When designing a data access layer you should carefully design it's services.
When you don't use this layer, you will have the following advantages:
- full access to the features provided by the datasource (rdms)
- flexibility since you can create any view you like on the fly
Advantages of a data access layer are obvious ones like:
- encapsulate all physical data access logic in one layer
- hide all the database features and provide some predefined services
- potentially change the internal physical datasource access logic without disrupting layers above. You can't change the service's behavior though without disrupting the depending layers.
Most of the time designing the data access layer interface is the hard part. Especially when you don't want any database access outside the data access layer, you're going to make this layer very critical. I think the "hard part" is caused by the reduction in flexibility. Sometimes you need a query resultset to achieve something efficiently, sometimes you can pass some parameters and that's sufficient to execute an transaction and sometimes you can handle a disconnected prodataset efficiently. But defining a data access layer that can only deal with disconnected prodatasets is not the way to go.
Perhaps the AutoEdge guys can share some of their design decissions in this area... A data access layer for simple operations is a no brainer. Think about CRUD-operations. You can generate this code or create some generic classes to do the work dynamically. But the area of querying the data and aggregating sets efficiently will be a bit harder. Here you need at least two types of "return values": a cursor (opened resultset) and a temp-table/dataset. Depending on the resultset size the caller might want to use one or the other.
I would have no problems implementing a "get me a list of backorders"-service that directly accesses the database to produce the list efficiently (*). This service would be a top level AppServer service. But the problem will be to draw a line somewhere, so developers won't put in FIND-statements everywhere.
I think the ABL plus the development environment could help us here. Since the ABL is a very data driven language, it could do a better job creating a data access layer. Maybe we need a special project type that allows you to do direct database access and other types that won't allow it. Why isn't it possible in OpenEdge 10+ to define some query and let the environment generate a datasource component for you?
*) I know you shouldn't talk about performance before you actually have a problem with it
Why isn't it possible in OpenEdge 10+ to define some query and let the environment generate a datasource component for you?
You can't even use it as a datasource for multiple TTs, only one at a time.
Which is a problem if the need is to fill some TTs with normal parent-child sets of records one would get back from a query.
I guess no one is going to argue with that ...
Isn't that what we have been
criticizingHHHHHHHHHH^Hdiscussing on the - Discussion
thread the last couple of days? Well, here we run into
this fuzziness about AppServer again. To me, everything we know
about AppServer is that it should be stateless. If it is stateless,
this seems to me that any one agent is also identityless, meaning
that it is not dedicated to any particular type of service.
Therefore, anything that we are going to call a service has to be
at least one level down from the agent. This raises the question of
how it is that the agent locates and "attaches" to or communicates
with the service. If this is done using Sonic, then it is pretty
clear how a service publishes its identity, how work is directed to
it, how one communicates, how things are routed, etc. The question
I have is whether Sonic alone is sufficient. I know I want it there
for workflow management, distributed computers, interface with
external systems, etc., but it seems to me that I either need large
multi-threaded sessions or I need a very high performance method of
communicating among multiple sessions on the same box ... creating,
in essence a virtual AVM out of multiple sessions (yes, I realize
that is a virtual virtual! ) I suppose you can always create a project that
has no database connection...
One of the
advantages on an encapsulated data access layer that
you didn't mention is that you only have to solve the
problem once and then every one else who needs this
data can just use it without having to worry about
how it is obtained.
I asumed you would that get one yourself once you define logic in one place...
Given all the
questions on the PEG over the years about "why did my
query do this funny thing? or why did my query
perform poorly", this seems like a big plus and a net
reduction in development effort overall, especially
since the data access layer is probably one of the
areas in an application must susceptible to
generation.
Your query might run 100% efficient and the unit test may run "green" as well, but still your application might perform bad. It's an illusion to claim that once we put everything in one tier, all database related problems will be solved! Due to the nature of OO you're moving responsibilities to autonomous units (classes). This means that you might have to ask the help of your neighbours (objects) a bit more often than you would in less well structured application. This will lead to more isolated, simplified queries that will fire more often. Sure there is caching, but that will complicate things as well.
Isn't that what we have been
criticizingHHHHHHHHHH^Hdiscussing on the -
Discussion thread the last couple of days?
I guess you're right, I just wanted to focus on the data access layer... To me the ABL is already providing a database abstraction layer when you compare it to raw SQL:
- it gives you typed buffers
- it gives you nested transaction
Sure, it's no ORM, not by far.... but it has it's strengths as well.
Thinking in OO terms, I would have said that the two
return types were either an individual object or a
collection of objects. I'm not sure I can think of a
case off hand where I need to return both the
collection and a pointer or "cursor" into a specific
location in the collection. Could you explain this a
bit more?
Wrong! I was explaining that returning a collection of objects or a temp-table or a prodataset is not what I need for certain areas. Whether it's OO or not is irrelevant. Sometimes you need to loop over a set of rows to do some calculation. You don't need a temp-table with all data in memory, since that would be very inefficient.
This service would be a top level AppServer
service
Well, here we run into this fuzziness about AppServer
again. To me, everything we know about AppServer is
that it should be stateless.
Let's take that debate in a new thread http://www.psdn.com/library/thread.jspa?threadID=2862
To be sure, but it won't be running badly because of the query
unless the query is one that is inherently slow and thus there is
nothing to be done about it. At the very least, we will know exactly where the
problem is! Your example is still not
clear to me. If you need to loop over a series of rows, what is
efficient about doing this with an in-memory representation? Unless
you mean that you only want the total and there is only need for
one pass ... well then, why not also provide that functionality. It
is one of those things for which a SQL connection would be ideal as
long as the aggregate of interest was one supported by the SQL.
Regardless of how it is done, there is nothing about OO which says
that you can't return aggregates.
Indeed, I wish that ABL had a few of the SQL extensions provided by some extensions. E.g., TOP ... which is in 10.1B SQL, but not in ABL.
One can do the same thing with code, which isn't exactly what you're looking for. However, since the query has to be processed manually anyway, putting in a counter to stop after "n" iterations isn't too hard.
There's also the "CACHE" attribute, but it's not what you're looking for either.
The point being that these SQL extensions are processed on the server, outside of the process space of the client. Even if the client is on the same box, this is a big plus.
It is another one of those places, like the no-index read, where the ABL query has oddly been left the poor cousin of the SQL query.
Wrong!
An interesting reaction to "Can you this explain a
bit more".
The "Wrong!" applied to your conclusion:
"...
Thinking in OO terms, I would have said that the two
return types were either an individual object or a
collection of objects.
..."
Let's assume that you want to put all data access logic in one tier. The tier above wraps this tier by adding business logic. Imagine a data intensive routine like a "credit limit check" that needs to query:
- outstanding orders
- outstanding invoices
- etc
to produce the result (which is basically "sufficient" or "insufficient").
You could design a data access component that returns a decimal for the "outstanding orders" request, one for the "outstanding invoices", etc and let the tier above do the math.
But maybe it requires some additional rules to filter orders/invoices, so the tier above needs to loop over the resultset. In that case it might be more efficient to return a simple forward only resultset ("cursor" in other languages) instead of a temp-table with all outstanding orders preloaded.
The most efficient way would probably be to let the data access layer produce the final result, since it's the closest one to the data. But in this case the DAC-component would have an awfull lot of knowlegde about other tables, since it needs to read related data.
So my point is: the boundary between "business logic" and pure "data access logic" might not always be that clear.
I
would be the last to suggest that it is a trivial question, but my
inclination is to think that when it isn't clear, that the problem
isn't yet clearly thought through.
Well, first of all, I would never design a credit
check system which required reading all open orders
and invoices for a customer as a part of the check.
No hard feelings, imagine something else than, it was just an example
Secondly, passing a credit check is not something
which I think is typically resolvable by a data
access layer.
Well something has to collect the data and something (else) has to do something with it. And I might not be interested in full blown orders with it's details or customers with their addresses.
Two frequent scenarios come to mind.
...
Another is the kind of ERP system which my
software represents where the factors involved in
the credit evaluation are very complex including
credit available, credit hold, total value of orders
which are being checked, which might be a combined
shipment for some customers and not others, age of
oldest open invoice in relationship to credit terms,
credit overrides for particular orders, etc.
Exactly, that's what came in my mind as well. This is a very complex calculation, I know. The same you could imagine for an "available to promise" calculation for products where you have to deal with locations, batches, orderportfolio, etc.
You can attack this problem in at least two ways:
- create a specialized component that reads the database directly, ignoring any data access layer
- create a component that uses data access components for every query
That's
a BL task if I've ever seen one.
What do you mean?
While I can't see returning a collection of orders
other than the ones being currently processed
It doesn't have to be a different collection. But the overhead of creating a collection of orders/a prodataset/a temp-table might not be what I need for this particular problem. When the data access layer would return a simpler proxy object that wraps the query and behaves like a forward only cursor. So it could be something like "OrderProxy" with a "Next"-method to advance to the next internal row and exposes all the "Order"-properties like OrderDate, OrderId, etc.
But in this case the DAC-component would have an
awfull lot of knowlegde about other tables, since it
needs to read related data.
I haven't seen any example yet where there is an
inappropriate amount of knowledge required of the DA
layer.
There would be if you would implement the "credit check" data logic in a single data access component.
So my point is: the boundary between "business
logic" and pure "data access logic" might not always
be that clear.
I would be the last to suggest that it is a trivial
question, but my inclination is to think that when it
isn't clear, that the problem isn't yet clearly
thought through.
The problem often is that a single solution will be applied to all problem areas. I like Scott Ambler's practical approach where the data access layer provides multiple targets (cursor, record and object collection).
But, having concluded above that this is inherently
a BL function, it has no business being a DA component, so it is a
non-example.
Well, one of the reasons for wanting an example is
not being sure it is a real requirements. If the
answer to every example is, "I wouldn't do it that
way", then it isn't clear that there is a real
requirement. It is not infrequent here and on the
PEG for someone to ask how they can do something and
it turns out the best answer is either that they
shouldn't be trying to do it or that they should be
formulating the problem differently. Before I try to
deal with the dilemma as presented, I want to know
whether it exists or not.
It is not infrequent that someone starts with a fresh mind designing an architecture and during implementation finds out that the ideal world doesn't exist after all.
Which only means that
they haven't finished the design yet.