Data access layer

Posted by Admin on 20-Jan-2007 03:29

When designing a data access layer you should carefully design it's services.

When you don't use this layer, you will have the following advantages:

- full access to the features provided by the datasource (rdms)

- flexibility since you can create any view you like on the fly

Advantages of a data access layer are obvious ones like:

- encapsulate all physical data access logic in one layer

- hide all the database features and provide some predefined services

- potentially change the internal physical datasource access logic without disrupting layers above. You can't change the service's behavior though without disrupting the depending layers.

Most of the time designing the data access layer interface is the hard part. Especially when you don't want any database access outside the data access layer, you're going to make this layer very critical. I think the "hard part" is caused by the reduction in flexibility. Sometimes you need a query resultset to achieve something efficiently, sometimes you can pass some parameters and that's sufficient to execute an transaction and sometimes you can handle a disconnected prodataset efficiently. But defining a data access layer that can only deal with disconnected prodatasets is not the way to go.

Perhaps the AutoEdge guys can share some of their design decissions in this area... A data access layer for simple operations is a no brainer. Think about CRUD-operations. You can generate this code or create some generic classes to do the work dynamically. But the area of querying the data and aggregating sets efficiently will be a bit harder. Here you need at least two types of "return values": a cursor (opened resultset) and a temp-table/dataset. Depending on the resultset size the caller might want to use one or the other.

I would have no problems implementing a "get me a list of backorders"-service that directly accesses the database to produce the list efficiently (*). This service would be a top level AppServer service. But the problem will be to draw a line somewhere, so developers won't put in FIND-statements everywhere.

I think the ABL plus the development environment could help us here. Since the ABL is a very data driven language, it could do a better job creating a data access layer. Maybe we need a special project type that allows you to do direct database access and other types that won't allow it. Why isn't it possible in OpenEdge 10+ to define some query and let the environment generate a datasource component for you?

*) I know you shouldn't talk about performance before you actually have a problem with it

All Replies

Posted by Tim Kuehn on 20-Jan-2007 08:18

Why isn't it possible in OpenEdge 10+ to define some query and let the environment generate a datasource component for you?

You can't even use it as a datasource for multiple TTs, only one at a time.

Which is a problem if the need is to fill some TTs with normal parent-child sets of records one would get back from a query.

Posted by Thomas Mercer-Hursh on 20-Jan-2007 13:58

I guess no one is going to argue with that ...

Isn't that what we have been

criticizingHHHHHHHHHH^Hdiscussing on the - Discussion

thread the last couple of days? Well, here we run into

this fuzziness about AppServer again. To me, everything we know

about AppServer is that it should be stateless. If it is stateless,

this seems to me that any one agent is also identityless, meaning

that it is not dedicated to any particular type of service.

Therefore, anything that we are going to call a service has to be

at least one level down from the agent. This raises the question of

how it is that the agent locates and "attaches" to or communicates

with the service. If this is done using Sonic, then it is pretty

clear how a service publishes its identity, how work is directed to

it, how one communicates, how things are routed, etc. The question

I have is whether Sonic alone is sufficient. I know I want it there

for workflow management, distributed computers, interface with

external systems, etc., but it seems to me that I either need large

multi-threaded sessions or I need a very high performance method of

communicating among multiple sessions on the same box ... creating,

in essence a virtual AVM out of multiple sessions (yes, I realize

that is a virtual virtual! ) I suppose you can always create a project that

has no database connection...

Posted by Admin on 21-Jan-2007 03:00

One of the

advantages on an encapsulated data access layer that

you didn't mention is that you only have to solve the

problem once and then every one else who needs this

data can just use it without having to worry about

how it is obtained.

I asumed you would that get one yourself once you define logic in one place...

Given all the

questions on the PEG over the years about "why did my

query do this funny thing? or why did my query

perform poorly", this seems like a big plus and a net

reduction in development effort overall, especially

since the data access layer is probably one of the

areas in an application must susceptible to

generation.

Your query might run 100% efficient and the unit test may run "green" as well, but still your application might perform bad. It's an illusion to claim that once we put everything in one tier, all database related problems will be solved! Due to the nature of OO you're moving responsibilities to autonomous units (classes). This means that you might have to ask the help of your neighbours (objects) a bit more often than you would in less well structured application. This will lead to more isolated, simplified queries that will fire more often. Sure there is caching, but that will complicate things as well.

Isn't that what we have been

criticizingHHHHHHHHHH^Hdiscussing on the -

Discussion thread the last couple of days?

I guess you're right, I just wanted to focus on the data access layer... To me the ABL is already providing a database abstraction layer when you compare it to raw SQL:

- it gives you typed buffers

- it gives you nested transaction

Sure, it's no ORM, not by far.... but it has it's strengths as well.

Thinking in OO terms, I would have said that the two

return types were either an individual object or a

collection of objects. I'm not sure I can think of a

case off hand where I need to return both the

collection and a pointer or "cursor" into a specific

location in the collection. Could you explain this a

bit more?

Wrong! I was explaining that returning a collection of objects or a temp-table or a prodataset is not what I need for certain areas. Whether it's OO or not is irrelevant. Sometimes you need to loop over a set of rows to do some calculation. You don't need a temp-table with all data in memory, since that would be very inefficient.

Posted by Admin on 21-Jan-2007 03:47

This service would be a top level AppServer

service

Well, here we run into this fuzziness about AppServer

again. To me, everything we know about AppServer is

that it should be stateless.

Let's take that debate in a new thread http://www.psdn.com/library/thread.jspa?threadID=2862

Posted by Thomas Mercer-Hursh on 21-Jan-2007 12:16

To be sure, but it won't be running badly because of the query

unless the query is one that is inherently slow and thus there is

nothing to be done about it. At the very least, we will know exactly where the

problem is! Your example is still not

clear to me. If you need to loop over a series of rows, what is

efficient about doing this with an in-memory representation? Unless

you mean that you only want the total and there is only need for

one pass ... well then, why not also provide that functionality. It

is one of those things for which a SQL connection would be ideal as

long as the aggregate of interest was one supported by the SQL.

Regardless of how it is done, there is nothing about OO which says

that you can't return aggregates.

Posted by Tim Kuehn on 21-Jan-2007 14:48

Indeed, I wish that ABL had a few of the SQL extensions provided by some extensions. E.g., TOP ... which is in 10.1B SQL, but not in ABL.

One can do the same thing with code, which isn't exactly what you're looking for. However, since the query has to be processed manually anyway, putting in a counter to stop after "n" iterations isn't too hard.

There's also the "CACHE" attribute, but it's not what you're looking for either.

Posted by Thomas Mercer-Hursh on 21-Jan-2007 15:42

The point being that these SQL extensions are processed on the server, outside of the process space of the client. Even if the client is on the same box, this is a big plus.

It is another one of those places, like the no-index read, where the ABL query has oddly been left the poor cousin of the SQL query.

Posted by Admin on 22-Jan-2007 02:12

Wrong!

An interesting reaction to "Can you this explain a

bit more".

The "Wrong!" applied to your conclusion:

"...

Thinking in OO terms, I would have said that the two

return types were either an individual object or a

collection of objects.

..."

Let's assume that you want to put all data access logic in one tier. The tier above wraps this tier by adding business logic. Imagine a data intensive routine like a "credit limit check" that needs to query:

- outstanding orders

- outstanding invoices

- etc

to produce the result (which is basically "sufficient" or "insufficient").

You could design a data access component that returns a decimal for the "outstanding orders" request, one for the "outstanding invoices", etc and let the tier above do the math.

But maybe it requires some additional rules to filter orders/invoices, so the tier above needs to loop over the resultset. In that case it might be more efficient to return a simple forward only resultset ("cursor" in other languages) instead of a temp-table with all outstanding orders preloaded.

The most efficient way would probably be to let the data access layer produce the final result, since it's the closest one to the data. But in this case the DAC-component would have an awfull lot of knowlegde about other tables, since it needs to read related data.

So my point is: the boundary between "business logic" and pure "data access logic" might not always be that clear.

Posted by Thomas Mercer-Hursh on 22-Jan-2007 11:36

I

would be the last to suggest that it is a trivial question, but my

inclination is to think that when it isn't clear, that the problem

isn't yet clearly thought through.

Posted by Admin on 22-Jan-2007 13:58

Well, first of all, I would never design a credit

check system which required reading all open orders

and invoices for a customer as a part of the check.

No hard feelings, imagine something else than, it was just an example

Secondly, passing a credit check is not something

which I think is typically resolvable by a data

access layer.

Well something has to collect the data and something (else) has to do something with it. And I might not be interested in full blown orders with it's details or customers with their addresses.

Two frequent scenarios come to mind.

...

Another is the kind of ERP system which my

software represents where the factors involved in

the credit evaluation are very complex including

credit available, credit hold, total value of orders

which are being checked, which might be a combined

shipment for some customers and not others, age of

oldest open invoice in relationship to credit terms,

credit overrides for particular orders, etc.

Exactly, that's what came in my mind as well. This is a very complex calculation, I know. The same you could imagine for an "available to promise" calculation for products where you have to deal with locations, batches, orderportfolio, etc.

You can attack this problem in at least two ways:

- create a specialized component that reads the database directly, ignoring any data access layer

- create a component that uses data access components for every query

That's

a BL task if I've ever seen one.

What do you mean?

While I can't see returning a collection of orders

other than the ones being currently processed

It doesn't have to be a different collection. But the overhead of creating a collection of orders/a prodataset/a temp-table might not be what I need for this particular problem. When the data access layer would return a simpler proxy object that wraps the query and behaves like a forward only cursor. So it could be something like "OrderProxy" with a "Next"-method to advance to the next internal row and exposes all the "Order"-properties like OrderDate, OrderId, etc.

But in this case the DAC-component would have an

awfull lot of knowlegde about other tables, since it

needs to read related data.

I haven't seen any example yet where there is an

inappropriate amount of knowledge required of the DA

layer.

There would be if you would implement the "credit check" data logic in a single data access component.

So my point is: the boundary between "business

logic" and pure "data access logic" might not always

be that clear.

I would be the last to suggest that it is a trivial

question, but my inclination is to think that when it

isn't clear, that the problem isn't yet clearly

thought through.

The problem often is that a single solution will be applied to all problem areas. I like Scott Ambler's practical approach where the data access layer provides multiple targets (cursor, record and object collection).

Posted by Thomas Mercer-Hursh on 22-Jan-2007 14:14

But, having concluded above that this is inherently

a BL function, it has no business being a DA component, so it is a

non-example.

Posted by Admin on 22-Jan-2007 14:38

Well, one of the reasons for wanting an example is

not being sure it is a real requirements. If the

answer to every example is, "I wouldn't do it that

way", then it isn't clear that there is a real

requirement. It is not infrequent here and on the

PEG for someone to ask how they can do something and

it turns out the best answer is either that they

shouldn't be trying to do it or that they should be

formulating the problem differently. Before I try to

deal with the dilemma as presented, I want to know

whether it exists or not.

It is not infrequent that someone starts with a fresh mind designing an architecture and during implementation finds out that the ideal world doesn't exist after all.

Posted by Thomas Mercer-Hursh on 22-Jan-2007 15:23

Which only means that

they haven't finished the design yet.

This thread is closed