Management of domain objects

Posted by Thomas Mercer-Hursh on 08-Dec-2006 17:10

Any given application is going to have a number of different types of objects and the lifecycle and the method of management is likely to vary by type. Let's consider, though, one of the core types of objects, the domain objects or business entities.

One of the characteristics of these objects is that they tend to be persistent, meaning that existing instances are likely to be fetched from either a data store or some other source which ultimately leads to a data store and that new objects or changed objects likewise need to be returned to the data store or intermediary source so that they can ultimately be stored in the data store.

One of the implications of this is that these objects have a sort of "home base" in the data access layer from which they arise and to which they return. To me, this suggests that it might be desirable to anchor the lifecycle management in the data access mechanism.

One way this might be done is for the data access object to retain a cache of all objects of that type which were currently in use. When an object was changed, it could be returned to the data access object and a save-changes method would return the object to the data store or other source. This could include a comparison with the cached object to determine whether or not it was changed and whether or not its before image state in the object returned was identical to the one in the cache. It would only perform this check if it was an authoritative source for the object, otherwise it would pass it back to the authoritative source.

Were it possible to have multi-threaded services, one such data access object might serve as the authoritative source for all threads and, assuming this service was the authoritative source for that object type, then this one object would know whether or not changes had been made. Without this, it would need to re-obtain the object from the database to make the comparison.

There might also be a method to indicate that one was done using an object without having to return it, perhaps because no changes were made, the object was read-only, or any changes were being discarded.

The data access object might also keep track of the number of instances of an object which had been checked out and were not yet returned. While an instance was checked out, it would remain in the cache.

Different object types might have different caching policies for objects which dropped to zero external references. The DAO for a code table, for example, might initialized by reading the entire table and it would keep the entire table in cache since that is normally what would be requested. I.e., actual update would be rare. A DAO for a more complex object like a Customer or Order might be retained in the cache for a specified period of time on the theory that, if there is one operation on it, there might be a series. These objects would age out after some specified period of time. Another type of complex object might be aged out immediately or quickly because it was known that the object was unlikely to be accessed again soon. An object of this type might be a completed transaction audit trail.

This kind of lifecycle management is quite different that the general purpose factory or general purpose object manager to which there has been recent reference in other threads.

So, what does everyone think?

All Replies

Posted by Tim Kuehn on 14-Dec-2006 18:11

This sounds like the "permanent" condition I'd implemented in my procedure manager - if an object was written with that attribute, the PM would never delete it until a shutdown was being performed.

Posted by Admin on 15-Dec-2006 12:39

Let's consider, though, one of the core types of

objects, the domain objects or business entities.

Now you're really confusing me. In other threads you explained that a business entity could wrap a temp-table/dataset/buffer, but here you clearly want the more traditional way of dealing with entity classes: every "order row" in the database will be mapped to an Order-business entity instance, when asked for, right? This means that you will end up with lots of entity instances during a service request (in case of the order-example, you will have the order entity, the order line entities as a collection of the order, product entities as a property of an orderline entity, a debtor entity, etc). This was one of my concerns, since I assume that it's a weakness of the OO4GL (since I assume that the PSC architects still think a lot in terms of buffers, temp-tables and prodatasets and not in entities). But OK... let's move on....

One of the implications of this is that these objects

have a sort of "home base" in the data access layer

from which they arise and to which they return. To

me, this suggests that it might be desirable to

anchor the lifecycle management in the data access

mechanism.

You would use a "session" or a UnitOfWork at the level above the data access layer. The data access layer is clearly defined:

- when you ask it to fetch an object (row), it should go to the database and get the latest version of it

- when you ask it to save an object, it should save it to the database

This layer should be consistent in it's database behavior. The layer on top should implement caching and transaction scoping. For some inspiration, see http://www.hibernate.org/42.html.

One way this might be done is for the data access

object to retain a cache of all objects of that type

which were currently in use.

I wonder what kind of class type you want to exchange with the data access layer? When you stuff in the specialized domain object (the "order business entity"), you're basically tying the data access layer and it's layer above together. You will create a strong dependency between the two.

Posted by Admin on 15-Dec-2006 12:47

This site elaborates on caching and transaction context http://www.progress.com/dataxtend/data_caching/index.ssp.

Posted by Thomas Mercer-Hursh on 15-Dec-2006 13:52

You must have some

assumptions in here that I don't follpw or agree with. If the data

access layer creates an Order object whose members are whatever one

has defined them to be and are independent of how they are stored,

the only tie between the data access layer and everything else is

that they have to agree on the contract for Order. If one doesn't

have that, one has nothing.

Posted by Thomas Mercer-Hursh on 15-Dec-2006 14:06

I

think this is talking about a different level of cacching. What I

am talking about here is simply caching based on the idea that a

recently used object may be needed again. An Order, for example, is

likely to go through a series of processing steps from the initial

creation, allocation, credit check, and routing to the warehouse.

Somewhat later it might be shipped and invoiced. Depending on the

nature of the business, these latter two might be "soon" or not.

E.g., if manufacturing or assembly was required, it might be a

while. To me, the DataXtend products exist to exist to provide more

bandwidth and more local access to data. If the master database of

customer information is in St. Louis and the satellite office is in

Columbus, one can get into a performance bottleneck if one has to

keep making requests across the WAN for all the customers.

DataXtend provides a technology which can provide a limited local

"shadow" of that data so that many requests can be filled locally.

There is still a need for the WAN connection for each new customer,

but it can be a specially tailored communication not limited by the

generality of regular SOA communications and thus it can be more

efficient. I see this as supplementing, not replacing, local

caching in an individual service.

Posted by Admin on 15-Dec-2006 14:32

Where did you get that idea? Whether the lines in an

Order are OrderLine objects or whether they are

simply a temp-table is a design decision that is

independent from the current topic. Nothing I have

said suggests a one to one mapping of table to

object. Quite the contrart, the point of a data

access layer is to make any such relationship

invisible to the application.

Here you are not consistent: you replied to John Sadd somehwere that you're against exposing temp-tables. You said you wanted classes with properties, not buffers.

For the sake of the discussion, please share a simplified, written class model with us that demonstrates the class responsibilities. In the case of an order with orderlines, products, etc. show us how you want to model the data access layer and the business entities. It would be nice to see how that would interact with, for instance, a BROWSE-widget or DataGrid, but that's another topic.

Posted by Thomas Mercer-Hursh on 15-Dec-2006 14:38

The

internal structure is irrelevant to this discussion. Pick your

favorite.

Posted by Admin on 15-Dec-2006 14:53

Well, any individual data access layer class may or

may not be accessing the database.

Sure, that's the whole idea of abstracting the database access...

But, Object is not equal Row. It is whatever it is.

Among other thisgs, it might be a collection.

Be more concrete! If you don't want temp-tables/buffers and you don't want objects, what do you want to expose? A collection of what? When you don't wrap the orderline as a class, it doesn't have behavior attached.

When you wrap an orderline temp-table with a collection wrapper and the collection is not a collection of orderline objects, you probably have a current orderline in the collection. In that case there can be only one orderline active in the collection. Something like "order.Orderlines.GetOrderline(2)". Next you will have access to the orderline properties via "order.Orderlines.ProductId". But this collection isn't reentrant... I'm trying to figure out what you have in mind, so perhaps it's better that you write down what's in your head instead of me guessing...

Among other thisgs, it might be a collection. It

also might be assembled from multiple sources. Note

also that SaveChanges on a collection would only

save/update/delete those entities which had been

modified.

So you would have an "order.Orderlines.SaveChanges()" and that wouldn't signal the order? Or would it internally call it's order, telling it wants to be save?

It is one

thing to have a Customer data access object and to

know that all fetches and stores of Customer happen

through one instance of that object and it is another

to have one of these objects in each relevant

session. In the latter case, none can assume that a

cached object is still current.

With "current" I guess you mean "not dirty"? So what happens when I have a WRITE-trigger defined on the orderline table that updates some order header total/price/something? The order will become invalid when the orderline is saved. In this simplified scenario you can tell, but it soon gets very complex. But surely you don't have triggers

You must have some assumptions in here that I don't

follpw or agree with. If the data access layer

creates an Order object whose members are whatever

one has defined them to be and are independent of how

they are stored, the only tie between the data access

layer and everything else is that they have to agree

on the contract for Order. If one doesn't have that,

one has nothing.

Perhaps all questions will be answered when you provide the model...

Posted by Phillip Magnay on 15-Dec-2006 15:03

If I could very briefly interject a couple of thoughts here...

This is a complex and involved topic with a lot of scope for discussion. However, there is also the possibility for misunderstanding due to the limited nature of text as a means to convey the complexities involved, differences in terminology, and tone. Patience and the willingness to seek to understand the other's point of view here is critical to the goodwill of such complex discussions via this forum.

Robust debate is healthy. But the goal should not only be about convince the other of your own view. It should also be about understanding the other's perspective also.

Posted by Thomas Mercer-Hursh on 15-Dec-2006 15:33

Be more concrete! If

you don't want temp-tables/buffers and you don't want objects,

what do you want to expose?

Correct. They are clearly anathema in this context.

Posted by Thomas Mercer-Hursh on 15-Dec-2006 15:36

What I have long thought was one of the most powerful

leaning techniques.

Posted by Admin on 16-Dec-2006 13:06

For the present discussion, I think it is irrelevant

what the internal implementation is.

I'm not asking for the internal implementation, I'm asking for the contracts you have in mind when you talk to the data access layer. It clearly makes a difference if you return an Order-entity instance per order or if your order class only exposes the order properies as somekind of resultset wrapper.

What I want to avoid is passing around temp-tables or

PDS unwrapped. That requires that the definition be

in both ends.

Sure, that's a fair argument.

That doesn't keep one from passing an

object which wraps a temp-table or PDS because then

one merely uses the objects methods.

The question will be: how are you going to find the right order (wrapper) instance in memory that contains a particular order? When I ask for the last three orders of a customer, what do I get from your Finder-class? An Order instance that wraps the 3 orders and manages a temp-table internally or a collection of order classes with three order instances?

In the latter case it's very easy to identify the object externally by the order primary key. So the next time I ask the Finder for order "xxx", it can lookup the order instance in memory (cache) and return the already loaded version. Take into consideration that the entire state of an Order class instance isn't equal to the properties stored in the order row (so the temp-table it wraps).

When you start aggregating orderheader and orderlines into a flattened Order-entity class, as you suggest, when do you stop? When do you decide that Product should be fetched as a standalone object instead of being exposed as product properties of the order? And will the Order(line) provide you the Product or should the caller use the Product Finder to locate the product based on the orderline information? This would indicate that something else knows how to fetch the product.

An Order object can include either a collection of

OrderLines or a temp-table of OrderLines ... doesn't

matter because the temp-table never goes anywhere on

its own.

But the Orderline will go on it's own as soon as you expose it. You can't stop the caller from using it.

Similary, I can have a OrderLineCollection object

which is a specialized object for only order lines

and which contains a temp-table with individual

fields or I can have a generic collection object

which contains OrderLine objects. I can define the

former to have the same signature as the latter and

no one will ever know.

So you want to keep the orderlines, either as collection or as temp-table, internal to the Order?

More to the point, the question I have raised here is

independent of the specific implementation. You can

choose to do things one way and I can choose another

and we can still both ask ourselves the quesiion I

have posed.

Sure I can ask myself where, how and what to cache. But when you start a thread and ask people for their opinions, it would be nice if you could eleborate a bit on the architecture you have in mind. Just show us how your classes would interact so we can give better feedback.

Posted by Thomas Mercer-Hursh on 16-Dec-2006 13:28

I don't

want to restrict the discussion to caching in my particular

architecture. Indeed, one of the interesting questions would be if

the caching decision was different based on different

architectures.

Posted by Admin on 16-Dec-2006 14:04

So a resume:

- a Finder will return Order-instances

- for manipulating the orderlines you would like to use the Order

- sometimes you will use an Orderline collection when you want to query

Manipulating such a restricted (or perhaps readonly) Orderline would mean fetching the full blown Order.

Interesting that you want to aggregate orderline and order behavior into the order, since the orderline behavior can be rather complex as well. An orderline can be the "header" of a delivery schema (an orderline can be delivered in multiple shipments for instance). But I can imagine that you want to model this shipment handling in a dedicated class. On the other hand there is a relationship between the orderline delivery amount and the shipments, discount, etc.... This makes me question if your entity classes merely expose state or if they encapsulate data and behavior.

So now I know that you want to return Order-instances from the Finder I think you somehow have to track the materialized Orders. So when I ask 10 times for order id "xyz", I will always get the same instance. Else I would be running into troubles : when I would have two versions of order "xyz" in memory, which version is valid?. Now I assume that the Order entity class is a full blown, self supporting object, that doesn't fetch it's order-state from a Finder-singleton that manages somekind of temp-table/pds with the actual order rows. And that an Order object is more than the accumulation of the order buffer properties. Else it wouldn't really matter if the Finder would return 10-different wrapper versions of order id "xyz", since the state is managed externally from the order entity class, but centrally namely in the Finder** (and there would be just one of them in the session).

So I think you have two levels of "caching":

1) object tracking (for session* consistency)

2) performance optimization (cache mostly readonly objects and reduce the # of database reads)

Number two could be done by the Finder. Number one should be one level up, since it should be created at the service request.

*) with session I don't mean application session, but the session context of the current request. So when something wants to place an order, this call handling is my "session context". It has a transaction scope, materialized entity scope, security scope, etc.

**) this is just a way to model it. And it wouldn't be my preferred method....

Posted by Thomas Mercer-Hursh on 16-Dec-2006 14:47

Much better, though I don't think I see it quite the

same way. For complex domain objects, it seems to me that the

purpose of caching is also performance, but perhaps we should ask

about two different levels of caching. One is during the usage

lifetime of the object. The other is some longer period which might

be minutes, hours, or days, but at least it id during a period when

the object is not currently in use in that session, but was

previously used. You seem to think that it might be important to

cache something like an order during its usage lifetime in case the

same object were requested by another part of the session. Given

that we don't have a multi-threaded session, this seems pretty

remote, but I agree that it is one thing that I would like to do.

Indeed, one of the things I would like if we could get

multi-threaded sessions or closely cooperating sessions is that the

source for a particular type of object would not only cache the

object while it was in use, but it would also have a distributed

event system so that it could notify all users of an object when a

change was made so that they could get a fresh copy. With

single-threaded sessions I'm not sure if this has practical value,

but I agree that it is a nice idea. In an optimistic locking

discipline, it is possible for different sessions to get different

versions of an "object" and this is resolved or rejected at

check-in. So, ensuring that all copies are itdentical is not

considered necessary in this discipline in order to ensure

consistency in the stored data. The discipline works because the

conflict doesn't happen very often. Longer term caching is clearly

just a performance issue. This is based on the notion that

accessing someone once at time X means that it iw likely that it

will be wanted again sometime "soon", e.g., an order generates a

shipping request which is routed to the warehouse and, at least in

some contexts, this means that the order will soon be updated by

the results of the shiping request. Caching the object would keep

the Mapper from having to rebuild it, which might be moderately

expensive if lots of tables were involved. The second type of

caching I think I agree is more of a matter of caching in the

business logic, although it seems likely that one would back that

cache with a cache in the data access layer. This kind of cache is

one where it would be particularly nice to have the ability to

publish a CollectionModified event. One of the reasons for the

cache in the data access layer, btw, is if any of the requestors

have edit ability since then one would want to be able to use

Tracking-Changes to identify what was different. Note too that the

authoritative source for such a collection may be in a different

service than where it is being used so one needs to cache it on

both sides of the bus.

Posted by Admin on 17-Dec-2006 05:05

I seem to be having a hard time convincing you that

I'm not trying to discuss specific architectures in

this thread. This thread was intended to discuss the

management of the lifetime of domain objects ...

however composed.

Thanks for your patience and I really appreciate talking to a fellow architect (it's a pitty the audience is rather small).

The primary reason for me diving into details is that far too often people agree on the highest level of abstraction. But as soon as you start implementing things, you either run into runtime specifics or other concerns and than suddenly things aren't so easy anymore. It's you yourself who complains about the simple examples in the AutoEdge reference architecture and I agree with you (haven't dived into AutoEdge, but other samples/whitepapers only highlight certain areas).

So when you say "No need to fetch the whole order to update impacted fields" you make a misjudgement imho. I think updating an orderline should be done in the scope of the order header. Something simple like changing the ordered quantity might have a big inpact on the price. Let's assume you get a discount of 25% when you order 10 pieces. So you order 10 pieces and a day later you cancel 9 of them. The order entry clerk updates the quantity and hopefully the business logic will update the price as well, unless the clerk overrules the system.

Don't get me wrong: I'm really not into creating a perfect orderline system. It's just my favorite example.

Posted by Admin on 17-Dec-2006 05:31

You seem to think that it might be important to cache

something like an order during its usage lifetime in

case the same object were requested by another part

of the session. Given that we don't have a

multi-threaded session, this seems pretty remote, but

I agree that it is one thing that I would like to do.

No it's not. You tend to forget the responsibilities of classes. Once you start OO'ing your appplication, you also start creating "self supporting objects". So during orderline validation something might want to fetch the customer object. This code code manipulate the customer, so it's in memory state will be changed: it's not synchronized with the database state, since the processing is not done yet. Now another part of the code during this same call processing requires the same customer as well. What if the Finder will produce a new version of the Customer, which has been materialized with the current database state? The two Customer instances will be out-of-sync. This has nothing to do with being mulit-threaded.

Indeed, one of the things I would like if we could

get multi-threaded sessions or closely cooperating

sessions

Whow... don't underestimate the complexity you will introduce by adding multi-threading. There are very few people who know what they should be doing when it comes to multi-threading. It will be very easy to deadlock yourself. A simple example: thread A has a pending transaction and thread B wants to update the same data. More complex to detect is when you start adding synchronized code....

is that the source for a particular type of

object would not only cache the object while it was

in use, but it would also have a distributed event

system so that it could notify all users of an

object when a change was made so that they could get

a fresh copy.

What about the ACID-rules here?

In an optimistic locking

discipline, it is possible for different sessions to

get different versions of an "object" and this is

resolved or rejected at check-in.

Are you considering pessimistic locking in your architecture? You will soon be in trouble when you add the multi-threading part when it would be available....

consistency in the stored data. The discipline

works because the conflict doesn't happen very

often.

Hehe... that's the same argument as saying "I ignore conflicts in an optimistic concurrency controlled environment since two users will hardly ever update the same row"

Longer term caching is clearly just a performance

issue. This is based on the notion that accessing

someone once at time X means that it iw likely that

it will be wanted again sometime "soon", e.g., an

order generates a shipping request which is routed to

the warehouse and, at least in some contexts, this

means that the order will soon be updated by the

results of the shiping request. Caching the object

would keep the Mapper from having to rebuild it,

which might be moderately expensive if lots of tables

were involved.

So you mean that a subsequent request, in case of an AppServer a new stateless AppServer call, will want to reuse the same order instance? I find that very unrealistic... Between the two requests another user could be working on this order as well, perhaps he's approving it... And than you have load balancing: you won't get the same AppServer session.

The second type of caching I think I agree is more of

a matter of caching in the business logic, although

it seems likely that one would back that cache with a

cache in the data access layer.

You will cache when you think it's worth it. You might cache things at the user interface level as well. A list of countries can very well be cached on a smart client device once it has been fetched. There is no need to spend another AppServer roundtrip on that one.

This kind of cache

is one where it would be particularly nice to have

the ability to publish a CollectionModified event.

One of the problems with hooking up subscribers is that the object's lifetime will be extended as well, since you will connect everything together. So my request handler will subscribe deep down there to a CollectionModified event published by a data access component. It probably have to do that for all the dac's it uses (order, product, customer, etc).

One of the reasons for the cache in the data access

layer, btw, is if any of the requestors have edit

ability since then one would want to be able to use

Tracking-Changes to identify what was different.

Note too that the authoritative source for such a

collection may be in a different service than where

it is being used so one needs to cache it on both

sides of the bus.

So what will happen to the state of this cache when you rollback (UNDO) the transaction somewhere?

Posted by Thomas Mercer-Hursh on 17-Dec-2006 12:11

OK, I have started a new thread on Order and OrderLine ( http://www.psdn.com/library/thread.jspa?threadID=2725 ) so we can go into that discussion there.

Let's pick this up in the new thread.

Posted by Thomas Mercer-Hursh on 17-Dec-2006 12:37

Very well understood. One of the points Gus

made when we were disucssing my use case on multithreading was in

essence, "how are we going to keep people from shooting themselves

in the foot?" The Right Thing. Surely you

don't intend having long open transactions ...

This thread is closed