Reverse & foward engineering of database schemas

Posted by Phillip Magnay on 13-Sep-2006 10:33

I would be interested in the hearing what people think about the need (or otherwise) of reverse & foward engineering of OpenEdge/Progress database schemas into UML design tools. The UML/MDA demos I just posted include some basic functionality in this regard. If there is a genuine need, what other functionality needs to be included to make it a viable/usable approach?

Phil

All Replies

Posted by Thomas Mercer-Hursh on 13-Sep-2006 12:31

I haven't tried out this version yet, but there were a couple of limitations in the version I did try where information in the dictionary did not seem to get imported, notably initial values and validation. Has this changed?

The really obvious weakness is the lack of JOIN information in the OE schema. While you have the tool for "guessing" about relationships, this is going to work poorly or not at all with some schemas and is likely to have some notable issues with most schemas. Any chance we can twist the arms of the engine crew to add a JOIN table? It would be a pain to build it the first time, but at least then it would be done once and it would be over. With a guess step, one has a problem of potentially having to re-do a lot of work if one ever does a fresh import.

Posted by Phillip Magnay on 13-Sep-2006 13:55

I haven't tried out this version yet, but there were

a couple of limitations in the version I did try

where information in the dictionary did not seem to

get imported, notably initial values and validation.

Has this changed?

OpenEdge-specific dictionary elements are captured as TaggedValues upon import of the .df. To open the TaggedValues Window in EA, it's just Ctrl-Shift-6 (or from the main menu, View->Tagged Values). Any field with an 'INITIAL' value should have a tagged value called 'INITIAL' with the corresponding value. Same with VALEXP and VALMSG or any other dictionary element. Of course, re: VALEXP and VALMSG, this is just data attached to a field and not a constraint/rule expressed in the model. But that should be reasonably simple to achieve. It's just figuring out the most appropriate way to express these rules.

The really obvious weakness is the lack of JOIN

information in the OE schema. While you have the

tool for "guessing" about relationships, this is

going to work poorly or not at all with some schemas

and is likely to have some notable issues with most

schemas. Any chance we can twist the arms of the

engine crew to add a JOIN table? It would be a pain

to build it the first time, but at least then it

would be done once and it would be over. With a

guess step, one has a problem of potentially having

to re-do a lot of work if one ever does a fresh

import.

I'm pretty sure a JOIN table is not under any current consideration so we're pretty much stuck with implicit relationships. Perhaps it might be possible to add more options or parameters to make the guesses. But yes, this is a real bother to work around.

Posted by Thomas Mercer-Hursh on 13-Sep-2006 14:24

Well based on the earlier version of the code, I'm not seeing these tagged values. If I go to a column which I know has an initial value and a validation expession, the Initial box is blank and Ctrl-Shift-6 (lovely affordance!) gets me only a tagged value for dump-name ... oops, interesting ... I initially opened the table, then the detail, then the columns and I get the tagged value for the table, but if I expand the table to the columns and click on that directly, I get the same display for the column, but now there are a bunch of tagged values, including the initial and validation ones. There really should be a more obvious way to find the tagged values than CTRL-SHIFT-6.

I understand that translating the value expression into a constraint may have some tricky aspects ... not the least of which are the ones that are includes! ... but if there is a bucket for initial and an initial value, I don't see why that shouldn't be easy to populate.

As for the joins, I think it is a tough problem. I wonder if we couldn't do something external to the core dictionary. I.e., a table or two with a .df we import, populate it, then export text and read that in as a part of the load?

Posted by Phillip Magnay on 13-Sep-2006 14:55

Tagged values are really very useful. They can be predefined so that only certain tags apply to specific class/element types and stereotypes. As well, as setting them up to be assigned using drop-downs for boolean, lists, etc, and file dialogs, color dialogs, long description dialog, etc. They really are quite handy.

I usually keep the Tagged Values window open all the time and docked in the lower right of the EA window, in addition to the new Properties window and the Relationships window.

As for the joins, I think it is a tough problem. I

wonder if we couldn't do something external to the

core dictionary. I.e., a table or two with a .df we

import, populate it, then export text and read that

in as a part of the load?

Sure. There are probably a nunber of ways you could do it. It's just difficult arriving at a standard approach and building on top of that.

Posted by Admin on 13-Sep-2006 15:39

I'm pretty sure a JOIN table is not under any current

consideration so we're pretty much stuck with

implicit relationships. Perhaps it might be possible

to add more options or parameters to make the

guesses. But yes, this is a real bother to work

around.

Are you serious? This feature pops up year after year and it's still not considered as being valuable? Everything you do in a data centric application has to do with relationships between entities. As long as there will be no object-relational mapping framework for ABL, we have to deal with ProDataSets, temp-tables, buffers, queries, etc. Why duplicate join-information in ProDataSet's, queries and triggers, while you could refer to a declaration?

Posted by Phillip Magnay on 13-Sep-2006 15:57

Are you serious? This feature pops up year after year

and it's still not considered as being valuable?

Yes. I'm serious. And I certainly did not imply that such a feature was not considered useful or valuable.

Only that it is something that is not going to be available in the short-term.

Everything you do in a data centric application has

to do with relationships between entities. As long as

there will be no object-relational mapping framework

for ABL, we have to deal with ProDataSets,

temp-tables, buffers, queries, etc. Why duplicate

join-information in ProDataSet's, queries and

triggers, while you could refer to a declaration?

I understand the issues. But it's not my place to make promises or commitments on this forum in an area with which I have very little involvement.

Posted by Thomas Mercer-Hursh on 13-Sep-2006 16:06

On the surface, it would seem like this was a simple request ... but it's not. Merely creating a couple of tables and providing dictionary code to maintain them would be a couple of days work. But, then what? For the limited purpose of exporting to UML, yes, it would be keen ... assuming that it had actually been fully and accurately maintained. But, if one were putting something like this in the product, one would expect the product to do something with it. That's where it gets difficult and interesting. I can imagine some things that I might expect to happen, but they seem difficult and unlikely to assist with existing code, unless it would be in some kind of SQL optimization.

What might actually be more useful, thinking a bit, would be if the prospective XREF tool created an empirical table of joins and we could access that for loading the UML. This would probably also identify some suspect joins!

Posted by Tim Kuehn on 13-Sep-2006 19:28

At Exchange 2006 one of the engine guys mentioned referential integrity as something that's on the roadmap...

Posted by Admin on 20-Sep-2006 16:36

Finally!!! Having that concept (dare I call it a foreign key) would make reverse eng. and other tools (e.g. Database visualisation) and reporting (e.g. crystal) sooooo much easier!!

Posted by Thomas Mercer-Hursh on 20-Sep-2006 17:12

Of course, "on the roadmap" and "getting there soon" are not necessarily the same thing!

Posted by Admin on 05-Oct-2006 16:44

True. Personally I wrote a 4GL program to transfer the Progress schema to MySQL. Its got a little TempTable of "1 to many" relationships and builds the foreign-keys on the fly. then I put the whole thing into the new Hibernate "reverse engineer" and had it build me a full SEAM application based on that. It also fully documents the schema as it goes (in HTML ) which is cool.

On the UML front, I've been putting the Schema into XML and using that to generate XMI maps and putting them into AndroMDA (http://www.andomda.org) - thats ok but its a LONG way from finished.

I then took the first concept a little further and turned my MySQL database into a code mapping tool so that each ".p" was a table and each call a "foreign key". Then I could use DB reverse engineering tools to produce a code map. This is definately NOT for the faint hearted and you need lots of RAM.

Muz

PS - BTW your spell checker doesn't recognise 4GL as a word ....

Posted by Thomas Mercer-Hursh on 05-Oct-2006 16:56

On the UML front, I've been putting the Schema into XML and using that to generate

XMI maps and putting them into AndroMDA (http://www.andomda.org) - thats ok but

its a LONG way from finished.

You should check out Enterprise Architect. From what I have been able to tell so far, its MDA potential is equal or greater than AndroMDA except for the pre-existing "cartridges" in AndroMDA, assuming that any of those are relevent.

PS - BTW your spell checker doesn't recognise 4GL as a word ....

That's because you are supposed to use ABL instead!

Posted by Muz on 09-Oct-2006 18:01

Do you mean the borland one? If so I just got it installed today

Posted by Thomas Mercer-Hursh on 09-Oct-2006 18:14

No, see http://www.sparxsystems.com/

Posted by Muz on 11-Oct-2006 15:55

I can't say I've tried it. I've been looking a lot at EJB3 (mainly from JBoss) and it looks great. I reckon that I could save 33% on coding and eliminate most of that annoying XML config by going from Spring/Hibernate to EJB3/Hibernate. However, I suppose the proof is in the pudding. Why is that relevant? I'm wondering if we could reverse the 4GL into a form and then forward it into Java/EJB3. Progress could still be the DB back end (seems to work better in V10). There are definitely parts of my app where it makes sense to get them out out of the ABL and into a multi-threaded environment, however the cost of taking the 4GL and tables to EJB3/Hibernate is not cheap!! Let alone the bugs that would probably creep in.

Posted by Thomas Mercer-Hursh on 11-Oct-2006 16:08

Well, it certainly is no secret that I would love to have a multi-threaded environment ( http://www.oehive.org/MultiThread ) although I also recognize that threading per se is a pretty dangerous area, so I might settle for some kind of close cooperation between sessions.

If you are focused on Java, then there might be AndroMDA components which would speed up the curve, but my own focus for MDA is to generate ABL. No reason the same model couldn't generate another language with different transforms, but the idea of having to create Java code is certainly not high on my list of desirable options.

The part about creating the UML from the existing application is certainly an interesting area. There are a couple of obvious and useful initial steps, but it certainly gets more difficult after that ... especially considering most source code! A couple of us think we might know how to do it, but there is some development which needs to get funded before we can get there. Even if the UML has to be built by hand based on the existing app, though, I think there are gains to be made by using modeling and MDA.

Posted by Admin on 11-Oct-2006 16:16

As for generating ABL (not 4GL) from the UML models, I was thinking that the ProDataSet could offer the same sort of function as Hibernate does to Java. Especially if you could hide the logic for the DataServers behind it as well. The ProDataSet then becomes the logical representation of your physical database. Now thats the data layer down. You can then write your "business" or "services" layer as a set of persistent procedures that pass around Value Objects (i.e. ProDataSets) for operations. The persistent procs may also need, on occasions, in depth knowledge of the schema for locking/transactions OR you need to be able to more easily roll back a ProDataSet update.

If the data is stored in the ProDataSet and isn't committed until the end of the transaction we also fix one of the ACID properties of the Progess DB where everyone can read uncommitted records from the buffer pool (one of my bug bears and the cause of frequent bugs).

The front end lay can then use the persistent procs to publish WebServices and SOA enpoints for Sonic or maybe move onto JBI (see http://www.servicemix.org as an example). There are also many other endpoints that could come out of the box like WebSphereMQ, eMail, FTP or, if you are really keen, CICS/Cobol copy books.

Once you get the layers right, the rest is just a matter of some hard work.

Posted by Thomas Mercer-Hursh on 11-Oct-2006 16:41

Once you get the layers right, the rest is just a matter of some hard work.

A general truism!

In my looking at the idea of ABL from MDA, I think that there are several parts to the problem.

First, as I think you are saying, one needs to have the target architecture clearly in mind. I.e., one has to know what one would write if one was writing it by hand.

Second, one has to extract out from that target what belongs in an underlying framework. I certainly wouldn't mind having that code come from MDA too and I hope it will eventually will, but in the short run I see no reason this part can't be hand crafted.

Third, one then has to figure out the transfroms that will take the UML models and convert them into the deisred target. Here again I think one needs clarity of concept of layers, but in this case in terms of PIM and PSM so that one breaks the transform into layers in order to provide one with architectural nimbleness. In particular, one of the layers should be for the UI so that one can select different UIs by applying different transforms.

Fourth, although in some ways a part of the Third, one needs to come up with an action language for expressing the algorithmic parts that are not a part of the UML proper. Hy instinct at this point is that this action language can stongly resemble ABL, at least a subset thereof (with all that nasty UI stuff gone).

I think this whole process is going to be way more straightforward using the OO features in the ABL than it would have been depending on the "imitation" of those features in pre-10.1A code.

You are probably right that the ProDataSet will be a key item in the data access layer although I haven't quite finished working through my full model there.

If we can get PSC to cooperate in Sonic licensing, I would be more than happy to focus on Sonic ESB as the backbone because it is best of breed.

Posted by Admin on 11-Oct-2006 16:48

If we can get PSC to cooperate in Sonic licensing

Good luck.

I had thought of the OO stuff but that be too much of a jump for many people in ABL land. But yes - it could definitely be done that way.

Now if you wanted to be really cool (and this way out there) you could organise the DB buffer pool (or the private BP) to store the ProDataSets as well .....

Posted by Thomas Mercer-Hursh on 11-Oct-2006 17:05

Good luck.

I've had good talks with the main players and there seems to be a consensus all around that it is a thing they need to do, but they just aren't sure how to do it. This piece (http://www.oehive.org/ESBAsAnApplicationArchitecture ) was the basis for our discussion and I got a clear message that they agreed with the concept, but again there is the problem of the details. The crux of the problem is that Sonic ESB is now primarily marketed as an EAI solution and EAI projects are notoriously complex and difficult and expensive, so a tool that helps manage and simplify that complexity is worth a lot of money so, in that context, the pricing is, if anything, reasonable. But, if one wants to use the same core tools as the basis for the architecture of tightly knit (but loosely coupled, of course!) applications, then the price point needs to be much lower until an if one starts needing the EAI aspects.

I had thought of the OO stuff but that be too much of a jump for many people in ABL land.

Well, there are three things that I want to say about that.

One, acquiring the skill to decompose an application into services and layers is something that I think is really quite similar to learning how to think in OO terms so the OO piece doesn't really add significant additional complexity.

Two, the UML modeling tools are essentially OO modeling tools so effective use of them requires one to learn to think in OO terms anyway.

Three, there is a natural relationship between OO and SOA so in some ways it is easier to get the job done with OO than to try and fake it.

Now if you wanted to be really cool (and this way out there) you could organise

the DB buffer pool (or the private BP) to store the ProDataSets as well .....

I'm not sure that there is any need to go there. The data itself will be in the DB buffer pool, of course, and the instantiated PDS will be in the local cache. What would one gain by moving the PDS into the DB pool?

Posted by Tim Kuehn on 11-Oct-2006 17:13

Now if you wanted to be really cool (and this way

out there) you could organise

the DB buffer pool (or the private BP) to store the

ProDataSets as well .....

I'm not sure that there is any need to go there. The

data itself will be in the DB buffer pool, of course,

and the instantiated PDS will be in the local cache.

What would one gain by moving the PDS into the DB

pool?

I'm guessing to not have PDS data change, or become inconsistent with what other sessions show while a given session's doing something else.

Posted by Thomas Mercer-Hursh on 11-Oct-2006 17:26

Well, caching the PDS in the DB buffer first of all assumes that they are on the same machine, which isn't necessarily the case. Secondly, moving the cache from the session to the DB buffer wouldn't in and of itself do anything to insure that the data in the PDS was kept current.

I think what one really wants is an ability to have closely interacting sessions such that a particular session can be the source for a particular type of data so that a single point of focus knows whether or not data has changed. Combine this with cross session publish subscribe and the source object can inform a user of that data that the user needs to refresh because the data has changed.

Posted by Admin on 11-Oct-2006 18:35

Yes thats all true BUT - my aim would be to make all my persistent procedures to run stateless on an AppServer and hence "simulate" SSBs (Stateless session beans) from Java. If they could share ProDataSets under the hood that would help a lot. Things like "static data" could be cached inside an general AppServer memory pool (which could be inside the -B). Typically your AppServer ""would"" be on the same machine as the DB to take advantage of the Shared Memory connection (we all know sockets are way slow!!!). If I could then automatically expose my ProDataSet business object as a .NET 3 client or a WebService with HTTP authentication then I'm well on the way to get .NET (or Java Swing or anything) connected into my ABL application. More money for me, more for Progress, everyone wins.

Posted by Thomas Mercer-Hursh on 11-Oct-2006 18:47

This is one of the reasons I want technology more intimate than sockets or named pipes to use for cooperating sessions or threads. We are looking at what sounds like the same structural model, but for me I want objects which are running in a thread and providing a local service to hold the shared data. I think that is a more appropriate model for data services and provides generalization to a lot else.

If one were to put the PDS in something like the DB cache, one would have to take it out and put it in to change it. Much better to have it in a process where it can initiate and respond to events. For non-static data, this would also mean the possibility of managed optimistic locking.

Posted by Muz on 12-Oct-2006 16:39

Now what would be really useful here is for PSC to either get the ABL running inside a JVM (bytecode) or into the Microsoft common format (sorry I've got a mental blank on the name - the common language runtime thing). Then I should be able to write Java or .NET code that can directly access the Shared DB pool or call directly into / out of the ABL. Now I get multi-threadedness with with Java and the fantastic locking structure of the ABL. I could then also deploy my whole PSC app into JBoss or directly onto a Win2003 server with .NET 3.0 installed.

Posted by Thomas Mercer-Hursh on 12-Oct-2006 16:45

I think we can safely say that you shouldn't hold your breath for this happening.

Posted by Muz on 12-Oct-2006 16:46

Possibly - but then I never thought PSC would get a usable JDBC driver either ...

BTW - Phil - are you still reading this?

Posted by Phillip Magnay on 12-Oct-2006 16:53

Possibly - but then I never thought PSC would get a

usable JDBC driver either ...

BTW - Phil - are you still reading this?

Sure. You guys are having a good exchange. It's what these forums are for.

Phil

Posted by Thomas Mercer-Hursh on 12-Oct-2006 17:00

Well, and I was surprised by the OO additions, but I think there are degrees of surprise. Having DataDirect work on the drivers to get them up to snuff is expected ... the only real surprise is possibly that they took as long as they did or perhaps that they finally got there, depending on your attitude. The OO stuff is a surprise in a way, but with the whole SOA and OERA push, perhaps it isn't such a big leap from coding to imitate objects to actually putting it in the language. Considering that PSC's UML use preceeds the development of 10.1A, one might even be tempted to guess that the embracing of UML by the EMEA consulting folks was a strong driving force behind getting real OO into the language.

But, where is that driving force here? I suspect that you think of the idea because you see a deficiency on the ABL side, multi-threading, and you see this as a strength in these other languages and so you think, "why not just jump into that pond?". Well, one of the reasons not to jump into that pond is that it would be a fabulously huge development effort. Given the whining and moaning I get whenever I ask for something that merely requires cleaning up the existing execution environment, I'm sure the idea of re-inventing it entirely is mind-blowing. Note particularly that the problem isn't just one of building a 4GL on that base, but of reinventing the existing execution enfironment exactly so that it will continue to be upwardly compatible.

Now, compare that with the effort of adding some things to the current environment. For example, a mechanism for close, performant inter-session exchanges which allows a sort of manual mutli-threading with cross session pub-sub. Difficult, yes, but actually a fairly clean bolt-on. Or, true multi-threading ... probably fairly messy cleanup of the engine, which would probably do it some good, but lots of it can remain the same ... i.e., it is refactoring, not recreating.

Posted by Admin on 12-Oct-2006 17:13

True, true, all true. But then I look at the PSC "Data Access Object" and "Data source object" ideas, and then I look at Hibernate and I think, writing something new is going to be FAR easier in EJB3, backed by Hibernate, database agnostic, with thousands of tools out there to help me ... As an example, I found something like this today: http://www.headwaysoftware.com/products/structure101/ Now regardless of how we proceed, with great tool support (over and above UML) - we of the ABL world are under thread. Why, because other teams (like Jee/J2ee/.NET) have these tools, graphics and wiz bands to show management. and like it or not, they are now at, or faster, at developing new applications than we of the ABL are. Also, they can do it entirely in open source - no licenses. The other side is that their stuff is much harder to tune for performance an often has more bugs, but CFOs, CTOs and CEOs, I find (although not so much where I work now), see bugs as not part of the original build cost and hence move towards something cheaper.

My whole desire behind the UML is get software built faster, with wiz bang tools and can be managed more easily.

Posted by Phillip Magnay on 12-Oct-2006 17:26

Considering

that PSC's UML use preceeds the development of 10.1A,

one might even be tempted to guess that the embracing

of UML by the EMEA consulting folks was a strong

driving force behind getting real OO into the

language.

I would suggest against being too tempted by this logic. It is true that some of the more active and mature consulting teams within PSC had been using UML tools well before the OO additions were considered. Some of the EMEA groups had been using these sorts of tools for some years. And I personally remember being involved in large projects in Australia as far back as '98 (hardly ahead of the OO/UML curve) where Rational Rose was used extensively. But it was always a challenge going from model to code back then and typically the code was either produced by hand or through some other tool(s).

However, the OO additions were driven by many more significant factors apart from interoperability with UML tools.

Posted by Thomas Mercer-Hursh on 12-Oct-2006 17:54

I would suggest against being too tempted by this logic

Maybe I've just been talking to Tom Bascom too much lately ...

Seriously, though, there is a certain history of PSC making language innovations at times when that particular innovation was especially needed by some internal consumer. There is nothing really wrong or surprising about that, it just tends to make external lobbyists a bit skeptical about their ability to influence the choices.

The main point to me about the OO is not what triggered it, but that when you decided to do it, you did a pretty solid chunk on the first pass and you did it with a certain religious conviction. Whatever the trigger, you got the message and now are following up on it thoroughly (although I personally would up the pace a bit).

Posted by Thomas Mercer-Hursh on 12-Oct-2006 18:01

I understand the lure in the rich assortment of tools, but, frankly they come with a cost too ... having to develop in Java. That is not something I would decide to do easily with all the tools in the world.

Also, I think that the contrast may be over stated. Sure, it is nice to be interested in MDA and discover that there are a ton of alternatives available and some of them are even free ... but how many of them are you actually going to use ... one, max. I think we can get there in ABL and having done so it won't matter whether or not there are 100 alternatives because we will have what we need.

Posted by Thomas Mercer-Hursh on 12-Oct-2006 18:10

BTW, Murray have you looked at Joanju's CallGraph? http://www.joanju.com/callgraph/index.php

It isn't as developed as the product you pointed to, but addresses the core of some of the same structural issues. If people would just buy more of it, it might eventually do all those things too.

Posted by Admin on 12-Oct-2006 18:11

Yes - I've given it go. It worked fine - although it took, I think, 12 hours do the major part (i.e. not all of) our source code. I believe it usually takes about 2 hours ....

Posted by Thomas Mercer-Hursh on 12-Oct-2006 18:14

You might want to try a fresh copy. When I tried it first, it was very slow, but the slowness turned out to relate to certain program types. John eventually figured out what the problem was and put in a fix. I think it was something like 50 times faster overall after the fix.

Posted by Admin on 12-Oct-2006 18:15

I tried it about 4 weeks ago ...

Posted by Thomas Mercer-Hursh on 12-Oct-2006 18:21

I think it was mid to late August or so that John found this fix, but you ought to check with him. He has a couple very substantial code bases measured in MLOC that take something like half an hour each to process, so if there is some issue still there with your code, it is probably fixable.

Posted by Muz on 15-Oct-2006 15:55

I'm waiting for my new PC (dual core) before I do anything more serious. This P4 2.6GHz is starting to hurt - especially since it only has a 5400RPM drive

Posted by Thomas Mercer-Hursh on 15-Oct-2006 16:00

Of course, if John makes a 20 to 1 or better improvement like he did with mine, that old "clunker" might be OK.

Posted by Muz on 15-Oct-2006 16:35

Of course, if John makes a 20 to 1 or better

improvement like he did with mine, that old "clunker"

might be OK.

Yeah - as a paper weight

This thread is closed