CI/CD for OpenEdge Databases: State or Migration?

Posted by hutcj on 24-Jan-2020 21:33

There is a strong push to accelerate the testing and deployment of databases to keep up with the agility of source code. However, it is not a simple thing because of the nature of databases. For example, perhaps bugs from previous versions allowed bad data into the db that must be corrected; or data must be migrated to fit new models; or care must be taken to ensure data is not lost when promoting new schema (particularly renames). Not to mention updating all code which accesses the db, including triggers.

It goes without question that the database should be under the same rigorous version control that the source code is. Since the incremental df files needed to migrate the production database to the new schema is more a deployment detail than anything, I initially assumed the state approach, i.e. dumping the full schema and version controlling that, would make the most sense. However, this runs into problems at deploy time because the context for the change has been lost.

On the other hand, version controlling an initial schema and an ever-increasing number of migration scripts (incremental df's) would (I think) cause performance issues with CI as test db's must be regularly built and torn down. Also, we would still need the full schema file in order to have the pipeline analyze it for problems.

How do you handle this problem? Keep both the full schema and migration scripts (both for schema & data) in source control? Some kind of hybrid approach with full schema and rename files?

All Replies

Posted by Peter Judge on 24-Jan-2020 22:00

How do you handle this problem? Keep both the full schema and migration scripts (both for schema & data) in source control? Some kind of hybrid approach with full schema and rename files?

 
I think you manage the schema versions as a unit - so there's at least one delta.df and possibly an update_after_delta_data.p. If you need to remove fields, you'll need a before_delta.df , an update_after_before.p, an after_delta.df and an update_after_after.p.
 
All of which should be explicitly versioned.

We followed this kind of approach in Dynamics back in the day.  If you care to look at the approach we followed then, there's a set of files that define the updates, what should happen and when. For example github.com/.../icfdb102001patch.xml defines stuff after an "ADOLoad" but you can see there are PreDelta, Delta and PostDelta stages and they have some metadata too. There may be some doc on the DCU (Dynamics Configuration Utility) in the doc archives somewhere.
 
So if you have a drop field, you'd add an update that would possibly add a new field ("Delta" stage) and move the field data from old to new ("PostDelta" stage). Then you'd create a Delta stage to drop the old field.
 
There was code to deal with things like is this update re-runnable, does it only apply to new databases, etc.
 
Take a look in github.com/.../dfd  for the whole (or most of) ball of wax.
 
 
 

Posted by Riverside Software on 25-Jan-2020 08:57

As Peter said, incremental DF are more than just an implementation detail. They have to be applied in a specific order, and additional steps may have to be executed in between. All those steps have to stored in the code repository, so that they can be applied again and again in any environment.

A few years ago, I started working on DB update mechanism with Groovy. I've done a presentation on Groovy at the EU PUG Challenge ( www.pugchallenge.eu/.../groove-is-in-the-ant---querret.pdf ), saying that this language is really good for this kind of task. Next step is to do the presentation for the full DB update mechanism !

Posted by hutcj on 28-Jan-2020 16:46

Would you version control the migration files (inc df, data migration procedures) to the exclusion of full df's, or would it be a good idea to include everything in the repository (even though that may be redundant)?

Posted by Peter Judge on 28-Jan-2020 18:10

If delta 001 has 2x DF, 5x D and 2x P, that's a single update version or bundle is maybe a better way to look at it.
 
Each of the files within that bundle goes into source control as usual. I think the .D files you'd check in would be mainly master or control data, but it would depend on your application.
 
A new DB (with a full .df) is another kind of bundle.
 
In the Dynamics world, that was managed using a sequence, where the max-val was the current version. It was updated when an update bundle was applied.
 
 
-- peter
 

Posted by hutcj on 28-Jan-2020 18:43

I see - so even though you could produce the current full df by applying all the migration bundles in order from start to finish, it is still reasonable to actually store the full df in parallel in that same repo for its own purposes such as spinning up new test environments and running analytics?

Repository hierarchy (could also be further organized by database name at the root level?):

/schema - contains full df's

/migrations - contains a folder for each delta, which in turn holds all the df & p files needed for that delta

/data - contains d files with sample data for the current iteration

For example:

/db1/schema/db1.df

/db1/migrations/delta001/<delta001 artifacts>

/db1/migrations/delta002/<delta002 artifacts>

/db1/data/<d file for each table in db1>

/db2/schema/db2.df

/db2/migrations/delta001/<delta001 artifacts>

/db2/data/<d file for each table in db2>

etc.

Or should all the d files be inside their corresponding deltas so that all versions of the data are available concurrently without having to checkout a previous version of the repository?

Posted by Riverside Software on 28-Jan-2020 20:28

If you want to setup an old version of your database, then you'd just check out the right version in the history of the repository. So to my mind, you just store the data for the latest version of your database in the repository, and keep history for that.

However, some cases will have different requirements !

Posted by Peter Judge on 29-Jan-2020 02:20

I think that /delta001 should contain the schema, data and programs for that update version.
 
So you could have something like
/db1
                /full
                                Full.df
                                Table1…N.D
                                Sequence data
                /001
                                Delta.df
                                Table3.d
                                Update-table3.p
Etc
 
One reason to have the code in the /001 folder is that the rcode produced for those update programs may change based on the df that's applied. So you will also need to make sure your build scripts can deal with that.
 
You can see the scripts we used for Dynamics in github.com/.../adebuild , particularly pbuild and pbuildfxprg .
 
You'll also need a "driver" program; one that
- queries the current db's existence and (application)version
- determines the target version
- applies the update versions (/001 to /005 or whatever)  and update the db-version
- I think you'll need to think about what the level of atomicity is. I'd think not less than a single update is appropriate but it'll depend.
 
 
 

Posted by Stefan Drissen on 29-Jan-2020 06:51

Dynamic buffers and queries can be used to simplify the build process, removing the need for r-code specific to a database version.

Posted by Peter Judge on 29-Jan-2020 14:32

True. You will have to balance the risks of runtime errors against the build complexity.
 

Posted by gus bjorklund on 29-Jan-2020 18:56

two separate problems:

0) update an existing database with the new schema changes

for this you need one or more delta.df's.

1) create the current version of the database from scratch

for this, you need complete .df's, .st's, and a text change history that explains what schema changes were made and why.

you should be able to create a complete database with a small set of artifacts. doing 50 delta .df's is not fun.

2) depending on applications

you may also need to version some data files - for example if you have (mostly) static lookup tables or application configuration tables. and/or .p's needed to populate the data files.

dont forget documentation needed to create or update the database(s)

Posted by hutcj on 30-Jan-2020 18:00

Thanks for all the input - I think this gives us plenty of ideas to work with.

This thread is closed