Dealing with historical (rarely accessed) data

Posted by davidkerkhofs3 on 16-Oct-2007 01:56

Hello,

First a brief situation overview:we're currently planning a migration to OE10.1B. Our production environment currently holds 5 databases (V9 on HP-UX), but one of them is growing exponentially. Three tables in the database are responsible for 60% of the volume. These are tables that contain historical data, but at the same time a portion of the data is part of the daily operation. Thank you for indexes!

A few years ago we were playing with the taught of porting the historical data into a seperate schema, not part of the connected databases. But this would mean a modification in our application.

To avoid this handcoding another possibility could be to simply move the three tables to a different schema. Downside would be that an additional connection needs to be established for every application user.

My questions are:

a) who has experienced the same problem of having a database where tables take a considerable amount of disk space but aren't touched for ages

b) is the performance penalty of having an additional database connection (+ all the server resources for this database) too high, no matter what purpose

c) I personally am in favor of setting up an archive mechanism for this data. But to avoid waiting for this development, I plan on quickly shifting from V9 to 0E10 using -conv910.
Does anyone have experiences to share why I should first get my archiving scenario in place?

Grtz,

David

All Replies

Posted by Thomas Mercer-Hursh on 16-Oct-2007 11:27

Are you experiencing any actual problems with the data where it is? Disk is cheap, after all and if you need to have the data on-line anyway, then it has to be somewhere and moving it won't cause you to use any less disk. There can be reasons to separate historical data from current data, but a lot of the time I think it is just fine to leave it combined since indexes provide direct access to what you need, but doing things like trend analysis is much easier since the data is all in one place under one structure.

Posted by avkaam on 19-Oct-2007 01:05

we have about 4 tables wich take up 2/3 of the database, moving them to a seperate area (one of the data and one for the indexes) with a 64 cluster size resoved a lot of our preformance problems, only change I would make is go for 512 cluster size and maybe make an area for each of the 4 tables if I had to re-do it all again.

Posted by davidkerkhofs3 on 23-Oct-2007 15:13

Posted by Thomas Mercer-Hursh on 23-Oct-2007 16:10

With all that restoration going on, it sounds like you would be a good candidate for some form of replication. Why move around all the same records all the time?

And, what BI tool are you using that can't simply get its data directly from the Progress database? After all, we don't want the Oracle database to suffer, do we?

If you have replication of any new data, then you can elect to purge old data from any given instance. At the worst, at some point you might have to do a full copy again and re-delete. And, you really should never have to do that since you could clean out the tables in question and re-initialize them for a selected range using Tom Bascom's replication technique.

Come to that, if you used his technique you could empty out the tables in any clone as often as you wanted and do a selective copy for only the most recent data.

It seems to me that you have a number of options here that require no architectural change and which can be implemented immediately. This seems preferable to an architectural change, especially since the reason for the architectural change isn't that it is better architecture, but just because of the number and types of copies you are making.

Posted by davidkerkhofs3 on 24-Oct-2007 04:33

Posted by Thomas Mercer-Hursh on 24-Oct-2007 10:43

My first stop would be greenfieldtech.com, for sure. Not only is his technique fast, but it is ideally suited to targeted refreshes like you want to do.

Is the Oracle DB used for other data or is all the data in it coming from the Progress DB? If the later, it is really dumb.

Posted by davidkerkhofs3 on 09-Nov-2007 16:02

Hi Thomas,

sorry for my late response.

I checked out Greenfieldtech.com and found some useful information/code in dealing with large volumes. But I' didn't take a closer look on real replication solutions that they provide.

With regards to your Oracle question: several systems connect to Oracle databases, Cognos was found to be most performing with Oracle (hearsay, don't have any figures with regards to Progress being slower).

David

Posted by Thomas Mercer-Hursh on 09-Nov-2007 16:15

Tom's orientation in his work was replication, but what you want to do is essentially the same thing, just targeted. The key is the technique of direct copy and multiple sessions.

This thread is closed