Adding index from command line

Posted by Admin on 21-Dec-2006 08:03

Guys,

is there a way to add new index from a command line to Progress 9 database?

I know that if the index is already there, I can activate it using proutil.

However, in my case I get backup of the production DB and need to add indexes to speed up dump process. I don't want to and can't add indexes to "live" DB (even if they are non-active).

Any ideas?

Dima.

All Replies

Posted by Alon Blich on 21-Dec-2006 09:49

is there a way to add new index from a command line

You can write a procedure that creates the .df file and loads the file using prodict/load_df.p.

Put all other administration tasks for creating the index in the procedure, like, shutting down the database, using proutil idxfix or idxbuild (with os-command), error handling etc.

Then write a batch or shell script file that calls the procedure and passes any parameters using the -param/session:parameter.

... I get backup of the production DB and need to add indexes to speed up dump process

Why are you not using probkup ??? For one, you wouldn't have these scattering issues, besides many others reasons.

Posted by Admin on 21-Dec-2006 09:59

>You can write a procedure that creates the .df file and loads the file using prodict/load_df.p.

thanks! I'll try this method

... I get backup of the production DB and need to add indexes to speed up dump process

here is how it works: every night a process that uses probkup take production backup and restores it to a "reporting" instance.

I have scripts that dump ASCII text files and load them in data warehouse (Oracle).

But when I dump - I need to apply some WHERE conditions. Original system does not have enough indexes for this...

Does it make sense?

Posted by Admin on 21-Dec-2006 10:16

I can find load_df.p program. I do have makedf.p though

Can you e-mail me load_df.p or may be just an idea how it works? Does it use system tables?

Posted by Alon Blich on 21-Dec-2006 13:19

Hi Dima,

Can you e-mail me load_df.p or may be just an idea how it works? Does it use

system tables?

load_df.p is the procedure used to load .df files and is part the Data Dictionary code. The sources are in the prodict.pl procedure library file in the %DLC%\src dir, and the compile r-codes in gui and tty dirs.

Have a look at the extractpl script in the %DLC%\src dir for extracting files, although, extracting the files isn't necessary and the procedure can be called from the procedure library.

I would be happy to email you the sources but the code could be version dependent. Only the installed load_df.p procedure should be used to load defintions and no customization should be done on the code.

Are you planning on adding and removing indexes on a regular basis ? just the overhead of building an index (besides other complications, like, shutting down the database etc.) would be greater then a full table scan.

If you have large tables and the available indexes are not effective enough for your query you might want to consider adding a permanent index.

When traversing large volume of data using an index that's more contiguous or less scattered, like, a non-unique or the primary index can return better results then using an index/es matching the query filter. I would also recommend using read only buffers not to monopolize the buffer pool during dump.

You could also look at using a log based replications scheme, that will only copy the changes and will allow for replication in short intervals instead of nightly runs.

Posted by Admin on 21-Dec-2006 14:58

first of all thank you for so much details.

>load_df.p is the procedure used to load .df files and is part the Data Dictionary code.

found it!

>Are you planning on adding and removing indexes on a regular basis

yes, every time we refresh a DB, I'd add "dump" index. I tested it and it gives me a lot of advantage against full scan.

Some stats:

record count: 8 mil (very large record size)

full scan dump: 6-7 hours

time to add index: 1h

time to dump with index: 1h

>want to consider adding a permanent index.

I'd love too, but the system is locked for any changes, except bug fixes. The risk is too big to make ANY changes to it. So, all indexes would have to be added after the database is copied.

>I would also recommend using read only buffers

how do I do that? now, I just use for each with no-lock. I found it the fastest way to do the job. Note, that the instance I'm using is fully dedicated to dump process. No other users are cobbected to it at the time I'm dumping data.

>You could also look at using a log based replications scheme

Is this trigger based? We have considired these solutions. Trigger based would require changes to source system, which again I have no ability to make. We also, considered solutions from Bravepoint but they are also trigger based.

Posted by Tim Kuehn on 21-Dec-2006 16:49

>You could also look at using a log based

replications scheme

Is this trigger based? We have considired these

solutions. Trigger based would require changes to

source system, which again I have no ability to make.

Not entirely - I've written trigger-based replication systems where the db logging triggers are stored in persistent procedures which are started at the beginning of the session, and then the main application is started. The triggers just log what happened, and a backend process takes care of massaging the logs and applying to the target db's as required. Properly done, there will be no changes made to the existing system's behavior or code base, although it might run a little slower in a high-transaction update environment.

Posted by Alon Blich on 21-Dec-2006 17:42

yes, every time we refresh a DB, I'd add "dump" index. I tested it and it gives me a

lot of advantage against full scan.

Some stats:

record count: 8 mil (very large record size)

full scan dump: 6-7 hours

time to add index: 1h

time to dump with index: 1h

Wow, that's really a long time ! I don't think it's unreasonable to expect to dump 8 million records in less then ten minutes or a quarter of an hour.

There can be significant improvements, as much as 20x YMMV, between reading records in a contiguous order using an index sorted close to how the records are physically stored than scattered.

Non-unique indexes have the table's recid added as a phantom field and are therefore closer to how the records are physically stored.

The primary index, usually, means the index that's mostly used (although, opinions vary) and is usually the order the table is dumped and loaded.

I would recommend, if a dump and load hasn't been done in a while, now is the time. That will solve scattering, and record fragmentations issues and could improve performance substantially.

I would also recommend doing heavy queries with a self-service connection. A client running on the same machine as the database without a network connection (no -H -S parameters).

You can use -Bp to define a client's private read-only buffers (although it can also be dynamically set at runtime using VST's) to prevent the dump from monopolizing the buffer pool and reduce the impact of performing a dump.

And any other tweaks like dumping to a separate disk, parallel dump processes. Tom Bascom wrote an excellent article about dump and loads you could look at @greenfieldtech.com. You can also look at the documentation for references on startup parameters etc.

Is this trigger based?

Yes, but ... there are replication triggers, besides, the basic triggers saved especially for this purpose.

Overall there are significant advantages to using log based replication, although, it's not something that's obvious to build. Changes should be loaded by transaction, all tables would need to have primary key defined etc.

In the future when you upgrade to OpenEdge 10 you might want to look at the Progress DataXtend http://www.progress.com/dataxtend/index.ssp for replication directly from OpenEdge to Oracle and other databases (without using the database triggers).

HTH

This thread is closed