Highly Parallel Dump and Load

Posted by kunal.watkar on 28-Mar-2014 16:27

Hi,

I have a task to dump and load very large database ( approx. 300GB) and I will be having not more that 32 hrs to complete this task. With conventional ASCII or Binary D/L this task is not achievable within given time limit.

So, I tried parallel Dump/Load option. I used a raw-transfer to do parallel D/L . I wrote a 4GL code to transfer data of 1 table (tr_hist , size 55GB, records more than 120 million).

After almost 6 hours of data transfer, only 30 million records was loaded into target database, which implies that for a complete table it will take around 24 hours. This is not suitable for my task's time limits.

Please let me know which parameters I should tune to improve the performance.

Please find below some details about my environment:

Source DB:

OS: AIX

OE Version: 10.1A

-L :500000

-bibufs :120

-s :81

-B :100000

-spin :40000

-aibufs :130

-directio

Target DB:

OS: Linux

OE Version: 11.2.1

-L 300000

-bibufs 120

-s 81

-B 500000

-spin 1000

-aibufs 130

For doing raw-transfer, I am connecting source DB using -S parameter and progress 11.2.1 from Linux server.

OpenEdge RDBMS - Forum

Posted by ChUIMonster on 28-Mar-2014 16:39

You should be able to dump that tr_hist table faster than that.

Shutdown the db server and try a binary dump like so:

proutil dbName -C dump tr_hist -RO

Also -- if you have (or can obtain) a recent dbanalys check to see which index is the smallest and try the binary dump using "-index" whatever that smallest index is.

Using buffer-copy is sometimes helpful but if you are going between servers with -S you will also need to carefully tune the networking. Jumbo frames is a good first step. Your buffer-copy code is also important. For instance, you will benefit greatly from proper transaction "chunking" vs one record at a time commits. Using the implicit record creation might also be helpful.

But the first thing I suggest is to return to the binary dump. Try it with a -RO connection. And then try -RO plus the smallest index on the table.

Posted by Libor Laubacher on 28-Mar-2014 18:01

No need for the undefined "intermezzo". Just convert straight to utf-8. There are some extra steps for word break table, collaction etc - see knowledgebase.progress.com/.../19912 and possibly Internationalization guide/doc for that.

All Replies

Posted by ChUIMonster on 28-Mar-2014 16:39

You should be able to dump that tr_hist table faster than that.

Shutdown the db server and try a binary dump like so:

proutil dbName -C dump tr_hist -RO

Also -- if you have (or can obtain) a recent dbanalys check to see which index is the smallest and try the binary dump using "-index" whatever that smallest index is.

But the first thing I suggest is to return to the binary dump. Try it with a -RO connection. And then try -RO plus the smallest index on the table.

Posted by kunal.watkar on 28-Mar-2014 16:43

Hi Tom,

My souce DB codepage is ISO8859-1 and target DB codepage is UTF-8 .

So, some way I am not able to load the binary data dumped from source database.

Any suggestion on how can I do this ?

Posted by Libor Laubacher on 28-Mar-2014 17:18

convert the source db to utf-8 prior dumping the data -or- have the target db in iso8859-1, load the data and then convert it to utf-8

Posted by kunal.watkar on 28-Mar-2014 17:51

I will create ISO8859-1 DB and will then convert to UTF-8 .

For converting ISO8859-1 DB to UTF-8, I will first convert the DB to undefined codepage and then to UTF-8 .

Is this process correct ?

Posted by Libor Laubacher on 28-Mar-2014 18:01

Posted by kunal.watkar on 31-Mar-2014 22:35

Hi Tom,

Binary Dump and Load worked for me. Thanks .

Can you let me know the cases where raw-transfer is more effective than Binary D/L ?

This thread is closed