Age old question! I need to rebuild indexes on a DB as fast as possible.
Got 10GB memory
500GB free on DB volume
250GB free on another volume
What's the best options to use? Need any more info?
Windows by the way.
4 cores on the processor by the way!
165GB DB.
proutil <DB> -C idxbuild ALL -TB 64 -TM 32 -TF 90 -SG 64 -T K:\Temp
More Memory - faster idxbuild
Truncate all index areas before running idxbuild.
Unfortunately we've got data and indexes mixed. Does that make a difference to this recommendation, [mention:ae2ea2f6412743fc8be36c522f414ef0:e9ed411860ed4f2ba0265705b8793d05]
Of course, truncating can't be used for the mixed areas. It's sad that we can't move the inactive indexes between the areas. Otherwise you could easy separate data from indexes before running idxbuild.
Assuming that you're on 10.2B06 or later, look at -mergethreads and -datascanthreads. You can also set -threadnum <x> but be careful as mergethreads are per threadnum thread. IIRC -threadnum defaults to the number of CPUs/cores but I always set it explicitly.
There is also a new TMB parameter which I typically set to 256. And a large -B doesn't help you: leave it a 256 or 512. Same for -i: no value per my testing.
TF was the biggest addition to idxbuild, as it can move all the I/O intensive sorting and merging to memory. What we used to fake with RAM drives is now built in.
There was some weird issue with TF on HPUX where it couldn't/wouldn't allocate more than one shared memory segment, so even if you had 128 GB RAM on the server, proutil only used 4 GB (or whatever shmmax was). I had reported it as a bug so it might be fixed in later versions.
There was also an issue when you ran multiple concurrent idxbuild, all with -TF 90: they would bump into each other, sucking up all the RAM and eventually one or more would crash. But again that was way back in 10.2B0x so it might be resolved in later 11.x versions.
Thanks Paul. Appreciated as always. This is 10.2B08. Currently running a test with the command posted above by Manfred. -threadnum has definitely defaulted to num cores.
It's currently about 10% of the way through phase 1 - 17.5m blocks. Could be a long old night ahead. :(
Kill it and add -datascanthreads 4 -mergethreads 2 -TMB 256 -B 512.
Also add "-z -rusage" for some extra geeky stats.
The -rusage param isn't just geeky, it's important for determining whether you're tuning correctly. It will cause (11480) and (11483) messages to be written to stdout and the db log file, for each area processed, telling you how much I/O was done to temp files (-T or -SS). Example:
DBUTIL : (11480) Temporary sort file at /u/db/11.5/pac used up 0K of disk space.
DBUTIL : (11483) A total of 0K of temporary sort disk space was used for area 6.
If these numbers are larger than 0 KB then you are doing extra disk I/O: write I/O during the data scan/key build phase, read/write during the sort/merge phase, and read I/O during the b-tree insertion phase.
That's a very small amount of RAM, especially for a DB this size. Is this a VM? Can you bump it up temporarily?
Sorry the 10GB memory was free memory. They might be able to beef it up for the live run. We're testing on the replica target to get an idea of the timings involved at the moment.
Sorry Paul and James, I was wrong about -rusage :(. You get the 11480 and 11483 messages by default, at least back to 10.2B08. The -rusage param adds lines like this:
[2017/07/18@14:10:14.546-0400] P-75487 T-139980717442816 I DBUTIL : (-----) Resource usage: CPU user 0.002000, system 0.001000
Resource usage: DISK reads: 80 KB at 39 KB/sec, writes: 0 KB at 0 KB/sec
[2017/07/18@14:10:14.546-0400] P-75487 T-139980717442816 I DBUTIL : (-----) Resource usage: CPU user 0.003000, system 0.002999
Resource usage: DISK reads: 1640 KB at 818 KB/sec, writes: 456 KB at 227 KB/sec
Not sure where i got the idea that it was tied to 11480/11483 but (not surprisingly) Paul is correct: -rusage provides info that is interesting but not critical.
That said, you should also pay attention to your 11480/11483 messages and ideally you want to see 0K of temp file I/O per area.
[2017/07/19@03:49:18.139+0100] P-4460 T-4928 I : (11480) Temporary sort file at f:\sort\ used up 259868928K of disk space.
[2017/07/19@03:49:18.141+0100] P-4460 T-4928 I : (11483) A total of 259868928K of temporary sort disk space was used for area 6.
That's a lot of disk io. So adding more RAM will help that? Not that we'll be able to add 260GB more! I presume putting the sort files on as fast a disk as possible will help?
Full idxbuild took 14.5 hours! :(
Bit of a side question, but we've run idxfix option 3 on some of the tables we've got index corruption on and it makes the problem go away, but after a while it comes back again. It's always the same tables, different RECIDs though. Is it possible that their ODBC connections are causing this? Is there anything else I can do to debug where this is coming from. We haven't changed our application code in a few weeks so it's not code. Is it possible that the idxfix isn't solving the problem where the idxbuild will?
Hi James. We had the same problem overhere. It was a code page issue. An UTF-8 session was adding Chinees characters in a database with Polish code page.
The customer is receiving files from a German customer, but as far as I can see all connections are Codepage 1252. And the records we get the messages for do not have special characters in them.
> we've got index corruption on and it makes the problem go away, but after a while it comes back again.
Does the customer use the latest hot fix for V10.2B08? If I recall correctly, it fixed some bugs related to the index corruptions.
> Is there anything else I can do to debug where this is coming from.
There is no an easy answer. More details are required. What are the errors reported by idxfix? Which indexes were corrupted? Unique or non-unique? Are the records with the corrupted indexes be created recently? What are the typical transactions that create/update these tables? One or many records per transaction?
No the customer is not running any hot fixes. I will investigate.
The indexes are a mix of unique and non-unique. Yes they are recent creations because we very rarely access older records. The transacitons will be multi record ones. Our scoping is really bad.
It smells like a known bug. ;-)
Crap code? ;)
Progress OpenEdge Release Notes
Release: 10.2B0836
Date: April 2015
2. List of issues fixed in previous 10.2B08x releases
-----------------------------------------------------
Issue ID: PSC00328928
---------------------
Under certain cases involving multiple database updates, you may end up with logical index corruption if you undo certain sub-transactions.
The problem may occur on undo of certain sub-transactions when indexed fields of tables from multiple databases are updated in different nested sub-transactions.
This issue can be avoided by running with the -nosavepoints option.
The issue was fixed in 10.2B0832
Thanks George - that sounds very likely to be a cause.
> [2017/07/19@03:49:18.139+0100] P-4460 T-4928 I : (11480) Temporary sort file at f:\sort\ used up 259868928K of disk space.
> [2017/07/19@03:49:18.141+0100] P-4460 T-4928 I : (11483) A total of 259868928K of temporary sort disk space was used for area 6.
> That's a lot of disk io. So adding more RAM will help that? Not that we'll be able to add 260GB more! I presume putting the sort files on as fast a disk as possible will help?
> Full idxbuild took 14.5 hours! :(
Having large tables (or very many smaller ones) together in a single area will increase the required RAM to prevent temp file disk I/O. Having a well-designed structure will reduce the required RAM.
Are all of the application tables in the schema area?
Yup although I've just been asked to do a full D&L to Type II as part of the solution.
Hi Paul,
[quote user="Paul Koufalis"]
Kill it and add -datascanthreads 4 -mergethreads 2
[/quote]
How to calculate the values of these two parameters? From which they depend?
Regards,
Valeriy
[mention:819ac0812c524a3b9718403d142f1ba2:e9ed411860ed4f2ba0265705b8793d05] trial and error. datascanthreads depends on how much IO you can push through your disks. Maybe that's a good topic for next year's PUG Challenge: Benchmarking Index Rebuild under Various Hardware Environments.
[mention:3cb137fff0074190a2d3f32e55cf403a:e9ed411860ed4f2ba0265705b8793d05] I'd certainly be in support of such a venture!
Thank you, Paul!
I don’t remember what presentation I’ve watched, but it seemed to indicate that it was calculated as -datascanthreads = 1.5 X # CPU and -mergethreads = 1.5 X # CPU.
It turns out it was the wrong calculation, right?
For data scan threads it is really trial and error as Paul already mentioned. Balancing CPU usage and I/O throughput is the key to getting this optimal. The idea of 1.5 threads * CPUs is to eliminate any wasted CPU during this part of the index rebuild. With today's machines having so many CPUs, this number can become ridiculous since the file system will become the bottleneck. Having too many threads that are not improving I/O rates will introduce contention/concurrency issues and may actually decrease performance.
So you need enough datascan threads to use all the available CPU resources up to the point where I/O rates no longer improve.
I know everyone wants a simple formula to just apply but there really isn't one that fits every deployment. The experience of others with success running this in similar deployments to yours is the best resource for specific tuning suggestions.
Thank you Richard!
I smell a PUG Challenge submission brewing...
Fancy some sort of collaboration? I could do before and you could do after the Linux patch... ;)
[mention:6e3b17cb0ff148039a2aabb4c7834ce4:e9ed411860ed4f2ba0265705b8793d05] you are correct, it is not a patch. It's a cure.
:-)