Is my vague recollection true?
Every time there is a lock table overflow the RDBMS increases -L by 1.
Or something like that...
Paul
Wow... when did that start?
Tested with small -L setting on AIX 10.2B and it does indeed increase -L by one each time.
Yes.
It has been going on for a long time. At least since v8, probably
longer. Eventually it eats up all of -Mxs and the db goes BOOM!
Some more detail on the mechanism:
http://knowledgebase.progress.com/articles/Article/000039445/p
I guess I never noticed because I always either get the code fixed and/or adjust the settings (even temporarily).
I was going to ask what happened when it ran out of segments/memory, but I see Tom has already covered that :)
Yep...exactly: bye-bye -Mxs and bye-bye DB.
Now the $1MCAD question: what can I extract from the db.lg and aiscan verbose to maybe help me figure out who the guilty party was?
AISCAN verbose...not for the faint of heart. Even with George Potemkin's AI Scan programs.
Does anyone have any words of wisdom to help alleviate my burden? You know, a quick shortcut magic method that says HERE!! LOOK HERE!!!
Paul
Guilt is in the eye of the beholder.
It might be that some particular user grabs a whole mess of locks *and*
blows up -L.
Or some user grabs a lot, but not all of the locks and the normal
workload results in some poor unlucky user just happens to ask for a
modest helping at just the wrong moment.
Or maybe -L is simply too small and a bunch of users with perfectly
reasonable requirements all make requests at about the same time...
Having said that... I've been know to go tripping through _userlock
like so:
for each _userLock no-lock where _userLock-usr <> ? and
_userLock-recid[512] <> ?:
to find people that seem to have more locks than might be generally
recognized as "reasonable"... I find the approach above to be *much*
safer than fiddling with _lock.
The database log is only going to give you the PID and possibly the userid (depends on how they are connecting).
Sometimes this will end up being a victim of another program that just pushed the lock table near the limits, sometimes it will be the offending process.
If it was actually doing DB updates and not just locks there should be some info in AISCAN, but like you said it can be a pain to look through especially if you have a lot of activity.
The problem is that I don't get a call until AFTER the DB has crashed. I am doing a post-mortem now.
What is -L set to now? How does that compare to the number of users and transactions?
I know Tom is probably going to blow a gasket on this suggestion,but ...
Try cranking -L up to a much larger value and put in more monitoring/alerts/logging to let you have time to find out where the suspects are. Then you can decrease it if it makes sense after the code is changed.
> I know Tom is probably going to blow a gasket on this
suggestion,but ...
Bang!
Try cranking -L up to a much larger value and put in more monitoring/alerts/logging to let you have time to find out where the suspects are. Then you can decrease it if it makes sense after the code is changed.
-- Tom Bascom 603 396 4886 tom@greenfieldtech.com
-L 100K
About 1500 concurrent users
We suspect the offending job was a batch process that will refire so I did exactly what you suggest in hope of catching it the next time around.
Then again, when one of the programmers heard our theory his eye twitched and he went back to his desk.
UPDATE: The twitch was something he saw in the batch scheduler. Someone scheduled a job with no start/end date.
Bad code once again...and the programmers wonder why we always say it's their fault.
Paul
Tom: I agree that some DBAs will just let it go forever.. In fact I have jacked up a few -L values to absurd values because the developers just will not stop writing crappy code.
I got tired of explaining to the CIO and business execs why parts of the business stops every few weeks or so because of "some database problem". The 10MB of memory just wasn't worth the trouble.compared to explaining it to them.
It also gave me time to track down the bad code and crack some knuckles instead of talking to managers about it. Because it was always my problem, even when it wasn't my problem :)
Paul: Sounds like a decent plan to me.-L of 100K seems a tad low to me for that many users but of course it depends on the application. You know it better than me obviously.