There are a couple of extracts we run that if they terminate abnormally have the propensity to bring the whole database down. We get the following in the log:
[2014/12/04@14:36:30.779+0000] P-12908 T-17196 I SRV 36: (2520) Stopped. [2014/12/04@14:36:30.779+0000] P-22176 T-18640 I WDOG 167: (2523) User 212 died with 1 buffers locked.
Where 212 is pertinent to the user in question. I've compiled the offending code with listing and there are no transactions. It's just an extract after all.
Does anyone have any ideas what this locked buffer might be?
In this case "buffer" is a synonym for "block".
The message is telling you that a shared memory connection was killed
while it held a latch on a block.
The database will see that and realize that the latch will never be
released and that, therefore, everyone who wants that resource will
Since the information about how to reverse that changes being protected
by the latch was only held in the memory of the process that was killed
there is no way to safely unlock the resource.
The only feasible solution is to shutdown.
Why would a mere extract need to lock blocks?
1) You want consistent reads. It wouldn't be very good if the record
was being updated by someone while you read it (you can see how that
"works" if you connect -RO... every now and then your session will crash
with some very scary messages...)
2) Progress needs to keep track of access for LRU purposes and also
makes adjustments to the LRU chain dues to reads -- these operations
all need to be protected by latches.
Fair enough - that makes sense. Thanks Tom. Any ideas (short of kicking the developer who wrote awful code) of things one can look at to try and reduce block locking by these extracts?
Locking blocks is not something that 4GL code has any awareness of. It
is just part of ensuring consistent concurrent access to the database.
On the other hand -- you lock lots more blocks when you read lots of
data. So poorly indexed queries and the ilk will drive lots more
activity. I am always in favor of fixing that sort of thing on any
On the third hand... this issue is a result of two things that you do
have some control over:
1) It only happens to shared memory clients. You could connect with -S
to avoid it. That might, however, perform poorly. especially if the
extract is data intensive. Which is very likely why it is running on
the server in the first place.
2) Why are the clients being abnormally terminated? Are they GUI
sessions that get killed when the user logs off? In that case run them
from a proper headless client as scheduled tasks or services. Use -b and
use _progres.exe NOT prowin32.exe. That will allow you to preserve the
advantages of shared memory connections without the risk of the
interactive session dying or being killed.
603 396 4886
Good call on the -b thing. I will make sure that gets implemented tomorrow.
Connecting with -S definitely isn't an option as you surmise due to the data being read. I suspect that if the queries were given FIELDS lists then they would run much quicker on a client/server connection, but that's beyond the understanding of the coder - see below.
The clients are usually abnormally terminated because someone realises they have gone out of control (due to a code change that wasn't tested). They kill the process.
Unfortunately (for me), the extracts are written by someone who has no understanding of proper index selection and reducing reads. It happens to be our IT manager. He writes the extracts because nobody else has time. He writes them badly and then someone else has to fix them. Not at all fun.
> Does anyone have any ideas what this locked buffer might be?
If ai is enabled then you can find out what session did right before its death.
If the client process does have to be killed, disconnecting it from the database with promon or proshut will reduce the chance of bringing down the database.
Good call [mention:4b133177ec1e4b6f9a6a7832e6f29913:e9ed411860ed4f2ba0265705b8793d05].
[mention:ae2ea2f6412743fc8be36c522f414ef0:e9ed411860ed4f2ba0265705b8793d05], how would i go about seeing this?
> how would i go about seeing this?
1) In db log find ai file that was active at 14:36:30;
2) Scan the ai file (and a few of its predecessors);
3) Find the login name of user 212;
4) grep the scans by the user's name;
5) grep the scans by TRIDs found by the previous grep.
or you can use my grepAiScan.p instead of 'grep' command:
> If the client process does have to be killed
...then don't use 'kill -9' for self-service users. Use: kill -SIGUSR1; kill -SIGTERM etc
Maybe a weeks worth of training (for the IT Manager), can save you months worth of problems. :-)
lol [mention:ff09ecb5f26245d181a46a061b650f24:e9ed411860ed4f2ba0265705b8793d05] we've tried... :)
[mention:ae2ea2f6412743fc8be36c522f414ef0:e9ed411860ed4f2ba0265705b8793d05], thanks for that really useful post. Spasibo!