User terminated with 1 buffer locked

Posted by James Palmer on 04-Dec-2014 08:50

There are a couple of extracts we run that if they terminate abnormally have the propensity to bring the whole database down. We get the following in the log:

[2014/12/04@14:36:30.779+0000] P-12908      T-17196 I SRV    36: (2520)  Stopped. 
[2014/12/04@14:36:30.779+0000] P-22176      T-18640 I WDOG  167: (2523)  User 212 died with 1 buffers locked. 


Where 212 is pertinent to the user in question. I've compiled the offending code with listing and there are no transactions. It's just an extract after all. 

Does anyone have any ideas what this locked buffer might be? 

Progress 11.2.1

All Replies

Posted by ChUIMonster on 04-Dec-2014 10:02
In this case "buffer" is a synonym for "block".

The message is telling you that a shared memory connection was killed
while it held a latch on a block.

The database will see that and realize that the latch will never be
released and that, therefore, everyone who wants that resource will
queue forever.

Since the information about how to reverse that changes being protected
by the latch was only held in the memory of the process that was killed
there is no way to safely unlock the resource.

The only feasible solution is to shutdown.

Why would a mere extract need to lock blocks?

1) You want consistent reads. It wouldn't be very good if the record
was being updated by someone while you read it (you can see how that
"works" if you connect -RO... every now and then your session will crash
with some very scary messages...)

2) Progress needs to keep track of access for LRU purposes and also
makes adjustments to the LRU chain dues to reads -- these operations
all need to be protected by latches.
Posted by James Palmer on 04-Dec-2014 10:05

Fair enough - that makes sense. Thanks Tom. Any ideas (short of kicking the developer who wrote awful code) of things one can look at to try and reduce block locking by these extracts?

Posted by ChUIMonster on 04-Dec-2014 10:26
Locking blocks is not something that 4GL code has any awareness of. It
is just part of ensuring consistent concurrent access to the database.

On the other hand -- you lock lots more blocks when you read lots of
data. So poorly indexed queries and the ilk will drive lots more
activity. I am always in favor of fixing that sort of thing on any
excuse :)

On the third hand... this issue is a result of two things that you do
have some control over:

1) It only happens to shared memory clients. You could connect with -S
to avoid it. That might, however, perform poorly. especially if the
extract is data intensive. Which is very likely why it is running on
the server in the first place.

2) Why are the clients being abnormally terminated? Are they GUI
sessions that get killed when the user logs off? In that case run them
from a proper headless client as scheduled tasks or services. Use -b and
use _progres.exe NOT prowin32.exe. That will allow you to preserve the
advantages of shared memory connections without the risk of the
interactive session dying or being killed.

--
Tom Bascom
603 396 4886
tom@greenfieldtech.com
Posted by James Palmer on 04-Dec-2014 10:34

Good call on the -b thing. I will make sure that gets implemented tomorrow.

Connecting with -S definitely isn't an option as you surmise due to the data being read. I suspect that if the queries were given FIELDS lists then they would run much quicker on a client/server connection, but that's beyond the understanding of the coder - see below.

The clients are usually abnormally terminated because someone realises they have gone out of control (due to a code change that wasn't tested). They kill the process.

Unfortunately (for me), the extracts are written by someone who has no understanding of proper index selection and reducing reads. It happens to be our IT manager. He writes the extracts because nobody else has time. He writes them badly and then someone else has to fix them. Not at all fun.

Posted by George Potemkin on 04-Dec-2014 10:36

> Does anyone have any ideas what this locked buffer might be?

If ai is enabled then you can find out what session did right before its death.

Posted by Rob Fitzpatrick on 04-Dec-2014 10:38

If the client process does have to be killed, disconnecting it from the database with promon or proshut will reduce the chance of bringing down the database.

Posted by James Palmer on 04-Dec-2014 10:42

Good call [mention:4b133177ec1e4b6f9a6a7832e6f29913:e9ed411860ed4f2ba0265705b8793d05].

[mention:ae2ea2f6412743fc8be36c522f414ef0:e9ed411860ed4f2ba0265705b8793d05], how would i go about seeing this?

Posted by George Potemkin on 04-Dec-2014 10:56

> how would i go about seeing this?

1) In db log find ai file that was active at 14:36:30;

2) Scan the ai file (and a few of its predecessors);

3) Find the login name of user 212;

4) grep the scans by the user's name;

5) grep the scans by TRIDs found by the previous grep.

or you can use my grepAiScan.p instead of 'grep' command:

ftp.progress-tech.ru/.../grepAiScan.p

> If the client process does have to be killed

...then don't use 'kill -9' for self-service users. Use: kill -SIGUSR1; kill -SIGTERM etc

Posted by Abri Venter on 05-Dec-2014 00:14

Hallo

Maybe a weeks worth of training (for the IT Manager), can save you months worth of problems. :-)

Cheers

A Venter

Posted by James Palmer on 05-Dec-2014 02:45

lol [mention:ff09ecb5f26245d181a46a061b650f24:e9ed411860ed4f2ba0265705b8793d05] we've tried... :)

[mention:ae2ea2f6412743fc8be36c522f414ef0:e9ed411860ed4f2ba0265705b8793d05], thanks for that really useful post. Spasibo!

This thread is closed