Transaction timeout now available in ABL ?!

Posted by dbeavon on 20-Feb-2019 00:31

We regularly struggle with long-running transactions in our ABL which will lock records, creating conflicts with other processes and the end result is that the entire system comes to a halt (and sometimes crashes).  The problem normally escalates gradually over the course of an hour or more.

I know that we can monitor VST's or use promon to diagnose these types of issues.  I think some customers build their own custom scripts to do this type of database management. (see  https://community.progress.com/community_groups/openedge_rdbms/f/18/t/24286

... I have looked for it frequently, but I've never found a "built-in" way to configure badly designed ABL transactions to have a timeout in HP-UX.  A timeout would be better than a database-wide problem which is instigated by a single misbehaving client.  This missing transaction timeout functionality has never been available, as far as I know.  (At least not for those of us that use shared-memory ABL connections to a database running on HP-UX.)

But I recently I've discovered that Progress does now have a "ClientTimeOut" feature!!!  This feature will be available to us if/when we migrate our databases out of HP-UX to Windows, and if/when we start using client-server connections (rather than shared memory.)  Here are the details:

https://knowledgebase.progress.com/articles/Article/P15531

The ClientTimeOut parameter allows the database remote server process 
to determine whether a remote ABL client has been inactive for a
specified period of time and if so, the remote server disconnects that remote client
and backs out any related active transaction releasing associated locks/latches
it may be holding at the time. Client "inactivity" is treated as the client
not accessing the database for a specified period of time.

Does anyone have experience with this?  It is unfortunate that it was never available to us in HP-UX or we would have investigated it long time ago.  It would be helpful to hear any real-world experiences with "ClientTimeOut", especially in the context of PASOE.

 

All Replies

Posted by Tim Kuehn on 20-Feb-2019 02:50

Have you looked at the Lock Wait Timeout parameter (-lkwtmo) which helps prevent deadly embraces if one transaction waits too long to get a record locked by another transaction?

Posted by Mike Fechner on 20-Feb-2019 05:26

Also for specific transactions or batch routines, you can use the STOP-AFTER option on a suitable block.

DO TRANSACTION STOP-AFTER 60

Posted by Brian K. Maher on 20-Feb-2019 11:59

Look at the DO STOP-AFTER phrase too.
 
 
Brian Maher
Principal Engineer, Technical Support
Progress
Progress
14 Oak Park | Bedford, MA 01730 | USA
phone
+1 781 280 3075
 
 
Twitter
Facebook
LinkedIn
Google+
 
 

Posted by dbeavon on 20-Feb-2019 15:04

Thanks for the tips.  My experience with those options (lock-wait and stop-after) have had very mixed results.    I was hoping to try something new, which might behave in a more reliable way.

An example related to LOCK-WAIT...  the other day we had a couple client-server processes that appeared to be locked in a deadly embrace for about two hours.  This was despite the fact that one of them was a PASOE session, and therefore its lock-wait should have timed out after only 10 seconds!  (Which is the default behavior for PASOE client code.)

Similarly, the STOP-AFTER seems very finicky and seems to only work in certain scenarios.  It would be interesting to know how that is implemented internally.

Here are a couple links to details about the behavior STOP-AFTER:

Notes:

"STOP-AFTER phrases are not intended to interact with user interfaces"

"Blocking calls to third party software components, where the AVM has transferred execution control, cannot be timed out."

As far as I can tell, STOP-AFTER involves a both (1) a timer, and also (2) ongoing polling which must be happening internally in order to detect when the timer elapsed.  IE. If the polling (part 2) isn't able to happen as promptly as it should, then the STOP-AFTER does not have the intended effect.  

Anyway, I don't want to get too far off-topic.  I was hoping someone may know about the purpose of the ClientTimeOut feature.  That could be really useful now that some of our appserver stuff is being migrated to PASOE.  Please let me know if anyone has tried to use it.

Posted by dbeavon on 26-Aug-2019 23:03

We are eager to try some new things when our OE database finally moves from HP-UX to Windows.  One database feature I'd like to try is a timeout that enforces limits on client connections (via "ClientTimeOut" see https://knowledgebase.progress.com/articles/Article/P15531 )

We frequently have long-running transaction issues.  I accept that it is normally a programmer's responsibility to fix this.  However there are times when an ABL programmer has no means of disconnecting from the database or the transaction.  It follows that the database *server* should have some responsibility as well - in preventing a client from connecting for hours at a time (especially while holding locks).

I'm eager to hear if anyone has any real-world experience with the "ClientTimeOut" feature. 

A possible use for the feature came up again today.    A small amount of ABL code was running in a single session, and it locked a few records, and proceeded to open a JMS adapter session to send the data to a queue.  However, the JMS adapter became hung (as a result of a Progress bug), and was not able to proceed nor to fail.  It caused a series of cascading issues as other ABL transactions started trying to lock the same database records.  This lasted for HOURS until an OpenEdge DBA was able to intervene and kill the misbehaving database connection. 

(... for more on that scenario, please reference the following thread: https://community.progress.com/community_groups/openedge_development/f/19/t/57696 )

It seems that the "ClientTimeOut" would have alleviated the related locking issues in the OE database.  The misbehaving ABL session would have continued to remain hung by the JMS adapter ... but at least the database server should be able to cut itself loose.  The fate of the entire database shouldn't depend on what is happening within a single misbehaving ABL session!

Posted by marian.edu on 27-Aug-2019 08:34

[quote user="dbeavon"]

A small amount of ABL code was running in a single session, and it locked a few records, and proceeded to open a JMS adapter session to send the data to a queue. 

[/quote]
I would say will be better to fix that in the app's code instead of relying on the server to detect and fix all possible cases where the client seems to be 'hung'. Unfortunately sloppy code exists everywhere (exclusive lock without no wait, potentially long running code ran in a transaction) and for whatever reason developers looks at the technology provider to fix everything, it just looks like other's bugs are more visible :)

Posted by dbeavon on 27-Aug-2019 13:51

As I said, I accept that it is a client-side code is part of the problem as well.  But in this case, "it takes two".  The OE database server is just as much at fault for *allowing* a client connection to hold a transaction open for hours at a time.  The server should have a way to enforce its own timeout as well.

A database transaction timeout is not a new concept.  It is a constraint that is enforced within the *database*, not by client program.  That concept should not be conflated with features in the client-side programming (ie. things like "lock-wait" and "STOP-AFTER" that are basically features which control the execution-flow within an individual ABL client process).

This thread is closed