PASOE auto-session-destruction after losing database connect

Posted by dbeavon on 27-Nov-2018 16:09

I noticed a change in PASOE, but I don't believe that I had seen any announcements or KB's about it.  

In the past when a remote database connection was lost, PASOE didn't seem to know or care about that insignificant detail.  This was the case even when the database connection was started based on client session startup parameters.  Until manual maintenance was performed on the PASOE instance, all clients that tried to use it would experience STOP conditions.

Now, in version 11.7.4, I'm seeing messages like this in the logs:

[18/11/25@02:30:36.041-0500] P-002304 T-002800 1 AS-4 MSAS Connection to networked db lumbertrack lost for AS-4.
[18/11/25@02:30:36.041-0500] P-002304 T-002800 1 AS-4 -- Disconnect from server; database is being shutdown. (2659)
[18/11/25@02:30:36.047-0500] P-002304 T-002800 1 AS-4 MSAS Destroying Session AS-4 due to externally instigated db disconnect.

Notice the message,

MSAS Destroying Session AS-### due to externally instigated db disconnect

This seems to indicate that PASOE is able to detect and resolve database disconnections.  I think this is a step in the right direction.  The only complaint I have is that PASOE doesn't proactively trim these bad sessions.  It seems to wait until the first use of them, after the database was shut down and restarted.  That usually means that the PASOE client will get an error message, even if the database had been stopped and restarted over an hour ago (ie. there is a residual problem with the session that needs to be flushed out and it won't be flushed out until the arrival of the first unfortunate client that attempts to use the session).

After that, the problems seem to resolve on their own.  After ALL ABL sessions that had lost their connections have been individually repaired, one at a time.

All Replies

Posted by kevin hermans on 21-Dec-2018 10:12

Hi,

Sometimes we also have this problem in development, because the procedures are less strict :) Our web developers don't like it and I agree even if it's occasionally.

What is your solution for this problem?
Think you have something dynamic in your activate procedure by parsing the database parameters.
Have checked with etime and this gives a delay of 0ms.
Or is there a better solution except that Progress have to build it into their software.

Kind regards

Kevin Hermans

Posted by dbeavon on 21-Dec-2018 14:02

I have a windows service on the PASOE server that detects the health of the entire PASOE-ABL-application by polling two separate methods every thirty seconds.

  • The first method runs a simple ABL statement that does *NOT* interact with the database.
  • The second method runs a simple ABL statement that does interact with the database.

Since I want to isolate database connectivity issues, I only take action when the first method is succeeding for a given ABL-app and the second is failing.  I don't care about any other variations.  This indicates that there is a problem, and one that is connectivity-related. 

When these facts are established, then the health service will simply trim all ABL sessions for the ABL-application using the oemanager REST interface.  By doing so, it will flush out the ABL sessions that have are affected by latent connectivity problems (but may not even know that yet).

One final rule for the health monitoring is that, before the two methods are called, you have to query the oemanager REST interface to determine the number of ABL sessions in the ABL application (for all agents of the application).  If there are absolutely no sessions, then there is nothing that needs to be done (during that thirty-second-iteration).  Without this rule, then the polling methods themselves may do more harm than good (ie. if there are ongoing/intentional/planned outages).  Hope this is clear.

Personally I think all this work should be unnecessary, especially given that PASOE can now detected "externally instigated db disconnect"...

MSAS Destroying Session AS-4 due to externally instigated db disconnect

... in other words, PASOE should at least take the initiative to trim the other ABL sessions in the same ABL application so that the others don't continue to have the latent connectivity problems as well.  Otherwise the problems in these sessions drag on for a much longer period of time than necessary.

Posted by dbeavon on 21-Dec-2018 14:13

>> Think you have something dynamic in your activate procedure by parsing the database parameters.

There were KB articles about detecting database connectivity problems on-the-fly, eg in the connect procedures and such.  

That seems like a very poor strategy and causes performance overhead and clutter in the code, and it must be sprinkled around in a lot of places.  

In my opinion whenever the openedge.properties for my ABL application specifies a client-connection on the database, then that is a declaration of a dependency.  PASOE could/should provide more assistance to ensure that sessions are trimmed in a well-defined way as soon as the dependency on the database is missing.

Posted by Brian K. Maher on 21-Dec-2018 14:16

David,
  • PASOE could/should provide more assistance to ensure that sessions are trimmed in a well-defined way as soon as the dependency on the database is missing.
This is a good idea.  Please submit an enhancement request.
 
Brian Maher
Principal Engineer, Technical Support
Progress
Progress
14 Oak Park | Bedford, MA 01730 | USA
phone
+1 781 280 3075
 
 
Twitter
Facebook
LinkedIn
Google+
 
 

Posted by dbeavon on 21-Dec-2018 14:50

Thanks Brian.  These are the types of scenarios that would be hard to conceive abstractly, until they are biting you in the rear.  Lately I have been finding a few of these types of issues in PASOE.  

The good news is that this is the first time I started noticing the "externally instigated db disconnect" error messages (since OE 11.7.5).  It makes me suspect Progress is already aware of the issue.  Perhaps Progress is even using PASOE for some of their own internal software projects, and so they might be fixing bugs based on practical experience.

This thread is closed