Stack trace on PASOE

Posted by Gareth Vincent on 26-Sep-2019 07:43

I was just wondering if anyone else has had issues with PASOE where the session state remains in Active state yet you cannot pull any stack trace.  Below is an example of a jmx query against one of the sessions that are showing as active with the curl query and yet the jmx query shows as idle. After a period of time all agents go into this "ACTIVE" state preventing any more connections to the DB.  We are currently running on 11.7.4 on centos 6.

I've also looked at the agent.log file and i'm not seeing anything out of the ordinary.

Query: Line 1. Object: PASOE:type=OEManager,name=AgentManager, Method: getSessionStacks(24252, 9)
Result: {"getSessionStacks":{"ABLStacks":[{"Status":"Idle","Callstack":"","AgentSessionId":9}]}}

Posted by dbeavon on 30-Sep-2019 14:44

>> Out of interest is this something you do regularly in your environment?  

Not by choice.

There is a problem with APSV transport as it relates to the "session-free" openclients.  These types of openclients connect only once and then they re-use a dedicated HTTP connection for long periods of time (for all round-trips over the life of the client application).  The problem is that this worked well in classic appserver and not so well in PASOE.  In PASOE the default HTTP session timeout configuration causes the connection to be broken after a "reasonable" tomcat session timeout - a number of minutes.  Then the "session-free" openclient will encounter this broken connection the next time it needs to make an appserver round-trip.  Things go from bad to worse because of the way in which the openclient will raise a generic, low-level communication exception and crash the entire client application.

So the fix that we were given by PSC tech support is to increase the default HTTP session timeout configuration.  I think we have it set to 24 or 48 hours!  This avoids the continual crashing of our .Net openclient applications..  

But the tradeoff is that there is now a greater potential for resource leaks.  The number of abandoned HTTP sessions may grow over the course of a day (as yours are doing) and these abandoned sessions may hold onto various types of resources (primarily tomcat and msagent memory).  By using the tomcat manager to expire the HTTP sessions, it will administratively release those resources, even if 24  hours haven't yet elapsed.

Another way to avoid all of this is to use the "session-managed" connections to PASOE (rather than "session free").  That model doesn't maintain long-running connections to the server.  It creates new connections for every round-trip.  This avoids the need for the extremely long HTTP session timeouts.  But the round-trips will have quite a lot of additional overhead (10's or 100's of ms each) since new connections being opened and closed.

All Replies

Posted by dbeavon on 28-Sep-2019 00:45

Are you using the APSV transport with HTTP sessions enabled?

Is it every single client connection that is behaving this way or just certain ones?  What does the user experience in the remote client process when this happens?

I'd probably use agent tracing (log entry type 4GLTrace) to see if you can find out the last thing that the bad ones do before they get stuck in "ACTIVE".  You might find a pattern.

The fact that you can't get a stack trace is not in itself unusual.  I think that feature is a bit unreliable, and I have seen that happen when things are operating normally.  But I would agree that it is a problem for your ABL sessions to remain stuck at ACTIVE.

FYI, If you are using HTTP sessions then I've found that the tomcat "manager" webapp has quite a lot of power to free things up again in a pinch.  The HTTP session from tomcat essentially becomes the master-key for all related resources in PASOE (both in the java-session-manager and in the ABL-ms-agents).  If you forcefully expire/disconnect HTTP sessions in the tomcat "manager" webapp and it doesn't free up the related resources in PASOE then that is almost certainly a Progress bug.  I would definitely deploy that "manager" webapp and try to force-expire all of the HTTP sessions and see what happens.

Posted by Gareth Vincent on 30-Sep-2019 04:43

Thanks for the reply.  Yes, I'm using APSV transport with HTTP sessions enabled.  I will start with the 4GLTrace and see what I can find.  The strange thing is we have over 150 clients that are currently running on the same software and only 3 clients are experiencing this problem.  

Granted 80% of these clients are still on classic appserver.  The 3 clients in question are running PASOE but have been running fine for over a year now.  This problem only started about 2 weeks ago after a software deployment yet none of our other clients are experiencing this problem. (Classic and PASOE clients).  

How do I go about enabling "force-expire" on the http sessions?

Posted by Gareth Vincent on 30-Sep-2019 07:51

I've just figure out to end the session in active state by stopping old client sessions that are stuck in "READING" under the Request State.  Now to figure out why these sessions are getting in READING state.  Unfortunately, I cannot enable 4gltrace until close of business this evening unless there is a way to enable it without restarting the PASOE Instance.

Posted by dbeavon on 30-Sep-2019 12:48

>> How do I go about enabling "force-expire" on the http sessions?

The tomcat "manager" webapp allows you to interact with your HTTP sessions.  PASOE uses tomcat as its host, and is subject to any management operations that happen from tomcat.  So when you forcefully terminate an HTTP session in tomcat, the PASOE webapp is required to release any of the related resources.

The tomcat manager webapp is an application that you can deploy to your instance; it should live on port 8810 by default.  It has meaningful information that will complement the information that you see in OEE for oepas1.  In fact, OEE will show you similar HTTP session information on the "webapps" tab; but OEE will not let you interact directly with the HTTP sessions.  

Here is a screenshot of the tomcat manager webapp. Notice that there is a button where you can expire sessions that have been idle for a certain number of minutes.  (idle is based on tomcat's definition of an inactive session, and I would guess that you will find your sessions listed in here).

Posted by dbeavon on 30-Sep-2019 12:53

What does the user experience in the remote client process when this happens?  Can they recreate the problem consistently?

Are the clients using the open client (java or .net)?

Posted by Gareth Vincent on 30-Sep-2019 14:18

It looks like the user is getting kicked out of the system when trying to print a copy document and the session is left behind in a READING state.  This is intermittent but I managed to capture a protrace file from one of the users machines as they are using Webclient to connect to the APSV.  

Our DEV team is currently looking at the code so hopefully we will have this resolved soon.  I appreciate the feedback and I will definitely have a look at the expired sessions.  

Out of interest is this something you do regularly in your environment?  

Our environment is quite unique in the way we connect to the APSV.  We are currently running 2 frame works and we make asynchronous calls to the APSV to load dashboard widgets on the home page which will continue to load while the user navigates through the application.  Essentially we have 3 connections to the APSV per client.  1 is used for our legacy code, 1 is used for the dashboard and the other is used for our modernized screens.

Posted by dbeavon on 30-Sep-2019 14:44

>> Out of interest is this something you do regularly in your environment?  

Not by choice.

There is a problem with APSV transport as it relates to the "session-free" openclients.  These types of openclients connect only once and then they re-use a dedicated HTTP connection for long periods of time (for all round-trips over the life of the client application).  The problem is that this worked well in classic appserver and not so well in PASOE.  In PASOE the default HTTP session timeout configuration causes the connection to be broken after a "reasonable" tomcat session timeout - a number of minutes.  Then the "session-free" openclient will encounter this broken connection the next time it needs to make an appserver round-trip.  Things go from bad to worse because of the way in which the openclient will raise a generic, low-level communication exception and crash the entire client application.

So the fix that we were given by PSC tech support is to increase the default HTTP session timeout configuration.  I think we have it set to 24 or 48 hours!  This avoids the continual crashing of our .Net openclient applications..  

But the tradeoff is that there is now a greater potential for resource leaks.  The number of abandoned HTTP sessions may grow over the course of a day (as yours are doing) and these abandoned sessions may hold onto various types of resources (primarily tomcat and msagent memory).  By using the tomcat manager to expire the HTTP sessions, it will administratively release those resources, even if 24  hours haven't yet elapsed.

Another way to avoid all of this is to use the "session-managed" connections to PASOE (rather than "session free").  That model doesn't maintain long-running connections to the server.  It creates new connections for every round-trip.  This avoids the need for the extremely long HTTP session timeouts.  But the round-trips will have quite a lot of additional overhead (10's or 100's of ms each) since new connections being opened and closed.

Posted by Gareth Vincent on 04-Oct-2019 05:08

We eventually found out the root cause of this issue.  

There was a code change that was caching all printing templates back to the user's machine.  In some instances the size of these templates amounted to around 80Mb.  What was happening was if there was a slight drop in network while caching these templates it would cause the user's Session to remain in a READING state which then left the Agent session in an ACTIVE state.  This problem was highlighted by some of our customers running on WAN connections running on PASOE.

We were only able to reproduce this by physically unplugging the network cable when caching was taking place at the time of printing.

I would like to think that PASOE would handle this better and simply timeout the connection at some point but this was not the case.  Our DEV has now changed the code to only cache what is needed for the user.

A part of me is glad that it was crashing as I’m not sure that we would have picked up the amount of data that was getting transferred to the end-user.

This thread is closed