PASOE ms-agent terminations

Posted by dbeavon on 17-May-2018 07:50

This may be a bit of a vague question but it would help me determine if/when to contact Progress technical support in regards to the behavior of PASOE.  Are there scenarios where a single ABL session should run into a problem which is severe enough to take down the entire MS-agent process?   Is that ever an appropriate/acceptable behavior (esp. given that the MS-agent is potentially hosting a lot of unrelated sessions)?  Note that I'm not loading any third-party libraries or using the .Net runtime or anything like that, I'm just using run-of-the-mill ABL database code.

It seems that this comes up for me from time to time and the root cause usually seems to be an ABL compilation problem or a schema problem (either r-code CRC schema or client-server/"SERV" schema).  This problem is typically encountered in the ABL code running in a *single* session of the agent.  It seems a bit severe that such a problem in a single session should take down the entire process, potentially including other ABL sessions which are having no problems at all.

Another simultaneous issue is the lack of logging when this happens.  Perhaps it is because of some internal indirection mechanism that PASOE uses for logging errors to the agent logs. When the entire ms-agent is terminated so quickly, it is unlikely that any meaningful message will appear in the agent logs.  While you can see in the java (session manager) logs that the ms-agent is dead, you don't get to see the ABL-related reason in the agent log file.

In order to troubleshoot, I typically need to do a lot of guesswork to figure out what code may have been running at the time, and then try to run the same code in a an isolated "_progres" process.  That provides more clues about what went wrong.

Posted by Peter Judge on 17-May-2018 08:02

>   Is that ever an appropriate/acceptable behavior (esp. given that the MS-agent is potentially hosting a lot of unrelated sessions)? 
 
No. If you find that a single AVM session takes down the whole ms-agent please contact Tech Support.
 
If it’s doing something along the lines of self-terminating (which would have been not-so-big-a-deal in classic AppServer) you should fix the code first, but I assume that’s not the case here.
 
In general, GPFs/cores are reason enough to contact TS.
 

All Replies

Posted by Peter Judge on 17-May-2018 08:02

>   Is that ever an appropriate/acceptable behavior (esp. given that the MS-agent is potentially hosting a lot of unrelated sessions)? 
 
No. If you find that a single AVM session takes down the whole ms-agent please contact Tech Support.
 
If it’s doing something along the lines of self-terminating (which would have been not-so-big-a-deal in classic AppServer) you should fix the code first, but I assume that’s not the case here.
 
In general, GPFs/cores are reason enough to contact TS.
 

Posted by Irfan on 17-May-2018 10:15

Does looking at the MS-Agent session stack help to figure out what is happening ?

Posted by dbeavon on 17-May-2018 11:14

By the time I know there is a problem, the MS-Agent is already long-gone, and I don't have any stacks (... but perhaps there is a configuration that would generate stacks while crashing?)

I see this in the java (session-manager) logs :

[xXUBwzg6RrejhqchK27x3g-agent-watchdog] WARN  c.p.appserv.PoolMgt.AgentWatchdog - AgentWatchdog(xXUBwzg6RrejhqchK27x3g) : agent -927YaQeTdK7r2UAzbwm8A PID= 17580 has terminated.

.. and Windows also reports the process crashing in the Application event log.  But there is no good reason for that PID being killed (ie. no applicable agent/resource timeouts or anything).  It typically happens because of a compiler or schema error while a single ABL session is running code.  Typically this only affects us in development because that is where things are compiled on the fly and the schema is being actively changed.  However, it is very disruptive, and *much* more difficult to troubleshoot than it should be (since there is absolutely *nothing* being reported to the agent log).

I've found an earlier discussion that I tried to start about this here:

https://community.progress.com/community_groups/openedge_development/f/19/t/36346

According to that, it was a simple compiler error that causes _mproapsv to crash.  It looks like so

DS_Shift:WRITE-XML("FILE""shift.xml", TRUE)

...instead of...

DATASET DS_Shift:WRITE-XML("FILE",  "shift.xml"TRUE)

This is enough of a compiler problem to cause the whole MS-agent to die. And it doesn't put anything helpful in the agent log in the process of dying.  

The problems that cause ms-agent to die seem to be limited to a small set of compiler/schema issues.  For example if I replace that line of code with something like this "asdfasdf", then things fail in a more expected way, with a compiler message to the agent log like so: ** Unable to understand after -- "asdfasdf". (247)).

I am glad to know that this is a bug in PASOE.  I will spend the time to open the tech support case next time I waste more than an hour trying to figure why my code caused the MS-agent to crash.  Hopefully Progress has started their preparations for 11.7.4...

This thread is closed