JMS generic client stuck since yesterday.

Posted by dbeavon on 04-Apr-2019 16:12

Is the JMS generic client code (in ABL) available to be reviewed by the public?

I have an ABL session in PASOE that is interacting with the "sendToQueue" method on a jms/ptpsession.  The ABL session callstack is stuck on "send2" in message-header.p.  There has been no change for the past 12 hours.

We do a TON of jms and know that the scope of the problem is limited to this single ABL session.

Callstack
 	C:\Progress\OpenEdge\jms\impl\message-header.r  (send2 jms/impl/message-header.p - 1616)

 	C:\Progress\OpenEdge\jms\impl\session.r  (doSend jms/impl/session.p - 2161)

 	C:\Progress\OpenEdge\jms\impl\session.r  (sendToQueue jms/impl/session.p - 1950)

 	\\grnetappm03\oe_prod\OpenEdge\USW\LumberTrack\app\p\app0479.r  (FetchAsyncDataTransaction app/p/app0479.p - 828)

 	\\grnetappm03\oe_prod\OpenEdge\USW\LumberTrack\app\p\app0479.r  (FetchAsyncData app/p/app0479.p - 501)

It would be really great if I had some idea what "send2" is doing to make it stuck, so that I can free it up (either manually or introduce a custom workaround that will force a timeout).

I've already verified that the PASOE client is not connected, and the HTTP session in tomcat has been expired.  But these things weren't enough to free up the ABL session and unlock it from whatever it is doing with the generic jms adapter.

Our PASOE environment runs on Windows with version 11.7.4.  Any help would be appreciated.  This is the first time I've seen anything get locked up at this point in the callstack.  We send 100,000's of messages a day.  I haven't found any google hits about this so I thought I'd at least start the conversation.   The other thing I haven't done yet is to kill the entire PASOE agent, after adding another one in its place.  That is going to be the next step, but I'd like to gather information and understand what happened first.

All Replies

Posted by Matt Baker on 04-Apr-2019 16:49

In lieu of posting the code here is a summary of what it does.  The method is fairly simple..

A few uses of CODEPAGE-CONVERT on several variables...then it makes a call to send the actual message which is actually going across the wire to the adapter...a bit of error handling...then it calls codepage-convert a few more times on the response.  It is likely this call that you see as hanging is waiting for a response from the adapter.

There is only one code path other than the error handling code which is quite minimal.  No loops or anything.

Maybe a protrace from the session and/or a javastack trace from the adapter would yield more clues as to where the code is blocked waiting for the response from the adapter.

Posted by dbeavon on 04-Apr-2019 19:35

That is helpful.  It sounds like killing the adapter process and restarting it might be the fix?  It might release the ABL session from its hung state.  I don't have the UI configured properly (within OEE) to browse the broker-adapter at the moment, but perhaps there is some corresponding thread that is hung on that side of things as well?

I don't suppose there is a REST management API for the broker-adapter yet?  Is there an adaptman command which serves that purpose? It would be nice to detect this issue in the future via REST and send out email alerts to the admins...  (I'm a bit spoiled by the oemanager REST api for PASOE... seems like the broker-adapter could benefit from something like that too...)

Posted by dbeavon on 26-Aug-2019 22:16

I see that there was a new KB added about this (about a week ago).  Someone else must have found the same bug that I had described above.  Apparently Progress has acknowledged that this is a defect in the JMS adapter:

https://knowledgebase.progress.com/articles/Article/OpenEdge-ABL-Sonic-adapter-hangs-on-send2-jms-impl-message-header-p-at-line-1616

Our environment is also running in windows.  But we are on OE 11.7.4.  And we use the Apache ActiveMQ broker, not the Sonic MQ broker.

What is the timeline for a fix to something like this?  Is there any workaround?  Would ABL's "STOP-AFTER" have any effect?  In our case the issue happens within the context of a Progress PASOE agent process.  It is really disruptive that we have to kill an entire agent process.  That will also interfere with whatever is happening in any of the other ABL sessions.

Posted by dbeavon on 05-Sep-2019 17:03

Based on interacting with Progress tech support, the KB that was created about this is specific to the Sonic MQ broker and does *not* apply to the generic JMS adapter (Active MQ broker).  This is despite the fact that we are seeing the exact same symptoms and callstacks.

Here was that KB:

knowledgebase.progress.com/.../OpenEdge-ABL-Sonic-adapter-hangs-on-send2-jms-impl-message-header-p-at-line-1616

Unfortunately it sounds like the KB article was a red herring, and I'm not actually very close to a resolution after all.

Even so, I would *really* appreciate hearing some feedback from the Progress customer who worked with Progress support on that recent KB ("OPENEDGE ABL SONIC ADAPTER HANGS ON SEND2 JMS/IMPL/MESSAGE-HEADER.P").  You know who you are.  It would be nice to get as much additional information as possible (beyond the brief notes in the KB) so that I might attempt to build a similar repro using Active MQ.  There are lots of missing pieces to this puzzle, and I don't quite have enough information to solve it (nor even to consistently recreate the problem).  It looks like I'm back at square 1 since my issue is supposedly unrelated to the one involving Sonic MQ.

My plan "B" will be 100% trial-and-error.  Maybe it will come down to pulling network wires out the back of the PASOE servers to see whether that causes the "generic JMS adapter" to hang (bad) or raise an error (good).

Posted by dbeavon on 13-Sep-2019 18:50

I'm still working with tech support on this "JMS generic adapter" issue.  The current theory is that the root problem might be as simple as a TCP transport issue ... whereby the interaction with the remote broker becomes unresponsive and there is no timeout in effect so ABL is forced to hang indefinitely.  (It isn't too surprising that we are scrutinizing the remote broker connectivity - rather that the various moving-parts on the local server where Progress is running).

I agree that a javastack trace from the adapter would be helpful if I could get one.  So far I haven't been able to recreate the problem on demand.

This whole "JMS generic adapter" thing is kind of crazy stuff.  My own ABL code runs in a process that is *two* levels of indirection away from the messaging broker.  And actual client connection is happening in a third-party jar that Progress doesn't necessarily support.  In the Java runtime.  And the jar, in turn, speaks to a messaging broker with a proprietary wire protocol.  If/when issues arise with this configuration, the majority of issues are things that Progress can blame on a third-party.

To be fair ... Progress does own the "JMS generic adapter" and supports that.  But that generic adapter doesn't currently seem to have error detection logic that would interrupt a client that is misbehaving.    EG. If an ABL program is trying to send a message and the "generic adapter" is blocked for a long period of time - EVEN if it is waiting on a third-party JMS implementation - then there should be a way for it to *abandon* the operation and return control to ABL again with a failure message.  It seems like the "JMS generic adapter" is just as much at fault for blocking execution.  It is not just the fault of the remote messaging broker.

I wish Progress would provide native AMQP support to talk directly to a messaging broker from ABL code. The complexity would be reduced, and we'd eliminate extra hops, additional processes, and the need for the Java runtime.  By now there are probably *lots* of open source implementations of AMQP that Progress could copy from.

Posted by onnodehaan on 13-Sep-2019 19:43

dbeavon: we have been using JMS Generic Adapter for years and years, processing thousands of messages every hour (even over the internet) and we have never seen a hanging appserver process pop-up except in one use case.

When we restart the statefree appserver that the OpenEdge Nativa Adapter (Sonic component) uses to talk to Metacom, than it hangs. But that's the other way around. So not related to your use case, but I thought I'd chip in since the last few posts didn't receive a response, which I found kind of sad :-)

Posted by dbeavon on 14-Sep-2019 02:07

@onnodehaan ... I only heard about one other customer that encountered the same issue.  And I only heard about it when I happened upon their KB article with the same symptoms.  I suspect there are not many OE customers who actually use JMS in the first place.  If it was more popular then Progress might not have unloaded the Sonic MQ product to Aurea.

I think I agree with the Progress theory that the root cause might be related to a TCP transport issue in the third-party JMS library.  The JMS specification itself doesn't seem to care about the wire protocol, or the TCP/IP layers ...  so Progress really isn't able to care about those things either.  There probably isn't any perfect way for the Progress "generic JMS" interface to interact with TCP/IP settings in a generic way (given that the underlying client library might be sonic or active mq or websphere or whatever)..

My strategy now is to incorporate some additional TCP timeout settings to my connection string and then cross my fingers.  These settings are supposed to impact the active mq client library that we are using today: "&soTimeout=60000&soWriteTimeout=60000"

Hopefully that will reduce or eliminate the hangs.  If the hangs continued then I suppose the next step would be to write a custom health-checker.  It might do an "adaptman -query" on a regular basis and if there is anything that looks wrong, or hung, then we could kill off the whole broker-connect-mess and restart it again from scratch.  Given that we have a busy and long-running PASOE service on that server, then we can't really afford any hangs, or leaked client connections within the adapter.

Posted by dbeavon on 11-Nov-2019 23:00

Below is another example of a callstack where we've seen the Generic adapter get hung.  The only difference from my original posting is that this is while calling the commitSend method.

Callstack
C:\Progress\OpenEdge\jms\impl\session.r  (commitSend jms/impl/session.p - 1874)

\\grnetappm03\oe_prod\OpenEdge\USW\LumberTrack\app\p\app0479.r  (FetchAsyncDataTransaction app/p/app0479.p - 828)
 
\\grnetappm03\oe_prod\OpenEdge\USW\LumberTrack\app\p\app0479.r  (FetchAsyncData app/p/app0479.p - 501) 


This thread is closed