Strange webspeed error

Posted by Jens Dahlin on 23-Sep-2019 09:54

Hi,

I'm struggling with a strange Webspeed problem together with Progress Support so I'm looking around for anybody else with the same issue. 

What happens is that once in a while we will get orphaned agents and the broker loses contact with those agents. The processes are still running and connected to the database but idle. If this isn't mitigated by the processes killed and the users logged out of the database we will have a full database and/or system resources depleted. These orhpaned agent seems to live on forever. 

We have seen a correlation between high load (or high bursts) of load when this happens. So the webspeed broker has spawned lots of agents and one or more of those agents get "orphaned" as mentioned above.

We see errors 6404, 6397, 6403, 6400 in the server file and also 

FSM ERROR: INVALID ACTION state= 10 event = 6 : FSM : action= 20 nextstate = 12
FATAL ERROR : (2) Protocol Error. (8121)
In the broker log file when this happens. 

After this happens the agents producing errors in the server log file becomes orphaned and the broker "heals" by spawning new if needed. 

Support has suggested port scans (because of protocol errors) but we have no port scanners running. This is on a private network where only http-requests get through from the outside. No strange requests are found in the Apache log files (apart from sometimes a large amount of requests). 

Database logs show nothing at all.

Apache log file shows a cgi timeout about five minutes after this happening and a corresponding http 504 being returned. 

General linux logs (syslog etc) show nothing. 

No strange things in firewall logs etc.

If anybody has experienced something similar and come up with a solution please let me know.

We run OE 11.7.5. Webspeed (not PAS). On Ubuntu 16.04. Apache 2.14.18.

All Replies

This thread is closed