Question regarding AS query output (Rq Wait & Rq Duratio

Posted by qasimpatwekar on 01-Feb-2016 04:47

Dear Experts,

We have two AppServers 'qadfinyoungs' and 'qadui_ASyoungs' both with very high 'Rq Wait' and 'Rq Duration' values.  Is there any way of checking what is causing these high values?  Do we need to turn on specific Logging Type and/or increase the level of logging from '2' to '4' (verbose) to capture this information?  

 
Example:
proenv>asbman -name qadui_ASyoungs -query
OpenEdge Release 11.2 as of Wed Feb 13 19:00:20 EST 2013
Connecting to Progress AdminServer using rmi://localhost:20931/Chimera (8280)
Searching for qadui_ASyoungs (8288)
 
Connecting to qadui_ASyoungs  (8276)
Broker Name                    : qadui_ASyoungs
Operating Mode                 : Stateless
Broker Status                  :  ACTIVE
Broker Port                    : 53205
Broker PID                     : 3660
Active Servers                 : 11
Busy Servers                   : 1
Locked Servers                 : 0
Available Servers              : 10
Active Clients (now, peak)     : (188, 288)
Client Queue Depth (cur, max)  : (0, 37)
Total Requests                 : 1466361
Rq Wait (max, avg)             : (352092 ms, 5 ms)  ==> (5 minutes Max time)
Rq Duration (max, avg)         : (10072341 ms, 222 ms) ==> (167 minutes Max time)
PID   State     Port  nRq    nRcvd  nSent  Started          Last Change
03948 AVAILABLE 53247 002425 002468 004010 Jan 20, 2016 09:34 Jan 20, 2016 10:21
49647 AVAILABLE 53261 038949 039656 056081 Jan 27, 2016 08:45 Jan 28, 2016 12:13
01573 AVAILABLE 53232 007971 008099 012143 Jan 28, 2016 07:22 Jan 28, 2016 12:13
49830 AVAILABLE 53239 007126 007259 011205 Jan 28, 2016 07:59 Jan 28, 2016 12:13
49852 AVAILABLE 53242 006866 007062 011119 Jan 28, 2016 07:59 Jan 28, 2016 12:13
49854 AVAILABLE 53244 007126 007314 011151 Jan 28, 2016 07:59 Jan 28, 2016 12:13
49866 AVAILABLE 53248 007029 007160 011676 Jan 28, 2016 07:59 Jan 28, 2016 12:13
50237 AVAILABLE 53251 002195 002237 003203 Jan 28, 2016 10:41 Jan 28, 2016 12:13
50240 AVAILABLE 53253 002104 002170 003376 Jan 28, 2016 10:41 Jan 28, 2016 12:13
50245 AVAILABLE 53254 002042 002091 003280 Jan 28, 2016 10:41 Jan 28, 2016 12:13
50261 AVAILABLE 53255 002087 002147 003438 Jan 28, 2016 10:41 Jan 28, 2016 12:13
2) There is huge discrepancy in nRq, nRcvd & nSent number and I found http://knowledgebase.progress.com/articles/Article/21131. Do you have any other kbase which clarifies this in more details (like taking real time example)?
 
Note: I searched in Progress kbase but only find the information regarding what is Rq Wait & Rq Duration but I am unable to find the cause of its high value and what value Progress suggests the optimal.
Thanks in advance.
--
Thanks,
Qasim.

All Replies

Posted by lecuyer on 01-Feb-2016 06:48

There is no simple way to empirically identify the specific requests that cause large values for the rqWait time.  However, it may be possible to narrow down the source of this a bit.

As requests arrive in the broker, agents are assigned in a FIFO manner to serve these request.  If agents are 'available' to serve the request, they are assigned immediately ... no waiting involved.  If no agents are available at that moment, the request waits for a short time for an existing agent to become available, on the assumption that an existing agent may become available in less time than it would take to start a new agent.  The duration of this timeout is specified by the srvrStartupTimeout property (def=3 secs)in the ubroker.properties file

Once this first timeout occurs, behavior depends on whether all allowed agents have been started (based on the srvrMaxInstance property).  If all agents have been started, then the request waits again for an existing agent to become available.  The duration of this second wait is defined by the requestTimeout property (def= 15 secs).  If no agent becomes available within this interval, an error response is returned to the client.  For these types of requests, the maximum rqWait should be no greater than the sum of the srvrStartupTimeout and the requestTimeout (approximately).

If the number of agents has not reached its maximum, then the broker attempts to start a new agent.  It is important to note that at this point, the requestTimeout timer is stopped, since the amount of time necessary to start a new agent is application dependent.  Consequently, for requests where a new agent process is started 'on demand', the rqWait time will include the srvrStartupTimeout plus the amount of time it takes to start a new agent process.  It is also important to note that this time also includes the time necessary to run the agent startup procedure (which may involve connecting to databases, loading of application context information, caching of static temptables, etc.).

This would suggest, therefore, that the large rqWait times represent large agent startup times.  This time could be aggravated by the frequency with which agents are started 'on demand', and whether multiple agents are started up simultaneously.  This could be the result of request 'storms' due to application requirements.

As far as the discrepancies in the nRq, nSent, and nRcvd values, this is perfectly normal.  It would suggest that the average request transfers a lot of data in each direction.  As the kbase suggests, as requests are sent and received, they are serialized into 'packets' of (relatively) fixed length.  A request/response message may involve multiple packets depending on the size of parameter data.

Posted by qasimpatwekar on 01-Feb-2016 07:17

Dear Lecuyer,

Thanks for providing detailed flow of the AS request model. However I am just wondering If detailed logging and debugging had been enabled, would this have captured the instance where the ‘Rq Duration’ max value was reached and would it indicate with timestamps when that request started and ended so we have some way of identifying what process caused it?  If so what level of debugging is required? If NOT what is the best possible thing to identify the delay?

You also mentioned "it may be possible to narrow down the source of this a bit." may I know how we can do this?

Thanks in advance.

Thanks,

Qasim.

Posted by lecuyer on 01-Feb-2016 07:20

As you would expect, rqDuration is the length of time it takes to run a request.  This highly dependent on the application, and the resources it requires.  Identifying requests that might take a long time is largely a matter of analyzing the behaviour of the application and what its dependencies are.

The one possible tool that might prove helpful in identifying long running requests would be to use the srvrExecutionTimeLimit property.  If set, this property should cause a STOP condition to be raised in the application after the specified number of seconds, normally resulting in an error being returned to the client.  Obviously this could be very disruptive to your application, and should not be employed without a clear understanding of how your application would behave.  If the large rqDuration values can be reproduced in a test environment, setting this property might help identify which requests are taking a long time.  You would have to determine whether this is 'normal' behaviour for you application, or if it indicates some kind of problem.

The other alternative would be to  utilize OEManagement which would give you this kind of detailed information.

Posted by lecuyer on 01-Feb-2016 07:23

Detailed logging can give you some of this information.  However, this would likely generate very large log files, and cause significant performance penalties.  This can sometimes lead to an actual change in the behaviour that you are trying to observe.

Posted by lecuyer on 01-Feb-2016 07:45

To narrow this down a bit, I would suggest analyzing the application requirements of starting new agents (especially startup procedures), and what your usage patterns are regarding starting agents 'on demand.  That is, do you employ an relative static agent startup policy (e.g. prestart all/most of your agents and keep them running) or is it more dynamic (e.g. prestart few agents, and use an agressive auto-trim property to terminate agents that are not immediately needed).  There is no right or wrong configuration in this regards, and it might be possible to tune this a bit.

That being said, I should probably ask whether these values for rqWait and rqDuration are typical, or are they anomalous, since your average times are very respectable.  How long had this broker been running when this data was collected?  One of your agents has been running at least 13 days.  Unfortunately the statistics presented reflect behaviour since the time the broker was started, and long running times can tend to mask the effects of specific events.  Restarting you broker more frequently might yield some information.  You could also consider gathering the asbman query data over time in regular intervals (e.g. every 30 seconds) using an automated script.  This might help identify when these long running requests occurred.

Again, you might want to consider using OEManagement to gather this level of detail.

Posted by qasimpatwekar on 03-Feb-2016 01:35

Hi lecuyer,

Thanks for the detailed information. I will add ASPlumbing,DB.Connects,4GLTrace with auto script which queries Appserver on regular interval. At the same time I am fully aware that it may have consequences like performance degradation and large consumption of disk space. I will do this for short period of time.

I really appreciate your time and thanks once again.

--

Thanks,

Qasim.

This thread is closed