We have two AppServers 'qadfinyoungs' and 'qadui_ASyoungs' both with very high 'Rq Wait' and 'Rq Duration' values. Is there any way of checking what is causing these high values? Do we need to turn on specific Logging Type and/or increase the level of logging from '2' to '4' (verbose) to capture this information?
There is no simple way to empirically identify the specific requests that cause large values for the rqWait time. However, it may be possible to narrow down the source of this a bit.
As requests arrive in the broker, agents are assigned in a FIFO manner to serve these request. If agents are 'available' to serve the request, they are assigned immediately ... no waiting involved. If no agents are available at that moment, the request waits for a short time for an existing agent to become available, on the assumption that an existing agent may become available in less time than it would take to start a new agent. The duration of this timeout is specified by the srvrStartupTimeout property (def=3 secs)in the ubroker.properties file
Once this first timeout occurs, behavior depends on whether all allowed agents have been started (based on the srvrMaxInstance property). If all agents have been started, then the request waits again for an existing agent to become available. The duration of this second wait is defined by the requestTimeout property (def= 15 secs). If no agent becomes available within this interval, an error response is returned to the client. For these types of requests, the maximum rqWait should be no greater than the sum of the srvrStartupTimeout and the requestTimeout (approximately).
If the number of agents has not reached its maximum, then the broker attempts to start a new agent. It is important to note that at this point, the requestTimeout timer is stopped, since the amount of time necessary to start a new agent is application dependent. Consequently, for requests where a new agent process is started 'on demand', the rqWait time will include the srvrStartupTimeout plus the amount of time it takes to start a new agent process. It is also important to note that this time also includes the time necessary to run the agent startup procedure (which may involve connecting to databases, loading of application context information, caching of static temptables, etc.).
This would suggest, therefore, that the large rqWait times represent large agent startup times. This time could be aggravated by the frequency with which agents are started 'on demand', and whether multiple agents are started up simultaneously. This could be the result of request 'storms' due to application requirements.
As far as the discrepancies in the nRq, nSent, and nRcvd values, this is perfectly normal. It would suggest that the average request transfers a lot of data in each direction. As the kbase suggests, as requests are sent and received, they are serialized into 'packets' of (relatively) fixed length. A request/response message may involve multiple packets depending on the size of parameter data.
Thanks for providing detailed flow of the AS request model. However I am just wondering If detailed logging and debugging had been enabled, would this have captured the instance where the ‘Rq Duration’ max value was reached and would it indicate with timestamps when that request started and ended so we have some way of identifying what process caused it? If so what level of debugging is required? If NOT what is the best possible thing to identify the delay?
You also mentioned "it may be possible to narrow down the source of this a bit." may I know how we can do this?
Thanks in advance.
As you would expect, rqDuration is the length of time it takes to run a request. This highly dependent on the application, and the resources it requires. Identifying requests that might take a long time is largely a matter of analyzing the behaviour of the application and what its dependencies are.
The one possible tool that might prove helpful in identifying long running requests would be to use the srvrExecutionTimeLimit property. If set, this property should cause a STOP condition to be raised in the application after the specified number of seconds, normally resulting in an error being returned to the client. Obviously this could be very disruptive to your application, and should not be employed without a clear understanding of how your application would behave. If the large rqDuration values can be reproduced in a test environment, setting this property might help identify which requests are taking a long time. You would have to determine whether this is 'normal' behaviour for you application, or if it indicates some kind of problem.
The other alternative would be to utilize OEManagement which would give you this kind of detailed information.
Detailed logging can give you some of this information. However, this would likely generate very large log files, and cause significant performance penalties. This can sometimes lead to an actual change in the behaviour that you are trying to observe.
To narrow this down a bit, I would suggest analyzing the application requirements of starting new agents (especially startup procedures), and what your usage patterns are regarding starting agents 'on demand. That is, do you employ an relative static agent startup policy (e.g. prestart all/most of your agents and keep them running) or is it more dynamic (e.g. prestart few agents, and use an agressive auto-trim property to terminate agents that are not immediately needed). There is no right or wrong configuration in this regards, and it might be possible to tune this a bit.
That being said, I should probably ask whether these values for rqWait and rqDuration are typical, or are they anomalous, since your average times are very respectable. How long had this broker been running when this data was collected? One of your agents has been running at least 13 days. Unfortunately the statistics presented reflect behaviour since the time the broker was started, and long running times can tend to mask the effects of specific events. Restarting you broker more frequently might yield some information. You could also consider gathering the asbman query data over time in regular intervals (e.g. every 30 seconds) using an automated script. This might help identify when these long running requests occurred.
Again, you might want to consider using OEManagement to gather this level of detail.
Thanks for the detailed information. I will add ASPlumbing,DB.Connects,4GLTrace with auto script which queries Appserver on regular interval. At the same time I am fully aware that it may have consequences like performance degradation and large consumption of disk space. I will do this for short period of time.
I really appreciate your time and thanks once again.