Appserver agents.....where does the memory go?!

Posted by chrisrichardson on 29-Jul-2010 06:34

Over the course of a couple of days swap space gets swallowed up, resulting in the following error in the messages file:

 WARNING: /tmp: File system full, swap space limit exceeded

Swap is showing:

root@server # swap -s
total: 39085584k bytes allocated + 11687120k reserved = 50772704k used, 7612008k
 available


with the available quantity constantly decreasing. For example in the last 10 minutes the available has decreased by 809mb:

root@server # swap -s
total: 39871152k bytes allocated + 11727856k reserved = 51599008k used, 6783264k
 available

This seems to be attributed to the appserver/webspeed agents as when I trim them, the memory is released and /tmp gets it's space back.  I could add swap space, but my concern is the appserver agents will continue to gobble up the memory - so should I really be looking into the code the appserver/webspeed brokers are running and the configuration of the appservers?

For example the heap size for a long running agent (c24hrs) is 458752k and for an agent just started is 16384K.  (Attached the full details of each process).

We are running 10.2B on Solaris 10.  Server: Sun T5440, 64gb ram.

Does anyone have any pointers/appserver configuration suggestions?

Thanks,

Chris

pmap.txt.zip

All Replies

Posted by Gangs on 29-Jul-2010 10:42

We had a big memory leak in our webspeed agents last year.  After talking with Progress, we went through the code and find every place where a widget-pool is used/created, and made sure we delete them afterward. 

it helped a lot.

Posted by chrisrichardson on 29-Jul-2010 10:59

Thanks for this....

Just looking at the code, 99% of our webspeed .p's have the CREATE WIDGET-POOL in the definitions section. - does this mean we should explicitly delete the widget-pool...say at the end of the main block?

I should add - this has only happened since upgrading from 10.1b to 10.2.

Thanks

Chris

Posted by Tim Kuehn on 29-Jul-2010 12:24

un-named widget-pools go away when they go out of scope. Named widget pools do not.

Posted by chrisrichardson on 30-Jul-2010 09:36

So I'm basically looking for dynamic TT's, objects that are not cleared down and persistent procedures that may be left hanging around?

Chris

Posted by Admin on 30-Jul-2010 09:55

That are the usual suspects....

Posted by rbf on 30-Jul-2010 15:38

 

I should add - this has only happened since upgrading from 10.1b to 10.2.

In that case your code is OK and you should contact Tech Support immediately.

Posted by Tim Kuehn on 03-Aug-2010 10:20

rbf wrote:

I should add - this has only happened since upgrading from 10.1b to 10.2.

In that case your code is OK and you should contact Tech Support immediately.

Also check the release notes for 10.2 to see if something's been changed  from 10.1B to 10.2 which could result in this.

Posted by chrisrichardson on 09-Aug-2010 02:45

Thanks all for the input.  Back at work today so will have a look this week and report back with findings.

Chris

Posted by kborn on 09-Aug-2010 03:58

In case the tmp dir gets full, please verifiy if you use the -t parameter. On unix system sometimes this is used to see the Progress tmp files. If -t (small t) is used some automatism which cleans this files(lbi*) will not work. See KB Entry P107512.

Posted by chrisrichardson on 31-Aug-2010 03:19

With advice from Progress support it turned out that a handful of procedures and dynamic objects were not been cleaned up properly on Appserver and Webspeed agents. Within an hour the Appserver was running 9000 versions of a .p persistently across 15 agents!

One Webspeed procedure was missing the CREATE-WIDGET-POOL statement and this meant that the dynamic objects created by the Webspeed .i's were not cleaned up. - This happened to be a procedure called thousands of times a day.  No wonder memory was disappearing!

To track down I enabled enhanced logging on the appserver by adding “DynObjects.*:4” to the Server logging entry types.  This logged every dynamic object being created or deleted, including the handle number.  From this I could figure out the procedures that needed attention.

Attached is a before/after shot of agent memory usage.

Cheers for your help.

Chris


Posted by rbf on 31-Aug-2010 17:00

Surely you are not saying that this problem was a result of your upgrade to 10.2?

My bet is that a code changed was introduced at the same time and that was the real issue.

Posted by chrisrichardson on 01-Sep-2010 02:10

Nope, it was definitely our code

The upgrade to 10.2b was one of many elements that had changed around that period of time.  In addition we had added 250 high-throughput users to the system so I think this highlighted the memory issue.

Chris

This thread is closed