(warning) Directory Service unavailable, configuration from

Posted by sumit_saxena on 23-Feb-2012 10:50

I am Using Sonic 6.1 in our production environment.

from the past few months we are facing issue of Directory service getting unavailable frequently. Since last few days the frequency has increased to twice or sometime more a day.

We are using SuSe 9 linux env for Domain Manager & Directory Service Hosting.

When I check the Java Process running on the system I can see process using ds.xml as running.

and Domain manager also running.

Can any one help me on this, what is the exact reason for Directory Service getting unavailable very frequently.

What should be the fix for it ??

I am presently using 2048 as max java heap. Any performance tunning tip for fixing it ??  

All Replies

Posted by kjervis on 23-Feb-2012 16:33

Hi Sumit,

There could be a number of reasons why the Domain Manager becomes non-responsive.  Two of the more common are probably:

  • Insufiicient Memory: If memory (or the lack of it) was the primary factor I would typically expect to see Java OutOfMemory Java Heap Space exceptions in the Domain Manager log.  Alternatively excessive thread usage (usually correlated to promiscous cleint connections or defective thread handling in the process itself) can result in OutOfMemory unable to create new native thread exceptions.
  • Deadlocked threads: The next time this occurs you may wish to obtain thread dumps from the running process, assuming it is still running as indiciated.  These will help indicate whether the process is in a hung state and maybe identify the culprit. Whether it is possible to obtain these will likely depend on the version of java being used.

You say the issue appears to be happening more frequently recently, is there something obvious that has changed in your enevironment in recent months that might have contributed to this?

NOTE: However for these type of issues, especially where they occur in a production environment, it is generally better to open a support ticket with Progress Support and work with support to a satisfactory resolution (http://www.progress.com/en-gb/support/index.html).  That said version 6 is quite old and quite likely outside any support/maintenance agreeement.

Hope this helps

Kevin

Posted by davila on 23-Feb-2012 16:40

Is it possible that the machine where the domain manager is running is experiencing network problems? Can anyone check if a client like the SMC also has problems connecting to the domain manager at these times?

Posted by sumit_saxena on 24-Feb-2012 02:55

Kevin, Thanks for the prompt Response & Concern.

In the log file I can found any trace of suspect. Neither I can see any warning or error related to Insufiicient Memory. We are presently running on 8 GB of RAM on this linux server. And Max Java Heap size set-up in our Domain Manager startup is 2 GB

Only line that I can see in log file is (warning) Directory Service unavailable, configuration from local cache only

Attached is trace of our running processes on this server. I get it using ps-ef|grep java  on server.

Regarding Deadlocked threads , I took a thread dump of the java process today please find the attached dump file for reference.I took it using kill -3 9836

I am not able to debug this dump prefectly. help in this analysis will be highly appreciated

Q :

You say the issue appears to be happening more frequently recently, is there something obvious that has changed in your enevironment in recent months that might have contributed to this?

A :

In recent month nothing has majorly changed accept the load which is presently running I think is more now a days. But I do not have any idea how to measure load for Sonic.

Moreover than this we have only set-up few more esb containers in past for our precessess.

So In all no configuration change in domain manager or directory service.

One Question : Rebooting the Sonic Domain on every weekend will help us in any way ??  

Posted by sumit_saxena on 24-Feb-2012 03:00

Maria, Thanks a lot for prompt reply & concern.

Yes we are also not able to coonect the Domain using the SMC, For Network issue we are also touching base with our IT infrastructure team.

But one thing I want to highlight is that our Domain Manager, Directory Service, Agent Manager all are runing on same server. In that case Is network plays any role for this warning   ??

Actually if there would be any network issue then we may not be even able to connect to our production environment using Putty. I think so

Posted by kjervis on 24-Feb-2012 04:26

Hi Sumit,

I've had a quick look at the thread dump.  Usually it's best to take several of these 30 seconds or so apart to compare and identify any threads that are deadlocked.  However in your case it looks to me (although I could be wrong) like pretty much every (runnable) thread in the management container is locked.  This would explain why the domain manager is not respsonding to requests from other containers or SMC.

Unfortunately I cannot say (or speculate as to) why.

To answer you question.  There is a good chance taking down the managment container (domain manager) and starting it back up will resolve the issue temporarily.  However, the conditions that cause the issue are still present and therefore it will only be a matter of time before it happens again.

General Note: Depending on your architecture a general an increase in messaging load tends not to impact directly the the management container(s).  Although this general statement is based on the assumption that dedicated containers hosting brokers handle the application messaging traffic and separate management container is used to host management broker and directory service.  So if the broker hosted in the management container is used for application messaging traffic then there is an increased probability that message traffic can ultimately affect/impact the management capability.

With kind regards

Kevin

Posted by pmeadows on 24-Feb-2012 05:02

We'd need to know the broker's Build Number in addition to the version (6.1) to properly interpret the thread dump.  However, as Kevin says many of the mgmt threads are blocked, waiting (directly or indirectly) for a lock held by Task Runner 4043665.  This thread is trying to publish a response to a mgmt request.  From the information in the thread dump alone I don't see any clues as to what's holding up the publish.  (A series of thread dumps would help confirm that this thread really is stuck, and that we're not just catching it in some transient state).  At this stage you'd be best working with Progress Technical Support.  A heap dump from the mgmt container may help.

Assuming the mgmt container is used purely for mgmt traffic it might be worth reinitialising the mgmt broker's storage.  This is done using the 'dbtool' utility which you'll find described in the MQ documentation.  It's just possible something's built up in the mgmt broker's persistent storage over time and is now starting to cause problems.

Posted by pmeadows on 24-Feb-2012 08:34

Another quick thought: you could try enabling the Flow To Disk feature on the mgmt broker in case the mgmt comms are getting flow controlled.

If you do this I'd suggest also setting a Maximum Topic DB Size (maybe 100Mb or thereabouts?), and perhaps monitoring the broker.bytes.TopicDBSize metric for a while.  This is to check that if Flow To Disk kicks in it's only to avoid flow control during short bursts of high activity and that you're not building an ever-growing backlog of messages.

Posted by sumit_saxena on 24-Feb-2012 09:25

The Broker Build Version is 2.1.1  and Config Version is 100. Max Management thread Pool is 50 for management container.

Management broker do not host any queue/Topic for message flow service. It is servicing the Management Work , Directory Service Hosting , Agent Manager Hosting.

We have clustered broker with back-up for message servicing. Only JNDI clients, JMS , JMX, Esb Containers use to make connection with Management broker.

Posted by sumit_saxena on 27-Feb-2012 13:12

I have restore the DS from the latest dump of DS config with me using dsAdmin tool.

But after that when i start the DM, the log do not show the line Directory Service available reconsiled the local cache....

can any one let me know what is the problem area ??

Posted by davila on 27-Feb-2012 13:29

You will only get the message about the cache reconciliation if in that same run of the MF container the container has stopped getting its configuration from the DS because the DS becomes  unavailable. So, if the DS was always available for the DM container to get configuration information, then it just reconciles the cache but doesn't write a message about it. You can post the text of the DM log after your restart to make sure it looks ok.

Posted by pmeadows on 05-Mar-2012 04:21

Hi Sumit,

Did you try enabling Flow To Disk on the management broker?  If so, any indication as to whether it helped?

There was also the earlier suggestion of reinitializing the mgmt broker's storage using the dbtool utility.  Note that I'm refering to the mgmt broker's storage (losing information about in-flight mgmt messages shouldn't be a problem), not the DS storage.  But I'd be tempted to try the Flow To Disk suggestion first since it's the least intrusive.

Thanks, Paul.

Posted by sumit_saxena on 26-Mar-2012 04:33

Hello Paul

Just to update DS is stable since last 20 + days

This thread is closed