When I recreate (cleanDeploy) Sonic Domain using Sonic Deployment Manager (SDM) I get following warning:
WARN [main] [HostManagerAccessor] Communication failure with the HostManager.
Caused by: com.sonicsw.mf.comm.InvokeTimeoutException: SampoDomain.ctESB1HM:ID=HOST MANAGER - invoke()
... 6 more
This happens when SDM is cleaning up and stoping existing containers. Deployment still processed successfully, Put it takes awful lot of time when there are many containers.
How can this warning be avoided? How does SDM know how to shut down containers on remote machines? How does it choose which hostmanager it should use?
Thank You for any help.
During a cleanDomain, SDM will try to clean up/rempve ALL the containers belonging to the existing domain, provided it is the same domain with the same connection URLs. It identifies the the host that a host manager is running in because there is a host manager operation getAllHostnamesAndIPs which will return the host names for the host the HostManager is running on. If there is more than one host managet running on a host, it probably chooses one at random. From the stack trace it looks like SDM is having problems communicating with a host manager. You might want to try adjusting the command line argument requestTimeout to a value in seconds higher than the default (60 seconds) to see if that makes any difference. Can you post the entire SDM output? That gives me an idea of where in the SDM processing the exception occured. As far as you know, was the host manager running?
Host managers were running. This timeout only occures during cleanDomain. When running updateDomain, there's no problems.
Will post full log later.
There one more thing I notice in log:
WARN: There are no machines matching the logical host(s) for the container: ctESB1HM. Logical hosts are: . Caused by model el
Proceeding with default host (e.g. this machine).
WARN: There are no machines matching the logical host(s) for the container: ctClient1HM. Logical hosts are: . Caused by model
Added log as attachment. Hostnames, ip address and most container names are renamed for privacy.
Also when clean domain my self (kill processes and delete container directories) and run SDM, then it creates domain without these proxy errors.
It looks like the (or at least some of the) host managers are part of the model. SDM is shutting down all the containers in the model during a cleanDomain, and then it cannot find the host managers it has picked when it tries to perform operations on other hosts. Try taking the host managers out of the model - this is what SDM expects. Couple this with Launcher installs on the remote hosts that configure a container and a host manager after the bits are laid down and start the container. These remote containers will wait until they can connect to the domain created by the cleanDomain. At that point they become part of your environment, but not part of your model.
When you re-execute cleanDomain, SDM will not shutdown these host managers as part of the cleanup. It will restart them automatically so they will become part of the new domain, and then it will use them to perform operations on those hosts.
It looks like there is more than one host manager per host, and SDM is picking the one that it is shutting down. It is fine to have more than one host manager on a host, but not necessary. I think SDM will be picking one at random, so if it happens that it is also one that is in the model, it could get shutdown before SDM uses it.