Hi All,
I am currently setting up seven databases which will repliacte to two targets. I am using the replication set to allow for failover and failback, whilst continuing replication to the second target.
Yesterday one of the target servers had to be rebooted and since then the databases on this server report as Pre-tranistion. The source and second target are in Normal Processing.
The servers are not in use yet so I presume that I can just restart source and targets and then both targets will be back to Normal processing but my question is - if this were live and transactions had continued in the source and been replicated to the second target, I presume that the other target would have to be re-baselined. The ai files don't queue up when only one of their expected targets is communicating?
Many Thanks,
Laura
Hi,
You should check the status of AI extents on the Source database.
proutil <source-db> -C aimage list
If the AI extents have Locked status, then replication waits for your second target.
The source and target logs should also be checked. Perhaps there is an answer to the question why they did not connect.
Instead of restarting source and target, try to restarting only the replication server (if all agents are started off course).
Replication Server parameter agent-shutdown-action should be set to Recovery.
dsrutil <source-db> -C terminate server
then
dsrutil <source-db> -C restart server
Thanks for this excellent advice. Aimage list showed that all ai extents were locked on the source database. Once I terminated the server and restarted it, both targets are in normal processing. Is it possible to configure my repl.properties so that this would occur automatically without the need for a to restart the replication server?
In windows, I dislike doing this as the replication server is then started in my user session, so if I log of the server it ends.
Many Thanks,
Laura
You can try to increase the value of the connect-timeout parameter in the replication server settings (source.repl.properties).
# [control.agent] Properties
#
# connect-timeout seconds Specifies for how long, in seconds, the server will attempt
# to connect to its configured agents. This property is also
# used by the server while reconnecting to the agent after
# communication has been lost.
The default is 600 seconds.
I think that when the machine with the first target was rebooted, the replication server tried to connect to the agent for 10 minutes, but could not and stopped attempts.
It is possible that it would be enough to tell the server that it tried to try to connect to the agent again.
Next time, instead of restarting the replication server, try this command:
dsrutil <source-db> -C connectagent <agent-name>
or
dsrutil <source-db> -C connectagent ALL
This should solve the problem of starting the replication server under the user session.