Our replication server is complaining when trying to monitor:
F:\DATABASE\LIVE>dsrutil icmasliv -C monitor Cannot connect to replication shared memory. Status = -1
The target is in normal processing, but hasn't received any data since the server stopped responding.
I've run Restart Server and it is now Connecting to Agents but seems to be unsuccessful in doing so. Is there anything else I can attempt before restarting the DB?
Also, is there anywhere I can look to see if I can work out why replication failed? Nothing obvious in the logs I've looked in so far.
Had to restart the DB and it's still complaining so I'm restarting the server. :/
Did you check the connectivity between both servers/databases, i.e. telnet on the target database port from the source db server, do you get a response?
Had to restart the DB and it's still complaining so I'm restarting the server. :/
Flag this post as spam/abuse.
Unfortunately I am restricted by when the business says it's convenient to do things. The next "convenient" window being Wednesday when our AI extents would be full...
I restarted the source as that is the one which was giving the error.
Found some tell-tale problems in the Target log file, although the timings don't match completely. Target DB was up to date until 12:19 when things went south.
[2015/03/30@11:35:35.088+0100] P-4528 T-4520 I RPLA 162: (9407) Connection failure for host 192.168.125.1 port 4859 transport TCP. [2015/03/30@11:35:35.089+0100] P-4528 T-4520 I RPLA 162: (-----) Diagnostic Dump of RPCommInfo_t - TCP/IP Poll Error:2 [2015/03/30@11:35:35.089+0100] P-4528 T-4520 I RPLA 162: (-----) 0000: 0000 0000 0000 0000 6080 4050 2811 0000 2311 0000 9411 0000 0200 0000 2400 0000 [2015/03/30@11:35:35.089+0100] P-4528 T-4520 I RPLA 162: (-----) 0020: 4a92 1100 3730 af00 0000 0000 9e26 1955 0000 0000 4021 0000 0000 0000 2c01 0000 [2015/03/30@11:35:35.089+0100] P-4528 T-4520 I RPLA 162: (-----) 0040: 0000 0000 58f0 ffff 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 [2015/03/30@11:35:35.089+0100] P-4528 T-4520 I RPLA 162: (-----) 0060: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 [2015/03/30@11:35:35.089+0100] P-4528 T-4520 I RPLA 162: (-----) 0080: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 [2015/03/30@11:35:35.089+0100] P-4528 T-4520 I RPLA 162: (-----) 00a0: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 [2015/03/30@11:35:35.089+0100] P-4528 T-4520 I RPLA 162: (-----) 00c0: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 [2015/03/30@11:35:35.089+0100] P-4528 T-4520 I RPLA 162: (-----) 00e0: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 [2015/03/30@11:35:35.089+0100] P-4528 T-4520 I RPLA 162: (-----) 0100: 0000 0000 0000 0000 0000 0000 3139 322e 3136 382e 3132 352e 3100 0000 0000 0000 [2015/03/30@11:35:35.089+0100] P-4528 T-4520 I RPLA 162: (-----) 0120: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 [2015/03/30@11:35:35.089+0100] P-4528 T-4520 I RPLA 162: (-----) 0140: 0000 0000 0000 0000 0000 0000 [2015/03/30@11:35:35.089+0100] P-4528 T-4520 I RPLA 162: (10492) A communications error -157 occurred in function rpNLA_PollListener while receiving a message. [2015/03/30@11:35:35.089+0100] P-4528 T-4520 I RPLA 162: (11699) A TCP/IP failure has occurred. The Agent's will enter PRE-TRANSITION, waiting for connection from the Replication Server. [2015/03/30@11:35:38.027+0100] P-4528 T-4520 I RPLA 162: (10392) Database f:\database\live\icmasliv is being replicated from database f:\database\live\icmasliv on host 192.168.125.1. [2015/03/30@11:35:39.030+0100] P-4528 T-4520 I RPLA 162: (10671) The OpenEdge Replication Agent agent1 is beginning Recovery Synchronization at block 11913. [2015/03/30@11:35:39.399+0100] P-4528 T-4520 I RPLA 162: (6806) Retry transaction point located at dbkey 0 note type 10 updctr 0. [2015/03/30@11:35:39.399+0100] P-4528 T-4520 I RPLA 162: (10705) Retry point located at logical op 1 note type 70 trid 908325446. [2015/03/30@11:35:39.720+0100] P-4528 T-4520 I RPLA 162: (10670) The Source and Target databases are synchronized. Normal processing is resuming.
Would need data from “Target DB was up to date until 12:19 when things went south”.
Found some tell-tale problems in the Target log file, although the timings don't match completely. Target DB was up to date until 12:19 when things went south.
[2015/03/30@11:35:35.088+0100] P-4528 T-4520 I RPLA 162: (9407) Connection failure for host 192.168.125.1 port 4859 transport TCP.
[2015/03/30@11:35:35.089+0100] P-4528 T-4520 I RPLA 162: (-----) Diagnostic Dump of RPCommInfo_t - TCP/IP Poll Error:2
[2015/03/30@11:35:35.089+0100] P-4528 T-4520 I RPLA 162: (-----) 0000: 0000 0000 0000 0000 6080 4050 2811 0000 2311 0000 9411 0000 0200 0000 2400 0000
[2015/03/30@11:35:35.089+0100] P-4528 T-4520 I RPLA 162: (-----) 0020: 4a92 1100 3730 af00 0000 0000 9e26 1955 0000 0000 4021 0000 0000 0000 2c01 0000
[2015/03/30@11:35:35.089+0100] P-4528 T-4520 I RPLA 162: (-----) 0040: 0000 0000 58f0 ffff 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
[2015/03/30@11:35:35.089+0100] P-4528 T-4520 I RPLA 162: (-----) 0060: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
[2015/03/30@11:35:35.089+0100] P-4528 T-4520 I RPLA 162: (-----) 0080: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
[2015/03/30@11:35:35.089+0100] P-4528 T-4520 I RPLA 162: (-----) 00a0: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
[2015/03/30@11:35:35.089+0100] P-4528 T-4520 I RPLA 162: (-----) 00c0: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
[2015/03/30@11:35:35.089+0100] P-4528 T-4520 I RPLA 162: (-----) 00e0: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
[2015/03/30@11:35:35.089+0100] P-4528 T-4520 I RPLA 162: (-----) 0100: 0000 0000 0000 0000 0000 0000 3139 322e 3136 382e 3132 352e 3100 0000 0000 0000
[2015/03/30@11:35:35.089+0100] P-4528 T-4520 I RPLA 162: (-----) 0120: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000[/collapse]
Is the target database "aware" that you restarted the source db/server, did it say anything about it or it kept happily ignoring the issue on source?
No target carried merrily on its way when source was rebooted.
When I attempted to restart the target db I got a message saying shared memory was already in use.
So target db stopped, but gave you an error upon starting. I've seen it take sometimes a couple of minutes to completely shut down. Is the dbname.lk still there? Can you promon the database, or it says that there is no server for it? Is the rpagent.exe process still alive (assuming Windows platform)?
I left it 30 minutes before trying again and still no joy. The admin service log file had an error saying shared memory was already in use. DB log file showed shutdown was complete.
I left it 30 minutes before trying again and still no joy. The admin service log file had an error saying shared memory was already in use. DB log file showed shutdown was complete.
Flag this post as spam/abuse.
If (.lk isn't there AND promon says "no server..." AND rpagent.exe isn't there) then maybe the db's port is in a hanging state. You should be able to kill/force disconnect the port (with cports or TCP View, etc...)
Then start the db again.
Hmmmm Back in the scenario again. I'll try and give more info this time.
Source:
Win 2003 server 32 bit, running Progress 11.2.1 32 bit (yes I know. We are migrating to 11.5 64 bit in May).
Target:
Win 2008 R2 64 bit running Progress 11.2.1 32 bit.
Log File:
[2015/03/30@21:01:27.491+0100] P-3632 T-3628 I RPLA 162: (9407) Connection failure for host 192.168.125.1 port 2633 transport TCP. [2015/03/30@21:01:27.492+0100] P-3632 T-3628 I RPLA 162: (-----) Diagnostic Dump of RPCommInfo_t - TCP/IP Poll Error:2 [2015/03/30@21:01:27.492+0100] P-3632 T-3628 I RPLA 162: (-----) 0000: 0000 0000 0000 0000 28a9 6700 2811 0000 2311 0000 9411 0000 0200 0000 2400 0000 [2015/03/30@21:01:27.492+0100] P-3632 T-3628 I RPLA 162: (-----) 0020: 8d6d 0000 a044 0400 0000 0000 508f 1955 0000 0000 4021 0000 0000 0000 0500 0000 [2015/03/30@21:01:27.492+0100] P-3632 T-3628 I RPLA 162: (-----) 0040: 0000 0000 58f0 ffff 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 [2015/03/30@21:01:27.492+0100] P-3632 T-3628 I RPLA 162: (-----) 0060: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 [2015/03/30@21:01:27.492+0100] P-3632 T-3628 I RPLA 162: (-----) 0080: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 [2015/03/30@21:01:27.492+0100] P-3632 T-3628 I RPLA 162: (-----) 00a0: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 [2015/03/30@21:01:27.492+0100] P-3632 T-3628 I RPLA 162: (-----) 00c0: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 [2015/03/30@21:01:27.492+0100] P-3632 T-3628 I RPLA 162: (-----) 00e0: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 [2015/03/30@21:01:27.492+0100] P-3632 T-3628 I RPLA 162: (-----) 0100: 0000 0000 0000 0000 0000 0000 3139 322e 3136 382e 3132 352e 3100 0000 0000 0000 [2015/03/30@21:01:27.492+0100] P-3632 T-3628 I RPLA 162: (-----) 0120: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 [2015/03/30@21:01:27.492+0100] P-3632 T-3628 I RPLA 162: (-----) 0140: 0000 0000 0000 0000 0000 0000 [2015/03/30@21:01:27.492+0100] P-3632 T-3628 I RPLA 162: (10492) A communications error -157 occurred in function rpNLA_PollListener while receiving a message. [2015/03/30@21:01:27.492+0100] P-3632 T-3628 I RPLA 162: (11699) A TCP/IP failure has occurred. The Agent's will enter PRE-TRANSITION, waiting for connection from the Replication Server.
We are in Pre Transition.
I've restarted the server and we've gone to Performing Startup Synchronisation.
Fingers crossed it'll come back but some ideas where to look for why this is happening would be good as I don't appreciate getting alerted during the night :D
When I say restarted the server I mean I've restarted the replication server, not the whole server!
A little bit off topic, I had a support case once, and was told that replication across nonidentical OS (Windows 2003 32 bit vs Windows 2003 64 bit in my case) wasn't supported.
A little bit off topic, I had a support case once, and was told that replication across nonidentical OS (Windows 2003 32 bit vs Windows 2003 64 bit in my case) wasn't supported.
Flag this post as spam/abuse.
Just as an extra note: we have 6 databases that are replicating between the same servers. Is it at all pertinent that only one of them is failing like this?
And now the systems guys tell me they were messing around with switches last night. Nice of them to warn me.
A little bit off topic, I had a support case once, and was told that replication across nonidentical OS (Windows 2003 32 bit vs Windows 2003 64 bit in my case) wasn't supported.
Flag this post as spam/abuse.
The Progress service-pack level (as a part of OE version) should be (or is recommended to be) the same on source and on target boxes (due to the possible difference in the structure of recovery notes). I guess it's can't be guaranteed if the Progress bit-ness is different. That is why it's not supported.
Regards,
George
So how do I work out what's using the shared memory on win? Got ProcMon installed. Not sure how to reconcile what I see with a particular DB.
So how do I work out what's using the shared memory on win? Got ProcMon installed. Not sure how to reconcile what I see with a particular DB.
Flag this post as spam/abuse.
Not procmon tho, but procexp. Eg process explorer
So how do I work out what's using the shared memory on win? Got ProcMon installed. Not sure how to reconcile what I see with a particular DB.
Flag this post as spam/abuse.
Flag this post as spam/abuse.
Got a really weird one at the moment. I'll keep it in this thread as it is related.
Source DB says that replication is in normal processing. Target DB is listening. AI notes are not being processed as the files are all set at locked. I've restarted the target DB. I don't want to restart source unless I have to. Source hasn't responded to a terminate server request. restart server says the server is already running. Any ideas?