Error (10716) after unexpected server shutdown - RPLS: seque

Posted by Dmitry Lishafaev on 25-Mar-2015 16:27

I have two AIX servers - Server-Tgt and Server-Src. I have 600Gb archive database with no transaction activity almost at all time (except data archiving - one day per month). This database replicated from Server-Src to Server-Tgt, async mode.

Today Server-Src has exceeded virtual memory size, hung and was rebooted manually.

db log:

[2015/03/25@09:24:23.832+0300] P-9371908 T-1 I AIMGT 5: (3778) This is after-image file number 850 since the last AIMAGE BEGIN

[2015/03/25@09:24:24.002+0300] P-5833054 T-1 I RPLS 14: (11805) Unlocking after-image file 848 and locking ALL FULL after-image files beginning with file 849.
[2015/03/25@09:24:28.850+0300] P-9371908 T-1 I AIMGT 5: (13199) After-image extent /renmsg/ai/renmsg.a4 has been copied to /backup/tape/ai/oss~dbase~renmsg.20150218.182948.00000849.renmsg.a4.
[2015/03/25@09:24:28.851+0300] P-9371908 T-1 I AIMGT 5: (13154) Marked after-image extent /renmsg/ai/renmsg.a4 ARCHIVED.
[2015/03/25@09:24:28.868+0300] P-9371908 T-1 I AIMGT 5: (3789) Marked after-image extent /renmsg/ai/renmsg.a3 EMPTY.
[2015/03/25@10:24:28.910+0300] P-9371908 T-1 I AIMGT 5: (3777) Switched to ai extent /renmsg/ai/renmsg.a1.
[2015/03/25@10:24:28.910+0300] P-9371908 T-1 I AIMGT 5: (3778) This is after-image file number 851 since the last AIMAGE BEGIN
[2015/03/25@10:24:29.056+0300] P-5833054 T-1 I RPLS 14: (11805) Unlocking after-image file 849 and locking ALL FULL after-image files beginning with file 850.
[2015/03/25@10:24:33.932+0300] P-9371908 T-1 I AIMGT 5: (13199) After-image extent /renmsg/ai/renmsg.a5 has been copied to /backup/tape/ai/oss~dbase~renmsg.20150218.182948.00000850.renmsg.a5.
[2015/03/25@10:24:33.933+0300] P-9371908 T-1 I AIMGT 5: (13154) Marked after-image extent /renmsg/ai/renmsg.a5 ARCHIVED.
[2015/03/25@10:24:33.953+0300] P-9371908 T-1 I AIMGT 5: (3789) Marked after-image extent /renmsg/ai/renmsg.a4 EMPTY.
[2015/03/25@11:01:21.395+0300] P-7930094 T-1 I SRV 2: (739) Logout usernum 21, userid appsrv1, on Server-Src batch.

Note - AI sequence 850 was locked before reboot (10:24) and archived at same time with AIMGT (AIMGT policy - one AI per hour)

Server-Src was exceeded swap space at 11:18 and rebooted after 11:20.

During reboot I got on Server-Tgt:

[2015/03/25@11:36:56.621+0300] P-10682534 T-1 I RPLA 5: (11699) A TCP/IP failure has occurred. The Agent's will enter PRE-TRANSITION, waiting for connection from the Replication Server.

After DB start on Server-Src I got:

[2015/03/25@11:51:04.393+0300] P-7405852 T-1 I BROKER 0: (13875) This database is enabled for OpenEdge Replication as a Source database....

[2015/03/25@11:51:07.509+0300] P-6684840 T-1 I RPLS 5: (10507) The Fathom Replication Server has successfully connected to the Fathom Replication Agent agent1 on host Server-Tgt.
[2015/03/25@11:51:07.509+0300] P-6684840 T-1 I RPLS 5: (11251) The Replication Server successfully connected to all of it's configured Agents.
[2015/03/25@11:51:07.649+0300] P-6684840 T-1 I RPLS 5: (10716) Fathom Replication Agent agent1 cannot be configured because the required AI area (area-number>13, sequence number 850 is not available.
[2015/03/25@11:51:07.649+0300] P-6684840 T-1 I RPLS 5: (11696) The Agent agent1 cannot be properly configured and is being terminated.
[2015/03/25@11:51:07.650+0300] P-6684840 T-1 I RPLS 5: (10700) The Fathom Replication Agent agent1 is being terminated.
[2015/03/25@11:51:07.650+0300] P-6684840 T-1 I RPLS 5: (10504) Unexpected error -158 returned to function rpSRV_ServerLoop.

But current sequence number on Server-Tgt is 851 (I ran this command only one hour ago, not immediately)

-bash-4.2# dsrutil msgtgt -C recovery agent

Online Replication Recovery Information for /oss/dbase/msgtgt:
Replication version: 5.0
Date created: Wed Feb 18 23:04:00 2015
Date last written: Wed Mar 25 21:44:18 2015

Replication local agent information:
Last Block: Complete
Last block received location: area: 7, seq: 0, loc: 0, offset: 0
Last block processed location: area: 0, seq: 0, loc: 0, offset: 0
Last block ACKed location: area: 7, seq: 851, loc: 0, offset: 128
Last block received: no date
Last block ACKed: no date
ID of the last TX begin: 139660314
ID of the last TX end: 139660314
Time of last TX end: Wed Mar 11 18:37:44 2015
Last AI Extent processed
AIMAGE BEGIN date: Wed Feb 18 18:29:48 2015
AIMAGE NEW date: Wed Mar 25 10:24:28 2015
After-Image File Number: 851
File Last Opened: Wed Mar 25 10:24:28 2015
Completely Applied to Target: No


I can do dsrutil applyextent only with sequence numbers 851 and above .

I decided to re-enable replication (via backup/restore), but maybe there are another solutions?

10.2B08/AIX

All Replies

Posted by Libor Laubacher on 26-Mar-2015 10:01

Ø  I decided to re-enable replication (via backup/restore), but maybe there are another solutions?

I am afraid not, it’s unfortunately a bug.

Posted by Dmitry Lishafaev on 26-Mar-2015 10:16

Thank you.

This is known bug?  

This bug is fixed in v11.5? (our new platform in couple of months)

This thread is closed