Replication target error 2329 source is fine

Posted by James Palmer on 30-Jun-2017 10:52

My replication target is crashing with 

SYSTEM ERROR: Invalid block 609150 for file D:\index\databases\live\audit_8.d1, max is 609087

Source is fine. It's an index area. 

Source:

d "Indexes":8,8;64 D:\index\databases\live\audit\audit_8.d1 f 2048000
d "Indexes":8,8;64 D:\index\databases\live\audit\audit_8.d2 f 2048000
d "Indexes":8,8;64 D:\index\databases\live\audit\audit_8.d3 f 2048000
d "Indexes":8,8;64 D:\index\databases\live\audit\audit_8.d4 f 2048000
d "Indexes":8,8;64 D:\index\databases\live\audit\audit_8.d5 f 2048000
d "Indexes":8,8;64 D:\index\databases\live\audit\audit_8.d6 f 2048000
d "Indexes":8,8;64 D:\index\databases\live\audit\audit_8.d7 f 2048000
d "Indexes":8,8;64 D:\index\databases\live\audit\audit_8.d8 f 2048000
d "Indexes":8,8;64 D:\index\databases\live\audit\audit_8.d9 f 2048000
d "Indexes":8,8;64 D:\index\databases\live\audit\audit_8.d10 f 2048000
d "Indexes":8,8;64 D:\index\databases\live\audit\audit_8.d11

Target: 

d "Indexes":8,8;64 D:\index\databases\live\audit_8.d1 

RPB is the same. How can this happen? I assume the only solution is reseed replication? 

OE 10.2B

All Replies

Posted by George Potemkin on 30-Jun-2017 12:31

Db blocksize is 4K?

block 609150 (and block 609087) => 2.3 GB

Size of the source area is 20 GB.

What is the size of audit_8.d1 on target db?

IMHO, more likely it's an index corruption on target db.

Posted by cjbrandt on 30-Jun-2017 21:42

Is the structure file of the source and target the same ?  I haven't used replication in a long time so I don't recall if the target can differ from the source.

Posted by James Palmer on 27-Jul-2017 18:25

So this has happened again, for the same customer.

Source

[2017/07/27@23:02:28.118+0100] P-7480       T-6060  I RPLS   26: (9407)  Connection failure for host cmlsql03 port 4509 transport TCP. 
[2017/07/27@23:02:28.119+0100] P-7480       T-6060  I RPLS   26: (11713) A communications error -4004 in rpCOM_SendMsg. 
[2017/07/27@23:02:28.119+0100] P-7480       T-6060  I RPLS   26: (-----) Diagnostic Dump of RPCommInfo_t - TCP/IP Send Error
[2017/07/27@23:02:28.119+0100] P-7480       T-6060  I RPLS   26: (-----) 0000:  c088 fc01 0000 0000 0000 0000 9d11 0000 e489 0000 e489 0000 0200 0000 4200 0000 
[2017/07/27@23:02:28.119+0100] P-7480       T-6060  I RPLS   26: (-----) 0020:  ea05 0000 9a00 0000 0000 0000 ee62 7a59 0000 0000 3c41 0000 0000 0000 0000 0000 
[2017/07/27@23:02:28.119+0100] P-7480       T-6060  I RPLS   26: (-----) 0040:  0000 0000 baff ffff 4627 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 
[2017/07/27@23:02:28.119+0100] P-7480       T-6060  I RPLS   26: (-----) 0060:  0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 
[2017/07/27@23:02:28.119+0100] P-7480       T-6060  I RPLS   26: (-----) 0080:  0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 
[2017/07/27@23:02:28.119+0100] P-7480       T-6060  I RPLS   26: (-----) 00a0:  0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 
[2017/07/27@23:02:28.119+0100] P-7480       T-6060  I RPLS   26: (-----) 00c0:  0000 0000 0000 0000 0000 0000 636d 6c73 716c 3033 0000 0000 0000 0000 0000 0000 
[2017/07/27@23:02:28.119+0100] P-7480       T-6060  I RPLS   26: (-----) 00e0:  0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 
[2017/07/27@23:02:28.119+0100] P-7480       T-6060  I RPLS   26: (-----) 0100:  0000 0000 0000 0000 0000 0000 3137 322e 3136 2e31 342e 3732 0000 0000 0000 0000 
[2017/07/27@23:02:28.119+0100] P-7480       T-6060  I RPLS   26: (-----) 0120:  0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 
[2017/07/27@23:02:28.119+0100] P-7480       T-6060  I RPLS   26: (-----) 0140:  0000 0000 0000 0000 0000 0000 
[2017/07/27@23:02:28.120+0100] P-7480       T-6060  I RPLS   26: (10491) A communications error -155 occurred in function rpNLS_SendAIBlockToAgent while sending AIBLOCK. 
[2017/07/27@23:02:28.120+0100] P-7480       T-6060  I RPLS   26: (10661) The Fathom Replication Server is beginning recovery for agent l_idx41_audit. 
[2017/07/27@23:02:28.120+0100] P-7480       T-6060  I RPLS   26: (10842) Connecting to Fathom Replication Agent l_idx41_audit.


Target

[2017/07/27@23:02:22.688+0100] P-2636       T-3216  I RPLA   26: (2329)  SYSTEM ERROR: Invalid block 729342 for file D:\index\databases\live\audit_8.d1, max is 729279 
[2017/07/27@23:02:22.692+0100] P-2636       T-3216  F RPLA   26: (612)   SYSTEM ERROR: Possible file truncation, 729342 too big for database. 
[2017/07/27@23:03:12.302+0100] P-5516       T-5520  I WDOG   28: (2527)  Disconnecting dead user 26. 
[2017/07/28@00:16:52.659+0100] P-964        T-5460  I RPLU       (10829) Database connections are not allowed at this time. 
[2017/07/28@00:16:52.709+0100] P-964        T-5460  I RPLU       (10829) Database connections are not allowed at this time. 
[2017/07/28@00:16:52.759+0100] P-964        T-5460  I RPLU       (10829) Database connections are not allowed at this time. 
[2017/07/28@00:16:52.809+0100] P-964        T-5460  I RPLU       (10829) Database connections are not allowed at this time. 
[2017/07/28@00:16:52.859+0100] P-964        T-5460  I RPLU       (10829) Database connections are not allowed at this time. 
[2017/07/28@00:16:52.909+0100] P-964        T-5460  I RPLU       (10829) Database connections are not allowed at this time. 
[2017/07/28@00:16:52.959+0100] P-964        T-5460  I RPLU       (10829) Database connections are not allowed at this time. 
[2017/07/28@00:16:53.009+0100] P-964        T-5460  I RPLU       (10829) Database connections are not allowed at this time. 
[2017/07/28@00:16:53.059+0100] P-964        T-5460  I RPLU       (10829) Database connections are not allowed at this time. 
[2017/07/28@00:16:53.109+0100] P-964        T-5460  I RPLU       (10829) Database connections are not allowed at this time. 
[2017/07/28@00:16:53.159+0100] P-964        T-5460  I RPLU       (10829) Database connections are not allowed at this time. 
[2017/07/28@00:16:53.209+0100] P-964        T-5460  I RPLU       (10829) Database connections are not allowed at this time. 
[2017/07/28@00:16:53.259+0100] P-964        T-5460  I RPLU       (10829) Database connections are not allowed at this time. 
[2017/07/28@00:16:53.309+0100] P-964        T-5460  I RPLU       (10829) Database connections are not allowed at this time. 
[2017/07/28@00:16:53.359+0100] P-964        T-5460  I RPLU       (10829) Database connections are not allowed at this time. 
[2017/07/28@00:16:53.409+0100] P-964        T-5460  I RPLU       (10829) Database connections are not allowed at this time. 
[2017/07/28@00:16:53.459+0100] P-964        T-5460  I RPLU       (10829) Database connections are not allowed at this time. 
[2017/07/28@00:16:53.509+0100] P-964        T-5460  I RPLU       (10829) Database connections are not allowed at this time. 
[2017/07/28@00:16:53.559+0100] P-964        T-5460  I RPLU       (10829) Database connections are not allowed at this time. 
[2017/07/28@00:16:53.609+0100] P-964        T-5460  I RPLU       (10829) Database connections are not allowed at this time. 
[2017/07/28@00:16:53.659+0100] P-964        T-5460  I RPLU       (10829) Database connections are not allowed at this time. 
[2017/07/28@00:16:53.659+0100] P-964        T-5460  I RPLU       (10829) Database connections are not allowed at this time. 
[2017/07/28@00:16:53.699+0100] P-964        T-5460  I REPL     : (10429) The user failed to connect to database D:\index\databases\live\audit with error -1 in rpDB_OpenDatabase. 
[2017/07/28@00:16:53.699+0100] P-964        T-5460  I REPL     : (10717) The Fathom Replication Utility cannot connect to database D:\index\databases\live\audit. 



Source and target structures are the same, everything I can think of is the same. How can you get index corruption on the target and not the source?

Posted by cjbrandt on 27-Jul-2017 20:01

I used to get a similar error - but it was rolling forward AI files using rfutil - not replication.  We fixed it by adding a new extent - so audit_8.d2 on target.  after we did this, we could continue rolling forward AI files.  It was on versions of 10.2B

Posted by James Palmer on 28-Jul-2017 03:07

That's interesting. You you reckon if I add an extent the DB will just work again?

Unfortunately in this case it will be useless as I disabled Replication so they didn't get system down in the meantime.

Posted by Dapeng Wu on 31-Jul-2017 10:58

This error indicates a space allocation error on target. And I agree with cjbrandt that you should fix the .st files on target so that they match the ones on source.

You should also make sure large files are enabled on both hosts, and there is enough disk space available on target.

BTW, the best practice on the RPB value for index areas is to set it to 1.

Dapeng

Posted by James Palmer on 31-Jul-2017 14:40

Setting RPB to 1 on an index area is dangerous. It's fine if you can guarantee you get no data objects in there, but until Progress allow you to specify that a storage area is for indexes only I will not be using RPB 1.

As for the .st files, the structure is the same. The extents are not, but that doesn't matter.

Large files has to be enabled on both. You can't replicate if it's not on for one or the other.

Posted by George Potemkin on 01-Aug-2017 00:21

> Setting RPB to 1 on an index area is dangerous

<Religious war is on>

If someone will accidentally put a data object in there then the area will indeed grow abnormally fast. It's dangerous if DBA is unable to notice such anomalies but why RPB should be blamed?

After each schema update we can easy to check that there are no the mixed data types (tables, indexes, LOBs) located in the same area. No needs to wait while Progress will embed such checks in Data Dictionary.

<Religious war is off>

Posted by James Palmer on 01-Aug-2017 03:27

Fair point George, except in a situation where anyone just goes and adds structures in production as everyone considers themselves a DBA. Yes that is a control issue, but it's difficult to stop bad practise that's been ongoing for many years because of not having a DBA in situ.

Posted by George Potemkin on 01-Aug-2017 04:11

James, putting LOBs in a wrong area would be, IMHO, a more dangerous thing because we can't change their location like we can do with tablemove/indexmove. Probably we can add our own checks into the stadard Data Dictionary programs - just to report a violation of data object allocation rules.

Posted by e.schutten on 01-Aug-2017 06:02
Posted by James Palmer on 01-Aug-2017 06:26

I've already voted ;) For the record, this is something the Standard Storage Areas CCS folks have asked for as well IIRC.

Posted by George Potemkin on 01-Aug-2017 06:43

Object segregation is a digital racism but I've voted for it as well. ;-)

This thread is closed