My replication target is crashing with
SYSTEM ERROR: Invalid block 609150 for file D:\index\databases\live\audit_8.d1, max is 609087
Source is fine. It's an index area.
Source:
d "Indexes":8,8;64 D:\index\databases\live\audit\audit_8.d1 f 2048000
d "Indexes":8,8;64 D:\index\databases\live\audit\audit_8.d2 f 2048000
d "Indexes":8,8;64 D:\index\databases\live\audit\audit_8.d3 f 2048000
d "Indexes":8,8;64 D:\index\databases\live\audit\audit_8.d4 f 2048000
d "Indexes":8,8;64 D:\index\databases\live\audit\audit_8.d5 f 2048000
d "Indexes":8,8;64 D:\index\databases\live\audit\audit_8.d6 f 2048000
d "Indexes":8,8;64 D:\index\databases\live\audit\audit_8.d7 f 2048000
d "Indexes":8,8;64 D:\index\databases\live\audit\audit_8.d8 f 2048000
d "Indexes":8,8;64 D:\index\databases\live\audit\audit_8.d9 f 2048000
d "Indexes":8,8;64 D:\index\databases\live\audit\audit_8.d10 f 2048000
d "Indexes":8,8;64 D:\index\databases\live\audit\audit_8.d11
Target:
d "Indexes":8,8;64 D:\index\databases\live\audit_8.d1
RPB is the same. How can this happen? I assume the only solution is reseed replication?
OE 10.2B
Db blocksize is 4K?
block 609150 (and block 609087) => 2.3 GB
Size of the source area is 20 GB.
What is the size of audit_8.d1 on target db?
IMHO, more likely it's an index corruption on target db.
Is the structure file of the source and target the same ? I haven't used replication in a long time so I don't recall if the target can differ from the source.
So this has happened again, for the same customer.
Source
[2017/07/27@23:02:28.118+0100] P-7480 T-6060 I RPLS 26: (9407) Connection failure for host cmlsql03 port 4509 transport TCP. [2017/07/27@23:02:28.119+0100] P-7480 T-6060 I RPLS 26: (11713) A communications error -4004 in rpCOM_SendMsg. [2017/07/27@23:02:28.119+0100] P-7480 T-6060 I RPLS 26: (-----) Diagnostic Dump of RPCommInfo_t - TCP/IP Send Error [2017/07/27@23:02:28.119+0100] P-7480 T-6060 I RPLS 26: (-----) 0000: c088 fc01 0000 0000 0000 0000 9d11 0000 e489 0000 e489 0000 0200 0000 4200 0000 [2017/07/27@23:02:28.119+0100] P-7480 T-6060 I RPLS 26: (-----) 0020: ea05 0000 9a00 0000 0000 0000 ee62 7a59 0000 0000 3c41 0000 0000 0000 0000 0000 [2017/07/27@23:02:28.119+0100] P-7480 T-6060 I RPLS 26: (-----) 0040: 0000 0000 baff ffff 4627 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 [2017/07/27@23:02:28.119+0100] P-7480 T-6060 I RPLS 26: (-----) 0060: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 [2017/07/27@23:02:28.119+0100] P-7480 T-6060 I RPLS 26: (-----) 0080: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 [2017/07/27@23:02:28.119+0100] P-7480 T-6060 I RPLS 26: (-----) 00a0: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 [2017/07/27@23:02:28.119+0100] P-7480 T-6060 I RPLS 26: (-----) 00c0: 0000 0000 0000 0000 0000 0000 636d 6c73 716c 3033 0000 0000 0000 0000 0000 0000 [2017/07/27@23:02:28.119+0100] P-7480 T-6060 I RPLS 26: (-----) 00e0: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 [2017/07/27@23:02:28.119+0100] P-7480 T-6060 I RPLS 26: (-----) 0100: 0000 0000 0000 0000 0000 0000 3137 322e 3136 2e31 342e 3732 0000 0000 0000 0000 [2017/07/27@23:02:28.119+0100] P-7480 T-6060 I RPLS 26: (-----) 0120: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 [2017/07/27@23:02:28.119+0100] P-7480 T-6060 I RPLS 26: (-----) 0140: 0000 0000 0000 0000 0000 0000 [2017/07/27@23:02:28.120+0100] P-7480 T-6060 I RPLS 26: (10491) A communications error -155 occurred in function rpNLS_SendAIBlockToAgent while sending AIBLOCK. [2017/07/27@23:02:28.120+0100] P-7480 T-6060 I RPLS 26: (10661) The Fathom Replication Server is beginning recovery for agent l_idx41_audit. [2017/07/27@23:02:28.120+0100] P-7480 T-6060 I RPLS 26: (10842) Connecting to Fathom Replication Agent l_idx41_audit.
Target
[2017/07/27@23:02:22.688+0100] P-2636 T-3216 I RPLA 26: (2329) SYSTEM ERROR: Invalid block 729342 for file D:\index\databases\live\audit_8.d1, max is 729279 [2017/07/27@23:02:22.692+0100] P-2636 T-3216 F RPLA 26: (612) SYSTEM ERROR: Possible file truncation, 729342 too big for database. [2017/07/27@23:03:12.302+0100] P-5516 T-5520 I WDOG 28: (2527) Disconnecting dead user 26. [2017/07/28@00:16:52.659+0100] P-964 T-5460 I RPLU (10829) Database connections are not allowed at this time. [2017/07/28@00:16:52.709+0100] P-964 T-5460 I RPLU (10829) Database connections are not allowed at this time. [2017/07/28@00:16:52.759+0100] P-964 T-5460 I RPLU (10829) Database connections are not allowed at this time. [2017/07/28@00:16:52.809+0100] P-964 T-5460 I RPLU (10829) Database connections are not allowed at this time. [2017/07/28@00:16:52.859+0100] P-964 T-5460 I RPLU (10829) Database connections are not allowed at this time. [2017/07/28@00:16:52.909+0100] P-964 T-5460 I RPLU (10829) Database connections are not allowed at this time. [2017/07/28@00:16:52.959+0100] P-964 T-5460 I RPLU (10829) Database connections are not allowed at this time. [2017/07/28@00:16:53.009+0100] P-964 T-5460 I RPLU (10829) Database connections are not allowed at this time. [2017/07/28@00:16:53.059+0100] P-964 T-5460 I RPLU (10829) Database connections are not allowed at this time. [2017/07/28@00:16:53.109+0100] P-964 T-5460 I RPLU (10829) Database connections are not allowed at this time. [2017/07/28@00:16:53.159+0100] P-964 T-5460 I RPLU (10829) Database connections are not allowed at this time. [2017/07/28@00:16:53.209+0100] P-964 T-5460 I RPLU (10829) Database connections are not allowed at this time. [2017/07/28@00:16:53.259+0100] P-964 T-5460 I RPLU (10829) Database connections are not allowed at this time. [2017/07/28@00:16:53.309+0100] P-964 T-5460 I RPLU (10829) Database connections are not allowed at this time. [2017/07/28@00:16:53.359+0100] P-964 T-5460 I RPLU (10829) Database connections are not allowed at this time. [2017/07/28@00:16:53.409+0100] P-964 T-5460 I RPLU (10829) Database connections are not allowed at this time. [2017/07/28@00:16:53.459+0100] P-964 T-5460 I RPLU (10829) Database connections are not allowed at this time. [2017/07/28@00:16:53.509+0100] P-964 T-5460 I RPLU (10829) Database connections are not allowed at this time. [2017/07/28@00:16:53.559+0100] P-964 T-5460 I RPLU (10829) Database connections are not allowed at this time. [2017/07/28@00:16:53.609+0100] P-964 T-5460 I RPLU (10829) Database connections are not allowed at this time. [2017/07/28@00:16:53.659+0100] P-964 T-5460 I RPLU (10829) Database connections are not allowed at this time. [2017/07/28@00:16:53.659+0100] P-964 T-5460 I RPLU (10829) Database connections are not allowed at this time. [2017/07/28@00:16:53.699+0100] P-964 T-5460 I REPL : (10429) The user failed to connect to database D:\index\databases\live\audit with error -1 in rpDB_OpenDatabase. [2017/07/28@00:16:53.699+0100] P-964 T-5460 I REPL : (10717) The Fathom Replication Utility cannot connect to database D:\index\databases\live\audit.
Source and target structures are the same, everything I can think of is the same. How can you get index corruption on the target and not the source?
I used to get a similar error - but it was rolling forward AI files using rfutil - not replication. We fixed it by adding a new extent - so audit_8.d2 on target. after we did this, we could continue rolling forward AI files. It was on versions of 10.2B
That's interesting. You you reckon if I add an extent the DB will just work again?
Unfortunately in this case it will be useless as I disabled Replication so they didn't get system down in the meantime.
This error indicates a space allocation error on target. And I agree with cjbrandt that you should fix the .st files on target so that they match the ones on source.
You should also make sure large files are enabled on both hosts, and there is enough disk space available on target.
BTW, the best practice on the RPB value for index areas is to set it to 1.
Dapeng
Setting RPB to 1 on an index area is dangerous. It's fine if you can guarantee you get no data objects in there, but until Progress allow you to specify that a storage area is for indexes only I will not be using RPB 1.
As for the .st files, the structure is the same. The extents are not, but that doesn't matter.
Large files has to be enabled on both. You can't replicate if it's not on for one or the other.
> Setting RPB to 1 on an index area is dangerous
<Religious war is on>
If someone will accidentally put a data object in there then the area will indeed grow abnormally fast. It's dangerous if DBA is unable to notice such anomalies but why RPB should be blamed?
After each schema update we can easy to check that there are no the mixed data types (tables, indexes, LOBs) located in the same area. No needs to wait while Progress will embed such checks in Data Dictionary.
<Religious war is off>
Fair point George, except in a situation where anyone just goes and adds structures in production as everyone considers themselves a DBA. Yes that is a control issue, but it's difficult to stop bad practise that's been ongoing for many years because of not having a DBA in situ.
James, putting LOBs in a wrong area would be, IMHO, a more dangerous thing because we can't change their location like we can do with tablemove/indexmove. Probably we can add our own checks into the stadard Data Dictionary programs - just to report a violation of data object allocation rules.
I've already voted ;) For the record, this is something the Standard Storage Areas CCS folks have asked for as well IIRC.
Object segregation is a digital racism but I've voted for it as well. ;-)