Question concerning scripting a progress restore - getting C

Posted by Stephanie Seney on 17-Jun-2014 14:57

We're running OE 10.2b04 on SuSe Linux - enterprise DB lic - char only.  I created a script that will run from cron - to restore selected DBs to a "testing" location after the nightly full progress backup.  I was testing the script and all was running good until it got to one of the DB backup files.  it started to throw the CRC error 1147.  I know what that means, but the odd thing is for the date of that backup file - the prorest -vm and prorest -vf showed NO issues.

[2014/06/17@13:11:29.161-0400] P-24485      T-2092578560 I          : (12854) Restoring database from /marcs/clsbi/marccls.db.
[2014/06/17@13:11:41.905-0400] P-24485      T-2092578560 I          : (1147)  CRC check error reading backup block 2707
[2014/06/17@13:11:41.907-0400] P-24485      T-2092578560 I          : (1147)  CRC check error reading backup block 2708
[2014/06/17@13:11:41.912-0400] P-24485      T-2092578560 I          : (1147)  CRC check error reading backup block 2709

First - the backup file of this DB is 5 days old.  Below is the output showing that it appears to have completed cleanly:
******** BACKUP of marccls started  Thu Jun 12 01:59:38 EDT 2014

289159 active blocks out of 485913 blocks in /marcs/clsbi/marccls will be dumped. (6686)
0 BI blocks will be dumped. (6688)
The blocksize is 8192. (6994)
Backup requires an estimated 878.5 MBytes of media. (9285)
Restore would require an estimated 289159 db blocks using 2.2 GBytes of media. (9286)
Backed up 289170 db blocks in 00:00:17
Wrote a total of 3310 backup blocks using 879.2 MBytes of media. (13625)

Backup complete. (3740)
This is a full backup of /marcs/clsbi/marccls.db. (6759)
This backup was taken Thu Jun 12 01:59:39 2014. (6760)
The blocksize is 8192. (6994)
Partial verification successfully read backup volume. (6765)
Verify pass started. (3751)
Verified 289170 db blocks in 04:43:44
Backup for /marcs/clsbi/marccls.db verified ok. (6758)

This is a full backup of /marcs/clsbi/marccls.db. (6759)
This backup was taken Thu Jun 12 01:59:39 2014. (6760)
The blocksize is 8192. (6994)
It will require a minimum of 485913 blocks to restore. (6763)
Full verify pass started. (3752)
Verified 289170 db blocks in 00:00:07
Full verify successful. (3758)
BACKUP of marccls completed  Thu Jun 12 06:44:43 EDT 2014
OpenEdge Release 10.2B04 as of Thu Mar  3 19:15:28 EST 2011

On 6/12/14 we had an issue with the VM disk space filling up, stalling all our processes - including the progress backups  Once our SA allocated more disk space - the VM backups continued on and the system was available again.

I don't fully understand why if the check of the backup on the day of the backup was clean, that it would now throw CRC errors.

We will be backing up our DBs again tonight as full backups - so I expect that this issue will resolve itself.  If anyone might have some thoughts on why this would/could happen, that would be helpful.

All Replies

Posted by Jean Richert on 18-Jun-2014 07:17

Hello sseney,

Me not being a DB expert I still searched through the Community search engine which is also searching the KB articles and did find the following that may help you.

progresscustomersupport-survey.secure.force.com/.../P18147

progresscustomersupport-survey.secure.force.com/.../P9362

Posted by Stephanie Seney on 18-Jun-2014 08:24

Jean, thanks, I had already looked at those.  My concern was more that the backup & verify showed no errors.  Yet when using that backup file, it apparently was corrupted.  If you look at the data that I posted from the actual backup on that day for that DB & the verify passes done - all say OK.  I'm sure it had something to do with the fact that our VM ran out of space during the time that the PROGRESS backups ran - when our SA increased the space (everything picked up and continued)...yet progress didn't seem to think there were any issues with the progress backup at that time.  

I've since created a new full backup & will be testing my script again today - and I'm fairly confident that the restores will work cleanly.  

It is, however, a bit disconcerting that the prorest command in this instance was not very reliable.

Posted by Thomas Mercer-Hursh on 18-Jun-2014 09:57

Stephanie (btw, you might fill in your name), I haven't followed the details of this thread since it has been quite a while since I did this sort of thing, but your description sounds a lot like you made a valid backup and someplace later, outside of Progress control, corruption occurred.  If so, it is hardly surprising that the restore should throw errors.

Posted by Stephanie Seney on 18-Jun-2014 11:35

I can accept that, however the issue that occurred was on the day/night of the backup on 6/12 - not after that time.  While the progress backups were executing on the VM machine, the VM SAN ran out of disk space.  This happened apparently during the progress backups.   When the VM ran out of space, it apparently "suspends" operations.  The whole server was thus "unavailable" the following morning.  Our SA increased the space needed by the VM snapshots, and then the server(s) were available.  Once the our DB server was "awakened", the progress backups completed - along with the varifies - as seen in my initial post.  However, it appears that because the VM ran out of space during the backup of this DB - that apparently corrupted the progress backup, yet, the immediate prorest verify passes didn't detect that corruption.  

My concern is that if we were to script the restore, if the initial verify shows it was GOOD, but something undetected by progress backup verify does happen and the CRC is thrown, how long before the restore attempt is aborted because of the CRC errors?  In this case, I was "monitoring" the restore in part...and just killed the script since it seemed like all it would do is continue with the CRC error until the disk space of the restore location was filled up.

I'm not saying that the CRC error is not "legit" because it was....and I can attribute it to the VM disk space issue on the day of the backup, yet, the verify pass didn't catch that it was corrupted.  We create full progress backups nightly that reside on disk.  I will be attempting to restore them again today to the special testing location - so that I can do some time tests of how long a DB change on our LIVE DBs will take, for a project to be installed next week.

This special restore script would not be used daily - just on an as needed case.  I was hoping to be able to run it right after the DB backups complete (and they are up in MU mode ready for the following day), but if the restore hangs if any CRC issue happens...I'm not going to since I'm not comfortable now with the verify pass.

This thread is closed