backup causes abnormal shutdown

Posted by jmls on 16-Mar-2010 02:26


We recently upgraded our production server to 10.2b, and since then
have had some problems.

Every night, a full online backup is performed. Every night, the
backup fails because the database shuts down with the following
message:


[2010/03/16@00:11:50.757+0000] P-3360       T-3196  I WDOG   63:
(5028)  SYSTEM ERROR: Releasing regular latch. latchId: 18
[2010/03/16@00:11:50.757+0000] P-3360       T-3196  I WDOG   54:
(2522)  User 63 died holding 1 shared memory locks.

this then brings down the whole database

[2010/03/16@00:11:50.991+0000] P-944        T-3716  I SRV    28:
(2520)  Stopped.
[2010/03/16@00:11:52.179+0000] P-1896       T-3580  I BROKER  0:
(15192) The database will complete shutdown within approximately 60
seconds.
[2010/03/16@00:11:52.194+0000] P-1896       T-3580  I BROKER  0:
(2249)  Begin ABNORMAL shutdown code 2
[2010/03/16@00:11:52.460+0000] P-3076       T-2872  I AIMGT  51: (453)
Logout by SYSTEM on CON:.
[2010/03/16@00:11:53.194+0000] P-1896       T-3580  I BROKER  0:
(2527)  Disconnecting dead user 63.

this is obviously not acceptable

User 63 seemed to be the actual backup process :

               Tue Mar 16 00:01:01 2010
[2010/03/16@00:01:01.347+0000] P-456        T-3724  I RFUTIL 63: (452)
Login by administrator on CON:.
[2010/03/16@00:01:01.347+0000] P-456        T-3724  I RFUTIL 63:
(7129)  Usr 63 set name to Aimage list.
[2010/03/16@00:01:02.362+0000] P-456        T-3724  I RFUTIL 63: (453)
Logout by Aimage list on CON:.
[2010/03/16@00:01:02.425+0000] P-2472       T-4020  I RFUTIL 63: (452)
Login by administrator on CON:.
[2010/03/16@00:01:02.440+0000] P-2472       T-4020  I RFUTIL 63:
(7129)  Usr 63 set name to Aimage new.
[2010/03/16@00:01:02.456+0000] P-2472       T-4020  I RFUTIL 63:
(3777)  Switched to ai extent G:\MyDBLive\mydb.a2.
[2010/03/16@00:01:02.456+0000] P-2472       T-4020  I RFUTIL 63:
(3778)  This is after-image file number 298 since the last AIMAGE
BEGIN
[2010/03/16@00:01:03.456+0000] P-2472       T-4020  I RFUTIL 63: (453)
Logout by Aimage new on CON:.
[2010/03/16@00:01:03.565+0000] P-940        T-3548  I BACKUP 63:
(-----) Login by administrator.
[2010/03/16@00:01:03.628+0000] P-940        T-3548  I BACKUP 63:
(12850) Backup blocks will be written to C:\data\mydbbackup\live.bak.
[2010/03/16@00:01:03.628+0000] P-940        T-3548  I BACKUP 63:
(1362)  Full backup started.
[2010/03/16@00:01:03.628+0000] P-940        T-3548  I BACKUP 63:
(6686)  20484617 active blocks out of 20484932 blocks in c:\mydb will
be dumped.
[2010/03/16@00:01:03.628+0000] P-940        T-3548  I BACKUP 63:
(6688)  4096 BI blocks will be dumped.
[2010/03/16@00:01:03.628+0000] P-940        T-3548  I BACKUP 63:
(9285)  Backup requires an estimated 78.4 GBytes of media.
[2010/03/16@00:01:03.628+0000] P-940        T-3548  I BACKUP 63:
(9286)  Restore would require an estimated 20542790 db blocks using
19.6M of media.
[2010/03/16@00:01:03.628+0000] P-940        T-3548  I BACKUP 63:
(3777)  Switched to ai extent G:\mydbLive\mydb.a3.
[2010/03/16@00:01:03.643+0000] P-940        T-3548  I BACKUP 63:
(3778)  This is after-image file number 299 since the last AIMAGE
BEGIN
[2010/03/16@00:01:03.862+0000] P-940        T-3548  I BACKUP 63:
(5459)  Begin backup of Before Image file(s).
[2010/03/16@00:01:06.784+0000] P-940        T-3548  I BACKUP 63:
(5460)  End backup of Before Image file(s).
[2010/03/16@00:01:06.800+0000] P-940        T-3548  I BACKUP 63:
(5461)  Begin backup of Data file(s).


Has anyone else come across this problem.?

Thanks

Julian

All Replies

Posted by jmls on 16-Mar-2010 07:00

Some further info: This was taken from one of several protrace files:

//=====================================================

PROGRESS stack trace as of Tue Mar 16 00:11:52 2010

//=====================================================

Exception code: C0000005 ACCESS_VIOLATION

Fault address:  0063576A 01:0023476A C:\Progress\OpenEdge\bin\_sqlsrv2.exe

Registers:

EAX:10020BD8

EBX:00000000

ECX:00000005

EDX:00408540

ESI:00DD0D40

EDI:00DD0C90

CS:EIP:001B:0063576A

SS:ESP:0023:0012D454  EBP:0012D464

DS:0023  ES:0023  FS:003B  GS:0000

Flags:00210287

Call Stack:

Address   Frame

0063576A  0012D464  dsmContextGetLong+21A

00408558  0012D4B8  0001:00007558 C:\Progress\OpenEdge\bin\_sqlsrv2.exe

00408331  0000000B  0001:00007331 C:\Progress\OpenEdge\bin\_sqlsrv2.exe

Does this give any clue as to what may be happening ?

Posted by ChUIMonster on 16-Mar-2010 07:03

Since it is apparently going to crash anyway... what happens if you shut it down and do an offline backup?  (IOW is the problem exclusive to online backup?)

Posted by jmls on 16-Mar-2010 07:09

When I bring the db back up, I back it up online without a problem. I

suspect that it's a combination of the sql92 server crashes and the

online backup that is causing this.

Julian

Posted by ChUIMonster on 16-Mar-2010 07:14

How soon after restarting do the SQL server crashes resume?

Does an online backup right after a restart succeed?

Posted by jmls on 16-Mar-2010 07:24

1) sometime during the following day.

2) yes. (see previous)

Posted by jmls on 16-Mar-2010 11:37

Anyone ? Backup time is approaching

Posted by rvkanten on 17-Nov-2014 03:27

Was the problem eventually solved and if so, how?

A customer of ours has the same problem.

Their stack trace shows a remarkable ressemblance with yours:

//=====================================================

PROGRESS stack trace as of Fri Nov 14 21:35:07 2014

//=====================================================

Exception code: C0000005 ACCESS_VIOLATION

Fault address:  0063997A 01:0023897A D:\Progress\OE102B\bin\_sqlsrv2.exe

Registers:

EAX:100236D0

EBX:00000000

ECX:00000006

EDX:004085D0

ESI:00911860

EDI:009117B0

CS:EIP:0023:0063997A

SS:ESP:002B:0018D414  EBP:0018D424

DS:002B  ES:002B  FS:0053  GS:002B

Flags:00210283

Call Stack:

Address   Frame

0063997A  0018D424  dsmContextGetLong+21A

004085E8  0018D47C  0001:000075E8 D:\Progress\OE102B\bin\_sqlsrv2.exe

004083C1  0018D48C  0001:000073C1 D:\Progress\OE102B\bin\_sqlsrv2.exe

739D623B  0018D4A2  fopen+0

DFA00000  0018D4A6  0000:00000000

Posted by TheMadDBA on 17-Nov-2014 13:06

Are they running 10.2B with no service packs or low SP numbers (2,3,etc)? There were a substantial number of bugs with the SQL92 side in those old versions.

Make sure they are at least 10.2B07 or 10.2B08 (the last SP/update for 10.2B).

This thread is closed