We recently upgraded our production server to 10.2b, and since then
have had some problems.
Every night, a full online backup is performed. Every night, the
backup fails because the database shuts down with the following
message:
[2010/03/16@00:11:50.757+0000] P-3360 T-3196 I WDOG 63:
(5028) SYSTEM ERROR: Releasing regular latch. latchId: 18
[2010/03/16@00:11:50.757+0000] P-3360 T-3196 I WDOG 54:
(2522) User 63 died holding 1 shared memory locks.
this then brings down the whole database
[2010/03/16@00:11:50.991+0000] P-944 T-3716 I SRV 28:
(2520) Stopped.
[2010/03/16@00:11:52.179+0000] P-1896 T-3580 I BROKER 0:
(15192) The database will complete shutdown within approximately 60
seconds.
[2010/03/16@00:11:52.194+0000] P-1896 T-3580 I BROKER 0:
(2249) Begin ABNORMAL shutdown code 2
[2010/03/16@00:11:52.460+0000] P-3076 T-2872 I AIMGT 51: (453)
Logout by SYSTEM on CON:.
[2010/03/16@00:11:53.194+0000] P-1896 T-3580 I BROKER 0:
(2527) Disconnecting dead user 63.
this is obviously not acceptable
User 63 seemed to be the actual backup process :
Tue Mar 16 00:01:01 2010
[2010/03/16@00:01:01.347+0000] P-456 T-3724 I RFUTIL 63: (452)
Login by administrator on CON:.
[2010/03/16@00:01:01.347+0000] P-456 T-3724 I RFUTIL 63:
(7129) Usr 63 set name to Aimage list.
[2010/03/16@00:01:02.362+0000] P-456 T-3724 I RFUTIL 63: (453)
Logout by Aimage list on CON:.
[2010/03/16@00:01:02.425+0000] P-2472 T-4020 I RFUTIL 63: (452)
Login by administrator on CON:.
[2010/03/16@00:01:02.440+0000] P-2472 T-4020 I RFUTIL 63:
(7129) Usr 63 set name to Aimage new.
[2010/03/16@00:01:02.456+0000] P-2472 T-4020 I RFUTIL 63:
(3777) Switched to ai extent G:\MyDBLive\mydb.a2.
[2010/03/16@00:01:02.456+0000] P-2472 T-4020 I RFUTIL 63:
(3778) This is after-image file number 298 since the last AIMAGE
BEGIN
[2010/03/16@00:01:03.456+0000] P-2472 T-4020 I RFUTIL 63: (453)
Logout by Aimage new on CON:.
[2010/03/16@00:01:03.565+0000] P-940 T-3548 I BACKUP 63:
(-----) Login by administrator.
[2010/03/16@00:01:03.628+0000] P-940 T-3548 I BACKUP 63:
(12850) Backup blocks will be written to C:\data\mydbbackup\live.bak.
[2010/03/16@00:01:03.628+0000] P-940 T-3548 I BACKUP 63:
(1362) Full backup started.
[2010/03/16@00:01:03.628+0000] P-940 T-3548 I BACKUP 63:
(6686) 20484617 active blocks out of 20484932 blocks in c:\mydb will
be dumped.
[2010/03/16@00:01:03.628+0000] P-940 T-3548 I BACKUP 63:
(6688) 4096 BI blocks will be dumped.
[2010/03/16@00:01:03.628+0000] P-940 T-3548 I BACKUP 63:
(9285) Backup requires an estimated 78.4 GBytes of media.
[2010/03/16@00:01:03.628+0000] P-940 T-3548 I BACKUP 63:
(9286) Restore would require an estimated 20542790 db blocks using
19.6M of media.
[2010/03/16@00:01:03.628+0000] P-940 T-3548 I BACKUP 63:
(3777) Switched to ai extent G:\mydbLive\mydb.a3.
[2010/03/16@00:01:03.643+0000] P-940 T-3548 I BACKUP 63:
(3778) This is after-image file number 299 since the last AIMAGE
BEGIN
[2010/03/16@00:01:03.862+0000] P-940 T-3548 I BACKUP 63:
(5459) Begin backup of Before Image file(s).
[2010/03/16@00:01:06.784+0000] P-940 T-3548 I BACKUP 63:
(5460) End backup of Before Image file(s).
[2010/03/16@00:01:06.800+0000] P-940 T-3548 I BACKUP 63:
(5461) Begin backup of Data file(s).
Has anyone else come across this problem.?
Thanks
Julian
Some further info: This was taken from one of several protrace files:
//=====================================================
PROGRESS stack trace as of Tue Mar 16 00:11:52 2010
//=====================================================
Exception code: C0000005 ACCESS_VIOLATION
Fault address: 0063576A 01:0023476A C:\Progress\OpenEdge\bin\_sqlsrv2.exe
Registers:
EAX:10020BD8
EBX:00000000
ECX:00000005
EDX:00408540
ESI:00DD0D40
EDI:00DD0C90
CS:EIP:001B:0063576A
SS:ESP:0023:0012D454 EBP:0012D464
DS:0023 ES:0023 FS:003B GS:0000
Flags:00210287
Call Stack:
Address Frame
0063576A 0012D464 dsmContextGetLong+21A
00408558 0012D4B8 0001:00007558 C:\Progress\OpenEdge\bin\_sqlsrv2.exe
00408331 0000000B 0001:00007331 C:\Progress\OpenEdge\bin\_sqlsrv2.exe
Since it is apparently going to crash anyway... what happens if you shut it down and do an offline backup? (IOW is the problem exclusive to online backup?)
When I bring the db back up, I back it up online without a problem. I
suspect that it's a combination of the sql92 server crashes and the
online backup that is causing this.
Julian
How soon after restarting do the SQL server crashes resume?
Does an online backup right after a restart succeed?
1) sometime during the following day.
2) yes. (see previous)
Anyone ? Backup time is approaching
Was the problem eventually solved and if so, how?
A customer of ours has the same problem.
Their stack trace shows a remarkable ressemblance with yours:
//=====================================================
PROGRESS stack trace as of Fri Nov 14 21:35:07 2014
//=====================================================
Exception code: C0000005 ACCESS_VIOLATION
Fault address: 0063997A 01:0023897A D:\Progress\OE102B\bin\_sqlsrv2.exe
Registers:
EAX:100236D0
EBX:00000000
ECX:00000006
EDX:004085D0
ESI:00911860
EDI:009117B0
CS:EIP:0023:0063997A
SS:ESP:002B:0018D414 EBP:0018D424
DS:002B ES:002B FS:0053 GS:002B
Flags:00210283
Call Stack:
Address Frame
0063997A 0018D424 dsmContextGetLong+21A
004085E8 0018D47C 0001:000075E8 D:\Progress\OE102B\bin\_sqlsrv2.exe
004083C1 0018D48C 0001:000073C1 D:\Progress\OE102B\bin\_sqlsrv2.exe
739D623B 0018D4A2 fopen+0
DFA00000 0018D4A6 0000:00000000
Are they running 10.2B with no service packs or low SP numbers (2,3,etc)? There were a substantial number of bugs with the SQL92 side in those old versions.
Make sure they are at least 10.2B07 or 10.2B08 (the last SP/update for 10.2B).