The first Sunday of every month, we run proutil -C dbanalys against all databases to determine dump and load needs, track extent growth, etc.
This morning, a database crashed with the error:
Routine upanalys releasing forgotten buffer lock on (6, 80768) Error getting first extent for area 0, error = -20006. (7113) ** Save file named core for analysis by Progress Software Corporation. (439)
This appears in both the .lg file as well as the output of the dbanalys. The dbanalys completed successfully. We got a trace file which we will get to Progress on Monday morning - but this is something I haven't seen before and was wondering if anyone has ay ideas.
The database restarted and has been functioning without error since the crash - no further messages in the log.
OE 10.1B SP3/HPUX 11.11
This is what the "old KB" says about 7113:
---
Status: Verified
SYMPTOM(s):
Error getting first extent for area , error = . (7113)
Error getting first extent for area , error = -20006. (7113)
Error when trying to start a database broker or client
Error when trying to start a database in single session
Failed to open file protrace.13682 errno 13 (1263)
SYSTEM ERROR: The broker is exiting unexpectedly, beginning Abnormal Shutdown. (5292)
A previous prorest /db/81d/live/appl failed. (614)
!!! ERROR - Database restore utility FAILED !!! (8564)
Backup failed due to EOF during next output device request. (5057)
FACT(s) (Environment):
All Supported Operating Systems
Progress/OpenEdge Product Family
CAUSE:
The client was "hotspare'ing" the live database every night (probkup) and prorest'ing it through scripts, without validation. Users access this database for querying and reports in multi user mode, and get error 7113. Single user connect, provided same result. The live database was connecting without errors.
The "hotspare" Log reads like this:
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
prorest session begin for mxpadm on batch. (451)
Full restore started. (1368)
Started restoring volume 2. (3763)
Backup failed due to EOF during next output device request. (5057)
Restore failed. (1618)
!!! ERROR - Database restore utility FAILED !!! (8564)
prorest session end. (334)
BROKER 0: A previous prorest /db/81d/live/appl failed. (614)
BROKER 0: Multi-user session begin. (333)
BROKER 0: Error getting first extent for area 16, error = -20006. (7113)
BROKER 0: SYSTEM ERROR: The broker is exiting unexpectedly, beginning Abnormal Shutdown. (5292)
BROKER 0: drexit: Initiating Abnormal Shutdown
BROKER 0: ** Save file named core for analysis by Progress Software Corporation. (439)
BROKER 0: Failed to open file protrace.13682 errno 13 (1263)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
In the log file, the problem started with the incomplete restore of a back-up. -20006 is called by DSM_S_AREA_NO_EXTENT or DSM_S_AREA_NULL and means: "Passed area not found".
Quite simply, the storage areas 'expected' were not present due to running out of discspace when backing up the database: "Backup failed due to EOF during next output device request. (5057)"
FIX:
Initially, resolve disc space issues:
Running: probkup db-name /dev/null -scan will NOT make a backup, but will show how many MB of media are needed to backup ALL the database blocks. Consider allocating (more) storage
space on the disc for this backup procedure. The failure is recorded, effectively, that users are trying to connect to an "incomplete copy" of the live database.
Finally, review cron jobs and insert verification checks like:
prorest dbname -vf
to alert "bad backup" before users try to access it the following morning.
Consider allocating reserved storage_space on disc for this process.
Thanks, Tim
First it was my bad typing a 7116 instead of 7113. This wasn't a disk space issue - we had plenty - but it being a disk problem makes sense - we've had ongoing issues with our Network Storage and proutil -C busy - on rare occasion, it would return a nonstandard return code (like a 13) which our monitoring script would interpret as a database failure resulting in the database being restarted through ServiceGuard. We've since changed the script to check twice to confirm the failure before restartng the database.
I'm guessing that this was a case where proutil received a return code from a disk related call that made no sense to it and shutdown the database to protect it.
Again, thanks!!
The typo was also mine - I cut 'n' pasted 7113, not 7116 when I did the KB lookup. The reported KB was for 7113, not 7116. (I've edited my original post to reflect that.
Even so, if it pointed you in the right directio, then I hope it led to a solution, and I was glad to help!