'IMPORT UNFORMATTED RawBlock' will miss some data while reading large files. Namely after offset of 2147483648 bytes it will periodically skip 1024 bytes, no matter what block size you set: LENGTH(RawBlock) = BlockSize. Tested with Progress V10.2B and V11.4 on Windows. Right now I can't test on Unix but I have the reasons to think the bug exists on Unix as well. The issue can be reproduced with program below (but I used the more detailed investigation). For the tests I used a large db extent and a large video file. My goal is to read the object blocks from Progress databases.
The workaround on Unix is to use the dd command to copy the data blocks to a temp file. Does anybody know the "native" Windows solution? I know there are the implementations of dd command for Windows but I don't like to use them.
/* Program to reproduce the bug */
DEFINE VARIABLE vInputFile AS CHARACTER NO-UNDO INITIAL "<your.large.file>".
DEFINE VARIABLE vBlockSize AS INTEGER NO-UNDO INITIAL 1024.
DEFINE VARIABLE vBlockOffset AS INT64 NO-UNDO INITIAL 0.
&SCOPED-DEFINE Sep "~t"
OUTPUT TO VALUE("GetBlock.txt").
PUT UNFORMATTED
"Offset"
{&Sep} "Shift"
{&Sep} "Read"
SKIP.
DO WHILE vBlockOffset LE 2150046720:
RUN GetBlock(vInputFile, vBlockOffset, vBlockSize).
ASSIGN vBlockOffset = vBlockOffset + vBlockSize.
END.
OUTPUT CLOSE.
PROCEDURE GetBlock.
DEFINE INPUT PARAMETER ipInputFile AS CHARACTER NO-UNDO.
DEFINE INPUT PARAMETER ipBlockOffset AS INT64 NO-UNDO.
DEFINE INPUT PARAMETER ipBlockSize AS INTEGER NO-UNDO.
DEFINE VARIABLE vRawBlock AS RAW NO-UNDO.
DEFINE VARIABLE vOffset1 AS INT64 NO-UNDO.
DEFINE VARIABLE vOffset2 AS INT64 NO-UNDO.
INPUT FROM VALUE(ipInputFile) BINARY.
SEEK INPUT TO ipBlockOffset.
ASSIGN LENGTH(vRawBlock) = ipBlockSize
vOffset1 = SEEK(INPUT).
IMPORT UNFORMATTED vRawBlock.
ASSIGN vOffset2 = SEEK(INPUT).
INPUT CLOSE.
IF vOffset2 - vOffset1 NE ipBlockSize THEN
PUT UNFORMATTED
vOffset1
{&Sep} vOffset2 - vOffset1
{&Sep} LENGTH(vRawBlock)
SKIP.
ASSIGN LENGTH(vRawBlock) = 0.
END PROCEDURE. /* GetBlock */
/* See the result in "GetBlock.txt" file */
Regards,
George
Workaround: the missed chunk of input data can be re-read using READKEY function - just 1K from the beginning of a block. Like that:
INPUT FROM VALUE(ipInputFile) BINARY.
SEEK INPUT TO ipBlockOffset.
ASSIGN LENGTH(opRawBlock) = ipBlockSize.
IMPORT UNFORMATTED opRawBlock.
ASSIGN vReadSize = SEEK(INPUT) - ipBlockOffset.
IF vReadSize GT ipBlockSize THEN
DO:
ASSIGN vReadSize = vReadSize - ipBlockSize.
IF ipBlockSize GT vReadSize THEN
PUT-BYTES(opRawBlock, vReadSize + 1) =
GET-BYTES(opRawBlock, 1, ipBlockSize - vReadSize).
SEEK INPUT TO ipBlockOffset.
REPEAT vReadByte = 1 TO vReadSize:
READKEY PAUSE 0.
IF LASTKEY LT 0 THEN
LEAVE.
PUT-BYTE(opRawBlock, vReadByte) = LASTKEY.
END. /* REPEAT vReadByte */
END. /* IF vReadSize GT ipBlockSize */
INPUT CLOSE.
The issue is logged as the defect PSC00325355
BTW, COPY-LOB statement also has the problems with large files:
COPY-LOB FROM FILE "<your.large.file>" STARTING AT <offset> FOR <length> TO MEMPTR.
If the file is larger than 4GB it issues the error:
Could not seek to appropriate position in file '<your.large.file>' during COPY-LOB (11320)
Thanks for sharing with the rest of the community [mention:ae2ea2f6412743fc8be36c522f414ef0:e9ed411860ed4f2ba0265705b8793d05]!
Hi George:
Quick update on the issue noted above - PSC00325355. This issue is resolved in 11.6.0. The resolution addressed an issue in which the AVM did not process offsets greater than 2 GB properly. Do you have use cases in which the file you are processing is greater than 4GB in size? Is that a common scenario?
Thanks
Evan
Hi Evan,
Thanks for update.
Due to the lack of some Progress utilities that we need I forced sometimes to write the programs that parse db files directly. And nowaday db extents larger 4Gb are quite common case.
Best regards,
George