'IMPORT UNFORMATTED RawBlock' will miss some data while reading large files. Namely after offset of 2147483648 bytes it will periodically skip 1024 bytes, no matter what block size you set: LENGTH(RawBlock) = BlockSize. Tested with Progress V10.2B and V11.4 on Windows. Right now I can't test on Unix but I have the reasons to think the bug exists on Unix as well. The issue can be reproduced with program below (but I used the more detailed investigation). For the tests I used a large db extent and a large video file. My goal is to read the object blocks from Progress databases.
The workaround on Unix is to use the dd command to copy the data blocks to a temp file. Does anybody know the "native" Windows solution? I know there are the implementations of dd command for Windows but I don't like to use them.
/* Program to reproduce the bug */ DEFINE VARIABLE vInputFile AS CHARACTER NO-UNDO INITIAL "<your.large.file>". DEFINE VARIABLE vBlockSize AS INTEGER NO-UNDO INITIAL 1024. DEFINE VARIABLE vBlockOffset AS INT64 NO-UNDO INITIAL 0. &SCOPED-DEFINE Sep "~t" OUTPUT TO VALUE("GetBlock.txt"). PUT UNFORMATTED "Offset" {&Sep} "Shift" {&Sep} "Read" SKIP. DO WHILE vBlockOffset LE 2150046720: RUN GetBlock(vInputFile, vBlockOffset, vBlockSize). ASSIGN vBlockOffset = vBlockOffset + vBlockSize. END. OUTPUT CLOSE. PROCEDURE GetBlock. DEFINE INPUT PARAMETER ipInputFile AS CHARACTER NO-UNDO. DEFINE INPUT PARAMETER ipBlockOffset AS INT64 NO-UNDO. DEFINE INPUT PARAMETER ipBlockSize AS INTEGER NO-UNDO. DEFINE VARIABLE vRawBlock AS RAW NO-UNDO. DEFINE VARIABLE vOffset1 AS INT64 NO-UNDO. DEFINE VARIABLE vOffset2 AS INT64 NO-UNDO. INPUT FROM VALUE(ipInputFile) BINARY. SEEK INPUT TO ipBlockOffset. ASSIGN LENGTH(vRawBlock) = ipBlockSize vOffset1 = SEEK(INPUT). IMPORT UNFORMATTED vRawBlock. ASSIGN vOffset2 = SEEK(INPUT). INPUT CLOSE. IF vOffset2 - vOffset1 NE ipBlockSize THEN PUT UNFORMATTED vOffset1 {&Sep} vOffset2 - vOffset1 {&Sep} LENGTH(vRawBlock) SKIP. ASSIGN LENGTH(vRawBlock) = 0. END PROCEDURE. /* GetBlock */ /* See the result in "GetBlock.txt" file */
Regards,
George
Workaround: the missed chunk of input data can be re-read using READKEY function - just 1K from the beginning of a block. Like that:
INPUT FROM VALUE(ipInputFile) BINARY. SEEK INPUT TO ipBlockOffset. ASSIGN LENGTH(opRawBlock) = ipBlockSize. IMPORT UNFORMATTED opRawBlock. ASSIGN vReadSize = SEEK(INPUT) - ipBlockOffset. IF vReadSize GT ipBlockSize THEN DO: ASSIGN vReadSize = vReadSize - ipBlockSize. IF ipBlockSize GT vReadSize THEN PUT-BYTES(opRawBlock, vReadSize + 1) = GET-BYTES(opRawBlock, 1, ipBlockSize - vReadSize). SEEK INPUT TO ipBlockOffset. REPEAT vReadByte = 1 TO vReadSize: READKEY PAUSE 0. IF LASTKEY LT 0 THEN LEAVE. PUT-BYTE(opRawBlock, vReadByte) = LASTKEY. END. /* REPEAT vReadByte */ END. /* IF vReadSize GT ipBlockSize */ INPUT CLOSE.
The issue is logged as the defect PSC00325355
BTW, COPY-LOB statement also has the problems with large files:
COPY-LOB FROM FILE "<your.large.file>" STARTING AT <offset> FOR <length> TO MEMPTR.
If the file is larger than 4GB it issues the error:
Could not seek to appropriate position in file '<your.large.file>' during COPY-LOB (11320)
Thanks for sharing with the rest of the community [mention:ae2ea2f6412743fc8be36c522f414ef0:e9ed411860ed4f2ba0265705b8793d05]!
Hi George:
Quick update on the issue noted above - PSC00325355. This issue is resolved in 11.6.0. The resolution addressed an issue in which the AVM did not process offsets greater than 2 GB properly. Do you have use cases in which the file you are processing is greater than 4GB in size? Is that a common scenario?
Thanks
Evan
Hi Evan,
Thanks for update.
Due to the lack of some Progress utilities that we need I forced sometimes to write the programs that parse db files directly. And nowaday db extents larger 4Gb are quite common case.
Best regards,
George