IMPORT raw data is missing the 1K chunks after 2GB offset

Posted by George Potemkin on 02-Jan-2015 14:35

'IMPORT UNFORMATTED RawBlock' will miss some data while reading large files. Namely after offset of 2147483648 bytes it will periodically skip 1024 bytes, no matter what block size you set: LENGTH(RawBlock) = BlockSize. Tested with Progress V10.2B and V11.4 on Windows. Right now I can't test on Unix but I have the reasons to think the bug exists on Unix as well. The issue can be reproduced with program below (but I used the more detailed investigation). For the tests I used a large db extent and a large video file. My goal is to read the object blocks from Progress databases.

The workaround on Unix is to use the dd command to copy the data blocks to a temp file. Does anybody know the "native" Windows solution? I know there are the implementations of dd command for Windows but I don't like to use them.

/* Program to reproduce the bug */
DEFINE VARIABLE vInputFile AS CHARACTER   NO-UNDO INITIAL "<your.large.file>".
DEFINE VARIABLE vBlockSize AS INTEGER NO-UNDO INITIAL 1024.
DEFINE VARIABLE vBlockOffset AS INT64 NO-UNDO INITIAL 0.
 
&SCOPED-DEFINE Sep "~t"
 
OUTPUT TO VALUE("GetBlock.txt").
PUT UNFORMATTED 
         "Offset"
  {&Sep} "Shift"
  {&Sep} "Read"
SKIP.
 
DO WHILE vBlockOffset LE 2150046720:
  RUN GetBlock(vInputFile, vBlockOffset, vBlockSize).
  ASSIGN vBlockOffset = vBlockOffset + vBlockSize.
END.
OUTPUT CLOSE.
 
PROCEDURE GetBlock.
 
  DEFINE INPUT  PARAMETER ipInputFile   AS CHARACTER NO-UNDO.
  DEFINE INPUT  PARAMETER ipBlockOffset AS INT64     NO-UNDO.
  DEFINE INPUT  PARAMETER ipBlockSize   AS INTEGER   NO-UNDO.
  DEFINE VARIABLE vRawBlock    AS RAW       NO-UNDO.
 
  DEFINE VARIABLE vOffset1 AS INT64 NO-UNDO.
  DEFINE VARIABLE vOffset2 AS INT64 NO-UNDO.
 
  INPUT FROM VALUE(ipInputFile) BINARY.
  SEEK INPUT TO ipBlockOffset.
  ASSIGN LENGTH(vRawBlock) = ipBlockSize
         vOffset1 = SEEK(INPUT).
  IMPORT UNFORMATTED vRawBlock.
  ASSIGN vOffset2 = SEEK(INPUT).
  INPUT CLOSE.
 
  IF vOffset2 - vOffset1 NE ipBlockSize THEN
  PUT UNFORMATTED 
           vOffset1 
    {&Sep} vOffset2 - vOffset1 
    {&Sep} LENGTH(vRawBlock)
  SKIP.
 
  ASSIGN LENGTH(vRawBlock) = 0.
END PROCEDURE. /* GetBlock */
 
/* See the result in "GetBlock.txt" file */


Regards,

George

All Replies

Posted by George Potemkin on 03-Jan-2015 05:31

Workaround: the missed chunk of input data can be re-read using READKEY function - just 1K from the beginning  of a block. Like that:

  INPUT FROM VALUE(ipInputFile) BINARY.
  SEEK INPUT TO ipBlockOffset.
  ASSIGN LENGTH(opRawBlock) = ipBlockSize.
  IMPORT UNFORMATTED opRawBlock.
  ASSIGN vReadSize = SEEK(INPUT) - ipBlockOffset.

  IF vReadSize GT ipBlockSize THEN
  DO:
    ASSIGN vReadSize = vReadSize - ipBlockSize.
    IF ipBlockSize GT vReadSize THEN
    PUT-BYTES(opRawBlock, vReadSize + 1) = 
    GET-BYTES(opRawBlock, 1, ipBlockSize - vReadSize).
    SEEK INPUT TO ipBlockOffset.
    REPEAT vReadByte = 1 TO vReadSize:
      READKEY PAUSE 0.
      IF LASTKEY LT 0 THEN
      LEAVE.
      PUT-BYTE(opRawBlock, vReadByte) = LASTKEY.
    END. /* REPEAT vReadByte */
  END. /* IF vReadSize GT ipBlockSize */
  INPUT CLOSE.


Posted by George Potemkin on 19-Jan-2015 07:40

The issue is logged as the defect PSC00325355

BTW, COPY-LOB statement also has the problems with large files:

COPY-LOB FROM FILE "<your.large.file>" STARTING AT <offset> FOR <length> TO MEMPTR.

If the file is larger than 4GB it issues the error:

Could not seek to appropriate position in file '<your.large.file>' during COPY-LOB (11320)

Posted by Jean Richert on 28-Jan-2015 07:55

Thanks for sharing with the rest of the community [mention:ae2ea2f6412743fc8be36c522f414ef0:e9ed411860ed4f2ba0265705b8793d05]!

Posted by Evan Bleicher on 09-Oct-2015 10:31

Hi George:

Quick update on the issue noted above - PSC00325355.  This issue is resolved in 11.6.0.  The resolution  addressed an issue in which the AVM  did not process offsets greater than 2 GB properly.  Do you have use cases in which the file you are processing is greater than 4GB in size?  Is that a common scenario?

Thanks

Evan

Posted by George Potemkin on 09-Oct-2015 10:58

Hi Evan,

Thanks for update.

Due to the lack of some Progress utilities that we need I forced sometimes to write the programs that parse db files directly. And nowaday db extents larger 4Gb are quite common case.

Best regards,

George

This thread is closed