Extract a PDF from a data file

Posted by Admin on 25-May-2011 11:15

I have a data file with a PDF imbedded in it.  I want to be able to extract the PDF data and create a separate file with that data.

I can do it manually but I cannot seem to stumble across the correct command to allow me to do it in Progress.

Thank you

All Replies

Posted by Admin on 25-May-2011 14:04

Can you please be a bit more specific on what exactly are you trying to  do? How do you define embedded, are you referring to a PDF document that  has 'data' in it (like customer info in header, order data in some  document section and so on)?

Posted by Admin on 25-May-2011 14:08

The PDF file is simply the PDF file opened as raw data and put into the data file..

The data file lay out is like this:

Header

Data

Header

PDF Data.

Thank you for taking an interest in this...

Posted by Admin on 25-May-2011 14:12

The PDF file is simply the PDF file opened as raw data and put into the data file..

 

You say you can do it manually. What does that mean? From within Acrobat or Acrobat reader? Do you use copy & paste?

Posted by Admin on 25-May-2011 14:20

Yes open the data file up in an editor that does not modify the data and strip out the non PDF part of the data and save the file as a PDF.

Posted by Admin on 25-May-2011 14:37

Yes open the data file up in an editor that does not modify the data and strip out the non PDF part of the data and save the file as a PDF.

Sounds a bit odd.... But when it's basically just plain text operation (and you're on OpenEdge 10), define a LONGCHAR variable and use the COPY-LOB statement. Then you may try the INDEX and SUBSTRING kind of ABL functions to extract parts of the LONGCHAR.

Posted by Admin on 25-May-2011 14:41

Yes but now it is not just plain text.  The PDF part of the data file contains control codes and such that when i tried with a Character data type it stripped out all not Characters.  i tried with LongChar but Import/Put does not like LongChar.

Posted by Admin on 25-May-2011 14:52

Then COPY-LOB it into a MEMPTR and use the GET-BYTE kind of functions.

Posted by Admin on 25-May-2011 14:55

And there is my problem...i do not understand MEMPTR's....could you give me a sample?

Posted by Admin on 25-May-2011 15:12

And there is my problem...i do not understand MEMPTR's....could you give me a sample?

Progress Documentation: Programming Interfaces, the Chapter about "Introduction to External Program

Interfaces" has some samples.

COPY-LOB is the easiest way to read a large binary file into a MEMPTR.

And don't forget to use SET-SIZE (mptr) = 0 . when you're done to free the memory (put it in the FINALLY block).

Posted by Admin on 26-May-2011 11:03

dchalom wrote:

And there is my problem...i do not understand MEMPTR's....could you give me a sample?

forget about memptr, even if you understand that being able to read data from a PDF document won't be as easy as using common string manipulation routines (index, substring, ...)


your best bet is to try to find an 'external' tool to do it, either escaping to OS command or using shared libraries (assemblies could be an option if you have the most recent OE version and are on windows)... for instance try to see what you can get from Xpdf and other tools that use it pdf2ipe (ipe is plain xml), or something that translate it to excel...


good luck

Posted by jquerijero on 11-Oct-2011 17:08

If you are using OEA, System.IO has tons of stream related classes/objects that you can use to manipulate file easily by byte.

This thread is closed