Import XLSX

Posted by James Palmer on 06-Aug-2014 05:25

Just a quick question, hopefully! 

Is anyone aware of open source code to read XLSX files into an ABL data structure? We've got a solution that uses the API to read the data cell by cell, but it's very slow, and as I've seen libraries for writing DOCX files, I was wondering if anyone had actually sat down to work out the XLSX xml. Would save me a job! 

Thanks

All Replies

Posted by Thomas Mercer-Hursh on 06-Aug-2014 09:43

Ask Alon Blich, the DOCX guy.  I know he has a library for XLSX *writing* which he is planning to update to the same standard as the new DOCX stuff, but I don't know about the reading.

Posted by James Palmer on 06-Aug-2014 09:45

Thanks Thomas.

Posted by gabriel.lucaciu on 06-Aug-2014 10:30

Hi James.

Check out this post from OEHive. Maybe it would help you out: www.oehive.org/.../xls

Regards, Gabriel

Posted by James Palmer on 06-Aug-2014 10:33

Thanks Gabriel,

That sadly only writes to XML, rather than reading from XML. The Read function uses the API which I already have code for. It's just slow :)

Posted by andrew.may on 06-Aug-2014 11:14

The internals of Alon’s libraries definitely let you read the data from xlsx files.
 
I don’t think that there’s a nice wrapper to translate it into a dataset, but writing one shouldn’t be that hard.
(& if you write one, it should be possible to share the changes, as Alon’s project is now OSS)
 
[collapse]
From: James Palmer [mailto:bounce-jdpjamesp@community.progress.com]
Sent: 06 August 2014 16:35
To: TU.OE.Development@community.progress.com
Subject: RE: [Technical Users - OE Development] Import XLSX
 
Reply by James Palmer

Thanks Gabriel,

That sadly only writes to XML, rather than reading from XML. The Read function uses the API which I already have code for. It's just slow :)

Stop receiving emails on this subject.

Flag this post as spam/abuse.





Business Computer Projects Ltd is a limited company registered in England and Wales.

Registered number: 1813992. Registered office: BCP House, 151 Charles Street, Stockport, Cheshire SK1 3JY. Tel: +44 (0)161 355-3000. Fax: +44 (0)161 355-3001 Web: http://www.bcpsoftware.com

Disclaimer: This message, and any associated attachment is confidential. If you have received it in error, please delete it from your system, do not use or disclose the information in any way, and notify either the sender or Postmaster@BCPSoftware.com immediately. The contents of this message may contain personal views which are not necessarily the views of Business Computer Projects Ltd., unless specifically stated. Whilst every effort has been made to ensure that emails and their attachments are virus free, it is the responsibility of the recipient(s) to verify the integrity of such emails.
[/collapse]

Posted by tbergman on 06-Aug-2014 16:18

A quick Google search yielded a couple of .Net components that may work. I haven’t used these but I’m tempted.

www.codeproject.com/.../Csharp-How-To-Read-xlsx-Excel-File-With-Lines-of

exceldatareader.codeplex.com

Since I know you are on Windows and have Excel, you could also try opening the spreadsheet, saving as CSV, then importing the csv using Progress. Alternatively, you could open the spreadsheet, copy the contents into the clipboard, and then read the data from the clipboard. Either of these will be much faster than the cell by cell approach.

Tom

Posted by gabriel.lucaciu on 07-Aug-2014 02:12

This what I found that might help you to automatically convert xlsx to csv:

{libooxml.i}

run ooxml_convert( "/tmp/test.xlsx", "/tmp/test.csv" ).

And here is the include needed (and many other useful tools that are Office related):

docxfactory.com/free-download

Posted by blichal on 08-Aug-2014 03:08

Hello James,

The structure of a worksheet in an .XLSX file is pretty straight forward and simple, very similar to an HTML table with rows and cells -

<worksheet>

<sheetData>

<row>

<cell>

</cell>

</row>

.

.

.

</sheetData>

</worksheet>

Although there are a few caveats -

- Office usually removes empty rows and cells so you'll need to pay attention to the r (reference/position) attribute of the row and cell nodes. slibxlsx.p adds the missing rows and cells in the beginning so it will be easier to work with the worksheet.

- Values are saved differently for text, numbers and dates (besides formulas).

I'll be happy to help if you have any questions. I'd also search for other solutions maybe in other languages/platforms, as suggested.

It wouldn't be too complicated to add an import XLSX file to temp-table/dataset. It would even be incredibly fast if it was written in C++ (less than 0.01 seconds).

I think it's also important to support importing from Excel tables instead of simply importing form a worksheet since Excel files are usually "polluted" with peoples added remarks, calculations etc. It's definitely something I plan to add to the new version of the XLSX library.

But at the moment I'm a little busy with another side project (besides my regular day job). The project is sort of a Javascript web desktop and similar in concept to ADM/SmartObjects which will also be free and open source. More to come soon.

Posted by James Palmer on 08-Aug-2014 06:55

I've got code that saves as CSV. The trouble with that is that you tend to lose formatting of things. I work in the energy industry and we deal with MPAN references a lot that are around 21 digits long. The save as CSV approach results in incorrect values for these :(

This thread is closed