Just a quick question, hopefully!
Is anyone aware of open source code to read XLSX files into an ABL data structure? We've got a solution that uses the API to read the data cell by cell, but it's very slow, and as I've seen libraries for writing DOCX files, I was wondering if anyone had actually sat down to work out the XLSX xml. Would save me a job!
Thanks
Ask Alon Blich, the DOCX guy. I know he has a library for XLSX *writing* which he is planning to update to the same standard as the new DOCX stuff, but I don't know about the reading.
Thanks Thomas.
Hi James.
Check out this post from OEHive. Maybe it would help you out: www.oehive.org/.../xls
Regards, Gabriel
Thanks Gabriel,
That sadly only writes to XML, rather than reading from XML. The Read function uses the API which I already have code for. It's just slow :)
Thanks Gabriel,
That sadly only writes to XML, rather than reading from XML. The Read function uses the API which I already have code for. It's just slow :)
Flag this post as spam/abuse.
A quick Google search yielded a couple of .Net components that may work. I haven’t used these but I’m tempted.
www.codeproject.com/.../Csharp-How-To-Read-xlsx-Excel-File-With-Lines-of
Since I know you are on Windows and have Excel, you could also try opening the spreadsheet, saving as CSV, then importing the csv using Progress. Alternatively, you could open the spreadsheet, copy the contents into the clipboard, and then read the data from the clipboard. Either of these will be much faster than the cell by cell approach.
Tom
This what I found that might help you to automatically convert xlsx to csv:
{libooxml.i}
run ooxml_convert( "/tmp/test.xlsx", "/tmp/test.csv" ).
And here is the include needed (and many other useful tools that are Office related):
Hello James,
The structure of a worksheet in an .XLSX file is pretty straight forward and simple, very similar to an HTML table with rows and cells -
<worksheet>
<sheetData>
<row>
<cell>
</cell>
</row>
.
.
.
</sheetData>
</worksheet>
Although there are a few caveats -
- Office usually removes empty rows and cells so you'll need to pay attention to the r (reference/position) attribute of the row and cell nodes. slibxlsx.p adds the missing rows and cells in the beginning so it will be easier to work with the worksheet.
- Values are saved differently for text, numbers and dates (besides formulas).
I'll be happy to help if you have any questions. I'd also search for other solutions maybe in other languages/platforms, as suggested.
It wouldn't be too complicated to add an import XLSX file to temp-table/dataset. It would even be incredibly fast if it was written in C++ (less than 0.01 seconds).
I think it's also important to support importing from Excel tables instead of simply importing form a worksheet since Excel files are usually "polluted" with peoples added remarks, calculations etc. It's definitely something I plan to add to the new version of the XLSX library.
But at the moment I'm a little busy with another side project (besides my regular day job). The project is sort of a Javascript web desktop and similar in concept to ADM/SmartObjects which will also be free and open source. More to come soon.
I've got code that saves as CSV. The trouble with that is that you tend to lose formatting of things. I work in the energy industry and we deal with MPAN references a lot that are around 21 digits long. The save as CSV approach results in incorrect values for these :(