Reading in CSV without double quotes on every entry.

Posted by Garry Hall on 30-Mar-2017 10:59

Use the DELIMITER option of IMPORT:

IMPORT DELIMITER "," c1 c2 c3 c4.

This assumes you have a consistent format within the file.

Posted by Brian K. Maher on 30-Mar-2017 11:00

What about checking num-entries() and if it returns the value indicating <normal count> + 1 then split as normal for all fields but the two that should be combined then combine those two into a single value?

Posted by Joseph Kreifels on 30-Mar-2017 11:04

@Garry Hall

Doesn't that still cause the issue I stated above?

1. "He loves apple pie"

2. " but is allergic to apples"

See response below

Posted by James Palmer on 30-Mar-2017 11:11

How big is the CSV? If it's not too big you can use copy-lob to put the whole file in a LONGCHAR and then parse the contents using chr(10)/chr(13) as your line delimiter. It might be a bit clunky, but it's better than having to shell out to some other process IMO.

Posted by Joseph Kreifels on 30-Mar-2017 11:15

Garry,

I think that actually works. I tried your method against the sample data and it seems to have done the trick. I'll try to implement it using the actual CSV file and see how it goes.

Posted by Brian K. Maher on 30-Mar-2017 11:16

define variable vLine as character no-undo.

input from data.csv.

repeat:
    import unformatted vLine.
    if num-entries(vLine) = 4 then
        message entry(1,vLine) skip
                entry(2,vLine) skip
                replace(entry(3,vLine),'"','') skip
                entry(4,vLine) view-as alert-box.
    else
        if num-entries(vLine) = 5 then
            message entry(1,vLine) skip
                    entry(2,vLine) skip
                    replace(entry(3,vLine) + "," + entry(4,vLine),'"','') skip
                    entry(5,vLine) view-as alert-box.
        else
            message "Found line with wrong number of entries" view-as alert-box.            
end.

input close.

Posted by Garry Hall on 30-Mar-2017 11:17

I just test it in 11.7, and it worked for me.

DEFINE VARIABLE c1 AS CHARACTER NO-UNDO.

DEFINE VARIABLE c2 AS CHARACTER NO-UNDO.

DEFINE VARIABLE c3 AS CHARACTER NO-UNDO.

DEFINE VARIABLE c4 AS CHARACTER NO-UNDO.

INPUT FROM "csv.txt".

REPEAT:

IMPORT DELIMITER "," c1 c2 c3 c4.

DISPLAY c3 FORMAT "x(50)".

END.

INPUT CLOSE.

c3

--------------------------------------------------

description

He loves apple pie, but is allergic to apples

IMPORT is the statement used for dictionary dump and load. It has to handle delimiters in quoted character data, even if the delimiter is embedded in the quoted data. The default delimiter is " " (space), so IMPORT must be able to handle embedded spaces. As long as your quote starts after your delimiter, it should work.

Posted by Brian K. Maher on 30-Mar-2017 11:19

This is better:

define variable vLine as character no-undo extent 4.

input from data.csv.

repeat:
    import delimiter ',' vLine[1] vLine[2] vLine[3] vLine[4].
    message vLine[1] skip
            vLine[2] skip
            vLine[3] skip
            vLine[4] view-as alert-box.
end.

input close.

Posted by Joseph Kreifels on 30-Mar-2017 11:20

@James

The CSV has 23 columns and can have anywhere between 1 to a few hundred rows. There is at least 1 column in particular that allows commas in the data. Thankfully though this 1 column is wrapped in double quotes. I'm looking for a method that works with the CSV like it should without jury rigging a band aid method.

Posted by Ken Ward on 30-Mar-2017 11:26

The IMPORT statement is definitely the way to go, but I had to deal with a similar issue where the IMPORT statement wouldn't have been useful.

What I did was I checked the first character to see if it was a quote but the last character was NOT a quote. If so, then it went into a loop where it looked for a token that ended with a quote, appending commas and tokens along the way.

It might've been kludgy, but it got the job done.

Posted by scott_auge on 30-Mar-2017 14:12

This might help:

/* Given a string from import unformatted, return CSV entries in the */

/* 100 extent CSVEntries variable. */

/* TODO: Quotes within a quote is not done */

define input parameter CSVLine as character no-undo.

define output parameter CSVEntries as character extent 100 no-undo.

define variable CSVLength as integer no-undo.

define variable iterator as integer no-undo.

define variable extentiterator as integer no-undo.

define variable WholeValue as character no-undo.

define variable CharValue as character no-undo.

define variable LastCharValue as character no-undo.

define variable InQuotes as logical initial false no-undo.

assign CSVLength = length(CSVLine).

do iterator = 1 to CSVLength:

assign

LastCharValue = CharValue

CharValue = substring(CSVLine, iterator, 1)

.

case CharValue:

when "~"" then do:

assign InQuotes = not InQuotes. /* flip flop switch */

next.

end. /* when " */

when "," then do:

if not InQuotes then do:

assign

extentiterator = extentiterator + 1

CSVEntries[extentiterator] = WholeValue

WholeValue = ""

.

end. /* if not InQuotes */

if InQuotes then assign WholeValue = WholeValue + CharValue.

end. /* when , */

otherwise assign WholeValue = WholeValue + CharValue.

end. /* case */

end. /* do iterator */

assign CSVEntries[extentiterator + 1] = WholeValue.

/*************** Unit test code

define variable A as character no-undo.

define variable C as character extent 100 format "x(10)" no-undo.

A = "345,the word,*,45,~"417,000~",3ed".

run /home/sauge/prg/CSVParser.p (input A, output C).

/**/

display C.

/**/

/*

display integer(C[5]).

*/

*********************************/

This thread is closed

All Replies