If I have a CSV file that doesn't use double quotation marks except for items with commas.
Sample Data
name,age,description,state
Johnson,43,"He loves apple pie, but is allergic to apples",PA
I can't use the standard entry() method because it will split the description into 2 parts.
I've noticed software like Excel knows that when a item is wrapped in double quotation marks, it should ignore delimiters.
Is there a way to parse a CSV file into a temp table following the CSV standards?
I am using Progress 10.2B on RHEL. If Progress can't parse the file, there may be a outside option through Bash, Java, or Python that can be invoked to make changes to the CSV during runtime
Use the DELIMITER option of IMPORT:
IMPORT DELIMITER "," c1 c2 c3 c4.
This assumes you have a consistent format within the file.
Doesn't that still cause the issue I stated above?
1. "He loves apple pie"
2. " but is allergic to apples"
See response below
How big is the CSV? If it's not too big you can use copy-lob to put the whole file in a LONGCHAR and then parse the contents using chr(10)/chr(13) as your line delimiter. It might be a bit clunky, but it's better than having to shell out to some other process IMO.
Garry,
I think that actually works. I tried your method against the sample data and it seems to have done the trick. I'll try to implement it using the actual CSV file and see how it goes.
define variable vLine as character no-undo. input from data.csv. repeat: import unformatted vLine. if num-entries(vLine) = 4 then message entry(1,vLine) skip entry(2,vLine) skip replace(entry(3,vLine),'"','') skip entry(4,vLine) view-as alert-box. else if num-entries(vLine) = 5 then message entry(1,vLine) skip entry(2,vLine) skip replace(entry(3,vLine) + "," + entry(4,vLine),'"','') skip entry(5,vLine) view-as alert-box. else message "Found line with wrong number of entries" view-as alert-box. end. input close.
I just test it in 11.7, and it worked for me.
DEFINE VARIABLE c1 AS CHARACTER NO-UNDO.
DEFINE VARIABLE c2 AS CHARACTER NO-UNDO.
DEFINE VARIABLE c3 AS CHARACTER NO-UNDO.
DEFINE VARIABLE c4 AS CHARACTER NO-UNDO.
INPUT FROM "csv.txt".
REPEAT:
IMPORT DELIMITER "," c1 c2 c3 c4.
DISPLAY c3 FORMAT "x(50)".
END.
INPUT CLOSE.
c3
--------------------------------------------------
description
He loves apple pie, but is allergic to apples
IMPORT is the statement used for dictionary dump and load. It has to handle delimiters in quoted character data, even if the delimiter is embedded in the quoted data. The default delimiter is " " (space), so IMPORT must be able to handle embedded spaces. As long as your quote starts after your delimiter, it should work.
This is better:
define variable vLine as character no-undo extent 4. input from data.csv. repeat: import delimiter ',' vLine[1] vLine[2] vLine[3] vLine[4]. message vLine[1] skip vLine[2] skip vLine[3] skip vLine[4] view-as alert-box. end. input close.
@James
The CSV has 23 columns and can have anywhere between 1 to a few hundred rows. There is at least 1 column in particular that allows commas in the data. Thankfully though this 1 column is wrapped in double quotes. I'm looking for a method that works with the CSV like it should without jury rigging a band aid method.
The IMPORT statement is definitely the way to go, but I had to deal with a similar issue where the IMPORT statement wouldn't have been useful.
What I did was I checked the first character to see if it was a quote but the last character was NOT a quote. If so, then it went into a loop where it looked for a token that ended with a quote, appending commas and tokens along the way.
It might've been kludgy, but it got the job done.
This might help:
/* Given a string from import unformatted, return CSV entries in the */
/* 100 extent CSVEntries variable. */
/* TODO: Quotes within a quote is not done */
define input parameter CSVLine as character no-undo.
define output parameter CSVEntries as character extent 100 no-undo.
define variable CSVLength as integer no-undo.
define variable iterator as integer no-undo.
define variable extentiterator as integer no-undo.
define variable WholeValue as character no-undo.
define variable CharValue as character no-undo.
define variable LastCharValue as character no-undo.
define variable InQuotes as logical initial false no-undo.
assign CSVLength = length(CSVLine).
do iterator = 1 to CSVLength:
assign
LastCharValue = CharValue
CharValue = substring(CSVLine, iterator, 1)
.
case CharValue:
when "~"" then do:
assign InQuotes = not InQuotes. /* flip flop switch */
next.
end. /* when " */
when "," then do:
if not InQuotes then do:
assign
extentiterator = extentiterator + 1
CSVEntries[extentiterator] = WholeValue
WholeValue = ""
.
end. /* if not InQuotes */
if InQuotes then assign WholeValue = WholeValue + CharValue.
end. /* when , */
otherwise assign WholeValue = WholeValue + CharValue.
end. /* case */
end. /* do iterator */
assign CSVEntries[extentiterator + 1] = WholeValue.
/*************** Unit test code
define variable A as character no-undo.
define variable C as character extent 100 format "x(10)" no-undo.
A = "345,the word,*,45,~"417,000~",3ed".
run /home/sauge/prg/CSVParser.p (input A, output C).
/**/
display C.
/**/
/*
display integer(C[5]).
*/
*********************************/