In one of our methods which is reponsible for outputting columns in the data buffer (row buffer) to stream we have the need to:
We roughly need to run this logic 12 million times per run and there are many other runs where we output various kinds of data into the warehouse.
Performance is critical due to the volume of data.
The crude way to do it is: when field buffer data type is CHAR, we calculate length using length function and compare to format length .... but that is expensive.
Is there a start up parameter or something so that when we are formatting data where truncation happens, an error exception is generated?
SEE THIS SECTION OF CODE BELOW:
CAPS(pFBuffer:string-value()) /* i wish this would error out if STRING TRUNCATION HAPPENS */
method public override logical writeBuffer(input pHBuffer as handle):
define variable i as integer no-undo.
define variable vCharBuffer as char no-undo.
for each ttOutStream no-lock :
vCharBuffer = "".
ttOutStream.hbuf:buffer-copy(pHBuffer).
do i = 1 to ttOutStream.hbuf:num-fields:
pFBuffer = ttOutStream.hbuf:buffer-field(i).
vReportValue = if can-do("GUID,UUID", pFBuffer:Name)
and pFBuffer:buffer-value > "" then
pFBuffer:string-value()
else
if (pFBuffer:label > "") then
DYNAMIC-INVOKE(pFBuffer:label,
"format",
input pFBuffer:buffer-value,
input pFBuffer:format)
else
if (pFBuffer:buffer-value <> ?)
and ttOutStream.uppercase
and pFBuffer:data-type = "character" then
CAPS(pFBuffer:string-value()) /* i wish this would error out if STRING TRUNCATION HAPPENS */
else
if (pFBuffer:buffer-value <> ?) then
pFBuffer:string-value()
else
if pFBuffer:data-type = "date" then "00000000"
else
fill(" ",integer(pFBuffer:width-chars)).
vCharBuffer = vCharbuffer + vReportValue +
ttOutStream.columnDelimiter.
end. /* do i = */
if (ttOutStream.columnDelimiter <> "") then
vCharBuffer = substring(vCharbuffer,1,length(vCharBuffer) - 1).
put stream-handle ttOutStream.hStream unformatted
vCharBuffer skip.
end. /* for each*/
end method.
I don't know about performance, but have you tried string(myString,theDBFormat)?
Also,
you may use .Net Methode:
Class StringABLTool
{
public static string UpperCase (string val)
{
return val.ToUpper();
}
}
Are you sure that the string comparison is the issue here? You do a whole lot more in that method. A lot of the time might be just writing millions of records to disc. Consider profiling the method.
Perhaps the warehouse can import files exported with just a delimiter instead of what looks like a fixed column format?
ETIME below says around six seconds for 1 million records on my desktop PC. Removing the check roughly halves the time.
DEFINE TEMP-TABLE myTable NO-UNDO
FIELD myField AS CHARACTER FORMAT "xx".
DEFINE VARIABLE iNums AS INTEGER NO-UNDO INIT 1000000.
DEFINE VARIABLE iRun AS INTEGER NO-UNDO.
DEFINE VARIABLE lError AS LOGICAL NO-UNDO.
DO iRun = 1 TO iNums:
CREATE myTable.
myTable.myField = "ABC".
END.
ETIME(TRUE).
FOR EACH myTable:
lError = FALSE.
IF BUFFER myTable:BUFFER-FIELD("myField"):DATA-TYPE = "CHARACTER" AND myTable.myField <> STRING(myTable.myField, BUFFER myTable:BUFFER-FIELD("myField"):FORMAT) THEN
lError = TRUE.
END.
DISP ETIME.
Yes I have tried that.
In the case of character data type, it does not give any error if the format size is smaller than the number of the characters to display.
that is very unfortunate.
Seems to me I might have to compare using LENGTH(var_char_data) against the format size.
For DECIMAL data types, it gives an error to indicate that display size cannot hold converted element.
I was hoping to avoid that and instead rely on error being generated by progress in both cases so that I can consistently trap it the same way.
Thank you for your suggestion.
We are on linux ... so there is no .NEt.
I wish they would allow JAVA classes to be mapped into the ABL framework.
Thanks for your replay Jen.
I have extensively profiled this part to optimize it since the majority of our time is spent here.
A simple case is 200,000 rows X 64 columns to output.
I am trying to avoid any sting type of comparisions since I have noticed they are much slower then integer compares or logical compares.
So ... worst case, I will use length(string) to check against buffer-field:width-char attribute.
Mike
You seem to be doing a lot of run time evalution within your record loop which could be pulled outside and evaluated once per table.
Define a character field with extent 64 (or whatever your maximum number of columns is) and fill this with the 'datatype' outside of the record loop. Within your loop you then only need to add a case on the 'datatype' and do what is required.
CASE p_cdatatype [i]:
WHEN "GUID" THEN vreportvalue = IF pfbuffer:BUFFER-VALUE > "" THEN pfbuffer:BUFFER-VALUE.
WHEN "UUID" THEN vreportvalue = IF pfbuffer:BUFFER-VALUE > "" THEN pfbuffer:BUFFER-VALUE.
WHEN "CUSTOM" THEN vreportvalue = DYNAMIC-INVOKE( pfbuffer:LABEL, "FORMAT", ...
WHEN "CHARACTER" THEN etc
You may want to look at the -clientwidth startup parameter, when set to 2 character values may not exceed their width. You will initially need to set widths in the schema to match their format.
Is -clientwidth going to throw an error? The OP clearly wants an error.
tamhas wrote:
Is -clientwidth going to throw an error? The OP clearly wants an error.
checkwidth (clientwidth was a typo) will throw an error when attempting to create / update fields beyond their max width. So you will not be able to assign "ABC" in a field that has a width defined as 2:
ERROR: Width of data is greater than exactcs.mcomp.main-comp-code (rowid -1734) _width. (10841)
I have a idea,
If you use open edge 10 or more , you may create your custom ABL CAPS methode with HLC librairie who don't truncate string.
Explain here : http://communities.progress.com/pcom/servlet/JiveServlet/download/16301-2-15434/dvpin.pdf