Need high performance method to check if length of char is b

Posted by mliu.mike on 11-Apr-2012 08:17

In one of our methods which is reponsible for outputting columns in the data buffer (row buffer) to stream we have the need to:

  • check and error out if the length of the character field is greater than the format length (x(nn))

We roughly need to run this logic 12 million times per run and there are many other runs where we output various kinds of data into the warehouse.

Performance is critical due to the volume of data.

The crude way to do it is: when field buffer data type is CHAR, we calculate length using length function and compare to format length .... but that is expensive.

Is there a start up parameter or something so that when we are formatting data where truncation happens, an error exception is generated?

SEE THIS SECTION OF CODE BELOW:

                            CAPS(pFBuffer:string-value()) /* i wish this would error out if STRING TRUNCATION HAPPENS */

method public override logical writeBuffer(input pHBuffer as handle):

define variable i as integer no-undo.

define variable vCharBuffer as char no-undo.

for each ttOutStream no-lock :

vCharBuffer = "".

ttOutStream.hbuf:buffer-copy(pHBuffer).

do i = 1 to ttOutStream.hbuf:num-fields:

          pFBuffer = ttOutStream.hbuf:buffer-field(i).

          vReportValue = if can-do("GUID,UUID", pFBuffer:Name)

                            and pFBuffer:buffer-value > "" then

                           pFBuffer:string-value()

                         else

                           if (pFBuffer:label > "") then

                             DYNAMIC-INVOKE(pFBuffer:label,

                                            "format",

                                            input pFBuffer:buffer-value,

                                            input pFBuffer:format)

                         else

                           if (pFBuffer:buffer-value <> ?)

                               and ttOutStream.uppercase

                               and pFBuffer:data-type = "character" then

                            CAPS(pFBuffer:string-value()) /* i wish this would error out if STRING TRUNCATION HAPPENS */

                         else

                           if  (pFBuffer:buffer-value <> ?) then

                             pFBuffer:string-value()

                         else

                           if pFBuffer:data-type = "date" then "00000000"

                         else

                           fill(" ",integer(pFBuffer:width-chars)).

          vCharBuffer = vCharbuffer + vReportValue +

                                      ttOutStream.columnDelimiter.

        end. /* do i =  */

        if (ttOutStream.columnDelimiter <> "") then

          vCharBuffer = substring(vCharbuffer,1,length(vCharBuffer) - 1).

        put stream-handle ttOutStream.hStream unformatted

          vCharBuffer skip.

      end. /* for each*/

    end method.

All Replies

Posted by Thomas Mercer-Hursh on 11-Apr-2012 11:18

I don't know about performance, but have you tried string(myString,theDBFormat)?

Posted by nidk on 12-Apr-2012 02:43

Also,

     you may use .Net Methode:

Class StringABLTool

{

     public static string UpperCase (string val)

    {

            return val.ToUpper();

    }

}

Posted by Jens Dahlin on 12-Apr-2012 04:56

Are you sure that the string comparison is the issue here? You do a whole lot more in that method. A lot of the time might be just writing millions of records to disc. Consider profiling the method.

Perhaps the warehouse can import files exported with just a delimiter instead of what looks like a fixed column format?

ETIME below says around six seconds for 1 million records on my desktop PC. Removing the check roughly halves the time.

DEFINE TEMP-TABLE myTable NO-UNDO
    FIELD myField AS CHARACTER FORMAT "xx".

DEFINE VARIABLE iNums  AS INTEGER     NO-UNDO INIT 1000000.
DEFINE VARIABLE iRun   AS INTEGER     NO-UNDO.
DEFINE VARIABLE lError AS LOGICAL     NO-UNDO.

DO iRun = 1 TO iNums:
    CREATE myTable.
    myTable.myField = "ABC".

END.

ETIME(TRUE).

FOR EACH myTable:

    lError = FALSE.

    IF BUFFER myTable:BUFFER-FIELD("myField"):DATA-TYPE = "CHARACTER" AND myTable.myField <> STRING(myTable.myField, BUFFER myTable:BUFFER-FIELD("myField"):FORMAT) THEN
        lError = TRUE.


END.

DISP ETIME.

Posted by mliu.mike on 13-Apr-2012 10:36

Yes I have tried that.

In the case of character data type, it does not give any error if the format size is smaller than the number of the characters to display.

that is very unfortunate.

Seems to me I might have to compare using LENGTH(var_char_data) against the format size.

For DECIMAL data types, it gives an error to indicate that display size cannot hold converted element.

I was hoping to avoid that and instead rely on error being generated by progress in both cases so that I can consistently trap it the same way.

Thank you for your suggestion.

Posted by mliu.mike on 13-Apr-2012 10:37

We are on linux ... so there is no .NEt.

I wish they would allow JAVA classes to be mapped into the ABL framework.

Posted by mliu.mike on 13-Apr-2012 10:40

Thanks for your replay Jen.

I have extensively profiled this part to optimize it since the majority of our time is spent here.

A simple case is 200,000 rows X 64 columns to output.

I am trying to avoid any sting type of comparisions since I have noticed they are much slower then integer compares or logical compares.

So ... worst case, I will use length(string) to check against buffer-field:width-char attribute.

Mike

Posted by Stefan Drissen on 17-Apr-2012 01:18

You seem to be doing a lot of run time evalution within your record loop which could be pulled outside and evaluated once per table.

Define a character field with extent 64 (or whatever your maximum number of columns is) and fill this with the 'datatype' outside of the record loop. Within your loop you then only need to add a case on the 'datatype' and do what is required.

CASE p_cdatatype [i]:

   WHEN "GUID" THEN vreportvalue = IF pfbuffer:BUFFER-VALUE > "" THEN pfbuffer:BUFFER-VALUE.

   WHEN "UUID" THEN vreportvalue = IF pfbuffer:BUFFER-VALUE > "" THEN pfbuffer:BUFFER-VALUE.

   WHEN "CUSTOM" THEN vreportvalue = DYNAMIC-INVOKE( pfbuffer:LABEL, "FORMAT", ...

   WHEN "CHARACTER" THEN etc

You may want to look at the -clientwidth startup parameter, when set to 2 character values may not exceed their width. You will initially need to set widths in the schema to match their format.

Posted by Thomas Mercer-Hursh on 17-Apr-2012 11:27

Is -clientwidth going to throw an error?  The OP clearly wants an error.

Posted by Stefan Drissen on 17-Apr-2012 11:34

tamhas wrote:

Is -clientwidth going to throw an error?  The OP clearly wants an error.

checkwidth (clientwidth was a typo) will throw an error when attempting to create / update fields beyond their max width. So you will not be able to assign "ABC" in a field that has a width defined as 2:

ERROR: Width of data is greater than exactcs.mcomp.main-comp-code (rowid -1734) _width. (10841)

Posted by nidk on 18-Apr-2012 03:22

I have a idea,

   If you use open edge 10 or more , you may create your custom ABL CAPS methode with HLC librairie who don't truncate string.

Explain here : http://communities.progress.com/pcom/servlet/JiveServlet/download/16301-2-15434/dvpin.pdf

This thread is closed