Does getCharCount take Unicode sizing into account?

Posted by Phil_M on 23-Nov-2009 17:23

Hi all,

It appears as if the sonicMQ / 4GL adapter function getCharCount doesn't take Unicode sizing into account for text messages, although I don't know that for sure yet.

I've found that if I size a memptr based on it, then load it up using successive putstring functions in a getTextSegment loop, I end up with an error message (Cannot PUT past the end of the MEMPTR. (4791) ) when receiving a message containing double-byte characters (e.g. those in the 128-255 ASCII range or other e.g. Chinese).  As a test, if found that if the message contains 4 Chinese characters (amongst other regular 1-127 ASCII), I find that if I increase the memptr size by 8 then the error goes away, but add 7 and it remains (i.e. each Chinese character probably takes up 3 bytes, so need 4 x 3 = 12 bytes instead of 4 bytes, therefore an extra 8 sorts it out).

If this is correct i.e. that it doesn't take Unicode / double-byte (or more) characters into account, then can another function be used instead?  Alternately, can I pass it an argument such as "RAW" (as with LENGTH() etc. functions) or otherwise?

I am aware that using a bytesmessage gets around this, but I would like to find a solution for textmessage as we would like to keep an option we currently have that allows our clients to select which kind of message types they want to use.

Any assistance or comments on this would be much appreciated.

Cheers.

p.s. this is my 1st post here ... I hope it was clear enough (and you could understand the Aussie accent!)

All Replies

Posted by Bill Wood on 24-Nov-2009 08:33

This communitiy has more of a pure Sonic focus so you may want to cross-post a link in the PSDN/OpenEdge community.

The getCharCount is a pass-through to the MQ Adapters Java environment -- in Java the TextMessage is unicode, so the character count is the Java Unicode character count.  For ABL, I think you have pure double-byle (vs Unicode variable length encoding).  You might try just doubling the char count (butt that is a guess)

Posted by Phil_M on 24-Nov-2009 16:03

William,

Thanks for the reply.

I've actioned your suggestion by re-posting it on PSDN/Openedge.

As for the doubling suggestion, whilst it would work (in most cases, although possibly not for Chinese etc. that might have 3 or more bytes per character) I'd rather be more precise if possible to save on memory, especially for the rare occassion where the message ends up being huge. If there is no precise solution available, I'll limit use to bytesmessage only and remove the option.

Cheers.

Posted by Bill Wood on 25-Nov-2009 08:26

Phil_M wrote:

As for the doubling suggestion, whilst it would work (in most cases, although possibly not for Chinese etc. that might have 3 or more bytes per character) I'd rather be more precise if possible to save on memory, especially for the rare occassion where the message ends up being huge.


I'm a little rusty on my ABL (I still call it 4GL), but I thought OpenEdge was double-byte always, and not Unicode.

Posted by Phil_M on 25-Nov-2009 17:36

William,

We investigated using Unicode 3 years ago to handle Chinese and other asian languages, and found it was possible by converting the d/b to UTF-8 and ensuring that all string handling functions such as LENGTH (I can't remember if there are others) use the optional "RAW" parameter where required to keep track of the true length in bytes (e.g. for transmitting via SonicMQ messages).

In context, we are using a Web system written in Ext / Java, and using the Progress ABL for business logic to run the Web via SonicMQ messages containing XML requests from the Web for what screen to draw / what menu to provide / saving record details etc., replying with XML responses providing details requested or confirmation that info has been saved & what to do next etc.  We are effectively running the Web as a fancy 'green-screen' from Progress business-logic that runs the show.

Cheers.

Posted by Phil_M on 25-Nov-2009 18:06

p.s. extra information regarding the previous reply:

We don't use Progress's language facilities, we use our own language tables (held on the d/b) and have mechanisms in place to convert all labels etc. to use equivalent alternate language strings if a system is using something other than English.

From one installed software installation (client ABL/4GL and/or Web) we are capable of running a system in Canada and having it appear in either English or French depending on who logs in (and what language is configured on their profile), and the same goes for other languages (including Asian).

Cheers.

Posted by Bill Wood on 28-Nov-2009 07:34

Given all that history, I agree that going with the BytesMessage might be your best bet.  In the progress.message.jclient.Message interface there is a getBodySize() method, but I don't think you'd have this available in the ABL.

Posted by Håvard Danielsen on 30-Nov-2009 10:10

The ABL bytes-message.p has a getBytesCount(), which possibly is the equivalent of text-message.p getCharCount().

Posted by Phil_M on 30-Nov-2009 16:34

Havard,

Thanks for your suggestion.

Note that I've already got bytesmessage working, which was already using getBytesCount().

What I was attempting to achieve was also being able to keep using textmessages for those clients that wanted to use it, but unfortunately getBytesCount() isn't supported for textmessages (I've already tried it!).

It's looking more likely that I might need to just limit the options to bytesmessage if it is a system that is going to support Unicode.

Cheers.

p.s. William, I was unable to find any references to getBodySize() in the ABL documentation ('Progress on the Web' or anything in the Progress Kbase), so I don't think it's available via the SonicMQ adapter.

Posted by Håvard Danielsen on 01-Dec-2009 13:13

What I was attempting to achieve was also being able to keep using textmessages for those clients that wanted to use it, but unfortunately getBytesCount() isn't supported for textmessages (I've already tried it!).

It's looking more likely that I might need to just limit the options to bytesmessage if it is a system that is going to support Unicode.

You should contact Support to check if this really needs to be the case. Most ABL consumers that use the text message probably do not care about the size, so it is possible that this is a simple bug that no one has encountered and that it can be fixed if a customer reports it. It is, however, also possible that it is a known limitation, in which case Support should be able to tell you what other alternatives you have.

Posted by Phil_M on 01-Dec-2009 15:52

Thanks Havard.

Will do so.  I'll post the response here once received.

Cheers.

This thread is closed