How sonic handles non-english characters

Posted by smahanta on 06-May-2010 10:16

Hi,

We have a web application which displays a form in which user inputs an xml file, upon submitting the form the web application posts  xml messages to sonic message queue.This message at times contains greek characters. The web application does it job properly as verified by logs.The encoding in xml file was explicitly mentioned as UTF-8.

Originally the web application was also having the same problem. As I came to know that jsp form assumes the encoding to be ISO-8859-1 by default I used String.getBytes("ISO-8859-1") & then converted the byte-array using UTF-8 encoding to string.

However, not sure what to do with sonic as the string is becoming '????????????????".

Would appreciate any help in solving this encoding problem.

Regards,

Subhendu

All Replies

Posted by Bill Wood on 07-May-2010 14:59

smahanta wrote:

However, not sure what to do with sonic as the string is becoming '????????????????".

Would appreciate any help in solving this encoding problem.


I would look at a few things.  I am assuming you are sending this as a TextMessage (or XML or Multipart, but not a BytesMessage).   For Text, Sonic uses stardard Java multbtyte encoding.   It's possible (and ofter the case) that the viewer you are using to view or process the message at the other site -- if you try a different client, on a different machine with different encodeing for the system, are you seeing the same thing.

Posted by smahanta on 07-May-2010 21:32

Hi William,

Thanks for replying.

My web-application is sending progress.message.jclient.XMLMessage, not ByteMessage.

The message ultimately gets delivered to mobile handset & there it gets delivered properly ie greek message showing up as greek. Our problem is with logging both with log4j & database (MySQL). For now we are ignoring the database part as my doubt is if we declare database CHARACTER SET as utf8 & collation as utf8_unicode_ci that problem will be solved ( as at the time of creation of database we were not thoughtful enough mysql went for the default charset latin1 etc).

We are viewing the log file with editplus in my development machine (windows XP).

First I tried to solve the problem at log4j appender by adding <param name="Encoding" value="UTF-8" />.

Mostly things are erratic.We have several sonic services.At times the log is logging the greek letters correctly while the message has arrived from the web application & travelling through the first service.But by the time the first service is done with its processing & dispatches it to the next queue from where it will be picked by next service the greek part of the message becomes gibberish.

Let us examine what are the options we have programmatically:

  1. XQMessage.getPartAsDocument(0,false) which gives me org.w3c.dom.Document. I parse this document to construct my java object;
  2. XQMessage.getPart(0).getContent().toString().Do xml parsing to get the Document. Parse the document to construct my java object.
  3. XQPart part = message.getPart(0);

    javax.activation.DataHandler dataHandler = part.getDataHandler();

    String content = (String)dataHandler.getContent().Do xml parsing to get the Document. Parse the document to construct my java object.

    Progress SonicMQ Deployment Guide V7.6 page 474 Accepting Inbound HTTP Messages Through Direct Acceptors says "The character encoding attribute charset can be specified when a message is intended to be treated as JMS TextMessage. The character encoding in this example is specified as UTF-8. When no encoding attribute is provided, the encoding defaults to the ISO-8859-1".Though we are not using Http Acceptor here, I assumed may be sonic is treating the encoding as ISO-8859-1.So tried the following( here I did not convert the whole incoming message to bytes, but only that field of my java object which is in greek :

  4. iso88591bytes = pld.getBytes();ByteBuffer bb = ByteBuffer.wrap(iso88591bytes);ByteBuffer cc = cset.encode(bb.asCharBuffer());utfbytes = cc.array();String utfString = new String(utfbytes,"UTF-8");   This did not work as it threw UnsupportedEncodingException.

Don't know if it has any relevance - my development platform is Windows XP .It has 151 charsets.

Big5,Big5-HKSCS,EUC-JP,EUC-KR,GB18030,GB2312,GBK,IBM-Thai,IBM00858,IBM01140,IBM01141,IBM01142,IBM01143,IBM01144,IBM01145,IBM01146,IBM01147,IBM01148,IBM01149,IBM037,IBM1026,IBM1047,IBM273,IBM277,IBM278,IBM280,IBM284,IBM285,IBM297,IBM420,IBM424,IBM437,IBM500,IBM775,IBM850,IBM852,IBM855,IBM857,IBM860,IBM861,IBM862,IBM863,IBM864,IBM865,IBM866,IBM868,IBM869,IBM870,IBM871,IBM918,ISO-2022-CN,ISO-2022-JP,ISO-2022-KR,ISO-8859-1,ISO-8859-13,ISO-8859-15,ISO-8859-2,ISO-8859-3,ISO-8859-4,ISO-8859-5,ISO-8859-6,ISO-8859-7,ISO-8859-8,ISO-8859-9,JIS_X0201,JIS_X0212-1990,KOI8-R,Shift_JIS,TIS-620,US-ASCII,UTF-16,UTF-16BE,UTF-16LE,UTF-8,windows-1250,windows-1251,windows-1252,windows-1253,windows-1254,windows-1255,windows-1256,windows-1257,windows-1258,windows-31j,x-Big5-Solaris,x-euc-jp-linux,x-EUC-TW,x-eucJP-Open,x-IBM1006,x-IBM1025,x-IBM1046,x-IBM1097,x-IBM1098,x-IBM1112,x-IBM1122,x-IBM1123,x-IBM1124,x-IBM1381,x-IBM1383,x-IBM33722,x-IBM737,x-IBM856,x-IBM874,x-IBM875,x-IBM921,x-IBM922,x-IBM930,x-IBM933,x-IBM935,x-IBM937,x-IBM939,x-IBM942,x-IBM942C,x-IBM943,x-IBM943C,x-IBM948,x-IBM949,x-IBM949C,x-IBM950,x-IBM964,x-IBM970,x-ISCII91,x-ISO-2022-CN-CNS,x-ISO-2022-CN-GB,x-iso-8859-11,x-JIS0208,x-JISAutoDetect,x-Johab,x-MacArabic,x-MacCentralEurope,x-MacCroatian,x-MacCyrillic,x-MacDingbat,x-MacGreek,x-MacHebrew,x-MacIceland,x-MacRoman,x-MacRomania,x-MacSymbol,x-MacThai,x-MacTurkish,x-MacUkraine,x-MS950-HKSCS,x-mswin-936,x-PCK,x-windows-50220,x-windows-50221,x-windows-874,x-windows-949,x-windows-950,x-windows-iso2022jp

Regards,

Subhendu

Posted by tsteinbo on 10-May-2010 06:07

Subhedhu,

have a look at solution P148382 in our knowledge base:

http://progress.atgnow.com/esprogress/Group.jsp?bgroup=sonic&id=P148382

Thomas

Posted by Bill Wood on 10-May-2010 08:34

Progress SonicMQ Deployment Guide V7.6 page 474 Accepting Inbound HTTP Messages Through Direct Acceptors says "The character encoding attribute charset can be specified when a message is intended to be treated as JMS TextMessage. [snip]

...

Though we are not using Http Acceptor here, I assumed may be sonic is treating the encoding as ISO-8859-1.

...

You should not make that assumption -- the ISO-8859-1 is related specifically to HTTP inbound.  It is not a sonic default.

This thread is closed