SonicMQ C/C++ client library connect problem

Posted by priitr on 19-Aug-2010 07:26

Hello

I'd like to ask suggestions to solve an extremely annoying problem involving SonicMQ C/C++ client library and unreliable network connections.

A client of mine asked me to interface his application (running on RedHat EL 5) with a service provider's SonicMQ server. I've implemented the client part as a single thread loosely based on SonicMQ ReliableTalk example program using SonicCClient v7.6.2 libraries.

As the providers' Sonic server only accepts SSL connections, I had to tunnel the traffic through a stunnel proxy on the same host as application itself (SonicMQ C/C++ library doesn't support SSL brokers).

Usually this setup works as intended and behaves well even when the provider's SonicMQ server goes down occasionally. setPingInterval and setExceptionListener do their work and connection will re-established when SonicMQ server comes back.

Unfortunately, connection can not always be re-established. Sometimes it's the network or firewalls between the endpoints that break and then real trouble arises.

Exception will still be thrown and caught, old connection will be invalidated but then the program will get stuck in createQueueConnection library call, endlessly trying to write/read data to/from socket it's connected to (remember, it will successfully connect to the stunnel listening on localhost). The call doesn't recover even when network connectivity is restored.

Only solution so far has been to restart the whole application and this is extremely annoying as the application's availability vitally important for my client's business.

It _is_ stated in documentation that connect with a timeout is not implemented (again ) in C/C++ client so can't blame Progress for not telling about the 'feature', but still... Can't something be done about it?

First  I tried to check TCP-level connection availability before calling createQueueConnection but it obviously doesn't work - connection can always be established to the stunnel.

It kind of _does_ work if I monitor TCP-connectivity of the provider's endpoint and only call createQueueConnection if it becomes available, but it is possible that provider also will use some kind of proxy that can't connect to the SonicMQ server and then we will block again...

I think that i could solve the problem by writing something and reading some kind of answer back from (presumably) connected socket. Unfortunately I don't know what to write to SonicMQ server and what answer to expect.

So my plea for help follows - does somebody know about the protocols used? Some kind of a HELLO message? Anything at all to ensure that SonicMQ is listening on the connected port?

Best solution of course would involve a new version of library supporting both SSL and timeouts.

Regards

Priit

All Replies

Posted by tsteinbo on 23-Aug-2010 15:51

The problem you describe sounds like the broker things the old connection is still active. In that case it will actually not let a new client connect that is using the same connect id.

You should tune/enable the broker ping interval. It will try to ping the clients from the broker, thereby allowing the broker to terminate stale connections if like in you case firewalls/routers etc etc misbehaved and do not report an TCP error to the broker.

Thomas

Posted by priitr on 24-Aug-2010 08:50

Hi

Unfortunately the problem does not seem to be on the broker level, but strictly on the POSIX API programming  level.

When tracing test program kernel calls it can be seen that (presuming network is down) createQueueConnection connects a socket to the SSL-proxy successfully, writes 20 bytes to the socket

and waits (polls for 30ms a time) for answer 'till the proxy resets the TCP connection (as it cannot contact the real broker), timeouts and then just does nothing any more with that socket.

Moreover, it behaves the same way even without a ssl-tunnel - just for example when trying to connect to ssl-enabled broker without having SSL-enabled library - it writes his 20 bytes to the socket, broker resets the connection but the program just hangs within createQueueConnection...

Regards,

Priit

Posted by nkamath on 24-Aug-2010 09:38

Hi Priit,

If the same problem is occurring without  the ssl tunnel proxy, perhaps you can explain the sequence of steps undertaken at the broker side and the client application/API side to be  able to understand the problem better?

Is it something  like...

1. client connects to the broker

2.  client publishes messages (is it a sender or receiver or both like our  Talk sample)

3. waits for some time

3a. meanwhile the broker  is shutdown (as a way to simulate the network failure) or is it  something else?

4. client retries connection thru createQueueConnection as part of the exception handling (and blocks forever)

If you have sample snippets for the client side  (without the ssl tunnel proxy), that will also be useful.

Maybe the TCP connection or the higher level connection objects were not closed properly when the reset occurred.

It is too early to determine the stage of the SonicMQ connection handshake at which the block is occurring, we can take a closer look if you post us the client snippets. Regardless of the determination, there is no way for the API to manipulate the SonicMQ protocol packets directly. It is better to fix the underlying issue in the runtime (as it seems to be bug, client connection in unsuable state if connection reset during handshake) as opposed to tweaking with the underlying socket directly from the application. I realise you are trying to get around the issue at the minute but we need further data to answer your question more precisely.

Regards,

Navin

Posted by nkamath on 25-Aug-2010 07:38

Another suggestion is to use cf->setTCPConnectionTimeout(time_in_msecs);

It is worth trying though not sure if it will work or not.

Regards,

Navin

Posted by priitr on 26-Aug-2010 09:49

Hi

Thank you for the answer.

I'm currently using ReliableTalk example from SonicCClient7.6.2 32-bit RHEL4 distribution as base for experiments and as a source we all have access to.

The command looks like this:

./ReliableTalk -u user -p pwd -qr TQ -qs TQ -b localhost:12506

If the sonicMQ server is up and stunnel is listening on localhost:12506 I can send and receive messages:

Enter text messages to clients that read from the TQ queue.
Press Enter to send each message.
dsfdsf
[ MESSAGE RECEIVED ] user: dsfdsf
sdfsdf
[ MESSAGE RECEIVED ] user: sdfsdf

...

If I kill the ssl-proxy, connection is terminated as expected and the program tries to reconnect repeatedly until stunnel is restarted.

....

There is a problem with the connection.  The connection was dropped by the broker

-5
error: Cannot connect to Broker - localhost:12506.  Try again in  - 10 seconds.
error: Cannot connect to Broker - localhost:12506.  Try again in  - 10 seconds.

...

Connection restored.  Messages will now be accepted again

... and sending/receiving works again; everything seems to be fine.

------------------------------------------

If I instead simulate a network failure and drop the TCP packets meant for sonicMQ server (by doing iptables -A OUTPUT -d server_ip -p tcp -j DROP) the output is:

There is a problem with the connection.  The connection was dropped by the broker
-5

and so it stays even if the network traffic is enabled again.

Liberal use of cout's shows that program gets stuck on line with

            connect = createQueueConnection(createString(m_broker), createString(""), createString(m_username), createString(m_password));

The program also gets stuck on the same place when I tried connecting to the sonicMQ server directly,without using proxy - trying to establish a non-ssl connection to SSL-listening sonicMQ server.

Same also happens when trying to connect to arbitrary TCP services (http for example).

I also modified that example to use QueuConnectionFactory in order to set the timeout for TCP connections. Something like this:

factory = createQueueConnectionFactory(createString(m_broker), createString(CLIENT_ID), createString(m_username), createString(m_password));
factory->setTCPConnectionTimeout((jint)5000);

and later in ReliableTalk::setupConnection created connection with:

connect = factory->createQueueConnection();

Unfortunately observed behavior remains the same.

strace shows that program waits indefinitely for _something_ to come back from the socket it wrote to:

Excerpts:

connect(3, {sa_family=AF_INET, sin_port=htons(2507), sin_addr=inet_addr("172.26.214.112")}, 16) = -1 EINPROGRESS (Operation now in progress)

..... wait till connection established, then write 'magic' to the socket

writev(3, [{"\32\4\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 20}, {NULL, 0}, {NULL, 0}, {NULL, 0}, {NULL, 0}], 5) = 20

...tries to read an answer,
readv(3, [{"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 32768}], 1) = 0

.. and then it polls the socket for about 200 times for events, each time waiting 30 ms:

poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}], 2, 29) = 0 (Timeout)

... and then it doesnt do anything more with that socket, just doing 'something' in createQueueConnection().


Was all that even remotely helpful for diagnosing the problem? Anything specific I should try to do?

Regards

Priit

Posted by nkamath on 31-Aug-2010 07:39

The comments were helpful, let's try to set up a simplest scenario to reproduce the issue

-QueuePTP/ReliableTalk sample

-remove the ssl proxy mediation

-broker be non-SSL (if possible)

-check the client working correctly

-drop broker connection using iptables -j DROP (at which point is this step done?)

-check that the client is blocked in createQueueConnection

does the above seem correct? can you try these steps with a non-SSL broker? It will be good to get a pstack (or equivalent) output when the client is blocked in createQueueConnection

Regards,

Navin

Posted by priitr on 01-Sep-2010 08:54

Hi again

Fortunately my clients SonicMQ service provider agreed to let me use also a non-SSL listener for testing purposes.

I tried that ReliableTalk example application against that listener and can assure that it works as expected and intended.

If the network connectiond drops, SonicMQ session terminates with exception and createQueueConnection is called to recreate the connection.

As the server is can not reached,  createQueueConnection terminates with exception,  program prints "Cannot connect to Broker - ..., try again in.." and the process resumes until network connectivity is restored.

However, if I'm using (SSL)proxy or the program listening on broker address isn't SonicMQ, I get a hung in createQueueConnection.

It seems, that the problem manifests itself if:

1. createQueueConnection _can_ establish a TCP connection to the broker port and

2. It does not get something it expects back from connected port.

pstack then gives us the following call stack:

Thread 2 (Thread 0x497cb70 (LWP 11102)):
#0  0x00a43416 in __kernel_vsyscall ()
#1  0x0048ede6 in poll () from /lib/libc.so.6
#2  0x00140838 in spr_evsdisp_dispatch () from /home/priitr/Sonic762/SonicCClient7.6.2/Linux_AS4/bin/release/libspr.so
#3  0x0014c9a3 in io_broker_connection_connect () from /home/priitr/Sonic762/SonicCClient7.6.2/Linux_AS4/bin/release/libio.so
#4  0x0015c3f5 in _jio_connect_orchestrator_connect_single () from /home/priitr/Sonic762/SonicCClient7.6.2/Linux_AS4/bin/release/libjio.so
#5  0x0015c65a in _jio_connect_orchestrator_connect_iterator () from /home/priitr/Sonic762/SonicCClient7.6.2/Linux_AS4/bin/release/libjio.so
#6  0x0015c869 in jio_connect_orchestrator_connect () from /home/priitr/Sonic762/SonicCClient7.6.2/Linux_AS4/bin/release/libjio.so
#7  0x001639a2 in jio_connection_connect () from /home/priitr/Sonic762/SonicCClient7.6.2/Linux_AS4/bin/release/libjio.so
#8  0x00efe15a in progress::message::jclient::Connection::constructorHelper(progress::message::jclient::Connection*, java::lang::String*, java::lang::String*, java::lang::String*, java::lang::String*, java::lang::String*, java::util::Hashtable*, progress::message::jclient::ILoginSPI*) () from /home/priitr/Sonic762/SonicCClient7.6.2/Linux_AS4/bin/release/libsmq.so
#9  0x00f3b05a in progress::message::jclient::QueueConnection::constructorHelper(progress::message::jclient::QueueConnection*, java::lang::String*, java::lang::String*, java::lang::String*, java::lang::String*, java::lang::String*, java::util::Hashtable*, progress::message::jclient::ILoginSPI*) () from /home/priitr/Sonic762/SonicCClient7.6.2/Linux_AS4/bin/release/libsmq.so
#10 0x00f441cc in progress::message::jclient::QueueConnectionFactory::createQueueConnection(progress::message::jclient::QueueConnectionFactoryRef, java::lang::StringRef, java::lang::StringRef, unsigned char, unsigned char) () from /home/priitr/Sonic762/SonicCClient7.6.2/Linux_AS4/bin/release/libsmq.so
#11 0x00f44617 in progress::message::jclient::QueueConnectionFactory::createQueueConnection() () from /home/priitr/Sonic762/SonicCClient7.6.2/Linux_AS4/bin/release/libsmq.so
#12 0x00f3a5b4 in progress::message::jclient::createQueueConnection(java::lang::StringRef, java::lang::StringRef, java::lang::StringRef, java::lang::StringRef) () from /home/priitr/Sonic762/SonicCClient7.6.2/Linux_AS4/bin/release/libsmq.so
#13 0x0804afb6 in ReliableTalk::setupConnection() ()
#14 0x0804bf90 in ReliableTalk::onException(java::lang::ExceptionRef) ()
#15 0x00efc995 in progress::message::jclient::Connection::onException(progress::message::jclient::JMSExceptionRef) () from /home/priitr/Sonic762/SonicCClient7.6.2/Linux_AS4/bin/release/libsmq.so
#16 0x00efcd01 in progress::message::jclient::Connection::notifyExceptionListener(unsigned int) () from /home/priitr/Sonic762/SonicCClient7.6.2/Linux_AS4/bin/release/libsmq.so
#17 0x00efd092 in progress::message::jclient::Connection::changeConnectionState(int, unsigned int) () from /home/priitr/Sonic762/SonicCClient7.6.2/Linux_AS4/bin/release/libsmq.so
#18 0x00efd109 in progress::message::jclient::connection_state_change_listener(t_jio_connection_struct*, t_jio_connection_state, unsigned int, void*) () from /home/priitr/Sonic762/SonicCClient7.6.2/Linux_AS4/bin/release/libsmq.so
#19 0x001630c3 in jio_connection_state_update () from /home/priitr/Sonic762/SonicCClient7.6.2/Linux_AS4/bin/release/libjio.so
#20 0x00164c30 in jio_connection_handle_connection_drop () from /home/priitr/Sonic762/SonicCClient7.6.2/Linux_AS4/bin/release/libjio.so
#21 0x0016e32e in _lwc_connection_shutdown_phase2 () from /home/priitr/Sonic762/SonicCClient7.6.2/Linux_AS4/bin/release/liblwc.so
#22 0x0016e461 in _lwc_connection_shutdown_dpc_handler () from /home/priitr/Sonic762/SonicCClient7.6.2/Linux_AS4/bin/release/liblwc.so
#23 0x00140b3b in spr_evsdisp_dispatch () from /home/priitr/Sonic762/SonicCClient7.6.2/Linux_AS4/bin/release/libspr.so
#24 0x0016e94f in _lwc_client_thread_ep () from /home/priitr/Sonic762/SonicCClient7.6.2/Linux_AS4/bin/release/liblwc.so
#25 0x001415f3 in spr_thread_thread_entry_point () from /home/priitr/Sonic762/SonicCClient7.6.2/Linux_AS4/bin/release/libspr.so
#26 0x0057c919 in start_thread () from /lib/libpthread.so.0
#27 0x00499cbe in clone () from /lib/libc.so.6
Thread 1 (Thread 0xb788f6d0 (LWP 11099)):
#0  0x00a43416 in __kernel_vsyscall ()
#1  0x004898fb in read () from /lib/libc.so.6
#2  0x004268eb in _IO_new_file_underflow () from /lib/libc.so.6
#3  0x0042860b in _IO_default_uflow_internal () from /lib/libc.so.6
#4  0x00429c1a in __uflow () from /lib/libc.so.6
#5  0x0042319c in getc () from /lib/libc.so.6
#6  0x06a2eb56 in __gnu_cxx::stdio_sync_filebuf >::underflow() () from /usr/lib/libstdc++.so.6
#7  0x06a1a35b in std::basic_istream >::getline(char*, int, char) () from /usr/lib/libstdc++.so.6
#8  0x0804b8f9 in ReliableTalk::talker(char*, char*, char*, char*, char*, int) ()
#9  0x0804ccc0 in main ()

Regards

Priit

Posted by nkamath on 09-Sep-2010 10:23

Hi Priit,

I saw a similar problem at first attempt itself if I try to connect to a SSL enabled/configured broker from the C client (which has no support for SSL). I tried on Solaris but the stack looks very similar

  [1] __pollsys(0x8ad58, 0x1, 0xffbfe948, 0x0, 0xfe7ba480, 0x1), at 0xfe4c65c4
  [2] _pollsys(0x8ad58, 0x1, 0xffbfe948, 0x0, 0x0, 0x0), at 0xfe4b965c
  [3] _poll(0x8ad58, 0x1, 0x19, 0x10624c00, 0x5f5e1, 0x17d7840), at 0xfe461b34
=>[4] _spr_evsdisp_dispatch(evsdispselect = 0x8ab00, timeout = 25), line 671 in "spr_evsdisp.c"
  [5] spr_evsdisp_dispatch(evsdisp = 0x8ab00, timeout = -1), line 816 in "spr_evsdisp.c"
  [6] io_broker_connection_connect(iobc = 0x78d80, timeout = 0.0), line 1971 in "iobc.c"
  [7] _jio_connect_orchestrator_connect_single(co = 0x784f0, p = 0x768f0, timeout = 1284040963.4949, honour_timeout = 0, iobc = 0x77f54), line 283 in "jio_connect_orchestrator.c"
  [8] _jio_connect_orchestrator_connect_iterator(co = 0x784f0, p = 0x768f0, timeout = 1284040963.4949, honour_timeout = 0, iobc = 0x77f54), line 357 in "jio_connect_orchestrator.c"
  [9] _jio_connect_orchestrator_connect(co = 0x784f0, p = 0x768f0, default_broker_url = 0x2da28 "ssl://sonicsol08:8889", t = -1, iobc = 0x77f54), line 452 in "jio_connect_orchestrator.c"
  [10] jio_connect_orchestrator_connect(co = 0x784f0, default_broker_url = 0x2da28 "ssl://sonicsol08:8889", p = 0x768f0, iobc = 0x77f54), line 503 in "jio_connect_orchestrator.c"
  [11] jio_connection_connect(connection = 0x77ef0, default_broker_url = 0x2da28 "ssl://sonicsol08:8889"), line 1409 in "jio_connection.c"
  [12] progress::message::jclient::Connection::constructorHelper(conn = 0x76720, brokerURL = 0x72588, connectID = (nil), username = 0x759a0, password = 0x759b8, clientID = (nil), environment = 0x752f0, loginSPI = 0x2d9a8), line 461 in "Connection.cpp"
  [13] progress::message::jclient::QueueConnection::constructorHelper(conn = 0x76720, brokerURL = 0x72588, connectID = (nil), username = 0x759a0, password = 0x759b8, clientID = (nil), env = 0x752f0, loginSPI = 0x2d9a8), line 103 in "QueueConnection.cpp"
  [14] progress::message::jclient::QueueConnectionFactory::createQueueConnection(cf = CLASS, username = CLASS, password = CLASS, checkTxn = '\001', tcpNoDelay = '\001'), line 295 in "QueueConnectionFactory.cpp"
  [15] progress::message::jclient::QueueConnectionFactory::createQueueConnection(this = 0x75590, userName = CLASS, password = CLASS), line 258 in "QueueConnectionFactory.cpp"
  [16] Talk::talker(this = 0xffbff730, broker = 0xffbff94a "ssl://sonicsol08:8889", username = 0xffbff963 "Administrator", password = 0xffbff974 "Administrator", qReceiver = 0xffbff986 "SampleQ1", qSender = 0xffbff993 "SampleQ1", timeout = 0), line 128 in "Talk.cpp"
  [17] main(argc = 11, argv = 0xffbff7d4), line 411 in "Talk.cpp"

To me it looks like the broker is right in not responding to a normal packet on a SSL acceptor in above situation. FWIW, we will be implementing SSL in the client side and ensure that encryption etc are supported. IMO, this is the right way forward.

The case of the ssl tunneled proxy seems to be different, complicated and not necessarily related to above even though stacks are similar. I say this because the proxy approach works for you at first time and continuously so until the connection is dropped whereas in my case, it's the first connection attempt. I would surmise a guess that if the SSL handshake packets and sonic connection packets are not dropped, it will work correctly. Is it possible to shutdown and restart broker at the time when normal data operations aka chat conversation is going on? i.e do a disconnect without dropping higher protocol packets from ssl or sonic. That should tell us whether we are on the right track or not.

Regards,

Navin

This thread is closed