PASOE - Clustering and Session Replication

Posted by christian.bryan@capita.co.uk on 02-Apr-2019 13:42

HI 

I am trying to setup PASOE clustering and i have followed the documentation but i cannot get the session replication part to work.

I have successfully setup a tomcat load balancer with Apache HTTPD and MOD_JK on a windows server and this works as expected.

Next i tried to setup session replication between two PASOE instances on the same server.

However i cannot get this to work and a i am receiving the following error in the catalina.out file

02-Apr-2019 12:52:39.256 WARNING [Tribes-Task-Receiver[Catalina-Channel]-1] org.apache.catalina.ha.session.ClusterSessionListener.messageReceived Context manager doesn't exist:[]

Posted by Paul Connaughton on 03-Apr-2019 13:59

Hi Christian,

This is a known defect in 11.7.2 (ADAS-4236). It was fixed in the 12.0 release. It is also fixed in the upcomming 11.7.5 service pack which is tentatively scheduled for an end of Q2, early Q3 release. The workaround in the defect is the same as you discovered.

All Replies

Posted by Roy Ellis on 02-Apr-2019 13:54

Did you run

<instance-directory>/bin/tcman.sh feature Cluster=on

To see your current setting

<instance-directory>/bin/tcman.sh feature

Let me know, Roy

Posted by christian.bryan@capita.co.uk on 02-Apr-2019 14:03

Hi Roy

Yes i have run that and i noticed a bug in OE 11.7.2 turning on that feature essentially un-comments

<!-- feature:begin:Cluster:on -->

 <Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster"

channelSendOptions="${psc.as.clust.sendoptions}">

<Manager className="org.apache.catalina.ha.session.${psc.as.clust.manager}"

 expireSessionsOnShutdown="${psc.as.clust.expireOnShut}"

 notifyListenersOnReplication="${psc.as.clust.notifyListeners}"

 maxInactiveInterval="${psc.as.clust.inactivetimeout}"/>

<Channel className="org.apache.catalina.tribes.group.GroupChannel">

 <Membership className="org.apache.catalina.tribes.membership.McastService"

 address="${psc.as.clust.mcast.addr}"

 port="${psc.as.clust.mcast.port}"

 frequency="${psc.as.clust.mcast.freq}"

 dropTime="${psc.as.clust.mcast.dropafter}" />

 <Sender className="org.apache.catalina.tribes.transport.ReplicationTransmitter">

<Transport className="org.apache.catalina.tribes.transport.nio.PooledParallelSender"/>

 </Sender>

 <Receiver className="org.apache.catalina.tribes.transport.nio.NioReceiver"

address="${psc.as.clust.recv.addr}"

port="${psc.as.clust.recv.port}"

autoBind="${psc.as.clust.recv.autobind}"

selectorTimeout="${psc.as.clust.recv.selectortimeout}"

maxThreads="${psc.as.clust.recv.maxthreads}"

tcpNoDelay="${psc.as.clust.recv.nodelay}"

timeout="${psc.as.clust.recv.timeout}" />

 <Interceptor className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"/>

 <!-- CJB <Interceptor className="org.apache.catalina.tribes.group.interceptors.MessageDispatch15Interceptor"/> -->

 <Interceptor className="org.apache.catalina.tribes.group.interceptors.MessageDispatchInterceptor"/>

</Channel>

<Valve className="org.apache.catalina.ha.tcp.ReplicationValve" filter=""/>

<Valve className="org.apache.catalina.ha.session.JvmRouteBinderValve"/>

<!--CJB <ClusterListener className="org.apache.catalina.ha.session.JvmRouteSessionIDBinderListener"/> -->

<ClusterListener className="org.apache.catalina.ha.session.ClusterSessionListener"/>

 </Cluster>

<!-- feature:end:Cluster:on -->

The comments marked CJB i had to amend as they are Tomcat 7 stuff which is depreciated in Tomcat 8 which is used in OE 11.7.2 if i didn't do this the PAS won't start.

Posted by christian.bryan@capita.co.uk on 02-Apr-2019 14:34

I am going to upgrade to 11.7.4 to see if that helps!

Posted by dbeavon on 02-Apr-2019 15:38

Are you working from the PASOE documentation, or from tomcat docs?  

I'd be interested to hear how this ends up working for you.  And what docs you found for the setup work.  Session replication seems like it would be a useful feature.  

PS. Our own load-balancing is done by Citrix Netscaler.  Unfortunately the failover from one PASOE server to another is fairly painful.  During a failover, all of the "state-free/session-free" openclients running on .Net will crash.  This is because the openclient's internal session pool becomes invalidated as a result of the failover.  The moment one of them begins another round-trip request to PASOE, they encounter a generic communication exception.  (The exception seems to indicates a very general-purpose network problem - "something bad happened").

Posted by Paul Connaughton on 03-Apr-2019 13:59

Hi Christian,

This is a known defect in 11.7.2 (ADAS-4236). It was fixed in the 12.0 release. It is also fixed in the upcomming 11.7.5 service pack which is tentatively scheduled for an end of Q2, early Q3 release. The workaround in the defect is the same as you discovered.

Posted by christian.bryan@capita.co.uk on 03-Apr-2019 15:04

Hi Paul

Thanks for the confirmation.

With all the changes i have made i still can't get the clustering / session replication bits to work which makes me wonder whether there is something else which is broken or is not documented.

My assumption is that with the clustering turned on, i should be able to stop one of the PAS instances in the cluster and that because the JSESSIONID has been replicated to the other sessions the load balancer can route me to another PAS session and the JVMRoute added to the end of the JSESSIONID is re-written to match the PAS instance i have now been routed to.

If this is incorrect i am not sure what the point of turning the clustering on is, as i can get the load balancing simply from switching on the AJP port.

I have logged a support call for this 00486394.

One thing i tried was commenting out the standard manager in content.xml e.g.

<!-- CJB

   <Manager

       maxActiveSessions="-1"

       pathname=""

       processExpiresFrequency="6" >

       <SessionIdGenerator sessionIdLength="22" />

    </Manager>

-->

This seemed to throw some new errors in catalin.out which suggested that now the DeltaManager was being used but i think i am falling at the following error.

03-Apr-2019 14:29:15.603 INFO [localhost-startStop-1] org.apache.catalina.ha.session.DeltaManager.startInternal Register manager [localhost#] to cluster element [Engine] with name [Catalina]

03-Apr-2019 14:29:15.603 INFO [localhost-startStop-1] org.apache.catalina.ha.session.DeltaManager.startInternal Starting clustering manager at [localhost#]

03-Apr-2019 14:29:15.627 INFO [localhost-startStop-1] org.apache.catalina.ha.session.DeltaManager.getAllClusterSessions Manager [localhost#], requesting session state from [org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 1, 173, 66}:4001,{10, 1, 173, 66},4001, alive=15968, securePort=-1, UDP Port=-1, id={-48 113 -63 -73 57 87 72 -21 -65 -4 -52 49 88 113 -5 -108 }, payload={}, command={}, domain={}]]. This operation will timeout if no session state has been received within [60] seconds.

>>>>> WARNING

03-Apr-2019 14:29:15.733 WARNING [localhost-startStop-1] org.apache.catalina.ha.session.DeltaManager.waitForSendAllSessions Manager [localhost#]: No context manager send at [4/3/19 2:29 PM] received in [127] ms.

<<<<<<

PS - I am not deploying any web apps just ABL apps and the web-inf/web.xml has the <distributable/> tag.

Posted by christian.bryan@capita.co.uk on 17-Jun-2019 05:04

Can the 11.7 workaround be published?

Posted by christian.bryan@capita.co.uk on 30-Aug-2019 09:30

PS - The bug with the content.xml still exists in 11.7.5.

This thread is closed