I would just like to know if anyone has a details answer as to what these errors means in the DB log file.
19:15:08 BROKER 0: Server 4 has 1 unresolved pending connections(s). Please check port 5703. (12454)
19:15:08 BROKER 0: Clearing pending connections from server 4. (12455)
We are currently experiencing an issue with batch jobs that run in the evening, they are failing with a "lock wait time out of 1800". We have multiple Customers running on our Data Centre running on version 9.1e that share one common DB that reports on dayend times. These batch jobs connect via client/server connection on our LAN
At this point i'm not sure if it is a DB recourse, coding or network issue. I'm leaning towards some type of load issue as there are multiple cronjobs that kick in the evening to start these batch processes. They start from 19h00 and end at around 22h00. The first few batch runs go though then later in the evening we start getting the "lock wait time out error". Each batch job can run for aprox 10min to 2hours so there is definitely an overlap were multiple customers are connecting to the common DB at the same time.
"Unresolved pending connections" means that a login broker sent a server's port to a new client to connect to but the server did not report back to the login broker that the client is successfully conected. It might happen, for example, when a remote client server is exteremely busy.
"Lock wait time" errors are reported by sessions that are already connected to db. It's unlikely that these errors can cause the pending connections.
9.1E is old version. It has the issue with jump notes in bi file. Just a guess: one of the remote sessions undoes a large transaction that, for example, deletes a lot of records. Undo restores these records and locks them until the end of undo. It causes the "lock wait timeout" errors. If undo faced the jump notes then it makes the server that serves the session very busy. It causes the "unresolved pending connections".
Check db log for transaction backout messages with long time between begin and end phases. During new incident check the current activity: if bi reads exceed bi writes then it caused the jump notes. MTX latch will often owned by the client that undoes its transaction.
Thanks George, I appreciate the feedback