Pend. Users at Promon, r&d, 1 (status displays), 3 (serv

Posted by Rob Fitzpatrick on 20-Nov-2017 15:09

Are you specifying -minport/-maxport on your connection brokers? You should have unique, non-overlapping port ranges for each one. It makes firewall configuration and troubleshooting of issues like this one much more straightforward.

If a client connects to the broker (-S port) but can't connect to the assigned server port (between -minport and -maxport inclusive) then you will have a pending connection. Check that the firewall(s) between clients and server allow connections to the server on all required broker and server ports.

Posted by Rob Fitzpatrick on 20-Nov-2017 15:12

Also check promon to ensure that all servers that should be able to spawn are able to. Check the DB log for errors related to spawning servers. Note that the broker won't spawn a 4GL server on a port if it is defined in the OS services file, even if it isn't currently in use. Check that there aren't defined services with port numbers within your minport/maxport ranges.

Posted by bremmeyr on 20-Nov-2017 15:20

Thank you Rob for the input. In promon I find servers for the number of servers I expect. All of the servers have Cur. users values that account for the connected users.

It looks like the number of pending users adds up to what I expect are open slots. So new connections get the no more resources.

I need to look into the min/max port, service file, and run a netstat to see what is in use.

Posted by Rob Fitzpatrick on 20-Nov-2017 15:22

Also check that -n is high enough given your anticipated connection count.

Posted by George Potemkin on 20-Nov-2017 18:45

> What causes values to show in pend. in servers status of promon?

Login broker sents a port of the started server to a new client. Then it checks if the number of the clients served by the server is increased. In other words the login broker checks if the client successfully connected to the server. Otherwise it's counted as a pending client connection.

So it's always a port inside the -minport/-maxport that was used to start a remote client server. Most likely there is a problem with the port on OS level.

Can you post the output from promon?
Example:

Sv                                                   Pend.   Cur.   Max.   Port
No    Pid  Type       Protocol               Logins  Users  Users  Users    Num

361 50988512 Auto       TCP                       339      0      2     30  10375
362 15599228 Auto       TCP                       460      0      2     30  10376
363 66258402 Auto       TCP                         0      2      0     30  10405
364 16449868 Auto       TCP                       306      0      2     30  10378
365 42468244 Auto       TCP                       362      0      2     30  10379

Srv 363 did not have the successful logins. It was a server newly started on port 10405 and the clients were unable to connect to this port.

Posted by bremmeyr on 21-Nov-2017 09:41

Connections can be made on each of the ports and then connections seem to get stuck in pending.

11/21/17 Status: Servers 07:37:58

Sv Pend. Cur. Max. Port

No Pid Type Protocol Logins Users Users Users Num

0 5484 Login TCP 0 0 0 2 2021

1 16956 Auto TCP 1 0 1 2 4002

2 7256 Login TCP 20312 0 0 6 2020

3 8132 Auto TCP 2088 4 2 6 4100

4 6276 Auto TCP 3762 4 2 6 4101

5 6116 Auto TCP 2127 4 2 6 4102

6 18496 Auto TCP 790 0 1 2 4006

7 4848 Auto TCP 6070 2 4 6 4103

8 8424 Auto TCP 2061 3 3 6 4104

9 7772 Auto TCP 2067 5 1 6 4105

10 7472 Auto TCP 2112 3 3 6 4106

11 13372 Auto TCP 105 0 1 2 4003

12 14000 Auto TCP 600 0 1 2 4004

13 17196 Auto TCP 461 0 0 2 4005

Posted by George Potemkin on 21-Nov-2017 09:56

Too busy servers may result in the pending connections.

What about db performance?

R&D/Activity/Summary/Commits per sec and Active trans?

Activity/Performance Indicators/Latch timeouts per sec?

Posted by gmuchon on 21-Nov-2017 10:22

The parameter PenndConnTime is set?

Posted by bremmeyr on 21-Nov-2017 12:06

R&D/Activity/Summary/Commits per sec and Active trans?

11/21/17 Activity: Summary

10:05:50 08/08/17 19:41 to 11/21/17 10:05 (2511 hrs 23 min)

Event Total Per Sec |Event Total Per Sec

Commits 98954K 11.2 |DB Reads 1414600K 160.2

Undos 6138 0.0 |DB Writes 18149892 2.0

Record Reads 346135M 40144.7 |BI Reads 701766 0.1

Record Updates 104761K 11.9 |BI Writes 5625586 0.6

Record Creates 5450812 0.6 |AI Writes 5345318 0.6

Record Deletes 2844443 0.3 |Checkpoints 76869 0.0

Record Locks 1929763K 218.6 |Flushed at chkpt 16441621 1.8

Record Waits 125 0.0 |Active trans 0

Rec Lock Waits 0 % BI Buf Waits 1 % AI Buf Waits 1 %

Writes by APW 0 % Writes by BIW 0 % Writes by AIW 0 %

DB Size: 14 GB BI Size: 1263 MB AI Size: 376 K

Empty blocks: 91305 Free blocks: 41426 RM chain: 2814

Buffer Hits 99 % Primary Hits 99 % Alternate Hits 0 %

8 Servers, 24 Users (5 Local, 19 Remote, 9 Batch), 0 Apws

Posted by bremmeyr on 21-Nov-2017 12:16

11/21/17 Activity: Performance Indicators

10:15:28 08/08/17 19:41 to 11/21/17 10:15 (2511 hrs 33 min)

Total Per Min Per Sec

Per Tx

Commits 98955K 672 11.21

1.00

Undos 6138 0 0.00

0.00

Index operations 271070M 1886198 31436.64

2805.08

Record operations 346274M 2409489 40158.15

3583.30

Total o/s i/o 1444471K 9816 163.59

14.60

Total o/s reads 1416032K 9622 160.37

14.31

Total o/s writes 29121304 193 3.22

0.29

Background o/s writes 2121117 14 0.23

0.02

Partial log writes 856362 6 0.09

0.01

Database extends 118336 1 0.01

0.00

Total waits 23008 0 0.00

0.00

Lock waits 125 0 0.00

0.00

Resource waits 22883 0 0.00

0.00

Latch timeouts 262567K 1784 29.74

2.65

Buffer pool hit rate: 99 % Primary pool hit rate: 99 % Alternate pool

hit rate: 0 %

Posted by George Potemkin on 21-Nov-2017 12:18

> 10:05:50 08/08/17 19:41 to 11/21/17 10:05 (2511 hrs 23 min)

Interval is longer than 100 days. It can't help to identify the current bottleneck.

Posted by bremmeyr on 21-Nov-2017 12:22

Is there a way to refresh the counters without a DB stop and start?

Posted by George Potemkin on 21-Nov-2017 12:25

Enter <return>, A, L, R, S, U, Z, P, T, or X (? for help): ?

Entry    Action taken

A        Activate auto repeat mode. The current display then restarts
         after the number of seconds specified by the current display
         pause time. Stop after the number of times specified by the
         auto repeat count or when you press control-c.
S        Sample activity counters for the number of seconds specified
         by the sampling interval. The current display then restarts.
         The data shown include only activity which occurred during the
         sample interval.
U        Update activity counters. The current display then restarts.
         The data shown include changes which occurred since the initial
         set of data was collected.
Z        Zero the activity counters so updates show changes from now.

I prefer: Z, wait a few seconds, U

Posted by bremmeyr on 21-Nov-2017 12:42

♀11/21/17 Activity: Performance Indicators

10:40:55 11/21/17 10:31 to 11/21/17 10:31 (9 sec)

Total Per Min Per Sec

Per Tx

Commits 0 0 0.00

0.00

Undos 0 0 0.00

0.00

Index operations 953316 6355440 105924.00

0.00

Record operations 958636 6390907 106515.11

0.00

Total o/s i/o 497 3313 55.22

0.00

Total o/s reads 497 3313 55.22

0.00

Total o/s writes 0 0 0.00

0.00

Background o/s writes 0 0 0.00

0.00

Partial log writes 0 0 0.00

0.00

Database extends 0 0 0.00

0.00

Total waits 0 0 0.00

0.00

Lock waits 0 0 0.00

0.00

Resource waits 0 0 0.00

0.00

Latch timeouts 800 5333 88.89

0.00

Buffer pool hit rate: 99 % Primary pool hit rate: 99 % Alternate pool

hit rate: 0 %

Posted by George Potemkin on 21-Nov-2017 12:49

Latch timeouts is 88.89 per sec. I suspect it's mainly MTX naps.

Check promon/R&D/debghb/6/11. Latch Counts

Update: no commits/undoes per 9 seconds! It's unlikely that MTX latch was used at all. What latches have non-zero naps?

Posted by bremmeyr on 21-Nov-2017 12:53

♀11/21/17 Activity: Latch Counts

10:52:34 08/08/17 19:41 to 11/21/17 10:51 (2512 hrs 9 min)

----- Locks ----- ------ Busy ------ Naps -

--------- Spins ----------- ----- Nap Max -----

Owner Total /Sec /Sec Pct /Sec /

Sec /Lock /Busy Total HWM

MTX -- 536071432 59 0 0.0 0

0 0 0 0 0

USR -- 43428375 4 0 0.0 0

0 0 0 0 0

OM -- 310790521764 34365 0 0.0 4

0 0 0 0 0

BIB -- 443499238 49 0 0.0 0

0 0 0 0 0

SCH -- 1466550 0 0 0.0 0

0 0 0 0 0

LKP -- 11168141 1 0 0.0 0

0 0 0 0 0

GST -- 108999 0 0 0.0 0

0 0 0 0 0

TXT -- 413323790 45 0 0.0 0

0 0 0 0 0

LKT -- 39722330474 4392 0 0.0 0

0 0 0 0 0

LKT -- 39711588891 4391 0 0.0 0

0 0 0 0 0

LKT -- 39719111353 4391 0 0.0 0

0 0 0 0 0

LKT -- 39696930662 4389 0 0.0 0

0 0 0 0 0

SEQ -- 4319118 0 0 0.0 0

0 0 0 0 0

AIB -- 442376741 48 0 0.0 0

0 0 0 0 0

TXQ -- 586187429 64 0 0.0 0

0 0 0 0 0

Posted by bremmeyr on 21-Nov-2017 13:01

♀11/21/17 Activity: Latch Counts

10:56:29 08/08/17 19:41 to 11/21/17 10:51 (2512 hrs 9 min)

----- Locks ----- ------ Busy ------ Naps -

--------- Spins ----------- ----- Nap Max -----

Owner Total /Sec /Sec Pct /Sec /

Sec /Lock /Busy Total HWM

MTX -- 536071432 59 0 0.0 0

0 0 0 0 0

USR -- 43428375 4 0 0.0 0

0 0 0 0 0

OM -- 310790521764 34365 0 0.0 4

0 0 0 0 0

BIB -- 443499238 49 0 0.0 0

0 0 0 0 0

SCH -- 1466550 0 0 0.0 0

0 0 0 0 0

LKP -- 11168141 1 0 0.0 0

0 0 0 0 0

GST -- 108999 0 0 0.0 0

0 0 0 0 0

TXT -- 413323790 45 0 0.0 0

0 0 0 0 0

LKT -- 39722330474 4392 0 0.0 0

0 0 0 0 0

LKT -- 39711588891 4391 0 0.0 0

0 0 0 0 0

LKT -- 39719111353 4391 0 0.0 0

0 0 0 0 0

LKT -- 39696930662 4389 0 0.0 0

0 0 0 0 0

SEQ -- 4319118 0 0 0.0 0

0 0 0 0 0

AIB -- 442376741 48 0 0.0 0

0 0 0 0 0

TXQ -- 586187429 64 0 0.0 0

0 0 0 0 0

Enter <return> for more, A, L, R, S, U, Z, P, T, or X (? for help):

♀11/21/17 Activity: Latch Counts

10:58:44 08/08/17 19:41 to 11/21/17 10:51 (2512 hrs 9 min)

----- Locks ----- ------ Busy ------ Naps -

--------- Spins ----------- ----- Nap Max -----

Owner Total /Sec /Sec Pct /Sec /

Sec /Lock /Busy Total HWM

EC -- 0 0 0 0.0 0

0 0 0 0 0

LKF -- 158527987901 17529 0 0.0 0

0 0 0 0 0

BFP -- 758422154 83 0 0.0 0

0 0 0 0 0

BHT -- 802039844902 88684 0 0.0 0

0 0 0 0 0

PWQ -- 0 0 0 0.0 0

0 0 0 0 0

CPQ -- 34543367 3 0 0.0 0

0 0 0 0 0

LRU -- 799182187673 88368 0 0.0 24

0 0 0 0 0

LRU -- 0 0 0 0.0 0

0 0 0 0 0

BUF -- 397460116541 43948 0 0.0 0

0 0 0 0 0

BUF -- 395255999673 43704 0 0.0 0

0 0 0 0 0

BUF -- 414575552839 45841 0 0.0 0

0 0 0 0 0

BUF -- 398311657901 44042 0 0.0 0

0 0 0 0 0

INC -- 0 0 0 0.0 0

0 0 0 0 0

L29 -- 0 0 0 0.0 0

0 0 0 0 0

L30 -- 0 0 0 0.0 0

0 0 0 0 0

L31 -- 0 0 0 0.0 0

0 0 0 0 0

Posted by George Potemkin on 21-Nov-2017 13:02

You can create a short sampling interval using any "Activity" screen and all "Activity" screens will report statistics exactly for the same interval. It's a good idea to post the different statistics for the same sampling interval.

Posted by bremmeyr on 21-Nov-2017 13:13

LRU and OM are non-zero naps.

Posted by bremmeyr on 21-Nov-2017 13:15

I am feeling like a promon newbie. I am not sure what you saying with this comment "You can create a short sampling interval using any "Activity" screen and all "Activity" screens will report statistics exactly for the same interval."

Posted by bremmeyr on 21-Nov-2017 13:27

Perhaps add -lruskips 25 ? Do you expect that would have an impact on getting the connections showing as pending?

Posted by George Potemkin on 21-Nov-2017 13:28

On any Activity screen do: Z, wait a few second, U

Then copy the screen and go to other screens to copy statistics. Do not update interval on these screens.

It would suggest to post data for:

t/2/1. Activity: Summary

t/3/1. Activity: Performance Indicators

t/debghb/6/11. Activity: Latch Counts

t/1/3. Status: Servers

Posted by bremmeyr on 21-Nov-2017 13:44

♀11/21/17 Activity: Summary

11:41:12 11/21/17 11:41 to 11/21/17 11:41 (10 sec)

Event Total Per Sec |Event Total Per Sec

Commits 17 1.7 |DB Reads 220 22.0

Undos 0 0.0 |DB Writes 0 0.0

Record Reads 398316 39831.6 |BI Reads 0 0.0

Record Updates 17 1.7 |BI Writes 3 0.3

Record Creates 0 0.0 |AI Writes 3 0.3

Record Deletes 0 0.0 |Checkpoints 0 0.0

Record Locks 51 5.1 |Flushed at chkpt 0 0.0

Record Waits 0 0.0 |Active trans 0

Rec Lock Waits 0 % BI Buf Waits 0 % AI Buf Waits 0 %

Writes by APW 0 % Writes by BIW 0 % Writes by AIW 0 %

DB Size: 14 GB BI Size: 1263 MB AI Size: 1456 K

Empty blocks: 91385 Free blocks: 41426 RM chain: 2832

Buffer Hits 99 % Primary Hits 99 % Alternate Hits 0 %

8 Servers, 22 Users (4 Local, 18 Remote, 8 Batch), 0 Apws

♀11/21/17 Activity: Performance Indicators

11:42:35 11/21/17 11:41 to 11/21/17 11:41 (10 sec)

Total Per Min Per Sec

Per Tx

Commits 17 102 1.70

1.00

Undos 0 0 0.00

0.00

Index operations 389515 2337090 38951.50

22912.65

Record operations 398333 2389998 39833.30

23431.35

Total o/s i/o 226 1356 22.60

13.29

Total o/s reads 220 1320 22.00

12.94

Total o/s writes 6 36 0.60

0.35

Background o/s writes 6 36 0.60

0.35

Partial log writes 6 36 0.60

0.35

Database extends 0 0 0.00

0.00

Total waits 0 0 0.00

0.00

Lock waits 0 0 0.00

0.00

Resource waits 0 0 0.00

0.00

Latch timeouts 1888 11328 188.80

111.06

Buffer pool hit rate: 99 % Primary pool hit rate: 99 % Alternate pool

hit rate: 0 %

♀11/21/17 Activity: Latch Counts

11:43:29 11/21/17 11:41 to 11/21/17 11:41 (10 sec)

----- Locks ----- ------ Busy ------ Naps -

--------- Spins ----------- ----- Nap Max -----

Owner Total /Sec /Sec Pct /Sec /

Sec /Lock /Busy Total HWM

MTX -- 71 7 0 0.0 0

0 0 0 0 0

USR -- 45 4 0 0.0 0

0 0 0 0 0

OM -- 466061 46606 0 0.0 36

0 0 0 0 0

BIB -- 60 6 0 0.0 0

0 0 0 0 0

SCH -- 0 0 0 0.0 0

0 0 0 0 0

LKP -- 0 0 0 0.0 0

0 0 0 0 0

GST -- 0 0 0 0.0 0

0 0 0 0 0

TXT -- 68 6 0 0.0 0

0 0 0 0 0

LKT -- 72 7 0 0.0 0

0 0 0 0 0

LKT -- 43 4 0 0.0 0

0 0 0 0 0

LKT -- 74 7 0 0.0 0

0 0 0 0 0

LKT -- 32 3 0 0.0 0

0 0 0 0 0

SEQ -- 0 0 0 0.0 0

0 0 0 0 0

AIB -- 60 6 0 0.0 0

0 0 0 0 0

TXQ -- 70 7 0 0.0 0

0 0 0 0 0

Enter <return> for more, A, L, R, S, U, Z, P, T, or X (? for help):

♀11/21/17 Activity: Latch Counts

11:43:31 11/21/17 11:41 to 11/21/17 11:41 (10 sec)

----- Locks ----- ------ Busy ------ Naps -

--------- Spins ----------- ----- Nap Max -----

Owner Total /Sec /Sec Pct /Sec /

Sec /Lock /Busy Total HWM

EC -- 0 0 0 0.0 0

0 0 0 0 0

LKF -- 68 6 0 0.0 0

0 0 0 0 0

BFP -- 0 0 0 0.0 0

0 0 0 0 0

BHT -- 906599 90659 0 0.0 0

0 0 0 0 0

PWQ -- 0 0 0 0.0 0

0 0 0 0 0

CPQ -- 0 0 0 0.0 0

0 0 0 0 0

LRU -- 906403 90640 0 0.0 148

0 0 0 0 0

LRU -- 0 0 0 0.0 0

0 0 0 0 0

BUF -- 447032 44703 0 0.0 2

0 0 0 0 0

BUF -- 451183 45118 0 0.0 0

0 0 0 0 0

BUF -- 401535 40153 0 0.0 0

0 0 0 0 0

BUF -- 514389 51438 0 0.0 0

0 0 0 0 0

INC -- 0 0 0 0.0 0

0 0 0 0 0

L29 -- 0 0 0 0.0 0

0 0 0 0 0

L30 -- 0 0 0 0.0 0

0 0 0 0 0

L31 -- 0 0 0 0.0 0

0 0 0 0 0

♀11/21/17 Status: Servers

11:44:19

Sv Pend. Cur. Max. Port

No Pid Type Protocol Logins Users Users Users Num

0 5484 Login TCP 0 0 0 2 2021

1 16956 Auto TCP 1 0 1 2 4002

2 7256 Login TCP 20343 0 0 6 2020

3 8132 Auto TCP 2091 4 2 6 4100

4 6276 Auto TCP 3764 4 2 6 4101

5 6116 Auto TCP 2128 4 2 6 4102

6 18496 Auto TCP 790 0 1 2 4006

7 4848 Auto TCP 6079 2 4 6 4103

8 8424 Auto TCP 2067 3 2 6 4104

9 7772 Auto TCP 2069 5 0 6 4105

10 7472 Auto TCP 2120 3 2 6 4106

11 13372 Auto TCP 105 0 1 2 4003

12 14000 Auto TCP 600 0 1 2 4004

13 17196 Auto TCP 461 0 0 2 4005

Posted by George Potemkin on 21-Nov-2017 14:00

To eliminate the latch timeouts set the -omsize equal to the number of the objects in your database (or just count the _StorageObject records) and the -lruskip 100.

But in your case the pending connections are not caused by those latch timeouts. Their root case is probably outside Progress.

Posted by George Potemkin on 21-Nov-2017 14:06

10:05:50        08/08/17 19:41 to 11/21/17 10:05 (2511 hrs 23 min)
Event                  Total  Per Sec
DB Writes          18149892       2.0
BI Writes           5625586       0.6
AI Writes           5345318       0.6

Writes by APW      0 %    Writes by BIW     0 %    Writes by AIW     0 %

You should run APW, BIW and AIW.

Posted by bremmeyr on 21-Nov-2017 14:11

Thank you George for all your feedback and input.

Posted by George Potemkin on 22-Nov-2017 08:20

Mark, in your case the thousands clients successfully used the server's ports (4003-4006, 4100-4106) during 100+ days. I guess you started getting the pending users only recently - you can check the errors in db log:

Server <n> has <n> unresolved pending connections(s). Please check port <port>. (12454)

Clearing pending connections from server <n>. (12455)

Also you have the pending users only for the servers spawn by your secondary login broker (ports 4100-4106)

As a workaround/troubleshooting in future I would suggest the following:

1. Start primary database broker with -Mn twice higher than the current value.

2. Use the separate login brokers. You seem to start the primary database broker as a login broker and one secondary login broker. My suggestion: no -S parameter for the primary database broker and two secondary login brokers (with their own -minport/-maxport as you do now).

3. When you will face with the pending users again stop an affected secondary login broker without stopping the database. Then start it again on the same port but with new port range (new -minport/-maxport). The existent clients will be still served by old servers. New clients will be connected to the new servers on new ports.

4. Stop old servers when all their clients will be disconnected. So you will free the slots in the -Mn table and you will be able to use them to restart a login broker again in the future when/if needed.

Posted by bremmeyr on 22-Nov-2017 16:26

Thank you very much.

This thread is closed

All Replies