Network Database Connection Pause using VMWare - vmxnet3 vs.

Posted by Rob Mylett on 24-Sep-2018 07:33

We have configured new Linux hosts to run in a VMWare environment.   We have 1 VM running on a single ESX host.  The HP Linux server has 4 sockets with 10 cores/socket for a total of 40 CPU's.   

The test is to connect to a database running on the new server, remotely using (-db, -H, -S) to the same database in a loop the connection should be made in less than 50 milliseconds.   All that is being done, is to do a native Progress ABL connect -db (dbname) -H (IP Address) and -S (port number), and record the elapsed time before and after the connection.

When we configure the host to run with 1 socket and 8 CPU's, for a total of 8 cores, the database acts normal and most connections are within 25-35 milliseconds.

When we change the configuration to 4 sockets and 8 CPU's, for a total of 32, the same exact causes the connection to pause, for various times, from 8 seconds to sometimes over 2 minutes.

Working with VMWare and Progress on this issue, it was determined that we try a different network driver.   We were using vmxnet3, and tried using e1000.   The e1000 driver worked, and eliminated the pause completely.

We were contacted by Progress that we should be using vmxnet3, and VMWare is stating that e1000 driver is old and should be using vmxnet3 as well. 

So my question, has anyone run into this issue?   Is anyone running with VMWare, with vmxnet3 network driver, over multiple sockets?  Has anyone tested doing a remote database connection to determine if there have been any pauses in connecting? 

Thanks.

Posted by Rob Mylett on 26-Sep-2018 08:20

We discussed 6.5 vs. 6.7 and we decided to go to 6.5.   We rarely go bleeding edge...    

So we installed 6.5 and did the test with vmxnet3, and it worked flawlessly...   The issue is gone..

All Replies

Posted by Paul Koufalis on 24-Sep-2018 08:07

Hi Rob,

This sounds suspiciously like a KB entry that was distributed via the PANS late last week.

I agree with VMWare/PSC that you should not be using E1000 over vmxnet3. I suspect that if you do further benchmarking tests you'll see that the E1000 is significantly slower than the vmxnet3. Certainly we have in our testing.

Your comments regarding 1 socket vs multiple sockets suggests that this is a NUMA issue within the hypervisor. Do simple ping tests show similar results? I.e. if you run ping for an hour or two on the 32-core VM, do you see any variations in the ping times?

Does the 4-socket/32-core configuration represent the entire physical capacity of the physical server?

Did you configure the hypervisor with vCPU = # of actual processors, or # of hyperthreads? I often see hypervisor configurations where the the number of vCPU is = # hyperthreads = 2 X [tag:cores].

Perhaps the better question is how many physical CPUs and cores, how many NUMA nodes, and then how many vCPUs configured in ESXi?

What is the scheduler being used? Example:

$ cat /sys/block/sda/queue/scheduler

noop [deadline] cfq

What is the output of numactl -H on the 32 core VM?

numactl also allow you to bind a process and it's memory to a single core or set of cores. I would be curious to see if the results are different if you tie the broker, its memory and the single _mprosrv -m1 to a single processor.

How many _mprosrv -m1 processes are running and being used by your tests? I would think just one, but want to be certain.

Are you checking if the broker and presumably single _mprosrv -m1 process are bouncing around cores? The example below shows that my _mprosrv processes are bound to either CPU 0 or 1 (the PSR column).

$ for i in $(pgrep _mprosrv);do ps -mo pid,tid,fname,user,psr -p $i;done

 PID   TID COMMAND  USER     PSR

10244     - _mprosrv root       -

   - 10244 -        root       1

 PID   TID COMMAND  USER     PSR

10274     - _mprosrv root       -

   - 10274 -        root       0

 PID   TID COMMAND  USER     PSR

10278     - _mprosrv root       -

   - 10278 -        root       1

 PID   TID COMMAND  USER     PSR

10282     - _mprosrv root       -

   - 10282 -        root       1

 PID   TID COMMAND  USER     PSR

10285     - _mprosrv root       -

   - 10285 -        root       0

 PID   TID COMMAND  USER     PSR

10313     - _mprosrv root       -

   - 10313 -        root       1

 PID   TID COMMAND  USER     PSR

10336     - _mprosrv root       -

   - 10336 -        root       0

 PID   TID COMMAND  USER     PSR

10366     - _mprosrv root       -

   - 10366 -        root       1

 PID   TID COMMAND  USER     PSR

10369     - _mprosrv root       -

   - 10369 -        root       1

 PID   TID COMMAND  USER     PSR

10397     - _mprosrv root       -

   - 10397 -        root       1

 PID   TID COMMAND  USER     PSR

10424     - _mprosrv root       -

   - 10424 -        root       0

 PID   TID COMMAND  USER     PSR

10937     - _mprosrv root       -

   - 10937 -        root       1

 PID   TID COMMAND  USER     PSR

10994     - _mprosrv root       -

   - 10994 -        root       1

 PID   TID COMMAND  USER     PSR

16326     - _mprosrv root       -

   - 16326 -        root       1

 PID   TID COMMAND  USER     PSR

34287     - _mprosrv root       -

   - 34287 -        root       1

Posted by Libor Laubacher on 24-Sep-2018 08:31

You did not mention your vSphere/ESX version, and the version of VM Tools, there have been recent vmxnet3 issues reported (in general). Pardon me for being blunt, but why/what you would need 32 CPUs on a single box for? Aside for this setting 'engaging' NUMA for the VM, you might be running into kernel scheduling contention on the host. You might want to try assigning 4 NICs for this VM, binding each NIC to one NUMA node and bonding them.

Posted by Rob Mylett on 24-Sep-2018 09:45

Scheduler:

cat /sys/block/sda/queue/scheduler

[noop] deadline cfq

numactl -H

available: 1 nodes (0)

node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

node 0 size: 491519 MB

node 0 free: 397671 MB

node distances:

node   0

 0:  10

Physical server has 4 socket, 10 cores each with HT enabled.  1 numa node when there is 4 socket, 8 cores which has been assigned to a VM.

Posted by Rob Mylett on 24-Sep-2018 09:48

We will be running 60 OE databases on this server, and that is why we need 32 cpu's.   That is not uncommon for us.  We have 4 servers set up in this configuration, to run 200 databases, about 5000 user connections.

Posted by Rob Mylett on 24-Sep-2018 09:55

proserve -db test -S 25000 -minport 20000 -maxport 29999 -B 25000 -Mn 25 -Mpb 20 -Ma 20 -Mi 1

This server is very idle right now as this is the only functioning process on this server.

When I do the test, it's always going to the same pid, as I see that through the db.lg.   This is a copy of the sports2000 database so it's very isolated but I can have up to 20 server processes.  

When I just did the ps -mo pid,fname,user,psr -p (pid) of the server process... I can see this process assigned to different processors 8, 9, 11, 12, 14, etc...   just looping every 3 seconds..

Also I want to add... when I run an strace command against the process that my test connects to, the pause does not happen and when I terminate the strace... the pause starts again.   very odd.

Posted by Paul Koufalis on 24-Sep-2018 12:18

Can you please answer Libor's questions: ESX version and VMTools version? He mentioned some known vmxnet3 issues.

Posted by Libor Laubacher on 24-Sep-2018 12:22

On the top of it - what is the HW brand in question? If that's confidentail then what has BIOS NUMA node interleaving settings has been set to ?

Posted by Paul Koufalis on 24-Sep-2018 12:32

The Linux view of a single NUMA node configuration is suspicious. Clearly you have more than one NUMA node.

Check out this article:

www.opvizor.com/decoupling-of-cores-per-socket-from-virtual-numa-topology-in-vsphere-6-5

Posted by Rob Mylett on 24-Sep-2018 13:45

HP DL560 Gen 10 chip...  

ESXj 6.0

open-vm-tools-10.1.5-3.el7.x86_64

Posted by Brian K. Maher on 25-Sep-2018 05:35

Libor & Paul,

Please note that during testing Rob did reduce the memory down to 128 GB and lower such that numactl showed a single node.  The performance problem was still present.

Rob,

I found this via Google & wonder if it applies ... kb.vmware.com/.../2129176

Posted by Brian K. Maher on 25-Sep-2018 05:43

Libor & Paul,

I did a lot of testing on our VCloud Director environment using the following config:

- CentOS 7.5, memory varied between 64 and 480 GB, virtual sockets varied between 1 and 4, virtual cpus varied between 8 and 32, OpenEdge 11.5.1.

- Windows 10 64 bit client, 1 virtual cpu, 4 GB memory, OpenEdge 11.5.1 / 11.6 / 11.7.

- CentOS 7.5, 1 virtual cpu, 4 GB memory, OpenEdge 11.5.1.

I could see the problem very, very slightly when using Windows 10 as the client and the large CentOS VM as the server but it was random and the delays were much smaller than what Rob sees..

Using CentOS 7.5 for both client and server did not show the problem.

Let me know if you want to know anything else we did during the testing.

Posted by Paul Koufalis on 25-Sep-2018 07:17

One of the first things to try is ESXi 6.5 or later. I know I know, easy for me to say over here in my office. But you are testing a vmxnet3 issue on a 3-year old version of ESXi and Libor mentioned that there are known vmxnet3 issues that have been corrected since then.

Posted by Rob Mylett on 25-Sep-2018 07:25

We tried disabling the LSO/RSC and powered the VM off and on...

The issue was still present.   Seeing lenghy pauses....

┌────────────────────────────┐

│Connect Time Disconnect Time│

│──────────── ───────────────│

│          16               0│

│          15               0│

│          17               0│

│          16               0│

│          17               0│

│          16               0│

│          16               0│

│      19,802               0│

│          17               0│

│          17               0│

│          16               0│

│          16               0│

│          17               0│

│      47,292               0│

│          17               0│

│          16               0│

│          16               0│

│          17               0│

│          17               0│

│          17               0│

│          17               0│

Posted by Rob Mylett on 25-Sep-2018 08:17

We are going to try the ESXi 6.5 version and we will test again...

Posted by Libor Laubacher on 25-Sep-2018 10:31

> Please note that during testing Rob did reduce the memory down to 128 GB and lower such that numactl showed a

> single node

IMO, 32vCPUs won't fit into 1 single NUMA node, they will span across 4 (or 3 if the HyperThreading is on). It's possible that Node interleaving feature is enabled (and it should not be) which would explain the numactl single output, but that information is not available in this thread.

I am assuming that the C/S test is done from Windows and that Windows is also a VM, tho it is not specified explicitly said here.I would try same C/S (in Rob's environment) stress test on Linux to further clarify where the hiccups issue resides as well as getting the last VM tools off vmware.com site (10.3.2) - my.vmware.com/.../details

> We are going to try the ESXi 6.5 version and we will test again.

Why not 6.7 ?

Posted by Libor Laubacher on 25-Sep-2018 11:40

I forgot to ask this, but - how does/did VMware explain the difference between vmxnet3 vs e1000 and what did they say about the cause of vmxnet3 being 'slow on connect' ?

Posted by Brian K. Maher on 25-Sep-2018 11:44

Libor,
 
They said it was an application problem.
 
Brian Maher
Principal Engineer, Technical Support
Progress
Progress
14 Oak Park | Bedford, MA 01730 | USA
phone
+1 781 280 3075
 
 
Twitter
Facebook
LinkedIn
Google+
 
 

Posted by Paul Koufalis on 25-Sep-2018 11:47

hahahahahaha!!! Classic! Same application does exactly the same thing. Only diff is a vmware change. Of course they said it was an application problem.

Posted by Libor Laubacher on 25-Sep-2018 12:03

> They said it was an application problem.

And that's IT ? And that's been accepted ??

Posted by Brian K. Maher on 25-Sep-2018 12:55

Basically, yes.

Their mantra is that they have no control over or exposure to what happens inside a VM and that we need to prove to them that we are not the problem.

Posted by Libor Laubacher on 25-Sep-2018 13:54

I think the fact that switching the adapters makes a difference has proved it already, but that does not help Rob here. (I mean if we'd start acting like them, them being VMware and pointing fingers instead of actually trying to figure out something). I suppose the proof would be install to Linux on the same bare metal where the ESX host is and redo the same test. I have kicked off the connect/disconnect code 2 hours ago and leaving it overnight, but from my previous (albeit quick) tests - using ESX 6.0 and 6.7 I saw no hiccups.

Posted by Rob Mylett on 26-Sep-2018 08:20

We discussed 6.5 vs. 6.7 and we decided to go to 6.5.   We rarely go bleeding edge...    

So we installed 6.5 and did the test with vmxnet3, and it worked flawlessly...   The issue is gone..

Posted by Libor Laubacher on 26-Sep-2018 08:32

Thanks for the update, Rob. I have run 12 hours of your code against 6.7 - no dice seeing the problem. Same test will run against 6.0 later today overnight, it is only CPU time after all :)

I would be very interested to know what does/would VMware say about 6.5 correcting the issue :)

Posted by Paul Koufalis on 26-Sep-2018 08:34

<BIG SMILE> !!

Great news Rob. Not just for you, but for all of us out here who support customers with these configurations.

The next thing to do is download ProTop and run some of the built-in benchmark tests to compare your new server to your old server. Better yet, install ProTop on the old server so that we can gather some historical metrics. That way when you go live, we can quickly and easily compare before and after.

Ping me offline if you're interested.

Posted by Libor Laubacher on 26-Sep-2018 08:36

Sigh Paul. And such a nice technical thread this was ... .:p :-D

Posted by Paul Koufalis on 26-Sep-2018 08:41

Hey hey hey! The benchmark tests are free. And short-term historical metrics are free, too. You only have to pay if you want access to alerting and long-term historical metrics.

Posted by Rob Mylett on 26-Sep-2018 08:42

I am asking that question to them...   I will post the reply.

Posted by Brian K. Maher on 26-Sep-2018 08:52

Hi Rob, are we now agreed that this wasn't a Progress issue?

This thread is closed