Explaining VMWare CPU Performance

Posted by Paul Koufalis on 19-Jun-2015 04:13

How do you explain a CPU benchmark that is slower on a virtual server versus a laptop PC? I'm looking for real explanations or ideas at what to look at and not vague references to host-system overhead and thin provisioning (sorry - I know I can be a less-than-nice person sometimes). I know there is a cost related to the HV but I don't know how to quantify it, especially when the host CPUs are running at 50%.

VMWare ESX 5.x on an HP Blade with 2 Xeon E5-2640 (2 CPU, 6 cores each + H/T). My Windows 2008R2 VM has 2 vCPU out of 24 and there are 22 VMs running on the box. esxtop shows that the CPUs are fairly quiet during my benchmark (approx 50% CPU usage) but my Windows VM is at 100% CPU - right where I want it. The test was run on the production box in the evening so really there was very little activity on any of the VMs.

Anti-virus was configured to ignore my benchmark directory and my test doesn't do any disk I/O anyways. The benchmark result was 21% slower on the VM versus the laptop (core i7). CPUMark says the Xeon should be 3X faster than my Core i7.

The only thing i can think of is that my laptop has 4 logical processors (1 CPU = 2 Core = 4 H/T) versus 2 LCPU for the server. Unfortunately I am not educated enough to understand how hyperthreading really works in this situation. Sorry I did not check context switching and sys cpu% during the benchmark runs.

The benchmark used was White Star Software's ReadProbe. RP loads the entire sports DB into memory then tries to read it as fast as it can, steadily increasing the number of clients (shared memory in this case). In the strict sense it is not a pure CPU benchmark but it is a more realistic evaluator of CPUs in an OpenEdge environment.

All Replies

Posted by Libor Laubacher on 19-Jun-2015 06:08

Ø  sorry - I know I can be a less-than-nice person sometimes

 
I bet nobody has ever accused you of being nice, eh ? :-D
 
Few ideas:
 

1)      HP blade servers have by default “Dynamic power saving” enabled in BIOS. How is that set on your/their HP Blade? If it is left in default and as you said the ESX host was not loaded, then the blade will “apply” power saving and CPUs will operate on lower frequency and load from a single VM is/might not be ‘strong’ enough to ‘wake’ them

 

2)      Server with more CPU’s (your laptop) is able to “queue” more load/threads as it has more CPUs to use and will beat the less albeit faster CPUs.

 

If you want (and I am interested myself as well) I have 4 identical (to a dot) HP blade servers available at my disposal, so I can re-do readprobe having one server bare metal and one server running ESX/VM, so we would be actually comparing apples to apples here. One question tho – does it have to be Windows ?
 
[collapse]
From: Paul Koufalis [mailto:bounce-pkoufalis@community.progress.com]
Sent: Friday, June 19, 2015 11:14 AM
To: TU.OE.RDBMS@community.progress.com
Subject: [Technical Users - OE RDBMS] Explaining VMWare CPU Performance
 
Thread created by Paul Koufalis

How do you explain a CPU benchmark that is slower on a virtual server versus a laptop PC? I'm looking for real explanations or ideas at what to look at and not vague references to host-system overhead and thin provisioning (sorry - I know I can be a less-than-nice person sometimes). I know there is a cost related to the HV but I don't know how to quantify it, especially when the host CPUs are running at 50%.

VMWare ESX 5.x on an HP Blade with 2 Xeon E5-2640 (2 CPU, 6 cores each + H/T). My Windows 2008R2 VM has 2 vCPU out of 24 and there are 22 VMs running on the box. esxtop shows that the CPUs are fairly quiet during my benchmark (approx 50% CPU usage) but my Windows VM is at 100% CPU - right where I want it. The test was run on the production box in the evening so really there was very little activity on any of the VMs.

Anti-virus was configured to ignore my benchmark directory and my test doesn't do any disk I/O anyways. The benchmark result was 21% slower on the VM versus the laptop (core i7). CPUMark says the Xeon should be 3X faster than my Core i7.

The only thing i can think of is that my laptop has 4 logical processors (1 CPU = 2 Core = 4 H/T) versus 2 LCPU for the server. Unfortunately I am not educated enough to understand how hyperthreading really works in this situation. Sorry I did not check context switching and sys cpu% during the benchmark runs.

The benchmark used was White Star Software's ReadProbe. RP loads the entire sports DB into memory then tries to read it as fast as it can, steadily increasing the number of clients (shared memory in this case). In the strict sense it is not a pure CPU benchmark but it is a more realistic evaluator of CPUs in an OpenEdge environment.

Stop receiving emails on this subject.

Flag this post as spam/abuse.

[/collapse]

Posted by ChUIMonster on 19-Jun-2015 06:27

You have 4 identical servers… it would fascinating to test:

1) Bare metal Linux
2) Linux in a VM
3) Bare metal Windows
4) Windows in a VM



On Jun 19, 2015, at 7:09 AM, Libor Laubacher <bounce-llaubach@community.progress.com> wrote:

Reply by Libor Laubacher
Ø  sorry - I know I can be a less-than-nice person sometimes
 
I bet nobody has ever accused you of being nice, eh ? :-D
 
Few ideas:
 
1)      HP blade servers have by default “Dynamic power saving” enabled in BIOS. How is that set on your/their HP Blade? If it is left in default and as you said the ESX host was not loaded, then the blade will “apply” power saving and CPUs will operate on lower frequency and load from a single VM is/might not be ‘strong’ enough to ‘wake’ them
 
2)      Server with more CPU’s (your laptop) is able to “queue” more load/threads as it has more CPUs to use and will beat the less albeit faster CPUs.

 

If you want (and I am interested myself as well) I have 4 identical (to a dot) HP blade servers available at my disposal, so I can re-do readprobe having one server bare metal and one server running ESX/VM, so we would be actually comparing apples to apples here. One question tho – does it have to be Windows ?
 
[collapse]From: Paul Koufalis [mailto:bounce-pkoufalis@community.progress.com]
Sent: Friday, June 19, 2015 11:14 AM
To: TU.OE.RDBMS@community.progress.com
Subject: [Technical Users - OE RDBMS] Explaining VMWare CPU Performance
 
Thread created by Paul Koufalis

How do you explain a CPU benchmark that is slower on a virtual server versus a laptop PC? I'm looking for real explanations or ideas at what to look at and not vague references to host-system overhead and thin provisioning (sorry - I know I can be a less-than-nice person sometimes). I know there is a cost related to the HV but I don't know how to quantify it, especially when the host CPUs are running at 50%.

VMWare ESX 5.x on an HP Blade with 2 Xeon E5-2640 (2 CPU, 6 cores each + H/T). My Windows 2008R2 VM has 2 vCPU out of 24 and there are 22 VMs running on the box. esxtop shows that the CPUs are fairly quiet during my benchmark (approx 50% CPU usage) but my Windows VM is at 100% CPU - right where I want it. The test was run on the production box in the evening so really there was very little activity on any of the VMs.

Anti-virus was configured to ignore my benchmark directory and my test doesn't do any disk I/O anyways. The benchmark result was 21% slower on the VM versus the laptop (core i7). CPUMark says the Xeon should be 3X faster than my Core i7.

The only thing i can think of is that my laptop has 4 logical processors (1 CPU = 2 Core = 4 H/T) versus 2 LCPU for the server. Unfortunately I am not educated enough to understand how hyperthreading really works in this situation. Sorry I did not check context switching and sys cpu% during the benchmark runs.

The benchmark used was White Star Software's ReadProbe. RP loads the entire sports DB into memory then tries to read it as fast as it can, steadily increasing the number of clients (shared memory in this case). In the strict sense it is not a pure CPU benchmark but it is a more realistic evaluator of CPUs in an OpenEdge environment.

Stop receiving emails on this subject. 

Flag this post as spam/abuse. 

Stop receiving emails on this subject.

Flag this post as spam/abuse.


[/collapse]

Posted by Libor Laubacher on 19-Jun-2015 06:46

I was told I need to learn to keep my mouth quiet sometimes.
 
Okay then.
 
[collapse]
From: ChUIMonster [mailto:bounce-ChUIMonster@community.progress.com]
Sent: Friday, June 19, 2015 1:28 PM
To: TU.OE.RDBMS@community.progress.com
Subject: Re: [Technical Users - OE RDBMS] Explaining VMWare CPU Performance
 
Reply by ChUIMonster
You have 4 identical servers… it would fascinating to test:
 
1) Bare metal Linux
2) Linux in a VM
3) Bare metal Windows
4) Windows in a VM
 
 
 
On Jun 19, 2015, at 7:09 AM, Libor Laubacher <bounce-llaubach@community.progress.com> wrote:
 
Reply by Libor Laubacher
Ø  sorry - I know I can be a less-than-nice person sometimes
 
I bet nobody has ever accused you of being nice, eh ? :-D
 
Few ideas:
 
1)      HP blade servers have by default “Dynamic power saving” enabled in BIOS. How is that set on your/their HP Blade? If it is left in default and as you said the ESX host was not loaded, then the blade will “apply” power saving and CPUs will operate on lower frequency and load from a single VM is/might not be ‘strong’ enough to ‘wake’ them
 
2)      Server with more CPU’s (your laptop) is able to “queue” more load/threads as it has more CPUs to use and will beat the less albeit faster CPUs.

 

If you want (and I am interested myself as well) I have 4 identical (to a dot) HP blade servers available at my disposal, so I can re-do readprobe having one server bare metal and one server running ESX/VM, so we would be actually comparing apples to apples here. One question tho – does it have to be Windows ?
 
[collapse]
From: Paul Koufalis [mailto:bounce-pkoufalis@community.progress.com]
Sent: Friday, June 19, 2015 11:14 AM
To: TU.OE.RDBMS@community.progress.com
Subject: [Technical Users - OE RDBMS] Explaining VMWare CPU Performance
 
Thread created by Paul Koufalis

How do you explain a CPU benchmark that is slower on a virtual server versus a laptop PC? I'm looking for real explanations or ideas at what to look at and not vague references to host-system overhead and thin provisioning (sorry - I know I can be a less-than-nice person sometimes). I know there is a cost related to the HV but I don't know how to quantify it, especially when the host CPUs are running at 50%.

VMWare ESX 5.x on an HP Blade with 2 Xeon E5-2640 (2 CPU, 6 cores each + H/T). My Windows 2008R2 VM has 2 vCPU out of 24 and there are 22 VMs running on the box. esxtop shows that the CPUs are fairly quiet during my benchmark (approx 50% CPU usage) but my Windows VM is at 100% CPU - right where I want it. The test was run on the production box in the evening so really there was very little activity on any of the VMs.

Anti-virus was configured to ignore my benchmark directory and my test doesn't do any disk I/O anyways. The benchmark result was 21% slower on the VM versus the laptop (core i7). CPUMark says the Xeon should be 3X faster than my Core i7.

The only thing i can think of is that my laptop has 4 logical processors (1 CPU = 2 Core = 4 H/T) versus 2 LCPU for the server. Unfortunately I am not educated enough to understand how hyperthreading really works in this situation. Sorry I did not check context switching and sys cpu% during the benchmark runs.

The benchmark used was White Star Software's ReadProbe. RP loads the entire sports DB into memory then tries to read it as fast as it can, steadily increasing the number of clients (shared memory in this case). In the strict sense it is not a pure CPU benchmark but it is a more realistic evaluator of CPUs in an OpenEdge environment.

Stop receiving emails on this subject. 

Flag this post as spam/abuse. 

Stop receiving emails on this subject.

Flag this post as spam/abuse.

 
Stop receiving emails on this subject.

Flag this post as spam/abuse.

[/collapse][/collapse]

Posted by Paul Koufalis on 19-Jun-2015 07:20

Like that's never happened before!

I'll send you the benchmark offline.

Paul

Posted by ChUIMonster on 19-Jun-2015 07:20

It sounds like the vm is setup to cap the logical cpu power at 2.  I'm not familiar with the gory details of vmware configuration but other virtualization technologies (such as AIX) will let you exceed what is allocated (the "entitlement") if it is available (IOW if demand is low as you claim that it is) and if you have allowed that sort of thing to happen.

In this case it sounds like vmware is not permitting you to go beyond 2 logical cpus.  IOW a hard limit on logical CPUs is stopping you.

I am fairly certain that a "logical cpu" is equivalent to a "core".  Not a "cpu".  I seem to recall that vmware also abstracts the clock speed -- so a logical cpu might not even be a full "native" core.  There would not be any hyper threading allowed or accounted for.

So to match or exceed your laptop you probably need the vm to have 4 virtual cpus at full speed.

If it were an "apples to apples" configuration with equivalent resources available (i.e. the tests that we hope Libor will run)  I would not expect the *hypervisor* to have much of an impact.  Low single digit percentages is my guess.  Pending the outcome of actual testing ;)  So if you are seeing bigger differences (as you are) the issue probably lies elsewhere -- like in the configuration not providing the resources that you think you are getting.

Posted by Paul Koufalis on 19-Jun-2015 07:30

A logical CPU is equal to half a core or one hyperthread. Hence there are 24 vCPU available (2 CPU X 6 Core X 2 HT).

I also don't know enough about the gory details of VMWare to know about CPU entitlement.

Libor: I'll check the power thingy...

Posted by Libor Laubacher on 19-Jun-2015 07:37

Yes, knowing the resource limit and allocations would help J
If there’s a cap, then that would certainly explain. And there usually is, as I can’t image one should be able to hog the production machine resources on his test machine (even if that person is Paul).

Ø  In this case it sounds like vmware is not permitting you to go beyond 2 logical cpus.  IOW a hard limit on logical CPUs is stopping you.

If you ask for 2 CPUs, you get power of 2 CPUs (whatever that CPU might be). However the hypervisor will serve you that power how it sees fit. It’s possible to set affinity, but that’s not without risks.
 

Ø  (i.e. the tests that we hope Libor will run)

 
Okay okay, the message has been rec’d.
 
[collapse]
From: ChUIMonster [mailto:bounce-ChUIMonster@community.progress.com]
Sent: Friday, June 19, 2015 2:21 PM
To: TU.OE.RDBMS@community.progress.com
Subject: RE: [Technical Users - OE RDBMS] Explaining VMWare CPU Performance
 
Reply by ChUIMonster

It sounds like the vm is setup to cap the logical cpu power at 2.  I'm not familiar with the gory details of vmware configuration but other virtualization technologies (such as AIX) will let you exceed what is allocated (the "entitlement") if it is available (IOW if demand is low as you claim that it is) and if you have allowed that sort of thing to happen.

In this case it sounds like vmware is not permitting you to go beyond 2 logical cpus.  IOW a hard limit on logical CPUs is stopping you.

I am fairly certain that a "logical cpu" is equivalent to a "core".  Not a "cpu".  I seem to recall that vmware also abstracts the clock speed -- so a logical cpu might not even be a full "native" core.  There would not be any hyper threading allowed or accounted for.

So to match or exceed your laptop you probably need the vm to have 4 virtual cpus at full speed.

If it were an "apples to apples" configuration with equivalent resources available (i.e. the tests that we hope Libor will run)  I would not expect the *hypervisor* to have much of an impact.  Low single digit percentages is my guess.  Pending the outcome of actual testing ;)  So if you are seeing bigger differences (as you are) the issue probably lies elsewhere -- like in the configuration not providing the resources that you think you are getting.

Stop receiving emails on this subject.

Flag this post as spam/abuse.

[/collapse]

Posted by James Palmer on 19-Jun-2015 07:41

All this vmware talk is really interesting. Is there a handy guide out there somewhere on what I need to know? I've been landed with vmware architecture and I'd like to try and understand what's been done well and what hasn't.

Posted by Libor Laubacher on 19-Jun-2015 07:55

Login to the vCenter and look J
Unless you have an access to see the config, you have to rely on what admin tells you.
 
[collapse]
From: James Palmer [mailto:bounce-jdpjamesp@community.progress.com]
Sent: Friday, June 19, 2015 2:42 PM
To: TU.OE.RDBMS@community.progress.com
Subject: RE: [Technical Users - OE RDBMS] Explaining VMWare CPU Performance
 
Reply by James Palmer

All this vmware talk is really interesting. Is there a handy guide out there somewhere on what I need to know? I've been landed with vmware architecture and I'd like to try and understand what's been done well and what hasn't.

Stop receiving emails on this subject.

Flag this post as spam/abuse.

[/collapse]

Posted by ChUIMonster on 19-Jun-2015 07:55

> If you ask for 2 CPUs, you get power of 2 CPUs (whatever that CPU might be)

"CPUs"?  Or "cores"?

> However the hypervisor will serve you that power how it sees fit.

AIX lets you decide certain things about that.  By default the cores are "shared" and it grabs cycles wherever it can find them.  But you can also set them up as "dedicated" and arrange for affinity and set limits on growth or allow a partition to exceed its entitlement if necessary and if appropriate resources are available.  I would hope that vmware would have similar capabilities -- but I've not dug into that.

Not that I am simply singing the praises of AIX virtualization -- by default it spreads the resources "wide".  For Progress databases it is much better to narrowly focus them.  IOW "dedicated" and specifically keep the CPUs within a single physical package.  999 times out of 100 that is NOT the way that the sys admins will have set it up and changing it is not exactly easy.

Posted by TheMadDBA on 19-Jun-2015 09:35

For vmware virtual CPUs are cores (or the extra "core" you get from hyperthreading). Hyperthreading/SMT can help under certain workloads in varying degrees but for most databases you don't really get close to an extra core worth of work out of it.

Vmware does have similar concepts as AIX for allocating resources per host. The default is shares which is basically a ratio of available virtual CPUs (either a low,med,high setting or actual cpus). If you are using the low,med.high share model performance can suffer.

You can also set minimums and maximums and the affinity that Libor mentioned. You can also control how the hyperthreaded core is used/not used per host.

Paul do you have access to the vsphere client to look at the settings?

Posted by Paul Koufalis on 19-Jun-2015 10:10

No I do not have access to vsphere. Must pass through the proper channels. Did I mention that I was in France?

Posted by TheMadDBA on 19-Jun-2015 10:41

Must be nice :-)

The short list of things to have them look at.... (disclaimer: I am assuming your VM is the most important VM).

1) Make sure SpeedStep is turned off... power saving is great until it slows down your CPUs. There are bios settings on the host as well as in within vmware to control if/how SpeedStep is used.

2) Make sure the CPU settings have appropriate minimums and maximums for your host. High shares/guaranteed MHz.

3) By default vmware will hop between cores which is ok for most applications. But if you are trying to maximize the CPU cache you probably want to look into the affinity settings to make sure your VM gets a certain set of cores and the other VMs don't. We did this for our Outlook and Domain controllers. Keep in mind that 0 and even numbers will be the actual cores and odd numbers will be the hyperthreaded virtual core. Also if you don't do this right you can make performance worse.

4) Make sure the memory reservations (min,limit) are set properly to give you exactly the memory you want and to prevent your memory from being shuffled off to other VMs. There are also similar settings for IO.

It would also be interesting to compare just some basic functions like incrementing an integer/decimal and comparing the VM and your laptop. Basically get any memory access out of the way and just pound the CPU.

This thread is closed