High CPU consumption after client update (11.7.4) connected

Posted by weberj on 12-Feb-2020 07:24

Hi at all,

actuall we have an CPU issue on our database server (OE 10.2B08) after updating the clients to version 11.7.4.

As you can see, in the following screenshot (from our monitoring system, please note the date format), the CPU consumption increased after the update:

The higher consumption is caused by a stateless appserver, which runs on the database VM, allthough we haven't increased the connected users or changed the code. Aside the from the update, we haven't changed anything in the environment.

The database and appserver runs on the same VM ( Windows Server 2008 R2  on an ESX-Host with 2 Sockets x 4 CPUs (Intel Xeon Gold 2646).

Referenced to the following posts, a client with a higher major release can connect to a database and appservers with a lower major release:

https://knowledgebase.progress.com/articles/Article/P125537?popup=true

https://knowledgebase.progress.com/articles/Article/OpenEdge-Connectivity-rules-with-previous-later-versions?popup=true

https://knowledgebase.progress.com/articles/Article/P132?popup=true

Why have we done this: Actuall we have a migration project to update our servers from Windows Server 2008 to Windows Server 2016. In the first step we migrate the clients (based by the fact of the compability of the different versions). After this we migrate the servers.

Maybe someone has an idea, what causes the cpu consumption and what we can do to reduce it.

Best regards from germany

Jochen

All Replies

Posted by Tim Hutchens on 12-Feb-2020 12:10

I think I remember at Progress NEXT 2018 Roy Ellis and David Cleary talking about how customers had experienced performance degradation during certain version 11 updates due to Progress fixing some issues that were masking other problems (grossly oversimplifying this). The net effect was that the new AppServers exposed certain types of bad code that were previously masked, requiring some code refactoring.

I don't know if this would apply to your situation, but it may explain some of what you're experiencing. I'd be on the phone with Progress Tech Support, especially since you say there weren't any code or user count changes.

Posted by dbeavon on 12-Feb-2020 15:57

What type of clients are upgraded to 11.7.4?  OpenClients?  .Net/Java?

Can you capture the activity using 4GLTrace, and then replay it in a preproduction environment (using both the configurations)?  It would be helpful if your observations could be verified on another server.

It certainly seems odd that you would have such a high amount of CPU associated with the particular version of the remote openclient.  However the openclients are fairly sophisticated nowadays and can connect to both PASOE and classic.  Maybe there is some additional negotiation work that is happening in the legacy 10.2B agents.

You should probably enable a variety of logging types and try to see if you can correlate CPU and the logs (long delays in the logging with high CPU).   You should be able to modify your logging configuration on-the-fly in "classic" with the setting: allowRuntimeUpdates=1.

It is probably not worth saying it anymore, but I would have suggested upgrading the server side of things first.  Our OE server products are typically running more recent versions of OE than the clients (unless we have a client-specific hotfix).  For example we run 11.7.5 for our database server (and appservers), but some of our SQL92 drivers are still using the 10.2.B library.  It is easier to manage the centralized server products, and so they typically get attended to first.  After that we try to "herd" the various types of remote clients.  If I had to guess, you may be describing a configuration of the products which even Progress has never tested explicitly

Did you try any of this in pre-production?  Was there enough pre-production activity to compare the CPU behavior before/after?  Did you also get around to upgrading the *server* side of things yet in pre-production?  Are you comfortable pushing ahead with the upgrade on the *server* side of things, without first getting to the bottom of this particular performance mystery?  At least the "only" problem you have is performance related ... it could have been worse.  Once the server side of things is running a consistent version of OE, it won't be such an unusual configuration.  At that point your tech support case may go more smoothly (if you still have a case).

Posted by weberj on 13-Feb-2020 12:59

Thanks for your response.

To your first question:

We only updated the OpenEdge Graphical client and no others.

To the 4GL-Trace idea:

I have no access to the source code of the appserver routines. The software we are talking about is an external developed ERP-System. I think the capture of the 4GL Trace has to be implemented in the appserver routine, right?

By the way... The vendor of the ERP-System suggested to update the clients first and after this the database server. Obviously not the best suggestion. I also opened a support case at the vendors tech service. But, regarding to my experience, we don't get an qualified answer, to solve the problem. So, I created this post.

Netherless, officially the constellation of different client/server versions is supported (as I wrote in my initial post). But maybe you are right that progress never tested this in special.

Fact is, we learned our lesson.

Anyway, what I can try is, to increase the logging level of the appservers. Maybe I get more informations in the log files. At the weekend I will restart the database server. Hoping this lowers the CPU consumption.

Regarding to the pre-production test:

Yes, we had a pre-production test, with around twenty client connections. In this test scenario we recognized no problems with the CPU consumption. We are talking about 400 client connections in our productive environment.

Our plan is to migrate the database server in three weeks and, as you said, we can deal with this problem till then.

But if someone has an idea what this problem causes or what a possible solution could be, I' m pleased to hear it.

Best regards

Posted by Paul Koufalis on 13-Feb-2020 13:13

Can you verify that you are running r-code and not p-code? We have seen similar behaviour in the past and is one of the reasons why we advocate setting -basetable and -baseindex to negative numbers in order to monitor the Progress system tables like _file and _field. One of the key indicators of this potential cause is seeing massive reads in _File, _field, _index, etc.... as the AppServer agents compile and recompile and recompile the same code over and over again. Since these tables are almost always in the -B buffer cache, the only visible manifestation at the system level is increased CPU.

Warning: the interplay of -base[table | index] and -[table | index]rangesize is not as simple as it may seem when you change -base[table | index] to a negative number. Download ProTop to get suggested values. You can also watch table and index reads in real time with the free ProTop real-time monitor.

Posted by weberj on 18-Feb-2020 06:02

Update to this issue:

I rebooted the db server last weekend. After the reboot I didn't mentioned any abnormal CPU usage.

So for now, it looks good.

But I couldn't explain, why this behaviour happened.

Thank you all for the suggestions of possible solutions.

Best regards

Jochen

This thread is closed