kswapd0 running at 99.99%

Posted by Gareth Vincent on 07-Jun-2016 03:58

I've been trying to identify a problem that I am experiencing on one of our Centos servers.  The kswapd0 process runs close to 100% most of the time causing consistently high %IOWAIT times.  The servers is running 11.5.1 with appserver connections to the DB.  If I run a free -m this is the result, indicating i have plenty of memory available.

total used free shared buffers cached

Mem: 9888 9779 108 6509 0 6538
-/+ buffers/cache: 3240 6648
Swap: 9997 3348 6649

What I have identified is if I run the proadsv -keepservers -stop command the kswapd0 process goes away and %IOWAIT time drops to acceptable levels (under %5 as apposed to 10 - 20).  The second I bring up the proadsv process performance drops.

Any ideas how I can tweak the OS or the AdminServerPlugins.properties file to reduce swapping to disk.

All Replies

Posted by gus on 07-Jun-2016 08:24

what does cat /proc/meminfo show?

what does vmstat show ?

Posted by Keith Sudbury on 07-Jun-2016 09:38

Run the following...

cat /proc/sys/vm/dirty_ratio

cat /proc/sys/vm/swappiness

cat /proc/sys/vm/dirty_background_ratio

A lot  of your memory is being used for OS caching, kswap is maintaining those buffers as part of its job. For database servers I usually set swappiness to 0, dirty_ratio to 60 and dirty_background_ratio to 5 as a starting point.

Before you make any changes you need to carefully consider what you are going to do with that memory (-B, -T, etc.) or you will probably run into new performance problems as pages are read from disk instead of the buffer cache.

Posted by Gareth Vincent on 07-Jun-2016 23:32

Hi Gus,

here is the output from meminfo

MemTotal:       10125560 kB

MemFree:          140592 kB

Buffers:            2688 kB

Cached:          6714752 kB

SwapCached:       537032 kB

Active:          1959232 kB

Inactive:         918092 kB

Active(anon):    1941552 kB

Inactive(anon):   884180 kB

Active(file):      17680 kB

Inactive(file):    33912 kB

Unevictable:     6665748 kB

Mlocked:               0 kB

SwapTotal:      10237948 kB

SwapFree:        6498224 kB

Dirty:               484 kB

Writeback:             0 kB

AnonPages:       2589180 kB

Mapped:          6677372 kB

Shmem:           6665756 kB

Slab:             136420 kB

SReclaimable:      46372 kB

SUnreclaim:        90048 kB

KernelStack:       13200 kB

PageTables:       181136 kB

NFS_Unstable:          0 kB

Bounce:                0 kB

WritebackTmp:          0 kB

CommitLimit:    15300728 kB

Committed_AS:   14072256 kB

VmallocTotal:   34359738367 kB

VmallocUsed:      312140 kB

VmallocChunk:   34359410440 kB

HardwareCorrupted:     0 kB

AnonHugePages:     73728 kB

HugePages_Total:       0

HugePages_Free:        0

HugePages_Rsvd:        0

HugePages_Surp:        0

Hugepagesize:       2048 kB

DirectMap4k:        8560 kB

DirectMap2M:    10475520 kB

Vmstat shows a lot of swapping, particularly on "si"

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----

2 8 3777404 110484 508 6693548 952 0 152356 176 9912 3891 7 3 65 25 0
1 0 3777312 130964 708 6698072 152 0 58648 44 4453 1837 8 2 81 9 0
0 0 3776792 129444 1140 6698200 616 0 7800 8 3844 1983 13 3 80 4 0
1 1 3775372 122144 1056 6700292 1676 0 9224 64 3509 2134 8 4 84 5 0
1 3 3773624 112072 1064 6708128 2642 0 14114 10 3953 2536 7 3 65 25 0

Posted by Gareth Vincent on 07-Jun-2016 23:50

Hi Keith,

Thanks for your response, please see values below.

cat /proc/sys/vm/dirty_ratio

20

cat /proc/sys/vm/swappiness ( I recently changed the default of 60 to 1, there has been a slight improvement)

1

cat /proc/sys/vm/dirty_background_ratio

10

i've also tried the following with some improvement

echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag

echo never > /sys/kernel/mm/transparent_hugepage/enabled

With over 6Gb of free memory I wouldn't expect the OS to swap so much. I haven't worked with "dirty_ration" or "dirty_backgroud_ratio", I will certainly look into these parameters.

Posted by gus on 08-Jun-2016 08:44

Well you /don't/ have 6 GB of free memory.

We don't know what you are running on this system or how the database and other stuff are configured, but you do not have enough memory.

- there is 10 GB of total memory, which is not a lot for an active server

- there is about 4 GB of stuff paged out because it does not fit in memory.

- there is a small amount of filesystem cache (Active(file))

- there is 6 GB of stuff that is locked in memory (unevictable)

- there is 6 GB'ish of "Cached" but that can be any of lots of things, including filesystem pages, code, and dynamically allocated memory.

We have to figure out where your memory is going. Questions:

How many databases are you running?

How much shared memory are they allocating?

What else is allocating shared memory?

How many app servers?

How many 4GL runtime clients?

What else is running on this box?

What does vmstat or some other utility (e.g. nmon) tell about pagin and pageout I/O rates?

Posted by Gareth Vincent on 08-Jun-2016 23:55

Hi Gus,  I really appreciate your feedback on this one.  I must be honest I don't fully understand the memory allocation on Linux, there are a lot of contradictory articles out there. If this server is indeed running out of memory that would explain the high system load and swapping.  

This is a customers machine and is dedicated for running Progress and nothing else.  There are 7 DB's running that make up the application.  Below is the size and memory allocation for each

DB1  729 Mb   1.1Gb shared memory          static DB  High reads

DB2  87Gb       512Mb shared memory        Documentdb  High writes

DB3  26Mb      46Mb shared memory           Framework DB  (low reads/writes

DB4  2Mb         95Mb shared memory          Integration DB (low reads/writes)

DB5  26Gb       2.6Gb shared memory         Main DB (high reads/mediam writes (was on 3.2Gb,  I decreased the -B last night after reading your response)

DB6  10Mb       46Mb shared memory         Framework DB (low reads/writes)

DB7  33Mb       234Mb shared memory      Temp DB

Total 4.6Gb shared memory

Only 1 appserver running with on average 10 agents

other memory allocation

-Bt 20000

-tmpbsize 4

240 ABL clients

Its currently 06:30 now so I will try and gather some more stats during the course of today

Posted by Gareth Vincent on 09-Jun-2016 02:10

Its 9am now and i'm already seeing a great improvement in performance.  You were spot on about the memory.  Just by reducing the Buffer pool (not ideal, if anything I would like to allocate more) the server is no longer swapping.  At least 70% improvement in overall performance.  

As I was providing you with the shared memory allocations along with the temp table Bt param its was quite obvious that I was over committing on memory.  I just need to convince the customer to invest in new hardware or at least upgrade the memory.

I would appreciate if you could provide me with more insight into "meminfo", I would really like to get a better understanding on the memory allocation on an OS level.

Posted by Gareth Vincent on 09-Jun-2016 05:22

12:15:  Steady performance.  Thanks again

Posted by gus on 13-Jun-2016 08:35

here's a snippet of vmstat output:

Note the "si" and "so" columns. These indicate pages read in and pages written (i.e. swapped) out. It is possible for some of the swap space to be in use but no paging going on, such as in this case. A typical system has lots of processes that are running only occasionally and those can be swapped out but nothing is wrong.

These days, there is only demand paging in most systems. The idea of swapping out entire processes all at once has long been abandoned.

To make things confusing, some systems use the demand pager to do file I/O. That can make it hard to distinguish paging from other I/O.

This thread is closed