Database backups crashing on OE 11.6.3

Posted by dbeavon on 15-Mar-2017 08:57

Our database administrator tells me that our database crashes whenever we do backups.  This began within the very first week of installing OE 11.6.3 (we upgraded to 11.6.3 from 11.3). 

We are run HP-UX IA64 In this environment that is crashing and also use a feature in OE called the "Alternate Buffer Pool".

Has anyone else had these issues?  Apparently there is a KB as well: http://knowledgebase.progress.com/articles/Article/database-B2-crashes-with-1040-000078422

We didn't know about this problem in the planning phase when we were making preparations for the upgrade to 11.6.3.  Prior to the planning, I had already done some of my own preliminary testing of OE 11.6.3 on a personal database environment and had experienced no problems, but I was NOT using an "Alternate Buffer Pool" (B2).

Given this experience with 11.6.3, I think Progress should be warning all its customers to avoid this service pack and disallow access to it in ESD.  Has there been any communication along these lines?  If they do allow customers to move forward, it leads to a lot of trouble.   I am having a hard time making sense of it all;  but at this point I'm thinking that the B2 option is extremely uncommon among OE RDBMS customers... (or that OE customers may view backups as an unnecessary luxury.)

Either way we must get a better service pack (ie. 11.6.4.)  As it stands, this final service pack for 11.6 will leave customers in a position where they have to choose between using B2 and doing backups.  It doesn't make sense.  Nor does it make sense that every single customer that moves up to 11.6.3 should also have to request a private hotfix in addition to the service pack.  Maybe my experiences with the service packs and the hotfixes from other vendors do not compare to the way Progress creates these things.  

Is it unreasonable to think that 11.6.4 is needed?  After all, 11.6 is the most recent version of OE, and 11.6.3 is probably causing more trouble than it fixes.   How can Progress rule out a new service pack on the current version of OE?

 

All Replies

Posted by James Palmer on 15-Mar-2017 09:22

Have you tried it with the -Bp parameter as explained in the KB? In all honesty you should be using this parameter anyway for the backups as otherwise your -B is wiped out every time you back up.

Posted by dbeavon on 15-Mar-2017 10:06

It seems to me that if the "-B2" option is an uncommon thing for OE customers to use (or for Progress to test when releasing service packs) then this "-Bp" is probably even more uncommon.  We'd like to be using the product in a mainstream way so that we don't bump into problems before everyone else.

It was very surprising that we ran into this, and ran into it as quickly as we did.  OE 11.6.3 has been available since September but perhaps there aren't many who have tried using it yet.  (And/or they aren't using the B2 option).

Is "-Bp" something that most OE dba's should be familiar with?  It appears to be a client session parameter, and I'm guessing that most dba's overlook those when performing database administration tasks like serving databases, or performing backups.  (In order to use the product in a more mainstream fashion, I would just as soon stop using "B2" and just use "B" ... at least until the 11.6.4 service pack).

Posted by James Palmer on 15-Mar-2017 10:13

Not sure whether I'd say a dba should be aware of -Bp, but it's something I've become aware of in the last couple of years. The whole point of -B is that it's meant to be a cache of what people are using frequently. Anything that comes along and wipes that out for administration purposes is not a welcome addition as it means the buffer is no longer what people actually want. I generally only use it on the backups because they happen every night, or even more frequently, and this can have a negative impact on users.

It is something that is probably considered 'Best practise' nowadays.

Posted by dbeavon on 15-Mar-2017 10:17

Reading about "-Bp", it seems to say that the buffers are stolen from the public (-B) buffers anyway, so either way the public buffers are being consumed by the backup.  

Hopefully a backup isn't so greedy as to "wipe out" all available buffers.  It would seem foolish considering most of the data is used on a one-time basis and won't be accessed again.

Since we are talking about this "BP" client parameter that is new to me, I have a related question.  In the past I had wondered if there was a way to flush clean buffers in order to test the performance of reads from the I/O disk system.  

(see community.progress.com/.../86187

Would the Bp option serve the purpose of effectively flushing my buffers?  In other words, whenever I restart an ABL client with private buffers, will all the initial reads to to the disk before it starts using the data cached in buffers?  Nobody had suggested this as an option when we were talking about flushing buffers, and the best ideas were either to restart the entire database or perform a long-running dbanlysis to flush out the shared buffers (maybe in a similar way as what you say happens during a backup).

Posted by Tom Cattigan on 15-Mar-2017 10:20

Yes I would agree wholeheartedly here - -Bp should be used when using any of the online tools.  We have implemented B2 and seen some tremendous performance benefits so this is a pretty serious bug in my opinion.

Posted by ChUIMonster on 15-Mar-2017 10:25

Re: -Bp "stealing" buffers... I think you are thinking about it backwards -- if a block is already in -B then it will not count against -Bp.  If a block referenced by a process using -Bp is NOT in -Bp then rather than replace a "public" buffer with that block one of the -Bp buffers will be used.  Thus "public" buffers do not get flushed by processes doing sequential and non-repeating access (like backups).

So, no, -Bp is not a sneaky way to flush -B.  It is quite the opposite.  A way to *avoid* flushing -Bp.

Posted by Tom Cattigan on 15-Mar-2017 10:26

An online probkup will copy each block into memory - thus it will fill your buffer pool - this is why -Bp is advised.  Note, this will keep your buffer pool relatively clear but the file system cache will be impacted.

Sorry, I don't understand the second question so cannot comment.

Posted by dbeavon on 15-Mar-2017 10:30

Thanks Tom,

But if my ABL client session is reading data on a testing environment that *nobody* else is interested in, then the private (-Bp) buffers will be used for storing that data, even in preference to using free "public" (-B) buffers. ... And the next time the ABL code is started in a brand new client process, it will go back to disk again... Right?

Posted by Ruanne Cluer on 15-Mar-2017 10:46

For Private Read-Only buffers, take a look at:

000022329 - How Progress uses Buffers and when to use Private Buffers (-Bp) ?

knowledgebase.progress.com/.../P95829

And specifically for online probkups:

000021080 - When should the -Bp parameter be used with a Progress Online Probkup?

knowledgebase.progress.com/.../P49128

The initial question raised:

When your database crashes whenever online PROBKUP runs, is it specifically crashing with error (1040) SYSTEM ERROR: Not enough database buffers (-B)

000078422 - A database running with -B2 crashes with error (1040)

knowledgebase.progress.com/.../database-B2-crashes-with-1040-000078422

Does not only affect only PROBKUPS.

It is specifically only when -B2 has Object Level assignments (not at the area level)

The workaround for PROBKUP and for example a dbanalys when the database is started with -B2 and objects have been assigned, is to use Private Buffers (-Bp) which avoids this problem.

If your online PROBKUP is crashing for any other reason, you're not hitting this issue.

Posted by ChUIMonster on 15-Mar-2017 10:58

If your first session accesses more data than fits into whatever you defined for -Bp then the excess will result in buffers being evicted from -Bp.  If nobody else referenced then then they are no longer in memory.

If later on a 2nd session wants to reference some data that your first session referenced and that data is no longer in memory then it will need to be re-read from disk.

It doesn't "go back to disk" -- it is simply removed from the list of blocks that are available in memory.  (Unless you modified some data -- but this discussion is all about reads....)

Offhand I'm not entirely sure what happens if -B is very large and underutilized compared to -Bp.  I suppose that Progress *might* decide that if a block is being "evicted" from -Bp *and* there are unused -B blocks then it could kept around in memory just in case.  It does not seem to me like there would be any harm to that -- except that it would have to be coded and every bit of coding means potential for bugs and unanticipated overhead.  If it were me I probably wouldn't do it.  It is far too speculative and the whole point of -Bp is that you are saying "I don't think anyone else will care about this data".

Posted by Richard Banville on 15-Mar-2017 11:12

Buffers associated with a user connection's -Bp are indeed re-associated with the -B when the user disconnects from the database.  This will not cause paging of the -B since the buffers are already allocated.  The buffer is just removed from one LRU chain and added to the another.

Posted by dbeavon on 15-Mar-2017 14:13

So I'm hearing that -Bp doesn't allow me to reset/flush my buffers after a client disconnects.  The buffers stay in memory.  (Nor does this help with the ultimate goal of testing the performance of reads when they are going all the way back to disk.).

It might be that we have some pretty slow I/O hardware.  But I've often noticed that some ABL code, like certain reports, will execute *extremely* slow if run once in a day, and will be fast on subsequent executions (every ten minutes). There should be a way to troubleshoot and optimize the *first* execution of the report, and in that way isolate the I/O hardware issues that lie beneath the database.  Today the only good way to troubleshoot the *first* execution is to stop and restart the entire OE database.  

In the Windows world we use an I/O tool called "sqlio" for isolating and troubleshooting hardware. And insofar as SQL server itself is concerned, we can use DBCC DROPCLEANBUFFERS whenever we need our database reads to go all the way back to the disk.

I am still absolutely convinced that there is a secret command for easily flushing out the OE buffer pools (ie. removing clean buffers from the "LRU chains" for -Bp and -B).  I *really* wish someone would tell us what it is.  We promise to use it in development and not in production.

Insofar as my initial question goes, are there any thoughts on the likelihood of a new service pack (11.6.4) given the -B2 issues?  I'm hearing that -B2 is a fairly popular feature (or at least it was before it broke).  And I think we were specifically advised to use it with "object level assignments" that reference the specific tables that would benefit the most.  Can someone tell me how many severe bugs need to be found before an OE service pack is made available?  I'd rather stay on 11.6.3 for testing and wait to do a production upgrade after 11.6.4 is available.  

Posted by Thomas Mercer-Hursh on 15-Mar-2017 14:35

The secret command is proshut .... :)

Posted by cjbrandt on 15-Mar-2017 14:44

Flushing the buffer pool is a fairly common question in DBA forums for OpenEdge.  I have never seen a command that would do it, the normal response is to either restart the db or run a backup or table analysis - without using the -Bp option.  DBCC DROPCLEANBUFFERS will flush the database memory blocks, but not the file system cache, similar to if the database was stopped and restarted.

If there is a report that runs slowly the first time and quickly after, then that usually indicates the subsequent runs are using data in the buffer pools.  You can track the physical vs logical reads a report makes.  You could also try removing the "-q" client side parameter so the code in the report will be loaded into memory for each run, otherwise the code stays in memory and that would also make future reports run faster.

I can see the case for creating a 11.6.4.  Most companies don't bother requesting the latest hotfix when they upgrade.  They would upgrade to 11.6.3 and then be at risk for the db crashing during a backup or dbanalys.  

Posted by George Potemkin on 16-Mar-2017 02:01

> Today the only good way to troubleshoot the *first* execution is to stop and restart the entire OE database.

You can start the database with low value of the -B/-B2. It will immitate the unused buffer pool. But it's still not enough for the fair tests. You need to empty the filesystem cache as well. The only accurate way to troubleshoot the first execution is to reboot the whole system.

Posted by George Potemkin on 16-Mar-2017 02:06

> You could also try removing the "-q" client side parameter so the code in the report will be loaded into memory for each run, otherwise the code stays in memory and that would also make future reports run faster.

Documentation says: "With Quick Request (-q), after the initial search, IF the procedure still resides in memory or in the local session-compiled file, the AVM uses that version of the procedure rather than searching the directories again."

Posted by dbeavon on 16-Mar-2017 08:10

>You can start the database with low value of the -B/-B2. It will imitate the unused buffer pool. But it's still not enough for the fair tests.  

Yes, using a very small -B/B2 is the obvious test that bypasses OE buffering.  But to make things fair, even the *first* execution of some custom ABL code should be allowed to use a little bit of database buffering.

I understand about file system caching.  That is between me and my operating system.  Obviously the thing I'm looking for OE to do is help me troubleshoot its own RDBMS buffers in a more flexible way.  

I'm waiting for someone to tell me to submit a feature request ... but the fact of the matter is that there is probably already a feature in there that is internal or is being hidden from us.  I was hopeful that -Bp was finally the answer that I was looking for, but then someone says that the clients private buffers are added to the public buffer pool when the client with the -Bp option disconnects.  What use is that anyway?  The only reason that -Bp was specified in the first place was because the client was going to use data that wouldn't be of interest to other database clients.  (Or else there should be a -Bpx that tells Progress not to put the clients private buffers back in the public buffer pool when the client disconnects.)

Posted by Richard Banville on 16-Mar-2017 09:21

Just to clarify, when a user disconnects from the database, what ever remains in its -Bp is left in memory and "added" to the -B general pool.  However, the user's -Bp of say 10 may have had many more blocks read that are not ever made part of the -B.

Posted by George Potemkin on 16-Mar-2017 09:56

> when a user disconnects from the database, what ever remains in its -Bp is left in memory and "added" to the -B general pool.

Or to the alternate buffer pool (-B2) depending from an object the block belongs to?

Do the private buffers use their own LRU chains even though the -Bp value is limited by 64?

Posted by George Potemkin on 16-Mar-2017 10:07

> I understand about file system caching.  That is between me and my operating system.

BTW, it's possible to empty file system cache. At least on HP-UX:

$ cd mount_point
$ unmount mount_point
umount: cannot unmount /dev/logical/volume : Device busy
umount: return error 1.

But as an side-effect the unmount command will release the memory that was allocated as a cache for the specified file system.

Posted by gus bjorklund on 16-Mar-2017 10:10

> On Mar 16, 2017, at 9:11 AM, dbeavon wrote:

>

> there is probably already a feature in there that is internal or is being hidden from us

not for flushing the buffer pool. there really isn’t.

Posted by Richard Banville on 16-Mar-2017 10:13

"Or to the alternate buffer pool (-B2) depending from an object the block belongs to?"

Unfortunately not at this time.  The -Bp buffers are taken as needed from the -B so all buffers remaining on a users -Bp LRU go back the the -B regardless of the block data they contain.

"Do the private buffers use their own LRU chains even though the -Bp value is limited by 64?"

Every db connection has its own LRU for its -Bp separate from other user's -Bp LRU and separate from the -B LRU and -B2 LRU so the replacement policy is connection specific for its -Bp .

Posted by Ruanne Cluer on 17-Mar-2017 04:25

"even though the -Bp value is limited by 64?"

-Bp is not limited to 64. That's the default. It's limited to 25% of -B and can be increased with -Bpmax

"db connection has its own LRU for its -Bp separate from other user's"

If you want to get an idea of how when the -Bp user disconnects, their -Bp LRU is then simply added to the -B LRU chain,

proserve sports2000 -B 1000 -Bpmax 250

Start a PROMON session: R&D > 1 > 7. Buffer Cache

prowin sports2000 -Bp 100

FOR EACH Customer NO-LOCK:

Display CustNum Name.

END.

In promon, the Used buffers never goes above -Bp

When you exit the -Bp client, their buffers goto the LRU chain

Now run the above query without -Bp

To get the best idea, you ideally want to add more customer records than the 1117

Posted by George Potemkin on 17-Mar-2017 07:08

Ruanne,

I would add one more step to your test:

proutil sports -C enableB2 "Customer/Order Area"

Then start db with/without -B2.

BTW, it would be nice to have an article that would describe the usage of the private buffers a bit more deeply/more complete than the documentation does. Including the -Bp and probkup topic.

Posted by George Potemkin on 17-Mar-2017 14:08

> when a user disconnects from the database, what ever remains in its -Bp is left in memory and "added" to the -B general pool.

Also the tests say that all used blocks from the -Bp are added to the /beginning/ of LRU chain (in other words, they are treated as the most recently used blocks - no matter when the session read them).

Posted by Richard Banville on 17-Mar-2017 15:31

Yes, this is also true.  At the time -Bp was implemetnted the "technology" to add to the LRU end of the chain did not exist.  Since then such a mechanism has been added.  I will file a change request to add these blocks to the LRU end vs the MRU end of the replaement chain.

This thread is closed