AppServers getting different values from the DB

Posted by ojfoggin on 14-Jul-2011 09:23

I've been mornitoring an error for the past few days and despaite getting as much info out of the system as I can (by logging processes and states etc...) I still can't work out how it is possible for this to happen at all.

What is happening is people in the depots are amending orders and before any orders are added/amended the system checks that no one else has been in and done amendments in the mean time.  If amendments have been made then it throws back an error along the lines of "Someone else has changed this order.  Your changes cannot be accepted".

We have one company who make a large amount of amendments to orders, adding additional stuff, changing stuff, etc... and over the past few days we've had a problem where they are seeing this error even when no one else has been in the order.

Each time an amendment is made the system currently keeps a track of those amendments (there used to be a limit of 10 as the notes for these were kept in an array).  After each amendment the field amendlevel is incremeted by 1.  The logs below have come from the appserver and are all from the same order being amended several times.  At the beginning of the logs the amendLevel = 5. (I have edited in the amend level details as they go, they are not part of the logs).

amendLevel = 5

- [11/07/14@14:25:54.878+0100] P-002609 T-000001 1 AS -- (Procedure: 'libapp/ordlib/ordlib-comparemob.p' Line:3181) Comparing order number 91019377
- [11/07/14@14:25:54.880+0100] P-002609 T-000001 1 AS -- (Procedure: 'libapp/ordlib/ordlib-comparemob.p' Line:3182) Current dorder value 91019377/5
- [11/07/14@14:25:54.883+0100] P-002609 T-000001 1 AS -- (Procedure: 'libapp/ordlib/ordlib-comparemob.p' Line:3206) Comparison OK for order 91019377
- [11/07/14@14:25:54.968+0100] P-002662 T-000001 1 AS -- (Procedure: 'libapp/ordlib/ordlib-addorder.p' Line:4356) Update order 91019377
- [11/07/14@14:25:54.970+0100] P-002662 T-000001 1 AS -- (Procedure: 'libapp/ordlib/ordlib-addorder.p' Line:4357) dorder now set to 91019377/6

amendLevel = 6

- [11/07/14@14:27:26.213+0100] P-002609 T-000001 1 AS -- (Procedure: 'libapp/ordlib/ordlib-comparemob.p' Line:3181) Comparing order number 91019377
- [11/07/14@14:27:26.216+0100] P-002609 T-000001 1 AS -- (Procedure: 'libapp/ordlib/ordlib-comparemob.p' Line:3182) Current dorder value 91019377/6
- [11/07/14@14:27:26.218+0100] P-002609 T-000001 1 AS -- (Procedure: 'libapp/ordlib/ordlib-comparemob.p' Line:3206) Comparison OK for order 91019377

amendLevel = 7

- [11/07/14@14:28:16.564+0100] P-002393 T-000001 1 AS -- (Procedure: 'libapp/ordlib/ordlib-comparemob.p' Line:3181) Comparing order number 91019377
- [11/07/14@14:28:16.570+0100] P-002393 T-000001 1 AS -- (Procedure: 'libapp/ordlib/ordlib-comparemob.p' Line:3182) Current dorder value 91019377/6
- [11/07/14@14:28:16.573+0100] P-002393 T-000001 1 AS -- (Procedure: 'libapp/ordlib/ordlib-comparemob.p' Line:3206) Comparison OK for order 91019377

amendLevel = 8

- [11/07/14@14:31:29.496+0100] P-002721 T-000001 1 AS -- (Procedure: 'libapp/ordlib/ordlib-comparemob.p' Line:3181) Comparing order number 91019377
- [11/07/14@14:31:29.498+0100] P-002721 T-000001 1 AS -- (Procedure: 'libapp/ordlib/ordlib-comparemob.p' Line:3182) Current dorder value 91019377/6
- [11/07/14@14:31:29.499+0100] P-002721 T-000001 1 AS -- (Procedure: 'libapp/ordlib/ordlib-comparemob.p' Line:3206) Comparison OK for order 91019377

amendLevel = 9

- [11/07/14@14:33:44.369+0100] P-002393 T-000001 1 AS -- (Procedure: 'libapp/ordlib/ordlib-comparemob.p' Line:3181) Comparing order number 91019377
- [11/07/14@14:33:44.370+0100] P-002393 T-000001 1 AS -- (Procedure: 'libapp/ordlib/ordlib-comparemob.p' Line:3182) Current dorder value 91019377/6
- [11/07/14@14:33:44.372+0100] P-002393 T-000001 1 AS -- (Procedure: 'libapp/ordlib/ordlib-comparemob.p' Line:3187) Compare Error
- [11/07/14@14:33:44.373+0100] P-002393 T-000001 1 AS -- (Procedure: 'libapp/ordlib/ordlib-comparemob.p' Line:3188) S  -  91019377
- [11/07/14@14:33:44.375+0100] P-002393 T-000001 1 AS -- (Procedure: 'libapp/ordlib/ordlib-comparemob.p' Line:3189) amenddate,amendinits,amendlevel,amendreason
- [11/07/14@14:33:44.376+0100] P-002393 T-000001 1 AS -- (Procedure: 'libapp/ordlib/ordlib-comparemob.p' Line:3190) field name  - temp-table - DB
- [11/07/14@14:33:44.377+0100] P-002393 T-000001 1 AS -- (Procedure: 'libapp/ordlib/ordlib-comparemob.p' Line:3196) amenddate   -  ?   -   ?
- [11/07/14@14:33:44.379+0100] P-002393 T-000001 1 AS -- (Procedure: 'libapp/ordlib/ordlib-comparemob.p' Line:3196) amendinits   -  ?   -   ?
- [11/07/14@14:33:44.380+0100] P-002393 T-000001 1 AS -- (Procedure: 'libapp/ordlib/ordlib-comparemob.p' Line:3196) amendlevel   -  10   -   8
- [11/07/14@14:33:44.381+0100] P-002393 T-000001 1 AS -- (Procedure: 'libapp/ordlib/ordlib-comparemob.p' Line:3196) amendreason   -  ?   -   ?
- [11/07/14@14:33:52.553+0100] P-002372 T-000001 1 AS -- (Procedure: 'libapp/ordlib/ordlib-comparemob.p' Line:3181) Comparing order number 91019377
- [11/07/14@14:33:52.555+0100] P-002372 T-000001 1 AS -- (Procedure: 'libapp/ordlib/ordlib-comparemob.p' Line:3182) Current dorder value 91019377/6
- [11/07/14@14:33:52.558+0100] P-002372 T-000001 1 AS -- (Procedure: 'libapp/ordlib/ordlib-comparemob.p' Line:3206) Comparison OK for order 91019377
- [11/07/14@14:33:52.645+0100] P-002609 T-000001 1 AS -- (Procedure: 'libapp/ordlib/ordlib-addorder.p' Line:4356) Update order 91019377
- [11/07/14@14:33:52.647+0100] P-002609 T-000001 1 AS -- (Procedure: 'libapp/ordlib/ordlib-addorder.p' Line:4357) dorder now set to 91019377/7

amendLevel = 10

(the ?s are because they are arrays but they are related directly to amendLevel so the difference will be that the arrays have 8 entries instead of 10).

As you can see the comparison between the temp-table and the db at 14:33:44 failed because the db amendLevel = 8.  This is not the case though as the amendment before set the db amendLevel = 9.

The only thing I can think that might be relevant is that the update to amendLevel = 9 happened on a different AppServer agent (but then so did all the others).  I have checked that no temp-tables are left lying around in between calls and they aren't.  The system empties all temp-tables before populating them with new data and most of the procedures are running non-persistently across the client-server connection anyway and define the temp-tables a few lines before populating them.

Is it possible to have different AppServer agents caching their own data somewhere?  Or using buffered information instead of the actual DB information?  i.e. the DB is being updated but the buffered information isn't and then the info is retrieved from the old buffer info?

We're using NFS caching and going to turn that off tomorrow to see if it makes a difference but we have several DBs using this and only one causing these errors.

If anyone has any ideas where we can look next or any advice on what logs to look at I'd appreciate it a lot as I'm beginning to pull my hair out

Thanks for any help

Oliver

All Replies

Posted by jmls on 14-Jul-2011 09:31

make sure all appservers are running with -rereadnolock

Julian

Posted by Admin on 14-Jul-2011 09:36

Check the -rereadnolock parameter. A must have for AppServers

Posted by ojfoggin on 14-Jul-2011 10:12

Thanks both.

We will be putting the setting onto the AppServer tonight and I'll keep an eye on it.

The problem is that this problem is very random in the way it works.  Not consistent at all.  If it doesn't happen for a week it should be fine

I'll keep you posted!

Thanks

Posted by ojfoggin on 15-Jul-2011 04:22

Well, I had a read of the docs on the -rereadnolock switch and we have implemented that now but this error only started happening since the last release and from the stuff it was mentioning about buffers and stuff I started searching.

We have one procedure (common.p) that now runs on startup of the AppServer and runs persistently as a super procedure for the session.

Previously one of the internal procs of common.p started with...

def buffer order for order.

def buffer orderline for orderline.

find order where blah blah blah

     no-lock no-error.

since the release this IP has been changed and now it just does...

find order where blah blah blah

     no-lock no-error.

I am almost 100% convinced that this is the cause of the error and combined with the lack of -rereadnolock it is somehow retaining the buffered version of the order record.

I have now put the defined buffers back into the IP and we have also put the -rereadnolock on the AppServer.

I'll keep an eye on it but I think we've caught it now.

Thanks again for the help!

Oliver

P.S. I gave the correct answer to the first reply but they were actually both correct

Posted by Tim Kuehn on 15-Jul-2011 07:30

The original version scoped the buffers to the IP, while the removal of those "Define buffer" statements allowed the scope to go to the procedure. If successive FIND order's got the same record back, w/out rereadnolock it's possible it "kept" an old copy of the buffer rather than get a current one from the db. If one doesn't want to use rereadnolock for the session for some odd reason, FIND CURRENT will also ensure a fresh version is read from the db.

Posted by Thomas Mercer-Hursh on 15-Jul-2011 11:19

The difference between the two is likely to be buffer scope.  By defining a buffer, even with the same name as the table, it limits the scope to the IP whereas any other reference in the program will cause it to be scoped to the procedure.  This has more consequences than the one you noticed!  You can confirm it with COMPILE LISTING.

You should *always* define a buffer to scope to an IP.  Personally, I find this practice of naming the buffer the same as the table leads to confusion ... which is probably why it got dropped in your case.  I heartily recommend some other naming convention like b_Order so that you know exactly what you are dealing with an where it is defined.  Some people disagree with me, but nothing new in that.

This thread is closed