What could be the cause of RECID pointers corruption/missing

Posted by Tai Li on 26-May-2014 02:40

Hi All,

Some of my data using RECID pointers randomly went missing. Regret to say that I do not have the database log (.lg ) file for that moment when this incident occurred.

HDD does not have any bad sectors. Database was not touched either. Have done an extensive checking on our application code and highly confident that was not due to application code since the application have been running for number of years with no such issue before.

Kindly advise what could be the cause of this?

Thank you!

All Replies

Posted by Mike Fechner on 26-May-2014 02:50

You know that storing RECID’s and ROWID’s as referened keys in the DB is a bad idea?

Did you dump and load your DB recently? That changes RECID’s and ROWID’d. Or delete and recreate a record.

Physical storage reorganization like a TBLMOVE from one are to another also changes RECID and ROWID.    

Posted by ChUIMonster on 27-May-2014 07:51

Mike understates the case -- "bad idea" doesn't even begin to cover it.

Never, ever store a RECID or a ROWID in  a database table.  Treat RECIDs as very volatile data -- they *might* be consistent from moment to moment during program execution.  You will mostly, probably "get away" with using RECID *variables* (variables -- not db fields) within a narrow bit of code -- but even that will not *always* be safe.

Posted by ezequielmontoya on 27-May-2014 08:25

Agree. In my previous job they had one table using RECID in db fields for referring to another table, and they forgot it when we performed a Dump & Load.

The boss sent me to repair the disaster, so I'll never forget how bad is the idea.

Posted by gus on 27-May-2014 09:21

In addition to the fact that rowids can change as mentioned above by mike and tom, recid and rowid values are not nearly as unique as you might have been led to believe (and the documentation does not say much about it). long ago, in a galaxy far away, a 32-bit recid was always unique within a database. that has not been true since 1998. to be specific

rows in two different tables can have the same rowid values.

rows residing in different partitions of a single table can have the same rowid value.

rows belonging to different tenants in the same multitenant table can have the same rowid value.

Posted by Tim Kuehn on 27-May-2014 09:36

Gus - is what you wrote another way of saying "rows in different areas can have the same ROWID"?

One big "OMG" though is "rows in different partitions of the same table can have the same ROWID".That's a big issue for any program which use ROWID to do things like find the same record as another buffer has only with a different locking status and the like.

Posted by Rob Fitzpatrick on 27-May-2014 10:59

Tim,

Partitions of a table are storage objects.  Two table partitions *can* be in different areas, but they don't have to be.

Posted by gus on 27-May-2014 13:42

@tim

lock id's aren't the same as rowids. buffers have a lot of internal information in them along with the record itself (which you cannot get at but which we can use in the 4GL runtime).

most places in the 4GL where you can use a rowid, there is something else (e.g. rowid(customer)) involved as well as the context of the usage, so the expected thing happens. still, one can subvert the correct behaviour and get into trouble, such as when a rowid is saved away someplace without the rest of the information, then retrieved and (mis)used later.

Posted by Tim Kuehn on 27-May-2014 13:45

Gus -

This is the construct I'm thinking of -

FIND customer

   WHERE some-condition

   NO-LOCK

   NO-ERROR.

FIND CustomerUpdate

   WHERE ROWID(customer) = ROWID(CustomerUpdate)

   EXCLUSIVE-LOCK

   NO-ERROR.

Your comment about the same table having duplicate rowid's makes me concerned that this would not work anymore. If that's not the case - ie it's the same table for different tenants - then we're good.

Posted by gus on 27-May-2014 13:59

Experience is what you get when you don't get what you want.

Even without considering the possibility of duplicate rowids, that code might not work the way you want because after the first find, but before the second, the following could happen:

a second transaction comes and deletes the customer you found and then commits.

a third transaction comes along and creates a new record with the same rowid as before and commits.

if the new record is for a different table, then the find customerupdate will not find it.

if the new record is for the same table, then the find will succeed but the record will be the new one, not the one you found before.

Posted by Tim Kuehn on 27-May-2014 14:13

I've used this construct for years, and have never run into the situation you described. I have had cases where I persisted a ROWID for something, and then had another part of the program delete a record for that table and make a new one, which meant the ROWID pointed to an "illegal' record.

If the scenario you describe happened "very often", most of the NO-LOCK code out there wouldn't work.

Posted by Mike Fechner on 27-May-2014 14:17

Hi Gus,
 
is a FIND CURRENT … EXCLUSIVE-LOCK guaranteed to return the same record (if it still exists) or none -  or can the same things happen and FIND CURRENT suddenly returns a different record….
 
Mike
 

Posted by Thomas Mercer-Hursh on 27-May-2014 14:21

And, in many applications, even if the record was updated in between, one would want the new version.  Only if there were a meaningful time delay ... e.g., if it went out to the user in between, would one worry and then, of course, one would use standard optimistic locking behavior to detect the possible change.  The delete in between should be catchable and thus gracefully handled.  Note that doing the second find with a GUID or whatever doesn't actually change the scenario except for your third case, which seems a fairly thin chance.

Which said, I can't see any reason why a find by RECID would be better than a find by GUID unless one was dealing with legacy code where one had to use multiple fields to uniquely identify the record.

Posted by Mike Fechner on 27-May-2014 14:27

Which said, I can't see any reason why a find by RECID would be better than a find by GUID unless one was dealing with legacy code where one had to use multiple fields to uniquely identify the record.

I think the thread proves the opposite. As we cannot control if/when/how/why RECID’s are reused GUID’s have a great advantage: You are in controls of reusing them or not and if you are a reasonable developer, you’d never reuse them for different records.

Posted by Tim Kuehn on 27-May-2014 14:35

[quote user="Thomas Mercer-Hursh"]Which said, I can't see any reason why a find by RECID would be better than a find by GUID unless one was dealing with legacy code where one had to use multiple fields to uniquely identify the record.[/quote] 

Because a lot of legacy code out there predates the availability of a GUID, and ROWID/RECID can serve as a reasonable proxy for certain restricted cases.

Posted by Thomas Mercer-Hursh on 27-May-2014 14:44

Hence my "unless".  Been there, done that.  Even then, one has to admit that the real motivation for doing the find using a recid instead of on the three fields or whatever that make up the primary key is laziness in typing ... and, a certain clarity that the intent is "the same one".

Posted by Tim Kuehn on 27-May-2014 14:50

[quote user="Thomas Mercer-Hursh"]Even then, one has to admit that the real motivation for doing the find using a recid instead of on the three fields or whatever that make up the primary key is laziness in typing [/quote]

Actually - no.There are situations where I don't care what record it is or where it comes from in order to "do something to/with it." In other cases the record could be part of a FOR EACH or query result list and doesn't have a "unique" set of fields to get a locked version of the record with. 

Posted by Thomas Mercer-Hursh on 27-May-2014 15:11

You have database tables where there is no unique key for the record????

Posted by Tim Kuehn on 27-May-2014 15:12

yes.

Posted by Mike Fechner on 27-May-2014 15:13

If he has them, he's unfortunately not alone with that.

Always the biggest mess when introducing ProDatasets at customers with legacy DB's...=

Posted by Tim Kuehn on 27-May-2014 15:14

Or the code in question doesn't know what table the record is coming from.

Posted by Thomas Mercer-Hursh on 27-May-2014 15:22

Well, I can only hope that this is an inherited schema .... but how can you hope to find the right thing if you don't know the table?

Posted by Tim Kuehn on 27-May-2014 15:26

If the buffer handle is available and I know the record's RECID, then knowing which table it came from or it's primary key becomes irrelevant.

Posted by gus on 27-May-2014 15:27

@tim: i did not say anything about how often the code might fail, only that it /could/. i know that it does happen because awhile back we had to add some error handling code in the 4GL to handle that case. but as you say, it will be rare. very rare.

the failure scenario now isn't bad. you just have to handle the case when the row no longer exists and the second find does not return anything. when the rowid has been taken by a row in another table, then the find returns not found instead of the wrong row.

also, i should have mentioned that the scenario where the third transaction creates a new row for a different table cannot occur if you are using type ii data areas.

@mike: find current could do the same thing.

@tmh: nothing wrong with using find by recid in the right circumstances. since a rowid is an encoding of the storage location of the row, we can fetch it directly, without doing an index lookup, which typically involves several additional block accesses.

Posted by ChUIMonster on 27-May-2014 15:33

Using the RECID does skip the index lookup.  But that's an awfully skimpy justification.

Posted by ChUIMonster on 27-May-2014 15:37

I can see that the bit about partitions is going to be a problem.  Areas and Tenants no issue.  But potential duplication between partitions of a table is almost certainly going to break stuff.  I'm embarrassed to admit that I may have some code that needs remediation.

On the other hand it isn't like we haven't been warned not to do it for 25 years or so...

Posted by andrew.may on 28-May-2014 08:20

I seem to remember that some of the internals of the ADM1/2 frameworks passed around the rowid of a record as its unique ID (I might be remembering wrongly though).

Can anyone remember if this is true?

if so, does anyone know if the frameworks have been changed to be partition-safe (or are partitions just incompatible with ADM framework).

Posted by Mike Fechner on 28-May-2014 08:24

ADM2 batching is based on ROWID‘s
But as long as you are not switching the tenant I don’t see a problem.
 
 
Von: andrew.may [mailto:bounce-andrewmay@community.progress.com]
Gesendet: Mittwoch, 28. Mai 2014 15:22
An: TU.OE.RDBMS@community.progress.com
Betreff: RE: What could be the cause of RECID pointers corruption/missing?
 
Reply by andrew.may

I seem to remember that some of the internals of the ADM1/2 frameworks passed around the rowid of a record as its unique ID (I might be remembering wrongly though).

Can anyone remember if this is true?

if so, does anyone know if the frameworks have been changed to be partition-safe (or are partitions just incompatible with ADM framework).

Stop receiving emails on this subject.

Flag this post as spam/abuse.

Posted by andrew.may on 28-May-2014 08:27

Gus said "rows residing in different partitions of a single table can have the same rowid value."

That (to me) sounded like it would affect a single tenant DB if table partitioning was used.

Grepping the ADM1 source  for "rowid" shows up a lot of places where the ROWID seems to be used as a unique record id.

Posted by Mike Fechner on 28-May-2014 08:30

That’s true. In a multi-tenant scenario.
 
 
 
Von: andrew.may [mailto:bounce-andrewmay@community.progress.com]
Gesendet: Mittwoch, 28. Mai 2014 15:28
An: TU.OE.RDBMS@community.progress.com
Betreff: RE: What could be the cause of RECID pointers corruption/missing?
 
Reply by andrew.may

Gus said "rows residing in different partitions of a single table can have the same rowid value."

Stop receiving emails on this subject.

Flag this post as spam/abuse.

Posted by Stefan Marquardt on 28-May-2014 08:34

I thought he means the new announced OE 11.x feature: table partitioning (horizontal).

Posted by andrew.may on 28-May-2014 08:35

Since Gus said both ...

"rows residing in different partitions of a single table can have the same rowid value."

... and ... 

"rows belonging to different tenants in the same multitenant table can have the same rowid value."

... I assumed that he was talking about the new table partitioning functionality, not just the multi-tenancy features.

Not a major problem for me, as I've got no immediate plans to use table partitioning.  It just sounded like a nasty gotcha for anyone that decided to partition a legacy DB that was being maintained by ADM1 code

Posted by Tim Kuehn on 28-May-2014 08:35

[quote user="Mike Fechner"]

That’s true. In a multi-tenant scenario.

[/quote] In a MT scenario, it's not really the same table, but an seperate instance of a table.

Posted by Mike Fechner on 28-May-2014 08:37

… but not yet there.
 

Posted by Rob Fitzpatrick on 28-May-2014 08:45

There was a question about this (re-finding a record by rowid) in RichB's table partitioning session at Exchange last year, and whether such code would be broken when using it for a partitioned table.  Rich didn't go into specifics but the gist of his answer was that they have thought about this scenario in the design of TP and they have a way to deal with it.

Stay tuned, there should be more details available at PUG Challenge.

Posted by gus on 28-May-2014 09:30

@tim: yes, it is a different instance of a table and the partitioning is somewhat different with multitenancy than horizontal table partitioning. the main difference is that with multitenancy there is nothing in the data that has to do with partitioning since it is all based on identity.

@everybody: yes, by all means go to PUG Challenge and ask Rich about table partitioning and rowids. not too late to sign up.

btw:i was surprised to learn of a small problem with ADM code and rowids and multitenancy. in the ADM, the code sometimes wants to find a table's template row which it does by getting its rowid from the table's _file entry and the doing a find by rowid. that fails on a multitenant table because the template is not in the tenant's data partition. i've been told it is not hard to fix though.

Posted by Phillip Molly Malone on 28-May-2014 19:00

Can confirm there was an issue with this (MT and ADM). From looking at the code (from memory) it looked like it used the template record to get initial values and that a minor change to the code could do the same with dynamic references to the table.

Posted by Håvard Danielsen on 29-May-2014 11:58

The term ADM is ambiguous, so to clarify:    

-------------

ADM1 uses  _File._template to find the template record in the database table in order to initialize values for a new record. This was actually a rather common way to generically initialize a new record outside a transaction without doing a create when dynamic code and buffers did not exist.  

One can make the code “MT-compliant” and replace the template usage with dynamic code -  adm/method/tableio.i  adm-add-record.  

ADM1 uses rowids as record identifiers for navigation and update and cannot be considered "partition-safe".

Posted by Mike Fechner on 29-May-2014 12:00

I think the second ADM1 should have been ADM2, right?

Mit freundlichen Grüßen / Best regards,
 
Mike Fechner
Geschäftsführer / Director
 
Consultingwerk Ltd.
Unter Käster 1
50667 Köln / Germany
 
Tel.:  +49 221 / 6 77 88 55 0
Fax.: +49 221 / 6 77 88 55 5
Mobil: +49 171 / 4 357 484
 
 
Experts in OpenEdge & .NET

Am 29.05.2014 um 17:58 schrieb "Håvard Danielsen" <bounce-hdaniels@community.progress.com>:

The term ADM is ambiguous, so to clarify:    

-------------

ADM1 uses  _File._template to find the template record in the database table in order to initialize values for a new record. This was actually a rather common way to generically initialize a new record outside a transaction without doing a create when dynamic code and buffers did not exist.  

One can make the code “MT-compliant” and replace the template usage with dynamic code -  adm/method/tableio.i  adm-add-record.  

ADM1 uses rowids as record identifiers for navigation and update and cannot be considered "partition-safe".

Posted by Håvard Danielsen on 29-May-2014 12:10

No, I meant ADM1.      
 
Edit: This was in response to Mike Fechner, not myself...
 
[collapse]
From: Håvard Danielsen [mailto:bounce-hdaniels@community.progress.com]
Sent: Thursday, May 29, 2014 12:59 PM
To: TU.OE.RDBMS@community.progress.com
Subject: RE: [Technical Users - OE RDBMS] What could be the cause of RECID pointers corruption/missing?
 

The term ADM is ambiguous, so to clarify:    

-------------

ADM1 uses  _File._template to find the template record in the database table in order to initialize values for a new record. This was actually a rather common way to generically initialize a new record outside a transaction without doing a create when dynamic code and buffers did not exist.  

One can make the code “MT-compliant” and replace the template usage with dynamic code -  adm/method/tableio.i  adm-add-record.  

ADM1 uses rowids as record identifiers for navigation and update and cannot be considered "partition-safe".

[/collapse]

Posted by Mike Fechner on 29-May-2014 12:14

Ok. ADM2 RowIdent's do also contain ROWID values.


Am 29.05.2014 um 18:11 schrieb "Håvard Danielsen" <bounce-hdaniels@community.progress.com>:

No, I meant ADM1.       
 
[collapse]
From: Håvard Danielsen [mailto:bounce-hdaniels@community.progress.com]
Sent: Thursday, May 29, 2014 12:59 PM
To: TU.OE.RDBMS@community.progress.com
Subject: RE: [Technical Users - OE RDBMS] What could be the cause of RECID pointers corruption/missing?
 

The term ADM is ambiguous, so to clarify:    

-------------

ADM1 uses  _File._template to find the template record in the database table in order to initialize values for a new record. This was actually a rather common way to generically initialize a new record outside a transaction without doing a create when dynamic code and buffers did not exist.  

One can make the code “MT-compliant” and replace the template usage with dynamic code -  adm/method/tableio.i  adm-add-record.  

ADM1 uses rowids as record identifiers for navigation and update and cannot be considered "partition-safe".

[/collapse]

Posted by Håvard Danielsen on 29-May-2014 13:15

andrew.may wrote:

if so, does anyone know if the frameworks have been changed to be partition-safe (or are partitions just incompatible with ADM framework).

Mike Fechner wrote

Ok. ADM2 RowIdent's do also contain ROWID values.

--------------

ADM2 uses rowids as record identifiers when updating and when batching.  

The update is protected by the optimistic locking checking, so it is unlikely to cause any damage if the rowids have been changed (due to a move). However the optimistic lock check is also using rowids to find the current record(s)… (see fixes below)

Over the decades ADM2 has been extended with request methods that use keys instead of rowids and as of current only batching still relies on rowids. All other requests have been refactored to use key based methods.

Note that the older the ADM2 version is the more likely it is that rowids are used internally. In older versions visual objects also use rowids (concatenated string of rowids) as the identifier to their data sources. (The only practical way to avoid this is to upgrade)      

This means that existing customer code may still use the old rowid based methods.

ADM2  11.4 will have the following fixes to make updates work transparently over a partitioned database.  

Problem:  Errors on subsequent saves after an update that moved the record to another partition.

Fix: The rowids that identifies the database records of an SDO row are refreshed on every save.  

Problem: Record does not exist or is deleted error on save if someone else moved the records (in the SDO join) after you read it.  

Fix: The optimistic locking check will do a second find using keys if the rowid fails in a partitioned database in order to give the correct error. (changed or really deleted)

No plans to fix adm2 batching to use true keys.   

As mentioned the visual objects do not depend on rowids in recent versions. The DataView is a  “rowid-free” alternative that can replace existing SDO and SBO data-sources also if they are batching  (you have to write your own backend though…)      

What you may have to fix yourself  

The SDO (SBO?) refreshRow method is one known case that still uses rowids internally and that does not have an alternative key based method. This is not used by the framework itself, but it is commonly used to refresh a single row for example in cases where the update is done from a different object  and data-source.

We did add a getUseKeyOnRefresh() function in some version 11 release that can be overridden to return true and make refreshRow pass a query expression to the server instead of the rowid(s). Note that this is not backed by a true property, so it has to be overridden in code (cannot be changed in Dynamics repository).

Posted by RWEBSTER on 30-May-2014 05:25

>>ADM2  11.4 will have the following fixes to make updates work transparently over a partitioned database.  

Can the same be said of the flaws you mention for ADM1? Or are there no plans to address these?

Posted by gus on 30-May-2014 13:34

Havard mentions the problem of "errors on subsequent saves after an update that moved the record to another partition". that problem can occur when using table partitioning and you update a row's partition key value such that it has to go to a different partition.

as a rule, that is not a good thing to to be doing but we recognize that there may be special circumstances in which you may have to do it.

moving a row to another partition requires deleting it from its original location along with all its index entries and then creating it anew in another location along with new index entries. this can have a deleterious effect on performance if you do it a lot.

therefore, though table partitioning is intended to be transparent to 4GL applications, when adding new rows, you had should make sure to assign the partition key value(s) before the row gets created.

Posted by Håvard Danielsen on 30-May-2014 14:05

>> Can the same be said of the flaws you mention for ADM1? Or are there no plans to address these?

There are no plans to improve or address these issues in ADM1. The product is way passed the end of its life cycle. I do not know the official term... "mature" maybe?    

Posted by Thomas Mercer-Hursh on 30-May-2014 14:15

"Mature" in the same sense as one might use the term for a very old piece of Limburger? :)

Posted by Håvard Danielsen on 30-May-2014 15:05

>>  "Mature" in the same sense as one might use the term for a very old piece of Limburger? :)

Pretty much.

Posted by ChUIMonster on 04-Jul-2014 15:34

> ... why a find by RECID would be better than a find by GUID

I hate to mention it because it in no way justifies doing it except in *very* limited special cases that probably don't apply anymore and which can usually be solved in other ways anyhow, but for completeness, the traditional reason is that find by RECID bypasses the index.  Find by GUID (or anything other than a RECID) will first find the index and then find the data.

This thread is closed