GUID Globally unique?

Posted by Thomas Mercer-Hursh on 06-May-2015 16:36

In this thread https://community.progress.com/community_groups/openedge_development/f/19/t/17038.aspx?pi20882=3

Mark Opfer claimed that they would occasionally get duplicate GUIDs from different machines.

In this Knowledgebase article https://progress.my.salesforce.com/articles/Article/P117460?popup=true it indicates that the GUIDs are composed of a timestamp, a piece from the MAC address of the machine, and a large random number.

To me, this suggests that two GUIDs could be produced on the same machine that would have the same timestamp and MAC address, but that the large random number would keep them unique.  Whereas, two GUIDs produced on different machines would not have the same MAC address and therefore could never be identical.

The knowledgebase article does not specifically address GUID with no arguments, but my understanding is that this is identical in function to GUID(GENERATE-UUID).

Could we get some official confirmation?

And, Mark, if you are listening, could we hear some more specifics?

All Replies

Posted by ntwatkins on 06-May-2015 17:33

If the GUID generation is only using a "piece" of the MAC address, could it not be possible to get duplicate values from different machines?  Would it not depend upon what pieces of the MAC address are being used?

Posted by Matt Gilarde on 07-May-2015 04:27

GUID and GUID(GENERATE-UUID) are equivalent. Both generate GUIDs using the same algorithm.

I haven't stepped through all of the code but from what I have seen we don't use the MAC address. Instead, we use a combination of the timestamp, some information from the system, and random numbers. I'm not in a position to make an official statement, but I don't believe that the implementation guarantees uniqueness but rather relies on it being statistically improbable that the same GUID will be generated twice.

I, too, would be interested in specifics about a situation in which collisions were found. I believe it's possible that there could be a collision but it should be like winning the lottery three times in a row and not something that happens frequently enough for someone to use the word "occasionally". That suggests that the algorithm may be flawed and would need to be fixed.

Posted by Michael Jacobs on 07-May-2015 04:53

The raw 16 byte UUID generator used for the ABL language GUID function and the OpenEdge database audit trail is an implementation of a Type 3 UUID.   For the MD5 source it uses a buffer constructed from the OS hostname, process-id, and current time in micro-seconds.   There are compensations embedded that account for generating more than one UUID per system clock tick as CPU speeds increase, and for handling the overflow if the maximum UUIDs/tick is exceeded.  Multi-threaded processes will add mutex locking to minimize collisions during the tracking of how many UUIDs are generated per clock tick.

Because the algorithms are based on MD5 and system clock data, there may always be a possibility of collision.  There was a bug that was fixed in the initial releases that could infrequently generate a collision.   We have not had a report of UUID collisions in the 11.x release series (that I recall).  If you do find a recurring collision it should be reported.

Hope that information helps.

Posted by mopfer on 07-May-2015 07:56

To answer the request for more specifics of our case:  We had a couple of cases where the same GUID value was generated from code that was executing on two separate machines. This was a few years ago and we were using 10.2b at that time. 


Our "universe" is the database, so we added a bit of information that we knew was unique to the connected user at the database level to the GUID value that was generated to get a value that we knew would always be unique at the database level, and we haven't had a collision since.


Mark O



[collapse]
From: Michael Jacobs <bounce-mjacobs@community.progress.com>
Sent: Thursday, May 7, 2015 4:54 AM
To: TU.OE.Development@community.progress.com
Subject: RE: [Technical Users - OE Development] GUID Globally unique?
 
Reply by Michael Jacobs

The raw 16 byte UUID generator used for the ABL language GUID function and the OpenEdge database audit trail is an implementation of a Type 3 UUID.   For the MD5 source it uses a buffer constructed from the OS hostname, process-id, and current time in micro-seconds.   There are compensations embedded that account for generating more than one UUID per system clock tick as CPU speeds increase, and for handling the overflow if the maximum UUIDs/tick is exceeded.  Multi-threaded processes will add mutex locking to minimize collisions during the tracking of how many UUIDs are generated per clock tick.

Because the algorithms are based on MD5 and system clock data, there may always be a possibility of collision.  There was a bug that was fixed in the initial releases that could infrequently generate a collision.   We have not had a report of UUID collisions in the 11.x release series (that I recall).  If you do find a recurring collision it should be reported.

Hope that information helps.

Stop receiving emails on this subject.

Flag this post as spam/abuse.

[/collapse]

Posted by Thomas Mercer-Hursh on 07-May-2015 09:17

So, Mark, is it likely that you encountered the earlier bug and would not need the extra bit now?

Posted by mopfer on 07-May-2015 09:55

It is likely that we encountered the earlier bug.  It sounds like there is still a tiny possibility of a collision though, so we will continue to use the extra bit. 
 
 
[collapse]
From: Thomas Mercer-Hursh [mailto:bounce-tamhas@community.progress.com]
Sent: Thursday, May 07, 2015 9:18 AM
To: TU.OE.Development@community.progress.com
Subject: RE: [Technical Users - OE Development] GUID Globally unique?
 
Reply by Thomas Mercer-Hursh

So, Mark, is it likely that you encountered the earlier bug and would not need the extra bit now?

Stop receiving emails on this subject.

Flag this post as spam/abuse.

[/collapse]

Posted by Michael Jacobs on 07-May-2015 14:36

I think the appropriate adage here is: "If it is not broken, do not fix it".   I agree with Mark.

Posted by Thomas Mercer-Hursh on 07-May-2015 14:40

Well, yes, I would hardly expect Mark to go back and undo the fix just to see if it was still needed.  The real question is for the rest of us and whether we need to apply Mark's caution or can rely on GUID.

Posted by Stefan Drissen on 07-May-2015 14:56

Note also that the GUID function was broken in 11.2 and 11.3, see:

community.progress.com/.../49033.aspx

and

community.progress.com/.../8986.aspx

Posted by Thomas Mercer-Hursh on 07-May-2015 15:05

So, the question is, given post 11.3 or whatever qualification is required, how guaranteed is it on one platform and across multiple simultaneous platforms.  Do we need to be adding something like Mark did ... and, if so, why doesn't PSC add it for us?

Posted by TheMadDBA on 07-May-2015 15:13

Even with the best GUID/UUID logic there is a chance of collisions... the chance for most versions is smaller than somebody winning the lottery several hundred (or thousand)  times in a row but it is still possible.

You can add random user/session specific things to the GUID and still theoretically have collisions. Still exceedingly rare but possible.

If you are planning on consolidating data between databases it would probably be a good idea to add a site component to the GUID or an extra site column. If you aren't then probably just standard error handling on inserts would be fine.

This thread is closed