Uniqueness of UUIDs

Posted by jtownsen on 30-Apr-2007 11:55

OpenEdge 10 offers a great new function to generate-uuid's. Someone once told me that the 16 bytes generated are universally unique - ever. Now, I'm a trusting kind of a guy and I believe it, but what do we need to do to get a usable uuid in progress?

For me, I like IDs that I can read. We can base64 encode the 16 bytes, ending up with a 22 byte value consisting of the values +/. 64 different characters.

Now since I'm working on a WebSpeed project and I want to pass IDs in a query string, the + and / cause me a couple of problems. I thought about substituting these characters with something that doesn't cause url-encoding issue and while I was thinking about it, realised that for a uuid, "a" <> "A".

That means that throughout my whole application, whenever I work with a base64 encoded UUID, I'm going to have to do some case sensitive checking. In addition, I'm going to have to enable case sensitivity for the ID fields in my database. Of course, I could do that, but I'm essentially lazy

If we hex-encode the 16 bytes, we end up with 32 bytes consisting only of the characters . This solves my case sensitivity issues at the expense of space in the database tables, indices, database buffers, client memory, network packets, etc, etc.

What if we went with a base32 encoded value. We should end up with 26 bytes consisting of something readable. A quick search shows the following page: http://www.crockford.com/wrmg/base32.html

Obviously there are advantages and disadvantages of any approach. I'm interested in any thoughts.

All Replies

Posted by Thomas Mercer-Hursh on 30-Apr-2007 12:15

Myself, I don't see much reason not to use the native form internally and only go to some alternate form if you need to expose them in a web context or somesuch where the character set is going to be a problem. There, it seems like the base 32 solution is attractive for the reasons stated. If you have a need to visualize it elsewhere, you can always use the same routine to make it visual. But, it certainly isn't going to be easy to pronounce or even transcribe, no matter what you do.

Posted by jtownsen on 03-May-2007 06:29

The problem with trying to use the "native" form of a UUID is that it's a 16 byte raw value. Since you can't index raw fields in the Progress DB, you need to make it into something else to be useful. Then comes the question posed above.

I agree that none of the options will give you something that is really easily readable, but I think the question still stands. Since this particular application is a WebSpeed app, should we bite the bullet with the size of the key so that we don't ever have to worry about URL encoding/decoding them or should we go with 2 different values - one internal (base-64 encoded case sensitive) and one external (base-32 or base-16 encoded).

If we go with different internal and external values, at which level should we do something like this. We could store base-64 in the database and convert to base-32/16 in the DAO, but we haven't really saved any space except in the database. All client memeory, temp-tables, network traffic, etc. still have the longer values.

We could do the conversion in the "client" code, just before sending to the web stream or just after getting a response, but then we might as well just url-encode/decode the values. Oh, and then we need to program the whole application logic case sensitive.

Posted by fbe on 03-May-2007 06:57

Maybe it's just me, but what is wrong with using the 36 byte GUID that you can create out of the raw UUID value?

GUID(GENERATE-UUID)

This will get you a 'readable' string based on your raw UUID. It only has in it so none of your '+/' etc. characters. You will add some bytes compared to the 22 bytes using base64 but none of the problems exist.

Do you think having the extra byte load in the stuff you mentioned (like client memory ) is going to be an issue? You could take out the '-' characters and save an additional 4 bytes so ending up with your hex-encode length...

This thread is closed