GUID/sequence

Posted by agent_008_nl on 22-Mar-2012 10:21

Some background info: http://www.codinghorror.com/blog/2007/03/primary-keys-ids-versus-guids.html and a thead on the peg (see below the sig).

I'm interested to use the GUID as primary unique key, because of the pro's. What I forgot to mention in the thread on the peg is that a sequential number, f.e. for an ordertable is useful/needed. You can make a PUK (orderid field) and assign a GUID to it, and make the index on the ordernumberfield unique

(and assign it with a sequence). Expect the create of an order to become a bit slower and queries with joins on the GUID fields ditto. But that could be acceptable. Thoughts?

Regards, Stefan.

guid vs sequence
peg
x

Stefan Houtzager
Jan 30
to peg
Hi,

OE 10.2B04/11. Using Java ZK framework as front-end. In new
applications I often see GUID's being used as unique identifiers for
records. What would be good reasons to abandon the use of sequences
and replace them with guid's?
Julian Lyndon-Smith julian@lyndon-smith.com
Jan 30
to me, peg
One of the reasons is if you ever transfer data from one system to another there will be no clash in id's.
Another reason is that you don't have to create a sequence for every table that has a unique key requirement
Scott Augé sauge@amduus.com
Jan 30
to me, peg
Sequences are easily guessed.  GUIDs are not.

Sequences can be screwed up on db reloads.  GUIDs are not.
Julian Lyndon-Smith julian@lyndon-smith.com via peg.com
Jan 30
to peg
Antanas Kompanas a.kompanas@baltic-amadeus.lt via peg.com
Jan 30
to peg
On 01/30/12 14:15, Stefan Houtzager wrote:
> What would be good reasons to abandon the use of sequences
> and replace them with guid's?

The only good reason to use guid as primary key is when you somehow need
to merge the data into one table - that is, when you essentially have a
distributed database. Other that that, guid's are hard to read/write,
need additional storage and take more time to generate.
Mike Fechner mike.fechner@consultingwerk.de
Jan 30
to me, peg
A GUID can be created on the client.

Plus all the other arguments.

-----Ursprüngliche Nachricht-----
Von: postmaster@peg.com [mailto:postmaster@peg.com] Im Auftrag von Stefan Houtzager
Gesendet: Montag, 30. Januar 2012 13:15
An: peg@peg.com
Betreff: guid vs sequence
Antanas Kompanas a.kompanas@baltic-amadeus.lt via peg.com
Jan 30
to peg
On 01/30/12 14:29, Scott Augé wrote:
> Sequences are easily guessed.  GUIDs are not.

I wouldn't be so sure. GUIDs are unique, but they are not necessarily
hard to guess for a given system.

For example, if I generate my IDs using my ethernet MAC address and a
sequence, I get globally unique values but they are completely
predictable after some recent IDs have been sampled.
Julian Lyndon-Smith julian@lyndon-smith.com via peg.com
Jan 30
to peg
we just use

guid(generate-uuid)

cannot guess a pattern. However, surely guessing a sequence is a
*whole* lot easier

1
2
3
*wonder what comes next*

Antanas Kompanas a.kompanas@baltic-amadeus.lt via peg.com
Jan 30
to peg
On 01/30/12 15:11, Julian Lyndon-Smith wrote:
> we just use
>
> guid(generate-uuid)
>
> cannot guess a pattern.

Run this many times in a tight loop and see how many bits actually change.

> However, surely guessing a sequence is a
> *whole* lot easier
>
> 1
> 2
> 3
> *wonder what comes next*
>
>

What I wanted to say is that no-one should believe that GUIDs are
automatically a lot safer in this respect. I can just as well guess
(more or less, with a bit of luck) what comes next with this:

e0fb87f6-59f8-04a0-e111-454b9e2ac94f
e0fb87f6-59f8-04a0-e111-454b5492c94f
e0fb87f6-59f8-04a0-e111-454b5cadc94f

Depending on the circumstances, it might be just as easy as 1 2 3.
Gus Bjorklund gus@progress.com via peg.com
Jan 30
to peg
why does it matter if identifiers can be guessed?

if it is for some security reason, then both are the wrong solution.

-gus
Tom Bascom tom@wss.com via peg.com
Jan 30
to peg
On 1/30/12 7:44 AM, Antanas Kompanas wrote:
On 01/30/12 14:15, Stefan Houtzager wrote:
What would be good reasons to abandon the use of sequences
and replace them with guid's?
The only good reason to use guid as primary key is when you somehow need
to merge the data into one table - that is, when you essentially have a
distributed database. Other that that, guid's are hard to read/write,
need additional storage
So far, so good...

and take more time to generate.
One should always test performance assertions

I find that guid( myUUID ) is 30x-40x faster than next-value( someSequence )...
None the less -- I like sequences in many cases.  Being able to get a quick estimate of how rapidly something is growing simply by watching the current-value can be quite handy.
Gregory Higgins ablsaurusrex@gmail.com
Jan 30
to me, peg
A sequence implies an ordering that may not be wanted. A guid can
remove the implied ordering.

On Mon, Jan 30, 2012 at 7:15 AM, Stefan Houtzager
<stefan.houtzager@gmail.com> wrote:
> Hi,
>
> OE 10.2B04/11. Using Java ZK framework as front-end. In new
> applications I often see GUID's being used as unique identifiers for
> records. What would be good reasons to abandon the use of sequences
> and replace them with guid's?
>

--
I'm all-in with PPA!
http://theppa.org
Gregory Higgins ablsaurusrex@gmail.com
Jan 30
to me, peg
Also, a guid allows any number of sources to be a primary key generator.
Scott Augé sauge@amduus.com via peg.com
Jan 31
to peg
I agree with this.

For stuff that will show up on web pages and other dangerous areas (don't trust those web services either!) identifying a record, I use:

http://www.oehive.org/amduus/ServiceExpress/srvexp/src/MakeID2.i

with the date and time postfixed.
Scott Augé sauge@amduus.com via peg.com
Jan 31
to peg
Yes, you are correct.   As noted in another email that just went out, I usually uniquely id a record going into the public with a very random string (with 30 or more characters on it).

For really important records, the random string seen in public relates to a web state record with the real unique identifier on it tucked away in the database - thus the random string seen in public is throw away, something like:

FIND WebState NO-LOCK
WHERE WebState.UserCookie = GET-VALUE("AppCookie")
AND WebState.WebLookUpData = GET-VALUE("ThrowAway")
NO-ERROR.

...

FIND RealRecord EXCLUSIVE-LOCK
WHERE RealRecord.RID = WebState.DBLookUpData
NO-ERROR.


I was thinking maybe GUIDs can be used perhaps the same way, but the evidence is against it.
Stefan Houtzager
Feb 1
to Gregory, peg, bcc: engineers, bcc: Lorens.Zhou, bcc: Richard.vanWuy.
Thanks all!

Another reason I can think of is that sequeces are specific for the
progress db. If you want to use another datastore you have to use
something else then next-value(<sequencename>). The GUID can be used
for all datastores (except maybe one from before the second worldwar).
Antanas Kompanas a.kompanas@baltic-amadeus.lt via peg.com
Feb 1
to peg
On 02/01/12 10:47, Stefan Houtzager wrote:
> Another reason I can think of is that sequeces are specific for the
> progress db. If you want to use another datastore you have to use
> something else then next-value(<sequencename>). The GUID can be used
> for all datastores (except maybe one from before the second worldwar).

I don't think sequences or some form of them (like serial or autonumber
fields) are not available in other databases. If I had to guess you
would find sequences in those DBs "from before the second worldwar",
although that would not be so true about GUIDs. Mainly because they take
relatively a lot of space to store.
Stefan Houtzager
Feb 1
to Antanas
Could be, but you would have to write separate code for that other
datastore. And then there is the unmanaged datastore...
Darrell Davis ddavis.gso@gmail.com via peg.com
Feb 1
to peg
>> Another reason I can think of is that sequeces are specific for the
>> progress db. If you want to use another datastore you have to use
>> something else then next-value(<sequencename>). The GUID can be used
>> for all datastores (except maybe one from before the second worldwar).
>
> I don't think sequences or some form of them (like serial or autonumber
> fields) are not available in other databases. If I had to guess you
> would find sequences in those DBs "from before the second worldwar",
> although that would not be so true about GUIDs. Mainly because they take
> relatively a lot of space to store.

I believe that the he meant the sequence is going to be unique only in
the progress db that is the source of the sequence.  If you are
storing data in multiple db's, or the data is sourced from multiple
sources, a guid is unique across disparate systems while a sequence is
only unique if you get it from the source db first.

Hope I said that clearly.

Darrell Davis
RHCE / System Analyst / Progress Enthusiast

717/884-9231 office
336/324-0598 cell
Stefan Houtzager
Feb 1
to Darrell, peg
That is said clearly to me at least and it was (implicitly) said by
someone else before in this thread as a good argument for the use of
the GUID, but it is not what I meant. I mean that NEXT-VALUE in your
DA layer or databasetrigger means that you are tied to the progress
type of db. But of course you could use another way of numbering for
other datastores. For me the con of the space a guid needs does not
make a heavy weight on the balance (that now leans to the GUID).
Richard Kelters richard.kelters@gmail.com
Feb 1
to peg
I believe from the DB point of view there's a penalty of using a character datatype instead of an integer when a field is used in an index.

Richard

Op 1-2-2012 16:24, Stefan Houtzager schreef:
Gregory Higgins ablsaurusrex@gmail.com via peg.com
Feb 1
to peg
Systems are fast, space is cheap. There is some extra overhead I
supposed inserting a well constructed unique identifier into the
database, maybe the index pages get used more, but let's be honest,
it's probably less expensive than the appserver call you made to get
the data to the data base.
Antanas Kompanas a.kompanas@baltic-amadeus.lt via peg.com
Feb 2
to peg
On 02/01/12 17:50, Gregory Higgins wrote:
> Systems are fast, space is cheap.

Every system has its limits. When every programmer and architect starts
thinking this way one eventually ends up with a system that's neither
fast, nor cheap.
Scott Augé sauge@amduus.com via peg.com
Feb 2
to peg
On Feb 2, 2012, at 4:06 AM, Antanas Kompanas wrote:

> On 02/01/12 17:50, Gregory Higgins wrote:
>> Systems are fast, space is cheap.
>
> Every system has its limits. When every programmer and architect starts
> thinking this way one eventually ends up with a system that's neither
> fast, nor cheap.
>
>

We are not using 486's anymore either  ... but this is a good conversation to have.  Allow me to prattle on like someone who has stayed up all night (which I have.)

I have certainly seen some bass ackward code written because it was "faster."  Often, it wasn't.  For those rare exceptions, it wasn't as fast as someone thought it would be.

Then came those scenarios based on use - and often, it didn't matter to the user that the machine spent some more CPU cycles between times it sat there waiting for something to do.

Actually, often users are interested in seeing the code actually do something (right) - because often having a computer do it is a multitude of times faster than a human doing it - and allows for longer coffee breaks.  Even if the code was "slow" (judged by us old 8 bit assembly programmers who smashed code and data into 16,000 bytes), it is still much faster for the users anyhow.

What did happen, is that programmers spent overtime and thousands of dollars trying to figure out the "short cut" someone made because it pulled a few micro-seconds out of code execution, and being bass ackwards contained multiple defects on the boundaries, in re-use, or in extension of purpose.

This of course, pulls in the second no-no of software development - delays.  What should have been "a few minutes to do" turns into a big long frustrating battle with sh!t code written by people with no concept of coupling or cohesion.  (Just speaking from experience - not saying your's is!)  Having experience with C programming, I can certainly remember where people got way out of their league with "faster code" (regardless of if it is "correct code.")  (Entertainment: http://www.ioccc.org/)  I have seen some really horrible "tuned code" in Progress apps also.

So one spends thousands of dollars and people's time in development, testing, and deployment to shave 120 seconds a day off a program.  I can almost guarantee you the finance people, the people who pay for it, prefer "it just does something."  And if they found out they spent thousands of dollars for 120 seconds... ooooooo.

Most of us figure out a way for mono-threaded Progress to open up a process behind the scenes (aka a thread-like library) to grind through something while still allowing the user to move onto the next thing.  (Aka my Zeno project: http://www.oehive.org/node/1008)  Because lets face it, a lot of times there is another core sitting there twiddling it's thumbs for something to do.

Lets look at Facebook - PHP wasn't cutting it anymore.  They switched to automatic PHP -> C++ code conversion (https://developers.facebook.com/blog/post/358/) and compiled C++. (Hmmm, an ABL to C++ project?)

Most places Progress is used (and I have used it in some pretty big companies with pretty big user counts) only wishes they had Facebook problems.

Quite often, the solution is a faster computer and/or another computer in the pool (think AppServers/behind the scene processes) for a couple thousand bucks with clear, re-usable, and extensible code.

I now step off the soap box waving my hands around. 


Scott Augé
President
Amduus Information Works, Inc.
http://www.amduus.com

http://www.linkedin.com/in/scottauge
Twitter: ScAuge

Store Front: http://stores.lulu.com/store.php?fAcctID=2520351

PLEASE CONSIDER OUR ENVIRONMENT BEFORE PRINTING THIS EMAIL
Scott Augé sauge@amduus.com via peg.com
Feb 2
to peg
LOL - I am a Dr. Pepper man - caffeine caffeine caffeine - and the bathroom breaks associated with that!

Tonight I have been contemplating many things... 

On Feb 2, 2012, at 5:15 AM, Marian Edu wrote:

> Well said Scott, what have you been doing all night long and more important... what is that thing that you were using (smoking???)
>
> Greg should probably think of a way to add that 'like' button
Gregory Higgins ablsaurusrex@gmail.com via peg.com
Feb 2
to peg
Seriously? Limits? Have you been in the cloud lately? The most likely
limitation to any system today is an inadequacy of imagination by its
architect.
Antanas Kompanas a.kompanas@baltic-amadeus.lt via peg.com
Feb 2
to peg
On 02/02/12 12:49, Gregory Higgins wrote:
> Seriously? Limits? Have you been in the cloud lately?

Not everyone is happy enough to reach the clouds. Some of us here need
to write apps for earth.

Seriously, clound services are still too expensive (budget limit) and
have the unfortunate property of being remote (internet availability,
latency and throughput limit).
Scott Augé sauge@amduus.com via peg.com
Feb 2
to peg
Not to mention interesting legal principles and liabilities... especially if housed in a foreign country.
Rick Terrell terrell.rick@gmail.com via peg.com
Feb 2
to peg
Not to mention also, that some customers absolutely will not put their
data elsewhere (like Law Enforcement Agencies)...

All Replies

Posted by Stefan Drissen on 22-Mar-2012 17:16

My primary objection against using technical keys is that when there is a functional key available (invoice number, debtor number, g/l account) that any table referencing this data must always join to referential table to be able to show or sort on anything meaningful. Imagine a g/l transaction table with debtor numbers and invoice numbers. The query for showing the results sorted by debtor by invoice number is simply:

FOR EACH gltrans

   WHERE ...

BY debtor BY invoice_nr:

   ...

END.

When using technical keys this becomes:

FOR EACH gltrans

   WHERE ...

, debtor

   WHERE debtor.guid = gltrans.debtor_guid

, invoice

   WHERE invoice.guid = gltrans.invoice_guid

BY debtor.debtor_no BY invoice.invoice_no:

   ...

END.

I get the impression that result set based databases (SQL Server / Oracle) are very good at doing this. I am not so sure about ABL.

Posted by Thomas Mercer-Hursh on 22-Mar-2012 17:38

Well, yes, but the minute you want the name or default sign or anything else out of the account, then you have to do the join anyway.  The only time you can avoid the join is when the simple account number is all you want and you need no more information.

One of the big attractions of *not* using functional keys is that it is much more forgiving about mistakes.    Assign the wrong customer number, create a few dozen transactions, and then fixing your mistake becomes "exciting" if you have used the customer number as the link.  But, used a GUID as a link, then one can just change the number and all is well.

The big attraction of GUID over sequence is that GUID is unique to the record across all tables.  Sequence is not.  To be sure, there is no sensible link in which one would confuse the value across tables, but using GUID it is only going to be a link if it is a valid link.

Posted by Stefan Drissen on 22-Mar-2012 17:52

tamhas wrote:

Well, yes, but the minute you want the name or default sign or anything else out of the account, then you have to do the join anyway.  The only time you can avoid the join is when the simple account number is all you want and you need no more information.

I mentioned two issues. For referential data, yes a join is necessary - and I have no problems with that. The problem I do have is with sorting.

tamhas wrote:

One of the big attractions of *not* using functional keys is that it is much more forgiving about mistakes.    Assign the wrong customer number, create a few dozen transactions, and then fixing your mistake becomes "exciting" if you have used the customer number as the link.  But, used a GUID as a link, then one can just change the number and all is well.

True, but once again when sorting it gets messy. It becomes a balancing act between how often do you need to report and how often do you need to fix mistakes.

For referntial links I am completely fine and can admire the virtue of a technical key. I am however very worried about the performance impact on sorting and selection. In the one key sort you can take your referential data to the outside part of the query and inner join to your transaction table, but sort on two or more referential technical keys and you are stuck with having to scan through all your transactions to get transactions on debtors between 1000 and 4000 and g/l accounts between 8000 and 8070 (for example).

If this concern can be taken away - I would be happy to hear it!

Three of our corporate products, which are SQL Server only, do all this guid joining. But it's SQL Server. It's a result set based database, not a record based database.

Sure, we need complex recoding logic. But recoding is the exception.

Posted by Thomas Mercer-Hursh on 22-Mar-2012 18:08

Sorting and selection are partly a question of how one roles it... 

For starters, one might wonder in what context is a range of account numbers a genuinely interesting thing.

And, of course, there is the direction of the join.   If one wants all transactions within a certain account number range, as one might doing the general ledger, for example, one *could* start with the transactions and that would work efficiently only if the account number itself was in the transaction record.  But, in a general ledger, one *is* going to want to display things like the account name normal CR/DR etc. so one is going to need the join anyway.  Knowing that one needs the join, one can select based on the account table and join efficiently to the transaction table on the GUID.

Posted by Stefan Drissen on 22-Mar-2012 18:16

tamhas wrote:

Sorting and selection are partly a question of how one roles it... 

For starters, one might wonder in what context is a range of account numbers a genuinely interesting thing.

Up till know our customers think multiple selections are genuinely interesting, so I do not need to wonder about this. Please show me all g/l transactions on g/l accounts x thru y, project a thru c, cost center e and f, cost categories a and b.

tamhas wrote:

And, of course, there is the direction of the join.   If one wants all transactions within a certain account number range, as one might doing the general ledger, for example, one *could* start with the transactions and that would work efficiently only if the account number itself was in the transaction record.  But, in a general ledger, one *is* going to want to display things like the account name normal CR/DR etc. so one is going to need the join anyway.  Knowing that one needs the join, one can select based on the account table and join efficiently to the transaction table on the GUID.

Which is fine for one selection / sorting criterium as I mentioned in my previous post, but fails miserably for multiple selection / sorting criteria. There is a big difference in joining to simply get additional referential data and sorting on primary data.

Posted by Thomas Mercer-Hursh on 22-Mar-2012 18:59

Given the context of reporting such as you describe, it is hard for me to imagine how one avoids the join.  If one has the join, it is a problem in query optimization.

Posted by agent_008_nl on 23-Mar-2012 02:26

I find the possibility to assign the PUK on the client (java/.net/progress) a big plus too. But I would be interested in some performanceconsiderations, did anyone of you do some tests?

Posted by ujj1 on 14-Oct-2014 12:50

FWIW: We talked about a GUID as part of the Progress Exchange 2014 "InfoExchange" session.  Possible outcome could be a Guid datatype (16 byte binary) to optimize storage.  Currently we store a guid in a character string which would be 32bytes (with the dashes removed).  There was talk about sequential GUIDs also to optimize using as a database key field.

I'm going to create a Progress "Idea" Enhancement so feel free to vote/comment on that thread.

community.progress.com/.../abl_datatype_-_guid.aspx

Posted by agent_008_nl on 15-Oct-2014 01:06

The time I had a look I found the sequential GUIDs most interesting. For those new to the matter: www.codeproject.com/.../GUIDs-as-fast-primary-keys-under-multiple-database

Posted by Mike Fechner on 15-Oct-2014 01:12

Consider this as an attempt to have a serious discussion with Mr. Houtzager.:
 
Have you tried to implement this in ABL already in your framework? How is the runtime of the GUID generator compared to the plain GUID function?

 

Posted by agent_008_nl on 15-Oct-2014 01:39

Mike,

I leave it to others to see your attempt as serious or not (in case not they might not dare to say so). I for myself think you're far too smart to ask this seriously. You try to test my knowledge I think. That's ok, but I think what I can answer (clustered indexes do not exist in the openedge db, some cut and paste comments of others like "Is the slower creation a concern?  Personally, I find it acceptable.  Unless your application involves very frequent inserts (in which case a GUID key may not be ideal for other reasons), the cost of occasional GUID creation will pale in comparison to the benefits of faster database operations." would not add much to the discussions that have taken place already on the peg and at exchange (I assume). Besides I'm not a db guru. You better discuss this with f.e. Gus and Tom.

Posted by Mike Fechner on 15-Oct-2014 01:45

Believe it or not – I am really curious to see it this has real benefit.
And anybody following the ABL documentation should be able to use the C# solution from within the ABL – hence my interest if it was tried already.

Posted by agent_008_nl on 15-Oct-2014 02:10

According to what is written in the link I sent: www.codeproject.com/.../GUIDs-as-fast-primary-keys-under-multiple-database you can expect the comb guid creation to be a bit slower than the plain guid as we have it now. The clustered indexes would be needed for the real benefits: faster reads. I'll copy a discussion from the peg, you can find more info there. And maybe some who involved in the discussion at exchange want to explain more.

Re: clustered indexesImportant mainly because of your interaction with messages in the conversation.

Click to teach Gmail this conversation is not important.

pegx

Tom BascomOn 6/4/14, 2:39 AM, Stefan Houtzager wrote: > > Has the possibity to mark an ...

Jun 4

On 6/4/14, 2:39 AM, Stefan Houtzager wrote: > > Has the possibity to mark an ...

Tom Bascom  

Jun 4

to peg

On 6/4/14, 2:39 AM, Stefan Houtzager wrote:

Has the possibity to mark an index as clustered in a progress db been

considered bij psc by the way?

A clustered index is one where the data is arranged in index order on disk.  (The name strikes me as misleading -- "clustered data" seems more appropriate...)

There can only be one such index per table because the data can only be in one "physical" order.

When reading data in the clustered index order you can mostly skip reading the index -- so it is much faster (sort of like the oe11 TABLE-SCAN option except that the data is ordered).

But when writing it can be *very* expensive -- if you insert a record in the middle everything above it has to be shifted.

Personally I don't see the feature as being very attractive.  I suspect that it gets a lot of attention because in the Microsoft world they get created automatically and they are a way to do fast table scans.  So it is "received wisdom" that clustered indexes are an important feature.  Somewhat like the idea that fields should have a  known width that is never exceeded ;)

The ordering part is, maybe, occasionally useful but the write-performance trade-off seems far too high to me.  I also suspect that a great many table scans either don't care about the ordering or will need a sort order different from such a key.  My understanding is that with MS SQL these are often found in association with the "primary" key which in most Progress databases that I have seen is probably not the order that you are likely to want data in.  Lastly -- if a GUID is used for the key to cluster around it would likely be an even less useful sort order.  Even if the GUID were to be a monotonically increasing sort order the replication/merge use-case, which IMHO is the strongest reason to consider GUIDs as keys, makes the order attribute basically useless.

One Progress-specific issue that immediately springs to mind when thinking about Progress is that RECIDs would become volatile even within a session (not just across dump & load etc).  You could read a record, someone could insert another record in the table, and your RECID would no longer point to what it used to point to.

--

Tom Bascom

Reply Reply to all Forward

Rob FitzpatrickFollowing that train of thought re volatile recids, if an insert of one recor...

Jun 4

Following that train of thought re volatile recids, if an insert of one recor...

Rob FitzpatrickJun 4

Loading...

Rob Fitzpatrick  

Jun 4

to peg

Following that train of thought re volatile recids, if an insert of one

record caused a change in the recids of all the following records (in

clustered index order) then wouldn't any index entries for those records

in other indexes on that table also require the updated recids?  If so I

wonder how SQL Server deals with that.  Or are their data structures and

physical storage layout so different from Progress' that the question

isn't meaningful?

-----Original Message-----

From: Tom Bascom

One Progress-specific issue that immediately springs to mind when

thinking about Progress is that RECIDs would become volatile even within

a session (not just across dump & load etc).  You could read a record,

someone could insert another record in the table, and your RECID would

no longer point to what it used to point to.

Reply Reply to all Forward

Gus Bjorklundwhat makes you guys think that everything else moves when you add a row to a ...

Jun 4

what makes you guys think that everything else moves when you add a row to a ...

Gus BjorklundJun 4

Loading...

Gus Bjorklund

Jun 4

to peg

On 6/4/14 9:17 AM, "Rob Fitzpatrick"  wrote:

>Following that train of thought re volatile recids, if an insert of one

>record caused a change in the recids of all the following records (in

>clustered index order) then wouldn't any index entries for those records

>in other indexes on that table also require the updated recids?  If so I

>wonder how SQL Server deals with that.  Or are their data structures and

>physical storage layout so different from Progress' that the question

>isn't meaningful?

>

what makes you guys think that everything else moves when you add a row to

a table that has a clustered index ?

-gus

Reply Reply to all Forward

12 older messagesTim KuehnFrom the definition - "data is in index order on disk"

Jun 4

From the definition - "data is in index order on disk"

Tim KuehnJun 4

Loading...

Tim Kuehn

Jun 4

to peg

On Wed, Jun 4, 2014 at 9:45 AM, Gus Bjorklund wrote:

> what makes you guys think that everything else moves when you add a row to

> a table that has a clustered index ?

From the definition - "data is in index order on disk"

--

Tim Kuehn:  Senior Consultant  - TDK Consulting Services

Ontario PUG President

PUG Challenge Americas Executive, Program Committee Chair

Reply Reply to all Forward

Rob FitzpatrickGus, I don't know how clustered indexes work. That's why I wrote my question ...

Jun 4

Gus, I don't know how clustered indexes work. That's why I wrote my question ...

Rob FitzpatrickJun 4

Loading...

Rob Fitzpatrick  

Jun 4

to peg

Gus,

I don't know how clustered indexes work.  That's why I wrote my question

in conditional form.  If the condition is false then my question is moot

and can be ignored.

-----Original Message-----

what makes you guys think that everything else moves when you add a row

to a table that has a clustered index ?

-gus

Reply Reply to all Forward

DavidMSSQL will leave some gaps, especially when using something like a guid. I gu...

Jun 4

MSSQL will leave some gaps, especially when using something like a guid. I gu...

DavidJun 4

Loading...

DavidMSSQL will leave some gaps, especially when using something like a guid. I gu...

Jun 4

MSSQL will leave some gaps, especially when using something like a guid. I gu...

David

Jun 4

to peg

On 6/4/2014 8:51 AM, Rob Fitzpatrick wrote:

Gus,

I don't know how clustered indexes work.  That's why I wrote my question

in conditional form.  If the condition is false then my question is moot

and can be ignored.

-----Original Message-----

what makes you guys think that everything else moves when you add a row

to a table that has a clustered index ?

-gus

MSSQL will leave some gaps, especially when using something like a guid.   I guess its similar to create/toss limits in Progress.  Guids are a poor choice for a clustered key since they are effectively random.   Also, the clustered key becomes the implied last component of all other indexes.  The ideal clustered key will be a monotonically increasing integer (eg. an identity column or a sequence assigned column)

Reply Reply to all Forward

Tom BascomBecause that's what the Wiki that I found for "clustered indexes" says happen...

Jun 4

Because that's what the Wiki that I found for "clustered indexes" says happen...

Tom BascomJun 4

Loading...

Tom Bascom  

Jun 4

to peg

On 6/4/14, 10:12 AM, David wrote:

On 6/4/2014 8:51 AM, Rob Fitzpatrick wrote:

Gus,

I don't know how clustered indexes work.  That's why I wrote my question

in conditional form.  If the condition is false then my question is moot

and can be ignored.

-----Original Message-----

what makes you guys think that everything else moves when you add a row

to a table that has a clustered index ?

Because that's what the Wiki that I found for "clustered indexes" says happens.  Various other material says the same thing.

It passes the sniff test because of the "stored in order on disk" characteristic.

I suppose it would not be a problem if records are always appended but I did say "insert a record in the middle".

-gus

MSSQL will leave some gaps, especially when using something like a guid.   I guess its similar to create/toss limits in Progress. Guids are a poor choice for a clustered key since they are effectively random.   Also, the clustered key becomes the implied last component of all other indexes.  The ideal clustered key will be a monotonically increasing integer (eg. an identity column or a sequence assigned column)

So maybe it doesn't *always* cause everything to move but eventually gaps will get filled and when they do it is going to be really painful.

Sounds like a bad idea to me.

--

Tom Bascom

Reply Reply to all Forward

DavidLogically, the rows are ordered, but not necessarily physically. That is why ...

Jun 4

Logically, the rows are ordered, but not necessarily physically. That is why ...

DavidJun 4

Loading...

David  

Jun 4

to peg

Logically, the rows are ordered, but not necessarily physically.  That is why there is a need to periodically do a rebuild/reorg of the tables in MSSQL.

On 6/4/2014 9:44 AM, Tom Bascom wrote:

On 6/4/14, 10:12 AM, David wrote:

On 6/4/2014 8:51 AM, Rob Fitzpatrick wrote:

Gus,

I don't know how clustered indexes work.  That's why I wrote my question

in conditional form.  If the condition is false then my question is moot

and can be ignored.

-----Original Message-----

what makes you guys think that everything else moves when you add a row

to a table that has a clustered index ?

Because that's what the Wiki that I found for "clustered indexes" says happens.  Various other material says the same thing.

It passes the sniff test because of the "stored in order on disk" characteristic.

I suppose it would not be a problem if records are always appended but I did say "insert a record in the middle".

-gus

MSSQL will leave some gaps, especially when using something like a guid.   I guess its similar to create/toss limits in Progress. Guids are a poor choice for a clustered key since they are effectively random.   Also, the clustered key becomes the implied last component of all other indexes.  The ideal clustered key will be a monotonically increasing integer (eg. an identity column or a sequence assigned column)

So maybe it doesn't *always* cause everything to move but eventually gaps will get filled and when they do it is going to be really painful.

Sounds like a bad idea to me.

Reply Reply to all Forward

Thomas Mercer-Hursh, Ph.D.Is it, perhaps, the case that clustered index is used only on table which are...

Jun 4

Is it, perhaps, the case that clustered index is used only on table which are...

Tom BascomSorry, you lost me. I would have thought that the index is the "logical" part...

Jun 4

Sorry, you lost me. I would have thought that the index is the "logical" part...

Tom BascomJun 4

Loading...

Tom Bascom  

Jun 4

to peg

On 6/4/14, 10:59 AM, David wrote:

Logically, the rows are ordered, but not necessarily physically.  That is why there is a need to periodically do a rebuild/reorg of the tables in MSSQL.

Sorry, you lost me.  I would have thought that the index is the "logical" part of things and that the ordering in the file is the "on disk" or "physical" portion.

Can you elaborate?

--

Tom Bascom

Reply Reply to all Forward

DavidI can't recall the specifics, but there is some kind of pointer left where th...

Jun 4

I can't recall the specifics, but there is some kind of pointer left where th...

DavidJun 4

Loading...

David

Jun 4

to peg

On 6/4/2014 10:07 AM, Tom Bascom wrote:

On 6/4/14, 10:59 AM, David wrote:

Logically, the rows are ordered, but not necessarily physically.  That is why there is a need to periodically do a rebuild/reorg of the tables in MSSQL.

Sorry, you lost me.  I would have thought that the index is the "logical" part of things and that the ordering in the file is the "on disk" or "physical" portion.

Can you elaborate?

I can't recall the specifics, but there is some kind of pointer left where the new middle row would logically be.  Like a "jump to this block".

Reply Reply to all Forward

Gus Bjorklundthere are a variety of ways to implement storage for clustered index tables. ...

Jun 4

there are a variety of ways to implement storage for clustered index tables. ...

Gus BjorklundJun 4

Loading...

Gus Bjorklund

Jun 4

to peg

On 6/4/14 11:07 AM, "Tom Bascom"  wrote:

>On 6/4/14, 10:59 AM, David wrote:

>> Logically, the rows are ordered, but not necessarily physically.  That

>> is why there is a need to periodically do a rebuild/reorg of the

>> tables in MSSQL.

>

>Sorry, you lost me.  I would have thought that the index is the

>"logical" part of things and that the ordering in the file is the "on

>disk" or "physical" portion.

>

>Can you elaborate?

>

>

there are a variety of ways to implement storage for clustered index

tables. and for handling inserts and page splits. you don't need to

reorganize the whole table for every insert. that would  negate whatever

advantage you might get from the clustering.

note that "physical order" does not have much meaning with modern disk and

disk arrays. nobody stores things in disk sector order anymore. clustered

indexes serve to reduce the I/O overhead but don't have to be perfectly

optimized. good enough is good enough.

regards, gus

"Culture eats strategy for breakfast." -- Peter Drucker

Reply Reply to all Forward

Tom BascomThat would seem to defeat the whole "in order on disk" assumption that the be...

Jun 4

That would seem to defeat the whole "in order on disk" assumption that the be...

Tom BascomJun 4

Loading...

Tom Bascom  

Jun 4

to peg

On 6/4/14, 11:13 AM, David wrote:

On 6/4/2014 10:07 AM, Tom Bascom wrote:

On 6/4/14, 10:59 AM, David wrote:

Logically, the rows are ordered, but not necessarily physically.  That is why there is a need to periodically do a rebuild/reorg of the tables in MSSQL.

Sorry, you lost me.  I would have thought that the index is the "logical" part of things and that the ordering in the file is the "on disk" or "physical" portion.

Can you elaborate?

I can't recall the specifics, but there is some kind of pointer left where the new middle row would logically be.  Like a "jump to this block".

That would seem to defeat the whole "in order on disk" assumption that the benefits of clustered indexes are supposedly based on.

I guess that if it is a rare event that it might not be too horrible?

--

Tom Bascom

Reply Reply to all Forward

DavidYou have some control over how rare. If you have an ascending numeric key, yo...

Jun 4

You have some control over how rare. If you have an ascending numeric key, yo...

DavidJun 4

Loading...

David

Jun 4

to peg

On 6/4/2014 10:20 AM, Tom Bascom wrote:

On 6/4/14, 11:13 AM, David wrote:

On 6/4/2014 10:07 AM, Tom Bascom wrote:

On 6/4/14, 10:59 AM, David wrote:

Logically, the rows are ordered, but not necessarily physically.  That is why there is a need to periodically do a rebuild/reorg of the tables in MSSQL.

Sorry, you lost me.  I would have thought that the index is the "logical" part of things and that the ordering in the file is the "on disk" or "physical" portion.

Can you elaborate?

I can't recall the specifics, but there is some kind of pointer left where the new middle row would logically be.  Like a "jump to this block".

That would seem to defeat the whole "in order on disk" assumption that the benefits of clustered indexes are supposedly based on.

I guess that if it is a rare event that it might not be too horrible?

You have some control over how rare.  If you have an ascending numeric key, you can pack rows pretty tightly.  If you have something like a GUID, then you'd set the "fill %" accordingly when you rebuild the index.  This underscores why an plain GUID for a clustered key is not a great idea, since you have to have your fill % lower than other alternatives.  The article mentioned earlier in the thread describes a way around this by manipulating the GUID to be ordered of sorts.

Reply Reply to all Forward

Stefan Houtzager

Jun 5

to David, peg

Seems like important content in the article I sent (

www.codeproject.com/.../GUIDs-as-fast-primary-keys-under-multiple-database)

is forgotten by some in this thread. "This article outlines an

approach for using GUID values as primary keys/clustered indexes that

avoids most of the normal disadvantages, adapting the COMB model for

sequential GUIDs".

 So it is not about the normal guid with a clustered index.

On Wed, Jun 4, 2014 at 5:44 PM, David wrote:

> On 6/4/2014 10:20 AM, Tom Bascom wrote:

>>

>> On 6/4/14, 11:13 AM, David wrote:

>>>

>>> On 6/4/2014 10:07 AM, Tom Bascom wrote:

>>>>

>>>> On 6/4/14, 10:59 AM, David wrote:

>>>>>

>>>>> Logically, the rows are ordered, but not necessarily physically.  That

>>>>> is why there is a need to periodically do a rebuild/reorg of the tables in

>>>>> MSSQL.

>>>>

>>>>

>>>> Sorry, you lost me.  I would have thought that the index is the

>>>> "logical" part of things and that the ordering in the file is the "on disk"

>>>> or "physical" portion.

>>>>

>>>> Can you elaborate?

>>>>

>>> I can't recall the specifics, but there is some kind of pointer left

>>> where the new middle row would logically be.  Like a "jump to this block".

>>

>>

>> That would seem to defeat the whole "in order on disk" assumption that the

>> benefits of clustered indexes are supposedly based on.

>>

>> I guess that if it is a rare event that it might not be too horrible?

>>

> You have some control over how rare.  If you have an ascending numeric key,

> you can pack rows pretty tightly.  If you have something like a GUID, then

> you'd set the "fill %" accordingly when you rebuild the index.  This

> underscores why an plain GUID for a clustered key is not a great idea, since

> you have to have your fill % lower than other alternatives.  The article

> mentioned earlier in the thread describes a way around this by manipulating

> the GUID to be ordered of sorts.

>

--

Kind regards,

Stefan Houtzager

Houtzager ICT consultancy & development

www.linkedin.com/in/stefanhoutzager

Reply Reply to all Forward

Tom Bascom  

Jun 5

to me, peg

We didn't forget that.  It has been mentioned several times.  I changed the subject text though because topic of clustered indexes is interesting on its own and comes up once in a while and I wanted to talk about it more generally - not necessarily in relation to guids.

--

Tom Bascom

> On Jun 5, 2014, at 3:05 AM, Stefan Houtzager wrote:

>

> Seems like important content in the article I sent (

> www.codeproject.com/.../GUIDs-as-fast-primary-keys-under-multiple-database)

> is forgotten by some in this thread. "This article outlines an

> approach for using GUID values as primary keys/clustered indexes that

> avoids most of the normal disadvantages, adapting the COMB model for

> sequential GUIDs".

>  So it is not about the normal guid with a clustered index.

>

>

>> On Wed, Jun 4, 2014 at 5:44 PM, David wrote:

>>> On 6/4/2014 10:20 AM, Tom Bascom wrote:

>>>

>>>> On 6/4/14, 11:13 AM, David wrote:

>>>>

>>>>> On 6/4/2014 10:07 AM, Tom Bascom wrote:

>>>>>

>>>>>> On 6/4/14, 10:59 AM, David wrote:

>>>>>>

>>>>>> Logically, the rows are ordered, but not necessarily physically.  That

>>>>>> is why there is a need to periodically do a rebuild/reorg of the tables in

>>>>>> MSSQL.

>>>>>

>>>>>

>>>>> Sorry, you lost me.  I would have thought that the index is the

>>>>> "logical" part of things and that the ordering in the file is the "on disk"

>>>>> or "physical" portion.

>>>>>

>>>>> Can you elaborate?

>>>> I can't recall the specifics, but there is some kind of pointer left

>>>> where the new middle row would logically be.  Like a "jump to this block".

>>>

>>>

>>> That would seem to defeat the whole "in order on disk" assumption that the

>>> benefits of clustered indexes are supposedly based on.

>>>

>>> I guess that if it is a rare event that it might not be too horrible?

>> You have some control over how rare.  If you have an ascending numeric key,

>> you can pack rows pretty tightly.  If you have something like a GUID, then

>> you'd set the "fill %" accordingly when you rebuild the index.  This

>> underscores why an plain GUID for a clustered key is not a great idea, since

>> you have to have your fill % lower than other alternatives.  The article

>> mentioned earlier in the thread describes a way around this by manipulating

>> the GUID to be ordered of sorts.

>

>

>

> --

> Kind regards,

>

> Stefan Houtzager

>

> Houtzager ICT consultancy & development

>

> www.linkedin.com/in/stefanhoutzager

>

>

Reply Reply to all Forward

Stefan Houtzager

Jun 5

to Tom, peg

Ah, dummy me. David wrote "The article mentioned earlier in the thread

describes a way around this by manipulating the GUID to be ordered of

sorts." indeed.

On Thu, Jun 5, 2014 at 12:53 PM, Tom Bascom  wrote:

> We didn't forget that.  It has been mentioned several times.  I changed the subject text though because topic of clustered indexes is interesting on its own and comes up once in a while and I wanted to talk about it more generally - not necessarily in relation to guids.

>

> --

> Tom Bascom

>

>> On Jun 5, 2014, at 3:05 AM, Stefan Houtzager wrote:

>>

>> Seems like important content in the article I sent (

>> www.codeproject.com/.../GUIDs-as-fast-primary-keys-under-multiple-database)

>> is forgotten by some in this thread. "This article outlines an

>> approach for using GUID values as primary keys/clustered indexes that

>> avoids most of the normal disadvantages, adapting the COMB model for

>> sequential GUIDs".

>>  So it is not about the normal guid with a clustered index.

>>

>>

>>> On Wed, Jun 4, 2014 at 5:44 PM, David wrote:

>>>> On 6/4/2014 10:20 AM, Tom Bascom wrote:

>>>>

>>>>> On 6/4/14, 11:13 AM, David wrote:

>>>>>

>>>>>> On 6/4/2014 10:07 AM, Tom Bascom wrote:

>>>>>>

>>>>>>> On 6/4/14, 10:59 AM, David wrote:

>>>>>>>

>>>>>>> Logically, the rows are ordered, but not necessarily physically.  That

>>>>>>> is why there is a need to periodically do a rebuild/reorg of the tables in

>>>>>>> MSSQL.

>>>>>>

>>>>>>

>>>>>> Sorry, you lost me.  I would have thought that the index is the

>>>>>> "logical" part of things and that the ordering in the file is the "on disk"

>>>>>> or "physical" portion.

>>>>>>

>>>>>> Can you elaborate?

>>>>> I can't recall the specifics, but there is some kind of pointer left

>>>>> where the new middle row would logically be.  Like a "jump to this block".

>>>>

>>>>

>>>> That would seem to defeat the whole "in order on disk" assumption that the

>>>> benefits of clustered indexes are supposedly based on.

>>>>

>>>> I guess that if it is a rare event that it might not be too horrible?

>>> You have some control over how rare.  If you have an ascending numeric key,

>>> you can pack rows pretty tightly.  If you have something like a GUID, then

>>> you'd set the "fill %" accordingly when you rebuild the index.  This

>>> underscores why an plain GUID for a clustered key is not a great idea, since

>>> you have to have your fill % lower than other alternatives.  The article

>>> mentioned earlier in the thread describes a way around this by manipulating

>>> the GUID to be ordered of sorts.

>>

>>

>>

>> --

>> Kind regards,

>>

>> Stefan Houtzager

>>

>> Houtzager ICT consultancy & development

>>

>> www.linkedin.com/in/stefanhoutzager

>>

>>

--

Kind regards,

Stefan Houtzager

Houtzager ICT consultancy & development

www.linkedin.com/in/stefanhoutzager

Reply Reply to all Forward

Marian Edu

Jun 5

to me, peg

have found this video to be quite helpful on explaining the differences - www.youtube.com/watch

main idea, for a clustered index all record data is available (leaf index structure = table structure) while for non-clustered if values from non-index columns are needed the rowid will be used to get the record from table structure... however I don't think this should require reshuffling whole records when adding an 'out of order' record, meaning that might be one implementation but it's definitively a bad one :)

didn't actually get the idea of the article though, depending on how rows gets stored in the heap using a sorted or random value for the field of clustered index (pk defaults or is mandatory clustered) the time required to insert a record might be higher but that is probably because of the algorithm the engine uses to find the right table structure to put it in and not because it has to rearrange all existing records in some cases.

and beside, imho the performance impact of using guid (string) as pk is more on the read end than on insert/create one

On 06/05/2014 02:09 PM, Stefan Houtzager wrote:

Ah, dummy me. David wrote "The article mentioned earlier in the thread

describes a way around this by manipulating the GUID to be ordered of

sorts." indeed.

On Thu, Jun 5, 2014 at 12:53 PM, Tom Bascom  wrote:

We didn't forget that.  It has been mentioned several times.  I changed the subject text though because topic of clustered indexes is interesting on its own and comes up once in a while and I wanted to talk about it more generally - not necessarily in relation to guids.

--

Tom Bascom

On Jun 5, 2014, at 3:05 AM, Stefan Houtzager wrote:

Seems like important content in the article I sent (

www.codeproject.com/.../GUIDs-as-fast-primary-keys-under-multiple-database)

is forgotten by some in this thread. "This article outlines an

approach for using GUID values as primary keys/clustered indexes that

avoids most of the normal disadvantages, adapting the COMB model for

sequential GUIDs".

 So it is not about the normal guid with a clustered index.

On Wed, Jun 4, 2014 at 5:44 PM, David wrote:

On 6/4/2014 10:20 AM, Tom Bascom wrote:

On 6/4/14, 11:13 AM, David wrote:

On 6/4/2014 10:07 AM, Tom Bascom wrote:

On 6/4/14, 10:59 AM, David wrote:

Logically, the rows are ordered, but not necessarily physically.  That

is why there is a need to periodically do a rebuild/reorg of the tables in

MSSQL.

Sorry, you lost me.  I would have thought that the index is the

"logical" part of things and that the ordering in the file is the "on disk"

or "physical" portion.

Can you elaborate?

I can't recall the specifics, but there is some kind of pointer left

where the new middle row would logically be.  Like a "jump to this block".

That would seem to defeat the whole "in order on disk" assumption that the

benefits of clustered indexes are supposedly based on.

I guess that if it is a rare event that it might not be too horrible?

You have some control over how rare.  If you have an ascending numeric key,

you can pack rows pretty tightly.  If you have something like a GUID, then

you'd set the "fill %" accordingly when you rebuild the index.  This

underscores why an plain GUID for a clustered key is not a great idea, since

you have to have your fill % lower than other alternatives.  The article

mentioned earlier in the thread describes a way around this by manipulating

the GUID to be ordered of sorts.

--

Kind regards,

Stefan Houtzager

Houtzager ICT consultancy & development

www.linkedin.com/in/stefanhoutzager

--

m.edu

keep it simple

http://www.ganimede.ro

ro.linkedin.com/.../marianedu

Reply Reply to all Forward

Stefan Houtzager

Jun 5

to Marian, peg

> the performance impact of using guid (string) as pk is more on the read end than on insert/create one

And that's where the 'comb guid' with the clustered index comes in

(see the article).

On Thu, Jun 5, 2014 at 2:38 PM, Marian Edu wrote:

> have found this video to be quite helpful on explaining the differences -

> www.youtube.com/watch

>

> main idea, for a clustered index all record data is available (leaf index

> structure = table structure) while for non-clustered if values from

> non-index columns are needed the rowid will be used to get the record from

> table structure... however I don't think this should require reshuffling

> whole records when adding an 'out of order' record, meaning that might be

> one implementation but it's definitively a bad one :)

>

> didn't actually get the idea of the article though, depending on how rows

> gets stored in the heap using a sorted or random value for the field of

> clustered index (pk defaults or is mandatory clustered) the time required to

> insert a record might be higher but that is probably because of the

> algorithm the engine uses to find the right table structure to put it in and

> not because it has to rearrange all existing records in some cases.

>

> and beside, imho the performance impact of using guid (string) as pk is more

> on the read end than on insert/create one

>

>

> On 06/05/2014 02:09 PM, Stefan Houtzager wrote:

>>

>> Ah, dummy me. David wrote "The article mentioned earlier in the thread

>> describes a way around this by manipulating the GUID to be ordered of

>> sorts." indeed.

>>

>> On Thu, Jun 5, 2014 at 12:53 PM, Tom Bascom  wrote:

>>>

>>> We didn't forget that.  It has been mentioned several times.  I changed

>>> the subject text though because topic of clustered indexes is interesting on

>>> its own and comes up once in a while and I wanted to talk about it more

>>> generally - not necessarily in relation to guids.

>>>

>>> --

>>> Tom Bascom

>>>

>>>> On Jun 5, 2014, at 3:05 AM, Stefan Houtzager

>>>> wrote:

>>>>

>>>> Seems like important content in the article I sent (

>>>>

>>>> www.codeproject.com/.../GUIDs-as-fast-primary-keys-under-multiple-database)

>>>> is forgotten by some in this thread. "This article outlines an

>>>> approach for using GUID values as primary keys/clustered indexes that

>>>> avoids most of the normal disadvantages, adapting the COMB model for

>>>> sequential GUIDs".

>>>>   So it is not about the normal guid with a clustered index.

>>>>

>>>>

>>>>> On Wed, Jun 4, 2014 at 5:44 PM, David wrote:

>>>>>>

>>>>>> On 6/4/2014 10:20 AM, Tom Bascom wrote:

>>>>>>

>>>>>>> On 6/4/14, 11:13 AM, David wrote:

>>>>>>>

>>>>>>>> On 6/4/2014 10:07 AM, Tom Bascom wrote:

>>>>>>>>

>>>>>>>>> On 6/4/14, 10:59 AM, David wrote:

>>>>>>>>>

>>>>>>>>> Logically, the rows are ordered, but not necessarily physically.

>>>>>>>>> That

>>>>>>>>> is why there is a need to periodically do a rebuild/reorg of the

>>>>>>>>> tables in

>>>>>>>>> MSSQL.

>>>>>>>>

>>>>>>>>

>>>>>>>> Sorry, you lost me.  I would have thought that the index is the

>>>>>>>> "logical" part of things and that the ordering in the file is the

>>>>>>>> "on disk"

>>>>>>>> or "physical" portion.

>>>>>>>>

>>>>>>>> Can you elaborate?

>>>>>>>

>>>>>>> I can't recall the specifics, but there is some kind of pointer left

>>>>>>> where the new middle row would logically be.  Like a "jump to this

>>>>>>> block".

>>>>>>

>>>>>>

>>>>>> That would seem to defeat the whole "in order on disk" assumption that

>>>>>> the

>>>>>> benefits of clustered indexes are supposedly based on.

>>>>>>

>>>>>> I guess that if it is a rare event that it might not be too horrible?

>>>>>

>>>>> You have some control over how rare.  If you have an ascending numeric

>>>>> key,

>>>>> you can pack rows pretty tightly.  If you have something like a GUID,

>>>>> then

>>>>> you'd set the "fill %" accordingly when you rebuild the index.  This

>>>>> underscores why an plain GUID for a clustered key is not a great idea,

>>>>> since

>>>>> you have to have your fill % lower than other alternatives.  The

>>>>> article

>>>>> mentioned earlier in the thread describes a way around this by

>>>>> manipulating

>>>>> the GUID to be ordered of sorts.

>>>>

>>>>

>>>>

>>>> --

>>>> Kind regards,

>>>>

>>>> Stefan Houtzager

>>>>

>>>> Houtzager ICT consultancy & development

>>>>

>>>> www.linkedin.com/in/stefanhoutzager

>>>>

>>>>

>>

>>

>

>

> --

> m.edu

> keep it simple

> http://www.ganimede.ro

> ro.linkedin.com/.../marianedu

--

Kind regards,

Stefan Houtzager

Houtzager ICT consultancy & development

www.linkedin.com/in/stefanhoutzager

Reply Reply to all Forward

Tom Bascom  

Jun 5

to peg

Interesting.  I don't think I followed everything.

I especially don't understand the point of apparently duplicating the data in both the index and in the table.  I'm going to need to go back and watch it again.

On 6/5/14, 8:38 AM, Marian Edu wrote:

have found this video to be quite helpful on explaining the differences - www.youtube.com/watch

main idea, for a clustered index all record data is available (leaf index structure = table structure) while for non-clustered if values from non-index columns are needed the rowid will be used to get the record from table structure... however I don't think this should require reshuffling whole records when adding an 'out of order' record, meaning that might be one implementation but it's definitively a bad one :)

didn't actually get the idea of the article though, depending on how rows gets stored in the heap using a sorted or random value for the field of clustered index (pk defaults or is mandatory clustered) the time required to insert a record might be higher but that is probably because of the algorithm the engine uses to find the right table structure to put it in and not because it has to rearrange all existing records in some cases.

and beside, imho the performance impact of using guid (string) as pk is more on the read end than on insert/create one

On 06/05/2014 02:09 PM, Stefan Houtzager wrote:

Ah, dummy me. David wrote "The article mentioned earlier in the thread

describes a way around this by manipulating the GUID to be ordered of

sorts." indeed.

On Thu, Jun 5, 2014 at 12:53 PM, Tom Bascom  wrote:

We didn't forget that.  It has been mentioned several times.  I changed the subject text though because topic of clustered indexes is interesting on its own and comes up once in a while and I wanted to talk about it more generally - not necessarily in relation to guids.

--

Tom Bascom

On Jun 5, 2014, at 3:05 AM, Stefan Houtzager wrote:

Seems like important content in the article I sent (

www.codeproject.com/.../GUIDs-as-fast-primary-keys-under-multiple-database)

is forgotten by some in this thread. "This article outlines an

approach for using GUID values as primary keys/clustered indexes that

avoids most of the normal disadvantages, adapting the COMB model for

sequential GUIDs".

 So it is not about the normal guid with a clustered index.

On Wed, Jun 4, 2014 at 5:44 PM, David wrote:

On 6/4/2014 10:20 AM, Tom Bascom wrote:

On 6/4/14, 11:13 AM, David wrote:

On 6/4/2014 10:07 AM, Tom Bascom wrote:

On 6/4/14, 10:59 AM, David wrote:

Logically, the rows are ordered, but not necessarily physically.  That

is why there is a need to periodically do a rebuild/reorg of the tables in

MSSQL.

Sorry, you lost me.  I would have thought that the index is the

"logical" part of things and that the ordering in the file is the "on disk"

or "physical" portion.

Can you elaborate?

I can't recall the specifics, but there is some kind of pointer left

where the new middle row would logically be.  Like a "jump to this block".

That would seem to defeat the whole "in order on disk" assumption that the

benefits of clustered indexes are supposedly based on.

I guess that if it is a rare event that it might not be too horrible?

You have some control over how rare.  If you have an ascending numeric key,

you can pack rows pretty tightly.  If you have something like a GUID, then

you'd set the "fill %" accordingly when you rebuild the index.  This

underscores why an plain GUID for a clustered key is not a great idea, since

you have to have your fill % lower than other alternatives.  The article

mentioned earlier in the thread describes a way around this by manipulating

the GUID to be ordered of sorts.

--

Kind regards,

Stefan Houtzager

Houtzager ICT consultancy & development

www.linkedin.com/in/stefanhoutzager

--

Tom Bascom

Reply Reply to all Forward

Gus Bjorklundmaybe. there are tradeoffs, always these d**n tradeoffs. when multiple users ...

Jun 5

maybe. there are tradeoffs, always these d**n tradeoffs. when multiple users ...

Gus BjorklundJun 5

Loading...

Gus Bjorklund

Jun 5

to peg

On 6/5/14 8:38 AM, "Marian Edu" wrote:

>and beside, imho the performance impact of using guid (string) as pk is

>more on the read end than on insert/create one

>

>

maybe.  there are tradeoffs, always these d**n tradeoffs.

when multiple users are making inserts to the same index, if the key

values are monotonically increasing then all the inserts can end up in the

same leaf block of a b-tree.  then there will be contention and for locks

on the index block.  the transactions serialize and you lose concurrency.

regards, gus

"Culture eats strategy for breakfast." -- Peter Drucker

Reply Reply to all Forward

Marian Edutrue, but my point was in normal circumstances one will read that data more t...

Jun 5

true, but my point was in normal circumstances one will read that data more t...

Marian EduJun 5

Loading...

Marian Edu

Jun 5

to peg

On 06/05/2014 05:29 PM, Gus Bjorklund wrote:

On 6/5/14 8:38 AM, "Marian Edu" wrote:

and beside, imho the performance impact of using guid (string) as pk is

more on the read end than on insert/create one

maybe.  there are tradeoffs, always these d**n tradeoffs.

when multiple users are making inserts to the same index, if the key

values are monotonically increasing then all the inserts can end up in the

same leaf block of a b-tree.  then there will be contention and for locks

on the index block.  the transactions serialize and you lose concurrency.

true, but my point was in normal circumstances one will read that data more than once while there can only be one insert for the lifespan of an 'event'

and that assertion holds true for all indexes, back to those darn trade-offs I guess... how often one insert data in a table, how often data will be read and how will help a designer/dba to put those on balance and try to find the lesser of two evils :)

--

m.edu

keep it simple

http://www.ganimede.ro

ro.linkedin.com/.../marianedu

mobile: +40 740 036 212

skype: marian.edu

Reply Reply to all Forward

Tom BascomSome data is write-only. (Most implementations of auditing spring to mind...)

Jun 5

Some data is write-only. (Most implementations of auditing spring to mind...)

Tom Bascom  

Jun 5

to peg

true, but my point was in normal circumstances one will read that data more than once while there can only be one insert for the lifespan of an 'event'

Some data is write-only.

(Most implementations of auditing spring to mind...)

--

Tom Bascom

Reply Reply to all Forward

Marian Edu

Jun 6

to peg

On 06/05/2014 09:33 PM, Tom Bascom wrote:

true, but my point was in normal circumstances one will read that data more than once while there can only be one insert for the lifespan of an 'event'

Some data is write-only.

(Most implementations of auditing spring to mind...)

yeah, but in those cases one could argue there is no need for a primary-key... the most probably index on date (time-stamp) will lead to theracing condition Gus was mentioned on the b-tree leaf anyway since time flow isn't too random unfortunately, always increasing :)

but that makes me think... a good index must have a good distribution, does this mean a sequence should not be used as a primary key? An always incrementing number (like order number mentioned before) will lead to a skew index and not so great data distribution... well, luckily it's Friday :)

as for clustered index, wouldn't aordinary b-tree index on all table's fields will be the exact equivalent(not considering limitations)... this will of course duplicate the whole table data (table structure and b-tree index)

feel like those kind of thoughts would make a great camp fire subject at pug's/exchange events if kept at a level where developers can follow instead of flag it as `dba only` :)

- what makes a good index

- when enough is enough(don't over index)

- composed key vs. single key index (when, why,will a composed key be enough for a query if all fields required are there or will fetch the whole record from table anyway)

Posted by Stefan Drissen on 15-Oct-2014 06:36

The implementation for OpenEdge is very relevant.

From the article:

[quote]

I could generate a million random GUIDs in 140 milliseconds, but sequential GUIDs took 2800 milliseconds

[/quote]

With 11.4 it takes me 498 milliseconds to generate only 10,000 guids.

def var ii as int no-undo.

def var itime as int no-undo.

itime = etime.

do ii = 1 to 10000:

  guid.

end.  

message etime - itime view-as alert-box.


Posted by agent_008_nl on 15-Oct-2014 07:29

You tested the C# solution from within the ABL where Mike asks for?

Posted by Etienne Begin on 15-Oct-2014 08:17

Seems slow on the surface, but under which circumstances will you generate guid in such a tight loop ?

Posted by Peter Judge on 15-Oct-2014 09:06

I get around the same time.

Most pleasantly, if i use an integer counter, it's about the same speed - ~500 ms for both messages in the code below.

def var ii as int no-undo.
def var cg as char no-undo.
 
def var itime as int no-undo.
 
itime = mtime.
 
do ii = 1 to 10000:
 
  guid.
 
end.  
 
message mtime - itime view-as alert-box.

itime = mtime.
 
do ii = 1 to 10000:
  cg = guid.   
  entry(1, cg, '-') = string(ii, '99999999').
end.  
 
message mtime - itime view-as alert-box.


Posted by gus on 17-Oct-2014 13:04

@stefan:

498 ms for 10,000.

this is a problem?

Posted by Etienne Begin on 17-Oct-2014 13:22

In that tight loop you could throw in a couple "if true then." and you would notice a slow down :)

Posted by Stefan Drissen on 18-Oct-2014 02:23

[quote user="gus"]

@stefan:

498 ms for 10,000.

this is a problem?[/quote]

Just an observation.

A tight loop that does nothing one million times takes 225 ms.

A tight loop that generates one million guids takes 50001 ms.

In another language generating one million guids takes 140 ms.

This feels wrong.

Posted by Stefan Drissen on 18-Oct-2014 02:28

[quote user="beginetienne"]

In that tight loop you could throw in a couple "if true then." and you would notice a slow down :)

[/quote]

Three 'if true then .' resulted in three ms extra.(in 1 million loop)

Posted by Thomas Mercer-Hursh on 18-Oct-2014 09:09

There are many such comparisons possible between ABL and some other language.  String operations in particular come to mind.  And yet, by and large ABL performs just fine compared to these other languages.  In part, some of that comes from having more complex operations like reading and writing the disk built in and highly optimized and in part it comes from comparing things which don't happen or don't matter.  E.g., what exactly is the context in which one is generating 10,000 or 1 million GUIDs?  Chances are, if it ever happens it is in the context of creating new records in the database ... and so, the overhead of creating a GUID is a meaningless iota in the midst of the processing to obtain and validate the data which is going into the record and writing the record to disk.

Posted by gus on 20-Oct-2014 10:01

[quote user="Stefan Drissen"]

Just an observation.

A tight loop that does nothing one million times takes 225 ms.

A tight loop that generates one million guids takes 50001 ms.

In another language generating one million guids takes 140 ms.

This feels wrong.

[/quote]
No doubt the guid-generation speed could be improved substantially with some effort. Now and then, as time permits, the 4GL group makes minor performance improvements. It is not likely that this particular item will get a high priority (IMNSHO, not worthwhile). Then again, performance improvements in one place often beget improvements in others.

Posted by agent_008_nl on 20-Oct-2014 11:18

Just an observation, "My primary objection against using technical keys" (etc, by Stefan Drissen, 2012 in this thread). When then would what you observed become a problem in practice for you, is it a problem already? I for me would of course like to see the guid faster, you never know which usecase could appear sometime, but I have no problem with it's slowness for now.  

 And you Mike (or whoever), do you have a concrete problem with this slow guid?

Posted by Etienne Begin on 20-Oct-2014 12:38

Adding "if true then." slows down on my version of OE much more than 2 ms for 1M iterations.  Increment a variable, or casting an integer to string is just as slow as guid().

i = i + 1 is "slow", in comparison to an iteration with no code.

I'd rather have a faster substring() function than guid().  

This sounds a lot like another conversation on the PEG (or here don't remember) about class instantiation in ABL vs .NET.   

E-

Posted by Stefan Drissen on 20-Oct-2014 14:49

[quote user="agent_008_nl"]

Just an observation, "My primary objection against using technical keys" (etc, by Stefan Drissen, 2012 in this thread). When then would what you observed become a problem in practice for you, is it a problem already? I for me would of course like to see the guid faster, you never know which usecase could appear sometime, but I have no problem with it's slowness for now.  

[/quote]

I do not use guids in practice. The theory interested me briefly, being a clean cut case.

I would like the compiler to handle global defined preprocessors more efficiently - but that's another story.

Posted by Stefan Drissen on 20-Oct-2014 15:01

[quote user="beginetienne"]

Adding "if true then." slows down on my version of OE much more than 2 ms for 1M iterations.  Increment a variable, or casting an integer to string is just as slow as guid().

i = i + 1 is "slow", in comparison to an iteration with no code.

I'd rather have a faster substring() function than guid().  

This sounds a lot like another conversation on the PEG (or here don't remember) about class instantiation in ABL vs .NET.   

E-

[/quote]

I'm using prowin.exe on 11.4 x64 on Vista x64 on a five year old i7-920.

For one million iterations:

descriptionms
empty loop 221
if true then . 388
today 847
time 747
string( ii ) 737
string( ii, ">>>>>>>>9" ) 1788
substring( "", ii, 1 ) 588
it = it + 1 653
guid 50000

This is not slower - it is crawling. But again - just an observation.

Right... so I pulled out 11.2.1 32-bit - 1 million guids in 472 ms....

Is the 11.2.1 guid generator broken and functionally fixed at the expense of performance in 11.4 x64?

Update: switched to a Win7 x64 machine:

10.2B08: 50000 ms

11.2.1: 378 ms

11.4 (32-bit): 49999 ms

11.4 (64-bit): 50005 ms

So the 11.2.1 GUID generator seems to have been broken.

Posted by TheMadDBA on 20-Oct-2014 15:39

Older AIX box, each for 1 million calls of GUID.

10.2A (32 bit)     667ms

10.2B07(32 bit)  621ms

11.3 (64 bit)        670ms

I suspect a Windows call issue of some kind.

Posted by Aidan Jeffery on 20-Oct-2014 16:43

Your guess is correct.
A problem was found with the generator on Windows only, and was fixed in 11.3 to ensure the uniqueness of the guids.
 
Here’s a link to an earlier thread on this subject:
 
 
[collapse]
From: Stefan Drissen [mailto:bounce-14941@community.progress.com]
Sent: Monday, October 20, 2014 4:02 PM
To: TU.OE.Development@community.progress.com
Subject: RE: [Technical Users - OE Development] GUID/sequence
 
Reply by Stefan Drissen
beginetienne
Adding "if true then." slows down on my version of OE much more than 2 ms for 1M iterations.  Increment a variable, or casting an integer to string is just as slow as guid().
i = i + 1 is "slow", in comparison to an iteration with no code.
I'd rather have a faster substring() function than guid().  
This sounds a lot like another conversation on the PEG (or here don't remember) about class instantiation in ABL vs .NET.   
E-

I'm using prowin.exe on 11.4 x64 on Vista x64 on a five year old i7-920.

For one million iterations:

description
ms
empty loop
221
if true then .
388
today
847
time
747
string( ii )
737
string( ii, ">>>>>>>>9" )
1788
substring( "", ii, 1 )
588
it = it + 1
653
guid
50000

This is not slower - it is crawling. But again - just an observation.

Right... so I pulled out 11.2.1 32-bit - 1 million guids in 472 ms....

Is the 11.2.1 guid generator broken and functionally fixed at the expense of performance in 11.4 x64?

Stop receiving emails on this subject.

Flag this post as spam/abuse.

[/collapse]

Posted by Stefan Drissen on 20-Oct-2014 17:43

[quote user="Aidan Jeffery"]

Your guess is correct.
A problem was found with the generator on Windows only, and was fixed in 11.3 to ensure the uniqueness of the guids.
 
Here’s a link to an earlier thread on this subject:
 

[/quote]

Thanks for clearing that up! I was rather sure I'd seen this somewhere before, but couldn't find it in the release notes or in the knowledgebase.

Posted by Etienne Begin on 21-Oct-2014 08:31

[quote user="Stefan Drissen"]

For one million iterations:

descriptionms
empty loop 221
if true then . 388
today 847
time 747
string( ii ) 737
string( ii, ">>>>>>>>9" ) 1788
substring( "", ii, 1 ) 588
it = it + 1 653
guid 50000

[/quote]

I have run the same test just out of curiosity.

OE 10.2B, hp Integrity BL860c i2, B.11.31 ia64

CPU: Intel® Itanium® 9300 series

Memory: 64gb

descriptionms
empty loop 390
3x if true then . 1558
today 3835
time 3774
string( ii ) 1931
string( ii, ">>>>>>>>9" ) 3832
substring( "", ii, 1 ) 1367
it = it + 1 (repeat) 2561
guid 1559

Etienne.

This thread is closed