Many fields vs Extents

Posted by James Palmer on 24-Feb-2014 05:15

A theoretical question for those of you who like that sort of thing. 

Is there any difference in storage utilisation for having one field on a table with an extent of 10 vs having 10 fields on the table? 

All Replies

Posted by mallen on 24-Feb-2014 09:46

Hi James,

What happens if you need 11 ?

Posted by Mike Fechner on 24-Feb-2014 10:00

[quote user="mallen"]

What happens if you need 11 ?

[/quote]
In both cases a schema change would be required.
Only normalization can safe you from that. But the question was about storage optimization. 10 or 11 child records will certainly need more (disk/memory) space compared to 10 or 11 fields or extends.
My expectation is, that there is no difference between the storage requirements for 10 fields or an extent 10.

Posted by James Palmer on 24-Feb-2014 10:09

Thanks both. I realise there's a problem with either approach. The question was posed to me by a colleague in a theoretical approach, so yes the ideal answer would be to find a different approach! :)

Thanks for the thoughts Mike. Confirms my suspicions! :)

I suppose that if the child table were in an area with a different RPB (it's likely to be a smaller table I'd have thought), that whilst the individual records may well be bigger than extent/field solution that you may find many records don't take up much more space. I could be wrong of course! :)

Posted by Thomas Mercer-Hursh on 24-Feb-2014 10:15

The real point, though, is that one shouldn't be making decisions about data structure based on what has to be microscopically trivial differences in total storage, especially in this era of cheap disk.  

Posted by James Palmer on 24-Feb-2014 10:43

Completely agree Thomas. It was a trivia question more than anything.

Posted by ChUIMonster on 24-Feb-2014 11:17

I may regret saying this but...

I am generally a fierce advocate of normalization but there are times when an array is a significant performance advantage.

I have a particular use case where I could either have 50 to 100 records that all share the same leading key component (a timestamp) and are the distinguished by a "type" field that is essentially an array index.

Or I could have one record with and extent of X and decide on a mapping of those indexes to array positions.  Making the array much bigger than it needs to be has no serious impact on storage and allows me to add new mappings easily.

The extent based approach is much, much more efficient in terms of disk space, memory usage and performance.

Keep in mind that this is a special case where the trade-offs were carefully considered and tested.  Not a standard approach or a recommendation for such.

Posted by Mike Fechner on 24-Feb-2014 11:19

ProDatasets exchanged between AppServer and (GUI) client may also be a perfect place for denormalization as every byte my count on the wire.

Posted by ChUIMonster on 24-Feb-2014 12:12

To clarify my comment above -- when I say 50 to 100 records I mean 50 to 100 per timestamp.  There are hundreds of millions of records in total -- so the difference is magnified considerably across that dataset.  The inefficiencies of the normalized approach were not immediately apparent when there were only millions of records.  When there got to be tens of millions it started to become apparent that this was maybe not the best approach...

Posted by gus on 04-Mar-2014 23:02

yes. an array has slightly more overhead to mark the start and end of the array. but it is insignificant overhead. you will never notice, even if you use a magnifying glass.

Posted by James Palmer on 05-Mar-2014 02:29

[mention:9617a07f61934bc98f6fccb3b7fabfae:e9ed411860ed4f2ba0265705b8793d05], many thanks for wading in to this. I'll have to buy you a beer next time I see you as that helps me win the wager! ;)

This thread is closed