A theoretical question for those of you who like that sort of thing.
Is there any difference in storage utilisation for having one field on a table with an extent of 10 vs having 10 fields on the table?
Hi James,
What happens if you need 11 ?
[quote user="mallen"]
What happens if you need 11 ?
Thanks both. I realise there's a problem with either approach. The question was posed to me by a colleague in a theoretical approach, so yes the ideal answer would be to find a different approach! :)
Thanks for the thoughts Mike. Confirms my suspicions! :)
I suppose that if the child table were in an area with a different RPB (it's likely to be a smaller table I'd have thought), that whilst the individual records may well be bigger than extent/field solution that you may find many records don't take up much more space. I could be wrong of course! :)
The real point, though, is that one shouldn't be making decisions about data structure based on what has to be microscopically trivial differences in total storage, especially in this era of cheap disk.
Completely agree Thomas. It was a trivia question more than anything.
I may regret saying this but...
I am generally a fierce advocate of normalization but there are times when an array is a significant performance advantage.
I have a particular use case where I could either have 50 to 100 records that all share the same leading key component (a timestamp) and are the distinguished by a "type" field that is essentially an array index.
Or I could have one record with and extent of X and decide on a mapping of those indexes to array positions. Making the array much bigger than it needs to be has no serious impact on storage and allows me to add new mappings easily.
The extent based approach is much, much more efficient in terms of disk space, memory usage and performance.
Keep in mind that this is a special case where the trade-offs were carefully considered and tested. Not a standard approach or a recommendation for such.
ProDatasets exchanged between AppServer and (GUI) client may also be a perfect place for denormalization as every byte my count on the wire.
To clarify my comment above -- when I say 50 to 100 records I mean 50 to 100 per timestamp. There are hundreds of millions of records in total -- so the difference is magnified considerably across that dataset. The inefficiencies of the normalized approach were not immediately apparent when there were only millions of records. When there got to be tens of millions it started to become apparent that this was maybe not the best approach...
yes. an array has slightly more overhead to mark the start and end of the array. but it is insignificant overhead. you will never notice, even if you use a magnifying glass.
[mention:9617a07f61934bc98f6fccb3b7fabfae:e9ed411860ed4f2ba0265705b8793d05], many thanks for wading in to this. I'll have to buy you a beer next time I see you as that helps me win the wager! ;)