Hey guys...
I've been tinkering on and off with partitioning on the Linux environment in the [very little] spare time I've had over the past month or so, and I must admit I have been rather frustrated.
As the sort of table that we would be likely to partition are existing tables with hundreds of millions of records (>10G), what we are really needing to do is migrate the data in place. And in at least some cases we will want to use unique integer key value ranges.
In my particular attempt therefore, I had generated an extra one million customer records, then attempted to create three partitions of range 500,000. [The sports.customer table is not precisely analogous to our data but is a useful simulation.]
With the given examples being almost entirely driven by metaschema, I would characterize the effort as “struggling through virtually every step”. At the most farthest point, I received this error:
“Cannot add range value that splits an existing range, partition [partition-name]”
This, along with the context of the given examples, suggests to me that moving existing data may not be supported – that the current design assumes a necessary dump and load of the data. [If so, the functionality will likely be practically unusable for us, since we are primarily dealing with legacy applications & data.]
Is this the case – that existing range data can’t be subdivided – or am I misinterpreting my current results?
Admittedly, the lack of any discernible unix-based data dictionary tool makes it difficult to know if I’m using the metaschema correctly… but since those are the examples provided I assume that’s the only method available short of trying the Window environment.
At the moment I feel like the Linux examples aren’t really in a state that an end user can reasonably evaluate.
Hey guys...
I've been tinkering on and off with partitioning on the Linux environment in the [very little] spare time I've had over the past month or so, and I must admit I have been rather frustrated.
As the sort of table that we would be likely to partition are existing tables with hundreds of millions of records (>10G), what we are really needing to do is migrate the data in place. And in at least some cases we will want to use unique integer key value ranges.
In my particular attempt therefore, I had generated an extra one million customer records, then attempted to create three partitions of range 500,000. [The sports.customer table is not precisely analogous to our data but is a useful simulation.]
With the given examples being almost entirely driven by metaschema, I would characterize the effort as “struggling through virtually every step”. At the most farthest point, I received this error:
“Cannot add range value that splits an existing range, partition [partition-name]”
This, along with the context of the given examples, suggests to me that moving existing data may not be supported – that the current design assumes a necessary dump and load of the data. [If so, the functionality will likely be practically unusable for us, since we are primarily dealing with legacy applications & data.]
Is this the case – that existing range data can’t be subdivided – or am I misinterpreting my current results?
Admittedly, the lack of any discernible unix-based data dictionary tool makes it difficult to know if I’m using the metaschema correctly… but since those are the examples provided I assume that’s the only method available short of trying the Window environment.
At the moment I feel like the Linux examples aren’t really in a state that an end user can reasonably evaluate.
Flag this post as spam/abuse.
Hey guys...
I've been tinkering on and off with partitioning on the Linux environment in the [very little] spare time I've had over the past month or so, and I must admit I have been rather frustrated.
As the sort of table that we would be likely to partition are existing tables with hundreds of millions of records (>10G), what we are really needing to do is migrate the data in place. And in at least some cases we will want to use unique integer key value ranges.
In my particular attempt therefore, I had generated an extra one million customer records, then attempted to create three partitions of range 500,000. [The sports.customer table is not precisely analogous to our data but is a useful simulation.]
With the given examples being almost entirely driven by metaschema, I would characterize the effort as “struggling through virtually every step”. At the most farthest point, I received this error:
“Cannot add range value that splits an existing range, partition [partition-name]”
This, along with the context of the given examples, suggests to me that moving existing data may not be supported – that the current design assumes a necessary dump and load of the data. [If so, the functionality will likely be practically unusable for us, since we are primarily dealing with legacy applications & data.]
Is this the case – that existing range data can’t be subdivided – or am I misinterpreting my current results?
Admittedly, the lack of any discernible unix-based data dictionary tool makes it difficult to know if I’m using the metaschema correctly… but since those are the examples provided I assume that’s the only method available short of trying the Window environment.
At the moment I feel like the Linux examples aren’t really in a state that an end user can reasonably evaluate.
Flag this post as spam/abuse.
Hey guys...
I've been tinkering on and off with partitioning on the Linux environment in the [very little] spare time I've had over the past month or so, and I must admit I have been rather frustrated.
As the sort of table that we would be likely to partition are existing tables with hundreds of millions of records (>10G), what we are really needing to do is migrate the data in place. And in at least some cases we will want to use unique integer key value ranges.
In my particular attempt therefore, I had generated an extra one million customer records, then attempted to create three partitions of range 500,000. [The sports.customer table is not precisely analogous to our data but is a useful simulation.]
With the given examples being almost entirely driven by metaschema, I would characterize the effort as “struggling through virtually every step”. At the most farthest point, I received this error:
“Cannot add range value that splits an existing range, partition [partition-name]”
This, along with the context of the given examples, suggests to me that moving existing data may not be supported – that the current design assumes a necessary dump and load of the data. [If so, the functionality will likely be practically unusable for us, since we are primarily dealing with legacy applications & data.]
Is this the case – that existing range data can’t be subdivided – or am I misinterpreting my current results?
Admittedly, the lack of any discernible unix-based data dictionary tool makes it difficult to know if I’m using the metaschema correctly… but since those are the examples provided I assume that’s the only method available short of trying the Window environment.
At the moment I feel like the Linux examples aren’t really in a state that an end user can reasonably evaluate.
Flag this post as spam/abuse.
As Fernando said, the scripts in the first ESAP were intended to be followed precisely, and not intended to be tried on your own database. We limited it in this way in order to get early functionality in your hands but 'protect' you : ) from areas that were not complete. Were you able to run through the process in the script with the canned DB's? Hopefully your experience with the next ESAP will give you a better feel for working with partitions on your own.
Hey Rich...
> The integer range values you mention for the partitioning, are they associated with some time based criteria or something closer to a customer number?
> Are the values of these increasing or at least for the most part increasing?
We are not entirely certain of all the possible scenarios we would entertain the solution for at this point, but there are two specific cases we have in mind that are integer-based.
In case one, the integers are sequence numbers -- as you say constantly increasing over time. So if record number 100000 happens 01/01/2010 and 200000 happens 01/01/2011, we might partition by year. Theoretically his could be especially useful for partial tablemove on a 40GB table -- where both time and BI growth would otherwise be prohibitive for a production environment.
In case two, it is a more complicated legacy issue which is effectively case one + a "range" where record number = 0. Here, we have a table with a primary composite key which was reused inappropriately, resulting in different types of data. So much of the new data has increasing integers but is intermixed with zero-keyed records.
We understand that it might not be optimal to code the general database engine to account for the second case because it is a poor design to begin with... but of course those of us who are dealing with these legacy issues (which I imagine is a fair percentage of Progress administrators) would appreciate maximal flexibility where it does not cut into performance in the optimal cases.
Fernando / mlynch:
Thanks for the follow up. I was hoping that the current examples were very preliminary, so that confirms what I was presuming was true.
Hopefully the next iteration will let us offer more useful feedback; I'll take a closer look at the split target logic in the meantime.
1. You will identify your existing table as being a partitioned table and create a partition definition with the one range partition field that has some logical grouping value at least >= the largest existing value in that field say for example 400,000. This will “cap” the existing partition where it is today with all records with the partitioned field <= 400,000 residing in the current location/partition. (New records can be added to this initial partition as long as the field value is in the range <= 400,000)
2. To add new data to this table that is > 400,000 for your partitioned field, create a new partition with a range value inclusive of your next logical range grouping, say 500,000 for example. All new data (or data in the range of 400,001 thru 500,000) will reside in this new partition when those records are added. This partition can be created in the same or different Type II storage area than the original data.
3. So now you’ve got one 40GB partition (your original data) and one partition waiting to receive new data. This is helpful for managing new data but is really not partitioning as you want it since you still need to deal with that 40GB partition.
4. If you want to further partition your existing 40GB of original data, you will use the “split” option of proutil to split logical groupings of data out of the big partition into new partitions. This is accomplished by creating one or more new partition with the “split target” attribute set on it and running the proutil split command as described in Migration example #2 of the ESAP drop #1. Yes, this part does indeed require data movement but can be done online without crazy growth to the bi file and without affecting the parts of the data that are not being split so your application can run unaffected while this operation is executing (provided the application does not try to update the data being split since it is locked until completion).
Cool -- this is primarily what we're expecting. It's just that we're doing it [metaschema] wrong in our initial experiments.
One follow up...
> Yes, this part does indeed require data movement but can be done online without crazy growth to the bi file and without affecting the parts of the data that are not being split so your application can run unaffected while this operation is executing (provided the application does not try to update the data being split since it is locked until completion).
When you say "locked until completion", do you refer to all the data in the range of records being moved (e.g. tablemove, all or nothing), or will it be along the lines of an idxcompact where the reorganization scope is limited and safely interruptible?
Clearly the latter seems preferable -- we are presuming/hoping that if BI does not require crazy growth that the partition split can be partially completed over multiple maintenance windows.
Then we just have to deal with [slightly less crazy] AI growth/pica. (And we've heard replication improvements are coming as well...)
I think we are all eagerly awaiting a drop that we can use more extensively with our own data to see how easy it is to migrate. The Dump and load would not necessarily be a bad idea as you may find that once you partition data based on certain fields and you then create global and partition indexes it could be beneficial to do a dump and load of the data. But my understanding so far is that it should not be requirement for you to use/activate partitioning.
We are all waiting for a drop that we can use more extensively with our own data to see how easy it is to migrate. Dump and load would not be a bad idea as you may find that once you partition data based on certain fields and you then create global and partition indexes it could be beneficial to do a dump and load data. But it should not be requirement to use partitioning.
Flag this post as spam/abuse.
[quote user="shaske1"]
When you say "locked until completion", do you refer to all the data in the range of records being moved (e.g. tablemove, all or nothing), or will it be along the lines of an idxcompact where the reorganization scope is limited and safely interruptible?
Clearly the latter seems preferable -- we are presuming/hoping that if BI does not require crazy growth that the partition split can be partially completed over multiple maintenance windows.
Then we just have to deal with [slightly less crazy] AI growth/pica. (And we've heard replication improvements are coming as well...)
[/quote]
Only the data in the range to be split out will be locked down. Other data in that same partition which is not being split out will be accessible if explicitly accessed by the restriction (where clause) in the query.
The split command operates by moving only a user specified number of records per transaction. This way, it remains "recoverable" and bi space does not grow unbounded and it works in replication environments.
When I say "recoverable" I mean that if interrupted, the command will re-start where it left off. It does not mean that once the split command is cancelled the entire split operation can be undone. It must be run to completion once started.
If the split command is interrupted for some reason, the data will remain locked (even if you restart the database!). The split command when re-executed will pick up where it left off.