Hi,
The documentation states the following:
TABLE-SCAN provides the most efficient access method when retrieving all rows of a temp-table or a database table in a Type II Storage area. TABLE-SCAN returns these rows without using an index. When the TABLE-SCAN keyword is used, the AVM only accesses the record block instead of both record and index blocks.
Am I correct in assuming that using TABLE-SCAN to delete all records from a table has no benefit over using the primary index, since the index entry needs to be removed as well?
Think of it this way. The DELETE is going to go through the same work regardless of how the record was found in the first place. So clearly if TABLE-SCAN makes finding each record faster, then the whole process will be faster.
You seem to be thinking that if we use the index to find the record then deleting the record and the indexes will somehow be faster. That is not true. Well, maybe index blocks will already be in memory that wouldn't have been there. But these blocks needed to be loaded once regardless. So there would be no gain in performance overall.
> On May 23, 2017, at 9:50 AM, Lieven De Foor wrote:
>
> Am I correct in assuming that using TABLE-SCAN to delete all records from a table has no benefit over using the primary index
No.
Try it on a big table for your own self and you will see. I use "for each blah table-scan" every time i have to delete many records and it works extremely well. Way faster than without the table-scan.
OK guys, thanks for all the answers.
I'll do some experiments myself to confirm this.
The difference is the ordering - table scan (I believes) goes by rowid order which generally means faster access because successive blocks'll be "closer" to each other.
Going by primary index can be an issue if the blocks are not "close" to each other and the engine has to go looking all over the place for them.
But wouldn't the cost of fetching the corresponding index entry and deleting that one negate the benefit from accessing the records in the table in physical order?
Either way, only one of both table records or index entries can be iterated in the physical order on disk...
The issue is which direction you're going.
Finding a block using the primary index means taking the data, looking it up in the index, then reading the block that goes with that index. When you delete a record the engine has to delete all the index entries for all the indexes that go that record. Depending on where the blocks are in the index will govern how fast this process can run.
Finding a block using table-scan avoids that by going record by record in order by physical record and skipping the index-lookup/block load part of the process. Deleting a record would only incur an index entry lookup / deletion. Since those are linked by a pointer, there's not much time used in finding a block with the target record.
As an experiment do a FOR EACH on a big table using
a) a unique index,
B a non-unique index, then
c)TABLE-SCAN
and take timing measurements. You should find that C is the fastest, followed by B, then A.
> Am I correct in assuming that using TABLE-SCAN to delete all records from a table has no benefit over using the primary index, since the index entry needs to be removed as well?
I would say the answer is "it depends"; TABLE-SCAN might be better or worse than reading with an index. It depends on the state of your table.
The benefit of TABLE-SCAN, in theory, is that you don't have to read any index blocks to access the RM blocks in your table. Another benefit is that because it accesses the blocks by following the cluster chain, it will read each RM block into memory just once.
The downside of TABLE-SCAN is that it will read all RM blocks in the table, whether or not they contain records. By contrast, a scan via an index will only read RM blocks that contain at least one record. So if this table previously contained many more records than it does now, and it has a very long RM chain, the TABLE-SCAN may read many more RM blocks than the index scan would. However the downside of the index scan is that if the table is heavily scattered and quite large, compared to -B, you might read a given RM block from disk multiple times to delete its records.
So without knowing more about your table, it is difficult to say which approach is better.
Also, if this is a very large table, it might be faster for you to just dump the table's schema, drop it, and create it again rather than deleting records programmatically. Another alternative approach, if it is in its own area, is to truncate the area.
Hi Tim,
You don't have to convince me TABLE-SCAN is the fastest way of reading all records when the order isn't important.
I completely understand the principle here ;-)
It is the special case of deleting a record that makes me wonder if the benefit is still there, since at that point the engine needs to find all index entries for all indexes pointing to that record too, and finding these might negate the performance benefit of TABLE-SCAN...
Think of it this way. The DELETE is going to go through the same work regardless of how the record was found in the first place. So clearly if TABLE-SCAN makes finding each record faster, then the whole process will be faster.
You seem to be thinking that if we use the index to find the record then deleting the record and the indexes will somehow be faster. That is not true. Well, maybe index blocks will already be in memory that wouldn't have been there. But these blocks needed to be loaded once regardless. So there would be no gain in performance overall.
I would have hoped table-scan could be used after deactivating the indexes but
for each customer table-scan: delete customer. end.
fails with
Index CustNum is inactive and cannot be referenced. (995)
Hi Laura,
That was more or less what I was thinking.
I did not think going through the index would be faster, I did however think using TABLE-SCAN would only be marginally faster (if at all) because of the work the DELETE will cause (i.e. get the corresponding index entries and delete those).
Like you said, the same work has to be done, which leads me to believe TABLE-SCAN is of little use when deleting records...
Guess I'll have to write some benchmarks...
The inactive index error when using table-scan is a bug, if the table is in a type II area.
Tested brand new oe11.7 sports2000, Type II . "for each customer table-scan: end." fails with error 995 if the indexes are deactivated
> On May 23, 2017, at 9:50 AM, Lieven De Foor wrote:
>
> Am I correct in assuming that using TABLE-SCAN to delete all records from a table has no benefit over using the primary index
No.
Try it on a big table for your own self and you will see. I use "for each blah table-scan" every time i have to delete many records and it works extremely well. Way faster than without the table-scan.
OK guys, thanks for all the answers.
I'll do some experiments myself to confirm this.
Keep us informed. I'm very curious.
maybe what we’re looking at is an ‘empty table’ statement… it sure should be fairly easy to trash everything away :)
note that:
0) your mileage may vary depending on the number of indexes the table of interest has. the more indexes, the less difference it will make.
1) a table-scan on a table in a type i data area will use the primary index because there is no better way.
2) you should not be using type i data areas anyway.=
It now turns out TABLE-SCAN can't be used in a dynamic query, making it somewhat useless for my particular use case...
Is this something that could be proposed as a feature enhancement?
In the 11.0 functional spec of the TABLE-SCAN feature, it is noted that adding this facility to dynamic queries is not a requirement in the initial release. I don't recall other developers noting the lack of this functionality in the past, but I can add it to our backlog for consideration in a future release.
I think support for dynamic queries would be a great feature. We cannot use TABLE-SCAN since where we would need to use it in our app uses dynamic queries.
I'm pretty sure it has come up more than once in "info exchanges" and other such venues.
I suppose it might have made sense to omit it from the initial release if it was going to somehow extend the timeline but I cannot understand why it would not be on the radar as an obvious "completion" sort of thing.
I believe Tom is correct. I seem to recall it was discussed at one of the PUG Challenges .
I've created a community idea to add your votes ;-)
community.progress.com/.../extend_table-scan_support_to_dynamic_queries