TABLE-SCAN - Useless when deleting records?

Posted by Lieven De Foor on 23-May-2017 08:48

Hi,

The documentation states the following:

TABLE-SCAN provides the most efficient access method when retrieving all rows of a temp-table or a database table in a Type II Storage area. TABLE-SCAN returns these rows without using an index. When the TABLE-SCAN keyword is used, the AVM only accesses the record block instead of both record and index blocks.

Am I correct in assuming that using TABLE-SCAN to delete all records from a table has no benefit over using the primary index, since the index entry needs to be removed as well?

Posted by Laura Stern on 23-May-2017 10:02

Think of it this way.  The DELETE is going to go through the same work regardless of how the record was found in the first place.  So clearly if TABLE-SCAN makes finding each record faster, then the whole process will be faster.  

You seem to be thinking that if we use the index to find the record then deleting the record and the indexes will somehow be faster.  That is not true.  Well, maybe index blocks will already be in memory that wouldn't have been there.  But these blocks needed to be loaded once regardless.  So there would be no gain in performance overall.

Posted by gus bjorklund on 23-May-2017 17:28

> On May 23, 2017, at 9:50 AM, Lieven De Foor wrote:

>

> Am I correct in assuming that using TABLE-SCAN to delete all records from a table has no benefit over using the primary index

No.

Try it on a big table for your own self and you will see. I use "for each blah table-scan" every time i have to delete many records and it works extremely well. Way faster than without the table-scan.

Posted by Lieven De Foor on 24-May-2017 02:00

OK guys, thanks for all the answers.

I'll do some experiments myself to confirm this.

All Replies

Posted by Tim Kuehn on 23-May-2017 08:54

The difference is the ordering - table scan (I believes) goes by rowid order which generally means faster access because successive blocks'll be "closer" to each other.

Going by primary index can be an issue if the blocks are not "close" to each other and the engine has to go looking all over the place for them.

Posted by Lieven De Foor on 23-May-2017 09:01

But wouldn't the cost of fetching the corresponding index entry and deleting that one negate the benefit from accessing the records in the table in physical order?

Either way, only one of both table records or index entries can be iterated in the physical order on disk...

Posted by Tim Kuehn on 23-May-2017 09:16

The issue is which direction you're going.

Finding a block using the primary index means taking the data, looking it up in the index, then reading the block that goes with that index. When you delete a record the engine has to delete all the index entries for all the indexes that go that record. Depending on where the blocks are in the index will govern how fast this process can run.

Finding a block using table-scan avoids that by going record by record in order by physical record and skipping the index-lookup/block load part of the process. Deleting a record would only incur an index entry lookup / deletion. Since those are linked by a pointer, there's not much time used in finding a block with the target record.

As an experiment do a FOR EACH on a big table using

a) a unique index,

B a non-unique index, then

c)TABLE-SCAN

and take timing measurements. You should find that C is the fastest, followed by B, then A.

Posted by Rob Fitzpatrick on 23-May-2017 09:21

> Am I correct in assuming that using TABLE-SCAN to delete all records from a table has no benefit over using the primary index, since the index entry needs to be removed as well?

I would say the answer is "it depends"; TABLE-SCAN might be better or worse than reading with an index.  It depends on the state of your table.

The benefit of TABLE-SCAN, in theory, is that you don't have to read any index blocks to access the RM blocks in your table.  Another benefit is that because it accesses the blocks by following the cluster chain, it will read each RM block into memory just once.

The downside of TABLE-SCAN is that it will read all RM blocks in the table, whether or not they contain records.  By contrast, a scan via an index will only read RM blocks that contain at least one record.  So if this table previously contained many more records than it does now, and it has a very long RM chain, the TABLE-SCAN may read many more RM blocks than the index scan would.  However the downside of the index scan is that if the table is heavily scattered and quite large, compared to -B, you might read a given RM block from disk multiple times to delete its records.

So without knowing more about your table, it is difficult to say which approach is better.

Also, if this is a very large table, it might be faster for you to just dump the table's schema, drop it, and create it again rather than deleting records programmatically.  Another alternative approach, if it is in its own area, is to truncate the area.

Posted by Lieven De Foor on 23-May-2017 09:24

Hi Tim,

You don't have to convince me TABLE-SCAN is the fastest way of reading all records when the order isn't important.

I completely understand the principle here ;-)

It is the special case of deleting a record that makes me wonder if the benefit is still there, since at that point the engine needs to find all index entries for all indexes pointing to that record too, and finding these might negate the performance benefit of TABLE-SCAN...

Posted by Laura Stern on 23-May-2017 10:02

Think of it this way.  The DELETE is going to go through the same work regardless of how the record was found in the first place.  So clearly if TABLE-SCAN makes finding each record faster, then the whole process will be faster.  

You seem to be thinking that if we use the index to find the record then deleting the record and the indexes will somehow be faster.  That is not true.  Well, maybe index blocks will already be in memory that wouldn't have been there.  But these blocks needed to be loaded once regardless.  So there would be no gain in performance overall.

Posted by cverbiest on 23-May-2017 10:07

I would have hoped table-scan could be used after deactivating the indexes but

for each customer table-scan: delete customer. end.

fails with

Index CustNum is inactive and cannot be referenced. (995)

Posted by Lieven De Foor on 23-May-2017 10:13

Hi Laura,

That was more or less what I was thinking.

I did not think going through the index would be faster, I did however think using TABLE-SCAN would only be marginally faster (if at all) because of the work the DELETE will cause (i.e. get the corresponding index entries and delete those).

Like you said, the same work has to be done, which leads me to believe TABLE-SCAN is of little use when deleting records...

Guess I'll have to write some benchmarks...

Posted by Fernando Souza on 23-May-2017 10:16

The inactive index error when using table-scan is a bug, if the table is in a type II area.

Posted by cverbiest on 23-May-2017 10:53

Tested brand new oe11.7 sports2000, Type II  . "for each customer table-scan:  end." fails with error 995 if the indexes are deactivated

Posted by gus bjorklund on 23-May-2017 17:28

> On May 23, 2017, at 9:50 AM, Lieven De Foor wrote:

>

> Am I correct in assuming that using TABLE-SCAN to delete all records from a table has no benefit over using the primary index

No.

Try it on a big table for your own self and you will see. I use "for each blah table-scan" every time i have to delete many records and it works extremely well. Way faster than without the table-scan.

Posted by Lieven De Foor on 24-May-2017 02:00

OK guys, thanks for all the answers.

I'll do some experiments myself to confirm this.

Posted by e.schutten on 24-May-2017 03:49

Keep us informed. I'm very curious.

Posted by marian.edu on 24-May-2017 04:04

maybe what we’re looking at is an ‘empty table’ statement… it sure should be fairly easy to trash everything away :)


Marian Edu

Acorn IT 
+40 740 036 212

Posted by gus bjorklund on 24-May-2017 06:14

note that:

0) your mileage may vary depending on the number of indexes the table of interest has. the more indexes, the less difference it will make.

1) a table-scan on a table in a type i data area will use the primary index because there is no better way.

2) you should not be using type i data areas anyway.=

Posted by Lieven De Foor on 24-May-2017 06:28

It now turns out TABLE-SCAN can't be used in a dynamic query, making it somewhat useless for my particular use case...

Is this something that could be proposed as a feature enhancement?

Posted by Evan Bleicher on 24-May-2017 07:12

In the 11.0 functional spec of the TABLE-SCAN feature, it is noted that adding this facility to dynamic queries is not a requirement in the initial release.  I don't recall other developers noting the lack of this functionality in the past, but I can add it to our backlog for consideration in a future release.  

Posted by Roger Blanchard on 24-May-2017 07:16

I think support for dynamic queries would be a great feature. We cannot use TABLE-SCAN since where we would need to use it in our app uses dynamic queries.

Posted by ChUIMonster on 24-May-2017 07:34

I'm pretty sure it has come up more than once in "info exchanges" and other such venues.

I suppose it might have made sense to omit it from the initial release if it was going to somehow extend the timeline but I cannot understand why it would not be on the radar as an obvious "completion" sort of thing.

Posted by Roger Blanchard on 24-May-2017 08:09

I believe Tom is correct. I seem to recall it was discussed at one of the PUG Challenges .

Posted by Lieven De Foor on 24-May-2017 08:23

I've created a community idea to add your votes ;-)

community.progress.com/.../extend_table-scan_support_to_dynamic_queries

This thread is closed