Hi, I need to delete large amounts of data from a table on a regular basis and I was wondering what the best way of doint that would be.
I'm using OpenEdge 10.2A.
Thank you.
Drop the table.
Well I don't need to delete all the data from it.
how often will you need to delete the data?
I have some routines I perform at low usage times like 4 am (daily). Also if it's weekly you could likely hit some low usage on Sundays (weekly).
I have a scheduled task that currently runs once a day. The task time to run has increased significantly as the amount of data has increased over time.
I'm currently analyzing my system to find better down times as you mentionned as my task is now stepping over some operation times and affecting those operations.
I'm also considering running the task more than once a day.
I've had the least amount of complaints by running large processes at 4am server time, and more conservative transactions at lunchtime. Most of my users are off the db's during their lunch.
Also you could add a 3rd process time of "end of work + 1hour" - also run conservatively
We are a 24 hour shop, but the bulk of our users are active from 7am - 6pm. So gauge your own environment accordingly. It may help to consult the dbname.lic file in your database directory, in order to figure out low usage times. That would be an indirect way to do that, as the .lic file will just tell you how many users are logged on, not if anyone is crunching serious CPU/hard disk.
lace28 wrote:
Hi, I need to delete large amounts of data from a table on a regular basis and I was wondering what the best way of doint that would be.
I'm using OpenEdge 10.2A.
Thank you.
How is the delete code structured? Deleting data a record at a time is a lot slower than deleting them in groups of 100 or 1000.
Thanks about the lic file I didn't know that. As far as low usage time I'm confident to find the right times after a little bit more analysis.
It's interesting that you mention that as I'm currently testing different code structures.
As it is most of the work is done inside for each loops that delete one record at the time.
for each table1 where date
delete table1.
end.
for each table2 where date
delete table2.
end.
etc...
How exactly would you delete them in groups?
Something like this does the job nicely, and groups the transactions into blocks of 100 records / delete:
DEFINE QUERY q-name
FOR table-name
.
OPEN QUERY q-name
FOR EACH table-name
WHERE table-name.field-name = "something"
.
GET FIRST q-name NO-LOCK.
tx-block:
REPEAT TRANSACTION:
DO i = 1 TO 100:
IF NOT AVAILABLE table-name THEN
LEAVE tx-block.
GET CURRENT q-name EXCLUSIVE-LOCK.
DELETE table-name.
GET NEXT q-name EXCLUSIVE-LOCK.
END.
END..
Sounds good. I'll do some testing and post my results. Thanks.
You might want to test that assumption.
Grouping is great for creation & updates. My experience is that it doesn't help deletes.
Ok here are some results for deleting chunks of 10 000 records from one table.
Code1:
for each table1.
delete table1.
icnt = icnt + 1.
if icnt > 9999 then leave.
end.
Average time: 12,7 seconds
Code2:
DEFINE QUERY q-name
FOR table1.
OPEN QUERY q-name
FOR EACH table1.
GET FIRST q-name NO-LOCK.
tx-block:
REPEAT TRANSACTION:
DO icnt = 1 TO 100:
IF NOT AVAILABLE table1 THEN LEAVE tx-block.
GET CURRENT q-name EXCLUSIVE-LOCK.
DELETE table1.
GET NEXT q-name EXCLUSIVE-LOCK.
END.
icnt2 = icnt2 + 100.
if icnt2 > 9999 then leave.
END.
Average time: 9.8 seconds
Code3:
do transaction:
for each table1.
delete table1.
icnt = icnt + 1.
if icnt > 9999 then leave.
end.
end.
Average time: 10.9 seconds
So code 2 is faster than code 1 and if you project the results for a deletion of over 1,000,000 records it's pretty significant.
I guess I'll see how I can incorporate that into my daily job and see how it goes.
Thanks again, and if anyone has any more input or suggestions on this it is welcomed.
It might help if you would tell us a little about the actual problem. In your example, it looks like you are deleting the most recent data, which would be unusual. If we knew something about the process, we might be able to suggest different approaches.
E.g., years and years ago, before we had temp-tables for building temporary data for reporting, I developed a technique by which we made a copy of an empty database with a generic schema, filled that DB with the data for the report, and at the end simply deleted the database. Back in the day, this made a really dramatic improvement in the overall processing time.
So tell us a bit about where the data comes from and why you are deleting it.
I'm actually deleting the data that's older than 30 days (
I'm deleting it because there's a lot of data in some tables and the database grows pretty big pretty quickly.
Your main factor with this is /when/ you run this routine.
If you ran it daily you would have less deletes each time = quicker runtime.
If you ran it weekly - and have a nice slot of time where no one is actively using the system - then who cares if it takes 9.1 minutes compared to 8.6 minutes?
So get your timing down - then do your tweaks.
I hate to see people throwing away information. Define "pretty big".
If some process is slowing down as the amount of data in the table increases, then it is likely that the fault is the query in that process, not the amount of data in the table.
Like I already said I'll figure out the best time to run the job.
And it's more like 12.7 secs / 10 000 records vs 9.8 secs / 10 000 records projected on 2.5 millions records, eventually more, It might not be that big a deal but I care.
No process is slowing down. It's about disk space and getting rid of lots of unecessary data.
Like I said, how big is big. Disk is awfully cheap these days....
The data I'm cleaning goes through a first database before getting transfered to another.
The first db acts as an intermediate, and that's the one I'm cleaning regurarly.
So db1 is 10 gigs now and DB2 is 60 gigs. It grew 25 gigs in the last year and will grow by more than that in the next year.
I understand disks are cheap but isn't it a good idea to delete useless data?
Really useless data is, of course, useless ... but is it really useless. I've had customers go back years mining information from old transaction details. This is the era of Big Data!
Right! I'll need to get used to it. Thanks for your help.
Then again, we may be deluding ourselves into the belief that all that data is actually going to be good for something someday. I bet often it turns out not to be.
Clearly, in fact, one of the lessons of Big Data is that merely collecting large amounts of it is unlikely to produce meaningful results. I am merely suggesting that deciding not to keep data simply because it is big is just as wrong as deciding to keep everything just in case.