Deleting large amount of data in openedge database - Community Groups - Forum

All Replies

Posted by Admin on 03-May-2013 08:05

Drop the table.

Posted by lace28 on 03-May-2013 09:04

Well I don't need to delete all the data from it.

Posted by robw@hltool.com on 03-May-2013 10:00

how often will you need to delete the data?

I have some routines I perform at low usage times like 4 am (daily). Also if it's weekly you could likely hit some low usage on Sundays (weekly).

Posted by lace28 on 03-May-2013 10:16

I have a scheduled task that currently runs once a day. The task time to run has increased significantly as the amount of data has increased over time.

I'm currently analyzing my system to find better down times as you mentionned as my task is now stepping over some operation times and affecting those operations.

I'm also considering running the task more than once a day.

Posted by robw@hltool.com on 03-May-2013 10:40

I've had the least amount of complaints by running large processes at 4am server time, and more conservative transactions at lunchtime. Most of my users are off the db's during their lunch.

Also you could add a 3rd process time of "end of work + 1hour" - also run conservatively

We are a 24 hour shop, but the bulk of our users are active from 7am - 6pm. So gauge your own environment accordingly. It may help to consult the dbname.lic file in your database directory, in order to figure out low usage times. That would be an indirect way to do that, as the .lic file will just tell you how many users are logged on, not if anyone is crunching serious CPU/hard disk.

Posted by Tim Kuehn on 03-May-2013 10:52

lace28 wrote:

Hi, I need to delete large amounts of data from a table on a regular basis and I was wondering what the best way of doint that would be.

I'm using OpenEdge 10.2A.

Thank you.

How is the delete code structured? Deleting data a record at a time is a lot slower than deleting them in groups of 100 or 1000.

Posted by lace28 on 03-May-2013 12:09

Thanks about the lic file I didn't know that. As far as low usage time I'm confident to find the right times after a little bit more analysis.

Posted by lace28 on 03-May-2013 12:12

It's interesting that you mention that as I'm currently testing different code structures.

As it is most of the work is done inside for each loops that delete one record at the time.

for each table1 where date

delete table1.

end.

for each table2 where date

delete table2.

end.

etc...

How exactly would you delete them in groups?

Posted by Tim Kuehn on 03-May-2013 15:35

Something like this does the job nicely, and groups the transactions into blocks of 100 records / delete:

DEFINE QUERY q-name
FOR table-name
.

OPEN QUERY q-name
    FOR EACH table-name
        WHERE table-name.field-name = "something"
        .

GET FIRST q-name NO-LOCK.

tx-block:
REPEAT TRANSACTION:

DO i = 1 TO 100:

IF NOT AVAILABLE table-name THEN
LEAVE tx-block.

GET CURRENT q-name EXCLUSIVE-LOCK.

DELETE table-name.

GET NEXT q-name EXCLUSIVE-LOCK.

END.

END..

Posted by lace28 on 06-May-2013 10:01

Sounds good. I'll do some testing and post my results. Thanks.

Posted by ChUIMonster on 07-May-2013 17:06

You might want to test that assumption.

Grouping is great for creation & updates. My experience is that it doesn't help deletes.

Posted by lace28 on 08-May-2013 13:09

Ok here are some results for deleting chunks of 10 000 records from one table.

Code1:

for each table1.

delete table1.

icnt = icnt + 1.

if icnt > 9999 then leave.

end.

Average time: 12,7 seconds

Code2:

DEFINE QUERY q-name

FOR table1.

OPEN QUERY q-name

FOR EACH table1.

GET FIRST q-name NO-LOCK.

tx-block:

REPEAT TRANSACTION:

DO icnt = 1 TO 100:

IF NOT AVAILABLE table1 THEN LEAVE tx-block.

GET CURRENT q-name EXCLUSIVE-LOCK.

DELETE table1.

GET NEXT q-name EXCLUSIVE-LOCK.

END.

icnt2 = icnt2 + 100.

if icnt2 > 9999 then leave.

END.

Average time: 9.8 seconds

Code3:

do transaction:

for each table1.

delete table1.

icnt = icnt + 1.

if icnt > 9999 then leave.

end.

Average time: 10.9 seconds

So code 2 is faster than code 1 and if you project the results for a deletion of over 1,000,000 records it's pretty significant.

I guess I'll see how I can incorporate that into my daily job and see how it goes.

Thanks again, and if anyone has any more input or suggestions on this it is welcomed.

Posted by Thomas Mercer-Hursh on 09-May-2013 09:15

It might help if you would tell us a little about the actual problem. In your example, it looks like you are deleting the most recent data, which would be unusual. If we knew something about the process, we might be able to suggest different approaches.

E.g., years and years ago, before we had temp-tables for building temporary data for reporting, I developed a technique by which we made a copy of an empty database with a generic schema, filled that DB with the data for the report, and at the end simply deleted the database. Back in the day, this made a really dramatic improvement in the overall processing time.

So tell us a bit about where the data comes from and why you are deleting it.

Posted by lace28 on 09-May-2013 09:29

I'm actually deleting the data that's older than 30 days (

I'm deleting it because there's a lot of data in some tables and the database grows pretty big pretty quickly.

Posted by robw@hltool.com on 09-May-2013 09:41

Your main factor with this is /when/ you run this routine.

If you ran it daily you would have less deletes each time = quicker runtime.

If you ran it weekly - and have a nice slot of time where no one is actively using the system - then who cares if it takes 9.1 minutes compared to 8.6 minutes?

So get your timing down - then do your tweaks.

Posted by Thomas Mercer-Hursh on 09-May-2013 09:47

I hate to see people throwing away information. Define "pretty big".

If some process is slowing down as the amount of data in the table increases, then it is likely that the fault is the query in that process, not the amount of data in the table.

Posted by lace28 on 09-May-2013 11:54

Like I already said I'll figure out the best time to run the job.

And it's more like 12.7 secs / 10 000 records vs 9.8 secs / 10 000 records projected on 2.5 millions records, eventually more, It might not be that big a deal but I care.

Posted by lace28 on 09-May-2013 11:56

No process is slowing down. It's about disk space and getting rid of lots of unecessary data.

Posted by Thomas Mercer-Hursh on 09-May-2013 13:08

Like I said, how big is big. Disk is awfully cheap these days....

Posted by lace28 on 09-May-2013 13:24

The data I'm cleaning goes through a first database before getting transfered to another.

The first db acts as an intermediate, and that's the one I'm cleaning regurarly.

So db1 is 10 gigs now and DB2 is 60 gigs. It grew 25 gigs in the last year and will grow by more than that in the next year.

I understand disks are cheap but isn't it a good idea to delete useless data?

Posted by Thomas Mercer-Hursh on 09-May-2013 13:56

Really useless data is, of course, useless ... but is it really useless. I've had customers go back years mining information from old transaction details. This is the era of Big Data!

Posted by lace28 on 10-May-2013 08:06

Right! I'll need to get used to it. Thanks for your help.

Posted by gus on 10-May-2013 10:23

Then again, we may be deluding ourselves into the belief that all that data is actually going to be good for something someday. I bet often it turns out not to be.

Posted by Thomas Mercer-Hursh on 10-May-2013 10:51

Clearly, in fact, one of the lessons of Big Data is that merely collecting large amounts of it is unlikely to produce meaningful results. I am merely suggesting that deciding not to keep data simply because it is big is just as wrong as deciding to keep everything just in case.

This thread is closed