Storing dates for statistics

Posted by jmls on 19-Dec-2011 12:48

Before I head off and do all sorts of stats gathering, I was just wondering if anyone else had done something similar ..

given that I have something that happens on a date (be it financial or actions like "a memo"), and that accumulating totals by client, account, and  date, would it be best to store the date

a) as a date with an index

b) as three fields yyyy mm dd with an index on yyyy, an index on mm , and an index on dd

there are millions of records per year, with 25 years+ data

now, if I wanted to get a total of all records in January 2000

a) for each table no-lock where client eq "foo" and date ge 1/1/2000 and date le 31/12/2000

b) for each table no-lock where client eq "foo" and yyyy eq 2000 and mm eq 1

which would be faster, or more efficient ?

 * jmls goes off to play with 50m+ records ...

All Replies

Posted by gus on 19-Dec-2011 13:05

a) will be faster.

You might want to have a compound index on client and date. But I bet these are not the only queries you are going to perform, I bet the story has more to it.

Posted by jmls on 19-Dec-2011 13:07

if date were a datetime would that make a difference ?

And, of course, you are right. There are several queries that will be excecuted, some of them not involving the client ..

Posted by gus on 19-Dec-2011 13:29

Well, you said "date" but I was actually thinking "datetime". Since the

datetime datatype combines date and time it will be faster than having

separate date and time fields.

However, if you don't have time in your existing data (and you would likely

use 00:00:00.000 for the time part), datetime will be marginally slower than

a date field.

-gus

On 12/19/11 2:07 PM, "progresscommunities"

Posted by Tim Kuehn on 20-Dec-2011 08:27

gus wrote:

Well, you said "date" but I was actually thinking "datetime". Since the datetime datatype combines date and time it will be faster than having separate date and time fields.

This structure'll work if one exclusively does date-time comparisons, it's a bit harder to use when one wants to do date-specific comparisons / lookups. I've had situations where I need both/and, so I have to have both a date-time and a date field in a db table along with corresponding indexes.

Posted by gus on 20-Dec-2011 09:19

Can you give an example for why you need both?

Posted by Tim Kuehn on 20-Dec-2011 10:40

if I try to do something like this:

DEFINE TEMP-TABLE tt    NO-UNDO
    FIELD dt-field      AS DATE
    FIELD dtm-field     AS DATETIME

    INDEX i-dt      dt-field
    INDEX i-dtm     dtm-field
    .

FOR EACH tt
    WHERE tt.dtm-field = TODAY
    NO-LOCK:

    DISPLAY tt
        WITH FRAME f2
        DOWN. 

END.

I get:

+---------------------------- Error ----------------------------+
¦ ** Incompatible data types in expression or assignment. (223) ¦
¦      ** date-time.p Could not understand line 22. (196)       ¦
¦                                                               ¦
¦ ------------------------------------------------------------- ¦
¦                                                           ¦
+---------------------------------------------------------------+

In order to do date (range) matching against a date-time field, I'd need to do something like:

WHERE tt.dtm-field >= DATETIME(cur-date, 0) AND tt.dtm-field

which is a pain in the patootie, particularly when automated query generators are involved.

Posted by gus on 20-Dec-2011 12:40

Ah! I get it.

Posted by Peter Judge on 21-Dec-2011 10:19

This is where you might want to do something like

WHERE DATE(tt.tdm-field) EQ TODAY

This syntax works today (in 11 at least) but I would not expect it to perform particularly well because of the LHS-function. But I would think that this would be a nice thing to have and it might be worthwhile reporting as a bug/enhancement.

-- peter

Posted by jmls on 21-Dec-2011 10:24

arrgh! please, please do not encourage this by mentioning it !!.

Progress should not compile this statement.

Mind you, people like Dan and Tom will be rubbing their hands with

glee in anticipation of the performance issues coming down the line

Posted by Peter Judge on 21-Dec-2011 10:40

jmls wrote:

arrgh! please, please do not encourage this by mentioning it !!.

Progress should not compile this statement.

Mind you, people like Dan and Tom will be rubbing their hands with

glee in anticipation of the performance issues coming down the line

Should not compile this specific statement - ie the DATE - or functions-in-WHERE-clauses in general?

You can make an argument that DATE and DATETIME are wideing cases, and so should be treated similarly to CHAR / LONGCHAR and INT / INT64. I think in these cases there should be some flexibility without the performance hit. Adding functions (ABL or UDF) willy-nilly into a WHERE clause is not a good idea at all.

An alternative might be to extent the BEGINS statement to allow parts of datetime values on datetime fields:

  WHERE tt.dt-field BEGINS TODAY

or

  WHERE tt.dt-field BEGINS DATETIME(TODAY, 12, 00, 00).

The use of BEGINS semantically bight be a little odd, but I think you get my drift.

-- peter

Posted by jmls on 21-Dec-2011 10:49

lhs functions in where clauses. Or at least, get the compiler to throw

out a bloody great "Oi! do you want your query to run as slow as a

one-legged drunken kangaroo"

Posted by Peter Judge on 21-Dec-2011 10:51

jmls wrote:

lhs functions in where clauses. Or at least, get the compiler to throw

out a bloody great "Oi! do you want your query to run as slow as a

one-legged drunken kangaroo"

Do you claim copyright over that message text, or can we use it as-is? 

-- peter

Posted by jmls on 21-Dec-2011 10:54

Creative Commons. Use as you see fit .. .. Unless we get an

objection from our Antipodean friends ..

Posted by ChUIMonster on 21-Dec-2011 13:19

Don't forget to add a comment to any such code:

...

where LHS( somefield ) = something     /* improved performance! */

...

That way I'll have no trouble finding it when it is time to fix it

Posted by Tim Kuehn on 21-Dec-2011 14:15

pjudge wrote:

This is where you might want to do something like

WHERE DATE(tt.tdm-field) EQ TODAY





This syntax works today (in 11 at least) but I would not expect it to perform particularly well because of the LHS-function. But I would think that this would be a nice thing to have and it might be worthwhile reporting as a bug/enhancement.




-- peter


Absofrickinloutely NOT. I have no interest in doing things that may result in increased business for Tom!

This thread is closed