Looking for information about "too many temp tables&quo

Posted by Admin on 03-Mar-2010 07:22

In the thread here...

http://communities.progress.com/pcom/message/70180#70180

...there's reference to a "too many temp tables" problem.  At one point in the thread it's referred to as "expected behavior".  We're trying to make some architectural decisions and before we go too far we want to understand if this is something that we need to consider.  I've looked on the KB and searched here and I can't find anything else about the issue.  Can anyone point me to some more information on this?

All Replies

Posted by ChUIMonster on 03-Mar-2010 08:39

In OpenEdge10 the "temp-table database" uses type 2 storage areas with a cluster size of 8 and a default block size of 4.  When you define a temp-table, even if you never populate it with a single record, a minimum of 9 blocks will be allocated.  This is no big deal if you have a modest number of temp-tables.  But if your architecture creates lots and lots of temp-tables "just in case", or if you perhaps create a lot of single record temp-tables for some reason (maybe it is something that you do for every class in your OO architecture) you may be very surprised by 1) the amount of disk space used in the -T directory and 2) the amount of time that it takes to initialize a large number of temp-tables.

Currently you can mitigate this somewhat by choosing to use -tmpbsize 1 to change the default block size and by using a very large -Bt to buffer as much of the temp-table db as possible in RAM.

Posted by Admin on 03-Mar-2010 08:54

Thanks Tom!  That's exactly the kind of info I was looking for. 

Does all of what you said apply to temp tables defined as REFERENCE-ONLY as well?

Also, can you put any actual numbers to what you mean by "modest number of temp-tables" and "large number of temp-tables"? 

Posted by ChUIMonster on 03-Mar-2010 09:01

"Modest" is subjective but I would consider less than 100 to be modest.

Defining a temp-table is the key -- REFERENCE-ONLY should prevent a new instance from being created.  But I have not specifically looked into that, you might want to do some testing.

Posted by Admin on 03-Mar-2010 09:15

Any thoughts on how we'd go about doing the testing you suggest?  I mean, can you point me at something that would describe how we can access diagnostic data that will show us how temp-tables are being handled relative to the blocks used, etc.  If that kind of diagnostic data is available then I'd think we should be able to pretty readily identify if there's a difference between the way REFERENCE-ONLY temp-tables are handled and the way regular temp-tables are handled.

(sorry if that's a really silly question -- just diving back into Progress after a 3 year hiatus so I'm a bit rusty)

Posted by ChUIMonster on 03-Mar-2010 09:47

Temp-tables are almost completely opaque from a diagnostic POV.

The only thing that you can do is to specify a location for the DBI* file and use -t to make it visible (if you are using UNIX).

To test the behaviors described above you would write some programs that define lots of temp tables and then observe the size of the DBI file.  Something like this:

/* tt_test.p
*
* pro -T . -t -Bt 10 -nb 1000 -p tt_test.p
*
*/

define new global shared variable d as integer.
define new global shared variable t as integer.

define temp-table xyzzy
  field f as character
.

if d = 0 then update t label "Target depth:" with side-labels.

d = d + 1.
if d
  run tt_test.p.
else
  pause.

quit.

Run this and observe the size of the DBI file while at the prompt for target depth.  The base size can vary from release to release and depending on the -tmpbsize parameter but, on my system (Linux, 10.2B) it is 32,768.  Then observe the file size when it pauses at the end.  Modify -tmpbsize or the TT definition to test various scenarios.  In the case shown above a target depth of 100 results in a DBI file of 3,670,016 bytes.  (FYI adding REFERENCE-ONLY behaves as expected -- it does not cause DBI file growth.)

Posted by Admin on 03-Mar-2010 10:00

Excellent, thanks heaps Tom.

Posted by Thomas Mercer-Hursh on 03-Mar-2010 10:59

Tom has provided you (and the rest of us) with some great concrete testing on this question.

But, personally, I am curious what architectural decision you were considering which brought up the question.

Posted by Admin on 03-Mar-2010 11:50

Thanks for your interest Thomas.

I'm not sure how to answer your question without some explanation....

Well, I've been in the 3GL world exclusively for the last 3 years and before that I straddled the 3GL/4GL worlds.  A couple of the principles I've found really useful in 3GL OO development are single responsibility and favoring composition over inheritance.  This tends to produce architectures with many small classes that are highly specialized to a given task.  One approach we're looking at to deal with datasets involves custom classes that model the concept of a data source and a query.  As stand alone classes data sources would be responsible for wiring up a dataset particularly for pushing changes from the dataset to the database.  Query classes will encompass the functionality of a data source but for the purpose of reading data rather than writing it.  The data source and query classes are leveraged through a dao class as method/constructor parameters.  This allows us to persist/hydrate a given dataset in a variety of ways based on need. 

For both the data source and the query classes, we'd prefer that they have access to statically defined reference-only versions of the dataset and temp-tables.  We know we could do quite a bit dynamically through the handles but it relies on strings too much.  What I mean is that as the system evolves we have to rely on finding the errors that its evolution introduces at runtime because the compiler can't do anything to help locate them when everything is done dynamically.  Using references to statically defined structures we increase the compilers effectiveness as a bug hunter.  For instance, referencing a field dynamically in a buffer by the wrong name won't show as an error until runtime where as referencing it statically will show as an error at compile-time.  Apart from that, doing things dynamically tends to be a bit more cumbersome.  So, if our query and data source classes are going to have access to reference-only versions of the dataset and its temp tables that means we could end up with quite a few definitions.  That said, the query and data source objects have a very short lifetime and I'm wondering if that short lifetime will also help us as far as the "to many temp table" problem.

Posted by Thomas Mercer-Hursh on 03-Mar-2010 13:42

A couple of the principles I've found really useful in 3GL OO  development are single responsibility and favoring composition over  inheritance.

You might want to check out http://www.cintegrity.com/content/Object-Oriented-Design-Principles and possibly some of the other whitepapers in the vicinity.

Short lifespan will help the TMTT problem, for sure, since it is the total number of current TT which produces the symptoms.  But, the short lifespan might also raise a flag because a TT is a pretty heavyweight object to be creating and deleting all the time.

I would need to know more about your design to suggest alternatives, but one of the things I always think in terms of is encapsulating a TT in an object so that there is only ever one copy of it and providing that object with the methods and properties needed to manipulate the TT as desired.  I.e., put all parts of that responsibility encapsulated in one object.

Posted by Tim Kuehn on 03-Mar-2010 16:32

I've posted quite a bit about this problem, including some test cases using OO instances - unfortunately the thread with the attached code seems to've vanished.

The basic problem boils down to how the AVM handles the number of TTs defined in combination with the number concurrent program instances.

In my original case the application used a lot of super procedures to do certain things. These SP's in turn had a lot of TTs as API parameters for procedures that did things to said TTs. Each of those TT api definitions needed a TT definition, and that became part of a header file embedded in a procedure that calls APIs in said SP. This results in a lot of TT definitions existing concurrently in a set of procedure instances.

Since the AVM creates a TT instance whenever it encounters a TT definition regardless of whether it's actually used or not the system overflows it's -Bt space and overflows into the DBI file.

Using REFERENCE-ONLY will help reduce that problem since TT definitions like that implicitly state that the a TT definition from somewhere else will always be bound to the procedure's TT code. This  requires BINDing a _specific_ TT instance to a given persistent procedure / class instance TT, effectively creating a limited-scope shared TT. This is useful for some situations - it's also a problem for others.

Since TT data is stored in Type II structures - short-lived TTs aren't a real problem a the AVM is very good about creating a nuking TTs, particularly if they all fit in the application's -Bt area.

Bottom line - keep the number of TTs the application defines down and restrict their lifespan as much as possible, and you'll be ok.

Posted by Admin on 03-Mar-2010 17:13

Awesome info, thanks Tim!  Your explanation helps a lot. 

Posted by Thomas Mercer-Hursh on 09-Mar-2010 16:19

I've been doing a little calculating and benchmarking using something like Tom's little program.  The results have me wondering a little bit.

The total expected size in KB required for N empty temp-tables is N * 9 * tmpbsize.  So, for example, with the default -tmpbsize of 4, that means 72,000KB for 1000 empty temp-tables, dropping to 9,000KB with -tmpbsize of 1.  Now, 72MB used to be a lot of space, but it isn't really any more and 9MB isn't much at all, but 1000 simultaneous TT seems like rather more than modest.

So, I ran some benchmarks on my PC.  Once upon a time a few years ago, it would have been considered hot, but these days it isn't much.  The hotest thing about it is 10K SCSI disks which are still pretty good performers.  Time to run for 1000 TT with -tmpbsize was 4.77 seconds.  Now, nearly five seconds is an appreciable delay if immediately visible to the user, but not inordinate if something the user is doing something complex enough to justify 1000 temp-tables.  Drop -tmpbsize to 1 and this drops to 1.45s which seems like it wouldn't be a problem if not frequent.  Moreover, Tom's Q&D program is arriving at a particular number of temp-tables by doing a recursive run, so one is not just instantiating a TT, but also doing the run itself.  Over a range of 100 to 2000 TT, the run itself takes 65-71% of the time it takes to do the TTs.  I.e., the actual TT instantiation time is only 1/.3 of the figure in the -tmpbsize = 1 case.  Moreover, all of these times are for -Bt 10, i.e., worst case since so much of the build goes to disk.

-Bt can go up to 50,000.  with -tmpbsize 1, that means 55MB of memory ... not trivial, but again hardly a massive commitment in the modern era.  That's enough RAM for over 6100 TT in memory!

So, I'm wondering how much TMTT is really just a problem of not taking advantage of -tmpbsize and -Bt?

Posted by ChUIMonster on 09-Mar-2010 18:22

It may not be a big deal when it is just a single user on a PC (even an

old clunker). Like many performance and scalability issues it doesn't

look like a big deal on a small system. But it becomes a much bigger

deal when it is 100 or 1,000 or 10,000 users on a server. In those

cases you are either consuming gobs of disk space and IO ops or you are

wasting unseemly amounts of RAM that could be put to far better purposes.

Posted by Thomas Mercer-Hursh on 09-Mar-2010 18:53

Yes, but I think there are too major questions here.

One is whether we are talking about something architectural, i.e., an intrinsic pattern of use which means that every user will be using some large number of TTs, or situational, i.e., a given user may periodically have a specific task which uses large numbers of TTs.

The other is how many TTs are we talking about.

If we are *only* talking about a couple of hundred, it seems that a little tuning is all that should be required even if every session is using that many.

If we are talking a thousand or two, then, yes, that might be a problem in an architectural context, but still shouldn't be a problem for an individual session, as long as there is something meaningful being done with them.

This thread is closed