A case for deferred TT creation

Posted by Tim Kuehn on 30-Sep-2008 14:33

This DBI listing came from a development system after the client had problems with users on the production system "running out of space" on a HP-UX box running 10.1B

(user info) 168230912 Sep 30 11:56 /protemp/DBIa22729

in this case, the user went to a order-entry screen, and then did nothing. The net result is a DBI file of 168,230,912 bytes for a set of TTs that barely hold anything, or nothing at all!

I'm not sure where the TTs that go into this file are coming from either as there are plenty of cases where TT definitions may be included in a procedure but never used.

However, the time spent creating this DBI file may be the single root-cause why my client's procedure-managed code is taking so long to load.

All Replies

Posted by Thomas Mercer-Hursh on 30-Sep-2008 15:01

That's a lot of TTs! It would be interesting to know how many.

Posted by Tim Kuehn on 30-Sep-2008 15:08

It may not be, because if the DBI area is a Type II structure, then the DBI file size makes sense since for every TT (and possibly each of their associated indexes) there may be at least one cluster's worth of space required. If a cluster is a couple MB in size, then getting to 160MB may not take that much trying.

What I find puzzling is that, in cases where a static TT is never referenced either explicitly or via a handle, why the AVM would make an entry in the DBI file for it - with the resulting space usage and time delay taken while writing to disk.

Posted by Thomas Mercer-Hursh on 30-Sep-2008 15:41

I don't know if it is correct, but that is certainly an interesting model. One thinks of a non-instantiated TT as a tiny little thing, using very little memory, but if it is one per cluster in the DBI, then certainly it would add up pretty quickly. And, it could well be that PSC never thought that any one would use 50-100 TTs all at once, much less multiple hundreds (like the guy on PEG).

Have you considered taking this and the other information to tech support to see if you can get an explanation?

Posted by Tim Kuehn on 30-Sep-2008 15:46

I don't know if it is correct, but that is certainly

an interesting model. One thinks of a

non-instantiated TT as a tiny little thing, using

very little memory,

Exactly! There should be some kind of placeholder in memory, and the TT instantiated when it's first used and not before.

Have you considered taking this and the other

information to tech support to see if you can get an

explanation?

Yep.

"Not a bug, this is expected behavior, we're not going to change it, move along."

They're actually the ones who told me that TTs are setup when a new program is run, which provided me with the clue needed to look for things like this rather impressively sized DBI file.

Posted by Tim Kuehn on 30-Sep-2008 16:05

We have a smoking gun.....

Tim

ID: P122597

Title: "The size of the DBI file is much bigger in OpenEdge 10"

Created: 03/02/2007 Last Modified: 01/08/2008

Status: Unverified

Symptoms:

  1. The size of the DBI file is much bigger in OpenEdge 10

  2. The DBI file size is much bigger after upgrading to OpenEdge

  3. The session DBI file in Progress 9.x is typically a few K blocks in size

  4. In OpenEdge 10.x the DBI file is a few hundred K blocks for the same application code

  5. A temp table in OpenEdge increases the size of the DBI file much more than in Progress version 9

Facts:

  1. All Supported Operating Systems

  2. OpenEdge 10.x

Cause:

The cause of this is because the DBI file utilizes Type II Storage Areas in OpenEdge 10. As such the allocation of the DBI file uses clusters in OpenEdge instead of blocks, and subsequently the DBI file uses much more disk space.

The size of the DBI file can be estimated. It should be about 9 blocks * # of ACTIVE temp tables. (unless any one temp table contains more than 8 blocks of data).

Also, be aware that, the default block size of the temp tables has changed in OpenEdge 10.1B from 1K to 4K. This is to support the new large index key entries feature. So, if there are no plans to use large index key entries, then request a 1K temp table block size by specifying -tmpbsize 1 on client startup.

Fixes:

This is expected behavior.

To reduce the size of the DBI file use the client startup parameter -tmpbsize set to 1.

Notes:

References to Written Documentation:

Progress Solution(s):

P81853, "Is there any advantage to using Type II Areas for a table when I "

P81745, "Guidelines for Type II Area Blocks per Cluster setting with Progress OpenEdge 10.X?"

Posted by Tim Kuehn on 30-Sep-2008 16:09

another one...

ID: P134680

Title: "Increasing temptable blocksize decreases performance due to DBI growth."

Created: 08/29/2008 Last Modified: 08/29/2008

Status: Unverified

Symptoms:

  1. Increasing temptable blocksize decreases performance due to DBI growth.

Facts:

  1. After upgrading to OpenEdge 10.1B the DBI size increases.

  2. Increasing tmpbsize from 1kb to 8kb increases DBI size substantially.

Cause:

Increasing the temptable blocksize will increase the amount initially allocated for temptables. The more temptables that are defined, the more memory/DBI space is consumed.

Using an 8kb temptable blocksize will allocate approximately 8 times the amount compared to using a 1kb blocksize.

Note that it is only the initial allocation that increases to this degree. When the temptables actually get used then they are more comparable.

For example:

1kb blk 8kb blk

Defining only 442368 3538944

Populating 15884288 17629184

Due to the increase in temptable space allocation, if the DBI file is being used a lot more then it can negatively impact performance especially when large numbers of users are involved.

In 10.1B the default temptable blocksize has been increased from 1kb to 4kb.

Fixes:

There are a couple of solutions to the problem:

1. Decrease the temptable blocksize by using the -tmpbsize parameter:

-tmpbsize 1

2. Increase the amount of memory each client allocates for temptable buffers by using the -Bt parameter. The DBI file is used once the temptable buffers have all been filled. Note that this memory is allocated for EACH CLIENT. Increasing this parameter on a system that doesn't have adequate memory can cause negative performance.

The -Bt parameter specifies the number of buffers to allocate, each buffer is the size of a temptable block.

In Progress versions up to 10.0B the default is -Bt 10.

In Progress version 10.1A onward the default is -Bt 255.

Posted by Tim Kuehn on 30-Sep-2008 16:17

Here's the difference running the same code:

-tmpbsize 8

52,887,552 Sep 30 17:17 DBIa27416

-tmpbsize 1

6,709,248 Sep 30 17:15 DBIa27253

Posted by Thomas Mercer-Hursh on 30-Sep-2008 16:30

So, what kind of performance are you seeing with the 1K size?

Just thinking off the top of my head here, but it seems to me that if there are a lot of TTs defined, there is a good chance that many are either empty or have only a few records (or, at least if there are a large number of all active TTs all with a lot of data, then one shouldn't be surprised at less than good performance). That being the case, then it seems like going for a small blocksize would be indicated.

Deferred instantiation would only actually help if there were many more TTs defined than were actually used. I suppose that is not a very ordinary programming model. Instead, I would think that if there were a TT (or group) that might or might not get used, then one would put it (them) into their own .p or .cls and only instantiate it when needed.

In any event, it does seem to produce a very strong argument against the use of a TT in a class to hold entity data since that is bound to result in a large number of very small TTs and thus a huge disk overhead.

Posted by Tim Kuehn on 30-Sep-2008 16:38

So, what kind of performance are you seeing with the

1K size?

Operationally - I didn't try it, this was just to see if -tmpbsize 1 would shrink the DBI very much. I would think that the user could handle the operational slowdown compared to the time delay loading the program set.

Just thinking off the top of my head here, but it

seems to me that if there are a lot of TTs defined,

there is a good chance that many are either empty or

have only a few records (or, at least if there are a

large number of all active TTs all with a lot of

data, then one shouldn't be surprised at less than

good performance). That being the case, then it

seems like going for a small blocksize would be

indicated.

Yes.

Deferred instantiation would only actually help if

there were many more TTs defined than were actually

used. I suppose that is not a very ordinary

programming model.

In our case, there are a number of cases where a set of related TTs are defined in a .tt file, even though the including program may only use one of those TTs, so there could be a number of un-used TTs created when the procedures are instantiated.

Instead, I would think that if

there were a TT (or group) that might or might not

get used, then one would put it (them) into their own

.p or .cls and only instantiate it when needed.

If one wanted to get that fine-grained about it, sure. But I don't see that happening in a typical shop.

In any event, it does seem to produce a very strong

argument against the use of a TT in a class to hold

entity data since that is bound to result in a large

number of very small TTs and thus a huge disk

overhead.

Which is what I'll bet was going on when running code with a high TT count from the the class instance generator I posted earlier.

Posted by Thomas Mercer-Hursh on 30-Sep-2008 17:13

So, are you going to try this at the customer with the problem? It would certainly be interesting to see the result.

In our case, there are a number of cases where a set of related TTs are defined in a .tt file, even though the including program may only use one of those TTs, so there could be a number of un-used TTs created when the procedures are instantiated.

Doesn't seem like the way I would design. Reminds me of the old days where people had these huge include files with a zillion shared variables and any one program might only access a couple of them.

If one wanted to get that fine-grained about it, sure. But I don't see that happening in a typical shop.

I don't suppose I can make any claims for typicity, but it is sure the way things are done around here. It is very, very rare these days that I use an include file of any form ... and the only instance I can think of off the top of my head in several years was a consequence of not having generics for the collection classes.

Posted by Tim Kuehn on 30-Sep-2008 19:57

So, are you going to try this at the customer with

the problem? It would certainly be interesting to

see the result.

I'll certainly bring it up.

In our case, there are a number of cases where a set

of related TTs are defined in a .tt file, even though

the including program may only use one of those TTs,

so there could be a number of un-used TTs created

when the procedures are instantiated.

Doesn't seem like the way I would design. Reminds me

of the old days where people had these huge include

files with a zillion shared variables and any one

program might only access a couple of them.

With associated TTs defined in a file, there are no 'zillions' of things to maintain in different places like with shared vars.

If that approach wasn't followed, then it would have to be a bunch of base tt defn's, and then a multitude of different includes to get the different tt combinations. Or multiple includes in a given parent file, one for each tt - but tha would require remembering those combinations when passing multiple TTs to an API.

If one wanted to get that fine-grained about it,

sure. But I don't see that happening in a typical

shop.

I don't suppose I can make any claims for typicity,

but it is sure the way things are done around here.

It is very, very rare these days that I use an

include file of any form ... and the only instance I

can think of off the top of my head in several years

was a consequence of not having generics for the

collection classes.

So how do you pass TTs between APIs? Code generator defines the TT in a class, and then use the class to manage the tt?

Posted by Thomas Mercer-Hursh on 01-Oct-2008 11:27

So how do you pass TTs between APIs?

Current theory is to pass XML if over a wire or the enclosing class if not.

Posted by Tim Kuehn on 01-Oct-2008 12:29

Current theory is to pass XML if over a wire or the

enclosing class if not.

which is why I do things the way I do - there's no "enclosing" class in this application.

Yet.

Posted by Thomas Mercer-Hursh on 01-Oct-2008 12:55

Doesn't need to be a .cls. A .p is just as enclosing.

The question to ask here is whether or not there is a sound reason to have large numbers of temp-tables. If not, then discovering this "property" that large numbers of temp-tables uses up large amounts of disk and slows performance is an interesting fact, but not really a problem that needs to be solved. If they were a large number of TTs with a lot of data, then the problem is inherently complex and one probably has modest expectations of performance. Moreover, lazy instantiation wouldn't help. Further, a smaller block size probably wouldn't help either.

Your issue, though, is that you have a large number of temp-tables, many of which are not being used. So, for you, you don't expect the performance issue because you aren't doing that much. So, for you, lazy instantiation would help and a smaller block size might provide some level of help ... we hope that is the case and are looking forward to hearing what you find.

The key question, though, is whether PSC should invest the work necessary to move to lazy instantiation. I don't know how complex this is, but it sounds like it might be difficult. So, we have situations where people have wanted to use large numbers of temp-tables, like that fellow on PEG who put one in every entity object. Lazy instantiation doesn't help there since there is actually data in them. All that would help there is not allocating disk space until it was necessary to overflow memory and giving it enough memory to work with.

So, I guess we actually have two versions of lazy instantiation. 1) Not building a structure at all until needed; and 2) Not allocating disk space until an overflow condition was reached. Reminds me a bit of the problem with some Unix systems in which one had to allocate large amounts of disk space for swap even though one had more than enough RAM to never swap.

Both 1 & 2 sound like they could be good ideas, but the question to ask is how important they are. #1 is only relevant if there are large numbers of TTs which are never used. My own reaction is to think that possibly this isn't the best possible programming and that if one encapsulated things, the issue wouldn't arise. I have some trouble coming up with examples where I would want to do such a thing by design.

#2 seems like it could be of wider applicability since much of the time we are likely to provide enough memory for the TTs to all be entirely in memory and on those rare occasions when we want really large TTs, then we are unlikely to mind a tiny performance hit for initializing the overflow area.

Posted by Tim Kuehn on 01-Oct-2008 13:34

Another optimization is for the compiler to ignore TT definitions which are never used. This would be a clear win in cases where associated TT def'ns are declared in an include file and referenced by a parent file which which may only use one or a few of those definitions.

This wouldn't require any deep changes to the AVM to accomplish - only a change to the compiler.

more on this later.

Posted by Admin on 01-Oct-2008 13:39

That sounds like a very strange idea to me. If it's part of the source, then I want it to be in the r-code.

Why a special treatment for temp-tables? Just because they tend to become large? Then the developer should take better care about their declaration and their use or not use.

Mike

Posted by Admin on 01-Oct-2008 13:41

Just think what that might cause to case that is build around SESSION:FIRST-BUFFER and SESSION:FIRST-DATA-SET.

Posted by Thomas Mercer-Hursh on 01-Oct-2008 13:44

Next he'll want it to detect dead code between IF FALSE THEN DO ... END

One of the major maintenance headaches here seems to me to be one that is very similar to the issue with the 80s style includes full of shared variables. I.e., one doesn't know very easily where something is used be cause it is included in a bunch of things where it is not used, but of course, its mere inclusion means there is a reference.

Posted by Admin on 01-Oct-2008 13:48

Kind of.

But in this case - the large include files. Why wouldn't you add &IF ... THEN &ENDIF directives around each temp-table declaration. Thomas will still not like it (I bet), but you can reuse the include files but still control (manually!) which temp-table should be in and which temp-table should be out?

Posted by Tim Kuehn on 01-Oct-2008 13:51

Just think what that might cause to case that is

build around SESSION:FIRST-BUFFER and

SESSION:FIRST-DATA-SET.

none at all:

FIRST-BUFFER attribute: Returns the handle for the first dynamic buffer in the first table containing a dynamic buffer. The table may be either a temp-table or a connected database, in that order. If no dynamic temp-table or database buffers exist in the session, it returns the Unknown value (?).

Note: Only dynamic buffers created with the CREATE BUFFER statement are chained on the SESSION system handle.

I only want un-used static TT def's to be ignored.

Also, one can't "optimize out" IF FALSE THEN code because it could contain code which impacts the scope of a buffer or something. (I've seen cases where broken code was "fixed" by adding a statement in an IF FALSE THEN statement....)

Posted by Thomas Mercer-Hursh on 01-Oct-2008 14:00

Thomas will still not like it (I bet)

Got that right! Preprocessor devil spawn!

Posted by Tim Kuehn on 01-Oct-2008 14:07

That sounds like a very strange idea to me. If it's

part of the source, then I want it to be in the

r-code.

A static TT defn may be part of the r-code, but if it's not used, then there's no way for a program to get to it, so such definitions can be safely ignored.

Why a special treatment for temp-tables?

If the AVM is instantiating each TT defn regardless of whether it's being used or not, then that'll result in a lot of un-needed disk activity, and corresponding program / user delay.

Just because

they tend to become large? Then the developer should

take better care about their declaration and their

use or not use.

Maybe so, but if a TT isn't used, then it shouldn't be instantiated.

Posted by Thomas Mercer-Hursh on 01-Oct-2008 14:11

An unused static TT in a procedure is like using a LIKE definition to a table that is not referenced in that procedure. It creates an apparent connection where none exists. That is a maintenance headache.

I understand that you have an empirical problem that this is the way the code is and there is unlikely to be a budget for moving it to a more encapsulated form. That's why I hope that the block size adjustment has a significant benefit for you, enough to turn a problem into a temporary non-problem.

But, it is hard for me to blame the code more than PSC here since the problem is being caused by a lot of definitions for things being in places where they don't belong. I see a much better argument for my type 2 fix, i.e., deferring any disk impact until the TT overflows memory than for your type 1 fix of lazy instantiation. If one has a valid need for a large number of TTs, type 1 does nothing to help. It helps you, but it does nothing for people who actually use what they define in their code. The type 2 fix, however, would benefit both you and anyone who used a large number of small TTs and its only impact on those using TTs which overflowed is that the disk initialization pause would be moved from program start up to some time in the processing. If one is processing that much data, I doubt that one would notice the pause there, particularly since it could well be only one of a dozen TTs which overflowed. Whereas, a pause at program startup is very noticeable and annoys people. Note too that encapsulating the TTs in .ps or .cls means a small impact to instantiate the program or class, but trivial and it only happens if you need it.

So, I am supportive of the type 2 fix, but question the type 1 fix.

Posted by Admin on 01-Oct-2008 14:12

Maybe so, but if a TT isn't used, then it shouldn't

be instantiated.

... then avoid to use it! (in your source code).

If it's not in the source code, it won't cause disk activity as well. I know we are turning in circles. But I've seen nothing here relevant enough to change the current behaviour of the compiler.

Posted by Thomas Mercer-Hursh on 01-Oct-2008 14:16

A static TT defn may be part of the r-code, but if it's not used, then there's no way for a program to get to it, so such definitions can be safely ignored.

They can also be safely removed!

If the AVM is instantiating each TT defn regardless of whether it's being used or not, then that'll result in a lot of un-needed disk activity, and corresponding program / user delay.

But, the point here is that there is no benefit to someone who actually uses the TTs they define. The only benefit comes if there are significant numbers ... rather large numbers, apparently, ... of TTs which are never referenced. Not only does this not apply to most people, it is difficult to suggest that it represents best practice.

Maybe so, but if a TT isn't used, then it shouldn't be instantiated.

If it isn't used, then it shouldn't be there ... then the problem doesn't exist.

Posted by Thomas Mercer-Hursh on 01-Oct-2008 14:18

I've seen nothing here relevant enough to change the current behaviour of the compiler.

Except that I see no reason to spend time and effort creating a disk extension of a TT that will never overflow. Why pay a substantial start up penalty for an application that uses 100 TTs when I know that I will always give it enough memory that all of those TTs will be in core? That's a change that would improve performance for anyone using a lot of TTs and would also solve Tim's problem since unused TTs are very unlikely to overflow!

Posted by Tim Kuehn on 01-Oct-2008 14:26

They can also be safely removed! :)

Safely - yes.

Easily - not always.

Posted by jtownsen on 01-Oct-2008 14:32

...and

DEFINE VARIABLE myFavouriteTableOrTempTable AS CHARACTER.

DEFINE VARIABLE hFavouriteTableOrTempTable AS HANDLE.

myFavouriteTableOrTempTable = getRandomTableOrTempTableName().

CREATE BUFFER hFavouriteTableOrTempTable FOR TABLE myFavouriteTableOrTempTable.

Posted by Thomas Mercer-Hursh on 01-Oct-2008 14:38

To be sure, it is not trivial, but it also doesn't seem that hard. How many of these includes do you have and how many TTs in each and how many references?

While not trivial, it would be pretty straightforward to analyze each reference to see what TTs are actually used in that reference. From that you will have a picture of all possible reference combinations. Put the definition for each TT in its own include and then create one include for each combination that simply includes the two or more TT includes. Then substitute each reference for the appropriate reference combination or single include which applies. I'll bet with a little help from John Green you could even automate a lot of it.

I still wouldn't like it much because of all those includes, but it requires no change in the way you are using TTs.

Posted by Tim Kuehn on 01-Oct-2008 14:39

in which case one couldn't do the optimization for that .p/.pp/.sp.

Posted by Admin on 01-Oct-2008 14:42

Great! Dirty, but great!

Thank you Jamie. That was the case I was looking for. I knew it exists, but...

Posted by Admin on 01-Oct-2008 14:43

A human knowing all possible results of the function could. Not the compiler.

Posted by Tim Kuehn on 01-Oct-2008 14:46

and in those cases, the TTs couldn't be optimized out.

but that still leaves lots of other programs w/out that statement which could have their un-used static TTs optimized out.

Posted by Admin on 01-Oct-2008 14:51

Please write a spec for the compiler rule to identify those cases and don't harm those cases with dynamic expressions returning a table name like in Jamies CREATE BUFFER statement.

Posted by Tim Kuehn on 01-Oct-2008 14:55

CREATE BUFFER FOR TABLE only references buffers for TTs which are local to the procedure.

Procedures which do not have this statement in them do not have TTs which are in danger of having buffers made for them, so their un-used static TTs can be safely optimized out.

QED.

Posted by Admin on 01-Oct-2008 14:58

No. No. No.

Don't punish those with valid, harmless CREATE BUFFER FOR TABLE statements for dirty code of people that for what reasons ever refuse to optimize their include files (with &IF THEN &ENDID).

Posted by Tim Kuehn on 01-Oct-2008 15:00

what are you talking about?

Posted by Admin on 01-Oct-2008 15:06

That rule you described creates unpredictable behavior. Temp-Tables (and the performance hit) would depend on a treatment of code, that I believe should be fixed by the developer.

Potentially a small number of temp-tables in a procedure might perform worse, than other programs with 100 unused temp-tables. Just because in the first case I'm using a harmless CREATE BUFFER statement. That does not sound like a good coincidence.

I don't say, that I don't see a problem in general. Maybe Thomas' #2 is a good solution. But I really doubt, that the compiler can ever make a good decision here.

Posted by Thomas Mercer-Hursh on 01-Oct-2008 15:07

The point is that it is difficult, at best, to determine that a TT can be optimized out at the compiler level and it only benefits people who, for whatever reason, have a large number of TTs defined but not used, which is not going to be most people. Thus, this type of fix seems to be mostly a fix for your specific problem, not for a general issue that people are having with the problem. The other type of fix, deferring the disk piece until needed, would benefit you and anyone else using a large number of temp-tables ... it might even fix the guy on the PEG with a TT in every entity class.

Posted by Tim Kuehn on 02-Oct-2008 10:10

With all due respect to those who suggest that this is "just" my problem...please read the following thread on the PEG:

http://www.peg.com/lists/peg/web/msg18374.html

Posted by Tim Kuehn on 02-Oct-2008 10:22

That rule you described creates unpredictable

behavior. Temp-Tables (and the performance hit) would

depend on a treatment of code, that I believe should

be fixed by the developer.

And what wonderland do you live in which have legacy applications written by developers who had complete knowledge of how the AVM behaves and wrote their code accordingly?

Potentially a small number of temp-tables in a

procedure might perform worse, than other programs

with 100 unused temp-tables.

That's a distinct possibility.

Just because in the

first case I'm using a harmless CREATE BUFFER

statement.

It's not harmless if it can reference one of those static TTs, and the compiler can't tell which one'll be the target.

That does not sound like a good coincidence.

Well, what do you want me to say?

Would you rather leave things the way they are and have poor performance for all procedures with un-used static TTs?

I don't say, that I don't see a problem in general.

Maybe Thomas' #2 is a good solution. But I really

doubt, that the compiler can ever make a good

decision here.

That's a question for the people that do compiler work to answer.

Based on what I know about such things, I think optimizing out un-referenced static TTs by the compiler would be a much easier to do than changing the AVM to do delayed TT instantiation.

Given a choice between the two, though, I'd rather have the delayed TT instantiation.

Posted by Admin on 02-Oct-2008 10:28

Based on what I know about such things, I think

optimizing out un-referenced static TTs by the

compiler would be a much easier to do than changing

the AVM to do delayed TT instantiation.

The days of my compiler building classes have passed since a long time. But wouldn't that require a 2-pass-compiler? Do the first compile run and then evaluate what's not required in a second pass.

We have the 2pc for OO classes, not for procedures. If it would have been an easy step, why hasen't it been introduced already?

Posted by Tim Kuehn on 02-Oct-2008 10:29

The point is that it is difficult, at best, to

determine that a TT can be optimized out at the

compiler level and it only benefits people who, for

whatever reason, have a large number of TTs defined

but not used, which is not going to be most people.

"not going to be most people"?!? How do you come to that amazing conclusion?

And even if it is a few people, that kind of performance hit is about as acceptable as having to empty all the TTs before closing the session.

Thus, this type of fix seems to be mostly a fix for

your specific problem, not for a general issue that

people are having with the problem.

See the PEG thread - others are having this issue. That there are two KB's on this also suggests that this is not a "just Tim" issue.

The other type

of fix, deferring the disk piece until needed, would

benefit you and anyone else using a large number

of temp-tables ... it might even fix the guy on the

PEG with a TT in every entity class.

I agree that delayed TT instantiation would be the preferable way to go, but I'm not optimistic it can be done anytime soon compared to optimizing out un-used static TT references by the compiler.

Posted by Tim Kuehn on 02-Oct-2008 10:31

The days of my compiler building classes have passed

since a long time. But wouldn't that require a

2-pass-compiler? Do the first compile run and then

evaluate what's not required in a second pass.

I'd guess it'd be one pass to tokenize / parse things, and then a generation phase to emit the r-code.

We have the 2pc for OO classes, not for procedures.

If it would have been an easy step, why hasen't it

been introduced already?

Because not enough people are making enough noise about this issue to get it scheduled along with the 100 years worth of work requests they already have.

Posted by Thomas Mercer-Hursh on 02-Oct-2008 10:57

"just" my problem.

Not just your problem, but only the problem of someone who includes large numbers of unused temp-tables. It isn't at all clear that the poster to which you point has unused temp-tables. He may simply have a lot of ones in use. In which case, any optimization for unused wouldn't help him a bit.

The deferred instantiation of the disk portion might, however.

Posted by Thomas Mercer-Hursh on 02-Oct-2008 11:09

That's a question for the people that do compiler work to answer.

Not necessarily. I suppose there are actually three different proposals on the table at this point:

1) Have the compiler null out the instantiation of any temp-tables which are not used.

1.5) Have the AVM defer any instantiation processing on temp-tables until they are referenced. This might require some change in the compiler, but it might not.

2) Have the AVM defer any DBI disk processing on temp-tables until they actually overflow. I'm pretty sure this requires no change to the compiler.

1 seems problematic because of dynamic constructs and only helps people with unused temp-tables. People who are using all their temp-tables are not helped a bit.

1.5 might spread out the impact to those with lots of used temp-tables, but only if their access is spread throughout a procedure. It does help those with lots of unused temp-tables. This might require some fairly interesting changes to the AVM since all instantiation processing which I would guess normally happens when the definition was hit would have to be cached and associated with a first reference and the first reference could well not be deterministic.

2 has a material benefit to anyone using any number of temp-tables, unused or not. The only one it doesn't help is those with lots of temp-tables which are overflowing to disk ... and I'm not sure there is much that could be done to help them. This seems to me an easier change to the AVM since the basic instantiation of the temp-table in memory can be done when the definition is encountered and the only real change is a flag to indicate that the disk portion is not yet done and modifying the logic for slopping over to disk to check that flag and do the disk initialization if needed.

So, I agree that there is a potential big win here. I just don't agree on either of your proposed solutions.

Posted by Thomas Mercer-Hursh on 02-Oct-2008 11:11

Do the first compile run and then evaluate what's not required in a second pass.

And to skip doing it if there are any dynamic references.

Posted by Thomas Mercer-Hursh on 02-Oct-2008 11:18

See the PEG thread - others are having this issue.

The other cases I have heard of are either clearly a case of lots of temp-tables which are all being used (the guy with the TT in every widget) or are cases where I at least see no indication that the temp-tables are unused. Yours is the only case where I am aware of the large number of unused temp-tables.

Solution 1 provides no help to those who are using all the tables. Solution 1.5 might provide a little spreading out of the impact, but only if there are meaningful gaps between when the TTs are encountered in the session. So, its big impact is again only on those who have lots of unused temp-tables. Solution 2 benefits everyone except those who have lots of temp-table spilling over onto disk and I don't think there is much one can do for them except give them more memory.

I agree that delayed TT instantiation would be the preferable way to go, but I'm not optimistic it can be done anytime soon compared to optimizing out un-used static TT references by the compiler.

See above. I am not proposing deferred instantiation. That is 1.5. I am proposing simply that temp-tables be treated as if they were going to live entirely in memory until such time as they don't. There has to be logic in there already that detects that in memory spaces is used up and it is time to go to disk, so instantiation of the disk portion could be linked to that.

Posted by Thomas Mercer-Hursh on 02-Oct-2008 11:23

I'd guess it'd be one pass to tokenize / parse things, and then a generation phase to emit the r-code.

Which sounds like at least as much change to the compiler as adding OO, but with impact only on people who have lots of temp-tables in their code that they aren't using.

Because not enough people are making enough noise about this issue to get it scheduled along with the 100 years worth of work requests they already have.

Or, because the work around is to change the code in a way which actually makes it better code. I.e., the cost-benefit ratio is high compared to the number of people helped. Whereas, my proposal seems like it would benefit anyone using any number of temp-tables, referenced or not.

Posted by Tim Kuehn on 02-Oct-2008 11:29

I am proposing simply

that temp-tables be treated as if they were going to

live entirely in memory until such time as they

don't. There has to be logic in there already that

detects that in memory spaces is used up and it is

time to go to disk, so instantiation of the disk

portion could be linked to that.

you mean the -Bt parameter.

Posted by Thomas Mercer-Hursh on 02-Oct-2008 11:53

I mean that when the space provided by -BT is used up, there needs to be code that slops to disk. I am proposing that this code have a flag per TT to indicate whether the disk image has been initialized and to do it when needed if it has not.

Posted by Tim Kuehn on 02-Oct-2008 11:59

"slopping to disk" is where the DBI files are coming from, and is already part of the system's current behavior.

Beyond that, you're basically talking delayed instantiation again.

Posted by Thomas Mercer-Hursh on 02-Oct-2008 12:25

No, my perception based on your description is that the initial DBI space is being reserved for each temp-table at the point where it is defined, whether or not it is ever being used. Otherwise, you wouldn't be getting huge DBIs when only a small number of the temp-tables are actually referenced. This is certainly a direct parallel to what many Unix systems have done with swap space in the past (and perhaps still do). So, I am advocating instantiating the in-memory portion of the temp-table at the time the definition is encountered and just deferring the DBI extension until needed.

In fact, I would assume that in many cases, possibly even yours, supplying a sufficient -Bt would mean that nothing ever went to disk.

Posted by Tim Kuehn on 02-Oct-2008 12:30

No, my perception based on your description is that

the initial DBI space is being reserved for each

temp-table at the point where it is defined, whether

or not it is ever being used.

At the point where I was observing the huge DBI file, the user had gone to a menu, but hadn't done anything. So, the size of the file wasn't due to data in the TT, but by how much space the AVM had allocated for the TTs.

Otherwise, you

wouldn't be getting huge DBIs when only a small

number of the temp-tables are actually referenced.

This is certainly a direct parallel to what many

Unix systems have done with swap space in the past

(and perhaps still do). So, I am advocating

instantiating the in-memory portion of the

temp-table at the time the definition is encountered

and just deferring the DBI extension until needed.

It's doing that to a certain extent right now, but the problem appears to be how much space is being allocating to each TT structure.

In fact, I would assume that in many cases, possibly

even yours, supplying a sufficient -Bt would mean

that nothing ever went to disk.

using a lower -tmpbsize helped in one test, as it substantially reduced the amount of space that was taken up. However, I'm not sure what kind of operational impact it would have when the user actually started doing work.

Posted by ChUIMonster on 02-Oct-2008 13:09

I don't think that this is the same issue as the PEG posting. The PEG thing seems to have more to do with lots of -T space usage associated with lots of data in a temp-table. Yours seems to be about lots of -T usage but no data.

In your 1st post you've got 160MB (more or less) of disk being used. You apparently have -tmbsize 8 for that run which, according to the kbase rule, would work out to 228 temp-tables. Given your description of creating "lots" of temp tables that doesn't sound unreasonable. (72 8k blocks per table would be needed.)

If you're really defining 228 distinct temp-tables, and really need to, then you should face the facts and set -Bt to accommodate that. 20,000 or so should do the trick. Or drop -tmpbsize down to 1 and use -Bt 2500 (plus whatever you need for actual data).

(I'm not sure that I believe the kbase is 100% correct -- it only seems to be accounting for the data portion of the table. The 9 blocks sounds, to me, like 1 cluster of 8 blocks plus a block for the default index. I would guess that each additional index is another block...)

(As an aside -- do you really need "lots" of temp-tables? Or are these all basically the same? Could you simply add an "owner" column and make a smaller set of global TTs?)

I do sort of like the idea of deferred creation. A sort of "create on write" sort of approach might be an interesting option to have. After all why go to the trouble of allocating memory if it never gets used? It would be nice if such a feature were combined with making -Bt more like -mmax. IOW, make it a soft limit and don't allocate any RAM unless it is actually needed.

I would also very much like to see some instrumentation that sheds light on what is actually happening internally with temp-tables. We need many of the same VSTs and meta-schema tables as we have for the database -- but with a local, tt flavor to them. I'd really, really, like to be able to do something like:

for each _ttFile:

display _ttFile-name.

end.

and:

find _ttActBuffer.

display _ttBuffer-LogicRds.

and so forth. ttTableStats and ttIndexStats too please

We also probably need to be able to assign temp tables to storage areas with a particular RPB setting. Maybe we only need pre-defined SAs on a per RPB basis or maybe we need something more flexible but the knobs and dials currently available aren't nearly enough.

Posted by ChUIMonster on 02-Oct-2008 13:13

Oops...

A slight arithmetic problem Each table would take 72 -- 9 blocks @ 8k each. So 160MB of empty temp-tables translates to 2,200 temp tables. That's a bit less reasonable than I thought originally.

So Tim... how many temp tables do you really create? And how many indexes are there?

Posted by Thomas Mercer-Hursh on 02-Oct-2008 13:43

At the point where I was observing the huge DBI file, the user had gone to a menu, but hadn't done anything. So, the size of the file wasn't due to data in the TT, but by how much space the AVM had allocated for the TTs.

Exactly. This is my point.

It's doing that to a certain extent right now, but the problem appears to be how much space is being allocating to each TT structure.

While I have no real expertise in the area or inside knowledge, everything I am seeing at this point suggests that when the TT definition is encountered, the in memory part is initialized and so is a block in DBI that is the logical extension of the in memory portion. If it ever slops into that DBI space and if it uses up that space, then it adds more space. For an unused temp-table this never happens, but you are still paying the penalty of allocating the disk block you don't need.

using a lower -tmpbsize helped in one test, as it substantially reduced the amount of space that was taken up. However, I'm not sure what kind of operational impact it would have when the user actually started doing work.

Well worth a test, I should think.

Posted by Tim Kuehn on 02-Oct-2008 14:03

So Tim... how many temp tables do you really create?

And how many indexes are there?

that seems to be the million-dollar question.

The application loads upwards of 150 SP/PP files in some places, and the 160MB DBI file is the result of just entering a menu before the user does anything.

The individual programs may have APIs which pass TTs as parameters, which would mean each referenced TT defn is included as part of the procedure / function header. If more than one TT defn is included in a TT defn file, or APIs are referenced which aren't used, then there's the distinct possibility that a given PP / SP / .p could have a multitude of TT defn's which are never used.

Posted by ChUIMonster on 02-Oct-2008 14:29

In other words a few thousand wouldn't be a completely crazy number to expect to find.

Posted by Thomas Mercer-Hursh on 02-Oct-2008 14:42

So, another avenue for reduction would be consolidating some of the PPs so there weren't as many instances.

Posted by Tim Kuehn on 02-Oct-2008 14:42

In other words a few thousand wouldn't be a completely crazy number to expect to find.

nope.

I never, in my wildest dreams, anticipated running into this kind of problem.

Posted by Tim Kuehn on 02-Oct-2008 14:47

So, another avenue for reduction would be consolidating some of the PPs so there weren't as many instances.

Not easily... while the code structure lends itself to that kind of refactoring, there's over 1200 SP's in this system, not to mention .p's, so the sheer scope of the effort would be a bit daunting.

Posted by Thomas Mercer-Hursh on 02-Oct-2008 15:03

Well, cross your fingers for -tmpbsize, I guess. You certainly aren't going to get a fix from PSC for quite a while, so if a parameter doesn't fix it, the only choice is changing the code. Like I said, I would think that a little help from John Green would give you all the combinations of what TTs are actually used and identify where they are and aren't referenced, so refactoring to refer only to needed TTs could be done with a lot of automation. That seems like the easiest code fix.

Posted by Tim Kuehn on 02-Oct-2008 15:28

The DBI file size isn't the biggest issue per-se, but that the current db structure, navigating through the application requires loading and unloading the procedures as the user moves through the menus. It's the resulting delays where the real pain is. Certain changes are in process to make this load/unload process from having to be done on a recurring basis, but until that's completed, we're kind of stuck with what we have.

Having delayed TT instantiation would be a much nicer solution though.

Posted by jtownsen on 02-Oct-2008 16:52

If performance is the killer here, have you considered putting the -T space on a ram disk? Of course, if it's 160MB / user, you might have to only do something like this for a number of selected users...

Posted by Tim Kuehn on 02-Oct-2008 17:18

If performance is the killer here, have you

considered putting the -T space on a ram disk? Of

course, if it's 160MB / user, you might have to only

do something like this for a number of selected

users...

If a RAM disk would help a situation like this, wouldn't a larger -Bt do better?

Posted by jtownsen on 02-Oct-2008 17:25

I don't know the internals of whether the initial creation and writing to disk performs a sync() or whether temp-tables use the -Mf parameter (or perhaps have a hard coded value). If either of these apply, a bigger -Bt won't help because you're disk bound and the only solution is to make the disk faster.

More OS buffer space for disk writes still won't help either if a sync() is called though...

Posted by Thomas Mercer-Hursh on 02-Oct-2008 19:01

One rather assumes that you have already tuned -Bt as best you can ... haven't you?

As Tom Bascom has noted, RAM disks and solid states disks vary in performance relative to good hard disks. Are the hard disks on this system fast and sensibly used? Have you tried just moving the -T files to one or more disks of their own?

It would take some knowledge of the internals to know what it is that is being the bottleneck, although allocating 160MB in 2000+ separate pieces rather sounds like something that could be a bottleneck even with the fastest disk context.

Do you also have contention issues? I.e., is the delay really bad when the system is loaded and not so bad otherwise?

Posted by ChUIMonster on 03-Oct-2008 08:28

If you've got enough memory to stuff these files onto a RAM disk then it is a far better use of that RAM to allocate it to -Bt.

A RAM disk might be better than a real disk -- but only by a some percentage and your mileage will vary substantially. But no disk at all, of either type, is clearly a much, much better option.

Like a real disk, a RAM disk is multiple layers of indirection away from where the data is being accessed and used. To access that data you're going to have to execute a lot more code (the path length at the machine code level) and you're going to incur context switches when you make the read() & write() system calls. At best this is going to be (literally) 10x slower (and probably more like 100x slower) than allocating the very same RAM to a larger -Bt.

Whenever possible it is best to cache data as few layers as you can from where it will be used.

Posted by Darren Parr on 03-Oct-2008 11:25

Hi

sorry to add my 2c here. I have had issues with temp-tables and I guess my problem is not unrelated to the content here.

Most of you have been replying to my message on the peg. I do think there has been a massive negative difference between v9 and 10.1 in terms of performance and speed in this area.

First of all I think there should be a way of specifying the old v9 method for temp-table structure as it causes us performance issues. I'm not sure if this is possible as I haven't found a way of doing it yet.

Secondly I back the argument for deferred instantiation as this whole thing makes sense.

Thirdly I think it was a daft idea to use type-II structures in temp-table as we only have -tmpbsize as a way of altering performance. My type-II storage areas in our db can be tuned in all sorts of ways rpb, clusters etc. Unfortunately progress gives us no way of altering these and apply a one size fits all model to TTs. Is this really a good idea?

What I'm personally seeing is that space usage is massive for our -T location. Whenever the customer complains about speed, we find the temp disk at 100% and cpu usage going high. We had acceptable performance in 9.1E and poor performance in 10.1B. No parameter changes were made when we did this upgrade and the performance issue only became apparent to us when the system was load tested with about 140 users and 5 batch processes.

It looks to me like PSC never really knew how bad this was in comparison.

Regards

Darren

Posted by Thomas Mercer-Hursh on 03-Oct-2008 11:45

Whenever possible it is best to cache data as few layers as you can from where it will be used

While Tim hasn't said much about his -Bt situation yet, I rather assume that has already done what he can get the customer to do. From what I understand of the situation, his issue is not that the system is beating up on the disk when the TTs are being used, but rather that the start up burden of initializing 160MB in 2000 pieces is providing a delay before there may be any meaningful data in any of those TTs.

So, the question about a possible RAM disk is not that one expects -Bt to slop over onto that disk, but that a RAM disk would make that initialization process more efficient. My personal instinct is that -tmpbsize is the first easy thing to try since it will drastically cut the amount of disk being allocated, but it may not help proportionately since the real overhead might be the 2000 pieces of initialization.

Posted by Thomas Mercer-Hursh on 03-Oct-2008 11:49

Darren, can you clarify at what point you are seeing the impact? I.e., do you also see a big hit on initialization or is yours more of a question of performance during use.

If the former, do you have any idea how many TTs you are initializing at once?

If the later, have you tuned -Bt? Have you tried -tmpbsize? Do you have a lot of data in these tables so that they are inherently going to slop to disk?

Posted by Thomas Mercer-Hursh on 03-Oct-2008 11:53

Tim, I was thinking about this issue this morning and it occurs to me that the refactoring problem might be even easier than I was suggesting earlier.

Step one might be simply replacing one reference to an include with multiple temp-tables with N references, each to a single temp-table. That's the sort of thing that you might even accomplish with grep and sed.

Then, suppose you wrote a ProLint rule ... which might already exist ... to look for TT definitions which were not used? Then you could quickly zap the includes for any of those TTs.

No new fancy refactoring work required.

Posted by ChUIMonster on 03-Oct-2008 11:58

It looks to me like PSC never really knew how bad this was in comparison.

They are about as blind to temp-table usage as we are. There is 0 visibility into temp-table usage and performance at run-time. We all desperately need instrumentation in this area.

Posted by ChUIMonster on 03-Oct-2008 12:07

>> Whenever possible it is best to cache data as few layers as you can

>> from where it will be used

>

While Tim hasn't said much about his -Bt situation yet, I rather assume

that has already done what he can get the customer to do.

You know what they say about assumptions?

From what I

understand of the situation, his issue is not that the system is beating

up on the disk when the TTs are being used, but rather that the _start

up_ burden of initializing 160MB in 2000 pieces is providing a delay

before there may be any meaningful data in any of those TTs.

So, the question about a possible RAM disk is not that one expects -Bt to

slop over onto that disk, but that a RAM disk would make that initialization

process more efficient.

It's the same thing.

During initializtion he is overflowing -Bt. He is trying to initialize more blocks of temp-tables (at least 9 blocks per table the moment that they are defined) than he has -Bt.

A larger -Bt will resolve that problem. If there is RAM to define a RAM disk that big then that RAM will be more useful as -Bt.

... My personal instinct is that -tmpbsize is the first easy thing to try since

it will drastically cut the amount of disk being allocated, but it may not help

proportionately since the real overhead might be the 2000 pieces of

initialization.

-tmpbsize will just change the size of the allocation units. That is handy in so far as it reduces wastefulness and therefore permits a larger -Bt within the same memory footprint.

Posted by Thomas Mercer-Hursh on 03-Oct-2008 12:08

One thing this thread might help is to build an awareness. The KB entries indicate that they aren't totally oblivious to some of the issues, but it would appear that they aren't really clear on just how big the issues can be and what they might do to either report on or improve on things. This is why I think it is important to clarify as much as we can with existing tools and tuning options exactly what is going on for Tim and Darren and anyone else we can find so that we can present a clear picture to PSC.

Posted by ChUIMonster on 03-Oct-2008 12:13

Thirdly I think it was a daft idea to use type-II structures in temp-table

as we only have -tmpbsize as a way of altering performance. My type-II

storage areas in our db can be tuned in all sorts of ways rpb, clusters etc.

Unfortunately progress gives us no way of altering these and apply a one

size fits all model to TTs. Is this really a good idea?

Good point.

If we're going to have fancy type 2 areas for temp-tables we need to tools to monitor and tune them with.

Posted by ChUIMonster on 03-Oct-2008 12:17

Part of the problem is likely that PSC itself has little experience with how this stuff is being used in the field.

Having top secret limited access "beta" programs doesn't help either.

Posted by Thomas Mercer-Hursh on 03-Oct-2008 12:20

You know what they say about assumptions?

Especially since we know that Tim isn't really in control here, so he may have recommended action which has not been taken.

During initializtion he is overflowing -Bt. He is trying to initialize more blocks of temp-tables (at least 9 blocks per table the moment that they are defined) than he has -Bt.

A larger -Bt will resolve that problem. If there is RAM to define a RAM disk that big then that RAM will be more useful as -Bt.

Do you know this for a fact? I.e., it isn't just the 2000 tables you calculated from the 160MB, but that plus whatever fit in the current -Bt? I.e., were he able to allocate an additional 160MB per client to -Bt, then he wouldn't be getting those disk files?

Of course, that might not be a practical solution since it could be a whole lot of additional memory on a good sized system, but it sure would be interesting to know. Should be easy enough to test by temporarily boosting -Bt and seeing whether the disk signature drops proportionately. Could be done for one client.

My assumption ... yes, yes, I know, but it is all I can do until someone comes up with tests or authoritative knowledge ... was that it was creating a disk space similar to what OSs do when they assign swap space for an in memory process at initialization without knowing whether it will ever be needed since it is then fast when one wants to use it. I can see PSC thinking that fast response during use was more important than a delay at instantiation.

-tmpbsize will just change the size of the allocation units. That is handy in so far as it reduces wastefulness and therefore permits a larger -Bt within the same memory footprint.

It too should help confirm your view of -Bt versus disk since wouldn't it mean that 8 times as many could fit in -Bt without needing to use disk? Thus, if my initial theory about the swap pattern is right, an 8X decrease in -tmpbsize should reduce disk utilization by 8, but if your position is correct, it should be more than 8, especially if -Bt was non-trivial.

Posted by ChUIMonster on 03-Oct-2008 12:39

-Bt is number of blocks. Not MB of space.

Posted by Thomas Mercer-Hursh on 03-Oct-2008 12:58

Yes, but one relates to the other. If one reduces block size, then one can afford to specify more blocks for the same memory footprint, no?

Posted by Tim Kuehn on 03-Oct-2008 13:00

On my customer's site:

We haven't tried a lower tmpbsize, higher -Bt, etc. because it was just now that I came to understand that the huge DBI's on the client's machine were from empty TT's, and not from large amounts of data getting stored in one or more TTs.

I'll be back there next week, and the first thing I'll suggest is reducing -tmpbsize and seeing how things work. -Bt is about as high as we can put it right now, but a lower -tmpbsize'll at least keep more of the TT structure in memory rather than having it overflow onto disk.

FWIW - the mass storage system's running on a fiberchanel SAN with a RAID 10 configuration. Beyond that, I'm not sure what the disk configuration is.

Posted by ChUIMonster on 03-Oct-2008 13:01

(Nicely formatting postings is more or less impossible, sorry.)

I ran some tests:

The results:

The assumptions about sizing and initialization appear to be correct. Except that the default for -tmpbsize is 4k (at least on 10.1C).

In short, using disk is slowest, a ram disk is better and no disk of any sort is best.

Changing to -tmpbsize 1 results in the 1/4 the disk space being used (default on my system is 4) and reduced etimes to 17ms (no TT), 93ms, 68ms and 52ms. Less of an advantage to a large -Bt but still quite notable.

Looks like I was right about "9" being 8 blocks for the data and 1 for the (default) index. Adding indexes to the definition seems to increase the initial allocation by 1 block per index.

Posted by Tim Kuehn on 03-Oct-2008 13:12

From what I understand of the situation, his issue is not that the system is beating up on the disk when the TTs are being used, but rather that the start up burden of initializing 160MB in 2000 pieces is providing a delay before there may be any meaningful data in any of those TTs.

Correct. There's little, if any data in the TT's, but there's a huge DBI file being created before the user can do anything.

Posted by Thomas Mercer-Hursh on 03-Oct-2008 13:21

the first thing I'll suggest is reducing -tmpbsize and seeing how things work. -Bt is about as high as we can put it right now

OK, but if you move from 8 to 1 on -tmpbsize , then you should increase -Bt by a factor of 8 for the same memory footprint.

I would suggest doing some experiments with a single client session so that you can follow a fixed workflow, playing with parameters and collecting timings and sizes. You are going to want a mix of the initialization and then some production type activity that bangs away on the TTs in order to get a profile.

Results should shed some light.

Posted by ChUIMonster on 03-Oct-2008 13:26

Yes, but one relates to the other. If one reduces

block size, then one can afford to specify more

blocks for the same memory footprint, no?

Yes. But it wasn't at all clear to me that that was clear to all

If you want more stuff to stay in memory you must either:

Allocate more blocks (increase -Bt)

- or -

Put more stuff in each block.

The second one is a bit tricky.

You might be able to put more stuff in each block by increasing -tmpbsize.

Decreasing it will not result in getting more stuff in each block and might even result in less stuff per block.

But if you are wasting space in your blocks (fairly likely with 8k blocks - especially if the TT is empty...) then using smaller blocks will use less memory without evicting anything. You can then take the "svaed" memory and use it for more blocks by increasing -Bt.

Posted by Thomas Mercer-Hursh on 03-Oct-2008 13:32

Correct. There's little, if any data in the TT's, but there's a huge DBI file being created before the user can do anything.

So, to wrap up a little at this point:

We have a couple of things for you to try which might help your immediate problem.

While you are doing it, you should be able to determine from the change in the DBI whether there is a DBI piece created for each temp-table regardless of -Bt or whether you are only getting DBI from overflow of -Bt.

You might also consider doing a little grepping to get an idea of the number of affected compile units, in case you are able to persuade them to do some refactoring.

Clearly there is a need for a bit more information from someone at PSC who could tell us how some of these things work now.

It seems clear that it could be very desirable to have additional parameters to control TT behavior, although I think it remains to be seen exactly what they are. After all, with 2000 temp-tables in the mix, how does one decide how to set RPB and the like?

Likewise, it is clear that some additional instrumentation would be desirable. Tom? Want to make some specific proposals?

And, for longer term consideration we have three different proposals which might improve TT handling. Two of those relate mostly to Tim's situation of having TTs which are defined, but not used and the third would be useful to a larger set of circumstances, but is irrelevant if Tom is right that DBI is not used at all until -Bt is exhausted. I.e., if Tom is right, the third option is already the way it works.

That pretty well cover it?

Posted by jtownsen on 03-Oct-2008 13:44

Try making your

&

tags into &

Posted by ChUIMonster on 03-Oct-2008 13:45

Minor detail...

DBI is initialized, on disk, with 8 -tmpbsize blocks. This always happens on session startup. Even if you never define a temp-table. I don't know what this disk space is used for but it is always there.

Other than that I am quite sure that no disk space will be used until all of the -Bt blocks are consumed. There is nothing magic about how these blocks get consumed. The initial blocks allocated when a table is created are not any different from any other.

Posted by ChUIMonster on 03-Oct-2008 13:48

Try making your

&

tags into [ p

r e ] &

Thanks!

Posted by Tim Kuehn on 03-Oct-2008 14:05

Turns out, I don't have to go back to the customer site to test this.

I posted this thread: http://www.psdn.com/library/thread.jspa?threadID=12403&tstart=0 because I thought it demonstrated something about r-code load performance.

Now, it turns out that's not the issue at all.

I took that same code, ran "makeobjects.p" to make the classes, and then ran "testobjects.p" under 10.1C in a ChUI dos window, with default parameters.

DBI file size: 86MB!

running w/-tmpbsize = 1 resulted in a 21MB DBI file.

running w/-tmpbsize = 8 resulted in a 170MB DBI file.

Oi....

Posted by svi on 03-Oct-2008 14:07

I'm back. Just a quick word to acknowledge that we're working on this (starting back with the thread from Tim http://www.psdn.com/library/thread.jspa?threadID=12403)

We've been running tests with the procedures that were posted earlier, and following up and/or updating with the information you've been providing through both threads. We shall be getting back to you sometime next week.

Thank you all, as always

Salvador

Posted by Thomas Mercer-Hursh on 03-Oct-2008 14:25

So, am I right that if Tim goes from a block size of 8 to a block size of 1, that he should increase -Bt by 8?

I think you might have said this at one point, but would you clarify the formula for using up these blocks for empty temp-tables, both in -Bt and on disk? It would be nice for Tim to have a clear expectation of what he should see in terms of DBI change.

Posted by Thomas Mercer-Hursh on 03-Oct-2008 14:27

Good to know you are on the job! I hope you have followed this whole thread, since there are a number of different circumstances and ideas exposed. It would be nice to have someone who understood the TT initialization mechanism to comment.

Posted by Thomas Mercer-Hursh on 03-Oct-2008 14:28

OK, now make a proportionate difference in -Bt and tell us what you get?

How about timings?

Posted by ChUIMonster on 03-Oct-2008 14:29

Yes. It is just like changing -B and the db block size. To keep memory use constant if you change the one you should change the other by an inverse amount.

Posted by svi on 03-Oct-2008 14:35

With the latest postings you are on the right track; and we may need to change some OE 10 defaults, and perhaps a setting for deferred TT??? All's been looked at. The experts need some more time, but we will!

Salvador

Posted by Tim Kuehn on 03-Oct-2008 14:44

I'm back. Just a quick word to acknowledge that we're working on this (starting back with the thread from Tim http://www.psdn.com/library/thread.jspa?threadID=12403)

Salvador - it's always good to hear from you, and that your team's working on this.

However, this isn't the first time I've made noise about r-code load performance, and I'd even sent you a log file which showed some pretty crazy delays while loading r-code. Had those reports been investigated further when I reported them, this problem would've been uncovered a lot sooner.

Why didn't that happen?

We've been running tests with the procedures that were posted earlier, and following up and/or updating with the information you've been providing through both threads. We shall be getting back to you sometime next week.

Communication is always good.

However, in the future, it would also be extremely helpful to get more interaction from whoever's working on something that's reported here while they're checking things out, rather than leaving us out here to decipher what's going on on our own.

Posted by Thomas Mercer-Hursh on 03-Oct-2008 14:44

I missed this post with Tom's time in the flurry this morning, but it seems to prove your case that the DBI piece only happens after -Bt is consumed. So, if Tim is already at the max -Bt they can handle, moving to -tmpbsize 1 would allow him to bump that number by 8 (if they are at 8 now) and get a significantly larger number of temp-tables defined before the disk activity started. It also seems clear that pushing the -Bt as far as he can is key.

Posted by Thomas Mercer-Hursh on 03-Oct-2008 14:45

Tim, apropos easy ways to help the source code side of this problem, check out Jurjen's remarks here http://www.oehive.org/prolint/rules/tableusage

Posted by Tim Kuehn on 03-Oct-2008 14:48

With the latest postings you are on the right track; and we may need to change some OE 10 defaults, and perhaps a setting for deferred TT??? All's been looked at. The experts need some more time, but we will!

Salvador

Of the possible solutions posted here so far, I think deferred TT instantiation would be the biggest overall general win for everybody who uses temp tables.

I'm not sure how "do"able it is, but that would get my first vote.

I look forward to your report on where things have been and where they're going next week.

Posted by Tim Kuehn on 03-Oct-2008 14:51

Tim, apropos easy ways to help the source code side of this problem, check out Jurjen's remarks here http://www.oehive.org/prolint/rules/tableusage

This was a known problem in Mar 2007?!? And hasn't been addressed in the following 18 months?

Geeez.

Thanks for the pointer!

Posted by ChUIMonster on 03-Oct-2008 14:56

It was addressed. That is why the default -tmpbsize was changed to 4.

Posted by Tim Kuehn on 03-Oct-2008 14:58

It was addressed. That is why the default -tmpbsize was changed to 4.

4 is a better value, but it doesn't go far enough.

Posted by ChUIMonster on 03-Oct-2008 15:01

Ideally we would be able to place any given temp-table in either a type 1 or type 2 area and, for type 2 areas, we would be able to choose an area with an appropriate rows per block.

I'm not holding my breath but that's where I think things should go.

Along with adding all of the necessary instrumentation to monitor and tune just like using VSTs on real tables.

Posted by Tim Kuehn on 03-Oct-2008 15:04

One thing I find kind of amusing about this whole mess is that we're grumbling about TT areas that are bigger than full-size database from years ago!

Posted by svi on 03-Oct-2008 15:24

It did happen, while 10.1C02 has been completed and 10.2A beta is going on... Sometimes things just take time.

Thank you for your patience

Salvador

Posted by Thomas Mercer-Hursh on 03-Oct-2008 17:49

Are you implying some change in this area in 10.2A?

Posted by Tim Kuehn on 03-Oct-2008 19:55

It did happen, while 10.1C02 has been completed and 10.2A beta is going on... Sometimes things just take time.

I'm confused - what "did happen"?

I understand "things take time" (like no-index table scans , but beyond taking my information, your posts here is the first I've heard of where something's being done about it, and when those changes will be out.

My prior gripe was more along the lines of wanting some interaction with the people doing the actual work, so we could at work on this together rather than us users working in our own area and them "watching" us.

Thank you for your patience

Salvador

So spill the beans - will this update be in 10.1C02? 10.2A? And what will be in it?

Enquiring minds want to know!

Posted by Darren Parr on 06-Oct-2008 04:45

Hi. I'm not seeing a massive slowdown on initialization but we're getting massive issues at run-time. We have done alot to sort this out, but the system was developed to read thousands of records and pass them all to the client. This was done before I joined the company.

Effectively its not adm2 and hence has no rows to batch capability. The data is read to the client and then sliced and diced in a tree/list situation. A single click of a tree node can fetch tens of thousands of records and there is alot of redirection on the server in that we call one routine to return this data in a TT and this hits a super first, is passed through another super and then into a specific procedure to return the data. As the TT is passed as an output parameter we have 3 procedures getting the same data and all having their own copy. At one time these procedure were never cleaned up and removed but now they are.

Essentially at one time error 40 (too many indices) was a common issue for us. Now because of the amount of TT data, we mainly suffer with large DBI files and the issues this presents in OE10.

Our later builds are better at dealing with this. Our large customer who this topic is about is using an older version. We are struggling to convince them to upgrade because of the upheaval.

Posted by jtownsen on 06-Oct-2008 05:02

As the TT is passed as an output parameter we have 3 procedures getting the same data and all having their own copy.

You might be able to make use of the REFERENCE-ONLY & BIND options. The following from the ABL Reference:

Passing a reference-only temp-table object parameter to a local routine using either the BY-REFERENCE or BIND option allows the calling routine and the called routine to access the same object instance (instead of deep-copying the parameter).

Posted by Tim Kuehn on 06-Oct-2008 06:37

we call one routine to return

this data in a TT and this hits a super first, is

passed through another super and then into a specific

procedure to return the data. As the TT is passed as

an output parameter we have 3 procedures getting the

same data and all having their own copy. At one time

these procedure were never cleaned up and removed but

now they are.

In addition to the BIND / REFERENCE-ONLY option already suggested, have you looked at passing your TT BY-REFERENCE? Then there's only one copy of the data in the system, and not multiple copies of the TT holding/the same data, and there's no peristent association between the calling procedure and the called procedure.

Posted by Thomas Mercer-Hursh on 06-Oct-2008 11:55

So, in some ways this is a very different problem from Tim's since yours has to do with having lots and lots of data. -Bt and -tmpbsize might still help by getting more of the data into memory, but it sounds like you have enough that you are still going to be going to disk. And there, I would suspect that the slowness comes not from the volume of disk space used, but from the inherent lower efficiency of disk IO compared to RAM IO.

It does sound like the first key step is to get down to having only one copy of the TT instead of 3. That alone is likely to have a huge impact.

One way to do that would be to encapsulate the TT in its own procedure or class and provide methods for the other procedures to access and manipulate it. One of the advantages of that approach would be that, once done, you would have a natural context in which to explore possibilities for lazy instantiation, if that is even possible. Certainly, one of the problems of tree views, especially if they have sorting or grouping options, is that they tend to require a full set of data, but in some cases one can either lazy instantiate branches or construct an tree of minimal data and lazy instantiate the details.

Posted by svi on 07-Oct-2008 16:33

First of all, thank you for all the interest and the information that you all have shared through this thread. As I said earlier, some postings are pointing in the right direction.

Based on issues reported through Technical Support we have been looking at TT performance in OpenEdge 10 compared to Progress V9. This thread has added insights and some information. All in all we are realizing that some of the changes that have been introduced in OpenEdge 10, and that I am commenting on below, will need to be adjusted because, in some cases (for example when dealing with small TTs) they are causing unintended negative performance impact to TT handling. We have been working, and continue to work on it.

With OpenEdge 10 we introduced support of Type II Storage Areas for TT. One of the major objectives of this change, among others, was to optimize the deletion of TTs in that the deletion time of TTs would be fixed, regardless of their size. Using Type II Storage Areas would introduce some I/O overhead due to the way the resources are allocated with Type II Storage Areas compared to Type I.

In OpenEdge 10.1B we introduced a change in the default block size (-tmpbsize) from 1KB to 4KB. The change was due to the 10.1B enhancement to support large key entries (using 64-bit database keys).

As we described in the Knowledgebase entry:

ID: P122597

Title: "The size of the DBI file is much bigger in OpenEdge 10"

Created: 03/02/2007 Last Modified: 01/08/2008

http://progress.atgnow.com/esprogress/jsp/AnswerControls.jsp?directSolutionLink=1&tabs=true&docPropValue=p122597

and indicated by some of you in previous postings to this thread, setting –tmpbsize to 1 significantly helps increase TT performance, otherwise impacted by the new default in OpenEdge 10: –tmpbsize 4.

Because the nature of the space allocation of Type II Storage Areas, the resources needed to handle TTs have increased in OpenEdge 10. Therefore it is very important to adjust the –Bt setting as well. Our recommendation is to set –Bt to 10 times the number of TT in your session. Again, some of you have rightfully referred to –Bt setting in previous postings.

From issues reported through Technical Support, and with the information from this thread, we have been running tests scenarios with the objective to identify the settings, and other internal values, that would best accommodate typical TT usage. We are finding that your applications and reproducibles have high deviation. Some of the reproducibles have been arguably been considered unrealistic from a business application perspective, in this very threaded discussion. One of our objectives is to try to avoid introducing new startup parameters that you need to set and tune. With such objective in mind we need to continue collecting more typical configuration information, and from more customers, including:

1. What is the typical number of rows in your TTs (this is important to adjust default settings, and other internal values, for optimal breakeven of initial resource allocation and performance)

2. What is the typical number of TTs in a session

3. What percentage of the TTs that you create in your session, are rarely used. Or, if the percentage of TTs that are rarely used is low, what is the mean amount of time between TT creation and first use.

4. How common is for your Database and TTs, to use large index keys? (that would need –tmpbsize 4)

In addition to –tmpbsize and –Bt settings we are analyzing a number of additional internal values (e.g. TT cluster-size) as well as algorithms for TT handling in upcoming releases.

In summary, for those of you that are experiencing TT performance issues compared to your corresponding environments in Progress V9, we recommend using –tmpbsize 1 and follow the recommendations from the product documentation and knowledgebases to set –Bt. As always Technical Support will assist you if you have questions or concerns. We trust that the enhancements introduced in OpenEdge 10 are helping improve the performance of regular database tables, and we are working to improve the handling of TTs in those cases where it has been (undesirably) negatively impacted.

We are looking forward to your replies with information to the four questions listed above, regarding your typical configurations.

Thank you for your continued support

Salvador

PS. FYI In the next couple of weeks (starting tomorrow) I will be in a business trip in Latin America with limited access to e-mail, therefore I likely won’t be able to reply further to this, or other, PSDN threads until I’m back in the office. ABL Developer Managers will follow up in my absence, as needed.

Message was edited by:

Salvador Vinals

Posted by Thomas Mercer-Hursh on 07-Oct-2008 17:38

Salvador, thanks for the informative response and the open ears. While I'm not one of the people currently having a problem, I'd like to throw out a couple of comments.

One is that it is clear that for many people the answer is no more complicated than using the parameters that are already there. I.e., if one can drop to a low -tmpbsize and increase -BT sufficiently, then the temp-tables will live entirely in memory and the possible DBI issues will be non-existent or rare. The only problem here is in making people aware of symptoms and the solutions.

Of course, that may not be enough for some people. With Tim's 2000+ temp-tables it is quite possible that it may not be practical to allocate enough -Bt space in the available memory, even at a tmpbsize of 1. The same may be true of people like Darren who are simply handling large amounts of data in a much smaller set of temp-tables. Getting an optimum setting for the parameters is obviously desirable, but it may well not be enough.

You ask about typical values, but I think that the reality is that typical values in typical applications don't have the issue. If one is only handling a limited number of temp-tables simultaneously and they have modest amounts of data, then a modest -Bt will get everything in memory.

The real issue has to do with unusual values. Tim's is unusual, but not unprecedented, for the number of temp-tables that are being simultaneously defined. Darren's seems to be unusual in the volume of data being processed. If there is help for either of them outside of optimizing parameters and/or restructuring the application, it is going to have to be something new.

In Darren's case, it is hard to think off hand what might help unless there is some inefficiency that we haven't yet discovered. In Tim's case, some form of lazy instantiation would produce dramatic benefits.

Posted by Tim Kuehn on 07-Oct-2008 18:25

Salvador -

Thanks for the explanation and description of what PSC is doing about this issue.

We are finding that your applications and reproducibles have high deviation.

This does not surprise me as application design and structure also have a high level of deviation depending on how each shop develops their code base.

Some of the reproducibles have been arguably been considered unrealistic from a business application perspective, in this very threaded discussion.

Which cases would these be?

We trust that the enhancements introduced in OpenEdge 10 are helping improve the performance of regular database tables, and we are working to improve the handling of TTs in those cases where it has been (undesirably) negatively impacted.

Type II db areas are a definite overall win in my book. And if the initial TT allocation problem can be resolved they'd be an overall win in the TT area as well.

...

With respect to your queries - it's really hard to get answers for questions 1-3 without some kind of instrumentation in the language, unless there's a LOG-MANAGER parameter which would suit?

WRT question 4: none of the applications I'm presently working on use large index keys since they're all based on legacy code.

...

As for the client, they've since gone to -tmpbsize 1 and a -Bt ~16K, and has seen a good increase overall application responsiveness when starting the various menus.

No before/after measurements were taken of the absolute difference in time though.

Posted by Tim Kuehn on 07-Oct-2008 18:40

The real issue has to do with unusual values. Tim's is unusual, but not unprecedented, for the number of temp-tables that are being simultaneously defined. Darren's seems to be unusual in the volume of data being processed. If there is help for either of them outside of optimizing parameters and/or restructuring the application, it is going to have to be something new.

Agreed.

In Darren's case, it is hard to think off hand what might help unless there is some inefficiency that we haven't yet discovered. In Tim's case, some form of lazy instantiation would produce dramatic benefits.

Indeed it would - as it would for any case where an application has lots of TTs in their code, which could easily happen in an organically-grown OO system. LI would also save memory since a TT wouldn't use much, if anything in terms of resources until it was actually used. This would also be a performance win whenever a "run x.p" is performed, and a TT def'n is made but not used during the call.

Since part of the issue is the time the system spends writing to the DBI file, why not rip a page out of the DBA manual and pre-allocate the DBI file just like .bi and .d* files are supposed to be pre-grown for performance reasons. If the DBI file was pre-grown at startup, then less time would be taken when it's actually in use. Put the parameter in a client.pf file, and off you go!

Posted by Thomas Mercer-Hursh on 08-Oct-2008 11:26

Indeed it would - as it would for any case where an application has lots of TTs in their code, which could easily happen in an organically-grown OO system.

I don't know that you get off the hook that easily. E.g., the fellow on the PEG who talked about having thousands of temp-tables would actually not benefit from lazy instantiation since any temp-table in his system that got defined was immediately used, as near as I can tell.

I can't quite get away from the feeling that the core issue in having thousands of temp-table definitions which are not only never referenced, but in fact can't ever be referenced since they are included in compile units which don't refer to them is that this can hardly be considered a best practice. While I would tend to design with encapsulated temp-tables, I recognize that is a fairly major restructuring, but I have suggested a couple different routes by which you could refactor so that only temp-tables actually used by a program were defined there. This seems so clearly good practice. And, if it turns out that 3/4 of the temp-table definitions causing your current problem are never referenced, this refactoring could well solve the problem.

I.e., I am still a little dubious about lazy instantiation being of benefit to most systems.

As for pre-allocation, that is certainly a potentially interesting proposal. Certainly there is precedent in the other DB files between fixed and variable extents, so preallocating a fixed extent on client startup while providing a variable extent for overflow would seem like a strategy that would be in keeping with that in other parts of the database. What we don't know is how much of the DBI performance issue comes from grabbing the additional 9 blocks and how much comes from the formatting and writing to those blocks. Still, it seems like a very sensible proposal and one that could be implemented with fairly little effort. As a client startup parameter, it could even be tuned to give certain clients a much larger allocation.

I wonder if there might also be some in use performance gain because the sectors were contiguous instead of scattered?

Weren't you going to be looking at a -tmpbsize and -Bt adjustment with the problem site this week? I'm very interested in the results.

Posted by ChUIMonster on 08-Oct-2008 14:42

Salvador,

I'm not sure that you should trust the answers that you get (if you get many). Because of the lack of available temp-table instrumentation there are no reliable ways to definitively answer these questions. People are going to have to go with their guts. Designers and architects will tell you what they think should be happening within their code but the reality in the field may be (IMHO probably is) radically different.

These problems are going to get worse as more and more ProDataSet and OO based code becomes more widely deployed. The design principles that PSC has been advocating are going to create an awful lot of temp-tables and there is no effective way to monitor and manage them.

IMHO it is also unlikely that a "one size fits all" approach to temp-table configuration is going to work. There are very good reasons why the real database supports multiple row sizes, multiple cluster sizes and both type 1 and type 2 areas. The 4gl is brilliant in making decent default choices for a great many behaviors while preserving the ability to peel back the covers at need. That brilliance is needed here.

Posted by Thomas Mercer-Hursh on 09-Oct-2008 13:00

Indeed, Tom, not only is it the exceptions rather than the rule which are of interest in currently deployed code, there is also an issue here, as you suggest with your references to PDS, of the design of future code. If I come up with a particular architectural design involving TTs and do some testing and find it performs poorly, then I am likely to revise my design. That might actually be a good thing, but it would be unfortunate if the real issue was that I either didn't understand how to tune for large numbers of TTs or that I didn't have the controls necessary to tune them.

E.g., the whole idea of having a one row TT in every entity object. It is appealing on the one hand because it has the XML methods for serialization and I might be able to do before-image processing with it. But, combine that with using entity objects in collection objects a la my OE Hive paper and one ends up with a vast number of temp-tables and probably poor performance. Now, as it happens, I tend to favor entity objects with properties instead and the use of entity set object in most cases, which drastically reduces the number of temp-tables in play. At the moment, I am thinking this is actually a better design, but it would be unfortunate to be forced there by excessive overhead for temp-tables.

One thing we can be very sure of is that there will be many, many different forms of use. Some of these will be questionable design practice, but many will be perfectly reasonable design responses to valid real world problems. One person will have a large number of very small temp-tables and another might have only a few, but with large amounts of data. These are almost certainly going to require different tuning.

Providing additional client parameters like those for the DB is certainly one idea, but I wonder if ultimately we shouldn't have something more dynamic than that. What would I do, for example, if there was one place in a session where I had the many little situation and another in the same session where I had the one big situation?

Posted by ChUIMonster on 10-Oct-2008 14:54

Indeed, Tom, not only is it the exceptions rather

than the rule which are of interest in currently

deployed code, there is also an issue here, as you

suggest with your references to PDS, of the design of

future code.

I suspect that there is an awful lot of stuff "in the pipeline" that hasn't been deployed on a large scale yet. When it is deployed we're going to see a lot of surprised faces...

If I come up with a particular

architectural design involving TTs and do some

testing and find it performs poorly, then I am likely

to revise my design.

Call me a cynic but I think it far more likely that the sorts of problems that we're talking about won't be noticed until they go live. And then probably not unless they go live in a large-scale deployment.

A lot of (most?) partner applications start off with small deployments and grow into larger markets as they become successful. I see the evidence all the time in my consulting. Partners whose "standard" out of the box configuration makes sense for fairly small customers but which is seriously deficient for their big customers. It takes usually them a long time (if ever) to realize that they need to configure their newer and larger customers differently.

One thing we can be very sure of is that there will

be many, many different forms of use. Some of these

will be questionable design practice, but many will

be perfectly reasonable design responses to valid

real world problems. One person will have a large

number of very small temp-tables and another might

have only a few, but with large amounts of data.

These are almost certainly going to require

different tuning.

That's a very safe bet.

Providing additional client parameters like those for

the DB is certainly one idea, but I wonder if

ultimately we shouldn't have something more dynamic

than that. What would I do, for example, if there

was one place in a session where I had the many

little situation and another in the same session

where I had the one big situation?

Exactly. That's why we need sensible defaults coupled with powerful capabilities to specify exactly what we need on a TT by TT basis. Just as you would with a real database -- you can let everything default to a certain storage area or you can start explicitly managing individual tables. It might be as simple as a few additional properties exposed via something like DEFINE TEMP-TABLE ... ROWS-PER-BLOCK 128 etc.

Posted by Tim Kuehn on 10-Oct-2008 15:12

If I come up with a particular architectural

design involving TTs and do some

testing and find it performs poorly, then I am

likely to revise my design.

Call me a cynic but I think it far more likely that

the sorts of problems that we're talking about won't

be noticed until they go live. And then probably not

unless they go live in a large-scale deployment.

Which is pretty well what happened in this case. The initial code base worked, and worked well. But over time as the code base expanded and these techniques used in more and more places, performance degraded. It was a slow, incremental process that accumulated over time and was barely noticeable except when it was seen in the overall time delay.

And by the time we figured out what was causing the problem, "revising the design" would've been painful, it not exceedingly difficult / impossible to carry out.

Posted by Thomas Mercer-Hursh on 10-Oct-2008 15:39

I suspect that there is an awful lot of stuff "in the pipeline" that hasn't been deployed on a large scale yet. When it is deployed we're going to see a lot of surprised faces...

For people with large existing applications who are doing stepwise transformations, I would think the issues would become progressively apparent rather than a big bang sort of thing. The ones most susceptible to a big bang are those developing from scratch or doing a major re-write with inadequate testing. It always surprises me a bit how people don't do any stress testing until they are done, when they could have stress tested many things from very early on.

Call me a cynic

OK.

But, yes, I do think that is a typical pattern .. but only one of a number of patterns. The stepwise replacement pattern is likely to be increasingly common as people look to transform applications which have spent most of the last 10-20 years in accretion mode rather than having any architectural review.

Posted by ChUIMonster on 11-Oct-2008 12:37

I think that you're naively, albeit understandably, optimistic regarding the likelihood of stepwise refinement It doesn't really matter though. Either way the problems won't get noticed (or taken seriously) until they are in the field and being used somewhere that is much, much larger than "normal". And by then it will be very, very difficult to fix. And the first step to fixing these things is an accurate diagnosis -- which will be handicapped by the lack of instrumentation.

Posted by Admin on 13-Oct-2008 08:17

Hi Salvador:

First off, I don't envy anyone this task. Every place I've been, the usage of temp tables has been wildly varied. Let me try to answer some of the questions:

1. What is the typical number of rows in your TTs (this is important to adjust default settings, and other internal values, for optimal breakeven of initial resource allocation and performance)

- The majority of temp tables contain less than 2 dozen rows. Maybe 10% of the tables contain thousands. The small tables are for caching, the larger ones to consolidate and rearrange data for other purposes. This is changing dramatically over the next year at D&H as we move to a completely separated data access layer, so we will be using tons of ProDataSets. That will skew toward the smaller end but more tables.

2. What is the typical number of TTs in a session

-created and destroyed over the life of a session? Maybe 100-150. Concurrently active? Maybe 2-5 in most sessions (created, deleted, recreated, repeatedly), with a handful over a dozen.

3. What percentage of the TTs that you create in your session, are rarely used. Or, if the percentage of TTs that are rarely used is low, what is the mean amount of time between TT creation and first use.

- assumptions: creating the TT means defining it, first use means first CREATE statement? If so, time is pretty short. First CREATE to first FIND is also relatively close - probably closer. Most are loaded by FIND FIRST/IF NOT AVAIL THEN CREATE.

4. How common is for your Database and TTs, to use large index keys? (that would need ?tmpbsize 4)

- very few, very uncommon. Maybe a handful throughout the application.

I think the Type2 storage areas are likely to be an overburden for the small caching temp tables. But you know what might be a real interesting thought... rather than a bunch of session parameters that can't be changed in the middle of a session to accomodate how the temp tables are being used "at the moment", what about having storage area settings on the temp table definition? Allow us to set records per block, blocksize, cluster size, etc. all in the DEFINE TEMP-TABLE statement. That way, when we have a small (say US States) cached table, or a huge order summary temp table with tens of thousands of small records, or a medium size table with a huge index or text area, we can set the storage area attributes appropriately. What do you think?

Glen West

Posted by Thomas Mercer-Hursh on 13-Oct-2008 11:20

While setting attributes on a per temp-table basis is ideal from a tuning perspective, it also requires the most amount of work for retrofitting an existing application. I would think the ideal would be a combination approach.

1) Set a session default

2) Provide a mechanism for changing that default at a point in the session where one knows that one is about to handle unusual activity and then revert when done.

3) Per temp-table tuning.

Of course, this implies mixed mode use of the DBI, which I would guess means multi-volume DBI with the volumes dynamically created ... might be a little complex.

Posted by svi on 13-Oct-2008 13:07

Taking advantage of a good connection...

Thank you for all the follow ups to my posting. My takes so far:

- Need instrumentation to monitor and have visibility into TT usage

- New TT startup parameters for session defaults may be unavoidable

- Extended settings for TT definition may be needed also

Glen, The information you provided is great!. If we were to be able to find out some patterns of usage with other's applications we could set default values for such typical configurations so no changes would be needed out-of-the-box. As I said in my posting we are wary of startup parameter proliferation eroding ease of use, and if at all possible we'd like to make sure its number and use is manageable.

Please follow Glen's posting with your actual settings (if known) or your best estimates. We'll take Tom's word of advice into consideration as well, and not jump to conclusions too quickly! The more input the better.

Bye now, from warm and sunny Brazil (hehe)

Salvador

Posted by Tim Kuehn on 02-Dec-2008 09:46

Bye now, from warm and sunny Brazil (hehe) Salvador

Now that (I presume) you're back from Brazil, how are things wrt the issues identified in this thread?

Posted by Thomas Mercer-Hursh on 02-Dec-2008 11:28

how are things wrt the issues identified in this thread?

Tim, I was thinking of asking you the same! Did you get a chance to do some experimentation with parameter settings at your customer? With what results?

Posted by Tim Kuehn on 02-Dec-2008 13:45

the customer went straight to a tmpbsize = 1, with a corresponding increase in -Bt so the client would use the same amount of memory.

Anecdotal evidence has shown a significant decrease in DBI size, the amount of time to start any given program, which has also resulted in (so far) the users stowing their pitchforks and torches.

Posted by Thomas Mercer-Hursh on 02-Dec-2008 14:00

Well, sounds like good news ... and you didn't even have to wait for an update!

Posted by Tim Kuehn on 02-Dec-2008 14:04

No, we didn't.

This experience has illustrated that lazy TT instantiation would be a serious performance win all around for all platforms and environments, regardless of how many TT's the application uses.

I hope PSC implements that sometime in the future.

Posted by Thomas Mercer-Hursh on 02-Dec-2008 14:11

Well, it is still true that the only users that will materially benefit from lazy instantiation are those who define large numbers of temp-tables and don't use them. It doesn't really do anything for people who only define them when they are ready to use them. We don't really have any idea how many people there are out there with your specific problem. This thread has made it clear that there is certainly a need for better information about how to manage temp-tables and there there are a number of opportunities for providing us with better information and better control ... enhancements that would benefit anyone using temp-tables significantly. It still isn't clear to me that lazy instantiation will benefit very many people and it seems to me that the circumstances you describe of how the problem arose should be addressed by refactoring of the code to eliminate the large number of unused temp-table definitions, even if you were to get lazy instantiation, that would be recommended.

Posted by ChUIMonster on 02-Dec-2008 14:25

It is a fallacy to argue that a feature is not important because it only applies to a small subset of users.

Many features are important and necessary simply to ensure that customers who start small can be confident that their system will smoothly scale beyond their wildest dreams. Even if those dreams are never realized the feature is important to them.

Likewise some features are important in order to preempt potential support issues.

Even if a feature is never used by anyone it can be important.

While I can sympathize to some extent with the idea that the specific trigger of this thread could be remedied by refactoring the code I'm also quite sure that there are many other cases where lazy instantiation would be very helpful and that there will be more and more of these temp-table related issues as people modernize their code and start implementing the technologies and the ideas that PSC has been articulating for the past several years.

Posted by Admin on 02-Dec-2008 14:27

I fully agree with Thomas, but I have the feeling that we are turning in circles... And this is already such a long thread.

Posted by Tim Kuehn on 02-Dec-2008 14:33

Well, it is still true that the only users that will materially benefit from lazy instantiation are those who define large numbers of temp-tables and don't use them.

It'll be the most obvious for people like my clients, but every single AVM program in existence which does a "RUN" has all it's TTs instatiated as part of it's startup process right now, regardless of whether or not the program logic will require it.

If that initialization process can be avoided, then that' s a win for everyone, and every little bit helps.

A percent here, a percent there, and soon you're talking "real" performance increases.

Posted by Thomas Mercer-Hursh on 02-Dec-2008 14:38

I'm not voting against lazy instantiation ... seems like a good idea. I'm just trying to put things in perspective, especially relative to a limited pool of development dollars. Better instrumentation on temp-tables and more options for how they get instantiated like exist for the database seem to me to be fairly straightforward projects with impact across a wide range of the user base. Lazy instantiation might be harder and impact only a limited number of sites. That's all I am trying to suggest. If they look at the problem and find that lazy instantiation is easy or can be done in association with doing something else good, then great. But, I still think that code should get refactored.

Posted by Thomas Mercer-Hursh on 02-Dec-2008 14:46

every single AVM program in existence which does a "RUN" has all it's TTs instatiated as part of it's startup process right now, regardless of whether or not the program logic will require it.

Ah, but the key issue here is the "regardless of whether" phrase. How much code out there is there where temp-tables are defined, but not used. If they are used, then lazy instantiation simply moves the instantiation delay to a different spot ... which may or may not be a good thing. The only clear win here is a TT that is defined and not used.

With well-encapsulated code, one would expect that a .p will get run or not based on whether it is needed. IF it contains a TT, chances are that needing the .p means needing the TT. At least, that is how it seems to me.

From the earlier discussion, it sounded to me like the real killer in your application was the use of include files which define multiple temp-tables being included in programs which only used one or less than all of them. I wouldn't think that was a typical programming practice. Simply refactoring this so that TTs were defined in their own includes and including only those actually used by the program ... no structural changes whatsoever ... sounded to me like it would dramatically drop the total number of TTs. This seems to me to be a good thing across the board and one where there are tools which could make it fairly easy to do.

Posted by Tim Kuehn on 02-Dec-2008 14:58

Ah, but the key issue here is the "regardless of

whether" phrase. How much code out there is there

where temp-tables are defined, but not used.

That depends on the logic. A program may have a TT definition, it may use it in certain cases, in others it may not. In the "others it may not" cases, LI is a win.

From the earlier discussion, it sounded to me like the real killer in your application was the use of include files which define multiple temp-tables being included in programs which only used one or less than all of them. I wouldn't think that was a typical programming practice.

With the advent or prodatasets, it is now. PDS encourages grouping of TTs into a "business module", so one can expect more cases where TTs are defined as part of a PDS definition, regardless of whether it's actually needed in a program or not.

Posted by Thomas Mercer-Hursh on 02-Dec-2008 15:40

In the "others it may not" cases, LI is a win.

Well, yes, but if we are talking about one or two temp-tables in a current session for which this is true, the "win" may be trivial and undetectable.

Perhaps I just have a skewed experience base, but to me a temp-table is most often something that is a key intermediary for the function, like a TT used to assemble data for a report, or it is something that gets packaged in a procedure, typically a PP or SP and the whole point of instantiating the procedure is to use it and the TT is intrinsic to the use. I'm not suggesting there are other cases, e.g., an SP that gets instantiated because it might be needed, but then it doesn't happen to be. I'm just suggesting that it isn't rampantly common.

With the advent or prodatasets, it is now. PDS encourages grouping of TTs into a "business module", so one can expect more cases where TTs are defined as part of a PDS definition, regardless of whether it's actually needed in a program or not.

Again, if the PDS has any cohesion, I find it unlikely that part of it will get populated without the other parts. Even if the other parts are empty, e.g., a record for adding memos to order lines or some such, then the TT still needs to exist so that code can notice that it is empty and there is nothing to do. I just don't see that it is going to be common to have PDS which have pieces that are frequently not used.

Not to mention, of course, that I advocate encapsulating the PDS in an object so its definition is only in one place. The whole idea of having TTs defined all over the place and flinging them left and right seems poor programming practice to start with. Yes, I know it is what PSC publishes, but we haven't exactly looked to PSC as the arbiter of good style, have we?

Posted by Darren Parr on 05-Dec-2008 04:04

Hi.

Having looked at all this in detail and also by doing some benchmarking of our system here, I can conclude that although LI might help us, the real issues we suffer from are crashes as a result of error 40.

Imagine an ERP system which bring loads of TT data from the server to the client and commits what it needs. We can easily get into issues with the number of physical TTs around on the windows client and the linux server.

The BY-REFERENCE stuff and using a 6gb ram disk has alleviated most of our performance issues that we were having. Sorry that coupled with a targeted -Bt setting for batch process.

Anyway I would like to know if its at all possible to have the 32000 limit on TT indices doubled say or done away with altogether. That would be a major improvement for us.

We have say 10 people per day who crash (client side) with the error 40 issue on the client from a 150 user system. We dont seem to have hardly any crashes server side from this now.

_Darren

Posted by Tim Kuehn on 05-Dec-2008 06:58

the real issues we suffer from are crashes as a result of error 40.

Darren - can you start a new thread with this topic? I think this is interesting, but it's not really pertinent to LI and how / when TTs are instantiated.

Posted by GregHiggins on 27-Dec-2008 08:14

From the earlier discussion, it sounded to me like the real killer

  > in your application was the use of include files which define

  > multiple temp-tables being included in programs which only

  > used one or less than all of them. I wouldn't think that was

  > a typical programming practice.

I disagree. I've spent about 18 of my 21 year Progress career looking at other people's code and I think it is a fairly common practice. Sticking a temp-table definition in an include which is already included in several pieces of code may not be a best practice, but as I have observed in the past, management does not pay for best practice.

I've seen sites where it is required practice. I did some work at one just recently, running 10.1C, where the practice is canonized.

Posted by Tim Kuehn on 18-Oct-2010 13:07

I'd like to note that PSC Development has been working on implementing this solution, and it might make it to 11.0.

Posted by Admin on 18-Oct-2010 13:18

Do you have details about the exact fix they are going to implement?

Posted by Evan Bleicher on 18-Oct-2010 15:43

We are exploring whether or not we can delay the instantiation of temp-tables until the temp-table is first referenced within a procedure or class.  We are still in the investigation stage of this initiative and as Tim noted we have only indicated at this time that this change may be included in release 11.0.0.  Whether or not this feature makes it into release 11.0.0, will be determined by the performance gains we achieve by implementing this change and the overall quality of the modification.

Posted by Tim Kuehn on 30-May-2011 12:13

any update on the status of this update?

Posted by Evan Bleicher on 31-May-2011 10:43

I can report a positive update on this front.   Based on our internal performance testing, this feature is currently planned for version 11.0.   To review, this feature delays the instantiation of temp-tables, datasets and their associated indexes until the temp-table or dataset is utilized.  The act of delaying the instantiation of these constructs improves the performance of both class and procedure instantiation.  The instantiation performance boost increases as the number of defined objects (temp-tables and datasets) increases.  In our performance tests we instantiated classes which defined – but did not access - from 1 to 10 temp-tables.  For a class with 10 temp-tables, our testing generated an 81% improvement on Linux32, a 75% improvement on Solaris-64 and a 78% improvement on Windows32.  Your performance gains can vary.

Posted by Thomas Mercer-Hursh on 31-May-2011 11:28

Your performance gains can vary.

Would I be correct in assuming that if the TT are actually used, that there is no performance gain, just moving of the time from one instant to a latter instant?

Posted by Tim Kuehn on 31-May-2011 12:14

This is indeed good news! Thanks for the update!

Posted by Evan Bleicher on 31-May-2011 12:58

Hi Thomas:

You are correct.  If the temp-tables are actually used, that there is no performance gain.  We are just moving of the time from one instant to a latter instant.  However, given that a class or procedure needs to define all temp-tables or datasets which are referenced within the class/procedure, there are possibly instances in which a particular code path will not need to reference all temp-table or datasets.  In this case there will be a net performance gain.

Posted by Admin on 31-May-2011 13:22

In this case there will be a net performance gain.

 

I can certainly see that - also in cases which are not as extreme as Tim's case that initially kick started this thread.

Posted by Thomas Mercer-Hursh on 31-May-2011 13:53

Yes, although I think it likely that one would have to have a lot of instances with a lot of temp-tables for it to make a noticeable difference.

Posted by Tim Kuehn on 02-Jun-2011 13:33

Any application that overflows the -Bt buffer to disk will see a noticable benefit from this improvement. In some cases it'll be because the TT definition doesn't cause disk writes directly, in other cases because unused TT definitions don't take up space in the -Bt buffer so other TT's which are in use can be stored there w/out causing disk writes.

Posted by Thomas Mercer-Hursh on 02-Jun-2011 13:40

My point is that in most applications, there will not be a significant number of TT definitions for TTs that are not used.  If the TT is used, then the only performance difference is that the instantiation is deferred from program startup to first use, which will often be soon.  The only substantial impact will be on applications which have TTs that are defined and not used and have enough of those in currently used programs to amount to a significant aggregate.  You ran into this because of having multiple TTs in a standard include, so lots and lots of TTs, a large number of which were not used.  I don't think one will see this pattern very often ... though, of course, it will really matter in places where on does see the pattern.

Posted by Tim Kuehn on 02-Jun-2011 14:25

tamhas wrote:

My point is that in most applications, there will not be a significant number of TT definitions for TTs that are not used.

...

If the TT is used, then the only performance difference is that the instantiation is deferred from program startup to first use, which will often be soon.

I'm curious what your basis for these statements are....

Posted by Thomas Mercer-Hursh on 02-Jun-2011 14:41

1. I can't say that I have looked at thousands of applications to check, but I would think that generally people defined TTs in order to use them and the definition was confined to programs which did use them.  Thus, they would only remain unused if the flow of control was such that sometimes they would not be used.  I just don't think that is typical and certainly not such that hundreds or thousands of such unused TTs would be defined and not used at one time.

2. Isn't that apparent?  If the TT is going to be used, it has to be instantiated.  This change only moves the timing of that instantiation, not the amount of work required.

Posted by jmls on 02-Jun-2011 15:07

I very rarely use temp-tables, and with the super-duper-code-generator

(see it at PUG Americas) , only ever when they are needed. The most tt

defs in one class is 4. Or is it 5 ? Have to check.

For my use-case, I only ever need define what I am going to use.

Don't get me wrong - I know, and understand completely, that you (Tim)

need to be able to define a large amount of tables and I think that

what Progress have done makes an awful lot of sense.

Julian

Posted by Tim Kuehn on 10-Jun-2011 14:52

tamhas wrote:

1. I can't say that I have looked at thousands of applications to check, but I would think that generally people defined TTs in order to use them and the definition was confined to programs which did use them.  Thus, they would only remain unused if the flow of control was such that sometimes they would not be used.  I just don't think that is typical and certainly not such that hundreds or thousands of such unused TTs would be defined and not used at one time.

2. Isn't that apparent?  If the TT is going to be used, it has to be instantiated.  This change only moves the timing of that instantiation, not the amount of work required.

1. That's your first fallacy - that TT's are only defined _in_ the program that uses them. When the same TT definitions have to be used between different programs, the logical way to go is to define the TT's in a file and include that. The reason my particular case came up was that multiple TT's were defined in such a file which was then included in the target programs, each of which may or may not actually use all the TT definitions. (and no, this wasn't me doing that). The organization in question didn't know about the TMTT problem, so this wasn't considered a problem until they were too far down that path to easily get out of it again.

2. and the if the particular code path which would require a given TT is never called, the work is never done.

Posted by Tim Kuehn on 10-Jun-2011 14:54

jmls wrote:

Don't get me wrong - I know, and understand completely, that you (Tim)

need to be able to define a large amount of tables and I think that

what Progress have done makes an awful lot of sense.

Julian

Let me be clear about this - this wasn't a coding structure I made up, but one made by reasonably competent developers who weren't aware of the TMTT problem, or about how TT's are instantiated on program load, and so they took what would seem to be a perfectly sensible approach to centralizing their TT definitions.

Posted by Admin on 10-Jun-2011 15:09

but one made by reasonably competent developers who weren't aware of the TMTT problem, or about how TT's are instantiated on program load, and so they took what would seem to be a perfectly sensible approach to centralizing their TT definitions.

To be fair, there isn't a whole lot of documentation about this Mainly personal experiences ... if any.

Let's hope that the TT-VST's as expected for OE11 help to make this more visible.

Posted by jmls on 10-Jun-2011 15:14

Sure, and I have done the same sort of thing myself. I was just saying

that this (the fix) is a good idea

Posted by Thomas Mercer-Hursh on 10-Jun-2011 17:19

You have a case where TTs are defined much more than they are used.  I recognize that and I recognize that it isn't your fault and is not easily fixed.  My point is not about your site, but about what is generally the case.

I understand that a TT may be defined in a program unit and the path of execution may not pass though the code that uses it.  Happens, although I would suggest that it was questionable program partitioning.

While I don't like it, I recognize that it is common to put TT definitions into an include file so that an identical definition can be used in each program that uses it.  However, note the word *use*.  If a program does not anywhere reference a TT, why does that program have a definition to the TT.  If *no* flow of contol is going to touch a TT in a given compile unit, then the TT definition should not be there, any more than one should be defining variables which are not referenced, functions which are not defined, etc.  That just makes things hard to figure out.

The practice at your site which appears to have contributed the most to the problem was putting multiple TT definitions into a single file and then referencing that include file regardless of whether all or perhaps even any of the TTs were actually used in that compile unit.  Had the TT definitions been put in one file per TT and that include file referenced in the compile unit only if actually referenced, it sounds like there would have been far, far fewer TTs being defined.  Quite aside from the TMTT problem, that is bad programming practice., no different really than include files with 100 shared variables referenced everywhere regardless of whether any of those variables are actually used in the program.

Fixing the problem by encapsulating the TT usage would probably be a fairly large task.  Helping the problem along by something like a Proparse script to produce a list of compile units and the TTs they reference and don't reference as a guide to systematic substitution of single TT includes in place of the multiple TT includes would be not nearly so daunting a task, especially since there would be nothing requiring one to do all the work at once.  Some cleverness, and one could probably automate the whole thing.

The question is not this one site, but how many sites are likely to be helped by this change.  Given that I would guess:

1. Most sites using include files to define TTs have one per TT.

2. Most sites using include files to define TTs put them only into compile units that actually reference that TT.

3. A large percentage of functions which use TTs have no flow of control that does not touch the TT.

I think only a small number of sites will be impacted by this change.  Those sites that are impacted may be impacted dramatically, so great for them, but that doesn't mean that the average site will have any impact they will notice.

Message was edited by: Thomas Mercer-Hursh

Posted by Thomas Mercer-Hursh on 10-Jun-2011 17:29

Let's hope that the TT-VST's as expected for OE11 help to make this more visible.

One doesn't need anything that sophisticated.  Some fairly simple scripting should be able to run through a code base and report instance in which a TT is defined but not referenced.  Use that to substitute one or more include files which define the TTs that are referenced and leave out the ones that are not and they only thing left will be cases where the flow of control in a particular execution doesn't go through the piece of code which references the TT.  This is no different really than refactoring to eliminate dead code, unreferenced variables, etc.

Posted by Admin on 10-Jun-2011 17:34

One doesn't need anything that sophisticated.  Some fairly simple scripting should be able to run through a code base and report instance in which a TT is defined but not referenced

I'm in doubt your code parser will any closely be able to estimate how much I/O temp-tables will cause at runtime. There are more use cases for those VST's than just the TMTT issue.

Posted by Thomas Mercer-Hursh on 10-Jun-2011 18:19

I think the TT VSTs are a great addition and will be helpful at diagnosing many TT performance issues ... I just think that they are unneeded and possibly even unhelpful for the problem addressed by this thread.  One imagines that a defined but non-instantiated TT will not show up in the VST at all because it doesn't exist yet.

Posted by Admin on 11-Jun-2011 00:54

I think the TT VSTs are a great addition and will be helpful at diagnosing many TT performance issues ... I just think that they are unneeded and possibly even unhelpful for the problem addressed by this thread. One imagines that a defined but non-instantiated TT will not show up in the VST at all because it doesn't exist yet.

With the delayed instantiation it will probably not show up in the VST (but we don't know before we have details about the implementation). But unused temp tables will also not slow down the runtime anymore as it does now.

Well done, Tim and dev team!

Posted by Thomas Mercer-Hursh on 11-Jun-2011 11:12

unused temp tables

Where Tim and I seem to differ is how common are unused temp tables in most sites.

Posted by Ruanne Cluer on 19-Feb-2014 02:10

Not that this will help the situation described in your current verion, but quite some work has been done in this area since OE 11, specifically wrt delayed temp-table instantiation (which is the case you're making ;)

TT's creation are delayed by default to the point in the code where records in the temp-table are queried or created, thereby alleviating the performant impact when instantiating a proceedure/class that uses loads of TT's and completely avoid having to spend resources on TT's that aren't needed. Cool hey?

The -nottdelay has to be specified to revert this behavior in OE 11 to the current situation you're describing by disabling delayed TT instantiation in OE 11.

This thread is closed