I am trying to adjust my habits and begin writing client-server code that will perform well. I have always made "self-service" connections to the database in the past, and it seems I picked up bad habits that make for poor performance when ABL is using a client-server connection.
One of the things I gathered from reading the following link...
https://knowledgebase.progress.com/articles/Article/18342
... is that there are optimizations that take place for special variations of FOR EACH NO-LOCK. These optimizations are realized the most when -Mm is configured to send higher number of records at once. I was starting with the simplest variations of "FOR EACH" that I could come up with, but even in the simple cases I'm not making sense of my observations. Consider the two loops below that are the same except for the where condition.
/* ****************************************************************** */ /* Loop and fetch data */ /* ****************************************************************** */ FOR EACH prd_desc WHERE prd_desc.branch = p_BranchCode NO-LOCK: DEFINE VARIABLE v_Test AS CHARACTER NO-UNDO. v_Test = prd_desc.code-value. END. /* ****************************************************************** */ /* Loop and fetch data */ /* ****************************************************************** */ FOR EACH prd_desc WHERE prd_desc.job-type = p_JobType NO-LOCK: DEFINE VARIABLE v_Test AS CHARACTER NO-UNDO. v_Test = prd_desc.code-value. END.
As it turns out, the second variation packs a *lot* more records into each round-trip than the first. I examined the table definition, and find that there is an index on job-type but not on branch. The table is relatively small (maybe 50,000 small rows in total.) And the number of results are about the same for both of the FOR EACH examples.
My preference would be that the client-server "FOR EACH NO-LOCK optimization" should happen the same for both of these loops. They both should send over the max number of records per round-trip and perform equally well. Unfortunately this is not the case and the quantity of the network round-trips seems to be somehow dependent on the available indexes (only a theory). This index-dependent behavior was not stated explicitly in the KB article I mentioned above so I wanted to check and see if this theory makes sense to anyone else.
In short, the second loop, (an index is available for the where condition) seems to pack a large number of records into its round-trips (it's only limited by the -Mm configuration). The first loop, however, seems to be make a lot more round-trips with fewer records in each. The number of records per round-trip is limited in some other mysterious way that does not seem to be based simply on the -Mm.
Any feedback would be appreciated. If this is not clear then I can try to recreate in the sports database.
I can confirm your results in tests with 11.7.2 on Linux against sports2000 without/with an index on the "linenum" field and without/with your -prefetch* parameters:
Messages receive: 197 14 9 9 Messages sent: 196 13 8 8 Bytes received: 21542 1688 1148 1148 Bytes sent: 233720 215969 215486 215486 Records received: 0 0 0 0 Records sent: 3953 3953 3953 3953 Queries received: 196 13 8 8 Time slices: 3799 3988 3991 3991
The results are unexpected for me and they are different from what I got in V10.2B.
> While the network traffic is (finally) the same - regardless of the index - the performance is still quite different.
Without the index the query retrieved all 13970 records while using the index - only 3953 records.
I didn't try those prefetch parameters yet. But I did see some references to them and I have it on my list to investigate. Given that I am new to client-server I didn't want to jump in so deeply yet. I believed the defaults would suffice. But my concern is (still) that there is something is going on internally that treats "FOR EACH NO-LOCK" differently based on whether it is using the "perfect" index or not.
I noticed the default prefetchNumRecs is 16. My theory is that this number is treated as the number of candidate records that are evaluated via the selected index. In the case of a "perfect" index, this may return 16 consecutive matches that all get shoved over the network in a single round-trip. But if a bad index is used then perhaps 16 records are scanned in the base table, and only some small portion of them get shoved over the network in the client-server round-trip. This is all conjecture based on the differences I see in the network traffic between the two cases (whether or not there is a "perfect" index).
Here are the parameters I plan to test with :
-prefetchFactor 100
-prefetchNumRecs 100
-prefetchDelay (allows the first round-trip to contain more than a single record)
FYI, I found another discussion about prefetch* parameters here:
community.progress.com/.../20822
I will test these parameter asap and let you know if they eliminate the differences I see in the FOR EACH NO-LOCK. (Differences between the case where it uses a "perfect" index and the case it does not).
> and find that there is an index on job-type but not on branch.
Then it would have a sense to compare:
WHERE prd_desc.job-type = p_JobType
vs
WHERE STRING(prd_desc.job-type) = p_JobType
STRING() function will be resolved on server. Both queries (with/without STRING) will send through a network the same number of records but will retrieve the different number of records from database using the different indexes.
> In short, the second loop, (an index is available for the where condition) seems to pack a large number of records into its round-trips (it's only limited by the -Mm configuration). The first loop, however, seems to be make a lot more round-trips with fewer records in each. The number of records per round-trip is limited in some other mysterious way that does not seem to be based simply on the -Mm.
Can you post the numbers? You can use the following code:
&SCOPED-DEFINE TestQuery EACH Customer NO-LOCK WHERE Country EQ "Finland" USE-INDEX NAME &SCOPED-DEFINE MessagesForPromon DEFINE VARIABLE vMsgRec LIKE _ActServer._Server-MsgRec NO-UNDO EXTENT 3. DEFINE VARIABLE vMsgSent LIKE _ActServer._Server-MsgSent NO-UNDO EXTENT 3. DEFINE VARIABLE vByteRec LIKE _ActServer._Server-ByteRec NO-UNDO EXTENT 3. DEFINE VARIABLE vByteSent LIKE _ActServer._Server-ByteSent NO-UNDO EXTENT 3. DEFINE VARIABLE vRecRec LIKE _ActServer._Server-RecRec NO-UNDO EXTENT 3. DEFINE VARIABLE vRecSent LIKE _ActServer._Server-RecSent NO-UNDO EXTENT 3. DEFINE VARIABLE vQryRec LIKE _ActServer._Server-QryRec NO-UNDO EXTENT 3. DEFINE VARIABLE vTimeSlice LIKE _ActServer._Server-TimeSlice NO-UNDO EXTENT 3. DEFINE VARIABLE vDbAccess LIKE _UserIO._UserIO-DbAccess NO-UNDO EXTENT 3. DEFINE VARIABLE vDbReads LIKE _UserIO._UserIO-DbRead NO-UNDO EXTENT 3. DEFINE VARIABLE vTableRead LIKE _UserTableStat._UserTableStat-read NO-UNDO EXTENT 3. DEFINE VARIABLE vIndexRead LIKE _UserIndexStat._UserIndexStat-read NO-UNDO EXTENT 3. DEFINE VARIABLE vExecTime AS INTEGER NO-UNDO. PROCEDURE GetServerStat: DEFINE INPUT PARAMETER ipStatId AS INTEGER NO-UNDO. DEFINE VARIABLE vRangeSize AS INTEGER NO-UNDO. DEFINE VARIABLE vFirstStatID AS INTEGER NO-UNDO. DEFINE VARIABLE vLastStatID AS INTEGER NO-UNDO. FOR FIRST DICTDB._MyConnection NO-LOCK, FIRST DICTDB._Connect NO-LOCK WHERE DICTDB._Connect._Connect-Id EQ DICTDB._MyConnection._MyConn-UserId + 1, FIRST DICTDB._ActServer NO-LOCK WHERE DICTDB._ActServer._Server-Id EQ DICTDB._Connect._Connect-Server + 1, FIRST DICTDB._UserIO NO-LOCK WHERE DICTDB._UserIO._UserIO-Id EQ DICTDB._MyConnection._MyConn-UserId + 1: ASSIGN vMsgRec [ipStatId] = DICTDB._ActServer._Server-MsgRec vMsgSent [ipStatId] = DICTDB._ActServer._Server-MsgSent vByteRec [ipStatId] = DICTDB._ActServer._Server-ByteRec vByteSent [ipStatId] = DICTDB._ActServer._Server-ByteSent vRecRec [ipStatId] = DICTDB._ActServer._Server-RecRec vRecSent [ipStatId] = DICTDB._ActServer._Server-RecSent vQryRec [ipStatId] = DICTDB._ActServer._Server-QryRec vTimeSlice[ipStatId] = DICTDB._ActServer._Server-TimeSlice vDbAccess [ipStatId] = DICTDB._UserIO._UserIO-DbAccess vDbReads [ipStatId] = DICTDB._UserIO._UserIO-DbRead . /* ASSIGN */ FOR LAST DICTDB._UserTableStat NO-LOCK: ASSIGN vRangeSize = DICTDB._UserTableStat._UserTableStat-Id / (DICTDB._UserTableStat._UserTableStat-Conn + 1) vFirstStatID = vRangeSize * DICTDB._MyConnection._MyConn-UserId vLastStatID = vRangeSize + vFirstStatID vFirstStatID = vFirstStatID + 1 . /* ASSIGN */ END. /* FOR LAST _UserTableStat */ FOR EACH DICTDB._UserTableStat NO-LOCK WHERE DICTDB._UserTableStat._UserTableStat-id GE vFirstStatID AND DICTDB._UserTableStat._UserTableStat-id LE vLastStatID: ACCUMULATE DICTDB._UserTableStat._UserTableStat-Read (TOTAL). END. /* FOR EACH _UserTableStat */ FOR LAST DICTDB._UserIndexStat NO-LOCK: ASSIGN vRangeSize = DICTDB._UserIndexStat._UserIndexStat-Id / (DICTDB._UserIndexStat._UserIndexStat-Conn + 1) vFirstStatID = vRangeSize * DICTDB._MyConnection._MyConn-UserId vLastStatID = vRangeSize + vFirstStatID vFirstStatID = vFirstStatID + 1 . /* ASSIGN */ END. /* FOR LAST _UserIndexStat */ FOR EACH DICTDB._UserIndexStat NO-LOCK WHERE DICTDB._UserIndexStat._UserIndexStat-id GE vFirstStatID AND DICTDB._UserIndexStat._UserIndexStat-id LE vLastStatID: ACCUMULATE DICTDB._UserIndexStat._UserIndexStat-Read (TOTAL). END. /* FOR EACH _UserIndexStat */ ASSIGN vTableRead[ipStatId] = ACCUM TOTAL DICTDB._UserTableStat._UserTableStat-read vIndexRead[ipStatId] = ACCUM TOTAL DICTDB._UserIndexStat._UserIndexStat-read . /* ASSIGN */ END. /* FOR FIRST _MyConnection, _Connect, _ActServer, _UserIO */ END PROCEDURE. /* GetServerStat */ RUN GetServerStat(1). RUN GetServerStat(2). /******************** Insert here a query to test: **************************/ &IF {&MessagesForPromon} &THEN MESSAGE "Zero Activity in promon" VIEW-AS ALERT-BOX INFORMATION BUTTONS OK. &ENDIF ASSIGN vExecTime = ETIME. FOR {&TestQuery}: ACCUMULATE "Record":U (COUNT). END. ASSIGN vExecTime = ETIME - vExecTime. &IF {&MessagesForPromon} &THEN MESSAGE "Update Activity in promon" VIEW-AS ALERT-BOX INFORMATION BUTTONS OK. &ENDIF /***************************************************************************/ RUN GetServerStat(3). DISPLAY vMsgRec [3] + vMsgRec [1] - 2 * vMsgRec [2] @ _Server-MsgRec vMsgSent [3] + vMsgSent [1] - 2 * vMsgSent [2] @ _Server-MsgSent vByteRec [3] + vByteRec [1] - 2 * vByteRec [2] @ _Server-ByteRec vByteSent [3] + vByteSent [1] - 2 * vByteSent [2] @ _Server-ByteSent vRecRec [3] + vRecRec [1] - 2 * vRecRec [2] @ _Server-RecRec FORMAT ">>>>>>>>>>>>>>>>>>>9" vRecSent [3] + vRecSent [1] - 2 * vRecSent [2] @ _Server-RecSent vQryRec [3] + vQryRec [1] - 2 * vQryRec [2] @ _Server-QryRec vTimeSlice[3] + vTimeSlice[1] - 2 * vTimeSlice[2] @ _Server-TimeSlice vTableRead[3] + vTableRead[1] - 2 * vTableRead[2] @ _UserTableStat-read LABEL "Record Reads" vIndexRead[3] + vIndexRead[1] - 2 * vIndexRead[2] @ _UserIndexStat-read LABEL "Index Reads" vDbAccess [3] + vDbAccess [1] - 2 * vDbAccess [2] @ _UserIO-DbAccess vDbReads [3] + vDbReads [1] - 2 * vDbReads [2] @ _UserIO-DbRead ACCUM COUNT "Record":U FORMAT ">>>>>>>>9" LABEL "Record Count" vExecTime FORMAT ">>>>>>>>9" LABEL "Execution Time" WITH SIDE-LABELS 1 COLUMN TITLE '{&TestQuery}' WIDTH 78. OUTPUT TO VALUE("ServerStatTest.log") APPEND. VIEW. PUT SKIP(2). OUTPUT CLOSE.
Why are you defining a variable inside the loop? That's kind of bizarre.
-prefetchNumRecs is very important, 16 is a terrible default that dates from the 1980s. I suggest 10,000.
Properly applied FIELDS phrases are also important (and surprisingly effective).
Avoiding JOINs is important.
The difference when there is no index sounds a lot like the client is doing the filtering. It's bad enough when the server has to do a table scan but filtering on the client would be awful. I was under the impression that CAN-DO() and user defined functions were the only things that currently cause that to happen. I feel the urge to run some tests but I don't see much free time on the horizon :(
> I was under the impression that CAN-DO() and user defined functions were the only things that currently cause that to happen.
It's still true.
I tested:
FOR EACH Customer NO-LOCK WHERE Customer.Country EQ "USA"
vs
FOR EACH Customer NO-LOCK WHERE STRING(Customer.Country) EQ "USA"
vs
FOR EACH Customer NO-LOCK WHERE Customer.Country EQ "USA" USE-INDEX Cust-Num
vs
FOR EACH Customer NO-LOCK WHERE Customer.Country EQ "USA" NO-PREFETCH
vs
FOR EACH Customer NO-LOCK WHERE CAN-DO("USA", Customer.Country)
Results:
Messages received 8 8 8 35 18 Messages sent 7 7 7 34 17 Bytes received 720 748 740 2772 1500 Bytes sent 5387 5387 5387 7141 14261 Records received 0 0 0 0 0 Records sent 33 33 33 33 83 Queries received 7 7 7 34 17 Time slices 32 32 32 0 72
Tested with 10.2B. Note that the statistics is broken in 11.6 and 11.7 on Windows. Fixed in 11.7.1
PSC00357275
=============================
On Windows, the _ActServer VST has zeroes in some fields where it should have
non-zero values:
_server-msgrec
_server-msgsent
_server-byterec
_server-bytesent
This is also true in promon in the Activity: Servers screen.
George, thanks for the pointer to the VST's for _ActServer and _Connect. Using those numbers should be easier and more accurate than using wireshark.
My first demonstration uses sports2000 on HP-UX with OE 11.6.3. I will also repeat the same test against a Windows server running OE 11.6.3. I will post the Windows results at another time.
These tests use the orderline table; of which about 4000 records match the condition: "WHERE orderline.linenum = 1". For setting up the database to take my baseline measurements I do as follows:
prodb mysports sports2000 -newinstance proserve mysports -Mm 32000 -B 3000 -L 50000 -d mdy -T /temp -S 5205 logs from startup: ============================= [2017/12/11@10:31:06.076-0500] P-20374 T-1 I BROKER 0: (16955) Delay first prefetch message (-prefetchDelay): Not Enabled [2017/12/11@10:31:06.076-0500] P-20374 T-1 I BROKER 0: (16956) Prefetch message fill percentage (-prefetchFactor): 0 [2017/12/11@10:31:06.076-0500] P-20374 T-1 I BROKER 0: (16957) Minimum records in prefetch msg (-prefetchNumRecs): 16 [2017/12/11@10:31:06.076-0500] P-20374 T-1 I BROKER 0: (16958) Suspension queue poll priority (-prefetchPriority): 0 ... [2017/12/11@10:31:06.076-0500] P-20374 T-1 I BROKER 0: (4239) Number of Database Buffers (-B): 3000 ... [2017/12/11@10:31:06.077-0500] P-20374 T-1 I BROKER 0: (12818) Message Buffer Size (-Mm): 32000 ============ Code: See George's test harness above. I use the same thing but insert the following ABL loop.
Note that initially there should be no "perfect" index to assist with the filtering condition on linenum.
However, the table should fit entirely into buffered memory after the first execution or two,
and data should be retrieved very quickly from the database, especially if a self-service connection
were to be used instead. FOR EACH orderline WHERE orderline.linenum = 1 NO-LOCK: END.
Running this code a number of times will cause the data to be buffered into the memory of the database server. The network statistics from _ActServer will settle at the following values (between 210-240 messages in both directions). The number of records should always be the same (ie. 3953 which is based on the WHERE filter):
│Messages receive: 238 │ │ Messages sent: 237 │ │ Bytes received: 25880 │ │ Bytes sent: 241661 │ │Records received: 0 │ │ Records sent: 3953 │ │Queries received: 237 │ │ Time slices: 3913 │
Now I add an index, use "proshut" to stop and then restart the database using the exact same configuration as before. And I recompile my program. This time the program will use the index on orderline.linenum. Nothing else should change other than the presence of this new index, and the compiler's preference to use it. Below is the index and the new results that are generated while running the program.
CREATE INDEX "mylinenum" ON PUB."orderline" ("linenum" ASC ) PRO_DESCRIPTION '' PRO_ACTIVE 'y'; │Messages receive: 84 │ │ Messages sent: 83 │ │ Bytes received: 9232 │ │ Bytes sent: 226712 │ │Records received: 0 │ │ Records sent: 3953 │ │Queries received: 83 │ │ Time slices: 3953 │
Its pretty troubling to me that, depending on the indexes, there are three times the number of round-trips on the network. This is despite the fact that my code has not changed in any way. I had hoped that the "FOR EACH NO-LOCK" optimization would, in some way, batch my entire FOR EACH as a query that would be sent to the server and would receive consistent behavior from the database server no matter its internal implementation (especially if the whole operation can be performed using data that is buffered in memory). However, the behavior of the server doesn't appear to be consistent at all; and it depends on a variety of startup parameters, and unexpected factors such as the available indexes.
My fear is that we will have yet *another* reason to keep expanding the list of indexes that we create on our tables. Many of the indexes we have on our tables are already highly questionable. It would be unfortunate to find ourselves adding even more indexes for no other reason than to influence the number of round-trips made to resolve the "FOR EACH NO-LOCK" queries. In the worst possible case, every FOR EACH loop on a given table may need to be evaluated to determine whether a special index would greatly decrease the network traffic. This client-"server" stuff could get out of hand...
> and then restart the database using the exact same configuration as before.
Are you sure that the -Mm stays the same (32000)?
Yes, my preserve command is the same each time. The only reason I stopped the database at all was because I've found that ABL "SERV" connections don't handle online SQL schema changes very well. I have it on my list to open a case with Progress about the weird error messages we've seen in client/server connections after performing schema changes. The errors won't go away until the entire database is restarted.
Here are the results of the test on Windows. As George says, some of the _ActServer numbers aren't being reported properly but I think we can rely on "queries received" which is still reported on Windows, as it was on HP-UX.
before index added =============== │Messages receive: 0 │ │ Messages sent: 0 │ │ Bytes received: 0 │ │ Bytes sent: 0 │ │Records received: 0 │ │ Records sent: 3953 │ │Queries received: 93 │ │ Time slices: 3991 │ after index added ================= │Messages receive: 0 │ │ Messages sent: 0 │ │ Bytes received: 0 │ │ Bytes sent: 0 │ │Records received: 0 │ │ Records sent: 3953 │ │Queries received: 27 │ │ Time slices: 3953 │
Once again, the round-trips are different by a factor of three. I create and start the database the same way as in HP-UX:
prodb mysports sports2000 -newinstance proserve mysports -Mm 32000 -B 3000 -L 50000 -d mdy -T c:\temp -S 5205
My Windows repros always run slightly faster than HP-UX, probably because of modern Intel Xeon CPU. As you can see above, the number of network round-trips (with and without index) was still different by a factor of three, and the overall elapsed times were 30ms (with an index) and 50ms (without). It is interesting that the number of network round-trips was not correlated to the amount of time that elapsed.
I decided to do yet another set of tests running the client/server code *local* to the database (but still not self-service). However when running locally, the elapsed time results were basically the same (ie 30 ms and 50 ms for the "with index" and "without index" scenarios.) I guess that indicates that the latency of the network itself is negligible and the database server is solely responsible for performance issues.
When running my tests in self-service mode ("shared memory"), the elapse time results go all the way down to 15 ms and 25 ms.
I went ahead and introduced the prefetch options: -prefetchDelay -prefetchFactor 100 -prefetchNumRecs 1000
Below is the output for the "with index" scenario, and the "without index" scenarios, respectively:
│Messages receive: 8 │ │ Messages sent: 7 │ │ Bytes received: 1024 │ │ Bytes sent: 219342 │ │Records received: 0 │ │ Records sent: 3953 │ │Queries received: 7 │ │ Time slices: 3947 │ │Messages receive: 8 │ │ Messages sent: 7 │ │ Bytes received: 1040 │ │ Bytes sent: 219342 │ │Records received: 0 │ │ Records sent: 3953 │ │Queries received: 7 │ │ Time slices: 3985 │
As you can see, they have become essentially the same (whereas previously they indicated a difference in network traffic by a factor of 3).
While the network traffic is (finally) the same - regardless of the index - the performance is still quite different. When the index is present the code runs in 40ms, vs 70ms when its not available. So its clear that, in the end, the network activity wasn't the root cause of the performance problems which I was trying to troubleshoot.
While it was true that network activity used to be three times higher when an index was not in use, it was probably not the root cause of performance problems since the performance problems persist, despite the equivalent network activity that we see now.
I can confirm your results in tests with 11.7.2 on Linux against sports2000 without/with an index on the "linenum" field and without/with your -prefetch* parameters:
Messages receive: 197 14 9 9 Messages sent: 196 13 8 8 Bytes received: 21542 1688 1148 1148 Bytes sent: 233720 215969 215486 215486 Records received: 0 0 0 0 Records sent: 3953 3953 3953 3953 Queries received: 196 13 8 8 Time slices: 3799 3988 3991 3991
The results are unexpected for me and they are different from what I got in V10.2B.
> While the network traffic is (finally) the same - regardless of the index - the performance is still quite different.
Without the index the query retrieved all 13970 records while using the index - only 3953 records.
Thanks for double-checking my work. I appreciate knowing that I wasn't crazy in my observations.
I think that, while my observations were accurate, it was my original conclusions that were wrong. I now suspect it may be that the duration of the "FOR EACH" operation on the server is what is determining how many network round-trips are performed, not the other way around. (IE. if the operation is taking longer, then by default it schedules more round-trips to return intermediate results to the client.) In the end, the root cause for the higher quantity of network messages was probably related to the extended duration of time that was used for completing the work (ie. operating on ~14000 records vs ~4000, which is roughly 3x more .)
I don't think the network traffic in itself was a source of any substantial delays - at least not in any of these tests above.
Perhaps if our network was bad *and* the database operation was long-running, then my performance problems would be compounded by all the additional round-trips that would return intermediate results....
....George, as a suggestion for the future... you may want to include the elapsed ETIME output in your tests - especially if I'm correct in my theory that the duration of work can be the direct cause for a higher number of round-trip messages.
The overall lesson here seems to be simply that indexes make ABL code execute quite a lot faster (even when the entire table resides in memory). I was quite a bit shocked to see 3x messages, but I think I understand that the reason for that originates with the performance of the database table . Another lesson learned was that the prefetch* configuration should avoid compounding database performance problems with larger network performance problems.
> George, as a suggestion for the future... you may want to include the elapsed ETIME output in your tests
I did it. Also I added more statistics and now program saves the results to an output file. New version of program is the updated message above.
I run the additional tests against sports2000 -Mm 32000 with Progress V11.7.2 on Windows:
FOR EACH Customer NO-LOCK WHERE Country EQ "Finland"
vs
the same query and -prefetchNumRecs 1000
vs
FOR EACH Customer NO-LOCK WHERE Country EQ "Finland" USE-INDEX NAME
vs
the same same query and -prefetchNumRecs 1000
Results:
Messages receiv: 4 2 13 2 Messages sent: 3 1 12 1 Bytes received: 608 392 1580 392 Bytes sent:5008 4816 5884 4816 Records received: 0 0 0 0 Records sent: 28 28 28 28 Queries received: 3 1 12 1 Time slices:26 28 187 198 Record Reads: 28 28 1117 1117 Index Reads: 29 29 1118 1118 DB Access: 59 59 2260 2260 DB Read: 2 2 48 48 Record Count: 28 28 28 28 Execution Time: 0 0 2 2
It's the -prefetchNumRecs 1000 that makes the difference.