We have some legacy code in an ABL / CHUI environment , and I have one particularly widely used procedure which is :
(ie. fairly standard stuff for legacy ABL programming)
And now, many years down the road, we discover some rare form of data corruption from the procedure and we decide it's better that the ABL client should die on occasion than to corrupt our data. Without revisiting the design of the code from square one, how can we go about that (rolling back and dying)?
My initial thought was to pick between QUIT and STOP. Since QUIT actually commits data, that leaves me with STOP. But according to the docs, STOP seems to be trappable and may not always cause the ABL client to die.
So now I'm thinking I shouldn't use either QUIT or STOP. My next thought is to use the OS and kill myself with the following:
kill -SIGHUP 12345
http://knowledgebase.progress.com/articles/Article/P133912
According to the docs, this behaves in ABL like an untrappable STOP.
Any thoughts? Is there a more straightforward approach to rolling back and dying in an ABL client?
We'll need to know your version first as different releases have different capabilities.
I'm also curious why you're looking at proceeding this way instead of fixing the broken procedure - clearly something is telling you there's an error so why not fix it?
maybe just wrap the legacy code in a do on error undo, retry block and then check if retry show a message and quit...
do on error undo, retry: if retry then do: message 'your quote of the day'. quit. end. /* legacy code, refactor that to let it throw errors aka: remove no-error on statements that should fail fast */ ... end.
> QUIT actually commits data
No, it's logially impossible.
> the procedure has no output error parameters
You can check the errors using the undocumented _MSG pseudo-variable and _MSG() function.
According to the docs, QUIT commits the current data: Link
The faulty data indicates an invalid state going all the way back up to the start of the transaction (potentially many levels up the call-stack, with different possible call-stacks that may reference the procedure). While it is the most common scenario, it isn't always a CHUI client that uses the procedure either - it may be a batch or something else.
Redesigning the code would take too much effort, especially considering how rare it is that we get ourselves into an invalid state and create invalid data (once a month or so, on a very busy database).
Regardless of the reason, STOP is about as close as the ABL comes to what we are looking to do (ie. rollback and crash out of the AVM). Here is the link to STOP information: I just wish we could prevent any ON STOP handling (as would be the case if we used kill -SIGHUP).
According to the docs you can raise the STOP condition in the code with a STOP. statement, and then catch the STOP with an ON STOP in the "-p" procedure. If this is correct, then you can use that to unwind the transaction / call stack.
> According to the docs, QUIT commits the current data
Yes, I was wrong... a bit.
DO TRANSACTION ON QUIT UNDO: CREATE customer. ASSIGN customer.name = "New". QUIT. END.
Yes, QUIT does cause the current transaction to get committed, before exiting the session (assuming there no code trapping QUIT).
When you say faulty data / data corruption, what exactly are you talking about? Is it "logical" application-level inconsistency? Or is it actual data corruption in the database that can be detected with idxcheck / dbrpr? What are the exact side effects of this corruption. I just wanted to check if this is an application issue or a product issue.
What OE release are you running with?
As far as stopping that process, as you alluded to, STOP can be trapped by any ABL code so that is not a reliable approach. Very few stop conditions are not trappable at all (for instance, a Lock table overflow cannot be trapped).
BTW, session's "suicide" using kill -8 or kill -9 is safe for database because session will not hold the critical db resources at this moment.
Yes, its application level inconsistent data that is discovered deep in a callstack without a good way of notifying callers about a critical problem (no S.E.H, no error output params, no "NO-ERROR" strategy on run statements, etc). Its amazing how much of our legacy ABL code assumes no errors will ever possibly occur, either now or later after any future coding changes...
I'm very happy with OE/OO which takes the opposite approach and allows for any manner of error to occur at any time (the application code then having the responsibility for handling them - but only the ones it knows about).
(These days with brand new code, I would use OO to throw a unique type of error class that isn't handled by anyone. That definitely gets the job done on the OO side of things. But with poorly written legacy code, however, there is no obvious way to throw a fatal error and roll back the db).
We are on 11.3, planning an upgrade to 11.6.
Your KB says that sending SIGHUP corresponds to an untrappable STOP, which is why I'm inclining towards that approach. I wish there was an option on the ABL STOP statement that had the same effect.
FROM YOUR KB: SIGHUP: indicate to the running process that the terminal it is trying to read or write to has disconnected. The signal is sent by the operating system as a response to a read or write operation. The Progress / OpenEdge process defines a special signal handler that will treat this as an untrappable STOP condition.
Are there any ON STOP statements in your code? If there isn't then you can use the STOP statement and then trap the STOP condition at the top of the callstack and get what you want that way.
Cool. I'll try that out. I would prefer to have a future-proof'ed version of an "untrappable STOP". But I guess it is unlikely that anyone will include new STOP handling at the start of the callstack (batch, chui, appserver, whatever) unless they *really* know what they are doing.
As far as I'm aware, the only ABL conditions we check for and handle on a regular basis are ON ERROR and ON ENDKEY. (In ABL you have to put these on *every* single FOR EACH loop or you get the agonizing behavior where only *one* iteration of the loop is undone and the program keeps going on its merry way. But that is a separate topic altogether.)
Thanks for the help, David
I did some looking around for causing untrappable stops, and the only times it happens is when things like a DB shutdown or similar external events results in the ABL session effectively resetting / restarting itself.
When catching the top-level STOP condition, here's the top-level code I used:
test-block: DO ON STOP UNDO, LEAVE test-block: RUN CallStack.p. MESSAGE "In Main" VIEW-AS ALERT-BOX INFO BUTTON OK. END. MESSAGE "After Main" VIEW-AS ALERT-BOX INFO BUTTON OK.
I must be extra stupid today... a bad thing happens somewhere in your legacy code. You know that this bad thing happens and you're prepared to go through quite a few contortions to abort the process in such a way that the transaction does not get committed. But you cannot fix the error? That seems ill considered.
I can't create a general purpose fix for these errors in that program at that level of the call-stack. It is mainly because the program doesn't have all the necessary contextual information it would need to take a new course of action. IE. In my situation the program can see for itself - based on some of its own logic - that the data is invalid. But it:
Given how rare it is that the error situations come up, the choke-and-die approach seems suitable. As you say, we'd also fix the root cause if/when its possible to find it (nobody will prefer a program that chokes-and-dies over a program that succeeds).
But there would be too much refactoring involved to have a general-purpose solution that would either (A) pass back an output parameter all the way up the legacy ABL callstack, or (B) try to find and fix/validate all the input data everywhere, as supplied by potentially dozens of clients to this particular program.
This program is like a low-level private first class in the army. It has to do what its told without sending any unwanted outputs to its callers. It doesn't have too many options.
Now if this was OO code and could use Structured Error Handling, that would be a whole different story.
I really hate to suggest this as it goes against almost every best practice I can think of.... you could create a class with a static member, set a value in there, and then check that flag/value higher in the callstack...
A class with static members would also be a good way to capture the callstack and related information when the error happens, and then check the class's TT to see if there's anything interesting in it higher up the callstack or when quitting the program.
I tried to stay out of this...but I can't. Please...don't do that. You'll just tangle it even more, and have to fix even more bad code..."later".
If you're going about adding checks against global flags...just use exceptions. There's no easy fix. But there are correct fixes. The correct fix, is to throw an exception; It will unwind the stack and back out the transaction properly. Find all the places where this procedure is called, and add catch block to either handle it or sprinkle undo, throw where needed.
Given the constraints this is a viable - temporary - solution / diagnostic tool as long as it's only used for this, it's highly documented, and removed when the problem is fixed.
Alternatively would be to have a procedural pub/sub with a sub at the top of the stack and pub's where required. I'd consider this an inferior solution as its not encapsulated.
Just an FYI in case your last comment was an accurate reflection of your belief: you can use structured error handling ANYWHERE. It doesn't have to be in classes. A thrown error can be caught as an OO object, which you can do in any block, whether it's in a class or not. Or it can be trapped with a NO-ERROR just as before.
Though it is true that in your case you'd probably need a "BLOCK-LEVEL ON ERROR UNDO, THROW" in all your procedures and that could cause you even more nightmares without much careful thought. So, I'm not necessarily suggesting it. Just trying to clear up any myths.
> (A) pass back an output parameter all the way up the legacy ABL callstack
You can always pass back information using a shared variable, if you have code higher up that wants to check it and then undo the transation. No need to have output parameters for that.
@Laura, While I know its possible, I would avoid using SEH in legacy code. In general there are other error-handling strategies (as poor as they may be) in place, so that adding one more on top just creates confusion for the legacy ABL coders. (ie. callers of the error-throwing programs would then have to check for two or three different factors to see if an error happened).
The declaration of default ON ERROR handling (ON ERROR UNDO, THROW) which you mentioned, is one issue you'd have to address when using SEH in legacy code. Additionally you will force your OO-based error classes on non-OO coders; and they'd have to understand how to catch them properly, how class type inheritance works, how to get at class members within the error classes, and so on. I guess that in practice I don't buy into the idea that SEH should be used in OO *and* non-OO code ABL.
As soon as you start using SEH the code becomes, at the very least, OO-aware and you might as well not stop and limit your OO to just your errors.
Maybe this can be used as a learning opportunity. Work with co-workers to bring their skills up-to-date if that is hindering a proper solution.