Stop executing legacy code

Posted by dbeavon on 16-Feb-2016 12:18

We have some legacy code in an ABL / CHUI environment , and I have one particularly widely used procedure which is :

  • Is normally only one part of a larger encompasing transaction (may have an outer transaction scope in a calling procedure) 
  • the procedure has no output error parameters (ie. no way of normally signaling errors and notifying callers)
  • the enclosing environment does not support Structured Error Handling
  • the callers don't use "RUN xyz NO-ERROR" or check ON ERROR conditions

(ie. fairly standard stuff for legacy ABL programming)

And now, many years down the road, we discover some rare form of data corruption from the procedure and we decide it's better that the ABL client should die on occasion than to corrupt our data.  Without revisiting the design of the code from square one, how can we go about that (rolling back and dying)?

My initial thought was to pick between QUIT and STOP.  Since QUIT actually commits data, that leaves me with STOP.  But according to the docs, STOP seems to be trappable and may not always cause the ABL client to die.

So now I'm thinking I shouldn't use either QUIT or STOP. My next thought is to use the OS and kill myself with the following:

   kill -SIGHUP 12345

  http://knowledgebase.progress.com/articles/Article/P133912

According to the docs, this behaves in ABL like an untrappable STOP.


Any thoughts?  Is there a more straightforward approach to rolling back and dying in an ABL client?

All Replies

Posted by Tim Kuehn on 16-Feb-2016 12:25

We'll need to know your version first as different releases have different capabilities. 

I'm also curious why you're looking at proceeding this way instead of fixing the broken procedure - clearly something is telling you there's an error so why not fix it? 

Posted by Marian Edu on 16-Feb-2016 12:35

maybe just wrap the legacy code in a do on error undo, retry block and then check if retry show a message and quit...

do on error undo, retry:
   if retry then do:
      message 'your quote of the day'.
      quit.
   end.

   /* legacy code, refactor that to let it throw errors
      aka: remove no-error on statements that should fail fast
    */
   ...
end.

Posted by George Potemkin on 16-Feb-2016 12:58

> QUIT actually commits data

No, it's logially impossible.

> the procedure has no output error parameters

You can check the errors using the undocumented _MSG pseudo-variable and _MSG() function.

Posted by dbeavon on 16-Feb-2016 13:21

According to the docs, QUIT commits the current data: Link

The faulty data indicates an invalid state going all the way back up to the start of the transaction (potentially many levels up the call-stack, with different possible call-stacks that may reference the procedure).  While it is the most common scenario, it isn't always a CHUI client that uses the procedure either - it may be a batch or something else.

Redesigning the code would take too much effort, especially considering how rare it is that we get ourselves into an invalid state and create invalid data (once a month or so, on a very busy database).

Regardless of the reason, STOP is about as close as the ABL comes to what we are looking to do (ie. rollback and crash out of the AVM).  Here is the link to STOP information:   I just wish we could prevent any ON STOP handling (as would be the case if we  used kill -SIGHUP).

Posted by Tim Kuehn on 16-Feb-2016 13:49

According to the docs you can raise the STOP condition in the code with a STOP. statement, and then catch the STOP with an ON STOP in the "-p" procedure. If this is correct, then you can use that to unwind the transaction / call stack.

Posted by George Potemkin on 16-Feb-2016 14:10

> According to the docs, QUIT commits the current data

Yes, I was wrong... a bit.

DO TRANSACTION ON QUIT UNDO:
  CREATE customer.
  ASSIGN customer.name = "New".
  QUIT.
END.

Posted by Fernando Souza on 16-Feb-2016 14:12

Yes, QUIT does cause the current transaction to get committed, before exiting the session (assuming there no code trapping QUIT).

When you say faulty data / data corruption, what exactly are you talking about? Is it "logical" application-level inconsistency? Or is it actual data corruption in the database that can be detected with idxcheck / dbrpr? What are the exact side effects of this corruption. I just wanted to check if this is an application issue or a product issue.

What OE release are you running with?

As far as stopping that process, as you alluded to, STOP can be trapped by any ABL code so that is not a reliable approach. Very few stop conditions are not trappable at all (for instance, a Lock table overflow cannot be trapped).

Posted by George Potemkin on 16-Feb-2016 14:24

BTW, session's "suicide" using kill -8 or kill -9 is safe for database because session will not hold the critical db resources at this moment.

Posted by dbeavon on 16-Feb-2016 15:44

Yes, its application level inconsistent data that is discovered deep in a callstack without a good way of notifying callers about a critical problem (no S.E.H, no error output params, no  "NO-ERROR" strategy on run statements, etc).  Its amazing how much of our legacy ABL code assumes no errors will ever possibly occur, either now or later after any future coding changes...

I'm very happy with OE/OO which takes the opposite approach and allows for any manner of error to occur at any time (the application code then having the responsibility for handling them - but only the ones it knows about).

(These days with brand new code, I would use OO to throw a unique type of error class that isn't handled by anyone.  That definitely gets the job done on the OO side of things.  But with poorly written legacy code, however, there is no obvious way to throw a fatal error and roll back the db).

We are on 11.3, planning an upgrade to 11.6.

Your KB says that sending SIGHUP corresponds to an untrappable STOP, which is why I'm inclining towards that approach.  I wish there was an option on the ABL STOP statement that had the same effect.

FROM YOUR KB:  SIGHUP: indicate to the running process that the terminal it is trying to read or write to has disconnected. The signal is sent by the operating system as a response to a read or write operation. The Progress / OpenEdge process defines a special signal handler that will treat this as an untrappable STOP condition.

Posted by Tim Kuehn on 16-Feb-2016 16:15

Are there any ON STOP statements in your code? If there isn't then you can use the STOP statement and then trap the STOP condition at the top of the callstack and get what you want that way.

Posted by dbeavon on 16-Feb-2016 16:26

Cool.  I'll try that out.  I would prefer to have a future-proof'ed version of an "untrappable STOP".   But I guess it is unlikely that anyone will include new STOP handling at the start of the callstack (batch, chui, appserver, whatever) unless they *really* know what they are doing.

As far as I'm aware, the only ABL conditions we check for and handle on a regular basis are ON ERROR and ON ENDKEY.  (In ABL you have to put these on *every* single FOR EACH loop or you get the agonizing behavior where only *one* iteration of the loop is undone and the program keeps going on its merry way.  But that is a separate topic altogether.)

Thanks for the help, David

Posted by Tim Kuehn on 16-Feb-2016 16:32

I did some looking around for causing untrappable stops, and the only times it happens is when things like a DB shutdown or similar external events results in the ABL session effectively resetting / restarting itself.

When catching the top-level STOP condition, here's the top-level code I used:

test-block:
DO ON STOP UNDO, LEAVE test-block:

RUN CallStack.p.

MESSAGE "In Main"
   VIEW-AS ALERT-BOX INFO BUTTON OK.

END.

MESSAGE "After Main"
   VIEW-AS ALERT-BOX INFO BUTTON OK.

Posted by ChUIMonster on 16-Feb-2016 16:35

I must be extra stupid today...  a bad thing happens somewhere in your legacy code.  You know that this bad thing happens and you're prepared to go through quite a few contortions to abort the process in such a way that the transaction does not get committed.  But you cannot fix the error?  That seems ill considered.

Posted by dbeavon on 16-Feb-2016 17:03

I can't create a general purpose fix for these errors in that program at that level of the call-stack.  It is mainly because the program doesn't have all the necessary contextual information it would need to take a new course of action.  IE.  In my situation the program can see for itself - based on some of its own logic - that the data is invalid.  But it:

  • doesn't have authority to change anything about the data it is given.
  • it doesn't really know how much other logic may have been done previously - leading up to that point
    (ie. logic that had been executed based on the invalid data)
  • most importantly, it doesn't have a mechanism to send a back an error
    response to a calling program, nor allow someone else to take a new course of action
    (eg. generate a message to the user about the error and fail in a graceful way).

Given how rare it is that the error situations come up, the choke-and-die approach seems suitable.  As you say, we'd also fix the root cause if/when its possible to find it (nobody will prefer a program that chokes-and-dies over a program that succeeds).  

But there would be too much refactoring involved to have a general-purpose solution that would either (A) pass back an output parameter all the way up the legacy ABL callstack, or (B) try to find and fix/validate all the input data everywhere, as supplied by potentially dozens of clients to this particular program.

This program is like a low-level private first class in the army.  It has to do what its told without sending any unwanted outputs to its callers.  It doesn't have too many options.  

Now if this was OO code and could use Structured Error Handling, that would be a whole different story.

Posted by Tim Kuehn on 16-Feb-2016 17:15

I really hate to suggest this as it goes against almost every best practice I can think of.... you could create a class with a static member, set a value in there, and then check that flag/value higher in the callstack...

A class with static members would also be a good way to capture the callstack and related information when the error happens, and then check the class's TT to see if there's anything interesting in it higher up the callstack or when quitting the program.

Posted by Matt Baker on 16-Feb-2016 17:25

I tried to stay out of this...but I can't.  Please...don't do that.  You'll just tangle it even more, and have to fix even more bad code..."later".

If you're going about adding checks against global flags...just use exceptions.  There's no easy fix.  But there are correct fixes.   The correct fix, is to throw an exception; It will unwind the stack and back out the transaction properly.   Find all the places where this procedure is called, and add catch block to either handle it or sprinkle undo, throw where needed.

Posted by Tim Kuehn on 16-Feb-2016 18:10

Given the constraints this is a viable - temporary - solution / diagnostic tool as long as it's only used for this, it's highly documented, and removed when the problem is fixed.

Alternatively would be to have a procedural pub/sub with a sub at the top of the stack and pub's where required. I'd consider this an inferior solution as its not encapsulated.

Posted by Laura Stern on 16-Feb-2016 18:48

Just an FYI in case your last comment was an accurate reflection of your belief: you can use structured error handling ANYWHERE.  It doesn't have to be in classes.  A thrown error can be caught as an OO object, which you can do in any block, whether it's in a class or not.  Or it can be trapped with a NO-ERROR just as before.

Though it is true that in your case you'd probably need a "BLOCK-LEVEL ON ERROR UNDO, THROW" in all your procedures and that could cause you even more nightmares without much careful thought.  So, I'm not necessarily suggesting it.  Just trying to clear up any myths.

Posted by James Palmer on 17-Feb-2016 04:11

It doesn't need to be oo code to use structured error handling...

Sent from my Windows Phone

Posted by ske on 17-Feb-2016 04:28

>  (A) pass back an output parameter all the way up the legacy ABL callstack

You can always pass back information using a shared variable, if you have code higher up that wants to check it and then undo the transation. No need to have output parameters for that.

Posted by James Palmer on 17-Feb-2016 04:54

No shared variables are horrible horrible things.

Sent from my Windows Phone

Posted by dbeavon on 17-Feb-2016 07:33

@Laura, While I know its possible, I would avoid using SEH in legacy code.  In general there are other error-handling strategies (as poor as they may be) in place, so that adding one more on top just creates confusion for the legacy ABL coders. (ie. callers of the error-throwing programs would then have to check for two or three different factors to see if an error happened).  

The declaration of default ON ERROR handling (ON ERROR UNDO, THROW) which you mentioned, is one issue you'd have to address when using SEH in legacy code.  Additionally you will force your OO-based error classes on non-OO coders; and they'd have to understand how to catch them properly, how class type inheritance works, how to get at class members within the error classes, and so on.  I guess that in practice I don't buy into the idea that SEH should be used in OO *and* non-OO code ABL.  

As soon as you start using SEH the code becomes, at the very least, OO-aware and you might as well not stop and limit your OO to just your errors.

Posted by Matt Baker on 17-Feb-2016 08:12

Maybe this can be used as a learning opportunity.  Work with co-workers to bring their skills up-to-date if that is hindering a proper solution.

This thread is closed