SYSTEM ERROR: I/O error 2 while compiling code

Posted by dbeavon on 01-Oct-2019 22:01

In a Progress source-code compilation procedure that we've used for several years we are starting to get intermittent errors like so:

SYSTEM ERROR: I/O error 2 in readit, ret 0, file 10(.\gen\Common\NewVersion\VersionMigration.r), addr 0. (290)

It is generated from the _progres client on Windows, OE 11.7.4.  This error is kind of a scary one, and I look forward to spending many unproductive hours digging into this.

As near as I can tell, this is the best KB that might apply: https://knowledgebase.progress.com/articles/Article/P117101

The issue appears when a statement like this is executed:

     COMPILE lkp\p\lkp0200.p  GENERATE-MD5 SAVE.

Within that program (lkp0200) is a reference to the OE class, ie. gen.Common.NewVersion.VersionMigration.cls

I should also point out that it is possible that the class (gen\Common\NewVersion\VersionMigration) is being simultaneously compiled and saved to disk by a worker in a different _progres process.  (There are a handful of compilation processes that run concurrently or our projects would take all day to build.)

Any tips would be appreciated.

Posted by dbeavon on 21-Oct-2019 13:34

I have some feedback from tech support about the failures in the COMPILE statement.  This issue does *not* appear to be specifically related to the compilation of class hierarchies (or even to classes in particular, vs procedures).

Here is a KB that describes the failures while using COMPILE in one session within a multi-user environment.

knowledgebase.progress.com/.../abl-compile-statement-fails-and-crashes-avm-with-error-290

A defect has been filed with the product team (OCTA-16825).  Based on what I've read and heard, there is nothing that tells us to avoid COMPILE operations in a multi-user environment, and any related crashing that may happen can be considered a bug.

Posted by frank.meulblok on 02-Oct-2019 08:25

> I should also point out that it is possible that the class (gen\Common\NewVersion\VersionMigration) is being simultaneously compiled and saved to disk by a worker in a different _progres process.

I suspect that's actually the cause of this - the session failing because it finds the partial .r file the other worker is writing to.

Should be avoidable if each worker uses the SAVE INTO option to write the r-code to it's own folder outside of the propath, and merging the outputs of the different workers into a final set when they've completed.

All Replies

Posted by frank.meulblok on 02-Oct-2019 08:25

> I should also point out that it is possible that the class (gen\Common\NewVersion\VersionMigration) is being simultaneously compiled and saved to disk by a worker in a different _progres process.

I suspect that's actually the cause of this - the session failing because it finds the partial .r file the other worker is writing to.

Should be avoidable if each worker uses the SAVE INTO option to write the r-code to it's own folder outside of the propath, and merging the outputs of the different workers into a final set when they've completed.

Posted by dbeavon on 02-Oct-2019 12:58

Thanks for the response.  That is probably what is happening but there is no KB yet for it (at least not in relation to COMPILE), and I didn't think this type of a problem would go so far as crash the entire process.

So the AVM/runtime isn't that careful about using partially written r-code files?  

It seems like Progress should be relying on a checksum or MD5 feature - in order to avoid using an r-code file that is invalid.  At a very minimum it seems like the COMPILE statement should generate an error condition instead of crashing.  It would be better if the runtime would keep it together, rather than panicking and crashing out of the entire process.  

Or better yet, just rely on the file-system and OS to provide the concurrency control.  Those r-code files can be shared for reading and exclusive for writing.  It doesn't seem like this is so difficult ... a database vendor like Progress shouldn't have trouble with the concept of controlling concurrent access to a file resource.

>> Should be avoidable if each worker uses the SAVE INTO option to write the r-code to it's own folder outside of the propath, and merging the outputs of the different workers into a final set when they've completed.

Yes ... I'm thinking of using that approach, or giving up and using PCT one day (once it is formally supported).  Building our own compiler tooling is an uphill battle for sure, despite the fact that we've been at it for twenty years.  It seems like Progress has a *lot* of opportunity for improvements in this area.  ABL customers lose efficiency in the creation applications if they are also burdened with the creation of our own compiler and deployment toolchains.

Posted by Brian K. Maher on 02-Oct-2019 13:20
David,
 
FYI (for you and everyone else) ... any crash of the AVM should be reported to Tech Support.
 
Brian Maher
Principal Engineer, Technical Support
Progress
Progress
14 Oak Park | Bedford, MA 01730 | USA
phone
781-280-3075
Twitter
Facebook
LinkedIn
Google+
 
 
Posted by tbergman on 02-Oct-2019 13:34
I’m not sure if this will be helpful in your situation but one strategy we often use for preventing applications from using partially written files is to write the file under a different name, then rename it when done.
 
We often do this when sending files via FTP etc. when we know there’s an automated process to pick up the file. I have no idea if this will prevent your errors but it might be worth a try.
 
Tom
Posted by dbeavon on 02-Oct-2019 21:30

>> any crash of the AVM should be reported to Tech Support.

I just now submitted this.  It was a bit hard to get it to happen under normal circumstances, so I had to artificially increase the number of concurrent client processes that were making attempts to read the r-code.

As a side... Initially I thought the problem was going to be related to my class inheritance.  IE I was guessing that the bug would be a result of the unusual way in which the compiler may save *multiple* files at once - even ones that were NOT requested....  But it turned out that the bug had nothing to do with that at all; it can even happen when working with ABL code which doesn't use inheritance.

Another thing occurred to me while building the repro ... the bug can probably happen when only *one* party is doing the COMPILE/SAVE.  The other party might not even be compiling code, but just trying to *execute* something that hasn't been fully written to disk yet.  IE. this seems like a fundamental problem with COMPILE/SAVE in cases where it is used in a shared environment (ie. an environment where there are multiple active processes sharing the same propath).  I had no idea that there was a possibility that another AVM client process might crash if a COMPILE was underway.  I always thought that the writing of r-code was somehow "atomic" and/or that other client processes would revert to using the ABL source code if ever something went wrong while attempting to use the r-code.

>> one strategy we often use for preventing applications from using partially written files is to write the file under a different name, then rename it when done.

I will pass that along as a suggestion for Progress to improve the behavior of COMPILE.  IMHO the behavior of that operation should be atomic from the standpoint of any outside observer.  The reading or executing an incomplete r-code file is not good for anyone.

Posted by dbeavon on 02-Oct-2019 22:01

Speaking of submitting bugs.  I have four active support cases at the moment, and I just learned that we are going to have our software licenses audited! Coincidence?  

Or maybe it is the customers that use tech support who get moved to the top of the audit list? Maybe I need to be more inconspicuous, and just let those bugs be!

Posted by Simon L. Prinsloo on 03-Oct-2019 07:00

I doubt that your support cases have any bearing on the audit selection. I know of a few companies that were audited in recent years who rarely if ever log support cases.

Posted by Brian K. Maher on 03-Oct-2019 11:32
Hi David,
 
Short answer: Switch to decaf.  <smile>
 
Long answer: No, there is no correlation.
 
Brian Maher
Principal Engineer, Technical Support
Progress
Progress
14 Oak Park | Bedford, MA 01730 | USA
phone
781-280-3075
Twitter
Facebook
LinkedIn
Google+
 
 
Posted by Matt Baker on 03-Oct-2019 12:50

Please don't stop contributing to communities.  You've provided some great feedback (esp about the .net open client) and we do appreciate your input.

Posted by dbeavon on 03-Oct-2019 14:21

@Matt.  Thanks, I don't plan on going away.

But it would be nice to find the patterns and correlations that trigger an audit.  They are quite a waste of resources (similar to working on PSC tech support cases).  

Posted by dbeavon on 21-Oct-2019 13:34

I have some feedback from tech support about the failures in the COMPILE statement.  This issue does *not* appear to be specifically related to the compilation of class hierarchies (or even to classes in particular, vs procedures).

Here is a KB that describes the failures while using COMPILE in one session within a multi-user environment.

knowledgebase.progress.com/.../abl-compile-statement-fails-and-crashes-avm-with-error-290

A defect has been filed with the product team (OCTA-16825).  Based on what I've read and heard, there is nothing that tells us to avoid COMPILE operations in a multi-user environment, and any related crashing that may happen can be considered a bug.

Posted by dbeavon on 17-Jan-2020 17:11

Another follow-up on this error... knowledgebase.progress.com/.../abl-compile-statement-fails-and-crashes-avm-with-error-290

The defect is possible whenever anyone is compiling code in an environment where other active AVM clients are actively running.  There appears to be no guaranteed way to avoid it, other than stopping the other active clients in the system (ie. you cannot compile in production, *and* you must create a single-threaded compiler that is isolated into its own private PROPATH).

While it is considered a defect, my understanding is that the problem will not be addressed in 11.7.6 or 12.2.

This seemed to be the first time anyone had reported the issue to Progress.  It wasn't even clear what the expected behavior should actually be.  It is not well-defined how things will behave if a COMPILE statement that is writing to a PROPATH while there are other processes that are running.  The current behavior is not ideal, based on my testing.  We are seeing process crash to the OS, without a way to intercede or change course.  

It will be really challenging for us to develop a work-around for this issue!  Ultimately we would like to get out of the business of building our own compiler tools.  I think Progress would prioritize this type of a bug if THEY were the ones trying to develop a compiler tooling that supported larger ABL projects.  Hopefully that will happen some day.

Posted by ducity on 19-Jan-2020 22:29

"The defect is possible whenever anyone is compiling code in an environment where other active AVM clients are actively running.  There appears to be no guaranteed way to avoid it,"

Actually, a very old (probably v2 solution) is to start your client sessions with -q , there are obvious downsides, like existing sessions will not pick up the finished .r ......

And the reason this probably hasn't been reported? People long ago learnt to isolate compilation from testing.......

Posted by dbeavon on 19-Jan-2020 22:55

>>  start your client sessions with -q ,

The -q isn't a guarantee ... any newly started session still has to read r-code before it can run, and this has the potential to crash if the code compilation is actively underway.

Funny enough, the bug came up on a build server that does nothing more than compile our source code.  So we are avoiding the issue in production, but our build server is a mess as a result of COMPILE statements that are crashing the processes in an unpredictable and unavoidable way.  Is there a way to tell a client session NOT to use the preexisting r-code, even if present?  I had a long support case with Progress and didn't pose that particular question but it might be a potential workaround, if such a feature existed.  I got the impression that they didn't want to make changes on their end, and were OK with occasional/unpredictable crashing of the AVM.  

Posted by frank.meulblok on 20-Jan-2020 09:49

What about my previous suggestion from Oct 3 2019 ?

"Should be avoidable if each worker uses the SAVE INTO option to write the r-code to it's own folder outside of the propath, and merging the outputs of the different workers into a final set when they've completed."

It's not without downsides, as your disk space requirements will grow significantly.

But it should avoid sessions writing to .r files that other sessions are reading from, and AFAIK that's where the risk lies. (Not just multiple sessions reading the same .r files.).

Posted by dbeavon on 20-Jan-2020 17:14

>> What about my previous suggestion from Oct 3 2019 ?

I remember reading that but was less open to the idea at the time.  I had always heard that the crashing of the AVM wasn't an acceptable outcome under any circumstance, so I was fairly convinced that Progress would want to own up to this and fix it themselves.  (IE I don't see why the compile of every single file shouldn't work internally like a SAVE INTO that saves to a temp file and renames it when finished.  The operation should be atomic and other active clients shouldn't observe intermediate WIP outputs..)  Moreover the issue is not unique to me.  Progress conceded that there is a potential for this to happen any time a COMPILE w/SAVE happens in a multi-user environment.  The compiling client has the potential to negatively impact the executing clients.  Given that Progress is a dynamic platform, this is how many customers have always done it ("this is the way mando") and they have never told us otherwise nor given us any cautionary warnings about the unintended consequences of it.  Until a few years ago we always compiled all our code in-place within our production PROPATH (not on an isolated build server as we do today.)

Now that I know Progress wants me to do the heavy lifting,  and work around the issue myself, I agree that SAVE INTO is a real great option.  (I'm glad you reminded me because I was contemplating the use of redundant copies of our entire source from our repo).  Your option lets me use a single copy of the source and compile the results to a location that is outside the PROPATH so it shouldn't cause concurrency conflicts.  Unfortunately the compile operations will never take advantage of pre-compiled r-code and there will be a lot of redundant work being done, but that seems unavoidable.

Posted by Evan Bleicher on 31-Jan-2020 19:20

I would like to take this opportunity to provide some context on how the Development team handled OCTA-16825, which was originally logged as a regression.

As noted in this post, the ABL compile statement is a session-based facility for compiling a procedure or class file.  The output of the compilation is either a single or possibly multiple rcode files.  The compile statement has no synchronization with any other ABL sessions.

One of the examples cited in this post, was compiling an updated source file, while the corresponding r-code file was being used in a running application.  Alternatively, compilation may be occurring in one session, while another session is compiling the same project.

To some degree the ability to perform these simultaneous actions is supported by OpenEdge on Unix by leveraging operation system calls.  These calls allow us to write new r-code to a temporary file and then rename it to the proper name without disrupting existing readers of that same r-code file.  This capability is not supported on Windows.  The development team attempted to identify other approaches to provide a similar behavior on Windows and did not find a viable solution.  It was also conclusively determined that the product had not regressed and in fact this behavior has always existed in the product.  The decision to address this behavior (cost of development and validation) needed to be weighed against the benefit and implicitly the frequency of this situation occurring.

I noted that on Unix, to some degree,  these simultaneous actions are supported.  But it is possible that an invalid outcome may still occur.  In the scenario in which one ABL session is compiling a project while the project is being run in another session, it is possible for the running application to fail.  For example, this can occur if the signature of a method is changed. If the session which is compiling the project compiles and generates a new r-code file for a class, before the calling class or procedure is recompiled, a runtime error could occur during the execution of the logic which invokes the method.

For this reason, it is best practice to holistically compile a project prior to updating a running application.  In fact, the Progress Application Server for OpenEdge (PASOE) environment supports the ability to easily update an application in production without impacting running agents.  See “Update PROPATH in a production instance with zero downtime” (docs.progress.com/.../Update-PROPATH-in-a-production-instance-with-zero-downtime.html).

Therefore, the decision was made to invest OpenEdge Development resources on other issues as this issue can be avoided by leveraging existing development practices.  However, this discussion highlighted that we need to add clarifications to documentation noting these limitations.

Posted by dbeavon on 01-Feb-2020 00:16

Thanks Evan.  I appreciate this summarization of the support case.  It is helpful for others to hear it straight from Progress, if they ever encounter this.

I do understand the larger points related to the difficulties supporting multiple OS'es, the need to invest your resources wisely, the fact that this is not a recent regression, and especially the explanation that customers should wean themselves away from risky compilation in production.  (We especially appreciate the last point, and that is why we very do large, concurrent compilations of our ABL code as part of an automated build ... the place where we encountered this concurrency problem to begin with).

I know you did spend a lot of time on this issue for me and I appreciate it, especially given the fact that it probably impacts customers infrequently. 

I do have a few comments.  First of all we assume that this is an infrequent issue, but I would submit that we don't know how exactly often this issue bites your customers.  Many of them may still be compiling r-code directly into their production PROPATH's.  The issue causes the AVM-hosting process to unexpectedly crash.  There is not really an approachable error message (like "your session failed because of partially compiled code".)  If there were better errors/symptoms from the AVM then a customer might use that to open support cases.  And you might hear from customers a lot more than you do. As it is now, I would suspect that most customers flail about, recompile, restart, and the problem randomly goes away.  They shrug and move on.

My next comment is that it should be clearly defined how this stuff is supposed to work in the first place.  If you don't define the "right" behavior then of course you can argue that all is well no matter how they work.  IE. If things behave one way in UNIX then that is what is "right" for UNIX.  If it behaves a totally different way on Windows then that is "right" for Windows.  I'm reminded of programming-by-coincidence (see https://pragprog.com/the-pragmatic-programmer/extracts/coincidence )

... Fred doesn’t know why the code is failing because he didn’t know why it worked in the first place. It seemed to work, given the limited “testing” that Fred did, but that was just a coincidence.

Finally, my last comment is that you should probably clearly state in the documentation of the COMPILE statement that it isn't fully supported when performed in production (where a PROPATH is shared by active sessions).  There should be a LARGE warning about how it creates risks for other sessions - risks that are created by the behavior of the Progress runtime itself, and they go beyond the changing of custom method signatures in ABL.  I suspect it won't stop anyone from doing what they are doing, but at least they'll know it isn't advisable.

Thanks again for spending time on this.  I think I can do some things to work around the issue for now. 

Posted by Thomas Mercer-Hursh on 01-Feb-2020 15:19

To me, one of the questions that should be asked here is "Should I be doing the thing that is creating the problem?"

Compiling on top of production code seems to me to be fairly obviously risky behavior, something that should probably be avoided.  There are lots of ways in which that could cause a problem, even without the subtleties of this particular interaction.

Multiple parallel compiles seems like something that one might clearly want to do in order to reduce the time required to do a full compile, but seems unnecessary when doing compiles of limited amounts of code.  But, it also sounds like something that has obvious risks so that it would seem desirable to structure any such operation so that the code being compiled was as nearly mutually exclusive as possible.

At least, that's they way it seems to me.   And, if one followed these practices, it would be very unlikely to encounter this problem, which may be why it is not reported more commonly.

Posted by Evan Bleicher on 19-Feb-2020 15:52

In a previous Post I noted that OpenEdge Development would add clarifications to the COMPILE statement documentation.  The following is the proposed text:

The COMPILE statement is not synchronized with any other ABL sessions. As such, compiling individual files in a common code base when applications are using that same code base for either compilation or execution, can cause unpredictable behavior and is not advised. The best practice is to holistically compile a project in one ABL session prior to updating a running application. The Progress Application Server (PAS) for OpenEdge supports the ability to easily update an application in production. For more information, see <insert *Update PROPATH in a production instance with zero downtime* link here>.

Posted by dbeavon on 02-Mar-2020 02:40

Another update on this topic (hopefully the final one).  Our automated builds are now using the "SAVE INTO" option as suggested by @frank.meulblok. So far so good.    

This allows us to avoid the concurrency conflicts (... which Progress is now acknowledging in their documentation as "unpredictable behavior".)

There are only two additional considerations I would mention.  Firstly - none of the ABL "COMPILE" sessions will ever benefit from the R-code produced by *another* session, since the workers are all saving into their own independent directories that are specified by "SAVE INTO".  This allows them to avoid concurrency conflicts, but causes the overall duration of the builds to last a bit longer.  Synchronization is performed at the very end, in order to merge all the results together.  

Secondly it is important to note that the "SAVE INTO" option will *not* create the folder structure to receive compiled outputs when compiling P-code.  But it *will* create the folder structure when compiling OOABL (ie. CLS files).  This was an unfortunate "gotcha".

So when we use the "SAVE INTO" option, we need to proactively create the folders that are going to receive the compiled outputs.

Hopefully the gradle plugin will allow for compiling large amounts of ABL/OOABL code in parallel.  Maybe they will be able to learn from this thread, and from the Progress support case as well.  One important lesson we have learned is that Progress needs to test their gradle stuff on Windows as well as Linux.  That will ensure that it will account for the slight differences in the way that the COMPILE statement interacts with each of the two file systems.

This thread is closed