PASOE - intermittent issue ("Error loading the .NET run

Posted by dbeavon on 11-Dec-2018 14:48

We are running PASOE in production.  I'm seeing intermittent issues with some .Net interoperation (via clr bridge).  The interop is being used within our ABL code in order to quickly access a remote REST interface.  An error comes up intermittently while running some simple .Net code used by "sessionConnectProc" for session-managed connections.

We do not currently preload any assemblies for use by the CLR.

The progress error message that is reported within the AVM, via Progress.Lang.Error, is not of much help: Error loading the .NET runtime. (14081) 

There are only a few lines of code that reference .Net, and they run during the "sessionConnectProc".  I'm pretty sure I've isolated the problem to this code.

   /* ********************************************************************* */
   /* Call rest                                                             */
   /* ********************************************************************* */
   DEFINE VARIABLE v_HttpClient AS CLASS System.Net.WebClient. 
   v_HttpClient = NEW System.Net.WebClient(). 
   

 

Is there any way to get to the real underlying source of the problem.  I would guess that there are exceptions being thrown internally that are suppressed and replaced with the unhelpful and generic message, "Error loading the .NET runtime. (14081) ".  If there was a way to get the LOG-MANAGER to show more information, or hook into the first-chance details about the underlying errors/exceptions before they before they are superceded by the very generic one.

We are running PASOE on 11.7.4.  The KB articles that I've found about this are for problems that occurred in prior versions, and/or they are for problems that occur a lot more consistently.  In my case, I would guess I only see the error for 1 out of 1000 times that the WebClient is used.

Any help would be greatly appreciated.

All Replies

Posted by Brian K. Maher on 11-Dec-2018 14:58

David,
 
Use the MS fuslogvw.exe program (must be run as admin) and tell it to log all failures.
 
Brian Maher
Principal Engineer, Technical Support
Progress
Progress
14 Oak Park | Bedford, MA 01730 | USA
phone
+1 781 280 3075
 
 
Twitter
Facebook
LinkedIn
Google+
 
 

Posted by Jeffrey Z. Wolf on 11-Dec-2018 15:13

David,

You might also try the Windows Sysinternals program procmon. I have had an instance where fuslogvw.exe did not show me what assembly component was not being loaded but procmon did.

docs.microsoft.com/.../procmon

Jeffrey

Posted by frank.meulblok on 11-Dec-2018 15:29

I wonder:

Does this always happen on the first request for the session-managed connection ?

Or does the first request  for the connection succeed, and is it a later request that fails ?

If it's a later request that fails, did this particular session move over different threads on the agent ?

That should be visible in the agent logs by looking at the P-nnn (agent PID) T-nnn (Thread number)  AS-nnn (session number) portions of the entries. Filter those where P- and AS- portions match a session that ran into the error, see if there are variations in the T- portion.

COM automation breaks when sessions switch threads (knowledgebase.progress.com/.../COM-objects-on-PASOE-raise-CoInitialize-has-not-been-called-error), so it'd be worth probing if .NET objects are affected by something similar.

Posted by dbeavon on 11-Dec-2018 17:19

So it sounds like the problem is most likely related to loading the .net runtime, and related assemblies, right?  I'm hoping it isn't a catch-all error message that could arise for any arbitrary .Net exception (eg, including something in the implementation of the constructor of this class: System.Net.WebClient).

Why isn't there a way to get the .Net exception details?  This error (14081) is not something from the .Net runtime itself.  Troubleshooting is so much more difficult when Progress hides all the inner exception details.  I've noticed that the .Net openclient for appserver does the same thing.  It hides all the root-cause information like inner exceptions and stack traces, and replaces them with some generic message that will take many hours to troubleshoot.  (You almost need to have a debugger attached before you will find the real root cause for any of the unexpected errors).

Posted by goo on 11-Dec-2018 18:26

I got a similar error when trying to run an uninstalled prowc.exe (that works as a swell), calling a CrystalReport dll. I ha no clue why it failed, but did an install of Progress, and after that it worked. I found some talking about CAS an security, but not sure...

Sendt fra min iPad

11. des. 2018 kl. 15:49 skrev dbeavon <bounce-dbeavon@community.progress.com>:

Update from Progress Community
dbeavon

We are running PASOE in production.  I'm seeing intermittent issues with some .Net interoperation (via clr bridge).  The interop is being used within our ABL code in order to quickly access a remote REST interface.  An error comes up intermittently while running some simple .Net code used by "sessionConnectProc" for session-managed connections.

We do not currently preload any assemblies for use by the CLR.

The progress error message that is reported within the is not of much help: Error loading the .NET runtime. (14081) 

There are only a few lines of code that reference .Net during the "sessionConnectProc".  I'm pretty sure I've isolated the problem to this code.

   /* ********************************************************************* */
   /* Call rest                                                             */
   /* ********************************************************************* */
   DEFINE VARIABLE v_HttpClient AS CLASS System.Net.WebClient. 
   v_HttpClient = NEW System.Net.WebClient(). 
   

 

Is there any way to get to the real underlying source of the problem.  I would guess that there are exceptions being thrown internally that are suppressed and replaced with the unhelpful and generic message, "Error loading the .NET runtime. (14081) ".  If there was a way to get the LOG-MANAGER to show more information, or hook into the first-chance details about the underlying errors/exceptions before they before they are superceded by the very generic one.

We are running PASOE on 11.7.4.  The KB articles that I've found about this are for problems that occurred in prior versions, and/or they are for problems that occur a lot more consistently.  In my case, I would guess I only see the error for 1 out of 1000 times that the WebClient is used.

Any help would be greatly appreciated.

View online

 

You received this notification because you subscribed to the forum.  To stop receiving updates from only this thread, go here.

Flag this post as spam/abuse.

Posted by Laura Stern on 11-Dec-2018 18:53

Unfortunately, when there is a problem during the initialization of the .NET bridge (part of the AVM), you get this generic message "Error loading the .NET runtime. (14081)", which is I believe, very often not the problem!  There are a few different reasons why this can fail (would have to investigate to know what), but we always just give this one generic message that is extremely misleading.  It is also never about a failure on loading one of the assemblies in the assemblies.xml file.  There are no errors generated when any of those fail to load.  And it may have nothing to do with .NET whatsoever. There is an outstanding bug that we should modify the code to be able to generate a more specific error message.  It is OCTA-3502, if that helps anything.  I've been wanting to do this for a while, but it currently does not have priority.

Posted by dbeavon on 11-Dec-2018 20:48

@Laura 

Thanks for the reply.  I was afraid you might say something like that.  It sounds like the error could mean a number of things.

Perhaps you can prioritize OCTA-3502 based on the fact that this message is happening in PASOE, and there are times when the outer _mproapsv agent becomes very unreliable in the context of .Net interoperation.

Today I happened to be doing some debugging and the debugger was attached to the _mproapsv.exe process with the Visual Studio (for unrelated reasons) and noticed a few of these "Error loading the .Net runtime" come up in the PASOE agent log while I was doing my debugging.  I had originally suspected that it was triggered by a first-chance exception on the .Net side of things - but I never encountered any .Net exception for the whole duration of my debug session.  I now suspect that some of the reasons for the message are reasons which are entirely on the Progress side of the fence... and not related to anything going wrong in .Net.

It seems to me that the message "Error Loading the .Net runtime" doesn't pass the sniff test .  For example, there only seems to be a single appdomain in the entire _mproapsv.exe process.  I use .Net for the most minimal purposes (primarily just to use the WebClient in order to call a few REST methods.)  The initial usage of the WebClient takes place in the very first moments of the life of the agent process.  We can see that the appdomain is loaded with the relevant assemblies  right away.... (image)...

Given that the appdomain is loaded and initialized many hours (days?) ahead of time, it doesn't make sense for us to be getting the message  "Error loading the .Net runtime".  I suspect the message is trying to describe a different problem where a new ABL session is unable to be hooked up to the pre-existing .Net runtime.

Is there some way to get to the root cause of these error messages?  I no longer believe that the cause is on the .Net side.  It seems to be a problem within the ABL session.  Alternatively, is it likely that this message has a timing-related component to it, and will go away after some number of iterations?  Finally, is there documentation about how to use the CLR bridge in the context of PASOE?  It seems a bit scary that all ABL sessions are being re-directed to use the same appdomain.  An ABL programmer might expect that the CLR calls from one session are isolated from the CLR calls of another session, but that doesn't appear to be the case.

Any additional tips would be greatly appreciated.

Posted by Laura Stern on 11-Dec-2018 21:24

Yes, I agree with everything you said and getting to the root cause of this error is what OCTA-3502 is about.  We are working on prioritizing this to be higher in the queue.  We occasionally get that error message in our test environment, and personalIy, I find it VERY annoying!

Regarding the AppDomain, yes, we use the default AppDomain and there is only one.  However, most other things internal to the CLR bridge are NOT shared between the sessions.  We DID need to make these changes when we incorporated the CLR Bridge into PASOE.  So I think we are OK with that.  We don't modify the AppDomain or rely on it for anything that would differ between PASOE sessions. And hopefully PASOE ABL code is not mucking with it either!

Posted by dbeavon on 11-Dec-2018 21:45

>> We are working on prioritizing this to be higher in the queue.

So I'm hearing there isn't a way to troubleshoot on my end?   Alternatively, how should I bundle up this issue to send it over to tech support?  It happens so infrequently in in production that I will have a very hard time coming up with a consistent reproducible.  I may be able to refactor our use of the CLR bridge so it doesn't happen in the PASOE connect procedures (whereby it is causing users to get error messages that appear almost identical to connectivity or authentication failures).  But before I start refactoring this code and moving it around, I'd like to know where to move it so that it is less likely to raise the error messages.  

>> Regarding the AppDomain ...

Any static members in the CLR that an ABL programmer interacts with will also affect the other ABL sessions as well.  It is worth documenting this at the very least, since it might be unexpected and unintuitive to a programmer.  In contrast, on the ABL side of things all the static members of ABL classes are isolated within the context of the individual ABL session.

Posted by dbeavon on 11-Dec-2018 23:46

I'm wondering if these error messages from the CLR bridge are somehow correlated to the amount of load that is placed on the PASOE agent.  I really don't remember any occurrences of this issue prior to the recent deployment of a new application.  

Perhaps there is a way to synchronize the calls to the CLR bridge so it becomes less busy.  With some synchronization, the bridge will think there is only one ABL session running in the process at a time.  The REST methods (called via the WebClient) normally take only about five milliseconds but, under heavy load, they might take a bit longer.  I suspect that whenever the PASOE agent is under heavy load, there is a greater possibility of overlapping calls to the CLR bridge.

Hopefully I will be able to find a workaround.  If synchronization is the key, then I should only need to synchronize the ABL sessions within a given agent.  Perhaps I can create a table (CLR_IS_BUSY_RESTING) that has a primary key based on the process ID of the agent.  Prior to using the CLR bridge, I will create/lock a record associated with the process ID.  Then I'll call the REST methods using the CLR bridge.  Then I'll release the record again at the end.

It certainly isn't a "pretty" solution but I don't have any other ideas at the moment.  The REST method calls are pretty critical to our PASOE connection procedure.  Hopefully it won't add more than one additional millisecond to do the synchronization. It should still be faster than using OpenEdge.Net.HTTP for these REST methods.  (I use that API in other places but our PASOE connection procedures require really fast performance that I can't seem to get without using the .Net WebClient).

Posted by Laura Stern on 12-Dec-2018 15:33

Regarding reporting this to tech support, you really don't need to provide a reproducible to tech support in order to improve the error message.  As I said, there is already a bug and we also have a fairly high-priority "feature" to improve some of our error messages in order to avoid tech support calls.  This fits right into that category!  So the call to TS would really be to argue for increasing the priority.  But of course, if you did have a way to reproduce, that would be very helpful in diagnosing your actual problem!

In regards to trying to synchronize things on your own, I thing you are going down a difficult and dangerous path!   I would not recommend it.  I will reiterate: the CLR Bridge really does not share any data between the session threads.  There is a separate AssemblyStore for each thread and as far as we know, anything that stores data has its own instance for each thread.  The shared AppDomain should not affect anything.  We do not interact with it/modify anything in it.

Regarding the load factor: The error that started this whole discussion (Error loading the .NET runtime. (14081)) happens when we first need to initialize the CLR Bridge and that happens on the first call to anything in .NET.  Are you surmising that it could fail when one session is first initializing the Bridge and another one has already initialized but is now trying to make some .NET call?  I really can't comment on that.  I can't offhand see why this would be a problem.  But I can't say with any certainty that it isn't related.

Have you tried using -preloadCLR?  Maybe that would help.  That will cause the initialization to occur when the session starts up.

Posted by dbeavon on 12-Dec-2018 16:10

>> The error that started this whole discussion (Error loading the .NET runtime. (14081)) happens when we first need to initialize the CLR Bridge and that happens on the first call to anything in .NET

When you say the "first call to anything in .NET", then are you referring to the first call in the entire life of the agent process?  That doesn't seem relevant.  That had happened many days ago and there have been many tens of thousands of calls to .Net methods since then.  But it appears that we are still encountering the same intermittent message in the logs "Error loading the .NET runtime" in association with that same MS-agent process ID.  We churn thru the individual ABL sessions daily, because they are restarted on a regular basis (eg. when they become idle, are trimmed, and then are started when needed again ).  But the outer agent process has been running over the course of several days.

Given that the appdomain was loaded and initialized many days ago, it doesn't make sense for us to be getting the message  "Error loading the .Net runtime".  I suspect the message is trying to describe a different problem which is ABL-session-specific.  I still think that the new ABL sessions might be having trouble hooking into the pre-existing .Net runtime.

As far as the load factor goes, I'm wondering if there is contention between ABL sessions as they try to hook into the pre-existing runtime.  The more rapidly the ABL sessions are started, the more likely we might encounter errors?

Will the parameter which you referenced, -preloadCLR, affect the behavior of all the individual ABL sessions, or does it only impact the outer agent process (by initializing the CLR app domain on a one-time basis)?  That parameter was referenced in the forums earlier, and the developer said it didn't change matters and they continued to see this same message ( see community.progress.com/.../34331 )

I have opened a tech support case on this, given that I'm a long way from finding the root cause on my own.  I'd like to at least have a work-around that prevents these errors as much as possible, since I think they are disruptive when a few of our users encounter them each day.  My tech support engineer wants me to supply a consistent reproducible and I still don't have one.  Do you have any clues about how I might create a reproducible, even an artificial one?  I was going to focus on my theory that this is related to a synchronization problem, but you don't seem convinced.  Based on your experiences of this message, did you have any theories about how to recreate it on demand?

As a side, I also opened another tech support case (00470446) about a substantial memory leak in the ms-agent process that seems to be related to the CLR bridge.  It is pretty clear that there is a memory problem, given that the CLR managed memory dump can be opened in the VS debugger and we can see hundreds of rooted references (rooted via Progress.ClrBridge.ProMarshal).  Currently we are killing agent processes only once a week.  But as we migrate more of our applications from "classic" to pasoe, we are probably going to need to do that daily.

Posted by ske on 12-Dec-2018 16:29

> Do you have any clues about how I might create a reproducible, even an artificial one?

See if you can provoke the error to occur more often, by any means imaginable?

Maybe add some procedure that makes very many calls to the CLR bridge.

Posted by Laura Stern on 12-Dec-2018 16:45

>>>When you say the "first call to anything in .NET", then are you referring to the first call in the entire life of the agent process?  

I mean the first call from a particular session, not the whole agent process.

>>>Given that the appdomain was loaded and initialized many days ago, it doesn't make sense for us to be getting the message  "Error loading the .Net runtime".  I suspect the message is trying to describe a different problem which is ABL-session-specific.

Yes, that's what I already surmised - that it is probably not about the .NET runtime at all.

>>>As far as the load factor goes, I'm wondering if there is contention between ABL sessions as they try to hook into the pre-existing runtime.  The more rapidly the ABL sessions are started, the more likely we might encounter errors?

We really can't answer this until we know what the problem is.  If we could get a better error message, we would at least have some clue!

>>>Will the parameter which you referenced, -preloadCLR, affect the behavior of all the individual ABL sessions, or does it only impact the outer agent process (by initializing the CLR app domain on a one-time basis)?  That parameter was referenced in the forums earlier, and the developer said it didn't change matters and they continued to see this same message ( see community.progress.com/.../34331 )

This would affect the behavior of each ABL session.  i.e., When a session starts up, it would to this initialization.  However, after the first session, the .NET framework would already be loaded into the process, so it would only be other internal initialization that would occur.  The loading of the framework happens kind of automagically!  We don't actually have code that does it.  We just call into the CLR bridge, and voila, there it is (or not!).

>>> My tech support engineer wants me to supply a consistent reproducible and I still don't have one.  Do you have any clues about how I might create a reproducible, even an artificial one?  

I'm sorry, but I really don't.  So I think the first step is to prioritize getting that message fixed.  If the TSE balks, you can tell him/her to talk to me!  

>>>As a side, I also opened another tech support case (00470446) about a substantial memory leak in the ms-agent process that seems to be related to the CLR bridge.  ...

What version are you on?  We just fixed 3 different .NET-related memory leak issues in 11.7.4 and an 11.7.3 hot fix.  Could be one of these.  Or something else?? :-(

Posted by dbeavon on 12-Dec-2018 17:15

We're running 11.7.4 already.  Thankfully the memory issue will be an easy one to reproduce and submit, unlike the message "Error loading the .NET runtime".

Posted by frank.meulblok on 13-Dec-2018 08:39

Well, if the .NET code is only used in the sessionConnectProc, setting up a set of session-managed clients that do nothing more than connect and disconnect as fast as they can should work to put the stress on that code. And that should increase the odds of triggering the issue if it's tied to number or frequency of invocations of the procedure.

Posted by dbeavon on 14-Dec-2018 17:28

Here is a quick update on this, I was able to find a way to recreate that error message and I sent it to over to Progress tech support.  Hopefully my reproducible is accurately demonstrating the same type of circumstances that affect us in production.  (The repro involves starting up more than one session that both try to use the CLR bridge at the exact same time, in the same agent process).

The parameter, -preloadCLR, didn't seem to fix the problem.  It may have made it worse.  

Once I had a repro, I was able to put synchronization control around the section of logic that interacts with the CLR bridge.  I prevented multiple sessions from running that same section of code at the same time in the same agent.  This seems to make the problem go away.  My strategy uses record locks in a remote client/server database ... so it slows things down a *lot* more than I would like. (It appears that if there are two client/server sessions that both want a record lock on the same record at the same time, then one of them will be forced to wait an extra 2 seconds for the record to unlock, before trying again.  I don't know if there is any way to  control that 2 second overhead.  )

Posted by Laura Stern on 14-Dec-2018 21:32

Glad you got a reproducible!

Posted by bernhardkraml on 18-Dec-2018 09:34

Maybe I can also add some Information to this .NET mysterical.
 
In our Application (better it’s the app from some SW-vendor, and I did some customization for producing PDF with the .Net from https://www.dynamicpdf.com/)
 
Some of the user get also this Error-Message.
 
This Error is only on user working on a Windows-Terminal-Server.
 
The .Net works fine for some time, then one  user produce this error., All session started before the first error have no problem, all session after the first error have the problem.
 
Only workaround at the moment is a restart for the terminal server.
 
-bernhard
 
Von: Laura Stern [mailto:bounce-stern@community.progress.com]
Gesendet: Freitag, 14. Dezember 2018 22:34
An: TU.OE.Development@community.progress.com
Betreff: RE: [Technical Users - OE Development] PASOE - intermittent issue ("Error loading the .NET runtime. (14081)")
 
Update from Progress Community
 

Glad you got a reproducible!

View online

 

You received this notification because you subscribed to the forum.  To unsubscribe from only this thread, go here.

Flag this post as spam/abuse.

 
******************************************************************
"Informationen und Auskuenfte an den Adressaten unterliegen den Vereinbarungen des zugrundeliegenden Anbotes und Auftrages, insbesondere auch den vereinbarten Allgemeinen Auftragsbedingungen, welche unter www.kpmg.at/.../auf_bed.php
ersichtlich sind. Insbesondere ist vereinbart, dass Auskuenfte per email und angefuegter Beilagen nur verbindlich sind, wenn sie mittels Brief oder Fax bestaetigt werden. Die elektronische Uebermittlung von Nachrichten erfolgt insbesondere hinsichtlich Uebermittlungs- und Zustellproblemen, der Gefahr der Abwesenheit des Empfaengers und der Gefahr der Verletzung der Geheimhaltung im Internet auf Gefahr des Auftraggebers. Automatische Empfangs- und Lesebestaetigungen gelten nicht als Bestaetigung des Erhaltes Ihrer Nachricht.

Information to the addressee is subject to the stipulations of the underlying offer and order, including but not limited to the agreed General Conditions of Contract, which are retrievable from www.kpmg.at/.../auf_bed.php.
In particular, it is agreed that information via e-mail and attachments shall only be binding if confirmed in writing. Electronic transmission of messages shall be at the risk of the party requesting the same, in particular in view of problems relating to transmission and service, the risk of absence of the recipient and the risk of violation of secrecy on the Internet. Automatically generated acknowledgements of receipt or viewing shall not be deemed an acknowledgement of receipt of your message."

******************************************************************

Posted by dbeavon on 18-Dec-2018 14:06

Bernhard,    It sounds like a combination of two or more different bugs.  These terminal server users don't share the same process, right?  It is odd that your users would have a problem that crosses process boundaries.  The problem might be *initially* triggered by an intermittent issue with the Progress CLR bridge.  But If I had to guess, the *subsequent* failures within the *other* processes is probably *not* something that you can blame on Progress.  It may be related to a shared resource (file on disk) or something along those lines.  

You might want to try running "process monitor" and see if the third-party tools (or your own code) write to a hard-coded path (ie. in a temp directory or such) that cause all of the processes on the same server to start "stepping on one another", after they have been triggered by the initial failure.

Another thing to do is put CATCH Progress.Lang.Error blocks around small-ish sub-blocks of code so you know exactly which step is causing the failures.

Finally, once the problems are initially triggered, it would be fairly easy to get to the bottom of the *subsequent* failures by installing visual studio and attaching the debugger.  Visual studio will provide all the "first-chance" exception details.  Alternatively you could configure procdump or to generate memory dumps for you that you could analyzed later on a development machine (see docs.microsoft.com/.../procdump ).

I'd suggest using the -e 1 option for getting dumps from first-chance exceptions)

Hope this helps, David

This thread is closed