Posted by jamesmc on 18-Apr-2018 09:56


I am curious if someone could explain some behaviour that I saw recently.  We have two data centres, one based in the UK and one based in the States.  The AIX server I use is based in the data centre in the UK and it has two entries in the resolv.conf file for name servers, one address is for a DNS server in the UK data centre and the other address is for a DNS server in the US data centre.  We had a line issue the other day that cut us off from the US data centre, which was no big problem as we don't tend to rely on anything that is hosted in the remote site (from either direction).  Once the line went down though all interaction with the appservers and the admin server stopped working.  The admserv.log file was producing "System generated password has expired (9908)" errors after a period of time and interactively the command line tools were producing "Login denied, check username and password." errors.

Once the line came back up again these tools started behaving themselves again.  Throughout the outage I was testing DNS lookup on the AIX box because the issues felt network related, unable to reach hosts and timeouts etc. but it was still working fine.  DNS queries were being satisfied by the name server in the UK.  There is nothing on this host that uses any resource located off this server.  Everything that needs a hostname uses localhost but even if it uses the hostname we have an entry in the hosts file for it and the netsvc.conf shows that we use local resource first, "hosts=local,bind".

I can only surmise that even though there was a name server available to resolve lookups, the OpenEdge tools were attempting to use the server at the IP address that was unreachable because of the line outage.  Could this be the case?

AIX 7.1 / OpenEdge 10.2B08


