Link to home
Start Free TrialLog in
Avatar of batesit
batesitFlag for United States of America

asked on

Bind not forwarding requests to root or forwarder.

This is a new install of opensuse, 42.1. Built two boxes but debug is focused on the master. Bind version is 9.9.6P1-25.2(9.9.6P1-33.1). Also using webmin for management, 8.801-1. Configured bind as authoritative for our zones. Turned the local firewall off until this is debugged. Downloaded latest hints with webmin. I can use local network client systems with nslookup to discover IPs of entries in our zones. When I try an external name with windows, the response is "query refused." On the DNS server itself nslookups result in "servfail." This happens ether if bind is configured to look at root servers or configured to forward requests to the old server I am replacing. When I changed yast to look at the old server instead of 127.0.0.1, nslookup was successful. Since nslookups work when not using internal bind, my conclusion is this is not a firewall issue, but is some sort of permission problem when asking bind to forward requests to root servers or forwarders.
This configuration file below starts with the ranges I expect to limit as sources of external queries, but this is not yet implemented.  The file is currently configured by webmin to use our old DNS server as a forwarder. named.conf.include seems to be empty. I have pulled out the zones for most addresses. I tried enabling logging, but something isn't working. I tried making changes by hand, but found webmin complains unless it is used to make the changes. I have also removed nearly all of the comments before posting here.

acl college {
      172.16.0.0/16;
      "our public range";
      };
options {
      # The directory statement defines the name server's working directory
      directory "/var/lib/named";
      #dnssec-validation auto;
      managed-keys-directory "/var/lib/named/dyn/";
      dump-file "/var/log/named_dump.db";
      statistics-file "/var/log/named.stats";
      listen-on-v6 { any; };
      # The next three statements may be needed if a firewall stands between
      # the local server and the internet.
      #query-source address * port 53;
      #transfer-source * port 53;
      #notify-source * port 53;
      notify yes;
    disable-empty-zone "1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.IP6.ARPA";
      also-notify {
            };
      recursion yes;
      forwarders {
            "IP of old DNS server";
            };
};
zone "." in {
      type hint;
      file "root.hint";
};
zone "localhost" in {
      type master;
      file "localhost.zone";
};
zone "0.0.127.in-addr.arpa" in {
      type master;
      file "127.0.0.zone";
};
zone "0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa" in {
    type master;
    file "127.0.0.zone";
};
include "/etc/named.conf.include";
server "public IP of server"{
      };
zone "1.16.172.in-addr.arpa" {
      type master;
      file "/var/lib/named/master/172.16.1.rev";
      };
logging {
      channel namedlog {
            file "/var/lib/named/log/namedlog";
            severity info;
            print-category yes;
            print-severity yes;
            print-time yes;
            };
      };
ASKER CERTIFIED SOLUTION
Avatar of gheist
gheist
Flag of Belgium image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of arnold
I would advise against using forward all as it will wreak havoc when the old one is retired.
Forwarders also eliminate/reduce the efficiency of caching,......
Check whether your new server is authorized to query your old server.

From the new server (This is also how you can BUILD the root.hint file)
dig @oldserver . NS

Why do you see a need to forward?
using forward for a specific zone, i.e. a VPN'ed dns.......
Querylog should help in the beginning
Check the log (probably /var/log/messages) and see if named is even starting up correctly. Stop and start it, then check the log. If you see any zones not starting, etc then that gives you a direction. I often have 2 windows open when I'm troubleshooting-- one with tail -f on the log, the other to restart the service.
named-checkconf, sir
Avatar of batesit

ASKER

It appears my comments from yesterday weren't registered. gheist got me most of the way. Thanks for getting me this far.
All of my tests have been with IPv4 even though only IPv6 is explicitly listed.
After manually editing to add allow-recursion for all of our networks, allow-query {any;};, and making sure the firewall stayed off during reboots, I am able to do lookups through the root servers from the DNS server itself. Tests from the server failed until I added 127.0.0.1. Dig is now able to list the yahoo servers.
The message from windows clients has changed to "DNS request timed out" when querying external hosts. Local zone lookups still work fine. If the server is configured to use a forwarder, everything appears to work.
named-checkconf didn't report anything.
It appears the only thing left to fix is recursive queries from our network when forwarders are not used.
Avatar of batesit

ASKER

For logging I configured in webmin a file /var/lib/named/log/namedlog and set it to minimum message level of info. Log category, log severity, and log time are set to yes. It is owned by root but I enabled read and write permissions for owner, group, and others. It now totals 2 bytes and shows no readable characters when opened. I would think that with the number of changes made to the configuration and the number of lookup failures, let alone the hangs from syntax errors and the reboots, there should be something in this file.
Avatar of batesit

ASKER

Once again my post didn't save.
I got logging to work by changing named.conf to end with :
logging {
      channel namedlog {
            file "/var/lib/named/log/namedlog";
            severity info;
            print-category yes;
            print-severity yes;
            print-time yes;
            };
      category default {
            namedlog;
            };
      };
Now it's filled with lame-servers (network unreachable) entries. They all seem to have IPv6 addresses. My guess is that nslookup in windows gives up while bind is waiting for a response from all of these queries to IPv6 addresses. I disabled IPv6 in yast with no change. I found a recommendation to use OPTIONS="-4" to disable IPv6 in bind at startup. This caused bind to shut down. Any suggestions?
The only thing I can think of if I can't come up with a clean way to tell bind to ignore IPv6 would be to edit out all of the AAAA records in root.hints, but if I do that, the next time root.hints is updated, we will be broken again.
Make sure your private IP space does not leak by defining zones for them
192.168.0.0/16
10.0.0.0/8
172.16.0.0/12

This should eliminate lame server errors resolving 0.0.168.192.in-addr.arpa types of addresses.
Since you are not posting the example of a log entry....
Avatar of batesit

ASKER

Good point, but I'm not getting errors in any of those zones. As I said, they seem to be IPv6 errors. My experiments messed things up so I had to reload the operating system. Here is the log after getting it back up this morning. I really think this is a case of bind trying to use IPv6 to do its lookups and thus causing timeouts, but I don't know how to tell bind to quit using IPv6.

05-Jul-2016 10:50:09.294 lame-servers: info: error (address not available) resolving 'b0.org.afilias-nst.org/A/IN': 2001:500:3::42#53
05-Jul-2016 10:50:09.294 lame-servers: info: error (address not available) resolving 'b2.org.afilias-nst.org/A/IN': 2001:500:3::42#53
05-Jul-2016 10:50:09.294 lame-servers: info: error (address not available) resolving './NS/IN': 2001:500:3::42#53
05-Jul-2016 10:50:09.294 lame-servers: info: error (address not available) resolving 'b0.org.afilias-nst.org/AAAA/IN': 2001:500:3::42#53
05-Jul-2016 10:50:09.294 lame-servers: info: error (address not available) resolving 'c0.org.afilias-nst.info/A/IN': 2001:500:3::42#53
05-Jul-2016 10:50:09.294 lame-servers: info: error (address not available) resolving 'b2.org.afilias-nst.org/AAAA/IN': 2001:500:3::42#53
05-Jul-2016 10:50:09.294 lame-servers: info: error (address not available) resolving 'c0.org.afilias-nst.info/AAAA/IN': 2001:500:3::42#53
05-Jul-2016 10:50:09.294 lame-servers: info: error (address not available) resolving 'd0.org.afilias-nst.org/A/IN': 2001:500:3::42#53
05-Jul-2016 10:50:09.294 lame-servers: info: error (address not available) resolving 'd0.org.afilias-nst.org/AAAA/IN': 2001:500:3::42#53
05-Jul-2016 10:50:09.295 lame-servers: info: error (address not available) resolving 'a2.org.afilias-nst.info/AAAA/IN': 2001:7fe::53#53
05-Jul-2016 10:50:09.295 lame-servers: info: error (address not available) resolving 'a2.org.afilias-nst.info/AAAA/IN': 2001:503:c27::2:30#53
05-Jul-2016 10:50:10.894 lame-servers: info: error (address not available) resolving './NS/IN': 2001:500:2d::d#53
05-Jul-2016 10:50:12.495 lame-servers: info: error (address not available) resolving './NS/IN': 2001:500:1::803f:235#53
05-Jul-2016 10:50:15.696 lame-servers: info: error (address not available) resolving './NS/IN': 2001:7fe::53#53
05-Jul-2016 10:50:15.697 lame-servers: info: error (address not available) resolving './NS/IN': 2001:503:c27::2:30#53
05-Jul-2016 10:50:17.297 lame-servers: info: error (address not available) resolving './NS/IN': 2001:503:ba3e::2:30#53

Before the re-install I was getting messages like this. I suspect the "network unreachable" instead of the later "address not available" message was caused by subsequently unchecking the IPv6 enable box in yast.
30-Jun-2016 12:38:02.829 lame-servers: info: error (network unreachable) resolving 'a10-128.akadns.org/A/IN': 2001:500:f::1#53
30-Jun-2016 12:38:02.829 lame-servers: info: error (network unreachable) resolving 'a5-130.akadns.org/A/IN': 2001:500:f::1#53
30-Jun-2016 12:38:02.829 lame-servers: info: error (network unreachable) resolving 'a28-129.akadns.org/A/IN': 2001:500:f::1#53
30-Jun-2016 12:38:02.829 lame-servers: info: error (network unreachable) resolving 'a10-128.akadns.org/AAAA/IN': 2001:500:f::1#53
30-Jun-2016 12:38:02.829 lame-servers: info: error (network unreachable) resolving 'a13-130.akadns.org/A/IN': 2001:500:f::1#53
30-Jun-2016 12:38:02.829 lame-servers: info: error (network unreachable) resolving 'a28-129.akadns.org/AAAA/IN': 2001:500:f::1#53
30-Jun-2016 12:38:02.829 lame-servers: info: error (network unreachable) resolving 'a4-131.akadns.org/A/IN': 2001:500:f::1#53
It looks as though it attemps IPv6 connection, does your wan IP have IPv6, disabling IPv6 as well as updating your root.hints

Dig @a.root-servers.net  . NS
Make sure all it lists are IPv4 addresses.
Avatar of batesit

ASKER

The response to "dig @a.root-servers.net . NS" is
; (1 server found)
;; global options: +cmd
;; connectivity timed out; no servers could be reached

As i indicated, I have disabled IPv6 in yast. We are not set up for IPv6 at this time so there is no way for an IPv6 packet to be routed. I don't know how to disable IPv6 lookups by bind, (I suspect this would solve the entire issue.) Since the rebuild I have not updated root hints. When I did a search for root.hints I couldn't find it.

This morning I was able to get a single successful lookup of www.google.com with nslookup in windows after two timeouts. All other lookups failed. Here are the corresponding log entries.

06-Jul-2016 09:06:19.096 edns-disabled: info: success resolving 'www.google.com/A' (in 'google.com'?) after reducing the advertised EDNS UDP packet size to 512 octets
06-Jul-2016 09:06:19.236 edns-disabled: info: success resolving './NS' (in '.'?) after reducing the advertised EDNS UDP packet size to 512 octets
06-Jul-2016 09:06:19.346 lame-servers: info: error (address not available) resolving 'apple.wa-k20.net/A/IN': 2001:503:231d::2:30#53
06-Jul-2016 09:06:19.346 lame-servers: info: error (address not available) resolving 'apple.wa-k20.net/A/IN': 2001:503:a83e::2:30#53
06-Jul-2016 09:07:01.266 security: info: client 85.131.138.142#37932 (067.cz): query (cache) '067.cz/ANY/IN' denied
Avatar of batesit

ASKER

I found how to turn off IPc6 requests. Add NAMED_ARGS=" -4" to /etc/sysconfig/named.

Unfortunately that didn't fix everything. now the log has entries like this.
edns-disabled: info: success resolving 'ns1-99.akam.net/A' (in 'akam.net'?) after reducing the advertised EDNS UDP packet size to 512 octets

It's starting to look like our ASA firewall is blocking DNS requests larger than 512 bytes. I'm still getting timeouts from nslookup but I can see progress. I will check what can be done from both directions since I don't care whether I enable Extended DNS with the ASA or disable EDNS on the server.
make sure your firewall allows both UDP and TCP on port 53. When the packet is too large, the DNS adjusts to a TCP connection rather than a UDP.
Avatar of batesit

ASKER

Just before going home last night I found this link: https://supportforums.cisco.com/discussion/10880076/cisco-pix-asa-and-dnssec-problem-approaching-may-5th

I changed the entry in DNS inspection on the ASA from:

policy-map type inspect dns preset_dns_map
 parameters
  message-length maximum 512

to:

policy-map type inspect dns preset_dns_map
 parameters
  message-length maximum client auto
  message-length maximum 4096

Now every  nslookup from my windows client has worked. Thanks for all the help.
Avatar of batesit

ASKER

This turned out to be more involved than expected and required changes in three different areas. Gheist got me through the first hurdle and I appreciate not being dropped until all were completed.
You must switch off ASA DNS inspection if you intend to use DNSSEC/EDNS0, because ASA is defective.
even with 512 EDNS0 limit defective ASA will drop some packets. You can set dnssec-enable no; to stop trying them (who said ASA was for security? It will also strip SMTP STARTTLS same way)