Link to home
Start Free TrialLog in
Avatar of kapshure
kapshureFlag for United States of America

asked on

Correctly interpreting RNDC Stats

I've got a CentOS 5.5 box running BIND 9.3.6, and I need to find out how many DNS queries this system has had over a period of time. I know about "RNDC stats" and I've also read that the stats that are generated upon output are for the total amount of time that the process has been running (correct?).

What I dont see here in these stats is info relative to the record types that were served up. Which isn't a big deal - but what I do need is to find say, total DNS queries for a month, as we're looking to outsource DNS.

here is the "rndc stats" output:

+++ Statistics Dump +++ (1289517741)
success 154162821      
referral 295177
nxrrset 30021771
nxdomain 9025909
recursion 0
failure 10616
--- Statistics Dump --- (1289517741)

Open in new window

guess the "success" line is what I'm most interested in, so I need to know whether this is for a given threshold, or since the process was last started. I can see in "history" that there was a "service named restart" - a few hundred commands back.. HA! but how to tell WHEN that was?

thanks team.
Avatar of arnold
Flag of United States of America image

The data is based on when the named was started.
ps -ef | grep named

rndc status will let you know the full duration of how long the process has been running.

The rest of the data is explained:
Avatar of kapshure



rndc status doesnt seem to tell me the duration of the process

usr/sbin/rndc status
number of zones: 21
debug level: 0
xfers running: 0
xfers deferred: 0
soa queries in progress: 0
query logging is OFF
recursive clients: 0/1000
tcp clients: 0/100
server is up and running

Open in new window

but doesnt this indicate that the process has been running since Sept 29?

named     4334     1  0 Sep29 ?        07:01:07 /usr/sbin/named -u named -t /var/named/chroot

Open in new window

Avatar of arnold
Flag of United States of America image

Link to home
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial

thanks a ton!
@arnold --

I was told that the DNS stats are incorrect that I gathered, and that perhaps the cache hadn't been flushed when the last named restart occurred? in this case Sept 29th? Do i need to do a rndc reload to clear this? Or perhaps to get fresh stats, do a named restart and do a rndc reload?
rndc reload tells named to reread the named.conf and recheck the zones.
I do not believe it will clear the cache.
rndc restart will stop/start named. which should start reflecting the new stats going forward.
The cache is memory based so when the named is restarted the cache is not retained.

rndc dumpdb -cache should flush the cache. It will not purge the statistics/stats data.
rndc stats should report the amount of time the service has been up and the relavent statistics.

You could look into bind data collection using SNMP.

Ok thanks.
you could also use the output from rndc stats since it has the timestamp of when the command ran as well as the current count. By comparing the current results to the prior you you can determine the change in each of the categories over the elapsed time since the prior check.
but look above, the rndc stats output i posted, doesnt contain that info. Problem is, the director of IT, doesn't believe the stats that were generated from the output of this.

at the time I didnt know if the named restart clears the cache, and that's what he asked me to look into. looked a few other places, but wanted to ask you

i'm starting to wonder if I should just bounce named again, and let the stats build for a week, and # crunch from there.
rndc status reports on Authoritative zone data. and current state of the named.
Not historic infrmation.
rndc stats
+++ Statistics Dump +++ (1289517741)
reflects the unix time stamp since epoch january 1st 1970 GMT.
perl -e 'print scalar localtime(1289517741."\n";'
Will output the human readable date of when it ran.

Do you have a single internal DNS server on which all other systems rely?
What is the issue that is being explored through these stats?
check /var/named to see if you have non-zone files but stats files?
this data came from our primary, we do have a secondary.

We are just looking to outsource DNS and were hoping to get some insight into how many DNS queries we are serving up. but the # of queries it showed, for a 44 day period looked wrong, and gave us like 163 DNS queries per visitor, or 20 DNS queries per page view. a

we had 1 stats file in a /data folder.. we are chrooting BIND, and I honestly try to NOT touch this box. :)
The outsourcing part I gather means the outsourcing of the management of the 14-18 domains for which your domain is authoritative?
The frequency of the queries for these domain depends on the TTL/minimum/expiry setup.  these settings deal with how long the lookup remain valid in other caching servers i.e. if you reduce the TTL/Minimum your DNS will see many more requests i.e. TTL and minimum set at 60 seconds will see many more requests that when the TTL/Minimum is set for 8400 seconds (one day).

Presumably the firm to whom this will be transferred will charge based on number of domains and perhaps on how many/frequency of changes that are included.

If you want to minimize/limit the DDOS type of attacks against your server, you can find a provider that does secondary DNS service such that you remain in charge with all the zone data, but their DNS servers will be seen/referenced publicly for the domain such that only their name servers will be queried.

gotcha gotcha. our TTL is actually set to 15 mins, which would probably explain why there are so many requests coming in.; yeh looks like RFC 1912 specified recommended TTL value to be 1 day or more.
there are 86400 seconds not 8400 seconds in one day just to clarify the prior posts error..
i didnt even catch that mistake # mistake above :/  thanks for clarifying.

one more question. is it even correct to try and get rough monthly DNS queries by taking this # spit out by rndc stats, and then divide that by # of days. Then taking # of page views (or page uniques) and dividing it into the previous #? seems that based on our short TTL, that approaching this by just dividing stats by # of days then dividing again by page views is not going to give you a correct #.

I could be wrong.
It all depends on what it is you need the information for.

I do not know for certain whether you rndc dumpdb all, etc. will clear the cache while the time/date on the named process remains the same.
The stats file that is created as a result of rndc stats should contain a detailed report i.e. duration, etc. or at least that is what I remember the type of information included in the file.  Usually people have someone else manage their DNS is not so much based on the traffic the DNS gets, but for the purpose of managing the data within each zone.

You could do it that way, but if your organization has scheduled activities during certain months, i.e. your firm has/manages many events, April, June, October. These will be the months of peak interest in your sites/emails where DNS queries will spike while the rest of the time the access could be minimal.
I believe mgmt just wants to know roughly the average, so they can build that figure into a price model of of what it would costs monthly to host, as we are moving everything out of our colo; we're moving heavily to EC2, and mail & DNS are the next two pieces of infrastructure to go.
Check the stats file in /var/named
Not sure whether an average data size is part of the statistics.
i.e. a query can have a 50 byte response or a 500 byte response while being reflected as 2 queries.

I think the stats file breaks it up into the specific type of queries, NS, RR, SOA, MX, A, CNAME, etc. and each type of query has an average response size.

i.e. nslookup -q=ns | wc -c
will return how much data each NS records for will use.

Between the DNS and mail, mail is the only one you should/need to concentrate on.
The DNS server is important as it is everything else relies on.

you could issue the USR1 to named to increase its verbose logging level by.  You can do this for the peak time of your domain and then use the information generated in /var/log/messages (named) to see quantitatively the data.  Make sure to issue USR2 until the logging is off.
kill USR1 <pid_of_named>
kill USR2 <pid_of_named>

If your named instance only has static authoritative data, the memory allocated should be enough in the lowest available option should be enough i.e. no dynamic record updates, no caching services from these name servers.   i.e. a computer from 6 years ago with 256MB RAM and 1Gb of HD Pentium 1GHz would be good enough to handle/service DNS requests for the 16-18 domains you have.
As previously mentioned a better configuration of the TTL/Minimum will reduce the possible load.  The only thing you have to remember with large TTL/Minimum is that transition have to be planned TTL/Minimum period prior to the planned change. This is where the TTL/Minimum would need to be reduced to a period that reflects the propagation time your firm is willing to tolerate a delay i.e. an hour, half, 60 seconds, etc.  The only consideration is that for this transition period many more requests would flow in until the change is completed and the TTL/minimum are adjusted back.

IMHO, you should concentrate on the mail as it is the one that has to be spaced out accurately with some 200% allocation of overhead.  Depending on which mail server you use, crunching the logs will likely provide you with the information on this one.