08 October 2010

Displaying web access statistics using AWStats

Categories:  Web  Server  Monitoring  Linux  Gentoo

How to configure AWStats to monitor web access statistics on Gentoo Linux server

Business decision makers like to base their judgments on the statistics. You will find a lot of services which will help you gather and present the Web Access statistics in a pleasant and what is most important easy to understand way. The most popular services like Google Analytics relay mostly on Javascript API hidden on the web page. Therefore this kind of services are extremely easy to block by a Web site visitor. In fact most of browser security plug-ins will block them by default. So what to do to see real statistics of your web page access. Rely on your web server logs. The information kept in server log can not be blocked. If someone is downloading something from your web page the information about it is kept in the logs no matter what type of security plug-ins the visitor is using. On my servers I use awesome AWStats web server log parser, which is presenting web site access statistics in a very pleasant way.

You will find a lot of articles considering AWStats configuration on various Linux distributions, but unfortunately I found out that most of provided there information was just not good for me. I manage to configure AWStats the way I wanted using parts of information from various sources, and I want to share my configuration hoping that someone else will find it useful. Most of the information provided below address the Gentoo Linux distribution, but some of them should be applicable to any Linux or Posix type of system.

I will start by defining the objectives of my setup. As I'm running few sites on my servers I would like to have one separate virtual host which will be presenting statics for all sites published on the same server. Presenting statistics in separate location for every single site is not a bad idea, you can access them for example by using url similar to this http://www.mysiteaddress.org/awstats. However some the sites I'm taking care of are build using frameworks like Syfmony or Ruby On Rails and the framework routing engine would make it relay hard for me to access the statistics like this. It's much easier for me to create a separate web page with links to all of the statics I need. I also would like to limit access to my web statistics page to logged in users and access the site using secure SSL connection.

Most of the resources I found in Internet were far from being complete thats why I want to make sure my information will be 100% reliable (at lest for Gentoo Linux users) therefore I will start with required Apache server compilation, then go thorough AWStats installation and configuration and end with virtual hosts and web page configuration.

First thing is Apache server configuration. Apache modules configuration should be defined in Gentoo Linux in /etc/make.conf file. If you want to make sure you will be able to use AWStats I would recommend adding APACHE2_MODULES definition similar to this one:

APACHE_MODULES="ssl alias log_config mime mime_magic unique_id vhost_alias threads authz_host auth_basic auth_default rewrite dir cgid"

This definition is typical for multithreaded apache configuration, if you know that your Apache compilation is not running on:

APACHE2_MPMS="event"

or:

APACHE2_MPMS="worker"

instead of this multiprocessing modules you are using:

APACHE2_MPMS="prefork"

then make sure you will replace a cgid module with cgi one. This is absolutely key module as AWStats web page is generated by cgi perl script.

The above configuration is very minimalistic and to tell you the truth I use more modules on my servers but other modules (like proxy_balancer for example) does not affect my AWStats configuration. You can tune up this configuration to match your personal needs you can for example replace auth_basic with authnz_ldap to use ldap authentication services instead of basic authentication I use (this would also require Apache compilation with ldap support). If your knowledge about Apache modules is limited visit this site to read and understand what every single modules stands for.

Next thing is Apache compilation. I had to make sure my Apache will be compiled with threads and ssl support. This should be enabled by default for every server profile, but if you are unsure you can always add this flags to /etc/make.conf or add them to /etc/portage/package.use like this:

echo "www-servers/apache ssl threads" >> /etc/portage/package.use

or run emerge command like this:

USE="threads ssl" emerge www-servers/apache

If you were using different Apache compilation before make sure you will run:

etc-update

or

dispatch-conf

to update you previously created configuration files.

You should also enable needed Apache options in /etc/conf.d/apache2 configuration. My configuration is looking similar to this one:

APACHE2_OPTS="-D DEFAULT_VHOST -D SSL -D SSL_DEFAULT_VHOST

This will let us use SSL and default Vhost configuration. This is almost everything I had to do to make my apache AWStats ready.

Last thing to do is changing default Apache log format to the format that AWStat will be able to parse. Instead of using suggested by AWStats documentation combided format I use same approach as described on Gentoo Linux Wiki (take a look at sources list at the bottom of the article). I use vhost format. This will of course affect AWStats configuration. For every site available via HTTP protocol I added following log configuration in vhost definition:

ErrorLog /var/log/apache2/my_site_address-error_log
CustomLog /var/log/apache2/my_site_address-access_log vhost

For every site available via HTTPS protocol I added following log configuration in vhost definition:

ErrorLog /var/log/apache2/my_site_address-ssl-error_log
CustomLog /var/log/apache2/my_site_address-access_log vhost
CustomLog /var/log/apache2/my_site_address-ssl-request_log "%t %h %{HTTPS}x %{SSL_PROTOCOL}x %{SSL_CIPHER}x %{SSL_CIPHER_USEKEYSIZE}x %{SSL_CLIENT_VERIFY}x \"%r\" %b"

If you can access the site using both HTTP and HTTPS protocols you should use one access_log file in both configurations. Make sure you will use separate log files for separate web pages. You can delete or rotate your current logs at this moment as AWStats will not be able to parse them (they were most likely created using incompatible format). Then restart apache by running following command:

/etc/init.d/apache2 restart

This way apache will create new log files that can be parsed by AWStats. Visit your sites to make sure you will have some information stored in logs. This is all that has to be done in Apache web server configuration.

Next step is AWStats installation. Latest version available in portage is AWStats-7.0 to use this version we will have to unmask it by running following command:

echo "www-misc/awstats" >> /etc/portage/package.keywords

Next thing to do is choosing USE flags for AWStats and it's dependencies. AWStats comes with apache2 vhost goeip and ipv6 support. I added use flags configuration by running following commands:

echo "dev-libs/geoip perl-geoipupdate" >> /etc/portage/package.use
echo "www-misc/awstats -ipv6 geoip apache2 vhost"  >> /etc/portage/package.use

As you can see I'm not using ipv6 on my servers but I use geoip. This way AWStats will be able to identify site visitors country. You can install AWStats by running following command:

emerge www-misc/awstats

Now we can create new virtual host for our AWStats statistics. To do it in Gentoo we wll use a great script called webapp-config. I installed AWStats in separate virtual host by running following command:

webapp-config -I -h awstats_host_name awstats 7.0

This will create a new virtual host directory in /var/www/ and copy all needed files to this directory. You can visit this directory and see for yourself that you will have a few cgi-bin scripts inside of cgi-bin directory and some files in htdocs directory. Next thing is creating AWStats configuration file for every site you want to monitor.

To do it simply copy a sample configuration file under new name which will match your main site address. You can do it by running following command:

cp /etc/awstats/awstats.model.conf /etc/awstats/awstats.your_site_address.conf

Now edit the file providing needed configuration options. My configuration is looking similar to this one (I'm providing only configuration options I changed):

LogFile="/var/log/apache2/my_site_address-access_log"
LogFormat="%virtualname %host %other %logname %time1 %methodurl %code %bytesd %refererquot %uaquot %other"
SiteDomain="my_site_address"
HostAliases="IP_address my_site_address REGEX[my_site_address\.pl$]"
DirData="../datadir"
DirCgi="/var/www/awstats_host_name/cgi-bin"
DirIcons="/awstatsicons"
BuildReportFormat=xhtml
#Depending on your site technology use index.php or index.html
DefaultFile="index.php"
SkipHosts="127.0.0.1 localhost"
Lang="pl"
StyleSheet="/awstatscss/awstats_bw.css"
LoadPlugin="geoip GEOIP_STANDARD /usr/share/GeoIP/GeoIP.dat"

We should be able to test our configuration by entering virtual host cgi-bin directory and running awstats like this:

cd /var/www/awstats_host_name/cgi-bin
./awstats.pl -config=my_site_address -update

If you can see information similar to this one:

Create/Update database for config "/etc/awstats/awstats.my_site_address.conf" by AWStats version 7.0 (build 1.970)
From data in log file "/var/log/apache2/my_site_address-access_log"...
Phase 1 : First bypass old records, searching new record...
Direct access to last remembered record is out of file.
So searching it from beginning of log file...
Phase 2 : Now process new records (Flush history on disk after 20000 hosts)...
Jumped lines in file: 0
Parsed lines in file: 92
 Found 0 dropped records,
 Found 0 comments,
 Found 0 blank records,
 Found 0 corrupted records,
 Found 0 old records,
 Found 92 new qualified records.

then AWStats is well configured and you can go on. If you will run into problems make sure your Apache log format configuration matches AWStats log format configuration.

Now we should make sure our statistics will be updated periodically. I did it by creating a simple script in /etc/cron.hourly/ named awstats:

#!/bin/sh
cd /var/www/awstats_host_name/cgi-bin
./awstats.pl -config=my_site_address -update > /dev/null 2>&1

You should add similar sample line for every site you want to monitor. Do not forget to make this script executable by running following command:

chmod +x /etc/cron.hourly/awstats

You should also make sure that you will parse logs shortly before every log rotation. To do it you need to add prerotate command to your log rotation configuration. Consult example showing my log rotation configuration for apache access_log files stored in /etc/logrotate.d/apache2 configuration flle.

/var/log/apache2/*access_log {
  daily
  missingok
  notifempty
  rotate 365
  dateext
  olddir /var/log/old/apache2
  sharedscripts
  nocompress
  nocreate
  prerotate
        /etc/cron.hourly/awstats
  endscript
  postrotate
        /etc/init.d/apache2 reload > /dev/null 2>&1 || true
  endscript
}

Now we are close to finishing configuration. We just have to make our stats appear on some kind of web page. To do it we will first create a sample html page that will show us links to our statistics. Create a index.html file in /var/www/awstats_host_name/htdocs with contents similar to this one:

!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>AWSTAT Domain List</title>
</head>
<body>
AWSTATS Page.<br />
<ul>
<li><a href="/awstats/awstats.pl?config=my_site_address">My_Site_Address</a></li>
</ul>
</body>
</html>
 

Add separate link for every site you are monitoring. Now the last thing to do is making this page appear and forceing our awstas.pl file to work as cgi script. To do it we need to create virtual host configuration for Apache server. In Gentoo virtual host configuration files are kept in /etc/apache2/vhost.d/ directory. As I pointed out at the beginning my configuration should force users to visit this site using secure SSL connection. First thing will be creating a simple virtual host configuration with rewrite rule which will redirect every one to SSL connection. This configuration should be similar to this one:

View the awstats_vhost.conf
  1. <VirtualHost *:80>
  2. ServerAdmin admin@my_domain.org
  3. DocumentRoot /var/www/awstats_host_name/htdocs/
  4. ServerName awstats_host_name.my_domain.org
  5. RewriteEngine On
  6. RewriteCond %{HTTPS} !=on
  7. RewriteRule ^/(.*) https://%{SERVER_NAME}/$1 [R,L]
  8. ErrorLog /var/log/apache2/awstats-error_log
  9. CustomLog /var/log/apache2/awstats-access_log vhost
  10. </VirtualHost>

Next we need to create a proper configuration for SSL virtual host with all parameters that would let us use cgi scripts. This configuration should be similar to this one:

View the awstats_vhost_ssl.conf
  1. <VirtualHost *:443>
  2. ServerAdmin admin@my_domain.org
  3. ServerName awstats_host_name.my_domain.org
  4. UseCanonicalName On
  5. SSLEngine on
  6. SSLOptions StrictRequire
  7. SSLCertificateFile /etc/ssl/apache2/server.crt
  8. SSLCertificateKeyFile /etc/ssl/apache2/server.key
  9. SSLProtocol all -SSLv2
  10. DocumentRoot /var/www/awstats_host_name/htdocs
  11. Alias /awstatsclasses "/var/www/awstats_host_name/htdocs/classes/"
  12. Alias /awstatscss "/var/www/awstats_host_name/htdocs/css/"
  13. Alias /awstatsicons "/var/www/awstats_host_name/htdocs/icon/"
  14. ScriptAlias /awstats "/var/www/awstats_host_name/cgi-bin/"
  15. <Directory "/var/www/awstats_host_name/htdocs">
  16. Options -Indexes FollowSymLinks
  17. AllowOverride All
  18. AuthType Basic
  19. AuthName "AWStats Admin Access Required"
  20. AuthUserFile /etc/awstats/.htpasswd
  21. require valid-user
  22. Order allow,deny
  23. Allow from all
  24. SSLRequireSSL
  25. </Directory>
  26. <Directory "/var/www/awstats_host_name/cgi-bin">
  27. Options ExecCGI -Indexes FollowSymLinks
  28. SetHandler cgi-script
  29. Order allow,deny
  30. Allow from all
  31. SSLRequireSSL
  32. </Directory>
  33. <Location /awstats>
  34. AuthType Basic
  35. AuthName "AWStats Admin Access Required"
  36. AuthUserFile /etc/awstats/.htpasswd
  37. require valid-user
  38. SSLRequireSSL
  39. </Location>
  40. ErrorLog /var/log/apache2/awstats-ssl-error_log
  41. CustomLog /var/log/apache2/awstats-access_log vhost
  42. CustomLog /var/log/apache2/awstats-ssl-request_log "%t %h %{HTTPS}x %{SSL_PROTOCOL}x %{SSL_CIPHER}x %{SSL_CIPHER_USEKEYSIZE}x %{SSL_CLIENT_VERIFY}x \"%r\" %b"
  43. </VirtualHost>

The last thing to do is access file generation. You can create this file and add a first user by running following command:

htpasswd2 -c /etc/awstats/.htpasswd my_user_name

and add new users to existing file by running following command:

htpasswd2 /etc/awstats/.htpasswd my_2nd_user_name

And this is it. You can enjoy your statistics and watch what type of browsers and operating systems vistors are using:

AWStats Statistics

or what type of content they are accessing:

AWStats Statistics

or analyze the traffic:

AWStats Statistics

Those informations are 100% reliable. You can not say the same about javascript based tracking services.

Sources:




Comments

If you have found something wrong with the information provided above or maybe you just want to speak your mind about it, feel free to leave a comment.
All comments will show up on page after being approved. Sorry for such policy but I want to make sure that my site will be free of abusive or vulgar content. I don't mind being criticized just do it using right words.

Leave a comment