Squid view statistics. Comparison of SQUID traffic accounting programs

> ee /usr/local/etc/squid/squid.conf # Display messages to users in Ukrainian, default English error_default_language uk-ua # Squid listens to the port only on localhost. intercept enable transparent proxy http_port 127.0.0.1:3128 intercept visible_hostname = local # for example, there is 192 MB of RAM - you can set it to 64 MB. Allocation of RAM for the cache, and not for the SQUID process. cache_mem 64 MB # size of the cache folder 800 MB cache_dir ufs /usr/local/squid/cache 800 16 256 # limits for enabling the mechanism for clearing the cache of obsolete data, in percentage # the cache begins to be cleared when it reaches 95% full cache_swap_low 90 cache_swap_high 95 # default cache clearing policy memory_replacement_policy lru #This tag specifies the size of the object that can be stored in memory. Objects #larger than this size will not be saved in memory. Objects are #retrieved from memory faster, so it should only contain objects that are #frequently requested by clients. Increasing the value of this tag reduces server performance. maximum_object_size_in_memory 512 KB # minimum file size to save in cache minimum_object_size 0 KB # maximum size file to save in cache maximum_object_size 4096 KB # This tag allows you to set the ftp password for the Anonymous(!!!) user # on whose behalf Squid will view anonymous resources via the FTP protocol. ftp_user [email protected]# enable/disable FTP passive mode ftp_passive on #This tag determines the number of log file rotations. The default is 10. This means that when you issue the command "squid -k rotate", the current log file will have an extension of 0 to 9 and will be put aside. Instead, a new file will be created for logging and now all records will be recorded in the new file. Setting the logfile_rotate tag to 0 will disable file rotation. logfile_rotate 4 access_log /var/log/squid/access.log squid # debugging information. contains basic information about cache usage. cache_log /var/log/squid/cache.log # Default: none #cache_store_log /var/log/squid/store.log #This tag specifies a list of words that, when found in the URL, #tell Squid that that the object located at this URL must be taken #directly, and not from the cache. hierarchy_stoplist cgi-bin ?

Let's check configuration file Squid for errors.

> squid -f /usr/local/etc/squid/squid.conf -k parse 2010/03/16 16:40:32| Processing Configuration File: /usr/local/etc/squid/squid.conf (depth 0)

Before the first launch, you need to create a Squid cache.

> squid -z 2010/03/16 16:42:05| Creating Swap Directories 2010/03/16 16:42:05| /usr/local/squid/cache exists 2010/03/16 16:42:05| Making directories in /usr/local/squid/cache/00 ...

To start Squid you need to add the line squid_enable=yes to the rc.conf file

> /usr/local/etc/rc.d/squid start

Everyone who raises a proxy then wants to see who is using it, who is downloading how much. And sometimes it can be very useful to see in real time who is downloading what. The following programs will be discussed in this topic:
SqStat- Real Time statistics via web
- Squid log analyzer with subsequent HTML generation
SquidView- Interactive console monitor of Squid logs

0. Introduction

I won't tell you how to configure Apache here. There are so many manuals on this topic on the Internet, so go ahead and sing, I will talk about the features that I implemented at home.
Yes, I will tell you using Debian Etch as an example, your paths may differ, keep in mind...
Go…

1. SquidView

This program runs in the console, and displays everything that Squid does there.
Installation:

Aptitude install squidview

Let's wait a couple of seconds if you have fast internet. That's it, now we can see who is downloading what. If you have not changed the location of the logs and left most of the squid parameters default, then to view it you only need to run it, but with root rights, because the squid logs are written by it...

Sudo squidview

I think that this will be enough for you, but I will also tell you very useful things, you need to press the buttons and watch:

  • h - help, here we can learn even more;)
  • l - enter - report generation, you can also configure additional settings
  • T - recording of statistics on the size of the download will begin
  • O - viewing who downloaded what by user, after T

As for SquidView, everything seems to be all right, if I didn’t tell you anything, write me and I’ll add it!

2. SqStat

This is a script that allows you to view active connections, channel load, and average channel load.
I assume that you already have Apache configured.
Download latest version,

Wget -c samm.kiev.ua/sqstat/sqstat-1.20.tar.gz
tar xvfz sqstat-1.20.tar.gz
cd ./sqstat-1.20
mkdir /var/www/squid-stat
cp -R * /var/www/squid-stat*

That's it, now we need to configure Squid-cgi or cachemgr.cgi. Set:
aptitude install squid-cgi

Now you need to configure access...

Nano /etc/squid/squid.conf

Add
acl manager proto cache_object
http_access allow manager localhost
http_access deny manager
#This line sets the password secret and allows you to do everything
cachemgr_passwd secret all

Now you need to fix /etc/squid/cachemgr.conf
echo "*" >> /etc/squid/cachemgr.conf
Instead of * you can put the network address that squid is listening to

For some reason I couldn’t start it at the address 127.0.0.1, so I entered 192.168.0.1 and everything worked. Now you need to enter the external network address in the Cache Host field. What port do you have, in the login field, if you did everything according to the manual, you don’t have to enter anything, and write secret in the password field. If everything went well, then you will see a list of available parameters... You can take a look, and we move on to setting up SqStat...

Nano /var/www/squid-stat/config.inc.php
//This is the address where your squid listens
$squidhost="192.168.0.1";
$squidport=3128;
$cachemgr_passwd="secret";
//This parameter allows names to be resolved by records in your system
$resolveip=false;
//This file contains the IP and names of the computers, you can use the Cyrillic alphabet :)
$hosts_file="hosts";
$group_by="host";

In principle, the config itself is well documented, study it, fortunately there is nothing to study there))

Now we’re making a subdomain, it’s much more convenient)

Nano /etc/apache2/sites-enabled/sqstat

ServerAdmin [email protected]
DocumentRoot /var/www/squid-stat/
ServerName proxy.server.local

To resolve, write to /etc/hosts

Nano /etc/hosts
192.168.0.1 proxy.server.local

That's all :) almost everything

Squid -k reconfigure
/etc/init.d/apache2 reload

3. Sarg

This program generates html reports, draws graphs, etc...
We put:

Aptitude install sarg

Nano /etc/squid/sarg.conf
language Russian_koi8
graphs yes
title "Squid User Access Reports"!}
temporary_dir/tmp
output_dir /var/www/sarg
max_elapsed 28800000
charset Koi8-r

Of course, no one is stopping you from making fun of the display style of this entire facility - the config is provided with very detailed comments.

Crontab -u root -e
* 08-18/1 * * * /usr/sbin/sarg-reports today
* 00 * * * /usr/sbin/sarg-reports daily
* 01 * * 1 /usr/sbin/sarg-reports weekly
* 02 1 * * /usr/sbin/sarg-reports monthly

Epilogue

That's it :)) If you want, you can create a subdomain for it too! This has already been described...
I myself use all three programs and am satisfied.

UPD. To solve the problem with version 3 skid, you need to create a soft link:

Ln -s /var/log/squid3/access.log /root/.squidview/log1

UPD.2. The next article will talk about delay pools

Squid- a software package that implements the function of a caching proxy server for the HTTP, FTP, Gopher and (if appropriate settings) HTTPS protocols. Developed by the community as an open source program source code(distributed under the GNU GPL). All requests are executed as one non-blocking I/O process. Used on UNIX systems and OS Windows family N.T. Has the ability to interact with Active Directory Windows Server through authentication via LDAP, which allows you to use restrictions on access to Internet resources for users who have accounts on Windows Server, and also allows you to organize “slicing” of Internet traffic for different users.

Sarg (Squid Analysis Report Generator)— a report generator based on analysis of the Squid proxy server log file. Reports allow you to find out which user accessed which site at what time. The summary report can be of great help in charging users working through Squid, since it includes the total traffic and the number of connections for each user over a certain period of time.

Installing and configuring Squid

There are two versions of Squid - 2.x and 3.x. The latest beta version is 3.1. We will install the third branch of the proxy server. At the time of writing, the stable version was 3.0.STABLE15.

# cd /usr/ports/www/squid30/ # make install clean

While I was looking for descriptions of all the options in English, I realized that if the option is not used within a month, then there is no point in installing it, because If I understand correctly, then some of the options, starting from release 3.1, will be activated in the config (for example, SQUID_PINGER). Therefore, I did not mark those options that I do not plan to use after the initial configuration. In other words, I need a simple caching proxy server.

In the installation options I selected SQUID_SSL, SQUID_IDENT, SQUID_PF, SQUID_KQUEUE.

All squid configuration files are located in the /usr/local/etc/squid/ folder
Cache directory: /usr/local/squid/cache/
Logs are written to /usr/local/squid/logs/

Add the following line to /etc/rc.conf:

Squid_enable="YES"

Before starting work, you need to initialize the cache directories. This is done with the command:

# squid -z

The command must be executed from root user or squid. And you need to run it after you have created a ready-made configuration file, otherwise errors may occur.

By default, Squid's configuration will deny everyone access to the proxy server. To change this, you need to edit the parameters http_access in the file /usr/local/etc/squid/squid.conf.

# squid -f /usr/local/etc/squid/squid.conf -k parse

The initial configuration file is the file where everything is set by default. As the documentation says, if you will not change an option and want its default value, then you do not need to uncomment it. This may cause run-time errors. Also, you need to know that the meaning NONE sometimes it means that no value of a certain option should be used at all, and sometimes it is a valid option. So, if there are misunderstandings about one or another parameter in the configuration file, you should refer to the official documentation (a link to the site with the documentation can be found at the very bottom of the article).

The configuration file provides for the use of the include directive to include additional configuration files, for example:

Include /path/to/file/configuration/squid.acl.config

Please note that the number of levels include limited to 16. This was done to prevent loops where Squid endlessly included configuration files specified in other files.

My configuration will be relevant for networks where Squid in conjunction with is used as a transparent proxy server. All access will be from internal local network and is closed to everyone from the outside. Therefore, no authorization parameters will be used. Also, because This is a transparent proxy server, then you need to redirect all requests through PF to port 3128.

Also, we must remember that by default Squid accepts requests on port 3128.

Now that we're done with the little theoretical part, it's time to move on to the configuration part. In order to save space, here I will indicate only those parameters that I changed and those parameters that are worth paying attention to. So let's get started. Configuration file:

# commented out the following lines because... I don’t have these subnets #acl localnet src 10.0.0.0/8 #acl localnet src 172.16.0.0/12 # I also commented out the unused ports and added the necessary ones # I don’t know how true this is, maybe it would be better to leave them acl WEBMIN_ports port 10000 acl Safe_ports port 8080 # www acl Safe_ports port 10000 # webmin #acl Safe_ports port 70 # gopher ( network protocol distributed search and transmission of documents) #acl Safe_ports port 210 # wais (network information retrieval system) #acl Safe_ports port 1025-65535 # unregistered ports #acl Safe_ports port 280 # http-mgmt #acl Safe_ports port 488 # gss-http #acl Safe_ports port 591 # filemaker (cross-platform relational databases) #acl Safe_ports port 777 # multiling http (some kind of prehistoric protocol that no one knows anything about) # if this is not specified, then it will not be possible to connect to Webmin via the https address: //webmin_address:10000 http_access allow CONNECT WEBMIN_ports # uncomment the following line... it ensures that the proxy server does not access # http://localhost through the proxy server itself # this is a recommended parameter, but not mandatory http_access deny to_localhost # because If we have a transparent proxy server, then we need to change the default value to the following # and I also add the IP address on which to receive requests, because The computer has two network cards http_port 192.168.0.10:3128 transparent # how much memory Squid processes can occupy; by default 8 MB, but in version 3.1 # this parameter will be increased to 256 MB - we will also increase cache_mem 256 MB # maximum size of objects in memory... again, in version 3.1 this parameter will be increased # from 8 KB to 512 KB ; do the same maximum_object_size_in_memory 512 KB # specify the directory with the cache. the parameter has the form: # cache_dir ufs Directory-Name Mbytes L1 L2, where # Directory-Name is the directory itself # Mbytes - the number of megabytes that are allocated for the directory (I allocate 2 gigs) # L1 - the number of directories that can be created in folder with the cache (default 16) # L2 - the number of subdirectories that can be in each directory (default 256) cache_dir ufs /usr/local/squid/cache 2048 256 512 # I didn’t touch the parameter, but I want to make a small explanation about # the maximum size of objects that will be saved in the cache # if you want to save traffic, then you should increase this parameter # if you want performance, lower it # maximum_object_size 4096 KB # if free place in the directory with the cache, 90% is used up (cache_swap_low), then # gradual recycling (replacement) of saved objects begins # when the percentage of occupied space reaches 95% (cache_swap_high), recycling # occurs more “aggressively” # it’s also worth taking into account, then if a lot of space is allocated for the cache directory, then the difference # between 90% and 95% can be hundreds of megabytes... it’s worth thinking about reducing the difference between # these parameters # cache_swap_low 90 # cache_swap_high 95 # this log is responsible for which objects are removed from the cache, which ones are saved and for how long # because There are no utilities for creating reports on this data, then you can safely disable them cache_store_log none # indicate which part of the client’s IP address to include in the log. the default is the entire address. # but if, for example, you specify the value 255.255.255.0, then the first three octets # of the IP address will be displayed in the log. those. if the client (192.168.0.15 makes a request), then the log will display - 192.168.0.0 # I did not change the parameter, because cunning and I want to know who goes where # client_netmask 255.255.255.255 # Let's comment this out, because we don't use this protocol #refresh_pattern ^gopher: 1440 0% 1440 # e-mail of the person who is responsible for the cache. a letter will be sent here if something happens to the cache cache_mgr av3nger # server name visible_hostname computer_name # don't use this #icp_port 3130 # at the very bottom of the config there is a DNS OPTIONS section. You can play with it some more if you need more “standardization”

And now it’s worth initializing the cache directories. To make it “transparent”, you need to add the following line to the Packet Filter /etc/pf.conf configuration file:

Rdr on $int_if proto tcp from $int_if:network to !(self) port 80 -> proxy_server_ip_address port 3128

Let me remind you that you can see an example of setting up PF.

Problems I faced

1. When redirecting, the sender's IP address is replaced with the server address. Those. for example, if I (192.168.0.3) try to connect to 192.168.0.10:8080 (web server) through a proxy, but the web server has a restriction on connecting only from IP 192.168.0.3, then the connection will NOT happen. Because the logs will say that the connection came from the address 192.168.0.10. There are two ways to solve it. The first is to bother with NAT, the second is to simply fix the Apache config. After all, the proxy server is still prohibited for use from outside. We only lose the protection of the site on the local network. Which, of course, is not very good, but tolerable.

Installing and configuring Sarg

Let's get started with the installation right away:

# cd /usr/ports/www/sarg/ # make install clean

The only installation option available is GD. Let's celebrate it. GD is a graphics library, most likely responsible for displaying beautiful graphs. If a window pops up where you need to select installation options for GD, then you do not need to select anything.

All configuration files are located in /usr/local/etc/sarg/. We need the sarg.conf file, open it in any editor and bring it to something like this:

# language (from Russian there are also Russian_windows1251 and Russian_UTF-8) language Russian_koi8 # indicate where the Squid logs are located access_log /usr/local/squid/logs/access.log # use graphs where possible graphs yes # indicate the name of the pages title " Squid usage statistics" # директория для временных файлов temporary_dir /tmp # куда кидать отчеты (у меня так) output_dir /usr/local/www/secure/squid-reports # если хочется, чтобы репорты приходили вам на почту, то можно воспользоваться следующим параметром # при этом, отчеты в папку, указанную вышу, сохраняться не будут # output_email ваш_email # преобразовываем ip в адрес resolve_ip yes # по какому полю и как сортируем отчеты для страницы с топовыми пользователями # поля бывают USER, CONNECT, BYTES и TIME # способы сортировки - normal и reverse topuser_sort_field BYTES reverse # тоже самое, но для страницы пользователи user_sort_field BYTES reverse # европейский формат дат date_format e # удаляем временные файлы remove_temp_files yes # генерируем файл index.html index yes # если папка с отчетом уже создана, то мы ее перезаписываем overwrite_report yes # удаляем из отчета записи, содержащие !} following codes(400, 404, etc.) # the code must be entered into the specified file exclude_codes /usr/local/etc/sarg/exclude_codes # I uncommented the following line to get all the reports # I don’t include the entire line, because it is a bit long report_type ... # in the next file we indicate the pairs ip-address - user # the format is as follows: 192.168.0.1 Vasily Pupkin # at the end there must be end-of-line characters (in other words, you must press Enter) usertab /usr/local/ etc/sarg/users # encoding of generated reports charset Koi8-r # remove the logo; it just gets in the way show_sarg_logo no # replace the bytes with more understandable values ​​(kb and mb) displayed_values ​​abbreviation

All! Now, to generate reports, you just need to run the command:

To avoid having to run this command all the time, you can add the following entry to your crontab. Sarg will run every day at midnight.

@daily /usr/local/bin/sarg

1. Squid official website (in English)
2. (in English)
3. (again, in English)

Recently, our company needed to transfer a proxy server from MS ISA Server to free software. It didn’t take long to choose a proxy server (squid). Taking advantage of several practical recommendations, configured the proxy to suit our needs. Some difficulties arose when choosing a program for traffic accounting.

The requirements were:

1) free software
2) the ability to process logs from different proxies on one server
3) the ability to create standard reports with sending by mail, or a link on a web server
4) building reports for individual departments and distributing such reports to department heads, or providing access via a link on a web server

The developers provide very little information on traffic accounting programs: a laconic description of the purpose of the program plus an optional bonus of a couple of screenshots. Yes, it is clear that any program will calculate the amount of traffic per day/week/month, but additional interesting features that distinguish one program from others are not described.

I decided to write this post in which I will try to describe the capabilities and disadvantages of such programs, as well as some of their key features, in order to help those who have to make a choice a little.

Our candidates:

SARG
free-sa
lightsquid
SquidAnalyzer
ScreenSquid

Retreat

Information about the “age” of the program and the latest release is not a comparison parameter and is provided for information only. I will try to compare exclusively the functionality of the program. I also deliberately did not consider too old programs that have not been updated for many years.

Logs are sent to the analyzer for processing in the form in which squid created them and will not undergo any pre-processing to make changes to them. Processing of incorrect records and all possible transformations of log fields must be done by the analyzer itself and be present only in the report. This article is not a setup guide. Configuration and usage issues can be covered in separate articles.


So let's get started.

SARG - Squid Analysis Report Generator

The oldest among supported programs of this class (development started in 1998, former name - sqmgrlog). Latest release (version 2.3.10) - April 2015. After that there were several improvements and fixes that are available in the master version (can be downloaded using git from sourceforge).

The program is launched manually or via cron. You can run it without parameters (then all parameters will be taken from the sarg.conf configuration file), or you can specify parameters in command line or a script, for example the dates for which the report is generated.

Reports are created as html pages and stored in the /var/www/html/squid-reports directory (by default). You can set a parameter that specifies the number of reports stored in the catalog. For example, 10 daily and 20 weekly, older ones will be automatically deleted.

It is possible to use several config files with different parameters for different report options (for example, for daily reports, you can create your own config, in which the option for creating graphs will be disabled and a different directory for outputting the report will be specified).

Details

When entering the main page with reports, we can select the period for which it was created (defined in the report creation parameters), the date of its creation, the number of unique users, the total traffic for the period, the average amount of traffic per user.

When you select one of the periods, we will be able to get a topusers report for this period. Below I will give descriptions and examples of all types of reports that SARG can produce.

1) topusers - total traffic by users. A user is either the name of the host to which Internet access is granted, or the user's login. Sample report:

IP addresses are displayed here. When configured to enable the corresponding option, IP addresses are converted to domain names.

Are you using authentication? Accounts converted to real names:

The appearance can be customized in a css file. The displayed columns are also customizable, and unnecessary ones can be removed. Column sorting is supported (sorttable.js).

When you click on the graph icon on the left, you will see a graph like this:

When you click on the icon on the right, we get report 5.

2) topsites - report on the most popular sites. By default, a list of the 100 most popular sites is displayed (the value can be adjusted). Using regular expressions or specifying aliases, you can combine traffic from 3 or more domains high levels to a 2nd level domain (as in the screenshot) or set any other rule. For each domain, you can set a rule separately, for example, for yandex.ru and mail.ru to combine up to the 3rd level. The meaning of the fields is quite obvious.

3) sites_users - a report on who visited a specific site. Everything is simple here: the domain name and who accessed it. Traffic is not shown here.

4) users_sites - a report on sites visited by each user.

Everything is clear here too. If you click on the icon in the first column, we get report 8).

5) date_time - distribution of user traffic by day and hour.

6) denied - requests blocked by squid. This displays who, when and where access was denied. The number of entries is configurable (default is 10).

7) auth_failures - authentication failures. HTTP/407.
The number of entries is configurable (default is 10).

8) site_user_time_date - shows what time the user visited which site and from which machine.

9) downloads - list of downloads.

10) useragent - report on programs used

The first part of the report displays the IP address and used useragents.

In the second - a general list of useragents with distribution in percentage, taking into account versions.

11) redirector - the report shows who has had access blocked using the blocker. Squidguard, dansguardian, rejik are supported, the log format is customizable.

SARG has more than 120 settings parameters, language support (100% of messages are translated into Russian), support for regular expressions, work with LDAP, the ability to provide users with access only to their reports on the web server (via .htaccess), the ability to convert logs into their own format to save space, uploading reports to a text file for subsequent filling of the database, working with squid log files (splitting one or more log files by day).

It is possible to create reports for a specific set of specified groups, for example, if you need to make a separate report for a department. In the future, access to the web page with department reports can be provided, for example, to managers using a web server.

You can send reports by e-mail, however, for now only the topusers report is supported, and the letter itself will be simple text without HTML support.

You can exclude certain users or certain hosts from processing. You can set aliases for users, combining the traffic of several accounts into one, for example, all outstaffers. You can also set aliases for sites, for example, combine several social networks, in this case, all parameters for the specified domains (number of connections, traffic volume, processing time) will be summed up. Or, using a regular expression, you can “discard” domains above level 3.
Possible unloading in separate files a list of users who exceeded certain volumes for a period. The output will be several files, for example: userlimit_1G.txt - exceeding 1 Gb, userlimit_5G.txt - exceeding 5 Gb and so on - 16 limits in total.

SARG also has a couple of PHP pages in its arsenal: viewing current connections to squid and for adding domain names to squidguard block lists.

In general, this is a very flexible and powerful tool that is easy to learn. All parameters are described in the default configuration file; there are more in the project on sourceforge detailed description all parameters in the wiki section, divided into groups, and examples of their use.

free-sa

Domestic development. There have been no new versions since November 2013. More than stated quick creation reports compared to competing programs and less space required for ready-made reports. Let's check!

In terms of operating logic, this program is closest to SARG (and the author himself compares it with this program (for example,)), so we will compare it with it.

I was pleased that there were several design themes. The theme consists of 3 css files and 4 png icons corresponding to them.

Reports are actually done faster. The daily report was created at 4:30, when SARG's was 12 minutes. However, the volume occupied was not the case: the volume occupied by reports is 440 MB (free-sa) and 336 MB (SARG).

Let's try to give a more difficult task: process a 3.2 GB log file in 10 days, which contains 26.3 million lines.

Free-sa also made the report faster, in 46 minutes, the report takes up 3.7 GB of disk space. SARG spent 1 hour 10 minutes, the report takes up 2.5 GB.

But both of these reports will be awkward to read. Who, for example, would want to manually calculate which domain is more popular - vk.com or googlevideo.com and manually count the traffic of all their subdomains? If you leave only 2nd-level domains in the SARG settings, then creating a report will take about the same time, but now the report itself takes up 1.5 GB on disk (daily from 336 MB has decreased to 192 MB).

Details

When entering the main page we see something like the following (the blues theme is selected):

To be honest, the purpose of displaying the year and months is unclear; when you click on them, nothing happens. You can write something in the search field, but again nothing happens. You can select the period of interest.

List of blocked URLs:

CONNECT metdod report:

PUT/POST metdod report:

Popular sites:

The report on the effectiveness of the proxy server seemed interesting:

User report:

When you click on the graph icon in the second column, we get a graph of Internet usage by a specific user:

When you click on the second icon, we get a table of Internet channel loading by hour:

When you select an IP address, we get a list of sites by user in descending order of traffic:

All statistics are displayed in bytes. To switch to megabytes you need to set the parameter


The program does not accept compressed log files, does not accept more than one file with the -l parameter, and does not support filtering files by mask. The author of the program suggests circumventing these restrictions by creating named pipes.

An annoying glitch was discovered - when the length of the log line is too long, timestamps are entered instead of addresses:

When viewing the traffic of this “user” you can see the domain with the source of the error:

Thus, the number of users has increased several times.

If we compare these two programs, free-sa creates the report a little faster. I was not able to detect a 20-fold increase in speed, as stated by the author. Perhaps it can be seen under certain conditions. I think it doesn’t matter how long it takes to create a weekly report at night - 30 minutes or 50. In terms of the amount of space occupied by reports, free-sa has no advantage.

lightsquid

Perhaps the most popular traffic counter. It works quickly, reports do not take up much disk space. Although this program has not been updated for a long time, I still decided to consider its capabilities in this article.

The logic of the program is different: the program reads the log and creates a set of data files, which it then uses to create web pages. That is, there are no pre-created reports with data here; pages with data are generated on the fly. The advantages of this solution are obvious: to obtain a report, it is not necessary to parse all the logs for the period; it is enough to “feed” the accumulated log to lightsquid once a day. You can do this using cron several times a day, even several times a day, to quickly add a new piece of information.

There are some drawbacks: it is impossible to process logs from different servers and collect statistics in one place: when processing a log for a day from another server, the existing statistics for that day are erased.

There is a strange limitation: lightsquid “perceives” both uncompressed log files and compressed ones (gz - exactly), but in the second case the file name must be in the following format: access.log.X.gz, files with the name format access.log- YYYYMMDD.gz will not accept it.

Through simple manipulations we overcome this limitation and see what happens.

Details

The report for the month (total traffic 3 TB, 110 million lines) took up 1 GB of disk space.

On home page We see traffic by day for the current month.

When you select a day, we see a report for the day for all users:

If groups are specified, the name of the group to which the user belongs is displayed in the right column. Users who are not members of any group are grouped into group 00 no in group (they are marked with a question mark in this report).

When choosing on home page grp on the corresponding date we get to the report page of users divided into groups. Those not included in any group are listed first, then the groups in order.

When you click on the name of a group in the table on the right, we go below to the place on the page where the report for this group begins:

When you click on “Top sites report” we get a report on popular sites for the day:

Big files report:

Let's move on to the table on the right.
Here you can get a list of top sites for the month and for the whole year (they look the same, so no screenshot), general statistics for the year and month, as well as statistics for the year and month by group.

Statistics for the month:

By clicking on the clock icon we can see a table of sites, access time and traffic consumed per hour:

Statistics for the day are displayed here, but for the month and for the year it will look approximately the same, hourly statistics for domains will be summed up.

When you click on the graph icon, we can see the user’s traffic consumption during the month:

The graph columns are clickable: when you click on a column, you go to the user’s statistics for another day.

By clicking on [M], we will receive a report on the user’s traffic consumption during the month, indicating the volume for each day and for the full week.

When you click on the user's name, we get a list of sites that the user visited in descending order of traffic:

Well, that seems to be all. Everything is simple and concise. IP addresses can be converted to domain names. With the help of regular expressions, domain names can be combined into 2nd level domains, just in case I provide regular expression:

$url =~ s/(+:\/\/)??(+\.)(0,)(+\.)(1)(+)(.*)/$3$4/o;

If you have skills in perl, you can customize it to suit your needs.

SquidAnalyzer

A program similar to lightsquid and also written in Perl. Prettier design. The latest version 6.4 was released in mid-December of this year, many improvements have been made. Program website: squidanalyzer.darold.net.

SquidAnalyzer can use multiple processors on your computer (the -j option), which results in faster reporting, but this only applies to uncompressed files. For packed ones (gz format is supported), processing occurs using one processor core.

And one more comparison with lightsquid: the same report on the same server took about a day, it takes up 3.7 GB on disk.

Just like lightsquid, SquidAnalyzer will not be able to merge two or more log files from different servers for the same period.

More details

Home page - you can select the year of the report.

When selecting any period (year, month, week, day) appearance web pages will be similar: at the top there is a menu with the following reports: MIME types, Networks, Users, Top Denied, Top URLs, Top Domains. Below are the proxy statistics for the selected period: Requests (Hit/Miss/Denied), Megabytes (Hit/Miss/Denied), Total (Requests/Megabytes/Users/Sites/Domains). Below is a graph for the number of requests per period and traffic.

On the right top corner there is a calendar. When you select a month, you can see brief statistics and a download graph by day:

The calendar allows you to select a week. When selected, we will see similar statistics:

When you select a day, you see statistics by hour:

Content Type Report:

Networks report.

User report.

When you select a user, we get his statistics for the period.

Prohibited resources:

Report on 2nd level domains.

On my own behalf, I would like to note the very slow operation of the program as information accumulates. With each new log, statistics for the week, month and year are recalculated. Therefore, we recommend this program for processing logs from a server with big amount I wouldn't do traffic.

ScreenSquid

This program has a different logic: the log is imported into the database MySQL data, then data is requested from it when working in the web interface. The database with the processed ten-day log mentioned earlier occupies 1.5 GB.

More details

The program cannot import log files with an arbitrary name; it binds only to access.log.

Home page:

Brief statistics:

You can create aliases for IP addresses:

... and then they can be combined into groups:

Let's move on to the main thing - reports.

On the left is a menu with report types:

User traffic logins
IP address user traffic
Website traffic
Top sites
Top users
Top IP addresses
By time of day
User traffic logins expanded
IP address user traffic extended
IP address traffic with resolution
Popular sites
Who downloaded large files
Traffic by period (days)
Traffic by period (day name)
Traffic by period (months)
HTTP statuses
Login IP addresses
Logins from IP addresses

Examples of reports.

IP address user traffic:

Website traffic:

Top sites:

... further, to be honest, I didn’t have enough patience to study the possibilities, since the pages began to be generated in 3-5 minutes. The “time of day” report for the day, the log for which was not imported at all, took more than 30 seconds to create. For a day with traffic - 4 minutes:

Add tags

Send anonymously

One of the pressing issues for system administrator is to obtain statistics on Internet use in an organization. Having such data, you can always answer management’s question “where did all the Internet go”, justify the need to expand the channel, and promptly identify and stop unwanted traffic. Today we will look at such a solution for the Ubuntu Server platform.

The main type of traffic we are interested in is HTTP, which makes up the lion's share of incoming Internet traffic in an organization and is the most interesting because it allows us to judge the activity and preferences of users (as well as how they spend their working hours). All the data we need is available in the Squid proxy server logs, but we won’t look through them manually! A tool is needed to analyze and provide reports based on these logs. One such tool is SARG - Squid Analysis Report Generator, as reflected in its name.

Let's get started. Before you begin installing SARG, you need to prepare the server; this utility produces reports in HTML format and to work with them you will need an installed web server. If you are not going to use the router as a full-fledged web server, then a lightweight server will be enough lighttpd:

Sudo apt-get install lighttpd

The server starts working immediately after installation; to check, type the server address in your browser and you will see a standard page. Default lighttpd accepts connections on all interfaces, which does not suit us in any way; we will limit its operation to the internal network. Open the configuration file /etc/lighttpd/lighttpd.conf, we find and reduce the option to the following form:

Server.bind = "10.0.0.1"

where 10.0.0.1 is the internal address of the router, also do not forget to uncomment this line and restart the web server:

Sudo /etc/init.d/lighttpd restart

Install SARG:

Sudo apt-get install sarg

Setting up the log analyzer is quite simple and comes down to choosing the language, encoding and format of the report, as well as the path for its placement. We make all changes to the file /etc/sarg/sarg.conf:

Language Russian_UTF-8
graphs yes
graph_days_bytes_bar_color orange
output_dir /var/www/squid-reports
charset UTF-8

We also find and comment out the line:

#site_user_time_date_type table

Now we can check the operation of the analyzer:

Sudo /usr/bin/sarg

After the utility finishes working, type in the browser http://10.0.0.1/squid-reports, you should see the following page:

By default, SARG generates a report for the entire available period; the report contains details on users (addresses) and the sites they visited, traffic and cache usage, and downloads. Separately, you can view the most visited sites; this report sorts sites not by traffic, but by the number of visits.

You can get comprehensive statistics for each user:

You can also view the traffic consumption graph and operation statistics by date and time.

If you wish, you can customize the display of reports to your liking; the SARG configuration uses standard report output parameters HTML tags and well documented. If you are fluent in HTML basic level, this operation should not cause you any difficulties.

The analyzer is configured and working, which is good. But running it manually every time is not very interesting, so we’ll configure the system to receive daily, weekly and monthly reports. To do this, open the file /etc/sarg/sarg-reports.conf and indicate the path for posting reports, as well as the address and link for the logo.

HTMLOUT=/var/www/squid-reports
LOGOIMG=/sqiud-reports/logo.png
LOGOLINK="http://10.0.0.1/squid-reports"

Please note that the logo image must be located within the web server root folder (/var/www) and the paths are specified from the web server root, not the file system.

Now let's set a schedule for generating reports, which needs to be added to /etc/crontab

00 09-18 * * * root sarg-reports today
00 22 * ​​* * root sarg-reports daily
30 22 * ​​* 0 root sarg-reports weekly
30 23 1 * * root sarg-reports monthly

This schedule means that every hour from 9:00 to 18:00 (working day of the organization) a script for generating daily statistics is launched, every day at 22:00 statistics for the day are generated, at 22:30 on Sundays - statistics for the week and the first day of each month at 23:30 statistics for the month.

Internet