Analyze Nginx access logs with PostgreSQL

By John Doe March 14, 2023

Summary: PostgreSQL supports the data types json and jsonb, while nginx supports log output in JSON format. We can easily implement the analysis of nginx access logs using PostgreSQL.

scenery

Table of Contents

Configure Nginx to output logs using JSON format

Modify the Nginx configuration file /etc/nginx/nginx.conf, configure it output access logs in JSON format:

##
# Logging Settings
##
map $time_iso8601 $date {
  ~([^T]+) $1;
}
map $time_iso8601 $time {
  ~\T([0-9:]+)[\+\-] $1;
}
map $time_iso8601 $timezone {
  ~([\+\-][0-9:]+)$ $1;
}
map $msec $millisec {
  ~\.([0-9]+)$ $1;
}

log_format json_logger escape=json
'{'
    '"time":"$date $time.$millisec$timezone",'
    '"client_addr":"$remote_addr",'
    '"host":"$http_host",'
    '"path":"$request_uri",'
    '"request":"$request",'
    '"status":"$status",'
    '"content_type":"$sent_http_content_type",'
    '"body_bytes_sent":"$body_bytes_sent",'
    '"http_referrer":"$http_referer",'
    '"user_agent":"$http_user_agent"'
'}';

access_log /var/log/nginx/access.log json_logger;
error_log /var/log/nginx/error.log;

Restart the nginx service to load the new configuration:

# systemctl restart nginx

Import log files in JSON format

Use psql to connect to your database:

$ psql -U postgres -d postgres

First, create a temporary table where you’ll import your data.

CREATE TEMPORARY TABLE json_logs (log jsonb);

Once the table is created, let’s import our data.

We will use psql’s COPY command to load a local file to the PostgreSQL server and into the temporary table:

\COPY json_logs (log) FROM '/var/log/nginx/access.log';

This will import each JSON object into a row in the temporary table, now you can easily query the data like this:

SELECT log->>'time' AS time, log->>'path' AS path
  FROM json_logs;

             time              |  path  
-------------------------------+--------
 2023-03-14 14:47:07.832+08:00 | /
 2023-03-14 14:47:37.893+08:00 | /zh-cn
...

Moving the JSON data into a proper table

Now we have our JSON data but it’s all in a JSONB column, we might want to import it into a proper table to guarantee the types and structure, which is very recommended. You can make use of INSERT INTO .. SELECT statement to do this easily, here’s an example (you can use an existing table to do this):

CREATE TABLE nginx_logs (
    time timestamptz,
	client_addr varchar,
	host varchar,
	path varchar,
	request varchar,
	status smallint,
    content_type varchar,
	body_bytes_sent integer,
	http_referrer varchar,
	user_agent varchar);

INSERT INTO nginx_logs
  SELECT (log->>'time')::timestamptz,
         log->>'client_addr',
		 log->>'host',
		 log->>'path',
		 log->>'request',
		 (log->>'status')::smallint,
		 log->>'content_type',
		 (log->>'body_bytes_sent')::integer,
		 log->>'http_referrer',
         log->>'user_agent'
    FROM json_logs;

This will insert all the records into the table nginx_logs and when running it Postgres will issue errors if there are type mismatches, which is very helpful to ensure data integrity. Note that if you’re inserting a large amount of records, indexes and triggers in the destination table can greatly increase the time it takes to insert your data, you can temporarily disable them when inserting to improve the speed significantly.

Query and analyze access logs

Now that we have converted the log content in JSON format into a data table, we can use PostgreSQL’s powerful query capability to analyze the access log. For example, if we want to query the number of visits per hour, we can do this:

SELECT  date_trunc('hour', time) AS hour,
        count(time) AS access
  FROM nginx_logs
  WHERE content_type LIKE 'text/html;%'
  GROUP BY hour
  ORDER BY hour;