When it comes to logging and monitoring, organizations nowadays are dealing with a colossal amount of data coming from various sources, including applications, servers, firewalls, VPNs, etc. These data are essential for doing a forensic analysis and finding use cases.

ClickHouse is highly regarded for its ability to process massive amounts of data in real-time, making it an ideal choice for logging use cases.

ClickHouse is a highly scalable, open-source database management system designed for OLAP. ClickHouse utilizes a columnar data structure and supports distributed query processing, partitioning, data replication, and sharding. This makes it an excellent choice for applications dealing with massive structured datasets, complex data reports, data science computations, and data analytics. These features make it an ideal candidate for a logging system, In this blog we will try to create a logging system based on clickhouse.

What is OLAP?

OLAP (for online analytical processing) is software for performing multidimensional analysis at high speeds on large volumes of data from a data warehouse. OLAP scenarios require real-time responses on top of large datasets for complex analytical queries with the following characteristics:

  • Datasets can be massive - billions or trillions of rows
  • Data is organized in tables that contain many columns
  • Only a few columns are selected to answer any particular query
  • Results must be returned in milliseconds or seconds

What is a Columnar database?

A columnar database is a database management system (DBMS) which stores data in columns rather than rows. The purpose of this storage method is to efficiently write and read data to and from hard disk storage, which leads to improved query performance and speed.

ClickHouse Quickstart

ClickHouse runs natively on Linux, FreeBSD, and macOS, and runs on Windows via the WSL. The simplest way to download ClickHouse locally is to run the following curl command. It determines if your operating system is supported, then downloads an appropriate ClickHouse binary:

Download the binary using the given command

curl https://clickhouse.com/ | sh

Once downloaded Start the server using the command

./clickhouse server

the Use the clickhouse-client to connect to your ClickHouse service. Open a new Terminal, change directories to where your clickhouse the binary is saved, and run the following command:

./clickhouse client

Now that we have successfully connected to the server, we can execute queries using this client

Let's define a simple table in ClickHouse to store our log entries. We'll create a table called logs with columns for the log message, timestamp, and any other relevant information.

( timestamp DateTime, log String, level String ) 
ENGINE = MergeTree() ORDER BY (timestamp);

We just started a simple click-house server with a table for logs. Now let's ingest some real logs into it using its driver SDK's. Clickhouse supports ingestion via HTTP interface and you can ingest it using the fluent bit, logstash as well

Ingesting logs

Let's use a Python script for ingesting logs into clickhouse. We'll use the click house-driver library to send data into Clickhouse.
it can be installed using pip package manager

pip install click house-driver

Please use the given Python script to write logs into Clickhouse

import clickhouse_driver
from datetime import datetime

def log_to_clickhouse(message, level='DEBUG'):
  client = clickhouse_driver.Client('localhost')
  timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
  query = f"INSERT INTO logs(timestamp, log, level) VALUES('{timestamp}', '{message}', '{level}');"

log_to_clickhouse("System shutdown", level='ERROR')
log_to_clickhouse("Something went wrong", level='ERROR')

Copy the given code into a Python file and execute it to ingest logs into click house

python main.py

Visualizing via grafana

Let's use Grafana as a visualization layer for our sample logging system. Use the given docker-compose file to spin up a Grafana instance.

version: '3.8'
    image: grafana/grafana-enterprise:latest
    container_name: grafana
    hostname: grafana
    - GF_INSTALL_PLUGINS=grafana-clickhouse-datasource
      - ""

Copy the YAML file into docker-compose.yaml and execute it via the docker command

docker-compose up

Once the stack is up you can access Grafana UI in the given URL

Now we can add the click house data source in Grafana in-order to visualize the data.

Go to connections and add Clickhouse as a data source

In the explorer section, you should be able to query from Clickhouse

Just like that, you can execute queries in the Clikhouse on top of logs like we do in the traditional logging systems. As Clickhouse follows a SQL-like language it would be easier to write queries and slice & dice the logs.

So we built a simple logging system using Clickhouse. Logs are supposed to be immutable by nature and ClickHouse works best with immutable data.  So from this blog, we can understand that Clickhouse can be used for logging. However various optimizations are required to convert it from a DB into a full full-fledged logging system.