“THANOS” — Monitoring with Prometheus and Grafana

7 min readMay 31, 2021

When I started working as a DevOps Engineer, I was introduced with the concept of Monitoring, Either it’s infrastructure level monitoring, application level monitoring or something like that. Let’s consider a scenario where we have an infrastructure of more than five hundred servers, In any case we are bound to do the pro-active monitoring of our whole infrastructure very efficiently and smartly in order to meet the service level agreements and keep the system up and running. If we are any sort of service provider then we must keep an eagle’s eye on whole of our stack.
When I was a bit new I saw monitoring of a huge infrastructure with sensu, where I got the basic concept of how monitoring of the whole Infrastructure is being done, Moreover I also saw the transition where the modern monitoring techniques are being used and at that point I was introduced with some tools for monitoring like sensu, prometheus and grafana etc.

Baseline Monitoring Setup with Prometheus and Grafana

In order to setup a baseline monitoring setup in a containerized environment, you need to have a basic understanding of Docker and prometheus. Here I have wrote a blogpost previously where I have setup the stack in an easy and simple way.

Thanos

Thanos is a set of components that can be composed into a highly available metric system with unlimited storage capacity, which can be added seamlessly on top of existing Prometheus deployments.

Why Integrate Prometheus with Thanos?

When practicing monitoring with prometheus, there were certain capabilities and areas that needs upgrade in order to meet requirements such as:

Storing historical data in a reliable and cost-efficient way
Accessing all metrics, whether they are new or old using a single-query API
Merging replicated data collected via Prometheus high-availability (HA) setups

Thanos comes into play here allowing a user to create multiple instances of Prometheus, deduplicate data, and archive data in long-term storage like AWS-S3 or other providers. Thanos introduces multiple key components in order to cover the short comings and taking the monitoring to the next level. We will be having a look at those components individually in order to have a better understanding of the whole stack. It introduces a component named as sidecar which resides along with the prometheus server, A querying mechanism/component to respond to the queries from PromQL, It basically pulls data from different Prometheus servers, merges it, and deduplicates based on the Promql queries. The third one is compactor where the data from the prometheus server (sidecar) is being uploaded, It also sampled the huge data metrics sets into smaller chunks. The last component is a Ruler which provides a global set of rules for different Prometheus servers. This global set is stored in a shared object store location for high availability purposes.

Thanos Components Architecture Overview

Here is the official and complete Architecture diagram for implementation of Thanos with prometheus.

In our scenario, let’s create a simple dockerized version of the whole stack in order to practice the implementation of Thanos and its components with prometheus. For this purpose we will implement it in a simple manner using docker. Firstly what I have done is here I have create a simplified block diagram for the purpose of understanding the whole concept and applying it in local containerized environment and testing it.

**Block Diagram of Prometheus with Thanos**

Deployment Overview

Let’s write a docker-compose file to deploy the complete stack. we will do the following steps;

a) Setting up a Prometheus Server.

prometheus0:
  image: prom/prometheus:v2.9.2
  container_name: prometheus0
  user: root
  volumes:
    - thanos0:/data
    - ./data/prom0/prometheus.yml:/etc/prometheus/prometheus.yml
  command:
    - "--config.file=/etc/prometheus/prometheus.yml"
    - "--storage.tsdb.path=/data/prom0"
    - "--web.enable-lifecycle"
    - "--storage.tsdb.min-block-duration=2h"
    - "--storage.tsdb.max-block-duration=2h"
    - "--web.listen-address=0.0.0.0:9090"
  networks:
    - thanos

b) Enabling Thanos Sidecar for that specific Prometheus instance.

sidecar0:
  image: thanosio/thanos:v0.10.0
  container_name: thanos-sidecar0
  command:
    - "sidecar"
    - "--debug.name=sidecar-0"
    - "--grpc-address=0.0.0.0:10901"
    - "--http-address=0.0.0.0:10902"
    - "--prometheus.url=http://prometheus0:9090"
    - "--tsdb.path=/data/prom0"
    - "--objstore.config-file=/bucket.yml"
  volumes:
    - thanos0:/data
    - ./data/bucket.yml:/bucket.yml
  depends_on:
    - prometheus0
  networks:
    - thanos

c) Deploying Thanos Querier with the ability to talk to Sidecar.

query0:
   image: thanosio/thanos:v0.10.0
   container_name: thanos-query0
   command:
    - "query"
    - "--grpc-address=0.0.0.0:10903"
    - "--http-address=0.0.0.0:10904"
    - "--query.replica-label=prometheus"
    - "--store=sidecar0:10901"
    - "--store=store:10905"
  ports:
    - 10904:10904
  depends_on:
    - sidecar0
    - store
  networks:
    - thanos

d) Deploying Thanos Store to retrieve metrics data stored in long-term storage (in this case, Local file system).

store:
  image: thanosio/thanos:v0.10.0
  container_name: thanos-store
  restart: always
  command:
    - "store"
    - "--grpc-address=0.0.0.0:10905"
    - "--http-address=0.0.0.0:10906"
    - "--data-dir=/data/store"
    - "--objstore.config-file=/bucket.yml"
  volumes:
    - store:/data
    - ./data/bucket.yml:/bucket.yml
  networks:
    - thanos

e) Setting up Thanos Compactor for data compaction and downsampling.

compactor:
  image: thanosio/thanos:v0.10.0
  container_name: compactor
  command:
    - "compact"
    - "--wait"
    - "--data-dir=/tmp/thanos-compact"
    - "--objstore.config-file=/bucket.yml"
    - "--http-address=0.0.0.0:10902"
  volumes:
    - compact:/tmp
    - ./data/bucket.yml:/bucket.yml
  depends_on:
    - sidecar0
    - store
  networks:
    - thanos

f) Configuration of a node exporter container

node-exporter:
  image: prom/node-exporter:v0.18.1
  container_name: node-exporter
  volumes:
    - /proc:/host/proc:ro
    - /sys:/host/sys:ro
    - /:/rootfs:ro
  command:
    - '--path.procfs=/host/proc'
    - '--path.rootfs=/rootfs'
    - '--path.sysfs=/host/sys'
    - '--collector.filesystem.ignored-mount-points=^/(sys|proc|dev|host|etc)($$|/)'
  ports:
    - 9100:9100
  networks:
    - thanos

g) At last, here is Grafana for visualization along with the docker-compose volumes and network stuff

grafana:
  image: grafana/grafana:6.5.2
  container_name: grafana
  environment:
    - GF_SECURITY_ADMIN_USER=${ADMIN_USER:-admin}
    - GF_SECURITY_ADMIN_PASSWORD=${ADMIN_PASSWORD:-admin}
    - GF_USERS_ALLOW_SIGN_UP=false
  restart: unless-stopped
  ports:
    - 3000:3000
  networks:
    - thanosnetworks:
  thanos: {}volumes:
  thanos0: {}
  store: {}
  compact: {}

Once the docker-compose file is being completed, let’s have a look at the local prometheus configuration and our local file system storage through which thanos store and thanos sidecar will be able to reach and talk.

- DevOps
    |---docker-compose.yml
    |---data
        |---bucket.yml
        |---prom0
              |---prometheus.yml

Here is the local file system storage configuration inside the bucket.yml file, which configures the local file storage;

type: FILESYSTEM
config:
  directory: “/data/prom0”

Here is the prometheus configuration file, it depends on us how many jobs we want to aadd, Here just for seeing more visualizations I have added node_exporter and other thanos components also.

global:
  external_labels:
    prometheus: prom-0scrape_configs:
- job_name: prometheus
  scrape_interval: 5s
  static_configs:
  - targets:
     - "localhost:9090"- job_name: 'nodeexporter'
  scrape_interval: 5s
  static_configs:
  - targets:
     - "node-exporter:9100"- job_name: thanos-sidecar
  scrape_interval: 5s
  static_configs:
  - targets:
     - "sidecar0:10902"- job_name: thanos-store
  scrape_interval: 5s
  static_configs:
  - targets:
     - "store:10906"- job_name: thanos-query
  scrape_interval: 5s
  static_configs:
  - targets:
     - "query0:10904"

Let’s Execute it and See the Results

All the configuration has been now completed firstly we we will create the folder and files structure as shown above, we will run the following command and it will create our whole cluster as shown;

$ docker-compose up -d

Once all the containers are up and running, let’s have a look at the results. First we will access the Query UI, which looks identical to the Prometheus UI: htttp://localhost:10904 and here in the picture I have write a simple query up to see the nodes monitored.

Here, everything seems correct related to the prometheus and other containers, now we can have a look at the node_exporter logs via htttp://localhost:9100 . Now finally the grafana results are important to see and look. For doing this we can jump to htttp://localhost:3000 and login to grafana. Once logged in we can add the data source, obviously in our case it will be the thanos query component which will be providing the logs to grafana according to the architecture diagram.

Selecting the prometheus data source and then moving forward to add the thanos query component as a source inside.

Adding Thanos Query URL to data source and Save it.

Finally we are at the Grafana dashboard, we can easily set up any query we want and see the results as shown in the screenshot below. It will look more or less like this;

In A Nutshell

We have seen and test how Thanos is being used along with prometheus and how it can be deployed with a basic docker setup. We have seen how it interacts with Prometheus and the type of advantages it can give us. It is now clear from the concept that how Thanos Sidecar and Storage are inherently advantageous when it comes to scaling, in relation to a typical Prometheus setup. Moreover if we talk about Thanos Query, we can analyze that how a single metric collection point is a good thing for bigger environments. Lastly, downsampling through the use of the Thanos Compactor seems like a performance no brainer. Large datasets can be easily handled using the downsampling method.
Hopefully in the upcoming blogpost, we can deploy it in Kubernetes environment and see how efficiently things work there.

Reading all the components of Thanos, Now I completely know the usage of all the powers Thanos have in his hands;

References:
https://github.com/thanos-io/thanos
https://dickingwithdocker.com/2019/03/thanos-prometheus-without-kubernetes/
https://grafana.com/