⚒️Node-exporter + Grafana + Prometheus + Alertmanager

In this guide, we will set up a monitoring system that will collect metrics from all our servers and visualize them in Grafana. You can also install Alertmanager to receive notifications about server problems, in our case, in Telegram
We need a separate server for Prometheus, Grafana, Node Exporter, Alertmanager. All other servers will only have Node Exporter installed
Prometheus is an open source DBMS written in Go. An interesting feature of Prometheus is that it pulls metrics from a given set of services. Due to this, Prometheus cannot have any data queues clogged, which means monitoring will never become a bottleneck in the system
Node Exporter is a service whose job is to export machine information in a format that Prometheus can understand. There are actually many other exporters for Prometheus, but Node Exporter is perfect for our purposes of server monitoring
Grafana is an open web frontend to various time series DBMSs, such as Graphite, InfluxDB, and, of course, Prometheus. With Grafana, we can see beautiful graphs through our browser. It is characteristic that Prometheus also has its own web interface, but even the Prometheus developers themselves recommend using Grafana

Node Exporter
Can find the latest binaries along with their checksums on the Prometheus download page
# use the --no-create-home and --shell /bin/false parameters,
# so that this user cannot log into the server
useradd --no-create-home --shell /bin/false node_exporter
cd
wget https://github.com/prometheus/node_exporter/releases/download/v1.5.0/node_exporter-1.5.0.linux-amd64.tar.gz
tar xvf node_exporter-1.5.0.linux-amd64.tar.gz
# copy the binaries to /usr/local/bin
cp node_exporter-1.5.0.linux-amd64/node_exporter /usr/local/bin
chown node_exporter:node_exporter /usr/local/bin/node_exporter
node_exporter --version
#node_exporter, version 1.5.0 (branch: HEAD, revision: 1b48970ffcf5630534fb00bb0687d73c66d1c959)
# delete unnecessary files
rm -r node_exporter-*
Create a service file
tee /etc/systemd/system/node_exporterd.service > /dev/null <<EOF
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target
[Service]
User=node_exporter
Group=node_exporter
ExecStart=/usr/local/bin/node_exporter
[Install]
WantedBy=default.target
EOF
systemctl daemon-reload
systemctl enable node_exporterd
systemctl restart node_exporterd && journalctl -u node_exporterd -f -o cat

Check that the metrics are being given
curl 'localhost:9100/metrics'
Now is the time to check the metrics in the browser
echo -e "\033[0;32mhttp://$(wget -qO- eth0.me):9100/\033[0m"
# http://108.108.108.108:9100/

Prometheus, Grafana, AlertManager
Step 1 - Preparing the Server
apt update && apt upgrade -y
apt install curl iptables build-essential git wget jq make gcc nano tmux htop nvme-cli pkg-config libssl-dev libleveldb-dev tar clang bsdmainutils ncdu unzip libleveldb-dev -y
apt install python3-pip -y
pip install yq
Step 2 - Create a Prometheus User
For security purposes, let's create a prometheus account
useradd --no-create-home --shell /bin/false prometheus
# create the necessary directories to store Prometheus files and data
mkdir /etc/prometheus
mkdir /var/lib/prometheus
# set user and group permissions in new directories for user prometheus
chown prometheus:prometheus /etc/prometheus
chown prometheus:prometheus /var/lib/prometheus
Step 3 - Install Prometheus
cd && \
wget https://github.com/prometheus/prometheus/releases/download/v2.38.0/prometheus-2.38.0.linux-amd64.tar.gz && \
tar xvf prometheus-2.38.0.linux-amd64.tar.gz
cp prometheus-2.38.0.linux-amd64/prometheus /usr/local/bin/
cp prometheus-2.38.0.linux-amd64/promtool /usr/local/bin/
chown prometheus:prometheus /usr/local/bin/prometheus
chown prometheus:prometheus /usr/local/bin/promtool
cp -r prometheus-2.38.0.linux-amd64/consoles /etc/prometheus
cp -r prometheus-2.38.0.linux-amd64/console_libraries /etc/prometheus
cp prometheus-2.38.0.linux-amd64/prometheus.yml /etc/prometheus/
chown -R prometheus:prometheus /etc/prometheus
rm -rf prometheus-2.38.0.linux-amd64.tar.gz prometheus-2.38.0.linux-amd64
prometheus --version
promtool --version
Step 4 - Configure Prometheus
Now we need to configure prometheus.yml and add 1 or more parameters to it depending on the number of servers and nodes used
Important - the Prometheus configuration file uses YAML format, which strictly forbids tabs and requires two spaces for indentation. Prometheus will not start if the configuration file is not formatted correctly
Open the default config and bring it to the following form
nano /etc/prometheus/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: ['localhost:8080','localhost:9100']
Important - we have replaced the default prometheus port (9090) with port (8080)
Create a service file
tee /etc/systemd/system/prometheusd.service > /dev/null <<EOF
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file /etc/prometheus/prometheus.yml \
--storage.tsdb.path /var/lib/prometheus/ \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries \
--web.listen-address=:8080
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload &&
systemctl enable prometheusd &&
systemctl restart prometheusd && systemctl status prometheusd
#journalctl -u prometheusd -f -o cat

Check that the metrics are being given
curl 'localhost:8080/metrics'
Now is the time to check the information in the browser
echo -e "\033[0;32mhttp://$(wget -qO- eth0.me):8080/\033[0m"
# http://108.108.108.108:9090/


In the future, you can add additional information about your other servers to prometheus.yml to monitor several servers at once. The configuration will depend on your settings and json for Grafana
Step 5 - Install Grafana
sudo apt-get install -y adduser libfontconfig1 && \
wget https://dl.grafana.com/enterprise/release/grafana-enterprise_9.1.3_amd64.deb && \
sudo dpkg -i grafana-enterprise_9.1.3_amd64.deb
systemctl daemon-reload &&
systemctl enable grafana-server &&
systemctl restart grafana-server && systemctl status grafana-server

Now it's time to go to the browser
echo -e "\033[0;32mhttp://$(wget -qO- eth0.me):3000/\033[0m"
# http://108.108.108.108:3000/





Now we need to specify the URL address to our server with Prometheus in the column. If Prometheus is on the same server as Grafana, then set http://localhost:8080 and save the settings


If we have downloaded json, then we upload it ourselves



You can load several json at once and switch between them. Or create your own, spending a certain amount of time on it)
Step 6 - Install Alert manager
First, let's set up Prometheus rules. To do this, create a rules.yml file in the /etc/prometheus/
section, which will contain our rules for alerts In our configuration file, we will monitor the operation of node exporter, CPU load, RAM, and disk space
nano /etc/prometheus/rules.yml
groups:
- name: CriticalAlers
rules:
- alert: UPTIME_DOWN
expr: up == 0
for: 1m
labels:
severity: critical
annotations:
description: '🚨𝐒𝐄𝐑𝐕𝐄𝐑 𝐢𝐬 𝐃𝐎𝐖𝐍 {{ $labels.instance }} '
summary: 'СSERVER IS UNAVAILABLE FOR MORE THAN 1 MINUTE. CHECK node_exporter AND THE SERVER ITSELF'
- alert: DISK_space_usage_is_High
expr: 100 * (1 - (node_filesystem_avail_bytes / node_filesystem_size_bytes{mountpoint="/"})) > 85
for: 1m
labels:
severity: critical
annotations:
description: '🚨𝐇𝐈𝐆𝐇 𝐃𝐈𝐒𝐊 𝐔𝐒𝐀𝐆𝐄 {{ $labels.instance }} '
summary: 'DISK IS 85 PERCENT FULL. CHECK DISK'
- alert: CPU_usage_is_High
expr: 100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 85
for: 1m
labels:
severity: critical
annotations:
description: '🚨𝐇𝐈𝐆𝐇 𝐂𝐏𝐔 𝐔𝐒𝐀𝐆𝐄 {{ $labels.instance }} '
summary: 'CPU USAGE IS OVER 85 PERCENT. CHECK SERVER'
- alert: RAM_usage_is_High
expr: 100 * (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) > 85
for: 1m
labels:
severity: critical
annotations:
description: '🚨𝐇𝐈𝐆𝐇 𝐑𝐀𝐌 𝐔𝐒𝐀𝐆𝐄 {{ $labels.instance }} '
summary: 'RAM USAGE IS OVER 85 PERCENT. CHECK SERVER'
Adding information about our rules to the existing prometheus config
nano /etc/prometheus/prometheus.yml
Adding lines
rule_files:
- 'rules.yml'
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093

Check the created rules. The output should be #SUCCESS
promtool check rules /etc/prometheus/rules.yml
After making all the changes, restart Prometheus
systemctl restart prometheusd && systemctl status prometheusd
Downloading binary files
wget https://github.com/prometheus/alertmanager/releases/download/v0.26.0/alertmanager-0.26.0.linux-amd64.tar.gz
tar -zxf alertmanager-0.26.0.linux-amd64.tar.gz
cp alertmanager-0.26.0.linux-amd64/alertmanager /usr/local/bin/
cp alertmanager-0.26.0.linux-amd64/amtool /usr/local/bin/
cp alertmanager-0.26.0.linux-amd64/alertmanager.yml /etc/prometheus
chown -R prometheus:prometheus /etc/prometheus/alertmanager.yml
chown -R prometheus:prometheus /etc/prometheus/rules.yml
chown -R prometheus:prometheus /usr/local/bin/alertmanager
chown -R prometheus:prometheus /usr/local/bin/amtool
# delete unnecessary data
cd
rm -r alertmanager-*
Set up alertmanager.yml, in which we register the bot token and chat ID from Telegram, to which notifications will be sent
nano /etc/prometheus/alertmanager.yml
global:
resolve_timeout: 10s
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 15s
repeat_interval: 60m
receiver: 'telegram_bot'
receivers:
- name: 'telegram_bot'
telegram_configs:
- bot_token: 'BOT_TOKEN'
api_url: 'https://api.telegram.org'
chat_id: CHAT_ID
parse_mode: ''
Create a service file
tee /etc/systemd/system/alertmanager.service > /dev/null <<EOF
[Unit]
Description=alertmanager
After=network.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/alertmanager --config.file=/etc/prometheus/alertmanager.yml --log.level=debug
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable alertmanager
systemctl restart alertmanager && systemctl status alertmanager
journalctl -u alertmanager -f -o cat
Setting up UFW
In order to ensure the protection of your data, you must at least close our ports from others, since without this, anyone knowing the server IP and port will be able to receive Node exporter or Prometheus data from Grafana We can configure UFW so that data from Prometheus and Grafana are available only from the IP we need (for example, a home PC). And data from Node exporter is available only to the server where Prometheus is installed
# Allow receiving data from Grafana only from home IP
ufw allow from <YOUR_IP> to any port 3000
# Allow Prometheus to receive data only from home IP
ufw allow from <YOUR_IP> to any port 8080
# Allow receiving data from Node Exporter only for the server with Prometheus
ufw allow from <IP_PROMETHEUS> to any port 9100
Useful commands
Temporarily load the CPU
apt install stress
stress --cpu 4 --timeout 600s
Temporarily load RAM
apt install stress-ng
stress-ng --vm-bytes $(awk '/MemFree/{printf "%d\n", $2 * 0.9;}' < /proc/meminfo)k --vm-keep -m 1
Remove Alert manager
systemctl stop alertmanager
systemctl disable alertmanager
rm /etc/systemd/system/alertmanager.service
systemctl daemon-reload
Remove Prometheus
systemctl stop prometheusd
systemctl disable prometheusd
systemctl daemon-reload
rm /etc/systemd/system/prometheusd.service
Remove Grafana
systemctl stop grafana-server
systemctl disable grafana-server
systemctl daemon-reload
Last updated