In this guide, we will set up a monitoring system that will collect metrics from all our servers and visualize them in Grafana. You can also install Alertmanager to receive notifications about server problems, in our case, in Telegram
We need a separate server for Prometheus, Grafana, Node Exporter, Alertmanager. All other servers will only have Node Exporter installed
Prometheus is an open source DBMS written in Go. An interesting feature of Prometheus is that it pulls metrics from a given set of services. Due to this, Prometheus cannot have any data queues clogged, which means monitoring will never become a bottleneck in the system
Node Exporter is a service whose job is to export machine information in a format that Prometheus can understand. There are actually many other exporters for Prometheus, but Node Exporter is perfect for our purposes of server monitoring
Grafana is an open web frontend to various time series DBMSs, such as Graphite, InfluxDB, and, of course, Prometheus. With Grafana, we can see beautiful graphs through our browser. It is characteristic that Prometheus also has its own web interface, but even the Prometheus developers themselves recommend using Grafana
Can find the latest binaries along with their checksums on the Prometheus download page
# use the --no-create-home and --shell /bin/false parameters,# so that this user cannot log into the serveruseradd--no-create-home--shell/bin/falsenode_exporter
cdwgethttps://github.com/prometheus/node_exporter/releases/download/v1.5.0/node_exporter-1.5.0.linux-amd64.tar.gztarxvfnode_exporter-1.5.0.linux-amd64.tar.gz# copy the binaries to /usr/local/bincpnode_exporter-1.5.0.linux-amd64/node_exporter/usr/local/binchownnode_exporter:node_exporter/usr/local/bin/node_exporternode_exporter--version#node_exporter, version 1.5.0 (branch: HEAD, revision: 1b48970ffcf5630534fb00bb0687d73c66d1c959)# delete unnecessary filesrm-rnode_exporter-*
For security purposes, let's create a prometheus account
useradd--no-create-home--shell/bin/falseprometheus# create the necessary directories to store Prometheus files and datamkdir/etc/prometheusmkdir/var/lib/prometheus# set user and group permissions in new directories for user prometheuschownprometheus:prometheus/etc/prometheuschownprometheus:prometheus/var/lib/prometheus
Now we need to configure prometheus.yml and add 1 or more parameters to it depending on the number of servers and nodes used
Important - the Prometheus configuration file uses YAML format, which strictly forbids tabs and requires two spaces for indentation. Prometheus will not start if the configuration file is not formatted correctly
Open the default config and bring it to the following form
In the future, you can add additional information about your other servers to prometheus.yml to monitor several servers at once. The configuration will depend on your settings and json for Grafana
Now we need to specify the URL address to our server with Prometheus in the column. If Prometheus is on the same server as Grafana, then set http://localhost:8080 and save the settings
Now the most interesting part and we need to upload json files for our dashboard. To do this, you need to know the ID of the json file or download it from someone or create it yourself
If we have downloaded json, then we upload it ourselves
You can load several json at once and switch between them. Or create your own, spending a certain amount of time on it)
Step 6 - Install Alert manager
First, let's set up Prometheus rules. To do this, create a rules.yml file in the /etc/prometheus/ section, which will contain our rules for alerts In our configuration file, we will monitor the operation of node exporter, CPU load, RAM, and disk space
nano/etc/prometheus/rules.yml
groups:-name:CriticalAlersrules:-alert:UPTIME_DOWNexpr:up==0for: 1mlabels:severity:criticalannotations:description:'🚨𝐒𝐄𝐑𝐕𝐄𝐑 𝐢𝐬 𝐃𝐎𝐖𝐍 {{ $labels.instance }} 'summary:'СSERVER IS UNAVAILABLE FOR MORE THAN 1 MINUTE. CHECK node_exporter AND THE SERVER ITSELF'-alert:DISK_space_usage_is_Highexpr:100* (1 - (node_filesystem_avail_bytes /node_filesystem_size_bytes{mountpoint="/"})) > 85for: 1mlabels:severity:criticalannotations:description:'🚨𝐇𝐈𝐆𝐇 𝐃𝐈𝐒𝐊 𝐔𝐒𝐀𝐆𝐄 {{ $labels.instance }} 'summary:'DISK IS 85 PERCENT FULL. CHECK DISK'-alert:CPU_usage_is_Highexpr:100- (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 85for: 1mlabels:severity:criticalannotations:description:'🚨𝐇𝐈𝐆𝐇 𝐂𝐏𝐔 𝐔𝐒𝐀𝐆𝐄 {{ $labels.instance }} 'summary:'CPU USAGE IS OVER 85 PERCENT. CHECK SERVER'-alert:RAM_usage_is_Highexpr:100* (1 - (node_memory_MemAvailable_bytes /node_memory_MemTotal_bytes)) > 85for: 1mlabels:severity:criticalannotations:description:'🚨𝐇𝐈𝐆𝐇 𝐑𝐀𝐌 𝐔𝐒𝐀𝐆𝐄 {{ $labels.instance }} 'summary:'RAM USAGE IS OVER 85 PERCENT. CHECK SERVER'
Adding information about our rules to the existing prometheus config
In order to ensure the protection of your data, you must at least close our ports from others, since without this, anyone knowing the server IP and port will be able to receive Node exporter or Prometheus data from Grafana We can configure UFW so that data from Prometheus and Grafana are available only from the IP we need (for example, a home PC). And data from Node exporter is available only to the server where Prometheus is installed
# Allow receiving data from Grafana only from home IPufwallowfrom<YOUR_IP>toanyport3000# Allow Prometheus to receive data only from home IPufwallowfrom<YOUR_IP>toanyport8080# Allow receiving data from Node Exporter only for the server with Prometheusufwallowfrom<IP_PROMETHEUS>toanyport9100