How to configure Alertmanager with Prometheus and Grafana for Monitoring?

Cloud Technologist | Architecting Innovative Solutions for Digital Transformation | AWS, Azure, GCP Expert | Passionate about Emerging Technologies

Introduction

Prometheus Alertmanager is a crucial component in the Prometheus monitoring ecosystem, responsible for handling alerts generated by the Prometheus server. In this blog post, we’ll dive into the key aspects of Alertmanager, its role in managing alerts, and how it contributes to effective incident response.

Key Features:

Alert Grouping:

Alertmanager intelligently groups similar alerts, preventing alert fatigue and providing a more streamlined view for operators.

Silencing Alerts:

Operators can silence specific alerts temporarily, allowing for scheduled maintenance or when certain alerts are expected and don’t require immediate attention.

Notification Routing:

The tool supports flexible notification routing, enabling alerts to be sent to appropriate channels or recipients based on predefined configurations.

Integration with Prometheus:

Seamless integration with Prometheus allows Alertmanager to receive alerts and execute actions based on the defined rules and configurations.

Configurations:

Alert Routing:

Explore how to set up routing trees to direct alerts to the right team or individual based on severity or type.

Inhibition Rules:

Learn about inhibition rules and how they prevent unnecessary alerts by suppressing dependent alerts when a higher-level alert is triggered.

Notification Templates:

Customise alert notifications with templates, allowing operators to receive informative and actionable alerts.

Best Practices:

Effective Labeling:

Utilise Prometheus labels effectively to enhance alert grouping and ensure alerts are directed to the right teams.

Silence Rules:

Implement silence rules judiciously, understanding when to use them and when they might impact incident response negatively.

Testing Configurations:

Develop a testing strategy for Alertmanager configurations to ensure that changes won’t lead to unexpected behaviours during critical incidents.

Real-world Use Cases:

I. Prometheus Alertmanager Installation

(i) Download Alertmanager:

Start by downloading the latest version of Prometheus Alertmanager from the official releases page.

https://github.com/prometheus/alertmanager

$ wget

https://github.com/prometheus/alertmanager/releases/download/v0.26.0/alertmanager-0.26.0.linux-amd64.tar.gz.

$ tar -xvf alertmanager-0.26.0.linux-amd64.tar.gz
$ cd alertmanager-0.26.0.linux-amd64

(iii) Setup Alert Manager Systemd Service

SEO requires mobile friendliness because more and more people are using mobile devices to access the internet. Because of their full responsiveness, our flyer designs look fantastic and function perfectly on a variety of devices. We increase user experience and search engine rankings by optimising for mobile, which increases traffic to your flyers and fosters greater brand interaction.

Create a user and group for the Alert Manager to allow permission only for the specific user.

$ groupadd -f alertmanager $ useradd -g alertmanager - no-create-home - shell /bin/false alertmanager

Creating directories is /etc to store the configuration and library files and change the ownership of the directory only for the specific user.

$ mkdir -p /etc/alertmanager/templates $ mkdir /etc/alertmanager $ chown alertmanager:alertmanager /etc/alertmanager $ chown alertmanager:alertmanager /etc/alertmanager

Copy the alertmanager and amtol [a syntax checker utility] files in the /usr/bin directory and change the group and owner to alertmanager. As well as copy the configuration file alertmanager.yml to the /etc directory and change the owner and group name to alertmanager.

$ cp alertmanager /usr/bin/ $ cp amtool /usr/bin/ $ chown alertmanager:alertmanager /usr/bin/alertmanager $ chown alertmanager:alertmanager /usr/bin/amtool $ cp alertmanager.yml /etc/alertmanager/alertmanager.yml $ chown alertmanager:alertmanager /etc/alertmanager/alertmanager.yml

(iv) Run Alertmanager:

Launch Alertmanager with the configured file:

./alertmanager --config.file=alertmanager.yml

Create a service file in /etc/systemd/system and the file name is alertmanager.service.

[Unit] Description=AlertManager Wants=network-online.target After=network-online.target [Service] User=alertmanager Group=alertmanager Type=simple ExecStart=/usr/bin/alertmanager - config.file /etc/alertmanager/alertmanager.yml - storage.path/etc/alertmanager/ [Install] WantedBy=multi-user.target

After providing the necessary permission to the file reload the background processes and start the Alert Manager service. To prevent the manual restart of the service after reboot, enable the service.

systemctl daemon-reload systemctl start alertmanager.service systemctl enable alertmanager.service

(v) To access the Prometheus Alert Manager dashboard over the browser, use the below url and replace the with the vm ip on which alertmanager was installed.
http://<alertmanager-ip>:9093

II. Create Prometheus Rules

Prometheus rules are essential to trigger alerts. Based on the rules, Prometheus will detect and trigger an alert to the Alert Manager.
Below are some basic alert rules,

$ vim /etc/prometheus/alert-rules.yml groups: - name: alert_rules rules: - alert: InstanceDown expr: up == 0 for: 30s labels: severity: page annotations: summary: "Instance {{ $labels.instance }} down" description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 30 seconds." - alert: HostOutOfMemory expr: (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 10) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"} for: 10s labels: severity: warning annotations: summary: Host out of memory (instance {{ $labels.instance }}) description: "Node memory is filling up (< 10% left)n VALUE = {{ $value }}n LABELS = {{ $labels }}" - alert: HostHighCpuLoad expr: (sum by (instance) (avg by (mode, instance) (rate(node_cpu_seconds_total{mode!="idle"}[2m]))) > 0.8) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"} for: 5m labels: severity: warning annotations: summary: Host high CPU load (instance {{ $labels.instance }}) description: "CPU load is > 80%n VALUE = {{ $value }}n LABELS = {{ $labels }}" - alert: Jenkins_Service_Down expr: node_systemd_unit_state{name="jenkins.service",state="active"} == 0 for: 1s annotations: summary: "Instance {{ $labels.instance }} is down" description: "{{ $labels.instance }} of job {{ $labels.job }} is down."

Once the alert rule is added, restart the prometheus.service. The alert rule will appear in the Prometheus console, as shown below.

Create a new email template file and add the following to email.tmp

sudo vi /etc/alertmanager/templates/email.tmpl {{ define "email" }} <html> <head> <style type="text/css"> table { font-family: verdana,arial,sans-serif; font-size:11px; color:#333333; border-width: 1px; border-color: #999999; border-collapse: collapse; } table th { background-color:#ff6961; border-width: 1px; padding: 8px; border-style: solid; border-color: #F54C44; } table td { border-width: 1px; padding: 8px; border-style: solid; border-color: #F54C44; text-align: right; } </style> </head> <body> <table border=1> <thead> <tr> <th>Alert name</th> <th>Host</th> <th>Summary</th> <th>Description</th> </tr> </thead> <tbody> {{ range .Alerts }} <tr> <td>{{ .Labels.alertname }}</td> <td>{{ .Annotations.host }}</td> <td>{{ .Annotations.summary }}</td> <td>{{ .Annotations.description }}</td> </tr> {{ end }} </tbody> </table> </body> </html> {{end}}

FYI

While performing the above command you might end up with the below error,

Issue

while attaching a policy to cluster, as a non-root user

Error: unable to create iamserviceaccount(s) without IAM OIDC provider enabled

Fix

Note

Amazon EKS supports using OpenID Connect (OIDC) identity providers as a method to authenticate users to your cluster. OIDC identity providers can be used with or as an alternative to AWS Identity and Access Management (IAM)

Below are several ways that you can plan to add secrets to EKS

Mount secrets by name or ARN

Mount key/value pairs from a secret [ We chose this ]

Define a failover Region for a multi-region secret

Choose a failover secret to mount

Create a secret provider sp.yaml file eg.

apiVersion: apps/v1 kind: Deployment metadata: name: test-app-deployment labels: app: test-app spec: replicas: 1 selector: matchLabels: app: test-app template: metadata: labels: app: test-app spec: serviceAccountName: eks-test-sa volumes: - name: test-app-secret csi: driver: secrets-store.csi.k8s.io readOnly: true volumeAttributes: secretProviderClass: aws-secrets containers: - name: test-app image: nginx ports: - containerPort: 80 volumeMounts: - name: test-app-secret mountPath: "/mnt/db/secrets" readOnly: true

Issue

when you try to log-in to a container

error

Internal error occurred: error executing command in container: failed to exec in container: failed to start exec “467c544f0f458d70a6a0ff2ca2d7c2a65f334d681f8757b77c05371d5bdd235f”: OCI runtime exec failed: exec failed: unable to start container process: exec: “C:/Program Files/Git/usr/bin/bash”: stat C:/Program Files/Git/usr/bin/bash: no such file or directory: unknown

Fix

Any updates to secrets in the secret manager will be reflected only when a restart is made against the pod.

Conclusion

We want you to feel confident that you’re getting the most out of your reading time, and that you’ll leave with a better understanding of the subject matter. So relax, and let us do the heavy lifting for you. With iDevopz you can count on getting the information you need in a friendly, accessible way.
If you found this blog helpful, please feel free to share it with your friends. Thanks for checking it out.

How to configure Alertmanager with Prometheus and Grafana for Monitoring?

Introduction

Key Features:

Configurations:

Best Practices:

(iv) Run Alertmanager:

II. Create Prometheus Rules

FYI

Issue

Fix

Note

Create a secret provider sp.yaml file eg.

Issue

error

Fix

Conclusion

You have reached the end of the page does not signify the end of the journey, but rather the beginning of a new chapter in innovation and perseverance.

Company

Services

Explore