
Prometheus is an open-source monitoring system and time series database originally built at SoundCloud. It is now a graduated Cloud Native Computing Foundation (CNCF) project—the second after Kubernetes. Prometheus is designed for reliability, to be the system you go to during an outage, allowing you to quickly diagnose problems.
Prometheus stores time series data as metric names and key-value pairs called labels. This dimensional approach allows for flexible aggregation and slicing of data, making it powerful for complex infrastructure monitoring.
PromQL (Prometheus Query Language) enables users to query, aggregate, and analyze time series data in real-time. It supports operations like aggregation, filtering, and mathematical transformations directly on the stored metrics.
Prometheus uses a pull model over HTTP to collect metrics from instrumented targets. This approach simplifies service discovery in dynamic environments like Kubernetes and makes it easier to detect down instances.
Each Prometheus server operates independently with local storage. There are no external dependencies, making deployment straightforward and eliminating complex distributed system concerns for single-node deployments.
Prometheus includes an alerting component that evaluates rules against collected metrics and sends notifications via Alertmanager, which handles routing, grouping, and silencing of alerts.
Prometheus is written in Go and distributed as statically linked binaries, making it easy to deploy:
Prometheus integrates with hundreds of systems through exporters and has native client libraries for most programming languages. It works seamlessly with visualization tools like Grafana and Perses, and supports federation for scaling across multiple data centers.
Prometheus is a core component of the MING (Mosquitto, InfluxDB, Node-RED, Grafana) monitoring stack variation, often replacing or complementing InfluxDB for metric collection and alerting scenarios.