prometheus性能优化之RecordingRule

发表于 2021-02-04 分类于 prometheus 阅读次数：

问题发现

客户的k8s环境在查看grafana时发现某些图表无法展示2天的数据，如下：

error

环境

rancher-v2.3.6
监控版本0.0.7

排查过程

首先先查看这个图表使用了哪些表达式

grafana-1

接着带着这个表达式，到prometheus查看

prom-1

可以看到，查询这个表达式花费了37秒的时间，而这仅仅是其中一条表达式，由此可以得知grafana查看不到数据的原因是在获取prometheus的数据超时了

随着业务的扩大，prometheus中监控的数据会越来越多，查询的频率也在不断的增加，这就会导致当数据量达到一定程度时会影响prometheus查询的性能，尤其是如果有大量的表达式计算指标数据时，就会导致promql查询超时

根据客户反映，prometheus设置了30天的保存时间，集群比较大，数据指标也比较多，所以导致这个grafana使用表达式去查询数据指标的时候，使用了大量的时间去执行这个计算，最终导致返回超时

优化

首先想到的是清理旧数据，让数据获取少一点，但是这个效果并不明显

既然是因为表达式计算的过程花费的时间长，那么我们可不可以事先将该表达式计算的结果存储下来，后面查询的时候就不用再进行二次计算，直接获取对应的指标数据呢？

当然是可以的，prometheus提供了这样的一种方法：Recording Rule

Recording Rule：Recording Rule可以预先计算经常需要或计算量大的表达式，并将其结果保存为一组新的时间序列。这样，查询预先计算的结果通常比每次需要原始表达式都要快得多。这对于仪表板特别有用，仪表板每次刷新时都需要重复查询相同的表达式。

步骤

首先获取grafana中的表达式

sum (rate (container_network_receive_bytes_total[5m]))by (node)

sum (rate (container_network_transmit_bytes_total[5m])) by (node)

sum (rate (wmi_container_network_receive_bytes_total[5m]))by (node)

sum (rate (wmi_container_network_transmit_bytes_total[5m]))by (node)

编写对应的prometheus rule 配置文件

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  generation: 1
  labels:
    app: exporter-kubernetes
    chart: exporter-kubernetes-0.0.1
    heritage: Tiller
    io.cattle.field/appId: cluster-monitoring
    release: cluster-monitoring
    source: rancher-monitoring
  name: custom
  namespace: cattle-prometheus
spec:
  groups:
  - name: network_IO
    rules:
    - record: custom_container_network_receive_bytes_total
      expr: sum (rate (container_network_receive_bytes_total[5m]))by (node)
    - record: custom_container_network_transmit_bytes_total
      expr: sum (rate (container_network_transmit_bytes_total[5m])) by (node)
    - record: custom_wmi_container_network_receive_bytes_total
      expr: sum (rate (wmi_container_network_receive_bytes_total[5m]))by (node)
    - record: custom_wmi_container_network_transmit_bytes_total
      expr: sum (rate (wmi_container_network_transmit_bytes_total[5m]))by (node)