Datadog の monitor でアラートを設定したい

以前、Datadog の monitor でアラートを設定したときのメモ。備忘録。

APM monitor

operation: event.handler で p90 latency: 5s を超えたらアラートするようにしたい。今回は New Monitor > APM から作成する。

Select monitor scope

Resource に event.handler を選択する。

最初、プルダウンに * と servlet.request しか出てこなかったが、APM 側の設定で、Primary operation を servlet.request から event.handler に変えたらプルダウンに出てくるようになった。(なぜ Primary operation のものしか出てこないのかは不明...)

Set alert conditions

Alert when p90 latency is above the threshold over the last 5 minutes で Alert threshold: 5 を設定する。

あとは、通知するメッセージとかを設定して終了。

Metric monitor

operation: scheduled.call で、対象の resource 2つのいずれかが p90 latency: 60s を超えたらアラートするようにしたい。今回は New Monitor > Metric から作成する。

Define the metric

trace.scheduled.call from resource_name:scheduledtasks.foo* p90 by resource_name で設定する。resource の名前に依存しちゃうけど、from のところを先頭一致にすると複数の resource を監視できる。(それぞれを query に設定して、formula でごにょごにょすれば同じようにできるかもしれないけどやり方が分からず...)

Set alert conditions

Trigger when the metric is above the threshold percentile (p90) during the last 5 minutes for any resource_name で Alert threshold: 60 を設定する。

あとは、通知するメッセージとかを設定して終了。

その他

APM monitor の方で Primary operation を変えたけど、Metric monitor で作成したら別に Primary operation を変える必要はなかったかもしれない。

現場からは以上です。