Conversation
garrettheel
left a comment
There was a problem hiding this comment.
Thanks for contributing!
Can you help me understand what the original issue was that prompted you to do this? As far as I can tell, we're already using rule['raw'] as the incident_key which should allow for stateless resolution
| event_type = 'resolve' | ||
|
|
||
| # Extract unique alert identifiers | ||
| alert_name = message[message.find("<")+1:message.find(">")] |
There was a problem hiding this comment.
Can you use the context passed into the function for the alert name and metric name instead of pulling it out of the message?
| h.update(alert_metric) | ||
|
|
||
| # Use hash as incident key to support resolution | ||
| incident_key = h.hexdigest() |
There was a problem hiding this comment.
Is there any benefit to md5ing these? Why not just do "{alert_name:alert_metric}"?
| # Use hash as incident key to support resolution | ||
| incident_key = h.hexdigest() | ||
|
|
||
| if level == 'critical': |
There was a problem hiding this comment.
if level in ['critical', 'warning']:
| event_type = "resolve" | ||
| else: | ||
| event_type = 'trigger' | ||
| return |
There was a problem hiding this comment.
Is there a reason you're changing this?
| "event_type": event_type, | ||
| "description": message, | ||
| "details": message, | ||
| "incident_key": rule['raw'] if rule is not None else 'graphite connect error', |
There was a problem hiding this comment.
Looks like this logic has has been lost, can you re-add it?
PagerDuty incidents are generated with messages like:
"Back to normal" messages look like:
From those, I figured that incident-unique information is the combination of alert name (
<Test Wildcard Alert>) and metrics name ((stats.gauges.server1.data)). To avoid storing data on the file system, I decided to generate a hash out of those two. Using this hash value incidents can be triggered and resolved in a stateless way.Following tests were performed:
"query": "stats.gauges.test")"query": "stats.gauges.*.data")