Ok so Loki from Grafana Labs is *sexy*. This fixes a lot of the issues I set out to solve with our shipped systems for alerting and analytics, without further burdening the overwhelmed software team with adding a metrics API to our codebases.

And as a plus, Loki (well, promtail) can also eat syslog, systemd/journald, and Windows Event Log formats.

Combined with our use of time-series system health information and the outgoing relational databases to the plant, we can answer questions like:
* "Does the inspection cycle time change with CPU temperature?"
* "What source file throws the most errors? Is it dependent on the inspection recipe?"
* "send an alert when any inspection cycle time, per recipe, goes above 80% allotted cycle time for more than 5 in a row"

As well as the usual statistical process control questions like:
* "which recipes have the worst reject rates?"
* "are there time- or shift-dependent changes in judgement accuracy?"
* "is this measurement trending out of tolerance?"

Sign in to participate in the conversation

cybrespace: the social hub of the information superhighway jack in to the mastodon fediverse today and surf the dataflow through our cybrepunk, slightly glitchy web portal support us on patreon or liberapay!