System Logging

From Laen

Jump to: navigation, search

Send data to a central logging host with syslog.

Use syslog-ng to receive the logs, and sort them to appropriate files.

Monitor those files, alerting on errors, and reporting on unknowns.

Unknown lines must be acknowledged by a sysadmin.

Alerts are events. Events can be correlated.

All raw data gets loaded into Splunk.

So, I'm thinking of a syslog(/syslog-ng/msyslog) and logsurfer based event correlation system.

Contents

Goals

  • Determine when things are going wrong
  • Determine what the root cause of a problem is, as a debugging aide.
  • Batch up multiple alerts into single admin pages.
  • Trigger recovery

Why?

Because logs are only useful if they're analyzed.

Components

Raw Data

First, we need raw data. These are logs, system states, application activity, and things like that. An apache log is raw data. Everything from syslog is raw data. Instantaneous CPU and memory usage is raw data. Temperature and wind velocity is raw data.

Raw data doesn't need to be in any particular format. The Event Generators are responsible for them.

Event Generators

Next, we need Event Generators. These are things that process raw data and give them context.

Events are statements of state, changes of state, or statistics.

  • Statement of State: "Service Z on Host A is DOWN."
  • Change of State: "The tomcat process on appserver1 started."
  • Statistic: "web server averaged 300 requests/second in the past 5 minutes"
  • More detailed Statistic: "IP 127.0.3.2 has registered on web server Y 20 times."

Eventually, all Raw Data should have a corresponding Event Generator rule, even if that rule is "ignore this line." If a pattern shows up in a raw data stream that no Event Generator knows what to do with then an "UNKNOWN LINE" event should be generated.

Event Generators produce specially formatted Events and send them to the Correlators, along with an arbitrary number of tags.

Notes to self:

  • Event streaming could be big bandwidth, especially with a lot of tags. Compression could be important. Especially something like "event stream tagging", where the client and server work out a "shorthand" like "StreamID: 0xFED marks an event stream with a certain set of tags."

Correlators

Now we need Correlators. Correlators look at events and try to tie them together. "Tomcat is down on all these hosts. That's our entire cluster. Alert!!"

"if a certain source host has scanned the same destination port on more than 10 distinct destination hosts during 60 seconds, raise an alarm"

Correlators need to have pretty in-depth knowledge of your systems and how they interrelate.

Correlators really aren't much different than Event Generators, except that they take Events as raw data. I picture it having a little higher level language too.

Actions

Correlator events can trigger actions. An action could be:

  • Generate an Event
  • Call for help
  • Try to fix it

Notes

A GUI? An event rule generator? A visual correlator? Query all the events we saw at a certain time, and just click: "This event + This event + This event => This trigger."

Personal tools