LISA 2007

From Laen

Jump to: navigation, search

Contents

Workshops

Building Data Centers

Running wire under the floor is out. Ladder racking is in. The main reason is that under the floor is used to pipe cold air, and as rack density increases, that underfloor space becomes more and more important. One site has 4 foot raised floor. (!)

Power, Cooling, and Space are everyone's biggest problem. 120 cores in a single rack produce a lot of heat, and it takes a lot to cool it power it, and to get rid of the waste heat. Some places (in colder climates) were channelling the waste heat into the building heating system.

APC Manageable power strips are loved by all as a way to monitor power usage in the datacenter. Also, for power management.

Brady TLS2200 label printers will print little labels especially for cable management.

It's important to balance the phases of power in your datacenter. If the phases are out of balance, it can toast your Transformers.

People that have fiber in their datacenters have it cut to length to fiber patch panels.

Configuration, Managing Architectures

Security-wise: "Don't give orders you can't enforce."

WISE - What-If Scenario Evaluator. Lets you run What-If scenarios detailing what would happen if you moved pieces of your infrastructure around.

WebApps:

  • Monitor systems with heartbeats.
  • Monitor applications with synthetic requests
  • Assign each request a UUID. Track it throughout your process flow.
  • If the request errors in any way, store it, otherwise toss it.
  • Collect per-request performance info on each step of the request.

NSF's Future Internet Design Program

BoFs

Cfengine Developments

There's a USENIX/SAGE Cfengine book.

Cfengine 3 will have built-in version control hooks.

Cfengine 3 has "Reusable, parameterizable templates (bodies)". Config file generation?

Cfengine 2 has package management code.

Lots of instrumentation and performance data collection. "CfBrain"

cfenvd is neat. It automatically detects system norms, and alerts when variables are more than a standard deviation away from normal.

Nagios

SEC+Nagios Plugin

check_cluster and check_cluster2 can be used to monitor sets of hosts.

Autogenerate hosts from the hostdb.

Use Passive Service checks more. Send them with StatsServer.

Add some interface monitoring.

Nagios<->RRD integration.

How do we store dependencies??

StatServer can be used to pass around Nagios passive service checks.

I need some Cacti management scripts..

Opsview -- Built around Nagios.

ZenOSS -- Not built around Nagios

Tutorials

Deploying Linux-HA

Linux-HA

Alan Robertson's Blog

You can't pass more packets over GigE as 100megabit? Minimum packet time? What? He uses the Denver Ball Game Tickets as the example..

HA systems are a lot like "init on steroids". With policies on:

  • what order to do things in
  • how services relate to each other
  • when to run them.

Split Brain

  • Communications failures between nodes of the cluster.

Quorum

  • Voting on who the errent node is.

Fencing

  • Putting a "fence" around other errant node.
  • STONITH
  • Fiber channel switch lockout

Narrowly defined "SPOF" -- If this bit fails, service stops. So, replication links aren't SPOFs.

Could you use routing to Fence off a host?

Heartbeat could trigger DNS updates.. Low TTL required.

Shared Data

  • DRBD
  • NAS, NFS/iSCSI
  • Database replication
  • SAN

Shared Storage vs. Storage Replication

HA vs DR - Failback is hard in DR. Easier in HA.


Linux-HA capabilities

  • Up to 16 node clusters
  • Active/Passive or Active/Active
  • Open Cluster Framework
  • XML-based resource configuration
  • Configuration and Monitoring GUI?
  • OCFS2
  • Master/slave resource support?
  • VM Migration built-in.
  • Split-site cluster

Has LVS integration (ldirectord)

Minimal config in ha.cnf

node node1 node2 node3
bcast eth0
crm on

Also, an /etc/ha.d/authkeys key

ClusterIP - Uses multicast MACs. Every host has the same IP address.

Random Stuff

NetApp has an emulator.

Running SNMP as a different user

RSS for change sets

VMCasting - automatic virtual machine deployment mechanism based on RSS2.0

Personal tools