DevOps Resources
Nagios wasn't built for cloud - can't discover machines
built with the intention of being paired with a CM tool
uses RabbitMQ to securely route check requests and results, making it possible to scale out and back in on demand
uses Redis as a non-persistent database, to store client and event data
schedule the remote execution of checks and collect their results
OS
"scalable distributed monitoring system for high-performance computing systems such as clusters and grids"
XML for data representation
XDR for compact, portable data transport
RRDtool for data storage and visualization
agent-based
majority of monitoring systems out there are built around workloads and change processes that are no longer valid
future of application environments is dynamic, scalable, and ever changing
logstash is a tool for managing events and logs. You can use it to collect logs, parse them, and store them for later use (like, for searching). Speaking of searching, logstash comes with a web interface for searching and drilling into all of your logs.
Graylog2 is an open source log management solution that stores your logs in ElasticSearch. It consists of a server written in Java that accepts your syslog messages via TCP, UDP or AMQP and stores it in the database. The second part is a web interface that allows you to manage the log messages from your web browser.
Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. The system is centrally managed and allows for intelligent dynamic management. It uses a simple extensible data model that allows for online analytic application
Logging as a service (SaaS in the cloud - you send logs to them instead of installing the software)
Loggly provides a cloud based application intelligence solution for app developers. Loggly indexes application log data which can be used to troubleshoot, monitor and analyze customer usage.