0. Introduction
ELK is the new black, it seems, so let's give it a try.Suppose that we have a bunch of servers where a simple application ir running smoothly:
operator@srv1 $ /usr/local/bin/myapp -c /usr/local/etc/myapp.cfg --log /var/log/myapp.log
I know, myapp is completely uncool (and boring), so let's make some noise.
Suppose that we want a dashboard to inspect /var/log/myapp.log coming from all servers so basically we need:
to search within logfiles;
to filter per host/datetime/colour_of_my_tshirt;
to create some nice dashboard to see how our new iMac 5k will render it.
You have basically two choices:
convince your colleagues to help you logging on each server, learn to regexp and to dashboard ( then you'll spend a lot of money in beer for your colleagues), or
use an ELK stack (and buy beers only for yourself).
We will follow the second approach, because a real sysadmin doesn't have colleagues, only enemies.
ELK stands for ElasticSearch/Logstash/Kibana because well, we will need all of them.
1. Install Fest
Let's install our favourite FreeBSD 10.2, then:textproc/elasticsearch
sysutils/logstash
textproc/kibana43
www/nginx
You know how to install a pkg/port, right? If not, it's as easy as (as root):
cd /usr/ports/ports-mgmt/portmaster
make install clean
rehash # if you are using (t)csh
portmaster elasticsearch logstash kibana43 nginx
spend some time on seeing a black terminal (this is the time you can post some screenshot on facebook saying something like 'OMG, I'm so nerd')
Ok, now we need to enable them (in order to start or restart):
root@elk:~ # echo 'elasticsearch_enable="YES"' > /etc/rc.conf.d/elasticsearch
root@elk:~ # printf 'logstash_enable="YES"\nlogstash_log="YES"\nlogstash_log_file="/var/log/logstash.log"' > /etc/rc.conf.d/logstash
root@elk:~ # echo 'kibana_enable="YES"' > /etc/rc.conf.d/kibana
root@elk:~ # echo 'nginx_enable="YES"' > /etc/rc.conf.d/nginx
Hold on, man. It's not time to start up anything yet.
2. ElasticSearch
Let's start with ElasticSearch: we will have a single node instance because, you know, myapp is not logging so much (so no cluster for now, sorry).Let's see what we have to change in /usr/local/etc/elasticsearch/elasticsearch.yml (don't worry, really a few things):
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
# path.data: /path/to/data
path.data: /var/db/elasticsearch
#
# Path to log files:
#
# path.logs: /path/to/logs
path.logs: /var/log/elasticsearch
#
[...]
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
network.host: 127.0.0.1
#
# Set a custom port for HTTP:
#
http.port: 9100
#
# For more information, see the documentation at:
# <http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-network.html>
Basically we are changing where to save data, indexes and logs and to listen on 127.0.0.1:9100
Now we can finally start using our cpu cycles, so let's start ElasticSearch with:
root@elk # service elasticsearch start
Check its logfile (and/or use ps/pgrep) to see if elasticsearch is happy.
3. LogStash
Time to run logstash, so let's open /usr/local/etc/logstash/logstash.conf and replace all the content with:input {
lumberjack {
port => 4433
ssl_certificate => "/usr/local/etc/logstash/logstash-forwarder.crt"
ssl_key => "/usr/local/etc/logstash/logstash-forwarder.key"
}
}
filter {
if [type] == "myapp" {
grok {
match => { message => "%{TIME:event_time} %{DATA:process}:%{GREEDYDATA:message}" }
}
}
}
output {
elasticsearch { hosts => ["localhost:9100"] }
stdout { codec => rubydebug }
}
What's that lumberjack thing? It's a protocol, used primarly via the logstash-forwarder.
On each server you will install and run a logstash-forwarder (i.e. https://github.com/didfet/logstash-forwarder-java) that will 'take care' of our /var/log/myapp.log sending it to our logstash server (the first time will send the entire file, then it will send only deltas).
On your servers you will have to start something like:
/usr/local/bin/java -jar logstash-forwarder-java-X.Y.Z.jar -config /usr/local/etc/logstash-agent.json
where logstash-agent.json is:
{
"network": {
"servers": [ "XXX.YYY.WWW.ZZZ:4433" ],
"ssl ca": "/usr/local/etc/logstash/keystore.jks",
"timeout": 15
},
"files": [
{
"paths": [ "/var/log/myapp.log" ],
"fields": { "type": "myapp" }
}
]
}
Here we are configuring our logstash endpoint (XXX.YYY.WWW.ZZZ), its port (4433), the keystore that will be used to encrypt the traffic, the file that will be 'monitored' and we will associate a tag to each event sent.
Awesome, isn't it?
Unfortunately logstash-forwarder-java is not in the freebsd ports tree (you will find sysutils/logstash-forwarder, written in Go), but I don't want to compile and install Go for this, so I'll use the java one.
A few words on the SSL stuff
Let's create key and cert on our elk server with:
openssl req -x509 -batch -nodes -newkey rsa:2048 -keyout /usr/local/etc/logstash/logstash-forwarder.key -out /usr/local/etc/logstash/logstash-forwarder.crt
Create a keystore based on logstash-forwarder.crt:
keytool -importcert -trustcacerts -file /usr/local/etc/logstash/logstash-forwarder.crt -alias ca -keystore /usr/local/etc/logstash/keystore.jks
(keytool will ask for a password, this is a good moment to remember how much you hate systemd)
Distribute the jks on all clients
Grab a beer and relax
Now you can start logforwarder-java on your clients and logstash on your server with:
root@elk # service logstash start
then take a look at /var/log/logstash.log, our planet should still be safe.
Now stay tuned for the second part of this amazing tutorial, where we'll talk about kibana with screenshots and incredible special effects and probably a spoiler on Star Wars Episode VIII (just kidding): until then, have fun!
Kibana
To create alerts for our ELK setup, we can use different methods.
The one I will show you is based on ElastAlert from Yelp.
Let's install ElastAlert (no port is available, so I will install it manually in a virtualenv).
We need to be root (and use bash - for the virtualenv)
sudo su
Install py-virtualenv
portmaster devel/py-virtualenv
Create and use a virtualenv
virtualenv /usr/local/elastalert
source /usr/local/elastalert/bin/activate
mkdir -p /usr/local/elastalert/etc
Download and install the repo
mkdir /tmp/elastalert
cd /tmp/elastalert
git clone https://github.com/Yelp/elastalert.git
cd elastalert
python setup.py build
pip install setuptools --upgrade
python setup.py install
pip install -r requirements.txt
# the first time it will (probably) fail due to an error related to argparse
pip install -r requirements.txt
Create the elastalert config file in /usr/local/elastalert/etc/config.yml
rules_folder: /usr/local/elastalert/etc/rules
# The unit can be anything from weeks to seconds
run_every:
minutes: 1
# ElastAlert will buffer results from the most recent
# period of time, in case some log sources are not in real time
buffer_time:
minutes: 15
# The elasticsearch hostname for metadata writeback
# Note that every rule can have it's own elasticsearch host
es_host: 127.0.0.1
# The elasticsearch port
es_port: 9100
# Optional URL prefix for elasticsearch
#es_url_prefix: elasticsearch
# Connect with SSL to elasticsearch
use_ssl: False
# The index on es_host which is used for metadata storage
# This can be a unmapped index, but it is recommended that you run
# elastalert-create-index to set a mapping
writeback_index: elastalert_status
# If an alert fails for some reason, ElastAlert will retry
# sending the alert until this time period has elapsed
alert_time_limit:
days: 2
Create the rules directory
mkdir -p /usr/local/elastalert/etc/rules
Create an alert (frequency based) that will send an email if more than 9 events will happen in 1 hour with status: 404 and type: nginx
In /usr/local/elastalert/etc/rules/frequency_nginx_404.yaml
name: Large Number of 404 Responses
es_host: 127.0.0.1
es_port: 9100
index: logstash-*
filter:
- term:
- status: 404
- term:
- type: nginx
type: frequency
num_events: 10
timeframe:
hours: 1
alert:
- "email"
email:
- "spaghetti.with.meatballs@are.not.italian.at.all"
Create an index for metadata storage
(elastalert)[dave@elk /usr/local/elastalert]$ ./bin/elastalert-create-index
Enter elasticsearch host: 127.0.0.1
Enter elasticsearch port: 9100
Use SSL? t/f: f
Enter optional basic-auth username:
Enter optional basic-auth password:
Enter optional Elasticsearch URL prefix:
New index name? (Default elastalert_status)
Name of existing index to copy? (Default None)
New index elastalert_status created
Done!
Test our rule
(elastalert)[dave@elk /usr/local/elastalert]# ./bin/elastalert-test-rule etc/rules/frequency_nginx_404.yaml
[...]
Launch ElastAlert (in a tmux session, maybe?)
(elastalert)[dave@elk /usr/local/elastalert]$ ./bin/elastalert --config etc/config.yml --debug
INFO:elastalert:Starting up
INFO:elastalert:Queried rule Large Number of 404 Responses from 2016-02-16 17:22 CET to 2016-02-16 17:37 CET: 9 hits
[...]
INFO:elastalert:Ran Large Number of 404 Responses from 2016-02-16 17:22 CET to 2016-02-16 17:37 CET: 9 query hits, 0 matches, 0 alerts sent
Let's generate some http/404 (again, it's time to let the world know how much you agree with the systemd architecture)
INFO:elastalert:Sleeping for 59 seconds
INFO:elastalert:Queried rule Large Number of 404 Responses from 2016-02-16 17:23 CET to 2016-02-16 17:38 CET: 10 hits
[...]
INFO:elastalert:Alert for Large Number of 404 Responses at 2016-02-16T16:38:00.680Z:
INFO:elastalert:Large Number of 404 Responses
At least 10 events occurred between 2016-02-16 16:38 CET and 2016-02-16 17:38 CET
@timestamp: 2016-02-16T16:38:00.680Z
@version: 1
_id: AVLq8hzNIkvyITAb373u
_index: logstash-2016.02.16
_type: nginx
agent: "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.109 Safari/537.36"
bytes: 564
file: /var/log/nginx/nginx.access.log
host: [
"blog.gufi.org"
]
message: xxx.xxx.xxx.xxx - blog.gufi.org [16/Feb/2016:17:37:57 +0100] "GET /test/foo/bar HTTP/1.1" 404 564 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.109 Safari/537.36" "141.101.98.223" [-] "-" "-" "0.000"
offset: 1853436
remote_addr: xxx.xxx.xxx.xxx
request_httpversion: 1.1
request_time: 0.000
request_url: /test/foo/bar
request_verb: GET
status: 404
time_local: 16/Feb/2016:17:37:57 +0100
type: nginx
upstream_addr: -
xforwardedfor: "xxx.xxx.xxx.xxx"
Once removed the --debug parameter, we will start receiving e-mails. (yay!)
Before leaving
Few words to answer the question 'Ok, so how can this be useful to me/my company/my fiancee?'Many of us, sysadmins, have used Nagios and its derivatives (Zabbix/Icinga) for years but it's time now to say Nagios goodbye (and thanks for all the fish).
Tools like ELK or OpenTSDB (or InfluxDB/KairosDB) and Bosun/Prometheus have been created to give us a new generation of more suitable tools in environments that become bigger and bigger: I know, to create and manage an ELK stack (or a OpenTSDB/Grafana/Bosun stack) requires more effort than to manage a Nagios box, but it's an overhead you will soon get used to (and probably you already have an hadoop/hbase installation to manage, right?).
In this case, having an in house tool to parse your application logs will allow you:
- to blame developers if something goes wrong (just in case you need further reasons to)
- to not give them access to any production machines (yes, they will ask to, anyway)
- to be able to search all your logs at once (like grepping on a syslog-ng basedir with improved superpowers) or with a better semantics (i.e. spotting trends)
- to create dashboards (for your management, you know...) or alerts (because you need a good reason to skip that boring meeting, right?)
ELK engineers suggest to use ELK not only for DEBUG/ERROR messages, but also for the application ones: this will add a great value to your logs and, once again, the world will be a safer place thanks to you, bro.
There were no screenshots in this article, so that's a potato for you here
Resources:
blog.gufi.org
freebsd.org
www.elastic.co
You're welcome for any query.
I really like your blog with respective content, Thanks for sharing for your information about Devops Online Training
ReplyDeleteQuestion on Elastic Search, the folder elasticsearch and the yml file do not exist after I follow your instructions. What am I missing?
ReplyDeletePlease Assign right permissions to the folder and file.
Delete"We will follow the second approach, because a real sysadmin doesn't have colleagues, only enemies." Infinite wisdom in these words. Thanks
ReplyDeleteYes, agreed :p
Delete