Check MK – Write your own check

Download mkp-File: redis-info-mkp

Many thanks to Robert Sander, he was so kind to create a github-repo at https://github.com/HeinleinSupport/check_mk/tree/master/redis_info – it will be updated by him the sooner or later.

2016-12-06 – Updated this How To: If you use my old version of the redis_info_check make sure to first delete every corresponding rule, remove the old package, cleanup your agents and wait for them to be deployed without any redis_info_check. If all is cleared install the new package, create new rules and so on. (up to version 0.2.5)

2017-02-05 – Updated HowTo to version 0.3.3

2017-07-14 – For Check_MK 1.4.0pX you have to change a few lines:

  • each #!/usr/bin/python must be replaced with #!/usr/bin/env python
  • and so that the agent bakery doesn’t complain about local_agents_dir in the agents/bakery/redis_info-file you have to change it to cmk.paths.local_agents_dir

I’ll update the howto and source/mkp-files ASAP.

Introduction

A good start for writing your own checks is to have a look at https://mathias-kettner.de/checkmk_devel_agentbased.html. The Check_MK documentation is sometimes not very clear and/or complete – they are working on it – and there is much selfstudy to do. If you are using Check_MK Enterprise Edition you’ll soon miss a howto for creating a check that can be distributed by the „Agent Updater/Bakery“. Here I’ll try to explain how to create a complete check module for REDIS which later can be distributed. I also tried to follow the guidelines for writing checks https://mathias-kettner.de/checkmk_devel_guidelines.html but don’t be too harsh with me, because I just was tossed into Python by using Check_MK.

Where to begin?

Let’s have a look at the folder structure, where self-programmed checks and it’s corresponding parts should be placed as far as I understood. Change to your site user (e.g. with „omd su mysite“) and go to ~/local/share/check_mk – here we can see:

.
├── agents
│   ├── bakery
│   │   └── redis_info
│   ├── plugins
│   │   └── redis_info
│   └── special
├── alert_handlers
├── checkman
│   └── redis_info
├── checks
│   ├── redis_info
├── compiled_mibs
├── inventory
├── mibs
├── notifications
├── pnp-rraconf
├── pnp-templates
├── reporting
│   └── images
└── web
    ├── htdocs
    │   └── images
    └── plugins
        ├── config
        ├── dashboard
        ├── icons
        ├── metrics
        ├── pages
        ├── perfometer
        │   └── redis_info.py
        ├── sidebar
        ├── views
        ├── visuals
        └── wato
            ├── agent_bakery_redis_info.py
            └── check_parameters_redis_info.py

We see here already a few „redis“ files – all those files will be later packaged to a mkp-file. But first some explanations and later we will have a look at all the files in detail.

agents/bakery

here you’ll place a file with Python code containing the info where the agent bakery can find the check itself and where it will be placed on the client. Besides this file can contain procedures where a configfile, with defined configuration parameters for the client-side-check should be placed.

agents/plugins

here is the check which will be run on the corresponding client. Normally on Linux hosts it lands in the „PluginsDirectory“ (e.g. /usr/lib/check_mk_agent/plugins). The files in here should executable (e.g. with chmod 755) The output of those files should be for example like:

<<<mycheck>>>
some output

checkman

this should contain a file with the „manual“ of the check. Unfortunately there is absolutely no documentation at all, what this file may look like. You’ll have to look at other checkman files of other checks to find out what this file should/can look like.

checks

here are your files with the checks that Check_MK is executing itself for processing the output of an agent. Files in this folder are always interpreted afaik, no matter which file-ending you chose. Do not wonder if for example a .bak, .orig or .old-file will render your Check_MK unresponsive.

web/plugins/perfometer/

this contains your configured perf-o-meter graphs you often see on the right to the service checks of your hosts.

web/plugins/wato/

  • if you want to be able to use the plugin distribution by the Agent Bakery you’ll need an agent_bakery-file. This file includes some info about the agent itself and is shown for example in WATO if you click on „Monitoring Agents – Agent Bakery“ -> „Rules“
  • And secondly you’ll need a check_parameters-file. This file contains info about all the parameters (e.g. warning/critical levels) for the check itself.

The Check

Client plugin (agents/plugins/redis_info)

REDIS listens on port 6379 (default) and you either can use the redis-cli or a normal telnet client to issue commands. Our needed command is a simple „INFO“ – this will give us interesting output about the status of the service. To be able to get this data I created a simple Python script which connects via Python’s socket-module to the port and calls „INFO“. The output is then just written to the console. This script can be in any programming/scripting language you want, the agent just should be able to execute it (Linux: bash, perl, python,… ; Windows: batch, powershell, …). If you want to keep to the guidelines, you only should use very basic and included stuff for your checks, so that there are very, very few dependencies. In this case, I could use the redis-cli – but as I’m not sure if it is everywhere available I did it in Python (also think about Python 2 and 3 compatibilities!).

#!/usr/bin/python
# -*- encoding: utf-8; py-indent-offset: 4 -*-

# Check_MK Redis Info Plugin - Check for upgradeable packages.
#
# Copyright 2016, Clemens Steinkogler <c.steinkogler[at]cashpoint.com>
# many thanks to: http://www.binarytides.com/receive-full-data-with-the-recv-socket-function-in-python/
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.
#
#
# Example Agent Output:
# <<<redis_info>>>
# +++ 127.0.0.1:6389 +++
# $951
# redis_version:2.4.10
# redis_git_sha1:00000000
# redis_git_dirty:0
# arch_bits:64
# multiplexing_api:epoll
# gcc_version:4.4.6
# process_id:10170
# uptime_in_seconds:10370
# uptime_in_days:0
# lru_clock:2085406
# used_cpu_sys:1.49
# used_cpu_user:2.13
# used_cpu_sys_children:0.00
# used_cpu_user_children:0.00
# connected_clients:1
# connected_slaves:0
# client_longest_output_list:0
# client_biggest_input_buf:0
# blocked_clients:0
# used_memory:726112
# used_memory_human:709.09K
# used_memory_rss:7143424
# used_memory_peak:734632
# used_memory_peak_human:717.41K
# mem_fragmentation_ratio:9.84
# mem_allocator:jemalloc-2.2.5
# loading:0
# aof_enabled:0
# changes_since_last_save:0
# bgsave_in_progress:0
# last_save_time:1467878579
# bgrewriteaof_in_progress:0
# total_connections_received:190
# total_commands_processed:189
# expired_keys:0
# evicted_keys:0
# keyspace_hits:0
# keyspace_misses:0
# pubsub_channels:0
# pubsub_patterns:0
# latest_fork_usec:0
# vm_enabled:0
# role:master
# --- 127.0.0.1:6389 ---

# ### imports
import socket  # for sockets
import sys  # for exit
import time  # for time
import os
import inspect
from subprocess import Popen, PIPE


# ##### methods ###########
def recv_timeout(the_socket, timeout=2):
    # make socket non blocking
    the_socket.setblocking(0)

    # total data partwise in an array
    total_data = []
    # data = ''

    # beginning time
    begin = time.time()
    while 1:
        # if you got some data, then break after timeout
        if total_data and time.time() - begin > timeout:
            break
        # if you got no data at all, wait a little longer, twice the timeout
        elif time.time() - begin > timeout * 2:
            break

        # recv something
        try:
            data = the_socket.recv(8192)
            if data:
                total_data.append(data)
                # change the beginning time for measurement
                begin = time.time()
            else:
                # sleep for sometime to indicate a gap
                time.sleep(0.1)
        except:
            pass

    # join all parts to make final string
    return ''.join(total_data)
# enddef


def send_cmd(the_socket, message):
    message += "\r\n"
    try:
        # Set the whole string
        the_socket.sendall(message)
    except socket.error:
        # Send failed
        print('Failed to send command')
        sys.exit()
# enddef


# thanks to: http://stackoverflow.com/questions/3718657/how-to-properly-determine-current-script-directory-in-python
def get_script_dir(follow_symlinks=True):
    if getattr(sys, 'frozen', False):  # py2exe, PyInstaller, cx_Freeze
        path = os.path.abspath(sys.executable)
    else:
        path = inspect.getabsfile(get_script_dir)
    if follow_symlinks:
        path = os.path.realpath(path)
    return os.path.dirname(path)
# enddef

##################################################################################
# vars
config_file_dir = get_script_dir()
config_file = config_file_dir + "/redis_info.cfg"

##################################################################################
# main
print("<<<redis_info>>>")
try:
    with open(config_file, 'r') as fh:
        config = [line.strip().split() for line in fh]

    # if we just find a line with the string 'automatic', we detect our redis-instances automatically
    # there are definitely better ways how to do it - look at Robert Sander's github-repo (heinlein)
    if config[0][0] == 'automatic':
        config = []  # we reset the config
        # command: ps xa -o command= | grep redis-server | grep -v grep | cut -d " " -f2
        # output: 127.0.0.1:6379
        #         *:6380
        p1 = Popen(['ps', 'ax', '-o', 'command='], stdout=PIPE)
        p2 = Popen(['grep', 'redis-server'], stdin=p1.stdout, stdout=PIPE)
        p3 = Popen(['grep', '-v', 'grep'], stdin=p2.stdout, stdout=PIPE)
        p4 = Popen(['cut', '-d', ' ', '-f2'], stdin=p3.stdout, stdout=PIPE)
        output = p4.communicate()[0]
        output = output.splitlines()  # split lines into list, for example will create: ['127.0.0.1:6379', '*:6380']
        for output_element in output:
            output_element = output_element.replace('*',
                                                    '127.0.0.1')  # if redis listens on all interfaces, we take localhost
            output_element = output_element.split(':')  # we create a list and append it to the config list
            config.append(output_element)  # we receive a list [['127.0.0.1', '6379'], ['127.0.0.1', '6380']]
        # endfor
    # endif

    # print(config)
    remote_ip = None
    for config_element in config:
        redis_error = False
        # print(str(config_element))
        if len(config_element) == 2:
            redis_host = config_element[0]
            redis_port = int(config_element[1])
            print('+++ ' + str(redis_host) + ':' + str(redis_port) + ' +++')
        else:
            raise IOError

        # create an INET, STREAMing socket
        try:
            s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        except socket.error:
            redis_error = True
            print('Failed to create socket')

        try:
            remote_ip = socket.gethostbyname(redis_host)
        except socket.gaierror:
            redis_error = True
            # could not resolve
            print('Failed - hostname could not be resolved. Exiting')

        # Connect to remote server
        try:
            s.connect((remote_ip, redis_port))
        except socket.error:
            redis_error = True
            print('Failed to connect to host')

        if redis_error is not True:
            # send info command - will get all info sections
            send_cmd(s, "info")
            # DEBUG: get reply and print
            # print(recv_timeout(s))

            recvd_data = recv_timeout(s)
            print(recvd_data)

            # Close the socket
            s.close()
        # endif

        print('--- ' + str(redis_host) + ':' + str(redis_port) + ' ---')
    # endfor
except IOError, e:
    print("IOError: Error reading config -- " + str(e))
except IndexError, e:
    print("IndexError: Config has too many arguments -- " + str(e))
# endtry

Yay, first part finished. We have a programm that creates a parseable output for Check_MK. You already can place this check on your host where a redis-daemon is running. Don’t forget to create a redis_info.cfg in the same folder with the line 127.0.0.1 6379 Next we need the check-script on our Check_MK server which will interpret the output.

Server plugin

This will be the most complicated part. The agent gives us now the output and we will need to parse every line. Between the different REDIS versions there was a small change how „INFO“ was output, so we also have to catch that. The code:

#!/usr/bin/python
# -*- encoding: utf-8; py-indent-offset: 4 -*-

# Check_MK Redis Info Plugin
#
# Copyright 2016, Clemens Steinkogler <c.steinkogler[at]cashpoint.com>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.
#
# check_mk --debug -nv --checks=redis_info some.redis-server.dom

# example output
# <<<redis_info>>>
# +++ 127.0.0.1:6380 +++
# Failed to connect to host
# --- 127.0.0.1:6380 ---
# +++ 127.0.0.1:6379 +++
# $1858
# # Server
# redis_version:2.8.19

import time

# factory_settings ... is part of version/share/check_mk/modules/check_mk.py
factory_settings["redis_info_default_values"] = {
    'config': ((None, None), False),
}


# the inventory function
def inventory_redis_info(info):
    if len(info) > 0:
        hostinfo = ''
        for line in info:
            redis_info_item = ''
            line = " ".join(line)
            line = str(line)
            # print("line: '" + line + "'")
            if line.startswith('+++'):
                hostinfo = line.strip('+++').strip()
                # print('hostinfo: ' + hostinfo)
                continue  # just go to next line
            elif line.startswith('---'):
                continue  # end of host-section reached
            elif line.startswith('$') or line.startswith('#'):
                continue
            # endif

            errors = ['Failed', 'Error']
            if any(error in line for error in errors):
                redis_info_item = hostinfo + ": error"
                yield (redis_info_item, "redis_info_default_values")
            elif ":" in line:
                line_as_list = line.split(':')
                # we have to reformat for example ["slave0", "ip=10.12.47.107,port=6380,state=online,offset=89946822,lag=1"]
                # to get a nice useable format
                #print(str(line_as_list))
                if "=" in line_as_list[1]:
                    dissected_line_as_list = line_as_list[1].split(",")
                    # will create a new list
                    # ["ip=10.12.47.107", "port=6380", "state=online", "offset=89946822", "lag=1"]
                    for dissected_line_as_list_element in dissected_line_as_list:
                        hostinfo_part, hostinfo_value = dissected_line_as_list_element.split("=")
                        # we now split for example "ip=10.12.47.107" to hostinfo_part = "ip" and to
                        # hostinfo_value = "10.12.47.107"
                        redis_info_item = hostinfo + ": " + str(line_as_list[0]) + "_" + str(hostinfo_part)
                        # we now yield a new item - e.g.: slave0_ip: 10.12.47.107
                        # print(str(redis_info_item))
                        yield (redis_info_item, "redis_info_default_values")
                    # endfor
                else:
                    redis_info_item = hostinfo + ": " + str(line_as_list[0])
                    # print(str(redis_info_item))
                    yield (redis_info_item, "redis_info_default_values")
                # endif
            # endif
        # endfor
    # endif
# enddef


# create the perfdata stuff and save new values in counter-file        
def create_redis_total_commands_perfdata(current_total_commands, last_total_commands, last_total_commands_diff):
    total_commands_diff = None
    # parameter_name = "total_commands_processed"
    if current_total_commands > last_total_commands:  # redis was not restarted
        current_diff = current_total_commands - last_total_commands
        total_commands_diff = last_total_commands_diff + current_diff
        set_item_state('daily_redis_total_commands_diff', total_commands_diff)  # save new daily total commands difference
    # endif

    if current_total_commands < last_total_commands:  # redis must have been restarted
        total_commands_diff = last_total_commands_diff + current_total_commands  # we add the amount of new commands to the already known ones
        set_item_state('daily_redis_total_commands_diff', total_commands_diff)  # save new daily total commands difference
    # endif

    # return parameter_name, saveint(total_commands_diff)  # return the needed perfdata
    return saveint(total_commands_diff)  # return the needed perfdata-value
# enddef


def float_int_or_string(value):
    if isinstance(value, int):
        return saveint(value)
    # endif

    if isinstance(value, float):
        return savefloat(value)
    # endif

    if isinstance(value, basestring):
        value_int = saveint(value)
        value_float = savefloat(value)

        # ### DEBUGGING
        # print("value: " + str(value))
        # print("value_int: " + str(value_int))
        # print("value_float: " + str("{0:.2f}".format(round(value_float,2))))
        # some examples and their return value
        #
        # # value is initially really a string - e.g.: master
        # saveint will return    0
        # savefloat will return  0.0
        # the 'integer string' 0 is not equal to the string master - the elif will be taken into account
        # the 'float string' 0.0 is not equal to the string master - the else will be taken into account
        # so the initial given string will be returned
        #
        # # value is initially an integer given as string - e.g.: 17
        # saveint will return    17
        # savefloat will return  17.0
        # the 'integer string' 17 is equal to the string 17 the value will be returned as integer
        #
        # # value is initially a float given as string - e.g.: 17.4
        # saveint will return    17
        # savefloat will return  17.4
        # the 'integer string' 17 is not equal to the string 17.4 the elif will be taken into account
        # the 'float string' 17.4 is equal to the string 17.4 so it will be returned as float
        if str(value_int) == str(value):
            return value_int
        # as we possibly receive a value of for example 1.10, savefloat will return 1.1 - for comparison
        # we need a float with two decimal points
        elif str("{0:.2f}".format(round(value_float, 2))) == str(value):
            return value_float
        else:
            return value
        # endif
    # endif
# enddef


# the check function
def check_redis_info(item, params, info):
    perfdata = []
    state = 0

    # ### DEBUGGING
    # print(str(params))
    item_as_list = item.split(": ")
    host_in_item, hostinfo_in_item = item_as_list
    info_as_dict = {}
    create_perfdata = False
    errors = ['Failed', 'Error']
    warn, crit = params['config'][0]
    create_perfdata = params['config'][1]
    warn_crit_append_string = ""
    message = ""
    perfdata_value = None
    perfdata_value_str = None
    check_string_crit = False
    # ### DEBUGGING
    # just tried something - some interesting variables
    # print(str(get_info_for_check(g_hostname, lookup_ip_address(g_hostname), 'redis_info')))
    # print(str(extra_service_conf))
    # debug warn and crit
    # print(str(warn))
    # print(str(crit))

    for line in info:
        line = " ".join(line)
        line = str(line)
        if line.startswith('+++'):
            hostinfo = line.strip('+++').strip()
            info_as_dict[hostinfo] = {}
            continue  # just go to next line
        elif line.startswith('---'):
            continue  # end of host-section reached
        else:
            if any(error in line for error in errors):
                info_as_dict[hostinfo].update({hostinfo_in_item: str(line)})
            elif ":" in line:
                # e.g.:  line = "slave0: ip=10.12.47.107,port=6380,state=online,offset=89946822,lag=1"
                line_as_list = line.split(':')
                # e.g.: line_as_list = ["slave0", "ip=10.12.47.107,port=6380,state=online,offset=89946822,lag=1"]
                if "=" in line_as_list[1]:
                    dissected_line_as_list = line_as_list[1].split(",")
                    # will create a new list
                    # ["ip=10.12.47.107", "port=6380", "state=online", "offset=89946822", "lag=1"]
                    for dissected_line_as_list_element in dissected_line_as_list:
                        hostinfo_part, hostinfo_value = dissected_line_as_list_element.split("=")
                        info_as_dict[hostinfo].update({str(line_as_list[0]) + '_' + str(hostinfo_part): str(hostinfo_value)})
                        # e.g. info_as_dict["127.0.0.1:6379"].update({"slave0_ip": "10.12.47.107"})
                    # endfor
                else:
                    info_as_dict[hostinfo].update({line_as_list[0]: line_as_list[1]})
                # endif
            # endif
        # endif
    # endfor

    # from pprint import pprint
    # pprint(info_as_dict)

    # Catch if automatic detection is used. The plugin will not be able to connect to an previously automatically
    # detected redis-instance so here the it will be tried to read the supplied value. But if it's not available
    # a 'KeyError' would occure
    try:
        hostinfo_in_item_value = str(info_as_dict[host_in_item][hostinfo_in_item])
        # print(str(info_as_dict))
    except KeyError:
        hostinfo_in_item_value = "Error - Redis-Instance or Check not available"
    # endtry

    # if initial first connection to host fails (e.g. if first inventory is run)
    if "error" in item:
        state = 2
    # else if connection to host was lost or if REDIS is currently not running
    elif any(error in hostinfo_in_item_value for error in errors):
        state = 2
    else:
        warn = float_int_or_string(warn)
        crit = float_int_or_string(crit)
        # print("host_in_item: " + str(host_in_item))
        # print("hostinfo_in_item: " + str(hostinfo_in_item))
        # print("hostinfo_in_item value: " + str(info_as_dict[host_in_item][hostinfo_in_item]))
        perfdata_value = float_int_or_string(info_as_dict[host_in_item][hostinfo_in_item])
        # print("perfdata_value after float_int_or_string: " + str(perfdata_value))
        if isinstance(perfdata_value, basestring):
            if (warn is None) and (crit is None):
                warn_crit_append_string = " "
            else:
                warn_crit_append_string = " (crit if string changes) "
                check_string_crit = True
            # endif
        else:
            if (warn is None) and (crit is None):
                warn_crit_append_string = " "
            else:
                warn_crit_append_string = " (warn/crit at " + str(warn) + "/" + str(crit) + ")"
            # endif
        # endif

        bytes_to_mb_for = ['used_memory', 'used_memory_peak', 'used_memory_rss']
        if hostinfo_in_item in bytes_to_mb_for:
            perfdata_value = round(float(perfdata_value) / 1024 / 1024, 2)
            perfdata_value_str = str(perfdata_value) + "MB"
        else:
            perfdata_value_str = str(perfdata_value)
        # endif
    # endif

    if create_perfdata:
        # catch total_commands_processed
        if hostinfo_in_item == "total_commands_processed":
            # print("hostinfo_in_item: " + str(hostinfo_in_item))
            date = time.strftime("%Y%m%d")
            current_redis_total_commands = saveint(perfdata_value)  # current total_commands_processed value
            daily_redis_total_commands_diff = get_item_state('daily_redis_total_commands_diff')  # we try to get the daily processed total commands
            # print("daily_redis_total_commands_diff: " + str(daily_redis_total_commands_diff))

            if daily_redis_total_commands_diff is None:  # there was no value found - so this is running the first time
                # we set the few differnent counters that we need later
                daily_redis_total_commands_diff = 0
                set_item_state('daily_redis_total_commands_diff', daily_redis_total_commands_diff)
                daily_redis_total_commands_date = date
                set_item_state('daily_redis_total_commands_date', daily_redis_total_commands_date)
                last_redis_total_commands = current_redis_total_commands
                set_item_state('last_redis_total_commands', last_redis_total_commands)
                perfdata_value = saveint(daily_redis_total_commands_diff)
                # text = "%s: %d" % (parameter, saveint(daily_redis_total_commands_diff))
            else:  # we found an old value
                daily_redis_total_commands_date = get_item_state('daily_redis_total_commands_date')  # we get the set date
                last_redis_total_commands = get_item_state('last_redis_total_commands')  # we get the last total commands processed value
                # print("daily_redis_total_commands_date: " + str(daily_redis_total_commands_date))

                if daily_redis_total_commands_date == date:  # do we still have the same day?
                    # print("date: " + str(date) + " daily_redis_total_commands_date: " + str(daily_redis_total_commands_date))
                    daily_redis_total_commands_diff = get_item_state('daily_redis_total_commands_diff')

                    # we have a function for creating the perfdata stuff, it will also set the new daily_redis_total_commands_diff value
                    perfdata_value = create_redis_total_commands_perfdata(current_redis_total_commands, last_redis_total_commands, daily_redis_total_commands_diff)

                    # we set the current total commands processed as the last known - so we can calculate the values for the next check run
                    set_item_state('last_redis_total_commands', current_redis_total_commands)
                else:  # we have a new day - we must reset the already daily counted total commands processed to 0
                    daily_redis_total_commands_diff = 0
                    daily_redis_total_commands_date = date  # we set the new date
                    set_item_state('daily_redis_total_commands_date', daily_redis_total_commands_date)

                    # we have a function for creating the perfdata stuff, it will also set the new daily_redis_total_commands_diff value
                    perfdata_value = create_redis_total_commands_perfdata(current_redis_total_commands, last_redis_total_commands, daily_redis_total_commands_diff)

                    # we set the current total commands processed as the last known
                    set_item_state('last_redis_total_commands', current_redis_total_commands)
                # endif
                perfdata_value_str = str(perfdata_value)
            # endif
        # endif

        # perfdata = [(label, value*, warn, crit, min, max)]
        # * can have a unit of measurement:
        #   + no unit specified - assume a number (int or float) of things (eg, users, processes, load averages)
        #   + s - seconds (also us, ms)
        #   + % - percentage
        #   + B - bytes (also KB, MB, TB, GB?)
        #   + c - a continous counter (such as bytes transmitted on an interface)
        if (warn is None) or (crit is None):
            perfdata = [(hostinfo_in_item, perfdata_value_str)]
        else:
            perfdata = [(hostinfo_in_item, perfdata_value_str, warn, crit, 0)]
        # endif
    # endif

    if state != 2:
        message = str(hostinfo_in_item) + ": " + str(perfdata_value_str) + str(warn_crit_append_string)
    else:
        try:
            message = "(!!) An error occured: " + hostinfo_in_item_value
        except KeyError:
            state = 2
            message = "(!!) - Item not found, maybe do a Tabula Rasa :o) "
        # endtry
    # endif

    # we check if warn/crit is set and if we have to change the state
    if (warn is not None) or (crit is not None) and (state != 2):
        if check_string_crit:
            if str(crit) != perfdata_value_str:
                state = 2
                message += "difference between set string '" + str(crit) + "' detected(!!)"
            # endif
        else:
            if float_int_or_string(perfdata_value) >= crit:  # critical part
                state = 2
                message += " (!!)"
            elif float_int_or_string(perfdata_value) >= warn:  # warning part
                state = 1
                message += " (!)"
            # endif
        # endif
    # endif

    return state, message, perfdata
# enddef

# declare the check to Check_MK
check_info["redis_info"] = {
    'default_levels_variable': "redis_info_default_values",
    'inventory_function': inventory_redis_info,
    'check_function': check_redis_info,
    'service_description': 'Redis info',
    'has_perfdata': True,
    'group': "redis_info",
}

Explanations

Debugging

for debugging a check, you can use as the site-user the shell-command check_mk --debug -nv --checks=redis_info some-server.with.redis. The output will also include print output. But do not forget to comment print statements later as they will create problems if you use the WebGUI.

NOTE about: -n … do not submit results to core, do not save counters

Factory Settings

# factory_settings ... is part of version/share/check_mk/modules/check_mk.py
factory_settings["redis_info_default_values"] = {
    'config': ((None, None), False),
}

Here you define the default parameters which later should be used by the check-function def check_redis_info(item, params, info):. Those default-values will be placed in the params variable

Some notes about ~/version/share/check_mk/modules/check_mk.py

Here you find many already defined useful methods and functions. Unfortunately they are not well documented on the Check_MK manual website and you only stumble over them looking at other checks. I’m sure that some parts of my scripts can be replaced by some methods found in there – but I just don’t know anything about them. For example there is some „parse“ function which surely can be used – but I didn’t have the time yet to have a look at it.

The Inventory Function

# the inventory function
def inventory_redis_info(info):
    if len(info) > 0:
        hostinfo = ''
        for line in info:
            redis_info_item = ''
            line = " ".join(line)
            line = str(line)
            # print("line: '" + line + "'")
            if line.startswith('+++'):
                hostinfo = line.strip('+++').strip()
                # print('hostinfo: ' + hostinfo)
                continue  # just go to next line
            elif line.startswith('---'):
                continue  # end of host-section reached
            elif line.startswith('$') or line.startswith('#'):
                continue
            # endif

            errors = ['Failed', 'Error']
            if any(error in line for error in errors):
                redis_info_item = hostinfo + ": error"
                yield (redis_info_item, "redis_info_default_values")
            elif ":" in line:
                line_as_list = line.split(':')
                # we have to reformat for example ["slave0", "ip=10.12.47.107,port=6380,state=online,offset=89946822,lag=1"]
                # to get a nice useable format
                # print(str(line_as_list))
                if "=" in line_as_list[1]:
                    dissected_line_as_list = line_as_list[1].split(",")
                    # will create a new list
                    # ["ip=10.12.47.107", "port=6380", "state=online", "offset=89946822", "lag=1"]
                    for dissected_line_as_list_element in dissected_line_as_list:
                        hostinfo_part, hostinfo_value = dissected_line_as_list_element.split("=")
                        # we now split for example "ip=10.12.47.107" to hostinfo_part = "ip" and to
                        # hostinfo_value = "10.12.47.107"
                        redis_info_item = hostinfo + ": " + str(line_as_list[0]) + "_" + str(hostinfo_part)
                        # we now yield a new item - e.g.: slave0_ip: 10.12.47.107
                        # print(str(redis_info_item))
                        yield (redis_info_item, "redis_info_default_values")
                    # endfor
                else:
                    redis_info_item = hostinfo + ": " + str(line_as_list[0])
                    # print(str(redis_info_item))
                    yield (redis_info_item, "redis_info_default_values")
                # endif
            # endif
        # endfor
    # endif
# enddef

This will create a service for each info line found containing a :. First the function will merge a line to one long string (somehow the output is already delivered in a list format). Then we look at the start of the line for +++ so we know that the following lines are all part of the current REDIS-instance-part and extract the hostinfo (e.g. 127.0.0.1:6379). If a line starts with ---, $ or # we just skip to the next line.If we find anywhere in the line the keywords Failed or Error we yield the found redis_info_item to the next function (initially if an error occures, only one “error” service will be created), else we just look if there is a :. We combine the hostinfo and the “info”-parameter (e.g. 127.0.0.1:6379: used_memory) and again yield it.

The additional perfdata-function

# create the perfdata stuff and save new values in counter-file        
def create_redis_total_commands_perfdata(current_total_commands, last_total_commands, last_total_commands_diff):
    total_commands_diff = None
    # parameter_name = "total_commands_processed"
    if current_total_commands > last_total_commands:  # redis was not restarted
        current_diff = current_total_commands - last_total_commands
        total_commands_diff = last_total_commands_diff + current_diff
        set_item_state('daily_redis_total_commands_diff', total_commands_diff)  # save new daily total commands difference
    # endif

    if current_total_commands < last_total_commands:  # redis must have been restarted
        total_commands_diff = last_total_commands_diff + current_total_commands  # we add the amount of new commands to the already known ones
        set_item_state('daily_redis_total_commands_diff', total_commands_diff)  # save new daily total commands difference
    # endif

    # return parameter_name, saveint(total_commands_diff)  # return the needed perfdata
    return saveint(total_commands_diff)  # return the needed perfdata-value
# enddef

We need this function for the proper handling of counters to create the perfdata output, which is used to create those nice graphs. This check for example includes a check for the REDIS INFO about total_commands_processed. A value that is continually growing and only resetting if REDIS is restartet. To get a nice „saw tooth“-pattern graph we need some handling. As you see we use set_item_state and saveint – those are only two of the hundreds of included methods I wrote about earlier. set_item_state is used to write values into the hosts „counter“-file on the server (later we use get_item_state to read those saved values). To learn more about counters have a look at https://mathias-kettner.de/checkmk_devel_counters.html

The additional int- or float-function

def float_int_or_string(value):
    if isinstance(value, int):
        return saveint(value)
    # endif

    if isinstance(value, float):
        return savefloat(value)
    # endif

    if isinstance(value, basestring):
        value_int = saveint(value)
        value_float = savefloat(value)

        # ### DEBUGGING
        # print("value: " + str(value))
        # print("value_int: " + str(value_int))
        # print("value_float: " + str("{0:.2f}".format(round(value_float,2))))
        # some examples and their return value
        #
        # # value is initially really a string - e.g.: master
        # saveint will return    0
        # savefloat will return  0.0
        # the 'integer string' 0 is not equal to the string master - the elif will be taken into account
        # the 'float string' 0.0 is not equal to the string master - the else will be taken into account
        # so the initial given string will be returned
        #
        # # value is initially an integer given as string - e.g.: 17
        # saveint will return    17
        # savefloat will return  17.0
        # the 'integer string' 17 is equal to the string 17 the value will be returned as integer
        #
        # # value is initially a float given as string - e.g.: 17.4
        # saveint will return    17
        # savefloat will return  17.4
        # the 'integer string' 17 is not equal to the string 17.4 the elif will be taken into account
        # the 'float string' 17.4 is equal to the string 17.4 so it will be returned as float
        if str(value_int) == str(value):
            return value_int
        # as we possibly receive a value of for example 1.10, savefloat will return 1.1 - for comparison
        # we need a float with two decimal points
        elif str("{0:.2f}".format(round(value_float, 2))) == str(value):
            return value_float
        else:
            return value
        # endif
    # endif
# enddef

This function will just check, if a given variable contains an integer, a float number or a simple string. The check-function will use it a few times later.

The check-function

# the check function
def check_redis_info(item, params, info):
[...]
  • item … for example will be 127.0.0.1:6379: used_memory. Remember the inventory-function yield-lines
  • params … if nothing else is defined via WATO the values of factory_settings will land in here (in our case this will be a dictionary)
  • info … the whole output of the agent plugin

The script initializes a few variables first and then will parse the info in a nice useable format. For each defined monitored REDIS-instance a dictionary will be created (e.g. {'127.0.0.1:6379': {'used_memory': '174'}}. This can later be easily used – the rest of the script should be pretty clear with all the comments.

The declaration part

# declare the check to Check_MK
check_info["redis_info"] = {
    'default_levels_variable': "redis_info_default_values",
    'inventory_function': inventory_redis_info,
    'check_function': check_redis_info,
    'service_description': 'Redis info',
    'has_perfdata': True,
    'group': "redis_info",
}

This should be pretty clear.

The parameter group is connected to the WATO configuration site for the check – some info about the different parameters can be found at https://mathias-kettner.de/checkmk_devel_newapi.html.

Finished, now you should be able to use the mentioned debugging command above and see a nice parsed output that Check_MK will use.

Manual Page (checkman/redis_info)

How to write a manual page is unfortunatelly not documented at all. You will have to look at other checks and their checkman-files. The redis_info file just looks like this in my case:

title: Check for redis info
agents: linux
author: Clemens Steinkogler <c.steinkogler[at]cashpoint.com>
license: GPL
distribution: check_mk
description:
 This plugin checks for redis info parameters

 The check becomes {critical} or {warning} if there are defined
 levels reached.

 The check may become {critical} if a defined string is not
 matching.

 This is a inventorized check and will create one service for
 every info parameter found. Default no warn or critical values
 are set and no graphs will be created unless the corresponding
 service is configured via WATO.

Check-Parameters (web/plugins/wato/check_parameters_redis_info.py)

There are a lot of undocumented features for check_parameters-files which you have to find out on your own looking at all those different checks 🙁 e.g. have a look at ~/share/check_mk/web/htdocs/valuespec.py, there you can find all possible form-fields that can be used for a WATO config file.

This check_parameters…-file is needed if you want to configure the different warn/crit-levels for a service. In our case it looks like:

#!/usr/bin/python
# -*- encoding: utf-8; py-indent-offset: 4 -*-

# Check_MK Redis Info Plugin
#
# Copyright 2016, Clemens Steinkogler <c.steinkogler[at]cashpoint.com>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.

group = "checkparams"

subgroup_applications = _("Applications, Processes &amp; Services")

register_check_parameters(
    subgroup_applications,
    "redis_info",
    _("Redis info levels"),
    Dictionary(
        elements=[
            ("config",
             Alternative(
                 title=_("Choose if output returns Numbers or Strings"),
                 elements=[
                 Tuple(
                     title=_("Config for Service with Number-output (e.g. maxmemory, used_memory, mem_fragmentation_ratio, ...)"),
                     elements=[
                         Optional(
                             Tuple(
                                 elements=[
                                     Float(
                                         title=_("Warning at"),
                                         default_value=5,
                                     ),
                                     Float(
                                         title=_("Critical at"),
                                         default_value=10,
                                     ),
                                 ],
                             ),
                             label=_("Warn/Crit Levels"),
                             help=_("Put here the warn/crit-levels. For memory checks like used_memory, used_memory_peak and used_memory_rss should be given values in MB"),
                             none_label=_("No levels set"),
                             none_value=(None, None)
                         ),
                         DropdownChoice(
                             title=_("Create graph"),
                             help=_('default is No'),
                             choices=[
                                 (False, _("No")),
                                 (True, _("Yes")),
                             ],
                             default_value="no",
                         )
                     ]
                 ),
                 Tuple(
                     title=_("Config for Service with String-output (e.g. role, redis_version, config_file, ...)"),
                     elements=[
                         Optional(
                             Tuple(
                                 elements=[
                                     FixedValue(
                                         None,
                                         title="",
                                         totext="",
                                     ),
                                     TextAscii(
                                         title=_("Crit if this string is NOT detected"),
                                         size=60,
                                         default_value="",
                                     ),
                                 ],
                             ),
                             label=_("Critical if string is different to the set one"),
                             help=_("Put here the string which should be OK"),
                             none_label=_("Nothing set"),
                             none_value=(None, None)
                         ),
                         FixedValue(
                             False,
                             title="No graphs with strings possible",
                             totext="No",
                         ),
                     ]
                 ),
                 ]
             )
             ),
        ]),
    TextAscii(
        title=_("Service description"),
        allow_empty=False,
    ),
    "dict"
)

Explanation

This will “wrap” the defined values into a dictionary. Remember the factory_settings!

Variable – group

in this case not really needed. We set it anyway.

Variable – subgroup_applications

This one is important. You find the different subgroups in ~/version/share/check_mk/web/plugins/wato/check_parameters.py – this defines where the check will be listed in WATO „Monitoring Agents – Agent Bakery“ -> „Rules“

Why do we use a „Dictionary“? You should have a look at https://mathias-kettner.de/checkmk_devel_factorysettings.html

This far, we have already a pretty complete check you can use for manual distribution to your Check_MK clients.

Perf-o-Meter (web/plugins/perfometer/redis_info.py)

As we keep to the guidelines we need a perf-o-meter file:

#!/usr/bin/python
# -*- encoding: utf-8; py-indent-offset: 4 -*-

# Check_MK Redis Info Plugin
#
# Copyright 2016, Clemens Steinkogler <c.steinkogler[at]cashpoint.com>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.


def float_or_int(value):
    if isinstance(value, int):
        return saveint(value)

    if isinstance(value, float):
        return savefloat(value)

    if isinstance(value, basestring):
        value_int = saveint(value)
        value_float = savefloat(value)

        if str(value_int) == str(value):
            return value_int

        if str(value_float) == str(value):
            return value_float
    # endif
# enddef


def perfometer_redis_info(row, command, perf_data):
    # debug_file = file("/tmp/redis_info_perfometer.log", "a")
    # debug_file.write(str(perf_data))
    # debug_file.close()
    # return u"%s" % str(perf_data[0][1]), perfometer_linear(int(perf_data[0][1]), 'silver')
    half = float_or_int(perf_data[0][1]) / 2
    if half <= 1:
        half = 2
    base = 2
    return u"%s" % str(perf_data[0][1]), perfometer_logarithmic(float_or_int(perf_data[0][1]), half, base, 'silver')

perfometers['check_mk-redis_info'] = perfometer_redis_info

I kept this file as simple as possible for more info you can have a look at https://mathias-kettner.de/checkmk_devel_perfometer.html. All perfdata in this case, is saved in a list and perfdata[0][1] is the value of the corresponding service-item. For debugging purposes you can uncomment the debug_file-lines. As long as they are active, there may be problems in the WebGUI but someone can then figure out how the values are delivered to this function.

Note: ’silver‘ … as far as I understood, you can use the safe HTML color-names here (e.g. a list of colors: http://www.w3schools.com/colors/colors_names.asp)

Agent-Bakery

Part I – (web/plugins/wato/agent_bakery_redis_info.py)

This file is for the “Bakery-Rules” to even be able to tell the bakery to add this plugin to the agent. If the rule is created via the WATO-GUI you’ll see a dropdown with two options. If the Deploy Redis Info Check plugin option is chosen, there will be an “Add element”-button where you will be able to define a REDIS instance. For each instance that should be monitored, the “Add…”-button must be clicked.

#!/usr/bin/python
# -*- encoding: utf-8; py-indent-offset: 4 -*-

# Check_MK Redis Info Plugin
#
# Copyright 2016, Clemens Steinkogler <c.steinkogler[at]cashpoint.com>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.

group = "agents/" + _("Agent Plugins")

register_rule(group,
              "agent_config:redis_info",
              CascadingDropdown(
                   title=_("Redis Info Check (Linux)"),
                   help=_("This plugin monitors configured REDIS info parameters."),
                   choices=[
                        ("automatic", _("Deploy REDIS Info Check with automatic detection of instances")),
                        ("static", _("Deploy REDIS Info Check plugin with manual configuration of instances"),
                            ListOf(
                                Tuple(elements=[
                                    IPv4Address(
                                        title=_("REDIS-Server IP Address"),
                                        help=_("Often configured to 0.0.0.0 - so all networkinterfaces"),
                                        default_value="127.0.0.1",
                                     ),
                                    Integer(
                                        title=_("REDIS-Server Port Number"),
                                        help=_("Default often configured to 6379"),
                                        minvalue=1,
                                        maxvalue=65535,
                                        default_value=6379,
                                     ),
                                 ],
                                 ),
                             ),
                         ),
                         (None, _("Do not deploy REDIS Info Check plugin")),
                    ],
               ),
)

Part II – (agents/bakery/redis_info)

Here the bakery will be informed where to place the clients-check-plugin and how to create the corresponding config-file:

#!/usr/bin/python
# -*- encoding: utf-8; py-indent-offset: 4 -*-

# Check_MK Redis Info Plugin
#
# Copyright 2016, Clemens Steinkogler <c.steinkogler[at]cashpoint.com>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.


def bake_redis_info(opsys, conf, conf_dir, plugins_dir):
    shutil.copy2(local_agents_dir + "/plugins/redis_info", plugins_dir + "/redis_info")
    f = file(plugins_dir + "/redis_info.cfg", "w")
    # debug_file = file("/tmp/redis_info_cfg_write_debug", "a")
    # debug_file.write("conf: " + str(conf) + "\n")
    config_line_parameters = ''
    if conf[0] == "static":
        # debug_file.write('conf: ' + str(conf[1]) + '\n')
        for parameter in conf[1]:
            config_line_parameters += ' '.join(str(i) for i in parameter) + '\n'
        f.write(str(config_line_parameters))
        # debug_file.write('config-file content: \n' + str(config_line_parameters))
    elif conf == "automatic":
        config_line_parameters = 'automatic\n'
        # debug_file.write('config-file content: \n' + str(config_line_parameters))
        f.write(str(config_line_parameters))
    # debug_file.close()
    f.close()

bakery_info["redis_info"] = {
    "bake_function": bake_redis_info,
    "os"           : ["linux", ],
}

Explanation

Variable local_agents_dir

this variable tells the bakery to look in ~/local/share/check_mk/agents/ – else it would use ~/share/check_mk/agents/ (this is afaik not very well documented)

bakery_info

os defines for which operating systems this check will work for – I haven’t had any time to look at this in detail for example to create a check for linux and windows

Conclussion

I hope some things got clearer with this short Howto. Check_MK is really great and easy to implement in your infrastructure if you use the implemented checks. BUT the documentation is often really outdated, not existing or not complete. Lot’s of selfstudy has to be done to write your first own checks and sometimes it’s hardly possible to find the right things via Google or Check_MK’s mailinglist. Besides Check_MKs support model is in my eyes pretty strange and mostly they will charge money for anything you drop in – however, they are quickly replying and sometimes even the boss is answering which I find pretty awesome :o). Also bugs or small mistakes in their included checks are fixed really quickly.

Getagged mit: , , , ,
5 Kommentare zu “Check MK – Write your own check
  1. First Remark: Configuration files for agent plugins should reside in /etc/check_mk (conf_dir argument for the bakery function and $MK_CONFDIR in the environment of the agent / plugin).
    Second Remark: Why don’t you use the redis-cli tool to query redis?

    • admin sagt:

      Hi Robert,
      thanks for your remarks. The REDIS-check was my first check I had to program and like I wrote I really just was tossed into Check_MK and Python.
      @first remark: yes, I know that already after writing a few other checks :o) and I’m trying to implement it in future checks. The first few times I tried to use the conf_dir argument – it didn’t work very well for me 😉 for whatever reason. So I just used a quick, dirty and working solution.
      @second remark: well, I wanted to create the whole check in Python and didn’t want to use any other commandline calls or a bash-script as agent-plugin. Besides the plugin can be installed on clients that do not have any REDIS stuff installed to remotely monitor instances (I know, authentication is still missing ;()

      I hope I can update the howto soon to the current version we are using at work and upload the source finally to github.

  2. I took the liberty and “forked” your redis check into our repository at https://github.com/HeinleinSupport/check_mk/tree/master/redis
    I also altered the agent plugin to autodetect running redis instances similar to the apache_status plugin.

    • admin sagt:

      Hi Robert,
      sorry for the late answer. Thank you very much! Awesome job. I’ll try to find time to update my howto this weekend to the new version.

    • admin sagt:

      Hi Robert,
      I finally updated the HowTo to the newest check-version. Maybe you want to update the repo too. I have not yet implemented your changes. I hope my “schweinecode” is OK :o)

Kommentar verfassen

Diese Website verwendet Akismet, um Spam zu reduzieren. Erfahre mehr darüber, wie deine Kommentardaten verarbeitet werden.