Watchdog

From OraWiki

Jump to: navigation, search

Contents

General hint

This project is under construction. I'm actual writing the code, changing the documentation etc.. All pages related to Watchdog will change frequently. Maybe this project will become stable in December 2007. I work nearby each day a few hours at Watchdog to make it really running. The old plan to become stable in July 2007 is invalid due to a lot of changes of dependencies.

This page is probably good for those which will get an idea of software engineering and C programming. If you are interested in learning C programming you should look at our C programming beginners page to find some good links.

As I started this project my first idea was to create a very simple HA management and monitoring tool to run a small WebSite like this. Now the project growth. I've added a proprietary HTTP server for online WebBrowser based confguration, a scheduler and so on. The idea is now to create a general purpose, simple to use and understand, self learning HA management and monitoring tool including general purpose performance management and forecast.

Since 2007.04.21 a beta release of Watchdog is running at OraForecast.com. Just click at Watchdog preview to see that it really run and grow! If you can not access Watchdog than it's most likely that I'm just changing the code or that you are behind a firewall which blocks outgoing PORT 81 HTTP.

Download is available at Watchdog download.


It is possible that this document contains false or outdated information because I write the documentation before I implement a function and sometimes it seems necessary during development to change some of the ideas I've written down here as a reminder!




Current state of watchdog


2007.08.21 During holiday I was ill and not able to write a line. Maybe I had too much stress last month. I've done some changes at Watchdog. Development will continue next weeks. Up to now I'm working on Oracle RAC, DataGuard and Grid problems and required paperwork to become the self employed person I was until 2003.

I'm sorry about the delay.


2007.07.08

Installed a new demo of Watchdog to demonstrate base code of the statistics image. Just a simple change which shows only a simple fiber curve and a "scale-paper". The next release will make a huge step. I will move the data gathering code to the main loop and add the code to display the graph.

I'm actual playing with a method to implement a useful graph comparison plus a dynamic scale and zoom function.

Today I'm a little bit demotivated because we have a lot of ugly changes in German law which makes my work a criminal act (by arbitrariness. Maybe it is..maybe it is not!).

2007.06.22

Installed new Watchdog preview to test the PNG library. Just click on the statistics icon at Watchdog processes. The statistics module is not implemented. This is just a test of the watchdog png library.


2007.06.12

  • Added a fine tuning configuration for process status logging to the ini file. It is now possible to configure gathering of data in a MySQL database for all, monitored,not_monitored processes and to give each monitored process an additional flag to control it's logging behavior.
  • Ported the already written Java Ellipse and Line algorithms (Java Demo's at OraForecast.com) to C language. Need this stuff to create the PNG images with the statistical data. I will write a general purpose library with high level paint functions for PNG image creation. See ->watchdog png library (Added 2007.06.19)


2007.06.07

This function is one reason for me to say that this tool will become an illegal hacker tool if §22 is changed. I've written all German state Governments a mail to ask them not to vote for this changes and got answer from the Bavarian,

Now I will write the code to generate the statistics PNG image. Since Watchdog gathers data it makes sense to write the code to show the statistic. I have already written a JAVA program for such analysis. Just hit at Linear Regression Analysis to see some samples. I will publish the complete code of the Java application including gatherer processes until Sunday 2007.06.10 because this tool is not of any value for me. It was just the statistics prototype and I will create a better one using C language and PNG images.

Why a better one?

The Java application needs an open port in a firewall and the new solution do not need any additional open port! That's less risk!

Attention! I've changed some code of the HTTP server! Until now the server aborted if it could not open the configured port! Now it will retry to open the port hundred times in a ten seconds interval! So if you start Watchdog and can not access the HTTP server than you should wait a fews seconds or minutes!

Another change done related to this is failure handling of Watchdog's HTTP connections. If the connection is lost than the HTTP server will try to reopen the port! The benefit of this feature is that network card errors will not kill Watchdog. So this makes Watchdog a little bit more error tolerant.



2007.06.05

Added statistics gathering module. It is now possible to safe all gathered data shown by process monitor in a MySQL database. The stored data will allow high availability and performance analysis plus performance forecast.

Asynch monitoring using multiple threads is now in stage test.


2007.06.02

Added a traffic light logic to Watchdog.

  • Green - monitored and running processes
  • Yellow - running but not monitored processes
  • Red - monitored and not running processes

Each link contains now the processes name to get unique links for configuration dialog and statistics page/image. Started reading PNG documentation to create statistic images using PNG graphs.


2007.05.27

  • added POST function to Watchdog's HTTP server
  • added button for graphical analysis to intro page
  • reorganized internal function handling of Watchdog's internal reporting
  • removed some additional bugs from the HTTP servers stuff visible if using POST functions

It's required to write a parser for the URI posted to remove special characters from the POST to get valid ASCII text.

One benefit of C programming is clear to see: Up to now Watchdog's executeable is only 180kB!


2007.05.26

Started writing process monitoring configuration dialog. Using this dialog a user can configure process monitoring online. It's not required to edit Watchdog's inifile. All changes done here will be saved and immediately used! It's planned to make this work including start of real monitoring until 2006.06.03.

The idea of this dialog is simple: If the processes monitor could not detect that a process is already under Watchdog's control than a user can change this. In case of a failure a user can restart the process or do whatever required to make it work.

I'm playing with the idea not to publish all of the sources written by me. Maybe I will publish most parts of living Watchdog only as archives and give the public the header files onls. I'm sorry about this.


2007.05.23


Fixed all memory leaks. Details:

  • m_linux.c function machine_init malloc of p_active
  • httpserver.c function w3ServerGetPath and w3ServerRespond (twice token not freed!)


Marked all memory leaks with comment "Avoid memory leaks" in the source.

Report produced by watchdog?PROCESSES (file lib/gui/report/top.c) completely migrated to data gathered by m_linux.c. Removed all functions written by me to avoid concurrent access of /proc directory.

Added Version generator. A script which creates for each call of make a unique number and generates a date. Users can now see the actual build date of Watchdog at each page header.

2007.05.22

Found some bugs in top-3.6.1 machine dependent code in file m_linux.c. There is a a huge memory leak.

See

  • m_linux.c function machine_init

2007.05.20 Now the GUI of Watchdog look like top but it isn't a replacement for top. The goal of the Watchdog project is to create a HA management and monitoring project and to use an interface as already offered and well known by a huge community.

2007.05.19 Exchanged top-3.51 beta files using top-3.6.1. Now process monitor uses data generated by machine dependent code in m_linux.c as used by top-3.6.1.

I've added two subroutines to m_linux.c. They have to be "mutexed" because they are not thread safe. Especially the variable static struct top_proc **nextactive; have to be protected for access by multiple threads.


2007.05.17 Added thread synchronization using mutexes. Reading /proc directory, scheduling and process monitoring and reporting can now be done asynchronous. Race conditions seems to be handled efficient.


2007.05.16 Added a publishing function to watchctl for automated publishing of the source to Apache's directory.

Added top's machine dependent file m_linux.c and some others (look at include/machine) to the project. It was required to do some changes at some of the files from the top project to make them work within Watchdog. I've not touched the interface of the machine dependent files used by top. This is the first step to make Watchdog operating system independent. I will create a layer between Watchdog and top's machine layer because top seems not to be multi threaded. Synchronization between reading of /proc directory and monitoring have to be added.

Next steps will be:

  • Make processes HTML report work like top with additional Watchdog functions
  • Make scheduler threads really runable (Need to make some code of old Watchdog thread safe. Maybe I have to rewrite this part!)


2007.05.10 Added administration utility watchctl. Working on a process monitor comparable to famous top including status of Watchdog monitored processes. This idea will change the complete layout of the intro (first image you see if you run Watchdog) of Watchdog.

I will develop this because I need a nice testcase for the htmlgen.h library. Major work is actual development of a powerful C function suite usable as HTML generator. This will probably take 2-4 hours spread over 3 days. Hopin I've done this on Sunday 2007.05.13.

Tested Watchdog automatic startup in all runlevels. Seems to work pretty good. Installation is simple using watchctl. A user have to type watchctl -install.


2007.05.06 Added first throw of an internal status report to Watchdog using an ImageMap at the first image. Changing the scheduler. Working parallel on scheduler and GUI.

2007.05.01 Published all of the source and a binary of actual running Watchdog. This is just a look into our daily build directory.

It offers single shot monitoring if watchdog is started via cron and some other features for those interested in Watchdog.

Actual I'm working on Watchdog's GUI.

2007.04.22

Watchdog run one week without any failure.


2007.04.22

Actual Watchdog is a loosely grouped set of functions bound to one executeable file. The source files contain all the code required to run Watchdog as expected. But now I have to do the fine tuning to make it work the way a user would expect because today I change the code to build a release which fits for my needs. That's not userfriendly and that's not what is called a product.

The current state is:

Good:

  • Commandline parameter scanner run perfect
  • Process detection and error management run perfect
  • HTTP server run perfect (For a small number of users!)
  • Daemon mode run perfect

Bad:

  • Private memory management is nice to have but not implemented
  • No security features at Watchdog's HTTP server implemented.
  • Have to do a code review:

-> Remove unnecessary functions like StringDup or StringConcat in Util.c
-> Add limits from limits.h where required!

  • makefile is bad for a large project like this.
  • Have to rename all source files in report directory.


Well the good news is it runs now at OraForecast.com and this is a first stability check.

General License Information


    Watchdog , a simple to use Linux  High Availability management software
    Copyright (C) 1989-2007  Gerald Roehrbein

    This software contains libraries I've alread implemented between 1989
    and 2007.
    
    This software contains code written and Copyrighted by Christian Gosch
    to implement a HTTP server.
    
    This software contains code written by William LeFebvre (Author of top)
    also know as the UNIX TOP project http://www.unixtop.org. Actual 
    Watchdog uses anything required to use the so called machine dependent
    modules of top. Actual it uses m_linux.c written by Richard Henderson and
    Alexey Klimkin.
    
    I've done some changes to the top files to make it work for me!


    This program is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License along
    with this program; if not, write to the Free Software Foundation, Inc.,
    51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.


    Contact information:

    e-mail: Gerald.Roehrbein@OraForecast.com

    Gerald Roehrbein
    Boskamp 19
    24214 Gettorf
    Germany

You can find the Version 2.0 of the GPL at Watchdog license and if you have downloaded and installed Watchdog at /etc/watchdog/source/doc/gpl or at gnu.org.

Commercial users should read our LICENSE file at /etc/watchdog/source/doc/LICENSE or at Watchdog LICENSE file.

Author


(c) 2007 by Steven of Oz
Author
NameSteven
httpwww.oraforecast.com
e-mailGerald.Roehrbein@OraForecast.com
Release0.90
Created2007.03.18
Last update2007.04.06 00
Changesnone
Estimated duration of installation5 minutes
LicenseGPL


Up to now this is just the projects documentation. Download will be available at Watchdog download.

Purpose


A free configurable system health check robot which is able to restart died programs or reboot a system. It's just a cluster ware. You should be able to setup a complete cluster with this piece of software for non stop services.

Watchdog monitors a free defineable number of processes and is able to run free defineable healthchecks and executes any action you need to offer non stop services of your server.

Watchdog is able to send e-mail, SMS, FAX or whatever you need to an admin or a special user owning a service.

Look at the watchdog.ini options to get some details.

Another purpose of this project is to get an easy to use and easy to understand clustermanager with a coding manageable by anybody able to read C language source files. This means that I try to comment and document everything in a useful way.

Watchdog is a very small but very powerful Linux cluster management system having all required functions including HTML based setup and status pages. A sysop using watchdog should be able to do complete remote management of large Linux HA server farm accessing Watchdog using only a web browser.

Architecture


Watchdog is a multi threaded HA cluster manager with a high degree of parallelism. For more information read General Programming Concepts Chapter 9-11.

Watchdog runs for each process to be monitored a configurable thread which checks the existence and health of a process in a user defineable time interval. This threads will do all the recovery of dead or defunc processes by their own.

Watchdog offers a HTTP server thread which can be used to configure and monitor each Watchdog monitored process by an admin.

Watchdog runs as a background daemon and you can start Watchdog in any runlevel. Watchdog is able to replace a lot of functions the init process offers. Watchdog is able to manage all processes configured in /etc/init.d or /etc/rc.d directories. Compared to init Watchdog offers the ability to check all the started processes and to deal with a lot of problems they may have.

To be as safe as possible Watchdog allocates required memory once during startup. The only resources Watchdog requires after startup is a working CPU, accessible MEMORY and access to /proc directory.

  • If all of the memory is dead system is dead.
  • If all CPU's are dead system is dead.
  • If /proc is not accessible system is dead.

If one of the events described above occur a system is really dead and Watchdog won't be able to rescue it.

To handle such problems additional computers are required! I will describe the way to handle such situations using Watchdog later.

The architecture of Watchdog allows to solve any cluster aware HA problem.

Supported Operating Systems

Up to now it is planned only to support LINUX operating system. During development of the HTML process monitor I had some problems gathering and interpreting all of the files stored in /proc directory. So I decided to have a look into the sources of top. There I found a very well designed modular set of source files using generic functions which allows to gather all required data. If I would use this set of functions this could help to allow support of most of the operating systems supported by top.

To reach this goal it seems to be required to create also a modular structure for access of all of the runlevel dependent information.

On the other hand it seems to be a good idea not to replace the init process by Watchdog but make Watchdog an assistent of init.

This ideas will increase complexity of Watchdog by many times.

Up to now it is planned only to create a running prototype for LINUX operating systems until July 2007 and than to decide what to do. I treat this project now a little bit more like a rapid development (RAD) project to create with a small amount of time and a small budget a running prototype.

Installation


  • Download package watchdog.tar.gz
  • mv tar to /etc
  • etxtract using tar -xzvf watchdog.tar.gz watchdog/*
  • cd watchdog/source
  • add path to /etc/watchdog/util to your environment (~.profile / ~.bashrc )
  • if done just type watchctl -start if watchdog starts than nothing have to be done
  • if procedure above does not work than you should rebuild watchdog. Just type watchctl -rebuild for rebuild.
  • Install creates
Directories:
/etc/watchdog
/etc/watchdog/modules
/etc/watchdog/modules/<process>/restart
/etc/watchdog/modules/<process>/onerror
/etc/watchdog/modules/<process>/subprocess/onerror
/etc/watchdog/modules/<process>/subprocess/restart
/etc/init.d
/etc/rcX.d 

<process> - one directory for each monitored process. 

Default installation creates also monitoring modules for:

* ddclient
* mysql
* apache2
* postfix
* courier imap
* openLDAP
* oracle RDBMS
* oracle Listener
* oracle RAC
* generic

Files
/etc/watchdog/watchdog
/etc/watchdog/watchdog.ini
and proprietay modules, restart and onerror functions for each monitored process in the modules directory.

Watchdog's data model

  • If you want to gather process data than you have to create a user and three tables in a MySQL database
  • A MySQL database is not required to run Watchdog. It is only required to have statistical analysis!
  • Create a database using next script:
CREATE USER 'watchdog'@ '%' IDENTIFIED BY '********';
GRANT USAGE ON * . * TO 'watchdog'@ '%' IDENTIFIED BY '********' 
WITH MAX_QUERIES_PER_HOUR 0 
MAX_CONNECTIONS_PER_HOUR 0 
MAX_UPDATES_PER_HOUR 0 
MAX_USER_CONNECTIONS 0 ;
CREATE DATABASE `watchdog` ;
GRANT ALL PRIVILEGES ON `watchdog` . * TO 'watchdog'@ '%';
CREATE TABLE IF NOT EXISTS `dual` (
  `X` varchar(1) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COMMENT='dual like Oracle dual';

INSERT INTO `dual` (`X`) VALUES
('X');

CREATE TABLE IF NOT EXISTS `global_stats` (
  `system_id` varchar(128) NOT NULL,
  `timestamp` datetime NOT NULL,
  `users` int(11) NOT NULL,
  `load_1` int(11) NOT NULL,
  `load_2` int(11) NOT NULL,
  `load_3` int(11) NOT NULL,
  `t_total` int(11) NOT NULL,
  `t_running` int(11) NOT NULL,
  `t_sleeping` int(11) NOT NULL,
  `t_stopped` int(11) NOT NULL,
  `t_zombie` int(11) NOT NULL,
  `c_usr` int(11) NOT NULL,
  `c_sys` int(11) NOT NULL,
  `c_ni` int(11) NOT NULL,
  `c_idle` int(11) NOT NULL,
  `c_wait` int(11) NOT NULL,
  `c_hi` int(11) NOT NULL,
  `c_si` int(11) NOT NULL,
  `m_total` int(11) NOT NULL,
  `m_used` int(11) NOT NULL,
  `m_free` int(11) NOT NULL,
  `m_buffers` int(11) NOT NULL,
  `s_total` int(11) NOT NULL,
  `s_used` int(11) NOT NULL,
  `s_free` int(11) NOT NULL,
  `s_cached` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;


CREATE TABLE IF NOT EXISTS `process_stats` (
  `system_id` varchar(128) NOT NULL,
  `timestamp` datetime NOT NULL,
  `pid` int(11) NOT NULL,
  `user` varchar(128) NOT NULL,
  `pr` int(11) NOT NULL,
  `ni` int(11) NOT NULL,
  `virt` int(11) unsigned NOT NULL,
  `res` int(11) unsigned NOT NULL,
  `shr` int(11) unsigned NOT NULL,
  `state` int(11) NOT NULL,
  `p_cpu` double NOT NULL,
  `p_mem` double NOT NULL,
  `time` int(10) unsigned NOT NULL,
  `command` varchar(2048) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Remark: If you install database and tables after startup of Watchdog than you can reactivate data gathering using kill -SIGUSR2 <pid>. You can stop data gathering if you just change the password. Watchdog will not have any problems with lost MySQL database connections or something like that.


Additional Remark: The datamodel is very simple! It's not normalized because up to now the idea is just to store gathered data as fast as possible with as less as possible code!

Usage (Quick Start Guide)


  • Configure watchdog.ini for your needs
  • Configure number of processes required to run
  • Configure actions in case of a failure of a monitored process
  • Use command line parameters to override ini file settings
  • Set environmentvariables to increase or decrease loglevel
  • Start current version of watchdog regular via cron using no parameters!
  • Use integrated Watchdog HTML manager for all configuration tasks (Planned!)


A user can configure the general log level using environment variable DBG_ERRCLASS as described in logwriter.h to increase or decrease the number of messages generated by Watchdog. You have also the opportunity to give each process logging attributes using the keys debug=[yes|no] and syslog=[yes|no]. If DBG_ERRLCASS is set to 0 than Watchdog will not log anything. This is the silent mode. During normal operation users should set DBG_ERRCLASS to 1 to have all error messages in /var/log/syslog.

Command Line Options

Another option is using command line to configure some of Watchdogs features or to override settings in Watchdog's configuration file. If you start Watchdog with option -? than Watchdog will print an online help. For more details of Watchdog's command line parameter handling read Options.c or Watchdog_installation_and_configuration_guide.

List of command line options:

-i <inifile> path and name of Watchdog's ini file    
-p <number> where Watchdog will offer HTTP protocol.        
-d <number> Debuglevel between 0 and 5.                     
-h enable HTTP server.                                      
-s enable scheduler.                                        
-t check interval. Check delay between 50% and 100% of this 
-n enable daemon mode. Run Watchdog as a background daemon  
-o <section> check one section defined in ini file.         
-? Help                                                     

Configuration File

Sample watchdog.ini (Release 2007.06.14):

;
; (c) 2007 Gerald Roehrbein
; Gerald.Roehrbein@OraForecast.com
;
; Read LICENSE AGREEMENT at the end of this file
;
; Watchdog configuration file
; Define in section main with the processes key the processes to monitor.
; The admin key defines an address where to send messages if the program
; have to restart one ore more of the monitored processes.
;
; Service defines a service which is called to send a message to an admin.
; If mail=yes in a process section, than the admin will receive a message.
; You can also define service and mail inside a process section to send
; only messages to the owner of the process. This works regardless mail is set
; or not if these keys are defined!
;
; Lines starting with a semikolon are treated as comments!
;
; Syntax hint:  Do not use spaces between key and equal sign.
;               Everything is case sensitive!
;               Start at the left side.
;               Do not use leading blanks or other whitespace!

[main]
;processes - List of sections which contain detail information of processes to check
;admin     - email of the admin to contact in case of an error
;            The system will send one mail to the admin and if defined a second
;            to a user having a much more detailed knowledge of the problem
;service   - The command which we will use to send the admin a message
;on_errors_force_reboot - This parameter forces after a number of not recoverable watchdog
;                         errors a reboot of the system.
;                         If this parameter is not set than Watchdog will die after
;                         an such errors like missed dynamic memory allocation or
;                         being unable to open a file or being unable to execute
;                         external modules which exists.
;
;
;http           - http server on                cmdline param -http -nohttp
;irq            - irq handler on                cmdline param -irq -noirq
;scheduler      - scheduler on                  cmdline param -scheduler -noscheduler
;daemon         - start watchdog in daemon mode cmdline param -daemon -nodaemon
;
;MySQL settings
;mysql_host     - MySQL hostname
;mysql_user     - MySQL username
;mysql_pass     - MySQL password
;mysql_db       - MySQL database
;mysql_port     - Port where MySQL listens! Attention if set to -1 than gathering statistics is complete offline!
;

;processes=ddclient,apache2,postfix,courier,mysql,oracle,oraclelistener,oracleRAC,openLDAP,generic
processes=ddclient,apache2,postfix,courier,mysql,openLDAP,generic
;processes=generic
admin=bbs@oraforecast.com
service=mailx -s "Watchdog @$host" $admin "Watchdog found and tried to solve a problem with process $process."
on_errors_force_reboot=100
http=on
daemon=on

mysql_host=localhost
mysql_user=watchdog
mysql_pass=watchdog
mysql_db=watchdog

; Remember: If mysql_port is set to -1 than no statistics is written to the database !
mysql_port=0
mysql_socket= /var/run/mysqld/mysqld.sock


;
; Statistics gathering level:
;
; stat = none                 0
; stat = all                  1
; stat = monitored            2
; stat = not_monitored        3
;

stat=monitored

[http]

;HTTP Server configuration
;root           - HTML root directory
;cgi-bin        - cgi directory
;port           - port where HTTP will serve
;index_file     - default index_file

root=/etc/watchdog/htdocs
cgi-bin=/cgi-bin
port=81
index_file=index.html

[scheduler]
;handler        - on/off turn on scheduler
;timing         - number of seconds
handler=on
timing=60

[irq]
;handler        - on/off turn on interrupthandler
handler=on

[ddclient]
; Documentation for complete set of params for a process section:
; check    - pattern which should exist in the process list or /proc/<pid>/cmdline file
;            pattern have to be a substring of cmdline which can be used
;            to identify the running process correctly
; count    - expected number of processes with this pattern
; external - user defined module which return 1 if succesful and 0 if not
;            This could be used to do hangchecks of the module.
; restart  - command to restart process on error
;            This command must ensure that it returns 0 if it could start the process
;            succesfully and another value if not!
;            Watchdog will not monitor that this job was really succesfully.
; mail     - in case of an error send mail to admin
; owner    - the owner of this program we have to inform
; service  - a message service to use to send a message to a special user defined by owner
; message  - file which contains a message to send in case of an error
; retries  - number of tests before sending a mail. delay at main defines delay in seconds between retries
; onerror  - in case of an error (after unsuccesful number of retries) execute shell command
; debug    - debug messages to syslog dependent on environment variable DBG_ERRCLASS
; syslog   - error messages and warnings to syslog dependent on environment variable DBD_ERRCLASS
; on_errors_force_reboot - if the onerror command does not exist or is not executed without an
;                          error than Watchdog will reboot the operating system if this
;                          parameter is set to yes and an unrecoverable error occurs!
;
;schedule  - cron like scheduling
;            minute hour day month year
;            example:
;            0,5,10,15,20,25,30,35,40,45,50,55
;            0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23
;            mo,tu,we,th,fr,sa,su
;            jan,feb,mar,apr,mai,jun,jul,aug,sep,oct,nov,dec
;            2007,2008,2009,2010
;            use '*' as aliase for each group. Minute can be in intervall  0-59
;
;stat       - yes or no for MySQL based logging of this monitored process


check=ddclient
count=1
external=/etc/watchdog/modules/ddclient/mod_ddclient
restart=/etc/watchdog/modules/ddclient/restart/restart.sh
mail=yes
owner=bbs@oraforecast.com
service=mailx -s "Watchdog @$host" $owner
;message=/etc/watchdog/modules/ddclient/message.txt
retries=5
delay=10
;onerror=shutdown -r now
debug=yes
syslog=yes
on_error_force_reboot=no
schedule=* * * * *
stat=yes

[apache2]
check=apache2
external=/etc/watchdog/modules/apache2/mod_apache2
restart=/etc/watchdog/modules/apache2/restart/restart.sh
mail=yes
debug=no
syslog=no
schedule=* * * * *
stat=yes

[postfix]
check=saslauthd
external=/etc/watchdog/modules/postfix/mod_postfix
restart=/etc/watchdog/modules/postfix/restart/restart.sh
mail=yes
debug=no
syslog=no
schedule=* * * * *
stat=no

[courier]
check=couriertcpd
external=/etc/watchdog/modules/courier/mod_courier
restart=/etc/watchdog/modules/courier/restart/restart.sh
mail=yes
debug=no
syslog=no
schedule=* * * * *
stat=no

[mysql]
check=mysqld
external=/etc/watchdog/modules/mysql/mod_mysql
restart=/etc/watchdog/modules/mysql/restart/restart.sh
mail=yes
debug=no
syslog=no
schedule=* * * * *
stat=yes

[oracle]
check=oracle
external=/etc/watchdog/modules/oracle/rdbms/mod_oracle
restart=/etc/watchdog/modules/oracle/rdbms/restart/restart.sh
mail=yes
retries=2
delay=5
debug=yes
syslog=yes
schedule=* * * * *
stat=no

[oraclelistener]
check=listener
external=/etc/watchdog/modules/oracle/listener/mod_listener
restart=/etc/watchdog/modules/oracle/listener/restart/restart.sh
mail=yes
retries=2
delay=5
debug=yes
syslog=yes
schedule=* * * * *
stat=no

[oracleRAC]
check=gsd
external=/etc/watchdog/modules/oracle/rac/mod_rac
restart=/etc/watchdog/modules/oracle/rac/restart/restart.sh
mail=yes
retries=2
delay=5
debug=yes
syslog=yes
schedule=* * * * *
stat=no

[openLDAP]
check=slapd
external=/etc/watchdog/modules/openLDAP/mod_ldap
restart=/etc/watchdog/modules/openLDAP/restart/restart.sh
mail=yes
retries=2
delay=5
debug=yes
syslog=yes
schedule=* * * * *
stat=yes

[generic]
;
; Generic example
;

check=generic
;external=/etc/watchdog/modules/generic/mod_generic
restart=/etc/watchdog/modules/generic/restart/restart.sh
onerror=/etc/watchdog/modules/generic/onerror/onerror.sh
mail=yes
owner=Gerald.Roehrbein@OraForecast.com
service=mailx -s "Watchdog @$host" $owner
message=/etc/watchdog/modules/generic/message.txt
;retries=10
retries=1
delay=2
debug=yes
syslog=yes
on_error_force_reboot=no
schedule=* * * * *
stat=no

[END]


Sample bash script to start watchdog:

#/bin/bash
export DBG_ERRCLASS=1
cd /etc/watchdog
watchdog 

Next version of watchdog offers a deamon mode to be independent from a scheduler. If you need to run watchdog in a special interval you can run watchdog via cron a few times a day or every 5 minutes.

Signal Handler (Software Interrupt Management)

Watchdog handles three signals:

  • SIGTERM Terminate Watchdog
  • SIGUSR1 Force reload list of running processes
  • SIGUSR2 Force reload of ini file (That's something like a reset of Watchdog)


In a UNIX environment you can send signals using the kill command. First identify Watchdog's process ID using ps -efl| grep watchdog (maybe there is already a systems Watchdog) and than send the signal using kill -s <signal> <PID>.

Watchdog Control Program (watchctl)

watchctl simplifies major Watchdog administration tasks.

watchctl offers functions to start, stop, reload, install in init.d, reporting and query Watchdog's status. Reporting functions require a running Lynx browser.

Installation of Lynx for Ubuntu Linux Users should be possible using:

  • apt-get install lynx

It is required to add watchctl (in directory /etc/watchdog/util) to systems PATH.

Add next two lines to your environment (For Example /etc/profile or /root/.profile or in .bashrc).

  • export WATCHDOG=/etc/watchdog
  • export PATH=$PATH:$WATCHDOG/util

watchctl screenshot of online help facility



Image:Watchctl.png


For much more details study source.

Installation and configuration guide


Click at the link to move to the in deep Watchdog installation and configuration guide.

Recommendations


Example(s)


You will have access to OraForecast.com's Watchdog's build in HTML interface if you click at OraForecast.com Watchdog live.

There you will have access to all of the read only features of Watchdog.

Security related issues


Up to now the Watchdog HTTP server do not check malformed URI's. It is running and usable for the public to show that it really run but probably it have a lot of security holes. This is now unplanned work I have to start with. Now I will check the Watchdog daemon to offer save HTTP traffic.

Probably I will add an authorization page and allow only authorized session access. This seems to be a simple way to make the Watchdog HTTP server secure.


How to make this feature as secure as possible

How to build this project from scratch


The source code is very complex. It's written in the C programming language using POSIX standards. Source is documented as good as I can and there are a lot of comments at it.

All you need to use is a C compiler and a make utility.

Installation of required tools is just simple if using Ubuntu Linux 6.10 Edgy Eft:

  • apt-get install gcc
  • apt-get install make
  • apt-get install splint

For writing C programs I use Midnight Commander and VIM.

  • apt-get install mc
  • apt-get install vim

You also need to install the C libaries and documentation

  • apt-get install build-essential
  • apt-get install manpages-dev



For MySQL and PNG support you also need


MySQL required libraries
  • apt-get install libmysqlclient12-dev (Ubuntu users)

others


libpng required libraries
  • apt-get install libpng12-dev (Ubuntu users)

others


zlib required libaries (required by libpng)
  • apt-get install zlib1g-dev (Ubuntu users)

others


freetype required libaries (required to write text to PNG images)
  • [1] Freetype Release 2.34

To make things easy the Watchdog tarball contains all required files and an installation procedure for them! But you can use any dev release you want or already have installed but at your own risk! I will not guarantee that Watchdog will run with any MySQL, LIBPNG or ZLIB release.


Download watchdog.tar.gz and required support files.

  • sudo -s -H
  • cp <source> /etc
  • cd /etc
  • tar -xzvf watchdog.tar.gz
  • cd watchdog/source
  • make
  • make install
  • make rmsource # if you want to remove the source of this project


Planned enhancements


  • add a contribution list for all guys I've used code from! I'm sorry but I can not write each name to the license. I will add a contribution list to the software. Please forgive me if I've forgotten someone than please write me an e-mail. Thanks!
  • make it runable as daemon (done 2007.04.22)
  • make it configurable via Web Browser (added HTTP server 2007.04.22)
  • make it a HA manageable replacement of the Linux INIT process

Reminder: Up to now I have the idea to allow users to rename all start and stop procedures in /etc/rcX.d. It's planned to give all of the Watchdog processed files the prefix W.

For example if you start MySQL in runlevel 3 (/etc/rc3.d) using S19mysql than give it an additional prefix W will end management as usual by init and allow Watchdog to manage startup and stop. This will simplify Watchdog's configuration!

It's just an idea up to now but it seems to be a good idea. The only process good ol' init have to start is Watchdog. This will be the first simple to realize step and later I will try to remove init completely!

This will have no effect on watchdog.ini structure!


  • process report should look like utility top plus Watchdog monitoring add on (done 2007.05.20)
  • make Watchdog a tool which learns (by assistance of an admin) what is required and what to do
  • store gathered data in a MySQL database and write graphs (done 2007.06.08)
  • add modules to gather load from third party products
  • add linear regression analysis module already implemented in JAVA for systems load analysis.
  • add gathering of IO details (storage and network)

Reminder:

HA is not only making a system 24hours*365 days per year running. HA is also a perfect load forecast of a system! Because a system grow and the owner should know when to buy additional hardware or what piece of software should be redesigned!

Discussions


Do we need such a software? I need it and probably you too. All the other stuff I've seen dealing with such problems is complex, difficult, expensive. OraForecast.com's watchdog is smart. Yes we need it!

Why C language?

I'm able to do coding in a lot of scripting and programming languages but I prefer the C language. The reason is simple:

First of all there are nearby no C programmers. The chance that someone steal my work and use it to give it to anybody else is just small. The number of PHP, PERL and JAVA programmers is very large and I know that they steal ideas and coding.

The other reason is that C programming is very very powerful. Using C language you can write very small, modular and powerful programs with a few lines of code.

I know that there are a lot of people saying C language is complicated because the programmer have to deal with pointers. In my opinion that's a reason to use C language because pointers are the most powerful feature I've ever seen and used. No other language (except assembly language) supports the pointer concept but they use it implicit.

Using C or C++ is portable and the fine art of programming for process automation.

Why not using C++ for this project?

Using C language seems to be old style and outdated but this is wrong. I would use C++ for a project if I would expect to handle with objects behaving in a way and not dealing with functions on some data. The design of a C programm to solve such a problem is in my opinion pretty much easier to understand and since I force always to use the old KISS principle I decided to Keep It as Simple as a Stupid like me can.

Design issues

Up to now Watchdog allocates and deallocates during runtime a lot of resources dynamically. Probably I will change this in future releases. Using dynamic memory allocation is just a good style but for a HA daemon like Watchdog this seems not to be the best design.

The only resources Watchdog should require during normal operation is CPU time and access to /proc directory. In case of a really heavy problems Watchdog should be able to initialize a reboot of the server by calling void kernel_restart(char *cmd). To enable this feature you can add the key on_error_force_reboot=yes for a monitored process or as a general behaviour in section main using the key on_errors_force_reboot=100 (number of times not recoverable, only Watchdog related errors, can occur until Watchdog forces a system reboot).

Future releases will allocate memory once during startup of Watchdog and if a user forces reload of watchdog.ini.

If not using on_error_force_reboot or on_errors_force_reboot than, if Watchdog won't be able to allocate memory, which could happen if one of the running processes have a memory leak, it will run on a failure itself and abort.

Today I discuss solutions to automatically restart watchdog if it dies or reboot the system. On the other hand there are kernel built in functions like hangcheck which should be able to detect a complete hanging system and reboot it.

Watchdog is designed in a generic way to give a sysop the option to configure a HA system and allow to define jobs which checks resources and deal with processes having memory leaks.

Public discussion bulletin board system

For users interested in Watchdog I've configured a free BBS discussion group. You must not register there to start Watchdog related discussions. If you want to register as a user than register as a user of the Wiki using login/register page.

History


  • 2007.03.18 Start of project
  • 2007.04.07 Starting debugging of core functions. The code is up to now not stable.
  • 2007.04.17 Added command line options
  • 2007.04.17 Added HTTP server
  • 2007.04.21 Watchdog installed at OraForecast.com. Just try Watchdog ! ;-) It works!
  • 2007.04.22 Wrote scheduler. Process monitoring will now be done via parallel working threads for each process to check
  • 2007.04.23 Changed syntax of schedule statement in Watchdog.ini
  • 2007.04.22 - 2007.04.29 Watchdog ran for one week as a daemon
  • 2007.04.29 Made Watchdog LINT clean to be ISO C90 compliant
  • 2007.04.30 Made command line options and HTTP server config via INI really running
  • 2007.04.30 Added HTML generator to produce status reports
  • 2007.04.30 Offered download of really running source code via HTTP
  • 2007.05.01 Migration of a C++ HTML container class written in 1998 to ANSI C90
  • 2007.05.01 Wrote htmlprintf to write a text printf alike to a socket
  • 2007.05.01 Removed lib/httphtml.c and include/httphtml.h.
  • 2007.05.01 Removed function w3ServerSendError from lib/httpserver.c
  • 2007.05.01 Added first Watchdog generated internal status report handler using ImageMap
  • 2007.05.10 Changed htmlprintf to allow RAW socket writing
  • 2007.05.10 Added watchctl to project. This is the central Watchdog administration facility
  • 2007.05.10 Adding HTML table management for reporting
  • 2007.05.10 Installed Lynx at www.oraforecast.com
  • 2007.05.11 Added styles to htmlgen.c page generator and redesigned some functions.
  • 2007.05.11 Writing top alike process monitor for Watchdog. Click here
  • 2007.05.20 Rewritten data gathering and reporting of /proc using m_linux from top-3.6.1
  • 2007.05.27 Added HTTP POST to Watchdog's WebServer
  • 2007.05.27 Starting Watchdog's online documentation Click here for a demo
  • 2007.05.28 Started adding PNG graphics module to create graphics in PNG format for Watchdog's statistics module

Known bugs/ Issues


  • Reported Description of problem Fixed Release
  • yyyy.mm.dd error xy at line yyyy.mm.dd n.nn
  • 2007.04.07 In file profile.c. Reading of key started in correct section but doesn't stop at next section - fixed
  • 2007.04.21 In file watchdog.c bad thread management...the published release worked with Red Hat but not with Ubuntu. -fixed
  • 2007.04.29 Termination using SIGTERM waits for HTTP thread until next connection in Watchdog.c - fixed
  • 2007.05.06 Have to completely rewrite the makefile. It was planned for a small project. And now....
  • 2007.05.06 Project becomes more and more a prototype because I know a lot of really required changes
  • 2007.05.20 Possibly Watchdog have memory leaks since added functions from top-3.6.1. - fixed
  • 2007.05.20 CPU IDLE value seems to show sometimes false values. No idea why. Using same datatypes and format strings as used by top.
  • 2007.05.20 Number of active users is just hard coded 3. - fixed counting utmp struct
  • 2007.05.27 It's required to write some help files for Watchdog. Need some support by someone able to write perfect english.
  • 2007.06.03 Watchdog crashed tonight. Possibly reason: Alle sections and keys in ini file are case sensitive. There where two sections false configured.
  • 2008.01.23 Watchdog seems to run pretty much since 6 month. I'm not working at this stuff because the Siemens Lifebook I've used for development crashed. Have a copy from the harddisk but I have to order a new Notebook. It seems that Watchdog is something a lot of people would like to have.

License


Personal tools