Re: the event monitor in 0.9i

Juergen Schoenwaelder (schoenw@sol)
Fri, 29 Jul 94 19:44:58 +0200

Hi!

John> It would be really nice if the event monitor could work from a file,
John> or a fifo. Then there would be no reason to run the special syslogd
John> when an entry like:

John> *.debug /var/tmp/scotty_log

John> would do the trick. Of course an entry would have to be added to the
John> menu to allow you to rotate the log file.

Below is a diff that should do the job. It uses tail(1) to read the
syslog file beginning at the end. But there is a difference in reading
events from syslog files: The level and facility information is lost
when the messages are appended to a log file. The event script will
set them to be empty in this case.

John> It would be nice to have notification ability for the rest of
John> the snmp or ip monitoring capabilities as well.

I decided to put this kind of thing into a separate package, because
it should be possible to create events even if no tkined is currently
running. The netguard (an unfinished version is part of scotty-0.9i)
will be able to create events when variables monitored by simple
probing scripts reach user defined thresholds etc. But this is not
complete yet and needs some more work.

John> Also it would be nice if you could define different actions
John> (beep, flash, email, popup a window (the last is very useful
John> since I usually have tkined open on another part of my
John> desktop, or it is iconified, and I don't look at it when the
John> problem occurs.) etc.

OK. The documentation about the idea of the event script is clearly
missing. So I will try to explain something here.

An important (perhaps the most important) task of a network management
system is to create events describing abnormal situations and to
present them to the people responsible to solve the problem in an
appropriate way. There are some questions to answer:

1 Who creates events?

2 Where are events stored?

3 How are events transmitted over the network?

4 How do we handle series of events telling us all the same?

5 How do we filter interesting events?

6 What should be done if the software has noticed an interesting event?

Fortunately, most UNIX systems have a good and mostly reliable system
logging facility (syslog). It has some deficiencies, but it is useful
in most settings (messages can get lost, messages can become unordered
due to clock screw, the length of a message is limited, no support to
read back old events with all information, no protection against
forged messages). The good thing about syslog is that it solves 2, 3
and partly 4 (syslog normally writes something like 'last message
repeats 25 time when a series of equal messages is received).

The event script handles 5 and 6 in a simple, but useable way. To
filter events received from a forwarding syslog host (or read from a
file), you can define event filters. An event filter has a name (to
remember what it does), and a series of regular expressions that are
matched against the attributes describing an event (host name, event
level, event facility, the name of the process that created to event,
and the message that describes the event).

For each event filter we store an action to be taken if this event
happens. This action is a shell command that allows you to do any
action appropriate for the current filter (play a sound, ring the
telephone or just log you out to go home before its too late). The
only action taken by tkined is to highlight objects that send events.
You can turn this feature off as you can completely suspend a
particular filter.

This is all very simple. But of course there are problems: syslog does
a nice job to compress repeated messages, but it does this from a
local point of view. There is still some need to improve 4. Think of a
NFS server going down that is mounted by 50 clients. Everything will
start flashing and beeping. You can get around this by defining clever
filter expressions, but this needs some care.

That's enough for now. If you have any ideas or comments you might
want to share, please let me (us) know. Below is the patch (in case
you have forgotten why I wrote this mail :-).

Juergen

*** /usr/local/tmp/scotty-0.9i/tkined/event.tcl Wed Jul 13 17:08:07 1994
--- event.tcl Fri Jul 29 18:52:20 1994
***************
*** 36,41 ****
--- 36,54 ----
} else {
set server moloch.ibr.cs.tu-bs.de
}
+ if {[info exists default(file)]} {
+ set file $default(file)
+ } else {
+ set file /var/adm/messages
+ }
+ if {[info exists default(use)]} {
+ set use $default(use)
+ } else {
+ set use server
+ }
+ if {$use != "file" && $use != "server"} {
+ set use server
+ }

##
## The sl_level table is used to map priorities to readable strings.
***************
*** 239,245 ****
{ {Status:} $filter($id,status) radio active suspend } \
{ {Highlight:} $filter($id,highlight) radio true false} "]

! if {$result == ""} return

set filter($id,name) [lindex $result 0]
set filter($id,host) [lindex $result 1]
--- 252,260 ----
{ {Status:} $filter($id,status) radio active suspend } \
{ {Highlight:} $filter($id,highlight) radio true false} "]

! if {$result == ""} {
! return ""
! }

set filter($id,name) [lindex $result 0]
set filter($id,host) [lindex $result 1]
***************
*** 250,255 ****
--- 265,295 ----
set filter($id,action) [lindex $result 6]
set filter($id,status) [lindex $result 7]
set filter($id,highlight) [lindex $result 8]
+
+ return $id
+ }
+
+ ##
+ ## Select a filter from all existing filters. Returns the selected
+ ## id or an empty string.
+ ##
+
+ proc select_filter {} {
+
+ global filter
+
+ if {![info exists filter(ids)]} {
+ ined acknowledge "There is no filter defined yet."
+ return
+ }
+
+ foreach id $filter(ids) {
+ lappend filternames "$id $filter($id,name)"
+ }
+ set result [ined list "Select a filter to modify:" $filternames]
+ if {$result == ""} return
+
+ return [lindex [lindex $result 0] 0]
}

##
***************
*** 263,268 ****
--- 303,311 ----
if {![info exists filter(ids)]} return

set filter(ids) [ldelete filter(ids) $id]
+ if {$filter(ids) == ""} {
+ unset filter(ids)
+ }

catch {
unset filter($id,name)
***************
*** 281,324 ****
## List all defined filters.
##

! proc list_filter {} {

global filter

! set txt ""
! foreach id $filter(ids) {
! lappend txt "Filter $id:"
! if {$filter($id,name) != ""} {
! lappend txt "Name: $filter($id,name)"
! }
! if {$filter($id,host) != ""} {
! lappend txt "Host: $filter($id,host)"
! }
! if {$filter($id,level) != ""} {
! lappend txt "Level: $filter($id,level)"
! }
! if {$filter($id,facility) != ""} {
! lappend txt "Facility: $filter($id,facility)"
! }
! if {$filter($id,process) != ""} {
! lappend txt "Process: $filter($id,process)"
! }
! if {$filter($id,message) != ""} {
! lappend txt "Message: $filter($id,message)"
! }
! if {$filter($id,action) != ""} {
! lappend txt "Action: $filter($id,action)"
! }
! if {$filter($id,status) != ""} {
! lappend txt "Status: $filter($id,status)"
! }
! if {$filter($id,highlight) != ""} {
! lappend txt "Highlight: $filter($id,highlight)"
! }
! lappend txt ""
}

! ined browse "List of defined filters:" $txt
}

##
--- 324,365 ----
## List all defined filters.
##

! proc list_filter { id } {

global filter

! if {![info exists filter($id,name)]} return
!
! lappend txt [format "%-16s%s" Filter: $id]
! if {$filter($id,name) != ""} {
! lappend txt [format "%-16s%s" Name: $filter($id,name)]
! }
! if {$filter($id,host) != ""} {
! lappend txt [format "%-16s%s" Host: $filter($id,host)]
! }
! if {$filter($id,level) != ""} {
! lappend txt [format "%-16s%s" Level: $filter($id,level)]
! }
! if {$filter($id,facility) != ""} {
! lappend txt [format "%-16s%s" Facility: $filter($id,facility)]
}
+ if {$filter($id,process) != ""} {
+ lappend txt [format "%-16s%s" Process: $filter($id,process)]
+ }
+ if {$filter($id,message) != ""} {
+ lappend txt [format "%-16s%s" Message: $filter($id,message)]
+ }
+ if {$filter($id,action) != ""} {
+ lappend txt [format "%-16s%s" Action: $filter($id,action)]
+ }
+ if {$filter($id,status) != ""} {
+ lappend txt [format "%-16s%s" Status: $filter($id,status)]
+ }
+ if {$filter($id,highlight) != ""} {
+ lappend txt [format "%-16s%s" Highlight: $filter($id,highlight)]
+ }

! eval ined acknowledge $txt
}

##
***************
*** 331,337 ****

if {[info exists syslog]} {
removeinput $syslog
! tcp close $syslog
unset syslog
}
}
--- 372,380 ----

if {[info exists syslog]} {
removeinput $syslog
! if {[catch {tcp close $syslog}]} {
! close $syslog
! }
unset syslog
}
}
***************
*** 352,371 ****

# scan the administrative fields

set n [scan $line "%s %d.%d %s %d %d:%d:%d %s" \
host facility level month day hour min sec proc]
if {$n != 9} {
! if {[string match "-- MARK --" $line]} return
! writeln "** error parsing event message: $line"
! return
}

# extract the message

set i [string first $proc $line]
if {$i < 0} {
! writeln "** $line"
! writeln "** can not extract message after $proc"
}
incr i [string length $proc]
set message [string range $line $i end]
--- 395,422 ----

# scan the administrative fields

+ if {[string match "-- MARK --" $line]} return
+
set n [scan $line "%s %d.%d %s %d %d:%d:%d %s" \
host facility level month day hour min sec proc]
if {$n != 9} {
!
! set n [scan $line "%s %d %d:%d:%d %s %s" \
! month day hour min sec host proc]
! if {$n != 7} {
! debug "** error parsing event message: $line"
! return
! }
! set facility ""
! set level ""
}

# extract the message

set i [string first $proc $line]
if {$i < 0} {
! debug "** $line"
! debug "** can not extract message after $proc"
}
incr i [string length $proc]
set message [string range $line $i end]
***************
*** 449,470 ****

proc "Connect" {list} {

! global server port
global syslog
global tools

set result [ined request "Event Parameter" \
! "{ {Server:} $server } \
! { {Port:} $port } "]

if {$result == ""} return

set server [lindex $result 0]
set port [lindex $result 1]

! if {[catch {tcp connect $server $port} syslog]} {
! ined acknowledge "Can not connect to server $server using port $port"
! return
}

addinput -read $syslog ev_receive
--- 500,538 ----

proc "Connect" {list} {

! global server port use file
global syslog
global tools

set result [ined request "Event Parameter" \
! [list [list Server: $server] \
! [list Port: $port] \
! [list Use: $use radio server file ] \
! [list File: $file] \
! ]
! ]

if {$result == ""} return

set server [lindex $result 0]
set port [lindex $result 1]
+ set use [lindex $result 2]
+ set file [lindex $result 3]

! if {$use == "server"} {
! if {[catch {tcp connect $server $port} syslog]} {
! ined acknowledge "Can not connect to $server using port $port"
! return
! }
! } else {
! if {![file exists $file]} {
! ined acknowledge "Can not read $file."
! return
! }
! if {[catch {open "|tail -1f $file" r} syslog]} {
! ined acknowledge "Can not open $file:" "" $syslog
! return
! }
}

addinput -read $syslog ev_receive
***************
*** 504,510 ****
##

proc "Create Filter" {list} {
! edit_filter [create_filter "Temporary Filter"]
}

##
--- 572,582 ----
##

proc "Create Filter" {list} {
!
! set id [create_filter "Temporary Filter"]
! if {[edit_filter $id] != $id} {
! delete_filter $id
! }
}

##
***************
*** 513,526 ****

proc "List Filter" {list} {

! global filter
!
! if {![info exists filter(ids)]} {
! ined acknowledge "No filters defined."
! return
! }

! list_filter
}

##
--- 585,594 ----

proc "List Filter" {list} {

! set id [select_filter]
! if {$id == ""} return

! list_filter $id
}

##
***************
*** 529,546 ****

proc "Edit Filter" {list} {

! global filter
!
! if {![info exists filter(ids)]} return
!
! if {[llength $filter(ids)] == 1} {
! set result $filter(ids)
! } else {
! set result [ined list "Select a filter to modify:" $filter(ids)]
! }
! if {$result == ""} return

! edit_filter [lindex $result 0]
}

##
--- 597,606 ----

proc "Edit Filter" {list} {

! set id [select_filter]
! if {$id == ""} return

! edit_filter $id
}

##
***************
*** 549,562 ****

proc "Delete Filter" {list} {

! global filter
!
! if {![info exists filter(ids)]} return
!
! set result [ined list "Select a filter to delete:" $filter(ids)]
! if {$result == ""} return

! delete_filter [lindex $result 0]
}

##
--- 609,618 ----

proc "Delete Filter" {list} {

! set id [select_filter]
! if {$id == ""} return

! delete_filter $id
}

##