Effective system monitoring, with automated alarm notifications to support staff, can preempt many problems and improve system availability. Such monitoring will typically include:
- Free disk space available
- CPU utilisation
- Memory utilisation
- RAID health
- Checking for disk errors
- Checking system logs for potential problems
- Ensuring essential services are running
- The status of the backups
- Whether security updates need to be installed
- Routine system security checks
- Checking the validity and lifetime of any SSL certificates
Business Process Monitoring
Extending the monitoring to cover key business processes is a relatively easy extension. An online shop, for example, may know that it takes around 10 orders an hour. One check may be to look at the time of the last update of the “sales” table in the database and if it was more than 10 minutes ago, it could raise an alarm.
This is by no means the only test that should be run on such a server, and it wouldn’t be very helpful in diagnosing the cause of the problem, but it would alert staff who could check that all is well. If an issue is found, it may be appropriate to add more specific checks that would alert staff of future similar issues.
IT is there to support the business, and monitoring its effectiveness in doing so is most certainly a worthwhile approach.
While status monitoring is helpful in identifying problems when they begin to make themselves apparent, system trend monitoring is concerned with looking at various system parameters over a longer period of time. Typically, many of the same parameters are measured, but are displayed as graphs. This can be useful in a number of ways:
- It allows reasonable predictions to be made, for example in disk space or memory requirements.
- The cause of transient issues, such as slow performance at a given time, can be narrowed down.
By way of example, the graph below shows the disk usage of a system over time. It can be seen that the
/home partition (the top line) was filling up between June and early October. The system status monitor alerted support staff to the fact that the disk was getting full, and the graph below enabled a judgement to be made, that, unless something was done, the system would run out of space around the beginning of November.
In this particular case, some files that were no longer required were deleted, shown by the drop in mid-October, but it would have been possible to schedule the fitting of an additional or larger disk if that had been appropriate.
The use of system status monitoring and trend monitoring allowed the problem to be resolved before it impacted the business.
We have been using Tiger Computing for more than 10 years and they have provided us a server with 100% up time and off site backups. Their technical support has been second to none. I highly recommended them to anyone who needs trouble free IT.– MIKE VINCE, MANAGING DIRECTOR (MONODE)