Server maintenance involves keeping a server in a good state of repair and physical health. Meaning, It’s set of maintenance tasks and procedures that help to keep the server operating the software, applications, and hardware updated and operational.
Tiered Infrastructure Maintenance Standards (TIMS) for Server
Why Server Maintenance?
The purpose of maintenance, including tests, measurements, adjustments, parts replacement, and cleaning, performed specifically to prevent faults from occurring and to achieve optimal performance of the server. Hardware maintenance is the testing and cleaning of server equipment. Software maintenance is the updating of application programs in order to meet changing information requirements, such as adding new functions.
✓ Check and confirm that the server is installed with the latest and greatest operating system.
✓ Make sure that the latest stable version of the application software is installed.
✓ Check and make sure that the upgraded version of OS and application package is compatible with the server hardware version.
✓ Check server hardware for errors
✓ Test and confirm that the backup hardware components are functional
✓ Check and confirm that the hardware not overburdens and not overheating
✓ Confirm and ensure that the server hardware physically in a good state of cleanliness
✓ Ensure that the server is stored in a temperature controlled and clean server room
✓ Make sure that the network and power cables are not loose and are fasten tight
Storage & Processing
✓ Check the processor and CPU utilisation and ensure there are no errors, above 70% CPU utilisation may cause serious performance issues.
✓ Check the hard drive or RAID of the server and make sure that they have enough storage capacity
✓ Make sure that you delete all the unnecessary files and folders to release storage space and calm the utilisation of the server resources
✓ Make sure that the RAM utilisation is normal. If the RAM utilisation is high then check for errors otherwise upgrade the RAM on the server
✓ Check and review daily and weekly service performance report to maintain a healthy server operation
✓ Esure that your performance reports are in-line with the server resources utilisation in real time
✓ Make sure that you save the performance reports in a safe repository for your future reference
✓ Ensure that the server operating system and application software are up to date with newly released security patches
✓ Make sure that the server has appropriate antivirus installed unless there is a reliable network firewall in place
✓ Perform virus scan and penetration testing when installing a new server or whenever needed.
✓ Make sure that you know the server’s new vulnerabilities and attacks by reviewing OWASP (Open Web Application Security Project) and Common Vulnerabilities and Exposures (CVE) releases and documents.
Long & Monitoring
✓ Make sure detail log are collected from the server
✓ Ensure that the log has a full-time stamp and even tags
✓ Ensure that logs are collected from the server and they are stored in a secure repository outside the server
✓ Make sure that you review the server log on the daily basis to identify any anomalies and remediate the issues immediately
Backups and Restore
✓ Ensure that regular backups are taken from serve
✓ Make sure that backups are stored in a secure repository outside the server
✓ Make sure that the backup files don’t corrupt in the transition, always perform an MD5 check on the file and now and then exercise a restore in a lab environment to make sure that back up files are worth
✓ Check and confirm that the server hardware is stored in a locked, secure and clean room
✓ Keep track of the physical cleaning of the server and when necessary get it professionally cleaned inside and outside.
✓ Keep an inventory of the server parts whether they are installed or spare
Server Maintenance Types
- Condition-based maintenance
- Corrective maintenance
- Planned maintenance
- Predictive maintenance
- Preventive maintenance
- Total productive maintenance
Condition-based maintenance (CBM)
Condition-based maintenance (CBM), shortly described, is maintenance when the need arises. This maintenance is performed after one or more indicators show that the server equipment is going to fail or that equipment performance is deteriorating.
Corrective maintenance is a maintenance task performed to identify, isolate, and rectify a fault in the server so that the failed equipment, machine, or system can be restored to an operational condition within the tolerances or limits established for in-service operations.
Planned Maintenance/Scheduled Maintenance
Planned preventive maintenance (PPM), more commonly referred to as simply planned maintenance (PM) or scheduled maintenance, is any variety of server scheduled maintenance to an object or item of equipment. This is the maintenance that is regularly performed in the server environment to lessen the likelihood of it failing. Preventive maintenance is performed while the server equipment is still working so that it does not break down unexpectedly.
Predictive maintenance (PdM) techniques are designed to help determine the condition of a server equipment in order to predict when maintenance should be performed. This approach promises cost savings over routine or time-based preventive maintenance because tasks are performed only when warranted.
Preventive maintenance (PM)
The care and servicing by personnel for the purpose of maintaining server equipment in satisfactory operating condition by providing for systematic inspection, detection, and correction of incipient failures either before they occur or before they develop into major defects. Preventive maintenance tends to follow planned guidelines from time-to-time to prevent the server equipment and machinery breakdown. The work carried out on equipment in order to avoid its breakdown or malfunction. It is a regular and routine action taken on equipment in order to prevent its breakdown.
Total Productive Maintenance
Total productive maintenance (TPM) is a system of maintaining and improving the integrity of production and quality systems through the machines, equipment, processes, and employees that add business value to a server.
The server maintenance window is a period of time designated in advance by the technical staff, during which preventive maintenance that could cause disruption of service may be performed. The purpose of defining standard maintenance windows is to allow clients of the service to prepare for possible disruption or changes.
This is the period when the server system is unavailable. Downtime or outage duration refers to a period of time in which the server system fails to provide or perform its primary function. This is usually a result of the system failing to function because of an unplanned event, or because of routine maintenance (a planned event).
Unavailability or decrease in quality of the server service due to unexpected behaviour of a particular service. Incidents and maintenance work both may cause an outage in the server that results in a service not being delivered at a level they reasonably expected.
Server Change Management
The objective of server change management is to ensure that standardised methods and procedures are used for efficient and prompt handling of all changes to control the server infrastructure, in order to minimise the number and impact of any related incidents upon service. Change management can ensure standardised methods, processes and procedures which are used for all changes, facilitate efficient and prompt handling of all changes, and maintain the proper balance between the need for change and the potential detrimental impact of changes.
Presented by Data Centre & Server Cleaning Service