By Definition server room maintenance involves keeping the server room components and the environment in a good state of repair and physical health. Meaning, keeping the server room hardware equipment, building facility, and hosted equipment functional and operational.
Tiered Infrastructure Maintenance Standards (TIMS) for Server Room
Why Server Room Maintenance?
The purpose of maintenance, including tests, measurements, adjustments, parts replacement, and cleaning, performed specifically to prevent faults from occurring. Hardware maintenance is the testing and cleaning of server room equipment. Software maintenance is the updating of servers’ applications and programs in order to meet changing information requirements, such as adding new functions.
Maintenance Checklist
Building
✓ Check the condition of floors, ceilings and walls
✓ Look for leaks or water damage in the ceilings
✓ Make sure that exits are clearly marked, with additional signage as needed
✓ Make sure server room is free of trash or large items that could be a fire or tripping hazard
✓ Conduct routine pest inspections and treatments
IT Equipment
✓ Make sure IT hardware equipment i.e. servers, communication gear, and storage equipment are racked in appropriate locations as per the server room plan and design
✓ Make sure there are no loose wire on or above the floor of the server room
Electrical Infrastructure
✓ Check and confirm that the Utility Grid in good working order
✓ Make sure that Backup generators are available and are in good working order
✓ Check Automatic Transfer Switches (ATS), Uninterrupted Power Supplies (UPS), and Power Distribution Units (PDU) and their state of working condition
Cooling Infrastructure
✓ Check and confirm that Computer Room Air Conditioners (CRAC) or Computer Room Air Handlers (CRAH) and the overall HVAC system is efficiently functional.
Security and Safety
✓ Check the locks the door, make sure they lock and unlock easily
✓ Test smoke and carbon monoxide detectors and change batteries at least once a year
✓ Check that all lights (interior and exterior) are working, replacing bulbs as needed
✓ Regularly check the visitors’ list and try to limit access to the server room as much as possible
Cleaning
✓ Check the cleanliness condition of the server room facility
✓ Make sure that the server room hardware equipment and the facility itself is free of dust and contamination
Server Room Maintenance Types
- Condition-based maintenance
- Corrective maintenance
- Planned maintenance
- Predictive maintenance
- Preventive maintenance
- Total productive maintenance
Condition-based maintenance (CBM)
Condition-based maintenance (CBM), shortly described, is maintenance when the need arises. This maintenance is performed after one or more indicators show that the server room equipment is going to fail or that equipment performance is deteriorating.
Corrective Maintenance
Corrective maintenance is a maintenance task performed to identify, isolate, and rectify a fault in the server room so that the failed equipment, machine, or system can be restored to an operational condition within the tolerances or limits established for in-service operations.
Planned Maintenance/Scheduled Maintenance
Planned preventive maintenance (PPM), more commonly referred to as simply planned maintenance (PM) or scheduled maintenance, is any variety of server room scheduled maintenance to an object or item of equipment. This is the maintenance that is regularly performed in the server room environment to lessen the likelihood of it failing. Preventive maintenance is performed while the server room equipment is still working so that it does not break down unexpectedly.
Predictive Maintenance
Predictive maintenance (PdM) techniques are designed to help determine the condition of a server room equipment in order to predict when maintenance should be performed. This approach promises cost savings over routine or time-based preventive maintenance because tasks are performed only when warranted.
Preventive maintenance (PM)
The care and servicing by personnel for the purpose of maintaining server room equipment in satisfactory operating condition by providing for systematic inspection, detection, and correction of incipient failures either before they occur or before they develop into major defects. Preventive maintenance tends to follow planned guidelines from time-to-time to prevent the server room equipment and machinery breakdown. The work carried out on equipment in order to avoid its breakdown or malfunction. It is a regular and routine action taken on equipment in order to prevent its breakdown.
Total Productive Maintenance
Total productive maintenance (TPM) is a system of maintaining and improving the integrity of production and quality systems through the machines, equipment, processes, and employees that add business value to a server room.
Maintenance Windows
The server room maintenance window is a period of time designated in advance by the technical staff, during which preventive maintenance that could cause disruption of service may be performed. The purpose of defining standard maintenance windows is to allow clients of the service to prepare for possible disruption or changes.
Server room Downtime
The to periods when the server room system is unavailable. Downtime or outage duration refers to a period of time in which the server room system fails to provide or perform its primary function. This is usually a result of the system failing to function because of an unplanned event, or because of routine maintenance (a planned event).
Server room Outage
Unavailability or decrease in quality of the server room service due to unexpected behaviour of a particular service. Incidents and maintenance work both may cause an outage in the server room that results in a service not being delivered at a level they reasonably expected.
Server room Change Management
The objective of server room change management is to ensure that standardised methods and procedures are used for efficient and prompt handling of all changes to control the server room infrastructure, in order to minimise the number and impact of any related incidents upon service. Change management can ensure standardised methods, processes and procedures which are used for all changes, facilitate efficient and prompt handling of all changes, and maintain the proper balance between the need for change and the potential detrimental impact of changes.
Related Information
What is a Server Room?
Computer Maintenance
Powered Equipment Use and Maintenance Plan
Hard Floor Maintenance
Presented by Server Room Cleaning