By Definition data centre maintenance involves keeping the data centre components and the environment in a good state of repair and physical health. Meaning, keeping the data centre hardware equipment, building facility, and hosted equipment functional and operational.
Tiered Infrastructure Maintenance Standards (TIMS) for Data Centre
Why Data Centre Maintenance?
The purpose of maintenance, including tests, measurements, adjustments, parts replacement, and cleaning, performed specifically to prevent faults from occurring. Hardware maintenance is the testing and cleaning of data centre equipment. Software maintenance is the updating of application programs in order to meet changing information requirements, such as adding new functions.
✓ Check the condition of floors, ceilings and walls
✓ Look for leaks or water damage in the ceilings
✓ Make sure that exits are clearly marked, with additional signage as needed
✓ Make sure data centre is free of trash or large items that could be a fire or tripping hazard
✓ Conduct routine pest inspections and treatments
✓ Make sure IT hardware equipment i.e. servers, communication gear, and storage equipment are racked in appropriate locations as per plan
✓ Make sure there are no loose wire on or above the floor
✓ Check and confirm that the Utility Grid in good working order
✓ Make sure that Backup generators are available and are in good working order
✓ Check Automatic Transfer Switches (ATS), Uninterrupted Power Supplies (UPS), and Power Distribution Units (PDU) and their state of working condition
✓ Check and confirm that Computer Room Air Conditioners (CRAC) or Computer Room Air Handlers (CRAH) and the overall HVAC system is efficiently functional.
Security and Safety
✓ Check the locks the door, make sure they lock and unlock easily
✓ Test smoke and carbon monoxide detectors and change batteries at least once a year
✓ Check that all lights (interior and exterior) are working, replacing bulbs as needed
✓ Regularly check the visitors’ list and try to limit access to the data centre as much as possible
✓ Check the cleanliness condition of the data centre facility
✓ Make sure that the data centre hardware equipment and the facility itself is free of dust and contamination
Data Centre Maintenance Types
- Condition-based maintenance
- Corrective maintenance
- Planned maintenance
- Predictive maintenance
- Preventive maintenance
- Total productive maintenance
Condition-based maintenance (CBM)
Condition-based maintenance (CBM), shortly described, is maintenance when the need arises. This maintenance is performed after one or more indicators show that the data centre equipment is going to fail or that equipment performance is deteriorating.
Corrective maintenance is a maintenance task performed to identify, isolate, and rectify a fault in the data centre so that the failed equipment, machine, or system can be restored to an operational condition within the tolerances or limits established for in-service operations.
Planned Maintenance/Scheduled Maintenance
Planned preventive maintenance (PPM), more commonly referred to as simply planned maintenance (PM) or scheduled maintenance, is any variety of data centre scheduled maintenance to an object or item of equipment. This is the maintenance that is regularly performed in the data centre environment to lessen the likelihood of it failing. Preventive maintenance is performed while the data centre equipment is still working so that it does not break down unexpectedly.
Predictive maintenance (PdM) techniques are designed to help determine the condition of a data centre equipment in order to predict when maintenance should be performed. This approach promises cost savings over routine or time-based preventive maintenance because tasks are performed only when warranted.
Preventive maintenance (PM)
The care and servicing by personnel for the purpose of maintaining data centre equipment in satisfactory operating condition by providing for systematic inspection, detection, and correction of incipient failures either before they occur or before they develop into major defects. Preventive maintenance tends to follow planned guidelines from time-to-time to prevent the data centre equipment and machinery breakdown. The work carried out on equipment in order to avoid its breakdown or malfunction. It is a regular and routine action taken on equipment in order to prevent its breakdown.
Total Productive Maintenance
Total productive maintenance (TPM) is a system of maintaining and improving the integrity of production and quality systems through the machines, equipment, processes, and employees that add business value to a data centre.
The data centre maintenance window is a period of time designated in advance by the technical staff, during which preventive maintenance that could cause disruption of service may be performed. The purpose of defining standard maintenance windows is to allow clients of the service to prepare for possible disruption or changes.
Data Centre Downtime
The to periods when the data centre system is unavailable. Downtime or outage duration refers to a period of time in which the data centre system fails to provide or perform its primary function. This is usually a result of the system failing to function because of an unplanned event, or because of routine maintenance (a planned event).
Data Centre Outage
Unavailability or decrease in quality of the data centre service due to unexpected behaviour of a particular service. Incidents and maintenance work both may cause an outage in the data centre that results in a service not being delivered at a level they reasonably expected.
Data Centre Change Management
The objective of data centre change management is to ensure that standardised methods and procedures are used for efficient and prompt handling of all changes to control the data centre infrastructure, in order to minimise the number and impact of any related incidents upon service. Change management can ensure standardised methods, processes and procedures which are used for all changes, facilitate efficient and prompt handling of all changes, and maintain the proper balance between the need for change and the potential detrimental impact of changes.
Presented by Data Centre Cleaning