ICT Today

ICT Today May_June 19

Issue link: https://www.e-digitaleditions.com/i/1107553

Contents of this Issue

Navigation

Page 32 of 63

May/June 2019 I 33 HUMAN ERROR Data center operations need to consider quantitative characteristics and metrics, such as Availability, Reliability and MTBF, as well as the undeterminable effect of human error. IBM conducted a study to evaluate the ability of technicians to resolve failed drives within a redundant array of independent disks (RAID). 3 Five technically savvy personnel were tasked to perform a basic repair of replacing a drive within a RAID array. The technicians were to complete this task multiple times on up to three different OS environments. All technicians were trained on how to perform the repair and given printed step-by-step instructions for each OS environment. They completed the tasks in a low-stress environment, void of alarms, angry customers or supervisors. A total of 99 repairs were attempted. Errors due to human error resulted in 8 to 23 percent of the attempts depending on the OS environment. Human error cannot be predicted, but policies and procedures certainly can help to reduce or eliminate them. BICSI 009-2019 STANDARD BICSI 009 is focused on data center operations, which compliments the ANSI/BICSI 002 standard that com- prehensively covers data center design. The sections of BICSI 009 that are most relevant to data center operations include: • Standard Operating Procedures • Maintenance Operating Procedures • Emergency Operating Procedures • Management Standard Operating Procedures BICSI 009 provides guidance regarding standard operating procedures (SOPs). SOPs are developed for all personnel working within the data center or for those responsible for providing data center services. A data center's SOPs are written to address safety requirements, personnel code of conduct, quality of work, and defined processes for work order requests, approval and implementation. The SOPs are general policies and procedures to which all personnel must adhere. Maintenance Operating Procedures Maintenance operating procedures (MOPs) are developed for the specific data center technicians that are responsible for specific components or systems. Because human error is not predictable, it is the leading contributor to unplanned downtime. Under normal operating conditions, the data center responds to various internal and external conditions (i.e., utility power, outdoor temperature, humidity) without the need for any human interaction. As technology continues to be developed, automation within data centers is increasing with the implementation of more sophisticated control systems through machine learning and other artificial intelligent technologies. The clear boundary that used to exist between data center facilities and data center IT no longer exists. Common protocols are being developed that enable compute systems, storage systems, network systems, power systems, and cooling systems to communicate with each other. This ultimately creates one critical infrastructure ecosystem that integrates both IT and facility systems, thereby enabling the critical infrastructure to respond to IT requirements in real time. With this increased interaction between facility and IT systems, human error during human interaction with either facility or IT systems can have cascading results. Most human interaction with the data center is during maintenance activities, which is a time when the systems

Articles in this issue

Archives of this issue

view archives of ICT Today - ICT Today May_June 19