CONTACT US
The Global Leader in Technical Education for the Digital Infrastructure Industry
  • UK: +44 (0)1284 767100
  • U.S.: +1 302-526-1977
  • News

    CNet Training

    • PROGRAMS
      • DATA CENTRE PROGRAMS
        • The Global Digital Infrastructure Education Framework
        • CCAM® (Competency & Confidence Assessment Modelling)
        • Data Centre Fundamentals (DCF®) – Distance Learning
        • Data Centre Fundamentals (DCF®) – Instructor-led
        • Certified Data Centre Technician Professional (CDCTP®)
        • Certified Data Centre Design Professional (CDCDP®)
        • Certified Data Centre Management Professional (CDCMP®)
        • Certified Data Centre Energy Professional (CDCEP®)
        • Certified Data Centre Audit Professional (CDCAP®)
        • Certified Data Centre Sustainability Professional (CDCSP®)
        • Certified Data Centre Project Management (CDCPM®)
        • Masters Degree in Data Centre Leadership and Management
        • Program Delivery Methods
      • NETWORK INFRASTRUCTURE PROGRAMS
        • The Global Digital Infrastructure Education Framework
        • Network Infrastructure Fundamentals (NIF®) – Distance Learning
        • Network Infrastructure Fundamentals (NIF®) – Instructor-led
        • Certified Network Cable Installer (CNCI®)
        • Certified Network Cable Installer (CNCI®) – Copper Cabling
        • Certified Network Cable Installer (CNCI®) – Fibre Optic Cabling
        • Network Cable Installer (NCI®) Apprenticeship
        • Certified Network Infrastructure Technician (CNIT®)
        • Certified Integrated Infrastructure Technician (CIIT®)
        • Certified Wireless Infrastructure Technician (CWIT®)
        • Certified Outside Plant Technician (COPT®)
        • Certified Network Infrastructure Design Professional (CNIDP®)
        • Certified Telecommunications Project Management (CTPM®) – Distance Learning
        • Certified Telecommunications Project Management (CTPM®) – Instructor-led
        • Fluke Versiv™ CCTT
        • Fluke Electrical Installation & Appliance Testing
        • Program Delivery Methods
      • PROGRAM INFORMATION
        • Apprenticeships
        • Certifications, Qualifications, Accreditation
        • Re-certification
        • Digital Badges
        • Resettlement and Ex-forces
        • Program Delivery Methods
        • Remote Attendance
        • Distance Learning Programs
        • Credentials in Demand
    • PROGRAM INFORMATION
      • Apprenticeships
      • Remote Attendance
      • Certifications & Qualifications
      • Re-certification
      • Digital Badges
      • Resettlement & Ex-forces
      • Program Delivery Methods
      • On-site Programs
      • The CNet Technical and Expert Instructor Teams
      • Credentials in Demand
      • Train-the-Trainer
      • Credentials in Demand
    • PROGRAM DATES
    • EDUCATION FRAMEWORK
    • ABOUT US
      • About CNet Training
      • Why Choose CNet Training
      • Digital Infrastructure Education Advice Service
      • Legacy Statement
      • University Technical College (UTC) Heathrow
      • Committed to Sustainability
      • Community Engagement
      • Testimonials
      • News
      • Recruitment
      • Events
      • Partners
      • CNet Development & Instructor Team
      • Train-the-Trainer
      • All About Data Centers
      • Clients
      • Associate College
      • Technical Curriculum Advisory Board (TCAB)
    CONTACT US
    • Home
    • News
    • News
    • Human Errors: The Biggest Challenge to Data Center Availability and how we can mitigate them – Part 1

    Human Errors: The Biggest Challenge to Data Center Availability and how we can mitigate them – Part 1

    Human Errors: The Biggest Challenge to Data Center Availability and how we can mitigate them – Part 1

    by CNet / Thursday, 02 March 2017 / Published in News

    The 2016 Ponemon Institute research report on Cost of downtime (reference 1) contains a chart showing the cause of data center downtime, and classify accidental human error to be 22%, and the top six contributors to downtime are UPS system failure (25%), cyber crime (22%), accidental human error (22%), water/heat/CRAC failure (11%), weather related (10%), and generator failure (6%). However, the accidental human error did not account for latent human error that could have contributed to those UPS/CRAC/Generator failure.

    Uptime Institute had cited 70% of data center outages can be attributed to human error.

    The definition of human error is broader and can be generally classify into Active Error (where a deliberate action caused deviation from expected outcome), and Latent Error (where a non-deliberate action caused deviation from expected outcome). For example, when a design decision is made regarding the power protection circuit for a data center room, if it was not fully co-ordinated to isolate and protect power issue to cascade upstream to higher level circuit breakers.

    There are many cases of major outages in the past few years that are attributed to human error. The 2016 Delta airline data center outage is reported to cost them USD 150 Millions. Part of the long delay (3 days) to resume service is that a significant part of their IT infrastructure is not connected to backup power source which begs the question why did it happen that way? Well, it should be due to latent error, where the IT equipment installation or the in-rack PDUs are not from two separate UPS or supported by in-rack ATS switch.

    I was asked a question during my presentation on this subject matter whether higher tier level aka higher resiliency designed and implemented data center can minimize this issue of human error. My answer is you can design and implement to 2N power and cooling infrastructure, but when 1N is taken down for maintenance, any mistake or weakness (inexperience operations staff/vendor personnel, procedure gap that human nature overlooked and made wrong guess etc) can take down the IT load and has happened to many data centers (google search on human error and data center outage incidents).

    There are multiple ways for the human error to manifest in a data center outage. They can be simple external trigger that goes through loopholes like the Swiss cheese above, or cascade (combination), or direct active human error.

    For example on cascade, a case of lightning strike that caused momentary power dip (see reference) should not cause an outage in a data center; however if the selection of circuit protection device or the design did not cater for how the DRUPS would respond in such a situation, and the automated control was not configured to deal with it, then any amount of SOP/MOP/EOP or Method of Statement-Risk Assessment (MOS-RA) may not protect the facility against a particular external trigger. A case of a data center in Sydney whereby the circuit breakers were not designed and selected to cater to such a scenario caused the UPS to supply to the grid instead of to the load.

    For direct human error, I have also known a case of UPS manufacturer trained and authorized service engineer causing an outage, where the engineer did not follow the documented service manual and caused the entire set of UPS to tripped, and because the circuit protection devices were not able to isolate the fault downstream, caused the upstream incoming breaker to trip. This is part of the reason why data center staff should accompany and question the service engineer at critical check-points during servicing of critical infrastructure.

    Outage can be failure of the resilient design / implementation due to under-capacity. This can be traced to latent (no tracking of actual power capacity versus designed capacity) or active (no checking of UPS capacity before maintenance). For example, actual power usage of N+1 UPS has actually become N UPS, and when one of the UPS was down, the entire UPS set shutdown.

    References:

    1. http://www.enterpriseinnovation.net/system/files/whitepapers/1_2016-cost-of-data-center-outages-final-2.pdf
    2. https://aws.amazon.com/message/4372T8/
    3. http://news.delta.com/chief-operating-officer-gives-delta-operations-update
    4. https://journal.uptimeinstitute.com/examining-and-learning-from-complex-systems-failures/

    Source: LinkedIn – James SOH – Lead Data Center Consultant at Newwit Consultancy

    LEAVE A MESSAGE
    framework
    View The Global Digital Infrastructure Education Framework
    CNet Training
    Download the CNet Training Brochure
    Digital Infrastructure Education Advice Service
    Subscribe

    Sign up to receive program news and updates from us

    Programs

    • Data Centre Programs
    • Network Infrastructure Programs
    • Program Dates
    • The Global Digital Infrastructure Education Framework

    About CNet Training

    • About Us
    • Why Choose CNet Training
    • CNet Development & Instructor Team
    • Train-the-Trainer
    • Clients
    • Partners
    • News
    • Recruitment

    Help & Support

    • Contact Us
    • Privacy Statement
    • Feedback and Complaints

    Tel UK: +44 (0)1284 767100
    Tel U.S.: +1 302-526-1977

    Subscribe To Hear CNet News

    Receive our latest news straight to your inbox

    SUBSCRIBE
    • United Kingdom
    • United States

    CNet Training Ltd, formerly CableNet Training Services Ltd, is a member of the Academia Group. Registered office: Park Farm Business Centre, Fornham Saint Genevieve, Bury St Edmunds, Suffolk, IP28 6TS, England. Reg No. 3233910 England and Wales VAT GB 676 7804 83
    © CNet Training 2022

    TOP
    By using this site you agree to our use of cookies to monitor website traffic. Please read our Privacy Statement for more info.
    Accept all cookies or reject cookies. You can adjust which cookies you would like us to place in cookie settings.
    Privacy & Cookies Consent

    Privacy Overview

    This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the ones that are categorised as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyse and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
    Necessary
    Always Enabled
    Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
    Non-necessary
    Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
    Functional
    Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
    Performance
    Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
    Analytics
    Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
    Advertisement
    Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
    Others
    Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
    SAVE & ACCEPT
    Powered by CookieYes Logo

    Request a follow up

    * Indicates a required field






    We prefer not to send to gmail email addresses so would appreciate it if you could supply an alternative. Please check your spam/junk folders, to ensure you do not miss a reply email from us.

















    CNet will contact you using the contact information you have provided to respond to this request. Your information will be processed in accordance with our privacy statement. We would like to keep you updated on further information about CNet and the programs that we offer. If you would prefer not to receive future communications please tick here.
    You can unsubscribe at any time by contacting us or following the unsubscribe instructions in our emails.

    Complete the reCAPTCHA below in order to submit.