Showing posts with label Downtime. Show all posts
Showing posts with label Downtime. Show all posts

Tuesday, January 3, 2012

Testing & Exercising Branch Circuit Breakers

- Ken Koty, sales engineer for PDU Cables (http://www.pducables.com/) and former data center facilities manager for Thomson Reuters, says:

Any significant down time at my data centers could potentially cost my company millions of dollars in lost revenue, so I looked for every possible problem area and implemented preventative maintenance measures to ensure that downtimes were either minimized or eliminated. One of these preventative maintenance measures was to pre-test all branch circuit breakers prior to installation.

The last lines of defense protecting your critical servers on the raised floor are the branch circuit breakers that are located in your PDU/RPP’s. Considering how many of these are produced each day regardless of the brand name, you can be sure some of them may have defects. This can lead to several problems. If the branch circuit breaker does not trip on a direct short the next breaker in line will be the main breaker in your 42 circuit panel which can lead to widespread failures of your critical servers on the raised floor. In the event of an overloaded circuit, if the individual branch breaker does not trip, over time it can cause a fire.

I found from my years of experience of testing the branch circuit breakers that we installed in our data centers that between overloaded branch circuits and direct shorts we found about a 4% failure rate. Even though 4% may not seem like a large percentage, in most data centers 4% of the thousands of circuit breakers used can still put a significant number of servers at risk.

The solution I implemented and I recommend to you is, purchase a circuit breaker tester, set up a test station onsite and send some of your people to a training class so you can test all of the branch circuit breakers in-house prior to installing them in your critical environment. If that is not an option then you should consider having a local vendor do the testing for you. For each circuit breaker that passed its test we stickered, dated, signed, and then inventoried them for future installation.

Another very important maintenance practice is to exercise these breakers at least every two years when you have the opportunity to shut down the PDU/RPP for your scheduled maintenance program. I recommend that you exercise each branch circuit breaker by turning them off and on at least three to four times. This will ensure the breaker does not become seized in the closed position which could prohibit them from tripping in the event of a direct short or overloaded circuit which can lead to the above mentioned problems.

If it is continuous uptime you are looking to achieve, along with a good maintenance program, testing and exercising branch circuit breakers is a preventative maintenance measure well worth implementing.

Friday, October 7, 2011

What's Your Preventive Maintenance Strategy? Protecting Against Unplanned Downtime

- Lealand (Lea) Chittim, director of operations at Electronic Environments Corporation (http://www.eecnet.com/), says:

Preventive maintenance of electrical & mechanical systems such as Computer Room Air Conditioners (CRACs), Uninterruptible Power Supply (UPS) systems and batteries, and generators is conducted to keep equipment working and should be used as a strategy to improve the performance of one’s assets. Most companies implement a preventive maintenance strategy to protect themselves against unplanned downtime as well as to extend the operating life of the equipment inside their data center.

Preventive maintenance is the best approach a data center can take to get the value and Return on Investment (ROI) out of their assets. It is essential for all data centers that want to maximize these benefits to implement a high-quality preventive maintenance program on the critical environment equipment.

Preventive maintenance should rank in the top three in terms of overall priority in a data center. The 2007 Study of Root Causes demonstrated that two-thirds of downtime events, in a data center, stem from preventable causes. The number one preventable cause was due to insufficient maintenance. Decreasing potential downtime and preventing failures from occurring should be the most critical aspects of a good preventive maintenance program and a high ranking priority within a data center.

According to research done by Forrester Research, Inc., companies are more focused on service continuity, achieving continuous availability, and limiting the amount of downtime on their equipment in today’s critical environments. There is also a strong emphasis on the adoption of standards in an effort to maximize the benefit of a company’s assets. Preventive maintenance is the best approach a company can take to accomplish these initiatives while getting the most value out of their assets. It is essential for all companies that want to maximize the benefits of their assets to implement a high-quality preventive maintenance program on the critical environment equipment such as Uninterruptible Power Supplies (UPS), batteries, HVAC and generators.

Probably the biggest challenge for data center and IT managers when it comes to preventive maintenance is budgeting for the cost – the first item usually cut from a budget is preventive maintenance.

Understand how much a good preventive maintenance program can save them in the long run. It can not only save you money but it will also extend the life of the critical equipment. Extending the life expectancy of your equipment is a crucial aspect of preventive maintenance because the cost of the critical infrastructure equipment is a major capital investment. Part of managing one’s assets is based on the Return of Investment (ROI) and getting the maximum benefit from the assets based on cost. A standard preventive maintenance program will decrease the total cost of the investment by extending the life and reducing repair costs of the asset. According to a study performed by Jones Lang LaSalle, a preventive maintenance investment not only pays for itself but also produces a substantial ROI. The study shows that a good preventive maintenance program can yield up to a 500% return in some cases. The primary reason that this significant ROI is achieved is because proper maintenance adds years to the life expectancy of the equipment. The delay in replacing equipment allows companies to avoid the expensive capital outlay needed to replace the equipment for years. Basically, the longer the capital expense can be deferred the higher the ROI because the bulk of the return comes from increasing the useful life of the equipment.

Performing maintenance can help equipment run more efficiently, which in turn helps with the longevity of the equipment, but more importantly saves money by reducing the amount of energy the equipment uses. Scopes of Work to help equipment run more efficiently should include regularly checks of the quality of fluids, frequently changing air filters, and power washing condenser coils to name a few. Checking fluids is essential because contaminants can result in diminished performance. Changing filters frequently is important because it will maximize the airflow in the equipment. One of the most important maintenance checks on HVAC equipment is regularly cleaning the condenser coils. Condensers are known to draw in dirt, pollen, and other debris that restricts airflow and reduces efficiencies. Power washing the condenser on a regular basis will ensure that the HVAC equipment will run efficiently and also will help reduce emergency service calls.

If preventive maintenance has not been a high priority, then the critical environmental equipment in a data center has probably experienced a number of equipment failures during the past year. Some of the failures may be random, but the majority are a direct result of not implementing a preventive maintenance program. Preventive maintenance programs are conducted to keep the equipment running efficiently and effectively. Good preventive maintenance programs should include partial or complete overhauls at specific set periods and recording equipment trends to know when to replace or repair worn parts before they cause an unplanned outage.

The primary goal of preventive maintenance is to prevent failures before they occur and avoid any consequences. Long term benefits include: improved reliability, decreased cost of replacements, decreased downtime, and more efficient equipment. Companies should work with their maintenance provider to develop a scope of work and then standardize that scope throughout the company. With the exorbitant cost of downtime, a strong preventive maintenance program is not an option, it is a must.

Thursday, July 15, 2010

Downtime: Human Error

- Alex Bewley, chief technology officer at uptime software (www.uptimesoftware.com), says:

Ultimately, if you were to take a standard mid-enterprise organization, the greatest cause of downtime is still human error. No amount of automation or fault-tolerance is going to fix this. When this happens, you need deep monitoring across your virtual, physical and cloud applications to quickly pinpoint the problem and deal with it.

Achieving Zero Downtime

- Brady Reiter, General Manager of Enterprise Architecture and Application Strategy at NaviSite (www.navisite.com), says:

NaviSite has been managing packaged and custom applications for over a decade. During my eight years with NaviSite I have talked with many companies about the common causes of application downtime. A re-occurring theme has surfaced which data center/IT managers need to take to heart, and that is business and IT often have different definitions of zero downtime. In order to achieve zero downtime for your business users they expect not only that the servers are up and running but that all areas of the application are working properly. If your company can’t complete its month end processes or performance is degraded then in your business user’s eyes you have not achieved zero downtime.