BUSINESS CONTINUITY PLANNING AND SOLUTIONS
GUIDE TO RISK AND PROTECTION

Today's IT world increasingly requires non-stop access to the entire spectrum of computing and communications services to enable an organisation to operate in an effective and profitable manner. Any disruption to system availability is costly in terms of operations, lost revenue, missed deadlines and customer experience. Thus there is massive demand for full availability of IT equipment. The challenge for the IT professional is to balance the risks against the cost of protection to achieve full availability and justify this within their own operating environment.
The following topics cover the evaluation path to determining what type of protection should be considered and solutions to some common problems.
 

1. Operational Requirements

The simple fact is that like all forms of insurance, protection costs. You need to ensure that the level of cover is realistic for the operational needs and is cost-effective. IT equipment can broadly be categorised as critical and non-critical. Anything that has an operational impact and its loss would cause real pain; either internal or external is considered critical. Conversely, services that do not adversely effect core operations or customers are non-critical. As well as covering critical servers, the communications path, including terminating devices must also be considered.

In many organisations the customer-facing teams are considered a critical component and it may be prudent to ensure they enjoy full protection to avoid any customer impact. Many organisations choose to group their critical staff and provide them with a protected environment to ensure business continuity. This is achieved through physical co-location or assigning a special "hot-desk" area where the staff can work in an emergency with guaranteed access to services.
 

2. Risk Assessment

Risks are everywhere and simple steps can be taken to minimise the obvious issues such as poor wiring, inadequate ventilation and access to equipment. The main issues are:
 

2.1 Electrical Power Risk

Mains power failure is a commonplace occurrence in the UK as the dated distribution network is pushed to capacity with a maintenance-upon-failure attitude increasingly prevalent in today's de-regulated power industry. We also suffer from regular disruption due to building works and natural disasters such as flooding. There is the constant threat of brown-outs which may only cause the lights to flicker, but can severely disrupt IT equipment by causing power-downs and re-boots. By nature they are very difficult to trace, so can recur over a long period of time.

Another often overlooked issue is the actual stability of the incoming mains supply. In the UK it is supposed to vary by +/- 10%, though in practice can vary by anything up to 20%. This voltage cycling has a direct effect on the power consumption and heat output of any connected equipment, so is an undesirable factor.

External data services require power to the terminating units to operate and are often forgotten with services being terminated in a PBX room or comms cabinet away from the main IT suite. PBXs, and in particular VoIP enabled devices require power to give dial tone and again are often overlooked. Most traditional PBX systems have an optional battery pack from the manufacturer, though in practice these are of limited duration and often fail due to lack of maintenance.

When analysing what equipment to protect, you must bear in mind the electrical power characteristics and the effect this will have on any solution. Laser printers and electric motors all use a high surge current on start-up, so will require a considerably larger power rating than their normal operating state. The same is true for a VGA screen that can draw up to 5 times its normal power at start-up.

It is good practice to create a database of all electrical equipment and determine its' power requirements. By structuring the data you can see the exact requirement for each location or rack to enable the supply wiring infrastructure to be assessed. Where 3 phase power is used in an IT suite, an accurate power database will enable the phases to be balanced with equal loads for optimal operation.

All equipment is fused at the device and again at the mains plug. The plugs will typically connect via a fused distribution block back to a circuit breaker in the IT suite mains distribution panel. This is in turn fed from a circuit breaker in the main building distribution panel. Thus you can have 5 fused elements in the power supply chain - and that is only on your premises! All of these fused points, along with the wiring should be checked to determine correct rating, bearing in mind the hierarchical distribution structure.
 

2.2 Environmental Risk

Heat and humidity are enemies of electronic equipment and a stable environment is a must for a long service life. All power consumed by equipment has a heating effect and must be catered for. The optimum solution is a closed-control air conditioning system to manage both temperature and humidity. In an IT suite cooled air should optimally blow up through a ventilated floor with roof-level extraction. In practice, the floor gap is often insufficient and is filled with cabling, so obstructing airflow. Where air conditioning is not possible, efforts should be made to achieve air circulation, in particular to extract hot air from the equipment.

Upon mains power failure, the IT equipment can readily be kept running using a UPS system, though this is not suitable for running the air conditioning due to the high start-up power surge. The heat build-up from an IT suite can be considerable and provision must be made for this if operations are to be maintained for any length of time. This can involve ventilating through open doors and using modest-sized air-circulation fans to help vent the room. The best solution is to employ a backup generator to restore a mains supply, including operation of the air conditioning systems 

2.3 Fire and Flood Risk

Fire and flood are significant risks, borne out by increased buildings insurance premiums. There is often considerably more consequential damage than direct damage from an event and steps should be taken to minimise this.

Fire damages equipment from a considerable distance though thermal effects and smoke deposits which both cause significant harm to sensitive equipment. There is also the risk of water damage from fire hoses, particularly in a typical plasterboard construction room. A fire hose will easily penetrate a plasterboard panel, leaving water and plaster debris across everything in its path. Suspended ceilings can also drop, increasing the debris falling onto the equipment.

Flood damage can cripple all services as they typically are located in low points in a building and so are highly vulnerable to water ingress and should be suitably located or sealed.
 

2.4 Backup Devices

As applications and SANs continue to grow it is becoming impractical to run regular full tape backups, so making continuous network access all the more important for service continuity or prompt restoration. A network based backup drive is no good if it cannot be accessed in an emergency, so must be considered critical equipment.
 

3. Disaster Recovery

Many organisations invest heavily in disaster recovery facilities to offer an alternate operational location in the event of a disaster. The scope of these DR facilities varies greatly, but in general most can only accommodate a fraction of the normal number of staff and are rarely co-located with the main building, so necessitating a period for travel and start-up. Effective planning for a managed period of business continuity at the main site will ensure core services are maintained whilst the DR site comes on-line to minimise impact and ensure no data loss.
 

4. Protection Solutions

The most simple but effective method of protecting against mains power failure is with a UPS (Uninterruptible Power Supply). Having determined the overall power requirement and the period of autonomy, it is relatively straightforward to specify a suitable UPS system. All commonplace UPS systems use batteries, typically sealed lead-acid type. There are 2 types of UPS systems:

Line Interactive - typically these are the small units attached to individual equipment or sat in the bottom of equipment racks. These are normally passive with mains power fed straight to the load equipment. In the event of mains failure the UPS senses the loss and switches over to battery power with a rectifier to supply the load with AC current. By nature there is a switching transient upon mains loss and there is no guarantee that the load will not be disrupted, or indeed if the UPS will even work under load. Typically line interactive UPS systems are rated up to a maximum of 10kVA and should only be considered for dispersed equipment with minimal protection.

On-Line Double Conversion - these overcome all the shortcomings of the line interactive units by using a double rectifier, so taking the mains AC input down to DC and then back up to AC output for the load. The batteries connect the 2 DC rectifiers, so the load is always driven from them with a constant recharge process. Thus there is no impact whatsoever on the load from a mains failure (subject to battery capacity). The output is precisely controlled at an optimum voltage level with full power conditioning, so all spikes and surges are removed. These UPS systems come in a wide range of sizes, typically up to 300kVA and are ideal for IT equipment protection.
 

4.2 Environmental Protection

To provide a stable operating environment requires control of all parameters, so a full-function close control air conditioning system should be utilised to control temperature and humidity. By nature, air conditioning systems are potential sources of failure and it is good practice to design any installation with n+1 units for redundancy. As a general rule, the air conditioning system cooling capacity in kW should match the UPS rating as you cannot generate more heat than the input power. Hot-spots are a major problem and equipment layout should take account of how the hot air is extracted from equipment or racks. Most IT equipment employs fans to blow hot air out of the rear of the chassis and stacked equipment will have a significant heat concentration which is multiplied when open or perforated racks are set back-to-back.

The optimal solution is to use down-flow air conditioning units to blow cooled air under a raised floor void with outlets under the equipment racks. Racks should have open bottoms and offer sufficient airflow past the equipment leading up to extraction fans at the top. Thus hot air is cycled back to the roof level, in line with the air conditioning unit inlets. Where wall or ceiling mounted air conditioning units are employed, extra care should be taken to ensure there is good airflow around the room and hot-spots are avoided.
 

4.3 Fire and Flood Protection

The only way to fully protect the environment against hazards is to create a sealed, secure chamber for total risk isolation; from both internal and external events. For data suites the optimal solution is to construct a metal walled sealed chamber with good heat insulation. The PowerWorks offers its range of modular IT rooms to build into any location, existing or new to meet this demand. These rooms utilise a twin-wall steel construction, sandwiching a highly effective thermal barrier. For every 1,200°C externally, the interior will rise by less than 40°C. Services enter via intumescent seals so it is entirely impervious to smoke and water, fully protecting against fire and flood risks. The rooms are tested, and exceed the requirements of LPS 1208 cat 2, so are approved by insurers.

A dedicated fire detection and suppression system should be installed with multi-stage operation. Detection normally combines smoke detectors with infra-red heat sensors, set in zones for efficient operation. Suppression is best handled by an advanced low-volume gas such as FM200 which actively seeks the heat source to extinguish fire. Allowance should be made to expel spent gases externally following discharge.

Flood damage can be avoided by raising the equipment, or even the entire comms room above the risk level. The modular IT rooms can be set to any height on feet to avoid risk.
 

SUMMARY

Whilst certain protection steps are easy to identify and implement, the overall level of protection is only as good as the weakest component so any situation should be viewed as a whole to design an integrated solution. Power failure is the most regular risk element and is the easiest to combat, so return on investment is generally very good. At the other end of the spectrum, fire protection is more difficult to quantify, but generally leads to catastrophic failure and needs to be addressed as a matter of priority.

The PowerWorks Ltd invites you to contact us for a free risk assessment survey to determine areas of concern and solutions to keep your business running.

The PowerWorks Ltd
June 2003
 

  You can download a PDF version of this
  White Paper Here
  If you do not have Acrobat reader you can
  download it for free here!
  You can download a PDF version of this
  White Paper Here
  If you do not have Acrobat reader you can
  download it for free here!
  ©Copyright © 2002 The PowerWorks. All specifications are subject to change without notice. Website design by 2 GO UNDERGROUND
REQUEST A SURVEY  |   CALL ME  |   SITE MAP  |   HOME