SCADA/HMI


Alarm management: a Six Sigma approach

December 2007 SCADA/HMI

1:20 p.m. 23 March 2005 – An explosion at the third-largest oil refinery in the United States, the BP Texas City Refinery, leaves 15 people killed and 180 injured.

When a distillation tower was unknowingly overfilled, extreme pressure resulted in the release of flammable hydrocarbon which was then ignited by the backfire of a diesel truck idling some 7 metres from the blowdown drum causing a massive explosion.

BP Texas City refinery
BP Texas City refinery

Several factors contributed to this disaster which resulted in a financial loss of US$1,5 billion. However, the Final Investigation Report of the incident, released in March 2007 by the US Chemical Safety and Hazard Investigation Board, highlighted lapses in alarm management that were critical:

* The tower's high level alarm set-point was exceeded 65 times during the last 19 start ups, with more than 50 hours of operating time with the high level alarm activated.

* The redundant high level alarm (for the distillation tower) did not activate. When the...tower was filled beyond the set-points of both alarms...in the early morning on 23 March, 2005, only one alarm was activated. The high level alarm was triggered at 3:09 a.m. when the level reached 72% of transmitter range. The redundant hard-wired high level alarm (78% on transmitter) never sounded.

* The (redundant high level) alarm's set-point was not known to operations personnel or provided in the procedure, control data, or training materials.

* A functionality check of all alarms and instruments was also required prior to start up, but these checks were not completed.

* Tower pressure alarm set-points were frequently exceeded, yet the procedure did not address all the reasons this might happen and the steps operators should take in response.

Lessons from a disaster

Companies the world over are looking at these findings to understand not only how best to prevent such disasters from happening on their watch, but also to reassess their entire safety and risk management approach and specifically revisit their alarm management approach and practices.

Alarm systems have been an intrinsic part of plant safety management for a long time. They play a critical role in alerting operators to a change in operations at a process plant, inform operators about the nature of the change and guide operators to implement corrective action.

Poor alarm management results in:

* Increased downtime (when source alarms cannot cut through the clutter, then real problems are ignored for too long; resulting in process breakdowns). This translates into lost production as well as increased operator costs through overtime, and higher lifecycle cost of equipment through increased maintenance costs.

* Reduced plant productivity. When operators do not read early the signs of a developing problem, their response to alarm floods (large numbers of alarms annunciated at the time of process upset) typically takes the form of stabilising the process through reducing the rate of throughput.

* Reduced quality (when alarm systems fail to alert operators to corrective action at the right time, off-spec product has to be contended with).

* Reduced operator effectiveness, higher operator stress levels and increased operator staffing costs.

* In the worst-case scenario, alarm related confusion can result in or aggravate serious industrial accidents.

* Increased insurance premiums on plant equipment or fines incurred by not meeting regulatory requirements.

Too much of a good thing

However, despite their obvious significance, alarms have become yet another case of having 'too much of a good thing'; making less functional what was once an effective safety and productivity improvement system. How did this come about?

When control systems became mainstream, they also brought down the cost of alarms; thus increasing the proliferation of such alarms. After all, engineers did not have a strong cost disincentive to configuring excessive numbers of alarms. With this excess came reduced visibility of urgent and underlying problems, increased clutter that operators had to deal with and longer response time to undertaking appropriate corrective action.

Indisputable evidence of this came from an even earlier industrial disaster than that at the BP Texas City Refinery - the 24 July, 1994 explosion at the Texaco Refinery in Milford Haven, Wales. According to the Health and Safety Executive (HSE) that investigated the incident, "the flood of alarms greatly decreased the chances of the operators restoring control in the plant." Not surprising when it was determined that during the incident, alarms were presented at the rate of 20-30 per minute; with operators contending with 275 alarms in the final 11 minutes. As HSE put it, "warnings of the developing problems were lost in the plethora of instrument alarms triggered in the control room, many of which were unnecessary and registering with increasing frequency, so operators were unable to appreciate what was actually happening."

Going back further in time, the meltdown at the Three Mile Island Unit 2 nuclear power plant near Middletown, Pennsylvania (28 March 1978) also saw a similar alarm flood (110 alarms presented during the incident, preventing operators from understanding the real problem for 2,5 hours).

Adopting a systematic approach

To help organisations move away from the ad hoc approach of the past and adopt a more systematic and rational approach to alarm management, in 1999 the Engineering Equipment and Materials Users Association (EEMUA) released the document '191 Alarm Systems - A Guide to Design, Management and Procurement'.

This guide has rightly become the global reference point for alarm management. Its second edition - available from June 2007 - significantly updates and builds on the first. Designers and operators have much to gain from using EEMUA 191 when undertaking improvement of their existing alarm systems or launching into a new alarm management program.

The essentials

To understand how best to improve an existing alarm system or introduce a new alarm management program, it is useful to approach the task using the steps outlined in the well-known Six Sigma sequence: define, measure, analyse, improve, control.

Define

Successful alarm management is based on a comprehensive and consistent alarm philosophy document that defines:

* Business objectives to be met.

* Needs and requirements of the users of the alarm system.

* Alarm system design principles.

* Compliance parameters.

* Roles and responsibilities.

* Criteria for alarm generation, setting, prioritisation and presentation.

* Management of Change (MOC) (for example, tracking authorised and unauthorised changes to alarm settings, alarm suppression or shelving).

* Training/maintenance parameters.

* Escalation guidelines (moving from normal status mode where operators are trying to keep the process within the 'safe envelope' to emergency/disaster management).

Measure

Typically, with thousands of alarms per site, a 'stocktake' of the existing process, alarms and trends is critical before any changes are implemented. But while engineers and designers appreciate the benefits of such an exercise, the task - by virtue of its scale - can be quite daunting.

This is where plant Historians (central data repositories that gather, historise, archive and distribute plant data) can simplify the task. For example, CitectSCADA Reports, the plant-wide reporting solution from Citect, is capable of accurately recording all alarm data and tag values at high speed. Such a tool can help engineers and operators to gather and organise alarm data from across the entire site.

Analyse

If gathering data from thousands of alarms appears daunting, then analysing such data to derive useful insight can be even more formidable an undertaking.

Some plant Historians provide assistance with this by helping engineers and operators with the following:

* Event analysis: pulling up all alarms that occurred at a given point in time, be they basic process alarms or aggregated alarms or even critical safety-related alarms.

* Alarm and event archiving: historising all alarms and events for long term analysis.

* Alarm analysis:

- Identifying consequential/source alarms around which other alarms are triggered.

- Identifying nuisance alarms such as stale alarms (that remain present for extended periods of time), chattering alarms (that go in and out of alarm mode in a short span of time), or duplicate alarms (that persistently occur within a short period of time of another alarm). Pareto analysis can help rank nuisance alarms by frequency; to help detect the so called 'bad actors'.

- Identifying shelved alarms (temporarily suppressed) or permanently suppressed alarms (that are prevented from appearing on the operator's screen).

- Alarm setting analysis by the state/mode of operation of the plant.

Benefits of analysis

Cutting through the clutter: EEMUA 191 suggests that 150 alarms per day (one every 10 minutes) presented to an operator is 'very likely to be acceptable' and 300 alarms per day (an alarm every 5 minutes) is considered 'manageable'. In reality it is not unusual to record tens of thousands of alarms per operator per day, which makes such a system self-defeating. Identifying nuisance alarms helps to eliminate unnecessary or ineffective alarms, thus bringing the number of alarms per operator to a more manageable ratio. To do this, clear justification for each alarm is required. An alarm's reason for being should be related to a specific problem or abnormal situation and also to a specific and defined operator response. If there is no problem, or if the alarm is not intended to elicit specific operator action, then its legitimacy should be questioned. A process indicator or alert does not automatically equate to an alarm.

Under the carpet: aAnalysing shelved alarms can help highlight potential reductions in alarm numbers as well. More importantly, by looking at how long important alarms have been shelved or permanently suppressed, operator practices can be corrected (having shelved an alarm, there is no guarantee that the operator remembers to go back to reactivate it).

The heart of the matter: identifying root alarms or consequential alarms helps ensure that in an alarm flood, prioritisation models have been configured such that the consequential alarm does not get lost or remain unnoticed. For consequential alarm and event analysis, most Historians would compare one set of alarm data with another set of alarm data (depending on the query placed). However, what is even more useful is to be able to compare alarm data with plant/process trend data. This is significant because alarms - being reactive in function - cannot anticipate by themselves any process drift towards an abnormality which could eventually lead to breakdown or process failure. Correlating alarm data with trend information can help throw up such insight. It can also help in fine-tuning alarm settings and in linking alarm spikes to specific process conditions (startups, shutdowns, change in process set-points such as tank levels, pressure, temperature levels, etc), changes in instrumentation or new or changed control system configurations. In addition, it is by analysing operator response to alarms (and not simply focusing only on alarm data) that poor alarm system design is identified.

CitectSCADA Reports offers such an option, since it historises both alarms and plant process trends. This way, alarm and event data can be correlated to trend data from the plant to throw up anomalies or areas for alarm rationalisation or even assist in incident reviews.

Improve

The analysis stage seeks to assess each alarm from the standpoint of the alarm philosophy of the organisation and typically leads to certain specific improvements:

* Reduction in needless alarms.

* Recalibration of alarm parameters where necessary (such as action, set-point, detection time, etc).

* Bringing in consistency in alarm settings where desirable.

* Prioritisation of alarms where required.

* Reorganisation of the presentation of alarms if needed (to ensure relevance to operator, visibility, etc).

This process of alarm rationalisation and system improvement is clearly a laborious, expensive and disruptive effort, but the support of robust alarm analysis can help simplify this step. While implementing this step, the temptation is to focus only on the 'bad actors' ie, the low-hanging fruit. This, as an initial focus, is appropriate - given that it provides immediate relief to an overloaded system. However, alarm floods (which involve many more alarms presenting than just the top five or 10 most frequent alarms) can be minimised only by undertaking a total rationalisation exercise of all alarms in the system.

Control

Successful alarm management rests on what tools are used to ensure that the key performance indicators (KPIs) set out are achieved so that gains are sustained. This also involves creation of appropriate training material for new personnel who get involved, procedures and manuals for management of change (MOC) and ongoing review of analysis findings from the Historian. The operative word here is 'ongoing'. This is because new nuisance alarms have the habit of appearing surreptitiously (probably the result of instrumentation failure, changes to plant equipment and process conditions, lack of adherence to MOC procedures, or inadequate justification for new alarm additions).

Synergise

In a 'learning organisation', the fruit of analysis of the alarm system is shared with other stakeholders who are not necessarily on the plant floor. This is also helpful when plant engineers need to keep senior management informed of progress in alarm system improvements and to justify future investments in the alarm system to senior management. With CitectSCADA Reports v4 and above, which uses an embedded Microsoft SQL Server 2005, operators, engineers and management will deal with an industry-standard data storage and exchange tool. Reports can be delivered in a variety of formats (such as pdfs for regulatory reports, Excel spreadsheets that allow any user to immediately extract data for further analysis or web pages that can be integrated with other business systems in the organisation).

Citect also offers consulting services direct and through its Professional Services and System Integration partners, to add further value by guiding customers through the process of developing their alarm philosophy and rationalising their alarms.

The next level

To ascertain clearly what is the extent of improvement required in an alarm system (gap analysis) or to measure improvements after a new alarm management program has been initiated, it is useful to compare the system with industry best practice.

To undertake this, benchmarking tools such as Citect's Meta can prove useful. Meta enables comparison of KPIs across plants, divisions and countries; both within an organisation or across an industry peer group. Some alarm KPIs that could probably form the basis for such benchmarking include:

* Average number of alarms per hour.

* Maximum number of alarms per hour.

* Percentage of hours where there were >30 alarms per hour.

* Operator response time.

The 'human' dimension

In the final analysis, successful alarm management is not about the equipment or the alarm, but about people who impact and are impacted by the alarm system - operators, process and control engineers, maintenance personnel, shift supervisors, instrument and control system technicians, designers, safety officers, training staff and senior management. To implement a successful alarm management program requires factoring in the different expectations and priorities as well as the differing levels of awareness and understanding among these diverse groups of stakeholders.

Tools that can help to effectively share alarm analytics and the resulting insight across these stakeholders in a simple, relevant, meaningful and easy-to-understand format will help ensure that alarm management is fed back the multilevel and multidisciplinary input it requires to validate it and keep it relevant to the business objectives and the alarm philosophy of the organisation. Tools that can take alarm KPIs and benchmark them against industry best practice could take alarm management to the next level and provide the organisation with alarm report cards that can directly result in improved productivity, profitability and safety.

For more information contact Niconette du Toit, Citect, +27 (0)11 699 6600, [email protected], www.citect.com





Share this article:
Share via emailShare via LinkedInPrint this page

Further reading:

HMI with maximum performance in the smallest of spaces
ifm - South Africa SCADA/HMI
Whenever clear communication, precision and performance in the smallest of spaces are required, the most compact member of ifm’s ecomatDisplay family is the perfect choice. The 11 cm HMI makes no compromises when it comes to human-machine interaction.

Read more...
Real-time data acquisition and reporting
Adroit Technologies SCADA/HMI
As the authorised distributor for Mitsubishi Electric’s Factory Automation, Adroit Technologies provides a range of factory automation products that include scada, PLCs, drives, HMIs and robots. Together, ...

Read more...
Upgrading your control system? Avoid these myths and misconceptions
Iritron SCADA/HMI
An upgraded control system has many benefits. However, the industry is plagued with control system upgrade myths and misconceptions. We explore the most common misconceptions and provide recommendations for mitigation.

Read more...
Display for controlling mobile machines
ifm - South Africa SCADA/HMI
The new ecomatDisplay dialogue modules from ifm have been developed for use in cabins and outside vehicles.

Read more...
Scada systems essential for smart, sustainable water sector
ABB South Africa SCADA/HMI
Uptake is hampered by a lack of project funding and slow implementation. Only when plants are automated can responsible water use be implemented effectively.

Read more...
Circular TFT displays with rotary switch
SCADA/HMI
The display sizes available are 1,3-inch, 2,1-inch, and 2,47-inch, making them ideal for applications such as heating systems, industrial controls, IoT devices and boilers, among others.

Read more...
Intuitive visualisation for the digital age
Emerson Automation Solutions SCADA/HMI
Emerson’s new PACSystems RXi HMI delivers intuitive graphics, smartphone-like usability, collaboration from anywhere and industrial ruggedness.

Read more...
Visualisation system sets new standards
Siemens South Africa Editor's Choice SCADA/HMI
The combination of Simatic HMI Unified Comfort Panels with WinCC Unified software, augmented with open APIs and option packages, delivers a fully scalable system for operator control and monitoring.

Read more...
Move over scada – New OIT/HMI systems provide increased choice
Omniflex Remote Monitoring Specialists SCADA/HMI
Omniflex’s EasyView range of HMIs can communicate with a variety of PLC and PAC hardware, and provides engineers with a flexible system to manage plant operations.

Read more...
Why telemetry should form a critical part of your water management systems
Schneider Electric South Africa SCADA/HMI
A complete, integrated sensor-to-enterprise solution can help utilities and operations to manage and run secure and reliable water infrastructure.

Read more...