Sunday, October 26, 2008

What is a process safety incident?

During about 10 years at a major Canadian petrochemicals producer a number of undesired events occured at the plant and at other facilities we had collaboration with. Among them are the following events:
  • An employee jumps from a 4 feet high platform onto a concrete floor in the packaging area of a polyethylene plant. Unfortunately he don't land on his feet, but falls and breaks his back. He is instantly dead. No chemicals were involved. No process equipment was involved. No acute release took place. It is definitely an undesired event.
  • During maintenance work on a bulding housing a polyvinylchloride plant a ground operated man lift was used. The bucket controls malfunktioned resulting in an employee in the bucket beeing squezzed to death against some piping on the outside of the building. No chemicals were involved. No acute release took place. It is definitely an undesired event.
  • Unplanned shut down of a polyethylene reactor is an abnormal process conditions, which leads to the burning of a significant amount of hydrocarbons and an intense heat flux to nearby ground areas - up to more than 100 feet away from the flare. This event is also a quite costly one - properly in excess of $100.000. There are chemicals involved, but they are realased or burned through equipment designed for this event. There is definitely an acute release, but it is through equipment designed for it. It is definitely an undesired event.
  • During patching of some software on a Honeywell PMX system two constants in the computer system was interchanged. The result was, that valves, which should close did the opposite, i.e. they opened, and valves, which should close did also the opposite, i.e. they closed. The result was significant process deviations and releases through safety valves. This was definitely an undesired event.
Clearly not all undesired events should be counted as process safety events, or maybe they should only be counted as process safety events to some degree. According to the CCPS criteria for a process safety incident none of the above events qualify as a process safety incident. However, they all happened because there was a chemical plant.

Could these events have been prevented? Yes, the fall from the packaging platform could have been prevented by equipping it with a stair in place of the vertical ladder. The squezzing of an employee could have been prevented by installation of a dead-man button in the bucket. The incorrect patching of the control system software could have been prevented by patching during plant downtime and testing before plant startup. Would these events have happened if there was no chemical plant at the site? No. They are linked to the production of chemicals!

All four events were treated by management as major undesired events associated with their facilities. The first two events had significant impact on every engineer at the site they occured at. 3 of the 4 events were treated as significant incidents by management. None of them qualify as process safety incidents according to the CCPS criteria. Why? Could it be that the CCPS purpose is just to count events, which are caused by operations errors in running the process, maintenance errors in maintaining the process, or process design errors? I don't know.

A few years ago Shell UK reported in their annual report, that the had experienced to fatalities the previous year. The fatalities did not involve Shell employees. The fatalities did not involve contractors working at Shell UK sites. The fatalities involved road accidents involving a subcontractor.

If we want to improve performance of our chemical plant, then we need to focus on undesired events, and using our employee skill to understand why they occurred, and how we can prevent them in the future. Whether a particular event is labelled as a process safety indicent or somethingelse really is not that important, as long as we see the event as an opportunity to improve our performance. This approach is what drove the quality movement in Japan after WWII, and what drove Ford and many other companies in the 80's. A clear focus and belief, that we can do better!

CCPS Global Process Safety Metrics - How can they be improved?

I was quite excited when I read the presentation "Global Process Safety Metrics - Driving Consistency and Improvement" by Tim Overton, who is chief safety engineer at The Dow Chemical Company. These first results of the CCPS Metrics Project weres presented at this years AIChE Spring Meeting and CCPS Conference on April 6th. The goal of the process safety metrics according to the presentation is to help company management answer questions such as
  • Is our company headed for a major accident?
  • How is our company's process safety performance compared to others?
and questions more aimed at the general public, such as
  • Which companies are becoming safer?
  • Is the chemical industry improving its process safety performance?
The CCPS Metric Project, of which Tim Overton is chairman, was initiated by the CCPS following recommendations in the Baker Panel Report and the CSB Report on the explosion and fire at BP's Texas City Refinery on March 23rd, 2005. Due to the appearant failure of BP's corporate management in monitoring process safety in their plants the Baker Panel recommended that BP should "develop, implement, maintain and periodically update an integrated set of leading and lagging performance indicators for more effectively monitoring the process safety performance". And the CSB recommended, that the API and USW should "create performance indicators for process safety in the refinery and petrochemical industries" and involve relevant scientific organizations and disciplines in this work.

In his presentation Tim Overton state, that the current situation is characterized by process safety metrics, which differ from organization to organization, and are likely based on incident definition that are not well aligned to the actual hazard of the event. This could well be so, but how does this prevent such metrics from being used to drive improvement in process safety performance within the particular company in which they are used?

The aim of the CCPS Process Safety Metrics Project are
  • Define metrics that focus on process safety as contrasted to personal safety
  • Define common industry-wide lagging process safety metrics
  • Define near-miss or other lagging process safety metrics
  • Define leading indicator process safety metrics.
Therefore it is of course necessary to define what constitutes a process safety incident. In the presentation the criteria for a process safety incident are a) employee / contractor lost time injury, b) fire or explosion resulting in direct damage cost of at least $25.000, c) release from primary containment of quantities greater than chemical release threshold quantities. I see several problems with such a definition. Firstly an event, which just involved first aid but had the potential for large scale injury and / or damage is not counted. Secondly the amount of damage involving a cost of $25.000 will change with time and properly also location. Over time this could influence the historical trend of the metrics. Thirdly defining threshold quantities indirectly define distances of concern as used e.g. in the Dow Chemical Exposure Index. This is very relevant in risk assessment, but in my opinon seems odd in performance monitoring.

Well, if we can somehow agree on how process safety incidents are counted, then the following lagging metric can be defined:
  • Count of Process Safety Incidents (PSI).
  • Process Safety Indicent Rate (PSR) = Total count of all process safety incidents times 200.000 divided by Total employee and contractor work hours.
Furhtermore by assigning a severity score to each process safety incident, then a Process Safety Severity Rate (PSSR) can be defined as the total severity score for all process safety events times 200.000 divided by Total employee and contractor work hours. The 200.000 hours used in these calculations are of-course 100 employees working 40 hours per week for 50 weeks per year. Using this number create a parallel to the fatal accident rate or FAR. Indirectly, we thus know the count of incidents is for a year.

My concern about the CCPS Metric Project increased significantly after reading the booklet "CCPS Process Safety Metrics". This booklet describe carefully which events are counted as a process safety incident and which onces are not. The different metrics appear to be created by engineers for engineers. Let me give an example. The booklet describe how a release of 933 kg per hour of gasoline is not a process safety incident, because it is below the threshold of 1000 kg defined for this kind of material. How accurate can we in real life situations calculate the amount involved in a release? Can we be certain that this release did not involve 1017 kg per hour of gasoline? This difference, I believe, could be just due to the value of the discharge coefficient used in the estimation or the uncertainty in the level measurement used to calculate the amount.
In my view the ideas of fuzzy logic, which have been used to great advantage in e.g. cement kiln control, should properly be used to decide if a particular event was a process safety incident or not. This could create concepts of how much of a safety incident a particular event is.

The idea of the degree of process safety event could be applied to events, which are currently according to the CCPS booklet not considered a process safety incident. An example is a fire in a laboratory associated with a plant. The laboratory is there only because the plant is, and the laboratory works with the materials it does because of the plant. So in some sence the events in the laboratory occur due to the precence of the plant. Without the plant there would be no need for the laboratory. Therefore in my view undesired events in the laboratory should to some extend be considered as process safety events. By how much need to be defined.

Although I may appear critical of the current results of the CCPS Metric Project, I do believe, that defining common lagging metrics is a step in the right direction. However, without some directions as to how these metrics are to be used inside and outside the companies, which have to calculate them, then I fear they will end being just another administrative burden on the companies. Also the idea of fuzzyfying the criteria for defined a process safety indicent should be explored. Therefore I call upon the CCPS and other involved in process safety such as the EPSC and the EFCE Working Party on Loss Prevention and Safety Promotion to start thinking about how the proposed metrics and other metrics can be linked into the company management system and the actions company management need to take based on whatever metrics they have access to, and how the border between what is a process safety event and what is not a process safety event could be softened.

Leading metrics are much more difficult to define. At the 2007 Loss Prevention Symposium in Edinburgh there was a presentation by Unni Nord Samdal from Norsk Hydro Oil & Energy Research Centre in Porsgrunn on an indicator for technical safety called the T-Rate. This showed how difficult it is even within a single company to create meaningful leading indicators related to process safety performance. The CCPS Metric Project suggest the following as possible leading indicators:
  • Mechanical integrity inspections done divided by mechanical integrity inspections due.
  • Number of past due action items divided by total number of action items.
  • Percent of MOC's satisfying the MOC polity!
  • Percent of operators trained on schedule.
  • Survey of safety culture.
  • Number of activations of safety systems.
  • Number of activations of relief valves.
  • Number of deviations outside operating limits.
I am inclined to favour the simpler the better principle. Leading indicators, which are easy to understand and easy to use, are also more likely to be used by management. From the above list this would be number of activations of safety systems, number of activations of relief valves, number of deviations outside operating limits. I would add to this the alarm rate, i.e. number of console alarms per hour or shift.

I once worked in a large Canadian oil company at which the alarm rate in their refineries and chemicals plants was rather low. At the time we had a rather good safety performance in terms of the lagging indicators - first aids, loss time and fatalities - so this has colored by belief, that keeping the alarm rate low by using good process control is a significant step to good process safety performance.