Wednesday, August 17, 2016

The reporting about a flash fire at an oil terminal

Yesterday my Hydrocarbon Processing Daily News reported on a flash fire at the Sunoco Logistics Terminal in Nederland, Texas last Friday. This what the story looked like in the email I received from Hydrocarbon Processing:
:

A Sunoco spokesperson stated to Hydrocarbon Processing that the workers were either preparing to weld or welding a pipe to a new crude storage tank.

But what do we know?
We know, that a spark ignited some hydrocarbon vapors in and around the workers resulting in a flash fire, which critically injured four of seven contract workers on the particular job. We also know, that the other workers on the job suffered minor injuries, and that the critical injured were transported to different Texas burn centers.

Unfortunately Hydrocarbon Processing was not sufficiently concerned about the status of the critically injured workers to provide an update on their condition in the story distributed yesterday - four days after the fire. However, local officials talking to media on Saturday morning didn't have updates on the status of the critically injured either.

And what do we not know?
We also know that local sheriff's offfice issued the following statement: "We would like to reassure the public that there was no danger to residents who live near the plant.". I would say that statement is wrong because, when the fire started it could have spread to nearby crude storage tanks and resulted in a crude storage tank fire. So because the fire was quickly controlled, then there was no danger to the public.

Could we have avoided this event?
Properly were easily by monitoring the perimeter of areas with workers with hydrocarbon detectors or just having the workers wear hydrocarbon detectors. Such detectors could properly have warned the workers sufficiently to result in lessor injuries.

This flash fire is an example of not paying attention to the process around you when working in refinery or chemical plant, and it is in my view the responsibility of facility management to ensure, that all workers both employees and contractor employees focus on process safety - and not just personal safety.

As always is comes down to cost and who pays. Who pays for hydrocarbon detectors for contractor employees? The contractor? or the company hiring the contractor? Who has most at risk? Clearly the company hiring the contractor. Who pays for the treatment of the injured workers at the burn centers? Texas taxpayers? Or the company hosting the incident?

Saturday, August 13, 2016

CSB Safety Alert Should / Can Be Improved and Increase Impact

During the past week the CSB issued a Safety Alert about high temperature hydrogen attack. You can read this here. In this safety alert the CSB use the following structure

  • Background and Investigation Findings
  • CSB Recommendation No. 2010-8-I-WA-R10
  • Catastrophic HTHA Equipment Failure Can Still Occur Using the New API Nelson Curve
  • CSB Safety Guidance to Prevent HTHA Equipment Failure
I think the CSB would have much higher impact with the following structure:
  • CSB Safety Guidance to Prevent HTHA Equipment Failure
  • Background and Investigation Findings
  • Catastrophic HTHA Equipment Failure Can Still Occur Using the New API Nelson Curve
An just for inspiration here is what this could look like (Please send comments!):


CSB Safety Alert:


Preventing High Temperature Hydrogen Attack (HTHA)
Based on the findings in the latest its Investigation Report (CSB Report 2010-08-I-WA, May 2014) about the Catastrophic Rupture of Heat Exhanger at the Tesoro Anacortes Refinery on April 2nd, 2010 resulting in seven fatalities and insufficient guidance from the API in their revised API RP 941 (2016) on the CSB issue the following recommendation to industry on preventing high temperature hydrogen attack (HTHA):
CSB Safety Guidance to Prevent HTHA Equipment Failure
1. Identify all carbon steel equipment in hydrogen service that has the potential to harm workers or communities due to catastrophic failure;
2. Verify actual operating conditions (hydrogen partial pressure and temperature) for the identified carbon steel equipment;
3.  Replace carbon steel process equipment that operates above 400 °F and greater than 50 psia hydrogen partial pressure;
4.  Use inherently safer materials, such as steels with higher chromium and molybdenum content.

Background & Investigation Findings

The U.S. Chemical Safety and Hazard Investigation Board (CSB) has found that inadequate mechanical integrity programs, including preventive maintenance to control damage mechanisms and aging equipment at chemical facilities, have been causal to incidents investigated by the CSB. The CSB’s investigation into the catastrophic failure of a forty-year-old heat exchanger at the Tesoro Refinery in Anacortes, Washington, determined that the fatal explosion and fire was caused by a damage mechanism known as high temperature hydrogen attack (HTHA), which severely cracked and weakened the carbon steel heat exchanger over time, leading to a rupture.1
Industry uses a standard for determining vulnerability of equipment to HTHA, known as American Petroleum Institute (API) Recommended Practice (RP) 941, Steels for Hydrogen Service at Elevated Temperatures and Pressures in Petroleum Refineries and the Petrochemical Plants. The standard uses “Nelson Curves” to predict the operating conditions where HTHA can occur in different types of steels. The curves are based on process data voluntarily reported to API, and are drawn beneath reported occurrences of HTHA to indicate the “safe” and “unsafe” operating regions. The CSB investigation identified that Tesoro, like others in the industry, used API RP 941 to predict susceptibility of equipment to HTHA damage. The CSB found that HTHA occurred in the Tesoro heat exchanger in the “safe” operating region – where API RP 941 did not predict HTHA to occur.
Predicting and identifying equipment damage due to HTHA is complex. The CSB concluded in its investigation of the Tesoro Anacortes incident that using inherently safer materials of construction is the best approach to prevent HTHA. The carbon steel Nelson curve has repeatedly proven to be unreliable to predict HTHA. For example, the 2016 edition of API RP 941 reports 13 new failures below the carbon steel Nelson curve. In addition, inspecting for HTHA is difficult because the microscopic cracks can be hard to localize and hard to identify. The CSB concluded that inspections should not be relied on to identify and control HTHA, as successful identification of HTHA is highly dependent on the specific techniques employed and the skill of the inspector, and few inspectors were found to have this level of expertise.
As a result of its findings, the CSB made a recommendation to API to further prevent the occurrence of HTHA by revising RP 941 as follows (CSB Recommendation No. 2010-8-WA-R101):
Revise American Petroleum Institute API RP 941: Steels for Hydrogen Service at Elevated Temperatures and Pressures in Petroleum Refineries and Petrochemical Plants to:
  1. Clearly establish the minimum necessary “shall” requirements to prevent HTHA equipment failures using a format such as that used in ANSI/AIHA Z10-2012, Occupational Health and Safety Management Systems;
  2. Require the use of inherently safer materials to the greatest extent feasible;
  3. Require verification of actual operating conditions to confirm that material of construction selection prevents HTHA equipment failure;
  4. Prohibit the use of carbon steel in processes that operate above 400 oF and greater than 50 psia hydrogen partial pressure.

Catastrophic HTHA Equipment Failure Can Still Occur Using the New API Nelson Curve

In February 2016, API published the 8th edition of RP 941. Though this updated guidance does provide incremental improvements, it does not address important elements of the CSB’s recommendation. In the 2016 version, there are now two carbon steel Nelson curves, distinguished by whether the equipment has been post- weld heat treated (PWHT). API’s curve for non-PWHT carbon steel is drawn below the 13 newly reported failures. This Nelson curve does not, however, take into account all of the estimated process conditions where catastrophic failure occurred due to HTHA at the Tesoro Anacortes Refinery. As a result, the new curve allows refinery equipment to operate at conditions where HTHA severely damaged the Tesoro heat exchanger. The use of a curve not incorporating significant failure data could result in future catastrophic equipment ruptures.
In addition, the updated standard does not establish minimum requirements to prevent equipment failure due to HTHA or require the use of inherently safer materials. API already identifies materials that are not susceptible to HTHA failure in API 571.2 The CSB ultimately believes that the stronger option for industry to protect against HTHA is to focus on upgrading equipment susceptible to HTHA with inherently safer materials of construction rather than simply relying on administrative controls. Not only is HTHA very difficult to detect but equipment inspections and post-weld heat-treating rely on procedures and human implementation, which are low on the hierarchy of controls. These options are weaker safeguards to prevent HTHA failures than the use of materials that are less susceptible to HTHA damage. As a result of these noted deficiencies, the Board voted on July 13, 2016, to designate its Recommendation 2010-08-I-WA-R10 with the status of Closed – Unacceptable Action.
Inadequate mechanical integrity programs were causal to several recent incidents investigated by the CSB. In its “Most Wanted Safety Improvements,” the CSB identifies Preventive Maintenance—which includes actions to effectively control damage mechanisms—as a critical industry-wide improvement to prevent catastrophic incidents. The CSB also calls on regulators to modernize U.S. Process Safety Management regulations, including requiring inherently safer systems analyses, as a way to prevent catastrophic equipment failures. More information about these safety topics is available at: http://www.csb.gov/mostwanted/.

References

  1. Further information on the CSB’s investigation of the Tesoro Anacortes Refinery Explosion and Fire can be found at: http://www.csb.gov/tesoro- refinery-fatal-explosion-and-fire/.
  2. American Petroleum Insittute (API) Recommended Practice (RP) No. 571 (2003): “Damage Mechanisms Affecting Fixed Equipment in the Refinery Industry” states on page 5-83:  “300 Series SS, as well as 5Cr, 9Cr and 12 Cr alloys, are not susceptible to HTHA at conditions normally seen in refinery units.”

Monday, August 08, 2016

Process Reliability Starts with Knowledgeable People

Some time ago J.D.Stroup of Solomon Associates explained in a Hydrocarbon Processing Viewpoint "What characteristics define the world's best refineries?" that refinery performance correlates with maintenance execution. That was in the May issue two years ago, and at the time I questioned if there would be a correlation between refinery performance and process safety performance, but unfortunately Solomon Associates don't collect process safety performance. At the start of this summer in the June issue of HP associate editor H.P.Bloch followed up with an excellent advice to all process industry management, that process reliability - a key to excellent performance - starts with knowledgeable people.

In the June issue of this year Heinz Bloch describe a situation were a number of small deviations in the use of bearings resulted in major pump problems at two different refineries in the maintenance and reliability article "How small deviations and lack of management access compromise reliability". In the article six minor deviations, which added up to a big problem are described. The deviations were: 1.An open oil return notch at the 6 o'clock position of the housing bore resulted in some oil mist being bypassed and was no longer available for the dual purposes of lubrication and cooling, 2. An unusually wide bronze cage acted as a restriction orifice for the remaining oil mist, 3. Not using directed classifiers when the velocity at the shaft periphery exceeded 2000 fpm,  4. A large distance from reclassifiers to bearings at high windage (angular contact), 5. High viscosity oil created a small puddle of oil at the 6 o'clock position of bearing's outer ring race slightly slowing the many rolling elements as they dip into the puddle causing skidding or smearing in the machined pockets of the brass/bronze cage, and 6. A shaft-to-bearing interference fit at the upper limit of the permissible range possibly resulting in bearing preload adding to an already existing high temperature. To understand the details of the article you have to be a bearing and pump specialist, and I am not even close to that.

Nonetheless I have no problem understanding the key message, that the increased cost of implementing deviation-avoiding steps would be more than balanced by increased reliability. But to get there you need respected and experienced subject matter experts (SMEs), who have access to high management. At the company I used to work for such technical experts were called "associates", and the had among other things the freedom to travel worldwide, as they deemed necessary for using their skills. Without such knowledgeable people with access to corporate level management, any field experience will likely go unheeded, and blame for an event shifted to a lowly employee. Heinz Bloch conclude, that the process industry can only hope that good managers truly wish to do something about the indifference to learning from the mistakes of others, and that such manager insist of facts and accountability. I think this is true both in process reliability and in process safety.

While you are reading the June issue of HP, then don't skip the reliability blog in which H.P. Bloch discuss the current biases and challenges facing the process industry, such as cash outlay versus cost over time as well as the absence of groomed and nurtured experts.









Friday, August 05, 2016

Have you learned the lessions from Stuxnet?

Stuxnet managed to damage centrifuges at a plant in Iran, which was supposedly remotely inaccessible. None the less the creators of Stuxnet managed to get their software to the control network at an Iranian facility and execute it to damage some equipment.

This shows us, that if there is a vulnerability inside your control network, then it really does not matter if this network is directly connected to the internet or not. If there is a transport path to the control network, then it is vulnerable. Such a path exist in all control networks, because the nature of Windows based software systems is, that they need to be regularly patched to eliminated bugs and vulnerabilities. Regularly could be as often as every six months or as infrequent as each turnaround. Stuxnet showed us, that potential attackers have the patience and time to wait.

Recently Joe Weiss published an opinion in the Unfettered blog at ControlGlobal titled "Process safety and cyber security - they are not the same", which I totally agree with. I used to work as a process control engineer at a major Canadian petrochemicals producer, where we enjoyed the freedom to adjust the gains on control loops on a day to day basis, when we or an operator or a technician judged, that a loop needed attention. Today at some facilities are subject to MoC reviews and signoffs. We sure have complicated operations life by not behaving properly.


Recently Mike Basidore reported from CONNECT 2016 about ExxonMobil's views on configurable I/O. Sandy Vasser explained at CONNECT 2016, that in the not to distant future ExxonMobil would install a new HART enabled field device, and have the configuration of the device automatically downloaded from the process control system. With this the vulnerability issue around process control systems expands from the control room to the computers and interconnections used to create the configuration file. Bugs in the configuration program could be potentially exploited to change the functioning of field instruments. Also at issue is who has access to the configuration program and file. Nonetheless I believe, that ExxonMobil's view about configurable I/O is the way towards more effective plant commissioning and operations. The bottom line is, however, that the security of the process control system is no longer just a plant issue.

So there is every reason to participate in the discussion of digital safety systems for critical, high, risk applications, such as those found in our refineries and chemical plants including often overlooked level monitoring and alarm systems.

Wednesday, August 03, 2016

When will the US process industry wake up?

Early this morning (or late last evening - depending on were you are living) the CSB released their Case Study of about a series of sulfuric acid releases more than two years ago at the Tesoro Martinez Refinery in California. The CSB news release state, that the report is on Facility Safety Culture. This emphasized on the report cover, where it is stated "A strong process safety culture is necessary to help prevent process safety incidents and worker injuries". I agree 100%!

So do the Case Study contain any advise about how one achieve "a strong process safety culture"? Or how to know if a particular facility has that? I think these are important questions for company board members to ask themselves and discuss with fellow board members at board meetings. After all according to the Baker report THEY are responsible for process safety.

The case study include excellent analysis of the two sulfuric acid releases in early 2014 well illustrated with pictures. It also points to many concerns about the safety culture at the refinery. However, as many previous CSB Case Studies and Investigation Report this case study also end calling for more effective regulation from the authorities.
What the hell is a construction crane doing in the middle of an operating refinery?
But in this case study one does find any help on how to create a strong process safety culture. I know some, which exhibit it, e.g. ExxonMobil or Dow/DuPont (I unfortunately don't know what the combined entity is called). From what the CSB calls a list of issue with the process safety culture I can only conclude, that a safety culture did not exist at the Tesoro Martinez Refinery. In my view the issues are clear indication of the absence of process safety knowledge from the management group at this refinery.

The CSB also calls for proactive inspections by authorities to help companies, such as the Tesoro Martinez Refinery implement good safety practices. However, to require that the authorities employ people more knowledgeable about refinery operations, than the people employed by the refineries. I don't believe that is the situation in California or any other place in the world. The issues highlighted by CSB in the executive summary clearly indicate a lag of knowledge about importance areas of process safety among the leaders at the Tesoro Martinez Refinery. That in my view can only be fixed by hiring people with the relevant knowledge.

Process safety incidents continue to take place, and they continue to result in injuries to workers and losses for these workers families. I think it is time to focus in those who can change thinks at the refineries, and that is the boards of the companies operating them in cooperation with the managers they have put in charge of the daily operations. Under the current system these persons go free. Their salaries are not reduced after incidents. They don't go to jail after fatalities. What has happened to management responsibility?

Each incident, such as ones detailed in the current case study damage the image of the whole process industry. So were is the process industry leaders, who can fix the current state of affairs? Those with a vision about incident free operations. What has happened to process industry leadership?

Does size matter? The six refineries operated by Tesoro process less crude together than one of ExxonMobil's refineries on the Gulf Coast. Can small refineries be operated safely? It is rather unfortunate, that we don't have studies, which compare refineries safety performance the same way that their operational excellence and maintenance excellence are compared by Solomon Associates. When do we get that?




Tuesday, August 02, 2016

Do your board operators walk around your plant?

Recently Jerry J. Forest from Celanese published the article "Walk The Line" online in Chemical Processing. It is about a corporate wide initiative at Celanese to eliminate loss of containment incidents due to equipment not being properly lined up prior to e.g. a startup. In the article Mr. Forest discuss taking the root cause analysis beyond the usual conclusion "operator error" to also answer the question "why did the operator commit this error?".
Mr. Forest and his team at Celanese deserves credit for recognizing that the key to eliminating line up errors, when equipment was started up or re-introduced after maintenance was knowing the causes of past incidents beyond the common "operator error". In fact my personal belief is that "operator error" cannot be a root cause, but only an intermittent cause. The figure to the left indicate the decline of events at Celanese without a properly identified root cause. The result of Celanese's focus on root causes was the identification of three primary root causes for incidents related to line up: expectation for energy control not set, lack of continuity of operations, and deficiencies in operational readiness. Unfortunately Celanese chose not to include any indication about the number of LOPC  events without an assigned root cause in years prior to 2014. Why not include the numbers on the ordinate?
I would not be surprised if the original manuscript included figures with numbers on the ordinate, but during the normal legal review which company publications undergo, they were removed by the legal department - most likely without any discussion with author. Both in the EU and the USA companies have to report certain incidents to authorities, so one could properly - with a bit of work - discover how many reportable LOPC events Celanese experienced in the years before 2014. Some of these properly had root cause beyond "operator error" identified, but the trend would likely be similar to the trend in the figure. So let us get the numbers!
The figure with the green bars apparently show the decline in the number of line-up errors after the program implementation in 2013 (It is not unusual to initially see an increase, as is seen here). The lack of numbers also mean we don't know if the changes indicated are statistically significant, or random year on year variations (I believe the former).

I like the focus on a particular type of event, i.e. in this case LOPC. However, I believe that in order to get lasting results from such an effort, then you have to analyse also the minor events of this type, and not just those large enough to be reportable. A very successful program to eliminate minor fires was implemented by a major Canadian petrochemicals company about 20 years ago. That program involved treating minor fires as event the CEO should have knowledge about within 24 hours. At one company I used to work for there was a focus on reporting first aids, and therefore the first aid kit was in the shift supervisors office.That ensured reporting of even minor events.

It seems clear from Mr.Forest description, that operators at Celanese makes regular rounds. We used to call them housekeeping rounds, because one their purposes was to ensure, that housekeeping would not become a problem. However, as I continue to read the article, I see a clear focus on the operators. Unless an effort also have impact on supervisors and managers, then I doubt its sustainability. What do you think?