Wednesday, December 17, 2014

Measuring Safety

"Should Safety be Measured?" was the opening question by professor Erik Hollnagel from University of Southern Denmark at todays meeting on "Measuring Safety" at Herlev Hospital in Copenhagen, Denmark.

Eric Hollnagel went on the argue, that in what he calls "Safety I" we don't measure safety. We measure the absence of safety by counting things, which are easy to count. Such as the number of fatalities, the number of near miss events, the number of outstanding HAZOP items, the number of outstanding safety reivews, the number of first aids, the number of fires, the numbers of leaks, the number of explosions, etc. The thesis is that the numbers go down that safety improves. However, behind these numbers are events, which clearly indicate the absence of safety, so hence we are not really measuring safety, but just its absence. Maybe, that is why safety is not improved.

We have been doing this for almost 80 years. But have we improved safety? I would say: No! We are still having people killed almost every day at a chemical plant or refinery somewhere in the world. And there is management saying, that what gets measured gets managed. Usually we measure something in order to use the measurement for feedback - or in some cases feedforward - control. However, if we are safe the numbers we measure go to zero. Then what good are they in feedback control? This situation has led Erik Hollnagel to focus on the work being done and how it is being done. He calls this "Safety II".

As I see it, the idea appear to be, that in professional sports one analyze how the best performers perform, and learn from them. So why not adopt this approach for work in chemical plants and refineries? What do you thing?

Eric Hollnagel and Sidney Dekker does not argues, that we should abandon the things we do to create safe or at least safer plants, such as HAZOP, PHA, FTA, ETA etc. They just argue, that alone that has not improved safety over 80 years of using this approach. An within the last 10 years even some of the icon performers of the 1980's have failed, e.g. DuPont. So currently it is difficult to see who the best are, whom we can learn from.

Sunday, November 09, 2014

Risk Management and Rule Compliance - What is the connection?

Some time ago I noticed a discussion in the Process Safety group on LinkedIn with the headline "Do you have the right balance between risk-management and rule compliance?". I was curious, because to me rule compliance is one of the tools of risk management. So how could there be a balance the objective, i.e. risk management, and one of the tools used to perform this task, i.e. rule compliance?

The working of the complete post was "Risk-management and rule-compliance are inter-related strategies for promoting safety in hazardous industries. They are co-existing and complementary, not contradictory. We have free download that demonstrate what happens between the two strategies for a wide range of operational decision-making. Just let me know if you like a copy and I will email you". The post claimed to be from someone, who was production assistant at Oxygen360.

Since had certain issues with the wording of the original post, and did not know the company Oxygen360, I decided to google a bit before asking for the free download. Why don't they just provide a link to the free download? I quickly discovered, that there are at least two companies using the name Oyxgen360. One is a company in Australia, which based on their website is involved in the creation of digital content. From the website it is quite clear that this company is not at all involved in process safety or risk management, except maybe  the risk of people not liking the content, which they create.

The other company had the a website at http://www.oxygen360.com, which appear to be a UK based company. I write "had" because the website is no longer available after I posted a warning in the discussion on LinkedIn at the end of October. At that time it appeared to me, that the website http://www.oxygen360.com appeared to be involved in creating links between contractors and clients for short term work. For now however, they have disappeared from the web, and when I visited the site in October there was nothing available for free download. Hence it appear to me, that the post cited above was just an attempt to gathers e-mail addresses of safety professionals.

I am a bit concerned, that almost 100 safety professionals on one of the major LinkedIn groups related to process safety apparently without any hesitation respond to a post and expose their e-mail without first looking into the background for the post.

But maybe there is something out there about risk-management and rule compliance. The person behind the original post claim to current be working both for Oxygen360 and FutureMedia, and has an education from University of Technology in Sidney. FutureMedia is involved in creating training material for among other things workplace health and safety. But also FutureMedia have no free download about rule compliance. However, they do have many other interesting type of safety training material.

Therefore I have still not requested my "free download" about the balance between risk-management and rule compliance. However by googling a bit more I discovered a working paper from 2010 by Andrew Hopkins titled "Risk Management and Rule Compliance, Decision Making in Hazardous Industries", which I look forward to reading. Hopefully then I will understand the connection.

Thursday, October 02, 2014

Hiring for a safer plant and maintenance for increased profit

The August issue of Hydrocarbon Processing has some interesting articles about how to hire for a safer plant and how to use maintenance to increase profits.

Pier Parisi writes about maintenance of compressor control systems the article titled "Optimize control systems with preventive maintenance". His article contain to excellent examples on how lag of maintenance of compressor control system lead to reduced plant performance and increased maintenance. In the first example he describe how increased reaction time - just a few seconds - on the anti-surge control valve led to several unexpected plant shutdowns at gas production facility. The second example involved a suction throttling valve, which spend most of its time at the software low clamp. This clamp prevented the controller from closing the valve to the required 10-20% open. It turned out, that the software clamp was based on process conditions, that did no longer apply. In both examples the problems - unnecessary shutdowns or reduced plant throughput - could have been avoided with on-site control engineers to perform continuous maintenance of the control systems as process conditions change. The examples of the article focus on compressor control, but similar examples can be conceived for distillation or furnace control to mention some other areas, where conditions changes as time goes by.
The cost of the on-site control engineer can easily be recovered by improved plant performance. The industry leaders knew this in the early days of computer controlled plants, and continue the having control engineers, which are part of the operations team on site.

Dan Fearn and Mark Porter also focus on maintenance in their acticle titled "Optimize plant reliability with operator-based maintenance programs".  As far as I can tell from the article OPPM means, that operators perform many simpler maintenance task. A case history claims 30% reduction in maintenance labor cost. I wonder if this means the hour pay of operators are 30% less than the hourly pay of maintenance technicians. At least that was not the case in the plants I have worked in. In my view the focus of operators should be operating the facility in the most optimal way. If their attention is diverted from this task costly errors during simple maintenance work could be the result. I recall a recent CSB report about events at a DuPont site,where this was the case. So I clearly am quite sceptical about OPPM, and would rather see operator-initiated preventive maintenance. What is your experience on this?

I still wonder if we can show a correlation between maintenance cost and safety performance. Any data on this would be most welcome! We already know there is a correlation between maintenance cost and overall profitability.

The two articles discussed above indicate, that one way to safer plants is proper maintenance of the facilities, which the company operates. This avoids costly process safety events, such as unnecessary shutdowns. But are there other ways to safer plants? Yes,indeed!

One of the most underused is described by G. Ford in his article "Understanding people helps reduce safety risk in the workplace". Basically Mr. Ford describe the story of Charlie Morecraft, who in 1980 was employed by Exxon Corporation and how taking a short-cut nearly cost him his life. I was actually hired by Esso Chemical Canada around that time, and quickly got exposed to the attitude to safety. While on an interview trip I was allowed to visit a control room in the company of an employee, but after getting hired and starting work all plant visits of any kind was prohibited until I had completed the mandatory site safety training. The next training course was in 10 days time. That is a long time waiting if you are a new employee!

However, Mr. Ford go further by stating, that by asking the right questions during the interview process one can indeed hire the people with an attitude to safety in line with the company values. Already at Loss Prevention 2010 in Belgium a Canadian speaker revealed, that students applying for jobs with certain Canadian companies were screened during interviews based on their answers and attitude to safety related questions.

But even if you hire the best, then peoples attitude towards safety may change over the years due to personal experiences. Therefore attitude towards risk should be addressed individually, and not in one size fits all training sessions, as Mr. Ford points out in his article. Only then can you understand what a persons knee-jerk-reactions are, and then teach the person to manage his or her risk behavior. After all very few consciously and on purpose put themselves and others at risk.

I highly recommend, that anyone with any level of personnel responsibility in a process plant read and reflect on Mr. Fords article. It think it would be well worth your time!

Finally there is Marty Moran's article on operational excellence titled "How are leading organizations implementing operational excellence?", in which he hint that among leading organizations he counts Chevron, DuPont and ExxonMobil. First I would like to point out, that ExxonMobil only use 3 pages, and not 4 pages in their 2012 annual report on operational excellence.

ExxonMobil experienced, what I would call a wake-up event. It was when Exxon Valdez ran a ground at Alaska in 1989. Now more than 20 years later that event still cost the company money. But as the result of this wake-up event ExxonMobil 3 years later rolled out their Operations Integrity Management System on a world wide basis. In my view this changed how every ExxonMobil - and affiliated company - employees approached their work. As Mr. Moran points out OIMS at ExxonMobil helped every employee, team or site to focus on maximizing the value they contribute to the company all the way from the scientist asking was it the value and risk to our company of doing this or that experiment to the board members what are  the value and risk to our company of doing this or that investment.

So in my view OIMS goes beyond operational excellence to create a culture to aim for value and reduce risk. What is your view?


Monday, August 25, 2014

Are we updating the right standards?

Recently an article titled "Prevent Tank Farm Overfill Hazards" by WL Mostia from SIS-TECH Solutions was published on ControlGlobal. In this article Mostia briefly describe some of the spectacular overfill events, which have happened during the past 10 years, such as Buncefield in 2005 and Catanõ in 2009, and then continue to describe the regulatory changes, that has been implemented sinces these events, such as the recent 4th edition of API 2350 in 2012.

However, what really court my interest was the subtitle of the article by Mostia "Catastrophic Incidents Have Led to Useful Rules for Systems That Help Avoid Them". Especially since Mostia in the article describes how Buncefield had both a level gauge and an automatic high level shutdown, and at Catanô the tank monitoring application was not operational at the time of the event. To me this sounds a lot like maintenance failure was the course of these two events. Do we have a standard - prescriptive or not - for instrumentation system maintenance and testing?

To be honest, I don't know is such a standard exist. However, I would consider the failure to maintain the instrumentation at Buncefield a management failure. So do we need a management standard? Of course not, but we do need companies with better management control of their operations including the integrity of these.

It actually reminds my of a visit to a major US  Gulf of Mexico refinery more than 10 years ago. As the bus with 25 professors - mostly US - drove onto the site our host and guide reminded us that this refinery was run and maintained by engineers. This implied, that everything may not look pretty, but that they had control with the integrity of every tank, piece of process equipment, piece of pipe and other equipment on the site.

I thing that to create a step change in process safety we don't need more and/or better prescriptive standards, such as opdated API 2350's, but we do need more focus from both management and other employees on the potential consequences of something not working properly or being out of service.



Thursday, August 07, 2014

Process Operators - an overlooked resource for process improvements?

When I was working as a computer control engineer at a major Canadian oil company early in my professional life, I was fortunate to work with a process operator, who knew the process extremely well. Recently I read piece by A. Sofronas in Hydrocarbon Processing title "Random deck vibration with beats", in which he describe how process operators alerted the engineers to a an intermittent problem with a deck on which four centrifuges were mounted. Once in a while deck would shake every few seconds with a beat sound. This occurred for a few seconds and then stopped. Then it did not happen again for weeks or months. Without the eyes and ears of the field operators a problem such as this, would have gone unnoticed until equipment damage was observed. Mr. Sofronas story reminded me of the close collaboration I had with one particularly one operator.

At the major Canadian oil company operators were allowed to created their own displays on the DCS.  The mentioned operator had created two displays with a large number of DCS process values on each display. One display the reaction area and one display was for the purification area. Each process values was shown a vertical bar with a marker for the setpoint if any. The operator used these two displays to determined the state of the plant, when his shift started and to decided on adjustments to move a more optimal plant state. The bars were scaled so at the optimum operation they all had more or less the same height. Unfortunately as a control engineer I did at the time not see the opportunity to turns these displays into a tool for all operators to improve plant operations. I guess I was to focused on the displays, which my colleague and I had created to see the opportunity.

However, I did have excellent collaboration with the mentioned operator on the development of computer control structures for the the purification area. On his day shifts we would discuss how computer control could help operations and improve performance. Then I would design the necessary control strategies, and he would test them when he went on night shift. After such a night I would have charts on my desk showing how the strategy performed, and were he believe adjustments were called for. After a couple of itterations he would then test the strategies during day time, so final tuning could be performed, and the new strategy documented on a DCS display from which it could also be operated. All our strategies were designed, so they could be activated both from the instrumentation console and from the computer console. The fact that a respected operator had been involved int he development and testing made for easy acceptance of the strategies by the other operators.

I came away from this experience thinking, that operator involvement often is an overlooked ressource, when consultants or head office staff implement new control strategies at our plants. Usually they don't have the time to get to know the operators and gain their confidence. This leads to missed opportunities. What do you think? What are your experience with operator involvement in control projects?

Later the mentioned company went on to created a constraint based optimization of the whole purification area. A key to the success of this control project was the involvement of the operators in the creation of the displays needed to operate the strategy. In fact the operators was so pleased with the optimization, that they left the strategy on overnight after just two days of testing.

Monday, July 28, 2014

Functional models - a tool for incident prediction and avoidance?

In the June issue of Hydrocarbon Processing one finds an article titled "A new era emerging for incident prediciton and avoidance strategries" by D. Hill. In this viewpoint on automation strategies D. Hill advocates the usage of multivariable statistical process control (SPC), principal component analysis (PCA) and conditional logic as the tools for predicting incidents before they happen.

However, these tools include only limited amounts for process knowledge, and I believe that accident prediction could benefit significant by using process knowledge. One tools, which is able to capture process knowledge and use it to analyze the current state of the process is multilevel flow models (MFM), as developed by M. Lind at the Technical University of Denmark. MFM allows the engineer to create a model of the process in the goals - means domain. This model can then be used to reason about possible causes and consequences of currently measured process deviations, and the results presented to the operator in way, that allows the operator to easily take action on the measured deviations before the develop into a major plant event.

M. Lind and co-workers have demonstrated how MFM can be used to assist in HAZOP analysis of complex facilities such as chemical plants or nuclear power plants. Currently work is ongoing on using MFM for operator assistance during abnormal situations in nuclear power plants and other complex facilities.

The mentioned tools for assisting in HAZOP analysis can easily be modified to predict how minor observed process deviations can develop into major accidents, and what process consequences such deviations may have. Prediction accident impact on surrounding communities will require additional modelling efforts. This allows process knowledge to be used, since it is captured in the goals - means structure of the MFM models.

Are you getting to the bottom line?

The purpose of investigating a process accident or deviation is to reduce future operating cost by avoiding a repeat event. In H.P. Bloch's reliability editorial "Small deviations can compromise equipment reliability" from the June 2014 issue of Hydrocarbon Processing is detailed how a change from liquid oil lubrication to pure oil mist lubrication can result in reduced bearing reliability. My conclusion from reading the editorial is, that the refinery simply had used a centrifugal pump designed for liquid oil lubrication in an application, which used pure-oil mist lubrication.

The pump which experienced thrust bearing failures was manufactured by a well known company, and the bearings were in compliance with current recommended practice for centrifugal pumps: API-610. However, in the particular case with a rotational speed for 3560 rpm and linear velocity of 2780 fpm better guidelines can be found in books (see H.P. article for references) than in API-610.

So what is the issue here? When unexpected events happens during the operation of a plant, such as premature bearing failures, then there is a learning opportunity. If one simply replace the broken part without further investigation, then this learning opportunity is missed, and so is an opportunity to improve the long term performance of the unit.

Unfortunately the editorial does not cover why pure-oil mist lubrication was selected in this particular applications, or why the pump design was not the best for this particular choice of lubrication. Neither do we know when these choices were made. My philosophy is that to gain maximum learning benefits from a plant deviation, then both human, technical and organisational factors must be analysed. Stopping before that has permanent impact on your bottom line.


Sunday, July 06, 2014

Is the peer review process of scientific publications failing?

In old times the editors of scientific magazine chose who to send a paper to for review. However, lately this has changed. Now authors are typically asked to suggest three persons (the authors friends?), whom they think could review their contribution. I think is a dangerous development. I think the result is less critical reviews and a declining quality of published papers.

I will demonstrate what I think reviewers have missed by analysing two publications, both of which I definitely think deserved to be published, but which in my view could have been improved by a more critical review.

The first paper is "Methodology for the Generation and Evaluation of Safety System Alternatives Based on Extended Hazop" by N. Ramzan et.al, and it was published in the March 2007 issue of Process Safety Progress (Vol. 26, No. 1, pp. 35-42). My first question is what is "Extended Hazop"? It turns out, that the authors by this mean combining a traditional Hazop study with dynamic simulation. I think this a great idea, and one which few people considered 7 years ago. I am fascinated by "Generation and Evaluation", and hope the authors will present procedures both for generation of safety systems and for evaluation of safety systems. It turned out, that I was disappointed on both counts.

The introduction in my view attempt to paint a picture, that major accidents are particular to the chemical processing industries. This is done by just mentioning chemical processing industry events and by choosing to reference just another publication in stead of event focused publications, such as either "The 100 Largest Losses (in the Hydrocarbon Industry) 1974-2013" published by Marsh Risk Consulting. This years edition is the 23rd, but the authors should have had access to the 19th. Another more appropriate reference could by appendices 1-8 of Frank P. Lees "Loss Prevention in the Process Industries" (2nd Edition) or the updated 3rd Edition edited by Sam Mannan. I also wonder what the authors mean by "the old concepts of accident prevention". The reference provided to "Chemical Process Safety: Fundamentals wit Applications" by DA Crowl and JF Louvar does not provide the answer.
Then the introduction continues with a list of techniques claimed to be the most common in the chemical processing industries. However, the list is equally common to other process industries and most of them also to other industries. Quite honestly I do not understand the purpose of this list, and the reference provided to the claim, that "no single technique can support all the aspects of safety/risk".
The next paragraph of the introduction starts "Several methodologies of risk analysis have been presented so far...".  It would have been more correct to say "mentioned" in stead of "presented". After this a list of books are referenced. Two are the textbooks by respectively Skelton (name misspelled in reference list) and by Wells. The other two are quideline books by the CCPS. I my view the authors should select to either reference textbooks or guideline books - but not both. The reference to Tixier et.al's survey of tools is most welcome.
The third paragraph of the introduction mention the lag of standards for safety/risk analysis methodology. But I would like to disagree with the statement that tools used are based on "...judgemental contribution of the analyst or the plant manager". I certainly hope this is not the case!
The final paragraph of the introduction introduce the idea of using dynamic simulation to simulate large variations of design/operational parameters. To me simulation of variations in design parameters makes little sense after the design completion, and I would also question whether dynamic simulations can actually cope with events such as loss of reflux pump or loss of cooling water to condenser. At the end of this paragraph the purpose of the paper is stated as "...a combination of conventional risk analysis techniques and process disturbance simulation...for safety/risk analysis and optimization". Nothing about generation of safety system alternatives or their evaluation! There seem to be major difference between the title of the paper and the stated purpose at the end of the introduction. The peer review process should have pointed this discrepancy out.
The second section of the paper by Ramzan et.al describe a four step methodology based on extended hazop - without defining what is meant with extended hazop.
The first step is the usual task in preparation of a hazop study. Unfortunately most academic authors don't provide access to documents and other information collected in this first step. Such documents would of course be of tremendous value to others wanting to learn the hazop methodology.
The description of the second step starts with a very true statement, that "the biggest source of error in hazard analysis is failure to identify the ways in which a hazard can occur". This is normally called the causes of the hazards. Indeed, I find that many - especially students - have great difficulty with distinguishing between hazards, their causes and their consequences.  The section then continues with a discussion of a traditional hazop study and the requirements for conducting this, leading to a statement of the purpose of the work "...to identify weak points arising from disturbances in operation, which may or may not be hazardous, to improve safety, operability, and/or profit at the same time". That is not exactly what I expected from the title of the paper, and to me it sounds like a rather unclear purpose ("weak points"?) with possibly conflicting objectives (safety versus profit). The authors goes on to state, that the analysis of disturbances are based "on shortcut or simplified hand calculations supported by dynamic simulation". Unfortunately the authors does explain what shortcut or simplified calculations are used in the distillation example of the paper, but they do have some very positive statements about Aspen Dynamics after briefly mentioning other dynamic simulators. The section then continue with a clear five point description of the differences between traditional hazop and hazop supported by dynamic simulation (Extended Hazop). These difference are 1) to identify consequences using dynamic simulation, 2) rank consequences in eight classes, 3) identify frequency of each possible consequence, 4) documenting the results in an extended hazop worksheet, and 5) ranking the hazop results. This to me sounds like a description of QRA using a risk matrix with 9 consequence classes and 10 frequency classes. Most of the industrial risk matrices I have seen are limited to between 4 to 5 consequence classes and between 3 to 5 frequency classes, since the uncertainty in the calculations and the assumptions they are based on do not justify a larger risk matrix.
The third step involve what is called either the risk potential matrix or the hazop decision matrix. Here the authors are using consequences and hazards interchangeably. Here the matrix of 9 by 10 risk categories is condensed to just four risk levels: 1) intolerable, 2) acceptable for a short time, 3) risk reduction optional, and 4) acceptable. According to the authors the risk potential matrix is used for the following: a) status  of plant safety, b) ranking of events, c) optimization proposals, d) improvements, and e) documentation. I can't help asking what happened to the safety system alternatives?
The fourth step is development and analysis of optimization proposals. Here the term "risk target" is introduced for the first time. This is a further indication that the article discus a QRA approach to risk assessment. However, there is no clear description of how optimization proposals are developed. but we are told they are evaluated using dynamic simulation, event tree analysis and/or fault tree analysis.

It appear to me, that the conclusions goes beyond what has been presented in the paper. For example I find the paper contain no analysis of operational failures. Neither does a contain any analysis of effects of design improvements. A critical review should also have pointed out, that the term "weak point" has not been defined. I agree with the authors statement, that dynamic simulation in combination with hazop is a powerful tool. However, I disagree with the authors claims about the usefulness of the methodology for safety concept definition, safety analysis and safety system design. Did the reviewer actually read the conclusion?

The second paper I have looked at is "Modelling of BP Texas City refinery accident using dynamic risk assessment approach" by M Kalantarnia et.al, and this paper was published in 2010 in Process Safety and Environmental Protection (Vol. 88, pp. 191-199). I wonder what "dynamic risk assessment" could be? Is it something accident responders use during an event? Or is it the realtime risk state of the facility?
One should always be very careful with stating conclusions already in the abstract, since there is no room for a supporting argument. In this article the authors should in my view have avoided telling us what the main reason for events such as the BP Texas CIty refinery explosion and fire is. Especially when they are dead wrong! Such tragic events are not caused by "lack of effective monitoring and modelling approaches that provide early warning". Neither do such event occur "In spite of endless end-of-pipe safety systems". To me such statements are indications, that the authors have not fully understood the event. Unfortunately, the internal preliminary accident report and the internal final accident report, both of which BP at the time made available on the internet appear no longer to be accessible. These reports clearly shows that the main cause of the event was a local management decision to start the unit before the turn-around was 100% complete (some instrumentation had not been re-commissioned) and before it was needed.
That being said, I do believe that accidents do give an opportunity to look at how process monitoring could be improved. And the article by Kalantarnia et.al is indeed a very good example for such kind of research. Although implemention of the proposed system would not have prevented the events of March 23rh 2005 in Texas City. The local management failed to react on warnings in hazop updates several years before 2005, so I think they would also have overlooked the warnings of a sophisticated probability based system.

In the first paragraph of the introduction the authors gets it partly right and partly wrong. It is absolutely correct, that a high level of safety and reliability requires "...the implementation of a strong safety culture within the facility". Unfortunately not having such a culture was one of major contributing factors to the event. However, to believe that safety and reliability is "maintained by strict regulatory procedures" is just wishful thinking, and in clear contrast to  the end of the paragraph, which reads "...a culture to respect safety throughout the plant both by the personnel and the management is critical". The authors then continue with a more technical introduction in the second paragraph, where a reference to the original QRA work in the nuclear industry is encouraging. However, hazop newer was and never will be part of the QRA toolset. A more critical review should have pointed this our to the authors. The list of references, which the authors choose to mention do not all appear to be relevant for the present work. Again a more critical review could have pointed this out. I find, that only the reference to the work of Khan and Amyotte (J.Loss Prev. Process Ind. (2002) pp. 467-575) is relevant for the contribution of the paper, while the others appear not to relate to the current work. In my view the highlight of the introduction is the review of work to use accident precursors in risk assessment. I really feel, that here things sparkle. But I am left without a definition of dynamic risk assessment.
However, towards the end of the introduction it become clear, that the main inspiration is the work of Meel and Seider (Chem.Eng.Sci., vol. 61, pp. 7036-7056). I heer get the idea, that dynamic risk assessment is actually probabilistic risk assessment updated using real time plant data and Bayes theorem - I guess!

The second part of the article is titled "BP Texas City Refinery", by it could more appropriately have been titled "Raffinate splitter at ISOM unit of BP Texas City Refinery", and then the authors could have avoided filling a scientific paper with irrelevant data from the CSB report on the event, i.e. section 2.1 titled "A brief history" - which it is not, and the first two paragraphs of section 2.1 titled "Process description". A critical review should have pointed this out. Also half of section 2.3 titled "Accident description" could be omitted without loosing information relevant for the present study. The final sentence of this paragraph should be admitted in a scientific publication: "The release of flammable let to an explosion and fire". It would be more correct to write "The released flammables found a source of ignition resulting in an explosion and fire", and one could add "which killed 15 person and more than 170 persons".

The third part of the article "Dynamic risk assessment" is the key contribution of the paper. Nice short five step description of the procedure before going into details about each step.
In the first step the authors define 18 failure scenarios. 17 of these are related to instrumentation failures, such as failure of level site glass or failure of safety relief valve. This simple approach to scenario definition indicate a rather limited understanding of failure mode. I think it would be more appropriate to talk about failure of safety valve to open on demand or leakage through 1.5 inch reflux bypass valve on reflux drum. How can a site glass be labelled as a safety barrier? A reviewer should also look at inappropriate use of terminology. The 18th (or rather 1st) scenario is an operational scenario excess feed loading. But what about others, that actually happened on March 23rd? Such as failure to start flow for heavy distillate from the tower? Or overheating of feed to raffinate splitter? Maybe these operational events are only described in BP's internal incident investigation reports, and hence the authors did not know about them.  It would also have been interesting with a discussion of the choice of parameter values (discrete value, parameter 'a', parameter 'b'). This would help other implement dynamic risk assessment. It good that the choice of distribution function is discussed.
The second step is prior function calculation. To perform this calculation an event tree is needed. Part of this is shown in figure 3 of the paper, and it is incorrectly labelled "Part of event tree of ISOM unit in BP Texas City refinery" even though the event tree is just for the raffinate section of the ISOM unit. A mislabeling is also found in the event tree, where the "Raffinate splitter" has been labelled the "ISOM Tower". Furthermore, the initiating event of the tree is not defined. I wonder if excess feed loading is considered the initiating event?
The third step is the formation of the likelihood function, which has a nice discussion of the choice of this function. However, unfortunately the paper don't discuss how the collected accident percursor data, such as those showed in table 2 of the paper is fitted to the distribution. But maybe that is trivial, and I just need to take a statistics course on-line.
The fourth step is the calculation of the posterior function from the prior function and the likelihood function. This apparently is where the event tree play a key role. So it appears to me, that essentially the paper shows a methodology for updating an event tree using likelihood functions for a single event - excess feed loading.
Essentially it is QRA for a single event, and this is a long way form a monitoring system to provide early warning.
The final and fifth step is consequence assessment. Unfortunately the authors limits themselves to the consequences: asset loos, human fatality, environmental loss, and image loss, and hence neglect other forms of human loss. They go on to state, that in the BP case the main focus is on asset loss and human fatality - hence neglecting the 180 persons injured in the event. They then match each group of end states, i.e. A (process upset) - severity class 1, B (process shutdown) - severity class 2 and C (release) - severity class 4.

The fourth part of the paper is title "Result and discussion", and this start with a claim, which don't hold. The authors claim "The BP Texas City refinery has been analyzed..", when in reality it is the "Raffinate section of the ISOM unit at the BP Texas City refinery" which have been analyzed for a single scenario. I guess that the claim could be based on the fact, that ASP data are for the whole ISOM unit, and not just the Raffinate splitter section.
Unfortunately the authors report the calculated risk in dollars, and to me it looks like cost of unplanned shutdowns goes from 104 $ to 2610 $ and the cost of a release goes from 2590 $ to 96600 $. But is remains unclear to me as an operator or manager of this facility what these numbers tells and what they tell med something about. That is significant increases, but I wonder how things would change a) if there were additional releases or shutdowns in ASP data for the last 2-3 years. How would that change the profile? The whole refinery actually had severel hazop reviews during the time frame. Could such events be included in the ASP data?
I fail to see how the authors can claim that the data in figure 4 signals absence of inspection and maintenance plans? I am certian that Texas City refinery has such plans. However, they may not have satisfied generally accepted standards.
Events such as leakages, shutdowns or process upset clearly are discrete events for which a Poisson distribution properly would be a better choice than the linear model.  The diagram indicate, that there is little difference in the discrete numbers, but some in the cumulative.

While I am critical on several points about this paper, my excitement about the approach remains very high. If one could develop warning systems based on alarms - which if ignored leads to negative consequences - and are more frequent than the ASP's considered in the present paper, then I see dynamic risk assessment as another tool in the control engineers toolset.

However, I remain very critical about the current state of the peer review process, and huge pressure to publish, which create papers far worse that the two one I have critiqued here. I also maintain that both the papers critiqued here contain significant scientific contributions.
If the peer review process is not improved and simple counting of number of publications and citations as a measure of scientific production is not changed, then I fear for those students of the future, who have to review litterature within a subject are in order to obtain a graduate degree. Or maybe litterature review will just disappear.
IBM and their Watson computer have made great progress in analysing unstructured information and helping doctors diagnose rare diseases, such as cancer. I wonder if we can teach Watson to help the human reviewer by e.g. rating the relevance of different citations and missing relevant citations? I think it would be worth a test, and hope one of the major scientific publishers will approach IBM about this idea. 

Thursday, June 12, 2014

Are the best also the safest?

In the May issue of Hydrocarbon Processing there is viewpoint titled "What characteristics define the world's best refineries?". It is by Jon D. Stroup, who is a senior consultant with HSB Solomon Associates in Dallas, Texas. A short version was presented at the 112th American Fuels & Petrochemical Manufactores Annual Meeting in March this year, and AFPM version is freely available on the internet.

For 30 years HSB Solomon Associates have been gathering performance data for refineries around the world, and given their clients a view on how the clients own refinery performs relative to all refineries being monitored. The HP viewpoint Jon Stroup claim, the best refineries have a high availability of their plants. He further states, that there is a strong correlation between high availability and low total maintenance cost.

The best are not necessarily the lowest cost refiners. On the cost spectrum the best refiners lie on the border line between the first and second quartile w.r.t. cost.  The article further states, that the best refineries don't have the longest turnaround time. In stead they optimize cost and length of outtage.

Now, I cannot help wondering - given the above information - if the best refineries are also the safest refineries? I believe they must be, but it would be nice to data from the HSB Solomon Associates studies to support this.

If best and safest correlate, then we need to develop a tool for the process industries in general using the ideas, which HSB Solomon Associates have been using for the past 30 years. 

Wednesday, June 11, 2014

An article for your board of directors!

After the BP Texas City Refinery explosion and fire, that killed almost twenty people and injured many more we got the Baker report, which clearly stated, that company board of directors had responsibility for process safety at their facilities. You can read what the Washington Post stated about this report here and you can download the report from the CSB website here (Here 7 years after its release all BP webpages relating to the report appear to have disappeared. I wonder what that says about BP and safety?).

Some companies have been taking important steps to ensure that all their process hazard studies including HAZOP are performed to a consistent high level However, to accomplish this you need involvement at the board level to ensure, that those leading your PHA's and other process safety activities have clear knowledge of what the company expect of them. The board also need to ensure that adherence approach decided on is regularly tested. One of the major German chemical companies have done that. They now have internal procedures for what training a person need before she/he is allowed to lead a safety study.

However, if you don't have access to safety professionals or board members of this German company, that you can improve your own company situation by having board members read the article "Minimize false assurances in hazard analyses" by Mike Sawyer in the May issue of Hydrocarbon Processing (A subscription is needed to download the article).

Sawyer points out where many hazard analyses fail. A study, e.g. using the HAZOP approach, is performed based on deviations from normal operations. Often this will not cover items such as the process safety impact of overdue or deferred process safety management audit findings and actions items, delays in completing management of change, systems operating without defined operating paramters (e.g. not-to-operate-beyond-limits), deferred preventive maintenance activities, overdue / deferred inspections of e.g. vessels and relief valves, bypassed or out-of-service critical alarms, blocked relief valves, overdue operating procedure reviews, incomplete or out-of-date process safety information, isolation philosophy for emergencies, operating-envelope changes, increases in operational and/or maintenance tasks per shift, equipment failures incorrectly documented as routine maintenance, are just some of the items listed by Sawyer in his Table 1.

Other common deficiencies in hazard analyses listed by Sawyer are: failury to identify changed operating parameters, failure to identify re-rating of vessels or equipment, including out-of-service safety devices as safeguards, over reliance on operator intervention to mitigate emergencies, included inspections / tests as safeguards when these have been deferred, under-estimation of hazard severity, confusion between hazard and consequences.

Sawyer also mention how JSA/JHA in many organisation have become just another checklist to be completed before one can start the work. The issue here is culture, not the tool provided. A major Norwegian company started paying more attention to their contractors on a daily basis, and this had an impact on number of near miss events. Sometimes attitude changes is all that is needed.

I believe, that Sawyer's article also indirectly tells us why the many academic attempts to automate HAZOP over the past 20 years have all to a large degree completely failed. These studies mostly fail to deal with the organizational and training issues. Another reason could be the academic fascination with model, without providing methods to ensure, that the models used are actually suitable for their intended use.

In my view to improve process safety requires a fundamental change in culture in many companies and in societies these companies are part of, and not just another model.

Friday, June 06, 2014

How much does a video prevent future events?

Yesterday the CSB released an 11-minute animation of the Bluewater Horizon Blowout, which can be downloaded here as a Quicktime movie. This video elegantly shows what happened in the blow-out preventer on April 20th, 2010 in a fashion so even non-experts has a change to understand why this device failed to prevent the largest (I disagree with this label. Several prior events seem to have much longer time impact, than this. For example the Exxon Valdez in Alaska or Love Canal) environmental disaster in US history.

However, I am very disappointed, that the CSB not with a single word mention in the video the organizational issues on the platform, the conflicting agendas of the platform owner and the well owner, and the failure of the well owner to escalate decision making to the proper level in their organization. A platform like the Deepwater Horizon cost substantial amount of money to operate, so the cost of just a few extra days at the Macondo well potential could influence the share price of the well owner. Therefore a decision to delay the movement of the platform in order for the concrete used to seal the well to properly settle could have cost that require a decision at a much higher level in the company, than the well manager on duty on the platform. The procedure for such an escalation did not exist within the organisation of the well owner.

Five years earlier ExxonMobil was drilling in a nearby area, when kicks were experienced. A similar discussion at the Blackbeard well between people wanting to continue drilling and people wanting to stop drilling, was within hours moved form the platform level to the company headquarter. The company CEO at the time chose to stop drilling at a large loss. That saved the company image.

I think we need much more discussion on organisational issue and decision making structures in order to make a step change in the industry safety performance. Similarly it would be very nice if CSB also provided use with educational video, which showed the impact of such organisational questions. That would help teach engineers that there is more questions and answers than the pure technical ones.

Unfortunately a proposal put before the EPSC TSC in the early years of this millenium to look at the impact on process safety of organisation changes did not get off the ground. I hope someone will get the ball rolling.

Thursday, May 08, 2014

How can Internet of Everything improve safety?

Yesterday I attended Cisco Connect 2014 at Bella Center here in Copenhagen. This is an event, where Cisco gives details about their latest offerings and tells us why our networks need to move more and more data over the coming years.

After a welcome by Cisco Denmark CEO Niels Münster-Hansen we were treated to two keynotes. The first was by vice-president Maciej Kranz from Cisco Corporate Technology Group, and the title of his presentation was "Connect the Unconnected". According to him and Cisco by the end of the current decade 50 billion devices will be connected with the internet. That five devices for each human being on earth! Not until a breakout sesssion in the afternoon did I realize what he was talking about.

During a collaboration breakout session in the afternoon Stephen Quatrano from Cisco in USA produced a use case example, which started my thinking. The scenario was an older relative, who needed to get his / her medication regularly. When is was time to take the medicine the top of the container would light up. And simultaneously miles or continents away another light plugged into an outlet at a sons or daughters home would also light up - telling that is was time for mother or father to take her / his medicine. The older relative opens the medicine container, and the light in sons or daughters house change intensity or shape. Furthermore inside the lid was a button, which older relative could push if the container was empty - thus automatically ordering a refill through the designated doctor and the local pharmacy. Some would properly call this unnecessary surveillance. Not  taking once medicine can for some of us be life threatening - if I forget to take my blood thinner, then my risk of blood cloths increase significantly. However, I believe the use case shows, that through the internet we can assist even when we are not there in person. In this case it improve personal safety. However, I can easily come up with other cases.

Consider, a case where traces of impurities or foreign elements are found in vials from a particular laboratory. In such a situation the laboratory recalls all vials from a particular batch through letter, newspaper add, notices on websites and such thinks. This takes time, and the change that all relevant vials are discovered are not that great. I the packaging for the vials contained a gps locator, then the producer would be able to follow the package most likely all the way to the final consumer. The information could be kept encrypted until a situation such as described here arise, and authorities with proper credentials could use the information to quickly track down most affected vials. That would improve safety in case of a medical product recall.

Let us consider another situation. A company suddenly experience leakage problems with a particular type of valve. The manufacturer is alerted, and discovers problems with the seating seal composition in this valve from production run some years ago. Again sales records could properly be used to contact others with valves from the same production run in service, but if the valves were internet enable, then this feature could likely be used to locate the valves. Once the valves were located, then it properly would be easy to fix them before a serious leakage occurred.

Similarly internet enable equipment could report on operating range excursions and alert the owner about changing the equipment before a serious event. This would also improve safety.

You can properly find other scenarios from your own facility that could benefit from the Internet of Everything discussed at Cisco Connect in Copenhagen yesterday.

PS: Cisco also demonstrated their latest video-meeting offering with voice and face activated cameras and automatic zoom-out if some moves around in the meeting room. However, if find it quite impressive how close to that offering Google Hangouts actually are. And so far that is free - i.e. without cost to the user.

Sunday, March 30, 2014

Do company boards focus on process safety?

It is the time of the year, when company annual reports arrive in the mailbox, and it is always interesting to see how much focus these reports put on process safety. The annual report is a report from the company board - the highest level of management of the company - to the company shareholders. So if the report has significant coverage of safety or process safety, then this a clear message to shareholders, that the board is clearly focused on running a safe business.

The role of the company boards in process safety changed after the release of the Baker report following the explosion and fire at BP's Texas City Refinery in March 2005. From friends at HSE departments there are clear indications, that at least some boards now take process safety very seriously. Another such indication is the success of process safety courses aimed at board members.

Canadian Pacific

One of the first reports to arrive was the annual report from Canadian Pacific (CP). CP is a railway company, and last year Canada experienced the worst train disaster in more than 100 years. That was the derailment of an oil carrying train a Lac Mégantic on a section of track formerly owned and operated by CP. In this years annual report report Canadian Pacific devote 2 pages to safety. In my view the key message appear to be that the company "is intensifying is efforts to build a culture of safety across the organisation". That is good, and is clearly need in a company, which experienced some serious derailment in 2013:
  • On September 11th eigth cars of a CP train carrying a diluting agent used in oil pipelines derailed at a rail yard in southeast Calgary. More than 140 homes were evacuated.
  • On July 27th a CP locomotive and seven cars carrying oil left the tracks in Lloydminster on the border between Alberta and Saskatchewan. 
  • On June 27th seven cars - some of which carried ethylene glycol - of a CP train derailed while attempting to cross the flood-swollen Bow River in Calgary.
  • On June 2nd eleven cars of a CP train derailed at a trestle bridge east of Sudbury in Ontario. Half ended in the river, and a drinking water advisory was issued.
  • On May 23rd a runaway CP train car rumbled through the southern Alberta community of Okotoks.
  • On May 21st five cars of a CP train near the village of Jansen in southeastern Saskatchewan derailed spilling 91,000 liters of oil.
Among performance indicators CP reports both personal injuries per 200,000 employee hours and train accidents per million train-miles for the last three years as defined by the Federal Railway Administration in the US. The data are shown on the bar graph below, and it appear the two indicators are correlated.
Unfortunately CP does not provide goals for their safety performance indicators. If the company is serious about building a safety culture, then goals could be a good idea. However, in my view such goals cannot stand alone. There most also be guidance, so that the employees are encouraged to take as safe course of action, e.g. not crossing the Bow River when it is seriously swollen.
Clearly the CP board has some focus on safety, but it could be improved by setting performance goals in the safety area and publishing these, just like The Dow Chemical Company does.

Imperial Oil

Imperial Oil have stopped publishing the traditional annual report to shareholders - as far as I can tell. Imperial Oil is a major integrated oil company with operations only in Canada, and it is afiliated with ExxonMobil

This year shareholders receive a letter from the CEO and the annual financial statement. The latter does not mention safety at all. The letter to shareholders use the word 'safety' eight times - CEO Rich Kruger mention it in his opening remarks in the context "intense focus on the fundamentals - safety, operational integrity, reliability and profitability". Rich Kruger goes on to compliment employees for their commitment to achieve a workplace where Nobody Gets Hurt. Further in the section on operational highlights it is stated "Working more than 44 million hours, our second highest total on record, we achieved workforce safety performance on par with 2012's best ever". However, I find it a bit disappointing that a company with such clear focus on safety - see below - does not find it worthwhile to communicate the safety achievements directly to its shareholders. By this I mean setting long term goals towards which progress can be measured with relevant KPI's. 

On the Imperial Oil website the safety information is found under the heading "Community & society". Here it is clearly stated that the goal is "Nobody Gets Hurt", and that safety is the number one and first priority. Unfortunately it seems like there have been cut-backs in website maintenance. To the right of the nice words are links to 2009 safety record, with information about what the company said in 2008 it would do, what it did in 2009, and what it plan to do. A similar record can be found for 2010, but I did not succeed in finding it for the following years.

The data on Imperial Oils state performance IS available on the website. For the corporate performance you have to look at the Corporate Citizens Report, and the safety record is something to be proud of. During the years 2008-2012, when the huge Kearl project was being constructed among other activities, the company acheived zero work related fatalities among both employees and contractors. During the same period the total recordable incident rate from 0.35 to 0.15 among employees and from 1.07 to 0.37 among contractors. However, the standard deviations indicate, that the change may not be statistically significant.
Imperial Oil also publish community report or Neighbor News in the communities in which the operate, and these contain safety performance information related to the particular site. For example at the Sarnia Site the total recordable incident rate went from 0.51 in 2010 up to 1.49 in 2011, and then down to 0.21 in 2012. Also the site numbers for refinery sites, like Nanticoke, Sarnia and Strathcona appear to be higher than the corporate numbers.

Imperial Oil continue to have a clear focus on safety with a stated goal of nobody gets hurt. One must compliment the company for being open about a very difficult goal. However, the community reports show, that there is room for improvement in the downstream operational units of the company. I would suggest, that setting long term - i.e. 5-10 year - sustainability and safety goals may help achieve such goals.

EnCana

EnCana, which is Northamerican energy producer, use the word 'safety' just seven times in its more than 100 pages long annual report. There is no information about safety performance in the report, and the section on 'Risk Management' have 3 subsections labelled respectively 'financial risks', 'operations risks', and 'environmental, regaulatory, reputational and safety risks' in my view shows were the focus is in this company. 
The TRI performance of EnCana appear much like the one of Imperial Oil. Generally employee performance is better than contractor performance. The two companies are also similar in both just reporting KPI in the area of occupational safety even though the CCPS have suggested a set of process safety indicators, and these have been endorsed by some organisations.

EnCana - like Canadain Pacific and Imperial Oil - does set publicly available performance goals in the area of process safety or occupational safety. This is unfortunate, since the results, that on a chart as the one above we don't know if the year to year variations just happen by chance, or if there is a program to reduce contractor TRI-rate.

Cenovus

Cenovus is Canadian oil company focusing on the upstream end of the oil and gas business. In the company's annual report for 2013 the company CEO state, that neither he nor the board is satisfied with the current safety performance of the company. 

Unfortunately the CEO does state clear and quantified objectives for the company's safety performance - and we are just talking about occupational safety - just like the previous three companies. 
Again the picture here is very similar to that for TRI for Imperial Oil and EnCana. I clearly suspect, that in order to improve both occupational safety and process safety Cenovus need to take a good look at its systems - especially those for investigating incidents. Do the incidents dig deep enough? Or do the stop with the first identified mistake?

Conclusion

There appear to be a general reluctance to set specific goals in the area of process safety, occupational safety and sustainability. This is unfortunate, since only by setting quantifiable and public goals will the industry and individual companies move in the direction of improved process safety and improved sustainability.

In this connection it is worth remembering, that when the CEO of the Dow Chemical Company set the first set of safety goals in 1985, then neither he nor anyone else in company had any idea about how to achieve these goals. Only a belief, that with a joint effort it would be achievable. 

Probably stating occupational safety and process safety goals will not be sufficient to achieve improved safety performance. Most likely the investigations of incidents need to go a step deeper. Take a look at the recent Chemical Processing article by Nancy Leveson Sidney Dekker "Get to the Root of Accidents".

Thursday, March 13, 2014

Transport of crude - by railcar or by pipeline - what are the NGO's fighting for?

Currently in Canada there is a strong opposition to the National Energy Board's (NEB) decision to allow the flow in the Enbridge Line 9 from Montreal to Sarnia to be reversed, so the crude in the near future can flow from Sarnia to Montreal in order to transport pre-treated bitumen from Alberta's oil-sands to refineries near Montreal.

During media coverage of the decision CTV W5 - a Canadian TV station - have discovered, that over the 38 history of Enbridge Line 9 there has been 35 spills. The graphics below from the Toronto Star shows the locations of these spills and the amount of oil involved.



To me most of the spills - except for 2 near Edwardsburgh/Cardinal and 2 at Terrebonne near the Montreal terminal - have been minor. CTV W5 also uncovered, that during the past 10 years 22 spills have been reported to the authorities. However, is appears the different reporting requirements of the NEB and the Ontario Ministry of the Environment is part of the issue. Another reporting issue is wether a spill occurred along the main line or a company facilities associated with the pipeline. There clearly are room for improving the uniformity of the reporting requirements to local and national authorities.

Most people agree, that transport of crude oil by pipeline is safer than transport by railcars. Even the US-only Pipeline Safety Tracker acknowledge this. So opposition to reversing the flow of crude in order to avoid a large number of daily crude oil trains makes a lot of sense. In my view the NGO's opposition to this seems completely unjustified.

However, in its decision the NEB also allowed the capacity of the pipeline to be increased from 240.000 barrels of oil per day to 300.000. That is a 25% increase. Since the diameter of the pipeline is fixed, then this can only be done by increase the flow rate through the pipeline, and this requires increasing the outlet pressure at pumping stations along the line or the addition of compounds to lower the viscosity of the transported crude. The former approach clearly have some integrity risk with a 38 year old pipeline, and the latter is properly to costly. Therefore it make perfect sense, that the NEB requirer Enbridge to improve pipeline integrity monitoring.

So satisfy the public need to feel and be secure it would have been nice if the NEB had also required Enbridge to regularly release information to the public about the findings of their integrity monitoring of their Line 9 pipeline.

Tuesday, February 25, 2014

Have your online analyzer developments come far enough?

On ControlGlobal Paul Studebaker have an editorial about the last 25 years development in process analyzers. The editorial is accompanied with a timeline of the key developments over the past 25 years. However, there are at least two developments, which I find missing from the list.

The first of these properly goes further back than 25 years. That was when we learned to not tweak the analyzer settings each time we ran a standard test through it. Overadjustment of online analyzers was properly a significant cause of extra process variation in the late seventies and early eighties. Each week a technician would take an analyzer offline and run a standard sample through it, and based on the result adjust the analyzer settings.
In the eighties we learned the basic about quality control, and started only adjusting the analyzers when the result from the the standard sample was outside the control lines on our control chart. Thank you Dr. W. Edwards Deming! Read about the basic of statistical quality control (SPC) including control charts on Wikipedia.

The second happened after I switched form industry to academia about 25 years ago. My former employer in Canada eliminated the quality control laboratory in their chemical plant. No, they did not eliminate quality control - only the lab. All necessary analysis was converted to online analyzers, and in the few cases, where this was not possible a small lab was established close to the control room to perform the necessary tests. The reasons the change were many, but a key one was the delays involved in taking a sample, getting it to the lab, getting it analyzed, and getting the result back to the control room. Each of these steps were also a source of error. The result was, that the process operators did not make any process adjustment based on the quality control measurement - neither when necessary, nor when not necessary. The conversion of the quality control laboratory to online analyzers had many positive effects. For example that many impurities instead of being measure on a single sample every 24 hours was now measured by online analyzers every 2 hours, 4 hours or 8 hours. This naturally increased the operators awareness of these measurements.

So have you also moved your quality control laboratory online? If not I think you have not gone far enough! 

Monday, February 24, 2014

Trends in incidents in February 2014

I have done a highly unscientific survey of spills, fire, explosion etc. in fixed chemical facilities during the first 3 weeks of February this year. For this I have primarily used a RSS feed called "Recent Chemical Incidents at Fixed Facilities" created by Meltwater News. This RSS feed is available on the CSB website, and may also be followed on Feedly.

I have found 25 articles about events, which I would characterize as process safety incidents. Unfortunately all these are about events in the USA. So the picture is properly biased. Not unexpectedly events are first reported by local news media, and some days later they are picked up by NGO's such as Nation of Change or Think Progress.

However, the 25 articles covers explosions, fires, spill and blowouts in 12 different states. On the positive site only a single fatality is mentioned across these 25 stories. Spills are involved in more stories than any other type of event, and coal slurry from old and also shut down facilities. I guess the industry and society have not learned the lessons from Love Canal.

My greatest concern is however the failure of a blowout preventer at a fracking well in North Dakota. In Denmark we are currently discussing whether to allow preliminary search for oil and gas using this new technology, and the opposition to this possible new is quite well organized here. So any incident with fracking anywhere else will not facilitate development here. And failure of a blowout preventer brings memories back of BP's Deepwater Horizon.

Last years incident in Quebec was a wake-up call for the industry involved in the transportation of crude oil by rail. Nonetheless in 2013 more crude oil was spilled than in the previous four years the NGO's report. Does this mean, that transportation safety has decreased? Not necessarily! It could be that both the number of trains and the number of rail car with crude oil could have increased significantly in 2013 compared to the previous three years. Both the transportation sector and those, who own the crude being transported owe it to the public to come out with more information about rail transportation of crude oil.

Accidents or Incidents?

Many people use the terms incidents and accidents interchangeably. This in my view is rather unfortunate. In my view incidents are undesired events which could have been prevented. For example a fire in a pot of oil in the kitchen can be prevented - or at least it's likelihood reduced - by using a heating source, which don't involve open flames. Most chemical engineers have learned this in their first laboratory course in organic chemistry. Accidents cannot be prevented. For example you cannot prevent another driver from running a red light and crashing into your car. However, you can reduce the consequences using the techniques of defensive driving. 

I get quite upset, when officials are very quick to label an event as an accident. A recent fire a scrap metals plant in Iowa is a very good example. The scrap metal fire at Rich Metals on February 11th has been labelled an accident, because according to the Blue Grass, Iowa police chief "a worker was grinding metal when a spark or hot metal ignited a pile of metal turnings covered  with oil". My immediate question upon reading this story was: Why was a pile of metal turnings covered with oil left close to a potentially spark producing process?

So the so-called accident could have been prevented by placing the pile of metal turnings covered with oil further away form the potentially spark producing grinding process. This makes the event preventable, and hence it is an incident, and NOT an accident.

So it appears clear that housekeeping at Rich Metals could be improved by a proper investigation of this incident. But is there anything else, which could be improved by an incident investigation? What about working training? Did the working know about the dangers of the sparks produced by the grinding process? Was the worker trained evaluated the safety of the work area prior to starting the work? 

In Denmark employers are by EU law required to perform work-place-safety-assessment (Danish: arbejdspladsvurdering) before any work is performed. This assessment is to ensure the work can be carried out with minimal harm to the employee. The assessment can be carried out either by the employee, who will perform the job, or by another company employee or even an external consultant. The assessment must be documented.

So let call every undesired event in our plants for an incident or process safety incident until a proper incident investigation confirm, that it is really an accident. 

Even if lightning strikes - a so-called act of God - as it did on Tank 11 at Sunoco's Sarnia Refinery one summer night in 1996, it is still possible to learn from the event by using proper accident investigation techniques.

Thursday, February 06, 2014

Cloud - now much clearer!

Yesterday I attended an HP Software event here in Denmark titled "HP Discover Brush Up" at the Danish HP headquarters in Allerød. You can watch videos from HP Discover in Barcelona last December here. It was a rewarding afternoon. Country manager for Sweden and Denmark Lene Skov welcomed us before Rolf - unfortunately I have forgotten his last name - gave us HP's version of current trends in IT, which they call "The New Style of IT". This is all about mobility, security, cloud and big data, and Rolf presented a vision for the enterprise of 2020. A significant part of this was naturally HP's big data platform HAVEn. HAVEn just like other big data platforms - except maybe Tableau - is founded on the open source software HADOOP.

After the keynote lecture there were 4 tracks. The one I attended was cloud and automation. A key new product offering was apparently Cloud Service Automation, which allow you to configure a deploy a system including networking, storage etc. The system can be deployed in your own cloud or HP Cloud or a number of other providers, e.g. Amazon S3 or Dell with the HP Marketplace Portal functioning as a broker. HP Cloud provides a 90 day free trial - really nice for small independent consultants, who see the benifits of deploying in the cloud in stead of on premise.

The HP CSA is based on TOSCA, which is an open standard for defining and describing cloud service offering. This makes what whatever you define using CSA transferable among the different service providers. If you want to get your feet wet without spending any money and learn more about the technology under the elegant HP portal you could download the latest openSUSE 13.1, which include access to a an openstack implementation including orchestration tools.

Orchestration is something I have heard about some years ago before everyone started talking about cloud. At the time Novell had a product called ZenWorks Orchestra. It was about virtual machine definition and deployment, and you can read about it here. Today the offering have developed to Platespin Orchestra.

When I think back on IT hardware development over the past 40 years, then is appears we have come full circle. When I studied at the Technical University of Denmark in the early 70's our main computing facility was called NEUCC for Norther Europe University Computing Center. It started with some IBM 7000-series systems and later evolved to the 360's and 370's. You delivered the stack of punch cards to the machine room and picked up the printed output some hours later. Later a remote terminal was created at which you had your card deck read in one end of the room and picked up the output at the other end some minutes later. In the late 70's during my studies at University of Alberta we accessed the Amdahl mainframe through remote terminals and DECwriters, but still had to pick-up the output near the actual machines. That is we needed to know were the machine were!

In the following decade the mainframe was declared dead many times. Especially by IBM's competitors. However, the mainframe technology is still with us, and has advanced, so today the systems can be repaired and expanded without outage. And thanks to virtualization technology you can deploy a new OS instance within a few minutes. Then around the turn of the millenium we started seeing something new: so-called blade servers. At the same time IBM started running linux on their mainframes. In the beginning a few hundred virtual linux instances on a single System z, but today more than 60.000 virtual linux instances on a single System z without extender.

During the past 5-6 years the major buzzword had been "cloud computing". I think is started with IBM saying the network is the computer, and then it became less and less relevant where the computing power is, than how much you had access to. Today all the major players on the market offer you the ability to buy a cloud computer. However, there is very little information on what hardware is actually involved here. My guess would be, that is a simply a container with some blade servers, som storage and some networking hardware, which can be configured over the internet to perform the services you need. So what is the difference between this private cloud and the IBM mainframe? I would say the supplier, and possibly the mainframe has more compute power per square foot than most cloud offerings. But basically both the mainframe and the cloud computer, is just a large amount of computing power!

The other big current question is public versus private cloud. I.e. should you own the hardware and have it located on you premises or should you rent the compute power when needed either from Amazon S2 or Coogle Compute Engine? For many uses, e.g. university teaching and often also research, it makes little since to have the compute power on premise, since there will be many hours of the day, when it is not needed. It would make more sense to access e.g. Google Compute Engine when the compute power is needed. That same compute power could then benefit European universities, when American students are slipping and so on. Just like we shared mainframes in the past! What do you think?

One final note! Currently I am migrating our scanned mail to Google Drive. This means I will no longer physically know where a given  document is located. However, the benefit is that I can quicker search for a document on Drive, than I can currently click my way to the document through the folder structure on my harddisk. I will also be able to access my stuff at any location with a Wi-Fi connection. I can easily download a  document to my phone or tablet if I need it while off-line, e.g. when travelling by plane. I decided to go full out after reading an interview with Google manager in charge of security and testing the connectivity at my home office location - about 15 meters and several internal wall from my access point. Additional benefit is not loosing my data in case of break-in at my home - properly more likely than a Google 24 hour outtage. Only one thing concerns me: I am much older than Google!

Friday, January 24, 2014

What is "the cloud"? - I think yesterday it became a bit more clear

Yesterday, i.e. January 23rd 2014, I attended the IDC event "360 degrees of IT" in Copenhagen. It was a good event with speakers from IBM, Symantec, Hitachi, T-Systems and others. The focus of this years "360 degrees of IT" was what IDC call the 3rd platform, i.e. mobility, big data, social and cloud.

But what is the cloud, that all IT-person keeps talking about. In the past I have learned, that the cloud can have either a private cloud or a public cloud. But what is this cloud? During today's event it became a little clearer during the talk by Tony Franck "Increased business value with cloud". He talked about Hitachi's Unified Compute Platform (UPC). According to Franck this UCP makes it possible to deploy new servers in just 46 minutes by inserting a couple of blades. Blades? So, it appears a cloud, at least the Hitachi UCP one, involve blade servers. So I guess a cloud is just a sophisticated box with blade server and storage well integrated. Much like Oracle's Exadata or Exalogic boxes or IBM's System z mainframes. Only difference is, that on a System z mainframe no hardware need to be added to deploy new servers and the process only takes a few minutes and not almost an hour (unfortunately I don't have any information about the similar processes on Oracle equipment. Really I see little functional difference between an Exalogic box and a System z mainframe - except the name on the box. Am I seeing the world of the cloud as it really is? I.e. old wine on new bottles.

Arne Sigurd Rognan Nielsen, who is a Norwegian that have worked many years with IBM and whom I have heard before, came and talked about Social Business 2.0. Not as a service offering from IBM but as a process. Among the questions he asked were: "What is the business value of a license?", with the answer "Zero". I guess the question was asked from the point of view of the customer - not the vendor. Unfortunately Arne's Apple laptop did not play well with the projector in IDA's main conference room, so we could only see every second slide during the presentation.

Another presentation, which deserves to be mentioned was that of T-Systems' Dieter Weisshaar "Life and Business Changes - Zero Distance" in which he showed a grocery shop at a Japanese subway station, that just consisted of pictures of groceries with QR-codes on. The customers simply scan the QR-codes of the goods she wants to buy, and then they are payed for using the phone, and later that day delivered to the buyers home. I consider that the first intelligent usage of QR-code.

Saxobank's Mikael Munck talked about using the majority of his IT budget on new developments year after year, and about a new social trading platform announced by the bank the same morning called TradingFloor.com. Part of secret of spending most of the budget on new development is focus on just one platform, i.e. .NET, and the needs of the traders, i.e. the customers. Mikael Müller from DR Technology talked about the transformation of the broadcaster from the old static world in which they decided when, where and how to the new dynamic media consumption model, where the user/customers decides when, where and how. Examples mentioned was recent coverage of municipal elections in Denmark from 56 locations using mobile phone video technology, and harvesting tweets in almost real time as part of the coverage of this election.

I found that even the I am not currently in the market for any of the services offered by the companies taking part in "360 degrees of IT" the day was time well spent to get updated on current trends. Although I was a by surprised to see to presentations with copyright notices from respectively 2012 and 2011 at the end.


Sunday, January 12, 2014

It used to be my favorite periodical!

Many years ago, while I was chemical engineering student at the Technical University of Denmark I remember reading Hydrocarbon Processing regularly. I seemed to me, that the issues at that time was significantly thicker then the current monthly issue. I always find it fascinating to read about the many large construction project going on across the globe.

Lately I have been somewhat disappointed with the articles I have chosen to read in Hydrocarbon Processing. My primary field of interest is process safety so when received the December issue some weeks ago I was pleased to find to safety related articles: "Rethink the hazards in your process" by R. Modi and "Consider process-based failure analysis methods for piping and equipment" by D.L.N. Cypriano et.al.

I started reading the first of these and after reading the introductory paragraph I started wondering about the authors purpose of writing this article. It indicated, that process safety was just about "meeting mandatory requirements". I was a bit shocked, but continued reading.

The next three sections were titled respectively "Managing process hazards safety", "Safety life cycle" and "Basis of SIS design". This appeared interesting, although I was wondering about the role of the word "safety" in the title of the first section.

It turned out that the section "Managing process hazards" was mostly about designing and operating the process within acceptable risk limits. I was stunned to read, that "eliminating risk and still operating the facility in absolute safety is not practically possible".

The section "Safety life cycle" starts with mentioning the functional safety standards for electronic safety systems IEC 61511 and IEC 61508. But the author states that "to balance system design between safety and availability". Does the author really mean, that increased process safety means reduced system availability? I hope not! The author also seem to believe the analysis and realization phases of the process life cycle are more important than the operation phase. I respectfully disagree.

The section continue with stating, that the major to comply with IEC 61511 are SIF identification, SIF assessment, SIF design and SIF validation. However, when the author goes on to write "Risk assessment determines the required safety instrumented level (SIL)", then it leaves me with the impression, that the knowledge of IEC 61511 could be deeper. I also wonder why any discussion of the importance of company's allowable risk criteria, which are required to determine a safety integrity level, appear to be missing.

The physical components - input elements, logic solvers and output elements - of a safety instrumented system are correctly identified in the section "Basis of SIS design". But the sentence "...compenents are assessed for their complexity, inherent properties and behavioral uncertainty" I find very difficult to understand, and the same is the case with the sudden introduction of type A or B subsystems immediately following.

Unfortunately a direct error appear in Table 1, which list factors to be considered in SIS design. Here the third factor is "Keep PFDavg value of each SIF superior or greater than the targeted value" should be "lower than the targeted value". I am not an expert on the design of safety systems - that is why I read this article - and maybe that is why I find Figure 3 difficult to understand. Especially that PFDavg = SIL is a contributing factor for SIS voting. Maybe the word "voting" in Figure 3 should have been "design"?

Some times things are explained with a few more word than really needed, as in "..., PFDavg, as the probability of failure to perform the desired safety function when demanded". Most engineers would just write "..., PFDavg, the probability of failure on demand".

The following section titled "Contributing factors for SIS voting" first list all the factors / calculations, which enter into a SIS design, and the continues the discussion about safety versus availability without in my view making this so-called "balance" any clearer. However, I get the impression, that SIS voting is just choosing which XooY voting structure to implement. Can it be that simple?

The final sections are titled "Other factors", "Effective factors to design an SIS" and "Other measures for SIS life cycle design". The first of these continue the discussion of voting structure after again listing a number of factors, which should be considered in SIS design. As far as I can tell the section "Effective factors to design an SIS" don't contain a single word relating to effective design as least from an engineering perspective. Final section "Other measures for SIS life cycle design" state the important and surprising fact, that "the optimum is the best design".

My conclusion is that the title of the article has very little relation to its content, and hence I recommend, that in stead of wasting time reading this article, then go directly to the book "Safety Instrumented System Verification" by William Goble and Harry Cheddie mentioned in the bibliography.

I other article mentioned at the start of this blog is better. It contains a nice case study of investigation of leakages in a boiler feed water system. But again I have difficulty seeing link between title and content. I appears the editors of HP want titles to be as broad as possible to increase readership. But aren't engineers smart enough to see through that? Maybe HP articles are not aimed at engineers?