Sunday, October 26, 2008

What is a process safety incident?

During about 10 years at a major Canadian petrochemicals producer a number of undesired events occured at the plant and at other facilities we had collaboration with. Among them are the following events:
  • An employee jumps from a 4 feet high platform onto a concrete floor in the packaging area of a polyethylene plant. Unfortunately he don't land on his feet, but falls and breaks his back. He is instantly dead. No chemicals were involved. No process equipment was involved. No acute release took place. It is definitely an undesired event.
  • During maintenance work on a bulding housing a polyvinylchloride plant a ground operated man lift was used. The bucket controls malfunktioned resulting in an employee in the bucket beeing squezzed to death against some piping on the outside of the building. No chemicals were involved. No acute release took place. It is definitely an undesired event.
  • Unplanned shut down of a polyethylene reactor is an abnormal process conditions, which leads to the burning of a significant amount of hydrocarbons and an intense heat flux to nearby ground areas - up to more than 100 feet away from the flare. This event is also a quite costly one - properly in excess of $100.000. There are chemicals involved, but they are realased or burned through equipment designed for this event. There is definitely an acute release, but it is through equipment designed for it. It is definitely an undesired event.
  • During patching of some software on a Honeywell PMX system two constants in the computer system was interchanged. The result was, that valves, which should close did the opposite, i.e. they opened, and valves, which should close did also the opposite, i.e. they closed. The result was significant process deviations and releases through safety valves. This was definitely an undesired event.
Clearly not all undesired events should be counted as process safety events, or maybe they should only be counted as process safety events to some degree. According to the CCPS criteria for a process safety incident none of the above events qualify as a process safety incident. However, they all happened because there was a chemical plant.

Could these events have been prevented? Yes, the fall from the packaging platform could have been prevented by equipping it with a stair in place of the vertical ladder. The squezzing of an employee could have been prevented by installation of a dead-man button in the bucket. The incorrect patching of the control system software could have been prevented by patching during plant downtime and testing before plant startup. Would these events have happened if there was no chemical plant at the site? No. They are linked to the production of chemicals!

All four events were treated by management as major undesired events associated with their facilities. The first two events had significant impact on every engineer at the site they occured at. 3 of the 4 events were treated as significant incidents by management. None of them qualify as process safety incidents according to the CCPS criteria. Why? Could it be that the CCPS purpose is just to count events, which are caused by operations errors in running the process, maintenance errors in maintaining the process, or process design errors? I don't know.

A few years ago Shell UK reported in their annual report, that the had experienced to fatalities the previous year. The fatalities did not involve Shell employees. The fatalities did not involve contractors working at Shell UK sites. The fatalities involved road accidents involving a subcontractor.

If we want to improve performance of our chemical plant, then we need to focus on undesired events, and using our employee skill to understand why they occurred, and how we can prevent them in the future. Whether a particular event is labelled as a process safety indicent or somethingelse really is not that important, as long as we see the event as an opportunity to improve our performance. This approach is what drove the quality movement in Japan after WWII, and what drove Ford and many other companies in the 80's. A clear focus and belief, that we can do better!

CCPS Global Process Safety Metrics - How can they be improved?

I was quite excited when I read the presentation "Global Process Safety Metrics - Driving Consistency and Improvement" by Tim Overton, who is chief safety engineer at The Dow Chemical Company. These first results of the CCPS Metrics Project weres presented at this years AIChE Spring Meeting and CCPS Conference on April 6th. The goal of the process safety metrics according to the presentation is to help company management answer questions such as
  • Is our company headed for a major accident?
  • How is our company's process safety performance compared to others?
and questions more aimed at the general public, such as
  • Which companies are becoming safer?
  • Is the chemical industry improving its process safety performance?
The CCPS Metric Project, of which Tim Overton is chairman, was initiated by the CCPS following recommendations in the Baker Panel Report and the CSB Report on the explosion and fire at BP's Texas City Refinery on March 23rd, 2005. Due to the appearant failure of BP's corporate management in monitoring process safety in their plants the Baker Panel recommended that BP should "develop, implement, maintain and periodically update an integrated set of leading and lagging performance indicators for more effectively monitoring the process safety performance". And the CSB recommended, that the API and USW should "create performance indicators for process safety in the refinery and petrochemical industries" and involve relevant scientific organizations and disciplines in this work.

In his presentation Tim Overton state, that the current situation is characterized by process safety metrics, which differ from organization to organization, and are likely based on incident definition that are not well aligned to the actual hazard of the event. This could well be so, but how does this prevent such metrics from being used to drive improvement in process safety performance within the particular company in which they are used?

The aim of the CCPS Process Safety Metrics Project are
  • Define metrics that focus on process safety as contrasted to personal safety
  • Define common industry-wide lagging process safety metrics
  • Define near-miss or other lagging process safety metrics
  • Define leading indicator process safety metrics.
Therefore it is of course necessary to define what constitutes a process safety incident. In the presentation the criteria for a process safety incident are a) employee / contractor lost time injury, b) fire or explosion resulting in direct damage cost of at least $25.000, c) release from primary containment of quantities greater than chemical release threshold quantities. I see several problems with such a definition. Firstly an event, which just involved first aid but had the potential for large scale injury and / or damage is not counted. Secondly the amount of damage involving a cost of $25.000 will change with time and properly also location. Over time this could influence the historical trend of the metrics. Thirdly defining threshold quantities indirectly define distances of concern as used e.g. in the Dow Chemical Exposure Index. This is very relevant in risk assessment, but in my opinon seems odd in performance monitoring.

Well, if we can somehow agree on how process safety incidents are counted, then the following lagging metric can be defined:
  • Count of Process Safety Incidents (PSI).
  • Process Safety Indicent Rate (PSR) = Total count of all process safety incidents times 200.000 divided by Total employee and contractor work hours.
Furhtermore by assigning a severity score to each process safety incident, then a Process Safety Severity Rate (PSSR) can be defined as the total severity score for all process safety events times 200.000 divided by Total employee and contractor work hours. The 200.000 hours used in these calculations are of-course 100 employees working 40 hours per week for 50 weeks per year. Using this number create a parallel to the fatal accident rate or FAR. Indirectly, we thus know the count of incidents is for a year.

My concern about the CCPS Metric Project increased significantly after reading the booklet "CCPS Process Safety Metrics". This booklet describe carefully which events are counted as a process safety incident and which onces are not. The different metrics appear to be created by engineers for engineers. Let me give an example. The booklet describe how a release of 933 kg per hour of gasoline is not a process safety incident, because it is below the threshold of 1000 kg defined for this kind of material. How accurate can we in real life situations calculate the amount involved in a release? Can we be certain that this release did not involve 1017 kg per hour of gasoline? This difference, I believe, could be just due to the value of the discharge coefficient used in the estimation or the uncertainty in the level measurement used to calculate the amount.
In my view the ideas of fuzzy logic, which have been used to great advantage in e.g. cement kiln control, should properly be used to decide if a particular event was a process safety incident or not. This could create concepts of how much of a safety incident a particular event is.

The idea of the degree of process safety event could be applied to events, which are currently according to the CCPS booklet not considered a process safety incident. An example is a fire in a laboratory associated with a plant. The laboratory is there only because the plant is, and the laboratory works with the materials it does because of the plant. So in some sence the events in the laboratory occur due to the precence of the plant. Without the plant there would be no need for the laboratory. Therefore in my view undesired events in the laboratory should to some extend be considered as process safety events. By how much need to be defined.

Although I may appear critical of the current results of the CCPS Metric Project, I do believe, that defining common lagging metrics is a step in the right direction. However, without some directions as to how these metrics are to be used inside and outside the companies, which have to calculate them, then I fear they will end being just another administrative burden on the companies. Also the idea of fuzzyfying the criteria for defined a process safety indicent should be explored. Therefore I call upon the CCPS and other involved in process safety such as the EPSC and the EFCE Working Party on Loss Prevention and Safety Promotion to start thinking about how the proposed metrics and other metrics can be linked into the company management system and the actions company management need to take based on whatever metrics they have access to, and how the border between what is a process safety event and what is not a process safety event could be softened.

Leading metrics are much more difficult to define. At the 2007 Loss Prevention Symposium in Edinburgh there was a presentation by Unni Nord Samdal from Norsk Hydro Oil & Energy Research Centre in Porsgrunn on an indicator for technical safety called the T-Rate. This showed how difficult it is even within a single company to create meaningful leading indicators related to process safety performance. The CCPS Metric Project suggest the following as possible leading indicators:
  • Mechanical integrity inspections done divided by mechanical integrity inspections due.
  • Number of past due action items divided by total number of action items.
  • Percent of MOC's satisfying the MOC polity!
  • Percent of operators trained on schedule.
  • Survey of safety culture.
  • Number of activations of safety systems.
  • Number of activations of relief valves.
  • Number of deviations outside operating limits.
I am inclined to favour the simpler the better principle. Leading indicators, which are easy to understand and easy to use, are also more likely to be used by management. From the above list this would be number of activations of safety systems, number of activations of relief valves, number of deviations outside operating limits. I would add to this the alarm rate, i.e. number of console alarms per hour or shift.

I once worked in a large Canadian oil company at which the alarm rate in their refineries and chemicals plants was rather low. At the time we had a rather good safety performance in terms of the lagging indicators - first aids, loss time and fatalities - so this has colored by belief, that keeping the alarm rate low by using good process control is a significant step to good process safety performance.

Sunday, August 17, 2008

Half a HAZOP!

Since ICI told the world about its hazard and operability studies in the early 70's HAZOP studies, as they have come to be called, have been one of the prefered tools for risk assessment in the process industries. A HAZOP study uses a systematic team approach to investigate causes and consequences of relevant process deviations from normal operations in order to improve the ability of the process to handle deviations from normal operations.

Over the years the use of this approach have been adopted in many fields outside the process industries. Recently, while looking for references to the early descriptions of the method by ICI, I came across a publication which claimed to describe the use of HAZOP in the study of intelligent traffic system. This article "Application of Hazard and Operability Studies (HAZOP) to ISA and Speed Humbs in a Build-up Area", was presented at the e-SAFETY conference in Lyon, France in September 2002. The paper presents a deviation matrix for two viewpoints on a traffic situation a) that of a singe participant in the situation, and b) the situation as a whole. Already here the parallel to the HAZOP studies, which is known in the process industries appear to be weak. The process analogy to these two viewpoints would be a) that of a single process operator or other process worker (technician, contractor, engineer), and b) that of the process unit or plant. I don't know if these two viewpoints would be relevant to the process industry or not. It could be argued, that the first viewpoint would be a personal safety viewpoint, and the second viewpoint would be a process safety viewpoint.

The mentioned article by Jagtman and Heijer does however limit itself to presenting causes of deviations. For traffic speed adjustment system the parameters considered are speed, direction, location, attention and travel time. The deviations are no (none), too high, too low, wrong, fail of, part of, unknown and unexpected. The matrrix of relevant deviations is relatively sparse with less than 50% of the parameter and deviation combinations considered relevant by the authors. This and the fact, that the article don't consider the consequences of the deviations, make me think, that maybe HAZOP studies is not the best tool for studying this system. Through 10 years of teaching risk assessment to chemical engineering students at a university in Denmark, I have always struggled with explaining to students when the different qualitative and quantitative tools: HAZOP, Whaf-if, FTA, ETA, FMEA and QRA shoud be used, and which tool to start with. Generally I have recommended starting with What-if or HAZOP studies (HAZOP is easier with limited process knowledge - especially if you use a functional approach during study preparation).

However, the analyses of causes of deviations in a traffic control system presented by Jagtman and Heijer indicate, that maybe FTA and ETA would have been better tools. Since the article only present causes of deviations, but not discuss consequences, then it is really only half a HAZOP study. I think a better choice of tool would be Fault Tree Analysis (FTA) approach in which all causes of a top event, e.g. failure of the traffic control system, is the goal of the analysis. ETA or Event Tree Analysis could complete the study by showing possible consequences of particular failures.

Thus is important not to consider HAZOP studies, or any other tool such as layer of protection analsyis, as a universal tool for all risk assessment studies. Rather the tool among the many available, which best serves the purpose of the study should be choosen. Choosing the right tool for a study requires insight both in the system to be studied and in the different tools available for risk assessment.

Monday, August 11, 2008

Equipment to be banned from plants!

For more than 25 years I have felt, that sight glasses don't belong in chemical plants. 3 years ago the list was expanded with blowdown drums venting directly to the atmosphere. Why? Because there are inherently safer alternatives available.

I would also like to see these kinds eliminated from the textbooks used to educating new engineers at our universities. Maybe authors could introduce a chapter about historical and obsolete equipment in textbooks about chemical unit operations and textbooks about instrumentation.

Recently I was greatly surprised by the article "Level: A visual concept" in the April issue of ISA's InTech. The article appeared in the section automation basics, and was actually based on the book "Industrial Pressure, Level and Density Measurement" soon to be available from the ISA bookstore. My surprise was caused by the fac, that the article appeared to advocate traditional sight glasses with a glas tube in a metal shield for pressures up to a few hundred PSI. My first job was at a Canadian chemical plant in Sarnia, and when I joined in in the early eigthies all sight glasses was already taken out of service. Why? Because sight glasses are a major hazard to the workers, who attempt to use them to determine a drum level. Today alternatives are available using steel tubes - even with remote readout of the level. Such inherently safer alternatives should be used whereever a sight glass was called for 30 years ago.

In the mid 90's a refinery at Milford Haven in southwest England burned for several days, when a blowdown drum was overfilled with liquid, and the gas outlet fractured when due to the liquid head. 3 years ago another blowdown drum overfilled at a refinery in Texas City resulting in an explosion and fire, which killed 15 co-workers. Blowdown drums are necessary to protect our refineries and chemical plants. Usually, you don't want to use them, since anything, which leaves the plant through the blowdown drum is a loss. Therefore it is easy to forget about a piece of equipment, which is not in use during normal plant operations. At Milford Haven the gas pipe from the drum was corroded past its usefull life. Both at Milford Haven and at Texas City the control room operator had no information about the level in the blowdown drum.

This indicates a common problem: Our plants contain equipment, which is only in use during abnormal operational situations. This include flare systems, safety valves, and piping associate with these systems. However, these systems should be maintained as well as the reactor or destillation tower. Only then can we be sure, that they will protect our life and our plant in an emergency situation.

So if you have sight glasses with glas tubes, then replace them with inherently safer more modern sight glasses. If you have blowdown drums in your plant, the ensure, that they are tied to your flare system, and that this equipment is well maintained so it works when needed. Only then can you limit the consequences of process safety events in your plant.

Sunday, August 10, 2008

The biggest danger to your plant!

There has been different efforts at establishing e.g. what is most important. For example the Copenhagen Concensus among economists attempt to decide what is the best investment society can make for the future of mankind. Likewise the British Government have attempted to decide what is the biggest risk to the British society. They find that the biggest risk is not terrorism, but a flu epidemic. The question, which immediately comes to mind is: What is the biggest danger to your plant?

Many facilities especially in North America have since 9/11 performed several security assessment studies of their facilities and reported the results to the ACC and possibly also the Department of Homeland Security. These studies aim to proactively access the likelyhood of a terrorist attack of a particular facility and the potential consequences of such an attack. During a visit to Baton Rouge a few years back I noticed, that the result is an re-inforced perimeter around the plants using e.g. concrete blocks to prevent large trucks from driving through the fence. I have yet to see similar efforts at facilities in my own country. Although the attack on the Glasgow airport a few years ago have already resulted in the establishment of concrete blocks around other airports, such as at Terminals 2 and 3 at the Copenhagen Airport. This indicate, that some security measures spread quicker around the world than others.

Many years ago at a biweekly safety seminar in a major Canadian oil company the engnineer manager asked the audience: What is the most important for the company about you? There was many and varied suggestions. The managers own answer was: Your health!

Without good health you may not be able to go to work. This usually means something will not get done. If a process operator calls in sick, then most companies have in place plans for calling in a replacement, so process safety is not compromised - at least short term. If an engineer calls in sick, then the likely consequences are usually that some development work or some maintenance work gets postponed - at least short term this is not a process safety concern.

Now let us assume, that a flu epidemics stikes your area. Unfortunately current vaccines are not effective against this particular virus, and within a short period a quarter of the population are sick with this flu. Can your company cope with a quarter of your employees being sick at the same time?

So maybe you should be as concerned with employee health as your are with process safety!

Friday, August 08, 2008

Is more regulation the road to better process safety?

Recently the CSB chairman John Bresland called for OSHA to adopt CSB Recommendations on a comprehensive combustible dust standard. It seems that on both sides of the Atlantic the reaction of politicians and regulators are very quick to suggest more regulation after a process safety indicent, such as the explosion and fire at Imperial Sugar Company in Port Wentworth, Georgia, on February 7th this year.
Is more regulation really the way forward? Does regulation really improve process safety? I don't believe it does. Even companies with excellent process safety systems, such as e.g. the Dow Chemical Company, experience from time to time process safety events, which make employees ask the question: Is it really worth the efforts, when we still experience these accidents?
Increase process safety, in my view, is mostly about using once own common sense. Our as one major Canadian oil company put it: "Safety is the art of working properly".

Nonetheless I read on an excerpts from an interview with the CEO of Imperial Sugar. In it he says among other things: "We have treated worker safety as our top priority ... and will continue to do so." ´I think, that this focus is completely wrong! On another web-site this CEO seems to claim, that his companys lag of knowledge abou the dangers of dust explosions is due to inadequate federal guidelines on the handling of dust. I think, that indecate missing a point as far as responsibility is concerned! Isn't it the responsibility of the CEO to ensure, that he hire the people with the skills necessary to safety operate his company?

At BP's Texas City Refinery there was also a focus on workers safety prior to the 2005 explosion and fire, which killed 15 people. Both the CSB report and the Baker Panels report (also found on the CSB web-site) on that event seem to indicate, that there should have been more focus on process safety. I completely agree.

A focus on process safety will ensure, that hazardous substances and equipment are operated and maintained in such a way, that workers cannot be injured. The result of failing process safety is that workers can be injured. We need to avoid that possibility!

So in my view the best thing, which can be done for workers safety is to have the CEO focus on process safety. That way the CEO protect the shareholders from losses such as those involved in rebuilding the Port Wentworth plant after the February fires and explosions. That way the CEO also protect the employees from the consequences of fires and explosions, such as injuries and loss of employment, and hopefully he/she also hires the people with the skills necessary for safe operation of his/her facility.

Now, how do we get the CEO to focus on the right thing? The CCPS has created a 10 minutes presentation for CEO's about the importance of process safety. The EFCE WP on Loss Prevention and Safety Promotion is developing a video with the same purpose, and the CCPS are developing seminars with a similar focus according to their web-site. Will this do the job?

I don't know.

Thursday, August 07, 2008

What to be alarmed about?

As a young engineer with a major integrated oil company in Canada I had limited conception of when an alarm was needed and what was required to implement an alarm. However, that situation was quickly corrected by a more experienced instrumentation engineer, whom I worked with on computer applications for an ethylene unit.
At the time more than 20 years ago he told me, that if we generate an alarm, then the operator must act on it. The least he should do would be acknowledge the alarm. However, for good alarms the engineer should also generate suggested responses to the alarm. He then added, if you cannot come up with a suggested response, then forget about the alarm, since all it will do is contribute the operator frustration. So instead of generating alarms or messages to the operator when our advanced computer control applications for some reason could not do, what they were supposed to do, we implemented graceful degradation. That ment, that on failure of the computer control application it degraded transparently to standard Honeywell TDC 2000 control strategies - and they worked all the time.

With this philosophy about alarms this and other unit at the site was able to achieve alarm rates below the EEMUA current recommendations of 1 alarm every 5 minutes. Without any use of the alarm mangement (READ: Alarm filtering / inhibition / removal applications), which has been popular in recent years.
However, there is a fundamental question, which we did not address at the time in the mid eigthies: What should we alarm about in our refineries, petrochemical plants and pharmaceutical plants?
A few studies have adressed special situations. I am thinking of PCA for monitoring batch operations or crude mass balance simulations to discover when we are trying to put too much stuff where it should not be, such as during the startup of the raffinate splitter at BP's Texas City refinery in March 2005.

Fundamentally an alarm should be generated, when an operator action is needed because a system goal cannot be achieved or as M.Sc. student Tolga Us the Technical University of Denmark recently expressed it, when the goal is under treath. So to decide when to generate alarms we need to look at the goal of our system or subsystem, and specifically when there is a danger or chance, that this goal cannot be fulfilled.
So let us look at the application of this principle to a type of unit, which I am somewhat familiar with: an ethylene gas cracker. The purpose of the ethylene gas cracker is to convert X kg/hr ethane to Y kg/hr ethylene. Notice the quantification! Without quantification we cannot define deviation from a goal - this of course also applies to power plants generating electricity.

So the purpose of our gas cracker is to convert X kg/hr of ethane rich natural gas to Y kg/hr of almost pure ethylene and of course some by-products. This process is essentially two connected processes: the gas cracking furnace, in which the actual chemical conversion is performed, and the so-called light-end, in which the compounds from the furnaces are separated into pure or almost pure compound streams.
One of the by products of the gas cracking process is coke. Coke builds up on in the furnace tubes, and if there is too much coke in the tubes they could block, and conversion will cease. So one way the goal of conversion could be prevented is by too much coke in the furnace tubes. The amount of coke generated depends among other things on the temperatures in the furnace. Hence wrong temperatures in the furnace could also prevent the conversion process by speeding up coke creation in the furnace tubes. So some things to be alarmed about in the conversion process are too much coke in the furnace tubes and incorrect furnace temperatures.
What does this mean in terms of goal fulfilment? If the coke build-up in the cracking furnace is too high, then the ethylene production goal is treathened. However, if the furnace temperature deviate from normal, then the ethylene production goal could be treathened in the future.

In practice the coke build-up is monitored by monitoring the pressure drop over the transfer line heat exchanger at the end of the cracking tube. If this pressure drop exceed a certain value, which depends on many factor such as feed composition, cracking severity, etc. then the operator should be alerted, that a de-coke is called for. Similarly, if the temperature(s) deviate from normal then the operator properly should be alerted, so the situation does not develop into a coke build-up situation. Possible interventions could be adjustment of the secondary air flow to the furnace or adjustment of the gas-feed firing ratio.

So in conclusion: We need to be alarmed about situations, which treathens the fulfilment of one or more system goals, such as production goals. I believe, that this would limit the number of alarms configured on a particular system, and hence some features of alarm management systems may not be needed.

Wednesday, August 06, 2008

What does it take to survive a double digit fatality event?

Recently the The ChronicleHerald reported, that disaster still looms at the BP Texas City refinery. (unfortunately the story is no longer available online - not even cached by Google!). This made me think about other companies, which have experienced double digti fatality events.

The first which comes to mind is Union Carbide. After the 1984 Bhopal disaster the company struckled for some years before the remains was bought by the Dow Chemical Company. Another, that comes to mind is Nupro Ltd, the owner / operator of the plant that exploded at Flixborough ten years earlier. I seem to recall, that the company initially attempted to relocate production to Eastern Europe, but eventually the effort was abandoned.

Then there is the Phlips 66 polyethylene plant explosion in Pasadena, Texas, on October 23rd, 1989 which killed 23 persons. Among them most of the perssons knowledgeable about the facility. At the 2003 SACHE Workshop for professors at ExxonMobil's Baton Rouge facility Angela Summers of SIS-TECH reported, that eventually the follow-up on the 1989 explosion in Pasadena, and other explosions at the site during the nineties involves more than 3,300 action items. Among that a complete new management at all levels at the plant. The result appear to be a culture change, but it has certainly taken some time: more than 10 years.

Then there is BP's Texas City refinery, where the Associated Press story reported by the wants us to believe, that another disaster is just around the corner. Could that really be true? Within half a year of the event in 2005 the BP board had removed all but one person in the command line from the manager of the refinery to the CEO of the company. BP has also started a world-wide education programme in process safety for all their employees.

From reading of the many exellent reports about the BP Texas City refinery, such as BP's Accident investigation report, the OSHA report on the event, the CSB report on the event, and the Baker Panel report, is seems clear, that a culture change was called for at BP's Texas City refinery. I don't believe, that a culture change at a major refinery is easily accomplished. However, I recall a report from a Canadian company at the CCPS conference in Toronto about a month after 9/11. In this report the company reported, that before a new CEO arrived they were experiencing many small fires and explosions in their facilities. Then a new CEO arrived, who demanded to have a report on his desk every day before 10 AM about any fires or explosions the previsous day - no matter how small. This action by the CEO put focus on what most considered nuissance events, and within a relatively short periode of time - less than six months, I believe - these small fires and explosions had been virtually eliminated. That was a quick culture change.

To me it appears, that a similar culture change is called for within BP. BP is much larger than the Canadian company mentioned above. Some would argue, that it would be impossble to implement a similar reporting system within the BP organization, and that it would take CEO time away from other important bussiness issues. However, the CEO is responsible for the survival of the company, and many people believe, that BP would not survive another event like the March 2005 explosion at Texas City. The CEO must put focus on process safety, and by demanding quick reporting on process safety related events on his desk every day he does exactly that!

As did the CEO of the Canadian company!

Sunday, August 03, 2008

Cyber Security or Process Safety?

Recently ISA's InTech magazine printed the article "Peril in the pipeline" with subtitle "Cyber security could have precluded gasoline rupture at Washington pipeline" (InTech June 2008). The backgroup for the article was the rupture of a 16-inch burried pipeline through Whatcom Falls Park in Bellingham, Washington, on June 10, 1999. The rupture resulted a fireball travelling 1½ miles downstream from the rupture location and killed 3 people.
Since pipeline are classified as a transportation activity the accident was investigated by the NTSB, who on October 8, 2002 issued the pipeline accident report "Pipeline Rupture and Subsequent Fire in Bellingham, Washington June 10, 1999". If you read just the InTech article you could easily be left by the impression, that the accident was coursed by the pipeline owner performing development work on its SCADA system without adequate protection of the running system from the development activities, and that on the day of the pipeline rupture this development work has resulted in degraded responsiveness of the pipeline monitoring system. Unless you read the InTech article very carefully you are left with the impression, that "the accident resulted form the database development work that was done on the SCADA system" on the day of the accident.
Wait a minute! This can't be. One cannot design a pipeline such, that the only protection form an overpressure event is a remote SCADA system? There must be independent protective systems, which protect a pipeline from an overpressure event! Actually, if one digs into the NTSB report, it is discovered, that the database development work is just the 5th probable cause of the accident listed. The other 4 are:
  1. damage done to the pipe during water treatment plant modification project and inadequate inspection work during the project;
  2. inaccurate evaluation of inline pipeline inspection results, which led to the company’s decision not to excavate and examine the damaged section of pipe;
  3. failure to test,under approximate operating conditions, all safety devices associated with a products facility before activating the facility, and
  4. failure to investigate and correct the conditions leading to the repeated unintended closing of an inlet block valve to the products facility.
In my view these four probable causes of the pipeline rupture all fall under the heading: Process Safety. Clealy the pipeline owner did not have adequate process safety procedures in place, and that resulted in pipeline rupture event. The cyber security issuse, i.e. the SCADA system degradation on the day of the event, was just a contributing factor, which prevented operators to intervene in a situation, were automatic systems should have prevented the event.

Nonetheless, it is not good practice to perform systems development work on an operating SCADA or BPCS, especially when this is done without the benefit of the security features built into systems such as the VAX multiuser operating system. Development of system software or even key applications should not be done on operating SCADA or BPCS systems without a prior assessment of the risk using established MOC procedures.

Friday, July 11, 2008

openSUSE 11.0 Install Experience

Yesterday I decided to vipe away the Windows XP Home partition at the computer at Slangerup Church Council. This computer has for more than a year been running openSUSE, and only on a few rare occations has Windows XP been booted.

The system is a 4 year old 2.x GHz celeron with 512 MB ram and a 75 GB HD plus a Tandberg Data tape drive and a DVD RW drive and 5 USB slots (2 on the front). Windows including recovery and backup partitions consummed about half of the 75 GB.

Some weeks ago I had already downloaded openSUSE 11.0 GA, so a bootable DVD was burned using the existing openSUSE 10.3 installation. Prior to install all data - about 3.8 GB - was saved to a DVD. The system was then rebooted using the install DVD. Install was selected, and Danish selected for the language, since the system will be used by Council members, who are not very proficient in English.

In order to do away with the Windows partitions I had to select the expert partitioning tool. I had no problem getting rid of the old partitions. But how do you partition 75 GB for a new install? I decided on just two partions '/' of 20 GB, and '/home' of 50 GB. The system also have an old 9.4 GB IBM HD, and this was set op as '/backup'. It would have been nice with some suggestions for partitioning a fresh HD.

During the install I selected every possible package except for 'Cell development', since I don't have access to a Cell processor. This only consumed 5.6 GB of my root partition. Unfortunately I selected 'Automatic Configuration' at the end. This gave a lot of problems connecting to our wireless network after the install finished, so I re-installed and choose manual configuration.

Again I have problems connecting to our router during the install (I did not have these problems with the install of openSUSE 10.3 last year). Eventually I just by-passed the updating during the install.

While playing around with the network setting after the initial install, and also changing to NetworkManager I finally got a wireless connection. Then the fun started.

I did not know, that I would be unable to access YOU if I had activated scheduled patch download.
When I finally got YOU to work it insisted on updating the package handler several times, and each time a re-boot was required.

Am I pleased with openSUSE 11.0? Yes! Was it worth the trouble? Yes! Since I had been playing around with a number of things on the old openSUSE 10.3 installation (openVPN, openLDAP, Samba, KnowledgeTree, Amanda, Rdiff-backup and a few other things).

I have now decided not to use KnowledgeTrre, since I think it is overkill for our small organisation. I have decided to use Rdiff-backup for local disk to disk backup of documents. I have decided to used Samba for file access inside our local church office. I also plan to support openVPN access at a future date.

What I would like to see:
  • Some advice about the different file systems available. I choose ext3 just because I used that before.
  • That NetworkManager is installed by default on the status line.
  • An option to install Adobe Reader during the initial install with creation of relevant links to Firefox.
  • Some more info during test of network connection. Mine seemed just to sit there for a long long time.
So on a scale of 0 to 10 I would rate this install experience as a 9!

Thursday, May 29, 2008

Linux Day in the Danish Parliament

Yesterday was Linux Day in the Danish Parliament. The talk of the day where arranged along 4 themes: political and ethical interests, commercial interest, everyday usage and perspectives. All subjects, which I would consider important for increasing the usage of Linux in the Danish public sector.

A brief opening talk by the chairman of the Parliaments committee on science and technology emphasized the fast pace of change in our times, and the reluctancy of some 68'ers to avoid using new technology, such as mobile phones and computers (One of my high school class mates still refuse to get an e-mail account. I don't understand how he can get by without one!).

I left the seminar early quite disappointed with the ability of the open source or Linux community to arrange a professional event at which all actors take full advantage of the opportunity to sell the open source idea and the commercial products, which builds on it.

It is clear, that getting people involved in very different aspects of Linux, such as lawyers involved in issues around licencing of in-house used and developed software in connection with a business takeower, people involved in re-use of computers in developing countries, and commercial vendors trying to sell a product, to participate in a Linux Day. And on top of that deliver a clear message to the politicians.

The licensing talk left some unanswered questions. If I download a MySQL database and connect it to my commercial electronic laboratory notebook, does that mean the notebook is suddenly also covered by the open source license? My filling is no! Only the software, which I create to make the open source and commercial software work together is covered, but I am not certain.

The talk about re-use of computers in developing countries clearly showed, that using Linux on a computer could extend the life of the hardware by 2 or more years. Such extension of course have very significant impact on the CO2-footprint of a company. So maybe it is time to create such use cases showing how a company can become greener by deploying Linux.

During the commercial breaks I talked with a single person company, which promotes the usage of Linux in schools by using techology such as PXE, FreeNX or NoMachine servers, and NoMachines client software. Given a modern network card you can start openSUSE or another Linux distro from the server using PXE. This ensure, that student files are left on the server. Furthermore my using remove access tools such as FreeNX and NoMachine servers the student get the same experience whether connecting at the school or from the Windows PC at home. And as a side benefit Mom and Dad get exposed to Linux! I went right home, and installed the FreeNX server and the NoMachine client on an computer at our local church office on our 4 year old hardware. Startup is rather slow over a wireless link, but performance after start up is quite acceptable for task such as loking at or copying a small document. The beauty of this solution, is that is expand to the internet with little effort - except security! Thus church council members can be provided with electronic access to all incomming mail instead of making list of incomming mail and distributing these.

Why did the event lag professionalism? First of all the agenda was written in such a way that it was not clear when there were breaks and how lang the lasted. Secondly the announcement indicated, that a sandwich would be served at lunch time. I turned out that only the people exhibiting were supposed to get a sandwich. Thirdly if you are selling professional Linux based software, then make a professional presentation using a tool such as Impress with screen shuts. In these days were most Linux distros have tools for this build in I expect a bit more than bullet points in black text on a white background in a presentation.

Questions from the participants also indicated, that the organizing associations: DKUUG, SSLUG and KLID had no idea what the outcome of the day should be. I guess most of the participants either had played around with Linux or were using Linux as their main OS platform.

Beside the hopefully incorrect impression of the professionalism of the local Linux community I got home from yesterdays seminar with 3 live CD / DVD distros. One is MandrivaOne 2008.1 Sprig. Another is Ubuntu 8.04 (Hardy Heron), and a third is Keldix Linux 20080329. MandrivaOne and Hardy Heron you properly know if you have been around Linux for a short period of time. But Keldix is properly unknown to most people outside Denmark. This is lacal Danish distro based on PCLinuxOS, but with added features for watching our local national TV station: DR TV. All three distros are of course in Danish. This is examples of a new trend in the Linux community: Local influence fingerprints on the distros (DR TV) and a focus on national languages.

In this country like in many other computers are not as widely used among pensioners as among pre-schoolers and schoolers. If we want to interest pensioners in computers a national language Linux distro is a most. After it is installed. Updating not just of the OS, but of all application can be completely automated. This is not the case with the software, which normally comes with a PC in this country. Even if you add software to you Linux system, the automatic updating also takes care of those new applications.

Maybe it is the ease of maintenance, which is the reason behind 125.000 PC's at Russian post offices being switched to Linux these days? Or is the fact, that Russian post office can itself correct spelling mistakes in any messages the system generates? Maybe it is for the same reasons 9000 PC's at schools in the Geneve area is being switched to Linux?

I am writing this on a computer at our local church office, which is running openSUSE 10.3. I do quite a bit of install and remove of software as I learn more and more about Linux. However, since 10.3 was installed last fall I have not worried about maintenance. There is an application,which continuously monitor if updates are available from the sources I have selected during install. Usually once a week it turns red, which means click on me and install updates.

So let us promote Linux not only based on very low initial cost, but also about the very easy maintance of not just the OS, but all the software installed on the system, and also based on the native language versions available. Lets do this as professionally as the competition!

Wednesday, May 07, 2008

From Daily Safety Leader to Looking for Work

In 1982 I joined Imperial Oil in Sarnia as a process control applications engineer. From the first day I was impressed by the attitude towards process safety and ethics.

The company started by sending me on several weeks of course on process control systems and process control application in beautiful New Jersey. That was a wonderful time in the spring months of that year.

As I returned to Sarnia after the courses I was informed, that before I could enter the plant area of the site I had to complete an introductory safety course, and that course would not take place for another two weeks. Until then I was asked to stay in the trailor like office building outside the plant fence.

While working in Sarnia I had to sign the company ethics guidelines every year. All engineers and managers had to do that. Each year the company also provided a summary booklet about possible ways to get in conflict with the ethics guidelines. One year this booklet contained the following three ethics guidelines at the end:

1. Is it legal? Am I maling or proposing a decision that breaks the law or runs contrary to a company policy?

2. Is it fair? Will my decision disadvantage or perhaps even bring harm to anyone - colleagues, customers, suppliers, the community? Or is it a decision that will make those affected by it feel they have been treated fairly in the long run as well as the short term?

3. Can I defend it? If I had to justify my decision to my family or the media would I feel embarrassed and uncomfortable? Or could I explain my decision with pride, believing that in making my decision I have done the right thing?

Since the middle of 1992 I had been the daily safety leader at my last place of employment, but in the spring of 2007 I was formally appointed to that position. In the fall of 2007 I together with a few safety representatives planned an evacuation exercise of one of the buildings at that place of employment. I agreed with these safety representatives, that we would monitor how fast people was comming out of the building, but that we would not report any names to the management. The exercise went quite well, and most people was out within our two minutes success criteria. One person had not heard the initial alarm - you know how knowledge workers can get really into their work - and this person did not leave the building until well after the two minutes had passed. At the next safety committee meeting, where the evacuation exercise was evaluated this was of course reported without any names being mentioned. At the safety committee meeting and after the meeting the manager demanded to get the name of the person, who was slow. I consulted with different experts, and they told me, that I had to give the name to the manager.

I saw a clear conflict with the above ethics guidelines - especially the second and third ones - and refused to reveal the name of the colleaque. About a week later I was fired as daily safety leader.

Prior to this event the manager during the last 10 years have had no complaints about my work as daily safety leader, and at several occations had praised the work being done to improve safety and the awareness of safety issues at the workplace as well as the efficiency of our safety committee meetings.

A few weeks after being fired as safety Leader, I was offered an early retirement package. Now some months later, when I think back of the events, I still think I made the right decision. I have started a small consulting company, and have had my first customers. I have also had time to improve the administrative procedures of our local church office.

Wednesday, April 30, 2008

Living Happily in a Virtualized World

No this blog is not about Second Life. Today I attended a seminar at Symbion a University of Copenhagen conference facility hosted by Novell Danmark on Virtulization technology provided by Novell. If you just listen to the hipe, then you are led to belive, that virtualization technology is the best thing since sliced bread for managing and optimizing your data center. Fortunately or unfortunately I have at the moment no data center to manage or optimize.

I have experimented briefly with virtualization under eComStation in the days when VirturalPC was not a Microsoft product. I had VirtualPC running WindowsXP Pro as a guest OS. It was cool technology. Todays seminar also demonstrated some really cool technology.

The first part showed how you using a product called ZENworks Orchestra could migrate a virtual machine from one computer to another using drag and drop. Without taking the VM down! Well, this is only, as Terry Pretchard would say a lie to children. The python script working behind the scene of-course does momentarily take the VM down or at least prevents it from doing any communication with the outside world.On the 1G ethernet used in the demo the downtime for the VM was less than a second. I don't know what applications was running on the demo VM - if any. So I don't know if you could do this with an Oracle or MySQL database running in the VM. Nonetheless, it is quite impressive that you can do this with a tool programmed in Python!
But what is the business case for this technology? The demo clearly showed that it was faster than a forced failover in most cluster configurations. However, is there a real business case for this in a scenario with just a few servers. I guess, the answer is properly not. On the other hand if you have hundreds or thousands of servers an attempts to optimize the ressource usage in your data center, then automation of this migration based on hardware utilization could be a real business benefit.
Since ZENworks Orchestra is programmed in Python it is properly possible to create rules to implement a functionality in ZENworks Orchestra similar to automatic failover in a cluster configuration. What advantage and/or disadvantages this has compared to a full cluster configuration was not answered at todays seminar. But live migration is definitely a cool technology.
Anything to watch our for? Well, you will not be able to migrate from a pentium II to a celeron processor. As said, the processors in the computers you are migrating between need to be similar. Todays demo showed a live migration from a quad core processor to a dual core processor. That worked!

The speaker at todays seminar was Gabor Nyers ( from Novell in Holland. In his introduction of ZENworks Orchestra it was claimed it could intelligently make decisions. AI implemented in python? I doubt it! In my view this is sales language for saying, that if you can specify intelligent rules in ZENworks Orchestra, then ZENworks Orchestra have no problems following those rules any time, and using real time sensors to decide which rules or part of rules to follow. This is basically what mordern DCS does all the time in the chemical process industry and related industries.
The new thing is that similar technology with ZENworks Orchestra can now be applied to manage and optimize very large data centres for e.g. energy efficiency. Green technology?
A real cool usage of ZENworks Orchestra is for taken backups using the snapshot technology provided by most storage vendors today. Using this feature you can backup your data without having a backup agent in each VM and paying license money for these agents. I was led to belive, that this reduced hours of backup time to just a few seconds. That indeed would be cool technology!

Not so cool the claim of vendors of open source based products, that this prevents lock-in. I believe, that lock-in a a combination of the technology you buy from a vendor and the training and experience, which your people receive by attending vendor courses or using the vendor products. With time the cost of training and experience will far exceed the cost of the technology. Then you are in my view locked-in as much with open source based products as with proprietary tehcnogies. The cost of changing is just too big. Standard or norm based products can make the cost of moving from one vendor to another lesser.

To me the coolest thing is that much of this virtualization technology is freely available with openSUSE 10.3, which I am running on a server at our local church office. Unfortunately virtualization is not on the top three list of projects for that server at the moment - for many reasons.

Tuesday, April 29, 2008

Organizing Workplace Safety

Yesterday I attended a half day seminar on the safety organisation of the future in the Danish Engineering Society, IDA. The seminar, which was arranged by SAM, started with a review of the developments in Denmark over the last 40 years. This was followed by viewpoints form the unions, represented by FOA, and by the employers, represented by HTS. Finally Danske Bank præsented a new approach to taking care of working environments and all the risk associated with it.

It is in my view quite clear, that the responsibility for creating a safe and healthy work environment is squarely with the management. Nonetheless, for the last 40 years or so in Denmark a participatory model for organizing workplace safety has been used. The work environment law demands, that any company with more than 5 employees, must have a safety committee, and if you are a larger company, then you must have safety groups throughout the organization. The safety committee is made of elected employee representatives and appointed employer representatives. With the increasing difficulty of attracting good employees it has also become more difficult to attract employees interested in participating in the work for a better workplace.

The Danish way of organizing workplace safety and its relative success over the years is rooted in our desire to be members of societies or organizations of all kinds. In north America and Canada it is said, that important business decisions takes place at the golf course. In Denmark they take place when parents meet as their childing participate in local activities, such as the soccer club, the swim club, the bicycle club, and os on. With the increasing globalization and the arrival in Denmark of a significant number of emigrants the model is under treat, since the newly arrived don't participate in the club life to the same extend as ethnic Danes.

This might therefore be a good time to review of the involvement of employees in workplace safety could be arranged in a way, which is better aligned with current management and leadership principles. The case presented by Danske Bank at yesterday meeting is one example of how this could be done. By clearly focusing the responsibilit for workplace safety on the local leader the company appearantly have been able to increase employee involvement and knowledge about workplace safety. During the experimental period of the FARMOR project (Google FARMOR Danske Bank Arbejdsmiljø for more information about this project) local leaders were coached by professional workplace consultants, and these also participated in the facilitation of local meetings about workplace safety. Other companies years ago have adopted a similar approach, i.e. eliminating safety leaders in the organization and placing the responsibility squarely with the business group or section or brach leader, who ultimately must find the founding necessary to finance workplace improvement projects. This approach combined with a clear evaluation of the leader on workplace safety issues has been highly successful.

Maybe this is the time, when we in Denmark should change the focus around involvement and participation in workplace improvement activities from how these activities are organized to the business goals of these activities? It is now more than 14 years since a CEO of the Dow Chemical Company to Harward Business Review said, that he could not think of a single investment in SHE, which had not improved the profit of the company.

Sunday, March 09, 2008

Ryerson University in Toronto don't like collaborating students

I just read on the CBC news, that a student at Ryerson University face explusion because he facilitated collaboration among his fellow students. To me it appears as if Ryerson don't like collaborating students.

The days when each students sat behind a small disk in the library for long hours to study a subject has long gone from most progessive universities around the world. Todays business demand engineers who are able to collaborate not just with a colleaque in the office next door, but also with the people in the company or supplier office on the other side of the world.

Such collaboration is what students can learn in the facebook environment created and managed by the Ryerson student. So in my view Ryerson would be better off if they encouraged such collaboration, and maybe allowed the professor to learn some modern teaching approaches at a course, e.g. the ones given by Professor Felder.