Giulio Di Gravio* and Riccardo Patriarca
Department of Mechanical and Aerospace Engineering, University of Rome “La Sapienza”, Via Eudossiana, 18, 00184 Roma, Italy
Received date: October 08, 2015; Accepted date: June 28, 2016; Published date: June 30, 2016
Citation: Gravio GD, Patriarca R (2016) Safety Performance of Complex Systems: Lesson Learned from ATM Resilience Analysis. Ind Eng Manage 5: 193. doi:10.4172/2169-0316.1000193
Copyright: © 2016 Gravio GD, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Visit for more related articles at Industrial Engineering & Management
The Air Traffic Management (ATM) system has become steadily more complex due to rampant technological, procedural and societal developments and to the increase in traffic volume. These factors have become gradually more difficult to understand and manage, mainly because of tight couplings among functions and because of the continuous development characterizing everyday activities. According to this view, traditional safety analyses, basing on the belief that the systems are completely known and a causal-effect link could ever be easily detected may become ineffective. Furthermore, these methodologies can evaluate only linear causal dependencies. It is necessary therefore to evolve ATM risk assessment from its classic view of safety (Safety-I), to a new one, integrating the principles of resilience engineering (Safety-II). This editorial article presents the complexity and the outcomes deriving from resilience engineer methodologies, aiming at illustrating possible guidelines for managers and academics.
Safety performance; Air traffic management; Rampant technological
The traditional definition of safety is “a condition where nothing goes wrong or where the number of things that go wrong is acceptably small”. This indirect statement may appear somewhat paradoxical since safety results in “what happens when it is missing” and its measures is not a quality in itself but by the consequences of its absence . Risk governance and safety management, therefore, have traditionally, and with good reason, been concerned with what can go wrong and can lead to unwanted outcomes. Generally, investigations relies on the historical approach of listing up adverse events experienced during an accident to find out the causes of each adverse occurrence and to purpose countermeasures to eliminate the causes. Safety is considered implicitly as a performance achievable by eliminating the causes that contributed to the accident. The process of describing the events and the subsequent actions to impose barriers to future happening requires a large set of analysis with a complex structure . Figure 1 shows this common practice, which corresponds in analyzing in-depth only the areas, named in Figure 1 as disasters, accidents, incidents and, occasionally, near misses. Note that Figure 1 qualitatively represents all the possible outcomes of everyday performance in an ultra-safe system (e.g.) the Air Traffic Management (ATM) system. In detail, the x-axis describes predictability, ranging from very low to very high, the y-axis describes the value of the outcome, ranging from negative to positive, and the z-axis describes the frequency of possible outcomes.
Figure 2 addresses the safety characteristics of a potential ultra-safe system, where the probability of a system failure is (e.g.) 10-4. This data gives a clear overview of the characteristics of conventional safety research. Traditional analysis focuses on just 1 out of 10000 events while, for every time something goes wrong, there are 9999 times where things go right and lead to the desired outcome, not deserving, according to this view, any kind of analysis .
Along with this point of view, decomposition of systems in their components allows a detailed and stable description, enabling an accurate analysis of the causes of events. By the way, as technical and socio-technical systems are continuously developing, work environments have gradually become more difficult to understand, with reference to their complexity .
As a result, since the classical safety analysis assumes that systems are tractable in the sense that they are well-understood and wellbehaved, classical models and methods become  persistently not ready to portray and genuinely focus on security. The probability of occasion of a prosperity event is a direct result of a couple of variables, dependent upon the external condition, i.e. the working system and to the inward condition, i.e. the organization level the structure needs to guarantee. It is then possible to associate these conditions to a probability of occasion of the security events and all things considered a probability of frustration of the system. A mistake however can happen free of the conditions of the system, due to bizarre components or to an unusual mix of a couple obvious parts.
It becomes therefore necessary to adopt a new point of view , changing the definition of safety from “avoiding that something goes wrong” to “ensuring that everything goes right” or, more precisely, “the ability to succeed under varying conditions, so that the number of intended and acceptable outcomes is as high as possible”.
This perspective introduces the definition of resilience and paves the way to Resilience Engineering  itself, which focuses on the whole set of outcomes: things that go right as well as things that go wrong. The only possible exceptions, comparing Figure 1, are the areas of serendipity and good luck, where the situation is mostly in the hands of fate.
The resilience perspective emerged from ecology, by Holling  who analyzes interacting populations and their functional responded in relation to ecological stability theory. The concept has been then in depth analyzed and evaluated insomuch as it has influenced other fields like anthropology Vayda and McCay , non-linear dynamics Common and Perrings , cultural theory and human geography Zimmerer , modeling organization complex systems and social sciences Costanza et al. , as described by Folke  detailed literature review.
In recent years, resilience has also gained considerable interest in the ATM system, mainly through the safety-related researches of Hollangel and his co-workers (Hollnagel et al. [14,15]; Hollnagel ; Woltjer ; Macchi et al. ). Resilience has also received an increasing amount of attention in the area of risk management and safety management over the past several years, focusing on critical infrastructures, communities, regions and on various subsystem (like a region’s economy, governmental units, etc.).
In the industrial context, several methods have been developed, or are currently under an improvement process (e.g.) the Functional Resonance Analysis Method (FRAM)  and the System Theoretic Accident Modeling and Processes (STAMP) [18,19], which both agree that a system-wide evaluation is strictly necessary to consider resilience performance of an organization. Resilience acquires, indeed, a fundamental role in the ATM system, where large numbers of interacting human operators and technical systems, acting at different levels in a variety of locations, must control air traffic safely and efficiently in the context of uncertainty and disturbances . The SESAR MAREA WP-E Research project , started in March 2011 and completed in October 2013, has proved the importance of resilience engineering in ATM, by the development of an early stage mathematical modeling to support its effective implementation.
In the ATM structure, a solid perspective totally addresses the issues that the system requires similarly as prosperity and anomalous state quality execution. Regardless of the way that strategies and bearings tend to show working techniques, the versatility of the system and structure oversight by human directors are indispensable for successful and safe operations in normal and phenomenal conditions. In this way, the methodology of evaluating these capacity honest to goodness in order to address deficiency and fortifies of the specific ATM affiliation and, similarly, arrange activities or exercises to upgrade these aptitudes grabs a crucial interest.
Even though the progress in safety management made flying one of the safest ways to travel , there is a strong consensus that safety is something that always needs to improve in order not to maintain it static or inadequate. Figure 3 shows some well-known methods, over the years, to address technical, human factors and organizational issues.
ICAO defines  safety as “the state in which harm to persons or of property damage is reduced to, and maintained at or below, an acceptable level through a continuing process of hazard identification and risk management”.
This conception lead to focusing, with good reason, on the adverse outcomes (accident, incidents), trying to reduce their numbers and limit their effects. The classic vision of Safety-I ensures that adverse events happen because something goes wrong and, above all, that it is possible to find and treat its causes. This belief, the causality credo, has been considered valid for decades and many different models defined practical tools for its application to real cases.
The Domino model , which firstly spoke to the straight causalimpact join, remained constant for frameworks described by low determination levels and couple of, exceptionally straightforward, connections between their subsystems. The undeniably intricate and barely intelligible socio-specialized situations created in the latest years required more progressed and capable models. Specifically, in the ATM framework, the Reason Swiss Cheese Model  acquired a fundamental role in most of the risk management analysis. The core idea of EUROCONTROL Safety Regulatory Requirements (ESARRs) bases upon the Reason Swiss Cheese Model , which relates a system failure to an alignment of all the metaphoric barriers weakness, permitting “a trajectory of accident opportunity” where a hazard passes through all of the holes in all of the defenses leading to a failure [28,29].
FAA and US Naval Safety Center  with the contribution of EasyJet  developed the Aerospace Performance Factor (APF), a methodology capable of evaluating the overall safety level of the ATM system and offering some user-friendly outcomes for the decisionmakers .
Although the APF building process could present some complexity in a real case implementation , the APF indexes give a performance evaluation tool, capable of taking into account safety performance by the analysis of the each safety event historic count and the weighted combination of these time series into a single value, representing an overall risk. This value could be broken down into its components to analyze specific causal factors. Di Gravio et al.  applied the APF methodology to the Italian ANSP, obtaining a robust and useful safety performance tool.
RSCM, its derivate methods and all the traditional ones, often referred as Safety-I , advance a bimodal viewpoint of work and activities, as demonstrated by which commendable and unacceptable results are a direct result of strategies for working, unmistakably particular. Exactly when things go right it is by virtue of the system limits as it should and in light of the way that people fill in as imagined. Right when things turn out seriously it is in light of the fact that something has separated or failed (Figure 4). As showed by this view, a more secure system is a structure where the move from run of the mill ability to abnormal (or glitch) is blocked or minimized.
Another fundamental assumption of Safety-I allows systems as decomposable into meaningful constituents, both in mechanics systems and in “soft systems” (departments, roles, stakeholders, etc.).
The classic safety thinking is therefore based on the following assumptions :
• Systems are decomposable and well-understood
• Systems and places of work are well-designed and correctly maintained
• Procedures are comprehensive, complete and correct
• Operators behave as they are expected to and as they have been trained to
• Designers have foreseen every contingency and have provided the system with appropriate response capabilities.
It is easy to understand that, in many systems, like the ATM, these assumptions result inappropriate and therefore a new perspective must be developed.
Accident analysis and risk assessment methods have usually been developed in response to problems following major technological developments or to cope with “new” type of accidents. As for Figure 3, it is noteworthy that human factor methods came onto the scene after the accident at Three Miles Island in 1979 and that organizational methods were developed following the Chernobyl and Challenger accident in 1986 .
Conditions for work in ATM significantly changed over the past decades. In detail, between 2009 and 2014, revenue in the global aviation industry grew at a compound annual growth rate of around 7.4 percent, reaching $9 billion U.S. dollars net profit in 2014 . This financial performance is clearly the result of a rising number of air cargo and passenger figures, which in turn are driven by a world that is increasingly becoming more and more affluent and interconnected.
Air ship developments as far as airplane takeoffs and air ship kilometers flown for the period 2005-2025 have been required to increment at normal yearly rates of 3.6 and 4.1 for every penny, individually. The development of traveler activity on the real global course gathers has been relied upon to run from 3 to 6 for every penny during that time 2025. In point of interest, at European level, the flight development balances out at around 2.6% expansion for every year, indicating higher rates in 2016 and 2020, as exhibited in EUROCONTROL  for the EUROCONTROL Statistical Reference Area (ESRA) analysis. Table 1 provides summaries of traffic forecast which highlight a conspicuous growth in three scenarios, i.e., high, base or low growth.
|Flight Movements (thousands)||High-Growth||.||.||.||.||9834||10228||10675||11089||11487||11957||12332|
|Annual growth (compared to previous year)||High-Growth||.||.||.||.||2.40%||4.00%||4.40%||3.90%||3.60%||4.10%||3.10%|
Table 1: Summary of the flight forecast for Europe .
Other than these significant changes noticeable all around activity volume, likewise the Air Traffic Control (ATC) strategies multifaceted nature drastically changed, keeping in mind the end goal to react to the execution requests. In light of present circumstances, the headway of advancement itself, the IT programming limit has chosen a critical change of instruments structure and relative human capacities. Henceforth, unmistakably the conditions in which Air Traffic Controllers (ATCOs), and the entire subject required in ATC, work end up being more personality boggling and more related. Not a lot of variables are genuinely free from each other and thusly a change to any of them will impact the others in ways that are as frequently as could reasonably be expected hard to get it. Separating issues and looking at them in a one-by-one procedure could get the opportunity to be deficient. The ATM framework, and also numerous other present-day frameworks of real industry for modern wellbeing, turns out to be in this manner obstinate. The intractable systems (e.g. the financial systems, the space missions, the military operations; EUROCONTROL ) have common traits: the principles of functioning are only partly known, descriptions are elaborate, contain many details, even difficult to explicate, and take a long time to be accomplished, insomuch as the system changes before the description can be completed. Consequently, it is never possible to provide a description or specification of the system.
Since the models and methods of Safety-I assume that systems are tractable in the sense that they are well-understood and well-behaved, they become slowly ineffective to describe systems with a growing complexity and interdependences, such as the intractable systems.
A possible solution is therefore to change the definition of safety, focusing on what goes right rather than on what goes wrong. On this fundamental belief, this new perspective of safety traces a possible definition of resilience, just paraphrasing it. Although several documents discuss the concept of resilience in very broad terms and without reference to a specific object of analysis, there can be considered two main families of definitions. The ones that focus on what happens “after the adverse event” [37-39], and the ones that include one or more “before the adverse event” component [40,41], including resistance, protection, anticipation and preparedness. For the purpose of this paper, it is necessary to consider the entire process, assessing the characteristics of the system before and after the adverse event.
In the ATM system, indeed, the management of safety has become a systematic and structured process which is integrated in all operating and support processes . Safety, in the Safety-II view, is the ability to succeed under varying conditions , so that the number of intended and acceptable outcomes, i.e. everyday activities, is as high as possible.
Safety-II acknowledges that systems are incompletely understood, that descriptions can be complicated and that changes are frequent and irregular rather than infrequent and regular, i.e. those systems are intractable. In this conception, systemic methods, based on Resilience Engineering , acknowledges that acceptable outcomes and unacceptable outcomes have a common basis, namely everyday performance adjustment (Figure 5).
The basis of resilience is the awareness that individuals and organizations habitually adjust their performance to match the current demands, resources and constraints , in order to compensate the incompleteness of procedures and instructions . Thus, performance variability should be managed, more than constrained. It is necessary to identify everyday situations in order to analyze the variability of normal performance and the way in which they may combine to create unwanted effects. It is also necessary to continuously monitor how the system functions in order to intervene and dampen performance variability when it threatens to get out of control and, on the other hand, accentuate or amplify it when it can improve successful outcomes. Safety-II characteristics summarize as follows:
• Systems cannot be decomposed in a meaningful way.
• System functions are not bimodal but everyday performance is flexible and variable.
• Human performance variability leads to success as well as failures.
• Even though some outcome can be interpreted as linear consequences of other events, some event results of coupled performance variability.
Since resilience however refers [44-47] to something that an organization does (its ability to adjust the way things are done) rather than to something that an organization has (e.g. traffic count, number of accidents/incidents), it is difficult to measure it by counting specific outcomes, requiring the adoption of specific models.
FRAM (Functional Resonance Analysis Method) uses a non-linear model based on the assumption that accidents result from unexpected combination (resonance) of normal performance variability. FRAM characterizes complex systems by the functions they perform rather than by their structure. It captures dynamics and interactions among functions by modeling non-linear dependencies and performance variability of system functions .
On the other hand, STAMP (Systems-Theoretic Accident Modeling and Processes) considers systems as interrelated components that are kept in a state of dynamic equilibrium by feedback loops of information and control . The system is not treated as just a static design but as a dynamic process that is continually adapting to achieve its end and to react to changes in itself and its environments [49-51]. The process leading up to an accident can be described in terms of an adaptive feedback function that fails to maintain safety as performance changes over time to meet a complex set of goals and values. The accident result not simply from component failure, but from inadequate control of safety-related constraints on the development, design, construction and operation of the system [52-55].
Dramatic changes to the airspace environment affected ATM system in the last decade, extensively modifying the way in which the ANSPs, the ATCOs and all the other figures act. Equipment, procedures and human factor interactions became more complex, while only a correct understanding and management enable to achieve high-level performance targets. According to this view, although safety analysis and risk assessment methodologies significantly evolved from their starting perspective, several accidents and incidents happen anyway . These events, often characterized by difficult etiology and complex causality structure, highlight the need of a different perspective in ATM safety management.
Security must be managed not simply by obstructions and resistances according to a responsive perspective however by proactive and adaptable strategies, paying little heed to the way that considering the point of confinement of the advantages, the irreducibility of the insecurity and the closeness of different conflicting destinations that portray common activities [57-59]. In this setting, quality building and the rule of Safety-II could help in accomplishing new anomalous state standard for the ATM structure security organization.
Make the best use of Scientific Research and information from our 700 + peer reviewed, Open Access Journals