Monday, April 6, 2015

How to Stop Rising Emergency Preventive Maintenance Costs

By Ananth Seshan, Chairman, MESA Asset Performance Management (APM) Working Group and MESA International Board Member at Large
One of the main objectives of Asset Performance Management is to maximize the net return on production assets. This requires maximizing availability by maintaining the assets in good health. This in turn is achieved by adopting a sound policy of maintenance interventions.  Too much intervention is costly.  Too little – again costly!  In other words, we need to arrive at an optimal balance of proactive and reactive interventions.

Proactive maintenance includes any activity performed for maintaining an asset before it fails. Reactive maintenance is any maintenance activity performed after a failure. Proactive maintenance comprises the following activities:  Emergency Preventive Maintenance (EPM), Preventive Maintenance (PM)and, Predictive Maintenance (PdM).

Reactive maintenance comprises two types:  Emergency Corrective Maintenance (ECM) and Non-Emergency Corrective Maintenance (NCM).  Achieving optimal balance would mean increasing the cost of Proactive Maintenance appropriately in order to reduce the cost of Reactive Maintenance substantially.

Emergency Corrective Maintenance

ECM activities are the costliest and are performed as a result of emergency failures. Needless to say, organizations typically take significant measures to avoid emergency failures. Emergency Preventive Maintenance (EPM) is one of the main contributors to this cost. EPM has to be distinguished from ECM.  In the case of EPM, the machine in question is in a state of imminent emergency failure (has not failed yet) and is prevented from failing by performing appropriate and rapid response EPM activities.
In the case of ECM, the machine has already failed.  Even though EPM is not as costly as ECM, it is the costliest component of the Proactive Maintenance activities and therefore has to be minimized as much as possible. The cost of EPM is normally a function of the following two elements: (1) crisp definition of emergency in the maintenance policy; and (2) prior knowledge of asset failure modes.

Definition of Emergency
The maintenance policy needs to provide clarity on what constitutes emergency. Based on such clarity, a non-fuzzy method of unequivocally identifying a maintenance event as emergency or non-emergency should be practiced across the organization as a standard operating procedure (SOP). 
More often than not, the policy document is unclear or outdated, and therefore the SOP derived from it is imprecise or irrelevant.  This leads to the SOP not being followed in a standard way. Lack of a standard, even in pockets to begin with, creates a culture of ad hoc practices. Such tendencies increase the cost of EPM quite rapidly and need to be nipped in the bud. 

Also, it is imperative that the maintenance policy document is updated every couple of years since the market needs and the asset base of the organization continuously changes and new dynamics get introduced all the time. Typically, it is useful to conduct an audit of the adherence to the maintenance policy every six months, especially, as regards to interventions. 

More specifically, as regards to EPMs the audit can review the top 30% of the costliest emergency interventions in the previous six months to understand the following: 

  1. How many emergencies were real emergencies (or True Positives) and were avoided due to EPM interventions? 
  2. How many of them were false alarms (False Positives) – maintenance calls wherein the team upon reaching the site realized that they were not actually emergency in nature and therefore resulted in wasted EPM calls? 
  3. How many of them were not visible or “hidden” until it was too late (False Negatives), therefore, proactive measures could not be taken and consequently resulted in emergency failures?
Answers to these questions will help in the understanding of the effectiveness of the emergency maintenance practices within the organization and help in reducing the costs of EPM activities and the incidence of false negatives in the future.

The audit can also reveal emerging symptoms of variance in the adoption of the SOP and the fuzziness in the SOP which can lead to improvement of the SOP and more training and emphasis on the SOP. If there are a larger number of false positives, then a proactive investigation is required into why the responsible owners for making the call are suddenly unable to do their job correctly. There can also be simple hidden reasons for ordering emergency maintenance such as the labor rates for maintenance work being higher under EPM than in regular PM activities.

Prior Knowledge of Asset Failure
Another factor that has an impact on EPM costs is the prior knowledge of asset failure modes – that is, how well informed is the organization on the potential causes of failure of a critical asset. 

Every critical asset will have a history of failures. Typically, root causes of failures of assets are recorded and maintained in the Enterprise Asset Management (EAM) systems. Often the cause of previous failures can teach us to look for symptoms of the same cause recurring in a future period.

An illustration of a P-F Graph punctuated by progressive failure states  (Source – A Blue-print for Reliability Excellence in Water Utilities, 5G Automatika Ltd.)

The P-F Graph (Potential Failure Graph) is one such tool that can be used for the prognosis of a potential failure of equipment that is at a certain current state. P-F Graphs, such as the one shown above, can be synthesized for critical assets by recording a progressive set of symptoms that one can observe during a certain time period before an emergency failure happens. The critical asset can be monitored for potential failure symptoms using a state of the art real time monitoring software. The same real time software can be used to record new symptoms.

These inputs can actively drive PMs (dynamically refined PM schedules based on early symptoms of potential failure) or PdMs and thereby reduce the need for EPM activities – in other words, the organization can strive to prevent an asset from reaching the state of imminent emergency failure via active Preventive or Predictive Maintenance interventions, which in turn, would reduce the need for EPM responses which are performed very close to a failure and so are costlier.

Dr. Ananth Seshan is the CEO and Managing Director of 5G Automatika Ltd., a high technology software product company from Canada. The company is headquartered in Ottawa and has operations in the UK and India. He has been the main thought leader behind the successful, flagship product of the company, Enterprise Gateway. Enterprise Gateway has a user footprint in 13 countries globally and more than 100 installations in large manufacturing organizations and utilities. It is the first of its kind off the shelf product in the niche area of "Plant to Enterprise" Integration. (P2E). 

No comments: