IT in Manufacturing


Recovering from industrial data disasters

June 2022 IT in Manufacturing

It can quickly become your worst nightmare. The plant’s systems are down, and the backups don’t work. Production has stopped. Everyone is looking to you to sort it out quickly.

As manufacturers increase their use of digital technologies, so the amount of data grows. This comes with increased risk, which is further exacerbated by the variety and complexity of new interconnected systems.

The way manufacturers manage risk is by assessing the probability of failure together with the consequences. The risk of automation system failure and the associated data loss is high because of the severe consequences. Aside from bringing production to a sudden halt, there are often safety and environmental implications when shutting down a plant. Risk mitigation must reduce both the likelihood of failure and the consequences.

Risks arise from expected and unforeseeable sources

There are many possible causes of system failure, not all of which can be prevented. Failure might arise from human error, malicious activity, natural disasters or equipment faults. Often there is a single point of failure. These common-mode failures might include a shared power supply or utility, security services, reliance on a third party, rolling out incompatible operating system patches, and many more.

While we usually consider significant disasters such as floods, it is more common that system failures will arise from less obvious causes. A severed cable or a water leak onto a vital computer circuit can go undetected for days. What about malicious damage or sabotage? When formulating a disaster recovery plan, it is helpful to remember that you will not be able to identify and prevent every possible cause of failure.

Because failure is inevitable at some stage, you must implement proper controls that serve to limit the consequences. Disaster recovery (DR) is an integral part of business continuity planning (BCP) as it ensures that proper mitigating controls are in place to protect the organisation from loss, corruption or compromised information.

Central to a well-formulated disaster recovery plan is making a determination of the system’s recovery point objectives (RPO) and the recovery time objectives (RTO). For example, you might decide that a specific PLC needs to be restored within two hours to a particular software version (which might not necessarily be the most recent update). Or you must regain your scada system within six hours to a point where you can retrieve data for the past 30 days. A laboratory management system might need to be up and running in four hours. And so on.

Backup and restore procedures

Backup and restore procedures will form an integral part of disaster recovery. Backup and restore systems may be on-premises in the same data centre, in an offsite location or even in the cloud. Each of these configurations will affect the time to recover your plant. There are also implications on the network infrastructure to guarantee data transfer rates during both ongoing operations and the recovery process.

When the cloud is used as a backup data store, it is vital to understand how your data is safeguarded. Service level agreements must cater for disaster recovery procedures that align with your recovery objectives. Not all cloud vendors and infrastructure providers are equal in this regard, so do your due diligence carefully.

A variety of PLC distributed control systems (DCS) and scada systems will be at the heart of the automation and control in any plant. The safe operation of the plant will rely on multiple interconnected systems, some of which might no longer be supported by the vendor. A failure in any subsystem that is not repaired quickly could lead to shutting down sections of the plant. It is also vital to back up every point of integration.

The risk of manufacturing system failure can be reduced by having some redundancy together with regular backups. In mission-critical process control applications, redundancy might involve installing a ‘hot standby’. Backups will then act as a second layer of defence. Remember that redundancy will introduce additional costs and can pose an added risk.

Reliable systems do not equate to clean backups

It is possible to gloss over and confuse the techniques for improving reliability with a backup. For example, having hot-swappable hard drives in a redundant array with self-diagnostic capability will enhance reliability and might ‘tick the box’ in your mind. But this system of reliable hard drives is not enough if the data itself becomes corrupted, whether through failure elsewhere in the system or malicious activity. A second data centre on-site with a hot standby is also of little use if the data corruption has been replicated. You need to be able to restore backward in time to a specific point where you know the data was not compromised.

The DRP itself could fail. This is quite possible because it is hard to test a complete backup/restore without creating some form of disruption. Production pressures can limit the window for shutdowns needed to test such systems thoroughly. A full backup/restore test should also involve the vendors who are responsible for subsystems. Often, subsystems are tested independently and you simply accept the risk of not testing the integrated whole. It is important that you also understand the risk of an incomplete DR test and how you will mitigate it.

Cost-cutting might, in the past, have resulted in your company cancelling service agreements with OEM vendors and taking over the responsibility of specialised or proprietary disaster recovery in-house. But with that responsibility comes the need to ensure the right skills are available at short notice during a system failure. Over time these specialised skills tend to dissipate, leaving the organisation vulnerable.

The importance of a regular risk review and continuous auditing of the effectiveness of your control measures cannot be overstated. Just because something has never happened does not mean it never will. Complacency is a real risk and needs to be constantly challenged – test, test and re-test your disaster recovery plan.

IT security professionals advocate a zero-trust approach whereby you make no assumptions about the trustworthiness of any factor outside your direct control. A similar uncompromising and critical approach is essential to also ensure the continuity of industrial systems.


About Gavin Halse


Gavin Halse.

Gavin Halse is a chemical process engineer who has been involved in the manufacturing sector since mid-1980. He founded a software business in 1999 which grew to develop specialised applications for mining, energy and process manufacturing in several countries. Gavin is most interested in the effective use of IT in industrial environments and now consults part time to manufacturing and software companies around the effective use of IT to achieve business results.

For more information contact Gavin Halse, Absolute Perspectives, +27 83 274 7180, [email protected], www.absoluteperspectives.com




Share this article:
Share via emailShare via LinkedInPrint this page

Further reading:

Siemens ecosystem strengthens data and AI integration
Siemens South Africa IT in Manufacturing
Siemens has announced significant expansions to its Industrial Edge ecosystem, accelerating data and AI integration and releasing enhanced cybersecurity functionalities. These enable a seamless integration of IT and OT environments, optimise processes and reduce operational disruptions.

Read more...
Siemens manages shipbuilding process for HD Hyundai
Siemens South Africa IT in Manufacturing
Siemens has been selected by HD Korea Shipbuilding & Offshore Engineering as a preferred partner to establish an integrated platform to manage the entire shipbuilding process as a single data flow to help ensure consistency across all its global shipyard facilities.

Read more...
Transforming the process industry through digitalisation
Endress+Hauser South Africa IT in Manufacturing
By connecting field devices, systems and people, digitalisation creates new opportunities to optimise operations, enhance maintenance strategies and support continuous improvement. As a leading instrumentation provider and major source of process data, Endress+Hauser plays a key role in enabling this transformation.

Read more...
The OT operator’s guide to security and uptime on the plant
RJ Connect IT in Manufacturing
The article addresses three common questions about industrial network deployment and maintenance, exploring ways to achieve better control and visibility with more efficiency.

Read more...
The assets you can’t see are the ones that can shut you down
IT in Manufacturing
ABEGuardOT is an asset management solution that delivers continuous, non-intrusive visibility across multi-vendor environments, including Siemens, Rockwell, ABB, Honeywell, Schneider Electric, Emerson, GE and Yokogawa, with support for OPC UA, EtherNet/IP, Modbus and Profibus.

Read more...
Edge I/O NTS and the need for industrial speed
Schneider Electric South Africa IT in Manufacturing
One of the most compelling solutions to emerge from industrial automation is Edge I/O NTS, which represents a natural evolution of computing from centralised servers to localised, device-level input/output processing, offering improved speed, efficiency and resilience.

Read more...
The next wave of AI-driven process automation
Schneider Electric South Africa IT in Manufacturing
As process industries hurtle toward an AI-driven future, four powerful trends are set to redefine automation strategies in 2026: hyper automation, AI-first automation, low code/no code platforms, and advanced process intelligence.

Read more...
Huge increase in denial-of-service cyber threats
IT in Manufacturing
NETSCOUT has released its Distributed Denial-of-Service Threat Intelligence report, revealing sophisticated attacker collaboration, resilient botnets and compromised IoT infrastructure that drove more than eight million DDoS attacks worldwide.

Read more...
Sustainable manufacturing
ABB South Africa IT in Manufacturing
ABB’s production facility in Shandong province, China is delivering measurable energy and emissions reductions through the implementation of advanced digital energy management and electrification solutions.

Read more...
Open automation is breaking legacy chains
Schneider Electric South Africa IT in Manufacturing
Industrial automation is now entering a new era defined by open, software-driven principles that are breaking decades of hardware-bound limitations.

Read more...









While every effort has been made to ensure the accuracy of the information contained herein, the publisher and its agents cannot be held responsible for any errors contained, or any loss incurred as a result. Articles published do not necessarily reflect the views of the publishers. The editor reserves the right to alter or cut copy. Articles submitted are deemed to have been cleared for publication. Advertisements and company contact details are published as provided by the advertiser. Technews Publishing (Pty) Ltd cannot be held responsible for the accuracy or veracity of supplied material.




© Technews Publishing (Pty) Ltd | All Rights Reserved