Industrial Ethernet was first coined in the late 1990s to describe Ethernet products designed for use in applications with challenging environments. Ethernet was promoted as an alternative to fieldbus technology, delivering openness and reduced costs due to the widespread availability of components for office based networks. Unfortunately, some of these same office devices, more suited to air conditioned environments, were actually deployed in industrial applications, with inevitable consequences of premature failures.
As customers became educated to the need for products with industrial, robust enclosures, ability to withstand high operating temperatures, immunity to EMC interference and rapid ring recovery protocols, reliability and effectiveness of Ethernet networks in industrial applications improved. However, despite users deploying industrial Ethernet devices, poor network reliability and device failures still occurred. Devices were being described as ‘industrial Ethernet’ irrespective of their suitability for use in tough industrial applications. Customers were assuming that because a product had a DIN – rail clip, it had been designed specifically to cope with the demands of extreme environments.
Over time, product reliability has of course improved tremendously, but even today problems still arise. Only recently Westermo was asked to troubleshoot an Ethernet network at a wind farm in the UK. We discovered that the problem was caused by software glitches and unreliable Ethernet switches. The switch vendor will remain nameless, but needless to say the customer was surprised to discover that these ‘industrial’ devices were not up to the task.
High cost of maintenance is critical
Failed and unreliable networking products cost industrial users millions of dollars every year in maintenance, network and product downtime and loss of service. On this occasion the sites were unmanned and hence maintenance visits were costly adding to the significant loss of revenue for the operator whilst the turbine was out of service.
The more critical the application, the greater emphasis there is on equipment reliability. Should a device fail in a mission critical application there might well be redundancy or network repair technology in place. However, there may not be an opportunity to repair or replace the failed product until a scheduled period of downtime.
All devices will eventually fail at some point, but it is important to understand and plan for the consequences of a failure. Firstly, how will a failure affect the network and the application as a whole? Will this cause a network failure and possibly production downtime or loss of service? Secondly if a device fails, how easy will it be to maintain or replace? If a device is installed in an accessible cabinet, onsite, then it can be easily replaced. If they are installed in a remote location, then this will require a service engineer to take a trip into the field to investigate the failure and diagnose the problem. This can represent significant time and maintenance cost. If problems are slow to be resolved, this can affect the availability of service. Often the most significant cost can be if a failure leads directly to a loss or suspension of service for the customer. The impact can be huge in terms of lost income streams and customer confidence. Network and device reliability are therefore critical and raises the question, exactly how reliable are industrial Ethernet products?
When evaluating products from different vendors, customers need not only to know the functionality and price of the product, but also the full life cycle cost including maintenance and possibly the cost of network downtime. Saving a few hundred dollars on the unit price of a device may prove to be insignificant against the cost of travelling out to replace it or worse still a loss of service. The more critical the application, the more important the reliability of the device is.
Ask any vendor how reliable their industrial Ethernet products are and “robust, suitable for industrial use and very reliable” will be the reply. But how is reliability quantified?
Mean Time Between Failure (MTBF)
MTBF is the predicted elapsed time between inherent failure of a device or system during operation. It can be calculated as the arithmetic mean (average) time between failures and is a measure of how reliable a product is, usually given in units of hours – the higher the MTBF, the more reliable the product. Manufacturers often provide it as an index of a product’s reliability and, in some cases, to give customers an idea of how much service to plan for.
However, the numbers quoted must not be confused with the expected service life of a product. During the service life the failure rate is proportional to 1/MTBF. Just because a device has a stated MTBF it does not mean that that product will last that long. In fact, in a system consisting of a large number of devices with low MTBF figures, statistically there can be a failure quite often. For example, if a device has an MTBF of 100 000 hours (11 years) and there are 1000 of these devices in the system, the expectation is of a failure every 100 hours (4 days).
The MTBF figure can be calculated using laboratory test data, actual field failure data, or a statistical model. Live testing in laboratories can give very good numbers if few fail. Real data from the field collected over time tends to be very accurate but difficult to collect. Statistical or computer models offer a good comparison, but do tend to deliver worst case scenario figures.
If we ran 1000 units for 1000 hours and 10 failed the calculated MTBF would be (1000x1000)/10=100 000 hours.
The MTBF is often calculated based on an algorithm that factors in all of a product’s components to reach the sum life cycle in hours.
MTBF = 1/(FR1 + FR2 + FR3 + ........... FRn) where FR is the failure rate of each component of the system up to n components, and FR = 1/MTBF.
The same equation can be used to calculate the MTBF of a complete system consisting of a number of Ethernet devices, with FR being the failure rate of the device instead of the component. So if for example you have three devices in your network with MTBFs of 100 000 hrs, 100 000 hrs and 200 000 hrs, the system MTBF will be 40 000 hrs.
For electronic devices, it is commonly assumed that components have constant failure rates that follow an exponential law of distribution. MTBF is also reduced by increased operating temperature. As a rule of thumb, MTBF reduces by 50% for every 10°C rise in temperature. Normally MTBFs are quoted for use at 25°C. The number of units failing will increase exponentially as you approach and pass the actual calculated MTBF.
Increasing the MTBF of Ethernet devices
By designing and constructing Ethernet devices to the highest standards, it is possible to increase the MTBF of these products. Careful consideration must go into the components used, the quality of these is a major factor in the size of the MTBF. Cheaper, lower quality components, which individually may fail under certain conditions or time periods, will inevitably reduce the MTBF. Multi-board designs leads to lower MTBF and the greater the number of components used within a device, the greater chance that one of these will fail.
The use of electrolytic capacitors can also reduce the MTBF. These components are desirable because they offer larger capacitance, which helps to prevent a loss of service between the time power is lost and a UPS starts to operate. However, electrolytic capacitors dry up over a period of time and this affects the MTBF. Vendors that provide purpose-built solutions overcome this by only using this type of component if the application specifically requires it.
There are a number of other variables that can impact product failures. Aside from component failures, the application/installation can also result in failure. For example, if a customer misuses a product and then it malfunctions, should that be considered a failure? In reality, depreciation modes could limit the life of the product much earlier due to some of the variables listed above. It is very possible to have a product with an extremely high MTBF, but an average or more realistic expected service life.
Different MTBF calculation standards
An MTBF may be an expected line item in a request for quotation. Without the proper data, a manufacturer’s piece of equipment would be immediately disqualified. When equipment such as industrial Ethernet switches must be installed into mission critical applications, MTBF becomes very important. Therefore the higher the MTBF, the higher the reliability, but are all MTBF figures the same?
In short, the answer is no! There is currently a lack of consistency between the MTBF figures supplied by vendors. Two products with the same MTBF figures may not have the same level of reliability. The MTBF figure is certainly a good starting point when understanding reliability of a product, but unfortunately that is not enough. The method used to calculate the MTBF figure is critical.
A number of prediction methods have been developed to determine reliability. The two most commonly used calculation methods when compiling reliability data for Ethernet devices are MIL-HDBK-217F Notice 2 (Military Handbook) and Telcordia SR332. The MIL-HDBK-217F method gives a relatively conservative MTBF for industrial applications since it is based on military component requirements. The MIL-HDBK-217F method encompasses two ways to predict reliability: Parts Count Prediction, which is used to predict the reliability of a product in its early development cycle; and Parts Stress Analysis Prediction, used towards the end of the development cycle as the product nears production. There are also environmental conditions to take into account in the calculation dependent on the location of the installation. These environments vary for ground, mobile, naval or airborne applications. Typically the Ground Benign (GB) environment is used in calculation for industrial equipment.
The Telcordia SR332 method originated from the telecom industry. The calculation is simpler than the MIL-HDBK-217F and tends to lead to higher MTBF figures, hence is sometimes used for products that may not perform well in the MIL standard calculation. Another area to consider is that an operating cycle can be defined in the calculation – by taking an assumption that the unit will only operate 5 days a week and for 8 hours a day a huge MTBF figure can be calculated. It is for the user to decide whether their power substation, train signalling system or water pumping system can be switched off overnight.
Another factor to consider is the operating temperature to which the MTBF relates. Higher operating temperatures reduce the MTBF significantly. The MTBF is usually calculated at an operating temperature of 25°C, so if the device is going to be used continuously in temperatures above that, then it is important to acquire the MTBF at that specific operating temperature. Ask your vendor as this will not always be on the promotional flyer.
It is critical for customers to question what method has been used to calculate a product’s MTBF and to understand that not all MTBF results are the same. For products designed to be used in industrial applications, the MIL-HDBK-217F standard should be used.
When installing industrial Ethernet networks in applications that require low levels of maintenance and continuous uptime, the reliability of individual devices will be critical. Naturally vendors will promote how robust and reliable their own devices are, but where the reliability is backed up with a high MTBF figure, the questions for users must be; what method was used to calculate this, was it the MIL-HDBK-217F Notice 2 (Military Handbook) method and at what operating temperature? It then falls to vendors to ensure this information is available, accurate and with the supporting documentation to ensure customers select the appropriate devices to meet the demands of their application. If the vendor is unable to supply all this, then the user needs to consider whether this is the right product for their application.
|Tel:||+27 11 705 2497|
|Fax:||+27 11 465 5186|
|Articles:||More information and articles about Throughput Technologies|
© Technews Publishing (Pty) Ltd | All Rights Reserved