Historian's demise greatly exaggerated
November 2010, IT in Manufacturing
Today, manufacturers in all industries must be more competitive, and understand exactly how to tune and control their manufacturing processes in an optimal manner to stay competitive. Automation software is sophisticated and gives operations personnel the visibility of the plant state in real time, enabling them to make changes and tune process parameters to achieve the optimal state, regardless of end product.
The plant may be making bread, gasoline or pharmaceuticals, and without the constant control and feedback from their process systems, operators can seldom produce optimal output. However, real-time control is not the only control that enables optimal production. Analysis of previous production statistics allows engineers to learn what went right or wrong about a previous production run. Knowing how a change in one variable parameter can affect the yield of a process is highly valuable information. Typically, it takes multiple production runs before one is deemed perfect enough to be a role model for future production – often termed a ‘golden batch’.
To enable the analysis of a process, it is necessary to record information regarding operating parameters and states at the time of production. This is where the plant historian becomes a useful application tool. A plant historian is a database system designed to record as many parameters of a manufacturing process as needed. Typically, vast amounts of data are created during a manufacturing production run, and often data changes occur at high frequencies – especially when something goes wrong. This information must be stored accurately and timely.
Does any database system meet this challenge? Some proponents would suggest that a commercial, off the shelf database can be used, but in reality, this is not the case.
Just because the New York Stock Exchange trading systems and other high-throughput applications use a relational database does not mean they are the correct database for everything. For example, take time-series data; can SQL Server or Oracle store time-series data? Certainly. Are there issues you should know before you try it? Absolutely. In this paper we will review some myths on how a commercial database can be used as a plant historian, and why these ideas are not as sound as they seem initially.
Myth 1: Storage is so cheap that efficiency does not matter
You must fully understand how much data a typical process actually generates. A modest 5000-tag historian logging data every second generates 157 billion values per year. Stored efficiently in 8 bytes each, that is roughly a terabyte a year. In certain tests, comparing SQL Server storage requirements for time-series data to the Wonderware Historian, the difference was 50:1 and included all the necessary indices. Even with storage prices falling, 50 terabytes a year is a considerable amount of data. In addition, you must recognise that having enough disk space to hold that much data is insufficient, most historian applications also require that the data be protected, multiplying the amount of storage for backups or disk mirroring. Some industries have regulatory requirements for several years’ worth of data, further amplifying the required storage amount.
Myth 2: Relational databases are fast enough
As hardware price-performance has improved, relational databases have benefited. However, relational databases are designed to protect referential integrity around ‘transactions’ that may update multiple table values in unison, adding significant overhead. For example, on high-end hardware (running 96 processors) SQL Server 2008 established a record 2013 transactions per second. Even on such high-end hardware it is not possible to store 5000 values per second and treat each value as a transaction. Therefore, a front-end buffering application must collect the data and stream numerous values into the database in a single transaction. Databases without full transactional support, such as MySQL’s freeware MyISAM storage engine, can support higher throughputs, but still require a front-end buffer to achieve adequate throughput for all but the smallest historian applications.
Obviously, the main reason to store data is so that it can be retrieved easily, making retrieval performance extremely important. In general purpose solutions, such as a relational database, it is possible to organise data so that it is either efficient to store (higher throughput) or efficient to retrieve (fast retrieval), but not both. Efficient retrieval of time-series data from general purpose databases requires the use of a clustered index, such as in the higher-throughput MyISAM storage engine, which is currently unavailable in general purpose products.
In contrast, purpose built storage engines are designed specifically for time-series data leverage, knowledge of how data is collected and consumed allows it to be stored efficiently for both – this would not be possible if the data was generalised.
Myth 3: Managing data in a relational database is trivial
Relational databases are designed to accumulate massive amounts of data. However, as the amount of data grows so do query execution times, the size of backups and numerous other routine operations. To alleviate this problem, database administrators routinely purge data from the system. In any database that protects transactional integrity, this purge operation must suspend normal updates, which is a problem for historian applications running 24/7/365. To make the actual purge operation tolerable requires minimising the amount of data maintained in the database.
In the event purged data is required later (for example, in response to an audit or some regulatory demands); the data cannot be easily restored. Generally, recommended practice is to restore a full database backup that includes the required data to a separate system dedicated for this purpose. This is even more problematic if the required data is unavailable within a single database backup. For example, if the data is only maintained for the last 30 days in the online database and an audit requires 90 days of data, you must either manually merge all the data into a single database, requiring three systems each with an isolated 30-day window, or serially examine each backup.
True historians, in contrast, are designed to handle the rapid growth in data and provide simple means of taking subsets of the data offline and online.
Myth 4: Retrieving time-series data is no different from any other type of data
With all the power of Structured Query Language (SQL) to query data, some claim that relational databases are as good at retrieving time-series data as they are transactional data. SQL allows greater flexibility but is based on some fundamental assumptions that do not apply to time-series data: a) there is no inherent order in the data records (in fact, time-series data is ordered by time), b) all the data is explicitly stored (in fact, most historian data only represents samples from a continuum of the real data), all data is of equal significance.
These two differences are significant. For example, if an instrument reports a value time stamped at ‘7:59:58.603’ and a user queries a relational database for the value at ‘8:00:00.000’, no data is returned as there is no record stored at that precise time – the database does not recognise that time is a continuum. Similarly, if a temperature was ‘21,0°C’ and two-minutes later was ‘23,0°C’, it has no inherent ability to infer that halfway between these samples the temperature was approximately ‘22,0°C’.
In historian applications, rarely are steady-state operations significant. If the only way for a client application to find exceptions is to query all of the data for a measurement, this could place a heavy load on the overall system: server, network and client. In contrast, historians generally have the means of filtering out insignificant data (based on comparing sequential records) to reduce the volume that must be delivered to client applications.
Myth 5: All data is equal in importance and quality
In collecting thousands of data points from around a process, it is inevitable that some information is incorrect. For instance, there may be problems with physical equipment that is out of range, or simply not working. To a standard database, a stored value is precisely that, a value. In a plant historian, a stored data point not only has an associated value and time, it has an indication of data quality. Storing a data point from an instrument outside of the instrument’s normal operating range, for example, will cause a specific series of quality indicators to be stored with the value. When these are retrieved, they can be used to alert operations or engineering personnel to the potential anomaly. The information used within the historian in a summary point (for example, calculating an average over the last hour of a temperature) will cause the resulting aggregate value to have a quality factor. Managing and propagating the quality of data values within a process historian is necessary to enable any report or analysis performed with that data to be flagged as suspect and alert the consumer of the data.
Myth 6: The only options are fully relational or fully proprietary historian solutions
While it is true that most historian solutions use fully proprietary technology to address the inherent limitations of relational database or fully leverage relational database to reduce their own engineering costs, Wonderware Historian delivers the best of both worlds. It relies on a solid relational scheme for managing all the relatively static configuration data, but extends the native transactional storage engine and query processor of Microsoft SQL Server with proprietary extensions to address the limitations for time-series data.
Building on Microsoft SQL Server delivers a solution that is easier to secure and manage than fully proprietary solutions, but without compromising on the fundamental capabilities required in a historian.
This white paper discussed several reasons why a process historian is superior to the task of plant information acquisition and retrieval compared to a relational database system. However, this is not to say that commercial software has no place in an industrial environment. Today, process information is often needed outside of the plant environment and inside the business systems section of an enterprise. There is no better way to provide this interface between the plant data and the enterprise systems than a commercially accepted, standard interface. As pointed out in the last myth, the Wonderware Historian integrates a commercially available product (Microsoft SQL Server) with its open, standard query interface (SQL), to provide open access to plant historical data. This interface is easily understood by the IT department for reporting or integrating into the enterprise ERP system.
Wonderware’s Historian offers all the capability discussed within this paper and more. Trusted and in use in over 25 000 installations worldwide, Wonderware Historian empowers plant operations and enterprise business users alike, delivering the right information to the right person and leaving database management where it belongs – in the enterprise IT department and not on the plant floor.
For more information contact Jaco Markwat, Wonderware Southern Africa, 0861 WONDER, email@example.com, www.wonderware.co.za