In the previous post, Understanding the Value of Data, I examined how we can model both the data lifecycle plus the cost and value of data. That allows organisations to examine their cost and value of the data they generate and hold, giving them the opportunity to assess the data and its value, thus creating opportunities for better management of that data. In addition, there are other opportunities that participants (consumers and suppliers) in the information governance industry can seize to bring about structural changes to increase data interoperability between systems which will bring many benefits, such as:
- Increase information reuse
- Increase information value
- Increase flow of data
- Reduce cost
- Maintain higher information fidelity
- Improve market transparency & efficiency
- Increase competition among suppliers
Best of all this is an area where there are real economies of scale, because of the vast volumes of data organisations hold, even a small change in the cost can bring significant financial rewards.
That all sounds wonderful, who would not want the items outlined above? How can we achieve it? What will be required is a community effort – an effort to define and maintain a common data object standard which I call the Universal Data Object (UDO) into which other data objects can be converted and extracted and an open source library of code to perform the conversions or transformations required with complete fidelity. The UDO would also allow the original data object to be optionally attached and for the data object to be encrypted, if required.
Some of you reading this might think “this is a very big effort, it cannot be achieved, converting every data object type ….”. It is a valid concern. This is a very big vision, a universal data object type that can represent any data object. On the face of it, it’s almost impossible to achieve. However, as with many things in life, there is an 80:20 rule. I suspect when it comes to data, that rule is even more pronounced. I suspect that 90% or 95% of data is stored in 10% or 20% of data formats. In my work with unstructured data, there are less than 150 data object formats that represent virtually everything in terms of communal ions. In fact I know of many products in the archiving space which meet there customers’ needs with the ability to handle less than 90 data object types.
This will need to be tackled in stages. I am about to embark on this research standard, firstly to formalise the data lifecycle model and then the data life cost of ownership model; that’s the easy part. Then we will build the first version of the UDO, initially servicing only a small ne’er, say three, source object types. And building the corresponding code to support these.
How will it work? A blog is not the right place to delve into this in great detail, that will be covered in a future research paper, but let’s examine some of the key stages.
At the point of capture, messages are transported from the source system to the control application. This can be done as a three step process which can be configured as required. The steps are:
- Detect: Detect the source type. This can be optional, if the connectivity is guaranteed to have a known data object type then detection of the data object type is redundant.
- Transform: Transform the data from its native data type into a UDO using the open source tools.
- Transport: Transport the object from the source location into the control location. This could optionally be at the very start of the process.
This means that a data object can be captured, transformed from the native date object type into the UDO type and transported to the control location where it can be directed, enriched and encrypted. As you can imagine, this flow can be reconfigured to have the transport happen first and then have all the processing happen at one location. It then gets stored and indexed as per the normal life cycle. After this initial investment in transforming the date object just once, the organisation is in a position to take great advantage of that investment. Firstly, any application that is configured to comply with the UDO standard is able to consume the data, avoiding time and cost of further ETL (extract, transform & load) activities. They also have the potential to avoid duplication of storage and data transport costs. The result is data is easier to resume and thus drives greater value. Additionally, fidelity is thus maintained.
How is the UDO data object formatted? The UDO is formatted in a hierarchical data object and an initial illustration is presented below:As you can see, the structure is hierarchical and each layer holds data that is common to any following layer in the data structure.
Call to action
To achieve this, we need to define a Universal Data Object (UDO) format or standard. To make this a reality, we need an industry-wide consortium to be the standards body which will:
- Maintain the data object standard
- Maintain a library of performant code to perform basic operations on the data objects, such as:
- Converting active data object into the UDO format
- Activating encryption
- Rendering the data object; and
- Extracting or producing a native data object from the UDO on demand.
- Certify format and versions of data objects as compatible with the standard
- Certifying systems as being compatible with the standard
Starting in September, I am going to commence research into this topic to lay the foundations for the creation of a UDO and the necessary standards consortium. As part of this, I will be working on a paper that will outline all of this in more detail from the data lifecycle model through to an architecture to process and manage data through its entire lifecycle.
To get an early release of this paper, simply click here, and if you like this work, please show your support by commenting, clicking “like” and forwarding to others who might be interested. Thanks!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.