Before I start our journey to explore the universe of Big Data, I will apologise in advance for any Star Trek references.
The true potential of Big Data solutions came to the forefront with the advent of the relational database model, the initial foray into this domain took form of reference data databases, or static data databases as they were called at the time. The aim was to create centralised golden copy of reference data that could be shared throughout the lifecycle of a transaction.
The next step in the evolution was to supplement reference data with transactional information which led to the development of the first Big Data solutions. These were typically based on inflexible predefined fixed schemas, requiring a great deal of thought into the design at the project inspection phase, with a need to undertake thorough data lineage analysis.
With the next generation we saw corporations explore and implement NoSQL based repositories. Whilst this removed the risk of loss of traceability, an inherent weakness in the design of first-generation repositories, there was still a need to construct an effective identification and versioning framework. This framework should comprise metadata that includes terms to uniquely identify the item, its version and a persisted timestamp associated with the item. This provided the ability to support bi-temporality and emulate a single version of the truth.
As there is a better understanding of how technology can be used to simplify process and remove complexity, the business now has the opportunity to take control of Data Lake and Data Warehouse developments.
Is it possible to construct a Data Management solution within weeks?
Having had the privilege of being on this evolutionary journey of repository deployment, I can say that, “It does not need to be a 5 Year mission to explore new technologies and it is achievable with the assistance of the appropriate self-service preparation tooling”. Data Lake and Data Warehouse construction need not have lengthy project delivery milestones, even for use cases with complex data sourced externally or internally from diverse business silos.
So, what should a Big Data self-service preparation framework include?
A User Interface that ensures that the business can monitor progress, maintain and facilitate onboarding
The ability to onboard multiple business silos, structured, unstructured and ingest from pre-existing Data Lakes or other repositories
Data discovery capabilities, that identify the data formats, min/max lengths values, datetime ranges of data sets to be ingested
Workflow templates to facilitate data validation, cleansing and duplicate identification, as well as support for bi-temporal capabilities. Thus, ensuring high-quality data is made available for use by the business
Scheduling capabilities allowing for data ingestion and analytics to be undertaken when the content is made available
The ability to support the four governance pillars of Data Lake construction
Supporting full traceability and proxying of upstream systems, thus enabling Data Warehouse capabilities
The deployment of technologies that lend itself for integration with artificial intelligence and machine learning solutions
Have built-in modelling capabilities, facilitating data linkage, lineage and semantic understanding
Supports complex/unpredictable queries, as users are not constrained by design of any underlying database schema
Provides enhanced query/response times on large datasets
No third-party licence fees, as the solution is based on open source technologies
Cloud agnostic: Comes with production-ready installation images for Oracle, Google, Microsoft Azure and AWS
No Big Data or Data Warehouse expertise required
To summarise, an appropriate self-service preparation platform eliminates the need for a development team to construct and maintain a Big Data solution. This ensures that the business retains control, as the solution is assembled, maintained and managed by Business Analysts, Data Architects or equivalent staff. With a collaborative supplier and an intuitive, adaptable and responsive solution, implementation times can be significantly reduced and ROI optimised. Not to want to reduce your time to market and development costs is, “Highly Illogical.”
Timely and more responsive data management solution
Any self-service preparation framework worth its weight in gold, should deliver tangible results within weeks of project initiation. The Finworks’ Data Platform has achieved this on several occasions and in multiple business sectors, including Finance, Health and Transport.
Martin Sexton is a Senior Business Analyst at Finworks
For further information
We will be presenting at the Big Data Analytics 2020 conference