The current reversion to software code-centricity is almost a paradox. If we had told the programmer-developers engineering us out of the IBM-PC era into the earlier iterations of Windows that code would drive everything, they may have offered some quizzical glances.
“Of course, code will drive everything; that’s why we’re building applications, establishing database procedures, and looking to the future when artificial intelligence (AI) finally graduates out of the movies,” said any given software developer in the 1980s and probably most of them in the 1990s, too.
Table of Contents
Moving Forward with Everything as Code
But just as software application development syntax changes and evolves over the decades, our notion of what we mean by built-as-code has also moved on. This truism is coming to the forefront largely due to the fact that we now reside in the virtualization-first world of cloud, with all the abstracted layers that make up our new compute solutions.
When we talk about code constructs today, we’re not just talking about applications or their core components.
Today, when we say that something is delivered as code, we could be talking about infrastructure as code (IaC), testing as code (TaC), or some more general layer of networking as code (NaC) that we used to entrust to the hubs, switches, and routers of the pre-cloud era.
Typically written as XaC, everything as code embodies everything from infrastructure to platforms to applications and every deeper “service” element of the stack, such as compliance, security, and daily operations.
Data at the Core
As we now work to finesse the features of the new IT stack methods with everything as code, we will know our way around in this new era of technology solutions.
So where should we start? With data, obviously.
Data is of course the core element with which we will build all tiers of cloud. Logically then, cloud data observability should form a core discipline in any testing-as-code capability if we are to navigate the everything-as-code universe.
Aiming to resonate with that technology proposition and work in precisely this space is Soda, a provider of open source data reliability tools and cloud data observability platform technology based in Brussels, Belgium.
Late last year Soda released its Cloud Metrics Store to provide testing-as-code capabilities for data teams to get ahead of data issues in a more sophisticated way. This technology captures historical information about the health of data to support the intelligent testing of data across every workload.
As the modern cloud-native network stack now evolves and the use of everything as code starts to become a de facto approach, we start to think of the so-called “data value chain” as a measure of the worth of our wider IT system.
“It’s advantageous for data teams to unify around a common language that allows them to specify what good data looks like across the data value chain from ingestion to consumption, irrespective of roles, skills, or subject matter expertise,” said Maarten Masschelein, CEO at Soda. “Most data teams today are organized by domain, so when creating data products, they often depend on each other to provide timely, accurate, and complete data.”
Also read: Using Low-code to Deliver Network Automation
A Common Language for Data
A common language for data might see coding and data solutions become available for use by anyone; there is little or no hierarchy between users in the democratized everything-as-code future.
Without a clear strategy to monitor data for quality issues, many organizations fail to catch the problems that can leave their systems exposed and can result in serious downstream issues. Masschelein and team say they are working to give data teams the tools to create a culture and community of good data practice through a combination of the Soda Cloud Data Observability Platform and its open source data reliability tools built by and for data engineers.
He says that his firm’s latest release compels data teams to be explicit about what good data looks like, enabling agreements to be made between domain teams that can be easily tracked and monitored, giving data product teams the freedom to work on the next big thing.
“With this latest release, Cloud Metrics Store gives data and analytics engineers the ability to test and validate the health of data based on previous values,” said Masschelein. “These historical metrics allow data tests to use a baseline understanding of what good data looks like, with any bad data efficiently quarantined for inspection before it impacts data products or downstream consumers.”
Alerts are sent via popular on-call tools or Slack, so data teams are the first to know when data issues arise and can swiftly resolve the problem.
Also read: Top 8 Data Migration Best Practices and Strategies
The Era of Data Best Practice
Moving forward, we’re not yet hearing IT vendors talk about data best practice as a defined principle or workflow objective; most are still just doling out the usual best practice messages, some of which will cover data.
But if data best practice does exist in the everything-as-code arena (which it arguably very well fundamentally should), then we will need data reliability tools that work across the data product lifecycle. This will mean that it is straightforward for data engineers to test data at ingestion—and it will mean data product managers can validate data before it is consumed in other tools.
This test and validation function is precisely what Soda is bringing to market.
“All checks can be written ‘as-code’ in an easy-to-learn configuration language,” said Masschelein. “Configuration files are version controlled and used to determine which tests to run each time new data arrives into a data platform. Soda supports every data workload, including data infrastructure, science, analysis and streaming workloads, both on-premises and in the cloud.”
Our world is now cloud-first, code-first, and data-first. With everything as code pushing data solutions forward, it will be important to keep an eye out on the future of data innovation.
Read next: Data Center Technology Trends for 2022