³ÉÈËVRÊÓƵ

Skip to content
Digital Transformation & Operations

Why data transformation is no longer the only imperative for AI’s longevity and adoption

Mark Dangelo  Co-Founder of DMink

· 7 minute read

Mark Dangelo  Co-Founder of DMink

· 7 minute read

No longer can data auditability be a one-and-done periodic strategy that provides a point-in-time snapshot; rather, it must be designed for automation and continuous iteration that’s now needed for all forms of AI fuel

Artificial intelligence (AI) in all its forms have collectively become innovative technology hammers in search of efficiency nails.

As promoted, AI is the answer to all changes in business models, it will fix staffing and skill set deficiencies, and it will boost and scale profitability. AI has also ushered in new sets of delivery demands, deployment methods, data operations, and rising expectations in addition to changing roles and responsibilities to define, establish, manage, and retire its rapid-cycle technologies. AI — from cradle to grave — is redefining the lifecycles of deployment.

Nevertheless, AI at its foundations contains complex, data-driven solution sets that require data fuel, which is relevant, event-driven, secure, and critically free from defects. In simple terms, if the data fuel is dirty, irrelevant, and biased, the AI outputs feeding other AI inputs will create results that may initially excite project sponsors, yet create hidden risks, compliance failures, and legal liabilities. In the end, if the fuel that feeds AI solutions is tainted, the AI itself will yield misinformation that is cascaded across the enterprise, impacting customers and attracting unwanted regulatory attention.

Implicitly, and often hidden within the AI pilots and divergent vendor promises, is the assumption that the data for usage is generally stable, defect free, and readily available. As AI solutions increasingly become interconnected and trained on smaller, internal data sources, the margin of error often rises. Increasingly, AI discussions are concentrating on the intelligence layers within the AI solution sets, but the risk hidden within starts and ends with the unglamorous data itself.

Digital transformation was just the beginning

Designing clean data ingestion solutions and elements from the original systems-of-record is not simplistic nor is it as glamorous as designing new AI intelligence algorithms. For corporate leaders seeking to embrace AI’s innovative relevance, the idea that additional time, money, and skills need to be allocated to data after 15 years of digital transformations appears counterintuitive.

By discounting emerging data usage across what are familiar data transformation categories — that is, structured and unstructured — leaders and their IT operations are not recognizing the vast inputs used by AI solutions. Data auditability beyond structured and unstructured includes:

      • semi-structured
      • metadata
      • dark data
      • archived and legacy sources
      • event and streaming data
      • transactional and reference sources
      • third-party, ethically sourced data (beyond data broker inputs); and
      • derived and AI-created (or cascading) data.

Given the constant change in the data, organizations rapidly adopting AI solutions need to de-risk and streamline training and inputs across vast domain-specific sources. Organizations need to objectively assess and continuously audit the data that is fueling their AI solutions, while simultaneously taking coordinated and comprehensive action to bridge the gaps between current and future data requirements. Where many organizations begin with the features, functionality, and form of their desired AI end-state, the project baselines need to start instead with system-of-record data origins and existing segmentations contained across departmental and divisional functional applications.

These data compartments represent pain points that are spread across traditional domain systems and encased within legacy infrastructures. As AI progresses, the data-driven gains will apply new pressures that are needed by and created from AI inputs and outputs. These pressures are at opposite ends of the data spectrum and will drive emerging, advanced requirements for data audits.

data transformation

What Figure 1 puts into perspective are the critical drawbacks of non-cohesive AI implementations. If leaders embrace the operating and business model solutions that spearheaded the digital transformation efforts of the last decade, it is probable that fragmentation efforts — due to data quality and proliferation — will drive up costs, such as personnel, power, infrastructure, cloud, and more, while also raising the likelihood of inconsistencies and risks at alarming rates.

Integration of autonomous software layers

To address the emerging regulatory, audit, and legal requirements that are sandwiched between the pains and gains of data, enterprises will need to create and deliver against a framework of automated solutions. Each of these building block solutions, built upon a foundation of domain and cross-market data standards, requires tightly coupled stacks of solutions to provide multidimensional interoperability that will be necessary to meet the rapidly changing AI technologies that they power.

No longer can data auditability be a one-and-done periodic strategy that provides a point-in-time snapshot — it must be designed for automation and for the continuous iteration that’s needed for all forms of AI fuel. To understand the complexity and intracity of the data fuel for AI, a deeper dive is required and demanded by regulators, customers, and the very AI technology itself.

data transformation

Moreover, Figure 2 represents the granular segmentation necessary to create not just the rigor of data auditability, but also the adaptability of event-driven data and architectures. To compare this against prior efforts of pre-AI data auditability — when corporate leaders used a host of specialists and solution containers to define, assess, and implement specialized solutions — illustrates how much more complex the situation has become.

Shifting of roles and responsibilities

These solution outcomes, while progressive and managed, led to data auditability deserts between one area and another. Moreover, traditional point-developed solutions that could address a unique data component — such as lifecycle, analytics, or governance — typically are under the control of discrete C-level leaders, each with their own goals, outcomes, and staff. An illustration of these is shown in Figure 3 (below) with ultimate alignment under the control of distinct C-level executives.

data transformation

To guarantee that AI is defined, managed, measured, and optimized, the evolving organizational roles and responsibilities must surround shared auditability elements that are shown at the core of Figure 3, including strategy, cost-benefit analysis, change management, and ethical sourcing. By utilizing a holistic approach for data auditability, the negative risks for regulatory compliance, legal exposures, and software actions are minimized, thereby allowing tightly coupled solution sets that minimize outside oversight and governance.

The mainstreaming of AI in 2022 promised step change improvements, which in theory eliminate the need and expense for experienced personnel. Yet, as we now comprehend, AI is not flawless, it struggles with scalability and transparency, and it has yet to robustly address cascading impacts associated with upstream and downstream data interconnectivity.

Enterprises can glean several call-to-actions from the illustrations presented, but in summary these actions are grouped around:

      • identifying cross-domain problem solutions that span roles, software, and data usage;
      • atomizing the value for each data management solution as part of an active stack of capabilities, each interconnected to the others’
      • focusing on re-use and automation to adapt (rather than adopt) to vast data sources and uses;
      • democratizing data across a federated cradle-to-grave data strategy and architecture;
      • developing deep data-as-a-product skills and designs that can ease the burden of data oversight and governance; and
      • accepting that AI is the next-gen usage for software development, and that it begins with shared data resources, approaches, and rapid-cycling.

In the end when AI fails or succeeds, the data that fuels AI will be the opaque cause-effect lynchpin that starts unintended consequences and delivers value-added capabilities. Without proactive formulation and oversight that utilizes the components highlighted, AI will fail to deliver against rising expectations.


You can find more analysis of AI and software infrastructure from this author here.

More insights