Level of IA

Introduction

There is a lot of hype surrounding AI these days. Early advances in image recognition or text classification are being superseded by even more exciting breakthroughs in more specialized and business-oriented areas (see DeepMind’s AlphaFold solving a problem of protein folding). 

Despite that, the economic impact of AI remains small. Let’s discuss why this is happening and how business organizations can realize the tremendous value promised by AI.

Naive idea of AI

For someone entering the field, Data Science might look simple:

  • The business user formulates a problem, compiles a dataset, and passes it to a data scientist. 
  • The data scientist explores the data, conducts experiments with various machine learning architectures, trains and selects the best-performing model, and hands it over to a data engineer.
  • The data engineer deploys the model for inference.
  • The happy business user channels new data to the model and receives high-quality business insights.

What’s wrong with this description ?

Google Research article “Hidden Technical Debt in Machine Learning Systems” answers this question with a diagram:

Copyright Google; quoted from “Hidden technical debt in machine learning systems” (Sculley et al., 2015)

ML code is just a small fraction of real-world AI-based systems. To operationalize ML code and make it usable in production, all surrounding blocks are equally critical.

Even this picture is not yet complete: we can add integration into end-to-end business applications, including data and concept drift analysis, feedback loops, etc.

Production-grade AI

Here is another example, this time for a real-world production system developed by Tesla for autonomous driving AI:

Copyright Andrej Karpathy, Tesla

Every Tesla car has an AI module deployed locally and running all the time. Even if the auto-pilot is off, the AI module keeps working in shadow mode, predicting what the driver is going to do next. 

If these predictions are consistent with the actual driver’s behavior, no action is needed. If the driver does something unexpected, contradicting the predictions of the onboard AI, the car transmits the dataset describing the situation to the Data Engine. 

The Data Engine requests other Tesla cars to search for a similar situation in their historical records and transmit the related data sets, “boosting” the number of training data. All these datasets go through labeling and are then used to train a new version of the AI models which are then deployed to all Tesla cars.

As you can see, rather than focusing on model training, Tesla engineers are focusing on a continuous pipeline that puts together machine learning, feedback from the field, labeling, retraining, etc. You can imagine the challenges of running this pipeline across the whole fleet of Tesla cars !

Industrialized AI

Moving from a small data science project to an enterprise-scale implementation is often associated with the magic word “industrialization”. What does it mean exactly ? 

From hand-crafted production to factories

In the past, multiple industries transitioned from hand-crafted production by small groups of artisans to huge factories producing highly repeatable products in large quantities.

Until recently, the IT industry was expected to undergo the same change, gaining maturity and reducing the number of IT projects that fail. It did not happen. Software factories built around V-model assembly lines with architecture, design, development, testing teams and various degrees of off-shorization turned out to be heavy, rigid, and out-of-sync with the real consumers of the product. 
This approach worked for high-complexity tasks with well-defined and stable requirements like getting people to the Moon, but the majority of ordinary projects continued to suffer.

A rise of agile methodologies

The last ten years have been witnessing a rise of agile methodologies going against the traditional view of industrialization. What we see is a comeback of pre-industrialization practices, but at a new level. 

The agile movement empowers small artisanal teams with high-performance DevOps infrastructure and tooling, giving them end-to-end visibility on the production chain and enabling iterative development with frequent validations by end users.
The industrialization of AI follows the same path. Rather than introducing data scientist as another assembly line worker who is busy training the models with little visibility on what is happening upstream or downstream, the modern AI industrialization injects machine learning expertise into agile software development teams, complements DevOps by MLOps and integrates AI initiatives into the overall digital transformation program managed in agile and decentralized way.

AI Industrialization Journey

Let’s look at the stages of the AI industrialization journey, as organizations move from one maturity level to another.

Level 1: PoC (Proof of Concept) trap

When an organization discovers AI, it often has the naive idea mentioned at the beginning of the article. There is a popular belief that data is the new oil, so it’s enough to collect a large amount of data, hire a bright data scientist and enjoy the benefits of AI. 

In reality, the organizations end up with a large number of fragmented experiments, not getting their way to production systems. Even if some of these experiments got to production, the results are inconclusive: the end users look at such kind of AI with suspicion, not sure how far the insights can be trusted and whether AI helps them or rather creates more work and confusion with bias and false positives. In the end, AI stays there as a curiosity and the business process follows its way as before.

The problem is twofold

  • On one hand, as the Google Research article pointed out, the AI model is a small fraction of what needs to be implemented for a production-grade system. The data science team, working in isolation, can produce a PoC, but doesn’t have a chance of getting the remaining 95% of the AI-based solution right. 
  • On another hand, the organization must change the traditional way of doing things, redesigning the business processes, by putting AI in the center and, eventually, in the driving seat. It will not happen unless the business users trust the system and trust the governance process ensuring that the whole chain from data acquisition to model training and deployment is constantly monitored, controlled, and audited. 

Level 2: Use-case-level industrialization

As the organization gets more experience with AI and the machine learning teams gain in maturity, local business champions appear. Such people are critical, as they can identify the business problems that AI can solve, articulate the benefits, manage expectations and get the organization’s buy-in for making AI an integral part of the business process.

Apart from business process reengineering challenges and integration of AI components into the overall IT landscape, the machine learning team faces a number of ML-specific challenges:

  • Concept drift and the necessity to constantly maintain the deployed AI models are addressed by retraining the models or applying continual learning strategies;
  • The scarcity of labels, the fact they are often costly to obtain, poor quality, and coming late is addressed by integrating feedback loops in business solutions (remember Tesla Data Engine!), by active learning and few-shot learning strategies;
  • Out-of-distribution cases, not seen in training, require careful monitoring using confidence levels which can trigger fallback mechanisms and involve human users if the confidence is low;
  • Black-box models, subject to bias, not trusted by end users, and not able to meet compliance criteria, are addressed by responsible AI technologies that improve the explainability, and fairness aspects of the AI models.

The industrialization passes here by putting in place a full-scale MLOps infrastructure to support the whole life cycle of AI models. It ensures consistent quality of AI-driven business insights by controlling the quality of the data used for training, testing, and validation of AI models, monitoring input data during run time, identifying any drift, and managing continuous retraining of AI models. 

It also integrates AI components into the business application development life cycle, making sure that both sides evolve in sync and that any discrepancy is identified before it causes an incident in production.

Another dimension of industrialization is the emergence of standards and governance procedures. Standards increase reusability, facilitate exchanges between different agile teams and make the whole development process more fast and more efficient. Governance brings clear ownership and responsibility which in the end creates trust and motivates end users to use AI in their day-to-day jobs.

Level 3: Enterprise level industrialization across multiple use cases

At the next maturity level, organizations start focusing on the integration of various AI-related initiatives. The main driver of integration is the fact that the same data might be used across different use cases, so it makes sense to coordinate the downstream tasks and align roadmaps of AI-based applications depending on the same data. 

These initiatives also get integrated into the broader digital transformation strategy and business KPIs used to evaluate the success and return on investment.

This stage brings new ML-specific challenges:

  • Managing a large number of use case-specific models reusing features and learned representations produced by upstream models;
  • Controlling data quality and trustworthiness across multiple data sources with multiple owners; selective rollback and retraining, if the upstream control failed.

This is where a holistic ML platform becomes critical. In addition to MLOps capabilities, such platforms bring model repositories and feature stores. The platform allows to the version of data, features, and model artifacts, and tracks data lineage. Deployed on a scalable managed infrastructure, spanning through several edge layers to the cloud, such an ML platform becomes a backbone of industrialized AI capabilities.

Conclusion

It took IT technologies half a century to reach maturity and become the main driver for productivity growth in the world economy. We are far from being there with AI, but this time may come sooner than we expect. The path to maturity and industrialization of AI follows the same pattern as its ancestor, traditional software development. This path will be even shorter, if AI benefits from the latest industrialization patterns that emerged for software development: agile methodologies and DevOps. No need to look two centuries back for industrialization ideas. AI industrialization is way more fun than you might think !