For AI model success, utilize MLops and get the data right

Credit Source

We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 – 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!

It’s critical to adopt a data-centric mindset and support it with ML operations

Artificial intelligence (AI) in the lab is one thing; in the real world, it’s another. Many AI models fail to yield reliable results when deployed. Others start well, but then results erode, leaving their owners frustrated. Many businesses do not get the return on AI they expect. Why do AI models fail and what is the remedy?

As companies have experimented with AI models more, there have been some successes, but numerous disappointments. Dimensional Research reports that 96% of AI projects encounter problems with data quality, data labeling and building model confidence.

AI researchers and developers for business often use the traditional academic method of boosting accuracy. That is, hold the model’s data constant while tinkering with model architectures and fine-tuning algorithms. That’s akin to mending the sails when the boat has a leak — it is an improvement, but the wrong one. Why? Good code cannot overcome bad data.

Instead, they should ensure the datasets are suited to the application. Traditional software is powered by code, whereas AI systems are built using both code (models + algorithms) and data. Take facial recognition, for instance, in which AI-driven apps were trained on mostly Caucasian faces, instead of ethnically diverse faces. Not surprisingly, results were less accurate for non-Caucasian users.

Good training data is only the starting point. In the real world, AI applications are often initially accurate, but then deteriorate. When accuracy degrades, many teams respond by tuning the software code. That doesn’t work because the underlying problem was changing real-world conditions. The answer: to increase reliability, improve the data rather than the algorithms.

Since AI failures are usually related to data quality and data drifts, practitioners can use a data-centric approach to keep AI applications healthy. Data is like food for AI. In your application, data should be a first-class citizen. Endorsing this idea isn’t sufficient; organizations need an “infrastructure” to keep the right data coming.

MLops: The “how” of data-centric AI

Continuous good data requires ongoing processes and practices known as MLops, for machine learning (ML) operations. The key mission of MLops: make high-quality data available because it’s essential to a data-centric AI approach.

MLops works by tackling the specific challenges of data-centric AI, which are complicated enough to ensure steady employment for data scientists. Here is a sampling:

The wrong amount of data: Noisy data can distort smaller datasets, while larger volumes of data can make labeling difficult. Both issues throw models off. The right size of dataset for your AI model depends on the problem you are addressing.
Outliers in the data: A common shortcoming in data used to train AI applications, outliers can skew results.
Insufficient data range: This can cause an inability to properly handle outliers in the real world.
Data drift: Which often degrades model accuracy over time.

These issues are serious. A Google survey of 53 AI practitioners found that “data cascades—compounding events causing negative, downstream effects from data issues — triggered by conventional AI/ML practices that undervalue data quality… are pervasive (92% prevalence), invisible, delayed, but often avoidable.”

How does MLOps work?

Before deploying an AI model, researchers need to plan to maintain its accuracy with new data. Key steps:

Audit and monitor model predictions to continuously ensure that the outcomes are accurate
Monitor the health of data powering the model; make sure there are no surges, missing values, duplicates, or anomalies in distributions.
Confirm the system complies with privacy and consent regulations
When the model’s accuracy drops, figure out why

To practice good MLops and responsibly develop AI, here are several questions to address:

How do you catch data drifts in your pipeline? Data drift can be more difficult to catch than data quality shortcomings. Data changes that appear subtle may have an outsized impact on particular model predictions and particular customers.
Does your system reliably move data from point A to B without jeopardizing data quality? Thankfully, moving data in bulk from one system has become much easier, as tools for ML improve.
Can you track and analyze data automatically, with alerts when data quality issues arise?

MLops: How to start now

You may be thinking, how do we gear up to address these problems? Building an MLops capability can begin modestly, with a data expert and your AI developer. As an early days discipline, MLops is evolving. There is no gold standard or approved framework yet to define a good MLops system or organization, but here are a few fundamentals:

In developing models, AI researchers need to consider data at each step, from product development through deployment and post-deployment. The ML community needs mature MLops tools that help make high-quality, reliable and representative datasets to power AI systems.
Post-deployment maintenance of the AI application cannot be an afterthought. Production systems should implement ML-equivalents of devops best practices including logging, monitoring and CI/CD pipelines which account for data lineage, data drifts and data quality.
Structure ongoing collaboration across stakeholders, from executive leadership, to subject-matter experts, to ML/Data Scientists, to ML Engineers, and SREs.

Sustained success for AI/ML applications demands a shift from “get the code right and you’re done” to an ongoing focus on data. Systematically improving data quality for a basic model is better than chasing state-of-the-art models with low-quality data.

Not yet a defined science, MLops encompasses practices that make data-centric AI workable. We will learn much in the upcoming years about what works most effectively. Meanwhile, you and your AI team can proactively – and creatively – devise an MLops framework and tune it to your models and applications.

Alessya Visnijc is the CEO of WhyLabs

Read Full Article

What's Hot

Microsoft needs to win back trust

Investment Giant Morgan Stanley Considers Providing Spot Bitcoin ETF Options For Clients | TheSpuzz

DeFi bull market confounds expectations with RWAs and ‘recursive airdrops’

EPF: This is how you can get your account details in 9 regional languages

Demat Accounts: How to invest in ETFs? A step-by-step guide

What are the compliance requirements for a demat account?

What is moratorium period in health insurance?

Rapido to offer free rides to differently-abled voters in Karnataka

PayU gets RBI’s in-principle approval to operate as payment aggregator

InterGlobe, Assago set up AI business venture AIonONS; eyes partnerships

BharatPe launches payment device BharatPe One with integrated POS

Investment Giant Morgan Stanley Considers Providing Spot Bitcoin ETF Options For Clients | TheSpuzz

DeFi bull market confounds expectations with RWAs and ‘recursive airdrops’

Memecoins are like a ‘risky casino’ — Andreessen Horowitz exec

Bitcoin short liquidation risk surges as BTC price dips under $64K

Investment Giant Morgan Stanley Considers Providing Spot Bitcoin ETF Options For Clients | TheSpuzz

DeFi bull market confounds expectations with RWAs and ‘recursive airdrops’

EPF: This is how you can get your account details in 9 regional languages

After Bitcoin’s massive price surge and book profit if overexposed

Watch: Sam Altman’s Reaction When Stanford Students Surprised Him With Birthday Song

“Bull In A Mobile Shop”: Video Shows Animal Barging Into Small Store With Two Workers

Job Applicant Sends Resume And Cover Letter Through Blinkit: ”The Hustle Is Real”

Pups Carlos, Charles, Lando Join Chennai Police Canine Squad

World Malaria Day 2024: Date, Theme And History Of This Day

For AI model success, utilize MLops and get the data right

Microsoft needs to win back trust

Motorola’s Moto G64 review: A feature-packed 5G smartphone on a budget | Tech Reviews

Paris Landmark Moulin Rouge's Windmill Sails Collapse

Watch: Sam Altman’s Reaction When Stanford Students Surprised Him With Birthday Song

The TikTok ban: what might happen and who might buy it

Princess Beatrice's Ex Found Dead In Miami Hotel, Drug Overdose Suspected

Microsoft needs to win back trust

Investment Giant Morgan Stanley Considers Providing Spot Bitcoin ETF Options For Clients | TheSpuzz

DeFi bull market confounds expectations with RWAs and ‘recursive airdrops’

EPF: This is how you can get your account details in 9 regional languages

Microsoft needs to win back trust

Investment Giant Morgan Stanley Considers Providing Spot Bitcoin ETF Options For Clients | TheSpuzz

DeFi bull market confounds expectations with RWAs and ‘recursive airdrops’

What's Hot

For AI model success, utilize MLops and get the data right

MLops: The “how” of data-centric AI

How does MLOps work?

MLops: How to start now

Keep Reading

Subscribe to Updates