Top 5 data quality & accuracy challenges and how to overcome them

Credit Source

We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 – 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!

Every company today is data-driven or at least claims to be. Business decisions are no longer made based on hunches or anecdotal trends as they were in the past. Concrete data and analytics now power businesses’ most critical decisions.

As more companies leverage the power of machine learning and artificial intelligence to make critical choices, there must be a conversation around the quality—the completeness, consistency, validity, timeliness and uniqueness—of the data used by these tools. The insights companies expect to be delivered by machine learning (ML) or AI-based technologies are only as good as the data used to power them. The old adage “garbage in, garbage out,” comes to mind when it comes to data-based decisions.

Statistically, poor data quality leads to increased complexity of data ecosystems and poor decision-making over the long term. In fact, roughly $12.9 million is lost every year due to poor data quality. As data volumes continue to increase, so will the challenges that businesses face with validating and their data. To overcome issues related to data quality and accuracy, it’s critical to first know the context in which the data elements will be used, as well as best practices to guide the initiatives along.

1. Data quality is not a one-size-fits-all endeavor

Data initiatives are not specific to a single business driver. In other words, determining data quality will always depend on what a business is trying to achieve with that data. The same data can impact more than one business unit, function or project in very different ways. Furthermore, the list of data elements that require strict governance may vary according to different data users. For example, marketing teams are going to need a highly accurate and validated email list while R&D would be invested in quality user feedback data.

The best team to discern a data element’s quality, then, would be the one closest to the data. Only they will be able to recognize data as it supports business processes and ultimately assess accuracy based on what the data is used for and how.

2. What you don’t know can hurt you

Data is an enterprise asset. However, actions speak louder than words. Not everyone within an enterprise is doing all they can to make sure data is accurate. If users do not recognize the importance of data quality and governance—or simply don’t prioritize them as they should—they are not going to make an effort to both anticipate data issues from mediocre data entry or raise their hand when they find a data issue that needs to be remediated.

This might be addressed practically by tracking data quality metrics as a performance goal to foster more accountability for those directly involved with data. In addition, business leaders must champion the importance of their data quality program. They should align with key team members about the practical impact of poor data quality. For instance, misleading insights that are shared in inaccurate reports for stakeholders, which can potentially lead to fines or penalties. Investing in better data literacy can help organizations create a culture of data quality to avoid making careless or ill-informed mistakes that damage the bottom line.

3. Don’t try to boil the ocean

It is not practical to fix a large laundry list of data quality problems. It’s not an efficient use of resources either. The number of data elements active within any given organization is huge and is growing exponentially. It’s best to start by defining an organization’s Critical Data Elements (CDEs), which are the data elements integral to the main function of a specific business. CDEs are unique to each business. Net Revenue is a common CDE for most businesses as it’s important for reporting to investors and other shareholders, etc.

Since every company has different business goals, operating models and organizational structures, every company’s CDEs will be different. In retail, for example, CDEs might relate to design or sales. On the other hand, healthcare companies will be more interested in ensuring the quality of regulatory compliance data. Although this is not an exhaustive list, business leaders might consider asking the following questions to help define their unique CDEs: What are your critical business processes? What data is used within those processes? Are these data elements involved in regulatory reporting? Will these reports be audited? Will these data elements guide initiatives in other departments within the organization?

Validating and remediating only the most key elements will help organizations scale their data quality efforts in a sustainable and resourceful way. Eventually, an organization’s data quality program will reach a level of maturity where there are frameworks (often with some level of automation) that will categorize data assets based on predefined elements to remove disparity across the enterprise.

4. More visibility = more accountability = better data quality

Businesses drive value by knowing where their CDEs are, who is accessing them and how they’re being used. In essence, there is no way for a company to identify their CDEs if they don’t have proper data governance in place at the start. However, many companies struggle with unclear or non-existent ownership into their data stores. Defining ownership before onboarding more data stores or sources promotes commitment to quality and usefulness. It’s also wise for organizations to set up a data governance program where data ownership is clearly defined and people can be held accountable. This can be as simple as a shared spreadsheet dictating ownership of the set of data elements or can be managed by a sophisticated data governance platform, for example.

Just as organizations should model their business processes to improve accountability, they must also model their data, in terms of data structure, data pipelines and how data is transformed. Data architecture attempts to model the structure of an organization’s logical and physical data assets and data management resources. Creating this type of visibility gets at the heart of the data quality issue, that is, without visibility into the *lifecycle* of data—when it’s created, how it’s used/transformed and how it’s outputted—it’s impossible to ensure true data quality.

5. Data overload

Even when data and analytics teams have established frameworks to categorize and prioritize CDEs, they are still left with thousands of data elements that need to either be validated or remediated. Each of these data elements can require one or more business rules that are specific to the context in which it will be used. However, those rules can only be assigned by the business users working with those unique data sets. Therefore, data quality teams will need to work closely with subject matter experts to identify rules for each and every unique data element, which can be extremely dense, even when they are prioritized. This often leads to burnout and overload within data quality teams because they are responsible for manually writing a large sum of rules for a variety of data elements. When it comes to the workload of their data quality team members, organizations must set realistic expectations. They may consider expanding their data quality team and/or investing in tools that leverage ML to reduce the amount of manual work in data quality tasks.

Data isn’t just the new oil of the world: it’s the new water of the world. Organizations can have the most intricate infrastructure, but if the water (or data) running through those pipelines isn’t drinkable, it’s useless. People that need this water must have easy access to it, they must know that it’s usable and not tainted, they must know when supply is low and, lastly, the suppliers/gatekeepers must know who is accessing it. Just as access to clean drinking water helps communities in a variety of ways, improved access to data, mature data quality frameworks and deeper data quality culture can protect data-reliant programs & insights, helping spur innovation and efficiency within organizations around the world.

JP Romero is Technical Manager at Kalypso

Read Full Article

What's Hot

SEC 'next chair' must be named before US election — Tyler Winklevoss

Mirae Asset Mutual Fund lifts restrictions from large and midcap fund

This Pakistani City Is Ranked Second-Riskiest For Tourists

Sold house in last 2 years? You may get indexation benefit and lower tax rate

Fixed deposits: Can you double your money in 10 years by investing in FDs? Check rates of these 6 banks to find out | Mint

Invested in debt MF before 1 April 2023? You may pay 40% higher tax on gains | Mint

Confused whether capital gains tax on your asset sale will be short term or long term? Here is a complete guide | Mint

Funding winter for startups may end with angel tax abolition: DPIIT Secy

WayCool lays off over 200 employees, aims to achieve profitability

Urban Company revenue up 37.3% in Q1FY25, loss narrows to Rs 93 cr in FY24

Angel tax abolition significant milestone, will boost startups: IT Minister

SEC 'next chair' must be named before US election — Tyler Winklevoss

Bitcoin Rising: Next Most “Hated” Range Will Be Between $75,000 And $95,000

Michigan pension fund discloses $6.6M investment in Bitcoin ETFs

Bitcoin Network's OP_CAT upgrade fuels developer innovation

SEC 'next chair' must be named before US election — Tyler Winklevoss

Mirae Asset Mutual Fund lifts restrictions from large and midcap fund

Bitcoin Rising: Next Most “Hated” Range Will Be Between $75,000 And $95,000

Michigan pension fund discloses $6.6M investment in Bitcoin ETFs

British Woman Lost 48 Kg In A Year, Thanks To One Simple Gym Hack

Bengaluru Woman Spends Over Rs 16,000 Per Month On Uber: ”More Than Half Of My Rent”

“So Irresponsible”: Man Drives Car With Daughter On His Lap, Video Sparks Concern

Pakistani Woman In US Throws Party To Celebrate Her Divorce, Video Goes Viral

How An Employee Fooled His Boss Into Thinking He Was At Work For A Month

Top 5 data quality & accuracy challenges and how to overcome them

This Pakistani City Is Ranked Second-Riskiest For Tourists

Epic Games says Fortnite returning to iOS in EU, leaving Samsung app store | Tech News

180,000 Gazans Displaced In 4 Days As Israeli Aggression Continues

British Woman Lost 48 Kg In A Year, Thanks To One Simple Gym Hack

FBI Confirms Donald Trump Was Hit By Bullet In Assassination Attempt

JPMorgan Chase unveils AI-powered LLM Suite; may replace research analysts | World News

SEC 'next chair' must be named before US election — Tyler Winklevoss

Mirae Asset Mutual Fund lifts restrictions from large and midcap fund

This Pakistani City Is Ranked Second-Riskiest For Tourists

Bitcoin Rising: Next Most “Hated” Range Will Be Between $75,000 And $95,000

SEC 'next chair' must be named before US election — Tyler Winklevoss

Mirae Asset Mutual Fund lifts restrictions from large and midcap fund

This Pakistani City Is Ranked Second-Riskiest For Tourists

What's Hot

Top 5 data quality & accuracy challenges and how to overcome them

1. Data quality is not a one-size-fits-all endeavor

2. What you don’t know can hurt you

3. Don’t try to boil the ocean

4. More visibility = more accountability = better data quality

5. Data overload

Keep Reading

Subscribe to Updates