AI applications are also invaluable for data-driven organizations. Databases were already a gold mine, but with AI you have the tools to extract even more value from your data. Your data turns out to be full of undiscovered patterns and opportunities you never discovered before.

AI won't let you fail

The magic of AI is also its greatest risk. If it always gives the right answer, always finds patterns and always makes extraordinary discoveries, does the input still matter?

In the data field, we all know: you can't build on bad data. Garbage in, garbage out. Garbage out is a direct reason to take another critical look at the quality of your input. Then you can clean up the garbage and continue building on a good foundation. But what happens when Garbage out is no longer an option? In the AI era, it seems like models no longer depend on the quality of input. AI models are built to never give a "garbage" output, regardless of the data quality of your input. You will always get a polished and desired response. Just like ChatGPT: with a bad prompt, in many cases you will still get back a little novel in full sentences. No matter the input, AI won't let you fail.

Garbage in, magic out.
Mobilee

Bad data is bad decision making
Unfortunately, a polished answer doesn't always mean you're right.
It used to be easier for us to judge the reliability of our information: the number of spelling errors, incorrect sentence structure or word choice, size, etc. Today, these indicators are less relevant. The answers AI gives you almost always look good quality. This is exactly what AI is good at: the structure, structure and content appear sophisticated. Misinformation is hardly distinguishable from truth anymore. The same goes for the AI in your organization. In more and more organizations, AI is taking on a central role in informing crucial decision-making-a good opportunity, but it's difficult to evaluate its reliability. The result? Poor output is glossed over, potentially forming the basis for decision making in organizations. Bad decision-making.

AI won't let you fail ... and that's a problem

Preventing risk: Checking and Verifying
AI always gives you a great answer. It always finds something in your data. Even if it isdirty datathat is incomplete, irrelevant, unreliable or manipulated. By training AI models with dirty data, bad data is spread through other models and decision systems in no time. Thus, small errors in your input can result in large-scale and often invisible risks to your organization. Such a sum of successive errors, the consequences of which only become apparent after some time, is called a Data Cascade. They are common, invisible, delayed and long-lasting, but in most cases they are preventable.

Data Cascades: Google research
A 2021 Google Research survey asked AI practitioners from several countries about their experiences with Data Cascades. As many as 92% of participants said they had experience with at least one Data Cascade. In 45% of cases, there were even several. The underlying problem, according to the researchers: insufficient attention to data quality, poor understanding of the data. As the title says, "Everyone wants to do the [AI] model work, not the data work."

How do you prevent this cascade effect and find out if your AI model's findings are reliable?

Expertise more important than ever
Unfortunately, the answer is not ChatGPT. It is the data experts, the people who have a deep understanding of the data and processes in your organization.
First, it is important to continuously verify the data quality of the input. Second, it is important to additionally evaluate not only the outcome, but also the method.

Since the use of AI often leads to more complex models, the expertise of data experts is all the more important. The job of a data expert has become a lot broader than performing a few analyses or building models. The new challenge is to identify and fix errors and anomalies in complex AI models. Specifically, the models built by AI and not the data scientist himself, require a deep understanding of the data and processes. So preferably use Explainable AI, these are models that provide insight and explanations (Patel, 2024) about the choices made. As long as you can see into the black box, you keep a grip on the quality of your data, decision systems and the underpinnings of your decision-making.

Better data quality thanks to AI

Now for a moment practically. We want to avoid feeding AI dirty data at all costs. Fortunately, AI can also help you in improving your data quality.

A simple but valuable use case of AI is to clean up and standardize your customer data. Many organizations deal with customer data that comes from different sources, resulting in an inconsistent or unstructured database of customer data. AI is great for recognizing the meaning (semantics) of words, regardless of exact spelling. Run a model on your database of customer data to recognize and structure the anomalous data. The data expert gives AI the task of identifying all the anomalies, and explaining the processes. AI executes. By applying AI to improving your data quality, you not only ensure that you avoid the risk of blindly relying on AI models. But also that you extract even more of the potential and value from existing data.

Practical tips for AI deployment within your organization

Here are some practical tips for when you want to build the AI landscape in your organization.

Start with a clear question
Know your organization, its goals and ambitions. Make sure you have a clear goal in mind with associated KPIs.
- If you give anassignment without clear goals and conditions, the AI model will flawlessly fill in the missing links . AI is often not trained in your context and the input does not take into account the goals and ambitions of your organization.
- It is the people in an organizations business goals. Don't let AI determine your direction. Or even draw conclusions for you. Let it be your engine.

Know your data and choose your AI consciously
There are many flavors of AI to choose from(Google Learn): AI, GenAI, Machine Learning. The best choice depends on your data and goals. Structured or unstructured data. Text or numbers. Images or customer experiences. Make sure you choose consciously.

- Is your data not well structured yet? There are Machine Learning (ML) and Natural Language Processing (NLP) applications that allow you to label your data based on predetermined categories. This way, you define the rules yourself and AI assists.
- Is your data structured but contains errors? Get help with data cleaning. You decide what you see as "wrong," AI will search for you and come back with all the wrong or anomalous data.

Set standards for risk and implementation
Determine how much risk you are willing to take. An AI chat model that provides inspiration for a fiction story has significantly fewer major implications than an AI medical model that assesses disease probability based on medical data.

- The extent to which you are willing to take risks depends on the specific data and organization in which you are applying AI. Research the potential implications of your model and set clear standards for what are acceptable risks for implementation.
- Make sure you can always track the actions of your model so you can recognize and prevent any risks.

To actually apply these actions, you need a solid foundation in terms of data. Good data governance. To ensure that your data is consistent and reliable, delivers maximum added value to the organization while being secure and compliant with laws and regulations.

Create magic with value

An AI landscape starts with the first step. By making the right choices and considerations, you can tap into the 'magical' potential of AI and create magic with value.
Do you have a clear question, sufficient insight into your organization and clear criteria for implementation? Then you are ready to explore the many flavors of AI and enrich your data landscape with targeted applications.

By F. Bueters

This blog post is a contribution from Mobilee, a consulting firm for digital transformation challenges, strategy-execution and team building, for the readers of Data Expo. You can find more inspiration at www.mobilee.nl or visit Mobilee during Data Expo at booth #40.

For visitors

Become an exhibitor

Program

Blog & Knowledge

Select language

Garbage in, magic out: guarding data quality in an AI world

Mobilee

Data Expo

De laatste inzichten en nieuwtjes.

Democratization of AI: genius or digital nuke

AI failures are primarily transformation and people problems, not technology!