Skip to main content

Don’t Wait for Perfect Data

As MANTECH’s Data and AI Practice works with clients on how AI can help them, one of the most common concerns we hear is, “the data isn’t ready.”   

Clients often worry they won’t be able to get value out of implementing AI until their data is ideal: cleaned, correctly and identically formatted, structured consistently, and held in a fully managed schema, typically with normalized tables. There are good reasons for this belief, but it misses the ways AI can add value right away. 

Some of us have learned a lesson from traditional data science almost too well – that everything relies on the quality of our data. That quality is still hugely important, but an often-overlooked aspect is that data science and AI methods can contribute to improving that quality. This is especially true with modern generative AI models, which have already been trained on enormous quantities of extremely varied data and may well have learned the very patterns needed to clean or enhance a critical data set. 

As a simple example, suppose you want to predict future demand for an item based on past demand levels and world events at the times of those demands. This data would likely reside in several different databases, each with its own date format. Worse, several of the event databases might use free-form text entries for dates, so the format for each individual date is unknown. You could sit down and start scrolling and writing queries to address the formatting inconsistencies one at a time. Or, you could let data science and AI help you.  

Both conventional data science and generative AI can identify not only standard date formats but also unusual formats that look similar to dates and attempt to map them to a standard representation. Generative AI can also help you write code to standardize and normalize what you find. Humans then provide the oversight and judgment required instead of the grunt work, streamlining the process and improving the quality of life of your data engineers. This speeds the time to get ready for the predictive analytics you were planning. 

Going further, in some cases, you may not even need to go through the process of improving the data before getting AI results. Suppose your demand prediction use case is really looking for a description of the conditions at the times that demand has spiked in the past. You may be able to use a generative AI model’s flexibility directly, by asking it to fetch events co-occurring with demand spikes and then summarize them, leaving it to translate the date formats as needed. You should prompt it to show its work in this case, and you’ll need to check that work! 

With the right application, data science and AI can meet your organization where you are, even if that means messy data and uncertainty about what’s in the collections. 

Don’t let the perfect data vision be the enemy of good value from AI; with the right resources to help you, you can make real, mission-driven progress on both at once.

About Data and AI Bytes

Welcome to Data and AI Bytes – a series of short, snackable blog posts by experts from MANTECH’s Data and AI Practice. These posts aim to educate readers about current topics in the fast-moving field of AI.

MANTECH’s Data and AI Practice is at the forefront of operationalizing AI for government – delivering impactful AI solutions that drive mission success.

For more information, contact AI@mantech.com

Dr. Kristen Summers
Vice President and Artificial Intelligence Fellow
Read More
Dr. Kristen Summers
Vice President and Artificial Intelligence Fellow
Read More

Learn More About Data and AI

Explore your next career challenge and learn more about the Data and AI team!

Learn More

View More Blogs

View other MANTECH Blog Posts and Case Studies

View Blogs