How Artificial Intelligence Is Changing the Way We Read Data (and Make Decisions)
AI has made data analysis more conversational and faster. But anyone who works with data every day knows that the hard part has never been writing a query: it’s deciding what a number actually means. And that, AI alone doesn’t solve.
The Convincing Answer That Is Also Wrong
We recently started working with a company that showed us an analysis produced by an AI system. The output was already packaged for the Monday meeting: headline in capital letters, THE SEO BOOM, accompanied by numbers, triple-digit growth percentages and a polished commentary on the success of the organic strategy over the past six months.
The AI had been handed the monthly organic sessions of the last six months, compared with the same months of the previous year. From the system’s point of view, the conclusion was unassailable: relative to the comparison period, organic traffic had exploded. The phrase “extraordinary growth” would have been hard to dispute looking at those numbers alone.
Except that the previous year, in the months being compared, the company’s Google Analytics property didn’t yet exist. It had been activated five months after the site’s launch. The “previous year” data was mostly zeros — because there was no tool measuring, not because there was no traffic. SEO hadn’t boomed at all: it had gone from “unmeasured” to “measured”. A setup change told as a performance change.
The AI hadn’t got the maths wrong — the maths was right. It had got something more subtle wrong: it didn’t know that behind those numbers there was a story. It’s a mistake that’s hard to catch unless you know the client’s context, and that’s exactly why we found it — because that context is our job.
The point matters, and it deserves to be generalised. AI isn’t curious. It doesn’t ask itself “wait, did something change in the way we were measuring last year?”. It takes what you give it and reasons on top of it. If the data doesn’t say that the Analytics property was activated late, that piece of information doesn’t exist for the AI. And yet, it’s exactly that piece of information — utterly trivial once you know it — that flips the reading. Context, almost always, isn’t inside the data: it’s around the data, in the heads of the people who know it. And without those heads, AI reads everything and understands nothing.
“AI doesn’t get things wrong because it can’t read numbers. It gets things wrong because it doesn’t know what’s around the numbers.”
The Second Way to Stumble: Fragile Arithmetic
This is the first way AI stumbles on data: context. But there is also a second way, which directly involves the maths itself. And there the issue gets even more technical.
When you load a spreadsheet into a language model, it doesn’t “open” it the way Excel does: it reads it as text. It doesn’t see a grid with cells and formulas, it sees a sequence of words and numbers to process with the same mechanism it uses to write an email. On that mechanism, exact arithmetic is a side effect, not a guarantee. It’s like asking a brilliant writer to add up a column of thousands of numbers in their head in front of the client: they’ll probably get the order of magnitude right, get some of the details wrong, and in both cases will tell you so with the same confident tone.
Research data. A study titled “Large Language Models in Numberland” tested five flagship models (OpenAI’s o1, Gemini, Claude and Copilot, plus o1-mini) on a benchmark of 100 numerical problems. On tasks with a deterministic path, the models scored between 74% and 95% accuracy. On the Game of 24, which requires trial-and-error search, performance collapsed to 10–73%. And when the same game was offered in a harder version, the best model dropped to 27%. Razeghi et al., arXiv:2504.00226
Between wrong context and fragile arithmetic, AI left to its own devices has two routes to take you off course. In both cases, the problem isn’t solved by switching model: it’s solved at a deeper, less glamorous level.
The Real Bottleneck: the Company Dictionary
Walk into a company and ask three different people: “how many active customers do we have?”. Sales will count whoever has signed a contract. Customer success will count whoever has used the product. The CFO will count whoever has actually been invoiced. Three different numbers, none of them wrong.
As long as humans are the ones answering, this isn’t a problem: in a meeting you ask “active in what sense?” and find the common language for that conversation. When the one answering is an AI, the ambiguity explodes. The machine doesn’t ask for clarification: it picks a definition — possibly a different one each time — and replies with the same confidence. Two managers ask the same question an hour apart and get different numbers. End of trust in the system.
The solution exists, and it has nothing to do with AI: have a company data dictionary — written, versioned, single source of truth. Where “active customer” is defined once and for all, and that definition is the one every tool — dashboard, report, AI — uses to answer. In industry jargon this is called a semantic layer; the concept is that of the dictionary.
This work needed doing anyway. AI didn’t create the problem — it made it visible and urgent. A Gartner survey published in February 2025 found that 63% of organisations don’t have — or aren’t sure they have — AI-ready data management practices; the same report estimates that by 2026, 60% of AI projects unsupported by “AI-ready” data will be abandoned. Source: Gartner, February 2025. The model is almost always the least of the problems.
When the Insight Is Just Theatre
There is a second risk. AI is exceptionally good at producing narratives: give it any number and it will write you a polished commentary, with causes, effects and recommendations. It will do so even when the number says nothing.
Example: a campaign runs for a week on 400 people, the conversion rate moves from 2.1% to 2.4%. The AI comments: “the new creative is working, performance +14%, I suggest increasing budget”. Nice sentence. Except that on those numbers, the difference is well within normal random fluctuation. A standard statistical calculation (a significance test on two proportions, 95% confidence level, 80% power) shows that to conclude with any reasonable reliability that 2.4% really is better than 2.1%, you would need about 76,000 people. Not 400. On such a small sample, any difference below 4 percentage points is noise. And you are about to shift budget on the basis of a convincing narrative built on nothing.
The answer isn’t to give up on AI, it’s to set guardrails around it: systems that say “sample too small”, “difference not significant”, “trend within natural variance”. It can be automated, but it has to be willed. And it requires that someone, inside the company, has the explicit responsibility of saying “this insight is fragile, we are not publishing it”. A role almost nobody owns today.
AI on data doesn’t replace a semantic infrastructure and a culture of honest measurement.
It hides them, and makes mistakes faster and more convincing.
Two Questions Before Putting AI on Your Data
If you are considering introducing AI into your analytics systems, ask yourself these two questions. They aren’t technical — they’re organisational. If the answer to even one is no, the problem isn’t choosing the tool: it’s something further upstream.
- DICTIONARY Do my key metrics have a single, written, shared definition? If I ask three people what “active customer” or “net revenue” means, do I get the same answer?
- ACCOUNTABILITY Is there someone, on my team, with the mandate to say “we are not publishing this insight because it’s statistically fragile”? Or is everyone incentivised to produce conclusions, whatever they are?
DICTIONARY FIRST, MODEL SECOND
Anyone who reverses this order ends up with a system that answers everything and gets a lot of it wrong — written in perfect English.
Sources
(1) Razeghi et al., “Large Language Models in Numberland: A Quick Test of Their Numerical Reasoning Abilities”, arXiv:2504.00226, April 2025. arxiv.org/abs/2504.00226
(2) Gartner, “Lack of AI-Ready Data Puts AI Projects at Risk”, press release of 26 February 2025, based on a Q3 2024 survey of 1,203 data management leaders. gartner.com
(3) Own calculation, based on a significance test for the comparison of two proportions (alpha = 0.05; power = 0.80). The calculation can be reproduced with any public sample size calculator.
Fortop is a strategic data-driven marketing consultancy. We help companies build the infrastructure — technical, semantic and organisational — so that data actually becomes decisions. Talk to us.
