When AI Should NOT Make Decisions For You

Everyone's obsessed with what AI can do. Nobody's obsessed with what it shouldn't do. Here's a story about a tool where we deliberately chose not to let AI make the decisions — and why it's more trustworthy because of it.

The Problem

People need proof of payment history. It comes up in divorces, tax audits, disputes with contractors, child support matters, loan applications. They need a list of payments, dates, amounts. Banks don't provide what they need. Their own records are a mess.

So they try ChatGPT. They screenshot their bank statement, paste it in, ask "how much did I pay this person?" They get an answer. They run it again the next day with the same screenshot and get a different answer. That's a problem when a lawyer is asking.

That's where we came in. We built a tool that reconstructs payment history from bank statements and other sources. It's called TransactionPaid, and it's deliberately, aggressively deterministic.

Why Probabilistic AI Fails Here

LLMs (like ChatGPT, Claude, etc) are probabilistic. They're trained to predict the most likely next word. Run them twice on the same input, and you might get slightly different outputs. For creative tasks, that's a feature. For financial records, it's a catastrophe. Current benchmarks show that even the best LLMs still hallucinate at rates of 0.7% to 9.2% depending on the task,^[1] and for domain-specific work like legal or financial analysis, hallucination rates can reach 6-20%.^[2]

Imagine your lawyer says: "Your honour, the defendant paid $47,000 to my client over three years." She's got a printout from TransactionPaid saying exactly that. Then the defendant's lawyer says: "Actually, our analysis shows $43,000." He runs the same tool, same input, and gets a different answer because the underlying model updated or the temperature settings are different.

Now you've got a situation where the same input produces different outputs. In finance, that's game over. You lose credibility immediately.

That's why we didn't use an LLM as the core decision-maker. We use AI for specific, narrow tasks where probabilistic output is fine. Everything else is deterministic code.

What Deterministic Means

Same input + same confirmations = same output, every time. Forever. No model updates that change historical results. No probability involved.

Here's the actual workflow:

User uploads a bank statement (CSV or PDF)
AI parses the bank statement format and extracts transaction records. This is where LLMs genuinely help — bank statements are messy and inconsistent. An LLM can handle "ACME CORP USD 500" or "$500 Payment to ACME" or "TRANSACTION: ACME $500" and understand they're the same thing.
AI suggests candidate matches based on fuzzy name matching. "John Smith" might match "SMITH JOHN" or "J SMITH" or "Smith, John." An LLM is great at this.
Here's the key: the user confirms every match manually. The AI suggests, but the human decides.
Deterministic code totals the confirmed transactions and produces a report.

The AI never decides what's included. It never calculates totals. It never certifies results. It only assists the human who's doing the actual deciding.

Why This Is Actually Better

This constraint — "AI suggests, human confirms" — sounds limiting. It's not. It's the product.

Lawyers love this tool because it's defensible. If a transaction is included in the $47,000 total, the user confirmed it. There's an audit trail. If a judge asks "why is this transaction included?" the answer is "because the user reviewed it and said yes," not "because an AI decided it looks relevant."

That's not just better legally. It's actually more user-friendly. The user has skin in the game. They're not passively trusting an algorithm. They're actively verifying their own data.

And it's more accurate. A human reviewing 200 transactions and confirming 150 of them will catch the ones an LLM would miss. The LLM suggests "Transaction to Acme Corp, $500" — but the user knows "that's my supplier, include it" or "that's actually a refund, exclude it." The LLM can't know context. The human can.

When AI Is the Right Tool

This doesn't mean we're anti-AI. We just deploy it where it actually makes sense.

In FL-Support (our customer support system), the AI makes decisions. It triages tickets, diagnoses bugs, even generates code fixes. That's fine because:

The stakes are lower. A wrong diagnosis gets corrected later, not used in court.
We have a human approval gate. AI suggests, human approves.
We iterate on feedback. If the AI is wrong 2% of the time, we refine the prompt and get to 98%.

For Stocky (our inventory app), we don't use AI at all. We use deterministic code because inventory data needs to be exact. A webhook deducts inventory for every order, and that deduction must be reliable. Not "probably right," but "always right."

The pattern is: use AI where uncertainty is acceptable. Use deterministic code where it isn't.

The Conversation We Have in Every Audit

This is actually the conversation we have with every client during a free audit.

"Where could AI help in your business?"

Most people immediately think: "Everywhere. AI can do anything." That's backwards. The real question is: "Where is uncertainty acceptable?" McKinsey's 2024 State of AI survey found that 65% of organisations now use gen AI regularly,^[3] but few are asking this question rigorously enough.

For customer service? Usually yes. Respond to a customer email, get it slightly wrong, they reply, you correct it. Fine.

For financial reporting? No. For compliance? No. For safety-critical decisions? No. For content that's attached to your name? Probably no. This is why ISO/IEC 42001:2023 — the world's first AI management system standard — explicitly requires organisations to assess AI risks and maintain governance structures for responsible deployment.^[4]

For data entry, parsing, format conversion, fuzzy matching? Yes. For tedious search-and-click workflows? Yes. For writing first drafts that humans will edit? Yes.

The honest answer is that some of your problems aren't ready for AI yet. And some never will be. That's not a limitation. That's clarity.

The Lesson

We talk to businesses who are obsessed with adding AI everywhere because it's trendy. They want to replace customer service reps with chatbots (fine, maybe), replace accountants with AI analysis (not fine, they're doing judgment), replace approval processes with algorithms (nightmare scenario). In 2024, 47% of enterprise AI users admitted to making at least one major business decision based on hallucinated content.^[5] That's the cost of deploying AI without thinking about where certainty matters.

The smart move is to identify where your business actually needs certainty and where it doesn't. Build AI-assisted workflows, not AI-decision workflows. Keep humans in the loop for judgment. Let machines handle the drudgery. That's exactly the kind of thinking we bring to AI consulting in Perth — helping businesses figure out where AI fits and where it doesn't. Check out our services if you want to explore what that looks like.

TransactionPaid is less impressive than a fully autonomous system that calculates everything without human input. It's also infinitely more useful because it's actually trustworthy.

That's the north star: trustworthiness, not automation for automation's sake.