Asynchronous AI

17 Jul

The Future of Multi Tiered, Team Structured AI Deployment

Reading time: 5 minutes

When I speak to people about their approach to AI deployment at their businesses there are a few hesitations that everyone has. Here are the common ones:

"Doesn't it always hallucinate"
"How can I be sure that it's accurate"
"I don't know how to deploy it"
"Won't it take our jobs"

The future of deployment of AI is not via a chat interface that we know it. But built out automation and systems that utilise proprietary data to do the jobs AI is much better than us at. Research, multi source data ingestion and data analysis and processing. But if you've giving it the keys to decision making, how can you be sure it's the right one. The answer is Asynchronous AI.

But before we get to that, let's understand the issue.

AI Hallucinations

AI hallucinations when large language models confidently provide false or fabricated information have been a long-standing and well-documented issue. They range from subtle factual inaccuracies to completely invented events, citations, or reasoning. While they’re often written with remarkable fluency and authority, their unreliability creates real limitations when AI is expected to perform tasks requiring consistency, reliability, and trust.

The issue becomes even more complicated when you’re no longer operating within the friendly confines of a chat interface. While humans can manually detect and course-correct a hallucination mid-conversation in the UI, the story is different when you’re using the AI via an API, where inputs and outputs are meant to be automated, invisible, and assumed to work.

A Test of Consistency, Output, and Accuracy?

I ran a simple but revealing test that evaluated outputs across three metrics: consistency, output, and accuracy. We prompted GPT 3.5 Turbo 100x times via the API (Application Programming Interface) with the same input and observe how stable the outputs were.

Here were the input prompts:

Test 1 - "Acting as an SEO Expert & copywriter, look at this page https://www.theguardian.com/football/2025/jul/15/bigger-better-more-often-infantino-wont-let-up-on-his-ambition-for-club-world-cup and summarise in a short sentence what it is about"

Test 2 - "Acting as an expert in SEO, provide recommendations for optimising this page https://www.theguardian.com/football/2025/jul/15/bigger-better-more-often-infantino-wont-let-up-on-his-ambition-for-club-world-cup for more evergreen search volume. The output should be a list of 5 keywords with reasons for each of them"

Test 3 - "Acting as an SEO expert and sentiment analysis. Look at this page: https://www.theguardian.com/football/2025/jul/15/bigger-better-more-often-infantino-wont-let-up-on-his-ambition-for-club-world-cup provide me with a detailed list of 3 key changes you'd do to make the page target more search behaviour and keyword opportunity and then analysis the sentiment of the page to summarise 3 key points from the sentiment analysis"

Each of them escalating in terms of complexity and the sort of question you'd ask a junior in your team to complete.

In order to grade the consistency we looked at the following framework:

Consistency (Structural): The AI produced different styles, tones, and formats across runs.
Output: Were the results output in the requested format across the requests
Accuracy: Sometimes facts were present and correct. Other times they were… fictional.

Here were the results:

AI Accuracy Test Looking at SEO Complexity

On a one-off basis through the UI, this variation is manageable. You notice it. You click regenerate. You fix it. No problem.

But via the API, where automation depends on deterministic and predictable behavior, this is a breaking issue. In production environments where human oversight isn’t practical for every request, hallucination and inconsistency aren’t just annoying, they’re dangerous.

The Importance of Thresholds

This leads us to the concept of thresholds the invisible standards that dictate whether an AI response is “good enough” to be used. Think of thresholds as the AI’s quality gate: how well does the output need to align with factuality, task specificity, or user tone before it’s deemed acceptable?

Let’s consider a playful but telling example. If you ask an AI:

“Tell me a story about a mop.”

You might get three different levels of threshold:

Low Threshold: “Once there was a mop. It cleaned floors. The end." (Technically accurate. Completely uninteresting. Functionally useless.)
Mid Threshold: “The mop had dreams of being a dancer, twirling across the linoleum like Fred Astaire. But it was stuck in a janitor’s closet… until one night…" (Creative, engaging. A solid answer.)
High Threshold: “In 1973, amidst the oil crisis, a factory in Detroit built a mop with an experimental polymer head that would later be considered revolutionary. This is the story of how that mop ended up in the Smithsonian…” (Original, researched, deeply structured. Too ambitious, but better.)

Setting and maintaining the right threshold is critical. And to do that reliably, you need more than just one-shot AI output. You need an architecture that can evaluate, refine, and structure, autonomously.

Enter Asynchronous AI: A Team Model for Machines

Now imagine AI not as a monolithic black box that returns a string of text, but as a distributed asynchronous system, like a team of people with specialised roles:

The Briefer: Interprets the prompt and defines the goals.
The Executor: Actually does the work (writing, coding, summarizing, etc.).
The Reviewer: Checks the output for quality, accuracy, tone, etc.
The Outputter: Packages the final result in the desired format.

This is asynchronous AI; where each “role” can be played by separate instances or phases of the model, running sequentially or in parallel, evaluating and improving each other’s output.

It mimics the way high performance teams work: distributing complexity, enabling specialisation, and introducing checks and balances. But balanced with differing levels of Thresholds and standards to ensure desired deliverables are met.

And just like in a human team, this system doesn’t assume perfection in the first draft, but rather, builds in refinement as a feature, not a patch.

Why This Matters: Context, Windows, and Limitations

In a conversational UI, a lot of this happens invisibly. Context is preserved in your chat history. The model remembers your earlier preferences. It self-corrects, adds nuance, and even “learns” over the session (within limits). But that context, the glue holding everything together, doesn’t exist in the same way via API.

When using the API, context windows become a hard constraint. Everything the model needs to understand has to be included in the payload: your prior prompts, any preferences, the response history, all of it. If you don’t manage this carefully, the model responds like it has no memory, because… it doesn’t.

This is where asynchronous, multi-agent, team-like AI becomes not just helpful, but necessary. It allows you to simulate long-term memory, enforce standards, manage context, and execute multi-step reasoning, all without assuming the model will just “get it” from a single shot and there are multiple attempts to deliver an accurate version.

Final Thoughts: The Path Forward

The future of AI isn’t about pushing harder on single-shot prompt engineering. It’s about orchestrating AI like a team, thinking in systems, and designing workflows where multiple agents with defined roles collaborate, asynchronously, to meet quality thresholds, manage context, and produce reliable outputs.

Hallucinations, inconsistency, and brittle context limitations aren’t just bugs, they’re signs that we’re still thinking too linearly. The solution isn’t just better models. It’s better architecture.

Asynchronous AI is that architecture. And it’s how we’ll go from clever answers to trustworthy systems.

Jeremy Mcdonald