Abstract illustration of a layered AI technology stack

Architecture5 May 20265 min read

The five layers under every AI feature

A useful way to picture AI is as a five-layer cake, from the power plant at the bottom to the product people actually touch at the top. Here is how we read that stack, and why our work starts from the top down.

AndyCo-Founder · Technology & Vision

Most conversations about AI start at the very top of a tall stack. A team asks for a copilot, or an assistant that clears their invoice backlog overnight, and the conversation stays right there, at the level of the feature. It is worth looking down. Every AI feature rests on five layers, and each one quietly shapes what the layer above it is able to do.

NVIDIA recently described the whole thing as a five-layer cake. We like the picture, because it puts the part everyone argues about, the model, in its proper place: near the top, but not the top, and useless without everything beneath it. Here is how we read the stack from the bottom up, and where our work actually happens.

It starts with power

The bottom layer is energy. Every token a model produces comes down to electrons moving and heat being carried away. That feels a long way from a logistics team that wants fewer manual handoffs, but it is the reason inference is priced the way it is, and the reason capacity gets tight when everyone wants it at once. Energy is the constraint sitting underneath all the others.

Chips, then infrastructure

Above power sit the chips, the specialised processors that turn electricity into computation at scale. Above the chips sits infrastructure: the land, the cooling, the networking, and the work of wiring tens of thousands of those processors into a single machine. NVIDIA calls these AI factories, and the name fits. They do not store information. They produce intelligence on demand.

Two things follow for anyone running a business. The first is that this is the largest infrastructure buildout in modern history, and the per-token prices on your invoice are how it gets paid for. The second is more useful: almost nobody needs to operate at these layers. You rent them, by the token, and you move on.

Models are a component, not a product

The fourth layer is models, and this is where most people assume the story begins. It does not. Language models are one category among many. There are models for biology and chemistry, for physics and finance, for driving robots and running simulations. For the systems we build, the point worth holding onto is plain: a model is a component. It is general, capable, and completely unaware of how your particular business runs.

The top layer is the only one your customer touches

At the top is the application. The drug-discovery platform, the legal copilot, the system that reconciles a messy ledger while everyone sleeps. It is the one layer a user ever sees, and the only place where the value of everything underneath becomes real. A perfect model wired into a vague application is worth nothing. A modest model inside a sharp, well-built application can change how a team spends its day.

Where Mileon works

We build at the top of the stack, and we are deliberate about it. We do not train foundation models, and we are not about to pour concrete for a data centre. We assemble that top layer for one specific team, treating the layers below as supply we choose with care.

That choice carries weight. Every feature we ship pulls on every layer beneath it, right down to a power plant we will never see. So we make the decisions explicit. Which model does each job. Where it runs. What it is allowed to read, and what it must never touch. A frontier model in the cloud might draft a message. A smaller model on hardware the client owns might handle the records that cannot leave the building. To the user it is one smooth system. Underneath, it is a series of choices we made on purpose.

The teams that get the most out of AI are not the ones chasing the newest model each month. They are the ones who treat the whole stack as something to design rather than something to accept. That is the job we take on: choosing the right layer for each part of the work, and connecting them into a system the team can actually run.

Available — replies in under 24 hours

Want to Pick Our Brain?

Book a 30-minute discovery call and we'll map what's possible for your operations — no pitch deck required.

Operational AI, built around how your team works.

Company

Approach Services Compare Blog About

Legal