Tech

Why Manufacturers Are Moving AI Inference to the Edge: Hardware Considerations

Michael Wellis24 seconds ago

0 1 3 minutes read

The default assumption for a decade was that AI runs in the cloud. Data goes up, a model thinks, an answer comes back. For a recommendation engine that is fine. For a vision system inspecting parts on a line running at full speed, or an anomaly detector on a subsea asset with no reliable uplink, it is the wrong architecture. The question manufacturers are actually answering in 2026 is not whether AI is useful on the factory floor. It is where the inference should happen, and that turns out to be a hardware question more than a software one.

Table of Contents

The real driver: determinism, not just speed

Cloud inference optimises for throughput and assumes asynchronous processing. Industrial edge inference often needs the opposite: a bounded, predictable response time, every cycle, regardless of load. A vision inspection that averages fast but occasionally stalls is not acceptable when a stall means a defective part ships or a line stops. The decisive metric at the edge is frequently worst-case latency and jitter, not average performance, and that requirement shapes which hardware is appropriate.

The hardware options, and their trade-offs

There is no single “edge AI chip.” There is a spectrum, and the right point on it depends on model complexity, latency requirements, power budget, and volume.

FPGAs and SoC devices that combine an FPGA fabric with a CPU implement the model as a hardware datapath rather than a sequence of instructions. That enables deep parallelism, deep pipelining, and near-constant latency that holds under load, which is exactly what deterministic industrial workloads need. The cost is engineering effort: HDL expertise, longer development, and timing closure. Platforms like Zynq sit here. The gap is large enough to be decisive, and a benchmarked comparison of FPGAs and microcontrollers for industrial edge AI shows how sharply inference latency and jitter diverge between the two under real load.

Often the strongest design is hybrid: an FPGA or accelerator handling deterministic inference or preprocessing, while a microcontroller manages system logic and communication. The architecture is chosen to close the feedback loop under real conditions, not to maximise a benchmark.

Consideration	MCU / TinyML	FPGA / SoC	AI accelerator / edge GPU
Latency under load	Grows with model size	Near-constant, deterministic	High throughput, more variable
Power / thermal	Lowest	Moderate	Highest
Model complexity	Compact only	Scales with logic resources	Largest models
Integration effort	Lowest	High (HDL, timing closure)	Moderate
Best fit	Low-power, simple models	Strict real-time industrial	Heavy models, power available

The decisions that bite later

Two hardware-adjacent choices cause most of the regret. The first is model optimisation. A model trained in 32-bit floating point usually has to be pruned and quantised, often to 8-bit integer, to fit the memory and compute envelope of edge hardware, and that work has to be co-designed with the target device rather than bolted on afterward. The second is thermal and power headroom: an edge device that throttles in a hot enclosure or drains a battery faster than planned fails in the field even if it passed on the bench.

edge AI hardware in tandem with the model and the operating environment, so the device, the quantised model, and the thermal budget are balanced against each other rather than discovered to be incompatible during integration. Teams that choose the silicon first and adapt the model later usually end up redoing one or the other.

The bottom line

Manufacturers are moving inference to the edge because latency, connectivity, privacy, and cost on the factory floor all point the same direction. But “edge AI” is not a product you buy, it is a hardware architecture you design, somewhere on a spectrum from a quantised model on a microcontroller to a deterministic datapath on an FPGA. Decide based on worst-case latency, power and thermal limits, and model complexity, and co-design the model with the silicon from the start. The projects that struggle are the ones that treated the edge as a place to deploy a cloud model unchanged. The edge does not work that way, and the hardware is where that becomes obvious.

Michael Wellis24 seconds ago

0 1 3 minutes read