// THESIS

Why industrial robotics needs a different data layer, and what the moat looks like.

A working set of essays on industrial data, simulation and evaluation, and the physical AI infrastructure layer. Updated as the thinking sharpens. The numbered list below is the current set.

// THE ESSAYS

  • ESSAY 01

    The industrial data desert and why household egocentric does not transfer

    Foundation labs have trained robots on millions of hours of household video. They can fold laundry and load dishwashers. None of that transfers to a refinery floor, a fabrication yard, or a manufacturing line. The data that does transfer does not exist on the internet, is not in any public dataset, and is gated behind plant security, hazard protocols, and procurement cycles that take a year to navigate. This essay walks through the actual numbers and the gap.

    Read essay
  • ESSAY 02Writing now

    Process context is the missing label

    Labels in current robotics datasets describe what the camera sees. They do not describe what the plant's systems saw. Without MES, historian, SCADA, ERP, and DCS context synced to the frame, a video of a valve being turned is just video. With it, that frame becomes a state-action pair with a known process step, a known chemical, a known safety protocol, and a known operator. Same frame, completely different training signal.

  • ESSAY 03Writing now

    Sim + Eval is the moat in physical AI

    Data is table stakes. The moat is the simulator that matches the vertical's physics, layouts, and SOPs, plus the evaluation harness that scores policy performance against real operator benchmarks from the same plant. Whoever owns that pair for a vertical owns the data flywheel for that vertical. Whoever owns the flywheel becomes the deployment partner.

  • ESSAY 04Writing now

    One vertical first: how we are choosing

    We are evaluating three verticals with design partners: discrete manufacturing, construction and inspection, and oil and gas. The selection criterion is commercial pull, measured by concrete commitments from labs and operators. The vertical that pulls hardest gets the simulator, the harness, and the deployment partner role first. This essay walks through how we are scoring the three.

  • ESSAY 05Writing now

    What VLA labs should ask us before they pay us

    A short, candid set of questions a foundation lab evaluating Trekion should ask, with the honest answers. Use this essay as a pre-call briefing.

// ESSAY 01

The industrial data desert and why household egocentric does not transfer

The state of public robotics data, May 2026

Trekion is the industrial data and simulation and evaluation infrastructure layer for physical AI. We sit between foundation robotics labs and the plants where their models will eventually deploy. From that seat, the public data landscape looks like this.

Open X-Embodiment (Google DeepMind, 2023, expanded through 2025) is the canonical robotics demonstration corpus. Over one million trajectories across 22 robot platforms, pooled from 60 underlying datasets at 21 institutions, with 527 distinct manipulation skills. Excellent for embodiment generalization research. Heavily weighted toward lab manipulation tasks: blocks, cups, kitchen surfaces, generic objects on flat tables. Almost zero industrial process workflows.

Egocentric-1M (Build AI, April 2026) is the largest first-person robotics-relevant dataset to date. Roughly one million hours of head-mounted camera footage from over fourteen thousand factory workers across Southeast Asia, captured on Build AI's custom glasses inside real production environments: assembly lines, sorting, packaging, machining. Released Apache 2.0. Important step, and it does cover factory work. What it does not capture is what the plant's own systems saw at the same moment. There is no synced MES tag, no historian timestamp, no SCADA reading, no DCS state attached to the frames. The video knows what the operator did; it does not know what the plant did.

AGIBOT World 2026 is an open dataset from AgiBot. Broad scenarios, good benchmark for generalist policies.

NVIDIA Cosmos is a synthetic world model for physical AI. It generates training and evaluation data from world simulations. Complementary, not competitive. Most labs running serious industrial training pipelines should use Cosmos for volume and synthetic perturbation, and pair it with real ground-truth capture for fidelity.

Add it all up. The total quantity of high-quality real physical-interaction data globally, as of mid-2026, sits under 500,000 hours by most industry estimates. Generalist embodied policies need tens of millions of hours of relevant interaction to reach the breadth of behavior foundation lab teams are targeting. Roughly speaking, the world has about one twentieth of the data the labs need.

That is the data desert. Now the question that matters more: of the data that does exist, what fraction transfers to industrial deployment? Almost none. Here is why.

Why household tasks do not transfer to industrial

A foundation policy trained on household egocentric video learns three things well. It learns spatial reasoning over flat surfaces and clutter. It learns dexterous manipulation primitives for everyday objects. It learns natural language grounding for common verbs (pick up, place, open, close). All useful. None sufficient for industrial deployment.

What industrial deployment also requires:

Process state. The robot is not just moving a valve. It is opening a valve from 0 bar to 4.2 bar, in a system where the next downstream tag is a flow meter reading and a gas concentration alarm. The action only makes sense in the context of the plant state. Without that context in the training signal, the policy cannot learn the correct action conditioning.

Safety sequencing. Industrial SOPs are deeply ordered. A correct gas detection round in a refinery involves a fixed sequence of substeps in a fixed order, with a known set of permitted deviations. A household kitchen task has none of this structure baked in. A policy trained only on household data has no prior over operator-grade sequencing.

Tools and surfaces. A pipe rack is not a corridor. A control room is not a kitchen. A 30-year-old refinery floor has corrosion, fluid spills, vibration, varying lighting, and process noise that no kitchen captures. Sim-to-real transfer that works for tabletop block manipulation does not solve transfer to plant floors.

Operator conventions. Plant operators do tasks one way for hard reasons: lockout-tagout, two-person rules, vendor-specific procedure. Policies that have never seen these conventions cannot reliably propose actions a plant safety officer will approve.

This is why an Open X-Embodiment foundation plus an Egocentric-1M fine-tune is not enough to deploy a robot into a chemical plant. The transfer gap is structural.

What industrial reality actually looks like

Three concrete examples make this concrete.

A gas detection round on an oil refinery. An operator walks a fixed route. At each station, the operator turns a sample valve, takes a reading on a portable gas detector, logs the reading, and rotates the valve closed. The valve actions are simple. The state that makes them meaningful is not in the camera. It lives in the plant historian (the pressure reading before and after each valve action, the gas concentration recorded by the portable detector, the previous round's value for delta tracking) and in the DCS (the supervisory state of the upstream process unit). A useful training signal pairs the operator's action with these synchronized process readings. The video alone is just video.

A line changeover on an automotive assembly line. Parts swap, fixtures index, a quality gate runs. The operator's action sequence is dictated by the MES production order (which model, which trim, which fixture preset, which torque value). The same physical gesture means different things on different changeovers. Strip the MES context and the dataset has lost the information that made the action correct.

A structural inspection on a construction site. An inspector walks a route, captures defects against a checklist, syncs the checklist to ERP. The action is "look here, judge X." The training signal that makes it useful is the linkage to the ERP record (which structural element, which spec, which prior inspection round). Without that, the data is generic walking-with-camera content.

All three workflows have a common shape. Human action plus camera plus synced plant or facility system state equals a useful training episode. Human action plus camera, with no synced state, equals weakly labeled video that may help a model generalize but will not get it past a deployment evaluation.

The access problem

Even labs that recognize the gap and want to solve it run into the access problem. Industrial environments are gated. Plant security clearances are not casual. Hazard protocols vary by site and vendor. Insurance, IP, and safety officer sign-off are required before any data can leave the floor. Procurement cycles in oil and gas, pharma, and discrete manufacturing run six to twelve months on routine contracts.

A foundation lab that decides today to capture data in three chemical plants will, in the best case, see its first useful episode delivered nine months from now. The slow path is not the capture itself. It is the year of relationship work, certification, NDA, and on-site operator training that has to happen before the first camera is even mounted.

This is the asymmetry. The labs that need the data the most cannot economically build the access pipeline. The companies that have the access do not have the data infrastructure or the training-format expertise. The gap between them is the wedge Trekion was founded to close.

What this means for foundation labs

Three concrete implications.

First, household egocentric does not get a model to industrial deployment. Egocentric-1M is a useful upstream signal. It is not a substitute for industrial capture with synced process context. A policy that has trained only on household and lab demonstrations will not pass a plant safety officer's evaluation.

Second, scale of data matters less than fidelity of context. Ten thousand hours of process-contextualized capture from a refinery is more useful for training a refinery deployment policy than a million hours of generic egocentric. The data desert is a fidelity problem, not just a volume problem.

Third, owning the access pipeline is the moat at the data layer. Capturing industrial workflows with process context, at scale, is hard not because the cameras are special. It is hard because the access takes a year. Once a partner builds that access, they hold a structural advantage.

What Trekion delivers

Multimodal capture inside real plants. Every episode synced with MES, historian, SCADA, ERP, and DCS. Delivered as state-action pairs, trajectories, scene and task descriptions in the formats foundation labs train on. The schema and a sample episode are available on request.

We deliver this for one vertical at a time, with depth. The simulation and evaluation stack on top of the data layer turns one vertical's data flywheel into the deployment partner relationship for that vertical. That is the wedge.

If you are training a vision-language-action model for industrial deployment, talk to us before you commit a year of capture from your own side.