[0] From STEP Files to Self-Reasoning Buildings: The Pipeline To Agentic Digital Twins
A technical deep-dive for practitioners in Digital Twins, STEP/EXPRESS, and local LLM pipelines.
The path to agentic digital twins runs through ISO standards (STEP files and EXPRESS schemas) rather than multimodal LLMs, enabling buildings to autonomously reason about sensor events and validate against design intent. A practical pipeline transforms CAD geometry into structured data that local LLMs validate through Jinja2-templated prompts, triggering autonomous actions like risk escalation—all while maintaining data sovereignty and predictable costs. This requires clean data exports and semantic infrastructure, but shifts buildings from reactive maintenance to continuous assurance with machine-actionable insights.
*A follow-up to "Diving Back Into the Foundations to Build the Future of Agentic Digital Twins"*
In my previous post, I argued that the path to agentic digital twins runs through the unglamorous terrain of EXPRESS schemas and STEP files. Some readers pushed back: "Why dig into 1990s ISO standards when we have multimodal LLMs that can interpret images, CAD screenshots, and natural language?"
Here's my answer: because I've actually built the alternative, and it works.
This post documents the pipeline I've been developing — one that transforms static STEP geometry into a substrate for autonomous reasoning. No proprietary rule engines. No cloud-locked AI. Just ISO standards, structured prompts, and local inference.
[1] The Strategic Misunderstanding
The industry keeps framing this as "AI for BIM." That framing is backwards.
BIM models are not the destination. They're raw material. The 3D geometry sitting in your CDE right now contains decades of encoded design intent, spatial relationships, and product specifications — all trapped in formats optimised for human viewing, not machine reasoning.
When that data becomes machine-interpretable and your validation logic becomes machine-actionable, the calculus changes entirely:
- Continuous assurance replaces periodic audits
- Predictive intervention replaces reactive maintenance
- Autonomous triage replaces manual escalation
- The building becomes an active participant in its own lifecycle
This isn't speculative. It's a natural extension of what I'm about to show you.
[2] The Technical Pipeline
Here's the data flow from raw model to autonomous reasoning:
```
3D Model (CAD/BIM)
|
v Export
STEP File (ISO 10303-21)
|
v Parsed via EXPRESS schema
EXPRESS Interpreter (ISO 10303-11)
-> Extract entities, attributes,
relationships, geometry
|
v Structured JSON
Jinja2 Prompt Builder
-> Context-aware prompts
-> Rule definitions
-> Expected output schemas
|
v Send to LLM
Local LLM (Ollama)
-> Data validation
-> Risk assessment
-> Pattern detection
-> Compliance reasoning
|
v Structured results
Validation Engine
-> Aggregate findings
-> Flag issues
-> Suggest actions
|
v Feeds into
Agentic Digital Twin
-> Sensor-triggered evaluation
-> Risk reasoning chains
-> Automated escalation
-> Self-updating state
```
Each layer serves a specific purpose. The STEP file provides vendor-neutral geometry and product structure. The EXPRESS schema gives us the semantics — not just "this is a wall," but the wall's material composition, fire rating, spatial boundaries, and relationships to adjacent systems. Jinja2 templates let us generate prompts that are context-aware and output-structured. The local LLM reasons about the data without sending proprietary information to external APIs.
The validation engine is where things get interesting. Today, it aggregates LLM outputs and flags anomalies. Tomorrow, it becomes the orchestration layer for autonomous agents.
[3] Why Local LLMs Matter
I run this pipeline against Ollama, not cloud APIs. The reasons are practical:
**Data sovereignty.** Construction and facilities data often contains commercially sensitive information — equipment specifications, spatial layouts, operational patterns. Sending that to external inference endpoints creates legal and competitive exposure that most asset owners haven't fully evaluated.
**Latency.** Agentic systems need to reason in loops. A sensor event triggers a query, which triggers another query, which might trigger an action. Round-trip latency to cloud endpoints accumulates. Local inference keeps the loop tight.
**Cost predictability.** When you're running continuous validation across a building's lifecycle, per-token pricing becomes a liability. Local inference has upfront compute costs but predictable marginal economics.
**Iteration speed.** I break things constantly in development. Local models let me experiment without burning through API credits or hitting rate limits.
The tradeoff is capability. Current local models (7B-70B parameter range) can't match frontier model performance on complex reasoning. But for structured validation tasks with well-defined schemas and constrained outputs? They're more than adequate.
[4] A Concrete Scenario
Let me make this tangible.
A moisture sensor in a plant room detects a leak. In a conventional setup, this generates an alert. A facilities manager receives it, pulls up drawings, mentally traces the spatial relationships, assesses risk, and decides whether to escalate.
In an agentic setup, the digital twin initiates a reasoning chain:
1. **Sensor triggers event** — The agent receives structured data: location coordinates, moisture level, timestamp.
2. **Entity retrieval** — The agent queries the STEP-derived model for entities within a defined radius: pipes, electrical conduits, equipment, penetrations.
3. **Schema interpretation** — Using the EXPRESS schema, the agent understands not just proximity but functional relationships. That conduit isn't just "nearby" — it carries the main distribution for this zone.
4. **Prompt generation** — Jinja2 assembles a context-aware prompt with the structured entity data, thresholds, and risk assessment criteria.
5. **LLM evaluation** — The model reasons about proximity, material properties (is the conduit sealed?), containment measures (is there a drain nearby?), and historical patterns if available.
6. **Autonomous action** — If risk exceeds threshold: raise alarm with severity classification, isolate affected circuit if authorised, log event with full reasoning chain for audit, update digital twin state, generate structured action plan for maintenance dispatch.
This entire sequence can execute in seconds. The facilities manager still exists in the loop — but as an oversight function, not a bottleneck.
[5] What This Requires
I won't pretend this is simple to implement. The prerequisites are substantial:
**Clean STEP exports.** Most BIM-to-STEP workflows produce geometry-only exports. You need the full product structure — properties, relationships, classifications. This often requires custom export configurations or post-processing.
**Schema expertise.** EXPRESS isn't intuitive. ISO 10303-11 is a powerful data modelling language, but it has a learning curve. Understanding how to traverse entity relationships and extract meaningful attributes takes time.
**Prompt engineering discipline.** Jinja2 templates need to be versioned, tested, and maintained like code. Prompt drift is real — small changes in wording can produce meaningfully different outputs.
**Integration architecture.** Connecting sensor systems, building management, and the reasoning pipeline requires robust event handling and state management. This is distributed systems work.
**Validation rigour.** LLMs hallucinate. Every output needs verification against schema constraints and physical plausibility. The validation engine isn't optional — it's load-bearing.
[6] The Honest Assessment
I don't have production deployments yet. This is experimental work — breaking things in sandboxes before they touch real sites.
The current state:
- STEP parsing and EXPRESS interpretation: **working**
- Jinja2 prompt generation: **working**
- Local LLM inference: **working**
- Structured output validation: **working**
- Autonomous action loops: **prototype stage**
- Sensor integration: **not yet implemented**
- Production hardening: **nowhere close**
I'm sharing this because the architecture is sound and the path is clear. The implementation is a matter of engineering effort, not fundamental research.
[7] Why This Matters Beyond the Technical
For transformation leaders evaluating digital twin investments: the question isn't whether to adopt AI — it's whether you're building on foundations that enable autonomy or lock you into perpetual manual oversight.
Most "AI-powered" BIM solutions today are glorified chatbots sitting on top of document repositories. They can answer questions about your data. They cannot reason about your data, validate against design intent, or take autonomous action.
The difference is structural. If your data layer is PDFs and model screenshots, you've capped your ceiling at natural language Q&A. If your data layer is machine-interpretable product models with semantic schemas, you've built a substrate for genuine agency.
STEP files and EXPRESS schemas aren't exciting. They're old, they're dense, and they require expertise that isn't fashionable. But they represent one of the most rigorous attempts at describing engineering products in machine-interpretable form.
The future of intelligent buildings won't be built by ignoring these foundations. It will be built by understanding them deeply enough to transcend them.
*Next in this series: Practical EXPRESS — parsing ISO 10303-11 schemas and building entity relationship graphs for downstream reasoning.*
Key Takeaways
- Building autonomous digital twins requires transforming static STEP geometry into machine-interpretable data using EXPRESS schemas, not relying on multimodal LLMs and image interpretation alone.
- Local LLM inference (via Ollama) provides data sovereignty, predictable costs, and tight reasoning loops that cloud APIs cannot match for continuous building validation.
- The pipeline (STEP → EXPRESS → JSON → Jinja2 prompts → local LLM → validation engine) enables autonomous action loops where sensor events trigger structured reasoning and escalation in seconds, not through manual facilities management.
- Most "AI-powered" BIM solutions are document chatbots; genuine agency requires machine-interpretable product models with semantic schemas as the data foundation.
- Production deployment requires clean STEP exports, EXPRESS schema expertise, disciplined prompt versioning, robust event handling, and rigorous LLM output validation—engineering effort, not fundamental research.