How an Agent Built a 3D Paris Gallery by Chaining Two Hugging Face Spaces
An AI agent chained two Hugging Face Spaces to automatically generate a 3D Gaussian splat gallery of Paris monuments, signaling a building-block economy for multimedia AI.
- The agent reads a plain-text agents.md file that describes exactly how to call a Space, eliminating the need for manual glue code.
- By piping the output of an image generation Space directly into a 3D reconstruction Space, the agent achieved end-to-end automation.
- Hugging Face Spaces are becoming standardized building blocks for AI capabilities, allowing agents to compose them like npm packages.
- This foreshadows a future where multimedia software is built by agents orchestrating existing models rather than training from scratch or manual integration.
A Hugging Face engineer recently shared a fascinating experiment: he asked a coding agent to automatically build a 3D gallery showcasing Paris landmarks. Without ever opening an image generator or a 3D tool, the agent called two Hugging Face Spaces—one for image generation, another for 3D reconstruction into Gaussian Splats—and fused them into an interactive viewer.
The mechanism behind this is worth examining.
Why now?
This experiment validates a concept Mitchell Hashimoto calls the “building-block economy.” He argues that AI’s greatest strength lies not in writing code from scratch, but in gluing together small, proven components. This trend first emerged in code libraries, and now it’s hitting multimedia AI.
Think about it: the hardest part of using a top-tier image, video, TTS, or 3D model is rarely the model itself—it’s the integration: SDKs, GPUs, input formats, polling. But if each model becomes a documented, callable building block, an agent can snap them together like Lego. Hugging Face Spaces are quietly becoming exactly that.
The trick: agents.md
Every Gradio Space now exposes a plain-text file at https://huggingface.co/spaces/<author>/<Space>/agents.md. It contains everything an agent needs: API schema, call/poll endpoints, file upload instructions, and auth hints. No heavy client library—just a curl request gives the agent the full manual. Set an HF_TOKEN, and you’re ready to go.
The real breakthrough is chaining. The output of one Space becomes the input for the next. In the Paris gallery example, the agent first fed text prompts to an image Space (ideogram-ai/ideogram4) to generate 2D shots, then passed those images to a 3D Space (VAST-AI/TripoSplat) to produce Gaussian splats. Finally, it wrapped them in an HTML viewer. Zero manual steps.
The trend: a building-block economy for multimedia
AI capabilities are moving from monolithic models to standardized, callable services. With thousands of open-source models on Hugging Face deployed as Spaces with agents.md, it’s like an app store of API bricks. This mirrors the shift from monoliths to microservices in software. Building a 3D showcase once required expertise in photogrammetry, point clouds, and GPU optimization; now an agent can discover, understand, and assemble the required Spaces on its own. The barrier to using cutting-edge AI has collapsed.
Moreover, this model is a natural fit for the open-source community. Any developer can deploy a model as a Space, add agents.md, and instantly make it available to any agent on the planet. Could this spawn an "agent-first" model marketplace?
What you can do now
If you’re a developer, try it out. Find a Space for a model you like, curl its agents.md endpoint, and replicate the calling sequence manually or let an AI agent read it and generate the code. Integrations that used to take days can now take minutes.
If you have your own model, consider deploying it as a Space with agents.md. It could become a composable brick in someone else’s pipeline—maybe inside an agent-built product you never imagined.
The counterintuitive insight: a simple text file is the key
We often assume AI interfaces demand structured schemas like GraphQL or complex SDKs. Yet agents.md is just a plain text file—simple, like Markdown. That simplicity makes it a universal language for agents: no need to parse messy documentation; one curl fetches it all. It suggests that the future of AI tool communication might rely not on heavy protocols, but on human-readable, agent-executable instruction sheets.
When AI can read the manual and assemble the pieces on its own, how far are we from true software automation?
Analysis by BitByAI · Read original