The Path to Autonomous Software Development

Will we get there by generating better code for developers? Or better apps for product owners?

Feb 18, 2025

Software has been eating the world for over a decade with no sign that its appetite is slowing. But despite all the advances in computing, programming languages, and AI, software is still extremely expensive to build and maintain. As a result, the benefits of custom software are only available to few. As I wrote in The Future of Software, at Durable we’re passionate about democratizing these benefits to everyone, regardless of their software development skills. We’re excited about a future where anyone can empower their ideas by building custom software for themselves or others.

Two paths to autonomous software development

Regardless of how we achieve autonomous software development, the product UX needed is obvious: product owners first concretize their requirements and generate custom software apps to satisfy them. Then they’ll verify that these apps actually meet their requirements. When satisfied, they’ll deploy the app and begin using it. Finally, the requirements may change and product owners will want to modify the software apps they’ve created as their needs evolve. Any deployments should be continually maintained when the surrounding context (deployment environment, APIs) change. So how will we get from here to there? It’s clear that we’ll need AI to create software for us, but less clear what kind of product or AI we need. I’ll contrast two competing paths toward this future.

Codegen: progressively better code for developers

On the first path we incrementally generate more broad and correct code for developers, starting from today’s products like copilot and GPTX. I’ll call this path codegen, since the output is code and the primary user (for now) is developers. They enter prompts that describe a subpart of the app’s requirements, take the resulting code, and incorporate it into the app’s codebase. Once the code for the entire app is written, they deploy it and connect it to the resources and APIs it depends on. Developers need to invest time to turn these outputs into functioning apps. But as the AI improves over time developers can prompt it with increasingly complex and broad parts of the app’s requirements. Eventually, maybe the AI is able to write the code for the entire app. At this point, product owners could potentially use the product without requiring developer assistance, but two open problems remain: how the AI deploys functioning apps, and how product owners verify that the app meets their requirements.

Appgen: progressively more capable software for product owners

On the second path we start with an AI-powered platform that can generate fully functioning apps from requirements, and incrementally increase the scope and complexity of the apps the AI can generate over time. I’ll call this path appgen, since the output is an app, and the primary user is the product owner. They enter their requirements and interactively concretize them by adding detail. The AI then builds fully functioning apps from these requirements. The apps can be deployed instantly for product owners to try. To do this, the AI writes code to connect the app to supported resources and APIs it needs and makes them available to it during runtime. The higher bar for producing functioning and ready-to-deploy apps means that initially the AI is constrained in the kinds and complexity of apps it can produce. Some of these constraints are hard, e.g. the inability to use a particular API or to generate mobile apps. Others are soft constraints: e.g. the AI doesn’t know what a specific requirement means but a user can provide a step-by-step recipe for it to follow. As the AI improves, it can build more complex fully-functioning apps.

Codegen v. Appgen

These two paths differ principally by whether the output is code or a functional app. Consequently, they have different target audiences: codegen is targeted at developers which can take the result of the AI-generated code, correct errors, and deploy the resulting app. Appgen is targeted at product owners who want to instantly launch and deploy the generated apps themselves.

At Durable, we predict that appgen is the only viable path to autonomous software development, and we’ve bet on that prediction by designing a clean-sheet product and the bespoke AI to power it. This is a departure from the current AI status-quo, which is betting that better developer tools will over time automate progressively more of the software development process and eventually reach the goal of full autonomy. So how did we come to such a different conclusion? I’ll outline our arguments below following the product UX steps I described above: requirement concretization, generation, verification, deployment, and modification.

Requirement Concretization

Let’s start with the first step of autonomous software development: creating the requirements for what the software should do. Software development is the lossy translation of requirements into code. Going the other way from code to requirements is extremely difficult. This is why reading code that others wrote is much harder than writing it from scratch. When writing code, developers maintain a mental state of the requirements and their mapping to the code. They infer these through interactions with product owners. But neither the requirements nor the requirements→code mapping is represented in the code itself. Since these ingredients are absent from the codebase, the AI must have separate access to the requirements in its context. But the codegen path doesn’t require the AI to handle requirements since developers don’t reference them explicitly in their code workflow. The appgen path on the other hand must explicitly represent the requirements because they are its primary interface to its users: the product owners.

Even if the requirements were explicitly available to the AI, they will likely be underspecified and vague at first. For example, consider “an app that allows users to read and share reviews of books they’ve read”. There are two ways to concretize the requirements further: Q&A, and trial-an error. When concretizing via Q&A, product owners and software developers iterate to refine requirements based on what is feasible and efficient to implement. On the other hand when concretizing via trial-and-error, product owners use a draft version of the app and make changes to the requirements based on their experience with it. Both are necessary for the concretization of requirements, but crucially, both are interactions between product owners, developers (or the AI) and the product itself. The code is not directly involved in requirement concretization. Apart from determining what is feasible and efficient, it plays no part. Appgen is the only path that requires the development of the AI capabilities and the UX to allow product owners to concretize requirements in tandem with the AI. It must both ask questions of product owners to concretize their requirements through Q&A and also enable them to experience the software and adjust the requirements by trial-and-error. Codegen for developers requires none of these bespoke capabilities.

Generation

Let’s now assume that the requirements are concretized in tandem with the AI such that they are feasible. On the codegen path, the AI writes a progressively larger share of the app’s code. But correctness of any code the AI generates depends on all the other code it interacts with. Unless the AI is writing the code for the entire app at once, it’s unlikely it’ll write correct and functioning code without considering the entire context of the codebase. And this problem only gets worse as we progress along the codegen path and the AI is writing more of the app’s codebase. The need to consider the entire context of the codebase effectively rules out autoregressive codegen LLM models which would need to emit the entire codebase at once and consider it in its entirely for each emitted output token. The appgen path on the other hand requires the requirements→code mapping to be explicitly represented, both to enable requirement-concretization features like Q&A, and to ensure that the codebase can be decomposed into subparts whose relations to each other and the requirements are explicitly represented. Explicitly representing these interfaces is the only way to ensure that the code that is produced is functional and correct in the context of the entire codebase and deployment environment without developer intervention.

Verification

Even if correct and functional code for the app is emitted, how would the product owner, which we assume is not able to comprehend the code, verify its correctness? They could use trial-and-error by experimenting with the generated software, but this doesn’t scale for anything more than toy apps. Developers solve this with code testing, but code tests are useless to product owners who don’t have a mental model of the requirements→code mapping. For product owners to verify the correctness of the code, they would need to see tests at the requirement level. An AI that generates these tests needs to maintain the requirements→code mapping so it could invert it and generate tests for product owners. Only the appgen path builds the specialized AI necessary to maintain the requirements→code mapping explicitly and generate tests at the level of requirements for product owners. Codegen products need only focus on tests at the code level, which would be incomprihensible to product owners.

Deployment

Once the code is written and product owners are satisfied of its correctness, they will want to deploy it. But deploying an app involves far more than writing its code. The APIs and services the app uses, such as databases, external services, security, and hosting, all need to be configured and made available. In fact, the code must be written with these runtime requirements and constraints in mind from the ground up, lest the application be impossible to deploy. We have now entirely left the realm of autoregressive code LLMs which output only code. But it’s also clear that generating apps that can be deployed requires the AI to not only configure the runtime environment and services necessary for the app, but also write its code specifically for that runtime environment. Only the appgen path leads to the necessary capability by building AI that it is not only aware of the runtime environment during code generation, but also able to configure it to deploy the app.

Modification

Finally, requirements evolves, and product owners will want to modify and update their apps. If developers are required to take the AI-generated code and correct it for errors before deploying it, any modification to the requirements results in new AI-generated code without these corrections and deployment updates. Developers would need to re-review the code and correct it from scratch on every update. Only the appgen path explicitly represents the requirements→code mapping. This in turn enables surgical updates to the code based on only the parts of the requirements that changed.

The way forward

It’s clear to us that autonomous software development requires AI capabilities far beyond what developers directly need to improve their day-to-day productivity. Chief among these capabilities is the explicit representation of the requirements→code mapping. This mapping is needed to power Q&A with product owners to concretize requirements, the generation of verification tests at the requirements level, and the ability to make future edits to the requirements and generate correct modifications to the codebase. The AI must also write code that’s compatible with a known and configurable runtime environment so that product owners can launch the generated apps without developer involvement. None of these capabilities are required when incrementally improving the productivity of developers. So we believe the only path to achieve autonomous software development is via a ground-up design of the product, UX, and AI such that it enables building, verifying, deploying, and modifying fully functional apps for product owners from day one.

Thanks to Liam McInroy, and Chris Fruci for reading drafts of this article.

Amana

May 19

I am a product from a HEMS (Home Energy Management System), and I need a 3D scene that visualizes energy flow. The scene should include elements such as a residential villa with photovoltaic panels on the roof, the power grid, household electricity usage, a battery storage system, and an EV charging pile.

Expand full comment

Within Reason: The durable.ai Blog

Discussion about this post

Ready for more?