Artificial Intelligence constructs, from Large Language Models answering questions to Agentic AI that runs various workflows are fantastic, amazing, helpful tools in getting a job done. They aren’t quite completely automating entire tasks (The best ones as of this writing are correctly implementing around one out of three tasks accurately: https://llm-stats.com/benchmarks/apex-agents) but they are still a very helpful tool. “Vibe Coding” which means explaining to a model that can write code (or a Codex) what you want the code to do, trying it out, then correcting it until it does the thing, is prevalent everywhere now. And it’s easy to do.
But the code a Codex creates meets a single need: to ship. It works through the process line by line, having learned from decades of training corpus material on the most popular languages, libraries and frameworks, and does its best to cobble together that vast background of knowledge into a product. However, when you take a look at what it creates, in some cases it’s almost incomprehensible. You can try this yourself: Open your favorite AI Codex tool and ask it to build a Single Web Page App (SWA) that does something non-trivial, like reading and writing data to a database like SQLite. Ensure you have various buttons and menus to do that work. Vibe code that until it works.
Now open the assets it created. First, you’ll notice how massive the page is. Now change two of the more non-trivial menu items or button actions that the Codex created to do something else. Don’t just rename the button or add a color, have it do some work. My observation is that if you are not an experienced developer in the framework and code the Codex picked, you won’t be able to do that, or at best you’ll break it and need the Codex’s help again. And when it does, the codebase will become even more massive.
If you are an experienced developer, you can probably suss out what the Codex did and make the change. But after the initial amazement that it coded the brief, you will begin to see things that you would have done very differently. I’ve opened some of the Codex results, and I am actually pretty shocked that it wrote what it did. I find myself saying “why not just do a single line of a lambda function rather than these 58 lines it wrote?”
It’s not hard to write software that you can deploy. It’s really hard to check that software for edge case bugs, security violations, optimizations, and package it for deployment into an enterprise. Depending on how large the deployment becomes, most Codex’s start to choke, lose context and memory and start to hallucinate when you pass the entire solution back in and say “Revise this to include a custom sort the customer wants.” Try that now with your single page. Add a new, non-trivial function to the base. Then run a code optimization on it. Then pass it through a security check.
This is why you haven’t seen Codex developed software replace an SAP or some large enterprise software package. You need an extremely high level of software development process and procedure over decades and decades to release something like that.
To be clear, I am not saying “don’t vibe code things”. You should use this amazing new tool for things you need to create. The advice I give my clients is to develop in a modular way, as atomically as possible. Micro-services, Service Oriented Architectures, or just making sure each module is small and encapsulated and talks to other modules via an endpoint will serve you well. Then you can swap out, optimize, check and maintain small chunks of code.
This is today. Tomorrow AI will start eating away at these problems, and this post won’t be as accurate. But it illustrates an important point: You need to be smarter than what you are using as the tool. You need to know what it should do, and why it might make the choices it does. It will help in your prompting, architecting, “Brain” files and so on to get the best results.
Remember: It’s a tool, not a replacement for your professionalism.