Agentic coding tools have taken the developer world by storm. Whether you're using Gemini Code Assist, GitHub Copilot, or other AI-powered development assistants, these tools promise to revolutionize how we write software. But after months of intensive use, a surprising pattern emerges: most of the elaborate automations and workflow optimizations don't stick. The simple approaches win.
Armin Ronacher, creator of Flask and a prolific open-source developer, recently shared his experiences with agentic coding tools. His insights are invaluable because they come from someone who genuinely tried to make these features work—and had the discipline to recognize when they didn't. This isn't a critique of the tools themselves, but rather a practical guide to what actually works in day-to-day development.
The Rules of Automation
Before diving into specific failures, it's worth understanding Ronacher's framework for evaluating automation:
First, only automate things you do regularly. This seems obvious, but it's easy to get excited about automating edge cases that rarely occur. The cognitive overhead of remembering and maintaining automation for infrequent tasks often exceeds the time saved.
Second, if you create an automation and stop using it, delete it. Non-working automations are surprisingly common. Either you can't get yourself to use them, you forget about them, or you end up fine-tuning them endlessly. Unused custom commands cluttering your workspace confuse both you and others who might work in the same codebase.
The result? Ronacher does the simplest thing possible most of the time: just talk to the machine more, give it more context, keep the audio input going, and dump his train of thought into the prompt. That's 95% of the workflow. The rest is good use of copy/paste.
Slash Commands: Promise vs. Reality
Slash commands allow you to preload prompts to have them readily available in a session. In theory, they should be incredibly useful—predefined workflows at your fingertips. In practice, many of the commands Ronacher added went unused.
Several limitations make slash commands less useful than they could be. There's only one way to pass arguments, and it's unstructured. This proves suboptimal for complex use cases. Additionally, file-based autocomplete doesn't always work with slash command arguments, making it tedious to specify which files to operate on.
The workaround? Design commands that use Git state to determine context. For example, a grammar-fixing command can operate almost entirely from the current git status context, eliminating the need to explicitly provide filenames.
Here are slash commands that didn't survive:
The /fix-bug command was supposed to pull issues from GitHub and add extra context. But there was no meaningful improvement over simply mentioning the GitHub issue URL and voicing thoughts about how to fix it. The natural language approach proved just as effective.
The /commit command attempted to generate good commit messages, but they never matched the developer's personal style. Commit messages are surprisingly personal—they reflect how you think about changes and what context future readers will need.
The /add-tests command was particularly disappointing. The idea was to have the agent skip tests during development, then use an elaborate reusable prompt to generate them properly at the end. But this approach wasn't consistently better than automatic test generation, which itself remains unsatisfying.
The /fix-nits command for fixing linting issues and running formatters never became muscle memory. Most AI coding assistants already know how to do this—you can just tell them "fix lint" without needing a dedicated command.
What works instead? Speech-to-text is transformative. Talking to the machine means you're more likely to share more about what you want it to do. The friction of typing encourages brevity; speaking encourages completeness.
Copy/paste also remains surprisingly effective. Maintaining link collections, fetching files proactively into git-ignored folders, and mentioning them directly—it's simple, easy, and effective. You need to be selective to avoid polluting context, but compared to having the agent search in wrong places, more text doesn't harm as much.
Hooks: The Efficiency That Wasn't
Hooks should enable powerful automation—running code at specific points in the agent's workflow. But Ronacher hasn't seen efficiency gains from them yet.
Part of the problem is running in permissive modes (without confirmation prompts). Hooks can only guide agents through denies, which don't work in permissive mode. For instance, trying to use hooks to make the agent use uv instead of regular Python proved impossible.
The workaround is almost comically simple: preload executables on the PATH that override the default ones. A shell script that prints "This project uses uv, please use 'uv run python' instead" and exits with code 1 steers the agent toward the right tools more reliably than hooks.
Timing is another challenge. Running formatters at the end of a long edit session would be ideal, but currently you must run formatters after each edit operation. This forces the agent to re-read files, wasting context.
Print Mode: The Unrealized Potential
Print mode—having the agent generate output that can be captured and used programmatically—seemed promising. The idea of mostly deterministic scripts with small inference components is appealing. Use an LLM for the commit message, but regular scripts for the commit and gh pr commands. Make mock data loading 90% deterministic with only 10% inference.
The challenge is reliability. Print mode is slow and difficult to debug. The concept is sound—inference is too much like a slot machine, and many programming tasks are actually quite rigid and deterministic. We love linters and formatters because they're unambiguous. Anything we can fully automate, we should. Using an LLM for tasks that don't require inference is the wrong approach.
But the execution isn't there yet. Whether using the SDK or command-line flags, the results haven't matched the potential.
Sub-Tasks and Sub-Agents: Parallelism Pitfalls
Task tools are useful for basic parallelization and context isolation. Agent orchestration features were meant to streamline this process, but they haven't proven easier to use.
Sub-tasks and sub-agents enable parallelism, but you must be careful. Tasks that don't parallelize well—especially those mixing reads and writes—create chaos. Outside of investigative tasks, the results are poor.
While sub-agents should preserve context better, starting new sessions, writing thoughts to Markdown files, or even switching to a different model in the chat interface often produces better results.
The Counterintuitive Conclusion
What's fascinating about workflow automation is that without rigorous rules you consistently follow as a developer, simply taking time to talk to the machine and give clear instructions outperforms elaborate pre-written prompts.
If you don't use emojis or commit prefixes, if you don't enforce templates for pull requests, there's little for automation to latch onto. The flexibility that makes you productive as a human makes automation harder to apply.
Practical Takeaways for AI-Assisted Development
Based on these experiences, here's what actually works:
Embrace voice input. The barrier to providing context drops dramatically when you can speak instead of type. More context generally leads to better results.
Keep automations minimal. If you're not using a slash command within a week of creating it, delete it. The cognitive overhead of maintaining unused automation exceeds its potential value.
Use Git state as context. Design your workflows so the agent can infer what you're working on from git status, git diff, and similar commands. This eliminates the need for explicit file arguments.
Prefer simple PATH tricks over hooks. If you need to steer the agent toward specific tools, interceptor scripts that print helpful messages and exit are more reliable than hook-based approaches.
Accept that some features aren't ready. Print mode, sub-agents, and elaborate slash commands may improve over time. For now, the simple approach of talking to the machine and providing good context wins.
The Bigger Picture
These lessons extend to any AI-assisted workflow, whether you're using Gemini Code Assist, GitHub Copilot, or other tools. The temptation to over-engineer automation is strong—we're developers, after all. But the most effective approach is often the simplest: clear communication, good context, and the discipline to delete what doesn't work.
The tools will improve. Features that don't work today may become essential tomorrow. But the fundamental insight remains: automation should serve your workflow, not the other way around. If you find yourself adapting to your automation rather than your automation adapting to you, something has gone wrong.
For now, talk to the machine. Give it context. Keep it simple. And don't be afraid to delete the elaborate workflows that seemed like good ideas but never quite stuck.