#344

Weekly Thing 344 / Mythos, Artemis, Signals

Working with agents, economics of software teams, evals as new PRD, OpenAI Codex superapp, saying goodbye to Agile, cybersecurity proof of work.

Good morning! ☕️

It’s been a bit. I shared in WT343 that we were going on a trip to Europe for two weeks so I was taking a little break. That trip got off to a rocky start with a blizzard causing a flight cancellation but after that bump we had a wonderful time in Amsterdam, Paris, and Barcelona. Mazie joined us in Paris for a quick stint at Disneyland Paris and then we rejoined here in Barcelona. It was so great to see her on her semester abroad and I’ll point you to her blog where she has been writing regularly. Her most recent update from a weekend in the Azores. You can see our trip logs from Escape in Europe for more.

More than any prior trip this one was anchored with escape rooms. Tammy found us twelve amazing escape rooms with the finale being Londium in Barcelona, the 7th ranked room in the world according to TERPECA 2025. It was incredible by the way. I took the opportunity with so much escape room activity to make huge changes to our Escaping Things website including a dedicated trip function so you can see our Escape in Europe trip showcasing those twelve rooms including commentary from all of us!

Back to the timeline. Just two days after Tyler and I got back home I came down with an infection that started in my left ear and then went to the whole left side of my head and down my neck. Within 24 hours I went from totally fine to being at the emergency room, very “out of it”, and getting admitted to the hospital. I blogged a bit about my bout with facial cellulitis. After two days of IV antibiotics I was feeling much better and able to go back home. Thank you modern medicine. The outcome for a sepsis diagnosis in your head is very bleak without antibiotics.

Now I’m back and typing to you here on a Sunday morning. Should we get back right to it with some links? Yes, we should! ✅


The constant crowd of visitors at the Louvre looking to get a photo of the Mona Lisa.

March 20, 2026
Louvre, Paris


Notable

You can discuss any of these links at the Weekly Thing 344 tag in r/WeeklyThing.

Working with agents doesn’t feel like flow — Bill de hÓra

I have been working with many agents building and exploring things and I’ve been curious to observe what it feels like. I think that we will find that for people working closely with agents that doing that for 2-3 hours at a time is probably a limit. Maybe a little bit more, but co-creating alongside an agent takes a different kind of energy. This blog post commenting on flow and that feeling was interesting to me.

After a stint of deep work, I usually feel the tiredness of having held a line of thought together for a long time via concentration. After a stint with agents, the tiredness feels more like the aftermath, again, of sustained play or competition. The accumulation of lots of small judgments, many state updates, repeated course corrections, constant low-level vigilance. It’s neither better or worse, just different, more like a workout. Last of all, working with agents feels like… fun. Flow is not fun, it’s immensely rewarding yes, but not fun.

For me, I have found that I do enter a state of flow working with agents to create and build. I lose track of time and I really “feel” like I’m co-creating with another entity. Collaborating and ideating. A lot of the rest of the comments here I agree with. There is a game like aspect to it. We are still incredibly early in understanding how people and agents will collaborate.

The Economics of Software Teams: Why Most Engineering Organizations Are Flying Blind - Viktor Cessan

This is a great read and the Cessan is spot on that most (none I’ve ever seen) software teams think about the financials this way. Some get close, particularly with engineering teams that make other teams more productive there is a leverage view you have to apply to know if it makes sense. But this is all getting turned upside down with automatic programming and agentic delivery. We are needing to go back to the basics, and this math may be the best place to start.

Want to understand the current state of AI? | MIT Technology Review

Great overview of the pace of progress on frontier models. The charts here are incredible.

Despite predictions that development will plateau, AI models keep getting better and better. By some measures, they now meet or exceed the performance of human experts on tests that aim to measure PhD-level science, math, and language understanding. SWE-bench Verified, a software engineering benchmark for AI models, saw top scores jump from around 60% in 2024 to almost 100% in 2025. In 2025, an AI system produced a weather forecast on its own.

The crazy part? We are still in the very beginning of this transformation.

Browser Run: give your agents a browser

It is surprisingly difficult to give an AI Agent a browser to use the web with. The web is inherently very visual and there is a lot of complexity in the interfaces for agents to navigate. It seems very clear that agents running Chrome is fine for testing a website, but it is NOT what an agent would prefer. Cloudflare making a cloud-hosted agent-first browser makes a ton of sense. Notice how unique the features are too. It is very obvious we are going to be building a lot of software for agents.

Evals Are the New PRD — Elezea

Agents aren’t just helping us write software, and they aren’t just end user features, they are actually helping evaluate and build the software too. Without too much effort you can create agentic loops that evaluate how your product is performing and automatically works to improve the experiences that get negative scores. This is almost a standard operating practice now in part because there is so much data you can’t possibly do it any other way.

Building a CLI for all of Cloudflare

When I saw this article I wondered if the focus was really “making a CLI for agents” and that is exactly spot on. They start out plainly:

Increasingly, agents are the primary customer of our APIs. Developers bring their coding agents to build and deploy applications, agents, and platforms to Cloudflare, configure their account, and query our APIs for analytics and logs.

For a service like Cloudflare they have to pivot their entire product offering to be agent native. That means rethinking how agents can learn, use, and manage their software. Agent first means something totally different here. And if your service is hard for agents, they will just divert around you to another solution.

This space of creating products that are actually for agents and not for people is pretty interesting. I have my own agent product I’ve been working on, mb, a micro.blog client built for agents.

The Human Cost of 10x AI Productivity - Denis Stetskov

This article to me reads as the real issue with the human-in-the-loop answer to agentic transformation. It sounds good and makes folks feel better — oh good, there is a person looking at all that. However, to agnatically transform something you are usually looking to get “machine speed” and that becomes much more difficult with a human-in-the-loop. Right now there are a lot of senior engineers being asked to do this ill defined task. We need to learn quickly how to move our systems to safer environments to minimize this before we burn out tons of people. Safety of systems is the place to focus to make this better.

Our evaluation of Claude Mythos Preview’s cyber capabilities | AISI Work

It is predictable that coding agents are going to find vulnerabilities, and they are moving along very quickly. It seems we are now in race to use agents to secure systems at the same time others are using them to attack systems. The reality is that the vulnerabilities Mythos has found are nearly impossible for people to find. All this makes me wonder if there will be a time when we believe that coding is just too hard for people to do and to do it safely agents should do it. Driving a car could end up in the same place.

OpenAI Unveils Codex “Superapp” Update with Computer Use, Automations, Built-In Browser, and More - MacStories

Codex just got a ton of new capabilities.

On the productivity side of things, the update allows Codex to operate your desktop apps, interacting with interface elements and inputting text, for example. We’ve seen computer use from other AI companies before, but one thing that sets Codex apart is its ability to work in your apps in the background so they don’t steal the focus from whatever app you’re already using.

These systems are moving so fast it is impossible to keep up.

Saying Goodbye to Agile

Working in technology teams to build things I’ve practiced Agile delivery for decades. The arrival of automatic programming capabilities is throwing everything up in the air.

One unambiguously positive development that’s followed is that software professionals are writing specs again. LLMs - like many of us - do not perform well with ambiguity, and specifying problems is proving to be an effective tool for generating correct code. Agile told us “Working software over comprehensive documentation”. Spec-Driven Development is telling us “Comprehensive documentation creates working software”.

I’ve been building a bunch of things with agentic coding tools and this is how you do it. By the way, almost everyone also uses agents to help in creating that specification.

The part that everyone misses in this though is the “why” we should make this change. The fundamental issue is less about spec driven development, and more about the fact that making a mistake is 10x less expensive than it was before. You can ask the agent to refactor it and you are on your way pretty quickly.

That one issue, what is the impact of something being wrong, is the single most important thing that needs to go into figuring out how you do the work. And automatic programming is changing that in dramatic ways. It was changes to programming languages and moving into more interpreted and dynamic development environments that enabled agile. What is it that automatic programming is enabling?

Cybersecurity Looks Like Proof of Work Now

This is an interesting read of the impact AI is having on securing and exploiting systems.

If Mythos continues to find exploits so long as you keep throwing money at it, security is reduced to a brutally simple equation: to harden a system you need to spend more tokens discovering exploits than attackers will spend exploiting them.

In a way this isn’t completely surprising since to exploit something you need to find one path but to secure it you need multiple paths? But either way the economics here are concerning.



Journal

Apr 11, 2026 at 3:02 PM

Dock is in!

Apr 12, 2026 at 7:53 PM

We saw Project Hail Mary today and it was every bit as good as everyone that has seen it has told me. I read the Martian and loved that book but had no idea about this book until it hit the theater. Just a great movie with incredible characters. Definitely one to go to the theater for!

Apr 12, 2026 at 7:55 PM

My first meaningful print with the new Bambu P2S is a Gridfinity setup for my office desk. The grid is down and now I’m populating with various bins. There are so many options and the Bambu P2S prints things effortlessly. This may be showing up in more drawers soon.


Briefly

Beautiful images from NASA’s Artemis mission to adorn your phone with. → Artemis II Mobile Wallpapers - NASA

I’ve been using SQLite in some of my agentic apps and it is just a wonderful piece of software. → SQLite Release 3.53.0

Interesting and short read on how folks are making agentic systems even more capable. → From Assistant to Collaborator: How My AI Second Brain Grew Up — Elezea

The title on this is so clickbait I feel gross even including it, but I find these articles compelling to read anyway to expand thinking and see what folks are doing that are pushing agentic capabilities. → How I run multiple $10K MRR companies on a $20/month tech stack | Steve Hanov’s Blog

Good primer on getting more out of Claude Cowork. I’m close to dumping OpenClaw and instead just running Cowork with Claude Dispatch on my phone. → Cowork - How to AI

Bitcoin intentionally drops miner rewards over time. The math is sensible. In the early days miners should be compensated through direct mining activity, but over time Bitcoin should transition to fees paid for transactions. That transition is still in front of the Bitcoin ecosystem and given that Bitcoin has landed more as digital gold than cash, the transaction fees may be hard to get profitable? → Bitcoin miners are losing $19,000 on every BTC produced as difficulty drops 7.8%

Delightful trip with the awesome Nintendo DS platform. → Introduction to Nintendo DS Programming

This shortcut is one of the more powerful that exist, and really highlights how you can make some incredible solutions in Shortcuts. → Introducing Apple Frames 4 - MacStories

WordPress Plugins are a security nightmare, but this strategy can be applied to many software pipelines. → Someone Bought 30 WordPress Plugins and Planted a Backdoor in All of Them

Time to read up on neurosymbolic AI! → The biggest advance in AI since the LLM - Gary Marcus

Beads was one of the first products I remember reading about that was specifically made for an agent as a user. I find this category of software fascinating. → Gas Town: from Clown Show to v1.0 | Steve Yegge

I’ve tried using Obsidian multiple times and concluded every time that it is a rabbit hole that productive time sinks into. However, having your agent use it to store knowledge? That is rather interesting. Maybe Obsidian should have never been used by people. I can mostly agree with that. → obsidian-skills: Agent skills for Obsidian

I recently bought a Bambu Lab P2S and have been loving it. This article comparing their strategy to what DJI did with drones seems spot on, and was part of why I felt good buying one of their printers! → Bambu Lab X2D Signals Potential Shift Toward Consumer-Focused 3D Printing Market « Fabbaloo

Everyone is all AI all the time but I believe that crypto tech, particularly Ethereum, will continue to slowly grow in importance and ENS is continuing to incrementally improve. Great stuff. → A Deeper Look at the ENS App | ENS Blog

Interesting project to see how far an AI can run something on its own. → We gave an AI a 3 year retail lease in SF and asked it to make a profit | Andon Labs


A haiku to leave you with…

Clouds of data drift ☁️
Agents browse, whisper ideas —
Profit hides in code.

Would you like to discuss the topics in the Weekly Thing further? Check out the Weekly Thing on Reddit. 👋

👨‍💻