Weekly Thing 321 / Saluting, Tetris, Sky

brain on fire

                May 31, 2025

            Weekly Thing 321 / Saluting, Tetris, Sky

            Good morning! 👋
You may notice that I've been going deep on a lot of AI topics in the links lately. I definitely am going through a bit of a brain on fire moment with the thoughts and ideas pushing out sleep and other things. 🧠🔥
I like these moments though. In the end I usually land with a mental framework that helps understand how to use this stuff and apply it, or where not to.
My mind also got on fire with POAPs this week. More than half of you just rolled your eyes. 🙄
Way back when I loved Geocaching. It was so much fun, particularly back in the early days before it got commercialized. It was a way of exploring the area around you and finding cool, hidden things. 🗺️
When I first found POAPs the idea of claiming a POAP for a location was something I wanted right away. Well they have made that a reality and location drops are now a feature. I created my very first location POAP and even recorded a video claiming it. Awesome. So, I wanted to make something with this. 🤔
Then I connected the dots between the 612 Series NFTs which I have a complete collection of. You may remember I also interviewed Erik Halaas who created it. This was perfect — peanut butter and chocolate moment! ✨
Thus the 612 POAP Challenge was created! 😲
If you live in Minneapolis join in and go collect some of these locations. Details on the page and it is open until Labor Day. You are the first folks I’m sharing this with. I think this will be a fun summer activity! Get the whole family involved on some walks, bike rides, kayaks, and explore the city. 🧭
I love that I was able to create an entire activity on the POAP platform. 🤩

Honey Bee collecting pollen. This tree has so many bees working the flowers that it was buzzing as you stood by it. 🐝
May 26, 2025

Cannon Lake, MN

Notable
How I used o3 to find CVE-2025-37899, a remote zeroday vulnerability in the Linux kernel’s SMB implementation – Sean Heelan's Blog
One of the amazing use cases for LLMs and code is to find security vulnerabilities. Here Heelan uses it to see how well it can find very difficult Linux kernel bugs. I love how he uses multiple iterations and does a sort of Monte Carlo run. And wild that it finds it in some, but very few iterations. 

o3 finds the kerberos authentication vulnerability in the benchmark in 8 of the 100 runs. In another 66 of the runs o3 concludes there is no bug present in the code (false negatives), and the remaining 28 reports are false positives. For comparison, Claude Sonnet 3.7 finds it 3 out of 100 runs and Claude Sonnet 3.5 does not find it in 100 runs._ _So on this benchmark at least we have a 2x-3x improvement in o3 over Claude Sonnet 3.7.

And in the resolution? His initial remediation wasn't sufficient and o3 helped identify that in a small subset of runs.

Having realised this, I went again through o3's results from searching for the kerberos authentication vulnerability and noticed that in some of its reports it had made the same error as me, in others it had not, and it had realised that setting sess->user = NULLwas insufficient to fix the issue due to the possibilities offered by session binding. That is quite cool as it means that had I used o3 to find and fix the original vulnerability I would have, in theory, done a better job than without it. I say 'in theory' because right now the false positive to true positive ratio is probably too high to definitely say I would have gone through each report from o3 with the diligence required to spot its solution. Still, that ratio is only going to get better.

This shows there is clear value here and honestly I think if you showed the original code to 100 engineers I would be surprised if even 8 of them found the issue. 
Highlights from the Claude 4 system prompt
This is an incredible read and gives you insight into how Anthropic creates the personality of Claude for people to interact with. Is this programming? It doesn't look like it but it sure feels like it. This lights up all sorts of thoughts for me on how agents will appear in the workplace. We will need to consider designing agent personas in a much broader way than "help people create a ticket". The tagging of different components is also very interesting. Wow. 
MCP is the coming of Web 2.0 2.0 - Anil Dash
I agree with Dash here and see the same glimmer of hope. The idea that my website could have an MCP capability was what got me there. I have a human interface in HTML, a feed interface with RSS and JSON Feed. Why not an agentic interface with MCP? And if that agentic interface existed there is an incredible variety of potential that could be created from this. I’m hopeful!
The Future of Engineering Services: Building and Measuring Human-AI Teams – rajiv.com
Friend and Weekly Thing reader Rajiv Pant (👋 hey Rajiv!) breaks down how to pivot our thinking about engineering capability as you transition into an AI-native approach. He even walks through how to phase the transformation that he describes. At the core is thinking more about the value provided and less about individual productivity metrics. I think there is some nuance there in the mental model. I’m currently more in the mindset of agentic engineers working alongside people to contribute value to a project. More horizontally integrated. However, if we take a more vertical approach as Rajiv suggests and consider the people the designers and the LLM the coder, that would result in a different view. Thought provoking regardless of how you approach it.
Sky Extends AI Integration and Automation to Your Entire Mac - MacStories
Sky hasn't launched yet and their website just has a teaser video showing the application doing some pretty amazing things powered by AI. I signed up on the waiting list and even filled their survey out. There is a lot of enthusiasm about this app because it is from the authors of Workflow, which Apple acquired and renamed Shortcuts, and which I use every single day and is the primary tool I use to build the Weekly Thing.
Viticci's writeup here is based on using the beta version of their product and he is glowing of it. Viticci is notably also an extraordinary power-user of Shortcuts. His admiration for the past success may give some positive bias, but he is also a shrewd user of automation technology. 

What sets Sky apart from anything I’ve tried or seen on macOS to date is that it uses LLMs to understand which windows are open on your Mac, what’s inside them, and what actions you can perform based on those apps’ contents. It’s a lofty goal and, at a high level, it’s predicated upon two core concepts. First, Sky comes with a collection of built-in “tools” for Calendar, Messages, Notes, web browsing, Finder, email, and screenshots, which allow anyone to get started and ask questions that perform actions with those apps. If you want to turn a webpage shown in Safari into an event in your calendar, or perhaps a document in Apple Notes, you can just ask in natural language out of the box.

From that description I would describe Sky as an Agent platform that runs on your Mac to accomplish many different tasks. The agent part is a huge deal and follows the structure of more advanced AI solutions.
I think many Apple users may wonder if Sky is the app that Apple will buy to replace the woefully challenged Siri. 

I think there is, and after using Sky for two weeks, I'll be honest: I wouldn't be surprised to hear that OpenAI or Apple would be interested in acquiring this company. Apple could use this team (again) to get an LLM-infused tool that works with apps before the two years it'll likely take them to ship a Siri LLM with comparable features -- and I'm sure Sky would work even better if it had access to more private APIs on macOS.

I’m looking forward to getting my hands on this.
AI-assisted development needs automated tests
Small but profound statement. You hear a lot of variety in feedback on how well LLMs work in contributing to code. What we don't hear enough about are what the conditions of that code are. Do they have good tests? Is it a spaghetti mess? In the early days of LLM's doing code I shared with many friends that coding is in some ways a much easier task than writing generic text. Why? Well in a way you are playing a game with the compiler or interpreter. There is another piece of software that will tell you if you are right (compiles) or wrong (wont compile). So the LLM can play that game at machine-speed to get to something that compiles. However, every developer will tell you that is interesting but means nothing. Just because it compiles doesn't mean it can do the thing you want. What does that? Automated tests and focusing on test-first development. It follows from the first example that tests would then give the LLM another "boss battle" to work through to get to a valid outcome. 
All of this is to simply say that as Agentic coding becomes more common we will not just change a lot of tooling for how code is created but we will also likely change the process by which we come to understand what it is that we are building. In fact, if we don't assess that process as a whole we are almost certainly getting a suboptimal solution.
PS: Doing better at this would also help all developers. Test first development, similar to pair programming, is generally a known "good thing" but it is hard and many organizations find it slow in the beginning and they bale on it. 
Say Your Writing
Reading this made me smile. I guess I don't really think about the process that much but I will say that when I write these emails and blog posts this is exactly what I do.

In my case I find I constantly (silently) speak the words as I'm writing.

I’m usually speaking to myself in my head and working my hands as fast as I can to keep up. Doesn't everyone do that? 🤔
Vibe coding for teams, thoughts to date | Kellan Elliott-McCrea: Blog
Interesting observation about some of the impacts that LLMs have on a codebase. 

LLMs are without a doubt the most disruptive change to how code gets written I've seen in my career since the introduction of the World Wide Web.

…

That said, as of today, LLMs don't change some key fundamental physics of writing code as a team. Importantly, as of today, they haven't changed the fundamental calculation that writing code is always easier than understanding code.
The insight that reading code is harder than writing leads naturally to the insight that every line of code is tech debt. Every line of code encodes your best current understanding of the problem you're trying to solve (for increasingly weird definitions of "you"). Nothing has changed the fact that your current understanding is probably limited, and wrong in several ways.

All three of these insights are notable and things that teams should not ignore. In fact, I think this points to why an agentic model for software development is the right way to approach it. The things that the author highlights would also be true of adding a new engineer to the team. So how do we onboard an agent into the software development team? That is the question to ponder. 
Lazy Tetris
Tetris is one of my favorite games. I still remembering playing it early on and to this day a good Tetris implementation (which is hard to find sadly!) is one of the few games that can capture me. I forget to blink when I play Tetris. So this version, where the tetromino doesn't actually fall seemed silly at first. Then I used my mouse and just moved pieces around and created beautiful stacks of tiles. And it was fun. And I could still blink my eyes. It was even relaxing. Huh. 😊
LLMs are weird, man – Surfing Complexity
This is an enjoyable read highlighting how the inner workings of LLMs are a mystery to everyone, including those who create them. The point of this is that usually we can generate some sort of a mental model to categorize and understand a given technology, but LLMs seem to not fit those models. In a super simple way you can think of it like a "calculator for words" but that feels completely wrong when you see an LLM respond with something that is truly insightful. 

But I think it's a mistake to write off this technology as just a statistical model of text. I think the word "just" is doing too much heavy lifting in that sentence. Our intuitions break down when we encounter systems beyond the scales of everyday human life, and LLMs are an example of that. It's like saying "humans are just a soup of organic chemistry" (c.f. Terry Bisson's short story They're Made out of Meat). Intuitively, it doesn't seem possible that evolution by natural selection would lead to conscious beings. But, somehow we humans are an emergent property of long chains of amino acids recombining, randomly changing, reproducing, and being filtered out by nature. The scale of evolution is so unimaginably long that our intuition of what evolution can do breaks down: we probably wouldn't believe that such a thing was even possible if the evidence in support of it wasn't so damn overwhelming. It's worth noting here that one of the alternative approaches to AI was inspired by evolution by natural selection: genetic algorithms. However, this approach has proven much less effective than artificial neural networks. We've been playing with artificial neural networks on computers since the 1950s, and once we scaled up those artificial neural networks with large enough training sets and a large enough set of parameters, and we hit upon effective architectures, we achieved qualitatively different results.

In the late 90's before we created BigCharts and got a successful company up and running around market data visualization we had a very different product called The Conductor. That original product was Philip Hotchkiss first vision of informing trading models using neural nets and genetic algorithms. In modern terms the dozen or so computers we had (no GPUs!) using these advanced methods is almost comical. But the actions we were taking were advanced. Genetic algorithms would compete for superiority. Neural nets would predict future market behaviors. We graded them. We would find a "model" that did really well and promote it. Several weeks later that same model had become less effective and possibly overtrained itself so we would delete it. At that scale, dozens of cores of compute, not enough data to train on to care, it was all fairly simple but even at that level we had no idea "why" one model did better than the other. We only judged them on the outputs.
But it turns out that if you scale the crap out of this it is different. Increase by dozens of orders of magnitude the data set, the compute, the storage — you get a very different thing on the other end. And that same challenge of not really knowing what is going on at the core is still there too. Why does this model work better than the other? Shrug.
I’m far out of my skis on this topic. It is notable to me that extreme scale can result in just something very different. Don't brains have the same thing in the extreme? Increase the number of neurons by several orders of magnitude and at some point you get different outcomes? We don't really know, but I agree with the author that this stuff is "weird", and it is also rather amazing. 

Supporting Membership

🌟 Join the Weekly Thing's mission to make a real impact—become a Supporting Member today! With 24 amazing members already on board, we've raised $97.57 so far, and every single dollar will go directly to the Electronic Frontier Foundation (EFF) at the end of the year. Help us stand up for digital rights and let's see how far we can grow this community-driven support for a fantastic cause! 🚀

$4 monthly

$40 yearly

Journal
May 23, 2025 at 7:46 AM
How cool that the S3 Files app added a local MCP server so you can expose any S3 bucket of files to Claude or Cursor for local use.

May 24, 2025 at 12:27 PM
Boat Day 2025! Got the boat out just in time to get on Cannon Lake for Memorial Day.

May 24, 2025 at 12:31 PM
I'm wondering what MCP endpoints I would like to have:

My blog
Weekly Thing
OmniFocus, that could be amazing
Manuals for stuff I own
My media library
My photo library
My house, or whatever that would mean

May 24, 2025 at 1:28 PM
The Summer of 2025 Magic Pines POAP is now minting! Available for visitors until Labor Day. 🤩
Related: Magic Pines Summer POAPs for 2022, 2023, and 2024.

May 24, 2025 at 6:22 PM
Lucky and I.

May 24, 2025 at 6:25 PM
Smashburgers! 🍔🍔🍔

May 25, 2025 at 7:42 AM
I created my very first location-based POAP. Our “Peanuts Tree” is a popular spot on Cannon Lake and you can now mint a POAP if you visit it. I’m going to place a QR code by it so visitors can get more info. Will anybody do it? I don’t know, but it is fun either way. See screen recording of mint.

May 25, 2025 at 7:45 AM
I’ve been using ChatGPT so much that I hit the limits on my Plus account. I’ve been getting so much value out of it I made the jump to ChatGPT Pro. I’m curious to explore some of the advanced features further. 🧠
May 25, 2025 at 5:53 PM
I'm not eager to have someone riding my bumper, but…

BACK OFF BUMPER HUMPER!!!

MY BREAKS ARE GOOD!

HOW ABOUT YOUR INSURANCE?

😬

May 25, 2025 at 5:58 PM
Delicious cortado at Little Joy Coffee in Northfield. ☕️

May 25, 2025 at 6:01 PM
Memorial Day Dairy Queen! 🍦

May 25, 2025 at 6:04 PM
Pizzas at Pleasant Grove Pizza Farm. Love the pizza farms!

May 25, 2025 at 6:06 PM
The kids playing Kubb while we wait for our pizza. Playing with the custom set I won at the USA Kubb Championship.

May 26, 2025 at 7:38 PM
Ready for Game 4! Let's repeat Game 3 Wolves!

May 26, 2025 at 11:36 PM
I find it ironic that I've got a lot of things going on and am struggling to find the time to read How to do Nothing by Jenni Odell. 😬

Briefly
The cultural adoption of emoji is fascinating. → What Does The 🫡 Saluting Face Emoji Mean?
The scale of these DDoS attacks is incredible. All these insecure devices becoming zombies is a huge threat.  → KrebsOnSecurity Hit With Near-Record 6.3 Tbps DDoS – Krebs on Security
Simple form with good controls to make a QR code. No redirects involved. No surveillance. Just a simple QR code. I tend to use Shortcuts for this, but this tool has more control over the QR code specifics. → Just a QR Code
Poetic. → The Way of Code | Rick Rubin
Interesting look into how journalists are using AI in their craft. I think this is notable given that the act of writing is core to an LLM and in many ways "too close" to the task so having guardrails becomes more important.  → How this year’s Pulitzer awardees used AI in their reporting | Nieman Journalism Lab
Beautiful photos and so hard to capture. → 2025 Milky Way Photographer of the Year - Capture the Atlas
Bryn with some great thoughts about how to manage your attention. → Watching TV With The Sound Off - Gordo Byrn
Starbucks holds nearly $2 billion in prepayment from customers? That number is astonishing to me.  → The Bank of Starbucks - Of All Trades
Willison's LLM tool is like a pocket knife for people that are exploring and learning about LLMs and how to build around and on them. This new feature is a big jump in capability. → Large Language Models can run tools in your terminal with LLM 0.26
This looks like a great course to learn how to interact with LLMs better. → prompt-eng-interactive-tutorial: Anthropic's Interactive Prompt Engineering Tutorial
Don't love it but I guess the trend is strong. Maybe I need to embrace that this is still some form of Semantic Versioning using the calendar? 😬 → Daring Fireball: Gurman: Apple Is Going to Re-Version OSes by Year, Starting With iOS 26, MacOS 26, tvOS 26, Etc.
Interesting mental model. → The Five Layers of Sharing Thoughts and Ideas | Matt Mullenweg
I've seen data lakes do some incredible things. Another approach at that pattern. → DuckLake is an integrated data lake and catalog format. – DuckLake
Handy timeout command for bash. → TIL: timeout in Bash scripts | Heitor's log
It is wild to me that this is possible with CSS only! 🤯 → CSS Minecraft
The font here is nice enough, but the thing I really liked in this article is the author going into detail on how they made their very first font. For some reason I always find font creation super interesting. Reading this article had me pondering the idea of creating a font of my own. I think it would be called Thing. Hmm… added to the Someday, Maybe list. 🤔 → I made a font | blog.chay.dev
Russia's attacks on Ukraine continue and have escalated. Three years on and the people of Ukraine need our support. 💙💛 → Russia launches one of biggest drone attacks on Kyiv since start of war | Ukraine | The Guardian
I loved Twin Peaks and am a Lynch fan. What a collection of stuff. The "Twin Peaks" Scripts with the original "Northwest Passage" title are already at $6,000.  → The David Lynch Collection

Fortune
Here is your fortune…
Automate your day — saving time for Lazy Tetris! 🧩
Would you like to discuss the topics in the Weekly Thing further?

Join the private Weekly Thing Forum 🤝
r/WeeklyThing on Reddit 👋
Sign the Weekly Thing Guestbook ✍️

Want to share this issue with others? The link is…

👨‍💻

This work by Jamie Thingelstad is licensed under CC BY-SA 4.0.
My opinions are my own and not those of any affiliates. The content is non-malicious and ad-free, posted at my discretion. Source attribution is omitted due to potential errors. Your privacy is respected; no tracking is in place.

Don't miss what's next. Subscribe to Weekly Thing: