I've had preview access to GPT-5 for a couple of weeks, so I have a lot to say about it. Here's my first post, focusing just on core characteristics, pricing (it's VERY competitively priced) and inter…

LinkedIn Content Strategy & Writing Style
Founder of the Datasette open source project
1 person tracking this creator on Viral Brain
Simon Willison positions himself as a deeply technical practitioner at the forefront of the generative AI revolution, bridging the gap between open-source development and the emerging world of agentic workflows. His content strategy centers on high-velocity experimentation, where he documents the "vibe coding" of hundreds of micro-tools to extract repeatable architectural patterns for LLM-assisted engineering. He is notable for his radical transparency and technical rigor, often dissecting infrastructure postmortems or brute-forcing conformance suites to validate AI-generated code. This unique intersection of open-source stewardship and pragmatic AI exploration allows him to advocate for software engineering integrity while simultaneously pushing the boundaries of what can be built on a single laptop.
15.0K
724
325
—
0.4
217
1
I've had preview access to GPT-5 for a couple of weeks, so I have a lot to say about it. Here's my first post, focusing just on core characteristics, pricing (it's VERY competitively priced) and inter…
I see a lot of complaints about untested AI slop in pull requests. Submitting those is a dereliction of duty as a software engineer: Your job is to deliver code you have proven to work
I published my third annual roundup of the last twelve months in LLMs. This one has 26 sections, starting with reasoning models and coding agents and working through Chinese open weight models, vibe c…
Over the past two years I've built more than 150 little "HTML tools" - single page interactive HTML+JavaScript utilities that do one useful thing. Almost all of them were vibe-coded with the assistan…
I wrote about "Designing agentic loops" - a new key skill that's needed to get the most out of coding agents like Anthropic's Claude Code and OpenAI's Codex CLI. A surprisingly large number of diffic…
Chinese AI lab Z.ai (previously called Zhipu AI) released two new MIT licensed open weight LLMs yesterday - GLM-4.5 and GLM-4.5 Air - and they are very impressive. Here are my initial impressions form…

0.4 posts/week
Posts / Week
22.2 days
Days Between Posts
1
Total Posts Analyzed
LOW
Posting Frequency
324.875%
Avg Engagement Rate
STABLE
Performance Trend
95
Avg Length (Words)
HIGH
Depth Level
ADVANCED
Expertise Level
0.86/10
Uniqueness Score
NO
Question Usage
0.25%
Response Rate
Writing style breakdown
<start of post>
A tiny benchmark for agentic HTML parsing
I’ve been building small ‘HTML tools’ for a while, and a lot of them were vibe-coded with the help of Claude, ChatGPT and Gemini. The fun part is that you can get surprisingly far if you treat the model like a fast junior collaborator and you’re disciplined about tests.
This week I tried a slightly different approach: I wrote a minimal harness that takes a chunk of HTML, runs it through a parser, and then checks a set of invariants about the output. The goal is not to “look right”, it’s to be measurably correct.
The rule I’m using is simple: make it impossible to ‘pass’ unless the output is actually correct.
One thing I keep seeing in pull requests is untested AI slop. Submitting that is a dereliction of duty as a software engineer. Your job is to deliver code you have proven to work, not code that feels plausible.
So for this harness, every time the agent proposes a change, it has to run the tests. If it can’t run the tests, it doesn’t get to commit. That one constraint changes the whole dynamic.
I started with a small seed suite (a few dozen cases) and then pointed it at a bigger corpus. It worked on the first run. The second run failed because I’d accidentally allowed a degenerate success condition: the parser could throw away nodes and still satisfy the checks.
That failure was useful, because it forced me to tighten the success criteria instead of tweaking the prompt.
I also stole a trick from the big HTML5 conformance suites: treat weird inputs as a feature, not an edge case. Malformed tags, duplicate attributes, odd encodings, things that only show up on the open web.
At that point the agent started doing the thing I actually wanted: not “write a parser”, but brute-force the space of behaviors until it found the smallest change that made one more test go green.
... and then I tried the same prompt against a local quantized model on my laptop, just to see how far things have come. It’s FAST, and the failure modes are instructive. You get less consistent long-range planning, but you still get a lot of useful local moves if the harness is tight.
The interesting bit is how quickly the agent found the edge cases once the loop was set up. That’s the skill I think people are going to have to learn: designing agentic loops, where the model can try things safely, observe the results, and iterate without you hand-holding every step.
Here’s a detail that surprised me. The evaluations you’d normally run (a couple of canned examples) don’t capture the degradation you’ll see in the wild, because models often recover well from isolated mistakes. If you want reliability, you need a harness that can catch the “looks fine” failures too.
I’m also increasingly convinced that privacy constraints make this harder than most people appreciate. If you can’t inspect the exact failure traces (because you never stored them), you’ll end up chasing ghosts.
If you want to build something with an agent, my current checklist is boring on purpose: define a success condition you can test, give it tools that can’t hurt you in ‘YOLO mode’, and make it run the tests every single time.
Tags: ai , llms , python , html , testing , agents
<end of post>
Sign in to unlock the full writing analysis
Nail your LinkedIn strategy with ViralBrain.
Analyze and write in Simon Willison's style. Grow your LinkedIn to the next level.