Simon Willison's LinkedIn Strategy — Founder…

I've had preview access to GPT-5 for a couple of weeks, so I have a lot to say about it. Here's my first post, focusing just on core characteristics, pricing (it's VERY competitively priced) and inter…

404164455 viral

AI11 months ago

View on LinkedIn

I see a lot of complaints about untested AI slop in pull requests. Submitting those is a dereliction of duty as a software engineer: Your job is to deliver code you have proven to work

411164354 viral

Software Engineering7 months ago

View on LinkedIn

I published my third annual roundup of the last twelve months in LLMs. This one has 26 sections, starting with reasoning models and coding agents and working through Chinese open weight models, vibe c…

348114550 viral

Large Language Models6 months ago

View on LinkedIn

Over the past two years I've built more than 150 little "HTML tools" - single page interactive HTML+JavaScript utilities that do one useful thing. Almost all of them were vibe-coded with the assistan…

297202136 viral

AI-Assisted Web Development7 months ago

View on LinkedIn

I wrote about "Designing agentic loops" - a new key skill that's needed to get the most out of coding agents like Anthropic's Claude Code and OpenAI's Codex CLI. A surprisingly large number of diffic…

244112633 viral

AI Agents9 months ago

View on LinkedIn

Chinese AI lab Z.ai (previously called Zhipu AI) released two new MIT licensed open weight LLMs yesterday - GLM-4.5 and GLM-4.5 Air - and they are very impressive. Here are my initial impressions form…

LinkedIn post image: Chinese AI lab Z.ai (previously called Zhipu AI) released two new MIT licensed open weight LLMs yesterday - GLM-4.5 and

203211024 viral

Large Language Models12 months ago

View on LinkedIn

Topics & Content Focus

Primary Topics

Agentic coding workflows and success-criteria design (loops, tool access, guardrails)Practical LLM-assisted software engineering with strong quality standards (testing, responsibility, review hygiene)Rapid utility-building and “vibe-coded” product patterns (small tools, deployment simplicity, avoiding heavy frameworks)

Secondary Themes

LLM ecosystem sensemaking and annual retrospectives (model families, trends, weird edge-cases)Brute-force testing strategies and conformance-driven development (leveraging large test suites)Developer tooling choices and operational safety (YOLO mode, sandboxing, controlled environments)

Industry Focus

Software engineering / developer productivityAI-assisted programming (LLMs, coding agents) in real-world workflowsOpen-source and tooling-oriented engineering culture

Content Categories

Deep-dive link posts to long-form essaysOpinionated engineering standards / ethics PSACase study breakdowns of specific projects/toolsPlaybook-style tactical guidance (patterns, tips, safety practices)Year-in-review synthesis / curated roundup

Performance Insights

324.875%

Avg Engagement Rate

STABLE

Performance Trend

Best Performing Topics

Engineering responsibility + AI code quality (anti-slop, testing expectations)High-signal ecosystem synthesis (annual LLM roundup)Repeatable, practical patterns for building with LLM assistance (tool-building playbooks)

Virality Signals

Shareable, strong-opinion “quality bar” statements that teams can repostHigh-density specificity (tool names, counts, test-suite scale) that signals authorityRoundup/curation format that functions as a reference artifactTactical utility: readers can immediately apply loops/patterns/safety tipsLight absurdity/whimsy embedded in otherwise technical content (memorability boost)

Structure & Quality

Avg Length (Words)

HIGH

Depth Level

ADVANCED

Expertise Level

0.86/10

Uniqueness Score

Common Hooks

First-person credibility opener ("I published…", "I wrote about…", "Over the past two years…")Problem/behavior callout that creates a norm (quality bar / duty / responsibility framing)Concrete artifact lead (naming a project/tool and why it’s interesting)Scale signal early (numbers of sections/tools/tests to establish depth)

Common Endings

Teaser colon leading into a link or implied continuationPromise of actionable tips or patterns ("Includes tips…")List-like breadth cue ("and much more")

Value Delivery Methods

Turns emerging agent tooling into concrete operating principles (criteria, loops, environment design)Translates personal experimentation into reusable patterns (small-tools heuristics)Sets and defends professional norms for AI-assisted development (testing discipline)Provides curated synthesis that reduces reader research load (roundups and trend mapping)

Formatting Style

Short paragraphs with strong line breaksFrequent use of colons to introduce links, lists, or expansionsOccasional quotation marks for coined/ironic terms (e.g., “HTML tools”)Tool-name and proper-noun density (signals specificity and credibility)

Audience & Tone

Question Usage

0.25%

Response Rate

Detected Tone

Pragmatic practitioner-scholarOpinionated but evidence-backedBuilder-first, theory-secondSmart-casual with occasional dry humorStandards-driven (quality/ethics emphasis)semi-formalfirst-person

Interaction Style

Professional norm negotiation (agree/disagree debates around standards)Peer sharing among engineers (save/repost behavior for team policies and references)Tool-focused discussion (implementation details, agent workflows, safety tradeoffs)

Community Building Signals

Creates repeatable “reference posts” (annual roundup) that anchor returning attentionBuilds trust through visible volume of hands-on work (many tools built, tested patterns)Invites practitioners into a shared vocabulary (vibe-coded, agentic loops, YOLO mode)

Writing Style Patterns

Content Strategy

Hook: First-person credibility opener ("I published…", "I wrote abTone: semi-formalCTA: Implicit CTA via “I wrote about…” with a link (rea

Writing style breakdown

A tiny benchmark for agentic HTML parsing

I’ve been building small ‘HTML tools’ for a while, and a lot of them were vibe-coded with the help of Claude, ChatGPT and Gemini. The fun part is that you can get surprisingly far if you treat the model like a fast junior collaborator and you’re disciplined about tests.

This week I tried a slightly different approach: I wrote a minimal harness that takes a chunk of HTML, runs it through a parser, and then checks a set of invariants about the output. The goal is not to “look right”, it’s to be measurably correct.

The rule I’m using is simple: make it impossible to ‘pass’ unless the output is actually correct.

One thing I keep seeing in pull requests is untested AI slop. Submitting that is a dereliction of duty as a software engineer. Your job is to deliver code you have proven to work, not code that feels plausible.

So for this harness, every time the agent proposes a change, it has to run the tests. If it can’t run the tests, it doesn’t get to commit. That one constraint changes the whole dynamic.

I started with a small seed suite (a few dozen cases) and then pointed it at a bigger corpus. It worked on the first run. The second run failed because I’d accidentally allowed a degenerate success condition: the parser could throw away nodes and still satisfy the checks.

That failure was useful, because it forced me to tighten the success criteria instead of tweaking the prompt.

I also stole a trick from the big HTML5 conformance suites: treat weird inputs as a feature, not an edge case. Malformed tags, duplicate attributes, odd encodings, things that only show up on the open web.

At that point the agent started doing the thing I actually wanted: not “write a parser”, but brute-force the space of behaviors until it found the smallest change that made one more test go green.

... and then I tried the same prompt against a local quantized model on my laptop, just to see how far things have come. It’s FAST, and the failure modes are instructive. You get less consistent long-range planning, but you still get a lot of useful local moves if the harness is tight.

The interesting bit is how quickly the agent found the edge cases once the loop was set up. That’s the skill I think people are going to have to learn: designing agentic loops, where the model can try things safely, observe the results, and iterate without you hand-holding every step.

Here’s a detail that surprised me. The evaluations you’d normally run (a couple of canned examples) don’t capture the degradation you’ll see in the wild, because models often recover well from isolated mistakes. If you want reliability, you need a harness that can catch the “looks fine” failures too.

I’m also increasingly convinced that privacy constraints make this harder than most people appreciate. If you can’t inspect the exact failure traces (because you never stored them), you’ll end up chasing ghosts.

If you want to build something with an agent, my current checklist is boring on purpose: define a success condition you can test, give it tools that can’t hurt you in ‘YOLO mode’, and make it run the tests every single time.

Tags: ai , llms , python , html , testing , agents

Simon Willison

Warm Analysis

Performance Overview

Top Posts by Engagement

Posting Patterns & Frequency

Best Performing Days

Best Performing Times To Post