Sebastian Raschka, PhD's LinkedIn Strategy…

I put together a new LLM Architecture Gallery that collects the architecture figures I shared over the months and years in one place. The goal is to make it easier to quickly browse recent open-weigh…

LinkedIn post image: I put together a new LLM Architecture Gallery that collects the architecture figures I shared over the months and years

6.5K18961253 viral

Large Language Models2 months ago

View on LinkedIn

While waiting for DeepSeek V4 we got two very strong open-weight LLMs from India yesterday. There are two size flavors, Sarvam 30B and Sarvam 105B model (both reasoning models). Interestingly, the s…

LinkedIn post image: While waiting for DeepSeek V4 we got two very strong open-weight LLMs from India yesterday.

3.8K8534430 viral

LLM Research3 months ago

View on LinkedIn

I have been pretty heads-down this year to finish Chapter 6 on implementing reinforcement learning with verifiable rewards from scratch (using GRPO). I just finished it this weekend, and I'd say it's…

LinkedIn post image: I have been pretty heads-down this year to finish Chapter 6 on implementing reinforcement learning with verifiable rewar

3.0K8418822 viral

Reinforcement Learning for LLMs4 months ago

View on LinkedIn

I coded up a Qwen3.5 from-scratch reimplementation for educational purposes. The full suite of smaller Qwen3.5 open-weight LLMs (0.8B, 2B, 4B, 9B) was released earlier this week, and it's probably the…

LinkedIn post image: I coded up a Qwen3.5 from-scratch reimplementation for educational purposes. The full suite of smaller Qwen3.5 open-weig

2.8K8922222 viral

LLM Engineering3 months ago

View on LinkedIn

I just uploaded my State of LLMs 2025 report, where I take a look at the progress, problems, and predictions for the year. Originally, I aimed for a concise overview and outlook, but (like always) th…

LinkedIn post image: I just uploaded my State of LLMs 2025 report, where I take a look at the progress, problems, and predictions for the yea

2.1K7723119 viral

AI Research5 months ago

View on LinkedIn

I put together a new visual guide to the main attention variants used in modern LLMs. What's (relatively new) is that once models moved to longer contexts, the design space widened quite a bit, and a…

LinkedIn post image: I put together a new visual guide to the main attention variants used in modern LLMs.

2.3K6020918 viral

LLM Architecture2 months ago

View on LinkedIn

Topics & Content Focus

Primary Topics

Open-weight LLM architecture intelligence (what changed, why it matters, and how to compare model families)Compute/memory-efficient long-context modeling (KV-cache economics, attention variants, hybrid stacks)Reasoning-model engineering workflows (evaluation, inference-time scaling, self-refinement, RL, distillation)

Secondary Themes

Hands-on “from scratch” implementation education (readable code, consumer-hardware-friendly notebooks)Model release teardown & applied benchmarking (accuracy/throughput trade-offs, practical deployment implications)Knowledge productization (books, visual guides, galleries as reference infrastructure)

Industry Focus

LLM systems engineering (architecture, training, inference efficiency)Open-weight model ecosystem (Qwen/DeepSeek/Nemotron-style releases)Agentic applications requiring long-context throughput and local serving constraints

Content Categories

Technical explainer / architecture breakdownEducational resource announcement (book/chapter/notebook)Curated reference library (gallery, visual guide)Applied research synthesis (trade-off mapping across design variants)

Performance Insights

2768.2%

Avg Engagement Rate

STABLE

Performance Trend

Best Performing Topics

Curated architecture collections and visual reference assets (gallery/guide format)Explaining efficiency trade-offs in long-context attention and hybrid architecturesFrom-scratch reasoning-model implementation updates with reproducible artifacts

Virality Signals

Shareable, evergreen reference objects outperform transient updates (galleries/visual guides drive re-sharing)Utility-first packaging (links, configs, implementations) increases saving/sharing behaviorClear trade-off maps (cost vs quality vs context) trigger discussion without requiring controversy

Structure & Quality

280

Avg Length (Words)

HIGH

Depth Level

ADVANCED

Expertise Level

0.88/10

Uniqueness Score

Common Hooks

Builder/curator announcement hook ("I put together...", "It's done!", "Finished...")Timely market-scan hook ("Another week, another noteworthy open-weight LLM release")Contrarian-clarifier hook (framing the field as trade-offs rather than a single best answer)

Common Endings

Direct resource drop (article/GitHub link) with minimal frictionLight, friendly sign-off after delivering utility (e.g., browsing/experiment encouragement)Availability/status update (MEAP/production timeline/preorder)

Value Delivery Methods

Turns frontier research jargon into implementable workflows (evaluation → training → distillation)Connects architecture choices to operational constraints (KV-cache, bandwidth, latency, throughput)Provides reproducible assets (GitHub notebooks, config/tech-report links, curated galleries)Maintains readability/teachability as a deliberate product principle

Formatting Style

Short paragraphs for scannability despite high densityFrequent bullet/numbered lists to structure complex systemsInline definitions and parenthetical clarifiers (cost/latency/memory implications)Concrete naming of mechanisms and modules (variants, blocks, tokens, benchmarks)

Audience & Tone

Question Usage

0.25%

Response Rate

Detected Tone

Engineering-educator authority (teaches by building, not by theorizing)Pragmatic systems thinker (everything framed as trade-offs and constraints)Smart-casual technical curator (friendly, readable, but uncompromisingly detailed)Evidence-backed explainer (mechanisms + concrete results + links)semi-formalfirst-person

Interaction Style

Resource-driven engagement (readers respond by saving/sharing and asking implementation questions)Peer-to-peer technical validation (comments likely oriented around architectures, benchmarks, and serving trade-offs)Community learning loop (creator ships artifacts, audience tests/compares, feedback informs next artifacts)

Community Building Signals

Consistent release of public reference infrastructure (gallery/visual guides) that becomes a shared mapInvites experimentation via accessible code and consumer-hardware positioningCreates continuity through serialized projects (book chapters, recurring architecture coverage)

Writing Style Patterns

Content Strategy

Hook: Builder/curator announcement hook ("I put together...", "It'Tone: semi-formalCTA: Reference CTA ("Here's the link" to a guide/galler

Writing style breakdown

I just finished a new deep-dive into the architecture of Qwen3-Next (the 235B MoE model). It's a massive release, and what's particularly interesting is how they've refined the "sparse" scaling laws we've been seeing lately.

Instead of just adding more experts, the team focused on the routing mechanism itself to ensure that the active parameter count (around 22B) stays efficient enough for reasonable inference throughput on H100 clusters.

Sebastian Raschka, PhD

Warm Analysis

Performance Overview

Top Posts by Engagement

Posting Patterns & Frequency

Best Performing Days

Best Performing Times To Post

Topics & Content Focus

Primary Topics

Secondary Themes

Industry Focus

Content Categories

Performance Insights

Best Performing Topics

Virality Signals

Structure & Quality

Common Hooks

Common Endings

Value Delivery Methods

Formatting Style

Audience & Tone

Detected Tone

Interaction Style

Community Building Signals

Writing Style Patterns

Content Strategy