Brendan Foody's LinkedIn Strategy — CEO @…

Gemini 3.1 Pro is now at the top of the APEX-Agents leaderboard. Gemini jumped from 18.4% to 33.5% on Pass@1 in just 90 days. It also completes 5 tasks that no model has ever been able to do before.…

LinkedIn post image: Gemini 3.1 Pro is now at the top of the APEX-Agents leaderboard. Gemini jumped from 18.4% to 33.5% on Pass@1 in just 90

710387427 viral

AI Agents Benchmarking5 months ago

View on LinkedIn

GPT 5.4 is the best model we’ve ever tested on APEX-Agents. It’s also the first model to pass 50% mean score. A year ago, frontier models couldn’t even edit an Excel sheet and scored less than 5%. No…

LinkedIn post image: GPT 5.4 is the best model we’ve ever tested on APEX-Agents. It’s also the first model to pass 50% mean score.

576557726 viral

AI Model Benchmarking4 months ago

View on LinkedIn

Mercor's unique ability to answer this question is framed by the fact that we know who's an expert in what, all around the world. Uber built a driver network. Airbnb built a host network. Mercor is b…

509414419 viral

Network Effects4 months ago

View on LinkedIn

Anthropic jumping from 18.4% to 29.8% on APEX Agents in a few months is insane. This benchmark demonstrates it's ability to create real deliverables such as slide decks, documents, financial models i…

2451497 viral

AI Agents5 months ago

View on LinkedIn

Applied Compute improved 19% on Corporate Law tasks in APEX Agents. Their model traverses data rooms with hundreds of files to prepare complex legal deliverables. This level of model improvement wit…

1969167 viral

Legal AI5 months ago

View on LinkedIn

Applied Compute achieving frontier capabilities on APEX Agents with just 2,000 tasks is incredible. Their model can produce complex legal deliverables, redlines, and slide decks. It feels like RL is…

205775 viral

AI Agents5 months ago

View on LinkedIn

Topics & Content Focus

Primary Topics

Agentic AI performance benchmarking and model progress narrativesEconomic displacement/augmentation thesis for elite knowledge work (consulting, banking, law) via AI agentsPlatform strategy and network effects framing (building a global expertise/knowledge network)

Secondary Themes

Evaluation infrastructure as the core bottleneck (benchmarks/evals measuring economic value)Reinforcement learning and rapid benchmark saturation dynamicsSocial proof through operator shoutouts and ecosystem credit-giving

Industry Focus

Frontier LLMs applied to enterprise knowledge-work automation (legal deliverables, slide decks, spreadsheet/document workflows)AI agents evaluation/leaderboards (APEX-Agents) as go-to-market narrative infrastructureTalent/expertise marketplaces evolving into knowledge networks (Mercor platform positioning)

Content Categories

Benchmark update / leaderboard commentaryStrategic market thesis + predictionCompany narrative positioning (network effects, category creation)Community building / team-alumni recognition

Performance Insights

417.5714285714286%

Avg Engagement Rate

STABLE

Performance Trend

Best Performing Topics

APEX-Agents leaderboard updates with specific metric jumps and capability examplesBroad-scope predictions about AI surpassing professional servicesNetwork-effect platform thesis positioned as a new category (knowledge network)

Virality Signals

Shares spike when posts combine hard metrics + clear, memorizable implication ("better than best consulting firm")Benchmark/leaderboard framing creates 'reference utility' (people share as evidence in debates)Specific deltas over short time windows (90 days, 3 months) increase repost-worthiness

Structure & Quality

Avg Length (Words)

HIGH

Depth Level

ADVANCED

Expertise Level

0.78/10

Uniqueness Score

Common Hooks

Leaderboard/benchmark superlative hook ("best we've ever tested", "now at the top")Compressed time-horizon contrast ("a year ago... now...") to signal accelerationHard threshold milestone ("first to pass 50%") as credibility anchorCategory-level claim framed as inevitability ("imminently", "most important network effect")Named-entity credibility stacking (models, benchmarks, products, people)

Common Endings

Future-forward conclusion that reframes the implication at economy scaleAttribution/shoutout ending ("great work", naming individuals/teams)Congratulatory close to a major lab/company releaseShort community tag/identity marker (team/alumni signal)

Value Delivery Methods

Turns model eval results into executive-level implications (market timing + category impact)Provides concrete progress signals (thresholds, deltas, task capability examples) that reduce ambiguityClarifies the real bottleneck (measurement/evals) rather than repeating generic 'AI is improving' claimsBuilds trust through credit-giving and ecosystem awareness

Formatting Style

Short paragraphs with line breaks for scanabilityMetric-first statements followed by implication framingMinimal emoji usage (used sparingly as identity/energy marker)Proper nouns and technical terms used without over-explaining (assumes informed audience)

Audience & Tone

Question Usage

0.12%

Response Rate

Detected Tone

Benchmark-driven futurist-operatorClinical credibility with occasional insider-casual signalingConfident, thesis-forward, economy-scale framingHigh-agency builder narrative (platform/network effects orientation)semi-formalfirst-person

Interaction Style

Debate-seeking through bold predictions (invites rebuttal and scenario-building)Peer validation loops (tagging/crediting practitioners and labs)Evidence-based discussion starter (metrics as shared ground for comments)

Community Building Signals

In-group identity markers (e.g., team/alumni 'mafia' language)Public recognition of individuals/teams to strengthen network tiesPositions company narrative within a larger movement (agents transforming knowledge work)

Writing Style Patterns

Content Strategy

Hook: Leaderboard/benchmark superlative hook ("best we've ever tesTone: semi-formalCTA: Implicit CTA via strong claims designed to trigger

Writing style breakdown

The speed at which LLMs are mastering specialized labor is breathtaking. GPT 5.4 just hit a 62% success rate on the APEX-Agents legal reasoning module.

Six months ago, no model could maintain consistency across a 50-page contract. Now, GPT 5.4 identifies conflicting clauses and suggests redlines in seconds.

This is the end of rote document review. The bottleneck is no longer human hours, but the quality of the underlying data.

The economic impact on professional services will be enormous.

great work Sam Altman and the OpenAI team

Brendan Foody

Warm Analysis

Performance Overview

Top Posts by Engagement

Posting Patterns & Frequency

Best Performing Days

Best Performing Times To Post