
Ethan Mollick on AI Idea Generation: The Evidence
Unpacks Ethan Mollick's viral point that GPT-4 can outperform people at idea generation, with context and prompting tactics.
Ethan Mollick recently shared something that caught my attention: he said he still sees the argument that AI is bad at generating ideas, but that his colleagues and he (and many other researchers) have repeatedly found that even the old GPT-4 could be prompted to generate more diverse and higher quality ideas than most people. And, he added, newer models do better.
"I still see the argument that AI is bad at generating ideas... even the old GPT-4 could be prompted to generate more diverse and higher quality ideas than most people. And newer models do better."
That short post is doing a lot of work. It challenges a stubborn belief that creativity is a hard boundary for AI, and it points to something more practical: idea generation is not a mystical talent. It is a process you can measure, improve, and scale. If a model can reliably produce a wider spread of options, and if humans rate many of those options as high quality, then AI is not just "helpful". It is a meaningful brainstorming advantage.
Below, I want to expand on what Mollick is getting at, why the research result matters, and how you can apply it without turning your work into generic AI slop.
The real claim: diversity and quality, not just volume
When people say "AI is bad at ideas," they often mean one of three things:
- AI repeats common patterns and clichés.
- AI produces a lot of ideas, but most are low quality.
- AI cannot make the leap to something truly novel or useful in context.
Mollick's point pushes back with two specific criteria researchers can actually test:
- Diversity: Are the ideas meaningfully different from one another, or are they variations of the same theme?
- Quality: When evaluated (often by humans, sometimes by other metrics), do the ideas meet the bar for usefulness, originality, feasibility, or fit?
In other words, the benchmark is not "Can the model be a lone genius?" It is "Can the model help produce a better set of options than a typical human brainstorm?" That is a much more grounded question, and it is exactly where AI tends to shine.
Why "even the old GPT-4" matters
The phrase "even the old GPT-4" is a subtle jab at moving goalposts. A lot of skepticism about AI creativity is based on earlier experiences: weak prompts, older models, or unstructured experiments where people ask for "10 startup ideas" and then judge the output as bland.
But in research settings, prompting is treated like experimental design. You do not just ask once and stop. You specify constraints, ask for multiple directions, request rationale, enforce novelty checks, and iterate. When you do that, GPT-4 (even the earlier versions many people now call "old") can outperform many individuals on standard ideation tasks.
And if that was true with older GPT-4, Mollick's follow-up point is important: newer models do better. That does not mean they are perfect. It means the default capability baseline keeps rising, so the "AI cannot generate ideas" claim ages poorly.
The hidden variable: most brainstorming is not very good
A tough truth: many human brainstorming sessions are optimized for comfort, not output.
- The loudest voice anchors the group.
- People converge too early on "safe" directions.
- Social pressure reduces weird options.
- Time constraints force shallow exploration.
AI does not feel awkward, does not worry about judgment, and does not get tired after the 12th variant. That alone can produce more breadth.
Key insight: In practice, AI is often competing with average ideation, not with elite creative directors at their best.
So when studies show AI beating "most people" on idea diversity and quality, it may be less magical than it sounds. It may simply be that we are comparing a tireless, structured generator to a human process full of bottlenecks.
What "prompted to generate" implies (and what to copy)
Mollick's wording matters: "could be prompted to generate". The win is not automatic. It is conditional on how you set the task up.
Here are a few prompting patterns that reliably improve ideation outcomes.
1) Ask for a spread, not a list
Instead of: "Give me 20 ideas for X."
Try: "Generate 12 ideas in 4 distinct categories (3 per category). Categories should be meaningfully different approaches." This forces diversity by design.
2) Define what "good" means
Quality is not generic. Tell the model how you will judge it.
Example: "Each idea must be feasible in 30 days with a 2-person team, and must include a clear user, a pain point, and a testable first experiment." Constraints reduce fluff.
3) Use novelty pressure carefully
You can ask for non-obviousness without demanding impossible originality.
Example: "Avoid common suggestions like A, B, and C. Prefer ideas that combine two unrelated domains." Then list the clichés you already know.
4) Make it critique itself, then regenerate
Two-step loops are powerful:
- Step 1: Generate candidates.
- Step 2: Score them against your criteria and explain weaknesses.
- Step 3: Regenerate improved versions based on the critique.
This is one reason researchers often see better results than casual users. They run a process, not a single prompt.
A practical workflow: AI for breadth, humans for taste and context
If you want a simple division of labor that matches Mollick's research framing, try this:
- Use AI to generate a wide search space (diverse directions).
- Use humans to apply context (what fits your market, brand, audience, constraints).
- Use AI again to elaborate the best few (turn sketches into test plans).
Think of AI as a multiplier for exploration. You still need human judgment to pick the right hill to climb.
Example: content ideas for a B2B newsletter
Prompting for diversity might produce:
- Contrarian takes (challenge a common assumption)
- Case breakdowns (one company, one lesson)
- Tactical playbooks (step-by-step)
- Data-driven mythbusting (small original analysis)
Even if only 2 out of 12 are strong, you now have options you might not have reached in a rushed meeting.
Where skeptics are still right (and how to avoid the traps)
Mollick is arguing against a blanket claim, not saying AI is always creative in the way humans mean it. A few pitfalls are real:
- Genericness: If your prompt is generic, your output will be generic.
- Hidden repetition: Models can produce ideas that feel different but share the same underlying structure.
- Context blindness: AI may miss constraints that are obvious inside your organization.
- Over-trust: Quantity can create false confidence.
The fix is not to abandon AI. It is to add evaluation.
How to evaluate idea quality without fooling yourself
If you want to apply the research spirit in real work, borrow these lightweight checks:
- Blind ranking: Remove labels and rank ideas without knowing which came from AI or a person.
- Diversity audit: Group ideas by theme and count how many clusters you actually have.
- Feasibility pass: For top ideas, require a first experiment and success metric.
- "One step to action": If an idea cannot be turned into a next step, it is not ready.
Key insight: The goal is not to prove AI is creative. The goal is to reliably surface better options.
Why this connects to LinkedIn content and content strategy
Mollick's post went viral because it compresses a counterintuitive research result into a clean, debatable claim. That is classic LinkedIn content: one sharp point, clear stakes, and a hint of evidence (plus a paper link).
For creators and teams, the lesson is bigger than AI. It is about process. If you treat ideation like a repeatable system, you can produce better work more consistently.
- Use AI to generate angles for posts.
- Use your expertise to select, fact-check, and add lived experience.
- Use iteration to refine hooks, examples, and structure.
Viral posts are rarely accidents. They are often the visible output of good idea generation habits.
Closing thought
When Ethan Mollick says GPT-4 can be prompted to generate more diverse and higher quality ideas than most people, I hear a challenge: stop arguing about whether AI can be creative in theory, and start learning how to run better ideation in practice.
If you pair structured prompting with human judgment, AI becomes less like a gimmick and more like a serious tool for exploration. And if newer models keep improving, the competitive advantage will shift from "having AI" to "knowing how to direct it."
This blog post expands on a viral LinkedIn post by Ethan Mollick, Associate Professor at The Wharton School. Author of Co-Intelligence. View the original LinkedIn post →