Back to Blog
Trending Post

Prateek Joshi on the Power of Repeating Prompts

·Prompt Engineering
·Share on:

A deeper look at Prateek Joshi's prompt repetition tip, why it works in LLMs, and when to use it for better accuracy.

prompt engineeringLLMsGoogle researchbenchmarkinggenerative AILinkedIn contentviral postscontent strategysocial media marketing

Prateek Joshi recently shared something that caught my attention: researchers at Google found that "simply repeating your prompt twice can dramatically improve LLM performance" - even reporting jumps like "21% to 97%" on specific search tasks.

That is the kind of claim that makes you pause, because it is both oddly simple and deeply technical. And the more I sit with it, the more it feels like a perfect example of what prompt engineering still is in 2026: not magic words, but small, testable changes that align better with how models actually process text.

Below, I want to expand on what Prateek pointed out, add some context about why repetition helps, and share practical ways to use this technique without fooling yourself with cherry-picked wins.

The idea: give the model a second pass

Prateek explained the core mechanism in a way I rarely see stated so clearly: most large language models read prompts left-to-right using causal attention. In other words, when the model is processing early tokens, it has not yet seen the later tokens.

So what happens if you duplicate the input?

If your prompt is structured like: "[Task] [Task]", the second copy can attend to the full context of the first.

That matters because the second repetition is no longer "early" in the overall sequence. By the time the model reaches it, the relevant constraints, definitions, and examples from the first copy are already in its attention window.

This is different from asking the model to "think twice" or "reflect" after generating an answer. It is a structural trick: you are changing what information is available during the model's internal processing of the prompt.

Why causal attention makes repetition surprisingly effective

Let me rephrase Prateek's intuition in a more concrete way.

When you put instructions at the top of a prompt, you are betting that the model will retain those instructions as it moves forward into details, edge cases, and noisy context. But attention is a finite resource. Even with modern architectures, models can drift as the prompt grows.

Repeating the task can help in three ways:

  1. Constraint reinforcement: The second statement of the task restates the rules right when the model is about to decide what to do.

  2. Disambiguation after context: Sometimes the context clarifies what the task really means. By repeating the task after the context (or by repeating the whole block), you let the later tokens reinterpret the ask with more clarity.

  3. Error correction during prefill: During the prompt ingestion (often called "prefill"), the model forms internal representations before it generates any output token. If the second copy yields a cleaner representation of the task, the generation phase starts from a better place.

Prateek also noted an important practical detail: if the repetition happens in prefill, it can be parallelizable on modern serving stacks. That means it may add tokens (and thus some cost), but it does not necessarily add much generation latency the way long chain-of-thought style prompting can.

What the benchmarking claim suggests (and what it does not)

Prateek highlighted that the technique was tested across 7 major models (he name-checked Gemini, GPT-4o, Claude 3, DeepSeek) and 7 benchmarks, with a striking summary: it "won" 47 out of 70 tests with 0 losses.

That kind of result is a strong signal that we are not looking at a model-specific quirk. But it is still worth interpreting carefully:

  • A "win" can mean very different things depending on the task. Search and retrieval-style benchmarks are often brittle and instruction-sensitive, so prompt structure changes can swing accuracy a lot.
  • "0 losses" sounds incredible, but it may depend on how losses are defined (for example, non-significant differences may be counted as ties rather than losses).
  • The biggest gains usually show up where the baseline prompt is under-specified or easy to misread.

So I take the takeaway as: repetition is a low-effort baseline to try, not a guarantee.

How to use prompt repetition in practice

If you want to apply what Prateek shared, here are a few patterns that are easy to test.

Pattern 1: Repeat the instruction block, not the entire prompt

This is the simplest, lowest-token version.

  • Put your instructions at the top.
  • Provide context.
  • Repeat the same instructions again right before the output format.

Example:

Instruction: Summarize the document into 5 bullets. Keep each bullet under 12 words.

Context: [your text]

Instruction (repeat): Summarize the document into 5 bullets. Keep each bullet under 12 words.

You get the "second pass" effect without duplicating a huge context window.

Pattern 2: Repeat the task with a tighter phrasing the second time

Sometimes the best second repetition is more explicit.

First instruction (broader): "Extract the key entities and their roles."

Second instruction (tight): "Return JSON with fields: person, organization, role, evidence_quote."

You still respect the spirit of Prateek's point (restating the task later), while using the second pass to remove ambiguity.

Pattern 3: Repeat the user query after tool outputs or retrieved passages

For RAG systems, this can be especially helpful.

Flow:

  1. User question
  2. Retrieved documents
  3. Repeat user question (verbatim)
  4. Answer with citations

That ensures the model does not get lost in retrieval noise. In my experience, many "hallucinations" in RAG are just goal drift.

When repetition might not help (or could hurt)

A technique this simple invites overuse. Here are a few constraints to keep in mind.

Token limits and cost still matter

Even if prefill is parallelizable, extra tokens are still extra tokens. If you duplicate large contexts, you can:

  • push out important earlier content due to context window limits
  • pay more per request
  • increase time-to-first-token in some deployments

So prefer repeating the instruction block unless you have evidence that duplicating more improves accuracy.

Do not confuse repetition with better evaluation

If you test this, do not just eyeball one or two examples. Track outcomes:

  • define a scoring rule (exact match, rubric, human rating)
  • run enough samples to avoid randomness
  • compare to a baseline prompt that is already well-written

Otherwise, you may attribute a win to repetition when the real win is simply clearer instructions.

Some tasks need diversity, not reinforcement

Brainstorming and creative writing can get worse when you over-constrain the model. Repetition can increase compliance, which is great for extraction and classification, but can reduce novelty for ideation.

Why this kind of insight goes viral

Prateek ended with a line I agree with: it is amazing we are still discovering low-hanging fruit in prompt engineering.

This also explains the post's appeal as a piece of LinkedIn content. It combines:

  • a counterintuitive claim (repeat the prompt)
  • a memorable metric (21% to 97%)
  • a mechanism explanation (causal attention, second pass)
  • broad validation (multiple models, multiple benchmarks)

That is a repeatable content strategy: make a simple action feel justified by a concrete mechanism and credible testing. When you can do that, you create something people can try immediately, then share because it worked for them.

A quick checklist to try today

If you want a practical starting point inspired by Prateek's post, here is a fast experiment plan:

  1. Pick one workflow (classification, extraction, RAG Q&A).
  2. Create a baseline prompt.
  3. Create a repeated-instruction version.
  4. Run 30 to 100 test cases.
  5. Track accuracy and failure modes (format errors, missed constraints, wrong citations).

If you see improvement, keep the repetition. If you do not, you learned something equally valuable: your task may not be sensitive to this particular alignment trick.

Sometimes the best prompt engineering is not adding clever words. It is matching the prompt structure to the model's reading behavior.

This blog post expands on a viral LinkedIn post by Prateek Joshi, Infra Investing at Moxxie Ventures | Author of 13 AI books | Nvidia alum | Recovering Founder. View the original LinkedIn post →