David Arnoux Automates Invoice Reconciliation with AI
A practical breakdown of David Arnoux's viral workflow using Claude, Gmail MCP, and Vision AI to end manual invoice processing.
David Arnoux recently shared something that caught my attention: "still managing invoices manually in 2026???" Then he followed with the part that will feel painfully familiar if you have ever closed the books at the end of a month: downloading invoice PDFs from email one by one, typing numbers into spreadsheets, and hoping you did not miss the one invoice that comes back to haunt you at tax time.
I want to expand on what David is pointing to here, because it is bigger than a clever automation. It is a pattern for turning a repetitive back-office chore into a lightweight, reliable system that runs in the background and only asks for human attention when it truly needs it.
Key insight: stop treating invoices as documents you process, and start treating them as signals your systems can detect, extract, validate, and file.
The real problem David is calling out
David Arnoux explained that even with a "smart" neobank and a stack full of integrations, the last mile is still manual: invoices arrive in messy formats, scattered across threads, vendors change templates, and someone has to reconcile what got paid with what got received.
That last mile is where time disappears. Not because any single invoice is hard, but because the work is fragmented:
- Find the email
- Download the PDF
- Open it
- Read the fields
- Copy amounts, dates, invoice numbers
- Confirm vendor identity
- Attach the document somewhere
- Repeat 20-200 times
David's claim is simple: you can eliminate most of this with Claude connected to Gmail via MCP, plus Vision AI that reads invoices "like a human would".
Why connecting AI to Gmail changes the game
When an AI assistant is only a chat window, it helps you think. When it can connect to your inbox, it can help you operate.
As David described it, Claude can connect directly to Gmail via MCP and "watch for invoice patterns" instead of waiting for you to forward attachments or paste content. This matters because invoices are not a one-time task. They are an ongoing stream.
The moment you frame invoices as a stream, you can build an assembly line:
- Detect incoming invoice emails
- Capture attachments
- Read the document
- Extract structured data
- Validate against rules
- Post to your ledger
- File the source of truth
- Send a summary and queue exceptions
That is exactly the workflow David outlined.
The workflow, expanded into a practical system
David Arnoux said the system extracts vendor, amount, date, invoice number, and even line items with 95%+ accuracy, then validates, saves to bookkeeping software, archives PDFs to Google Drive by vendor and month, flags ambiguous items, and sends a daily summary.
Let’s break that into components you can actually implement and maintain.
1) Invoice detection rules (your "secret sauce")
David called out the "secret": detection rules. You teach the system your vendor patterns once, and then it runs.
This is the part most people skip. They jump straight to OCR and extraction, but the highest leverage is deciding what counts as an invoice and where to look for it.
Good detection rules typically use a mix of:
- Sender domain (for example, @vendor.com)
- Subject keywords (invoice, receipt, statement, billing)
- Attachment types (PDF)
- Common invoice identifiers (INV-, Invoice #, Bill No.)
- Known vendor display names that vary ("AWS", "Amazon Web Services")
Practical tip: avoid overly strict detection rules. As David noted, it is often better to flag false positives than to miss real invoices.
2) Automated download and document handling
Once detection triggers, the system should download PDFs and normalize them:
- Rename files consistently (Vendor - YYYY-MM-DD - InvoiceNumber.pdf)
- Store the original file unchanged
- Create a derived text or image representation for extraction if needed
This is where reliability comes from. If you can always find the source document, you can always audit.
3) Vision-based extraction that works across templates
David mentioned using Claude Vision to read the invoice like a human. In practice, that means your extraction prompt (or workflow instructions) should be explicit about:
- Which fields to capture
- How to handle missing fields
- How to output data in a strict schema (JSON or a table)
- How to treat taxes, discounts, shipping, and multiple currencies
- How to parse line items when present
A robust approach is to extract:
- Vendor legal name
- Vendor address (optional)
- Invoice date
- Due date
- Invoice number
- Subtotal
- Tax total
- Total amount due
- Currency
- Line items (description, quantity, unit price, line total)
And then compute checks:
- Does subtotal + tax equal total?
- Is the currency expected for that vendor?
- Is the invoice date in a plausible range?
4) Validation and the review queue
This is the safety net David referenced when he said the system "flags anything ambiguous for quick review".
Treat the review queue as a product:
- Show the extracted values next to the invoice image
- Highlight low-confidence fields
- Provide one-click actions: approve, edit, reject, duplicate
- Track reasons for flags so you can improve rules later
If you do this well, "tax season chaos" really can become a short weekly review. You are not removing human oversight, you are concentrating it where it matters.
5) Posting to bookkeeping software (Sheets, QuickBooks, Xero)
David listed Google Sheets, QuickBooks, or Xero as targets. The key is to pick one system as your ledger of record and keep the others as operational views.
Common posting fields:
- Vendor
- Account/category mapping
- Amount
- Tax
- Date
- Due date
- Memo
- Attachment link (Drive URL)
A strong pattern is to store the extracted data in a simple table first (like Sheets), then sync approved items to QuickBooks or Xero. That keeps automation flexible while reducing the risk of messy postings.
6) Filing PDFs in Drive by vendor and month
David emphasized a Google Drive structure organized by vendor and month. That is not just tidy, it is operationally important:
- You can audit quickly
- You can share with accountants
- You can train detection rules from historical folders
- You can recover from mistakes
A simple structure:
- Invoices/
- Vendor Name/
- 2026-01/
- 2026-02/
- Vendor Name/
7) Daily summaries that build trust
If the automation runs silently, people do not trust it. The daily summary David mentioned solves that.
A useful summary includes:
- Count of invoices processed
- Total amount captured
- Vendors processed
- Items in review queue
- Any anomalies (new vendor, currency change, unusually high amount)
Over time, these summaries become a lightweight internal control.
Time-to-build versus time saved (and why it matters)
David Arnoux shared concrete numbers: time to build around 2 hours, time saved 3-6 hours per month, and the end state is "zero manual invoice processing".
Even if your real-world results are more modest, the ROI is compelling because the workflow compounds:
- Every new vendor rule reduces future work
- Every reviewed exception improves your detection
- Every month of clean data reduces year-end cleanup
What to watch out for: security, access, and failure modes
A system that reads your inbox and touches your accounting data deserves basic guardrails:
- Use least-privilege access for Gmail and Drive
- Keep an audit log of what was read, what was extracted, and what was posted
- Separate "extract" from "post" with approvals for higher-risk amounts
- Handle duplicates (invoice resent, updated invoices, credit notes)
- Define what happens when extraction confidence is low
This is how you keep speed without losing control.
Why this post went viral (and what to learn from it)
David's post works because it pairs a relatable pain with a specific, achievable outcome. It is not "AI will change accounting". It is "stop downloading PDFs and typing numbers".
It also provides a clear playbook: Gmail MCP setup, detection rules, Vision prompt, connections to Sheets and QuickBooks or Xero, Drive structure, review queue, and a warning about overly strict rules.
If you are building your own LinkedIn content, there is a lesson here in content strategy too: real story, clear metrics, and an actionable system. That combination is why LinkedIn content can spread fast, and why viral posts often come from highly specific workflows that readers can imagine themselves using tomorrow.
This blog post expands on a viral LinkedIn post by David Arnoux, Helping GTM Leaders & Founders Grow With GTM x AI | Fractional CxO | Building Linkedin Tools @ humanoidz.ai. View the original LinkedIn post →