I spent this morning running an audit on my own audit funnel.
I didn't plan to. I was answering a question I thought I already knew the answer to — how is VibeTokens doing? — and the number I opened the conversation with turned out to be a story I'd been telling myself for two weeks. Twenty-four hundred dollars a month from three paying clients, I said. Then I pulled Stripe.
Zero.
Not "low." Not "soft." Zero paying subscriptions. One lifetime invoice for six hundred and forty dollars, paid in March, from a friend who was doing me a favor. The rest was phantom revenue I'd written into a status file on March 27 and copy-pasted forward into six other files without ever checking the source of truth.
That was the beginning of the autopsy, not the end of it.
The Sample Size I Invented
The file that convinced me we had traction said there were fifteen audits in the VibeTokens funnel, a twenty percent conversion rate, three paying clients. I had the Notion database open in another tab and the Stripe dashboard in another, and I was still quoting the status file.
So I wrote a script. Hit the Notion API, pulled every row from the VT Audits database, and filtered by the email address on the record.
Fifteen rows total. Twelve of them were me. jasonmatthewmurphy@gmail.com fifteen times over, with twelve different business names I'd invented to test different parts of the pipeline. Murphys Plumbing Test. E2E Test Company. The Brow Fairy. Final Email Test. I'd been counting my own QA sessions as funnel activity.
Two of the remaining three were friends. They knew the product was half-built and submitted anyway because they love me and they wanted to help. They are not a conversion rate. They are a favor.
That left one. Exactly one real stranger had found /start on his own, filled out the form, and sat through the ninety-second audit pipeline without anyone holding his hand.
A sample size of one is not a conversion rate. A sample size of one is a test case. I had been running an entire business strategy against a test case and calling it data.
The Stranger's Audit
His name isn't mine to share. I'll call him the stranger. He runs a home service business in Texas. He submitted the audit on April 8 at 5:19 PM Central.
Before the audit modules ran, the intake chatbot asked him four questions. Here's what he typed, verbatim, from the JSON I pulled out of Notion:
{
"services": ["lawn service"],
"frustrations": "people finding but not calling",
"readyToBuild": false
}
Three signals. Lawn service. People find him, they don't call. He is not ready to build a new site.
The audit that generated forty-five seconds later recommended he create a dedicated page for kitchen remodeling. And bathroom renovation. And roofing. And general contractor services. The summary opened by benchmarking him against a local contractor with a 4.9-star Google rating and suggested he build a testimonials page. The five recommendations at the end of his report all began with variations of create and build.
Every single one of those recommendations contradicted something he had literally just typed.
The model that generated his keywords didn't know he ran lawn service. It saw the business name, saw home service, and pattern-matched to general contracting. The summary module wrote its executive overview from the keyword output. The email module pulled the summary and shipped it. No step in the chain looked back at the intake form to ask whether the story it was telling matched the one the customer had told first.
By the time the email hit his inbox, the funnel had already failed. Not because of a bug in any single module — each one did exactly what it was written to do — but because the chain had no idea what a coherent audit looked like, so it couldn't recognize an incoherent one.
The Friends Who Were Testing
The two friends were the same story in different costumes.
One of them runs a tree service. His audit ran the site-health module against his domain; the PageSpeed API silently returned nulls. It ran the Google Business Profile module; the API returned found: false. The other three modules never produced files. The orchestrator marked the lead complete anyway. He got an email telling him his audit was ready. The report was blank. He clicked the payment link as a favor and nothing happened, because the checkout flow was half-built. He never said anything because he's my friend.
The other runs a salon. The GBP module pulled back a business record for a completely different salon at the same address. The summary module read that record and confidently opened with "Mavon Beauty has a significant brand identity crisis — your Google Business Profile shows up as 'Cosmetique West' instead of your real name." It was accusing her of a brand crisis that was actually my data-matching bug. The email fired. She was too polite to correct me.
Three out of three warm leads had received broken or wrong audits. One out of one real stranger had received an audit for the wrong business. The conversion rate I'd been quoting for two weeks wasn't low. It was unmeasured, because the pipeline I was measuring had never been stress-tested against anyone who would tell me the truth.
The Real Problem
I can show you the exact line of code where the funnel fails.
It's in lib/audit/deliver.ts. The function is seventy lines long. Line sixty-two pulls the summary JSON. Line seventy sends the email. Between those two lines there is nothing — no verification that the GBP the audit matched actually bears the submitted business name, no check that performance scores returned real numbers instead of nulls, no cross-reference between the recommendations the summary made and the readiness signals the intake captured, no alignment test between the keywords the model surfaced and the services the customer listed.
The delivery function trusts the pipeline completely. Every file it reads, it believes. And every lead it sends, it ships. Whether the audit is brilliant or whether it has confidently matched the wrong company, the function can't tell the difference, and so the customer can't either until it's too late.
I'd built a system whose weakest link was unaware it was the weakest link.
What I'm Shipping
There is an obvious fix and I'm building it tonight.
An integrity gate in front of the delivery function. Before any email leaves, four checks run:
- The GBP module returned
found: trueAND the business name it matched is a fuzzy match for the one the customer submitted. If the Levenshtein distance exceeds fifty percent, it's a different business. Flag. - The site-health module returned a performance score that isn't null. If it's null, PageSpeed silently failed. Retry, don't ship.
- The summary's recommendations don't contain phrases like create a new, build a, or new website if the intake form said
readyToBuild: false. If they do, the audit contradicts the customer. Flag. - At least one of the customer's stated services fuzzy-matches at least one of the top five keywords the audit surfaced. If not, the pipeline audited the wrong vertical. Flag.
Any flag routes the email to me instead of the customer, with a diff showing what the modules produced vs. what the customer actually said. I decide whether to re-run, correct by hand, or discard.
In parallel I'm pausing the outbound outreach cron that was about to start pushing fresh cold leads into this funnel tonight. A one-day delay on cold outreach is cheap. A wrong-vertical audit landing in a stranger's inbox is a letter of introduction I can't take back.
The Lesson I'll Actually Remember
Every startup I've worked in has a version of this moment. The dashboard looks good, the numbers in the weekly update match the story in the founder's head, and the founder is spending his time optimizing a funnel that has never actually been tested by someone who isn't emotionally invested in its success.
I'd been doing it to myself for two weeks. I'd built a quality control layer for everyone else — clients, prospects, search engines — and exempted my own reporting from it. The number in my status file said one thing. Stripe said another. I kept reading the status file.
The rule I added to my global instructions tonight is short. Revenue comes from Stripe. Period. Never cite MRR, ARR, client count, or any dollar figure from a local file. If you're about to state a number, run the script first. I added a similar rule for the audit pipeline itself. Before any audit leaves the building, it has to pass a set of sanity checks that don't trust the model.
The version of me that trusts the model is the version that ships wrong-vertical audits to strangers in Texas. The version that audits the model is the version that finds out in time.
If you run a small business and you want to see what your own audit funnel would tell you — the intake is at /start and the pipeline takes about two minutes. The integrity gate will be live before I drive any new traffic to it. You'll get a report about your business, not one that a model guessed at.
And if you're a founder reading this and nodding — audit your own dashboard tonight. Pull the source of truth. Count the rows where the email isn't yours. Read what the strangers actually said. The number you walk away with is the number you actually have.
That's the one worth improving.
— Jason Murphy, VibeTokens
