
From the conversion glossary
Concepts referenced in this article, defined.

Concepts referenced in this article, defined.
Run rigorous A/B tests and personalize every visit on Shopify or any storefront โ no engineers required.
A CRO program without documentation is a memory game. Tests get repeated, hypotheses get forgotten, and every team change resets the organization's learning. Documentation transforms your testing activity from a series of one-off experiments into a compounding knowledge base โ where every test builds on what came before and every new team member can immediately access the full history of what's been tried, what worked, and why. This guide provides ready-to-use templates and the practices that make documentation actually useful rather than just theoretically important.
Most CRO teams start with documentation intentions and abandon them. The reasons:
Too complex: A 15-field hypothesis template takes 45 minutes to fill out. Teams skip it when testing velocity is the priority.
Not searchable: Results stored in slide decks or scattered across email threads can't be searched. When you want to know "have we ever tested this?" you can't find out.
Not linked to outcomes: Documentation that captures what was tested but not what was decided or implemented becomes an orphaned archive.
No review cadence: Documentation that doesn't get read in regular team meetings becomes performative โ filled out to satisfy a process, not to drive learning.
The solution: simpler templates, a single source of truth, and a regular cadence where documentation is actively used.
Use this before launching any test. It forces clear thinking and creates a reference for post-test analysis.
HYPOTHESIS BRIEF
Test name: [Short descriptive name โ e.g., "Checkout trust badge โ return policy"]
Test ID: [Sequential number โ e.g., CRO-047]
Date created: [YYYY-MM-DD]
Test owner: [Name]
Page / Element: [URL or element description]
Observation: [What data, user research, or qualitative signal prompted this hypothesis?] Example: "Heatmaps show 62% of visitors exit the checkout page at the payment step. Exit survey data suggests 28% cite 'unsure about return policy' as their reason."
Hypothesis: "We believe that [change] will [improve/reduce] [metric] for [audience segment] because [reason based on evidence]." Example: "We believe adding a '30-day free returns' badge adjacent to the payment button will reduce checkout abandonment by 10โ15% for first-time buyers because the exit survey data indicates return policy uncertainty is a top objection at this step."
Control: [Description or screenshot URL]
Variant: [Description or screenshot URL]
Primary metric: [Single metric you'll use to determine winner โ e.g., checkout completion rate]
Secondary metrics: [Supporting metrics to track โ e.g., add-to-cart rate, revenue per session]
Traffic allocation: [50/50 | 70/30 | other โ and reason]
Target audience: [All visitors | new visitors only | mobile only | other segment]
Minimum sample size needed: [Calculate using a sample size calculator]
Planned runtime: [Minimum 2 weeks; calculated end date]
ICE/PIE Score:
Use this when a test concludes. Fill it out within 48 hours of stopping the test.
TEST RESULTS REPORT
Test name: [Same as hypothesis brief]
Test ID: [Same as hypothesis brief]
Dates: [Start date] โ [End date]
Duration: [X days]
Total visitors: [Per variant and total]
Result:
Primary metric outcome:
| Control | Variant | Change | |
|---|---|---|---|
| [Metric] | X% | Y% | +/- Z% |
Statistical confidence: ___% (target: 95%)
Revenue / business impact (if implemented): "At current traffic levels, implementing this variant is estimated to generate โน___/month in additional revenue."
Secondary metrics summary: [Did any secondary metrics move unexpectedly? Positively or negatively?]
Segment analysis: [Did the effect differ by device / traffic source / new vs. returning / geo?]
Hypothesis verdict:
Key learning: In 2โ3 sentences: what did this test teach us about our customers' behavior?
Decision:
Next hypothesis generated: "Based on this result, we will next test ___ because ___."
This is your single source of truth โ a spreadsheet or database where every test gets one row. All hypothesis briefs and results reports link back to this log.
Columns:
| Column | Description |
|---|---|
| Test ID | Sequential (CRO-001, CRO-002...) |
| Test Name | Short descriptive name |
| Status | Planning / Live / Concluded / Implemented / Archived |
| Page | URL or page type |
| Element | What was changed |
| Pillar | Which CRO pillar (checkout / PDP / homepage / etc.) |
| Hypothesis | One-sentence summary |
| Primary Metric | What was measured |
| Result | Won / Lost / Inconclusive |
| Lift | % improvement (blank if inconclusive) |
| Revenue Impact (โน/mo) | Estimated monthly impact |
| Confidence | % statistical confidence |
| Start Date | YYYY-MM-DD |
| End Date | YYYY-MM-DD |
| Implemented? | Yes / No / Pending |
| Implementation Date | When the winner was deployed |
| Owner | Test owner name |
| Notes | Any anomalies, caveats, links to deeper docs |
This log should be reviewed weekly in your team meeting and monthly with leadership. The "Implemented?" column and "Revenue Impact" columns are the two most important for demonstrating program value.
This is your pipeline โ tested ideas waiting to run, ordered by ICE/PIE score.
HYPOTHESIS BACKLOG
| ID | Hypothesis | Page | Element | Impact (1โ10) | Confidence (1โ10) | Ease (1โ10) | Score | Status |
|---|---|---|---|---|---|---|---|---|
| H-022 | Adding a live chat button... | Checkout | Support widget | 8 | 7 | 6 | 21 | Ready to test |
| H-023 | Showing ingredient origin story... | PDP | Description | 7 | 6 | 8 | 21 | Ready to test |
| H-024 | Free shipping threshold bar... | Cart | Progress indicator | 8 | 8 | 5 | 21 | Design needed |
Score is the sum of Impact + Confidence + Ease. Work from the top of the list. Add new hypotheses at any time with their scores โ the list self-sorts.
A one-page summary of program performance, shared with leadership monthly.
CRO PROGRAM MONTHLY REPORT โ [Month Year]
Tests launched this month: X
Tests concluded this month: X
Winners implemented: X (___% implementation rate)
Estimated monthly revenue impact from implemented tests:
Current tests live: [List with status]
Key learning this month: [1โ3 bullet points on what you learned โ about customers, about what works, about areas to explore]
Next month's priority tests: [2โ3 tests planned with one-line rationale for each]
Blockers: [Any constraints preventing velocity โ developer bandwidth, traffic limitations, data access]
One source of truth: All documentation lives in one place (Notion, Confluence, Google Drive โ pick one). No results in slides, no hypotheses in individual emails, no test logs in personal spreadsheets.
File every test, including null results: Tests that showed no winner are often more valuable than wins โ they eliminate directions. A searchable archive of null results prevents teams from re-running tests that already have answers.
Tag by element type: Create a tagging system (checkout, PDP, homepage, cart, navigation, email, mobile) so you can instantly filter the test log by area. When you're planning a new PDP test, you can immediately see all previous PDP tests and their outcomes.
Link results to decisions: Documentation that captures "the test won" but not "and we implemented it on [date]" is incomplete. The implementation column in your test log closes the loop.
Review documentation in team rituals: A weekly 30-minute "test review" where the team reads results aloud, discusses learnings, and prioritizes next tests ensures documentation is actively used โ not just filed.
Assign ownership clearly: Every test has one owner who is responsible for the hypothesis brief, the results report, and the implementation follow-through. Shared ownership is no ownership.
Documentation isn't just operational โ it's cultural. A well-maintained test archive tells the organization:
When stakeholders propose changes based on gut feeling or competitor copying, a searchable test archive lets you say "We tested something similar in Q2 โ here's what we found." That's institutional knowledge creating better decisions.
CustomFit.ai's built-in test history and results tracking automates much of this documentation โ every test you run is automatically logged with traffic, variants, and results. You layer the qualitative documentation (hypothesis rationale, learnings, next steps) on top.
Related reading: Experimentation Culture Pillar | CRO Pillar | Testing Velocity: How Many Tests Should You Run? | How to Present A/B Test Results to Stakeholders