Articles on A/B testing

In-depth articles on A/B testing with primary focus on statistical methods applied to online experimentation. Written in an accessible language targeted at conversion optimization practitioners the articles also go into deep technical topics where necessary.

Stop AbUsing the Mann-Whitney U Test (MWU)

Mann-Whitney-U Test

The Mann Whitney U Test (MWU), also known as the Wilcoxon Rank Sum Test and the Mann-Whitney-Wilcoxon Test, continues to be advertised as the go-to test for analyzing non-normally distributed data. In online experimentation it is often touted as the most suitable for analyses of non-binomial metrics with typically non-normal (skewed) distributions such as average […] Read more…

Also posted in Statistics | Tagged , , , , , , , , ,

Q&A on Sequential Statistics in A/B Testing

Sequential testing QA

Sequential statistics are gathering interest and there are more and more questions posed by CROs looking into the matter. For this article I teamed up with Lucia van den Brink, a distinguished CRO consultant who recently started using Analytics Toolkit and integrated frequentist sequential testing into her client workflow. In this short interview she asks […] Read more…

Posted in A/B testing | Tagged , ,

Sequential Testing is About Improving Business Returns

Sequential Testing Efficiency

A central feature of sequential testing is the idea of stopping “early”, as in “earlier compared to an equivalent fixed-sample size test”. This allows running A/B tests with fewer users and in a shorter amount of time while adhering to the targeted error guarantees. For example, a test may be planned with a maximum duration […] Read more…

Also posted in AGILE A/B testing | Tagged , , , ,

False Positive Risk in A/B Testing

False positive risk in A/B testing

Have you heard how there is a much greater probability than generally expected that a statistically significant test outcome is in fact a false positive? In industry jargon: that a variant has been identified as a “winner” when it is not. In demonstrating the above the terms “False Positive Risk” (FPR), “False Findings Rate” (FFR), […] Read more…

Also posted in Bayesian A/B testing, Statistics | Tagged , , , , , , , , ,

Analytics Toolkit to discontinue Google Analytics-related functionalities

Discontinuing Google Analytics Functionalities

Analytics Toolkit was conceived in 2012 as a set of tools that automate essential Google Analytics-related tasks and augment the GA functionalities in various ways. This goal was achieved in the years since with the release of over a dozen tools utilizing the Google Analytics API. These were accompanied by dozens of in-depth technical articles […] Read more…

Also posted in Analytics-Toolkit.com, Google Analytics | Tagged , , , ,

How to Run Shorter A/B Tests?

Shorter A/B Tests

Running shorter tests is key to improving the efficiency of experimentation as it translates to smaller direct losses from testing inferior experiences and also less unrealized revenue due to late implementation of superior ones. Despite this, many practitioners are yet to start conducting tests at the frontier of efficiency. This article presents ways to shorten […] Read more…

Also posted in Statistics | Tagged , , , ,

Comparison of the statistical power of sequential tests: SPRT, AGILE, and Always Valid Inference

Power and Average Sample Size of Sequential Tests

In A/B testing sequential tests are gradually becoming the norm due to the increased efficiency and flexibility that they grant practitioners. In most practical scenarios sequential tests offer a balance of risks and rewards superior to that of an equivalent fixed sample test. Sequential monitoring achieves this superiority by trading statistical power for the ability […] Read more…

Also posted in AGILE A/B testing, Statistics | Tagged , , , , , ,

Statistical Power, MDE, and Designing Statistical Tests

Statistical Power and MDE Demystified

One topic has surfaced in my ten years of developing statistical tools, consulting, and participating in discussions and conversations with CRO & A/B testing practitioners as causing the most confusion and that is statistical power and the related concept of minimum detectable effect (MDE). Some myths were previously dispelled in “Underpowered A/B Tests – Confusions, […] Read more…

Also posted in Statistics | Tagged , , , ,

What Can Be Learned From 1,001 A/B Tests?

Meta Analysis

How long does a typical A/B test run for? What percentage of A/B tests result in a ‘winner’? What is the average lift achieved in online controlled experiments? How good are top conversion rate optimization specialists at coming up with impactful interventions for websites and mobile apps? This meta-analysis of 1,001 A/B tests analyzed using […] Read more…

Also posted in AGILE A/B testing, Conversion optimization | Tagged , , , , , , , , ,

When Session-Based Metrics Lie

Per Session Metrics In AB Testing

In online A/B testing it is not uncommon to see session-based metrics being used as the primary performance indicator. Session-based conversion rates and session-based averages (like average revenue per session, in likeness to ARPU) are often reported by default in software by prominent vendors, including Google Optimize and Google Analytics. This widespread availability makes session-based […] Read more…

Also posted in Conversion optimization | Tagged , , ,

A/B Testing Statistics – A Concise Guide for Non-Statisticians

AB Testing Statistics

Navigating the maze of A/B testing statistics can be challenging. This is especially true for those new to statistics and probability. One reason is the obscure terminology popping up in every other sentence. Another is that the writings can be vague, conflicting, incomplete, or simply wrong, depending on the source. Articles sprinkled with advanced math, […] Read more…

Also posted in Statistics | Tagged , , , , ,

P-values and Confidence Intervals Explained

P-values and confidence intervals explained

Hundreds if not thousands of books have been written about both p-values and confidence intervals (CIs) – two of the most widely used statistics in online controlled experiments. Yet, these concepts remain elusive to many otherwise well-trained researchers, including A/B testing practitioners. Misconceptions and misinterpretations abound despite great efforts from statistics educators and experimentation evangelists. […] Read more…

Also posted in Statistical significance, Statistics | Tagged , , , , , ,