Articles on A/B testing

In-depth articles on A/B testing with primary focus on statistical methods applied to online experimentation. Written in an accessible language targeted at conversion optimization practitioners the articles also go into deep technical topics where necessary.

Overgeneralization in A/B testing

Overgeneralization is a mistake in interpreting the outcomes of online controlled experiments (a.k.a. A/B tests) that can have a detrimental impact on any data-driven business. Overgeneralization is used in the typical sense of going above and beyond what the evidence at hand supports, with “evidence” being a statistically significant or non-significant outcome of an online […] Read more…

Also posted in Conversion optimization | Tagged ab testing, conversion rate optimization, cro, online experimentation, statistical significance

The Business Value of A/B Testing

Several charges are commonly thrown at A/B testing while considering it or even after it has become standard practice in a company. They may come from product teams, designers, developers, or management, and can be summed up like this: A good way to address these and to make the business case for experimentation is to […] Read more…

Also posted in Conversion optimization | Tagged ab test ROI, business experiments, business risk, business value, return on investment, risk reward analysis, roi

What if the Observed Effect is Smaller Than the MDE?

The above is a question asked by some practitioners of A/B testing, as well as a number of their clients when examining the outcome of an online controlled experiment. It may be raised regardless if the outcome is statistically significant or not. In both cases the fact the observed effect in an A/B test is […] Read more…

Also posted in Statistics | Tagged mde, minimum detectable effect, minimum effect of interest, observed power, statistical power

Using Observed Power in Online A/B Tests

Observed power, often referred to as “post hoc power” and “retrospective power” is the statistical power of a test to detect a true effect equal to the observed effect size. “Detect” in the context of a statistical hypothesis test means to result in a statistically significant outcome. Some calculators aimed at A/B testing practitioners use […] Read more…

Also posted in Statistics | Tagged observed power, optional stopping, peeking, post hoc power, statistical power, underpowred tests

Stop AbUsing the Mann-Whitney U Test (MWU)

The Mann Whitney U Test (MWU), also known as the Wilcoxon Rank Sum Test and the Mann-Whitney-Wilcoxon Test, continues to be advertised as the go-to test for analyzing non-normally distributed data. In online experimentation it is often touted as the most suitable for analyses of non-binomial metrics with typically non-normal (skewed) distributions such as average […] Read more…

Also posted in Statistics | Tagged arpu, average revenue per user, difference in medians, mann-whitney u test, mwu, skewed distribution, skewness, statistical power, statistical significance, stochastic difference

Q&A on Sequential Statistics in A/B Testing

Sequential statistics are gathering interest and there are more and more questions posed by CROs looking into the matter. For this article I teamed up with Lucia van den Brink, a distinguished CRO consultant who recently started using Analytics Toolkit and integrated frequentist sequential testing into her client workflow. In this short interview she asks […] Read more…

Posted in A/B testing | Tagged early stopping, peeking, sequential testing

Sequential Testing is About Improving Business Returns

A central feature of sequential testing is the idea of stopping “early”, as in “earlier compared to an equivalent fixed-sample size test”. This allows running A/B tests with fewer users and in a shorter amount of time while adhering to the targeted error guarantees. For example, a test may be planned with a maximum duration […] Read more…

Also posted in AGILE A/B testing | Tagged agile ab testing, return on investment, roi, sequential testing, testing velocity

False Positive Risk in A/B Testing

Have you heard how there is a much greater probability than generally expected that a statistically significant test outcome is in fact a false positive? In industry jargon: that a variant has been identified as a “winner” when it is not. In demonstrating the above the terms “False Positive Risk” (FPR), “False Findings Rate” (FFR), […] Read more…

Also posted in Bayesian A/B testing, Statistics | Tagged bayes rule, bayesian inference, false discovery rate, false findings rate, false positive rate, false positive risk, fpr, p-value, statistical power, type I error

Analytics Toolkit to discontinue Google Analytics-related functionalities

Discontinuing Google Analytics Functionalities

Analytics Toolkit was conceived in 2012 as a set of tools that automate essential Google Analytics-related tasks and augment the GA functionalities in various ways. This goal was achieved in the years since with the release of over a dozen tools utilizing the Google Analytics API. These were accompanied by dozens of in-depth technical articles […] Read more…

Also posted in Analytics-Toolkit.com, Google Analytics | Tagged bigquery, cardinality estimates, google analytics 4, google analytics api, hyperloglog

How to Run Shorter A/B Tests?

Running shorter tests is key to improving the efficiency of experimentation as it translates to smaller direct losses from testing inferior experiences and also less unrealized revenue due to late implementation of superior ones. Despite this, many practitioners are yet to start conducting tests at the frontier of efficiency. This article presents ways to shorten […] Read more…

Also posted in Statistics | Tagged ab testing, efficient testing, small sample size, statistical power, test duration

Comparison of the statistical power of sequential tests: SPRT, AGILE, and Always Valid Inference

Power and Average Sample Size of Sequential Tests

In A/B testing sequential tests are gradually becoming the norm due to the increased efficiency and flexibility that they grant practitioners. In most practical scenarios sequential tests offer a balance of risks and rewards superior to that of an equivalent fixed sample test. Sequential monitoring achieves this superiority by trading statistical power for the ability […] Read more…

Also posted in AGILE A/B testing, Statistics | Tagged msprt, power and sample size, sample size, sequential testing, sequential tests, sprt, statistical power

Statistical Power, MDE, and Designing Statistical Tests

One topic has surfaced in my ten years of developing statistical tools, consulting, and participating in discussions and conversations with CRO & A/B testing practitioners as causing the most confusion and that is statistical power and the related concept of minimum detectable effect (MDE). Some myths were previously dispelled in “Underpowered A/B tests – confusions, […] Read more…

Also posted in Statistics | Tagged minimum detectable effect, minimum effect of interest, risk reward analysis, risk-reward ratio, statistical power

Search

Browse by topic

Browse by year

The book on user testing

Take your A/B testing program to the next level

Learn more

Articles on A/B testing

Search

Recent articles

Browse by topic

Browse by year