Evaluating an A/B test? Better not use transaction time data

November 18, 2016
Sam Knapp, Head of Insights for [24]7 Predictive Search Bidding

Continuing our series on attribution and the various ways it may impact your SEM Accounts, for this entry we will add another one of our favorite elements and discuss how Google Drafts and Experiments depends on having a solid understanding of attribution. Now, why does this specific combination warrant a whole thought piece? At [24]7 we absolutely love Google Drafts and Experiments as an A/B testing framework and think it provides an amazing way for evaluating changes within your campaigns. When measuring the impact of ad copy change or testing out a new bid management platform, it is likely that conversions or revenue will be a main metric that goes into deciding the winner.

As a quick intro for those who are unfamiliar with Google Drafts and Experiments (we will refer to it from here on as simply D&E), it allows for cloning of Adwords Campaigns and splitting the available Impressions between the Original and Experiment campaigns either 50/50 or by whatever ratio you prefer. All attributes are copied over to the new Experiment campaigns, and conversions will get credited to the keyword and campaign that generated the click. With standard Adwords attribution, this is a shut-and-closed case with the simple last-click and click-time modelling.

3rd party tracking solutions often offer custom attribution models, which can be quite valuable when tailored towards your business but it is important to understand the implications of testing out features and how you look at attribution. The example we will dive into here is how to properly evaluate a D&E test where conversions are attributed to the time of the transaction rather than the time of the click. For the rest of this post, let's assume we are using Transaction time attribution.

If you are testing a new A/B testing framework such as D&E it can be wise to start off with an A/A test to make sure it works as expected. With D&E this can be simply done by cloning the campaigns and… doing nothing! All attributes will be the same, although for the integrity of the A/A test, it is essential that any bid changes or other adjustments be applied to both the Original and Experiment side of the test.

Jumping into our example scenario, one day into the A/A test our brand new Experiment campaign has a nearly identical number of impressions, clicks and spend but there is an alarming difference in ROI when compared to the Original campaign! The below image shows how you could potentially have an Original campaign which has historically achieved a $5.00 ROI while spending $2,000/day appear to reach a $7.00 ROI on Day 1 of the A/A Test. At the same time Experiment is doing really poorly at $3.00 ROI even while all the attributes such as ad copy, keyword bids and landing pages are identical. What could be causing this?!

Evaluating-A-A-Test Day 1

Our old friend attribution, that’s what! Imagine your business sells shoes and you are using the standard 30 day cookie window. If Roger clicked on an ad two days prior to the launch of the A/A test and then returns directly to your website www.myfakeamazingshoewebsite.com to complete his purchase after the Experiment campaigns have been created, let’s think about how that may skew the results. The Original campaign will have essentially banked, or built up a large number of clicks prior to the test launch that have the incredible ability to generate revenue at no cost! The scenario of phantom conversions occurring on keywords with 0 clicks on that same day that can be a key indicator that transaction-time data may be influencing how results are viewed.

As a general rule of thumb, transaction-time attribution should be avoided when running D&E tests. With a 30 day cookie window, your business leaves open the possibility of a user “converting” on an ad click 29 days after the initial click that would unfairly advantage the Original campaign. The only way to account for this would be to launch the test but don’t begin evaluating performance until the cookie window has closed. That way all traffic that converts has been bought during the test period under the conditions that are being evaluated. With an A/A test, you can clearly see a huge gap in performance between the two test sides early on, but that difference will shrink as time goes on. As below, the revenue will quickly start to converge but a few conversions may trickle in towards the end of the cookie window. Only until the full 30 days have passed will it be safe to run a clean D&E test and ensure that the Original campaign cannot benefit from these “free” conversions.

Evaluating-30Day-TransactionTimeLag

Waiting 30 days to run a test is obviously not an ideal option, as often the purpose of a D&E test is to quickly view results and act quickly.

Switching to a click-time attribution model provides advertisers with the opportunity to start measuring results from Day 1 and gain valuable insights into ad-copy, landing page or bid change strategies. If a tracking solution is able to identify a specific user as well as determine the time of both the click and the transaction, they are often able to switch seamlessly back and forth between the two from a reporting standpoint. Click-time data is instantly based on the traffic bought once the Experiment has begun, which makes it massively more relevant when evaluating the two test sides! From a strategy perspective, looking at performance from a click-time point of view not only clarifies the results of a test, but it also gives much more meaningful insights into how to adjust SEM strategies moving forward. Someone in the “click-buying business” ought to be more concerned with the optimal time to increase or lower bids based on metrics such as conversion rate or revenue per click, rather than knowing when those purchases actually go through.

D&E provides a truly incredible platform for unbiased evaluation of changes within advertiser’s Google Adwords Accounts--we at [24]7 just want to make sure that the data is evaluated correctly. As data gets more and more complex to analyze, such a clean testing platform is a huge treat for advertisers seeking improvements to their SEM strategies that they can measure, act upon, and apply quickly.

As we move forward in our series on attribution, the question of when gets replaced by the questions of who and why as we attempt to unpack Google’s new data-driven attribution model and determine some of the logic behind this new feature.

Want to learn more? Join us for a webinar on November 30th as we uncover the true value of your search traffic and dissect the core components of SEM attribution you need to understand when choosing attribution models. You can register here.

Sam Knapp, Head of Insights for [24]7 Predictive Search Bidding
Sam Knapp, Head of Insights for [24]7 Predictive Search Bidding

Add new comment