Comparing comScore to Collected Data

There has been a lot of great discussion about comScore’s recent press release in the yahoo group and the blogs this past week. Two post that particularly caught my eye were responses from comScore to questions by Eric Peterson about their methodology and the promise they will be releasing more information soon and Marshall Sponder’s post where he looks at what reducing the measured Visitor numbers by a factor of 2.5 could mean to the numbers published by some of the bigger online retail sites.

Checking Their Method

I actually talked to Eric about the questions he sent on to them before hand. I must say they pretty much addressed my major concerns about their data collection although I will look to their upcoming write up before making any final judgments.

Also, there is still another piece of this that is sticking in my mind, they gave the participants privacy tools. Understanding those tools and how they might have contributed to the results is next on my list of things to question for this. comScore’s numbers have always been a bit lower than cookie based numbers. Have they always given their participants privacy tools? I don’t necessarily think that this will account for a huge amount of the factor but questioning whether their panel is 2.5 more likely to delete their 1st party cookies is certainly something for consideration.

Overstating the Factor

According to the press release:

Frequent Cookie Deletion by 3 out of 10 U.S. Internet Users Leads to Overstatements in Audience Sizes by a Factor as High as 2.5

So what if the method holds up and the factor becomes more or less accepted?

Marshall made this comment in his post after making a little before and after comparison of the numbers published for a few online retailers:

The corrected numbers are much more believable and feel right - but I don’t think anyone who sells something on the web and puts a value on the number of visitors they’re getting will be in any hurry to divide their web analytics sanctified Uniques by 2.5.

I have two problems with the comment. First it assumes that the numbers he started with were based on cookie id driven log analysts when in fact they appear to have come from Nielson which is a panel based method similar to comScore. Second, it assumes that the 2.5 factor has been accepted.

I’m including this because I do agree that anyone that has been using cookie based numbers for their public figures is certainly in for an adjustment. It also highlights the importance of knowing where the numbers come from. All methods of finding user numbers are estimations in the end and the panel based numbers are certainly not flawless. See my data below for an example of how variable comScores numbers can be when compared to cookie based data. When it comes to the trends and the fluctuations in types of users, neither is 100% accurate. Knowing your data and how it came to be is key to really understanding your results.

Investigating the factor

So what is so magical about the 2.5 number? I just happen to work with a number of geographically focused news publisher sites in the US who are also comScore customers. So I decided to go to the best tool at my disposal for looking into this question, the data.

What I did was take the numbers from 6 of the sites for 2 months and compared them to the comScore numbers. The result was interestingly supportive of comScore’s report:

comScore comparison summary

Or is it? To be clear, until the method is fully vetted, don’t take this as conclusive evidence of comScore’s findings. It also could end up that the factor simply explains the difference between our analytics tools and comScore. One thing this method shows is that despite the tendency to the 2.5 factor through the average, there is certainly a bit of variability in the numbers for given site/month comparison. Here is the raw data:

comScore comparison data

The one part of this that really troubles me is the size of the factor for the daily averages per month. With cookies tending to be more accurate over shorter time periods, I was expecting the difference there to be far less than I found. The notion that the daily numbers could still be twice as high as the reality is a little disturbing.

Any one else willing to share some data to widen the sample for this comparison?

I have provided some additional notes below about this research.

-Ian

Notes, Methodology, Disclaimer

Here is some more information about myself and my findings in order to be as transparent in this as possible:

  • Cookie based data is derived from an Implementation of Visual Sciences designed and maintained by myself over the past 2 years. The implementation uses a hybrid data collection model in order to see the cookies that have been accepted by the browsers on the first page view to the site. A side effect of the hybrid model is the cookies attempted are inflated by almost a 3 to 1 ratio for those that block them. The relationship of the cookies used in this implementation are almost always 1st party but there is a small number of pages for each site that employ 3rd party cookies.
  • Although the site and months are masked to provide some anonymity, the results are based on 2 side by side months per site within the past 3 months.
  • All sites used in this example use the comScore number as their published number. Cookie based data is used solely for trending and performance analysis. Cookie based data is also used to help understand the possible inaccuracies in the panel based numbers.
  • Even though I have conducted this work as a practitioner/consultant. I will be announcing in my next couple blog posts that I am joining the team at Visual Sciences on May 1st.
Loading..
DiggIt! Del.icio.us Blinklist Yahoo Furl Technorati Simpy Spurl Reddit

6 Responses to “Comparing comScore to Collected Data”


  1. 1 Clint Apr 23rd, 2007 at 10:01 am

    So are you the person that Eric mentioned last night?
    http://blog.webanalyticsdemystified.com/weblog/2007/04/welcome-to-the-blogosphere-judah-phillips.html

    welcome aboard!

  2. 2 Jacques Warren Apr 23rd, 2007 at 10:41 am

    Hi Ian,

    Hmm I find your results with the dailies quite troubling too. One would have expected a much lower inflation factor. I think this casts some doubt on the comScore conclusions, not that I absolutely want to stick to cookies forever (still waiting for a Savior).

    Hopefully, other readers with access to a WA platform and comScore services will come forward.

  3. 3 Ian Apr 24th, 2007 at 8:08 am

    Thanks Clint.

    Yeah Jacques, the daily inflation factor does raise some issues with how accurate this could be. I also just stumbled upon another interesting tid bit. I actually just saw comSore’s list of top sites in March by visits per visitor.

    http://lsvp.wordpress.com/2007/04/20/most-frequently-visited-websites-not-what-youd-expect/

    I don’t think it is purely coincidental that the number 1 site by visits per visitor is a privacy and protection tools vendor. I downloaded the software from aluriasoftware.com and so far it has been a disappointing experience. The app hangs, has trouble opening files, produces an odd memory read error on all 3 machines I have tried it on.

    I wouldn’t trade Norton or TrendMicro’s products for this one. But I do wonder if it is part of the gifts they give their panel volunteers. This isn’t a smoking gun though. For all of the program’s faults it does a pretty poor job of scanning cookies and deleting tracking cookies. It hasn’t decide to delete a one on my test boxes and I made sure to go to a couple “special” sites that would set a couple cookies that most certainly should be flagged.

    But what if the impact of giving a tool like this out to the panel is more influential in nature. It is impossible to observe without having an impact on that which you are observing.

  4. 4 Sébastien Brodeur Apr 26th, 2007 at 1:18 pm

    Hi Ian,

    I was wondering, those web site you uses for your test, are they using P3P policy? I guest this can have a impact on the result.

    Also, a lot of traffic come from people surfing during work hours. By fear of being discover, are those people clearing they cookie/cache more often? This can also explain the ratio by day.

    I can’t wait to see the methodology of comScore for this study.

  5. 5 Ian Apr 26th, 2007 at 2:44 pm

    Hi Sébastien,

    All of the sites are P3P complient but if they weren’t it would affect a higher rate of blocking, not deletion.

    Your point about work hours could be an important consideration. One common criticism of comScore’s method of data collection is that most businesses and universities block the installation of their software and/or the transmission of data to their servers. As far as I know, it appears that their panel my be very Home centric which could cause a fair amount of discrepancy if a site’s audience is Work and University centric.

    The sites I used are extremely geo-regional focused and it so happens that the busiest traffic periods for them are from 8am to 5pm in their local markets. I’ll see if I can’t pull some day part data for comparison and put up a part two post.

    -Ian

  1. 1 Web Analytics and SEO Blog » Blog Archive » The Simple Answer To Cookies Being Deleted: Spyware Pingback on May 6th, 2007 at 10:44 am

Add to Technorati Favorites
View blog top tags