Jeremiah Owyang posted some great questions yesterday about the accuracy of User measurement and cultural reasons why “User” measurement may not be accurate in certain countries such as China and India. The most interesting part of the post (IMHO) is that both the post and the responses seem to focus on personal perceptions for why User measurement may not be accurate in a specific locale instead of focusing on whether User numbers are accurate at all. The underlying assumption appears to be that since they are not accurate for a particular country they would be accurate for other countries?
Whether or not User measurement is more or less accurate because of cultural differences is interesting but the answer is completely dependent on how you define and identify “Users”. There are issues with all forms of user identification regardless of cultural and socioeconomic differences around the globe.
For instance, take the use of just an identity cookie as the sole form of identification (the most common approach today by far). Cookies do not have a one to one relationship with a “computer” (a single physical machine). They are stored per browser, per user account and per virtual machine (where they are employed). There are several ways they are skewed in both directions from the reality. Take these scenarios for instance:
More “Users” than identity cookies.
- Many users grouped around one machine viewing the content at the same time (per your example). This may happen more frequently in some cultures but I would you to also consider how many times you may have only seen a site over a coworkers shoulders.
- Many users using the same computer such as a public machine at a library or an internet cafe and a central family computer.
More id cookies than “Users”.
- How about a single user who uses 2 both Firefox and IE (or Firefox and Safari) on the same sites. More Cookie than users.
- Someone with a work computer and a home computer and more than one browser, account, and/or virtual machine on either or both.
Add to the mix above the issue of cookie deletion!
This raises perhaps the perfect example for why it is important as an analyst to understand both the underlying data, how that translates into the metrics, and the goals of the analysis. For all of the issues I have listed above and more, some might complain that this is one of those areas where this is a problem with data accuracy. Though, unless we move to some sort Orwellian scenario where every computer in the world includes a proximity sensor that reads data from implanted chips and transmits identities in every request, there is no 100% solution for identifying users. There are however smarter ways to use what you have in the analysis.
Here are a number of points to consider when working on how to best utilize the user tracking of your implementation:
Manage your Implementation
Don’t just install an analytics package and expect to know what every metric means. Work with your vendor and/or your IT folks to make sure you understand how your data is collected. For cookie based tracking this may even include identifying how well your site follows these best practices:
- Count accepted cookies not attempted cookies. A number of analytics systems will log an id whether it was returned from the users browser or not. Unless the cookie has been accepted you can’t accurately use the id as measure of a successful dientification.
- Use First Party Cookies instead of third wherever possible. Whether or not you believe the recent ComScore report on cookie deletion, it is undeniable that todays browsers are becoming increasingly tougher on third party cookies with each release.
- Set proper P3P header. It is equally undeniable that the most popular browser (IE) has become a more reliant on seeing an acceptable privacy policy from the server to accept any cookie.
- Don’t over use cookies. This applies to your whole site. Cookie storage in the browsers is finite. They are limited both by site and in total. If you are using too many of them you will create a different form of cookie deletion for yourself and possibly the other site your users are interested in.
Work with What You Know
Beyond understanding the implementation, you also need to know how the tool arrives at the metrics and what they really tell you about your audience. For the standard cookie based implementation this can also be broken down into a set of best practices:
- Be aware of what you are measuring. A cookie only has a one to one relationship with a specific of a browser and user account on a machine. To call them “Visitors” or “Users” is widely accepted but the reality is they are measuring unique client instances.
- Focus on the Known instead of the Possible. If a user id shows up in your traffic for multiple sessions across multiple Days, Weeks Months, that is a known return of a unique client instance. You can say for certain that it occured, segment on it and analyze the behavior. If a new cookie id is seen in your traffic you know it is possible that it was the first time a user came to the site but there are several other possibilities. This doesn’t mean that you should ignore the new users metrics but you do need to be aware of this difference to tell a more accurate story with your analysis.
- Trend on shorter time periods. With the exception of a group of people huddling around the glow of a single computer screen, most methods of inflation or deflation are more likely to occur over time. For this reason a daily count is more accurate than a weekly one which is more accurate than a monthly number. When measuring things like growth trends, consider using rolling averages of shorter time segments.
- Embrace the Value of Session data. The session is not only a smaller segment of time but there are also methods(such as click stream data) that they can be at least partially validated.
Learn More Where You Can
Along with managing and understanding your User identification, you should also work with your site find possibilities for alternative data that would refine what you know about your Users number.
- Identify multiple identification situations that will at least give you a better understanding of how your primary method performs within a segment of your audience. Even options such as an optional site registration and login that only a part of your audience may participate in will give you knowledge through the comparison of 2 IDs.
- Implement alternative forms of identification in other types of web clients such as email and blog readers. Capturing both IDs in the click through events between the other clients and the browser will give you 2 numbers that can help you identify inflation of your primary tracking for a higher value segment of your audience (the ones that are responding to you marketing).
In the end, I believe the golden rule for all of the above is to simply understand what you are analyzing and the answers available from your data AND your analytics tool together. That is the best way to be more accurate in your User measurement and reporting regardless of where on the globe those users are coming from.
-Ian
NOTE: A majority of the content in this post comes my second presentation at Emetrics last month. Gary Angel wrote a review of the presentation here.




