Using a framework to define and communicate web analytics measurement

In my previous post on Evolving the Web Analytics Data Model I attempted to describe, in perhaps too much of a technical way, a move away from narrowly defined terms such as a Page View, Clickthrough, Impression, etc. to the more generic classification of an Event. In the resulting comments and some discussions with colleagues it became clear that meaning of what I was trying to describe had been lost and confused. Mostly through the use of the phrase “Data Model”.

This got me thinking more about the problem I am trying to solve for, how to better define and communicate the core measurements applied to the web. Or as I stated in a response comment to the post,

to foster the discussion of the Data Model in language that is practical and understandable across multiple disciplines (Statisticians, Programmers, Marketers, etc…)

Variations on a Theme

If there is one truth I have learned over the last 12 years working with internet technologies, it is that new developments are, more often than not, simply variations of the preceding. Sure there are improvements along the way but usually the core of what is being done is consistent at some core level.

These variations are especially true in the realm of Web 2.0 technologies such as Ajax where the new name is really a reclassification of DHTML where all of the browsers now support XMLHttp (not just IE). DHTML with XMLHttp (plus a couple other ways of pulling and changing data in a page) was a replacement for what was previously done with things like iframes. And all of them at their core are methods of dynamically changing a part of a web page without having to reload the whole (the theme).

Finding the Web Analytics Theme

Historically, Web Analytics tools are based on very simple and narrowly defined premisses such as Visitors Visit sites and view Pages or ads are delivered to Impress visitors enough that they Click through to the advertiser’s content. Even though these are focused on particular things, their wide acceptance and use is not likely to disappear anytime soon.

When I hear people declare the death or the replacement of the page view I often have the knee jerk reaction of cringing and rolling my eyes. What many of these conversations fail to convey is that Page Views are not really disappearing as a metric but rather that the term is too narrow and not really applicable to the whole of what can occur across the entire scope of Web Usage. They also usually fail to mention that even if the Page View is no longer useful for one discipline (like Marketing), it can still be very relevant to others (like Site Optimization). (See my first post about events and identifying types.)

So how do we find the terms that are applicable to the whole? The way I look at this is Web Analytics at it’s simplest is the means to analyze web use. To find the theme behind web analytics the starting point is to define the core of what web use really is. In taking a stab at this I have boiled it down to this elementary definition:

Users use Clients to engage in Activities comprised of Events

As the definition of web use, this in turn becomes the core theme behind anything you would wish to measure on the web.

Turning the Theme into a Tool

Now that we have our theme all we need to do is define the components plus their relationships and we now have a basic conceptual structure for defining how we measure various pieces of the web.

Enter the Web Analytics Measurement Framework.

The Web Analytics Measurement Framework

In this “reference model” the components break down as follows:

Events

The request unit. Events are the smallest unit of data capture or the specific units of Activity. Page Views, Impressions, Clickthroughs, Google Map Scrolls, Blog Post Views, etc…

Activity

Formula driven groupings of events representing periods of activity. Visits, Subscriptions, etc… Activity is the most effective place to measure concepts like duration. Unique client identification can occur here for activity that does not support traditional techniques of client or user identification such as cookies.

Clients

The application by which the activity occurred. it is more than just the User Agent and should also be able to illustrate the application within the application in the case of AJAX, Flex, Widgets, etc… Unique client identification (such as unique id cookies) occur here when ever possible.

Users

More than just a client, events between clients are used as the glue to provide multi-client user identification. Users can be human or robotic (instead of just throwing out all robots). With SEO and feed syndication there are plenty of clients emerging that could look like a single user but through search results or aggregation of feeds can result in off site human views of the content. Thus robotic users/clients will become a more important part of analysis to understanding where and how new Users are coming from.

Expanding the Framework

As I hinted at in some of the component descriptions, there are places in the framework where certain types of measurement such as duration are most effective. As an expansion of the framework definition it is also very easy to add baseline definitions of core relational metrics and KPIs such as Events per Activity. The relationship between unique clients and the users also demonstrates some ideas I have about ways to combat issues like cookie deletion.

But I will leave these topics for future posts. Along with a more formal description of the framework and how it can be applied.

-Ian

Loading..
DiggIt! Del.icio.us Blinklist Yahoo Furl Technorati Simpy Spurl Reddit

2 Responses to “Using a framework to define and communicate web analytics measurement”


  1. 1 Jeroen Feb 16th, 2007 at 4:53 pm

    Ian,

    I have just posted a quite technical reply to your previous post, but I think I can see where you’re trying to go.

    The ‘Page View’ is too specific, so we need to broaden our minds and try to find something more general that still includes the good old page view, but can also accomodate the other beasties in the zoo of possible commercial internet activities.

    I think, however, that, in order to be useful, the subtypes of Event should have some pretty elemental attributes in common. I would suggest the attributes: activity, timestamp, request, and response. Where request and response would both be of type Message. A message would have the following attributes: headers, MIME-type, and content. The headers attribute would be a map of key/value pairs. A request could have an extra attribute: location.

    Be careful with clients, they don’t fall neatly between users and activities. I use more than one client to participate in e-mail threads. A campain can span more than one channel (e-mail, web, off-line media). I would suggest to rearrange the framework in the form of a diamond, with one of the four terms at every corner. The most abstract on top, Users, and the most specific at the bottom, Events. This would mean that Event needs an extra elementary attribute: client.

    I am curious as to where KPI’s get located in this framework. Is a KPI a function that assigns a numbers to (sets of) activities?

    Jeroen

  2. 2 Ian Feb 19th, 2007 at 6:59 am

    Hi Jeroen,

    Exactly right about the “beasties in the zoo” although I don’t think commercial vs. non-commercial is a distinction for consideration here.

    I agree that there are some required attributes for an event but I think those are simply the Identity and the Timestamp (what and when). Beyond that everything else is meta-data, additional attributes that can be captured from the request or the response, a header or a query string parameter, or even by creating an attribute after the fact based on the value of one or more of the collected attributes. In fact, even the two required attributes can be created after the fact. An example of this would be page tags where the actual request is to a data collection mechanism and the identity is most often passed as a parameter in the query string.

    In addition to the two attributes above, in order to be able to sub-class the events into things like Page Views, Impressions, Clickthroughs, etc.. a third attribute is needed which I simply call type.

    As far as the Message concept goes, that appears to me to be just a grouping separation of request data from response data and that is fine. I have no issues with separating them for understanding where the information originated and would certainly identify them in such terms technically. But in more general terms I am looking at the whole as attributes of the what when the what is measured through the interaction between a client and a server.

    I am intrigued by the diamond shape but I don’t necessarily think it is a perfect fit either. You are certainly right in pointing out that there are issues between the Users and Clients levels. But most of those issues stem from the strengths and weaknesses of various identification methods.

    For instance, the practice of using a cookie for a unique identifier only identifies a unique client, not a user (unless the cookie is set through a required login process). An email address for say a newsletter is a much closer to identifying a single person but it is possible here too where a person can have more than one email or an email can be an alias to many people. The point of the User level is to deal with the client identification and use information from events that transpire between clients (linking client identifications to one another) to arrive at a more accurate measurement of real people.

    A lot of the above is very specific. The point of the framework is to be able to identify what is being measure in more conceptual terms and then apply those concepts downward when defining more specific levels of measurement and data models. All I meant by KPIs was that simple formulas such as Events per Activity apply downward to any level of use of the framework. (Page Views per Session, Image Views per Slideshow, etc..)

    In perhaps even more general terms:

    Events are WHAT happened WHEN.

    Activities are a collection or stream of WHATS over a segment of WHEN. (Note, WHEN could sometimes be a period of time equal to the total possible time the WHATs could occur such as Subscriptions to an email newsletter.)

    Clients are HOW that can be identified by unique actors of the how (don’t think of client as Web Browser, think of it as Web Browser with ID cookie value 12345).

    Users are WHO.

    -Ian

Leave a Reply




Add to Technorati Favorites
View blog top tags