Dennis D. McDonald ( consults from Alexandria Virginia. His services include writing & research, proposal development, and project management.


By Dennis D. McDonald

Robert Scoble in his May 13, 2007 We Need Better Statistics post does a nice job of starting off a discussion of the problems currently experienced by anyone who tries to make sense of web usage statistics.  His post and the comments that follow it provide a compendium of issues including:

  • Lack of agreement among major reporting services.
  • Pressure on the importance of the “page” metaphor given current web technologies (e.g., AJAX).
  • Failure of counting systems to take into account widget-based remote display of files of various types.
  • Apparent unwillingness of some services (e.g., Google, ISP’s) to make public their own data on usage.
  • Weakness of voluntary and panel based systems to represent the “true population” of users.
  • Possibility that current systems are being influenced by heavy amounts of “cheating.”
  • The belief by some that the only thing that really matters is whether or not someone actually purchased something from your web site.

As someone who once made a living designing and managing surveys, samples, and statistical analyses, I understand the frustration of those who need such data to make intelligent buying and pricing decisions. Just figuring out who uses what when it comes to my own blog is challenge enough! 

What questions are you asking? Your answer to how to build a better web usage data mousetrap will depend a lot on what questions you are asking and how intrusive you are willing to be to answer those questions. The web and its associated standards were not originally designed to be an infrastructure for tracking human behavior. 

This means that significant behaviors that cannot be expressed through keyboard or mouse — whether or not those behaviors occur at or away from the keyboard — may never be universally gathered and reported back  to a measuring service without a significant  amount of time, effort, and expense.

Why is this visit taking place? We have been spoiled by the relative ease with which we can track certain gross measures of web based usage and behavior. But we can’t easily get inside the head of individual users and ferret out “deep thoughts” without creating severe policy and privacy related concerns. All the widgets, toolbars, pop-up surveys, spyware, and rootkits we throw at the problem can’t get around the fact that we seldom know why someone visits a web site in the first place.

This question of “why” is an important input to tracking outcomes and satisfaction, yet it’s also the most difficult to get at. I don’t see any way around mounting a special effort to gather such data in order to supplement what we gather during the actual “physical interaction” the user has with the web site, but I’m also not naive about the difficulty of doing so. Of course, if you have an eCommerce web site and the visitor ends up purchasing something, you may care less about the “why” (although I’m sure your marketing department will care very much!)

Something else making the collection of good usage data more difficult  is the rising importance of relationship based activity via systems that promote collaboration, information sharing, and specialized “communities,” AKA “web 2.0.”

Community membership is influential - and shifting. In the old days we used to talk a lot about “population demographics” which led to “the tyranny of the zip code” as a design guide for all manner of marketing messages. These were succeeded by data that classified people by behaviors, once of the most recent being the Pew study A Typology of Information and Communication Technology Users.

As we see with the rise of “web 2.0” initiatives and the increasingly popular use of social media and social networking as marketing tools, people can belong to many different “communities,” each of which gives a slightly different picture of one’s interests, abilities, and influence within the group. Since these communities are constantly changing, a “true” picture of a visitor to a web site should include not only information about that individual but also information about that person’s “community” activities at a given time as well.

Need for transparency. Another important ideas I took from the Scoble discussion about web usage statistics was the importance of transparency, by which I mean the manner in which data are collected and reported should be openly described. 

That conclusion was reached long ago by survey research trade associations and is one of the reasons that public opinion surveys that are reported in the mainstream media are usually accompanied by some details about the survey methodology. The same should hold true for data on web site usage, especially if that web site usage data includes any repeat measures of usage or ranking. We need to know, for example, how efforts to “game the system” are controlled by web sites reporting any sort of usage or rating data. Until such data are made available in a clear and professional manner, we have a right as consumers to suspect any of the data being reported to us.

Copyright (c) 2007 by Dennis D. McDonald 


Followup to "Is DRM a 'Tax' on the Intellectual Property Supply Chain?"

Followup to "Is DRM a 'Tax' on the Intellectual Property Supply Chain?"

PowerPoint: The Tool People Love to Hate