Dennis D. McDonald ( consults from Alexandria Virginia. His services include writing & research, proposal development, and project management.

Balkanization of the Web - or Just Better Focus?

By Dennis D. McDonald

Last week I posted Google’s Custom Search Engine Applied to Film Reviews and the Social Media Collective where I discussed creation of specialized search engines that can limit their targets to a specified list of URL’s. While there are disadvantages to this approach (e.g., failure to retrieve valuable content from beyond your initially defined search scope), I believe the advantages can be substantial.

Limiting one’s search to a given set of web sites or pages is good example of a lesson I learned early in statistical work — “shoot where the ducks are.” Basically, when you define the scope in advance of your search by specifying a known or defined target area, both your search precision and search performance can be improved.

This week Mike Stopforth posted Why I Want Localised, Specialised Search where he describes why he wants the ability to have tools like Technorati and search available to be focused specifically on African and South African web sites. Mike’s post was a follow-up to an earlier post of his where he had commented positively on the experimental South African Gargoyle search engine.

Mike’s arguments for a specialized South African search service make eminently good sense and bears reading. Still, I pointed out two potential disadvantages:

Thanks Mike, this is a terrific explanation. We should be able to “slice and dice” the web anyway we see fit — even if it’s just to be able to search a specialized collection of movie reviews or a specialized set of blogs related to social media and social networking. Geographic orientation is an obvious segmentation variable and certainly one of the most important criteria relevant to personal and professional networking.

Two possible caveats arise if we add metadata-based or infrastructure-based identifiers that simplify geographic aggregation of sites:

First, do we risk some “balkanization” of the web if different regions adopt different geographic tagging or aggregation techniques?

Second, do we make it easier for internet-unfriendly governments to track and restrict the free flow of information?

In the ideal world, it should be very easy to “scope” an Internet search just about any way you want — by topic, by groups of sites, by geography, by whatever. There are just too many times when “give me everything” is inappropriate — especially when the placement of the returned items can be gamed.

Assuming that specialized search engines are sustainable, how serious is my concern? My blogging service vendor Squarespace already automatically “pings” a whole host of search engines when I make a change to Dennis McDonald’s Blog. If more specialized search engines become available, whose responsibility will it be to ensure that all relevant specialized search engines are “pinged” so my materials are indexed?

I say that as someone who realizes how important, volume wise, search engine hits are to the traffic that reaches my blog. At least 46% of the hits on my blog during 2006 were Google related. Perhaps inclusion in more specialized search engines might drive that higher — as long as I’m included.

On the other hand, hits on my blog that are referrals, not search based, tend to be, I believe, much more targeted to topics I care most about. These referrals tend to come from sites and sources that are more closely aligned with my interests related to social media, social networking, knowledge management, expertise location, and digital asset management. Such hits tend to be more “social” in origin; that is, they come from blogs, aggregator, search engines, or other sources that are more closely aligned with my own interests.

All things being equal, a hit related to an RSS or Atom feed, due to the “opt in” element, may tend to be more closely aligned with my interests than someone’s random search for nude vampire pictures.

Copyright (c) 2007 by Dennis D. McDonald


Weird Search Engines

Avoiding the "Web 3.0" Label