Web 2.0 and the Integrity of Individual Works

by Dennis D. McDonald

Part of the evolving Web 2.0 environment is the collection and redistribution of original content and conversations as well as meta-information (tags etc.) relating to that content. What we are seeing are not only the easy creation of online content and media but also the sharing of that content and associated information and conversations. Functions such as aggregation of feeds and mash-up of application services are examples of operations that take place where original content and functionality may be combined and recombined in new, unique, and potentially powerful ways. We need to make sure, though, that in the process we don't destroy the integrity of the intellectual property we now find it so easy to copy and manipulate.

Now, I’m probably an example of an "old school" attitude, but I like to know a few things when I fire up my computer when I go online:

  1. Where am I? Am I working on my own machine or am I working on a machine (and content) with tools on my machine or that are elsewhere? I like to have a physical grounding in where I'm working for a variety of reasons, such as (a) transmission latency effects may impact speed and performance of the network operation I'm performing and (b) if my connection with a remote server is lost I want to recover and/or continue working to the extent possible.
  2. Whose information am I working with? I like to know the provenance of the information -- its history, ownership, authorship, and credibility. Even "facts" don't make much sense without context, and key critical contextual details I want to know about something I'm reading or viewing include (a) who wrote this thing and (b) is this what was written. I need some basis for trusting what I'm reading.

This gets me to a post I put out on the Linkedin Bloggers group on Yahoo! recently; I'll repeat parts of it here:

I was recently researching the origins of searches to my blog and tracked back to an html document online that had been generated by a news research company that provides keyword-based tracking services to its customers. 100% of my original content was displayed, but with a few twists. I was concerned about the manner in which the original article was presented and expressed this by email to the company:
  1. all the links on the left side of my page had been deleted; I use these as a navigational device within my blog. Their loss means that a major feature is gone even though a link to the original article is available at the top of the page. One of the items missing was my email address.
  2. my copyright notice, displayed on all pages, was gone.
  3. the formatting was incorrect and resulted in a run-on between a sentence I wrote and a quotation (attributed) that I had included from another publication. The result was that a reader of this stripped-down version of my original text would misinterpret what I said and what the quotation said.
I am still attempting to get a response from this company (they have a US address) and I notice that their indexers are still crawling my site.

I'm bringing this up since, while I normally believe that it makes great sense to separately tag text and how it is presented, my experience is one example where failure to represent the original structure (at least in terms of the third point I make above) may result in a misrepresentation of what I actually write.

Now, anyone who publishes on the web knows how easy it is to copy and redistribute digital information. Publishing is an act of faith and a body of law and related enforcement mechanisms have evolved to “protect” the rights of the intellectual property’s owner. We all know how digital communications has fundamentally altered the economics of intellectual property distribution, and I don’t want to get into issues such as DRM and Fair Use here.

What I am concerned about is that, as the web becomes more and more interactive, social, and embedded in daily life, we need to still be able to keep track of what is the individual’s, and what is the group’s. Just as many people are concerned about digital copies being made of their intellectual property without their permission, we also need to respect and maintain the physical and intellectual integrity of individual works, even when those works are intended for social and interactive use and manipulation.

Now, back to my quoted posting above. I still haven’t heard back from this publisher, and its crawlers still regularly visit my site. But I know that at least one of my publications exists in cyberspace in a corrupted form and that people reading that will not receive the intended message. My assumption is that the mistake in formatting was caused by an error in the parsing of tags by the publisher’s software that stripped out “extraneous” tags and frame information, and that this effect was not intentional.

I’ve written here my concern about the potential misrepresentation of work. Is the publisher as concerned that its subscribers may not be able to trust the integrity of the product it delivers to them? So far I have no way of knowing.

