15 August 2010

Call for Collaboration: Blogosphere Analysis, Visualization, and Reducing "Knowledge Creep"

Software designers (and users) often speak of "feature creep" -- a once-reasonable application becomes bloated and unwieldy as features are added without rethinking its cohering principles.  In software project management, we speak of "scope creep" -- the tendency for the target feature-set to change and grow as development progresses.  Alvin Toffler's phrase "information overload" has been applied to the challenge of navigating Internet information flows, but I think a more accurate phrase would be "Knowledge Creep."

I have started a new coding experiment, resulting from recent discussions regarding the difficulty most advanced digerati (AD) have in keeping up with the daily tsunami-like output of blog and news feeds.

By "advanced digerati" I don't necessarily mean anything terribly special or commendable.  If you use Google Reader or a similar interface to keep up with many blogs and feeds; and if you have subscribed to more than 30 feeds; then congratulations, you qualify.

AD tend to read newsfeeds on a variety of topics. They might organize feeds into meaningful subgroups or categories based on their proclivities, and the capabilities of their feed-reading system.  But problems develop over time with high-volume blog and feed-reading, along the lines of the frog-in-the-boiling water trope.  As we find new feeds of interest, we slowly, incrementally increase the volume of information we must process, until we have unwittingly made it unmanageable.  People who accumulate many Facebook friends over a few years experience the same problem: their "Wall" becomes an intimidating stream of info tidbits more aptly named "Trivia Tsunami."

This is similar to the oft-discussed problem of information overload.  Subscribed blogs and news feeds, however, offer interesting (arguably unique) opportunities to address the problem:

  • The information is highly-structured.  Blogs and newsfeeds use RSS, Atom and other content-syndication standards that make it easy to computer-automate content processing and analysis.
  • Blogs and Feeds were among the very first parts of the Web to adopt Web 2.0 technology, in the strict Tim Berners-Lee sense of identifying semantic elements using XML to facilitate computer manipulation, a.k.a. the Semantic Web.
  • The Blogosphere is particularly, annoyingly locacious about itself.  For instance, I subscribe to Slashdot, Gizmodo and EnGadget, and often find myself reading essentially the same story thrice.  That means the feed-reading process, by design, adds the insult of needless redundancy to the injury of information overload.
  • It is fairly straightforward to write software to "spider" or "crawl" blogs and feeds -- as Google does for indexing Web sites as well as news -- to obtain high-value metadata other than that which would be needed for content search.
  • To date, the design thinking of most Internet Applications (among which I would include the Web itself) has dramatically neglected the impact of time on usability.  Interfaces dealing with datasets that grow geometrically and independently from the individual user need to apply an entirely different set of design metrics than for, say, a word processor.  Call it "History Creep."
For these and a host of other reasons, I've been writing some code to spider-crawl news feeds.  I plan to extract some interesting data that could be of immediate practical value to a lot of people, particularly the aforementioned AD.

I know the foregoing is vague, and I don't mean to sound coy.  But I think I'm on to something very interesting, and I'm loathe to tip my hand quite yet.

I'm not the first person, by far, to ponder these problems and possibilities.  mSpoke in particular was doing very interesting things for its FeedHub service.  However, following mSpoke's acquisition by LinkedIn, it's not clear how, nor in what form, that technology may resurface.

There are some interesting opportunities here.  If this is an area of interest for you, and you have the time and desire to collaborate, and especially if you're already involved in a project working along similar lines, please contact me.  It's a rich, complex problem-space, and I'm gonna need all the help I can get.


0 comments:

Post a Comment