embedded video is the new fax

4 December 2006, 1:42 pm

Lately I’ve been spending time with some of the late-20th-century classics of techno-punditry — books I’ve been ashamed to admit I never actually read, like Negroponte’s Being Digital and Norman’s The Invisible Computer. These folks are unanimous in their disdain for the fax, which they proclaim a giant step backwards in information technology. The fax takes information that is (usually) mostly text, turns into a graphic, and sends it as analog information. The human brain can process graphic-as-text directly, but if you want to put the faxed text back into a form you can manipulate electronically — a word processor, say — you either have to scan it with optical character recognition software or retype it from scratch.

The last several times I used a fax, it was to transmit an indication that a formal document had been signed. (It was usually followed up by sending the physical signed document to the recipient). The decline of the fax is one of the biggest prognostication victories of these books (although someone should probably give Negroponte credit for more-or-less inventing TiVo).

I’ve also been reading quite a bit about the semantic web (supposedly coming soon to an Internet near you). It concerns the aboutness of documents. You’re probably familiar with the problem — you’re looking for information on a topic, so you type some keywords in a search box. You get back a huge list of documents, some of which are about the topic, but many of which just mention it. The central concept of the semantic web is that documents should contain information about their contents that search engines and other tools can use.

The other day, I was working with a prototype of a web-based research tool for a specific content domain (kind of like Allmusic.com or IMDb, the Internet Movie Database). I typed in keywords and got, typically, a list of documents that was too big to be useful. Like Google, the search results displayed snippets of the search keywords in their context in the document. In addition, the search results told me under which keywords each document was filed. It was easy to make a good assessment of how useful a document was likely to be without accessing the whole thing.

However, the search results also included audio and videocasts. I could see how these were filed, but I couldn’t hear my search terms in context, and it wasn’t nearly as easy to determine if they were likely to be useful or not. This has also been an issue lately at work, where we have literally hours of un-indexed video that sometimes contains vital kernels of information. It suddenly struck me that even though the audio and video are stored in digital files, from an information perspective, they might as well be analog. They’re almost as bad as faxes. In order to determine that a 10-minute podcast isn’t relevant to my needs, I may have to spend (waste) 10 minutes listening to it. If I’m at home, time-to-download is a significant annoyance. If I’m at work, just determining which video file has the interview session I need to review is a daunting task. I’ve seen posts recently to some web-fora looking for ways to speed up podcasts so the information in them can be extracted more quickly — I don’t think it’s just me.

I hope the architects of the semantic web don’t ignore rich media. I hope the next generation of video and audio file formats contains aboutness information, like transcripts, summaries, timestamps, and production credits. But in the meantime a video clip on the web is much less useful to me than a few sentences about its contents. And without those few sentences, I may not bother.

One comment on “embedded video is the new fax”

  1. Ezra

    I agree with you re: podcasts that there is unfortunately a lot of good stuff there that can’t be easily extracted. In addition to having some kind of transcript, I want to use something like the DSP systems that radio stations use to remove all the little pauses from spoken words to increase their ad inventory (i.e. time). And better indexing (e.g. the They Might Be Giants Podcast usually has 5-10 songs on it, but no way to move between them but to fast forward or rewind).

    Also, semantic web: I really think it’s never gonna happen.


Comments are subject to moderation. Unless you have been whitelisted, your comment will not appear on the site until it is approved. Links are allowed for whitelisted commenters; images are not permitted.