From Nick Carr, here’s a New York Times article on the semantic web – both Nick and the Times reporter are quite excited about the semantic web’s potential to make the current generation of search engines obsolete. The Times report calls the semantic web “Web 3.0″, and any talk of “Web 3.0″ makes me want to shout “Web 4.0 is psychic, Web 5.0 — omnipotent!” in my best mad-scientist voice, but aside from that, it’s a solid article – it just overlooks the one huge roadblock in the way of a semantic-web-related “Web 3.0″ boom. I call that roadblock “Bill.”
I’m exaggerating, of course – the roadblock in the way of a Web 3.0 isn’t just Bill, but the entire class of people like Bill. Bill himself is an independent webmaster who’s got a large content site that focuses on a popular hunk of electronics. There’s nothing particularly flashy about his business – he built up traffic, sells ads and manages his costs, and due to this he doesn’t have to work for anybody but himself. If there’s one thing Bill hates, it’s automated scrapers. They drive up his costs and they slow down his site for the legitimate customers – just like Google, Bill knows ad-driven sites thrive on speed. So Bill blocks bots – he stops every single spider or scraper that comes near his site unless it’s already got a proven record of delivering traffic. And he’s not the only webmaster to do so, although he’s the only one I know of that’s putting together a commercial product to share his solutions with others.
I’m very bullish on Bill’s product. There’s a ton of scraper crap out there already, and if the “semantic web” project takes off, there will be even more of it – a whole flurry of start-ups, each trying to make a buck by sending out a spider that ever-so-slightly degrades the performance of the sites it visits. You’ll need a solution like Bill’s if you don’t have one already. Of course, this means that any startup reliant on a new bot is going to be at a severe disadvantage, a disadvantage that’ll probably prevent the semantic web from developing into a startup- and VC-fueled Web 3.0 series of conferences. Instead, the semantic web will have to be developed by already-established search engines – the Googles and Yahoos and Microsofts of the world. If a startup succeeds in this space, it’ll only be because a Yahoo or a Microsoft gets too far behind a Google and starts licensing out the raw data from its crawler.