When McLean-based Cvent Inc. filed a $3 million copyright lawsuit against a West Coast competitor this spring, the software company didn’t just allege simple plagiarism. Cvent, which offers a database of venue profiles for corporate event planners, accused rival Eventbrite Inc. of quietly unleashing an automated program — a webbot or “bot,” for short — on Cvent.com to purloin thousands of pages of valuable content.
In its complaint filed May 10 in federal District Court in Alexandria, Cvent alleged the San Francisco company had taken information that cost more than $10 million to create and reproduced it on its own website — errors intact.
Cvent’s suit highlights a prime fear of companies whose stock in trade is a mass of publicly available data: Web scraping. The widespread but sometimes legally hazy practice — in which tailor-made programs mimic a human user to harvest content from the Web — runs the gamut from benign to malicious.
In some cases, scraping is used to help market researchers or create Web mashups that stitch together data in new and creative ways.
In others, it serves as a vehicle for corporate espionage and piracy. The demand for scraping has spawned a market for custom-built bot software, as well as for software to thwart those bots.
Scraping as a means of stealing Web content occurs “on a fairly regular basis,” said attorney Karl Means, who heads the intellectual property group at Potomac-based Shulman Rogers Gandal Pordy & Ecker PA.
His practice confronts scraping-related piracy about a half-dozen times a year, Means said. “As long as the information is out there, people are going to abscond with it. If you think about it, it’s essentially plagiarism.”
The problems associated with scraping are broader than intellectual property, however — a fact underscored in June by a high-profile and embarrassing security breach at AT&T Inc. A group of hackers exploited a security flaw in AT&T’s iPad 3G network to scrape 114,000 customers’ e-mail addresses, including addresses from the military, media, Congress and the White House.
The hacker, Goatse Security, which derives its name from an infamously obscene website, said it attacked AT&T as a “service to our nation,” to expose a gaping security hole.
Cvent’s lawsuit claims a bot that automatically copies website material had accessed cvent.com several times between August and October 2008.
Eventbrite, the lawsuit claims, took data from 1,613 of Cvent’s copyrighted venue profiles and earlier this year copied and redistributed the information in a “wholesale and indiscriminate” manner on eventbrite.com — including typographical errors, duplicated paragraphs and incorrect tax rates.
Neither party would comment on the suit, and Eventbrite has not yet filed a formal response. In lieu of a temporary restraining order, the parties have agreed not to download information from each other’s websites during the litigation.
Cvent claims it spent more than $10 million researching and building its venue database, dubbed the Cvent Supplier Network, which compiles information like meeting room capacity and amenities for each facility. The company calls the database “a key differentiator” that gives it a competitive advantage over rivals.
Cvent also said it spent $800,000 over the past three years to create and market a destination guide that enables planners to compare locations across cities.
Stealing another company’s competitive advantage is typically the motivation for engaging in Web scraping.
But Michael Schrenk, a Las Vegas- and Minneapolis-based bot designer, lecturer and author of “Webbots, Spiders & Screen Scrapers,” doesn’t see it as clear-cut pilfering, explaining that while scraping can be either legitimate or nefarious, it represents “probably the most exciting area of Web development.”
“Basically, you can make the Internet a lot more useful than what it is,” he said. “Instead of taking the Internet and using it the way it’s presented to you, you can actually [remake] it the way you want the Internet to look.”
Schrenk described his customers as “the people that tend to be a little more adventurous,” those in procurement and fraud detection fields, private investigators and even journalists.
Tech analysts declined to venture an estimate on the size of the market for the services that Schrenk offers, and none of the software companies interviewed for this article would specifically discuss their clients.
“In order to get that competitive advantage, in order to keep it, you got to be kind of quiet about what you’re doing,” said Schrenk, who recently addressed the national hackers convention, DEF CON 17. “No real numbers will ever be made [on the size of the market]. It’s absolutely impossible. But you have to assume there’s quite a bit of it going on.”
On the flip side, the market for combating scraping is “absolutely massive,” said David Crowder, CEO of Pramana Inc., an Internet security company. The Atlanta-based startup grew out of the Georgia Institute of Technology and specializes in products that detect and block bots and prevent scraping.
The company initially thought its products would be used primarily to stop fraudulent account creation and spam posts on websites. Instead, scraping has become its customers’ single largest concern.
The biggest target for scraping, Crowder said, is actually nonsensitive — but still valuable — content, like Facebook profile information and original news content.
Those companies are thus presented with a conundrum, he said. “They want that information as public as possible because it drives traffic to their site, but they want to protect it as much as possible because that’s their asset.”
And businesses that make their money on subscriptions see their services become less valuable when their data is scraped and re-created elsewhere on the Web.
That puts legitimate content providers in the position of “competing with their own data that they paid to create,” Crowder said. “It’s absolutely mind-boggling.”
Source: http://www.bizjournals.com/washington/stories/2010/07/12/focus1.html?page=all
In its complaint filed May 10 in federal District Court in Alexandria, Cvent alleged the San Francisco company had taken information that cost more than $10 million to create and reproduced it on its own website — errors intact.
Cvent’s suit highlights a prime fear of companies whose stock in trade is a mass of publicly available data: Web scraping. The widespread but sometimes legally hazy practice — in which tailor-made programs mimic a human user to harvest content from the Web — runs the gamut from benign to malicious.
In some cases, scraping is used to help market researchers or create Web mashups that stitch together data in new and creative ways.
In others, it serves as a vehicle for corporate espionage and piracy. The demand for scraping has spawned a market for custom-built bot software, as well as for software to thwart those bots.
Scraping as a means of stealing Web content occurs “on a fairly regular basis,” said attorney Karl Means, who heads the intellectual property group at Potomac-based Shulman Rogers Gandal Pordy & Ecker PA.
His practice confronts scraping-related piracy about a half-dozen times a year, Means said. “As long as the information is out there, people are going to abscond with it. If you think about it, it’s essentially plagiarism.”
The problems associated with scraping are broader than intellectual property, however — a fact underscored in June by a high-profile and embarrassing security breach at AT&T Inc. A group of hackers exploited a security flaw in AT&T’s iPad 3G network to scrape 114,000 customers’ e-mail addresses, including addresses from the military, media, Congress and the White House.
The hacker, Goatse Security, which derives its name from an infamously obscene website, said it attacked AT&T as a “service to our nation,” to expose a gaping security hole.
Cvent’s lawsuit claims a bot that automatically copies website material had accessed cvent.com several times between August and October 2008.
Eventbrite, the lawsuit claims, took data from 1,613 of Cvent’s copyrighted venue profiles and earlier this year copied and redistributed the information in a “wholesale and indiscriminate” manner on eventbrite.com — including typographical errors, duplicated paragraphs and incorrect tax rates.
Neither party would comment on the suit, and Eventbrite has not yet filed a formal response. In lieu of a temporary restraining order, the parties have agreed not to download information from each other’s websites during the litigation.
Cvent claims it spent more than $10 million researching and building its venue database, dubbed the Cvent Supplier Network, which compiles information like meeting room capacity and amenities for each facility. The company calls the database “a key differentiator” that gives it a competitive advantage over rivals.
Cvent also said it spent $800,000 over the past three years to create and market a destination guide that enables planners to compare locations across cities.
Stealing another company’s competitive advantage is typically the motivation for engaging in Web scraping.
But Michael Schrenk, a Las Vegas- and Minneapolis-based bot designer, lecturer and author of “Webbots, Spiders & Screen Scrapers,” doesn’t see it as clear-cut pilfering, explaining that while scraping can be either legitimate or nefarious, it represents “probably the most exciting area of Web development.”
“Basically, you can make the Internet a lot more useful than what it is,” he said. “Instead of taking the Internet and using it the way it’s presented to you, you can actually [remake] it the way you want the Internet to look.”
Schrenk described his customers as “the people that tend to be a little more adventurous,” those in procurement and fraud detection fields, private investigators and even journalists.
Tech analysts declined to venture an estimate on the size of the market for the services that Schrenk offers, and none of the software companies interviewed for this article would specifically discuss their clients.
“In order to get that competitive advantage, in order to keep it, you got to be kind of quiet about what you’re doing,” said Schrenk, who recently addressed the national hackers convention, DEF CON 17. “No real numbers will ever be made [on the size of the market]. It’s absolutely impossible. But you have to assume there’s quite a bit of it going on.”
On the flip side, the market for combating scraping is “absolutely massive,” said David Crowder, CEO of Pramana Inc., an Internet security company. The Atlanta-based startup grew out of the Georgia Institute of Technology and specializes in products that detect and block bots and prevent scraping.
The company initially thought its products would be used primarily to stop fraudulent account creation and spam posts on websites. Instead, scraping has become its customers’ single largest concern.
The biggest target for scraping, Crowder said, is actually nonsensitive — but still valuable — content, like Facebook profile information and original news content.
Those companies are thus presented with a conundrum, he said. “They want that information as public as possible because it drives traffic to their site, but they want to protect it as much as possible because that’s their asset.”
And businesses that make their money on subscriptions see their services become less valuable when their data is scraped and re-created elsewhere on the Web.
That puts legitimate content providers in the position of “competing with their own data that they paid to create,” Crowder said. “It’s absolutely mind-boggling.”
Source: http://www.bizjournals.com/washington/stories/2010/07/12/focus1.html?page=all