hiQ Labs, Inc. v. LinkedIn Corp.: Scratching the Surface of Web Scraping
Neha Mehta is a J.D. Candidate, 2021 at NYU School of Law.
Introduction to Data Scraping
It’s hard to imagine how the
Internet would function without web scraping. Web scraping, often referred to
as crawling or spidering, is the automated extraction
and collection of large amounts of data from websites. The primary function
of web scraping is to utilize the repositories of data collected to provide
meaningful insights on a variety of metrics, including real estate price
comparison, website competition monitoring, and stock market analysis. The practice is ubiquitous and has been
employed by many sites, including common search engines, such as Google and
Yahoo. Similarly, many websites permit web scraping by third-parties to provide
real-time analytics. Despite the benefits of scraping, scraping can be
problematic when web scrapers collect information on sites without explicit
consent. In the last few years, the improper
collection, retention, and use of third-party data has grown to be
an alarming phenomenon.
Computer Fraud and Abuse Act
Even though web scraping is pervasive,
the legality of the practice has not been definitively settled. Given the
novelty of web scraping, in addition to its highly technical nature, there is
no comprehensive legal framework that aims to regulate web scraping. However, plaintiffs
who have sought to bring suit against third-party companies that engage in
automated scraping of user data have turned to the Computer Fraud and Abuse
Act (CFAA). The CFAA was passed in 1986 and was meant to protect online
data from improper web scraping by imposing both criminal and civil liability. The
CFAA broadly states that “whoever…intentionally accesses a computer without
authorization or exceeds authorized access, and thereby obtains…information
from any protected computer…shall be punished.” Because the CFAA does not
explicitly define “without authorization,” courts have struggled to interpret
its meaning, often resulting in conflicting judicial consideration. For
example, one issue the courts have had to confront is whether “without
authorization” should be interpreted within the scope of a company or site’s
terms of use.
Background
The hiQ Labs,
Inc. v. LinkedIn Corp.case represents the newest development in
cases concerning third-party web scraping practices in violation of a site’s
terms of service. HiQ Labs, a data analytics company, scraped data from public
LinkedIn profiles to develop competing analytics tools; hiQ labs would
routinely sell the data it had aggregated from LinkedIn users to employers. In response, LinkedIn issued a cease-and-desist
letter claiming that hiQ Labs had violated the CFAA and LinkedIn’s User Agreement,
and ordered hiQ Labs to stop accessing and retaining public user data. HiQ Labs
preemptively filed suit seeking injunctive relief, which the district court
granted. Therefore, on appeal, the Ninth Circuit had to determine if after hiQ
Labs had received LinkedIn’s cease and desist letter, whether any further scraping was done
“without authorization” within the scope of the CFAA.
The Ninth Circuit found that
automated scraping of publicly accessible data likely does not violate the
CFAA, even if the site owner tries to revoke access through a cease-and-desist
letter. The court reasoned that the “without authorization” provision does not
apply to publicly accessible data and covers private data that web scrapers
collect when circumventing “permissions, such as username and password
requirements.” Here, LinkedIn’s data was open to the public. Moreover, the
Court pointed to the legislative
history of the CFAA and noted that the “prohibition on authorized access”
was not meant to police automated scraping of publicly available data.
Implications
While hiQ Labs, Inc. v.
LinkedIn Corp. signifies a win for those who advocate that information on
publicly available sites is free to utilize, the Ninth Circuit did not issue a
slam-dunk ruling; rather, the court’s opinion was far from definitive. Even
though the court maintained that scraping public data from websites is legal
under the CFAA, it observed that there were alternative
laws that could provide recourse to corporations or websites who seek to
prevent wholesale copying of public information, including claims rooted in
trespass to chattels, copyright infringement, conversion, unjust enrichment, etc.
Thus, the Ninth Circuit’s holding may be narrow in scope, limited to the
legality of web scraping under the CFAA.
Additionally, the court expressed
concern about providing companies, like LinkedIn, sole discretion in deciding
who can collect and use data that companies do not own and otherwise make publicly
available. Providing companies with the power to determine what information
third-party sites may collect and use may lead to the creation of an
information monopoly that would work contrary to the public’s interest in preserving
an open and fair Internet. However, if authorization under the scope of the
CFAA is interpreted to mean password or login-protected, the Court’s decision
may in turn incentivize
companies and websites to implement securities measures that would transform
publicly available data into private data. This is especially likely given
cease-and-desist letters, one of the few legal remedies available to companies,
are not likely to deter third-party scrapers.
For the time being, the Ninth
Circuit’s ruling provides relevant guidance on how the court will treat future
web scraping cases. However, with the lack of uniformity nationwide regarding
the interpretation of “authorization” and the perceived benefits of web
scraping, comfort may come only if the Supreme Court resolves lingering
questions related to the scope and application of the CFAA.