Archive for January, 2018

TL;DR

Wi-Fi networks are on the cusp of a step-change improvement in speed and bandwidth. This will make previously-impossible technological intrusions (like multi-gig-per-second data transmission) feasible. Government involvement in the oversight of these new networks may increase consumers’ anxiety. Society should welcome this change, but perspectives on the boundaries between private and public formed in the era of 4G need to be updated.

Detailed Discussion 

The mobile networks in developed nations have been sufficient to support substantial progress toward connecting people and things to each other. However, these networks were built primarily to support voice communication and other standard applications like email and web surfing. Due to a variety of technical challenges, they are struggling to meet the demands of more demanding applications like augmented reality (AR), autonomous vehicles, and always-on HD video streaming. To address these concerns, private telecoms firms in the United States, China, Japan, and South Korea have been racing fervently to build out so-called fifth-generation (“5G”) network architectures.

If / when they are successful, 5G will fundamentally change the relationship private individuals have with technology and the ubiquity of computing in society. The promise of 5G is that mobile communication networks will be able to tolerate much larger transfer volumes with lower latency. This will make possible some transfers which were previously prohibitively costly or slow. According to an article from DMV.org, modern automobiles may be able to continuously transmit information about your location, identity (e.g. fingerprints, facial images), and even your health (such as your heart rate and posture). Most connected devices do not have the storage capacity locally to keep long time series of all that information, but the latency and throughput guarantees provided by 5G would allow it to be streamed out to a persistent data store where it could be used to build a more detailed profile and possibly be joined with other information about the device or individual(s) interacting with it.

In light of the new data transfers that will be possible, private individuals need to consider the new context within which they will be asked to make disclosure choices. In general, more consideration will need to be given to longitudinal data and what can be learned from it. For example, disclosing “location” in the 4G world may (for some applications / devices) just mean that you are consenting to the existence of a real-time endpoint that holds your information and can be used to trigger events like location-based ads. However, disclosing “location” in the 5G world may carry more weight, as it may imply consenting to the disclosure of time-series data which could be used to derive other information like patterns of behavior.

To complicate things further, it is possible that in the United States 5G may become a government-run public utility. A few days ago, the Trump Administration floated the possibility of a nationalized 5G wireless communication network. Today’s modern communication infrastructure is dominated by a handful of private firms and improvements in the infrastructure is largely driven by competitive forces. In an internal memo obtained by Axios, the administration cited national security concerns as the main reason it is considering subverting this competitive process. By some accounts, China’s Huawei Technologies is leading the 5G race in the private sector, and the Trump administration is worried about the national security implications of having a critical part of the U.S. communication infrastructure controlled by a foreign firm. If the federal government builds and maintains the network infrastructure, it may make it easier for government agencies to access the data traveling over it.

To be clear, the intention of this post is not to convince readers that the 5G-pocalypse is coming and  that we should fear its might. The promised improvements to mobile networks will open new opportunities for creativity to flourish, for individuals to connect, and for the reliability and effectiveness of institutions and infrastructure to improve. This post merely serves to raise the concern that 5G will alter the context in which individuals make data privacy decisions.

 

References:

  • “Trump team considers nationalizing 5G network”. (Axios)
  • “How Huawei is leading 5G development”. (Forbes)
  • “5G Network Architecture, A High-Level Overview”. (Huawei)
  • “1 billion could be using 5G by 2023 with China set to dominate”. (CNBC)
  • “Autonomous cars, big data, and the post-privacy world”. (DMV.org)
  • “Next-generation 5G speeds will be about 10 to 20 Gbps”. (Network World)

The average Internet user visits dozens upon dozens of websites every day and thereby interacts with the infrastructure and code on which those websites are built upon.  However, individuals also interact with a layer beyond the digital code and many without their active knowledge – a legal code – that of a Terms of Service.

While the former code is easily modified and updated frequently, this legal code, the Terms of Service, is typically drafted by attorneys and updated much less frequently.  Understandably, many companies try to craft a Terms of Service to be as broad as possible, to afford the greatest amount of protection for the company.  While computer code is typically precise and exact, legal code provides for more ambiguity and interpretation.  The law must strike a balance protecting companies and individual’s property and also afford good-actors reasonable access and use.

One enduring question (curiously) not tested until this past decade was the legal consideration if violating a website’s Terms of Service constituted a crime.  The potential logical reasoning behind the inclusion of this violation was based upon the Computer Fraud and Abuse Act , an act passed by the US Congress in 1986 and broadened in 1996 prohibiting unauthorized access to “protected computers” or exceeds authorized access to and obtains any information from these aforementioned computers if the access involved interstate or foreign communication.

As the Electronic Freedom Foundation (EFF) notes in its detailed blog post, in the most recent case Oracle v. Rimini, the Court held that violating a website’s Terms of Service is not criminally punishable under the Computer Fraud and Abuse Act (and similar state statutes).

Core to a component of this case, and as was argued by an Amicus Brief filed by the EFF, definitions of criminal activity must be very specific and follow the Rule of Lenity, which states as the EFF mentioned, “criminal statutes be interpreted to give clear notice of what conduct is criminal.”  But most critically, the EFF goes on to say, “Not only do people rarely (if ever) read terms of use agreements, but the bounds of criminal law should not be defined by the preferences of website operators.”

Of particular interest to Data Scientists was the question of whether using “bots and scrapers” for automated collection of data was deemed a violation of the law if it violated a Terms of Service.  An important tool in the Data Scientists’ and Data Engineers’ toolbox, automated scraping scripts provide for efficient accumulation of data.  Further, many individuals cite instances of Terms of Service being too broad or vague for interpretation.

Among the applications of these scraped data, it subsequently can be used for academic research or used to develop novel products and services that connect disparate sets of information and reduce information asymmetries across consumer populations (for example, search engines or price tracking).  On the other hand, sometimes malicious bots can become burdensome to a company’s website and impact or impede their operations.

Legal scholars have argued public websites implicitly give the public the right to access (including to scrape) the content, but a some companies disagree.  This presents a fascinating quandary that is beyond the scope of this article.

At risk, and argued by Oracle in the case, was that “the manner in which [the defendant] used” “bots and scrapers” was more than a contractual violation (a violation of the Terms of Service), but also a criminal violation under the Computer Fraud and Abuse Act.  Viewable beginning at 33:42 , Judge Susan Graber stated (at 36:00) she has difficulty seeing how Oracle’s arguments fits with the statute and previous cases.  “They had permission to take [the scraped data]” she states, and that previous cases and statues refer only to data that they did not have legal access to.  Oracle’s attorney rebuts by saying (at 34:47), “The manner restriction is critical to protect the integrity of the computer systems.” And Judge Graber counters that this potentially has jurisdiction in the civil sphere, but not in the criminal realm.

In another, currently pending case, hiQ v. LinkedIn, the Court noted further danger:

Under [an aggressive] interpretation of [the Computer Fraud and Abuse Act (CFAA) ], a website would be free to revoke ‘authorization’ with respect to any person, at any time, for any reason, and invoke the CFAA for enforcement, potentially subjecting an Internet user to criminal, as well as civil liability.  Indeed … merely viewing a website in contravention of a unilateral directive from a private entity would be a crime, effectuating the digital equivalence of Medusa.

The Court goes on to articulate that website owners could block certain populations on the basis of discrimination, consequently, putting any individual, including Data Professionals, who accesses a website at risk.

Fortunately, the Ninth Circuit articulated in the Oracle v. Rimini case that “[T]aking data using a method prohibited by the applicable terms of use when the taking itself generally is permitted, does not violate [criminal statutes]” (Page 3).

This Oracle decision further clarifies for Data Scientists, Data Engineers, and others that they cannot be criminally prosecuted from violating a website’s Terms of Service.  As mentioned above, because Terms of Service can be broad and open to interpretation, data professionals were potentially under risk of criminal prosecution and liability if a company were to encourage authorities to pursue criminal prosecution in addition to exclusion and discrimination.  This resolution, however, still leaves a remedy for businesses to go after bad-actors through civil litigation.  Oracle v. Rimini helps clarify some of the parameters in which the law will be applied to web scraping.  The other case mentioned in this post, hiQ v. LinkedIn, soon to hear oral arguments in March of 2018, will further test the resolution in the Oracle case in addition to previous cases that have been resolved similarly.

 

Note: When engaging in web scraping, there are a number of best practices to engage in, such as respect the Terms of Service as much as possible, respect a website’s Robots.txt, identify your bot, do not republish the data without consent, do not gather non-public or sensitive data, do not overburden the website, e-mail the admin if you have a question, or if you have additional questions seek advice from an attorney.

Disclosure: I am not a lawyer and am interpreting these legal concepts and rulings from an aspiring Data Scientist’s perspective.  Should there be an error in my understanding or writing, or if you have a question, please let me know at dkent [at] Berkeley [dot] edu.  Thank you in advance.

 

Please join us for the NLP Seminar on Monday, January 22,  at 4:00pm in 202 South Hall.   All are welcome!

Speaker:  Jacob Andreas (Berkeley)

Title:  Learning from Language

Abstract:

The named concepts and compositional operators present in natural language provide a rich source of information about the kinds of abstractions humans use to navigate the world. Can this information help us build better machine learning models? We’ll explore three different ways of using language to support learning: to provide structure to question answering models, fast training and improved generalization for reinforcement learners, and interpretability to general-purpose deep models.

( Slides )

PhD Student Noura Howell and Prof. Greg Niemeyer are participating in the Arts Research Center Fellowship to develop their project proposal FEELER/CRAWLER/OCTOPET.

This project explores an alternative vision of urban sensing inspired by the dérive or unstructured wandering of the Situationists. Through these walks the Situationists sought to break out of routines and experience life differently. This project asks,

Instead of surveillance or self-optimization, how can biosensing support more poetic unscripted ways of experiencing daily life?

A Feeler/Crawler/Octopet wanders the environment and invites others to wander too. Its senses/sensors comprise its own quirky perspective, not the all-seeing authority of data surveillance. Its skin includes sensing pigments that change color in response to temperature, pH, carbon monoxide, UV, and ozone. It can sense touch with conductive fabric and piezo discs and respond with electrical signals transduced into sound. With little protruding feelers, it crawls around accumulating (recording, sensing) material traces of dust, dirt, leaves, and DNA.