California’s Trailblazing Consumer Privacy Law

California’s Trailblazing Consumer Privacy Law by Anamika Sinha

I’m sure when most of us stumble across a trivia question of sorts, the first answer key that crosses our minds is Google, the ultimate search engine. But have you ever wondered what exactly Google does with the data it gathers from the billions of searches that it is used for per day, (about 3.5 to be precise)? Well that’s a question that not even the ultimate search engine will answer perfectly for you. In fact, Alastair Mactaggart, one of the main masterminds behind the creation of the new California privacy bill, once reminisced about a time when he bumped into a Google engineer at a cocktail party, where the employee casually mentioned that if consumers had a glimpse into what the company knew about their users, they would be shocked. This ultimately gave Mactaggart a clear incentive to advocate for privacy rights for the general public, which resulted in a new piece of legislature known as the “California Consumer Privacy Act of 2018”.

Highlights of the Law: This law passed by CA government on June 28, 2018 will be enforced starting Jan 1st, 2020. It mandates businesses to disclose all categories of data that they have collected for a given user over the last twelve months including names of third party entities with whom they have shared the data. It also requires businesses to offer users a simple, intuitive way  to opt out of having their information stored, and shared. It also allows businesses to monetize aggregate data as long as individuals were unidentifiable from aggregate data. Lastly it also allows businesses to charge a different price to customers who have opted out ONLY if they can prove that the difference in charge is related to the value provided by the consumer’s data. Comparing it to the GDPR initiative that was ratified by Europe Union earlier this year, there are a lot of similarities. One key difference is that while GDPR levies huge fines on companies for non-compliance, the California law falls short and gives sweeping powers to the attorney general.

Reaction from Businesses: How do major businesses feel about having stricter privacy guidelines? While Facebook supposedly supports this law (truth be told, its mainly because the alternate measure for November CA ballot was more onerous. The ballot proposal was polling at 80% approval rate. Ballot measures are much harder to change than legislations) , most other businesses in Silicon Valley and the nation in general seem opposed to the legislature.

Tech giants like Google and Uber were frustrated that they were not consulted and such an important legislation was passed in record time without proper deliberation on the pros and cons. Even if their concerns may be somewhat valid, the reality is that they will have to make massive changes to support the law and risk a high percentage of their customers opting out from data collection and sharing. This puts their entire business model at risk.

Another major argument that opposers had was that the privacy issue should be addressed by US Congress, and not by individual state governments. This leads to an impending question, how will this law in world’s fifth largest economy affect the rest of the country? Well, due to many factors, this will likely influence a significant amount of companies to apply the same rules to all their customers. This is because the expectation that businesses will filter through their IP addresses in order to implement this law to only their California users is quite unreasonable, which means that users worldwide will benefit.

What’s Next? I’m sure that nobody imagined that the Google employee’s words at a cocktail party would have such a large domino effect on our tech world, just like we can’t predict the true extent of people that this law will impact. But it’s safe to say, that businesses will all use their money and influence to orchestrate some changes to the law. Its hard to imagine that tech giants will sit still and allow the law in its current form to take effect. Regardless of what happens next, when you combine Europe led GDPR initiative with this California initiative, one can rest assured that the the world wide web is about to endure some major changes from the standpoint of privacy.

Venmo is Conditioning Your Expectation of Privacy

Venmo is Conditioning Your Expectation of Privacy by James Beck

Add to your ever-growing list of apps and services to pay attention to: Venmo.

Venmo is an interesting service. It’s core ability is to quickly and conveniently facilitate transactions between you and your contacts list. Need to pay your friends back for a night at the bar that got a little out of control? Split a check at a restaurant, but don’t want to deal with the awkward pain of asking the server to deal with awkward fractions to get the amounts correct? Or maybe you have more illicit purposes and you’d rather just not deal with cash.


Who carries cash anymore? Just Venmo me

Regardless of your usage, Venmo is a wildly convenient means of moving money around. Users are fairly easy to find and setting up your account with your bank account information or a credit card is fairly straightforward as well.

So what’s the catch?

Well, there doesn’t seem to be one – for now.

For a long time there has been somewhat of an urban myth of sorts that Venmo is making money by micro-investing the cash that sits in their service while you wait for it to be transferred to your bank account (Users must specifically request that Venmo transfer their balance to their account so significant sums of money can be stuck in Venmo-limbo for long durations). However, for a long time Venmo wasn’t generating a lot of income for itself beyond the usual and expected credit card transaction fee.

The catch is that Venmo is following a model that has been paved out for it by many services before it. Attract a ludicrous volume of users, generate information, and figure out later how exactly to capitalize on those users and their information.

Venmo has now begun to partner with businesses to allow users to pay for real goods and services directly through the application rather than just serving as a means to pay your friends back for that late night burrito at that cash only place that you always forget is cash only. Venmo plans to charge a transaction fee to these businesses in exchange for the convenience of their service – the thought being that users have become so accustomed to the convenience of Venmo between their peers that they will begin to expect that same payment convenience from businesses. This also feels fairly reasonable.

However, Venmo has another facet to it’s service that is worth stopping to consider.

In a way that feels oddly tacked on Venmo also serves as a social media site of sorts. Transactions by default show up in a “news feed” style interface along with all of your contacts’ transactions. The amounts are hidden, but the participants and the user-entered descriptions of these transactions are visible. What you’re left with is a running history of who is paying who and for what.


Venmo’s Social Media Feed

It’s a strange and mostly benign feature. Transactions can be set to private if so chosen and even if you don’t choose to keep things private you still have the autonomy to choose the description of your transaction and keep it fairly innocuous.

What we should be concerned with though is how the addition of this social media dimension to a service that is just supposed to serve as a tool for monetary transactions is conditioning users for the future of Venmo. By incorporating a social media feed as the default behavior of the application, Venmo is slowly normalizing sharing our transactions publicly. This is not something that has traditionally been seen as “normal”.

Our credit card purchases have historically been seen as very private. However, now that we’ve normalized sharing payments between individuals will there be any protest when we start sharing our transactions with businesses by default? Will there be protest when Venmo starts using our past transactions to serve ads to us and our contacts? Or will we shrug our shoulders because the new de-facto business model is to attract users to a free or wildly inexpensive tool of convenience and then eventually introduce targeted advertisements based on our behavior with that service?

I fear it’s the latter and you should too – we’ve normalized sharing so many details of our lives and in doing so have gradually eroded our expectations of privacy. The way you move your money around is about to become the next pool of big data to analyze and the only fanfare to mark the occasion will be an update to a privacy policy that we’ll all forget we agreed to.

Drones: Privacy up in the air

Drones: Privacy up in the air by Elly Rath

Drones or unmanned aerial vehicles (UAVs) are flying devices which are capable of collecting a vast array of information on a daily basis if configured and equipped correctly. The basic function of most drones is for aerial videography and to capture images. Images are not the only data drones can gather. Equipped with an appropriate sensor, a drone can capture light, speed, sound, chemical composition and a myriad of information. Drones are now a ubiquitous and unlike earlier days not limited to military or police surveillance. They can be easily purchased online or at a local superstore just like we purchase any toy.

There are no laws controlling the data collection or restricting its usage. Any individual or private company with a properly equipped drone can collect and process huge amounts of potentially private information in a short time.

The privacy concerns surrounding drones are a concern for both the government and civilians. On Nov 29, 2017 the NY Times published that DJI is fighting a claim by one United States government office that its commercial drones and software may be sending sensitive information about American infrastructure back to China. DJI is the market leader on drones. In this blog I mostly focus on the perspective of civilian privacy.

Drones are affordable and the learning curve to operate it is mild. Even a novice can master it in a few days. While currently there are laws to protect citizens against people stalking or spying on them in their homes, there are no such federal laws that would protect individuals from being spied on specifically by a drone. A drone can fly overhead unnoticed, peer directly into someone’s house or record activities on a private property from the sky. Drone privacy data regulations first surfaced in 2012 when the Federal Aviation Administration (FAA) was tasked with integrating drones and UAVs into US airspace. The FAA, however, failed to consider privacy aspects. Electronic Privacy Information Center (EPIC), a privacy and civil liberties nonprofit, along with 100 organizations, experts, member of the public has filed multiple petitions against FAA since 2014. One of the petitions is still pending.

It is normal for citizens to expect certain amount of privacy on their own property. In recent years, there have been multiple incidents of citizens suspecting that they were being “watched” by someone operating a drone above their property. Sometimes it could be simply a land survey company but it is still unsolicited. In one instance a man called William Merideth in Hillview, Kentucky shot down a drone hovering over his sixteen year old daughter who was sunbathing in the garden. He was arrested for wanton endangerment and criminal mischief. A Kentucky judge dismissed all charges against William stating the drone was an invasion of his privacy.

With the multitude of images the drones can capture now we have companies that develop high-tech software that do the data analysis in one click of a button. Hence companies who had difficulty making sense of the data now have algorithms to help them. Hence the risks to privacy simply increase with further technological advancements.

The first law regarding personal airspace above one’s property was passed in 1946, when the Supreme Court ruled in the case United States v. Causby that a person’s property extended to 83 feet up in the air. The federal government prohibits the unauthorized use drones above national parks, military bases, above airports, and federal buildings. Civilian drones should fly at or below 500 feet and the maximum speed limit is 100 miles per hour. But that’s all we have as far as federal law. Drone laws are now mostly covered in State laws and vary from state to state. FAA has issued a fact-sheet for state and local law makers to help in creating non-federal drone laws. Many of the state laws pertaining to drones relate to interfering with emergency measures, filming someone without their permission, accident sites or crime scenes. Penalties range from fines to jail time, but again, it can be very difficult to enforce such laws.

The lack of clear and standardized drone privacy laws is glaring compared to the over 1 million FAA registered drones excluding light weight recreational drones. In 2017, Senator Edward Markey introduced a drone privacy bill which aims to create privacy protections and data reduction requirements about information a drone collects, disclosure provisions for when data collection is happening and warrant requirements for law enforcement.

On one hand drone advocacy group Small UAV Coalition, which represents companies like Google’s parent, Alphabet, and Amazon wants lax laws, while on the other, citizens want well defined boundaries. Hopefully, a mutual middle ground will be found in the near future, offering increased innovation for businesses while retaining citizens’ rights.

Smart Home Devices: Consent to Private Matters?

Smart Home Devices: Consent to Private Matters? by Adhaar Gupta

June 30, 2018

A garage linked to a camera that scans the number plate and automatically opens up the door as you drive your car towards it. We all love easing our lives but can we be sure that this camera is not storing our data and using it for purposes we don’t know off. Will this data be analyzed to figure out when you are likely to be home and then used to schedule deliveries based on your availability at home. What if this data is shared with the advertisers who know the make and model of your car? Running algorithms to tie your car details with your household income or family details to offer you products (be it car accessories, or a new car loan etc.) that you were not even thinking about. Are systems determining what an individual needs rather than the individual himself?

Research from Parks Associates finds purchase intentions for smart home devices among U.S. broadband households have increased by 66% year over year. Analysis firm Juniper Research shows smart speakers will be installed in 55 percent of U.S. households within the next four years, and that total advertising spend on voice will reach $19 billion in the same period. These research firms confirm that the use of smart home devices is on the rise and so are the privacy concerns. There are eavesdropping concerns with the smart home devices. Recently a family reported that the smart home device recorded their conversations at home and sent it to a random contact in their list. Smart devices especially home technology is rapidly integrating into our personal lives and it’s not difficult to imagine its consequences on privacy violations.

We have smart speakers that record conversations, smart thermostats equipped with motion sensors track the whereabouts of each household member, smart security systems that recognize family members and enable key less entry, smart health tracking devices, smart refrigerators that analyze our grocery lists, spending habits and much more. We are surrounded by devices that invade out private spaces.

If the existing spectrum of smart devices weren’t enough to invade our privacy, the future has much more in-store for us. Recent headlines from a leading tech giant demonstrated a sneak peek into future of voice where an Artificial Intelligence (AI) system was used to book a hair appointment and a restaurant reservation. The conversation between a human and a computer sounded natural. Systems talking and engaging in complex conversations without human intervention seems exciting and concerning at the same time. Should we be aware that we are talking to a bot on the other side of the phone? Should we be concerned that someone is recording the phone conversation with a bot? The recordings can reveal a lot more about a person and its surroundings than he or she would like to share. All these advances will have massive ramifications on the future of advertising. We need to be more concerned about the consequences.

The recent Facebook-Cambridge Analytica case of harvesting data to meet ones need and the new data rules (GDPR) implemented in EU have brought spotlight on the data privacy aspects. As a user we should be aware that getting a smart device is like inviting an outsider into our private space.

References:

“It’s All About Me”

“It’s All About Me” by Prashant Sahay

To my grandfather, privacy was a pretty simple concept. He believed that what we did in public spaces was public, everything else was private. If we played loud music at home, that was a private matter. But I had to turn the volume down when driving by a busy street corner. To his way of thinking, an individual’s personal information, likes and dislikes, political affiliation, health issues, friend lists, unless he or she advertised them in public, were all private matter. This was a simple minded yet surprisingly useful conception of privacy.

In contrast, since May 25th when GDPR went into effect, we have been inundated and puzzled by the complexity of privacy policies of websites we did not even realize collected so much personal information. These policies seem to be designed to comply with governing laws including GDPR in Europe as well as digital privacy statutes and regulations elsewhere. They will argue that you can exercise your right to be let alone by refusing to sign consent and opting out of their service. You are assured that there is no potential for substantial injury or harm because the website will safeguard your information. You are also assured that you have a right to see and amend your information in keeping with Fair Information Practice Principles. With GDPR, you also have a right to have your information deleted if you so desire.

The policies I have read check all of the legal boxes: governing laws, tenets of sound privacy practices and standards. And yet, these are deeply unsatisfying, as they do not seem to have a core theme. Dan Solove rightly likens the struggles of privacy as a concept to those of Jorge Luis Borges, the Argentine playwright who lived vicariously through the characters he created but who did not have a core of his own. Privacy, Solove observes, lacks a core and is a concept in disarray.

Solove sidestepped the challenge of defining what privacy is by addressing the ways in which it could be lost. Solove proposed his conception of privacy in the context of harms that materialize when privacy is lost. Solove’s conception begs an important question, though. If losing something is undesirable, shouldn’t what is lost be considered an asset?

I believe that conceptualizing personal information as an asset that belongs to the individual provides a core structure over which other privacy principles can be added. As a society, we are pretty good at developing, owning, renting, managing and accounting for assets. We can apply our body of knowledge and standards of practice related to private property rights and asset management to managing personal and private information. We can de-center companies and websites from the role of safeguarding privacy and place the individual at the center. In this conception, information about an individual will be an asset that is created by society but owned by the individual. The individual will authorize the use of personal information to be selective, targeted, and temporary. For example, the individual will allow a bank to use her information to determine whether she is creditworthy when she apples for credit and disallow the access when she closes the account. Of course, implementing selective, targeted and temporary disclosure of information would require greater interoperability between computer systems. Fortunately, there are new technologies on the horizon such as Blockchain and Enigma that facilitate such interoperability.

It is time we gave the control over personal information back to the individual and let him or her decide what information they want shared, with whom, and for how long. The individual also must take responsibility for her actions too. If he or she does something stupid in a public space or forum, do not expect people to forget it. You do not have a right to wipe out memories.

Why wearable technology could actually hurt the earning power of professional athletes

Why wearable technology could actually hurt the earning power of professional athletes by Nick McCormack

According to Statista, the number of connected wearable devices is expected to reach over 830 million by the year 2020. It is already estimated that about 25% of Americans own a wearable device, but for most individuals, the data collected by these devices is strictly for private consumption. In instances where it is not, the data is pseudonymized and aggregated with other people who own the same device. Imagine, however, that the company you worked for was monitoring your device and tracking your every move. They knew the number of hours you slept and the quality of that sleep. They knew what you were eating, and what your body was doing when your performance slowed. Using this information, they then gauged your aptitude as an employee, how well you were tracking to your potential, and the brightness of your future. Your next salary could potentially be based on this information, and if it didn’t look great, you may not even receive an offer to continue on as an employee.

While this may seem like a distant dystopia, this is the not so distant reality for many professional athletes. While much of the focus on the use of wearable technology in sports has focused on the positives (e.g. injury prevention, performance optimization, and preventative medicine), more and more athletes and player’s associations (essentially unions for athletes in a particular league) are beginning to push back against their usage-at least when they are applied inappropriately. In an article in Sports Illustrated, bioethicists Katrina Karkazis and Jennifer Fishman warn that they, “come with the risk of compromising players’ privacy and autonomy” and could “even cut short athletic careers.”

One of the biggest concerns, voiced by Michele Roberts, executive director of the NBA’s player’s association, is that the data could be used against players in contract negotiations. Teams may notice something in the wearable data that raises a red flag, and an injury that has yet to realize could prevent a player from receiving their next contract. This is an even bigger concern in a league like the NFL, where contracts are not guaranteed. Players could be cut based upon their running motion, ability to recover, or even their sleeping habits.

While consent is generally made at the individual level, a bigger concern is that players may not even have the autonomy to make the choice of whether or not to wear these devices for themselves. This decision is likely to be negotiated between player union representatives and league owners in the Collective Bargaining Agreement (CBA). The CBA is a document that essentially sets forth the laws of the league and is renegotiated every few years. The player’s association may trade higher salaries for less say in what can and cannot be collected. This could inadvertently hurt the earning power of its athletes.

This raises another big question-who exactly owns the data? Is it the player, the team, the team’s medical staff? Conflicting goals and motivations could result in a major standoff and involve legislation about whether or not wearable technology is medical data that binds doctors to patient-doctor confidentiality laws. The implications of such a decision could extend far beyond sports.

Overall, this is something to definitely keep an eye on in upcoming CBA negotiations. As big data becomes a larger and larger part of society, issues like this are likely to spread from isolated industries like professional sports into our daily lives. Hopefully all of the ramifications are thought through thoroughly and our privacy is protected going forward.

Universities: They’re Learning From You Too

Universities: They’re Learning From You Too by Keri Wheatley

Universities want you to succeed. Did you know that? Besides the altruistic reasons, universities are also incentivized to make sure you do well. School funding is largely doled out based on performance. In Florida, state performance funding is determined by ranking the 11 eligible universities on 10 criteria, such as their six-year graduation rate, salaries of recent graduates, retention of students and student costs. The top ranked university nets the most funding while the 3 lowest ranked universities don’t get any at all. To put it into perspective, Florida A&M University earned $11.5 million for the 2016-2017 school year, but then lost that funding when it finished 10th on the list the next year. Policies like these, coupled with the trend of decreasing enrollments, have compelled universities to start thinking about ways to improve their numbers.

data

Data analytics is a rapidly growing field in the higher education industry. Universities are no longer using data just for recording keeping, but they are also using data to identify who will be successful and who needs more help. What does this mean for you? Before you enroll and while you are there, the university’s basic systems collect thousands of data points about you – high school background, age, hometown, ethnicity, communications with your professor, campus housing, number of gym visits, etc. This is normal. If a university didn’t keep track of these things, it wouldn’t be able to run the basic functions of the organization. However, it is when a university decides to combine these disparate data systems into one dataset that this raises some eyebrows. The mosaic effect happens when individual tidbits of information get pieced together to form a picture that wasn’t apparent from the individual pieces. And having that knowledge is powerful.

But they’re using their powers for good, right? There are a lot of questions that should posed when universities begin building such datasets.

What about security? Every day, data is becoming more vulnerable. Organizations, especially those regulated and funded by government agencies, just can’t keep up with new threats. When universities begin aggregating and sharing this data internally, they open themselves and their students to new risks. Do the benefits for the students outweigh the potential harms? This can only be answered on a case-by-case basis, since the security practices and uses of data differ vastly between universities.

Who has access? FERPA, one of the nation’s strictest privacy protection laws, was written to protect student personal information. This law restricts universities from selling student data to other organizations, and also dictates that universities have to create policy to restrict access to only those who need it. In practice, however, these policies are applied ambiguously. Professors shouldn’t have access to students’ grades, but your history professor wants to know why you wrote such a bad essay in his class, so he has a use case to look up your English I grade. Unless a university has stringent data access policies, this dataset could be shared with persons at the university who don’t need access to it.

How do they use the data? There are many ways. Once a university collects the data of all students, it gets a birds-eye view. Institutional researchers then have the ability to answer any question. Which students will drop out next semester? Do students who attend school events do better than students who don’t? How about computer lab visits? How does the subject line affect email open rates? These are all investigations I have done. Universities tend to ask more questions than they can provide actions to the answers. This leads to an unintentional imbalance where the university learns more about its students than is necessary to make decisions.

Universities are asking a lot of questions and finding the answers through the data. In doing so, they are learning more about their students than their students are aware of. How would a student feel if he knew someone was monitoring his gym visits and predicting what grades he will get? What if his academic advisor knew this piece of information about him? How would the student feel when he starts getting subtle nudges to go to the gym? These scenarios are a short step from becoming reality.

In the end, you are purchasing a product from universities—your degree. Shouldn’t they have a right to analyze your actions and make sure you are getting the best product? At what point do we consider it an invasion privacy versus “product development”?

diploma

The Cost of Two $1 Bike Rides

The Cost of Two $1 Bike Rides by Alex Lau

In February 2018, bike sharing was finally introduced to denizens of San Diego, making their presence known overnight, and without much forewarning, as multicolored bicycles seemed to sprout on public and private land all across the city. Within weeks of their arrival, multitudes of people could be seen taking advantage of the flexibility these pick-up-and-go bikes provided, and most people liked the idea of offering alternatives to cars for getting around town. Not as widely discussed was the large amount of information these companies gather through payment information, logging of bike pick-up and drop-off locations, and potentially a vast store of other less obvious metadata.

Recently my wife and I grabbed two orange bikes standing on the grass just off the sidewalk, deciding to ride to the nearby UCSD campus. After each of us paired a payment method to the Spin app, and off we went. We hit a snag while pedaling up a one-mile incline that is normally imperceptible behind the wheel of a car, but forced us to pedal at a moderate jogging pace in the bikes’ first gears. We finally got off the bikes short of the campus, grateful that the service allowed us to drop-off a bike as easily as we had picked them up. After walking them over to a wide part of the sidewalk and securing the wheels with the built-in locking mechanism, we began to walk the rest of the way. Maybe we wouldn’t be competing in the Tour de France, but we got in a little exercise, had some fun riding bikes together, and tried out a new bike app for very little money.

Spin bike. (By SounderBruce - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=61416390)

Within a minute of leaving the bikes, we both received text messages and e-mails informing us that we did not leave the bikes in an approved designated area, and that our profiles may suffer hits if the bikes were not parked well. While trying to understand what constituted a designated area in a neighborhood already littered with bike shares, we began wondering to one another what information we had just handed over to Spin and what kind of profile the company was building on us.

There have been articles in the press about the potential dangers of inadvertent data leakage with ride-sharing apps, using a situation where a high-level executive of a well-known public company uses a ride share to visit the doctor, or perhaps more revealing in this hypothetical, an outpatient cancer therapy center. This type of information could be accidentally or even purposely exposed, invading the rider’s privacy and perhaps used to hurt the company’s stock price. While I doubt my bike app is angling to embarrass me in the tabloids one day, some of the same data that can leak out of ride-sharing habits extends to the simple bike app.


Note: You cannot drop off anywhere.
(https://fifi.myportfolio.com/spin-branding)

In the case of our quick ride, one could begin to imagine how Spin might start to learn personal details about my wife and me both individually and as two users that share some sort of connection. While we each paid through Apple Pay, keeping private some of the payment details from Spin, we had to provide phone numbers and e-mail addresses. Even without providing a street address, repeat uses of the app may build Spin a picture of which neighborhood we live in. When we had the chance to read through Spin’s privacy policy, we found most of it to be in the realm one expects: if you use our service, we have access to your info. A few other items were a little bit more concerning including Spin reserving the right to use our personal information to pull a driving record and to assess our creditworthiness. Although we had assumed there might be some method of ensuring a rider cannot abuse a bike or cause an accident without being exposed to some liability, neither of us thought that might include pulling a driving record. Other areas of the privacy policy mention that a user’s private information is Spin’s business asset, and may be disclosed to a third party in the event of a corporate restructuring or bankruptcy.

Although I am not privy to how Spin uses their user data, if I were in their position I can understand the business reality of protecting the company’s assets and satisfying insurance obligations for running a business where almost anyone with a smartphone and credit card can pick up a bike with no human intervention. But even though the policy may state what the company can do with personal data, I would want to err towards the option of least intrusion, or least potential harm. I find it hard to justify using a user’s information to run a detailed background check on their credit history and driving record for building a user profile, but if a user is involved in an incident, such actions may be required. (If the incident is severe, privacy may not be possible or guaranteed regardless if legal action is involved.) I do worry that the lines between which actions are viewed as ethically right or wrong in relation to user data may shift especially if the company was facing financial hardship.

While the privacy policy opened my eyes about what our cheap novelty really cost us, I would be naive not to assume every other app and non-app service I use daily has similar wording. It can be worryingly easy to handwave away such concerns as the price for participation and access to these services, however. Instead as data professionals, we need to take advantage of our expertise to examine and understand the potential benefits and pitfalls of how other organizations use our user data, and lend our voices where needed to minimize potential areas for abuse.

Spin Privacy Policy: https://www.spin.pm/privacy

Thick Data: ethnography trumps data science

Thick Data: ethnography trumps data science by Michael Diamond

“It is difficult / to get the news from poems / yet men die miserably every day / for lack / of what is found there.” William Carlos Williams

As business continues to pivot on its data-obsessed axis, with a fixation on the concrete and measurable, we are in danger of missing true meaning and insight from what surrounds us. The field of ethnography, established long before the bright sparks of data science were kindled, provides some language to enlighten our path, guide us through the thickets of information, and situate the analytics with new perspectives.

Drawing on his field-work in Morocco, the American anthropologist Clifford Geertz introduced the world to “thick descriptions” in the 1970’s in the context of ethnography, borrowing from the work of a British philosopher Gilbert Ryle who used the term to unpack the work of language and thought. Ethnographers, who study human culture and society, need to hack through a path to insight like a hike though an overgrown jungle. Thick descriptions are, in Geertz’s words, the “multiplicity of complex conceptual structures, many of them superimposed upon or knotted into one another, which are at once strange, irregular, and inexplicit.”

For today’s ethnographers, and the business consultants who champion these methodologies, thick description has morphed in to thick data. Consultant Christian Madsjberg contrasts this with the “thin data” that consumes the work of data scientists, and which he portrays as simply “the clicks and choices and likes that characterize the reductionist versions of ourselves.” What thin data lacks is context — the rich overlay of impressions, feelings, cultural associations, family and tribal affinities, societal shifts — the less measurable or unseen aspects that frame and inform our orientation towards the world we experience.

Businesses, or at least the products they launch, die miserably every day for the lack of what is found in this thick data. The story of Ford’s introduction of the Edsel is instructive.

In 1945 Henry Ford II took over the auto manufacturer that his grandfather had founded at the turn of the twentieth-century. By the 1930s and 1940s Ford’s growth had slowed and the business reputation was waning. Henry Ford immediately set out to professionalize the management team with modern scientific principles of organization. Bringing together the finest minds from the war effort and from rival companies like General Motors, Ford hired executives with Harvard MBAs and recruited the “whiz kids” from Statistical Control, a management science operation within the Army Air Force. The senior management team wrestled over strategy and organization and pored over the data – commissioning multiple research studies and building elaborate demand forecasting models. They ultimately concluded that what Ford needed was a new line of cars – pitched to the upwardly-mobile young professional family. The data and analysis identified a gap in the product portfolio, an area where Ford under-served the market demand and a place where their rival General Motors showed growing strength.

The much heralded “Edsel” launched on September 4, 1957 with a commitment from over 1,000 newly established dealerships. Within weeks it was clear that the public was turning against the product and the brand never gained traction with its target market. Described by one critic as “fabulously overpriced jukeboxes,” the Edsel came to represent everything that was wrong with the flash and excess of Detroit. Within two years Ford had abandoned the business and Edsel had become a watchword for a failed and misguided project. With over $250mm invested and no sign of the projected 200,000 unit sales in sight, the last Edsel rolled off the production line on November 20, 1959.

Ford missed the cultural moment.
Looking back, Ford’s statisticians and planners missed a series of cultural moments – the thick data that was hidden from their analysis and models. First, there was an emerging sense of fiscal responsibility, as car-buyers increasingly saw vehicles coming out of Detroit as gas-guzzling dinosaurs belonging to an earlier era, an idea that was successfully exploited by one of the best selling cars that season: American Motors’ more fuel-efficient “Rambler.” Second, the deepening sense that America was falling behind the rest of the world culturally and scientifically, that participating in the American Dream was not quite as glamorous as once believed, a sense heightened with the deep psychic impact felt across America when the Soviet Sputnik went into orbit in October 1957. Third, the beginning of a consumer movement against the product-oriented “build it and they will buy” approach to marketing — a concern, captured in the same year as the Edsel launched, with the publication of Vance Packard’s _Hidden Persuaders_ that exposed the manipulation and psychological tactics used by big business and their Madison Avenue advertising agencies; and an approach to marketing that was roundly and succinctly critiqued a few years later in Theodore Levitt’s seminal 1960 essay Marketing Myopia.

Lessons for history.
The Edsel may be one of the best known business failures before the age of Coca Cola’s New Coke, or McDonald’s Arch Deluxe, but it is an interesting and salient case because cars are a uniquely American form of self-expression – they announce who we are, how we see ourselves and what tribe we belong to. Indeed automakers have been described as the “grammarians of a non-verbal language”.

But these lessons about fetishizing the things that can be measured, ignoring the limits to how well we can quantify key drivers, and mistaking strong measures for true indicators of what matters most, were to have much greater consequences than an abandoned brand. Sadly they were lessons still being learnt by America as the country entered and prosecuted the War in Vietnam a decade later. Robert McNamara, one of the “whiz kids” hired by Ford who rose to be President of the company, was now leading America’s military strategy, as Secretary of Defense. His dedication to the “domino theory,” which argued that if one country came under the influence of Communism, all of the surrounding countries would soon follow suite, was the justification used to escalate and prolong one of America’s most misguided foreign interventions. And his obsession with “body count” as the key metric of the war led many to exaggerate and mislead the public.

While it is simplistic to reduce the tragedy of the War in Vietnam to one man or one concept, more than a million Vietnamese, civilian and military, died in that war and nearly 60,000 soldiers from the US lost their lives.

McNamara failed to grapple with the “thick data” of the situation because it was hard to quantify. He refused to embrace an hypothesis about the conduct of the war that differed with his own, as it would have meant pursuing a much deeper understanding and empathy for the leaders and people of South East Asia. Ultimately McNamara, by then in his 90’s, came to understand and champion “empathy” in foreign affairs. “We must try to put ourselves inside their skin and look at us through their eyes, just to understand the thoughts that lie behind their decisions and their actions.”

Algorithmic Misclassification – the (Pretty) Good, the Bad, and the Ugly

Algorithmic Misclassification – the (Pretty) Good, the Bad, and the Ugly by Arnobio Morelix

Everyday, your identity and your behavior is algorithmically classified countless times. Your credit card transaction is labeled “fraudulent” or not. Political campaigns decide whether you are a “likely voter” for their candidate. You constantly claim and are judged on your identity of “not a robot” through captchas. Add to this the classification of your emails, the face recognition in your phone, the targeted ads you get, and it is easy to imagine hundreds of such classification instances per day.

For the most part, these classifications are convenient and pretty good for you and the organizations running them. So much so we can almost forget they exist, unless they go obviously wrong. I tend to get a lot of examples of these predictions working poorly. I am a Latino living in the U.S. and I often get ads in Spanish. Which would be pretty good targeting, except that I am a Brazilian Latino, and my native language is Portuguese, not Spanish.

Needless to say, this misclassification causes no real harm. My online behavior might look similar enough to the one of a native Spanish speaker living in U.S., and users like me getting mis-targeted ads may not be more than a rounding error. Although it is in no one’s interest that I get these ads — I am wasting my time, and the company is wasting money — the targeting is probably good enough.

This “good enough” mindset is at the heart of a lot of prediction applications in data science. As a field, we constantly put people in boxes to make decisions about them, even though we inevitably know predictions will not be perfect. “Pretty good” is fine most of the time — it certainly is for ad targeting.

But these automatic classifications can go from to good to bad to ugly fast — either because of scale of deployment or tainted data. As we go to higher stake fields beyond those they have arguably been perfected for — like social media and online ads — we get into problems.

Take psychometric tests for example. Companies are increasingly using them to weed out candidates, with growth in usage, and 8 of the top 10 private employers in the U.S. using related pre-hire assessments. Some of these companies are reporting good results, with higher performance and lower turnover. [1] The problem is, these tests can be pretty good but far from great. IQ tests, a popular component of psychometric assessments, is a poor predictor of cognitive performance across many different tasks — though it is certainly correlated to performance in some of them. [2]

When a single company weeds out a candidate that would otherwise perform well, it may not be a big problem by itself. But it can be a big problem when the tests are used at scale, and a job seeker is consistently excluded from jobs they would perform well in. And while the use of these tests by a single private actor may well be justified on an efficiency for hiring basis, it should give us pause to see these tests used at scale for both private and public decision making (e.g., testing students).

Problems with “pretty good” classifications also arise from blind spots in the prediction, as well as tainted data. Somali markets in Seattle have been prevented by the federal government of accepting food stamps because many of their transactions looked fraudulent — with many infrequent, large dollar transactions driven by the fact that many families in the community they serve only shopped once a month, often sharing a car to do so (the USDA later reversed the decision). [3] [4] African American voters in Florida were disproportionately disenfranchised because their names were more often automatically matched to a felon’s names, because African Americans have a disproportionate share of common last names (a legacy of original names being stripped due to slavery). [5] Also in Florida, black crime defendants were more likely to be algorithmically classified as “high risk,” and among those defendants who did not reoffend, blacks were over twice as likely as whites to have been labelled risky. [6]

In all of these cases, there is not necessarily evidence of was malicious intent. The results can be explained by a mix of “pretty good” predictions and data reflecting previous patterns of discrimination — even if the people designing and applying the algorithms had no intention to discriminate.

While the examples I mentioned here had a broad range of technical sophistication, there’s no strong reason to believe the most sophisticated techniques are getting rid of these problems. Even the newest deep learning techniques excel at identifying relatively superficial correlations, not deep patterns or causal paths, as entrepreneur and NYU professor Gary Marcus explains in his January 2018 paper “Deep Learning: A Critical Appraisal.” []

The key problem of the explosion in algorithmic classification is the fact that we are invariably designing life around a sleuth of “pretty good” algorithms. “Pretty good” may be a great outcome for ad targeting. But when we deploy them at scale on applications from voter registration exclusions to hiring to loan decisions, the final outcome may well be disastrous.

References

[1] Weber, Lauren. “Today’s Personality Tests Raise the Bar for Job Seekers.” Wall Street Journal. https://www.wsj.com/articles/a-personality-test-could-stand-in-the-way-of-your-next-job-1429065001

[2] Hampshire, Adam et al. “Fractionating Human Intelligence.” https://www.cell.com/neuron/fulltext/S0896-6273(12)00584-3

[3] Davila, Florangela. “USDA disqualifies three Somalian markets from accepting federal food stamps.” Seattle Times. http://community.seattletimes.nwsource.com/archive/?date=20020410&slug=somalis10m

[4] Parvas, D. “USDA reverses itself, to Somali grocers’ relief.” Seattle Post-Intelligencer. https://www.seattlepi.com/news/article/USDA-reverses-itself-to-Somali-grocers-relief-1091449.php

[5] Stuart, Guy. “Databases, Felons, and Voting: Errors and Bias in the Florida Felons Exclusion List in the 2000 Presidential Elections.” Harvard University, Faculty Research Working Papers Series.

[6] CorbeŠ-Davies, Sam et al. “Algorithmic decision making and the cost of fairness.” https://arxiv.org/abs/1701.08230

[7] Marcus, Gary. “Deep Learning: A Critical Appraisal.” https://arxiv.org/abs/1801.00631