Archive for February, 2018

Modern gun control began with the Gun Control Act of 1968, passed after the assassinations of John F. Kennedy, Martin Luther King Jr., and Bobby Kennedy. It prohibits mail-order gun purchases, requires all new guns be marked with a serial number, and created the Federal Firearms License (FFL) system, which manages licenses required for businesses to sell guns. The law was further strengthened in 1993 by the Brady Handgun Violence Prevention Act. This new addition established a set of criteria that disqualify a person from legally purchasing a gun. It also created the National Instant Criminal Background Check System (NICS), which is maintained by the FBI and used by FFL licensed businesses to quickly check if a person matches any of the disqualifying criteria.

Although NICS was created with good intent, and without any explicit racist assumptions, the NICS database and algorithms likely inflict greater burden on the African American community than its white counterpart. This unintended consequence is based on a perfect storm of seemingly unrelated policies and history.

To see this bias we must first understand how a background check is performed. When you purchase a gun through an FFL licensed business you submit identifying information, such as your name, age, address, and physical descriptions. Then the NICS system looks for an exact match on your personal data in three databases that track criminal records. If no exact match is found a close name match can still halt your purchase.

The data and matching algorithms used by NICS are not publically available so we can only guess at what exists in the databases and how it is utilized, but based on public record and one particular criteria established by the Brady Act, conviction of a crime punishable by imprisonment for a term exceeding one year, we can make educated assumptions about the data. First, drug possession and distribution can result in multi-year imprisonment. Second, the largest proportion of inmates are there because of drug related offenses. These imply a large–maybe the largest–population in NICS is there due to drug related crimes. Lastly, African Americans are imprisoned at a rate six times greater than whites for drug related crimes even though white and African Americans use and possess drugs at essentially the same rate. This final statistics indicates the NICS databases must include a disproportionate number of African Americans due to biases in law enforcement and the criminal justice system. These upstream biases not only affect the inmates at the time of conviction but follows them throughout life, limiting their ability to exercise rights protected by the 2nd Amendment.   

Unfortunately this is not where the bias ends. There is evidence that shows using loose name-based matching algorithms against felony records in Florida disproportionately identified black voters incorrectly as felons and stripped them of their right to vote in the 2000 elections because African Americans are over-represented in common names due to losing their family names during the slavery era. It’s worth wondering if the FBI’s name-matching algorithm suffers from the same bias and results in denying or delaying a disproportionate number of law-abiding African Americans from buying guns. In addition, this bias would result in law-abiding African Americans having their gun purchases tracked in NICS. By law, NICS deletes all traces of successful gun purchases. However, if you are incorrectly denied purchase, you can appeal and add content to the databases that proves you are allowed to purchase guns. This is done to prevent the need to appeal every time you purchase a gun. The existence of this content is the only record of gun purchases in NICS, information the government is generally forbidden to retain. If this bias does exist, there is sad irony in laws passed on the backs of infamous violence perpetrated by non-African Americans now most negatively affecting African Americans.

This evidence should be weighed carefully, especially by those who advocate for both gun control and social justice. The solutions settled upon for gun control must pass intense scrutiny to insure social justice is not damaged. In the case of NICS, the algorithms should be transparent, and simple probabilistic methods employed to lessen the chance of burdening  African Americans who have common names.

If you received an undergraduate degree in the United States, you are likely familiar with the U.S. financial aid system from a student perspective – you submit your essays, your academic records and test scores, you file the FAFSA, and you expect some amount of financial aid from the colleges you applied to in return. Your institution may or may not have provided an estimated cost calculator on its website, and you may or may not have received as much financial aid as you hoped for from your institution. Given that approximately 71% of each undergraduate class takes on student loan debt (TICAS, 2014), institutional aid typically does not cover the gap between what the student can pay and what the institution offers (also known as unmet need). What is clear, however, is that despite a consistent sticker price, the actual cost of college differs from student to student.

Colleges and consultants refer to the practice as “enrollment management” or “financial aid leveraging”, but the pricing strategy itself is known as price discrimination (How Colleges Know What You Can Afford, NY Times, 2017). As with any business where functionality is constrained by net revenue, in some ways there is a fundamental opposition between the best interests of the consumer (student) and the seller (college), since consumers ideally want the product that the cheapest rate and sellers want to earn as much revenue as possible (though many factors other than revenue also drive colleges’ decision making). However, this idea becomes more problematic as we consider that education is not an inessential service, but a key component in personal development and economic opportunity.

The looming ethical discussion, at least in the U.S., is whether higher education should be free for anyone who wants it, perhaps eliminating the need for universities to engage in price discrimination. A parallel discussion is whether price discrimination that leaves unmet need for students is what needs more immediate resolution.

Rather than taking a stance on U.S. college pricing, however, I am interested in the enrollment management paradigm from a student privacy perspective. If Nissenbaum et al. posit that “informational norms, appropriateness, roles, and principles of transmission” govern a framework of contextual integrity (Nissenbaum et al., 2006), how might the use of student-provided data by enrollment consultants violate contextual integrity from the perspective of a student?

I cannot find any existing studies on students’ expectations of how colleges handle their data. As a U.S. student myself, I expect that many students’ expectations are driven by the norms laid out by U.S. policy (particularly FERPA), which treats educational and financial data as private and protected.

I believe, therefore, that certain expectations about the flow of data from student to institution may be violated when universities don’t explicitly divulge their partnerships. If the flow is expected to be a straight line from the student to the college, the continuation of that information from college to consultancy and back to the college may seem aberrant. Equally important, I think, is the expectation of the extent of the information. Students likely expect, and cost calculators imply, that certain static pieces of information will be used to make an admit decision, offer merit aid, and determine financial need. In that case, the passing of that information to an outside consultancy who can use that information (and third-party data) in a predictive model to an extent that surpasses any individual piece of data, both to recommend aid and to predict behavior, and then return that information to the college, may also violate students’ expectations.

It seems to me that whether financial aid leveraging is beneficial to the student or not, a lapse in privacy occurs to the benefit of institutions when they fail to disclose the extent to which student data will be used, and by whom exactly.

Please join us for the NLP Seminar on Monday, February 26  at 4:00pm in 202 South Hall.   All are welcome!

Speaker:  Jonathan Kummerfeld: U Michigan

Title:  Representing Online Conversation Structure with Graphs: A New Corpus and Model

Abstract: 

When a group of people communicate online, their conversation is rarely linear, with each message responding only to the one immediately before it. To build systems that understand a group conversation we need a way to identify the discourse structure–what each message is responding to. I’ll speak about a new corpus we constructed with reply structure annotations for 19,924 messages across 58 hours of IRC discussion. Using our annotations we analyse strengths and weaknesses of a recent heuristically extracted set of conversations that have formed the basis of extensive work on dialogue systems (Lowe et al., 2015). Finally, I’ll present statistical models for the task, which improve thread extraction performance from 25.7 F (heuristic) to 60.3 F (our approach). Using our model we extract a new set of conversations that provide high quality data for use in downstream dialogue system development.

( Slides )

Privacy issues arising from technology often share more or less a similar story. A technology is usually developed with simple intentions to enhance a feature or perform a modest task. The fittest of those technologies survive to serve a wide set of users. However, as more information is logged and transmitted, a growing concern over privacy surfaces until that privacy issue devours the once simple technology. We have observed too many of these stories. Notably, each of the social networking sites that took turns in popularity were developed as a means to host personal opinions and connections. That never changed, except the discussion around privacy infringements exploded and profoundly affected the direction of the sites. The baton for the next debate seems to be handed over to On-Board Diagnostics (OBD). OBD is a device that is placed behind driver dashboards for the sake of collecting data on the car, such as whether or not tire pressure is low. But more features have been added with more to come. Addition of entertainment systems, cameras, and navigation devices contribute richer layers of data onto the OBD.

Originally developed to track maintenance information and later gas emissions, OBD is attracting mounting concern in its expanding capability to inflict some serious privacy violations. Much like the social network sites, OBD is becoming a lucrative source of rich data. In the case with cars, insurance agencies, advertisers, manufacturers, governments, and hackers all have an interest in the data contained in the OBD. For example, some insurance companies have used information from OBD to measure driving distance to determine discounts to drivers with low mileage. And other insurance companies are issuing monetary incentives for customers to submit information from their OBD. Manufactures can use the information to improve their cars and services. And governments can monitor and regulate traffic and gas emissions with the information. Advertisers can be guided with the information as well. Of course, the distribution of information to insurers and marketers seem trivial when you weigh the harm in a possible hacking incident.

As more OBDs are being loaded with internet connectivity functions, the vulnerability may be worsening. The types of information are no longer limited to whether or not your tires are low in pressure. More personal information such as your preference of music, number of passengers, and real time location. Location data can be used to infer your home address, school or office, choice of supermarkets, and maybe even your religious views or night life habits. Cameras in and around the vehicles can supply streaming videos as well. While each of these devices are useful in enhancing driver and passenger experiences, the privacy and security concerns are indeed alarming. Moreover, OBD loaded on a “smart” car can collect more information more accurately, and share the information faster with a wider audience. Unlike those of smartphones, however, developers of smart cars face bigger challenges in keeping up with the rapid technological evolution. Also, even if choices were offered to turn off features of the OBD, many of them are still likely to remain on as safety concerns may override privacy concerns. The question of ownership of the information is also debated in the absence of clear rules and regulations.

A collaborative effort involving governments, manufacturers, and cybersecurity professionals is needed to address the privacy and security concerns arising from OBD. In the United States, senators introduced a bill “Security and Privacy in Your Car Act of 2015” that reads cars to be “equipped with reasonable measures to protect against hacking attacks.” However, the bill is too ambiguous and will be difficult to enforce in a standardized way. Manufacturers, while acknowledging the possible risks associated with OBD, are not fully up to speed on the matter. Federal and state governments need to take leadership, with the cooperation of manufacturers and security professionals, to make sure safe and reliable automobiles are delivered to customers. How we collectively approach the issue will certainly affect what cost we pay.

Strava and Behavioral Economics

February 12th, 2018

I am a self-described health and fitness nut, and in the years since smartphones have become an essential device in our day-to-day lives, technology has also slowly infiltrated my daily fitness regime.  With such pervasive use of apps to track one’s own health and lifestyle choices, is it any wonder that companies are also collecting the data that we freely give them, with the potential to monetize that information in unexpected ways?  Ten years ago, when I went outside for a run, I would try to keep to daylight hours and busy streets because of the worry that something could happen to me and no one would know.  Now, the worry is completely different – now I am worried that if I use my GPS-enabled running app, my location (along with my heart rate and running speed) is saved and stored in some unknown database, to be used in some unknown manner.

 

Recently, a fitness app called Strava made headlines after it published a heat map showing the locations and workouts of users who made the data public (which is the default setting) and inadvertently revealed the location of secret military bases and the daily habits of personnel.  It was a harsh reminder of how the seemingly innocuous use of an everyday tool can have serious consequences – not just personally, but also professionally, and even for one’s own safety (the Strava heatmap showed certain jogging routes of military personnel in the Middle East).  Strava’s response to the debacle was to release a statement that said they were reviewing their features, but also directed their users to review their own privacy settings – thus the burden remains on the user to opt out, for now.

 

Fitness apps don’t just have the problem of oversharing their users’ locations.  Apps and devices like Strava, or Fitbit, are in the business of collecting a myriad of health and wellness data, from sleep patterns, and heart rates, to what the user eats in a day.  Such data is especially sensitive, because it relates to a user’s health – however, because the user is not sharing it with their doctor or hospital, they may not even realize the extent to which others’ may be able to infer their private sensitive information.

 

One of the biggest issues here is the default setting.  Behavioral economics studies show that the status quo bias is a powerful indicator of how us humans make (or fail to make) decisions.  Additionally, most users simply fail to read and understand privacy statements when they sign up to use an app.  Why do some companies still choose to make the default setting “public” for users of their app – especially in cases where it is not necessary? For Strava, if the default had been to “opt in” to share your location and fitness tracking data with the public, their heatmaps would have looked very different.

 

It is not in the interest of companies to allow the default settings to be anything other than public.  The fewer people who share data, the less the company has about you, and the less likely they are able to use the data to their benefit – such as targeted marketing techniques, or using the data to develop additional features for the individual user.  Thus, they could argue that collecting their users’ data on a more widespread basis also benefits their users in the long run (as well as their own revenues).  However, headlines like this one erode public trust in technology companies – and companies such as Strava would do well to remember that their revenues also depend on the trust of their users.  In the absence of allowing “private” or “friends only” default settings, these companies would do well to analyze the potential consequences before releasing the public data that they collect about their users.

 

Less than two months after the launch of MessengerKids, Facebook’s new child-focused correspondence app has received backlash from child-health advocates, including a plea directly to Mark Zuckerberg to pull the plug. On January 30th, the Campaign for Commercial-Free Childhood published an open letter compiled and signed by over 110 medical professionals, educators, and child development experts, which accuses the tech giant of forsaking its promise to “do better” for society and targeting children under 13 to enter the world of social media.  

At its introduction in early December 2017, MessengerKids was branded as another tool for parents struggling to raise children in the digital age. After installing the app on their child’s device(s), parents can control their child’s contact list from their own Facebook account. The app has kid-friendly gifs, frames, and stickers, built in screening for age-inappropriate content in conversations, and a reporting feature for both parents and children to hopefully combat cyberbullying. It contains no advertisements, and the child’s personal information isn’t collected, in accordance with US federal law. Creating an account does not create a Facebook profile, but nonetheless, the service introduces children to social media and their own online presence.

Contrary to the image MessengerKids hoped to present, child-health advocates have interpreted the application less as a gatekeeper for online safety and more as a gateway for unhealthy online habits. In its letter to Mark Zuckerberg, the CCFC cites multiple studies linking screen time and social media presence to depression and negative mental health effects. In addition, the app will interfere with the development of social skills, like the “ability to read human emotion, delay gratification, and engage in the physical world.” The letter argues that the connectivity MessengerKids promises is not an innovation, as these communication methods already exist with parent’s approval or supervision (e.g. Skype or parents’ Facebook accounts); nor does the app provide the solution for underage Facebook accounts, as there’s little incentive for those users to migrate to a service with fewer features designed for younger kids. Instead, it reads as a play to bring users onboard even earlier but marketing specifically to the untapped, under 13 audience.

In addition to the psychological development concerns, a user’s early-instilled brand trust may surpass the perceived importance of privacy later on. Data spread and usage is already a foggy concept to adults, and young children certainly won’t understand the consequences of sharing personal information. This is what the US federal law (“COPPA”) hopes to mitigate by protecting underage users from targeted data collection. MessengerKids normalizes an online identity early on, so young users may not consider the risks of sharing their data with Facebook or other online services once they age out of COPPA protection. The prioritization of online identity that MessengerKids may propagate presents a developmental concern which may affect how those after generation Z  value online privacy and personal data collection.

While Facebook seems to have done its homework by engaging a panel of child-development and family advocates, this could be another high-risk situation for user trust, especially in the midst of the fake-news controversy. Facebook’s discussions with its team of advisors are neither publicly available nor subject to the review process of academic or medical research. With the CCFC’s public backlash, parents who wouldn’t have questioned the feature otherwise may now perceive the impact of the app and its introduction as a medical decision for their child’s health. A curated panel of experts may not be enough to assure parents that Facebook does, in fact, care about kids as more than potential users. The app has no built-in capability to report or prevent cyberbullying, so if Facebook is concerned about unmitigated online activity why not just enforce the existing policy of age restrictions?

Comparing the “benefits” of this service to the developmental risks, the private business interests have clearly outweighed Facebook’s concerns for users’ well-being. While changing social interactions has long been Facebook’s trademark, MessengerKids threatens to alter interpersonal relationships by molding the children who form them and could additionally undermine data responsibility by normalizing online presence at an early age. It appears that Facebook is willing to risk the current generation’s trust to gain the next generation’s- a profitable, but not necessarily ethical decision.