December 2018 – Data Science W231 | Behind the Data: Humans and Values

December 11, 2018December 11, 2018

Impact of Algorithmic Bias on Society

Impact of Algorithmic Bias on Society
By Anonymous | December 11, 2018

Artificial intelligence (AI) is being widely deployed in a number of realms where they have never been used before. A few examples of areas in which big data and artificial intelligence techniques are used are selecting potential candidates for employment, decisions on whether a loan should be approved or denied, and using facial recognition techniques for policing activities. Unfortunately, AI algorithms are treated as a black box in which the “answer” provided by the algorithm is presumed to the absolute truth. What is missed is the fact that these algorithms are biased for many reasons including the data that was utilized for training it. These hidden biases have serious impact on society and in many cases the divisions that have appeared among us. In the next few paragraphs we will present examples of such biases and what can be done to address them.

Impact of Bias in Education

In her book titled, “Weapons of Mass Destruction”, a mathematician, Cathy O’Neil, gives many examples of how mathematics on which machine learning algorithms are based on can easily cause untold harm on people and society. One such example she provides is the goal set forward by Washington D.C.’s newly elected mayor, Adrian Fenty, to turn around the city’s underperforming schools. To achieve his goal, the mayor hired an education reformeras the chancellor of Washington’s schools. This individual, based on an ongoing theory that the students were not learning enough because their teachers were not doing a good job, implemented a plan to weed out the “worst” teachers. A new teacher assessment tools called IMPACT was put in place and the teachers whose scores fell in the bottom 2% in the first year of operation, and 5% in the second year of the operation were automatically fired. From mathematical sense this approach makes perfect sense: evaluate the data and optimize the system to get the most out of it. Alas, as Cathy points out in the example, the factors that were used to determine the IMPACT score were flawed. Specifically, it was based on a model that did not have enough data to reduce statistical variance and improve accuracy of the conclusions one can draw from the score. As a result, teachers in poor neighborhoods, performing very well in a number of different metrics, were the ones that were impacted by the use of the flawed model. The situation was further exacerbated by the fact that it is very hard to attract and grow talented teachers in the schools in poor neighborhoods, many of whom are underperforming.

Gender Bias in Algorithms Used By Large Public Cloud Providers

The bias in algorithms is not limited to small entities with limited amount of data. Even large public cloud providers with access to large number of records can easily create algorithms that are biased and cause irreperable harm when used to make impactful decisions. The website, http://gendershades.org/, provides one such example. The research to determine if there were any biases in the algorithms of three major facial recognition AI service provider— Microsoft, IBM and Face++— was conducted by providing 1270 images from a mix of individuals originating from the continent of Africa and Europe. The sample had subjects from 3 African countries and 3 European countries with 54.4% male and 44.6% female division. Furthermore, 53.6% of the subjects had light skin and 46.4% had darker skin. When the algorithms from the three companies were asked to classify the gender of the samples, as seen in the figure below, the algorithms performed relatively well when one looks just at the overall accuracy.

However, on further investigation, as seen in the figure below, the algorithms performed poorly when classifying dark skinned individuals, particularly women. Clearly, any decisions that one makes based on the classification results of these algorithms, would be inherently biased and potentially harmful to dark skinned women in particular.

Techniques to Address Biases in Algorithms

The recognition that the algorithms are potentially biased is the first and the most important step towards addressing the issue. The techniques to use to reduce bias and improve the performance of algorithms is an active area of research. A number of techniques ranging from creation of an oath similar to the Hippocratic Oath that doctor’s pledge to a conscious effort to use a diverse set of data much more representative of the society has been proposed and is being evaluated. There are many reasons to be optimistic that although the bias in algorithms can never be eliminated, in the very near future the extent of the bias in the algorithms would be reduced.

Bibliography

Cathy O’Neil, 2016, Weapons of Math Destruction, Crown Publishing Company.
How well do IBM, Microsoft and Face++ AI services guess the gender of a face?

December 5, 2018February 22, 2019

Data Privacy for Low Income Segment

Data Privacy for Low Income Segment
by Anonymous

While many people express unease over a perceived loss of privacy in the days of “Big Data” and predictive analytics, does the harm from such “Data Surveillance” impact the rich and poor equally? I believe the answer is no. People in the low income segment are more likely to be victims of predictive algorithms as they have more data exposed and less knowledge on how to protect themselves against data malpractice.

Government Data

As the low income population is required to provide day-to-day data to the government for basic needs (e.g., Welfare, Medicare), they face more pervasive and disproportionate scrutiny. For example, food stamp users are required to have their spending patterns electronically recorded and monitored so government agencies can watch for potential fraud. This limits autonomy and opportunity for food stamp users. In one instance, the U.S Department of Agriculture (USDA) wrongfully deactivated market access for some Somali supermarkets because low income Somalis tend to spend all of their monthly allowance in one day which didn’t follow the “Normal Pattern” established by the USDA data algorithm. Upon further investigation, we learned that such practice resulted from language barriers and limited availability of Somali food in local communities. So local Somalis organize group car rides and go to Somalian supermarkets in other communities and buy a month’s worth of food in one trip. Improper government data usage like this can negatively limit spending behavior for the low income population.

Mobile Usage

Based on survey results from Washington University, the low income segment relies more heavily on mobile phones for internet use than higher income segments. (63% vs. 21%) With high mobile data usage, low income segments are more likely to be victims of advertiser’s cross-device tracking, cell site simulators and in-store tracking by retailers. The whereabouts of low income consumers are more likely to be shared with 3rd parties for data mining purposes. In addition, as the low income segment is more likely to own older generation phones that do not have the latest security updates, they are more likely to encounter security breaches which result in identity theft. As mobile usually contains more personal data than desktop, it also become the main source of internet data leak for low income users.

Social Media

The Washington University survey also indicated that low income internet users are more likely to use social media compare to higher income users. (81% vs. 73%) Most of the difference is skewed to the younger users. When answering questions in social media privacy settings, people from the low income segment are significantly less likely to use privacy settings to restrict access to the contents they post online. (65% vs. 79%). I agree with researchers’ claim that the lack of data restriction control is the result of lack of skills and understanding. After all, privacy settings for social media platform usually have complex wordings and are usually not easy to navigate. As the low income segment tends to have lower education levels, they are more likely to be confused about privacy setting and share their content with the public by default.

Predictive Targeting

As the low income segment is usually more price sensitive, they are also more likely to fall for traps that trade personal information for store coupons. By releasing their information to marketers, they can be easily profiled into various “financially vulnerable” market segments as marketer compile data from various platforms together. With such profile, they are more likely to receive advertisement for dubious financial products such as payday loans or debt relief services. This would result in more financial loss.

Conclusion

Overall, I find data protection for the low income segment to be a tricky subject. While laws such as Title VI of the Civil Right Act of 1964 outlawed discrimination on the base of race, color, religion, sex or national origin, no specific laws were enacted to protect discrimination against the poor. Financial strength is a key factor used in loan, housing/rental and employment decisions. It might be hard to establish laws that include protections for the low income population without limiting the ability of businesses to properly vet potential customers for risk. However, in the interim, I believe we should increase opportunities for training / awareness for the low income population to reduce the knowledge gap so they are less likely to become victims of “big data” or privacy invasion. With my current work, my team is partnering with non-profit organization to provide web safety training for low income communities. We hope that by educating low income consumers on the ways they can ensure their privacy online, they will benefit from all the opportunities the internet can deliver, without putting themselves or their families at unnecessary risk.

Reference:

Privacy, Poverty, and Big Data: A Matrix of Vulnerabilities for Poor Americans https://openscholarship.wustl.edu/cgi/viewcontent.cgi?article=6265&context=law_lawreview

USDA disqualifies three Somalian markets from accepting federal food stamps http://community.seattletimes.nwsource.com/archive/?date=20020410&slug=somalis10m

Internet Essentials Online Safety and Security
https://internetessentials.com/en/learning/OnlineSafetyandSecurity

December 5, 2018February 22, 2019

Autonomous Vehicles

Autonomous Vehicles
by anonymous

As Internet of Things technology becomes a larger part of our lives, there will be privacy and ethics questions that will need to be addressed by lawmakers to protect consumers. With companies like Waymo, Uber, and other start-ups pouring millions of dollars each year into autonomous vehicle technology, self-driving cars are just around the corner and will make huge changes in our society in the next decade. As these technologies have developed over the past 5 years, questions surrounding the safety, potential ethical dilemmas, and subsequent legal issues regarding self-driving vehicles have been widely discussed and at least somewhat regulated as autonomous vehicle testing takes place on our roads. One topic that has been missing from the conversation is potential data protection and privacy issues that may arise once computer operated vehicles are shuffling us around while collected and using the stores of data they possess to subtly influence our daily lives.

To illustrate an example, Google already has troves of data on each of its users, collected from platforms and apps such as Gmail, Google Maps, Android phones, Google Fi equipped devices, and Google Homes. If Waymo, an Alphabet subsidiary, begins selling self-driving cars as personal vehicles, Google will gain access to new granular behavioral information on its users. What places does a person go to, at what time and on which days, and which brands do they prefer? Waymo could use information gathered along with the data Google already has to integrate targeted ads to persuade its users to visit sponsored businesses. For example, if McDonalds pays Waymo, they may suggest a 5 minute detour to stop for food during a road trip when an alternative such as Burger King is available with a shorter detour. Waymo could target users who Googleís machine learning algorithms have determined would buy food at McDonalds after being nudged by their vehicleís suggestion. Most users may not ever know that they were a victim of a sponsored suggestion. Autonomous vehicles will be able to do this for retail, restaurants, grocery markets, bars, as well as services such as dry cleaning, salons, etc. If no protections are put in place, companies will have free reign to target users and influence their decisions each time they get into a vehicle.

There are a few simple things that can be done proactively rather easily by companies to reduce potential harm to users. This is an area where regulations will be crucial since there will be no standards or consistency without legal guidelines. Companies can remove personally identifiable information in their databases, avoiding the potential harm of data leaks or hacks and making it more difficult for other platforms to use data gathered to target users. They can also give users the option to be targeted and even offer direct discounts in exchange for targeted ads. This would both provide a tangible benefit and could also serve to ensure that users are aware that they are being targeted when they receive their perks. Unnecessary data can be deleted after a certain time period so that each personís history is not stored forever.

This domain is entirely new for data gathering, targeted advertising, and sponsored suggestions and has had no impact on peopleís lives in the past. The question of what protections will be put in place for people as self-driving cars enter our roads is a fundamental one that needs to have answers. Technology today develops so quickly that legal guidelines often lag, as they take time to form and be passed into law. This leaves a hole for technology to be pushed to production quickly, leaving users, the general public, to take the full exposure of potential harm.

December 5, 2018December 5, 2018

Five Interesting Facts About GDPR’s Data Protection Officer (DPO)

Five Interesting Facts About GDPR’s Data Protection Officer (DPO)
David Larance

The recently enforced European Union’s Global Data Protection Regulation (GDPR) introduced a new term that CEOs, Board of Directors, and other senior corporate officials need to start learning more about, the Data Protection Officer. While some “What is a DPO?” articles exist online, I’ve found five additional interesting facts in my review of the new role.

1.It’s so important that committed DPOs an entire section
Image by David Larance

2.Not every company needs a DPO

Article 37’s designation of the DPO is limited to if one of three situations are met.

a) The data processing is managed by a “public authority or body”;
b) The processor’s core business already requires “regular and systemic monitoring of data subjects”; or
c) The processor’s core business is related to criminal activity or in a “special categories” section which includes sensitive data figures such as (race/ethic, political, genetic data, etc.)

3.Companies can contract out the DPO to 3rd a party provider

Article 37.6 clearly states that “The DPO may….fulfill the tasks on the basis of a service contract”. It doesn’t state any additional detail as to whether the DPO must be a full-time position or even if one DPO can fulfill the role for multiple independent organizations. By not explicitly stating the terms of what a valid service contract entails the article appears to legally open the door for a cottage industry of DPOs for hire. Given the stated cost of implementing GDPR by many high profile organizations, it will be interesting to see if firms feel like they reduce head count costs by using a 3rd party to meet the DPO requirements.

Image via aphaia, see references

4.The DPO is essentially a free agent

Article 38 details several elements of the DPO’s role, which when combined paint the picture of an independent role where they get to be a combined data auditor and data subject protector. What makes the role especially interesting is while they “may be a staff member of the controller or processor” they also say that they cannot be penalized or dismissed by the controller or processor and report to the highest levels of management. This provides a legal defense for any DPO wrongful dismissal case while also maintaining that the only people that need to be 100% aware of the DPOs activities are the highest levels of management (who usually are only focused on data privacy issues when an event or breach has occurred).

5.Good DPOs will be hard to find

A good DPO will be a skilled data technician, data privacy expert, and able to navigate complicated business processes within their own organization. They will need to be able to understand and access the back end systems and algorithms that manage their companies data to adequately monitor and test how protected the data actually is while also managing regulator and executive expectations. These two areas of domain when combined are challenging to manage and probably more importantly, challenging to communicate and provide transparency to all stakeholders.

See also:
1. Regulation (EU) 2016/679. (2018). Retrieved from https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32016R0679
2. What is a Data Protection Officer (DPO)? Learn About the New Role Required for GDPR Compliance?, digitalguardian, (2018). Retreived from https://digitalguardian.com/blog/what-data-protection-officer-dpo-learn-about-new-role-required-gdpr-compliance

Images from:
1. Larance (2018)
2. Do I need a Data Protection Officer, aphaia, (2018). Retrieved from https://aphaia.co.uk/en/2018/03/26/do-i-need-a-data-protection-officer/

December 5, 2018February 22, 2019

Can Blockchain and GDPR Truly Coexist?

Can Blockchain and GDPR Truly Coexist?
by Joseph Lee

As of 25 May 2018, the General Data Protection Regulation (GDPR) has been taking no prisoners in its enforcement across the world. Facebook itself is expected to face at least $1.6 Billion fine for a publicly disclosed data breach that allowed hackers access to over 50 Million facebook user accounts [1][2]. Not only are tech giants targeted by this regulation, but any organization is also fair game. As of October 29, 2018, GDPR has fined a Portuguese hospital with for €400,000 for two violations of the GDPR due to poor data management practices. While it is comforting to know that regulation regarding the ethical conduct of data collection, storage, and usage is in place, how does GDPR impact areas that have fuzzy definitions of data controllers, processors, and subjects? In this essay, I will lightly assess the feasibility of a well known decentralized protocol, blockchain, with GDPR compliance.

The GDPR was first proposed by the European Commission back in 2012 with the initial intent on monitoring cloud services and social networks [8]. At the time, blockchain was not a well-known concept, and most cloud infrastructures and social networks were based on a central information system [4]. This centrality gives the GDPR a relatively easy target for substantiating and finding data breaches and other related violations. But how will the GDPR affect and even enforce regulations on decentralized protocols such as blockchains?

First, what is blockchain? The blockchain is essentially an incorruptible digital ledger of economic transactions that can be programmed to record anything from financial transactions to any digitized action [6]. Proponents for blockchain would usually cite that the following the critical characteristics of blockchain are its public transparency, a potential to increase transaction speed, and reduction of middle management costs. While this technology is famous for its applications in cryptocurrencies, it is essential to acknowledge that this decentralized protocol could potentially revolutionize other industries such as automating health records, smart contracts, or even banking [5]. That said, the future of blockchain will depend on how this technology can comply with GDPR.

At an initial glance, one might think there is a paradoxical relationship between GDPR and public blockchains. For instance, among the many requirements set out in the GDPR, the “right to erasure” appears contradict the immutability of blockchain technology.

A promising solution that is gaining popularity amongst blockchain supporters is the use of private blockchains and off-chains. The general concept of this idea is simple. A person would store personal data off-chain and store the reference to this data on the ledger. This hashing ability means that it is possible for any person to delete their private information off-chain even though the original reference is still on the public blockchain network. I would strongly recommend visiting [Andries Van Humbeeck’s post](https://medium.com/wearetheledger/the-blockchain-gdpr-paradox-fc51e663d047) regarding the details of how off-chain and private blockchains can work, represented in figure below [7].

While this may technically meet GDPR’s definition of the right of erasure, there are other components of this workaround to consider regarding feasibility. The use and enforcement of off-chain ledgers would in actuality imply an increase in complexity and reduction of transparency. Moreover, the additional complexity could reduce the speed of peer-to-peer transactions [8]. In short, this means that in order to make blockchain comply with GDPR, we would need to sacrifice the primary benefits of having a decentralized network in the first place.

Despite the pros and cons of these workarounds, there are still a large number of unknowns. As mentioned before, GDPR relies on a clear definition of controller and subjects. However, managing these relationships will very complex when it comes to decentralized protocols. If we are not aware of every individual using blockchain, how can someone be clear on whom the responsibilities of controllers or subjects lie? How can we ensure that regulations are fairly and justly applied when such relationships are not clear?

While the future of blockchain compliance with GDPR is uncertain, it is vital for us to continue the dialogue regarding blockchain and GDPR coexistence. In 2017, the Financial Conduct Authority published a discussion paper regarding the challenges that blockchain faces in light of GDPR enforcement [4]. The overall conclusion was while there were significant challenges, the combination of GDPR and the use of decentralized ledger systems has the potential to improve an organization’s data collection, storage, and management of private data which would, in turn, enhance consumer outcomes and experiences.

In conclusion, the question of coexistence is still relevant and should continue to be debated and discussed. It would be exciting to see both relatively young paradigms interact and see how this interaction will create new precedents on how we regulate decentralized protocols.

References

[1] https://www.cnet.com/news/gdpr-google-and-facebook-face-up-to-9-3-billion-in-fines-on-first-day-of-new-privacy-law/

[2] https://www.huntonprivacyblog.com/2018/09/25/ico-issues-first-enforcement-action-gdpr/

[3] https://www.cnbc.com/2018/10/02/facebook-data-breach-social-network-could-face-EU-fine.html

[4] https://www.insideprivacy.com/international/european-union/the-gdpr-and-blockchain/

[5] https://www.g2crowd.com/categories/blockchain

[6] https://blockgeeks.com/guides/what-is-blockchain-technology/

[7] https://medium.com/wearetheledger/the-blockchain-gdpr-paradox-fc51e663d047

[8] https://cointelegraph.com/news/gdpr-and-blockchain-is-the-new-eu-data-protection-regulation-a-threat-or-an-incentive

December 5, 2018December 5, 2018

Privacy in Communications in Three Acts. Starring Alice, Bob, and Eve

Privacy in Communications in Three Acts. Starring
Alice, Bob, and Eve
By Mike Frazzini

Act 1: Alice, Bob, and Setting the Communications Stage
Meet Alice and Bob, two adults who wish to communicate privately. We are not sure why or what they wish to communicate privately, but either would say it’s none of our business. OK. Well, lets hope they are making good decisions, and assess if their wish for private communication is even possible. If it is possible, how it could be done, and what are some of the legal implications? We set the stage with a model of communications. A highly useful choice would be the Shannon-Weaver Model of Communications, which was created in 1948 when Claude Shannon and Warren Weaver wrote the article “A Mathematical Theory of Communication” that appeared in the Bell System Technical
Journal. This article, and particularly Claude Shannon , are considered founding thought leaders of information theory and studies. The model is shown in the diagram below:

Image: Shannon-Weaver Model, Image Credit: http://www.wlgcommunication.com/what_is

Playing the role of Sender will be Alice, Bob will be the Receiver, and the other key parts of the model we will focus on are the message and the channel. The message is simply the communication Alice and Bob wish to exchange, and the channel is how they exchange it, which could take many forms. The channel could be the air if Alice and Bob are right next to each other in a park and speaking the message, or the channel could be the modern examples of text, messenger, and/or email services on the Internet.

Act 2: Channel and Message Privacy, and Eve the Eavesdropper
So, Alice and Bob wish to communicate privately. How is this possible? Referring to the model, this would require that the message Alice and Bob communicate only be accessed and understood by them, and only them, from sending to receipt, through the communication channel. With respect to access, whether the channel is a foot of air between Alice and Bob on a park bench, or a modern global communications network, we should never assume the channel is privately accessible to only Alice and Bob. There is always risk that a third party has access to the channel and the message – from what might be recorded and overheard on the park bench, to a communications company monitoring a channel for operational quality, to the U.S. government accessing and collecting all domestic and global communications on all major communications channels like the whistleblower Edward Snowden reported.
If we assume that there is a third party, let’s call them Eve, and Eve is able to access the channel and the message, it becomes much more challenging to achieve the desired private communications. How can this be achieved? Alice and Bob can use an amazing practice for rendering information secret for only the authorized parties. This amazing practice is called cryptography. There are many types and techniques in cryptography, the most popular being encryption, but the approach is conceptually similar and involves “scrambling” a message into nonsense that can then only be understood by authorized parties.
Cryptography provides a way that Alice and Bob can exchange a message that only they can fully decode. Cryptography would prevent Eve from understanding the message between Alice and Bob, even if Eve had access to it.

To illustrate cryptography, let’s suppose Alice and Bob use an unbreakable form of cryptography, called a One Time Pad (OTP). This is a relatively simple method where Alice and Bob would pre-generate a completely random string of characters, then securely and secretly share this string called a key. One way they might do this is using a 40-sided dice with a number on each side representing each of 40 characters they might use in their message; the numbers 0-9, the 26 letters of the English alphabet A-Z, and 4 additional characters to represent a space among others. They would assign all of the characters a sequential number as well. They could then do modular arithmetic to encode the message with the random key:

Image: OTP Example, Image Credit: Mike Frazzini

Act 3: What have Alice and Bob Wrought? Legal Implications of Private Communications

So now that we have shown that it certainly is technically possible – and it is also mathematically provable – for Alice and Bob to engage in private communications, we create a tempest of legal questions that we will now attempt to provide some high level resolution. The first big question is on the legality of Alice and Bob in engaging in private communications. We will approach this question from the standpoint of a fully democratic and free society, and specifically of the United States, since many countries have not even established de facto, let alone de jure, full democracies including protections for freedom of speech.

We can address this in two parts; the question of the legality and protections of communications on the channel, including the aspects of monitoring and interception of communications; and the question of the legality of using cryptographic methods to protect the privacy of communications. In the United States, there are a number of foundational legal rights, statutes, and case law precedents, from the Fourth Amendment of the U.S. Constitution protecting against “unwarranted search and seizure,” to the Electronic Communications Privacy Act, to U.S. Code Chapter 119, that all generally protect privacy of communications, including protection from monitoring and interception of communications. However, this legal doctrine also defines conditions where monitoring and interception may be warranted. And as we have also presented, in at least the one case reported by Edward Snowden, there was widespread unwarranted abuse of the monitoring and interception of communications by the U.S. government, with indefinite retention for future analysis. So, given these scenarios, and then including all the commercial and other third parties that may have access to the communication channel and message, Alice and Bob are wise to assume their may be eavesdropping on their communication channel and message.

Regarding the question of the legality of using cryptographic methods to protect the privacy of communications in the U.S., there does not appear to be any law generally preventing consumer use of cryptographic methods domestically. There are a myriad of acceptable use and licensing restrictions, based in U.S. statute, such as the case of the FCC part 97 rules that prohibit cryptography over public ham radio networks. It is also likely that many communications providers have terms and conditions, as well as commercial law based contracts, that restrict or prohibit use of certain high-strength cryptographic methods. Alice and Bob would be wise to be aware and understand these before they use high strength cryptography.

There are also export laws within the U.S. that address certain types and methods of strong cryptography. There is legislation pending and relevant case law precedents that restrict cryptography as well. In response to the strengthening of technology platform cryptography, like that recently done by Apple and Google, and referred to by the U.S. law enforcement community as “going dark,” a senate bill was introduced by Senators Diane Feinstein and Richard Burr to require “back-door” access for law enforcement which would render the cryptography ineffective. This has not yet become law, however there has been several examples of lower court and state and local jurisdictions requiring people to reveal their secret keys so messages could be unencrypted by law enforcement. This is despite the Fifth Amendment protections of the U.S. Constitution for self-incrimination.

Of course, there are many scenarios where information itself, and/or communication of information, can constitute a criminal act (actus reus). Examples of this include threats, perjury, and conspiracy. So, again, we hope Alice and Bob are making good choices, since their communications – and the information transmitted therin – could certainly be illegal, even if the privacy of their communications itself is not illegal.

December 5, 2018December 5, 2018

The Physics Behind Good Data Security

The Physics Behind Good Data Security
By Rani Fields

The data security apocalypse is upon us. While that statement might be a bit hyperbolic, 2017 and 2018 were certainly emblematic years in data security. From Panera to Facebook, Aadhaar to Saks Fifth Avenue, the frequency, intensity, and expansiveness of data breaches entered an upward swing with increasing intensity. When you factor in the expensive nature of a breach, at $148 per record (IBM, 2018), the damages from a data breach impacting even a small group can reach into the millions. A company’s ability to quickly respond in a manner which satisfies regulatory officials and placates embroiled public sentiments also increases in complexity as a company’s profile increases. Needless to say, modern companies with an online presence should not concentrate solely on preventing breaches but rather have an extensive focus on managing the fall-out of data breaches. Data breaches should be considered inevitable in today’s world, after all.

Where Proactive Approaches Fail
All companies with a sizeable online footprint should be prepared for a security breach. As penetration methods and social engineering methods become increasingly refined, we cannot reasonably assume any infrastructure to be sufficiently free from the risk of a breach. As such, breaches are a question of when they will occur, not if they will occur. The sources of a breach are varied- they can happen via data leaks, infrastructure hackings and other technical vulnerabilities, phishing and other social engineering methods, and via inside jobs. Because the perfect system and the perfect policy does not exist, a company can always consider themselves vulnerable to some degree along one or more of these axes. Thus, any robust breach policy should be designed to not just mitigate risks but also properly prepare the company for a speedy response in line with which systems were breached and the nature of the resulting exposure.

Image via Statista, see references

Technical and Policy Risk Management
As the major attack medium in question is ultimately electronic, we can consider a number of digital areas when reducing risk. Naturally, existing best practices will prevail first and foremost. Identity management, security-by-design, intruder detection, and other similar techniques and technologies should be used wherever possible. The issue with these proactive methods is, with time, the number of resources required to manage a company’s electronic systems can eclipse the benefit hardening these systems can provide as a company’s technical offerings grow.

From a technical standpoint, proactive security policies present a definite amount of benefit, albeit with a limit. Thus, when managing system risk, companies should consider the amount of time and resources required to harden mission-critical systems versus other systems when pursuing a proactive approach.

With a reactive approach to security, we pivot from a question of where we can minimize risk to a question of how we can better understand and respond to security incidents. In this, we see a disproportionate importance in maintaining business continuity plans and disaster recovery plans. For each type of data stored and for each type of breach, you need to ask yourself if your company or group has a clear policy defining:

1. What happens to affected systems in the immediate aftermath of a breach?
2. Can you operate with reduced resources if a system is taken offline due to a breach?
3. Do you have any way to determine which resources were accessed within hours after a breach?
4. Do you have a data protection officer?
5. Does your plan account for any legal and regulatory questions which can occur in the aftermath of a breach, within reason?

Finally, consider modeling privacy and internal data flows when designing data-oriented privacy policies. The key focus of a company in the wake of a breach will be a fast and accurate response; knowing which entities had their data exposed and which specific data were affected are critical to ensuring that your company takes the correct response at the correct time. Furthermore, knowing this information in a process-oriented manner opens pathways to efficiently reducing risk by way of reducing attack surfaces while enabling internal security policies to operate smoothly.

Due to both evolving regulatory changes and the ever-evolving security landscape, the failure to be able to act in a reactive fashion can damage a company more than the benefit provided by simply reducing the risk of a breach. Thus, companies and stakeholders should review their policies to ensure procedures are properly defined so that a company can act in a reactive fashion when the time inevitably comes.

See also:
1. Cost of a Data Breach Study. (2018). Retrieved from https://www.ibm.com/security/data-breach
2. U.S. data breaches and exposed records 2018 | Statistic. (2018). Retrieved from https://www.statista.com/statistics/273550/data-breaches-recorded-in-the-united-states-by-number-of-breaches-and-records-exposed/
3. Data Breach Reports (Tech.). (2018, June 30). Retrieved https://www.idtheftcenter.org/wp-content/uploads/2018/07/DataBreachReport_2018.pdf
4. The History of Data Breaches. (2018, November 07). Retrieved from https://digitalguardian.com/blog/history-data-breaches
5. Data Breach Notification Requirements Coming from EU Expand Obligations for Organizations Worldwide. (2017, September 21). Retrieved from https://www.mayerbrown.com/data-breach-notification-requirements-coming-from-eu-expand-obligations-for-organizations-worldwide/

Images from:
1. University of West Florida. (n.d.). Legal & Consumer Info. Retrieved from https://uwf.edu/go/legal-and-consumer-info/general-data-protection-regulation-gdpr/
2. U.S. data breaches and exposed records 2018 | Statistic. (2018). Retrieved from https://www.statista.com/statistics/273550/data-breaches-recorded-in-the-united-states-by-number-of-breaches-and-records-exposed/

December 4, 2018

Potential Negative Consequences IoT Devices Could Have on Consumers

Potential Negative Consequences IoT Devices Could Have on Consumers
By Anonymous | December 4, 2018

IoT, or the Internet of Things, are devices that have the ability to collect and transmit data across the internet or other devices. The number of internet connected devices has grown rapidly among consumers. In the past, a typical person only owned a few IoT devices, such as desktops, laptops, routers and smartphones. Now, due to technological advances, many people also own televisions, video game consoles, smart watches (e.g. Fitbit, Apple Watch), digital assistants (e.g. Amazon Alexa, Google Home), cars, security systems, appliances, thermostats, locks and lights that all connect and transmit information over the internet.

While companies are constantly trying to find new ways to implement IoT capabilities into the lives of consumers, security seems to be taking a back seat. Therefore, with all of these new devices, it is important for consumers to remain aware of the personal information that is being collected, and to be informed of the potential negative consequences that could result from owning such devices. Here are four things you may want to be aware of:

1. Hackers could spy on you

43768357 – hooded cyber criminal stealing secrets with laptop

I am sure you have heard stories of people who have been spied on after having the webcams on their laptops hacked. Other devices, like The Owlet, a wearable baby monitor, was found to be hackable, along with SecurView smart cameras. What if someone were able to access your Alexa? They would be able to learn a lot about your personal life through recordings of your conversations. If someone were to hack your smart car, then they would be able to know where you are at most times. Recently, researchers uncovered vulnerabilities in Dongguan Diqee vacuum cleaners that could allow attackers to listen or perform video surveillance.

2. Hackers could sell or use your personal information

It may not seem like a big deal if a device, such as your FitBit is hacked. However, many companies would be interested in obtaining this information and could achieve financial gains with it. What if an insurance company could improve their models with this data, and as a result, increased their rates for customers with poor vital signs? Earlier this year, hackers were able to steal sensitive information from a casino after gaining access to a smart thermometer in a fish tank. If hackers can steal data from companies that prioritize security, then they will probably have a much easier time doing the same to an average person. The data you generate is valuable, and hackers can find a way to monetize it.

3. Invasion of privacy by device makers

Our personal information is not only obtainable through hacks. We may be willingly giving it away to the makers of the devices we use. Each device and application has its own policies regarding the data it chooses to collect and store. A GPS app may store you travel history so it can make recommendations in the future. However, it may also use this information to make money on marketing offers for local businesses. Device makers are financially motivated to use your information to improve their products and target their marketing efforts.

4. Invasion of privacy by government agencies

Government agencies are another group that may have access to our personal information. Some agencies, like the FBI, have the power to request data from device makers in order to gather intelligence related to possible threats. Law enforcement may be able to access certain information for purposes of investigations. Last year, police charged a man with murdering his wife using data from her Fitbit. Also, lawyers may be able to subpoena data in criminal and civil litigation.

IoT devices will continue to play an important role in everyone’s lives. They will continue to create an integrated system that will lead to increased efficiency for all. However, consumers should remain informed, and if given a choice between a brand of device, like Alexa or Google Home, consider choosing a company that prioritizes the security and policy issues discussed above. This will send a message that consumers care, and encourage positive change.

December 4, 2018

The View from The Middle

The View from The Middle
By Anonymous | December 4, 2018

If you are like me, you probably spend quite a bit of time online.

We read news articles online, watch videos, plan vacations, shop and much more. At the same time, we are generating data that is being used to tailor advertising to our personal preferences. Profiles constructed from our personal information are used to suggest movies and music we might like. Data driven recommendations make it easier for us to find relevant content. Advertising also provides revenue for the content providers which allows us to access those videos and articles at reduced cost.

But is the cost really reduced? How valuable is your data and how important is your privacy? Suppose you were sharing a computer with other members of your household. Would you want all your activities reflected in targeted advertising? Most of the time we are unaware that we are under surveillance and have no insight into the profiles created using our personal information. If we don’t want our personal information shared, how do we turn it off?

To answer that question, let’s first see what is being collected. We’ll put a proxy server between the web browser and the internet to act as a ‘Man-in-the-Middle’. All web communication goes through the proxy server which can record and display the content. We can now see what is being shared and where it is going.

The Privacy Settings of our Chrome browser allow us to turn off web services that share data. We also enable ‘Do Not Track’ to request that sites not track our browsing habits across websites.

Let’s see what happens when we browse to the webpage of a popular travel site and perform a search for vacation accommodation. In our proxy server we observe that the travel website caused many requests to be sent from our machine to advertising and analytics sites.

We can see requests being made to AppNexus (secure.adnxs.com), a company which builds groups of users for targeted advertising. These requests have used the X-Proxy-Origin HTTP Header to transmit our IP address. As IP addresses can be associated with geographic location this is personal data we may prefer to protect.

Both the Google Marketing Platform (doubleclick.net) and AppNexus are sharing details of the travel search in the Referrer HTTP Header. They know the intended destination and dates and the number of adults and children travelling.

ATDMT (ad.atdmt.com) is owned by a Facebook subsidiary Atlas Solutions. It is using a one pixel image as a tracking bug although the Do Not Track header is set to true. Clearbrain is a predictive analytics company which is also using a tracking bug.

Now we’ll have a look at the effectiveness of some popular privacy tools:

The Electronic Frontier Foundation’s ‘Privacy Badger’ combined with ‘Adblock Plus’ in Chrome. Privacy Badger is a browser add-on from the Electronic Frontier Foundation that stops advertisers and other third-party trackers from secretly tracking what pages you look at on the web. Adblock Plus is a free open source ad blocker which allows users to customize how much advertising they want to see.
The Cliqz browser with Ghostery enabled. Ghostery is a privacy plugin giving control over ads and tracking technologies. Cliqz is an open source browser designed for privacy.

There are now far fewer calls to third party websites. Privacy Badger has successfully identified and blocked the ATDMT tracking bug. Our IP address and travel search are no longer being collected. However neither Privacy Badger nor Ghostery detected the Clearbrain tracker. Since Privacy Badger learns to spot trackers while we browse it may just need to more time to detect bugs.

While these privacy tools are quite effective at providing some individual control over personal information, they are by no means a perfect solution. This approach places the burden of protecting privacy on the individual who does not always understand the risks. While these tools are designed to be easy to install, many people are unfamiliar with browser plugins.

Furthermore, we are making a trade off between our privacy and access to tailored advertising. Content websites we love to use may be sponsored by the advertising revenue we are now blocking.

For now, these tools at least offer the ability to make a choice.

December 4, 2018December 5, 2018

The Customer Is Always Right: No Ethics in Algorithms Without Consumer Support

The Customer Is Always Right: No Ethics in Algorithms Without Consumer Support
by Matt Swan | December 4, 2018

There is a something missing in data science today: ethics. It seems like there is a new scandal everyday; more personal data leaked to any number of bad actors in the greatest quantities possible. Big Data has quickly given way to Big Data Theft.

The Internet Society of France, for example, a public interest group advocating for online rights, is pushing Facebook to fix the problems that led to the recent string of violations. They’re suing for $100 million Euros (~$113 million USD) and threatening EU-based group action, if appropriate remedies are not made. Facebook is also being pursued by a public interest group in Ireland and recently paid a fine of 500,000 pounds (~$649,000 USD) for their role in the Cambridge Analytica breach. Is this the new normal?

Before we answer that question, it might be more prudent to ask why this happened in the first place. That answer is simple.

Dollars dictate ethics.

Facebook’s primary use of our data is to offer highly targeted (read: effective) advertising. Ads are the price of admission and it seems we’ve all come to terms with that. Amid all the scandals and breaches, Facebook made their money – far more money than they paid in fines. And they did it without any trace of ethical introspection. Move fast and break things, so long as they’re not your things.

Dollars dictate ethics.

Someone should be more concerned about this. In the recent hearings in the US Congress in early September, there was talk about regulating the tech industry to try to bring these problems under control. This feels like an encouraging move in the correct direction. It isn’t.

First, laws cannot enforce ethical behavior. Laws can put in place measures to reduce the likelihood of breaches or punish those not sufficiently safeguarding personal data or those failing to correct algorithms with a measurable bias, but it cannot require a company to have a Data Ethicist on the payroll. We’ve already noted that Facebook made more money than they paid in fines, so what motivation do they have to change their behavior?

Second, members of Congress are more likely to believe TensorFlow is a new setting on their Keurig than they are to know it’s an open source machine learning framework. Because of this reality, some organizations – such as 314 Action – exist and prioritize electing more STEM professionals to government because of technology has progressed quickly and government is out of touch. We need individuals who have a thorough understanding of technological methods.

Meanwhile, higher education is making an effort to import ethics into computer and data science programs, but there are still limitations. Some programs, such as UC Berkeley’s MIDS program, have implemented an ethics course. However, at the time of this writing, no program includes a course in ethics as a graduation requirement.

Dollars dictate ethics.

Consider the time constraints; only so many courses can be taken. If one program requires an ethics course, the programs that do not will be at an advantage in recruiting because they will argue the ethics course is a lost opportunity to squeeze in one more technology course. This will resonate with prospective students since there are no Data Ethicist jobs waiting for them and they’d prefer to load up on technology-oriented courses.

Also, taking an ethics course does not make one ethical. Ultimately, while each budding data scientist should be forced to consider the effects of his or her actions, it is certainly no guarantee of future ethical behavior.

If companies aren’t motivated to pursue ethics themselves and the government can’t force them to be ethical and schools can’t force us to be ethical, how can we possibly ensure the inclusion of ethics in data science?

I’ve provided the answer three times. If it were “ruby slippers”, we’d be home by now.

Dollars dictate ethics.

All the dollars start with consumers. And it turns out that when consumers collectively flex their economic muscles, companies bend and things break. Literally.

In late 2017, Fox News anchor Sean Hannity had made some questionable comments regarding a candidate for an Alabama senate seat. Consumers contacted Keurig, whose commercials aired during Hannity’s show, and complained. Keurig worked with Fox to ensure their ads would no longer be shown at those times, which also resulted in the untimely death of a number of Keurig machines.

The point is this: if we want to effect swift and enduring change within tech companies, the most effective way to do that is through consistent and persistent consumer influence. If we financially support companies that consider the ethical implications of their algorithms, or simply avoid those that don’t, we can create the necessary motivation for them to take it seriously.

But if we keep learning about the newest Facebook scandal from our Facebook feeds, we shouldn’t expect anymore more than the same “ask for forgiveness, not permission” attitude we’ve been getting all along.

Sources:
https://www.siliconrepublic.com/companies/data-science-ethicist-future
https://www.siliconrepublic.com/enterprise/facebook-twitter-congress
https://www.siliconrepublic.com/careers/data-scientists-ethics
https://news.bloomberglaw.com/privacy-and-data-security/facebook-may-face-100m-euro-lawsuit-over-privacy-breach
https://www.nytimes.com/2017/11/13/business/media/keurig-hannity.html
http://www.pewinternet.org/2018/11/16/public-attitudes-toward-computer-algorithms/