Data Privacy and the Chinese Social Credit System

Data Privacy and the Chinese Social Credit System
“Keeping trust is glorious and breaking trust is disgraceful”
By Victoria Eastman | February 24, 2019

Recently, the Chinese Social Credit System has been featured on podcasts, blogs, and news articles in the United States, often highlighting the Orwellian feel of the imminent system China plans to use to encourage good behavior amongst its citizens. The broad scope of this program raises questions about data privacy, consent, algorithmic bias, and error correction.

What is the Chinese Social Credit System?

In 2014, the Chinese government released a document entitled, “Planning Outline for the Construction of a Social Credit System” The system uses a broad range of public and private data to rank each citizen on a scale from 0-800. Higher ratings offer citizens benefits like discounts on energy bills, more matches on dating websites, and lower interest rates. Low ratings incur such punishments as the inability to purchase plane or train tickets, banishment for you and your children from universities, and even pet confiscation in some provinces. The system has been undergoing testing in various provinces around the country with different implementations and properties, but the government plans to take the rating system nationwide in 2020.

The exact workings of the system have not been explicitly detailed by the Chinese government, however details have spilled out since the policy was announced. Data is collected from a number of private and public sources: chat and email data; online shopping history; loan and debt information; smart devices, including smart phones, smart home devices, and fitness trackers; criminal records; travel patterns and location data; and the nationwide collection of millions of cameras that watch all Chinese citizens. Even your family members and other people you associate with can affect your score. The government has signed up more than 44 financial institutions and has issued at least 8 licenses to private companies such as Alibaba, Tencent, and Baidu to submit data to the system. Algorithms are run over the entire dataset and generate a single credit score for each citizen.

This score will be publicly available on any number of platforms including the newspapers, online media, and even some people phones so when you call a person with a low score, you will hear a message telling you the person you are calling has low social credit.

What does it mean for privacy and consent?

On May 1st, 2018, China announced the Personal Information Security Specification, a set of non-binding guidelines to govern the collection and use of personal data of Chinese citizens. The guidelines appear similar to the European GDPR with some notable differences, namely a focus on national security. Under these rules, individuals have full rights to their data, including erasure and must provide consent for any use of personal data by the collecting company.

How do these guidelines jive with the social credit system? The connection between the two policies has not been explicitly outlined by the Chinese government, but at first blush it appears there are some key conflicts between the two policies. Do citizens have erasure power over their poor credit history or other details that negatively affect their score? Are companies required to ask for consent to send private information to the government if it’s to be used in the social credit score? If the social credit score is public, how much control to individuals really have over the privacy of their data?

Other concerns about the algorithms themselves have also been raised. How are individual actions weighted by the algorithm? Are some ‘crimes’ worse than others? Does recency matter? How can incorrect data be fixed? Is the government removing demographic information like age, gender, or ethnicity or could those criteria unknowingly create bias?

Many citizens with high scores are happy with the system that gives them discounts and preferential treatment, but others fear the system will be used by the government to shape behavior and punish actions deemed inappropriate by the government. Dissidents and minority groups fear the system will be biased against them.

There are still many details that are unclear about how the system will work on a nationwide scale, however, there are clear discrepancies between the published data privacy policy China announced last year and the scope of the social credit system. How the government addresses the problems will likely lead to even more podcasts, news articles, and blogs.

Sources

Sacks, Sam. “New China Data Privacy Standard Looks More Far-Reaching than GDPR”. Center for Strategic and International Studies. Jan 29, 2018. https://www.csis.org/analysis/new-china-data-privacy-standard-looks-more-far-reaching-gdpr

Denyer, Simon. “China’s plan to organize its society relies on ‘big data’ to rate everyone“. The Washington Post. Oct 22, 2016. https://www.washingtonpost.com/world/asia_pacific/chinas-plan-to-organize-its-whole-society-around-big-data-a-rating-for-everyone/2016/10/20/1cd0dd9c-9516-11e6-ae9d-0030ac1899cd_story.html?utm_term=.1e90e880676f

Doxing: An Increased (and Increasing) Privacy Risk

Doxing: An Increased (and Increasing) Privacy Risk
By Mary Boardman | February 24, 2019

Doxing (or doxxing) is a form of online abuse where one party releases sensitive and/or personally identifiable information. While it isn’t the only risk associated with a privacy concern, it is one that can be put people physically in harm’s way. For instance, this data can include information such as name, address, telephone number. Such information exposes doxing victims to threats, harassment, and/or even violence.

People dox others for many reasons, all with the intention of harm. Because more data is more available to more people than ever, we can and should assume the risk of being doxed is also increasing. For those of us working with this data, we need to remember that there are actual humans behind the data we use. As data stewards, it is our obligation to understand the risks to these people and do what we can to protect them and their privacy interests. We need to be deserving of their trust.

Types of Data Used
To address a problem, we must first understand it. Doxing happens when direct identifiers are released, but these aren’t the only data that can lead to doxing. Some data are such as indirect identifiers, can also be used to dox people. Below are various levels of identifiability and examples of each:

  • Direct Identifier: Name, Address, SSN
  • Indirect Identifier: Date of Birth, Zip Code, License Plate, Medical Record
  • Number, IP Address, Geolocation
  • Data Linking to Multiple Individuals: Movie Preferences, Retail Preferences
  • Data Not Linking to Any Individual: Aggregated Census Data, Survey Results
  • Data Unrelated to Individuals: Weather

Anonymization and De-anonymization of Data
Anonymization is a common response to privacy concerns and can be seen as an attempt to protect people’s privacy. The way this is done is by removing identifiers from a dataset. However, because this data can be de-anonymized, anonymization is not a guarantee of privacy. In fact, we should never assume that anonymization can provide more than a level of inconvenience for a doxer. (And, as data professionals, we should not assume anonymization is enough protection.)

Generally speaking, there are four types of anonymization:
1. Remove identifiers entirely.
2. Replace identifiers with codes or pseudonyms.
3. Add statistical noise.
4. Aggregate the data.

De-anonymization (or re-identification) is where data that had been anonymized are accurately matched with the original owner or subject. This is often done by combining two or more datasets containing different information about the same or overlapping groups of people. For instance, anonymized data from social media accounts could be combined to identify individuals. Often this risk is highest when anonymized data is sold to third parties who then re-identify people.


Image Source:
http://technodocbox.com/Internet_Technology/75952421-De-anonymizing-social-networks-and-inferring-private-attributes-using-knowledge-graphs.html

One example of this is Sweeney’s 2002 paper where she was able to correctly identify 87% of the US population with just zip code, birthdate, and sex. Another example is work by Acqusiti and Gross from 2009, where they were able to predict social security numbers with birthdate and geographic location. Other examples include a 2018 study by Kondor, et al., where they were able to identify people based on mobility and spatial data. While their study only had a 16.8% success rate after a week, this jumped to 55% after four weeks.


Image Source:
https://portswigger.net/daily-swig/block-function-exploited-to-deanonymize-social-media-accounts

Actions Moving Forward
There are many options data professionals can take. These range from being negligent stewards, doing as little as possible, to the more sophisticated differential privacy option. El Emam presented a protocol back in 2016 that does a very elegant job of balancing feasibility with effectiveness to anonymize data. He proposed the following steps:

1. Classify variables according to direct, indirect, and non-identifiers
2. Remove or replace direct identifiers with a pseudonym
3. Use a k-anonymity method to de-identify the indirect identifiers
4. Conduct a motivated intruder test
5. Update the anonymization with findings from the test
6. Repeat as necessary

We are unlikely to ever truly know the risk of doxing (and with it, de-anonymization of PII). However, we need to assume de-anonymization is always possible. Because our users trust us with their data and their assumed privacy, we need to make sure their trust is well-placed and be vigilant stewards of their data and privacy interests. What we do, and the steps we take as data professionals can and do have an impact on the lives of the people behind the data.

Works Cited:
Acquisti, A., & Gross, R. (2009). Predicting Social Security numbers from public data. Proceedings of the National Academy of Sciences, 106(27), 10975–10980. https://doi.org/10.1073/pnas.0904891106
Center, E. P. I. (2019). EPIC – Re-identification. Retrieved February 3, 2019, from https://epic.org/privacy/reidentification/
El Emam, Khaled. (2016). A de-identification protocol for open data. In Privacy Tech. International Association of Privacy Professionals. Retrieved from https://iapp.org/news/a/a-de-identification-protocol-for-open-data/
Federal Bureau of Investigation. (2011, December 18). (U//FOUO) FBI Threat to Law Enforcement From “Doxing” | Public Intelligence [FBI Bulletin]. Retrieved February 3, 2019, from https://publicintelligence.net/ufouo-fbi-threat-to-law-enforcement-from-doxing/
Lubarsky, Boris. (2017). Re-Identification of “Anonymized” Data. Georgetown Law Technology Review. Retrieved from https://georgetownlawtechreview.org/re-identification-of-anonymized-data/GLTR-04-2017/
Narayanan, A., Huey, J., & Felten, E. W. (2016). A Precautionary Approach to Big Data Privacy. In S. Gutwirth, R. Leenes, & P. De Hert (Eds.), Data Protection on the Move (Vol. 24, pp. 357–385). Dordrecht: Springer Netherlands. https://doi.org/10.1007/978-94-017-7376-8_13
Narayanan, A., & Shmatikov, V. (2010). Myths and fallacies of “personally identifiable information.” Communications of the ACM, 53(6), 24. https://doi.org/10.1145/1743546.1743558
Snyder, P., Doerfler, P., Kanich, C., & McCoy, D. (2017). Fifteen minutes of unwanted fame: detecting and characterizing doxing. In Proceedings of the 2017 Internet Measurement Conference on – IMC ’17 (pp. 432–444). London, United Kingdom: ACM Press. https://doi.org/10.1145/3131365.3131385
Sweeney, L. (2002). k-ANONYMITY: A MODEL FOR PROTECTING PRIVACY. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(05), 557–570. https://doi.org/10.1142/S0218488502001648

Android Apps in the Hot Seat for Violating Privacy Rules

Over 17k Android Apps in the Hot Seat for Violating Privacy Rules
A new ICSI study shows that Google’s user-resettable advertising IDs aren’t working
by Kathryn Hamilton (https://www.linkedin.com/in/hamiltonkathryn/)
February 24, 2019

What’s going on?
On February 14th 2019, researchers from the International Computer Science Institute (ICSI) published an article claiming that thousands of Android apps are breaking Google’s privacy rules. ICSI claims that while Google provides users with advertising privacy controls, these controls aren’t working. ICSI is concerned for users’ privacy and is looking for Google to address the problem.

But what exactly are the apps doing wrong? Since 2013, Google has required that apps record only the user’s “Ad ID” as an individual identifier. This is a unique code associated to each device that advertisers use to profiles users over time. To ensure control remains in the hands of each user, Google allows users to reset their Ad ID any time. This effectively resets everything that advertisers know about a person so that their ads are once again anonymous.

Unfortunately, ICSI found that some apps are recording other identifiers too, many of which the user cannot reset. These extra identifiers are typically hardware related like IMEI, MAC Address, SIM card ID, or device serial number.


Android’s Ad ID Settings

How does this violate privacy?

Let’s say you’ve downloaded one of the apps that ICSI has identified as being in violation. This list includes everything from Audible and Angry Birds to Flipboard News and antivirus softwares.

The app sends data about your interests to its advertisers. Included is your resettable advertising ID and your device’s IMEI, a non-resettable code that should not be there. Over time, the ad company begins to build an advertising profile about you, and the ads you see become increasingly personalized.

Eventually, you decide to reset your Ad ID to anonymize yourself. The next time you use the app, it will again send data to its advertisers about your interests, plus your new advertising ID and the same old IMEI.

To a compliant advertiser, you would appear to be a new person—this is how the Ad ID system is supposed to work. For the noncompliant app, however, advertisers simply match your IMEI to the old record they had about you and associate your two Ad IDs together.

Just like that, all your ads go back to being fully personalized, with all the same data that existed before you reset your Ad ID.

But they’re just ads. Can this really harm me?

I’m sure you have experienced the annoyance of being followed by ads after visiting a product’s page once and maybe even by accident. Or maybe you’ve tried to purchase something secretly for a loved one and had your surprise ruined by some side banner ad. The tangible harm to a given consumer might not be life-altering, but it does exist.

Regardless, the larger controversy here is not the direct harm to a consumer but rather the blatant lack of care or conscience exhibited by the advertisers. This is an example of the ever-present trend of companies being overly aggressive in the name of profit, and not respecting the mental and physical autonomy that should be fundamentally human.

This problem is only increasing as personal data is becoming numerous and easily accessible. If we’re having this amount of difficulty anonymizing ads, what kind of trouble will we face when it comes to bigger issues or more sensitive information?

What is going to happen about it?

At this point, you might be thinking that your phone’s app list is due for some attention. Take a look through your apps and delete those you don’t need or use—it’s good practice to clear the clutter regardless of whether an app is leaking data. If you have questions about specific apps, search ICSI’s Android app analytics database, which has privacy reports for over 75,000 Android apps.

In the bigger picture, it’s not immediately clear that Google, app developers, or advertisers have violated any privacy law or warrant government investigation. More likely, it seems that Google is in the public hot seat to provide a fix for the Ad ID system and to crack down on app developers.

Sadly, ICSI reported their finding to Google over five months ago, but have yet to hear back. Their study has spurred many media articles over the past few days, which means Google should feel increasing pressure and negative publicity over this in the coming weeks.

Interestingly, this case is very similar to a 2017 data scandal about Uber’s iOS app, which used hardware based IDs to tag iPhones even after the Uber app had been deleted. This was in direct violation of Apple’s privacy guidelines, caused large amounts of public outrage, and resulted in threats from Apple CEO Tim Cook to delete Uber from the iOS App Store. Uber quickly updated their app.

It will be interesting to see how public reaction and Google’s response measure up to the loud public outcry and swift action taken by Apple in the case of Uber.

Impact of Algorithmic Bias on Society

Impact of Algorithmic Bias on Society
By Anonymous | December 11, 2018

Artificial intelligence (AI) is being widely deployed in a number of realms where they have never been used before. A few examples of areas in which big data and artificial intelligence techniques are used are selecting potential candidates for employment, decisions on whether a loan should be approved or denied, and using facial recognition techniques for policing activities. Unfortunately, AI algorithms are treated as a black box in which the “answer” provided by the algorithm is presumed to the absolute truth. What is missed is the fact that these algorithms are biased for many reasons including the data that was utilized for training it. These hidden biases have serious impact on society and in many cases the divisions that have appeared among us. In the next few paragraphs we will present examples of such biases and what can be done to address them.

Impact of Bias in Education

In her book titled, “Weapons of Mass Destruction”, a mathematician, Cathy O’Neil, gives many examples of how mathematics on which machine learning algorithms are based on can easily cause untold harm on people and society. One such example she provides is the goal set forward by Washington D.C.’s newly elected mayor, Adrian Fenty, to turn around the city’s underperforming schools. To achieve his goal, the mayor hired an education reformeras the chancellor of Washington’s schools. This individual, based on an ongoing theory that the students were not learning enough because their teachers were not doing a good job, implemented a plan to weed out the “worst” teachers. A new teacher assessment tools called IMPACT was put in place and the teachers whose scores fell in the bottom 2% in the first year of operation, and 5% in the second year of the operation were automatically fired. From mathematical sense this approach makes perfect sense: evaluate the data and optimize the system to get the most out of it. Alas, as Cathy points out in the example, the factors that were used to determine the IMPACT score were flawed. Specifically, it was based on a model that did not have enough data to reduce statistical variance and improve accuracy of the conclusions one can draw from the score. As a result, teachers in poor neighborhoods, performing very well in a number of different metrics, were the ones that were impacted by the use of the flawed model. The situation was further exacerbated by the fact that it is very hard to attract and grow talented teachers in the schools in poor neighborhoods, many of whom are underperforming.

Gender Bias in Algorithms Used By Large Public Cloud Providers

The bias in algorithms is not limited to small entities with limited amount of data. Even large public cloud providers with access to large number of records can easily create algorithms that are biased and cause irreperable harm when used to make impactful decisions. The website, http://gendershades.org/, provides one such example. The research to determine if there were any biases in the algorithms of three major facial recognition AI service provider— Microsoft, IBM and Face++— was conducted by providing 1270 images from a mix of individuals originating from the continent of Africa and Europe. The sample had subjects from 3 African countries and 3 European countries with 54.4% male and 44.6% female division. Furthermore, 53.6% of the subjects had light skin and 46.4% had darker skin. When the algorithms from the three companies were asked to classify the gender of the samples, as seen in the figure below, the algorithms performed relatively well when one looks just at the overall accuracy.

However, on further investigation, as seen in the figure below, the algorithms performed poorly when classifying dark skinned individuals, particularly women. Clearly, any decisions that one makes based on the classification results of these algorithms, would be inherently biased and potentially harmful to dark skinned women in particular.

Techniques to Address Biases in Algorithms

The recognition that the algorithms are potentially biased is the first and the most important step towards addressing the issue. The techniques to use to reduce bias and improve the performance of algorithms is an active area of research. A number of techniques ranging from creation of an oath similar to the Hippocratic Oath that doctor’s pledge to a conscious effort to use a diverse set of data much more representative of the society has been proposed and is being evaluated. There are many reasons to be optimistic that although the bias in algorithms can never be eliminated, in the very near future the extent of the bias in the algorithms would be reduced.

Bibliography

  1. Cathy O’Neil, 2016, Weapons of Math Destruction, Crown Publishing Company.
  2. How well do IBM, Microsoft and Face++ AI services guess the gender of a face?

Data Privacy for Low Income Segment

Data Privacy for Low Income Segment
by Anonymous

While many people express unease over a perceived loss of privacy in the days of “Big Data” and predictive analytics, does the harm from such “Data Surveillance” impact the rich and poor equally? I believe the answer is no. People in the low income segment are more likely to be victims of predictive algorithms as they have more data exposed and less knowledge on how to protect themselves against data malpractice.

Government Data

As the low income population is required to provide day-to-day data to the government for basic needs (e.g., Welfare, Medicare), they face more pervasive and disproportionate scrutiny. For example, food stamp users are required to have their spending patterns electronically recorded and monitored so government agencies can watch for potential fraud. This limits autonomy and opportunity for food stamp users. In one instance, the U.S Department of Agriculture (USDA) wrongfully deactivated market access for some Somali supermarkets because low income Somalis tend to spend all of their monthly allowance in one day which didn’t follow the “Normal Pattern” established by the USDA data algorithm. Upon further investigation, we learned that such practice resulted from language barriers and limited availability of Somali food in local communities. So local Somalis organize group car rides and go to Somalian supermarkets in other communities and buy a month’s worth of food in one trip. Improper government data usage like this can negatively limit spending behavior for the low income population.

Mobile Usage

Based on survey results from Washington University, the low income segment relies more heavily on mobile phones for internet use than higher income segments. (63% vs. 21%) With high mobile data usage, low income segments are more likely to be victims of advertiser’s cross-device tracking, cell site simulators and in-store tracking by retailers. The whereabouts of low income consumers are more likely to be shared with 3rd parties for data mining purposes. In addition, as the low income segment is more likely to own older generation phones that do not have the latest security updates, they are more likely to encounter security breaches which result in identity theft. As mobile usually contains more personal data than desktop, it also become the main source of internet data leak for low income users.

Social Media

The Washington University survey also indicated that low income internet users are more likely to use social media compare to higher income users. (81% vs. 73%) Most of the difference is skewed to the younger users. When answering questions in social media privacy settings, people from the low income segment are significantly less likely to use privacy settings to restrict access to the contents they post online. (65% vs. 79%). I agree with researchers’ claim that the lack of data restriction control is the result of lack of skills and understanding. After all, privacy settings for social media platform usually have complex wordings and are usually not easy to navigate. As the low income segment tends to have lower education levels, they are more likely to be confused about privacy setting and share their content with the public by default.

Predictive Targeting

As the low income segment is usually more price sensitive, they are also more likely to fall for traps that trade personal information for store coupons. By releasing their information to marketers, they can be easily profiled into various “financially vulnerable” market segments as marketer compile data from various platforms together. With such profile, they are more likely to receive advertisement for dubious financial products such as payday loans or debt relief services. This would result in more financial loss.

Conclusion

Overall, I find data protection for the low income segment to be a tricky subject. While laws such as Title VI of the Civil Right Act of 1964 outlawed discrimination on the base of race, color, religion, sex or national origin, no specific laws were enacted to protect discrimination against the poor. Financial strength is a key factor used in loan, housing/rental and employment decisions. It might be hard to establish laws that include protections for the low income population without limiting the ability of businesses to properly vet potential customers for risk. However, in the interim, I believe we should increase opportunities for training / awareness for the low income population to reduce the knowledge gap so they are less likely to become victims of “big data” or privacy invasion. With my current work, my team is partnering with non-profit organization to provide web safety training for low income communities. We hope that by educating low income consumers on the ways they can ensure their privacy online, they will benefit from all the opportunities the internet can deliver, without putting themselves or their families at unnecessary risk.

 

Reference:

Privacy, Poverty, and Big Data: A Matrix of Vulnerabilities for Poor Americans  https://openscholarship.wustl.edu/cgi/viewcontent.cgi?article=6265&context=law_lawreview

USDA disqualifies three Somalian markets from accepting federal food stamps  http://community.seattletimes.nwsource.com/archive/?date=20020410&slug=somalis10m

Internet Essentials Online Safety and Security
https://internetessentials.com/en/learning/OnlineSafetyandSecurity

Autonomous Vehicles

Autonomous Vehicles
by anonymous

As Internet of Things technology becomes a larger part of our lives, there will be privacy and ethics questions that will need to be addressed by lawmakers to protect consumers. With companies like Waymo, Uber, and other start-ups pouring millions of dollars each year into autonomous vehicle technology, self-driving cars are just around the corner and will make huge changes in our society in the next decade. As these technologies have developed over the past 5 years, questions surrounding the safety, potential ethical dilemmas, and subsequent legal issues regarding self-driving vehicles have been widely discussed and at least somewhat regulated as autonomous vehicle testing takes place on our roads. One topic that has been missing from the conversation is potential data protection and privacy issues that may arise once computer operated vehicles are shuffling us around while collected and using the stores of data they possess to subtly influence our daily lives.

To illustrate an example, Google already has troves of data on each of its users, collected from platforms and apps such as Gmail, Google Maps, Android phones, Google Fi equipped devices, and Google Homes. If Waymo, an Alphabet subsidiary, begins selling self-driving cars as personal vehicles, Google will gain access to new granular behavioral information on its users. What places does a person go to, at what time and on which days, and which brands do they prefer? Waymo could use information gathered along with the data Google already has to integrate targeted ads to persuade its users to visit sponsored businesses. For example, if McDonalds pays Waymo, they may suggest a 5 minute detour to stop for food during a road trip when an alternative such as Burger King is available with a shorter detour. Waymo could target users who Googleís machine learning algorithms have determined would buy food at McDonalds after being nudged by their vehicleís suggestion. Most users may not ever know that they were a victim of a sponsored suggestion. Autonomous vehicles will be able to do this for retail, restaurants, grocery markets, bars, as well as services such as dry cleaning, salons, etc. If no protections are put in place, companies will have free reign to target users and influence their decisions each time they get into a vehicle.

There are a few simple things that can be done proactively rather easily by companies to reduce potential harm to users. This is an area where regulations will be crucial since there will be no standards or consistency without legal guidelines. Companies can remove personally identifiable information in their databases, avoiding the potential harm of data leaks or hacks and making it more difficult for other platforms to use data gathered to target users. They can also give users the option to be targeted and even offer direct discounts in exchange for targeted ads. This would both provide a tangible benefit and could also serve to ensure that users are aware that they are being targeted when they receive their perks. Unnecessary data can be deleted after a certain time period so that each personís history is not stored forever.

This domain is entirely new for data gathering, targeted advertising, and sponsored suggestions and has had no impact on peopleís lives in the past. The question of what protections will be put in place for people as self-driving cars enter our roads is a fundamental one that needs to have answers. Technology today develops so quickly that legal guidelines often lag, as they take time to form and be passed into law. This leaves a hole for technology to be pushed to production quickly, leaving users, the general public, to take the full exposure of potential harm.

Five Interesting Facts About GDPR’s Data Protection Officer (DPO)

Five Interesting Facts About GDPR’s Data Protection Officer (DPO)
David Larance

The recently enforced European Union’s Global Data Protection Regulation (GDPR) introduced a new term that CEOs, Board of Directors, and other senior corporate officials need to start learning more about, the Data Protection Officer. While some “What is a DPO?” articles exist online, I’ve found five additional interesting facts in my review of the new role.

1.It’s so important that committed DPOs an entire section
Image by David Larance

2.Not every company needs a DPO

Article 37’s designation of the DPO is limited to if one of three situations are met.

a) The data processing is managed by a “public authority or body”;
b) The processor’s core business already requires “regular and systemic monitoring of data subjects”; or
c) The processor’s core business is related to criminal activity or in a “special categories” section which includes sensitive data figures such as (race/ethic, political, genetic data, etc.)

3.Companies can contract out the DPO to 3rd a party provider

Article 37.6 clearly states that “The DPO may….fulfill the tasks on the basis of a service contract”. It doesn’t state any additional detail as to whether the DPO must be a full-time position or even if one DPO can fulfill the role for multiple independent organizations. By not explicitly stating the terms of what a valid service contract entails the article appears to legally open the door for a cottage industry of DPOs for hire. Given the stated cost of implementing GDPR by many high profile organizations, it will be interesting to see if firms feel like they reduce head count costs by using a 3rd party to meet the DPO requirements.


Image via aphaia, see references

4.The DPO is essentially a free agent

Article 38 details several elements of the DPO’s role, which when combined paint the picture of an independent role where they get to be a combined data auditor and data subject protector. What makes the role especially interesting is while they “may be a staff member of the controller or processor” they also say that they cannot be penalized or dismissed by the controller or processor and report to the highest levels of management. This provides a legal defense for any DPO wrongful dismissal case while also maintaining that the only people that need to be 100% aware of the DPOs activities are the highest levels of management (who usually are only focused on data privacy issues when an event or breach has occurred).

5.Good DPOs will be hard to find

A good DPO will be a skilled data technician, data privacy expert, and able to navigate complicated business processes within their own organization. They will need to be able to understand and access the back end systems and algorithms that manage their companies data to adequately monitor and test how protected the data actually is while also managing regulator and executive expectations. These two areas of domain when combined are challenging to manage and probably more importantly, challenging to communicate and provide transparency to all stakeholders.

See also:
1. Regulation (EU) 2016/679. (2018). Retrieved from https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32016R0679
2. What is a Data Protection Officer (DPO)? Learn About the New Role Required for GDPR Compliance?, digitalguardian, (2018). Retreived from https://digitalguardian.com/blog/what-data-protection-officer-dpo-learn-about-new-role-required-gdpr-compliance

Images from:
1. Larance (2018)
2. Do I need a Data Protection Officer, aphaia, (2018). Retrieved from https://aphaia.co.uk/en/2018/03/26/do-i-need-a-data-protection-officer/

Can Blockchain and GDPR Truly Coexist?

Can Blockchain and GDPR Truly Coexist?
by Joseph Lee

As of 25 May 2018, the General Data Protection Regulation (GDPR) has been taking no prisoners in its enforcement across the world. Facebook itself is expected to face at least $1.6 Billion fine for a publicly disclosed data breach that allowed hackers access to over 50 Million facebook user accounts [1][2]. Not only are tech giants targeted by this regulation, but any organization is also fair game. As of October 29, 2018, GDPR has fined a Portuguese hospital with for €400,000 for two violations of the GDPR due to poor data management practices. While it is comforting to know that regulation regarding the ethical conduct of data collection, storage, and usage is in place, how does GDPR impact areas that have fuzzy definitions of data controllers, processors, and subjects? In this essay, I will lightly assess the feasibility of a well known decentralized protocol, blockchain, with GDPR compliance.

The GDPR was first proposed by the European Commission back in 2012 with the initial intent on monitoring cloud services and social networks [8]. At the time, blockchain was not a well-known concept, and most cloud infrastructures and social networks were based on a central information system [4]. This centrality gives the GDPR a relatively easy target for substantiating and finding data breaches and other related violations. But how will the GDPR affect and even enforce regulations on decentralized protocols such as blockchains?

First, what is blockchain? The blockchain is essentially an incorruptible digital ledger of economic transactions that can be programmed to record anything from financial transactions to any digitized action [6]. Proponents for blockchain would usually cite that the following the critical characteristics of blockchain are its public transparency, a potential to increase transaction speed, and reduction of middle management costs. While this technology is famous for its applications in cryptocurrencies, it is essential to acknowledge that this decentralized protocol could potentially revolutionize other industries such as automating health records, smart contracts, or even banking [5]. That said, the future of blockchain will depend on how this technology can comply with GDPR.

At an initial glance, one might think there is a paradoxical relationship between GDPR and public blockchains. For instance, among the many requirements set out in the GDPR, the “right to erasure” appears contradict the immutability of blockchain technology.

A promising solution that is gaining popularity amongst blockchain supporters is the use of private blockchains and off-chains. The general concept of this idea is simple. A person would store personal data off-chain and store the reference to this data on the ledger. This hashing ability means that it is possible for any person to delete their private information off-chain even though the original reference is still on the public blockchain network. I would strongly recommend visiting [Andries Van Humbeeck’s post](https://medium.com/wearetheledger/the-blockchain-gdpr-paradox-fc51e663d047) regarding the details of how off-chain and private blockchains can work, represented in figure below [7].

While this may technically meet GDPR’s definition of the right of erasure, there are other components of this workaround to consider regarding feasibility. The use and enforcement of off-chain ledgers would in actuality imply an increase in complexity and reduction of transparency. Moreover, the additional complexity could reduce the speed of peer-to-peer transactions [8]. In short, this means that in order to make blockchain comply with GDPR, we would need to sacrifice the primary benefits of having a decentralized network in the first place.

Despite the pros and cons of these workarounds, there are still a large number of unknowns. As mentioned before, GDPR relies on a clear definition of controller and subjects. However, managing these relationships will very complex when it comes to decentralized protocols. If we are not aware of every individual using blockchain, how can someone be clear on whom the responsibilities of controllers or subjects lie? How can we ensure that regulations are fairly and justly applied when such relationships are not clear?

While the future of blockchain compliance with GDPR is uncertain, it is vital for us to continue the dialogue regarding blockchain and GDPR coexistence. In 2017, the Financial Conduct Authority published a discussion paper regarding the challenges that blockchain faces in light of GDPR enforcement [4]. The overall conclusion was while there were significant challenges, the combination of GDPR and the use of decentralized ledger systems has the potential to improve an organization’s data collection, storage, and management of private data which would, in turn, enhance consumer outcomes and experiences.

In conclusion, the question of coexistence is still relevant and should continue to be debated and discussed. It would be exciting to see both relatively young paradigms interact and see how this interaction will create new precedents on how we regulate decentralized protocols.

References

[1] https://www.cnet.com/news/gdpr-google-and-facebook-face-up-to-9-3-billion-in-fines-on-first-day-of-new-privacy-law/

[2] https://www.huntonprivacyblog.com/2018/09/25/ico-issues-first-enforcement-action-gdpr/

[3] https://www.cnbc.com/2018/10/02/facebook-data-breach-social-network-could-face-EU-fine.html

[4] https://www.insideprivacy.com/international/european-union/the-gdpr-and-blockchain/

[5] https://www.g2crowd.com/categories/blockchain

[6] https://blockgeeks.com/guides/what-is-blockchain-technology/

[7] https://medium.com/wearetheledger/the-blockchain-gdpr-paradox-fc51e663d047

[8] https://cointelegraph.com/news/gdpr-and-blockchain-is-the-new-eu-data-protection-regulation-a-threat-or-an-incentive

Privacy in Communications in Three Acts. Starring Alice, Bob, and Eve

Privacy in Communications in Three Acts. Starring
Alice, Bob, and Eve

By Mike Frazzini

Act 1: Alice, Bob, and Setting the Communications Stage
Meet Alice and Bob, two adults who wish to communicate privately. We are not sure why or what they wish to communicate privately, but either would say it’s none of our business. OK. Well, lets hope they are making good decisions, and assess if their wish for private communication is even possible. If it is possible, how it could be done, and what are some of the legal implications? We set the stage with a model of communications. A highly useful choice would be the Shannon-Weaver Model of Communications, which was created in 1948 when Claude Shannon and Warren Weaver wrote the article “A Mathematical Theory of Communication” that appeared in the Bell System Technical
Journal. This article, and particularly Claude Shannon , are considered founding thought leaders of information theory and studies. The model is shown in the diagram below:


Image: Shannon-Weaver Model, Image Credit: http://www.wlgcommunication.com/what_is

Playing the role of Sender will be Alice, Bob will be the Receiver, and the other key parts of the model we will focus on are the message and the channel. The message is simply the communication Alice and Bob wish to exchange, and the channel is how they exchange it, which could take many forms. The channel could be the air if Alice and Bob are right next to each other in a park and speaking the message, or the channel could be the modern examples of text, messenger, and/or email services on the Internet.

Act 2: Channel and Message Privacy, and Eve the Eavesdropper
So, Alice and Bob wish to communicate privately. How is this possible? Referring to the model, this would require that the message Alice and Bob communicate only be accessed and understood by them, and only them, from sending to receipt, through the communication channel. With respect to access, whether the channel is a foot of air between Alice and Bob on a park bench, or a modern global communications network, we should never assume the channel is privately accessible to only Alice and Bob. There is always risk that a third party has access to the channel and the message – from what might be recorded and overheard on the park bench, to a communications company monitoring a channel for operational quality, to the U.S. government accessing and collecting all domestic and global communications on all major communications channels like the whistleblower Edward Snowden reported.
If we assume that there is a third party, let’s call them Eve, and Eve is able to access the channel and the message, it becomes much more challenging to achieve the desired private communications. How can this be achieved? Alice and Bob can use an amazing practice for rendering information secret for only the authorized parties. This amazing practice is called cryptography. There are many types and techniques in cryptography, the most popular being encryption, but the approach is conceptually similar and involves “scrambling” a message into nonsense that can then only be understood by authorized parties.
Cryptography provides a way that Alice and Bob can exchange a message that only they can fully decode. Cryptography would prevent Eve from understanding the message between Alice and Bob, even if Eve had access to it.

To illustrate cryptography, let’s suppose Alice and Bob use an unbreakable form of cryptography, called a One Time Pad (OTP). This is a relatively simple method where Alice and Bob would pre-generate a completely random string of characters, then securely and secretly share this string called a key. One way they might do this is using a 40-sided dice with a number on each side representing each of 40 characters they might use in their message; the numbers 0-9, the 26 letters of the English alphabet A-Z, and 4 additional characters to represent a space among others. They would assign all of the characters a sequential number as well. They could then do modular arithmetic to encode the message with the random key:


Image: OTP Example, Image Credit: Mike Frazzini

Act 3: What have Alice and Bob Wrought?  Legal Implications of Private Communications

So now that we have shown that it certainly is technically possible – and it is also mathematically provable – for Alice and Bob to engage in private communications, we create a tempest of legal questions that we will now attempt to provide some high level resolution. The first big question is on the legality of Alice and Bob in engaging in private communications. We will approach this question from the standpoint of a fully democratic and free society, and specifically of the United States, since many countries have not even established de facto, let alone de jure, full democracies including protections for freedom of speech.

We can address this in two parts; the question of the legality and protections of communications on the channel, including the aspects of monitoring and interception of communications; and the question of the legality of using cryptographic methods to protect the privacy of communications. In the United States, there are a number of foundational legal rights, statutes, and case law precedents, from the Fourth Amendment of the U.S. Constitution protecting against “unwarranted search and seizure,” to the Electronic Communications Privacy Act, to U.S. Code Chapter 119, that all generally protect privacy of communications, including protection from monitoring and interception of communications. However, this legal doctrine also defines conditions where monitoring and interception may be warranted. And as we have also presented, in at least the one case reported by Edward Snowden, there was widespread unwarranted abuse of the monitoring and interception of communications by the U.S. government, with indefinite retention for future analysis. So, given these scenarios, and then including all the commercial and other third parties that may have access to the communication channel and message, Alice and Bob are wise to assume their may be eavesdropping on their communication channel and message.

Regarding the question of the legality of using cryptographic methods to protect the privacy of communications in the U.S., there does not appear to be any law generally preventing consumer use of cryptographic methods domestically. There are a myriad of acceptable use and licensing restrictions, based in U.S. statute, such as the case of the FCC part 97 rules that prohibit cryptography over public ham radio networks. It is also likely that many communications providers have terms and conditions, as well as commercial law based contracts, that restrict or prohibit use of certain high-strength cryptographic methods. Alice and Bob would be wise to be aware and understand these before they use high strength cryptography.

There are also export laws within the U.S. that address certain types and methods of strong cryptography. There is legislation pending and relevant case law precedents that restrict cryptography as well. In response to the strengthening of technology platform cryptography, like that recently done by Apple and Google, and referred to by the U.S. law enforcement community as “going dark,” a senate bill was introduced by Senators Diane Feinstein and Richard Burr to require “back-door” access for law enforcement which would render the cryptography ineffective. This has not yet become law, however there has been several examples of lower court and state and local jurisdictions requiring people to reveal their secret keys so messages could be unencrypted by law enforcement. This is despite the Fifth Amendment protections of the U.S. Constitution for self-incrimination.

Of course, there are many scenarios where information itself, and/or communication of information, can constitute a criminal act (actus reus). Examples of this include threats, perjury, and conspiracy. So, again, we hope Alice and Bob are making good choices, since their communications – and the information transmitted therin – could certainly be illegal, even if the privacy of their communications itself is not illegal.

The Physics Behind Good Data Security

The Physics Behind Good Data Security
By Rani Fields

The data security apocalypse is upon us. While that statement might be a bit hyperbolic, 2017 and 2018 were certainly emblematic years in data security. From Panera to Facebook, Aadhaar to Saks Fifth Avenue, the frequency, intensity, and expansiveness of data breaches entered an upward swing with increasing intensity. When you factor in the expensive nature of a breach, at $148 per record (IBM, 2018), the damages from a data breach impacting even a small group can reach into the millions. A company’s ability to quickly respond in a manner which satisfies regulatory officials and placates embroiled public sentiments also increases in complexity as a company’s profile increases. Needless to say, modern companies with an online presence should not concentrate solely on preventing breaches but rather have an extensive focus on managing the fall-out of data breaches. Data breaches should be considered inevitable in today’s world, after all.

Where Proactive Approaches Fail
All companies with a sizeable online footprint should be prepared for a security breach. As penetration methods and social engineering methods become increasingly refined, we cannot reasonably assume any infrastructure to be sufficiently free from the risk of a breach. As such, breaches are a question of when they will occur, not if they will occur. The sources of a breach are varied- they can happen via data leaks, infrastructure hackings and other technical vulnerabilities, phishing and other social engineering methods, and via inside jobs. Because the perfect system and the perfect policy does not exist, a company can always consider themselves vulnerable to some degree along one or more of these axes. Thus, any robust breach policy should be designed to not just mitigate risks but also properly prepare the company for a speedy response in line with which systems were breached and the nature of the resulting exposure.

Image via Statista, see references

Technical and Policy Risk Management
As the major attack medium in question is ultimately electronic, we can consider a number of digital areas when reducing risk. Naturally, existing best practices will prevail first and foremost. Identity management, security-by-design, intruder detection, and other similar techniques and technologies should be used wherever possible. The issue with these proactive methods is, with time, the number of resources required to manage a company’s electronic systems can eclipse the benefit hardening these systems can provide as a company’s technical offerings grow.

From a technical standpoint, proactive security policies present a definite amount of benefit, albeit with a limit. Thus, when managing system risk, companies should consider the amount of time and resources required to harden mission-critical systems versus other systems when pursuing a proactive approach.

With a reactive approach to security, we pivot from a question of where we can minimize risk to a question of how we can better understand and respond to security incidents. In this, we see a disproportionate importance in maintaining business continuity plans and disaster recovery plans. For each type of data stored and for each type of breach, you need to ask yourself if your company or group has a clear policy defining:

1. What happens to affected systems in the immediate aftermath of a breach?
2. Can you operate with reduced resources if a system is taken offline due to a breach?
3. Do you have any way to determine which resources were accessed within hours after a breach?
4. Do you have a data protection officer?
5. Does your plan account for any legal and regulatory questions which can occur in the aftermath of a breach, within reason?

Finally, consider modeling privacy and internal data flows when designing data-oriented privacy policies. The key focus of a company in the wake of a breach will be a fast and accurate response; knowing which entities had their data exposed and which specific data were affected are critical to ensuring that your company takes the correct response at the correct time. Furthermore, knowing this information in a process-oriented manner opens pathways to efficiently reducing risk by way of reducing attack surfaces while enabling internal security policies to operate smoothly.

Due to both evolving regulatory changes and the ever-evolving security landscape, the failure to be able to act in a reactive fashion can damage a company more than the benefit provided by simply reducing the risk of a breach. Thus, companies and stakeholders should review their policies to ensure procedures are properly defined so that a company can act in a reactive fashion when the time inevitably comes.

See also:
1. Cost of a Data Breach Study. (2018). Retrieved from https://www.ibm.com/security/data-breach
2. U.S. data breaches and exposed records 2018 | Statistic. (2018). Retrieved from https://www.statista.com/statistics/273550/data-breaches-recorded-in-the-united-states-by-number-of-breaches-and-records-exposed/
3. Data Breach Reports (Tech.). (2018, June 30). Retrieved https://www.idtheftcenter.org/wp-content/uploads/2018/07/DataBreachReport_2018.pdf
4. The History of Data Breaches. (2018, November 07). Retrieved from https://digitalguardian.com/blog/history-data-breaches
5. Data Breach Notification Requirements Coming from EU Expand Obligations for Organizations Worldwide. (2017, September 21). Retrieved from https://www.mayerbrown.com/data-breach-notification-requirements-coming-from-eu-expand-obligations-for-organizations-worldwide/

Images from:
1. University of West Florida. (n.d.). Legal & Consumer Info. Retrieved from https://uwf.edu/go/legal-and-consumer-info/general-data-protection-regulation-gdpr/
2. U.S. data breaches and exposed records 2018 | Statistic. (2018). Retrieved from https://www.statista.com/statistics/273550/data-breaches-recorded-in-the-united-states-by-number-of-breaches-and-records-exposed/