October 2018 – Data Science W231 | Behind the Data: Humans and Values

October 24, 2018

Is the GDPR’s Bark Bigger than its Bite?

Is the GDPR’s Bark Bigger than its Bite?
by Zach Day on 10/21/2018

The landmark EU regulation, formally called the General Data Protection Regulation or GDPR, took effect on May 25, 2018. Among other protections, GDPR grants “data subjects” with a bundle of new rights and placed an increased obligation on companies who collect and use the data. Firms were given a two year preemptive notice to implement changes that would bring them into compliance with GDPR by May 2018.

Image Credit: https://www.itrw.net/2018/03/22/what-you-need-to-know-about-general-data-protection-regulation-gdpr/

I don’t think I’m reaching too far to make this claim: some for-profit enterprises won’t do the right thing just because it’s the right thing, especially when the right thing is costly. Do the EU member countries’ respective Data Protection Authorities, also called DPAs, have enforcement tools that are powerful enough to motivate firms to invest in the systems and processes required for compliance?

Let’s compare two primary enforcement tools/consequences, monetary fines and bad press coverage.

Monetary Fines

When the UK Information Commissioner’s Office released their findings on Facebook’s role in the Cambridge Analytica scandal, the fine was capped at 500,000 pounds or $661,000. This is because Facebook’s transgressions occured before the initiation of GDPR and was therefore subject to the regulations of the UK Data Protection Act of 1998, the UK’s GDPR precursor, which specifies a maximum administrative fine of 500,000 pounds. How painful do you think a sub-million dollar fine is for a company that generated $40B of revenue in 2017?

GDPR vastly increases the potential monetary fine amount to a maximum of $20M euros or 4% of the company’s global turnover. For Facebook, this would have amounted to a fine of ~$1.6B. That’s more like it.

But how effectively can EU countries enforce the GDPR? GDPR enforcement occurs at the national level, with each member country possessing its own Data Protection Authority. Each nation’s DPA has full enforcement discretion. Because of this, there will inevitably be variation in enforcement trends from country to country. Countries like Germany, with a strong cultural value of protecting individual privacy, may enforce the GDPR with far more gusto than a country like Malta or Cyprus.

Monetary fines are not going to be the go-to tool for every enforcement case brought under the GDPR. DPAs have vast investigative powers, such as carrying out audits, obtaining access to stored personal data, accessing the facilities of the data controller or processor then issuing warnings, reprimands, orders, and bans on processing. It’s likely that these methods will be used with much more frequency. Although, the first few cases will be anomalies since, a) media outlets are chomping at the bit to report on the first few enforcement actions taken under the GDPR and b) DPAs will be trying to send a message.

PR Damage

What do you think burned worse to Facebook, a $661,000 fine or the front page of every international media outlet running the story for hundreds of millions of readers to see (imagine for how many this was the last straw, causing them to deactivate their Facebook accounts)? I would argue that the most powerful tool in the GDPR regulators toolbox is the bad press associated with a GDPR violation brought against a company, especially in the early years of the regulation when the topic is still fraught.

Mark Zuckerberg testifying before a joint hearing of the Senate Judiciary and Senate Commerce Committees, April 10, 2018. Image Credit: https://variety.com/2018/digital/news/zuckerberg-congress-testimony-1202749461/

A report published in July by TrustArc outlining estimated GDPR compliance rates across the US, UK, and EU noted that 57% of firms are motivated to comply with GDPR by ‘customer satisfaction’, whereas only 39% were motivated by fines. Of course a small business with 100 employees in a suburb of London is chiefly concerned with a potential 20,000,000 euro fine. They’d simply be out of business. On the other hand, large Silicon Valley based tech firms, with armies of experienced attorneys (Facebook attorneys have plenty of litigation experience in this area, by now), have much more to lose from more bad press versus a fine of any amount allowed under GDPR.

Path Forward Firms are going to pursue any path that leads to maximum revenue growth and profitability, even if it means operating in ethical/legal grey areas. If GDPR regulators plan to effectively motivate compliance from companies, they need to focus on the most sensitive pressure points. For some, it’s the threat of monetary penalty. For the tech behemoths, it’s the threat of another negative front-page headline. Regulators will be at a strategic disadvantage if they don’t acknowledge this fact and master their PR strategies.

October 24, 2018

Ethical Issues of the Healthcare Internet of Things

Ethical Issues of the Healthcare Internet of Things
By Osmar Coronel | October 21, 2018

Tracking vital signs on the fly

Very likely you are already using an Internet of Things (IoTs) product for your healthcare.

Our new connected world

IoTs are small computing objects that are constantly collecting information and sending it to the cloud or turning on or off something automatically. This blog is mainly focused on the ethical risks of IoT devices applied to improving your health.

IoT devices are predicted to expand in the healthcare industry due to the many benefits they provide. However, our FIPP, Fair Information Practice Policy, regulation might not be able to protect the consumer against all the new ethical risks associated with the upcoming healthcare IoT device applications.

IoT devices have several applications in healthcare. For instance, insulin pumps and blood-pressure cuffs connect to a mobile app that tracks and monitors blood pressure. The power of technology allows people to take control of their health. With IoTs people can also be more engage with their health. A patient with an insulin pump can be in more control of their blood pressure levels and this gives them more control of their diabetes. IoTs applied to healthcare can monitor and collect information such as heart rate and skin temperature. The data captured from the consumer can be transmitted, stored, and analyzed. This creates opportunities for research.

The use of IoTs in healthcare is expanding in the medical area. According to a Market Watch article, the healthcare IoT Market is expected to be worth $158 Billion by 2022. The Consumer Electronics Show (CES) in 2018, showcased several companies IoT products created to diagnose, monitor and treat illnesses.

Under the lens of the Federal Trade Commission (“FTC”), the FIPP focus is on notice, access, accuracy, data minimization, security, and accountability. The most relevant recommendations for IoT are security, data minimization, notice, and choice.

From the FIPP security recommendation, companies should implement from the very beginning “security by design”. They should also train their employees and retain providers that are able to enforce security in their services.

New Healthcare

Another risk of the application of healthcare IoTs is that it collects a large amount of data of the consumer for a long period. The FIPP Data Minimization proposes that companies should limit the data collected only for what is needed and for a limited time. Companies should develop and apply best practices, business needs and develop policies and practices that impose reasonable limits on the collection and retention of consumer data. Security and data minimization have more explicit initiatives to help minimize the ethical risks of IoTs.

On the other hand, notice and choice could be a challenge. In general there is a high risk that IoT companies do not provide notice or choice to the customer. Providing a notice or choice is challenging since IoTs are used in everyday life and they typically lack a user interface. Furthermore, some people think that the benefit of the IoT devices outweigh the cost of not giving the consumer notice and choice.

It is challenging to provide a choice when there is no user interface. However, according to the FIPP, there are still suitable alternatives like the implementation of video tutorials and implementation of QR codes on the devices. Also, in many cases, the data use might be under the consumers’ expectations, so that means that not every data collection needs to require the consumer consenting to the collection of data. Companies should implement opt-in choices at the point of sale when the consumer is acquiring the device with an easy to understand language.

The use of new technological advances in healthcare IoT devices offers a large number of benefits and they will expand considerable in the healthcare sector. Nonetheless, they will require careful implementation. The expansion of healthcare IoTs will come with a surge of new ethical problems and conflicts.

October 24, 2018

Unknown Knowns

Unknown Knowns
by Anonymous on 10/21/2018

Image Credit: https://www.azquotes.com/quote/254214
Donald Rumsfeld during Department of Defense News Briefing, archive.defense.gov. February 12, 2002.

The taxonomy of knowledge laid out by Rumsfeld in his much quoted new briefing conspicuously omits a fourth category: unknown knowns. In his critique of Rumsfeld’s analysis, the philosopher Slavoj Žižek defines the unknown knowns as “the disavowed beliefs and suppositions we are not even aware of adhering to ourselves, but which nonetheless determine our acts and feelings.” While this may seem like the realm of psychoanalysis, it’s a term that could also be applied to two of the most important topics in machine learning today: bias and interpretability.

The battle against bias, especially illegal biases that discriminate against protected classes, is a strong focus for both academia and industry. Simply testing the outputs of an algorithm for different categories of people for statistical difference can reveal things about the decision making process that were previous unknown, flipping things from the “unknown known” state to “known known.” More advanced interpretability tools, like LIME, are able to reveal even more subtle relationships between inputs and outputs.

While swaths of “unknown knowns” are being converted to “known knowns” with new techniques and attention, there’s still a huge amount of “unknown knowns” that we will miss forever. Explicitly called out protected classes are becoming easier to measure, but it’s rare to check for all possible intersections of protected classes. For example, there may be know measurable bias in some task when comparing genders or across race separately, but there may be bias when looking at the combinations. The fundamental nature of intersections is that their populations become smaller as more dimensions are considered, so the statistical tests become less powerful and it’s harder for automated tools to identify bias with certainty.

Image Credit: https://www.ywboston.org/2017/03/what-is-intersectionality-and-what-does-it-have-to-do-with-me/

There are are also many sub-classes that we don’t even know to look for bias against and have to rely on chance to discover. For example, in 2014 Target was called out for predicting pregnancies based on shopping patterns. Their Marketing Analytics team had a hypothesis that they could target pregnant women and made the explicit choice to single out this population, but with modern unsupervised learning techniques it could have just as easily been an automatically deployed campaign where no human had ever seen the description of the target audience.

“Pregnant women” as a category is easy to describe and the concerns about such targeting are easy to stir up controversy and change corporate behaviour, but more niche groups that may be biased against by algorithms may never be noticed. It’s also troubling that there may be classes discovered by unsupervised learning algorithm that have no obvious description yet, but would be controversial if given a name.

So what can be done? It may seem like a contradiction to try and address unknown knowns, given that they’re unknown, but new interpretability tools are changing what can be known. Practitioners could also start dedicating more of their model validation time to exploring the full set of combinations of protected classes, rooting out the subtle biases that might be missed with separate analysis of each category. A less technical but more ambitious solution is for organizations and practitioners to start sharing the biases they’ve discovered in their models and to contribute to some sort of central repository that others can learn from.

October 24, 2018

GDPR Will Transform Insurance Industry’s Business Model

GDPR Will Transform Insurance Industry’s Business Model
By Amit Tyagi | October 21, 2018

The European Union wide General Data Protection Regulation, or GDPR, came into force on May 25, 2018, with significant penalties for non-compliance. In one sweep, GDPR harmonizes data protection rules across the EU and gives greater rights to individuals over how their data is used. GDPR will radically reshape how companies can collect, use and store personal information, giving people the right to know how their data are used, and to decide whether it is shared or deleted. Companies face fines of up to 4 per cent of global turnover or €20m, whichever is greater.

To comply with GDPR, companies across various industries are strengthening their data usage and protection policy and procedures, revamping old IT systems to ensure that they have the functionality to comply with GDPR requirements, and reaching out to customers to get required consents.

However, GDPR will also require a fundamental rethink of business models for some industries, especially those that heavily rely on personal data to make business and pricing decisions. A case in point is insurance industry. Insurers manage and underwrite risks. Collection, storage, processing and analyzing data is central to their business model. The data insurers collect go beyond personal information. They collect sensitive information such as health records, genetic history of illnesses, criminal records, accident-related information, and much more.

GDPR is going to affect insurance companies in many ways. Start with pricing. Setting the right price for underwriting risks heavily relies on data. With data protection and usage restriction provisions of GDPR, insurers will have to re-look at their pricing models. This may have an inflationary effect on insurance prices: not a good thing for consumers. This will be further compounded by ‘data minimization’, a core principle of GDPR limits the amount of data companies can lawfully collect.

Insurance companies typically store their data for long periods. This aids them in pricing analytics and customer segmentation. With right to erasure, customers can request insurers to erase their personal data and claims history. These requests might come from customers who have an unfavorable claims history, leading to adverse selection due to information asymmetry.

Insurance frauds are another area that will be impacted by GDPR. Insurance companies protect themselves from fraudulent claims by analyzing myriad data points, including criminal convictions. With limitation on the type of data they are able to lawfully use, quite possibly insurance frauds may spike.

Insurance companies will also have to rethink their internal processes and IT systems which were built for a pre-GDPR era. Most decisions in insurance industry are automated, which includes, inter alia, whether to issue a policy or not, how much insurance premium to charge, whether to processes a claim fully or partially. Now with GDPR, customers can lawfully request human intervention in decision making.

GDPR gives the right to customers to receive their personal data held by an insurer, or have it transmitted to another insurer in a structured, commonly used and machine-readable format. This will be a challenge as insurers will have to maintain interoperable data formats from disparate legacy IT systems. Further, this has to be done free of charge. This will surely lead to lower profitability as competition among insurers will increase.

GDPR mandates that data should be retained only as long as is necessary for the purpose for which it was collected, after which it needs to be deleted and anonymized. If stored for longer duration, the data should be pseudonymized. This will require significant system changes, which will be a huge challenge for insurance companies as the rely on disparate systems and data sources, all of which will have to change to meet GDPR requirements.

Though insurers may be acutely impacted by GDPR, their path to compliance should be a disciplined approach: revisiting systems and processes to assess readiness for this regulation and investing in filling gaps. Some changes may be big, such as data retention and privacy by design, while some may be more straightforward, such as providing privacy notices. In all cases, effective change management is the key.

October 23, 2018October 23, 2018

Making AI “Explainable”: Easier Said than… Explained

Making AI “Explainable”: Easier Said than… Explained
By Julia Buffinton | October 21, 2018

Technological advances in machine learning and artificial intelligence (AI) have opened up many applications for these computational techniques in a range of industries. AI is now used in facial recognition for TSA pre-check, reviewing resumes for large companies, and determining criminal sentencing. In all of these examples, however, these algorithms have received attention for being biased. Biased predictions can have grave consequences, and determining how biases end up in the algorithm is key to preventing them.

Understanding how algorithms reach their conclusions is crucial to their adoption in industries such as insurance, medicine, finance, security, legal, and military. Unfortunately, the majority of the population is not trained to understand these models, viewing them as opaque and non-intuitive. This is challenging when accounting for the ethical considerations that surround these algorithms – it’s difficult to understand their bias if we don’t understand how they work in general. Thus, seeking AI solutions that are explainable is key to making sure that end users of these approaches will “understand, appropriately trust, and effectively manage an emerging generation of artificially intelligent machine partners.”

How can we do that?

Developing AI and ML systems are resource-intensive, and being thorough in managing ethical implications and safety adds to that. The federal government has recognized not only the importance of AI but also ethical AI and has increased its attention and budget for both.

In 2016, the Obama administration formed the new National Science and Technology Council (NSTC) Subcommittee on Machine Learning and Artificial Intelligence to coordinate federal activity relating to AI. Additionally, the Subcommittee on Networking and Information Technology Research and Development (NITRD) created a National Artificial Intelligence Research and Development Strategic Plan to recommended roadmaps for AI research and development investments by the federal government.

This report identifies seven priorities:

Make long-term investments in AI research
Develop effective methods for human-AI collaboration
Understand and address the ethical, legal, and societal implications of AI
Ensure the safety and security of AI systems
Develop shared public datasets and environments for AI training and testing
Measure and evaluate AI technologies through standards and benchmarks
Better understand the national AI R&D workforce needs

Three of the seven priorities focus on the minimizing the negative impacts of AI. The plan indicates that “researchers must strive to develop algorithms and architectures that are verifiably consistent with, or conform to, existing laws, social norms and ethics,” and to achieve this, they must “develop systems that are transparent, and intrinsically capable of explaining the reasons for their results to users.”

Is this actually happening?

Even though topics related to security, ethics, and policy of AI comprise almost half of the federal government’s funding priorities for AI, this has not translated directly into funding levels for programs. A brief survey of the budget for the Defense Advanced Research Project Agency (DARPA) shows an overall increase in funding for 18 existing programs that focus on advancing basic and applied AI research, almost doubling each year. However, only one of these programs fits into the ethical and security priorities.

Funding Levels for DARPA AI Programs

The yellow bar on the table represents the Explainable AI program, which aims to generate machine learning techniques that produce more explainable yet still accurate models and enable human users to understand. trust, and manage these models. Target results from this program include a library of “machine learning and human-computer interface software modules that could be used to develop future explainable AI systems” that would be available for “refinement and transition into defense or commercial applications.” While funding for Explainable AI increases, it is not at rate proportional to the overall spending increases for DARPA AI programs.

What’s the big deal?

This issue will become more prevalent as the national investment in AI grows. Recently, predictions have been made that China will close the AI gap by the end of this year. As US organizations in industry and academia strive to compete with their international counterparts, careful consideration must be given not only to improving technical capabilities but also developing an ethical framework to evaluate these approaches. This not only affects US industry and economy, but it has big consequences for national security. Reps. Will Hurd (TX) and Robin Kelly (IL) argue that, “The loss of American leadership in AI could also pose a risk to ensuring any potential use of AI in weapons systems by nation-states comports with international humanitarian laws. In general, authoritarian regimes like Russia and China have not been focused on the ethical implications of AI in warfare.” These AI tools give us great power, but with great power comes great responsibility, and we have a responsibility to ensure that the systems we build are fair and ethical.

October 23, 2018October 24, 2018

What do GDPR and the #MeToo Movement Have in Common?

What do GDPR and the #MeToo Movement Have in Common?
By Asher Sered | October 21, 2018

At first glance it might be hard to see what the #MeToo movement has in common with the General Data Protection Rule (GDPR), the new monumental European regulation that governs the collection and analysis of commercially collected data. One is a 261 page document composed by a regulatory body while the other is a grassroots movement largely facilitated by social media. However, both are attempts to protect against commonplace injustices that are just now starting to be recognized for what they are. And, fascinatingly, both have brought the issue of consent to the forefront of public consciousness. In the rest of this post, I examine the issue of consent from the perspective of sexual assault prevention and data privacy and lay out what I believe to be a major issue with both consent frameworks.

Consent and Coercion

Feminists and advocates who work on confronting sexual violence have pointed out several issues with the consent framework including the fact consent is treated as a static and binary state, rather than an evolving ‘spectrum of conflicting desires’^[1]. For our purposes, I focus on the issue of ‘freely given’ consent and coercion. Most legal definitions require that for an agreement to count as genuine consent, the affirmation must be given freely and without undue outside influence. However, drawing a line between permissible attempts to achieve a desired outcome and unacceptable coercion can be quite difficult in theory and in practice.

Consider a young man out on a date where both parties seem to be hitting it off. He asks his date if she is interesting in having sex with him and she says ‘yes’. Surely this counts as consent, and we would be quick to laud the young man for doing the right thing. Now, how would we regard the situation if the date was going poorly and the man shrugged off repeated ‘nos’ and continued asking for consent before his date finally acquiesced? What if his date feared retribution if she were to continue saying ‘no’? What if the man was a celebrity? A police officer? The woman’s supervisor? At some point a ‘yes’ stops being consent and starts being a coerced response. But where that line is drawn is both practically and conceptually difficult to disentangle.

Consent Under GDPR

The authors of GDPR are aware that consent can also be a moving target in the realm of data privacy, and have gone to somelengths to try and articulate under what conditions an affirmation qualifies as consent. The Rule spends many pages laying out the specifics of what is required from a business trying to procure consent from a customer, and attempts to build in consumer protections that shield individuals from unfair coercion. GDPR emphasizes 8 primary principles of consent, including that consent be easy to withdraw, free of coercion and given with no imbalance in the relationship.

How Common is Coercion?

Just because the line between consent and coercion is difficult to draw doesn’t necessarily mean that the concept of consent isn’t ethically and legally sound. After all, our legal system rests on ideas such as intention and premeditation that are similarly difficult to disentangle. Fair point. But the question remains, in our society how often is consent actually coerced?

Michal Buchhandler-Raphael, a professor of Law at Washington and Lee University, argues that problems with legal frameworks in which the definition of rape is built around non-consensual sex ‘[are] most noticeable in the context of sexual abuse of power stemming from professional and institutional relationships.^[2]’ She cites numerous cases in which a supervisor or someone in an otherwise privileged position managed to extract consent from a subordinate, and was therefore unpunished by the legal system. Since 70% of rapes are committed by someone known to the victim, and presumably an even larger percentage of sexual interactions take place between parties who know each other, we can expect that some amount of coercion occurs in numerous day to day sexual interactions. Especially given the fact that we continue to live in a patriarchal society where men are much more likely than women to be in positions of power^[3].

This observation about power imbalances in sexual interactions neatly parallels a major concern with consent under GDPR. While GDPR requires that data subjects have a ‘genuine or free choice’ about whether to give consent, it fails to adequately account for the fact that there is always a power differential between a major corporation and a data subject. Perhaps I could decide to live without email, a smartphone, social networks or search engines, but giving up those technologies would have a major impact on my social, political and economic life. It matters much more to me that I have access to a Facebook account than it does to Facebook that they have access to my data. If I opt out, they can sell ads to their other 2 billion customers.

Conclusion

I should be clear that I do not intend to suggest that companies stop offering Terms and Conditions to potential data subjects or that prospective sexual partners stop seeking affirmative consent. But we do need to realize that consent is only part of the equation for healthy sexual relationships and just data practices. The next step is to think about what a world would like where people are not constantly pressured to give things away, but instead are empowered to pursue their own ends.

Notes

^[1] See, https://economictimes.indiatimes.com/news/politics-and-nation/thoughts-on-metoo-why-cant-men-understand-the-concept-of-consent-a-flimmaker-explains/articleshow/66198444.cms?from=mdr&fbclid=IwAR0O8fzj4cQ4d68nwqWciQPSLrepZIV00RJKAnUmsC0id6JBqnNb4CR69WQ for a fascinating take on the topic

^[2] https://repository.law.umich.edu/cgi/viewcontent.cgi?article=1014&context=mjgl

^[3] Of course, coercion can be used by people of all genders to convince potential partners to agree to have sex.

October 17, 2018

Workplace monitoring

Workplace monitoring
by Anonymous on September 30, 2018

We never intended to build a pervasive workplace monitoring system – We just wanted to replace a clunky system of shared spreadsheets and email.

Our new system was intended to track customer orders as they moved through the factory. Workers finding a problem in the production process could quickly check order history and take corrective action. As a bonus, the system would also record newly-mandatory safety-related records. And it also appealed to the notion of “Democratization of Data,” giving workers direct access to customer orders. No more emails or phone-tag with production planners.

It goes live

The system was well received, and we started collecting a lot of data: a log every action performed on each sales order, with user IDs and timestamps. Workers could see all the log entries for each order they processed. And the log entries became invisible to workers once an order was closed.

Invisible, but not deleted.

Two years later, during a time of cost-cutting, it came to the attention of management that the logs could be consolidated and sorted by *any* field. Like this:

A new report was soon generated; logs sorted by worker ID. It didn’t seem like such a major request. After all, the data was already there. No notice was given to workers about the new report, or its potential use as a worker performance evaluation tool.

Re-purposed data

The personally identifiable information was re-purposed without notice or consent. The privacy issue may be intangible now to the workers, but could one day become very tangible as a factor in pay or layoff decisions. There is also potential for misinterpretation of the data. A worker doing many small tasks could be seen as doing far more than a worker doing a few time-consuming tasks.

Protection for workers’ information

California employers may monitor workers’ computer usage. The California Consumer Privacy act of 2018 covers consumers, not workers.

However, European Union’s General Data Protection Regulation (GDPR) addresses this directly, and some related portions of the system operate in the EU.

GDPR’s scope is broad; covering personally identifiable information in all walks of life (e.g. as a worker, as consumer, as citizen.) Section 26 makes clear: “The principles of data protection should apply to any information concerning an identified or identifiable natural person.” Other sections cover consent, re-purposing, and fairness/transparency issues, and erasability (sections 32, 60, 65, and 66.)

Most particularly, Article 88 requires that collection and use of personally identifiable information in workplace should be subject to particularly high standards of transparency.

Failings

It’s easy in hindsight to find multiple points at which this might have been avoided. Mulligan’s and Solove’s frameworks suggest looking at “actors” and causes.

Software Developer’s action (harm during data collection): There could have been a login at the level of a processing station ID, rather than a worker’s personal ID.
Software Developer’s action (and timescale considerations, yielding harm during data processing): The developer could have completely deleted the worker IDs once the order closed.
Software Developer’s action (harm during data processing and dissemination: increased Accessibility, Distortion): The
developer could have written the report to show that simple “event counting” wasn’t a reliable way of measuring worker contribution.
Management’s action (harm during data gathering and processing): the secondary use of the data resulted in intrusive surveillance. Businesses have responsibility (particularly under GDPR) to be transparent with respect to workplace data. Due concern for the control over personal information was not shown.

Prevention

One way forward, when working on systems which evolve over time: Consider Mulligan’s contrast concept dimension of privacy analysis, applied with Solove’s meta-harm categories. Over the full evolutionary life of a software system, we can ask: “What is private and what is not?” If the actors – developer, manager, and even workers – had asked, as each new feature was requested: “What is being newly exposed, and what is made private,” they might not have drifted into the current state. It’s a question which could readily be built into formal software development processes.

But in an era of “Data Democratization,” when data-mining may be more broadly available withing organizations, such checklists might be necessary but not sufficient. Organizations will likely need broadly-based internal training on protection of personal information.

October 17, 2018

May we recommend…some Ethics?

May we recommend…some Ethics?
by Jessica Hays | September 30, 2018

The internet today is awash with recommender systems. Some pop to mind quickly – like Netflix’s suggestions of shows to binge, or Amazon’s nudges towards products you may like. These tools use your personal history on their site, along with troves of data from other users, to predict which items or videos are most likely to tempt you. Other examples of recommender system include social media (“people you may know!”), online dating apps, news aggregators, search engines, restaurants finders, and music or video streaming services.

Screenshots of recommendations from LinkedIn, Netflix

Recommender systems have proliferated because there are benefits to be shared on both sides of the coin. Data-driven recommendations mean customers have to spend less time digging for the perfect product themselves. The algorithm does the heavy lifting – once they set off on the trail of whatever they’re seeking, they are guided to things they may have otherwise only found after hours of searching (or not at all). Reaping even more rewards, however, are the companies using the hook of an initial search to draw users further and further into their platform, increasing their revenue potential (the whole point!) with every click.

Not that innocent

In the last year, however, some of these recommender giants (YouTube, Facebook, Twitter) have gotten attention for the ways in which their algorithms have been unwitting enablers of political radicalization and the proliferation of conspiratorial content. It’s not surprising, in truth, that machine learning quickly discovered that humans are drawn to drama and tempted by content more extreme than what they originally set out to find. And if drama means clicks and clicks mean revenue, that algorithm has accomplished its task! Fortunately, research and methods are underway to redirect and limit radicalizing behavior.

However, dangers need not be as extreme as ISIS sympathizing to merit notice. Take e-commerce. With over 70% of the US population projected to make an online purchase this year, behind-the-scenes algorithms could be influencing the purchasing habits of a healthy majority of the population. The sheer volume of people impacted by recommender systems, then, is cause for a closer look.

It doesn’t take long to think up recommendation scenarios that could raise an eyebrow. While humans fold ethical considerations into their recommendations, algorithms programmed to drive revenue do not. For example, imagine an online grocery service, like [Instacart

Junk food -> more junk food!

Can systems be held accountable?

While this may be great for retailers’ bottom line, it’s clearly not for our country’s growing waistlines. Some might argue that nothing has fundamentally changed from the advertising and marketing schemes of yore. Aren’t companies merely responding to the same old pushes and pulls of supply and demand – supplying what they know users would ask for if they had the knowledge/chance? Legally, of course, they’re right – nothing new is required of companies to suggest and sell ice cream, just because they now know intimately that a customer has a weakness for it.

Ethics, however, points the other way. Increased access to millions of users’ preferences and habits and opportunity to influence behavior aren’t negligible. Between the power of suggestion, knowledge of users’ tastes, and lack of barriers between hitting “purchase” and having the treat delivered – what role should ethically-responsible retailers play in helping their users avoid decisions that could negatively impact their well-being?

Unfortunately, there’s unlikely to be a one-size-fits-all approach across sectors and systems. However, it would be advantageous to see companies start integrating approaches that mitigate potential harm to users. While following the principles of respect for persons, beneficence, and justice laid out in the Belmont Report is always a good place to start, some specific approaches could include:

Providing users more transparency and access to the algorithm (e.g. being able to turn off/on recommendations for certain items)
Maintaining manual oversight of sensitive topics where there is potential for harm
Allowing users to flag and provide feedback when they encounter a detrimental recommendation

As users become more savvy and aware of the ways in which recommender systems influence their interactions online, it is likely that the demand for ethical platforms will only rise. Companies would be wise to take measures to get out ahead of these ethical concerns – for both their and their users’ sakes.

October 17, 2018

Mapping the Organic Organization

Mapping the Organic Organization
Thoughts on the development of a Network Graph for your company

Creating the perfect organization for a business is hard. It has to maintain alignment of goals and vision, enable the movement of innovation and ideas, develop the products and services and deliver them, apply the processes and controls needed to remain disciplined, safe and sustainable, and many things besides. But getting it right means a significant competitive advantage.

Solid Organization

There are myriad organization designs but even with all the efforts and thought in the word, the brutal truth is that the perfect organizational structure of your company today will be imperfect tomorrow. But what if we could track how our organization was developing and evolving over time? Would that be useful in identifying the most effective organization structure, i.e. the one that, informally, is in place already? Furthermore, could tracing the evolution of that organization predict the structure of the future?

Monitoring the way information flows in a company can be tricky as it takes many formats through many systems, but communication flows are becoming easier to map. Large social media platforms such as Facebook or LinkedIn have developed functions to visualize your social network or business contact network. Twitter maps of key influencers and their networks also make interesting viewing. And increasingly, corporations are developing such tools to understand their inner workings.

Network graph example

A corporation’s ‘social network’ can be visualised using the metadata captured from communication between colleagues – email, IM, VOIP call, meetings – and maps them on a network graph. Each individual engaged in work with the company constitutes a node, and the edges, communications (the weight of the edges relating to the frequency of communications).

The insights could be fascinating and important. The communities within the graph can be identified – perhaps these line up with an organization chart, but more likely, not. It would identify key influencers or knowledge centers that may not easily be recognised outside a specific work group, aiding with risk management and key succession planning with many nodes connecting to that person. It could identify the ‘alpha index’, the degree of connectivity within a department providing some understanding of how controlled or independent workgroups are. Perhaps key influencers, those who have a wide and disparate connection profile, could help in driving cultural changes or adoption of new concepts and values.

You could even match the profiles of the people in the team with Belbin’s team roles to see how balanced it is! But maybe that’s getting a little carried away

And I think that’s the risk. What is being collected and what are you using it for?

Data is collected on communications within a corporation. Many companies have Computer User Guidelines that are reviewed and agreed to upon joining a company. The use of computers and the networks owned by an employer comes with some responsibilities and an agreement about this corporate assets’ intended use – and how the company will use the metadata generated. Clearly it is a matter of importance for the protection of company technology and IP to understand when large files are transferred to third parties. And when illicit websites are being accessed from within a company network. But how far can employers go when using this data for other insights into their employees behaviour?

Dilbert privacy

Before embarking on this data processing activity, a company should develop an understanding of its employees expectation of privacy. There is a clear contrast here between an evil, all-seeing big-brother style corporation spying on staff, and employees that are compensated and paid to fulfill specific job functions for that company and who must accept a level of oversight. An upfront and transparent engagement will keep the employee population from being concerned and reduce distrust.

Guidelines for the collection, processing and dissemination of this data can be developed in conjunction with employee representatives using the multi-dimensional analytic developed by Deirdre Mulligan, Colin Koopman and Nick Doty which systematically considers privacy in five dimensions.

The theory dimension helps in identifying the object of the data, specifying the justification for collecting it, and the archetypal threats involved.
The protection dimension helps to develop the boundaries of data collection that the employee population are comfortable with and that the company desire. It will also help the company to understand the value of the data it is generating and understand the risks to its function should information about critical personal be made available outside, for example, to competitors or head hunters.
The dimension of harm can help understand the concerns employees have in the use of the collected data, and a company should be open about what the expected uses are and what they are not – results would not be used as input for any employee assessments, for example.
An agreement of what ‘independent’ entities to oversee the guidelines are carried out can make up the provision dimension.
And, finally, the provision of scope documents concerns about limits to the data managed and access to it. This should include storage length for example. An employee engaging in business on behalf of a company should have limited recourse to be concerned that the company monitors email, but specifications of “reasonable use” of company email for personal business – corresponding with your bank for example – muddies the waters. It will be key, for example, to agree that the content of communications would not be monitored.

In order to maintain motivation, trust and empowerment, it is key to be open about such things. There is an argument that providing this openness may impact the way the organization communicates: Perhaps people become more thoughtful or strategic in communications; perhaps verbal communications start to occur electronically or vice versa as people become conscious of the intrusion. Much like arguments on informed consent, however, I believe the suspicion and demotivation generated if employees’ privacy is dismissed will far outweigh the benefits gained from organizational graphing.

Works Cited

Mulligan, D; Koopman, C; Doty, N. “Privacy is an essentially contested concept: a multi-dimensional analytic for mapping privacy”, 2016

Rothstein, M.A; Shoben, A.B; “Does consent bias research?”, 2013

Ducruet, C. , Rodrigue, J. “Graph Theory: Measures and Indices”, Retrieved September 29, 2018 from https://transportgeography.org/?page_id=5981

Nodus Labs, “Learning to Read and Interpret Network Graph Data Visualizations”, Retrieved September 29, 2018 from https://noduslabs.com/cases/learn-read-interpret-network-graphs-data-visualization/

October 17, 2018

Content Integrity and the Dubious Ethics of Censorship

Content Integrity and the Dubious Ethics of Censorship
By Aidan Feay | September 30th, 2018

In the wake of Alex Jones’ exile from every social media platform, questions about censorship and content integrity are swirling around the web. Platforms like Jones’ Infowars propagate misinformation under the guise of genuine journalism while serving a distinctly more sinister agenda. Aided by the rise of native advertising and the blurring of lines between sponsored or otherwise nefariously motivated content and traditional editorial media, the general populace finds it increasingly difficult to distinguish between the two. Consumers are left in a space where truth is nebulous and the ethics of content production are constantly questioned. Platforms are forced to evaluate the ethics of censorship and balance profitability with public service.

At the heart of this crisis is the concept of fake news, which we can define as misinformation that imitates the form but not the content of editorial media. Whether it’s used to generate ad revenue or sway entire populations of consumers, fake news has found marked success on social media. The former is arguably less toxic but no less deceitful. As John Oliver so aptly put it in his 2014 piece on native advertising, publications are “like a camouflage manufacturer saying ‘only an idiot could not tell the difference between that man [gesturing to a camouflage advertisement] and foliage’.” There’s a generally accepted suggestion of integrity in all content that is allowed publication or propagation on a platform.

Image via Last Week Tonight with John Oliver

Despite the assumption that the digital age has advanced our access to information and thereby made us smarter, it has simultaneously improved the spread of misinformation. The barriers of entry to mainstream consumption are lower and the isolation of like-minded communities has created ideological echo chambers that feed confirmation bias in order to widen the political spectrum and reinforce extremist beliefs. According to the Pew Research Center, this gap has more than doubled over the past twenty years, making outlandish claims all the more palatable to general media consumers.

Democrats and Republicans are More Ideologically Divided than in the Past
Image via People Press.org

Platforms are stuck weighing attempts to bridge the gap and open up echo chambers with cries against censorship. On the pro-censorship side, arguments are made in favor of safety for the general public. Take the Comet Ping Pong scandal, for example, wherein absurd claims based on the John Podesta emails found fuel on 4chan and gained traction within far-right blogs, which propagated the allegations of a pedophilia ring as fact. These articles found purchase on Twitter and Reddit and ultimately led to an assailant armed with an assault rifle firing shots within the restaurant in an attempt to rescue the supposed victims. What started as a fringe theory found purchase and led to real-world violence.

The increasing pressure on platforms to prevent this sort of exacerbation has led media actors to partner with platforms in order to arrive at a solution. One such effort is the Journalism Trust Initiative, a global effort to create accountability standards for media orgations and develop a whitelisted group of outlets which would be implemented by social media platforms as a low-lift means of censoring harmful content.

On the other hand, strong arguments have been made against censorship. In the Twitter scandal of Beatrix von Storch, evidence can be found of legal pressure from the German government to promote certain behaviors by the platform’s maintainers. Similarly, Courtney Radsch of the Committee to Protect Journalists points out that authoritarian regimes have been the most eager to acknowledge, propagate and validate the concept of fake news within their nations. Egypt, China, and Turkey have jailed more than half of imprisoned journalists worldwide, illustrating the dangers of censorship to a society that otherwise enjoys a free press.

Committee to Protect Journalists map of imprisoned journalists worldwide
Image via the Committee to Protect Journalists

How can social media platforms ethically engage with the concept of censorship? While censorship can prevent violence amongst the population, it can also reinforce governmental bias and suppress free speech. For so long, platforms like Facebook tried to hide behind their terms of service in order to avoid the debate entirely. During the Infowars debacle, the Head of News Feed at Facebook said that they don’t “take down false news” and that “being false […] doesn’t violate the community standards.” Shortly after, they contorted the language of their Community Standards due to public pressure and cited their anti-violence clause in Infowars ban.

It seems then that platforms are only beholden to popular opinion and the actions of their peers (Facebook only banned InfoWars after YouTube, Stitcher and Spotify did). Corporate profitability will favor censorship as an extension of consumers so long as those with purchasing power remain ethically conscious and exert their power by choosing which platforms to utilize and passively fund via advertising.