Implications of Advances in Machine Translation

Implications of Advances in Machine Translation
By Cathy Deng | April 2, 2021

On March 16, graduate student Han Gao wrote a two-star review of a new Chinese translation of the Uruguayan novel La tregua. Posted on the popular Chinese website Douban, her comments were brief, yet biting – she claimed that the translator, Ye Han, was unfit for the task, and that the final product showed “obvious signs of machine translation.” Eleven days later, Gao apologized and retracted her review. This development went viral because the apology had not exactly been voluntary – friends of the affronted translator had considered the review to be libel and reported it to Gao’s university, where officials counseled her into apologizing to avoid risking her own career prospects as a future translator.

Gao’s privacy was hotly discussed: netizens felt that though she’d posted under her real name, Gao should have been free to express her opinion without offended parties tracking down an organization with power over her offline identity. The translator and his friends had already voiced their disagreement and hurt; open discussion alone should have been sufficient, especially when no harm occurred beyond a level of emotional distress that is ostensibly par for the course for anyone who exposes their work to criticism by publishing it.

Another opinion, however, was that spreading misinformation should carry consequences because by the time the defamed party could respond, often the damage was already done. Hence, the next question was: was Gao’s post libelous? Quality may be a matter of opinion, but machine translation came down to integrity. To this end, another Douban user extracted snippets from the original novel and compared Han’s 2020 translation to a 1990 rendition by another translator, as well as to corresponding outputs from DeepL, a website providing free neural machine translation. This analysis was conducive to two main conclusions: that Han’s work was often similar in syntax and diction to the machine translation, more so than its predecessor; and that observers agreed that the machine translation was, in some cases, superior to its human competition. The former may seem incriminating, but the latter much less so: after all, if Han had seen the automated translation, wouldn’t she make it better, not worse? Perhaps similarities were caused merely by lack of training (Han was not formally educated in literary translation).

Researchers have developed methods to detect machine translations, such as assessing similarity between the text in question and its back-translation (e.g. translated from Chinese to Spanish, then back to Chinese). But is this a meaningful task for the field of literary translation? Machine learning has evolved such that models are capable of generating or translating text to be nearly indistinguishable from, or sometimes even more enjoyable than, the “real thing.” The argument that customers always “deserve” fully manual work is outdated. And relative to the detection of deep fakes, detecting machine translations is not as powerful in combating misinformation.

Yet I believe assessing similarity to machine translation remains a worthwhile pursuit. It may never be appropriate as a measure of professional integrity because the times of being able to ascertain whether the translator relied on automated methods are likely behind us. Similar to the way plagiarism detection tools are disproportionately harsh on international students, a machine detection tool for translation (currently only 75% accurate at best) may unfairly punish certain styles or decisions. Yet a low level of similarity may well be a fine indicator of quality if combined with other methods. If even professional literary translators might flock to a finite number of ever-advancing art machine translation platforms, it is the labor-intensive act of delivering something different that reveals the talent and hard work of the translator. Historically, some of the best translators worked in pairs, with one providing a more literal interpretation that the other then enriches with artistic flair; perhaps algorithms could now play the former role, but the ability to produce meaningful literature in the latter may be the mark of a translator who has earned their pay. After all, a machine can be optimized for accuracy or popularity or controversy, but only a person can rejigger its outputs to reach instead for truth and beauty – the aspects about which Gao expressed disappointment in her review.

A final note on quality: the average number of stars on Douban, like other review sites, were meant to indicate quality. Yet angry netizens have flooded the works of Han and her friends with one-star reviews, a popular tactic that all but eliminates any relationship between quality and average rating.


How private are Zoom meetings?

How private are Zoom meetings?
by Gerardo Mejia | April 2, 2021

This topic caught my attention, especially after the pandemic, because I see people using Zoom to replace human interaction more and more every day. Zoom is used throughout the day for multiple things including work, education, and personal meetings. At first, I thought that privacy issues were mostly limited to personal meetings, but I later learned that there are privacy concerns in both in education and in the workplace.

Personal Meetings

My initial interest in the topic was due to my observations of people using Zoom for things like birthday parties, bridal showers, baby showers, and other non-traditional uses of Zoom. I became interested on whether Zoom itself monitors or listens in on those calls. I was convinced that somewhere in their privacy policy it would state some type of loophole that would allow them to listen in on calls for the purposes of troubleshooting or ensuring the service was working. I was a bit disappointed, and relieved when I read that meetings themselves are considered “Customer Content” and that the company did not monitor, sell or use the customer content in any purpose other than to provide it to the customer.

However, there was a small, although not too obvious loophole. Zoom considers this “Customer Content” to be under the user’s control, including its security, and thus it cannot guarantee that unauthorized parties will not access this content. I came to find out later that this is a major loophole that has been exploited in many instances. Although Zoom doesn’t take responsibility for this, there are many people that blame the company for not upgrading its security features. This all means that somebody would have to hack their way into my family’s private meeting in order to listen in. I believe that for most family gathering meetings the risk of this happening is not very high, so I would say it is safe to say that most family gathering zoom meetings are private as long as they are not the target of a hacker.


I had initially thought that the education field was not heavily affected by zoom’s privacy or security issues. After all, most educators have trouble getting all their students to attend, and who is going to want to hack into a class? I was wrong about that too. The most notorious example occurred in China where Zoom assisted the Chinese government in censoring content that it did not agree with. It is also important to note that in addition to class content, schools also have other types of meetings that are more private in nature that put some sensitive information at risk like grades or school records. These could also become target of malicious hackers. In conclusion, while censorship may not be a large issue in the United States, there are some countries where this is a real issue.


I remembered that Zoom is in my companies’ prohibited software list. I also learned that most tech companies have also banned their employees from using Zoom for work. I initially thought that this was due to Zoom’s privacy policy or terms of use policy allowing Zoom employees to listen in and thus making the meetings not secure enough as there could be a third-party listening in. It turns out that Zoom’s privacy policy states that they will not listen in or monitor in the meetings. However, like with personal meetings and education meetings, it is up to the company to secure its meetings and Zoom cannot guarantee that unauthorized users will access the content. This security issues make it so that Zoom cannot be held responsible if a company’s meeting is hacked and the meeting accessed by an unauthorized user. Companies are targeted by hackers all the time, so the risk, especially for high profile companies, of their zoom meetings being hacked is large.

Rise of Voice Assistants

Rise of Voice Assistants
by Lucas Lam | April 2, 2021

Many horror stories have surfaced as a result of the rise of Voice Assistants. From Alexa giving a couple some unwarranted advice to Alexa threatening someone with Chuck Norris, many creepy, perhaps crazy have surfaced. Without questioning the validity of these stories and getting deep into conspiracy theories, we recognize that the rise of voice assistants like the Echo from Amazon and Google Home from Google, has and will continue to give way to more privacy concerns. Yet, as it is getting harder and harder to get away from them going forward, we must understand what kind of data they are collecting, how we can take measures to protect our privacy, and how we can have peace of mind when using the product.

What are they collecting?
In the words of this “How-to Geek” article: “Alexa is always listening but not continually recording.” Voice Assistants are triggered by wake words. For Amazon’s device, the wake word is “Alexa”. When the blue ring light appears, the device captures audio input, sends it to the cloud to process the request, and a response gets sent back to the device. Anything said after a wake word is fair game for virtual assistants to record audio input. Every command that is given is stored, sent to the cloud for processing, and a response is sent back to the device to perform the task necessary. In Alexa’s Privacy Hub, it mentions that “all interactions with Alexa are encrypted in transit to Amazon’s cloud where they are securely stored,” explaining that the recording of audio input getting sent to the cloud and back is a secure process. Once a request is processed, the encounter is stored and collected, but users also have the ability to choose to delete the recordings once stored.

When users don’t actively delete their recordings, that’s information that these companies can harness to “understand” you better, give more targeted and customized responses, make more precise recommendations, etc. Though this can be considered creepy, the real threats don’t come when the virtual assistance understands your preferences better, it comes when that understanding gets into the hands of other people.

Potential Threats
What are some of the real threats when it comes to virtual assistants?

Any mishaps in triggering the wake word will lead to unwelcomed eavesdropping. Again, these voice assistants are always listening for their wake words, so a word that is mistaken for “Alexa” will inadvertently record audio input and return a response. That is why it is of upmost importance that companies optimize their algorithms so that they mitigate the false positives and increases the precision of detecting wake words. One major threat is that these recordings can land on the hands of people working on the product, from the machine learning engineers to the transcribers who work with this kind of data to improve the services of the device. Though personally identifiable information should be encrypted, an article in Bloomberg revealed that transcribers potentially have access to first names, device serial numbers, and account numbers.

Hacking is another possible threat. According to an article from Popular Mechanics, a German Sercuity consulting firm found that voice data can be hacked into through third-party apps. Hackers can attempt phishing by getting these virtual assistance to ask for a password or sensitive information in order for a request to be processed. Active security measures must be implemented in place to prevent such activity.

**What to do?**
There are some possible threats, and it’s consequences can escalate. Odds of something like this happening to an average joe is rare, but even if one is fearful of the consequences, many things can be done to protect one’s data privacy, from setting up automatic voice deletions to going file by file to delete the recordings. Careful use and careful investigation on your ability to protect your own privacy can give you a greater peace of mind every time you go home and talk to Alexa.


Invisible Data: Its Impact on Ethics, Privacy and Policy

Invisible Data: Its Impact on Ethics, Privacy and Policy
By Anil Imander | April 5, 2021

A Tale of Two Data Domains

In the year 1600, Giordano Bruno, an Italian philosopher and mystic, was charged with heresy. He was paraded through the streets of Rome, tied to a stake, and then set afire. To ensure his silence in his last minutes, a metal spike was driven through his tongue. His crime – believing that earth is another planet revolving around the sun!

Almost exactly a century later, in 1705, the queen of England knighted Isaac Newton. One of the achievements of Newton was the same one for which Giordano Bruno was burnt alive – proving that earth is another planet revolving around the sun!

Isn’t this strange! Same set of data and interpretations but completely different treatment of the subjects.

What happened?

Several things changed during the 100 years between Bruno and Newton. The predictions of Copernicus, data collection of Tycho Brahe and Kepler’s laws remained the same. Newton did come up with a better explanation of observed phenomenon using Calculus but the most important change was not in data or its interpretations. The real change was invisible –  most importantly Newton had political support from royalty, the protestent sect of Christianity was more receptive to ideas challenging the church and the Bible. Many noted scientists had used Newton’s laws to understand and explain the observed world and many in the business world had found practical applications to Newton’s laws. Newton had suddenly become a rockstar in the eyes of the world.  

This historical incident and thousands of such incidences highlight the fact that data has two distinct domains – Visible and Invisible.

The visible domain deals with the actual data collection, algorithm, model building and analysis. This is the focus of today’s data craze. The visible domain is the field of Big Data, Statistics, Advance Analytics, Data Science, Data Visualization and Machine Learning

The invisible domain is the human side of data. It is difficult to comprehend, not easily understood, not well defined, and is subjective. We tend to believe that data has no emotions, belief systems, culture, biases or prejudices. But data in itself is completely useless unless we, human beings, can interpret and analyze it to make decisions. But unlike data, human beings are full of emotions, cultural limitations, biases and prejudices. This human side is a critical component of the invisible data. This may come as a surprise to many readers but the invisible side of data is sometimes more critical than visible facts when it comes to making impactful decisions and policies.

The visible facts of data is a necessary condition for making effective decisions and policies but it is not sufficient unless we consider the invisible side of data.

So going back to Bruno and Newton’s example – in a way the visible data had remained the same but the invisible data was changed within the 100 years between Bruno and Newton.

You may think that we might have grown since the time of Newton – we have more data, more tools, more algorithms, advanced technologies and thousands of skilled resources. But we are still not far off from where we were – in fact the situation is even more complicated than before.

There is preponderance of data today that supports the theory that humans are responsible for climate change but almost 50 % of the people in the US do not believe that. The per capita expenditure in health care in the US is twice the amount of any developed nations in spite of a significant percentage of the people being not insured or underinsured. Yet many politicians ignore the facts on the table and are totally against incorporating any of the ideas from other developed nations into their plan whether becoming part of the “paris accord” or adopting a regulated health care system.

Why is the data itself not sufficient? There are many such examples in both business and social settings that clearly point out that along with visible facts, the invisible side of data is equally or in many cases more important than the hard facts.

Data Scientists, Data Engineers and Statisticians are well versed with visible data – raw & derived data, structures, algorithms, statistics, tools and visualization. But unless they are also well versed with the invisible side of data – they are ineffective.

The invisible side of data is the field of behavioral scientists, social scientists, philosophers, politicians, and policy makers. Unless we bring them along with the ride, just the datasets will not be sufficient.  

Four challenges of Invisible Data:

I believe that the invisible data domain has critical components that all data scientists and policymakers should be aware of. Typically, the invisible data domain is either ignored, marginalized or misunderstood. I have identified four focus areas of the invisible data domain. They are as follows.

  1. Human Evolutionary Limitations: Our biases, fallacies, illusions, beliefs etc.
  2. Brute Data Environments: Complex issues, cancer research, climate change
  3. Data Mirages: Black swans, statistical anomalies, data tricks etc.
  4. Technology Advancements: Free will, consciousness, data ownership

Human Evolutionary Limitations

Through the process of evolution we have learnt to avoid more of Type I errors (false positives) than Type II errors (false negatives). Type I errors are costlier than Type II errors – it is better to not pick up the rope thinking that its a snake than to pick up a snake thinking that its a rope. This is just one simple example of how the brain works and creates cognitive challenges. Our thinking is riddled with behavioral fallacies. I am going to use some of the work done by Nobel Laureate, Daniel Kahneman, to discuss this topic. Kahneman shows that our brains are highly evolved to perform many tasks with great efficiency, but they are often ill-suited to accurately carry out tasks that require complex mental processing.

By exploiting these weaknesses in the way our brains process information, social media platforms, governments, media, and populist leaders, are able exercise a form of collective mind control over masses.

Two Systems

Kahneman introduces two characters of our mind:

  • System 1: This operates automatically and immediately, with little or no effort and no sense of voluntary control.
  • System 2:  This allocates attention to mental activities that demand dedicated attention like performing complex computations.

These two systems co-exist in the human brain and together help us navigate life; they aren’t literal or physical, but conceptual. System 1 is an intuitive system that cannot be turned off; it helps us perform most of the cognitive tasks that everyday life requires, such as identify threats, navigate our way home on familiar roads, recognize friends, and so on. System 2 can help us analyze complex problems like proving a theorem or doing crossword puzzles. System 2 takes effort and energy to engage it. System 2 is also lazy and tends to take shortcuts at the behest of System 1.

This gives rise to many cognitive challenges and fallacies. Kahneman has identified several fallacies that impact our critical thinking and make data interpretation challenging. A subset are as follows – I will be including more as part of my final project.

Cognitive Ease

Whatever is easier for System 2 is more likely to be believed. Ease arises from idea repetition, clear display, a primed idea, and even one’s own good mood. It turns out that even the repetition of a falsehood can lead people to accept it, despite knowing it’s untrue, since the concept becomes familiar and is cognitively easy to process.

Answering an Easier Question

Often when dealing with a complex or difficult issue, we transform the question into an easier one that we can answer. In other words, we use a heuristic; for example, when asked “How happy are you with life”, we answer the question, “How’s my married life or How is my job”. While these heuristics can be useful, they often lead to incorrect conclusions.


Anchoring is a form of priming the mind with an expectation. An example are the questions: “Is the height of the tallest redwood more or less than x feet? What is your best guess about the height of the tallest redwood?” When x was 1200, answers to the second question was 844; when x was 180, the answer was 282.

Brute Data Environments

During the last solar eclipse, people travelled 100s of miles in the USA to witness the phenomenon. Thanks to the predictions of scientists, we knew exactly what time and day to expect the eclipse. Even though we have no independent capacity to verify the calculations. We tend to trust scientists.

On the other hand, the global warming scientists have been predicting the likely consequences of our emissions of industrial gases. These forecasts are critically important, because the experts see grave risks to our civilization. And yet, half the population of the USA ignores or distrusts the scientists.

Why this dichotomy?

The reason is, unlike the prediction of eclipse the climate dystopia is not immediate, it cannot predict the future as precisely as eclipse, it requires collective action at a global scale and there is no financial motivation.

If the environmentalists had predicted the Texas snowstorm of last month accurately and ahead of time to avoid its adverse impact, probably the majority of the people in the world would have started believing in global warming. But the issue of global warming is not deterministic like predicting an eclipse.

I call issues like “global warming” as issues of a brute data environment. The problem is not deterministic like predicting eclipse, it is more of a probabilistic and therefore open to interpretation. Many problems fall into this category – world hunger, cancer research, income inequality and many more.  

Data Mirages

Even though we have abundance of data today, there are some inherent data problems that must not be ignored. I call them data mirages. These are statistical fallacies that can play tricks on our minds. 

Survivorship Bias

Drawing conclusions from an incomplete set of data, because that data has ‘survived’ some selection criteria. When analyzing data, it’s important to ask yourself what data you don’t have. Sometimes, the full picture is obscured because the data you’ve got has survived a selection of some sort. For example, in WWII, a team was asked where the best place was to fit armour to a plane. The planes that came back from battle had bullet holes everywhere except the engine and cockpit. The team decided it was best to fit armour where there were no bullet holes, because planes shot in those places had not returned.

Cobra Effect

When an incentive produces the opposite result intended. Also known as a Perverse Incentive. Named from a historic legend, the Cobra Effect occurs when an incentive for solving a problem creates unintended negative consequences. It’s said that in the 1800s, the British Empire wanted to reduce cobra bite deaths in India. They offered a financial incentive for every cobra skin brought to them to motivate cobra hunting. But instead, people began farming them. When the government realized the incentive wasn’t working, they removed it so cobra farmers released their snakes, increasing the population. When setting incentives or goals, make sure you’re not accidentally encouraging the wrong behaviour.

Sampling Bias

Drawing conclusions from a set of data that isn’t representative of the population you’re trying to understand. A classic problem in election polling where people taking part in a poll aren’t representative of the total population, either due to self-selection or bias from the analysts. One famous example occurred in 1948 when The Chicago Tribune mistakenly predicted, based on a phone survey, that Thomas E. Dewey would become the next US president. They hadn’t considered that only a certain demographic could afford telephones, excluding entire segments of the population from their survey. Make sure to consider whether your research participants are truly representative and not subject to some sampling bias.

Technology Advancements

As per Yuval Noah Harari, one of the preeminent philosophers in Artificial Intelligence, there is a new equation that has thrown a monkey wrench into our belief system.

B * C * D = AHH

What he means is that the advancements in BioTech (B) combined with advancement in computer technology (‘C’) combined with Data (D) will provide the ability to hack human beings (AHH). Artificial intelligence is creating a new world for us where the traditional human values or human traits are becoming obsolete.

Technologies like CRISPR have already created a moral and ethical issue by providing the ability to create designer babies while technologies like Machine Learning have reignited the issue of bias by using “racist” data for training. The field of Artificial Intelligence is going to combine the two distinct domains of biology and computer technology into one.

This is going to create new challenges to the field of privacy, bias, joblessness, ethics and diversity while introducing unique issues like free will, consciousness, and the rise of machines. Some of the issues that we must consider and pay close attention to are as follows:

Transfer of authority to machines:

A couple of days ago I was sending an email using my gmail account. As soon as I hit the send button, a message popped up “did you forget the attachment?” Indeed I had forgotten to include the attachment and Google had figured that out interpreting my email text. It was scary but I was also thankful to Google! Within the last decade or more, we have come to entrust eHarmony for choosing a partner or Google to conduct search or Netflix to decide a movie for us or Amazon to recommend a book. Self-driving cars are taking over our driving needs and AI physicians are taking over the need for real doctors. We love to transfer authority and responsibility to machines. We trust the algorithms more than our own ability to make decisions for us.

Joblessness and emergence of the “useless class”:

Ever since the Industrial Revolution of the 1840s we have dealt with the idea of machines pushing people out of the job market. In the Industrial Revolution and to some extent in the Computer Revolution of 1980’s and 1990’s, the machines competed for manual skills or clerical skills. But with Artificial Intelligence, machines are also competing in cognitive and decision making skills of human beings.

Per Yuval Noah Harari – the Industrial Revolution created the proletariat class but the AI Revolution will create a “useless class.” Those who lost jobs in agriculture or handicraft during the Industrial Revolution could train themselves for Industrial jobs but the new AI Revolution is creating a class of people who will not only be unemployed but also unemployable!

Invisible Data: Impact on Ethics, Privacy and Policy

The abundance of data has created several challenges in terms of privacy, security, ethics, morals and establishing policies. Mere collection of data makes it vulnerable for hacking, aggregating and de-anonymizing. These are clear problems in the domain of visible data but these become even more complicated when we bring in invisible data in the mix. Following are few suggestions that we must explore:

Data Ownership and Usage

After the agricultural revolution, land was a key asset and decisions about its ownership were critical in managing society. After the Industrial Revolution, the focus shifted from land to factories and machines. The entire twentieth century was riddled with the ownership issue of land, factories and machines. This gave rise to two sets of political systems – liberal democracy and capitalism on one side and communism and central ownership on the other side. Today the key asset is data and decisions about its ownership and use will enable us to set the right policies. We may experience the same turmoil we went through while dealing with the issue of democracy vs communism.

The individual or the community

On most moral issues, there are two competing perspectives. One emphasizes individual rights, personal liberty, and a deference to personal choice. Stemming from John Locke and other Enlightenment thinkers of the seventeenth century, this tradition recognizes that people will have different beliefs about what is good for their lives, and it argues that the state should give them a lot of liberty to make their own choices, as long as they do not harm others.

The contrasting perspectives are those that view justice and morality through the lens of what is best for the society and perhaps even the species. Examples include vaccinations and wearing masks during a pandemic. The emphasis on seeking the greatest amount of happiness in a society even if that means trampling on the liberty of some individuals.

AI Benevolence

Today when we talk about AI, we are overwhelmed by two types of feelings. One is of awe, surprise, fascination and admiration and the other is of fear, dystopia and confusion. We tend to consider AI as both omnipotent and omniscient. There are the same adjectives we use for “God”. The AI concerns have some legitimate basis but like “God” we should also look to AI for benevolence. Long term strategies must include intense focus on using AI technology to enhance human welfare. Once we switch our focus from AI being a “big brother” to AI being a “friend” our policies, education and advancement will take a different turn.

Cross Pollination of Disciplines

As we saw already that the invisible data spans many disciplines from history to philosophy, to society to politics to behavioral science to justice and more. The new advancements in AI must include cross-pollination between humanists, social scientists, civil society, government and philosophers. Even our educational system must embrace cross pollination of disciplines, ideas and domains.

Somatic vs Germline Editing

Who decides what is right – somatic vs germline editing to cure diseases?

Somatic gene therapies involve modifying a patient’s DNA to treat or cure a disease caused by a genetic mutation. In one clinical trial, for example, scientists take blood stem cells from a patient, use CRISPR techniques to correct the genetic mutation causing them to produce defective blood cells, then infuse the “corrected” cells back into the patient, where they produce healthy hemoglobin. The treatment changes the patient’s blood cells, but not his or her sperm or eggs.

Germline human genome editing, on the other hand, alters the genome of a human embryo at its earliest stages. This may affect every cell, which means it has an impact not only on the person who may result, but possibly on his or her descendants. There are, therefore, substantial restrictions on its use.

Treatment: What is normal?

BioTech advancements like CRISPR can treat several disabilities. But many of these so-called disabilities often build character, teach acceptance, and instill resilience. They may even be correlated to creativity.

In the case of Miles Davis, the pain of sickle cell drove him to drugs and drink. It may have even driven him to his death. It also, however, may have driven him to be the creative artist who could produce his signature blue compositions.

Vincent van Gogh had either schizophrenia or bipolar disorder. So did the mathematician John Nash. People with bipolar disorder include Ernest Hemingway, Mariah Carey, Francis Ford Coppola, and hundreds of other artists and creators.

Apple Vs Facebook: Who’s Right is Your Data?

Apple Vs Facebook: Who’s Right is Your Data?
by Dan Ortiz | March 12, 2021

Photo by Markus Spiske from Pexels

Apple and Facebook are squaring off in the public domain over user privacy. In iOS 14.5, across all devices, app tracking features will transition from opt-out to opt-in and developers will be required to provide a justification for the tracking request in regards to third party tracking (App Tracking Transparency, User Privacy, App Privacy). As much as we are concerned that an app may spy on us through our camera, or sell our location data, this permission is to mitigate concerns an app is following us throughout our digital experience and logging interactions we have with other apps. Apple’s goal is to better inform users on the information each app is collecting and provide its users with more control over their data(. It is not to end user tracking or end personalized advertisements, but to increase transparency and get users consent prior to doing so. For people who prefer highly targeted ads, they accept the tracking request. For those who find it creepy and swear Facebook is listening in on their conversations, they can deny the request. Everyone gets what they want.

In response to the upcoming iOS updates, Facebook launched a very loud, very public campaign against the new policies claiming it will financially damage small businesses by limiting the effectiveness of personalized advertisements. At the core of this disagreement is who owns the data. Facebook phrases it like this “Apple’s policy could limit your ability to use your own data to show personalized ads to people who are likely to be interested in your business”. Clearly, Apple views the control of personal data as the right of the individual user, and Facebook believes they control that right.

Facebook’s argument claims that giving users the ability to say no to cross application tracking will hurt small businesses ability to serve personalized ads to potential customers, thus increasing the small business marketing costs. Facebook has taken out full page ads and have launched a campaign (Speak Up For Small Business). Even though iOS is only 17% of the global market, it has roughly 33% of the US population and average income for an iOS user tends to be 40% higher than an Android user. iOS users are a significant market in the USA and control a significant amount of its disposable income.

Photo by Anton from Pexels

However, Facebook’s argument, excluding concerns on how they calculated impact to small business, is disingenuous. Their campaign portrays this update to iOS as the death of personalized ads and the death of the small business. In reality, small businesses can still target advertisements in all data that has been uploaded to Facebook from our phones directly (first party). Small businesses can still use information about us, our home town, our interests, our groups all associated with our Facebook profile. What is changing is Facebook’s ability to track iOS users across multiple applications and browsers on the device itself. It is disingenuous to claim that user generated data, on applications not owned by Facebook, is the property of another unrelated small business.

The landscape of privacy in the digital age is shifting. Apple’s policy of championing individual choice when it comes to sharing personal data, although still notify and consent, is a step in the right direction. It informs users and asks for consent directly, rather than burying it in a long user agreement document. This aligns with the GDPR requirements for lawful consent requests(source). The collection and misuse of user data is a growing concern and continues to be a topic of increased debate. Landmark legislation like CalOPPA and GDPR are increasingly redefining privacy rights of the individual. Instead of embracing these changing landscapes, Facebook chose to stand in opposition of Apple’s app-tracking feature instead of convincing us, the users, why we should allow Facebook to track us across the internet.

This conflict has exposed the real questions consumers will face when iOS is updated. When the request to track pops up as the user launches the Facebook app launches, what will they do? Will they allow tracking and vindicate Facebook’s position, or will they deny the request challenging Facebook’s current business model of tracking people all across the web?

Better yet, what decision will you make?

Google, the Insidious Steward of Memories

Google, the Insidious Steward of Memories
by Laura Treider | March 12, 2021

All those cookies are bad for you
If you are a regular reader of this blog or if you take an interest in the use of data in our modern economy, you are aware that most companies you interact with digitally online or via apps on your phone track as much of your activity as possible. These companies frequently sell your data to brokers, or, while not technically selling it, work with advertisers to monetize your data by tailoring ads to serve you. Proponents of this scheme argue it’s good for consumers and everyone is happier seeing ads tailored for their interests. Privacy advocates are more skeptical and argue you should delete your data to avoid staying in a filter bubble and to prevent any other uses of your data that might not be to your benefit. Choosing to err on the side of privacy, I decided to see what data Google has about me and see how easy it is to delete it.

How to view and download your data
If you’re logged in to a chrome browser, seeing what data Google has about you is fairly simple; the details of how to accomplish it are here. Google has a data and personalization landing page that lets you view your tracked activity and control “personalization settings”, Google’s euphemism for how much they are tracking your activity. From this landing page, I was able to click through to a link to Google Takeout, their cheekily named platform for downloading all the information linked to your account. I chose to download everything possible. It took 10 hours before my download was ready and I received an email with 23 links. 22 of them were 2GB zipped files and a 6GB .mbox mail file containing emails since 2006. (I’m not an email deleter.)

What data does Google have about me?
While I was going through the downloads, I became overwhelmed by the sheer number of folders relating to different products, 40 in all. After I unzipped the 22 files and put them together on my hard drive, I found that Google includes an archive_browser.html file that helps you navigate the file structure. Google isn’t just holding data about my web browsing activity and search history. I was surprised to learn that I had more than 25,000 photos uploaded to Google’s servers. These weren’t from my android phone, either. They were from my camera. At some point in my data life, I must have chosen to upload 25,000 photos to the cloud, but I don’t remember having done that. There were also 48 videos included, the entirety of my YouTube channel I had set up 15 years ago while living in England and sharing memories of my newborn son with family in the US.

Interestingly, not included in my Takeout was my Ad Settings. I had to navigate to those via the online “Data & personalization” hub. I was able to see all the things that Google thinks I’m interested in. Some of them made sense: “Dogs,” “TV Comedies,” “Computers and Electronics,” and “Cooking and Recipes.” Others were a little more perplexing: “Nagoya,” “Legal Education,” and “Parental Status: Not A Parent.” (Sorry, son, I must not Google how to parent you enough.) As an aside, a great ice breaker for virtual dates during this pandemic would be to share each other’s ad personalization data with each other.

In addition to the files I had given Google and the ad settings Google was predicting for me, Google also has 96 MB of location data for me, 12 MB of search history spanning the last 18 months, and another 11 MB of browsing history spanning the past 5 months. Here’s where I got distracted: Google gives you your location data in the form of a JSON file. If you want to turn that JSON file into something viewable in map form and you’re programmatically inclined, I recommend this GitHub repo.

The Google location data rabbit hole
But Google’s online viewer for showing you your location history is engrossing. You land on a map with a bunch of markers representing places you have visited and lived at with some fun badges to encourage you to explore.

I looked at my data from Boise, the previous city I lived in, and randomly clicked dots. Each location came with a flood of memories about when I had gone there and why.

I went back in time to when I knew I had gone on a vacation to Germany. Clicking through the days was like looking at a scrapbook Google assembled for me. One of the days is shown below. I had traveled from our hotel in Munich to the Neuschwanstein and Hohenschwangau castles. My travel path was there, complete with the stop I had made in Trauchgau to air up a leaky tire. Not only was my route there, I could dig through the photos I took on my mobile phone at each stop. What a gift!

The verdict:
According to a recent survey by the US census, only 89% of households has a computer. That’s potentially more than 30 million people who may have smart phones but no computer. So not all users of Google products have the luxury of data Takeout from cloud storage to their home storage. These people will be less likely to delete their data because there’s nowhere else for it to live. By positioning themselves as a simple and generous cloud storage provider for the average citizen, Google has trained us to let them be our data caretakers.
Armed with the knowledge of the extent of the data Google has about me, I was ready to decide whether to wipe the slate clean and remove my digital traces from their servers. Reader, I was too weak. I couldn’t do it. Whoever designed the interface for viewing your Google data online knows exactly what strings to pull to make my nostalgia kick in and want to save this treasure trove of memories. Google doesn’t just know everything about my digital life. It knows where I go and when. I am trussed up like a ham and served to advertising partners, and I go to the feast willingly.

More reading:
How to see your Google data:

Apple’s Privacy War: Is it Good or Bad? You decide.

Apple’s Privacy War: Is it Good or Bad? You decide.
by Darian Worley | March 5, 2021

In the ongoing battle to provide more tracking and tools to identify consumers buying habits, Apple has decided to take a different approach and limit the ability of many companies to track your data without your permission. In this multi-billion dollar a year industry, Apple has indicated that it would release what it calls App Tracking Transparency (ATT) across iOS, iPadOS, and TVOS. This feature is expected to launch early spring 2021 to combat the digital surveillance economy.

How do advertisers track me anyway?

People go about their lives every day without realizing just how much data internet giants have collected. When iPhone users use an app to look at the weather, a Facebook post or another app on your iPhone, advertisers use an identifier called Identifier for Advertisers (IDFA) to track the user’s online behavior across multiple apps. This IDFA is a random device identifier assigned by Apple to a user’s device. It is used by companies to determine your location, what websites you visit, and other pertinent info without obtaining access to a user’s personal information. Companies use this information to sell marketing adds to individuals they are able to track, thus monetizing the data that they collected based on your own individual habits. Interestingly, Apple created the IDFA as a result of being sued for sharing user information without limitation via the UDID (Unique Device Identifier).

Why do I need ATT if Apple already has App Privacy Labels in the App Store?
Currently, Apple has what it calls Privacy Nutrition Labels in the Apple App store. These nutrition labels give iPhone users a snippet of what data apps collect and how they use this data. However, these privacy labels are currently based on self-reporting by app developers. There’s no verification by Apple or by any other source to determine whether or not an app is falsely using your data. Users should use caution when reviewing these labels as they may not be able to trust what apple and the privacy label says in the app store. Many apps in the app store that say that they are not sharing your data, but they could be.

Aren’t There Privacy Frameworks and Privacy Laws to Protect Me?
Many users are concerned about their privacy. Privacy frameworks and privacy laws such as The Belmont principles, CALOPPA, CCPA and the FTC were enacted to protect an individualÕs rights. While these privacy frameworks and laws focus on many privacy areas, two core tenants are choice for consumers and greater transparency. Due to the explosion of big data and online apps, many app developers and internet companies have skirted many of these laws and frameworks. In a limited unscientific study where users were specifically asked to read a privacy policy for a specific company, users indicated that the privacy policy is “too long”, and they said that they “assume that the privacy policy has good intentions.”

Potential Negative Implications of Apple’s ATT

In the tech giant war, Facebook took out a full-page ad indicating that Apple’s ATT would harm businesses. In the article “New Apple Privacy Features Will Be Hard on Small Businesses: Curtailing the collection of user data” may mean big spending for small developers, the author does not share any data on how small companies will be impacted other than stating that small business other than Facebook and Google have smaller budgets and they need to gather information to target their users. Further research did not yield any additional insights regarding how small firms would be hurt. The bigger story here is that Facebook has taken a stand against this privacy policy since Facebook stands to lose millions due its billion dollar digital advertising revenue stream.

In summary
We’ve been told that to get better services from internet companies, we need to give up more of our data. While this may be true, consumers should have the right to choose. While one can’t be sure of Apple’s motives to limit user tracking on the iPhone, it is already yielding tangible results since LinkedIn and Google indicated that they would stop collecting IDFA data on iOS. This seems to be a welcomed approach to the wild, wild west of collecting, using and monetizing one’s data without permission. Apple’s policy seems to strike the right balance between giving users choice to determine how their data is used by individual apps. Ultimately, as a user of the iPhone, you get to decide.

Can a Privacy Badge drive consumer confidence on privacy policy?

Can a Privacy Badge drive consumer confidence on privacy policy?
by Dinesh Achuthan | March 5, 2021

As a user/consumer, I always wondered what is in the privacy document or even in terms and conditions document which I blindly scroll through and accept. I talked with a few of my colleagues and friends, and I am not surprised to hear that they also do the same. When the privacy policy or terms and conditions are shown automatically, most of us tend to scroll over and accept it as we know there are no other options other than accepting it if we want to use the application. In the same line of thought, the visual display of sites’ and apps’ security was enhanced a couple of decades ago. We started to see trusted badges, verified by third-party badges, to provide a quick impression on the app/site security. There is a company called TRUSTe who started this idea of providing badges based on privacy policy two decades ago but now it is acquired by a different company and the idea of the badge has changed to drive more e-commerce business rather than to establish the intended privacy-policy trust with the end consumers.

My idea of a privacy badge originated from this idea of security badges, payment partner badges and other types of badges to instill confidence and trust with the end consumers. Why not provide a privacy badge or even terms-and-conditions badge either through a third-party service or via a self-assessment framework for any site/mobile app? Will this in any shape or form help the end consumer? Can a company or industry use this framework to assess themselves to improve their privacy policy? As a user, will it provide me some sense of security to see some badges instead of scrolling through pages and pages of privacy documentation? After thinking through and talking with few of my colleagues I started to think on how to create this privacy self-assessment framework through a methodological thought process and establish a scoring template to self-determine a privacy badge for any privacy policy. If we have such a thing, how would it look like?

I would like to share my approach with limited scope and validate whether it will work before embarking on larger scale. So, I constrained myself to US location and left EU’s GDPR and Germany’s BDSG and any other Asian privacy laws. First, I need to design a privacy assignment framework. What should be there in the framework?

1. I definitely want what an end consumer sees important for his privacy. How can I get this? I started to think about privacy related lawsuits in the past one decade.
2. I definitely want how a corporate or a company thinks about user privacy aligned to their business model. I can get this for any company through privacy policy.
3. Finally, I want something to map consumer thought to corporate thought via what is legally binding, which are the US privacy laws.

To stitch all the above three together, I decided to use the leading three academic privacy frameworks (Solove’s Taxonomy, Mulligan et al.’s Analytic, Nissenbaum Contextual Integrity) and below is the approach I used.

Assessment Framework Design and validation approach
1. Design Privacy categories based on 3 leading academic privacy framework (Privacy Assessment Framework)
2. List US Legal framework in consideration
3. Analyze the top 5-10 Privacy lawsuits and map to privacy categories to which the lawsuit fits.
4. Design Qualtrics privacy lawsuit questionnaire to get user perspective on the lawsuit categories
5. Design Qualtrics privacy baseline questionnaire to get user perspective on top 5-10 good privacy policies and bottom 5-10 bad privacy policies
6. Compute weights for each privacy category with inputs from the Qualtrics survey. Establish privacy score to badge matrix.
7. Compute privacy score with the assessment framework by evaluating 3-5 random privacy policies from the industry. Higher the score better the privacy policy and higher the badge.
8. Validate whether the badge fits with leading privacy experts.

Sample view of privacy score to badge mapping. There are further templates and charts which I omitted to include in this blog to keep it simple.

Assessment Score, Privacy Badge
0-25, Copper
26-40, Bronze
41-60, Silver
61-80, Gold
80-100, Platinum

Sample view of privacy assessment scoring template

I believe this framework will help both the consumers as well as companies. Companies and corporates can use this framework and start self-evaluating their privacy policies and at least get a basic understanding of their score. As a consumer I can get an approximate handle on the privacy policy based on the score or the badge.



US Privacy Lawsuits:
● New York Attorney General Letitia James announced her office reached a settlement with Dunkin’ Donuts over the handling of its 2015 data breach of approximately 20,000 customers. The settlement includes $650,000 in penalties, along with new requirements for data security.
● U.S. District Judge Charles Kocoras in Chicago threw out a motion to dismiss IBM’s case over Illinois’ Biometric Information Privacy Act violations regarding the use of facial images from Flickr, Reuters reports.
● Related to IBM, MediaPost reports Amazon and Microsoft are seeking dismissal of Illinois’ BIPA cases of their own regarding their use of the same images held by IBM.
● Facebook reaches a $650 Million settlement for facial recognition technology used to tag photos by storing biometric data (digital scans of users’ faces) without notice or consent violating Illinois’s BIPA.
● FTC and New York Attorney General fine Google and Youtube $170 Million for collecting personal information of children (persistent identifiers) violating COPPA. (badge image)
● (As claimed at Companies who display the TRUSTe Certified Privacy seal have demonstrated that their privacy policies and practices meet the TRUSTe Enterprise Privacy & Data Governance Practices Assessment Criteria. It’s fair to say that TRUSTe is no longer the preeminent trustmark to website visitors. Many have never heard of the organization or know of its history, and many other entities and regulations have stepped forward in the privacy and security space)

DNA Databases: The Line Between Personal Privacy and Public Safety

DNA Databases: The Line between Personal Privacy and Public Safety
by Brittney Van Hese | March 5, 2021

Recently customers of popular ancestry companies, such as GEDmatch, learned that the DNA data they had submitted to learn about their family was secretly being searched by police to solve crimes. While the contribution made to putting away some of the vilest criminals – like the Golden State Killer – has been touted by law enforcement as a win for society, the revelation of policing searching genealogy profiles without user knowledge has raised questions about the line between consumer privacy and public safety.

Image Source

Genealogy uses the DNA associated with ancestral linage to establish a family connection between a perpetrator’s sample and the uploader. Then, manually, an analyst builds down a family tree from that connection using public records such as birth certificates, death records, and marriage licenses. The family tree is then used to generate a focused suspect pool, at which point investigative police work takes over to build a remaining case sufficient for arrest.

Image Source

For the Golden State Killer, police obtained the family tree data by acting as a normal user uploading DNA to find a relative, not identifying themselves as police. These approaches have shone a light on the previously unconsidered legal and ethical concern of police access to consumer data for the public good. Up until the Golden State Killer case, GEDmatch was not even aware police were using their services, users were unaware they were cooperating with police, and no regulation existed on the subject.

Now that the discussions have started, two sides of the argument have naturally emerged. Those in law enforcement who believe in the beneficence of their work see little harm in the practice. It is solving terrible crimes which would otherwise be left to turn cold. Additionally, legal non-profits like DNA Doe Project, access genealogy resources to identify John and Jane Doe victims – bringing closure to families.

On the other side of the argument, users voice concerns about consent, constitutionality, and police misconduct. Firstly, users uploading their profiles were not volunteering to be included in a suspect database and their consent was never given for their data to be searched. Additionally, these searches were conducted without warrants, which is in conflict with recent supreme court precedent regarding obtaining public database information. Lastly, there are members – like Michael Usry – who were targeted as a suspect because their profile is closely related to the culprit’s family tree. Opening the door for police misconduct such as bias efforts being made to confirm the genealogy results.

In response to the debate, DNA and genealogy companies have altered their privacy policies to attempt pleasing both sides by creating an opt-in policy for users. By opting in, users are agreeing to add their profiles to a database that is available to police. However, the glaring concern that arises from this approach is that the opt in does not actually impact the individual who is sharing the data – as it is most likely they would share the data with the knowledge they have not committed any crimes. The problem is that the person choosing to exercise their personal freedom to opt in and share their data with police are doing so on the behalf of distant relatives who may have committed these crimes. This presents a not only a moral dilemma with implicating others’ privacy but also applies ethical pressure in public safety, making this a particularly difficult situation.

Luckily, there is a path forward through legislation. First and foremost, the process still relies on proper due process of the criminal justice system; a judge must grant a warrant to conduct searches on the databases have users consent. Warrants can only be requested for this purpose only if the case is a violent personal crime, such as homicide or rape, and has exhausted all other investigative resources. Most importantly, the scope of genealogy data is still limited by current technology to only point investigators in a general direction, from which investigators must still rely on using evidence-based crime solving to make an arrest. For now, Federal regulation of genealogy data usage in crime fighting strikes a sufficient balance in privacy and policing; but this legislation will need to be closely monitored as genealogy technology advances.


Akpan, Nsikan. “Genetic Genealogy Can Help Solve Cold Cases. It Can Also Accuse the Wrong Person.” PBS, Public Broadcasting Service, 7 Nov. 2019,

“DNA Databases Are Boon to Police But Menace to Privacy, Critics Say.” DNA Databases Are Boon to Police But Menace to Privacy Critics Say | The Pew Charitable Trusts,

“DNA Doe Project.” DNA Doe Project Cases, 5 Mar. 2021,

Payne, Kate. “Genealogy Websites Help To Solve Crimes, Raise Questions About Ethics.” NPR, NPR, 6 Mar. 2020,

Schuppe, Jon. “Police Were Cracking Cold Cases with a DNA Website. Then the Fine Print Changed.”, NBCUniversal News Group, 29 Oct. 2019,

Zhang, Sarah. “The Messy Consequences of the Golden State Killer Case.” The Atlantic, Atlantic Media Company, 2 Oct. 2019,

When is faking it alright?

When is faking it alright?
by Randy Moran | March 5, 2021

Photo by Markus Winkler on Unsplash

The AI news has been littered with Deep Fake articles over the last couple of years. Some articles are about using it for fun (CNet), some are using it to demonstrate technical capability, like with recent Tom Cruise fakes (Piper). And, some are using it maliciously to harm or try to sway opinions and rally opposition (“Malicious use of deep fakes is a threat to democracy everywhere”). All of this points to the fact that AI is just technology, a tool to be used for either good or bad purposes.

The recent announcement of WE-FORGE (DARTMOUTH COLLEGE), takes faking in a whole different direction. WE-FORGE can generate fake, realistic documents, not for fun and not quite for malicious reasons, but to obfuscate actual content; for the purposes of counter espionage. This AI approach can be used to hide corporate or national security documents within the noise of numerous other fake documents. As their announcement points out, this noise aspect was used successfully in WWII to thwart efforts in discovering upcoming military maneuvers in Sicily.

The above announcement leads to thinking about its application for obfuscating (hiding) individuals activity from tracking. As we have reviewed in the w231 course work, individuals’ have limited to no control over their data in today’s web, social, and application landscape. We have seen that privacy policies, for the most part, serve and cover a firm more so than the individual. True protective legislation is years away; and will still be guided by procedure over individual rights and likely to be fought hard from highly profitable tech firms. The procedures to control your own information are laborious and don’t completely provide the controls one would want. The only choices are to live with it so you can utilize the service or stop using the service altogether, limiting one’s ability to connect and participate in the good aspects of the technology.

Helen Nissenbaum, whose privacy framework outlined week 5 in “A Contextual Approach to Privacy Online” (Nissenbaum 32-48) was a co-author, along with Finn Brunton, of a book called “Obfuscation: A User’s Guide for Privacy and Protest” (Brunton and Nissenbaum). In that book, they outline obfuscation as “the deliberate addition of ambiguous, confusing, or misleading
information to interfere with surveillance and data collection.” They outline numerous variations for obfuscating your identity; chafing, location spoofing, disinformation, etc. As they state, in chapter three, “privacy is a multi-faceted concept, and a wide range of structures, mechanisms, rules, and practices are available to produce it and defend it.” There are legal mechanisms, there are technical solutions, and there are application options. To these, their goal in that chapter, they feel the need for an individual to utilise obfuscation; to produce noise that looks like normal activity so as to hide the actual activity. It provides an individual way to camouflage activity when other aspects fail. It is similar to the WE_FORGE process above, but for individuals.

To enable that strategy, Helen partnered with Daniel Howe and Vincent Toubiana, to develop a tool called “TrackMeNot” (Howe et al.) that puts these ideas to practice. It provides a browser plugin for Google Chrome and Mozilla Firefox, an option so that your activity can be obfuscated. It generates random search queries through several search services (Google, Bing, Yahoo,etc.) to hide an individuals’ actual search history. One could spend the time to do it manually, but the systematic approach is much more efficient.

While legislation may come in time, and individuals may gain control of their data eventually, they can now hide their activity. This is not necessarily going to be sought out by everyone. It will likely only be used by those aware of the lengths organizations and companies have gone through to identify and categorize users. As the authors put it, “it’s a small revolution” for those interested in mitigating and defeating surveillance. To a common individual, it’s an effort they don’t care to spend. To the few aware individuals, it’s one small step towards gaining back control of one’s own privacy. The browser plugins provide obfuscation for at least this one specific aspect of user activity.

Still, I can see additional applications being developed in the future, in social networking apps and other service areas, utilizing AI to generate noise that the application AI mechanisms are trying to capture and identify. Just as hackers use AI to infiltrate networks (F5), AI is now being used by software (IBM) to identify and counter those attacks. Most folks know AI/ML is being
used to catalog and categorize individuals and their activity; the next obvious step is to use the technology to thwart that activity for those that are concerned. To some, including the companies capturing the data, it may seem wrong to pollute the data. Still, it is justified and warranted to those individuals who care about their privacy since laws have not caught up to stop the proverbial data peeping-toms. In the latter case, they just have to look at more
information, which is no different from what WE-FORGE is trying to accomplish with its counter-espionage tactics.