blogpost – Page 44 – Data Science W231 | Behind the Data: Humans and Values

December 19, 2017December 19, 2017

The Effectiveness of Privacy Regulations

Digital Data and Privacy Concerns

In a world where fast pace digital changes are happening every second, and new form services are being built using data in different ways. At the same time, more and more people are becoming concerned about their data privacy as more and more data about them are being collected and analyzed. However, are the privacy regulations able to catch up with the pace of data collection and usage? Are the existing efforts put into privacy notices effective in helping to communicate and to form agreements between services and the users?

An estimated 77% of websites now post a privacy policy.
These policies differ greatly from site to site, and often
address issues that are different from those that users care
about. They are in most cases the users’ only source of
information.

Policy Accessibility

According to a study done at The Georgia Institute of Technology that studied the online privacy notice format, out of the 64 sites offering a
privacy policy, 55 (86%) offer a link to it
from the bottom of their homepage.Three sites (5%)
offered it as a link in a left-hand menu, while two (3%)
offered it as a link at the top of the page. While there are regulations requiring the privacy notice be given to the users, there is no explicit regulation about how/where it should be communicated to the users. In the above mentioned study, we can see that most of the sites tend not to emphasis the privacy notice before users start accessing the data. Rather due to lack of incentives to make the notice accessible, it is pushed to the least viewed section of the home page ,since most users redirect out of home page before reaching the bottom.

In my final project I conducted a survey targeted to collect accessibility feedback directly from users who interact with new services regularly. The data supports the above observations from another perspective, which is the notice are designed with much lower priority than other contents presented to the users, which leads to very few percentage of users actually reads those notices.

Policy Readability

Another gap in the privacy regulations would be the readability of the notice. In the course of W231, one of the assignment we did was to read various privacy notices, and from the discussion we saw very different privacy notice structures/approaches from site to site. Particularly, a general pattern is that there is often use of strong and intimidating language
has strong legal backing, which makes the notice content not easily understood by the general population, but at the same time does abide with the regulations to a large extent.

Policy Content

According to the examinations of different privacy notices during the course of the W231, it is obvious that even among those service providers who intend to abide with privacy regulations, there is often use of vague languages and missing data. One commonly seen pattern is usage of languages like ‘may or may not’, ‘could’. In combination of the issue of accessibility, with users’ different mental state in different stages of using the services, few users actually seek clarifications from service providers before they virtually sign the privacy agreements. The lack of standard or control over the privacy policy content puts the users to an disadvantage when they encounter privacy related issues, as the content was already agreed on.

To summarize, the existing regulations on online privacy agreements are largely at a stage of getting from ‘zero’ to ‘one’, which is an important step as the digital data world evolve. However, a considerable amount of improvements are still needed to close the gaps between the existing policies to an ideal situation where services providers are incentivized to make the policy agreement accessible, readable and reliable.

December 19, 2017

Privacy Policy Regulation

With rapid advances in data science, there is an ever increasing need to better regulate data and people’s privacy. In the United States, existing guidelines such as FTC Guidelines and California Online Privacy Protection Act are not sufficient in addressing all the privacy concerns. We need to draw some inspiration from the regulations suggested in European Union General Data Protection Regulation (GDPR) Some have criticized GDPR for potentially impeding innovation . I don’t agree with this for two reasons 1) There is a lot of regulation in the traditional industries in the US and organizations still manage to innovate. Why should data-driven organizations be treated any differently 2) I feel that majority of the data-driven organizations have been really good at innovating and coming up with new ideas. If they have to innovate with more regulated-data, I believe they will figure how to do it.

From the standpoint of data and privacy, I believe we need more regulation in the following areas

Data Security – We have seen a number of cases where user information is compromised and organizations have not been held accountable for the same. They get away with a fine which pales in comparison to the organizations’ finances.
Data Accessibility – Any data collected on a user should be made available to the user. The procedure to obtain the data should be simple and easy to execute.
Data Recession – Users should have the choice to remove any data they wish to be removed.
Data Sharing – There should be greater regulation in how organizations share data with third parties. The organization sharing the data should be held accountable in case of any complications that arise.
A/B Testing – Today, there is no regulation on A/B testing. Users need to be educated on A/B testing and there should be regulation on A/B testing with respect to content. Users must be consented before performing A/B testing related to content and users should be compensated fairly for their inputs. Today, organizations compensate users for completing a survey. Why shouldn’t users be compensated for being a part of an experiment in A/B testing.

The privacy policy of every organization needs to include a Privacy Policy Rubric as shown below. The rubric would indicate a user about the organization’s compliance with the policy regulations. It can also be used to hold an organization accountable for any violation of regulations.

Lastly, there needs to stricter fines for any breach in regulation. GDPR sets a maximum penalty of 4 % of total global revenue, with penalties befitting the nature of the violation. The top-level management of an organization needs to be held accountable by the organization board for failing to meet the regulations.

December 18, 2017

Artificial Intelligence: The Doctor is In

When I hear that AI will be replacing doctors in the near future, images of Westworld cybernetics come to mind, with robots toting stethoscopes instead of rifles. The debate of the role of AI in medicine is raging, and with good reason. To understand the perspectives, you just have to ask these questions:

• What will AI be used for in medicine?
• If for diagnosis, does AI have the capability of understanding physiology in order to make a diagnosis?
• Will AI ever harm the patient?

To the first point, AI can be a significant player in areas such as gauging adverse events and outcomes for clinical trials and processing genomic data or immunological patterns. Image recognition in pathology and radiology is a flourishing field for AI, and there have even been gasp white papers proving so. The dangers start emerging when AI is used for new diagnoses or predictive analytics for treatment and patient outcomes. How a doctor navigates through the history and symptoms of a new patient to formulate a diagnosis is akin to the manner in which supervised learning occurs. We see a new patient, hear their history, do an exam, and come up with an idea of diagnosis. While that is going on, we have already wired into our brains, let’s say, a convolutional neural network. That CNN has already been created by medical school/residency/fellowship training with ongoing feature engineering every time we see a patient, read an article, or go to a medical conference. Wonderful. We have our own weights for each point found in the patient visit and voila! A differential diagnosis. Isn’t that how AI works?

Probably not. There is a gaping disconnect between the scenario described above and what actually goes on in a doctor’s mind. The problem is that machine learning can only learn from data that is fed into it, probably through an electronic medical record (EHR), a database also created by human users, with inherent bias. Without connecting the medical knowledge and physiology that physicians have, that the CNN does not have. If this is too abstract, consider this scenario – a new patient comes into your clinic with a referral for evaluation of chronic cough. Your clinic is located in the southwest US. Based on the patient’s history and symptoms, coupled with your knowledge of medicine, you diagnose her with histoplasmosis infection. However, your CNN is based on EHR data from the northeast coast, which has almost no cases of histoplasmosis. Instead, the CNN diagnoses the patient with asthma, a prevalent issue across the US and a disease which has a completely different treatment.

AI could harm the patient. After all, we do not have the luxury of missing one case like when we screen emails for spam. Testing models and reengineering features will come with risks that everyone – the medical staff and the patient – must understand and accept. But before we jump to conclusions of Dr. Robot, we must have much more discussion on the ethics as we improve healthcare with AI.

December 18, 2017

Understanding the Basics of the GDPR

On May 25, 2018, enforcement of the General Data Protection Regulation (GDPR) will begin in the European Union. The Regulation unifies data protections for all individuals within the European Union, however, in some cases, it also hinders the usage of such data. By no means a comprehensive analysis, this post will help get you up to speed on the GDPR, how it impacts business, and what analysts can do to still get valid results from data.

Very Brief History

On January 25, 2012, The European Commission proposed a comprehensive reform of the 1995 data protection rules to “strengthen online privacy rights and boost Europe’s digital economy.” It was estimated that implementing a single law could bypass “the current fragmentation and costly administrative burdens, leading to savings for businesses of around €2.3 billion a year.” On April 14, 2016, the Regulation was officially adopted by the European Parliament and is scheduled to be put into force on May 25, 2018. Now that we know how we got here, let’s answer some basic questions:

Why does Europe need these new rules?

In 1995, when the prior regulations were written, there were only 16 million Internet users in the world. By June 2017, that number had increased to almost 4 billion users worldwide and more than 433 million of the EUropean Union’s 506 million inhabitants were online. The increased use ushered in increased technology, search capabilities, data collection practices and legal complexity. Individuals lack control over their personal data and businesses were required to develop complex compliance plans to comply with the varying implementations of the 1995 Regulations throughout Europe. The GDPR fixes these issues by applying the same law consistently throughout the European Union and will allow companies to interact with just one data protection authority. The rules are simpler, clearer, and provide increased protections to citizens.

What do we even mean by “personal data?”

Simply put, personal data is any information relating to an identified or identifiable natural person. According to The Regulation’s intent, it “can be anything from a name, a photo, an email address, bank details, your posts on social networking websites, your medical information, or your computer’s IP address.”

Isn’t there also something called “Sensitive personal data?”

Yes. Sensitive personal data is “personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, and the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health or data concerning a natural person’s sex life or sexual orientation.” Under the GDPR, the processing of this is prohibited, unless it meets an exception.

What are those exceptions?

Without getting into the weeds of the rule, the excepts lay out cases where it is necessary and beneficial to take into consideration sensitive personal data. These include legal proceedings, substantial public interests, medical purposes, protecting against cross-border threats, and scientific research.

With all this data being protected, can I still use Facebook?

Yes! The new rules just change how data controllers collect and use your information. Rather than users having to prove that the collection of information is unnecessary, the businesses must prove that the collections and storing of your data is necessary for the business. Further, companies must take into account “data protections by default” meaning those pesky default settings that you have to set on Facebook to keep people from seeing your pictures will already be set to the most restrictive setting. Further, the GDPR includes a right to be forgotten, so you can make organizations remove your personal data if there is no legitimate reason for its continued possession.

How can data scientists continue to provide personalized results under these new rules?

This is a tricky question, but some other really smart people have been working on this problem and the results are promising! By aggregating and undergoing pseudonymization processes, data gurus have continued to achieve great results! For a good jumping off point on this topic, head over here!

December 15, 2017

Online monitoring and “Do Not Track”

If you’ve ever noticed that online ads are targeted to your tastes and interests or that websites remember preferences from visit-to visit, the reason is online tracking methods, such as cookies. In recent years, we, as consumers, have become increasingly aware of organization’s ability to track our movement online. These organizations are able to track movement across their own website as well as sister websites, for example LinkedIn is able to track movement across LinkedIn, Lynda.com, SlideShare, etc. Organizations are able to track movement by utilizing IP addresses, accounts, and much more to identify us and connect our online movements across sessions. From there, advertisers and third-parties are able to purchase information about our movements to target ads.

While online tracking is now ubiquitous in 2017, efforts to curtail an organization’s ability to monitor online movement started almost 10 years earlier. In 2007, several consumer advocacy groups advocated to the FTC for an online “Do Not Track” list for advertising. Then, in 2009, researchers created a prototype for an add-on for Mozilla Firefox that implemented a “Do Not Track” header. One year later, the FTC Chairman told the Senate Commerce Committee that the commission was exploring the idea of proposing an online “Do Not Track” list. And towards the end of 2010, the FTC issued a privacy report that called for a “Do Not Track” system that would enable users to avoid the monitoring of their online actions.

As a result of the FTC’s announcement, most internet browsers provided a “Do Not Track” header similar to the one created in 2009. These “Do Not Track” headers work by alerting websites that the user does not want their movements to be tracked through a signal, which the website can then choose to honor or not. While most of these browsers created an opt-in option for the “Do Not Track” header, Internet Explorer 10’s original default was to enable the “Do Not Track” option. Microsoft faced blow back from advertising companies for the default setting, who thought that users should have to choose to utilise the “Do Not Track” header. Eventually in 2015, Microsoft changed the “Do Not Track” option to be an opt-in and no longer a default option.

Despite these browsers implementing “Do Not Track” solutions for users, there has been no legally agreed upon standard for what organizations should do when they receive the signal. As a result, the majority of online organizations do not honor the “Do Not Track” signal. This was further enabled in 2015, when the FCC dismissed a petition that would have required some of the larger online companies (e.g., Facebook, Google, Netflix) to honor the “Do Not Track” signals from consumers’ browsers. In the response, the FCC stated that “the Commission has been unequivocal in declaring that it has no intent to regulate edge providers.” The FCC’s response to this petition has enabled organizations to ignore “Do Not Track” signals as the response indicated that the FCC has no intention of enforcing the signal. Due to the lack of enforcement, today, the “Do Not Track” signal amounts to almost nothing for the majority of the web. However, there are a few organizations that have decided to implement the “Do Not Track” signal, including Pinterest and Reddit

In general, while there is a “Do Not Track” option available for internet browsers, it does not do much at all. Instead, for users to protect their online privacy and prevent tracking, they must consider other options such as setting your browser to reject third-party cookies, leveraging browser extensions to limit tracking, etc.

December 11, 2017

Social Media’s Problem with the Truth

Based on your political views, your Facebook/Google/Twitter news feed probably looks quite a bit different from mine. This is because of a process known as a “filter bubble”, in which news which comports to your world view is highlighted and news which conflicts with your world view is filtered out, painting a very lopsided picture in your news feed. A filter bubble results from the confluence of two phenomena. The first is known as an “echo chamber,” in which we seek out information to confirm what we already believe and disregard that which challenges those beliefs. The second is social media recommender algorithms doing what recommender algorithms do — recommending content it thinks you’ll enjoy. If you only read news of a certain political persuasion, eventually, your favorite social media site will only recommend news of a certain political persuasion.

Unfortunately, this has resulted in a breeding ground for fake news. Unscrupulous content providers don’t care whether or not you know the information they peddle is false, so long as you click on it and share it with your social network (many members of which probably share your political views and will keep the fake news article propagating through the network). The only barrier between fake news mongers and your news feed is the filter bubble you’ve created. That same barrier, however, becomes an express lane simply by capitalizing on key words and phrases that you’ve already expressed an interest in.

Social media sites have done little to combat this barrage of fake news. Their position on the matter is that it’s up to the user to decide what is fake and what is real. In fact, Twitter relieves itself of any obligation to discern fact from fiction in its Terms of Service, stating that you “use or rely upon” anything you read there at your own risk. Placing the onus of fact-checking on the users has led to real consequences such as “Pizzagate,” an incident in which a man, acting in response to fake news he had read on Facebook, fired an assault rifle in a pizzeria he believed was being used as a front by Hillary Clinton’s campaign manager, John Podesta, to traffic child sex slaves.

Clearly, placing the burden of verifying news on the users’ shoulders doesn’t work. Many users suffer from information illiteracy — they aren’t equipped with the skills necessary to ascertain whether or not a news article has any grounding in reality. They don’t know how to fact check a claim, or even question the expertise or motivation of someone going on the record as “an expert.” And if the news article happens to align with their existing world view, many have little reason to question its authenticity.

Social media sites need to do more to combat fake news. They’ve already been excoriated by Congressional committees over their part in the Russian meddling effort during the 2016 Presidential Election. Facebook, Google, and Twitter have since pledged to find a solution to end fake news, and Twitter has suspended 45 accounts suspected of pushing pro-Russia propaganda into U.S. political discourse, but they are only addressing the issue now that they are facing scrutiny, and they are still dragging their feet about it. Ultimately though, the incentive structures in place do little to encourage social media giants to change their ways. Social media sites make the majority of their money through advertisements and sponsored content, so when a content provider offers large sums to ensure millions of people get their message, social media sites won’t ask questions until the fines for sponsoring misleading content offset any potential profit.

December 9, 2017December 10, 2017

Data Privacy and GDPR

As the world’s most valuable resource and as coined by the economist article written in May 2017, data is the new oil. Internet and technology giant have transformed the business world and social interaction as we know it. Companies such as Amazon, Google, Facebook and Uber have more data on consumers around the world in the past 5 years than the entire history of data collection since its inception.

With that said, companies don’t always collect data in the most “ethical” way; as we have seen in class Google for example collected private data from wi-fi signals during its routine Google street view cars. Uber gets access to your phone details, your contacts, schedule amongst many other things on your phone and my personal favorite: “third-party site or service you were using before interacting with our services.” (Uber Privacy)

Many other big and small tech companies capture more information by the second and we as consumers have grown accustom to scrolling, scrolling, scrolling some more, clicking “accept”, entering personal information and start the consumption of services. Through cognitive tricks and with the help of psychologist, companies nudge consumers behavior to their benefit; it gets increasingly easier for making it more difficult for users to “opt-out” vs. “opting-in” if given the opportunity. Most of us never pause to think how can we continue using the services without sharing all this private information, we never pause to read what is being captured and tracked as we interact with the application or the platform and we never pause to question the impact on our lives if this data gets leaked or the company gets hacked due to weak security policies or lack of privacy regulations implemented by the organization.

Fortunately, the EU has drafted a new set of data protection regulations built around the protection of the user’s privacy and information. The new General Data Protection Regulation (GDPR) will be enforced on May 28 2018. Companies in violation of these regulations will be subject to a “penalty of up to 4% of their annual global turnover or €20 Million (whichever is greater)”. Most of you reading this blog are thinking great but this is in Europe and we are in the U.S., why should we care? How will it impact us?

The beauty about GDPR is “it applies to all companies processing the personal data of data subjects residing in the Union, regardless of the company’s location”, hence all technology companies and internet giants will need to comply with these new regulations if they would like to continue operating in the EU.

It is worth noting that though data protection directives are not new to the EU, GDPR introduces new regulations that are necessary to address the issues brought forth by the evolution and creativity of today’s technology companies when it comes to data protection and privacy. The biggest 2 changes that were introduced are the global reach of GDPR and the financial penalty as previously mentioned above. Other changes include strengthened consent statements and improved data subject rights. (see more details here).

All said and done, though the GDPR is a step in the right direction focused around the protection of our data and privacy, there are still no clear and strict guidelines that are preventing companies from capturing and processing excessive data being captured that are irrelevant to the user’s experience (if there is such a concept as “excessive data”). For example, my Uber hailing and riding experience is not linked in any shape or form to Uber capturing my browsing history on “how to sue Uber” or me checking my wife’s ovulation cycle before using their application!

Hence, I believe regulations should also include clear consent “Opt-in” as an option (empty checkbox) to capture, monitor and process data not relevant to the user’s experience and services offered by the platform.

December 5, 2017

How important is data integrity to a consumer?

I want to explore a couple of ideas. Is data a consumers best friend or worst enemy? Or both? Can they tell the difference? Do they care?

Big data has become the buzz around the silicon valley over the last few decades. Every company strives to not only be a data driven company but also, in many cases, become a “data company.” From the perspective of a data enthusiast this prospect is not only exciting but it also promises many exciting opportunities. However, as we have seen in this class, with these opportunities comes a risk. Throughout this class we have tried to make sense of the blurry line that governs what companies can collect, how they can collect it, and what their duty is to their consumers at the end of the day when it comes to data collection and data privacy.

Many times companies use data in less than ethical ways but in the end it benefits the user. For example let’s assume a hypothetical company scraped information off the web for everyone of their user base so that they can serve up personalized and relevant content to their users. This benefits the users because the personalization makes the product more attractive to them but at the same time it is a clear invasion of their privacy because they are giving the company access to this information. My base question is: do people care? My assumption is that they definitely care, as I am sure everyone in this class’s assumption is. But would the conversation change if they don’t? Put another way – why do we have the rules that we have? Is it because we feel that this is what the vast population wants or is it because we feel that they need to be protected?

We have a vast array or rules and regulations but, according to an article by Aaron Smith published in Pew Research, about half of online Americans don’t even know what a privacy policy governs. From his article he suggests that the majority of online consumers believe that if a company has a privacy policy they are not allowed to share the data that they receive. Given this research, and many others in the same subject, I think it is reasonable to suggest that we may think that our privacy laws are put into place to protect the population rather than to conform to what they want because for the most part they don’t even understand what we are protecting them from. Or at the very least we have not adequately explained to them what our laws say and how our laws restrict companies.

With that in mind the next logical question is whether the consumer population actually cares. This is the topic of my final project. They may not understand what we are trying to protect them from today but if they did understand it does it matter to them? Would they rather have a more compelling product or would they rather have more control over how their information is being handled? A quick note: I have focused mainly on improving a product because I felt that the question of using it only for the companies gain was not an interesting one. I am interested in what a consumer feels is more beneficial to them – a compelling product or control of their data.

Pew research article: http://www.pewresearch.org/fact-tank/2014/12/04/half-of-americans-dont-know-what-a-privacy-policy-is/

December 3, 2017December 3, 2017

Sources of Bias in Machine Learning

Recently Perspective, an application created by Google to score “toxicity” in online comments, has come under fire for displaying high levels of gender and racial bias. Here we will attempt to view this perceived bias through a machine learning lens.

There are two types of bias exhibited in machine learning and it is useful to distinguish them. On one hand we have the contribution due to algorithmic bias. This occurs when the mathematical algorithm itself is too simple to account for all of the variance in the observed data. On the other hand we have training bias. This occurs when the data used as training input into the mathematical model is too limited to explain all of the variance in the world. The solution would seem simple: increase the complexity of the algorithm and the variety of the training data. However in the real world this is often difficult and sometimes impossible.

One of the decisions that goes into creating a mathematical model is known as the “bias-variance tradeoff” in which a supervised machine learning model is selected such that it isn’t so specific that it only works in a limited number of cases, but isn’t so general that it ignores all the details. This tradeoff is straightforward to quantify and is very well understood in known algorithms. With Perspective Google uses a type of supervised machine learning called a deep neural network, a machine learning algorithm specifically designed to solve complex problems. Interestingly deep neural networks almost exclusively sit on the high variance end of the bias-variance spectrum. That is to say Perspective almost certainly has very low algorithmic bias. While it is possible that the model does have some unquantified algorithmic bias (for example it may not be able to distinguish intentional deception) the instances of text used in the referenced articles are not an example of this.

The conclusion then is that training bias accounts for almost all of the bias in this application. Training bias is much less understood than its algorithmic counterpart. The data used to train Perspective comes from discussions between different Wikipedia editors about the content of page edits, a simple and widely available data set. However the latent sources of bias in this training dataset are difficult to spot ranging from local copyright law to the composition of the employees in the software industry. Algorithms can be corrected so that these biases are not amplified but these adjustments require an apriori knowledge to identify the affected classes and still result in at least the same bias as the training data itself. The solution to this problem will involve awareness, working with both the variables in the data set as well as the outcome to be predicted.

Developing tests for bias among the predictor variables in the training set can, at a minimum, allow the consumer of the model to be informed of its limits. With Perspective Google simply put forward a test environment that allows an individual to enter in any english utterance and get a toxicity score. But the testing data consisted of full sentences that were a part of a larger thread of conversation, which is a bias in itself. If Perspective forced the user to submit an entire conversation and then select a specific response for a toxicity rating the results may be more interpretable.

Adjusting the outcome variables to incorporate additional parameters of “fairness” is one avenue being explored. Another solution is to throw out the predicted outcomes entirely and allow the algorithm infer the underlying structure of the data. Asking Perspective to partition up the Wikipedia data into a number of unlabeled categories may yield an implicit toxic/non-toxic split. This type of machine learning is much less understood but many experts believe this is the path forward towards a more generalizable intelligence.

Overall bias presents one of the most difficult obstacles to overcome in a wider adoption of machine learning. An awareness and understanding of the sources of those biases is the first step to correcting them.

November 23, 2017

Data Breaches

According to the United States Government, “A data breach is a security violation in which sensitive, protected or confidential data is copied, transmitted, viewed, stolen or used by an individual unauthorized to do so.” The news has been filled with massive company data breaches involving customer and employee information.

Notification Laws: Every state in the U.S., with the exception of Alabama and South Dakota, has a data breach notification law in place. The National Conference of State Legislators has a link to all the different state laws so you can see what your state requires. Keeping track of all these laws could be very confusing, not including all the international laws for multinational corporations. Currently, there is no federal law that covers general personal information data breaches. Both the Data Security and Breach Notification Act of 2015 and Personal Data Notification and Protection Act of 2017 have been introduced into the House of Representatives but that is as far as they got. For health information specifically, there are two rules at the federal level that cover notification to those effected which are the Health Breach Notification Rule and the HIPAA Breach Notification Rule.

Data Ownership: Discussion stemming from these breaches has brought up the topic of data ownership. The personal information that companies have residing in their databases has long been thought of as their property. This concept has been changing and evolving as our personal data has been proliferated into many databases with increasingly more personal information being collected and generated. Users of these websites and companies understand that organizations need their information to provide services, whether that’s a personalized shopping experience or hailing a ride. This point of ownership cannot be highlighted enough. The acquiring of personal information gained in a data breach is not just an attack on the company but is an attack on all this users whose personal information was stolen and could be sold or used for illegal activities.

Timing: Customers of these companies want to know if their information has been compromised, so they can evaluate if accounts or other identity fraud situations have occurred. There are several milestones in the data breach timeline. One is when the data breach actually occurred. This may not be known if the company does not have a digital trail and infrastructure to discover when this happened. This may be well before the next milestone of the company discovering a breach and assessing the extent of the breach. The next milestone would be the corrective action taken by the effected company or agency to ensure the data is now being protected. Currently, only eight states have a firm deadline for notification which is usually 30 to 90 days after discovery of the breach.

Encryption: California led the data breach notification law effort by passing, in 2002, a law requiring businesses and government agencies to notify California residents of data security breaches. In the California law, there is an exception to notifying those effected if the personal information is encrypted. The law defines the term “encrypted” to mean “rendered unusable, unreadable, or indecipherable to an unauthorized person through a security technology or methodology generally accepted in the field of information security.” These broad terms for encryption do not include a particular levels of encryption but tries to leave open the increasing level of encryption by whatever the industry standard is at that time. Maybe if a breach occurs, a government or third party could evaluate the company’s encryption levels to determine if reporting is required.

The issue of data breaches is not going away. If Government agencies and companies do not respond in a fashion that customers find acceptable, users will start to become wary of sharing this valuable personal information and the insights that come with it will be lost.