Archive for December 18th, 2017

When I hear that AI will be replacing doctors in the near future, images of Westworld cybernetics come to mind, with robots toting stethoscopes instead of rifles. The debate of the role of AI in medicine is raging, and with good reason. To understand the perspectives, you just have to ask these questions:

• What will AI be used for in medicine?
• If for diagnosis, does AI have the capability of understanding physiology in order to make a diagnosis?
• Will AI ever harm the patient?

To the first point, AI can be a significant player in areas such as gauging adverse events and outcomes for clinical trials and processing genomic data or immunological patterns. Image recognition in pathology and radiology is a flourishing field for AI, and there have even been gasp white papers proving so. The dangers start emerging when AI is used for new diagnoses or predictive analytics for treatment and patient outcomes. How a doctor navigates through the history and symptoms of a new patient to formulate a diagnosis is akin to the manner in which supervised learning occurs. We see a new patient, hear their history, do an exam, and come up with an idea of diagnosis. While that is going on, we have already wired into our brains, let’s say, a convolutional neural network. That CNN has already been created by medical school/residency/fellowship training with ongoing feature engineering every time we see a patient, read an article, or go to a medical conference. Wonderful. We have our own weights for each point found in the patient visit and voila! A differential diagnosis. Isn’t that how AI works?

Probably not. There is a gaping disconnect between the scenario described above and what actually goes on in a doctor’s mind. The problem is that machine learning can only learn from data that is fed into it, probably through an electronic medical record (EHR), a database also created by human users, with inherent bias. Without connecting the medical knowledge and physiology that physicians have, that the CNN does not have. If this is too abstract, consider this scenario – a new patient comes into your clinic with a referral for evaluation of chronic cough. Your clinic is located in the southwest US. Based on the patient’s history and symptoms, coupled with your knowledge of medicine, you diagnose her with histoplasmosis infection. However, your CNN is based on EHR data from the northeast coast, which has almost no cases of histoplasmosis. Instead, the CNN diagnoses the patient with asthma, a prevalent issue across the US and a disease which has a completely different treatment.

AI could harm the patient. After all, we do not have the luxury of missing one case like when we screen emails for spam. Testing models and reengineering features will come with risks that everyone – the medical staff and the patient – must understand and accept. But before we jump to conclusions of Dr. Robot, we must have much more discussion on the ethics as we improve healthcare with AI.

On May 25, 2018, enforcement of the General Data Protection Regulation (GDPR) will begin in the European Union.  The Regulation unifies data protections for all individuals within the European Union, however, in some cases, it also hinders the usage of such data.  By no means a comprehensive analysis, this post will help get you up to speed on the GDPR, how it impacts business, and what analysts can do to still get valid results from data.

Very Brief History

On January 25, 2012, The European Commission proposed a comprehensive reform of the 1995 data protection rules to “strengthen online privacy rights and boost Europe’s digital economy.”  It was estimated that implementing a single law could bypass “the current fragmentation and costly administrative burdens, leading to savings for businesses of around €2.3 billion a year.”  On April 14, 2016, the Regulation was officially adopted by the European Parliament and is scheduled to be put into force on May 25, 2018.  Now that we know how we got here, let’s answer some basic questions:

Why does Europe need these new rules?

In 1995, when the prior regulations were written, there were only 16 million Internet users in the world.  By June 2017, that number had increased to almost 4 billion users worldwide and more than 433 million of the EUropean Union’s 506 million inhabitants were online.  The increased use ushered in increased technology, search capabilities, data collection practices and legal complexity.  Individuals lack control over their personal data and businesses were required to develop complex compliance plans to comply with the varying implementations of the 1995 Regulations throughout Europe.  The GDPR fixes these issues by applying the same law consistently throughout the European Union and will allow companies to interact with just one data protection authority.  The rules are simpler, clearer, and provide increased protections to citizens.

What do we even mean by “personal data?”

Simply put, personal data is any information relating to an identified or identifiable natural person.  According to The Regulation’s intent, it “can be anything from a name, a photo, an email address, bank details, your posts on social networking websites, your medical information, or your computer’s IP address.”

Isn’t there also something called “Sensitive personal data?”

Yes.  Sensitive personal data is “personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, and the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health or data concerning a natural person’s sex life or sexual orientation.” Under the GDPR, the processing of this is prohibited, unless it meets an exception.

What are those exceptions?

Without getting into the weeds of the rule, the excepts lay out cases where it is necessary and beneficial to take into consideration sensitive personal data.  These include legal proceedings, substantial public interests, medical purposes, protecting against cross-border threats, and scientific research.

With all this data being protected, can I still use Facebook?

Yes!  The new rules just change how data controllers collect and use your information.  Rather than users having to prove that the collection of information is unnecessary, the businesses must prove that the collections and storing of your data is necessary for the business.  Further, companies must take into account “data protections by default” meaning those pesky default settings that you have to set on Facebook to keep people from seeing your pictures will already be set to the most restrictive setting.  Further, the GDPR includes a right to be forgotten, so you can make organizations remove your personal data if there is no legitimate reason for its continued possession.

How can data scientists continue to provide personalized results under these new rules?

This is a tricky question, but some other really smart people have been working on this problem and the results are promising!  By aggregating and undergoing pseudonymization processes, data gurus have continued to achieve great results!  For a good jumping off point on this topic, head over here!