Unstructured data is information that does not have any predefined data models or schemata, so it can be difficult for an enterprise to locate and digest. Examples include physicians’ notes in EHRs, emails, text files, photos, videos, call transcripts and recordings, and business chat apps.
Unstructured data can make up upward of 80% of data within a healthcare organization – so it’s important to be aware of what exists, and not be caught off guard and accidentally share sensitive information.
Apoorv Agarwal is the cofounder and CEO of Text IQ, a vendor of technology that uses artificial intelligence and machine learning to work with sensitive unstructured data. He has five nightmares when it comes to what can be found in unstructured data. Healthcare IT News sat down with him to discuss these nightmares and what healthcare provider organization CIOs and CISOs need to know.
Q: You’ve told me about nightmares hiding in a healthcare organization’s unstructured data. One such nightmare, you said, involves personally identifiable information and personal health information. Failing to redact all the PII and PHI in files might leave an enterprise in grave danger. How does this happen, and what should organizations do to resolve this issue?
A: At times, people will share personal information, from credit card numbers and social security numbers to personal health information, when communicating with a company (as a customer) or within a company (as an employee). Oftentimes, they may not realize the significance of doing this, may not remember that they did it or may not understand this information now is a part of the company’s unstructured data.
Obviously, this information is extremely sensitive, and the last thing companies want is for it to end up in the wrong hands. Automating the process of uncovering this information, categorizing it properly and associating it with the right people, using AI, can prevent the nightmares that would occur if this information were exposed during a breach or an audit.
“Unconscious bias is easy to miss and much more pervasive in the workplace than blatant discrimination and can be blamed for lower wages, less opportunities for advancement and high turnover.”
Apoorv Agarwal, Text IQ
Organizations should have processes and technology in place to identify this sensitive information in whatever systems, messages and documents in which it exists.
Q: Another nightmare you mentioned is code words. What are these and why is it important to deal with them?
A: When committing fraud, a person is likely to disguise their activity using code words. For large enterprises, it’s very difficult to sort through thousands of files, emails and business chats to make this discovery. The last thing a company would want in the middle of litigation or being audited is to be surprised by the information found.
New AI and machine learning techniques are able to identify code words by understanding not only what’s in a document or message, but also the context around it, including the social graph of the people involved.
Words that aren’t normally used in a conversation or are repeatedly used only between a small group of participants can indicate code words. By using AI to automate the process, enterprises can save both time and money, while understanding their data and avoiding issues like potential fraud.
Q: The same person appearing under different names is another of your nightmares. Why is this such a powerful problem?
A: This is problematic because information can become scattered or incorrectly attributed to a person who doesn’t exist, while getting completely lost in the process. In the event of a data breach, for example, people must be notified that their data was made accessible.
Organizations risk added expense and reputational damage if they notify the same person multiple times because they weren’t able to identify that Bob, Robert, rburns and R. Burns are indeed the same person.
Q: You said sexual harassment can rear its ugly head in unstructured data. How? And what steps should healthcare executives take here?
A: Sexual harassment is never acceptable, but having it pop up unknowingly can create a new set of issues. During a company audit or litigation, or prior to a merger or acquisition, organizations risk having a surprise like this if they are unaware of what is in their unstructured data.
As with sharing personal identifiable information, employees may be engaged in this activity via email or business chat apps. Even if it is not reported, there will still be a trail of evidence as this information will remain hidden until identified and addressed.
Q: And finally, another nightmare you pointed to is unconscious bias. How does this manifest, and how should healthcare executives deal with this?
A: Unconscious bias can refer to comments made unknowingly that expose a bias against gender, race or culture. Unconscious bias is easy to miss and much more pervasive in the workplace than blatant discrimination and can be blamed for lower wages, less opportunities for advancement and high turnover.
An example of this could be a manager leading a performance review who expresses an unconscious bias that impacts the performance and career path of another employee because of their race, gender or culture.
Unconscious bias can also manifest itself in casual conversations via email or business-chat apps such as Slack or Microsoft Teams. It is extremely important for business leaders to be able to identify this bias so that they can address it immediately with the individual.
If left undetected and unaddressed, unconscious bias hurts individuals as well as the business. Employees who feel like they have experienced negative bias are likely to withhold their best creative thinking, ideas and solutions from the organization, are less likely to refer others to their organization and will eventually leave the job for other opportunities.
And unconscious bias can lead to expensive lawsuits. In a lawsuit filed earlier this month, Amazon is alleged to hire people of color “at lower levels” and promote them less than white coworkers with similar qualifications. And last month Google reached a deal with the Department of Labor, requiring it to pay roughly $2.6 million in back wages to thousands of workers over claims that pay and hiring practices illegally disadvantaged women and Asians.
Advanced AI and machine learning systems now can be deployed to objectively identify unconscious bias by comparing, for example, the phrases that a manager uses to review female employees from those used with men.
Managers may not be aware of these differences until the machine is able to make objective assessments across a large data set, as it isn’t realistic to have humans review and categorize thousands of performance appraisals, meeting notes or messages.