AI can increasingly link different datasets and match different types of information with profound consequences. Data held separately and considered Non-PII (Personal Identifiable Information stripped of personal identifiers), with the application of AI can become Personal Identifiable Information (PII). This occurs when AI in the form of machine learning algorithms can correlate non-personal data with other data and matched to specific individuals, becoming personal data. AI through algorithmic correlation will weaken the distinction between personal data and other data. Non-personal data can increasingly be used to identify individuals or infer sensitive information about them, beyond what was originally and knowingly disclosed (Reference: OECD (2019). Artificial Intelligence in Society, OECD Publishing, Paris).
Personal identifiable information as defined by the U.S. Department of labor and the GDPR are as follows:
Personal Identifiable Information (PII) – defined by the U.S. Department of Labor states:
“Any representation of information that permits the identity of an individual to whom the information applies to be reasonably inferred by either direct or indirect means. Further, PII is defined as information: (i) that directly identifies an individual (e.g., name, address, social security number or other identifying number or code, telephone number, email address, etc.) or (ii) by which an agency intends to identify specific individuals in conjunction with other data elements, i.e., indirect identification. (These data elements may include a combination of gender, race, birth date, geographic indicator, and other descriptors). Additionally, information permitting the physical or online contacting of a specific individual is the same as personally identifiable information. This information can be maintained in either paper, electronic or other media.” (Reference: U.S. Department of Labor)
General Data Protection Regulation (GDPR)
GDPR enhances how people can access information about them and places limits on what organizations can do with personal data. GDPR’s seven principles are: lawfulness, fairness and transparency; purpose limitation; data minimization; accuracy; storage limitation; integrity and confidentiality (security); and accountability.
“The General Data Protection Regulation (GDPR) is a legal framework that sets guidelines for the collection and processing of personal information from individuals who live in the European Union (EU). As further protection for consumers, the GDPR also calls for any personally identifiable information (PII) that sites collect to be either anonymized (rendered anonymous, as the term implies) or pseudonymized (with the consumer’s identity replaced with a pseudonym). The GDPR affects data beyond that collected from customers. Most notably, perhaps, the regulation applies to the human resources’ records of employees.” (Reference: Investopedia)
Given the specifics of these guidelines for PII, we must constantly consider and monitor our data being used in these AL/ML applications from inception to deployment. Without the proper governance it will be difficult to assess which data will remain non-PII. However, having consistent data selection, training and monitoring throughout the AI/ML lifecycle (See diagram below) can ensure that AI/ML applications will distinguish between PII and non-PII and enact the necessary protocols.
AI also challenges personal data protection principles of collection limitation, use limitation and purpose specification. To train and optimize AI systems, ML algorithms require vast quantities of data. This creates an incentive to maximize, rather than minimize, data collection. With the growth in use of AI devices, and the Internet of Things (IoT), more data are gathered, more frequently and more easily. They are linked to other data, sometimes with little or no awareness or consent on the part of the data subjects concerned.
The patterns identified and evolution of the “learning” are difficult to anticipate. Therefore, the collection and use of data can extend beyond what was originally known, disclosed and consented to by a data subject. AI/ML applications will be able to learn over time and be used to offer individuals tailored personalized services based on their personal privacy preferences (See Knowledge-as-a-Service). AI systems being developed around the principles of privacy as detailed by the U.S. Department of Labor and the GDPR will be essential because AI carries the potential to enhance personal data; in turn this enhanced personal data has the potential to cause organizations to violate these policies.