Existing Data Protection Concepts for AI

Existing Data Protection Concepts for AI

March 6, 2024

Do we really need more regulations specifically tailored for artificial intelligence? Probably. But in the meantime, it appears that many existing regulations and frameworks could already be leveraged to ensure user privacy and security best practices.


Disclaimer: This article is not legal advice; I am not a lawyer, and this does not reflect my current company’s view or practice. I consider this more as a conversation opener. This article will also deliberately skip the topic of copyright, which is quite a separate, and even more complex, discussion.

Data Processor and Controller Obligations


The most straightforward aspect involves data processors and controllers. Given that most AI applications revolve around processing user data, existing regulations and obligations should apply. For specifics, Article 5 of the General Data Protection Regulation (GDPR) serves as a good example. Here are some key points:

  • Lawfulness, Fairness, and Transparency: AI processes should adhere to legal requirements, treat users fairly, and maintain transparency.
  • Data Minimization and Purpose Limitation: collect only necessary data and use it for specific, legitimate purposes.
  • Storage Limitation and Accuracy: limit data retention and ensure accuracy.
  • Limitations on Data Transfer to Third Parties: be cautious when transferring data to other companies or countries.


Major AI providers (such as Google and Microsoft) have diligently worked on compliance across their core businesses. They offer relevant tools to business users, making adherence to these obligations feasible.


Personally identifiable information or PII


This one starts to get trickier. But not much more than usual. What happens when you need to process for instance a passport through AI? Let’s deconstruct the process.

  • A user sends to your company sensitive data through a secure mean.
  • The data is either processed through your internal servers or through a third-party provider.
  • Your company stores the result in a database for your customer or authorised internal staff/server to read/modify/delete.

All these steps are nothing new and are not AI-specific. The most delicate part, because often out of your control, is your third-party data processor due diligence. One of the main differences with AI, is they need to make sure your own data won’t be used/accessible by their other customers, and that includes using these data to train their models. But again, is it really a new principle?


Personally Identifiable Information (PII) and AI Processing


Processing sensitive data, such as passport information, through AI introduces complexities. Let’s break down the process:

  • User Interaction: a user securely submits sensitive data to your company.
  • Data Processing: the data is processed either internally on your servers or via a third-party provider.
  • Storage and Access: your company stores the processed result in a database accessible to authorised staff or servers.


These steps are not unique to AI; they align with standard data handling practices. However, the critical aspect lies in due diligence regarding third-party data processors. Here’s where AI introduces some differences:

  • Third-party providers must ensure that your data remains isolated from other customers’ data, including using it for model training.
  • While not entirely new, this principle gains prominence due to AI’s reliance on shared infrastructure.


Data Security Challenges


Publicly or internally accessible AI results create security challenges. Consider the following:

  • Attack vectors.
    • AI systems become new targets for hackers
    • Prompt injection attacks are increasingly common.
  • Applying old solutions to new problems
    • Strong Authentication: Ensure robust access controls.
    • Regular Audits: Monitor system integrity.
    • Data and Prompt Hygiene: Cleanse input data to prevent malicious prompts.
    • Maintenance Cycles: Regularly update and patch AI systems.
  • Investment and training:
    • Companies must prioritise cybersecurity before a breach occurs.
    • Staff training is crucial.
    • Identifying reliable security partners is essential


Remember, while AI introduces new challenges, many solutions are rooted in established security practices. Vigilance and adaptation are key to safeguarding sensitive data in the AI era.


Conclusion


From a data privacy and security perspective, even though AI is rapidly evolving, it’s feasible to deconstruct what AI truly is and apply existing guidelines to each step. While AI is an impressive technology, companies have a history of incorporating new technologies over the decades. However, this process has often been imperfect and underfunded until issues arise. I appreciate that high-profile technologies like AI continue to spark conversations and raise awareness about these critical topics.



Appendix

  1. General Data Protection Regulation (GDPR): Introduced by the European Union (EU), the GDPR is a comprehensive regulation that governs the processing of personal data of EU citizens. It sets high standards for data protection, privacy, and security. Organizations worldwide that handle EU citizens’ data must comply with GDPR requirements
  2. California Consumer Privacy Act (CCPA): Enacted in California, the CCPA grants consumers rights over their personal information. It applies to businesses that collect or sell personal data of California residents and imposes transparency and consent obligations
  3. Personal Data (Privacy) Ordinance (PDPO): Hong Kong’s PDPO was the first comprehensive personal data privacy legislation in Asia. The PCPD oversees its enforcement. It emphasizes principles such as notice, choice, access, and security. Organizations operating in Hong Kong must comply with PDPO provisions.