Privacy risk assessment of AI models

The need to analyze personal data to drive business, alongside the requirement to preserve the privacy of data subjects, creates a known tension.

Data protection regulations such as The EU General Data Protection Regulation (GDPR) define strict restrictions and obligations on the collection and processing of personal data. Similar laws and regulations are being enacted in other countries around the world.

Many data processing tasks nowadays involve machine learning (ML). These regulations may also be relevant for machine learning models, which can be used to derive personal information about their training sets.

Recent studies show that a malicious third party with access to a trained ML model, even without access to the training data itself, can still reveal sensitive, personal information about the people whose data was used to train the model. For example, it may be possible to reveal whether or not a person’s data was part of the model’s training set, or even infer sensitive attributes about them, such as their salary.

It is therefore crucial to be able to assess the privacy risk of AI models that may contain personal information before they are deployed, allowing time for applying appropriate mitigation strategies. Such assessments are also important to enable comparing and choosing between different ML models based not only on accuracy but also on privacy risk, thus making an informed decision on what model is most suitable for a given use case.

Existing tools and frameworks for assessing privacy risk of ML models tend to be low-level, requiring a high degree of expertise to employ, or are tightly coupled with a specific ML framework. There is a strong need for more automated solutions that can enable non-experts to perform, and more importantly understand the results of, such privacy risk assessments.

To help overcome these limitations, IBM has developed an end-to-end privacy risk assessment framework for running privacy risk assessments of AI models that enables assessing models from different ML frameworks, using a variety of low-level privacy attacks and metrics, and without requiring deep technical expertise.

The goal of this framework is to automate many of the decisions around which attacks and metrics to run, and all of the technical preparation required in order to run them. For example, some attacks require dividing the given datasets into smaller fragments to use, for example, for training and validating an attack. In some cases, the fragments for different attacks must be generated in different ways so that the attack is most effective. Some attacks also have multiple runtime options and parameters that need to be set in a way that best matches the given model and data.

Secondly, since most non-technical users cannot understand the meaning of each individual attack or score, let alone be able to compare between the results of different models, it is crucial to summarize these individual results into an overall privacy risk score. This is not trivial, since different attacks may have different baselines or thresholds for defining risk, and various metrics have different ranges or scales, or may not even be bounded at all. All these must be carefully considered when aggregating different scores together.

Another design consideration we adopted is to make the tool generic to enable evaluating models from different ML frameworks (scikitlearn, pytorch, keras), different data modalities (tabular, images, text) and different model types (classification, regression, detection). Moreover, we wanted to also allow plugging into the tool various low-level attack frameworks.

The framework relies on Q&A-based interactions with the user. Depending on answers to general questions such as "what type of task does the model solve" (e.g., classification/regression), "how will it be deployed", etc., the tool automatically selects which low-level attacks and metrics to run, and their corresponding inputs and parameters. The tool then runs the attacks and metrics, collects their results, summarizes them and returns them to be displayed to the user.

This tool was first developed in the course of the CyberKit4SME project, where it was used to evaluate tabular and vision models in the transportation domain, supporting membership inference[1] and attribute inference[2] attacks. We plan to further develop and extend the tool in the NEMECYS project, where it will be used to assess models in the healthcare domain, and will be extended to support additional modalities such as time-series models and language processing models, and possibly also new types of attacks.

This risk assessment process is also tightly connected to the ML model anonymization technology developed in the iToBoS project, as it allows to first assess the initial risk of the model, and then if the results of the assessment are not satisfactory enough, to consider applying different strategies to mitigate the risk, including model anonymization. After applying the mitigation, the new model may again be assessed to ensure that the privacy risk has indeed been mitigated. This could also be an iterative process, where the tradeoff between the model's privacy and accuracy is explored to find the best option for the given use case. In such cases, multiple assessments of the different versions of the model may be performed and compared, until a final model is selected.

  

Abigail Goldsteen, IBM.

[1] Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. 2017. Membership Inference Attacks Against Machine Learning Models. In IEEE Symposium on Security and Privacy. San Jose, CA, USA, 3–18.

[2] Matthew Fredrikson, Somesh Jha, and Thomas Ristenpart. 2015. Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures. In CCS. Denver, Colorado, USA.