WIP - Privacy-preserving machine learning: Methods, challenges and directions#

Note

Hey guys, this is my personal reading note. I am not sure there might be some mistakes in my understanding. Please feel free to correct me (hsiangjenli@gmail.com) if you find any. Thanks!

Publish Year : 2021
Authors : Xu, Baracaldo, and Joshi

Before starting#

Before starting to read the paper, the basic concepts you need to know are as follows:

Entire ML pipeline process
The participants in the ML pipeline

Key Terms#

Privacy-preserving machine learning (PPML)
Complete Model \(\rightarrow\) Train on single machine
Global Model \(\rightarrow\) Train on multiple machines
Data Producer (DP)
Model Consumer (MC)
Computational Facility (CF)
Confidential-level privacy
Homomorphic encryption (HE)
Functional encryption (FE)
Differential privacy
Multi-party computation (MPC)
Secure multi-party computation (SMPC)
Garbled circuit
Oblivious transfer

Contributions [1]#

Existing privacy preserving approaches
Proposed an evaluation framework for PPML, which decomposes privacy-preserving features into distinct Phase, Guarantee, and Utility aspects (PGU).

Phase : Represents the use of privacy-preserving techniques at different stages in the ML pipeline
Guarantee : In specific scenarios, privacy-preserving techniques provide certain levels of privacy protection
Utility : The impact of privacy-preserving techniques on the model’s performance

Phases of ML Pipeline#

The techniques that can be applied to training phase, it also can be applied to the serving phase. [1]

However, the techniques that can be applied to the serving phase, it may not be applied to the training phase. [1]

../_images/image.png — This figure is taken from the paper [1]#

Privacy Preserving Data Preparation (Data Perspective)#

Traditional anonymization mechanism : Remove the identifier information before training
- \(k\)-anonymity [2]
- \(l\)-diversity [3]
- \(t\)-closeness [4]
Surrogate dataset
- Grouping the anonymized data [5]
- Abstracting the data by sketch techniques [6, 7]
Differential privacy mechanism [8, 9, 10] : Add noise to the data to avoid privacy leakage
- Inference or de-anonymization attacks [1] : Like [11, 12, 13, 14]
Encrypted data
- Confidential-level privacy

Privacy Preserving Model Training (Computational Perspective)#

Supporting computation on encrypted data [1]. Typically, encryption techniques involve two main steps: encoding and decoding [1].

Encoding \(\rightarrow\) Transform floating-point values into integers
Decoding \(\rightarrow\) Recover the floating-point values from trained model or crypto-based training results

Homomorphic encryption :
- BGV scheme [15]
- CKKS [16] : Supports approximate arithmetic computation
Functional encryption :
- Multi-party functional encryption [17, 18]

Privacy Preserving Model Serving (Model Perspective)#

Include model deployment and inference [1]

Private aggregation of teacher ensembles (PATE)
Model transform
Model compression

Privacy Guarantee#

Object-Oriented Privacy Guarantee
- Data oriented privacy guarantee : Prevent the leakage of data, but it will sacrifice of the data utility [1]
  - Anonymization mechanism needs to aggregate and remove proper feature values. Simultaneously, certain values of quasi-identifier features are erased altogether
  - Differential privacy requires the addition of a noise budget to the data sample.
  - Encrypted data may ensure the dataset’s confidentiality, it brings extra processing burden to the subsequent machine learning training.
- Model oriented privacy guarantee : Prevent adversaries from extracting private information through repeated model queries [1]
  - Perturb the trained model
    
    DP-SGD [19] : Adding noise into the clipped gradients to achieve a differentially private model
  - Regulate the model access times and patterns
Pipeline-Oriented Privacy Guarantee

References#

[1] (1,2,3,4,5,6,7,8,9,10,11,12)

Runhua Xu, Nathalie Baracaldo, and James Joshi. Privacy-preserving machine learning: methods, challenges and directions. arXiv preprint arXiv:2108.04417, 2021.

[2]

Latanya Sweeney. K-anonymity: a model for protecting privacy. International journal of uncertainty, fuzziness and knowledge-based systems, 10(05):557–570, 2002.

[3]

Ashwin Machanavajjhala, Daniel Kifer, Johannes Gehrke, and Muthuramakrishnan Venkitasubramaniam. L-diversity: privacy beyond k-anonymity. Acm transactions on knowledge discovery from data (tkdd), 1(1):3–es, 2007.

[4]

Ninghui Li, Tiancheng Li, and Suresh Venkatasubramanian. T-closeness: privacy beyond k-anonymity and l-diversity. In 2007 IEEE 23rd international conference on data engineering, 106–115. IEEE, 2006.

[5]

Mengwei Yang, Linqi Song, Jie Xu, Congduan Li, and Guozhen Tan. The tradeoff between privacy and accuracy in anomaly detection using federated xgboost. arXiv preprint arXiv:1907.07157, 2019.

[6]

Tian Li, Zaoxing Liu, Vyas Sekar, and Virginia Smith. Privacy for free: communication-efficient learning with differential privacy using sketches. arXiv preprint arXiv:1911.00972, 2019.

[7]

Farzin Haddadpour, Belhal Karimi, Ping Li, and Xiaoyun Li. Fedsketch: communication-efficient and private federated learning via sketching. arXiv preprint arXiv:2008.04975, 2020.

[8]

Cynthia Dwork. Differential privacy: a survey of results. In International conference on theory and applications of models of computation, 1–19. Springer, 2008.

[9]

Cynthia Dwork, Guy N Rothblum, and Salil Vadhan. Boosting and differential privacy. In 2010 IEEE 51st annual symposium on foundations of computer science, 51–60. IEEE, 2010.

[10]

Cynthia Dwork, Aaron Roth, and others. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014.

[11]

Gilbert Wondracek, Thorsten Holz, Engin Kirda, and Christopher Kruegel. A practical attack to de-anonymize social network users. In 2010 ieee symposium on security and privacy, 223–238. IEEE, 2010.

[12]

Md Atiqur Rahman, Tanzila Rahman, Robert Laganière, Noman Mohammed, and Yang Wang. Membership inference attack against differentially private deep learning model. Trans. Data Priv., 11(1):61–79, 2018.

[13]

Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. In 2017 IEEE symposium on security and privacy (SP), 3–18. IEEE, 2017.

[14]

Jianwei Qian, Xiang-Yang Li, Chunhong Zhang, and Linlin Chen. De-anonymizing social networks and inferring private attributes using knowledge graphs. In IEEE INFOCOM 2016-The 35th Annual IEEE International Conference on Computer Communications, 1–9. IEEE, 2016.

[15]

Masahiro Yagisawa. Fully homomorphic encryption without bootstrapping. Cryptology ePrint Archive, 2015.

[16]

Jung Hee Cheon, Andrey Kim, Miran Kim, and Yongsoo Song. Homomorphic encryption for arithmetic of approximate numbers. In Advances in Cryptology–ASIACRYPT 2017: 23rd International Conference on the Theory and Applications of Cryptology and Information Security, Hong Kong, China, December 3-7, 2017, Proceedings, Part I 23, 409–437. Springer, 2017.

[17]

Michel Abdalla, Florian Bourse, Angelo De Caro, and David Pointcheval. Simple functional encryption schemes for inner products. In IACR International Workshop on Public Key Cryptography, 733–751. Springer, 2015.

[18]

Michel Abdalla, Dario Catalano, Dario Fiore, Romain Gay, and Bogdan Ursu. Multi-input functional encryption for inner products: function-hiding realizations and constructions without pairings. In Advances in Cryptology–CRYPTO 2018: 38th Annual International Cryptology Conference, Santa Barbara, CA, USA, August 19–23, 2018, Proceedings, Part I 38, 597–627. Springer, 2018.

[19]

Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, 308–318. 2016.

WIP - Privacy-preserving machine learning: Methods, challenges and directions

Contents

WIP - Privacy-preserving machine learning: Methods, challenges and directions#

Before starting#

Key Terms#

Contributions [1]#

Phases of ML Pipeline#

Privacy Preserving Data Preparation (Data Perspective)#

Privacy Preserving Model Training (Computational Perspective)#

Privacy Preserving Model Serving (Model Perspective)#

Privacy Guarantee#

References#