Anonymization is a major issue when handling Industrial Internet of Things (IIoT) data. Machine Learning (ML) applications require decrypted data to perform tasks efficiently, which means that third parties involved in data processing may have access to sensitive information. This poses a risk of privacy leaks and information leaks for the companies that generate the data. Consequently, due to these concerns, companies are hesitant to share their IIoT data with third parties.
The state of the art to address the anonymization problem involves several approaches, such as encryption, homomorphic encryption, cryptographic techniques, and distributed/federated learning. However, these methods have limitations in terms of computational costs, explainability of ML models, and vulnerabilities to cyberattacks. Furthermore, existing privacy-preserving techniques often result in a tradeoff between privacy and accuracy, where achieving high privacy protection leads to a significant loss in ML model accuracy. These challenges make it difficult to effectively and efficiently preserve the privacy of IIoT data.
In this context, a research team from Kadir Has University in Turkey proposed a novel method that combines generative adversarial networks (GANs) and differential privacy (DPs) to preserve sensitive data in IIoT operations. The hybrid approach aims to achieve privacy preservation with minimal loss of accuracy and low additional computational costs. GAN is used to generate synthetic copies of sensitive data, while DP introduces random noise and parameters to maintain privacy. The proposed method is tested using publicly available data sets and a realistic IIoT data set collected from a confectionery production process.
The authors propose a hybrid privacy-preserving approach for IIoT environments. His method involves two main components: GAN and DP.
- GAN: They use GAN, specifically the Conditional Tabular GAN (CTGAN) approach, to create a synthetic copy (XG) of the original (XO) data set. GAN learns the distribution of the data and generates synthetic data with statistics similar to the original.
- DP: To improve privacy, you add random noise from a Laplace distribution to the sensitive features of the data. This technique preserves privacy while maintaining the general probability distribution of the data.
The proposed approach involves the following:
- Creation of a synthetic data set with GAN.
- Substitution of sensitive characteristics.
- Differential privacy app by adding random noise.
The resulting data set is privacy-preserving and can be used for machine learning analysis without compromising sensitive information. The complexity of the algorithm depends on the number of sensitive features and the size of the data set. The authors emphasize that their method ensures overall privacy protection for IIoT data.
The evaluation carried out in this paper involved conducting experiments to test the proposed hybrid approach to the synthesis and prediction of privacy-preserving data. The experiments were performed on four SCADA data sets: wind turbine, steam production, energy efficiency, and synchronous motors. The experiments used the CTGAN techniques of synthetic data generation and differential privacy (DP). Evaluation criteria included measuring accuracy using the R-squared metric and privacy preservation using six privacy metrics. The results showed that the proposed hybrid approach achieved higher accuracy and privacy preservation than other methods, such as CTGAN and DP. The experiments also tested the performance of the proposed method on data sets with hidden sensitive features and demonstrated its ability to protect such sensitive data.
In conclusion, the paper proposed a novel hybrid approach combining GAN and DP to address the problem of anonymization in Industrial Internet of Things (IIoT) data. The proposed method consists of creating a synthetic data set using GAN and applying DP by adding random noise to the sensitive features. The evaluation results demonstrated that the proposed hybrid approach achieved higher accuracy and privacy preservation than other methods. This approach offers a promising solution for preserving sensitive data in IIoT environments while minimizing precision loss and computational costs.
review the Paper. Don’t forget to join our 25k+ ML SubReddit, discord channel, and electronic newsletter, where we share the latest AI research news, exciting AI projects, and more. If you have any questions about the article above or if we missed anything, feel free to email us at [email protected]
Featured Tools:
🚀 Check out 100 AI tools at AI Tools Club
Mahmoud is a PhD researcher in machine learning. He also has a
bachelor’s degree in physical sciences and master’s degree in
telecommunication systems and networks. Your current areas of
the research concerns computer vision, stock market prediction and
learning. He produced several scientific articles on the relationship with the person.
identification and study of the robustness and stability of depths
networks