Advanced Techniques for Improving Model Robustness in Adversarial Machine Learning

Authors

  • Prashis Raghuwanshi Senior Software Engineer and Researcher (Associate Vice President), Dallas, Texas

DOI:

https://doi.org/10.29070/q2reyr59

Keywords:

machine learning models, Robustness, Advanced Techniques, Adversarial Attacks, Adversarial Learning

Abstract

This work investigates advanced methods for improving the resilience of machine learning models against adversarial attacks. Ensuring that these models can withstand deliberately crafted inputs—called adversarial examples—has become critical as machine learning expands into high-stakes fields such as computer vision, cybersecurity, and healthcare. The study examines several types of adversarial attacks, including black-box attacks, where the attacker has no direct knowledge of the model, and white-box attacks, where the attacker has complete access to the model. Popular defense strategies, such as the Fast Gradient Sign Method (FGSM), Iterative FGSM (I-FGSM), and the Carlini and Wagner (C&W) attack, are also discussed. The work emphasizes how adversarial learning contributes to creating more resilient models by addressing both theoretical foundations and practical applications. This thorough investigation highlights the strengths and weaknesses of current approaches, as well as the ongoing need for advancements to protect model integrity against evolving threats.

References

Liang, S., Li, Y., & Srikant, R. (2017). Principled detection of adversarial examples in deep networks. arXiv preprint arXiv:1704.01155.

Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2018). Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083.

Ren, K., Zheng, T., Qin, Z., & Liu, X. (2020). Adversarial attacks and defenses in deep learning. Engineering, 6(3), 346-360.

Sharif, M., Bhagavatula, S., Bauer, L., & Reiter, M. K. (2019). A general framework for adversarial examples with objectives. ACM Transactions on Privacy and Security (TOPS), 22(3), 1-30.

Wang, H., Zhang, Z., & Cao, X. (2016). Fast Gradient Sign Method (FGSM) for adversarial example generation. Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, 67-74.

Cheng, Y., Wei, Y., Bao, H., & Hu, T. (2018). Evaluating the effectiveness of FGSM in adversarial example generation. Journal of Machine Learning Research, 19(1), 1-26.

Carrillo-Perez, E., Fernandez, P., Garcia-Garcia, A., & Salgado, J. (2019). A comprehensive review of adversarial examples in neural networks: Bridging the gap. IEEE Transactions on Neural Networks and Learning Systems, 30(11), 3447-3459.

Vardhan, H., Reddy, M., & Kumar, R. (2020). Thermometer coding: A defense mechanism against adversarial attacks. Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI), 5896-5902.

Gupta, S., Sharma, R., & Kaur, P. (2021). Enhancing model robustness using K-NN algorithms for adversarial example detection. Pattern Recognition Letters, 139, 33-40.

Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255-260. Available at: https://www.science.org/doi/10.1126/science.aaa8415

Esteva, A., Kuprel, B., Novoa, R. A., Ko, J., Swetter, S. M., Blau, H. M., & Thrun, S. (2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639), 115-118. Available at: https://www.nature.com/articles/nature21056

Biggio, B., & Roli, F. (2018). Wild patterns: Ten years after the rise of adversarial machine learning. Pattern Recognition, 84, 317-331. Available at: https://www.sciencedirect.com/science/article/pii/S0031320318303564

Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572. Available at: https://arxiv.org/abs/1412.6572

Eykholt, K., Evtimov, I., Fernandes, E., Li, B., Rahmati, A., Xiao, C., ... & Song, D. (2018). Robust physical-world attacks on deep learning visual classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1625-1634). Available at: https://openaccess.thecvf.com/content_cvpr_2018/html/Eykholt_Robust_Physical-World_Attacks_CVPR_2018_paper.html

Iyyer, M., Wieting, J., Gimpel, K., & Zettlemoyer, L. (2018). Adversarial example generation with syntactically controlled paraphrase networks. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) (pp. 1875-1885). Available at: https://aclanthology.org/N18-1170/

Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2018). Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083. Available at: https://arxiv.org/abs/1706.06083

Papernot, N., McDaniel, P., Sinha, A., & Wellman, M. (2016). Towards the science of security and privacy in machine learning. arXiv preprint arXiv:1611.03814. Available at: https://arxiv.org/abs/1611.03814

Downloads

Published

2024-05-01

How to Cite

[1]
“Advanced Techniques for Improving Model Robustness in Adversarial Machine Learning”, JASRAE, vol. 21, no. 4, pp. 141–148, May 2024, doi: 10.29070/q2reyr59.

How to Cite

[1]
“Advanced Techniques for Improving Model Robustness in Adversarial Machine Learning”, JASRAE, vol. 21, no. 4, pp. 141–148, May 2024, doi: 10.29070/q2reyr59.