Main Article Content

Authors

Jai Ramesh Juneja

Abstract

Compound AI Systems are the emerging phenomena integrating “large language models (LLMs)” with additional components like agents, retrievers, orchestrators, and tools to deal with issues of individual models in tasks which need reasoning, memory, multimodal knowledge, and real-time grounding. These systems compose various specialized modules into cohesive flows for more context-aware and capable behaviors. Irrespective of rising adoption in both industry and academia, the landscape of compound AI systems has been fragmented, with a lack of unified model for taxonomy, analysis, and evaluation.


In LLMs, recent advancements and AI systems have made a significant change in optimization and design of complex workflows. Compound AI systems have been adept with various components when it comes to perform smart tasks. With the rise in complex systems, there are new challenges when it comes to optimizing both interactions and components. While traditional optimization models are foundational, such as “reinforcement learning (RL)” and “supervised fine-tuning (SFT)”, there are promising new avenues with the rise of “natural language processing (NLP)”, especially when it comes to optimizing different systems. This study offers a systematic review of recent developments for compound AI systems. It validates the notion of optimizing compound AI systems, highlights open research challenges, and classifies current approaches apart from major dimensions.

Downloads

Download data is not yet available.

Article Details

Section

Articles

References

  1. Bloomberg Intelligence (2023). Generative AI to Become a $1.3 Trillion Market by 2032, Research Finds. https://www.bloomberg.com/company/press/generative-ai-to-become-a-1-3-trillion-market-by-2032-research-finds/.
  2. Perplexity AI (2025). Perplexity AI. https://www.perplexity.ai/.
  3. Peng, S., Kalliamvakou, E., Cihon, P., & Demirer, M. (2023). The impact of ai on developer productivity: Evidence from github copilot. arXiv preprint arXiv:2302.06590.
  4. RADLogics (2021). Use of AI to Analyze Chest CT Shortens Turnaround Times in Russia. https://www.auntminnieeurope.com/imaging-informatics/artificial-intelligence/article/15655440/use-of-ai-to-analyze-chest-ct-shortens-turnaround-times-in-russia.
  5. Fan, W., Ding, Y., Ning, L., Wang, S., Li, H., Yin, D., ... & Li, Q. (2024, August). A survey on rag meeting llms: Towards retrieval-augmented large language models. In Proceedings of the 30th ACM SIGKDD conference on knowledge discovery and data mining (pp. 6491-6501).
  6. Li, X. (2025, January). A review of prominent paradigms for llm-based agents: Tool use, planning (including rag), and feedback learning. In Proceedings of the 31st International Conference on Computational Linguistics (pp. 9760-9779).
  7. Guo, T., Chen, X., Wang, Y., Chang, R., Pei, S., Chawla, N. V., ... & Zhang, X. (2024). Large language model based multi-agents: A survey of progress and challenges. arXiv preprint arXiv:2402.01680.
  8. Lin, M., Sheng, J., Zhao, A., Wang, S., Yue, Y., Wu, Y., ... & Liu, Y. J. (2024). LLM-based Optimization of Compound AI Systems: A Survey. arXiv e-prints, arXiv-2410.
  9. Ferrag, M. A., Tihanyi, N., & Debbah, M. (2025). From llm reasoning to autonomous ai agents: A comprehensive review. arXiv preprint arXiv:2504.19678.
  10. Ma, R., Wang, X., Zhou, X., Li, J., Du, N., Gui, T., ... & Huang, X. (2024). Are large language models good prompt optimizers?. arXiv preprint arXiv:2402.02101.
  11. Yan, B., Zhou, Z., Zhang, L., Zhang, L., Zhou, Z., Miao, D., ... & Zhang, X. (2025). Beyond self-talk: A communication-centric survey of llm-based multi-agent systems. arXiv preprint arXiv:2502.14321.
  12. Lee, Y. A., Yi, G. T., Liu, M. Y., Lu, J. C., Yang, G. B., & Chen, Y. N. (2025). Compound AI Systems Optimization: A Survey of Methods, Challenges, and Future Directions. arXiv preprint arXiv:2506.08234.
  13. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
  14. Zaharia, M., Khattab, O., Chen, L., Davis, J. Q., Miller, H., Potts, C., ... & Ghodsi, A. (2024). The shift from models to compound ai systems. Berkeley Artificial Intelligence Research Lab. Available online at: https://bair. berkeley. edu/blog/2024/02/18/compound-ai-systems/(accessed February 27, 2024).
  15. Zhou, H., Wan, X., Sun, R., Palangi, H., Iqbal, S., Vulić, I., ... & Arık, S. Ö. (2025). Multi-agent design: Optimizing agents with better prompts and topologies. arXiv preprint arXiv:2502.02533.
  16. Opsahl-Ong, K., Ryan, M. J., Purtell, J., Broman, D., Potts, C., Zaharia, M., & Khattab, O. (2024). Optimizing instructions and demonstrations for multi-stage language model programs. arXiv preprint arXiv:2406.11695.
  17. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., ... & Chintala, S. (2019). Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32.
  18. Khattab, O., Singhvi, A., Maheshwari, P., Zhang, Z., Santhanam, K., Vardhamanan, S., ... & Potts, C. (2023). Dspy: Compiling declarative language model calls into self-improving pipelines. arXiv preprint arXiv:2310.03714.
  19. Yuksekgonul, M., Bianchi, F., Boen, J., Liu, S., Lu, P., Huang, Z., ... & Zou, J. (2025). Optimizing generative AI by backpropagating language model feedback. Nature, 639(8055), 609-616.
  20. Jones, K. S. (1994). Natural language processing: a historical review. Current issues in computational linguistics: in honour of Don Walker, 3-16.
  21. Chowdhary, K. (2020). Natural language processing. Fundamentals of artificial intelligence, 603-649.
  22. Iqbal, T., & Qureshi, S. (2022). The survey: Text generation models in deep learning. J. King Saud Univ. Comput. Inf. Sci., 34(6 Part A), 2515-2528.
  23. Nozza, D., Bianchi, F., & Hovy, D. (2021). HONEST: Measuring hurtful sentence completion in language models. In Proceedings of the 2021 conference of the North American chapter of the association for computational linguistics: Human language technologies. Association for Computational Linguistics.
  24. Min, B., Ross, H., Sulem, E., Veyseh, A. P. B., Nguyen, T. H., Sainz, O., ... & Roth, D. (2023). Recent advances in natural language processing via large pre-trained language models: A survey. ACM Computing Surveys, 56(2), 1-40.
  25. Soam, M., & Thakur, S. (2022, January). Next word prediction using deep learning: A comparative study. In 2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence) (pp. 653-658). IEEE.
  26. Diao, S., Xu, R., Su, H., Jiang, Y., Song, Y., & Zhang, T. (2021, August). Taming pre-trained language models with n-gram representations for low-resource domain adaptation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 3336-3349).
  27. Brown, P. F., Della Pietra, V. J., Desouza, P. V., Lai, J. C., & Mercer, R. L. (1992). Class-based n-gram models of natural language. Computational linguistics, 18(4), 467-480.
  28. Omar, N., & Al-Tashi, Q. (2018). Arabic nested noun compound extraction based on linguistic features and statistical measures. GEMA Online® Journal of Language Studies, 18(2).
  29. Rawat, B., Bist, A. S., Rahardja, U., Aini, Q., & Sanjaya, Y. P. A. (2022, September). Recent deep learning based nlp techniques for chatbot development: An exhaustive survey. In 2022 10th International Conference on Cyber and IT Service Management (CITSM) (pp. 1-4). IEEE.
  30. Lhoest, Q., Del Moral, A. V., Jernite, Y., Thakur, A., Von Platen, P., Patil, S., ... & Wolf, T. (2021). Datasets: A community library for natural language processing. arXiv preprint arXiv:2109.02846.
  31. Sharir, O., Peleg, B., & Shoham, Y. (2020). The cost of training nlp models: A concise overview. arXiv preprint arXiv:2004.08900.
  32. Hadi, M. U., Al-Tashi, Q., Qureshi, R., Shah, A., Muneer, A., Irfan, M., ... & Shah12, M. (2024). LLMs: A Comprehensive Survey of Applications, Challenges, Datasets, Models, Limitations, and Future Prospects.
  33. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
  34. Luitse, D., & Denkena, W. (2021). The great transformer: Examining the role of large language models in the political economy of AI. Big Data & Society, 8(2), 20539517211047734.
  35. Dong, Z., Tang, T., Li, L., & Zhao, W. X. (2023). A survey on long text modeling with transformers. arXiv preprint arXiv:2302.14502.
  36. Adnan, K., & Akbar, R. (2019). An analytical study of information extraction from unstructured and multidimensional big data. Journal of Big Data, 6(1), 1-38.
  37. Awais, M., Naseer, M., Khan, S., Anwer, R. M., Cholakkal, H., Shah, M., ... & Khan, F. S. (2025). Foundation models defining a new era in vision: a survey and outlook. IEEE Transactions on Pattern Analysis and Machine Intelligence.
  38. Zhang, H., Li, X., & Bing, L. (2023). Video-llama: An instruction-tuned audio-visual language model for video understanding. arXiv preprint arXiv:2306.02858.
  39. Rouditchenko, A., Boggust, A., Harwath, D., Chen, B., Joshi, D., Thomas, S., ... & Glass, J. (2020). Avlnet: Learning audio-visual language representations from instructional videos. arXiv preprint arXiv:2006.09199.
  40. Zhao, Y., Lin, Z., Zhou, D., Huang, Z., Feng, J., & Kang, B. (2023). Bubogpt: Enabling visual grounding in multi-modal llms. arXiv preprint arXiv:2307.08581.
  41. Huang, J., & Chang, K. C. C. (2022). Towards reasoning in large language models: A survey. arXiv preprint arXiv:2212.10403.
  42. Pappas, N., & Meyer, T. (2012). A survey on language modeling using neural networks. Idiap, Martigny, Switzerland, Tech. Rep. Idiap-RR-32-2012.
  43. Bellegarda, J. R. (2004). Statistical language model adaptation: review and perspectives. Speech communication, 42(1), 93-108.
  44. Lafferty, J., & Zhai, C. (2003). Probabilistic relevance models based on document and query generation. In Language modeling for information retrieval (pp. 1-10). Dordrecht: Springer Netherlands.
  45. Petrushin, V. A. (2000, July). Hidden markov models: Fundamentals and applications. In Online Symposium for Electronics Engineer.
  46. Khudanpur, S., & Wu, J. (2000). Maximum entropy techniques for exploiting syntactic, semantic and collocational dependencies in language modeling. Computer Speech & Language, 14(4), 355-372.
  47. Wang, H., He, J., Zhang, X., & Liu, S. (2020). A short text classification method based on N‐gram and CNN. Chinese Journal of Electronics, 29(2), 248-254.
  48. Rosenfeld, R. (2002). Two decades of statistical language modeling: Where do we go from here?. Proceedings of the IEEE, 88(8), 1270-1278.
  49. Arisoy, E., Sainath, T. N., Kingsbury, B., & Ramabhadran, B. (2012, June). Deep neural network language models. In Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT (pp. 20-28).
  50. Bellegarda, J. R. (2002). Exploiting latent semantic information in statistical language modeling. Proceedings of the IEEE, 88(8), 1279-1296.
  51. Alva-Manchego, F., Scarton, C., & Specia, L. (2020). Data-driven sentence simplification: Survey and benchmark. Computational Linguistics, 46(1), 135-187.
  52. Malik, M., Malik, M. K., Mehmood, K., & Makhdoom, I. (2021). Automatic speech recognition: a survey. Multimedia Tools and Applications, 80(6), 9411-9457.
  53. Crawford, M., Khoshgoftaar, T. M., Prusa, J. D., Richter, A. N., & Al Najada, H. (2015). Survey of review spam detection using machine learning techniques. Journal of Big Data, 2(1), 23.
  54. Neethu, M. S., & Rajasree, R. (2013, July). Sentiment analysis in twitter using machine learning techniques. In 2013 fourth international conference on computing, communications and networking technologies (ICCCNT) (pp. 1-5). IEEE.
  55. Chiang, I. (2023). Unleashing the power of generative AI: The race for advancement and the global ramifications. Massachusetts Institute of Technology.
  56. Aldausari, N., Sowmya, A., Marcus, N., & Mohammadi, G. (2022). Video generative adversarial networks: a review. ACM Computing Surveys (CSUR), 55(2), 1-25.
  57. Hong, S., Seo, J., Shin, H., Hong, S., & Kim, S. (2023). Direct2v: Large language models are frame-level directors for zero-shot text-to-video generation. arXiv preprint arXiv:2305.14330.
  58. Aydın, Ö., & Karaarslan, E. (2023). Is ChatGPT leading generative AI? What is beyond expectations?. Academic Platform Journal of Engineering and Smart Systems, 11(3), 118-134.
  59. Kim, B., Kim, H., Lee, S. W., Lee, G., Kwak, D., Jeon, D. H., ... & Sung, N. (2021). What changes can large-scale language models bring? intensive study on hyperclova: Billions-scale korean generative pretrained transformers. arXiv preprint arXiv:2109.04650.
  60. Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.
  61. Jo, A. (2023). The promise and peril of generative AI. Nature, 614(1), 214-216.
  62. Bansal, H., Gopalakrishnan, K., Dingliwal, S., Bodapati, S., Kirchhoff, K., & Roth, D. (2022). Rethinking the role of scale for in-context learning: An interpretability-based case study at 66 billion scale. arXiv preprint arXiv:2212.09095.
  63. Mariani, M. (2022). Generative artificial intelligence and innovation: conceptual foundations. Available at SSRN 4249382.
  64. Zeng, W., Ren, X., Su, T., Wang, H., Liao, Y., Wang, Z., ... & Tian, Y. (2021). PanGu-$alpha $: Large-scale autoregressive pretrained Chinese language models with auto-parallel computation. arXiv preprint arXiv:2104.12369.
  65. Muneer, A., & Fati, S. M. (2020). A comparative analysis of machine learning techniques for cyberbullying detection on twitter. Future Internet, 12(11), 187.
  66. Hassani, H., & Silva, E. S. (2023). The role of ChatGPT in data science: how ai-assisted conversational interfaces are revolutionizing the field. Big data and cognitive computing, 7(2), 62.
  67. Solaiman, I. (2023, June). The gradient of generative AI release: Methods and considerations. In Proceedings of the 2023 ACM conference on fairness, accountability, and transparency (pp. 111-122).
  68. Tan, S., Shen, Y., & Zhou, B. (2020). Improving the fairness of deep generative models without retraining. arXiv preprint arXiv:2012.04842.
  69. Wach, K., Duong, C. D., Ejdys, J., Kazlauskaitė, R., Korzynski, P., Mazurek, G., ... & Ziemba, E. (2023). The dark side of generative artificial intelligence: A critical analysis of controversies and risks of ChatGPT. Entrepreneurial Business and Economics Review, 11(2), 7-30.
  70. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2020). Generative adversarial networks. Communications of the ACM, 63(11), 139-144.
  71. Che, Z., Cheng, Y., Zhai, S., Sun, Z., & Liu, Y. (2017, November). Boosting deep learning risk prediction with generative adversarial networks for electronic health records. In 2017 IEEE International Conference on Data Mining (ICDM) (pp. 787-792). IEEE.
  72. Shafahi, A., Najibi, M., Ghiasi, M. A., Xu, Z., Dickerson, J., Studer, C., ... & Goldstein, T. (2019). Adversarial training for free!. Advances in neural information processing systems, 32.
  73. Mukherjee, S., Asnani, H., Lin, E., & Kannan, S. (2019, July). Clustergan: Latent space clustering in generative adversarial networks. In Proceedings of the AAAI conference on artificial intelligence (Vol. 33, No. 01, pp. 4610-4617).
  74. Morrison, M., Kumar, R., Kumar, K., Seetharaman, P., Courville, A., & Bengio, Y. (2021). Chunked autoregressive gan for conditional waveform synthesis. arXiv preprint arXiv:2110.10139.
  75. Kaushik, S., Choudhury, A., Natarajan, S., Pickett, L. A., & Dutt, V. (2020). Medicine expenditure prediction via a variance-based generative adversarial network. IEEE Access, 8, 110947-110958.
  76. Yang, L. C., & Lerch, A. (2020). On the evaluation of generative models in music. Neural Computing and Applications, 32(9), 4773-4784.
  77. Geneva, N., & Zabaras, N. (2020). Multi-fidelity generative deep learning turbulent flows. arXiv preprint arXiv:2006.04731.
  78. Cohen, J. D., McClure, S. M., & Yu, A. J. (2007). Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration. Philosophical Transactions of the Royal Society B: Biological Sciences, 362(1481), 933-942.
  79. Xu, D., Zhu, F., Liu, Q., & Zhao, P. (2021). Improving exploration efficiency of deep reinforcement learning through samples produced by generative model. Expert Systems with Applications, 185, 115680.
  80. Dhoni, P. (2023). Exploring the synergy between generative AI, data and analytics in the modern age. Authorea Preprints.
  81. Vigliensoni, G., Perry, P., & Fiebrink, R. (2022). A small-data mindset for generative AI creative work.
  82. Kossale, Y., Airaj, M., & Darouichi, A. (2022, October). Mode collapse in generative adversarial networks: An overview. In 2022 8th International Conference on Optimization and Applications (ICOA) (pp. 1-6). IEEE.
  83. Ding, Y., Mishra, N., & Hoffmann, H. (2019, June). Generative and multi-phase learning for computer systems optimization. In Proceedings of the 46th International Symposium on Computer Architecture (pp. 39-52).
  84. Bandi, A., Adapa, P. V. S. R., & Kuchi, Y. E. V. P. K. (2023). The power of generative ai: A review of requirements, models, input–output formats, evaluation metrics, and challenges. Future Internet, 15(8), 260.
  85. Barratt, S., & Sharma, R. (2018). A note on the inception score. arXiv preprint arXiv:1801.01973.
  86. Chen, N., Klushyn, A., Kurle, R., Jiang, X., Bayer, J., & Smagt, P. (2018, March). Metrics for deep generative models. In International Conference on Artificial Intelligence and Statistics (pp. 1540-1550). PMLR.
  87. Obukhov, A., & Krasnyanskiy, M. (2020). Quality assessment method for GAN based on modified metrics inception score and Fréchet inception distance. In Proceedings of the Computational Methods in Systems and Software (pp. 102-114). Cham: Springer International Publishing.
  88. Schmidt, A. (2023, June). Speeding up the engineering of interactive systems with generative AI. In Companion Proceedings of the 2023 ACM SIGCHI Symposium on Engineering Interactive Computing Systems (pp. 7-8).
  89. Muse, H., Bulathwela, S., & Yilmaz, E. (2023). Pre-training with scientific text improves educational question generation (student abstract). In proceedings of the aaai conference on artificial intelligence (Vol. 37, No. 13, pp. 16288-16289).
  90. Foster, D. (2022). Generative deep learning. " O'Reilly Media, Inc.".
  91. Wang, Y., Zhong, W., Li, L., Mi, F., Zeng, X., Huang, W., ... & Liu, Q. (2023). Aligning large language models with human: A survey. arXiv preprint arXiv:2307.12966.
  92. Zhong, R., Lee, K., Zhang, Z., & Klein, D. (2021). Adapting language models for zero-shot learning by meta-tuning on dataset and prompt collections. arXiv preprint arXiv:2104.04670.
  93. Dang, H., Mecke, L., Lehmann, F., Goller, S., & Buschek, D. (2022). How to prompt? Opportunities and challenges of zero-and few-shot learning for human-AI interaction in creative applications of generative models. arXiv preprint arXiv:2209.01390.
  94. Oppenlaender, J. (2024). A taxonomy of prompt modifiers for text-to-image generation. Behaviour & Information Technology, 43(15), 3763-3776.
  95. Xue, T., Wang, Z., & Ji, H. (2023). Parameter-efficient tuning helps language model alignment. arXiv preprint arXiv:2310.00819.
  96. Yang, K., Ji, S., Zhang, T., Xie, Q., Kuang, Z., & Ananiadou, S. (2023). Towards interpretable mental health analysis with large language models. arXiv preprint arXiv:2304.03347.
  97. Singh, I., Blukis, V., Mousavian, A., Goyal, A., Xu, D., Tremblay, J., ... & Garg, A. (2022). Progprompt: Generating situated robot task plans using large language models. arXiv preprint arXiv:2209.11302.
  98. Santu, S. K. K., & Feng, D. (2023). Teler: A general taxonomy of llm prompts for benchmarking complex tasks. arXiv preprint arXiv:2305.11430.
  99. Zamfirescu-Pereira, J. D., Wong, R. Y., Hartmann, B., & Yang, Q. (2023, April). Why Johnny can’t prompt: how non-AI experts try (and fail) to design LLM prompts. In Proceedings of the 2023 CHI conference on human factors in computing systems (pp. 1-21).
  100. Diao, S., Wang, P., Lin, Y., Pan, R., Liu, X., & Zhang, T. (2023). Active prompting with chain-of-thought for large language models. arXiv preprint arXiv:2302.12246.
  101. Dong, Q., Li, L., Dai, D., Zheng, C., Ma, J., Li, R., ... & Sui, Z. (2022). A survey on in-context learning. arXiv preprint arXiv:2301.00234.
  102. Kojima, T., Gu, S. S., Reid, M., Matsuo, Y., & Iwasawa, Y. (2022). Large language models are zero-shot reasoners. Advances in neural information processing systems, 35, 22199-22213.
  103. Liu, F., Eisenschlos, J. M., Piccinno, F., Krichene, S., Pang, C., Lee, K., ... & Altun, Y. (2022). Deplot: One-shot visual language reasoning by plot-to-table translation. arXiv preprint arXiv:2212.10505.
  104. Liu, X., McDuff, D., Kovacs, G., Galatzer-Levy, I., Sunshine, J., Zhan, J., ... & Patel, S. (2023). Large language models are few-shot health learners. arXiv preprint arXiv:2305.15525.
  105. Hu, Z., Wang, L., Lan, Y., Xu, W., Lim, E. P., Bing, L., ... & Lee, R. K. W. (2023). Llm-adapters: An adapter family for parameter-efficient fine-tuning of large language models. arXiv preprint arXiv:2304.01933.
  106. Miyake, D., Iohara, A., Saito, Y., & Tanaka, T. (2025, February). Negative-prompt inversion: Fast image inversion for editing with text-guided diffusion models. In 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (pp. 2063-2072). IEEE.
  107. Liu, N., Li, S., Du, Y., Torralba, A., & Tenenbaum, J. B. (2022, October). Compositional visual generation with composable diffusion models. In European conference on computer vision (pp. 423-439). Cham: Springer Nature Switzerland.
  108. Ma, F., Zhang, C., Ren, L., Wang, J., Wang, Q., Wu, W., ... & Song, D. (2022). Xprompt: Exploring the extreme of prompt tuning. arXiv preprint arXiv:2210.04457.
  109. Tumanyan, N., Geyer, M., Bagon, S., & Dekel, T. (2023). Plug-and-play diffusion features for text-driven image-to-image translation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1921-1930).
  110. Chang, H., Zhang, H., Barber, J., Maschinot, A. J., Lezama, J., Jiang, L., ... & Krishnan, D. (2023). Muse: Text-to-image generation via masked generative transformers. arXiv preprint arXiv:2301.00704.
  111. Chen, A., Yao, Y., Chen, P. Y., Zhang, Y., & Liu, S. (2023). Understanding and improving visual prompting: A label-mapping perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 19133-19143).
  112. Bar, A., Gandelsman, Y., Darrell, T., Globerson, A., & Efros, A. (2022). Visual prompting via image inpainting. Advances in Neural Information Processing Systems, 35, 25005-25017.
  113. Jia, M., Tang, L., Chen, B. C., Cardie, C., Belongie, S., Hariharan, B., & Lim, S. N. (2022, October). Visual prompt tuning. In European conference on computer vision (pp. 709-727). Cham: Springer Nature Switzerland.
  114. Chakrabarty, T., Saakyan, A., Winn, O., Panagopoulou, A., Yang, Y., Apidianaki, M., & Muresan, S. (2023). I spy a metaphor: Large language models and diffusion models co-create visual metaphors. arXiv preprint arXiv:2305.14724.
  115. Volum, R., Rao, S., Xu, M., DesGarennes, G. A., Brockett, C., Van Durme, B., ... & Dolan, B. (2022, July). Craft an iron sword: Dynamically generating interactive game characters by prompting large language models tuned on code. In The Third Wordplay: When Language Meets Games Workshop.
  116. Hegde, D., Valanarasu, J. M. J., & Patel, V. (2023). Clip goes 3d: Leveraging prompt tuning for language grounded 3d recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 2028-2038).
  117. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. nature, 323(6088), 533-536.
  118. Ye, R., Tang, S., Ge, R., Du, Y., Yin, Z., Chen, S., & Shao, J. (2025). MAS-GPT: Training LLMs to build LLM-based multi-agent systems. arXiv preprint arXiv:2503.03686.
  119. Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., ... & McGrew, B. (2023). Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
  120. Cobbe, K., Kosaraju, V., Bavarian, M., Chen, M., Jun, H., Kaiser, L., ... & Schulman, J. (2021). Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
  121. Austin, J., Odena, A., Nye, M., Bosma, M., Michalewski, H., Dohan, D., ... & Sutton, C. (2021). Program synthesis with large language models. arXiv preprint arXiv:2108.07732.
  122. Hutzenthaler, M., Jentzen, A., Pohl, K., Riekert, A., & Scarpa, L. (2021). Convergence proof for stochastic gradient descent in the training of deep neural networks with ReLU activation for constant target functions. arXiv preprint arXiv:2112.07369.
  123. Yi, S., Liu, Y., Sun, Z., Cong, T., He, X., Song, J., ... & Li, Q. (2024). Jailbreak attacks and defenses against large language models: A survey. arXiv preprint arXiv:2407.04295.
  124. Zhou, H., Wan, X., Sun, R., Palangi, H., Iqbal, S., Vulić, I., ... & Arık, S. Ö. (2025). Multi-agent design: Optimizing agents with better prompts and topologies. arXiv preprint arXiv:2502.02533.
  125. Banerjee, S., Sahu, P., Luo, M., Vahldiek-Oberwagner, A., Yadwadkar, N. J., & Tiwari, M. (2024). Sok: A systems perspective on compound ai threats and countermeasures. arXiv preprint arXiv:2411.13459.
  126. Debenedetti, E., Severi, G., Carlini, N., Choquette-Choo, C. A., Jagielski, M., Nasr, M., ... & Tramèr, F. (2024). Privacy side channels in machine learning systems. In 33rd USENIX Security Symposium (USENIX Security 24) (pp. 6861-6848).
  127. Zheng, C., Chen, J., Lyu, Y., Ng, W. Z. T., Zhang, H., Ong, Y. S., ... & Yin, H. (2025). MermaidFlow: Redefining Agentic Workflow Generation via Safety-Constrained Evolutionary Programming. arXiv preprint arXiv:2505.22967.
  128. Dai, J., Pan, X., Sun, R., Ji, J., Xu, X., Liu, M., ... & Yang, Y. (2023). Safe rlhf: Safe reinforcement learning from human feedback. arXiv preprint arXiv:2310.12773.
  129. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., ... & Zheng, X. (2016). Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467.
  130. Novac, O. C., Chirodea, M. C., Novac, C. M., Bizon, N., Oproescu, M., Stan, O. P., & Gordan, C. E. (2022). Analysis of the application efficiency of TensorFlow and PyTorch in convolutional neural network. Sensors, 22(22), 8872.