类ChatGPT大模型发展、应用和前景

严昊; 刘禹良; 金连文; 白翔

doi:10.11834/jig.230536

综述 | 浏览量 : 0 下载量: 5 CSCD: 3

PDF
导出
分享
收藏
专辑

类ChatGPT大模型发展、应用和前景
The development， application， and future of LLM similar to ChatGPT
2023年28卷第9期页码：2749-2762
纸质出版日期： 2023-09-16 ，
DOI： 10.11834/jig.230536
稿件说明：

移动端阅览

严昊，刘禹良，金连文，白翔. 2023. 类ChatGPT大模型发展、应用和前景. 中国图象图形学报， 28(09):2749-2762

Yan Hao， Liu Yuliang， Jin Lianwen， Bai Xiang. 2023. The development， application， and future of LLM similar to ChatGPT. Journal of Image and Graphics， 28(09):2749-2762
严昊，刘禹良，金连文，白翔. 2023. 类ChatGPT大模型发展、应用和前景. 中国图象图形学报， 28(09):2749-2762 DOI： 10.11834/jig.230536.

Yan Hao， Liu Yuliang， Jin Lianwen， Bai Xiang. 2023. The development， application， and future of LLM similar to ChatGPT. Journal of Image and Graphics， 28(09):2749-2762 DOI： 10.11834/jig.230536.

摘要

生成式人工智能技术自ChatGPT发布以来，不断突破瓶颈，吸引了资本规模投入、多领域革命和政府重点关注。本文首先分析了大模型的发展动态、应用现状和前景，然后从以下3个方面对大模型相关技术进行了简要介绍：1）概述了大模型相关构造技术，包括构造流程、研究现状和优化技术；2）总结了3类当前主流图像—文本的大模型多模态技术；3）介绍了根据评估方式不同而划分的3类大模型评估基准。参数优化与数据集构建是大模型产品普及与技术迭代的核心问题；多模态能力是大模型重要发展方向之一；设立评估基准是比较与约束大模型的关键方法。此外，本文还讨论了现有相关技术面临的挑战与未来可能的发展方向。现阶段的大模型产品已有强大的理解能力和创造能力，在教育、医疗和金融等领域已展现出广阔的应用前景。但同时，它们也存在训练部署困难、专业知识不足和安全隐患等问题。因此，完善参数优化、优质数据集构建、多模态等技术，并建立统一、全面、便捷的评估基准，将成为大模型突破现有局限的关键。

Abstract

Generative artificial intelligence （AI） technology has achieved remarkable breakthroughs and advances in its intelligence level since the release of ChatGPT several months ago， especially in terms of its scope， automation， and intelligence. The rising popularity of generative AI attracts capital inflows and promotes the innovation of various fields. Moreover， governments worldwide pay considerable attention to generative AI and hold different attitudes toward it. The US government maintains a relatively relaxed attitude to stay ahead in the global technological arena， while European countries are conservative and are concerned about data privacy in large language models （LLMs）. The Chinese government attaches great importance to AI and LLMs but also emphasizes the regulatory issues. With the growing influence of ChatGPT and its competitors and the rapid development of generative AI technology， conducting a deep analysis of them becomes necessary. This paper first provides an in-depth analysis of the development， application， and prospects of generative AI. Various types of LLMs have emerged as a series of remarkable technological products that have demonstrated versatile capabilities across multiple domains， such as education， medicine， finance， law， programming， and paper writing. These models are usually fine-tuned on the basis of general LLMs， with the aim of endowing the large models with additional domain-specific knowledge and enhanced adaptability to a specific domain. LLMs （e.g.， GPT-4） have achieved rapid improvements in the past few months in terms of professional knowledge， reasoning， coding， credibility， security， transferability， and multimodality. Then， the technical contribution of generative AI technology is briefly introduced from four aspects： 1） we review the related work on LLMs， such as GPT-4， PaLM2， ERNIE Bot， and their construction pipeline， which involves the training of base and assistant models. The base models store a large amount of linguistic knowledge， while the assistant models acquire stronger comprehension and generation capabilities after a series of fine-tuning. 2） We outline a series of public LLMs based on LLaMA， a framework for building lightweight and memory-efficient LLMs， including Alpaca， Vicuna， Koala， and Baize， as well as the key technologies for building LLMs with low memory and computation requirements， consisting of low-rank adaptation， Self-instruct， and automatic prompt engineer. 3） We summarize three types of existing mainstream image–text multimodal techniques： training additional adaptation layers to align visual modules and language models， multimodal instruction fine-tuning， and LLM serving as the center of understanding. 4） We introduce three types of LLM evaluation benchmarks based on different implementation methods， namely， manual evaluation， automatic evaluation， and LLM evaluation. Parameter optimization and fine-tuning dataset construction are crucial for the popularization and innovation of generative AI products because they can significantly reduce the training cost and computational resource consumption of LLMs while enhancing the diversity and generalization ability of LLMs. Multimodal capability is the future trend of generative AI because multimodal models have the ability to integrate information from multiple perceptual dimensions， which is consistent with human cognition. Evaluation benchmarks are the key methods to compare and constrain the models of generative AI， given that they can efficiently measure and optimize the performance and generalization ability of LLMs and reveal their strengths and limitations. In conclusion， improving parameter optimization， high-quality dataset construction， multimodal， and other technologies and establishing a unified， comprehensive， and convenient evaluation benchmark will be the key to achieving further development in generative AI. Furthermore， the current challenges and possible future directions of the related technologies are discussed in this paper. Existing generative AI products have considerable creativity， understanding， and intelligence and have shown broad application prospects in various fields， such as empowering content creation， innovating interactive experience， creating “digital life，” serving as smart home and family assistants， and realizing autonomous driving and intelligent car interaction. However， LLMs still exhibit some limitations， such as lack of high-quality training data， susceptibility to hallucinations， output factual errors， uninterpretability， high training and deployment costs， and security and privacy issues. Therefore， the potential research directions can be divided into three aspects： 1） the data aspect focuses on the input and output of LLMs， including the construction of general tuning instruction datasets and domain-specific knowledge datasets. 2） The technical aspect improves the internal structure and function of LLMs， including the training， multimodality， principle innovation， and structure pruning of LLMs. 3） The application aspect enhances the practical effect and application value of LLMs， including security enhancement， evaluation system development， and LLM application engineering implementation. The advancement of generative AI has provided remarkable benefits for economic development. However， it also entails new opportunities and challenges for various stakeholders， especially the industry and the general public. On the one hand， the industry needs to foster a large pool of researchers who can conduct systematic and cutting-edge research on generative AI technologies， which are constantly improving and innovating. On the other hand， the general public needs to acquire and apply the skills of prompt engineering， which can enable them to utilize existing LLMs effectively and efficiently.

关键词

人工智能（AI）ChatGPT多模态技术自然语言处理大模型（LLM）

Keywords

artificial intelligence（AI）ChatGPTmulti-modalitynatural language processinglarge language model（LLM）

references

Alayrac J B， Donahue J， Luc P， Miech A， Barr I， Hasson Y， Lenc K， Mensch A， Millican K， Reynolds M， Ring R， Rutherford E， Cabi S， Han T D， Gong Z T， Samangooei S， Monteiro M， Menick J L， Borgeaud S， Brock A， Nematzadeh A， Sharifzadeh S， Bińkowski M， Barreira R， Vinyals O， Zisserman A and Simonyan K. 2022. Flamingo： a visual language model for few-shot learning//Proceedings of the 36th International Conference on Neural Information Processing Systems. New Orleans， USA：［s.n.］： 23716-23736

Bubeck S， Chandrasekaran V， Eldan R， Gehrke J， Horvitz E， Kamar E， Lee P， Lee Y T， Li Y Z， Lundberg S， Nori H， Palangi H， Ribeiro M T and Zhang Y. 2023. Sparks of artificial general intelligence： early experiments with GPT-4［EB/OL］. ［2023-07-01］. https://arxiv.org/pdf/2303.12712.pdfhttps://arxiv.org/pdf/2303.12712.pdf

Clark P， Cowhey I， Etzioni O， Khot T， Sabharwal A， Schoenick C and Tafjord O. 2018. Think you have solved question answering？ TryARC， theAI2 reasoning challenge［EB/OL］. ［2023-07-01］. https://arxiv.org/pdf/1803.05457.pdfhttps://arxiv.org/pdf/1803.05457.pdf

Devlin J， Chang M W， Lee K and Toutanova K. 2019. BERT： pre-training of deep bidirectional transformers for language understanding//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics. Minneapolis， USA： Association for Computational Linguistics： 4171-4186 ［DOI： 10.18653/v1/N19-1423http://dx.doi.org/10.18653/v1/N19-1423］

Dubois Y， Li X C， Taori R， Zhang T Y， Gulrajani I， Ba J， Guestrin C， Liang P and Hashimoto T B. 2023. AlpacaFarm： a simulation framework for methods that learn from human feedback［EB/OL］. ［2023-07-01］. https://arxiv.org/pdf/2305.14387.pdfhttps://arxiv.org/pdf/2305.14387.pdf

Girdhar R， El-Nouby A， Liu Z， Singh M， Alwala K V， Joulin A and Misra I. 2023. Imagebind： one embedding space to bind them all//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver， Canada： IEEE： 15180-15190

Guo B Y， Zhang X， Wang Z Y， Jiang M Q， Nie J R， Ding Y X， Yue J W and Wu Y P. 2023. How close is ChatGPT to human experts？ corpusComparison， evaluation， and detection［EB/OL］. ［2023-07-01］. https://arxiv.org/pdf/2301.07597.pdfhttps://arxiv.org/pdf/2301.07597.pdf

Hendrycks D， Burns C， Basart S， Zou A， Mazeika M， Song D and Steinhardt J. 2021. Measuring massive multitask language understanding//Proceedings of the 9th International Conference on Learning Representations. ［s.l.］： OpenReview.net

Hu E J， Shen Y L， Wallis P， Allen-Zhu Z， Li Y Z， Wang S A， Wang L and Chen W Z. 2022. LoRA： low-rank adaptation of large language models//Proceedings of the Tenth International Conference on Learning Representations. ［s.l.］： OpenReview.net

Kim S， Bae S， Shin J， Kang S， Kwak D， Yoo K M and Seo M. 2023. Aligning large language models through synthetic feedback［EB/OL］. ［2023-07-01］. https://arxiv.org/pdf/2305.13735.pdfhttps://arxiv.org/pdf/2305.13735.pdf

Li J N， Li D X， Savarese S and Hoi S. 2023. BLIP-2： bootstrapping language-image pre-training with frozen image encoders and large language models［EB/OL］. ［2023-07-01］. https://arxiv.org/pdf/2301.12597.pdfhttps://arxiv.org/pdf/2301.12597.pdf

Liang P P， Zadeh A and Morency L P. 2022. Foundations and trends in multimodal machine learning： principles， challenges， and open questions［EB/OL］. ［2023-07-01］. https://arxiv.org/pdf/2209.03430.pdfhttps://arxiv.org/pdf/2209.03430.pdf

Liu H T， Li C Y， Wu Q Y and Lee Y J. 2023a. Visual instruction tuning［EB/OL］. ［2023-07-01］. https://arxiv.org/pdf/2304.08485.pdfhttps://arxiv.org/pdf/2304.08485.pdf

Liu Y, Duan H D, Zhang Y H, Li B, Zhang S Y, Zhao W B, Yuan Y K, Wang J Q, He C H, Liu Z W, Chen K and Lin D H. 2023b. MMBench: Is Your Multi-modal Model an All-around Player？［EB/OL］. ［2023-07-01］. https://arxiv.org/pdf/2307.06281.pdfhttps://arxiv.org/pdf/2307.06281.pdf

Liu Y， Li H and Bai X. 2023. A brief analysis of ChatGPT： historical evolution， current applications， and future prospects. Journal of Image and Graphics， 28（04）：893-902

刘禹良，李鸿亮，白翔等.2023.浅析ChatGPT：历史沿革、应用现状及前景展望.中国图象图形学报， 28（04）：893-902

Mankowitz D J， Michi A， Zhernov A， Gelmi M， Selvi M， Paduraru C， Leurent E， Iqbal S， Lespiau J B， Ahern A， Köppe T， Millikin K， Gaffney S， Elster S， Broshear J， Gamble C， Milan K， Tung R， Hwang M， Cemgil T， Barekatain M， Li Y J， Mandhane A， Hubert T， Schrittwieser J， Hassabis D， Kohli P， Riedmiller M， Vinyals O and Silver D. 2023. Faster sorting algorithms discovered using deep reinforcement learning. Nature， 618（7964）： 257-263 ［DOI： 10.1038/s41586-023-06004-9http://dx.doi.org/10.1038/s41586-023-06004-9］

Seenivasan L， Islam M， Kannan G and Ren H L. 2023. SurgicalGPT： end-to-end language-vision GPT for visual question answering in surgery［EB/OL］. ［2023-07-01］. https://arxiv.org/pdf/2304.09974.pdfhttps://arxiv.org/pdf/2304.09974.pdf

Shen Y L， Song K T， Tan X， Li D S， Lu W M and Zhuang Y T. 2023. HuggingGPT： solving AI tasks with chatGPT and its friends in hugging face［EB/OL］. ［2023-07-01］. https://arxiv.org/pdf/2303.17580.pdfhttps://arxiv.org/pdf/2303.17580.pdf

Singhal K， Tu T， Gottweis J， Sayres R， Wulczyn E， Hou L， Clark K， Pfohl S， Cole-Lewis H， Neal D， Schaekermann M， Wang A， Amin M， Lachgar S， Mansfield P， Prakash S， Green B， Dominowska E， Arcas B A Y， Tomasev N， Liu Y， Wong R， Semturs C， Mahdavi S S， Barral J， Webster D， Corrado G S， Matias Y， Azizi S， Karthikesalingam A and Natarajan V. 2023. Towards expert-level medical question answering with large language models［EB/OL］. ［2023-07-01］. https://arxiv.org/pdf/2305.09617.pdfhttps://arxiv.org/pdf/2305.09617.pdf

Tay Y， Dehghani M， Tran V Q， Garcia X， Wei J， Wang X Z， Chung H W， Bahri D， Schuster T， Zheng H S， Zhou D， Houlsby N and Metzler D. 2023. UL2： unifying language learning paradigms//Proceedings of the Eleventh International Conference on Learning Representations. Kigali， Rwanda： OpenReview： 1-33

Touvron H， Lavril T， Izacard G， Martinet X， Lachaux M A， Lacroix T， Rozière B， Goyal N， Hambro E， Azhar F， Rodriguez A， Joulin A， Grave E and Lample G. 2023. LLaMA： open and efficient foundation language models［EB/OL］. ［2023-07-01］. https://arxiv.org/pdf/2302.13971.pdfhttps://arxiv.org/pdf/2302.13971.pdf

Vaswani A， Shazeer N， Parmar N， Uszkoreit J， Jones L， Gomez A N， Kaiser Ł and Polosukhin I. 2017. Attention is all you need//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach， USA： Curran Associates Inc.： 6000-6010

Wang S， Zhao Z H， Ouyang X， Wang Q and Shen D G. 2023a. ChatCAD： interactive computer-aided diagnosis on medical image using large language models［EB/OL］. ［2023-07-01］. https://arxiv.org/pdf/2302.07257.pdfhttps://arxiv.org/pdf/2302.07257.pdf

Wang Y Z， Kordi Y， Mishra S， Liu A， Smith N A， Khashabi D and Hajishirzi H. 2023b. Self-instruct： aligning language models with self-generated instructions//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Toronto， Canada： ACL： 13484-13508 ［DOI： 10.18653/v1/2023.acl-long.754http://dx.doi.org/10.18653/v1/2023.acl-long.754］

Wu C F， Yin S M， Qi W Z， Wang X D， Tang Z C and Duan N. 2023b. Visual chatGPT： talking， drawing and editing with visual foundation models［EB/OL］. ［2023-07-01］. https://arxiv.org/pdf/2303.04671.pdfhttps://arxiv.org/pdf/2303.04671.pdf

Wu S J， Irsoy O， Lu S， Dabravolski V， Dredze M， Gehrmann S， Kambadur P， Rosenberg D and Mann G. 2023a. BloombergGPT： a large language model for finance［EB/OL］. ［2023-07-01］. https://arxiv.org/pdf/2303.17564.pdfhttps://arxiv.org/pdf/2303.17564.pdf

Xu C W， Xu Y C， Wang S H， Liu Y， Zhu C G and McAuley J. 2023. Small models are valuable plug-ins for large language models［EB/OL］. ［2023-07-01］. https://arxiv.org/pdf/2305.08848.pdfhttps://arxiv.org/pdf/2305.08848.pdf

Yang J F， Jin H Y， Tang R X， Han X T， Feng Q Z， Jiang H M， Yin B and Hu X. 2023a. Harnessing the power of llms in practice： a survey on chatgpt and beyond［EB/OL］. ［2023-07-01］. https://arxiv.org/pdf/2304.13712.pdfhttps://arxiv.org/pdf/2304.13712.pdf

Yang Z Y， Li L J， Wang J F， Lin K， Azarnasab E， Ahmed F， Liu Z C， Liu C， Zeng M and Wang L J. 2023b. MM-REACT： prompting ChatGPT for multimodal reasoning and action［EB/OL］. ［2023-07-01］. https://arxiv.org/pdf/2303.11381.pdfhttps://arxiv.org/pdf/2303.11381.pdf

Yao S Y， Yu D， Zhao J， Shafran I， Griffiths T L， Cao Y and Narasimhan K. 2023. Tree of thoughts： deliberate problem solving with large language models［EB/OL］. ［2023-07-01］. https://arxiv.org/pdf/2305.10601.pdfhttps://arxiv.org/pdf/2305.10601.pdf

Ye Q H， Xu H Y， Xu G H， Ye J B， Yan M， Zhou Y Y， Wang J Y， Hu A W， Shi P C， Shi Y Y， Li C L， Xu Y H， Chen H H， Tian J F， Qi Q， Zhang J and Huang F. 2023. mPLUG-Owl： modularization empowers large language models with multimodality［EB/OL］. ［2023-07-01］. https://arxiv.org/pdf/2304.14178.pdfhttps://arxiv.org/pdf/2304.14178.pdf

Zhang J Y， Vahidian S， Kuo M， Li C Y， Zhang R Y， Wang G Y and Chen Y R. 2023. Towards building the federated GPT： federated instruction tuning［EB/OL］. ［2023-07-01］. https://arxiv.org/pdf/2305.05644.pdfhttps://arxiv.org/pdf/2305.05644.pdf

Zheng L M， Chiang W L， Sheng Y， Zhuang S Y， Wu Z H， Zhuang Y H， Lin Z， Li Z H， Li D C， Xing E P， Zhang H， Gonzalez J E and Stoica I. 2023. Judging LLM-as-a-judge with MT-bench and chatbot arena［EB/OL］. ［2023-07-01］. https://arxiv.org/pdf/2306.05685.pdfhttps://arxiv.org/pdf/2306.05685.pdf

Zhong Q H， Ding L， Liu J H， Du B and Tao D C. 2023. Can ChatGPT understand too？ A comparative study on chatgpt and fine-tuned bert［EB/OL］. ［2023-07-01］. https://arxiv.org/pdf/2302.10198.pdfhttps://arxiv.org/pdf/2302.10198.pdf

Zhou Y C， Muresanu A I， Han Z W， Paster K， Pitis S， Chan H and Ba J. 2023. Large language models are human-level prompt engineers//Proceedings of the Eleventh International Conference on Learning Representations. Kigali， Rwanda： OpenReview.net

Zhu D Y， Chen J， Shen X Q， Li X and Elhoseiny M. 2023. MiniGPT-4： enhancing vision-language understanding with advanced large language models［EB/OL］. ［2023-07-01］. https://arxiv.org/pdf/2304.10592.pdfhttps://arxiv.org/pdf/2304.10592.pdf

文章被引用时，请邮件提醒。

提交

浅析ChatGPT: 历史沿革、应用现状及前景展望

视觉信息抽取的深度学习方法综述