Exploring the Potential of ChatGPT-4o in Translation Quality Assessment
Keywords:
Multidimensional Quality Metrics, ChatGPT-4o, Translation quality assessment, Large language modelsAbstract
The advancement of large language models (LLMs) has demonstrated significant potential in the domains of foreign language teaching and research. By evaluating two translated works of MTI students, this study discusses the application effect of ChatGPT-4o in the evaluation of human translation based on Multidimensional Quality Metrics (MQM). The research involves literary texts and non-literary texts and conducts human translations before human and ChatGPT-4o modifications. Subsequently, the versions will be evaluated and compared in accordance with MQM standards. Through the score comparison and the qualitative analysis, the results show that ChatGPT-4o demonstrates high consistency with human evaluators in evaluating translations based on the MQM method. The scores and suggested modifications significantly enhance the translation quality, particularly in terms of maintaining terminology consistency and ensuring grammatical accuracy.
References
Atlas, S. 2023. “ChatGPT for higher education and professional development: A guide to conversational AI”. https://digitalcommons. uri.edu/cba_facpubs/548.
Burchardt, A. 2013. “Multidimensional quality metrics: a flexible system for assessing translation quality.” In Proceedings of Translating and the Computer 35: 1-7.
Bang, Y. et al. 2023. “A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity.” arxiv preprint arxiv: 2302.04023. https://doi.org/10.48550/arXiv.2302.04023.
Brewster, R. C. et al. 2024. “Performance of ChatGPT and Google Translate for Pediatric Discharge Instruction Translation.” Pediatrics. 154(1): e2023065573. https://doi.org/10.1542/peds.2023-065573.
Bubeck, S. et al. 2023. “Sparks of artificial general intelligence: Early experiments with gpt-4.” ArXiv preprint, abs/ 2303.12712. https://doi.org/10.48550 /arXiv.2303.12712
Chang, Y. et al. 2024. “A survey on evaluation of large language models.” ACM Transactions on Intelligent Systems and Technology. 15 (3) : 1-45. https://doi.org/10.1145/3641289.
Chowdhery, A. et al. 2023. “Palm: Scaling language modeling with pathways.” Journal of Machine Learning Research. 24(240): 1-113.
Cao, S. and Zhong, L. 2023. “Exploring the effectiveness of ChatGPT-based feedback compared with teacher feedback and self-feedback: Evidence from Chinese to English translation.” arXiv preprint arXiv:2309.01645. https://doi.org/10.48550/arXiv.2309.01645
Dai, W. et al. 2023. “Can Large Language Models Provide Feedback to Students? A Case Study on ChatGPT.” 2023 IEEE International Conference on Advanced Learning Technologies (ICALT). pp.323-325. doi: 10.1109/ICALT58122.2023.00100.
Davis, E. 2024. “Mathematics, word problems, common sense, and artificial intelligence.” Bulletin of the American Mathematical Society. 61(2): 287-303.
de Winter, J.C. 2023. “ Can ChatGPT Pass High School Exams on English Language Comprehension?” International Journal of Artificial Intelligence in Education. 1-16. https://doi.org/10.1007/s40593-023-00372-z.
Frieder, S. et al. 2024. “Mathematical capabilities of chatgpt.” Advances in neural information processing systems. 36.
Gilson, A. et al. 2022. “How well does ChatGPT do when taking the medical licensing exams? The implications of large language models for medical education and knowledge assessment.” MedRxiv. 23.
Guo, B. et al. 2023. “How close is chatgpt to human experts? comparison corpus, evaluation, and detection.” arxiv preprint arxiv:2301.07597. https://doi.org/10.48550/arXiv.2301.07597.
Hendy, A. et al. 2023. “How good are gpt models at machine translation? a comprehensive evaluation.” arXiv preprint arXiv : 2302.09210. https://doi.org/10.48550 /arXiv.2302.09210.
Hellas, A. et al. 2023. “Exploring the responses of large language models to beginner programmers’ help requests.” In Proceedings of the 2023 ACM Conference on International Computing Education Research. (1): 93-105. https://doi.org/10.1145/3568813.3600139.
Hidayati, N. N. and Nihayah, D. H. 2024. “Google Translate, ChatGPT or Google Bard AI: A Study toward Non-English Department College Students’ Preference and Translation Comparison.” Inspiring: English Education Journal, 7(1), 14-33. https://doi.org/10.35905/inspiring.v7i1.8821
Jung, D. et al. 2023. “Enhancing Machine Translation Quality Estimation via Fine-Grained Error Analysis and Large Language Model.” Mathematics. 11(19). 4169. https://doi.org/10.3390/math11194169.
Kasneci, E. et al. 2023. “ChatGPT for good? On opportunities and challenges of large language models for education.” Learning and individual differences. 103, 102274. https://doi.org/10.1016/j.lindif.2023.102274.
Khoshafah, F. 2023. “Chatgpt for arabic-english translation: Evaluating the accuracy.” 1-20. https://doi.org/10.21203/rs.3.rs-2814154/v2.
Lommel, A. 2018. Metrics for Translation Quality Assessment: A case for standardising error typologies. In Moorkens, J., et al. (eds.) Translation Quality Assessment, Machine Translation: Technologies and Applications vol. 1, pp. 109–128. Springer, Switzerland.
Liang, P. et al. 2022. “Holistic evaluation of language models.” arxiv preprint arxiv:2211.09110. https://doi.org/10.48550/arXiv.2211.09110.
Lin, Z., & Chen, H. 2024. “Investigating the capability of ChatGPT for generating multiple-choice reading comprehension items.” System. 123: 103344. https://doi.org/10.1016/j.system.2024.103344.
Farrell. M. 2023. “Preliminary evaluation of ChatGPT as a machine translation engine and as an automatic post-editor of raw machine translation output from other machine translation engines.” Proceedings of the International Conference HiT-IT 2023, pages 108–113. https://doi.org/10.26615/issn.2683-0078.2023_007
Mohsen, M. 2024. “Artificial Intelligence in Academic Translation: A Comparative Study of Large Language Models and Google Translate.” PSYCHOLINGUISTICS, 35(2), 134-156. https://doi.org/10.31470/2309-1797-2024-35-2-134-156.
Muennighoff, N. et al. 2022. “Crosslingual generalization through multitask finetuning.” arxiv preprint arxiv: 2211.01786. https://doi.org/10.48550/arXiv.2211.01786.
Pang, Y. and Wang, X. 2023. “A Study on the Translation Quality of ChatGPT in the Context of Large Language Model——A Case Study of Shaanxi Local LiteratureLife(Excerpt).” Modern English. (22): 67-70. doi:CNKI:SUN:XDYM.0.2023-22-019.
Qin, C., Zhang, A., Zhang, Z., Chen, J., Yasunaga, M., and Yang, D. 2023. “Is ChatGPT a general-purpose natural language processing task solver?” arxiv preprint arxiv:2302.06476. https://doi.org/10.48550/arXiv.2302.06476
Rachid. E. D. 2024. “Comparative Analysis of Copilot 4 and Chatgpt 4 for Literary Translation: A Comprehensive Evaluation.” Available at SSRN 4782157. http://dx.doi.org/10.2139/ssrn.4782157.
Roza, V. and Zulhirawati, Z. 2023. “Higher Students’ Perception of Using Chat GPT in Translating English Texts.” BiCED Proceeding, 1: 64–73. Retrieved from https://proceedings.uinbukittinggi.ac.id/biced/article/view/278
Siu, S. C. 2023. “ChatGPT and GPT-4 for Professional Translators: Exploring the Potential of Large Language Models in Translation.” 10.2139/ssrn.4448091.
ÜNLÜ, C. 2023. “Interpretutor: Using large language models for interpreter assessment.” Proceedings of the International Conference HiT-IT 2023, pages 78–96. https://doi.org/10.26615/issn.2683-0078.2023_007
Van Der Lee, C., Gatt, A., Van Miltenburg, E., Wubben, S., & Krahmer, E. 2019. “Best practices for the human evaluation of automatically generated text.” In Proceedings of the 12th International Conference on Natural Language Generation (pp. 355-368). DOI: 10.18653/v1/W19-8643
Widiatmika, P. W., et al. 2023. “Examining the result of machine translation for linguistic textbook from English to Indonesian.” In proceeding the second english national seminar “exploring emerging technologies in english education”. 54-65. LPPM Press STKIP PGRI PACITAN.
Wang, H. S and Xie, F. 2024. “A Study on the Innovation of Translation Education Practice Models Driven by Large Language Model Technology.” Chinese Translators Journal, (02), 70-78.
Wang, R. E. and Demszky, D. 2023. “Is chatgpt a good teacher coach? measuring zero-shot performance for scoring and providing actionable insights on classroom instruction.” arxiv preprint arxiv: 2306.03090. https://doi.org/10.48550/arXiv.2306.03090.
Wang, Z. et al. 2023. “Is ChatGPT a good sentiment analyzer? A preliminary study.” arxiv preprint arxiv:2304.04339. https://doi.org/10.48550/arXiv.2304.04339.
Wu, H. et al. 2023. “Chatgpt or grammarly? evaluating chatgpt on grammatical error correction benchmark.” arxiv preprint arxiv:2303.13648. https://doi.org/10.48550/arXiv.2303.13648.
Wang, Y. D. et al. 2023. “PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization.” arXiv preprint arXiv:2306.05087.
https://doi.org/10.48550/arXiv.2306.05087
Ye, L. 2024. “The Feasibility Study of Artificial Intelligence ChatGPT in Translation Field.” Frontiers in Computing and Intelligent Systems, 8(1), 52-57.
Zhang, B., Haddow, B., and Birch, A. 2023. “Prompting large language model for machine translation: A case study.” In International Conference on Machine Learning (pp. 41092-41110). PMLR.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Jingjing Wang
This work is licensed under a Creative Commons Attribution 4.0 International License.