Deconstructing Digital Discourse: A Deep Dive into Distinguishing LLM-Powered Chatbots from Human Language

Jiarui  Rao; Qian Zhang

Authors

Jiarui Rao Uber Technologies Inc., LA, USA
Qian Zhang The Chinese University of Hong Kong, HK

Keywords:

Deberta v3, Machine learning, Chatbots

Abstract

In recent years, chatbots powered by Large Language Models (LLMs) have garnered significant attention in the field of artificial intelligence. These models are sophisticated natural language processing systems trained using advanced deep learning techniques. The development process involves several crucial steps. Initially, the dataset is visualized and analyzed to understand its characteristics. This is followed by text preprocessing to clean and prepare the data for training. Subsequently, the language for the chatbot is generated and further processed using Deberta v3. Finally, a machine learning classifier is employed to distinguish between text generated by the chatbot and natural human language. The evaluation results indicate that the model achieves an accuracy of 85%, a precision of 60%, a recall of 62%, and an F1 score of 0.61. The high accuracy demonstrates the model's capability to differentiate between chatbot-generated text and natural language. However, the precision and recall values, both close to 60%, still suggest a significant degree of ambiguity. This makes it relatively easy for chatbot-generated text to be confused with natural human language.

References

Fama, E. F., & French, K. R. (1993). Common risk factors in the returns on stocks and bonds. Journal of Financial Economics, 33(1), 3-56.

Xu, Y., Shan, X., Guo, M., Gao, W., & Lin, Y. S. (2024). Design and Application of Experience Management Tools from the Perspective of Customer Perceived Value: A Study on the Electric Vehicle Market. World Electric Vehicle Journal, 15(8), 378.

Chen, M., Chen, Y., & Zhang, Q. (2021). A review of energy consumption in the acquisition of bio-feedstock for microalgae biofuel production. Sustainability, 13(16), 8873.

Chen, M., Chen, Y., & Zhang, Q. (2024). Assessing global carbon sequestration and bioenergy potential from microalgae cultivation on marginal lands leveraging machine learning. Science of The Total Environment, 948, 174462.Zheng, H., Wang, B., Xiao, M., Qin, H., Wu, Z., & Tan, L. (2024, July). Adaptive friction in deep learning: Enhancing optimizers with sigmoid and tanh function. In 2024 IEEE 6th International Conference on Power, Intelligent Computing and Systems (ICPICS) (pp. 809-813). IEEE.

Chen, M. (2021, December). Annual precipitation forecast of Guangzhou based on genetic algorithm and backpropagation neural network (GA-BP). In International Conference on Algorithms, High Performance Computing, and Artificial Intelligence (AHPCAI 2021) (Vol. 12156, pp. 182-186). SPIE.

Zhang, X., Soe, A. N., Dong, S., Chen, M., Wu, M., & Htwe, T. (2024). Urban Resilience through Green Roofing: A Literature Review on Dual Environmental Benefits. In E3S Web of Conferences (Vol. 536, p. 01023). EDP Sciences.

Dong, S., Xu, T., & Chen, M. (2022, October). Solar radiation characteristics in Shanghai. In Journal of Physics: Conference Series (Vol. 2351, No. 1, p. 012016). IOP Publishing.

Wang, Randi, and Morad Behandish. "Surrogate modeling for physical systems with preserved properties and adjustable tradeoffs." arXiv preprint arXiv:2202.01139 (2022).

Zhang, Q., Guan, Y., Zhang, Z., Dong, S., Yuan, T., Ruan, Z., & Chen, M. (2024). Sustainable microalgae cultivation: A comprehensive review of open and enclosed systems for biofuel and high value compound production. In E3S Web of Conferences (Vol. 577, p. 01008). EDP Sciences.

Wang, Randi, and Vadim Shapiro. "Topological semantics for lumped parameter systems modeling." Advanced Engineering Informatics 42 (2019): 100958.

Wang, Randi. Consistency Analysis Between Lumped and Distributed Parameter Models. The University of Wisconsin-Madison, 2021.

Yang, R. (2024). CaseGPT: a case reasoning framework based on language models and retrieval-augmented generation. arXiv preprint arXiv:2407.07913.

Sun, Y., Salami Pargoo, N., Jin, P., & Ortiz, J. (2024, October). Optimizing Autonomous Driving for Safety: A Human-Centric Approach with LLM-Enhanced RLHF. In Companion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing (pp. 76-80).

Li, Keqin, et al. "Exploring the Impact of Quantum Computing on Machine Learning Performance." (2024).

Wang, Zixiang, et al. "Research on Autonomous Driving Decision-making Strategies based Deep Reinforcement Learning." arXiv preprint arXiv:2408.03084 (2024).

Yan, Hao, et al. "Research on Image Generation Optimization based Deep Learning." (2024).

Tang, Xirui, et al. "Research on Heterogeneous Computation Resource Allocation based on Data-driven Method." arXiv preprint arXiv:2408.05671 (2024).

Su, Pei-Chiang, et al. "A Mixed-Heuristic Quantum-Inspired Simplified Swarm Optimization Algorithm for scheduling of real-time tasks in the multiprocessor system." Applied Soft Computing 131 (2022): 109807.

Zhao, Yuwen, Baojun Hu, and Sizhe Wang. "Prediction of Brent crude oil price based on LSTM model under the background of low-carbon transition."arXiv preprint arXiv:2409.12376(2024).

Diao, Su, et al. "Ventilator pressure prediction using recurrent neural network." arXiv preprint arXiv:2410.06552 (2024).

Zhao, Qinghe, Yue Hao, and Xuechen Li. "Stock Price Prediction Based on Hybrid CNN-LSTM Model." (2024).

Yin, Ziqing, Baojun Hu, and Shuhan Chen. "Predicting Employee Turnover in the Financial Company: A Comparative Study of CatBoost and XGBoost Models." (2024).

Xu, Q., Wang, T., & Cai, X. (2024). Energy Market Price Forecasting and Financial Technology Risk Management Based on Generative AI. Preprints. https://doi.org/10.20944/preprints202410.2161.v1

Wu, X., Xiao, Y., & Liu, X. (2024). Multi-Class Classification of Breast Cancer Gene Expression Using PCA and XGBoost. Preprints. https://doi.org/10.20944/preprints202410.1775.v2

Wang, H., Zhang, G., Zhao, Y., Lai, F., Cui, W., Xue, J., Wang, Q., Zhang, H., & Lin, Y. (2024). RPF-ELD: Regional Prior Fusion Using Early and Late Distillation for Breast Cancer Recognition in Ultrasound Images. Preprints. https://doi.org/10.20944/preprints202411.1419.v1

Min, L., Yu, Q., Zhang, Y., Zhang, K., & Hu, Y. (2024, October). Financial Prediction Using DeepFM: Loan Repayment with Attention and Hybrid Loss. In 2024 5th International Conference on Machine Learning and Computer Application (ICMLCA) (pp. 440-443). IEEE.

Accurate Prediction of Temperature Indicators in Eastern China Using a Multi-Scale CNN-LSTM-Attention model

Rao, Jiarui, Qian Zhang, and Xinqiu Liu. "Applications Analyzing E-commerce Reviews with Large Language Models (LLMs): A Methodological Exploration and Application Insight." Journal of Artificial Intelligence General science (JAIGS) ISSN: 3006-4023 7.01 (2024): 207-212.

Zhang, Qian, et al. "Sea MNF vs. LDA: Unveiling the Power of Short Text Mining in Financial Markets." International Journal of Engineering and Management Research 14.5 (2024): 76-82.

Rao, Jiarui, et al. "Machine Learning in Action: Topic-Centric Sentiment Analysis and Its Applications." (2024).

Qian, Chenghao, et al. "WeatherDG: LLM-assisted procedural weather generation for domain-generalized semantic segmentation." arXiv preprint arXiv:2410.12075 (2024).

Xiao, Zhaomin, et al. "Short interest trend prediction with large language models." Proceedings of the 2024 International Conference on Innovation in Artificial Intelligence. 2024.

Xiao, Zhaomin, et al. "Corporate event prediction using earning call transcripts." Annual International Conference on Information Management and Big Data. Cham: Springer Nature Switzerland, 2023.

Xiao, Zhaomin, et al. "Corporate Event Predictions Using Large Language Models." 2023 10th International Conference on Soft Computing & Machine Intelligence (ISCMI). IEEE, 2023.

Xiao, Zhaomin, Eduardo Blanco, and Yan Huang. "Analyzing Large Language Models’ Capability in Location Prediction." Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). 2024.

Xiao, Zhaomin, Yan Huang, and Eduardo Blanco. "Context helps determine spatial knowledge from tweets." Findings of the Association for Computational Linguistics: IJCNLP-AACL 2023 (Findings). 2023.

Xiao, Zhaomin, and Eduardo Blanco. "Are people located in the places they mention in their tweets? a multimodal approach." Proceedings of the 29th International Conference on Computational Linguistics. 2022.

Xiao, Zhaomin, et al. "Short interest trend prediction." 2024 6th International Conference on Natural Language Processing (ICNLP). IEEE, 2024.

Mai, Zhelu, et al. "Financial sentiment analysis meets llama 3: A comprehensive analysis." Proceedings of the 2024 7th International Conference on Machine Learning and Machine Intelligence (MLMI). 2024.

Zhang, Jinran, et al. "Is llama 3 good at identifying emotion? a comprehensive study." Proceedings of the 2024 7th International Conference on Machine Learning and Machine Intelligence (MLMI). 2024.

Mai, Zhelu, et al. "Is llama 3 good at sarcasm detection? a comprehensive study." Proceedings of the 2024 7th International Conference on Machine Learning and Machine Intelligence (MLMI). 2024.

Yu, Chenyang, et al. "Comparative Study of Intersection Management Algorithms for Autonomous Vehicles." 2024 6th International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI). IEEE, 2024.

Fang, Jingxing, et al. "A Comparative Study of Sequential Deep Learning Models in Financial Time Series Forecasting." 2024 11th International Conference on Soft Computing & Machine Intelligence (ISCMI). IEEE, 2024.

Wu, Yingyi, et al. "A Survey on Origin-Destination Flow Prediction." 2024 11th International Conference on Soft Computing & Machine Intelligence (ISCMI). IEEE, 2024.

Yu, Chenyang, et al. "Crime Prediction Using Spatial-Temporal Synchronous Graph Convolutional Networks." 2024 11th International Conference on Soft Computing & Machine Intelligence (ISCMI). IEEE, 2024.

Yu, Chenyang, et al. "A Social Value Orientation-Based Priority Swapping Algorithm for Efficient Autonomous Intersection Management." 2024 11th International Conference on Soft Computing & Machine Intelligence (ISCMI). IEEE, 2024.

Wu, Yingyi, et al. "Recent Technologies in Differential Privacy for NLP Applications." 2024 11th International Conference on Soft Computing & Machine Intelligence (ISCMI). IEEE, 2024.

Xing, Jinming, et al. "Network Traffic Forecasting via Fuzzy Spatial-Temporal Fusion Graph Neural Networks." 2024 11th International Conference on Soft Computing & Machine Intelligence (ISCMI). IEEE, 2024.

Wu, Yingyi, et al. "Can LLaMA 3 Understand Monetary Policy?." 2024 17th International Conference on Advanced Computer Theory and Engineering (ICACTE). IEEE, 2024.

Lu, Yawen, Yuxing Wang, Devarth Parikh, Yuan Xin, and Guoyu Lu. "Extending single beam lidar to full resolution by fusing with single image depth estimation." In 2020 25th international conference on pattern recognition (ICPR), pp. 6343-6350. IEEE, 2021.

Lu, Yawen, Zhuoyang Sun, Jinyuan Shao, Qianyu Guo, Yunhan Huang, Songlin Fei, and Victor Chen. "LiDAR-Forest Dataset: LiDAR Point Cloud Simulation Dataset for Forestry Application." In 2024 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), pp. 112-116. IEEE, 2024.

Han, Cheng, Yawen Lu, Guohao Sun, James C. Liang, Zhiwen Cao, Qifan Wang, Qiang Guan et al. "Prototypical Transformer as Unified Motion Learners." arXiv preprint arXiv:2406.01559 (2024).

Ling, Lu, Yichen Sheng, Zhi Tu, Wentian Zhao, Cheng Xin, Kun Wan, Lantao Yu et al. "Dl3dv-10k: A large-scale scene dataset for deep learning-based 3d vision." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22160-22169. 2024.

Lu, Yawen, Yuxing Wang, Devarth Parikh, Awais Khan, and Guoyu Lu. "Simultaneous direct depth estimation and synthesis stereo for single image plant root reconstruction." IEEE Transactions on Image Processing 30 (2021): 4883-4893.

Li, W. Building an Intelligent E-commerce Platform: From System Design to Meeting User Emotional Needs. Int. J. Comput. Sci. Inf. Technol. 2024, 4, 449–455, https://doi.org/10.62051/ijcsit.v4n3.51.

Yang, Rui, and Rajiv Gupta. “Enhancing Multi-Modal Relation Extraction with Reinforcement Learning Guided Graph Diffusion Framework.” Proceedings of the 31st International Conference on Computational Linguistics, 2025, pp. 978–988.

Wang, Randi, Vadim Shapiro, and Morad Mehandish. "Model consistency for mechanical design: Bridging lumped and distributed parameter models with a priori guarantees." Journal of Mechanical Design 146.5 (2024): 051710.

Wang, Randi, and Morad Behandish. "Surrogate modeling for physical systems with preserved properties and adjustable tradeoffs." arXiv preprint arXiv:2202.01139 (2022).

Wang, Randi, and Vadim Shapiro. "Topological semantics for lumped parameter systems modeling." Advanced Engineering Informatics 42 (2019): 100958.

Deconstructing Digital Discourse: A Deep Dive into Distinguishing LLM-Powered Chatbots from Human Language

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Current Issue

Information

Resources

PaperFormat

Browse

Developed By

Make a Submission

Keywords

Journal of Theory and Practice in Education and Innovation (JTPEI)

CONTACT US