Quantifying the Information Flow of Long Narratives: A Case Study of Jane Austin’s Works
Keywords:
information flow, entropy, narrative dynamics, Jane Austen, large language modelAbstract
This study employs digital humanities to analyze the information flow in Jane Austen's classic literature using the GPT-2 XL model. Entropy, a measure of unpredictability, quantifies the narrative's dynamic engagement with readers. By calculating the entropy of each sentence, the research reveals unique patterns of information gain across Austen's novels, reflecting the ebb and flow of reader surprise. Peaks in entropy correspond to narrative climaxes, while declines indicate more predictable plot developments. The findings suggest that digital tools can offer fresh insights into literary analysis, highlighting the interplay between predictability and surprise in narrative structure. This exploratory approach to literature enriches traditional literary studies and opens new avenues for understanding reader engagement with classic texts.
References
Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M. S., Bohg, J., Bosselut, A., Brunskill, E., Brynjolfsson, E., Buch, S., Card, D., Castellon, R., Chatterji, N., Chen, A., Creel, K., Davis, J. Q., Demszky, D., … Liang, P. (2022). On the Opportunities and Risks of Foundation Models (arXiv:2108.07258). arXiv. https://doi.org/10.48550/arXiv.2108.07258
Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q. V., & Salakhutdinov, R. (2019). Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context (arXiv:1901.02860). arXiv. https://doi.org/10.48550/arXiv.1901.02860
Estevez-Rams, E., Mesa-Rodriguez, A., & Estevez-Moya, D. (2019). Complexity-entropy analysis at different levels of organisation in written language. PLOS ONE, 14(5), e0214863. https://doi.org/10.1371/journal.pone.0214863
Frank, S. L. (2013). Uncertainty reduction as a measure of cognitive load in sentence comprehension. Topics in Cognitive Science, 5(3), 475–494. https://doi.org/10.1111/tops.12025
Freytag, G., & MacEwan, E. J. (1960). Technique of the Drama: An Exposition of Dramatic Composition and Art. https://www.semanticscholar.org/paper/Freytag's-Technique-of-the-Drama%3A-An-Exposition-of-Freytag/2882acb56b7917c2ddfe9b31d08cbf7f6fdb9031
Hale, J. (2006). Uncertainty about the rest of the sentence. Cognitive Science, 30(4), 643–672. https://doi.org/10.1207/s15516709cog0000_64
Hale, J. (2016). Information‐theoretical Complexity Metrics. Language and Linguistics Compass, 10(9), 397–412. https://doi.org/10.1111/lnc3.12196
Kukkonen, K. (2014). Bayesian Narrative: Probability, Plot and the Shape of the Fictional World. Anglia, 132(4). https://doi.org/10.1515/ang-2014-0075
Kullback, S., & Leibler, R. A. (1951). On Information and Sufficiency. The Annals of Mathematical Statistics, 22(1), 79–86. https://doi.org/10.1214/aoms/1177729694
Lambert, S. (2004). Shared Attention during Sight Translation, Sight Interpretation and Simultaneous Interpretation. Meta : Journal Des Traducteurs / Meta: Translators’ Journal, 49(2), 294–306. https://doi.org/10.7202/009352ar
Laurino Dos Santos, H., & Berger, J. (2022). The speed of stories: Semantic progression and narrative success. Journal of Experimental Psychology. General, 151(8), 1833–1842. https://doi.org/10.1037/xge0001171
Linzen, T., & Jaeger, T. F. (2016). Uncertainty and Expectation in Sentence Processing: Evidence From Subcategorization Distributions. Cognitive Science, 40(6), 1382–1411. https://doi.org/10.1111/cogs.12274
Lowder, M. W., Choi, W., Ferreira, F., & Henderson, J. M. (2018). Lexical Predictability During Natural Reading: Effects of Surprisal and Entropy Reduction. Cognitive Science, 42(S4), 1166–1183. https://doi.org/10.1111/cogs.12597
Moretti, F. (2013). Distant Reading. Verso Books.
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language Models are Unsupervised Multitask Learners. https://www.semanticscholar.org/paper/Language-Models-are-Unsupervised-Multitask-Learners-Radford-Wu/9405cc0d6169988371b2755e573cc28650d14dfe
Sap, M., Jafarpour, A., Choi, Y., Smith, N. A., Pennebaker, J. W., & Horvitz, E. (2022). Quantifying the narrative flow of imagined versus autobiographical stories. Proceedings of the National Academy of Sciences, 119(45), e2211715119. https://doi.org/10.1073/pnas.2211715119
Shannon, C. E. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27(3), 379–423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x
Shi, Y., & Lei, L. (2020). Lexical Richness and Text Length: An Entropy-based Perspective. Journal of Quantitative Linguistics, 29, 1–18. https://doi.org/10.1080/09296174.2020.1766346
Wilkens, M. (2015). Digital Humanities and Its Application in the Study of Literature and Culture. Comparative Literature, 67, 11–20. https://doi.org/10.1215/00104124-2861911
Wu, S. T., Bachrach, A., Cardenas, C., & Schuler, W. (2010, July 11). Complexity Metrics in an Incremental Right-Corner Parser. Annual Meeting of the Association for Computational Linguistics. https://www.semanticscholar.org/paper/Complexity-Metrics-in-an-Incremental-Right-Corner-Wu-Bachrach/5cfe9bb78fa50955be5a99e6966dd5ba5b4d4f80
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Tianyi Zhang, Junying Liang
This work is licensed under a Creative Commons Attribution 4.0 International License.