LongFinBERT: A Language Model for Very Long Financial Documents

Minh Tri PhanErik-Jan Senn2024-01-092024-01-092023-12-18https://www.alexandria.unisg.ch/handle/20.500.14171/119138This paper introduces LongFinBERT, a modern language model specialized for processing long financial documents. Due to an adaptation in model architecture, LongFinBERT demonstrates substantially lower computational requirements for lengthy documents compared to other state-of-the-art language models. This characteristic enables processing of e.g. an entire annual accounting filing at once, which was previously computationally infeasible for LMs. We apply LongFinBERT to two empirical settings: Firstly, we aim to improve the detection of financial misreporting using text from 10-K filings from 1994 to 2018. Misreporting predictions that utilize text-based features from LongFinBERT outperform those based solely on accounting variables or other textual models, namely Latent Dirichlet Allocation, neural document embeddings, and FinBERT. Lastly, we find that market returns respond to year-over-year alterations of accounting disclosures, measured using LongFinBERT.en10-K filingsdeep learninglanguage modelmachine learningfinancial misreporting detectionLongFinBERT: A Language Model for Very Long Financial Documentsworking paper