Options
LongFinBERT: A Language Model for Very Long Financial Documents
Type
working paper
Date Issued
2023-12-18
Author(s)
Abstract
This paper introduces LongFinBERT, a modern language model specialized for processing long financial documents. Due to an adaptation in model architecture, LongFinBERT demonstrates substantially lower computational requirements for lengthy documents compared to other state-of-the-art language models. This characteristic enables processing of e.g. an entire annual accounting filing at once, which was previously computationally infeasible for LMs.
We apply LongFinBERT to two empirical settings: Firstly, we aim to improve the detection of financial misreporting using text from 10-K filings from 1994 to 2018. Misreporting predictions that utilize text-based features from LongFinBERT outperform those based solely on accounting variables or other textual models, namely Latent Dirichlet Allocation, neural document embeddings, and FinBERT. Lastly, we find that market returns respond to year-over-year alterations of accounting disclosures, measured using LongFinBERT.
We apply LongFinBERT to two empirical settings: Firstly, we aim to improve the detection of financial misreporting using text from 10-K filings from 1994 to 2018. Misreporting predictions that utilize text-based features from LongFinBERT outperform those based solely on accounting variables or other textual models, namely Latent Dirichlet Allocation, neural document embeddings, and FinBERT. Lastly, we find that market returns respond to year-over-year alterations of accounting disclosures, measured using LongFinBERT.
Language
English
Keywords
10-K filings
deep learning
language model
machine learning
financial misreporting detection
HSG Classification
contribution to scientific community
Contact Email Address
triminh.phan@unisg.ch
Additional Information
Presented at CFE 2023 Berlin, to be presented at COMPSTAT 2024 Giessen and FFMM 2024 Lancaster. Last update 23.08.2024.
File(s)
Loading...
open access
Name
phan_senn_longfinbert_2024_08_22.pdf
Size
3.6 MB
Format
Adobe PDF
Checksum (MD5)
745a48878f6c59b7c73b70f74dc1ad8c