FinLlama: LLM-Based Financial Sentiment Analysis for Algorithmic Trading

The sentiment contained in on-line textual sources can drive market movements; such information harbours intrinsic advantages and gives a competitive edge to those equipped with the tools to harness it. Sentiment analysis rests upon the quantification of opinions present in unlabelled textual data, and aims to categorize whether the overall perspective is positive, negative, or neutral. When applied to large-scale information sources, this promises to enhance the understanding for the overall direction of macroscopic trends, a task which is both challenging and time-consuming for human analysts. Despite conceptual benefits, the diverse, nuanced, and vast nature of financial text presents unique challenges when it comes to extracting sentiment in a manner that is both accurate and actionable.

To address these issues, we consider the following fundamental questions:

  • Can large language models (LLMs), which have already revolutionized manifold areas of NLP, be specifically tailored for sentiment analysis in the finance domain, particularly for enhancing algorithmic trading?

  • Can this be achieved in a way which does not require vast computational resources, typically associated with NLP models, thus making the approach accessible to anyone equipped with standard computational resources?

To this end, we propose FinLlama, obtained by fine-tuning a pre-trained LLM (namely Llama 2 7B) on specialised, labelled and publicly available financial news datasets. The ultimate goal of FinLlama is to enhance the performance of financial sentiment analysis, whilst leveraging on parameter-efficient fine-tuning (PEFT) and 8-bit quantization, to minimise resource requirements.

Key Features of FinLlama

  1. Targeted Fine-Tuning: Rather than utilising one general LLM for financial tasks, our approach capitalises on the foundational pre-trained Llama 2 model, whereby fine-tuning is performed specifically for the purpose of sentiment classification through a SoftMax classification layer at its output.

  2. Efficient Resource Utilization: Our approach ensures that even standard computational resources, with no high-end GPUs, can be employed. By virtue of the pre-trained Llama 2 model and through targeted parameter-efficient fine-tuning, computational demands are dramatically reduced compared to the existing methods, thus bridging the gap between academic benchmarks and practical utility.

  3. Benchmarking and Real-World Application: The success of fine-tuned LLMs for finance has also highlighted that these have not yet adequately addressed the domain of portfolio construction. To this end, we integrate the extracted sentiment signals by FinLlama into a long-short portfolio, which allows us to obtain finance-specific real-world metrics including cumulative returns and the Sharpe ratio.

Fine-Tuning Process

Our work revisits the first principles of LLMs in order to align them  to the task of financial sentiment analysis. This is achieved by using four labelled financial text datasets as training data to fine-tune the Llama 2 model. Such finance-specific training equips the model with the ability to understand the linguistic nuances present in the financial domain. Furthermore, a three-class SoftMax classification layer is employed at the output of the foundational model. This made it possible to alter the primary function of the LLM from text generation to sentiment classification. In this way, the proposed fine-tuned FinLlama model acts as a generator-discriminator and produces sentiment decision outputs for three labels: positive, negative or neutral. Moreover, the LoRA implementation employed in the fine-tuning process reduces the number of trainable parameters to 4.2 million, amounting to just 0.0638% of the total number of parameters in the Llama 2 7B model, making the training process feasible on a single A100 GPU.

Proposed Framework

After establishing the proposed fine-tuned Llama 2 model, we followed the framework shown in Figure 1, with the aim of assessing the performance of our FinLlama model against other established sentiment analysis methods, using finance-specific real-world metrics.

 

Figure 1: Framework for sentiment analysis

 

Data Collection and Processing: Both textual and market data were analysed in order to construct appropriate long-short (L/S) portfolios. Regarding the textual data, articles dating between 2015 to 2021 were collected from online sources. Financial market data were collected for the same time period from Yahoo Finance. These market data contained daily stock returns for the 500 companies in our Investable Universe (S&P 500). Data processing in the form of Named Entity Recognition (NER) and text pre-processing was then applied to the textual data, to remove irrelevant articles and ensure the compatibility of the articles with our sentiment methods.

Sentiment Analysis: In total, five sentiment analysis methods were applied. For the lexicon-based approaches, LMD, HIV-4 and VADER were employed. Regarding the deep learning methods, the FinBERT model and our FinLlama were utilized. The considered methods were evaluated on every article within each corpus for a given company. In cases where multiple articles were published on the same day for a given company, the average sentiment for that day was calculated.

Portfolio Construction and Evaluation: Once the sentiment for each method was defined for every company, we constructed a long-short portfolio. The sentiment scores were used to determine which companies should be placed in long or short positions, aiming to maximize returns from both. Specifically, the top 35% of companies with the highest positive sentiment were placed in long positions, while the bottom 35% with the strongest negative sentiment were placed in short positions. This resulted in an equally-weighted portfolio strategy, as this is the strategy mostly utilized by hedge funds.

The performance of the portfolio constructed using our fine-tuned model was assessed against the portfolios constructed using the other SOTA sentiment methods. To this end, the employed real-world financial metrics were: cumulative returns, annualized return, annualized volatility, and the Sharpe ratio.

Experimental Results

The performances of the five portfolios constructed are illustrated in Figure 2. Notice that the deep learning approaches outperformed the lexicon-based approaches in terms of cumulative returns, particularly those relying on general-purpose dictionaries (HIV-4 and VADER). This was to be expected, given that lexicon-based approaches often fail to capture the contextual meaning of sentences, whilst the nuanced nature of financial text significantly reduces the accuracy of general-purpose dictionaries.

 

Figure 2: Comparison of the performance of the 35% long-short portfolios which were constructed using the five considered sentiment analysis methods, for the time period of February 2015 to June 2021. The MA(30) and MSTD(30) represent, respectively, the moving average and the moving standard deviation of the returns calculated over a 30-day rolling window.

 

The quantitative results, displayed in Table 1, suggest that the 35% long-short portfolio, constructed using our fine-tuned Llama-2 model, was the most successful. Overall, our FinLlama model successfully generated significantly higher returns for investors compared to all other considered methods, and most importantly FinBERT, whilst simultaneously reducing portfolio risk and being more robust to turbulent economic periods, as indicated by the higher Sharpe ratio and lower annualized volatility.

 

Table 1: Statistical comparison between the performances of the five considered sentiment analysis methods using a 35% long-short portfolio. For Cumulative Returns, Annualized Return and Sharpe Ratio, higher is better. For Annualized Volatility, lower is better.

 

Conclusion

We have introduced an innovative approach to financial sentiment analysis which rests upon the fine-tuning of a general-purpose LLM. Our fine-tuned Llama2 7B model, termed FinLlama, has been used to construct a long-short portfolio, yielding results that have surpassed those of the existing methods in the field. The FinLlama has achieved cumulative returns which have outperformed the currently leading FinBERT model by 44.7%, while achieving a significantly higher Sharpe ratio and lower annualized volatility.

In addition, the present work has set a new benchmark in the field, transcending traditional measures such as accuracy and F1-score, which are commonly used in the literature. It is our hope that such an approach is a step towards narrowing down the divide between academic research and practical applications within quantitative finance.

Disclaimer: Nothing herein is financial advice, and NOT a recommendation to trade real money. Please use common sense and always first consult a professional before trading or investing.

For a detailed read of the paper, please visit: https://arxiv.org/abs/2403.12285

Next
Next

Portfolio Cuts: A Graph-theoretic Framework To Diversification