EUR/USD Forecasting with LLM & Deep Learning: The IUS Framework

1. Introduction

Accurate forecasting of the EUR/USD exchange rate is a critical challenge for global finance, impacting investors, multinational corporations, and policymakers. Traditional econometric models, reliant on structured macroeconomic indicators, often fail to capture real-time market volatility and the nuanced impact of news and geopolitical events. This paper introduces the IUS (Information-Unified-Structured) framework, a novel approach that fuses unstructured textual data (news, analysis) with structured quantitative data (exchange rates, financial indicators) to enhance prediction accuracy. By leveraging Large Language Models (LLMs) for advanced sentiment and movement classification, and integrating these insights with an Optuna-optimized Bidirectional Long Short-Term Memory (Bi-LSTM) network, the proposed method addresses key limitations in current forecasting paradigms.

2. The IUS Framework: Architecture & Methodology

The IUS framework is a systematic pipeline designed for multi-source financial data fusion and predictive modeling.

2.1. Multi-Source Data Integration

The framework ingests two primary data streams:

Structured Data: Historical EUR/USD exchange rates, key financial indicators (e.g., interest rates, inflation indices, GDP figures).
Unstructured Textual Data: News articles, financial reports, and market analysis pertaining to the Eurozone and US economies.

This combination aims to capture both the quantitative history and the qualitative sentiment driving market movements.

2.2. LLM-Powered Textual Feature Extraction

To overcome the challenges of noise and complex semantics in financial texts, the framework employs a Large Language Model (e.g., a model akin to GPT or BERT) for dual-purpose analysis:

Sentiment Polarity Scoring: Assigns a numerical sentiment score (e.g., -1 for bearish, +1 for bullish) to each text document.
Exchange Rate Movement Classification: Directly classifies the text's implied forecast on EUR/USD movement (e.g., Up, Down, Stable).

This step transforms unstructured text into actionable, numerical features.

2.3. Causality-Driven Feature Generator

The generated textual features are combined with the pre-processed quantitative features. A causality analysis module (potentially using methods like Granger causality or attention mechanisms) is employed to identify and weight features based on their predictive causality concerning the future exchange rate, rather than mere correlation. This ensures the model focuses on the most relevant drivers.

2.4. Optuna-Optimized Bi-LSTM Model

The fused feature set is fed into a Bidirectional LSTM network. A Bi-LSTM processes sequences in both forward and backward directions, capturing past and future context more effectively for time-series prediction. The hyperparameters (e.g., number of layers, hidden units, dropout rate, learning rate) are automatically optimized using Optuna, a Bayesian optimization framework, to find the most effective model configuration.

3. Experimental Setup & Results

3.1. Dataset & Baseline Models

Experiments were conducted on a dataset spanning several years of daily EUR/USD rates, corresponding macroeconomic indicators, and aligned financial news. The proposed IUS framework with Optuna-Bi-LSTM was compared against several strong baselines, including:

Standard LSTM and Bi-LSTM models using only structured data.
CNN-LSTM hybrid models.
Traditional econometric models (e.g., ARIMA).

3.2. Performance Metrics & Results

Model performance was evaluated using standard regression metrics: Mean Absolute Error (MAE) and Root Mean Square Error (RMSE).

Key Experimental Results

The IUS + Optuna-Bi-LSTM model achieved the best performance:

Reduced MAE by 10.69% compared to the best-performing baseline model.
Reduced RMSE by 9.56%.

Interpretation: This demonstrates a significant and robust improvement in forecasting accuracy, with the RMSE reduction indicating better handling of large errors (outliers).

3.3. Ablation Study & Feature Importance

Ablation studies confirmed the value of data fusion:

Models using only structured data performed worse than the full IUS framework.
The combination of unstructured (text) and structured data yielded the highest accuracy.
Feature selection revealed that the optimal configuration used the top 12 most important quantitative features combined with the LLM-generated textual features.

4. Technical Deep Dive

Core Mathematical Formulation: The Bi-LSTM cell operation can be summarized. For a given time step $t$ and input $x_t$, the forward LSTM computes hidden state $\overrightarrow{h_t}$ and the backward LSTM computes $\overleftarrow{h_t}$. The final output $h_t$ is a concatenation: $h_t = [\overrightarrow{h_t}; \overleftarrow{h_t}]$.

The loss function minimized during training is typically the Mean Squared Error (MSE): $$L = \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2$$ where $y_i$ is the actual future exchange rate and $\hat{y}_i$ is the model's prediction.

Optuna's Role: Optuna automates the search for hyperparameters $\theta$ (e.g., learning rate $\eta$, LSTM units) by defining an objective function $f(\theta)$ (e.g., validation set RMSE) and efficiently exploring the parameter space using Tree-structured Parzen Estimator (TPE) algorithms, as detailed in their foundational paper [Akiba et al., 2019].

5. Analysis Framework: A Practical Case

Scenario: Forecasting EUR/USD movement for the next trading day following a European Central Bank (ECB) policy announcement.

Data Collection: Gather the day's ECB press release, analyst summaries from Reuters/Bloomberg, and structured data (current EUR/USD, bond yields, volatility index).
LLM Processing: Feed the textual documents into the LLM module. The model outputs: Sentiment Score = +0.7 (moderately bullish), Movement Classification = "Up".
Feature Fusion: These scores are combined with the 12 selected quantitative features (e.g., 10-year yield spread, prior day's return).
Causality Weighting: The feature generator assigns higher weight to the "Sentiment Score" and "Yield Spread" based on historical causal impact.
Prediction: The weighted feature vector is input to the trained Optuna-Bi-LSTM, which outputs a specific forecasted exchange rate value.

This case illustrates how the framework translates real-world events into a quantifiable, actionable forecast.

6. Future Applications & Research Directions

Cross-Asset Forecasting: Applying the IUS framework to other currency pairs (e.g., GBP/USD, USD/JPY) and correlated assets like equities or commodities.
Real-Time Prediction Systems: Developing low-latency pipelines for intraday trading, requiring efficient, distilled LLMs and streaming data integration.
Explainable AI (XAI) Integration: Incorporating techniques like SHAP or LIME to explain why the model made a specific prediction, crucial for regulatory compliance and trader trust. Resources like the Interpretable Machine Learning book by Christoph Molnar provide a foundation for this.
Multi-Modal LLMs: Utilizing next-generation LLMs that can process not just text but also audio (earnings calls) and data from charts/graphs for even richer context.
Adaptive Feature Selection: Moving from a static top-12 feature set to a dynamic, time-varying feature importance mechanism.

7. References

Akiba, T., Sano, S., Yanase, T., Ohta, T., & Koyama, M. (2019). Optuna: A Next-generation Hyperparameter Optimization Framework. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.
Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735–1780.
Molnar, C. (2020). Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. https://christophm.github.io/interpretable-ml-book/
Singh, et al. (2023). [Relevant baseline study on Weibo text and CNN-LSTM].
Tadphale, et al. (2022). [Relevant baseline study on news headlines and LSTM].
Vaswani, A., et al. (2017). Attention Is All You Need. Advances in Neural Information Processing Systems, 30.

8. Analyst's Corner: A Critical Deconstruction

Core Insight: This paper isn't just another "AI for finance" project; it's a targeted strike on the most persistent flaw in quantitative finance: the integration lag between news and numbers. The authors correctly identify that sentiment is a leading indicator, but traditional NLP tools are too blunt for the nuanced, bi-directional narratives of forex. Their use of LLMs as a semantic refinery to produce clean, directional sentiment features is the key intellectual leap. It's a move from bag-of-words to a model of understanding, akin to how CycleGAN's framework for unpaired image translation [Zhu et al., 2017] created a new paradigm by learning mappings between domains without strict correspondence.

Logical Flow: The architecture is logically sound. The pipeline—LLM feature extraction → causality filtering → optimized sequence modeling—mirrors best practices in modern ML: use a powerful foundation model for feature engineering, introduce an inductive bias (causality) to combat overfitting, and then let a specialized predictor (Bi-LSTM) do its job with tuned parameters. The Optuna integration is a pragmatic touch, acknowledging that model performance is often gated by hyperparameter hell.

Strengths & Flaws: The major strength is the demonstrated efficacy (10.69% MAE reduction is substantial in forex) and the elegant solution to the "two-country text" problem via LLM classification. However, the paper's flaw is one of omission: operational latency and cost. Running inference on large LLMs for every news item is computationally expensive and slow. For high-frequency trading (HFT), this framework is currently impractical. Furthermore, the "Causality-Driven Feature Generator" is under-specified—is it Granger causality, a learned attention mask, or something else? This black box could be a reproducibility issue.

Actionable Insights: For quants and asset managers, the takeaway is clear: Prioritize quality of sentiment signals over quantity. Investing in fine-tuning a smaller, domain-specific LLM (like a FinBERT) on forex corpus might yield most of the benefits at a fraction of the cost and latency. The research direction should pivot towards efficiency—exploring knowledge distillation from large LLMs to smaller models, and explainability—using attention weights from the LLM and Bi-LSTM to generate "reasoning reports" for trades, a necessity for fund compliance. The future winner in this space won't just have the most accurate model, but the one that is fastest, cheapest, and most transparent.