Advancing Exchange Rate Forecasting: Leveraging LSTM and AI for USD/BDT Prediction

1. Introduction

This research addresses the critical challenge of forecasting the US Dollar to Bangladeshi Taka (USD/BDT) exchange rate, a vital task for Bangladesh's import-dependent economy. Currency fluctuations directly impact foreign reserve management, trade balances, and inflation. Traditional statistical models often fail to capture the non-linear, complex patterns characteristic of emerging market currencies, especially during economic uncertainty. This study leverages advanced machine learning, specifically Long Short-Term Memory (LSTM) neural networks, to model these dynamic temporal relationships using historical data from 2018 to 2023.

2. Literature Review

Recent literature establishes the superiority of LSTM networks over traditional time-series models like ARIMA for financial forecasting. Pioneered by Hochreiter & Schmidhuber to solve the vanishing gradient problem in RNNs, LSTMs excel at capturing long-term dependencies. Subsequent enhancements like forget gates (Gers et al.) improved adaptability to volatility. Empirical studies on major currency pairs show LSTMs outperforming ARIMA by 18–22% in directional accuracy. While research on currencies like USD/INR exists, specific studies on USD/BDT are limited, often using pre-pandemic data and lacking integration of modern techniques like attention mechanisms or local macroeconomic shocks.

3. Methodology & Data

3.1. Data Collection & Preprocessing

Historical daily USD/BDT exchange rate data was sourced from Yahoo Finance for the period 2018–2023. The data shows a decline in the BDT/USD rate from approximately 0.012 to 0.009. Data preprocessing involved handling missing values, calculating normalized daily returns to capture volatility, and creating sequences for the time-series models.

3.2. LSTM Model Architecture

The core forecasting model is an LSTM neural network. The architecture was optimized for the USD/BDT dataset, likely involving multiple LSTM layers, dropout for regularization, and a dense output layer. The model was trained to predict future exchange rate values based on past sequences.

3.3. Gradient Boosting Classifier (GBC)

A Gradient Boosting Classifier was employed for directional prediction—forecasting whether the exchange rate will move up or down. This model's performance was evaluated through a practical trading simulation.

4. Experimental Results & Analysis

LSTM Accuracy

99.449%

LSTM RMSE

0.9858

ARIMA RMSE

1.342

GBC Profitable Trades

40.82%

4.1. LSTM Performance Metrics

The LSTM model achieved exceptional results: an accuracy of 99.449%, a Root Mean Square Error (RMSE) of 0.9858, and a test loss of 0.8523. This indicates a highly precise model for predicting the actual value of the USD/BDT rate.

4.2. GBC Trading Simulation

A backtest was conducted using the GBC's directional signals on an initial capital of $10,000 over 49 trades. While 40.82% of trades were profitable, the strategy resulted in a net loss of $20,653.25. This highlights the critical difference between predictive accuracy and profitable trading, where transaction costs, slippage, and risk management are paramount.

4.3. Comparative Analysis vs. ARIMA

The LSTM model significantly outperformed the traditional ARIMA model, which had an RMSE of 1.342. This demonstrates the clear advantage of deep learning in modeling the complex, non-linear patterns present in financial time-series data.

5. Technical Details & Mathematical Framework

The LSTM cell operates through a gating mechanism that regulates the flow of information. The key equations are:

Forget Gate: $f_t = \sigma(W_f \cdot [h_{t-1}, x_t] + b_f)$
Input Gate: $i_t = \sigma(W_i \cdot [h_{t-1}, x_t] + b_i)$, $\tilde{C}_t = \tanh(W_C \cdot [h_{t-1}, x_t] + b_C)$
Cell State Update: $C_t = f_t * C_{t-1} + i_t * \tilde{C}_t$
Output Gate: $o_t = \sigma(W_o \cdot [h_{t-1}, x_t] + b_o)$, $h_t = o_t * \tanh(C_t)$

Where $\sigma$ is the sigmoid function, $*$ denotes element-wise multiplication, $W$ are weight matrices, $b$ are bias vectors, $x_t$ is the input, $h_t$ is the hidden state, and $C_t$ is the cell state. This structure allows the network to learn which information to retain or discard over long sequences.

6. Analysis Framework: A Practical Example

Case: Integrating Macroeconomic Shocks into the LSTM Pipeline

The study mentions incorporating local macroeconomic shock detection. Here is a conceptual framework for how this could be implemented without explicit code:

Data Augmentation: Create a parallel time-series dataset of "shock indicators" for Bangladesh. This could be binary (0/1) flags for events like central bank intervention announcements, major political events, or changes in remittance flows, sourced from news APIs or official bulletins.
Feature Engineering: For each trading day, concatenate the historical window of exchange rate data with the corresponding window of shock indicators. This creates an enriched input vector: [Price_Seq, Shock_Seq].
Model Adaptation: Adjust the LSTM's input layer to accept this multi-dimensional input. The network will learn to associate specific shock patterns with subsequent volatility or trend changes in the USD/BDT rate.
Validation: Compare the performance (RMSE, directional accuracy) of the shock-augmented model against the baseline model that uses only price data, specifically during periods marked by shocks.

7. Future Applications & Research Directions

Multi-Modal Data Integration: Beyond macroeconomic flags, integrating real-time sentiment analysis from financial news and social media (e.g., using Transformer models like BERT) could capture market mood, as seen in studies on major forex pairs.
Attention Mechanisms: Incorporating attention layers (like those in the Transformer architecture) into the LSTM could allow the model to dynamically focus on the most relevant past time steps, improving interpretability and performance for long sequences.
Reinforcement Learning for Trading: Moving from pure prediction to direct policy learning. A model like Deep Q-Network (DQN) could be trained to make buy/sell/hold decisions that maximize risk-adjusted returns (Sharpe Ratio), directly addressing the profitability gap seen in the GBC backtest.
Cross-Currency Learning: Developing a meta-model trained on multiple emerging market currency pairs (e.g., USD/INR, USD/PKR) to learn universal patterns of volatility and policy impact, then fine-tuning on USD/BDT for improved robustness with limited data.

8. References

Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation.
Gers, F. A., Schmidhuber, J., & Cummins, F. (2000). Learning to Forget: Continual Prediction with LSTM. Neural Computation.
Rahman et al. (Year). Study on USD/INR forecasting with LSTM. [Relevant Journal].
Afrin et al. (2021). Pre-pandemic study on USD/BDT. [Relevant Conference].
Hosain et al. (Year). Hybrid techniques for currency forecasting. [Relevant Journal].
Vaswani, A., et al. (2017). Attention Is All You Need. Advances in Neural Information Processing Systems.
Mnih, V., et al. (2015). Human-level control through deep reinforcement learning. Nature.

9. Original Analysis & Expert Commentary

Core Insight: This paper successfully demonstrates the technical supremacy of LSTM networks over legacy models like ARIMA for point forecasting but inadvertently exposes a dangerous chasm in fintech research: the conflation of statistical accuracy with economic utility. A 99.45% accurate model that, when translated into a trading strategy via a Gradient Boosting Classifier, incurs a catastrophic 200%+ loss on initial capital is not just an academic footnote—it's a siren call for a fundamental shift in how we evaluate AI in finance.

Logical Flow & Strengths: The research logic is sound and replicable. The authors correctly identify the limitations of linear models for non-linear, policy-sensitive currencies like the BDT. Their use of a managed-float regime as a case study is astute, as these markets are ripe for AI disruption. The technical execution is robust, with the LSTM's near-perfect RMSE of 0.9858 (vs. ARIMA's 1.342) providing irrefutable evidence of deep learning's capacity to model complex temporal dependencies, a finding consistent with seminal works like the original LSTM paper by Hochreiter & Schmidhuber. The attempt to bridge to a trading outcome via the GBC is a commendable step towards real-world relevance.

Critical Flaws & The Profitability Paradox: Herein lies the critical flaw. The GBC's 40.82% win rate resulting in massive losses is a classic case of ignoring the asymmetry of financial returns. It highlights a lack of integrated risk metrics (e.g., Sharpe Ratio, Maximum Drawdown) and a naive execution model. This mirrors a common pitfall in early AI finance papers that focused purely on prediction error. The field has since evolved, as seen in reinforcement learning approaches that directly optimize for portfolio returns, such as the Deep Q-Network (DQN) framework applied in Mnih et al.'s seminal work. Furthermore, while the paper mentions macroeconomic factors, its implementation seems cursory. For a currency like the BDT, which is heavily influenced by central bank intervention and remittance flows, failing to deeply integrate these as structured features—perhaps using an attention mechanism to weigh their impact, as suggested in the Transformer architecture—is a missed opportunity.

Actionable Insights & The Path Forward: For practitioners and researchers, this study offers two crucial, actionable insights. First, stop worshipping at the altar of RMSE. The primary evaluation metric for any market-facing model must be its performance in a simulated trading environment that includes realistic costs, slippage, and position sizing. Tools like Backtrader or QuantConnect should be non-negotiable in the validation pipeline. Second, the future lies in end-to-agent learning. Instead of the disjointed pipeline (LSTM -> GBC -> Trade), the next frontier is to employ a single, holistic agent—likely based on Proximal Policy Optimization (PPO) or similar advanced RL algorithms—that ingests raw or lightly processed market data and directly outputs risk-managed trading actions. This agent's reward function would be a composite of risk-adjusted return metrics, forcing the AI to learn the true economics of the market, not just its statistical patterns. The authors' suggestion of adding sentiment analysis is a good start, but it must be fused into this agent-based architecture, not merely appended as another feature column. This is the path from creating a clever predictor to engineering a viable financial agent.

Table of Contents