Modeling Gold for Prediction and Portfolio Hedging

Gold prices have risen sharply in recent months, prompting renewed debate over whether the market has reached its peak. In this post, we examine quantitative models used to forecast gold prices and evaluate their effectiveness in capturing volatility and market dynamics. However, gold is not only a speculative vehicle, it also functions as an effective hedging instrument. We explore both aspects to provide a comprehensive view of gold’s role in modern portfolio management.

Comparative Analysis of Gold Forecasting Models: Statistical vs. Machine Learning Approaches

Gold is an important asset class, serving as both a store of value and a hedge against inflation and market uncertainty. Therefore, performing predictive analysis of gold prices is essential. Reference [1] evaluated several predictive methods for gold prices. It examined not only classical, statistical approaches but also newer machine learning techniques. The study used data from 2021 to 2025, with 80% as in-sample data and 20% as validation data.

Findings

-The study analyzes gold’s forecasting dynamics, comparing traditional statistical models (ARIMA, ETS, Linear Regression) with machine learning methods (KNN and SVM).

-Daily gold price data from 2021 to 2025 were used for model training, followed by forecasts for 2026.

-Descriptive analysis showed moderate volatility (σ = 501.12) and strong cumulative growth of 85%, confirming gold’s ongoing role as a strategic safe-haven asset.

-Empirical results indicate that Linear Regression (R² = 0.986, RMSE = 35.7) and ETS models achieved superior forecasting accuracy compared to ARIMA, KNN, and SVM.

-Machine learning models (KNN and SVM) underperformed, often misrepresenting volatility and producing higher forecast errors.

-The results challenge the assumption that complex algorithms necessarily outperform traditional methods in financial forecasting.

-Forecasts for 2026 project an average gold price of $4,659, corresponding to a 58.6% potential return.

-The study cautions that these forecasts remain sensitive to macroeconomic shocks and market uncertainties.

-The findings emphasize that simpler, transparent, and interpretable models can outperform more complex machine learning approaches in volatile market conditions.

In short, the paper shows that,

-Linear Regression and ETS outperformed ARIMA, KNN, and SVM, delivering the lowest error and highest explanatory power,

-Machine learning models (KNN, SVM) did not outperform traditional statistical methods, emphasizing the value of interpretability and stability in volatile markets.

Another notable aspect of the study is its autocorrelation analysis, which reveals that, unlike equities, gold does not exhibit clear autocorrelation patterns—its price behavior appears almost random. The paper also suggested improving the forecasting model by incorporating macroeconomic variables.

Reference

[1] Muhammad Ahmad, Shehzad Khan, Rana Waseem Ahmad, Ahmed Abdul Rehman, Roidar Khan, Comparative analysis of statistical and machine learning models for gold price prediction, Journal of Media Horizons, Volume 6, Issue 4, 2025

Using Gold Futures to Hedge Equity Portfolios

Hedging is a risk management strategy used to offset potential losses in one investment by taking an opposing position in a related asset. By using financial instruments such as options, futures, or derivatives, investors can protect their portfolios from adverse price movements. The primary goal of hedging is not to maximize profits but to minimize potential losses and provide stability.

Reference [2] explores hedging basic materials portfolios using gold futures.

Findings

-The study examines commodities as alternative investments, hedging instruments, and diversification tools.

-Metals, in particular, tend to be less sensitive to inflation and exhibit low correlation with traditional financial assets.

-Investors can gain exposure to metals through shares of companies in the basic materials sector, which focus on exploration, development, and processing of raw materials.

-Since not all companies in this sector are directly linked to precious metals, the study suggests including gold futures to enhance portfolio diversification.

-The research compares a portfolio composed of basic materials sector stocks with a similar portfolio hedged using gold futures.

-Findings show that hedging with gold reduces both profits and losses, providing a stabilizing effect suitable for risk-averse investors.

-The analysis used historical data from March 1, 2018, to March 1, 2022, and tested several portfolio construction methods, including equal-weight, Monte Carlo, and mean-variance approaches.

-Between March 2022 and November 2023, most portfolios without gold futures experienced losses, while portfolios with short gold futures positions showed reduced drawdowns and more stable performance.

-The basis trading strategy using gold futures did not change the direction of returns but significantly mitigated volatility and portfolio swings.

In short, the study concludes that hedging base metal equity portfolios with gold futures can effectively reduce PnL volatility and enhance portfolio stability, offering a practical approach for conservative investors and professional asset managers.

Reference

[2] Stasytytė, V., Maknickienė, N., & Martinkutė-Kaulienė, R. (2024), Hedging basic materials equity portfolios using gold futures, Journal of International Studies, 17(2), 132-145.

Closing Thoughts

In summary, gold can serve as an investment, a speculative vehicle, and a hedging instrument. In the first article, simpler models such as Linear Regression and ETS outperformed complex algorithms in forecasting gold prices, emphasizing the importance of interpretability in volatile markets. In the second, incorporating gold futures into base metal portfolios reduced profit and loss volatility, offering stability for risk-averse investors. Together, the studies highlight gold’s dual function as both a return-generating asset and a tool for risk management.

Identifying and Characterizing Market Regimes Across Asset Classes

Identifying market regimes is essential for understanding how risk, return, and volatility evolve across financial assets. In this post, we examine two quantitative approaches to regime detection.

Hedge Effectiveness Under a Four-State Regime Switching Model

Identifying market regimes is important for understanding shifts in risk, return, and volatility across financial assets. With the advancement of machine learning, many regime-switching and machine learning methods have been proposed. However, these methods, while promising, often face challenges of interpretability, overfitting, and a lack of robustness in real-world deployment.

Reference [1] proposed a more “classical” regime identification technique. The authors developed a four-state regime switching (PRS) model for FX hedging. Instead of using a simple constant hedge ratio, they classified the market into regimes and optimized hedge ratios accordingly.

Findings

-The study develops a four-state regime-switching model for optimal foreign exchange (FX) hedging using forward contracts.

-Each state corresponds to distinct market conditions based on the direction and magnitude of deviations of the FX spot rate from its long-term trend.

-The model’s performance is evaluated across five currencies against the British pound over multiple investment horizons.

-Empirical results show that the model achieves the highest risk reduction for the US dollar, euro, Japanese yen, and Turkish lira, and the second-best performance for the Indian rupee.

-The model demonstrates particularly strong performance for the Turkish lira, suggesting greater effectiveness in hedging highly volatile currencies.

-The model’s superior results are attributed to its ability to adjust the estimation horizon for the optimal hedge ratio according to current market conditions.

-This flexibility enables the model to capture asymmetry and fat-tail characteristics commonly present in FX return distributions.

-Findings indicate that FX investors use short-term memory during low market conditions and long-term memory during high market conditions relative to the trend.

-The model’s dynamic structure aligns with prior research emphasizing the benefits of updating models with recent data over time.

-Results contribute to understanding investor behavior across market regimes and offer practical implications for mitigating behavioral biases, such as panic during volatile conditions.

In short, the authors built a more efficient hedging model by splitting markets into four conditions instead of two, adjusting hedge ratios and memory length depending on the volatility regime. This significantly improves hedge effectiveness, especially in volatile currencies.

We believe this is an efficient method that can also be applied to other asset classes, such as equities and cryptocurrencies.

Reference

[1] Taehyun Lee, Ioannis C. Moutzouris, Nikos C. Papapostolou, Mahmoud Fatouh, Foreign exchange hedging using regime-switching models: The case of pound sterling, Int J Fin Econ. 2024;29:4813–4835

Using the Gaussian Mixture Models to Identify Market Regimes

Reference [2] proposed an approach that uses the Gaussian Mixture Models to identify market regimes by dividing it into clusters. It divided the market into 4 clusters or regimes,

Cluster 0: a disbelief momentum before the breakout zone,

Cluster 1: a high unpredictability zone or frenzy zone,

Cluster 2: a breakout zone,

Cluster 3: the low instability or the sideways zone.

Findings

-Statistical analysis indicated that the S&P 500 OHLC data followed a Gaussian (Normal) distribution, which motivated the use of Gaussian Mixture Models (GMMs) instead of k-means clustering, since GMMs account for the distributional properties of the data.

-Traditional trading strategies based on the Triple Simple Moving Average (TSMA) and Triple Exponential Moving Average (TEMA) were shown to be ineffective across all market regimes.

-The study identified the most suitable regimes for each strategy to improve portfolio returns, highlighting the importance of regime-based application rather than uniform use.

-This combined approach of clustering with GMM and regime-based trading strategies demonstrated potential for improving profitability and managing risks in the S&P 500 futures market.

In short, the triple moving average trading systems did not perform well. However, the authors managed to pinpoint the market regimes where the trading systems performed better, relatively speaking.

Reference

[2] F. Walugembe, T. Stoica, Evaluating Triple Moving Average Strategy Profitability Under Different Market Regimes, 2021, DOI:10.13140/RG.2.2.36616.96009

Closing Thoughts

Both studies underscore the importance of regime identification and adaptive modeling in financial decision-making. The four-state regime-switching hedging model demonstrates how incorporating changing market conditions enhances risk reduction in foreign exchange markets, while the Gaussian Mixture Model approach illustrates how clustering can effectively capture distinct market phases in equity trading. Together, they highlight the value of data-driven, regime-aware frameworks in improving both risk management and trading performance.

The Role of Data in Financial Modeling and Risk Management

Much emphasis has been placed on developing accurate and robust financial models, whether for pricing, trading, or risk management. However, a crucial yet often overlooked component of any quantitative system is the reliability of the underlying data. In this post, we explore some issues with financial data and how to address them.

How to Deal with Missing Financial Data?

In the financial industry, data plays a critical role in enabling managers to make informed decisions and manage risk effectively. Despite the critical importance of financial data, it is often missing or incomplete. Financial data can be difficult to obtain due to a lack of standardization and regulatory requirements. Incomplete or inaccurate data can lead to flawed analysis, incorrect decision-making, and increased risk.

Reference [1] studied the missing data in firms’ fundamentals and proposed methods for imputing the missing data.

Findings

-Missing financial data affects more than 70% of firms, representing approximately half of total market capitalization.

-The authors find that missing firm fundamentals exhibit complex, systematic patterns rather than occurring randomly, making traditional ad-hoc imputation methods unreliable.

-They propose a novel imputation method that utilizes both time-series and cross-sectional dependencies in the data to estimate missing values.

-The method accommodates general systematic patterns of missingness and generates a fully observed panel of firm fundamentals.

-The paper demonstrates that addressing missing data properly has significant implications for estimating risk premia, identifying cross-sectional anomalies, and improving portfolio construction.

-The issue of missing data extends beyond firm fundamentals to other financial domains such as analyst forecasts (I/B/E/S), ESG ratings, and other large financial datasets.

-The problem is expected to be even more pronounced in international data and with the rapid expansion of Big Data in finance.

-The authors emphasize that as data sources grow in volume and complexity, developing robust imputation methods will become increasingly critical.

In summary, the paper provides foundational principles and general guidelines for handling missing data, offering a framework that can be applied to a wide range of financial research and practical applications.

We think that the proposed data imputation methods can be applied not only to fundamental data but also to financial derivatives data, such as options.

Reference

[1] Bryzgalova, Svetlana and Lerner, Sven and Lettau, Martin and Pelger, Markus, Missing Financial Data SSRN 4106794

Predicting Realized Volatility Using High-Frequency Data: Is More Data Always Better?

A common belief in strategy design is that ‘more data is better.’ But is this always true? Reference [2] examined the impact of the quantity of data in predicting realized volatility. Specifically, it focused on the accuracy of volatility forecasts as a function of data sampling frequency. The study was conducted on crude oil, and it used GARCH as the volatility forecast method.

Findings

-The research explores whether increased data availability through higher-frequency sampling leads to improved forecast precision.

-The study employs several GARCH models using Brent crude oil futures data to assess how sampling frequency influences forecasting performance.

-In-sample results show that higher sampling frequencies improve model fit, indicated by lower AIC/BIC values and higher log-likelihood scores.

-Out-of-sample analysis reveals a more complex picture—higher sampling frequencies do not consistently reduce forecast errors.

-Regression analysis demonstrates that variations in forecast errors are only marginally explained by sampling frequency changes.

-Both linear and polynomial regressions yield similar results, with low adjusted R² values and weak correlations between frequency and error metrics.

-The findings challenge the prevailing assumption that higher-frequency data necessarily enhance forecast precision.

-The study concludes that lower-frequency sampling may sometimes yield better forecasts, depending on model structure and data quality.

-The paper emphasizes the need to balance the benefits and drawbacks of high-frequency data collection in volatility prediction.

-It calls for further research across different assets, markets, and modeling approaches to identify optimal sampling frequencies.

In short, increasing the data sampling frequency improves in-sample prediction accuracy. However, higher sampling frequency actually decreases out-of-sample prediction accuracy.

This result is surprising, and the author provided some explanation for this counterintuitive outcome. In my opinion, financial time series are usually noisy, so using more data isn’t necessarily better because it can amplify the noise.

Another important insight from the article is the importance of performing out-of-sample testing, as the results can differ, sometimes even contradict the in-sample outcomes.

Reference

[2] Hervé N. Mugemana, Evaluating the impact of sampling frequency on volatility forecast accuracy, 2024, Inland Norway University of Applied Sciences

Closing Thoughts

Both studies underscore the central role of high-quality data in financial modeling, trading, and risk management. Whether it is the frequency at which data are sampled or the completeness of firm-level fundamentals, the integrity of input data directly determines the reliability of forecasts, model calibration, and investment decisions. As financial markets become increasingly data-driven, the ability to collect, process, and validate information with precision will remain a defining edge for both researchers and practitioners.

When Trading Systems Break Down: Causes of Decay and Stop Criteria

A key challenge in system development is that trading performance often deteriorates after going live. In this post, we look at why this happens by examining the post-publication decay of stock anomalies, and we address a practical question faced by every trader: when a system is losing money, is it simply in a drawdown or has it stopped working altogether?

Why and How Systematic Trading Strategies Decay After Going Live

Testing and validating a trading strategy is an important step in trading system development. It’s a commonly known fact that a well-optimized trading strategy’s performance often deteriorates after it goes live. Thus, developing a robust strategy that performs well out-of-sample is quite a challenge.

Reference [1] attempts to answer the question: why a strategy’s performance decays after going live.

Findings

-The paper investigates which ex-ante characteristics can predict the out-of-sample decline in risk-adjusted performance of published stock anomalies.

-The analysis covers a broad cross-section of anomalies documented in finance and academic journals, with the post-publication period defined as out-of-sample.

-Predictors of performance decay are based on two hypotheses: (1) arbitrage capital flowing into newly published strategies, and (2) in-sample overfitting due to multiple hypothesis testing.

-Publication year alone accounts for 30% of the variance in Sharpe ratio decay, with Sharpe decay increasing by 5 percentage points annually for newly published factors.

-Three overfitting-related variables—signal complexity (measured by the number of operations required) and two measures of in-sample sensitivity to outliers—add another 15% of explanatory power.

-Arbitrage-related variables are statistically significant but contribute little additional predictive power.

-The study tests both hypotheses using explanatory variables and univariate regressions, finding significant coefficients from both sets.

In short, the results indicate that performance decay is driven jointly by overfitting and arbitrage effects.

Reference

[1] Falck, Antoine Rej, Adam and Thesmar, David, Why and How Systematic Strategies Decay, SSRN 3845928

When to Stop Trading a Strategy?

When a trading system is losing money, an important question one should ask is: Are we in a drawdown, or has the system stopped working? The distinction is crucial because the two situations require different solutions. If we are in a drawdown, it means that our system is still working and we just have to ride out the losing streak. On the other hand, if our system has stopped working, we need to take action and find a new system.

Reference [2] attempted to answer this question.

Findings

-The paper examines how to distinguish between normal unlucky streaks and genuine degradation in trading strategies.

-It argues that excessively long or deep drawdowns should trigger a downward revision of the strategy’s assumed Sharpe ratio.

-A quantitative framework is developed using exact probability distributions for the length and depth of the last drawdown in upward-drifting Brownian motions.

-The analysis shows that both managers and investors systematically underestimate the expected length and depth of drawdowns implied by a given Sharpe ratio.

I found that the authors have some good points. But I don’t think that the assumption that the log P&L of a strategy follows a drifted Brownian process is realistic.

Note that a trading strategy’s P&L can often exhibit serial correlation. This is in contradiction with the assumption above.

Reference

[2] Adam Rej, Philip Seager, Jean-Philippe Bouchaud, You are in a drawdown. When should you start worrying? arxiv.org/abs/1707.01457v2

Closing Thoughts

Both papers address the critical issue of strategy persistence and performance decay, though from different perspectives. The first highlights how published anomalies tend to lose risk-adjusted returns over time, with evidence pointing to both overfitting in backtests and arbitrage capital crowding as drivers of performance decay. The second provides a quantitative framework for assessing when drawdowns signal genuine deterioration rather than normal variance, showing that investors often underestimate the length and depth of drawdowns implied by a given Sharpe ratio. Taken together, these studies underscore the need for investors to treat historical performance with caution, monitor strategies rigorously, and account for both statistical fragility and realistic drawdown expectations in portfolio management.

Cross-Sectional Momentum: Results from Commodities and Equities

Momentum strategies can be divided into two categories: time series and cross-sectional. In a previous newsletter, I discussed time series momentum. In this post, I focus on cross-sectional momentum strategies.

Cross-Sectional Momentum in the Commodity Market

Momentum trading is often divided into 2 categories: time-series momentum and cross-sectional momentum. Time-series based trading strategies generate trading signals based on the asset’s past returns. A typical time-series trading strategy usually involves buying assets with positive trend signals and selling those with negative trend signals. In contrast, cross-sectional trading strategies generate trading signals based on the relative performance of assets. A typical cross-sectional trading strategy involves buying assets with the highest-ranked trend signals and selling those with the lowest-ranked trend signals. So basically, this is a relative value strategy.

Reference [1] examined trend trading in the commodity market from the cross-sectional momentum perspective. The authors conducted a study on a portfolio of 35 commodity futures.

Findings

-The study introduces a trend factor that uses short-, intermediate-, and long-run moving averages of settlement prices in commodity futures markets.

-The trend factor generates statistically and economically significant returns during the post-financialization period (2004–2020).

-It outperforms the momentum factor by more than nine times in the Sharpe ratio and carries less downside risk.

-Unlike the momentum factor, which delivers insignificant returns in the sample, the trend factor consistently generates large positive returns.

-The trend factor cannot be explained by existing multifactor asset pricing models.

-It also provides a significant positive risk premium, confirming its economic relevance.

-The trend factor is correlated with funding liquidity, as measured by the TED spread.

-Overall, the findings highlight the economic value of using historical price information in commodity futures markets beyond traditional momentum strategies.

In short, cross-sectional momentum exists in the commodity market, and it is possible to construct a profitable trend trading strategy.

Reference

[1] Han, Yufeng and Kong, Lingfei, A Trend Factor in Commodity Futures Markets: Any Economic Gains From Using Information Over Investment Horizons? SSRN 3953845

Profitability of Cross-Sectional Momentum Strategy

Reference [2] examines the profitability of cross-sectional momentum strategies over the past decades

Findings

– The study investigates how different definitions of cross-sectional momentum affect the performance of long-short momentum strategies under varying market conditions.

-A long-short momentum portfolio buys stocks with strong past performance and shorts stocks with weak past performance.

-Standard long-short momentum strategies have delivered declining returns in recent decades, leading researchers and practitioners to search for solutions.

-A key weakness of momentum strategies is their tendency to experience “crashes,” where large gains are followed by sudden and substantial losses.

-The thesis proposes a method to mitigate crash risk, conditional on market states.

-While the literature offers complex methods for constructing long-short portfolios, the study demonstrates that relatively simple adjustments can meaningfully improve outcomes.

-Techniques such as volatility scaling and adjusting long and short positions based on market state enhance performance.

-These adjustments restore the robustness of the momentum premium and improve the risk-return profile of the strategy.

In short, the paper concludes that the profitability of cross-sectional momentum strategies has diminished. The author subsequently proposes an approach to enhance the strategy’s returns.

By applying techniques such as volatility scaling and adjusting long and short positions based on the market state we can significantly enhance the efficacy of the momentum strategy, restoring its former robustness.

Reference

[2] Pyry Pohjantähti, Revisiting (Revitalizing) Momentum, 2024, Aalto University School of Business

Closing Thoughts

In summary, both studies highlight that momentum remains useful but requires refinement. The first shows that a trend factor in commodity futures, built on moving averages, delivers strong returns and outperforms traditional momentum benchmarks. The second finds that cross-sectional momentum, though weakened and crash-prone, can be improved with simple adjustments like volatility scaling and conditioning on market states. Together, they show that momentum strategies are still effective when adapted to market conditions.

The Impact of Market Regimes on Stop Loss Performance

Stop loss is a risk management technique. It has been advocated as a way to control portfolio risk, but how effective is it? In this post, I will discuss certain aspects of stop loss.

When Are Stop Losses Effective?

A stop loss serves as a risk management tool, helping investors limit potential losses by automatically triggering the sale of a security when its price reaches a predetermined level. This level is set below the purchase price for long positions and above the purchase price for short positions.

Reference [1] investigates the effectiveness of stop losses by formulating a market model based on fractional Brownian motion to simulate asset price evolution, rather than using the conventional Geometric Brownian motion.

Findings

-In long positions, stop loss levels are placed below purchase prices, while in short positions, they are positioned above to protect invested capital.

-Stop-loss strategies improve buy-and-hold returns when asset prices display long-range dependence, capturing fractal characteristics of financial market behavior over time.

-The Hurst parameter, expected return, and volatility significantly influence stop-loss effectiveness, making their measurement crucial for optimizing strategy performance.

-Simulation results confirm that optimizing stop-loss thresholds for these variables can significantly enhance investment returns and reduce downside risks.

-Polynomial regression models were developed to estimate the optimal relationship between stop-loss thresholds and influencing variables for better trading outcomes.

-In mean-reverting market conditions, stop losses tend to reduce risk-adjusted returns, highlighting the importance of adapting strategies to market regimes.

In short, the paper formulated a market model based on fractional Brownian motion. Using this model, we can formally study the effectiveness of stop losses. It showed that stop losses enhance the risk-adjusted returns of the buy-and-hold investment strategy when the asset price is trending.

We note, however, that when the underlying asset is in the mean-reverting regime, stop losses decrease the risk-adjusted returns.

Reference

[1] Yun Xiang  and Shijie Deng, Optimal stop-loss rules in markets with long-range dependence, Quantitative Finance, Feb 2024

Fixed and Trailing Stop Losses in the Commodity Market

Building on previous discussion of the theoretical foundations of stop-loss strategies, Reference [2] examines their real-world application in the commodity market. It evaluates the performance of fixed and trailing stop losses, uncovering key factors that influence their effectiveness and impact on returns.

Findings

-The study analyzed fixed and trailing stop-loss strategies in commodity factor trading, focusing on their effectiveness in improving returns and reducing risk exposure.

-Results showed unmanaged factors performed poorly after accounting for transaction costs, while applying simple stop-loss rules significantly improved factor performance at the asset level.

-Fixed-stop strategies achieved an average Sharpe ratio of 0.92, whereas trailing-stop strategies delivered a higher average Sharpe ratio of 1.28.

-Both fixed and trailing stop-loss approaches maintained maximum drawdowns below 20 percent, with generally positive return skewness except for the skewness factor.

-The effectiveness of stop-loss strategies was not regime-dependent, but influenced by the quality of trading signals, commodity return volatility, and serial correlations.

-Transaction costs also played a significant role in determining stop-loss strategy performance, highlighting the importance of cost-efficient execution in commodity markets.

-Dynamically adjusting stop-loss thresholds based on realized volatility further enhanced factor performance compared to static fixed thresholds, especially in volatile trading environments.

-Stop-loss strategies were most effective when applied to factors built with high-conviction weighting schemes, maximizing their potential to capture commodity premia.

-Positive return autocorrelation and higher commodity return volatility were key conditions under which stop-loss strategies delivered the most meaningful performance improvements.

In short, in the commodity market, stop losses are effective when the autocorrelation of returns is positive, which is consistent with the findings of Reference [1]. Additionally, the volatility of returns influences how effective stop losses are.

A notable result of this study is that using trailing-stop with dynamic thresholds could enhance factor performance compared to using fixed thresholds.

Reference

[2] John Hua FAN, Tingxi ZHANG, Commodity Premia and Risk Management, 2023

Closing Thoughts

In summary, the first paper formulates a market model based on fractional Brownian motion to formally study the effectiveness of stop losses. It finds that stop losses improve the risk-adjusted returns of a buy-and-hold strategy when the asset price exhibits trending behavior, but reduce returns in mean-reverting regimes. The second paper focuses on the commodity market and shows that stop losses are effective when return autocorrelation is positive, aligning with the first study’s findings. It also highlights that return volatility affects stop loss effectiveness, and notably, that trailing stops with dynamic thresholds can enhance factor performance compared to fixed thresholds.

The Limits of Out-of-Sample Testing

In trading system design, out-of-sample (OOS) testing is a critical step to assess robustness. It is a necessary step, but not sufficient. In this post, I’ll explore some issues with OOS testing.

How Well Overfitted Trading Systems Perform Out-of-Sample?

In-sample overfitting is a serious problem when designing trading strategies. This is because a strategy that worked well in the past may not work in the future. In other words, the strategy may be too specific to the conditions that existed in the past and may not be able to adapt to changing market conditions.

One way to avoid in-sample overfitting is to use out-of-sample testing. This is where you test your strategy on data that was not used to develop the strategy. Reference [1] examined how well the in-sample optimized trading strategies perform out of sample.

Findings

-In-sample overfitting occurs when trading strategies are tailored too closely to historical data, making them unreliable in adapting to future, changing market conditions and behaviors.

-The study applied support vector machines with 10 technical indicators to forecast stock price directions and explored how different hyperparameter settings impacted performance and profitability.

-Results showed that while models often performed well on training data, their out-of-sample accuracy significantly dropped—hovering around 50%—highlighting the risk of misleading in-sample success.

-Despite low out-of-sample accuracy, about 14% of tested hyperparameter combinations outperformed the traditional buy-and-hold strategy in profitability, revealing some potential value.

-The highest-performing strategies exhibited chaotic behavior; their profitability fluctuated sharply with minor changes in hyperparameters, suggesting a lack of consistency and stability.

-There was no identifiable pattern in hyperparameter configurations that led to consistently superior results, further complicating strategy selection and tuning.

-These findings align with classic financial theories like the Efficient Market Hypothesis and reflect common challenges in machine learning, such as overfitting with complex, high-dimensional data.

-The paper stresses caution in deploying overfitted strategies, as their sensitivity to settings can lead to unpredictable results and unreliable long-term performance in real markets.

The results indicated that most models had a high in-sample accuracy but only around 50% when applied to out-of-sample data. Nonetheless, a significant proportion of the models managed to outperform the buy-and-hold strategy in terms of profitability.

However, it’s noteworthy that the most profitable strategies are sensitive to system parameters. This is a cause for concern.

Reference

[1] Yaohao Penga, Joao Gabriel de Moraes Souza, Chaos, overfitting, and equilibrium: To what extent can machine learning beat the financial market?  International Review of Financial Analysis Volume 95, Part B, October 2024, 103474

How Reliable Is Out-of-Sample Testing?

Out-of-sample testing is a crucial step in designing and evaluating trading systems, allowing traders to make more informed and effective decisions in dynamic and ever-changing financial markets. But is it free of well-known biases such as overfitting, data-snooping, and look-ahead? Reference [2] investigated these issues.

Findings

-Out-of-sample testing plays a vital role in evaluating trading systems by assessing their ability to generalize beyond historical data and perform well under future market conditions.

-Although useful, out-of-sample testing is not immune to biases such as overfitting, data-snooping, and especially look-ahead bias, which can distort the validity of results.

-A common issue arises when models are developed or tuned using insights gained from prior research, creating an indirect dependency between development and test data.

-Researchers found that excessively high Sharpe ratios in popular multifactor models can be largely explained by a subtle form of look-ahead bias in factor selection.

-Many out-of-sample research designs still overlap with datasets used in earlier studies, leading to results that reflect known patterns rather than genuine model performance.

-The ongoing and iterative nature of financial research makes it difficult to construct fully unbiased validation frameworks that truly represent out-of-sample conditions.

-When alternative evaluation methods were applied, Sharpe ratio estimates dropped significantly, indicating the extent to which traditional approaches may inflate performance expectations.

-This reduction in Sharpe ratios is actually encouraging, as it better reflects the realistic outcomes investors can expect when implementing these models in real time.

-Despite these findings, the paper emphasizes that multifactor models still improve on CAPM, though the improvements are smaller than widely claimed.

In short, out-of-sample testing also suffers, albeit subtly, from biases such as overfitting, data-snooping, and look-ahead.

We agree with the authors. We also believe that out-of-sample tests, such as walk-forward analysis, also suffer from selection bias.

Then how do we minimize these biases?

Reference

[2] Easterwood, Sara, and Paye, Bradley S., High on High Sharpe Ratios: Optimistically Biased Factor Model Assessments (2023). SSRN 4360788

Closing Thoughts

The results indicated that most models achieved high in-sample accuracy, but only around 50% when applied to out-of-sample data. While out-of-sample testing is an essential tool for evaluating trading strategies, it is not entirely free from biases such as overfitting and look-ahead. Research shows that these biases can inflate performance metrics like Sharpe ratios, leading to overly optimistic expectations.

Sentiment as Signal: Forecasting with Alternative Data and Generative AI

Quantitative trading based on market sentiment is a less developed area compared to traditional approaches. With the explosion of social media, advances in computing resources, and AI technology, sentiment-based trading is making progress. In this post, I will explore some aspects of sentiment trading.

Using ChatGPT to Extract Market Sentiment for Commodity Trading

A Large Language Model (LLM) is an advanced AI system trained on vast amounts of text data to understand, generate, and analyze human language. In finance, LLMs are used for tasks like analyzing earnings reports, generating market sentiment analysis, automating financial research, and enhancing algorithmic trading strategies.

Reference [1] examines the effectiveness of ChatGPT in predicting commodity returns. Specifically, it extracts commodity news information and forecasts commodity futures returns. The study gathers over 2.5 million articles related to the commodity market from nine international newspapers across three countries, covering a diverse set of 18 commodities.

Findings

-A novel Commodity News Ratio Index (CNRI) was developed using ChatGPT, derived from the analysis of more than 2.5 million news articles from nine international newspapers across 18 commodities.

-The CNRI effectively forecasts commodity futures excess returns over 1- to 12-month periods, demonstrating significant predictive power in both in-sample and out-of-sample regression analyses.

-ChatGPT was used to classify sentiment in commodity-related news as either positive or negative, based on headlines, abstracts, or full article content.

-The CNRI shows stronger forecasting accuracy during specific macroeconomic conditions—particularly economic expansions, contango market phases, and periods of declining inflation.

-This ChatGPT-based approach outperforms traditional text analysis methods, including BERT and Bag-of-Words, in predicting future returns in commodity markets.

-The study controlled for various business variables and economic indicators, confirming the independent predictive significance of the CNRI.

-Results indicate that the CNRI also holds macroeconomic insight, offering valuable signals on broader economic performance beyond commodity markets.

-Findings affirm the utility of ChatGPT in financial forecasting, showcasing the broader potential of LLMs in understanding and extracting actionable intelligence from complex financial text data.

-This research highlights the growing role of AI in finance, illustrating how LLMs can enhance decision-making for investors, analysts, and risk managers alike.

In short, ChatGPT proves useful in forecasting commodity market dynamics and provides valuable insights for investors and risk managers.

Reference

[1]Shen Gao, Shijie Wang, Yuanzhi Wang, Qunzi Zhang, ChatGPT and Commodity Return, Journal of Futures Market, 2025; 1–15

Using the Number of Confirmed Covid Cases as a Sentiment Indicator

COVID-19, the novel coronavirus, was a source of anxiety for markets and individuals around the world since its outbreak in December 2019. Many traders looked for ways to use the information on the spread of the virus to predict market movements.

In Reference [2], the authors established an intraday algorithmic trading system that would open a short position in the Eurostoxx 50 futures market if the number of new confirmed cases of Covid-19 increased in the previous day (suggesting that fear of the epidemic rises), and close by afternoon. The system will open a long position if the new confirmed cases of Covid-19 have decreased from the previous day. The trading system achieved an annual return of 423% and a Sharpe ratio of 4.74.

Findings

-Daily confirmed COVID-19 cases were used as a sentiment proxy, reflecting public fear and uncertainty in financial markets during the pandemic.

-Researchers built an intraday trading system for Eurostoxx 50 futures, responding to increases or decreases in new Covid-19 cases reported the previous day.

-The system opened short positions after rising case counts and long positions after declines, closing trades by the afternoon to reduce overnight exposure.

-This simple rule-based strategy delivered an annual return of 423% and a Sharpe ratio of 4.74, suggesting strong performance under extreme market stress.

-The study demonstrated that pandemic-related health data could serve as a reliable short-term predictor of market direction, especially during crisis periods.

-Results reinforce the idea that emotional triggers—like health fears—can impact trading behavior just as much as traditional economic indicators or financial models.

-During high-uncertainty environments, metrics that reflect collective anxiety, such as COVID-19 cases, can outperform classic sentiment tools like the VIX index.

-The strategy showed how non-financial data can be directly translated into market actions, offering practical tools for risk-aware investors and quant traders.

-Overall, the research contributes to behavioral finance by quantifying the influence of fear on asset prices in moments of extreme public concern.

The article presented new evidence that emotions have an impact on financial markets, especially in situations of extreme uncertainty. In these situations, investors may utilize a variety of investment techniques based on metrics reflecting the progression of fear.

Reference

[2] Gómez Martínez, R., Prado Román, C., &Cachón Rodríguez , G. (2021). Algorithmic trading based on the fear of Covid-19 in Europe, Harvard Deusto Business Research 10(2), 295-304.

Closing Thoughts

Together, these studies highlight the growing role of alternative data and AI-driven sentiment analysis in financial forecasting. From pandemic case counts to millions of news articles, both fear and information flow can shape markets in measurable ways. Whether through rule-based trading or LLM-powered indices, the findings underscore how emotion, uncertainty, and unstructured data are becoming key inputs in modern investment strategies.

How Machine Learning Enhances Market Volatility Forecasting Accuracy

Machine learning has many applications in finance, such as asset pricing, risk management, portfolio optimization, and fraud detection. In this post, I discuss the use of machine learning in forecasting volatility.

Using Machine Learning to Predict Market Volatility

The unpredictability of the markets is a well-known fact. Despite this, many traders and portfolio managers continue to try to predict market volatility and manage their risks accordingly. Usually, econometric models such as GARCH are used to forecast market volatility.

In recent years, machine learning has been shown to be capable of predicting market volatility with accuracy. Reference [1] explored how machine learning can be used in this context.

Findings

-Machine learning models can accurately forecast stock return volatility using a small set of key predictors: realized volatility, idiosyncratic volatility, bid-ask spread, and returns.

-These predictors align with existing empirical findings, reinforcing the traditional risk-return trade-off in finance.

-ML methods effectively capture both the magnitude and direction of predictor impacts, along with their interactions, without requiring pre-specified model assumptions.

-Large current-period volatility values strongly predict higher future volatility; small values have a muted or negative impact.

-LSTM models outperform feedforward neural networks and regression trees by leveraging temporal patterns in historical data.

-An LSTM using only volatility and return history over one year performs comparably to more complex models with additional predictors.

-LSTM models function as distribution-free alternatives to traditional econometric models like GARCH.

-Optimal lag length remains critical in LSTM performance and must be selected through model training.

-The study reports an average predicted realized volatility of 44.1%, closely matching the actual value of 43.8%.

-Out-of-sample R² values achieved are significantly higher than those typically reported in related volatility forecasting literature.

In short, the paper aimed to demonstrate the potential of machine learning for modeling market volatility. In particular, the authors have shown how the LSTM model can be used to predict market volatility and manage risks. The results suggest that this is a promising alternative approach to traditional econometric models like GARCH.

Reference

[1] Filipovic, Damir and Khalilzadeh, Amir, Machine Learning for Predicting Stock Return Volatility (2021). Swiss Finance Institute Research Paper No. 21-95

Machine Learning Models for Predicting Implied Volatility Surfaces

The Implied Volatility Surface (IVS) represents the variation of implied volatility across different strike prices and maturities for options on the same underlying asset. It provides a three-dimensional view where implied volatility is plotted against strike price (moneyness) and time to expiration, capturing market sentiment about expected future volatility.

Reference [2] examines five methods for forecasting the Implied Volatility Surface of short-dated options. These methods are applied to forecast the level, slope, and curvature of the IVS.

Findings

-The study evaluates five methods—OLS, AR(1), Elastic Net, Random Forest, and Neural Network—to forecast the implied volatility surface (IVS) of weekly S&P 500 options.

-Forecasts focus on three IVS characteristics: level, slope, and curvature.

-Random Forest consistently outperforms all other models across these three IVS dimensions.

-Non-learning-based models (OLS, AR(1)) perform comparably to some machine learning methods, highlighting their continued relevance.

-Neural Networks forecast the IVS level reasonably well but perform poorly in predicting slope and curvature.

-Elastic Net, a linear machine learning model, is consistently outperformed by the non-linear models (Random Forest and Neural Network) for the level characteristic.

-The study emphasizes the importance of model selection based on the specific IVS characteristic being forecasted.

-Performance evaluation is supported using the cumulative sum of squared error difference (CSSED) and permutation variable importance (VI) metrics.

-The research highlights the utility of Random Forest in capturing complex, non-linear patterns in IVS dynamics.

-Accurate IVS forecasting is valuable for derivative pricing, hedging, and risk management strategies.

This research highlights the potential of machine learning in forecasting the implied volatility surface, a key element in options pricing and risk management. Among the five methods studied, Random Forest stands out as the most consistent and accurate across multiple IVS features.

Reference

[2] Tim van de Noort, Forecasting the Characteristics of the Implied Volatility Surface for Weekly Options: How do Machine Learning Methods Perform? Erasmus University, 2024

Closing Thoughts

These studies highlight the growing effectiveness of machine learning in financial forecasting, particularly for market volatility and implied volatility surfaces. Models like LSTM and Random Forest demonstrate clear advantages over traditional methods by capturing complex patterns and dependencies. As financial markets evolve, leveraging such tools offers a promising path for enhancing predictive accuracy and risk management.

Predicting Corrections and Economic Slowdowns

Being able to anticipate a market correction or an economic recession is important for managing risk and positioning your portfolio ahead of major shifts. In this post, we feature two articles: one that analyzes indicators signaling a potential market correction, and another that examines recession forecasting models based on macroeconomic data.

Predicting Recessions Using The Volatility Index And The Yield Curve

The yield curve is a graphical representation of the relationship between the yields of bonds with different maturities. The yield curve has been inverted before every recession in the United States since 1971, so it is often used as a predictor of recessions.

A study [1] shows that the co-movement between the yield-curve spread and the VIX index, a measure of implied volatility in S&P500 index options, offers improvements in predicting U.S. recessions over the information in the yield-curve spread alone.

Findings

-The VIX index measures implied volatility in S&P 500 index options and reflects investor sentiment and market uncertainty.

-A counterclockwise pattern (cycle) between the VIX index and the yield-curve spread aligns closely with the business cycle.

-A cycle indicator based on the VIX-yield curve co-movement significantly outperforms the yield-curve spread alone in predicting recessions.

-This improved forecasting performance holds true for both in-sample and out-of-sample data using static and dynamic probit models.

-The predictive strength comes from the interaction between monetary policy and financial market corrections, not from economic policy uncertainty.

-Shadow rate analysis confirms the cycle indicator’s effectiveness, even during periods of unconventional monetary policy and flattened yield curves.

-The findings suggest a new framework for macroeconomic forecasting, with the potential to enhance early detection of financial instability.

-The VIX-yield curve cycle adds value beyond existing leading indicators and may help in anticipating major economic disruptions like the subprime crisis.

In short, the study concludes that the co-movement between the yield curve spread and the VIX index, which is a measure of implied volatility in S&P 500 index options, provides an improved prediction for U.S. recessions over any information available from just considering the yield-curve spreads alone.

This new research will have implications for how macroeconomists forecast future economic conditions and could even change how we predict periods of high financial instability like the subprime crisis.

Reference

[1] Hansen, Anne Lundgaard, Predicting Recessions Using VIX-Yield-Curve Cycles (2021). SSRN 3943982

Can We Predict a Market Correction?

A correction in the equity market refers to a downward movement in stock prices after a sustained period of growth. Market corrections can be triggered by various factors such as economic conditions, changes in investor sentiment, or geopolitical events. During a correction, stock prices may decline by a certain percentage from their recent peak, signaling a temporary pause or reversal in the upward trend.

Reference [2] examines whether a correction in the equity market can be predicted. It defines a correction as a 4% decrease in the SP500 index. It utilizes logistic regression to examine the predictability of several technical and macroeconomic indicators.

Findings

-Eight technical, macroeconomic, and options-based indicators were selected based on prior research.

-Volatility Smirk (skew), Open Interest Difference, and Bond-Stock Earnings Yield Differential (BSEYD) are statistically significant predictors of market corrections.

-These three predictors were significant at the 1% level, indicating strong reliability in forecasting corrections.

-TED Spread, Bid-Offer Spread, Term Spread, Baltic Dry Index, and S&P GSCI Commodity Index did not show consistent predictive power.

-The best-performing model used a 3% correction threshold and achieved 77% accuracy in in-sample prediction.

-Out-of-sample testing showed 59% precision in identifying correction events, offering an advantage over random prediction.

-The results highlight inefficiencies in the market and support the presence of a lead-lag effect between option and equity markets.

-The research provides valuable tools for risk management and identifying early signs of downturns in equity markets.

In short, the following indicators are good predictors of a market correction,

-Volatility Smirk (i.e. skew),

-Open Interest Difference, and

-Bond-Stock Earnings Yield Differential (BSEYD)

The following indicators are not good predictors,

-The TED Spread,

-Bid-Offer Spread,

-Term Spread,

-Baltic Dry Index, and

-S&P GSCI Commodity Index

This is an important research subject, as it allows investors to manage risks effectively and take advantage of market corrections.

Reference

[2] Elias Keskinen, Predicting a Stock Market Correction, Evidence from the S&P 500 Index, University of VAASA

Closing Thoughts

This research underscores the growing value of combining traditional financial indicators with options market metrics to improve market correction and recession forecasts. Tools like the VIX-yield curve cycle, Volatility Smirk, and BSEYD offer a more refined understanding of market risks. As financial markets evolve, integrating diverse data sources will be key to staying ahead of economic and market shifts.