Paper status: completed

News Sentiment as Leading Indicators for Recessions

Published:05/11/2018
Original LinkPDF
Price: 0.100000
Price: 0.100000
Price: 0.100000
6 readers
This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

A novel news sentiment indicator, derived via topic modeling and sentiment analysis of unstructured news, significantly improves recession prediction. When combined with traditional survey-based sentiment and macroeconomic factors, this direct measure of public information polari

Abstract

In the following paper, we use a topic modeling algorithm and sentiment scoring methods to construct a novel metric that serves as a leading indicator in recession prediction models. We hypothesize that the inclusion of such a sentiment indicator, derived purely from unstructured news data, will improve our capabilities to forecast future recessions because it provides a direct measure of the polarity of the information consumers and producers are exposed to. We go on to show that the inclusion of our proposed news sentiment indicator, with traditional sentiment data, such as the Michigan Index of Consumer Sentiment and the Purchasing Manager's Index, and common factors derived from a large panel of economic and financial indicators helps improve model performance significantly.

Mind Map

In-depth Reading

English Analysis

1. Bibliographic Information

  • Title: News Sentiment as Leading Indicators for Recessions
  • Authors: Melody Y. Huang, Randall R. Rojas, Patrick D. Convery. All authors are affiliated with the Department of Economics at the University of California, Los Angeles (UCLA).
  • Journal/Conference: The paper is available on arXiv, which is a preprint server. This means it has not undergone formal peer review for publication in a journal or conference at the time of this version's release.
  • Publication Year: 2018
  • Abstract: The authors develop a new metric for recession prediction by applying topic modeling and sentiment analysis to unstructured news data. They hypothesize that this news-derived sentiment indicator, by directly measuring the polarity of information reaching the public, can improve forecasting models. They demonstrate that combining their indicator with traditional survey-based sentiment indices (like the Michigan Index of Consumer Sentiment and the Purchasing Manager's Index) and common factors from macroeconomic data leads to significantly better model performance.
  • Original Source Link: The paper is available at https://arxiv.org/abs/1805.04160, with the PDF at http://arxiv.org/pdf/1805.04160v2.

2. Executive Summary

  • Background & Motivation (Why):

    • Core Problem: Traditional models for predicting economic recessions rely heavily on a limited set of economic and financial indicators, such as the yield curve spread. While effective to a degree, these models often miss nuances captured by public sentiment.
    • Importance & Gaps: Existing sentiment indicators are typically derived from surveys (e.g., consumer confidence), which are indirect measures and can be subject to delays and biases. The rise of "big data" and computational linguistics offers a new opportunity to extract sentiment directly from the information that shapes public opinion, such as news articles. Prior work using unstructured data has focused more on financial market prediction than on macroeconomic forecasting like recessions. This paper aims to fill that gap.
    • Fresh Angle: The paper introduces a novel method to quantify news sentiment that goes beyond simple positive/negative word counting. It uniquely combines the polarity (positive/negative tone) of news with the concentration (topical similarity) of news stories, arguing that during downturns, news becomes not only more negative but also more thematically focused on the crisis.
  • Main Contributions / Findings (What):

    • Novel Indicator Construction: The paper details a new two-part methodology for creating a monthly news sentiment indicator from 1965 to 2017 using New York Times articles. This involves:
      1. A sentiment score based on an opinion lexicon.
      2. A "coherence" score based on the Jensen-Shannon distance between topics identified by a Latent Dirichlet Allocation (LDA) model.
    • Improved Forecasting Performance: The primary finding is that including this novel news sentiment indicator in a probit model for recession prediction significantly improves both in-sample and out-of-sample performance, especially at longer forecast horizons (6 and 12 months).
    • Synergy with Existing Indicators: The news sentiment indicator provides predictive power even when controlling for a large panel of 138 macroeconomic variables (distilled into 15 common factors) and traditional survey-based sentiment indices. The inclusion of interaction terms suggests a feedback loop between news sentiment and public sentiment.

3. Prerequisite Knowledge & Related Work

  • Foundational Concepts:

    • Recession Prediction: An economic recession is a significant, widespread, and prolonged downturn in economic activity. Predicting recessions is crucial for policymakers, businesses, and investors to mitigate negative impacts. Models typically use leading indicators—economic variables that tend to change before the broader economy does.
    • Yield Curve Spread: A classic leading indicator. It is the difference between the interest rates on long-term (e.g., 10-year) and short-term (e.g., 3-month) government bonds. An "inverted" yield curve (short-term rates higher than long-term rates) has historically been a reliable predictor of recessions.
    • Sentiment Analysis: A field of Natural Language Processing (NLP) that involves computationally determining the emotional tone or "sentiment" (positive, negative, neutral) of a piece of text. The simplest form uses a lexicon, which is a dictionary of words pre-labeled with sentiment scores.
    • Topic Modeling & Latent Dirichlet Allocation (LDA): Topic modeling is an unsupervised machine learning technique used to discover abstract "topics" that occur in a collection of documents. LDA is a popular algorithm for this. It assumes that each document is a mixture of various topics, and each topic is a distribution of words. For example, a topic about "finance" might have a high probability of containing words like "stock," "market," "rate," and "bank."
    • Common Factors & Principal Component Analysis (PCA): Modern economic forecasting often uses hundreds of time series variables. To avoid overfitting and handle multicollinearity, economists use techniques like PCA to extract a few "common factors" that capture the shared movement (co-variance) among these many variables. These factors represent underlying, unobserved drivers of the economy.
    • Survey-Based Sentiment Indicators:
      • Michigan Index of Consumer Sentiment (MICS): A monthly survey measuring U.S. consumer confidence in the economy.
      • Purchasing Manager's Index (PMI): An index based on surveys of purchasing managers in the manufacturing sector, indicating the economic health of that sector.
  • Previous Works:

    • The paper first acknowledges the long history of using financial indicators, especially the yield curve spread, as the primary tool for recession forecasting (Estrella, 2005; Rudebusch and Williams, 2009).
    • It then discusses the move towards using large panels of macroeconomic variables and extracting common factors to improve forecasts (Stock and Watson, 2002; Liu and Moench, 2016). However, even in these large models, the yield curve often remains the most powerful single predictor.
    • The literature has also recognized the importance of sentiment, or "animal spirits" (Keynes, 1936). Studies by Matsusaka & Sbordone (1995) and Christiansen et al. (2014) showed that survey-based indicators like MICS and PMI significantly improve recession models.
    • Finally, the paper situates itself within the emerging field of using unstructured "big data". It cites work on using Twitter to predict stock markets (Bollen et al., 2011) and Google searches to forecast unemployment (D'Amuri and Marcucci, 2017), noting that this approach has been underutilized for macroeconomic business cycle forecasting.
  • Differentiation:

    • This paper's key innovation is the creation of a sentiment indicator derived directly from news content, rather than from surveys.
    • It uniquely combines sentiment polarity with topic coherence, arguing that the focus of news is as important as its tone. This is a more sophisticated measure than a simple word count.
    • It rigorously tests this new indicator against strong benchmarks that include both traditional financial/macroeconomic factors and existing survey-based sentiment indicators, demonstrating its additive predictive value.

4. Methodology (Core Technology & Implementation)

The core of the paper is the construction of a novel news sentiment indicator. This process involves several distinct steps.

  • Data Collection & Preprocessing:

    1. Source: All articles published in the New York Times from 1965 to 2017.
    2. Filtering: Articles were selected if they were linked to the keywords "economy" or "stock market."
    3. Preprocessing: Standard text mining practices were applied: removing numbers, punctuation, white spaces, and common "stop words" (e.g., "the," "a," "is").
    4. Aggregation: The cleaned articles were grouped into monthly blocks.
  • Step 1: Scoring the Sentiment (Polarity)

    • The authors measure the positive or negative tone of the news.
    • Method: Each word in an article is scored using an opinion lexicon.
      • Positive words receive a score of +1.
      • Negative words receive a score of -1.
      • Neutral words receive a score of 0.
    • Document Score: The total score for a single article is the sum of its word scores, divided by the total number of words in the article.
    • Daily Score (scoretscore_t): The final score for a given day is the sum of the scores of all relevant articles published on that day. The authors use a sum rather than an average to account for the volume of news—more negative stories on a given day should have a larger impact than a single negative story.
  • Step 2: Measuring the Concentration of Topics (Coherence)

    • The goal here is to quantify how thematically similar or "coherent" the news articles are in a given month. The intuition is that during a crisis, news coverage narrows its focus on the negative event.
    • Algorithm: Latent Dirichlet Allocation (LDA) is applied to each monthly block of articles.
      • LDA models each document as a mixture of topics and each topic as a mixture of words. For this paper, the number of topics KK was fixed at 30.
      • The key output of LDA is the set of topic distributions, represented by ϕk\phi_k, where each ϕk\phi_k is a probability distribution over the entire vocabulary, indicating which words are most important for that topic.
    • Measuring Distance Between Topics: To measure how related the 30 topics are to each other, the authors use a distance metric derived from information theory.
      • Kullback-Leibler (KL) Divergence: A measure of how one probability distribution diverges from a second. It is defined for two discrete distributions PP and QQ as: DKL(PQ)=iP(i)logP(i)Q(i) D_{KL}(P || Q) = \sum_i P(i) \log \frac{P(i)}{Q(i)}
      • Jensen-Shannon Divergence (JSD): A symmetric and finite version of KL divergence, making it suitable as a distance metric. It is defined as: JSD(PQ)=12D(PM)+12D(QM) JSD(P || Q) = \frac{1}{2} D(P || M) + \frac{1}{2} D(Q || M) where M=12(P+Q)M = \frac{1}{2}(P+Q) is the average of the two distributions.
      • The Jensen-Shannon distance is the square root of the JSD. The authors compute this distance between every pair of the 30 topic distributions (ϕi\phi_i, ϕj\phi_j), resulting in a 30×3030 \times 30 distance matrix MdistM_{dist}.
    • Coherence Score (σtdist\sigma_t^{dist}): The overall "coherence" for a month is calculated as the standard deviation of all the distances in the MdistM_{dist} matrix. σtdist=1K2j=1Ki=1K(di,jdˉ)2 \sigma_t^{dist} = \sqrt{\frac{1}{K^2} \sum_{j=1}^{K} \sum_{i=1}^{K} (d_{i,j} - \bar{d})^2} A low standard deviation implies that the topics are all very similar to each other (high coherence), which is hypothesized to occur during recessions. A high standard deviation means the topics are diverse and sparse. Correction: The paper text implies sparsity is linked to a high standard deviation, which seems more intuitive. The text says "the more sparsely related the topics are, the lower the coherence," but then links the standard deviation to "sparsity." We will follow the formula's direct implication: a higher σtdist\sigma_t^{dist} means greater average distance between topics, hence more sparsity.
  • Step 3: Constructing the Final Sentiment Indicators

    • Absolute Sentiment Indicator (senttsent_t): The coherence score is used to weight the sentiment score. sentt=σtdistscoret sent_t = \sigma_t^{dist} * score_t This combined metric captures both the tone and focus of the news. The paper notes that this metric tends to peak and then decline sharply just before a recession.
    • Relative Sentiment Indicator (ztsentz_t^{sent}): To account for long-term cultural shifts in media tone (e.g., news becoming generally more negative over decades), the authors normalize the absolute indicator. They calculate a rolling z-score over a two-year (24-month) window. ztsent=senttμt24,tsdt24,t z_t^{sent} = \frac{sent_t - \mu_{t-24, t}}{sd_{t-24, t}} where μt24,t\mu_{t-24,t} and sdt24,tsd_{t-24,t} are the mean and standard deviation of sent over the previous 24 months. This relative measure, ztsentz_t^{sent}, is the final indicator used in the forecasting models.
  • Model Specification

    • The authors use a probit model, a type of binary regression suitable for predicting a yes/no outcome (recession or not). The probability of a recession at time tt, given information up to t-h (where hh is the forecast horizon in months), is: P(rect=1Hth)=Φ(πt) P(rec_t = 1 | H_{t-h}) = \Phi(\pi_t) Here, Φ()\Phi(\cdot) is the standard normal cumulative distribution function, and πt\pi_t is a linear function of the predictor variables.
    • The full proposed model is: πt=αt+i=115βifi,th+ϕ1micsth+ϕ2pmith+γ1zthsent(1+γ2micsth+γ3pmith) \pi_t = \alpha_t + \sum_{i=1}^{15} \beta_i f_{i,t-h} + \phi_1 mics_{t-h} + \phi_2 pmi_{t-h} + \gamma_1 z_{t-h}^{sent}(1 + \gamma_2 mics_{t-h} + \gamma_3 pmi_{t-h}) Explanation of terms:
      • fi,thf_{i, t-h}: The 15 common factors extracted from a large macroeconomic panel, lagged by hh months.
      • micsthmics_{t-h}: The Michigan Index of Consumer Sentiment, lagged.
      • pmithpmi_{t-h}: The Purchasing Manager's Index, lagged.
      • zthsentz_{t-h}^{sent}: The novel relative news sentiment indicator, lagged.
      • The terms including γ2\gamma_2 and γ3\gamma_3 are interaction terms. They test the hypothesis that the effect of news sentiment is amplified by the existing level of consumer and business sentiment.

5. Experimental Setup

  • Datasets:

    • News Data: New York Times articles from 1965-2017, filtered for "economy" and "stock market" keywords, used to construct the ztsentz_t^{sent} indicator.
    • Macroeconomic Data: The FRED-MD monthly database, a balanced panel of 138 U.S. macroeconomic time series from 1965 to 2017. PCA was used to extract 15 common factors from this panel.
    • Recession Data: Recession periods are defined by the National Bureau of Economic Research (NBER) business cycle dates. This is converted into a binary variable rectrec_t (1 for recession month, 0 otherwise).
  • Evaluation Metrics:

    • F1-Score:
      1. Conceptual Definition: Used because recession data is highly imbalanced (far more non-recession months than recession months). A simple accuracy metric would be misleading. The F1-score is the harmonic mean of precision and recall, providing a balanced measure of a model's performance on the positive class (recessions).
        • Precision: Of all the months the model predicted as a recession, what fraction were actually recessions? (Measures exactness).
        • Recall (True Positive Rate): Of all the actual recession months, what fraction did the model correctly identify? (Measures completeness).
      2. Mathematical Formula: Precision=True PositivesTrue Positives+False Positives \text{Precision} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Positives}} Recall=True PositivesTrue Positives+False Negatives \text{Recall} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Negatives}} F1=2PrecisionRecallPrecision+Recall F_1 = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}
      3. Symbol Explanation: True Positives are correctly identified recession months. False Positives are non-recession months incorrectly flagged as recessions. False Negatives are recession months that were missed.
    • ROC Curve and AUROC:
      1. Conceptual Definition: The Receiver Operating Characteristic (ROC) curve plots the True Positive Rate (Recall) against the False Positive Rate at various classification thresholds. The Area Under the ROC Curve (AUROC) provides a single number summarizing the model's performance across all thresholds. An AUROC of 1.0 represents a perfect classifier, while 0.5 represents a random guess.
      2. Mathematical Formula: True Positive Rate (TPR)=TPTP+FN \text{True Positive Rate (TPR)} = \frac{TP}{TP+FN} False Positive Rate (FPR)=FPFP+TN \text{False Positive Rate (FPR)} = \frac{FP}{FP+TN} AUROC=01TPR(FPR1(x))dx \text{AUROC} = \int_{0}^{1} \text{TPR}(\text{FPR}^{-1}(x)) dx
      3. Symbol Explanation: TP, FN, FP are True Positives, False Negatives, and False Positives. TN is True Negatives (correctly identified non-recession months). The integral calculates the area under the curve formed by plotting TPR vs. FPR.
    • AIC and BIC:
      1. Conceptual Definition: The Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) are measures used for model selection. They penalize models for having more parameters, balancing model fit against complexity to prevent overfitting. Lower values are better.
  • Baselines:

    • Model 1 (15 PCA): A probit model using only the 15 common factors derived from the macroeconomic panel.
    • Model 2 (MCI, PMI, 15 PCA): Model 1 augmented with the two traditional survey-based sentiment indicators (MICS and PMI).
    • Proposed Model (Model 3): Model 2 augmented with the novel relative news sentiment indicator (ztsentz_t^{sent}) and its interaction terms with MICS and PMI.

6. Results & Analysis

  • Core Results:

    • Regression Analysis (Table 1): The regression output shows that the news sentiment indicator zent and its interaction terms are highly statistically significant, especially at longer forecast horizons (h=6h=6 and h=12h=12). This indicates that the new indicator provides unique predictive information not captured by the other variables. For example, at h=12h=12, the coefficient on zent is positive and significant, while its interaction with mics is negative and significant. This complex relationship suggests that the effect of news sentiment depends on the prevailing consumer sentiment.

    • Model Selection (Table 2):

      • At short horizons (h=1h=1, h=3h=3), AIC and BIC favor Model 2 (factors + traditional sentiment). This aligns with the regression results where the news indicator was less significant.
      • At long horizons (h=6h=6, h=12h=12), AIC and BIC prefer the proposed Model 3. This is a key finding, as long-term forecasting is particularly valuable.
    • Forecasting Performance (Table 3 & Table 4):

      • In-Sample: The proposed Model 3 achieves the highest F1-score across all forecast horizons, with the performance gap widening at longer horizons (h=12h=12: 0.7355 vs. 0.6795 for Model 2).
      • Out-of-Sample: The analysis is split into three periods. The paper finds that the sentiment indicators provide the most benefit after the 1980s, and the news sentiment indicator specifically shows its strength in the most recent period (1999-2017). For this recent period, Model 3 generally outperforms the others, especially at h=3h=3, h=6h=6, and h=12h=12.
      • AUROC: The bootstrapped AUROC results in Table 4 confirm these findings. At longer horizons (h=6h=6 and h=12h=12), the proposed Model 3 has a higher mean and median AUROC than the baseline models, indicating more robust and reliable predictive power.
  • Figures & Visualizations:

    • Figure 1: Intertopic Distance Map

      该图像是一个主题间距离图,展示了衰退期(蓝色)与非衰退期(红色)主题的主成分分析(PCA)分布及其相互重叠情况,用以分析新闻情感指标在经济周期中的表现。 该图像是一个主题间距离图,展示了衰退期(蓝色)与非衰退期(红色)主题的主成分分析(PCA)分布及其相互重叠情况,用以分析新闻情感指标在经济周期中的表现。

      This figure visualizes the core intuition behind the "coherence" metric. The topics from a recession month (February 2009, blue circles) are tightly clustered, indicating a narrow, focused news cycle. In contrast, topics from a non-recession month (August 2004, red circles) are more spread out, indicating a diverse range of news topics.

    • Figure 2 & 3: Sentiment Time Series

      Figure 2: We plot out the constructed time series of news sentiment and observe that prior to recessions occurring (denoted by the light blue bars in the plot), the series peaks and then quickly begi… 该图像是论文中展示的图表,显示了构建的新闻情绪指标时间序列及其6个月移动平均值。图中浅蓝色竖条表示经济衰退时期,情绪指标在衰退前通常达到峰值后迅速下降,预示潜在经济下行。

      Figure 3: We plot out the relative news sentiment index that we have constructed. This is done by taking a rolling two year z-score across our original measure of news sentiment (see Figure 2 for com… 该图像是图表,展示了论文中构建的相对新闻情绪指标的时间序列(见图3)。通过对新闻情绪指标进行两年滚动z分数标准化,消除了不同时间段新闻周期的波动影响,图中灰色区域标示了经济衰退期。

      Figure 2 shows the absolute sentiment indicator (senttsent_t). A clear pattern emerges where the indicator spikes and then plummets just before NBER-defined recessions (light blue bars). Figure 3 shows the normalized, relative indicator (ztsentz_t^{sent}), which is used in the final model. This version removes long-term trends and appears more stationary.

    • Figure 4: Density of Sentiment Metric

      Figure 4: The estimated densities of the sentiment metric during times of recession and times of non-recession are plotted (where the dashed line notes the density during times of recession, and the… 该图像是图表,展示了衰退期(虚线)与非衰退期(实线)新闻情绪指标的概率密度估计分布。可以看到衰退期间情绪指标整体偏低,非衰退期间则较高。

      This plot confirms that the distribution of the sentiment metric is different in recession vs. non-recession periods. The mean of the metric is visibly lower during recessions (dashed red line) than during non-recessions (solid blue line), supporting its validity as an indicator.

    • Figure 5 & 6: Fitted and Forecasted Probabilities

      Figure 5: We compare the fitted recession probabilities for the different models. Black represents the values fit by the fifteen common factor model, red represents the values fit by the fifteen comm… 该图像是比较不同模型拟合的经济衰退概率的图表(对应论文中的Figure 5)。图中黑色为十五个公共因子模型,红色为包含传统情绪指标的模型,蓝色为包含新闻情绪指标的改进模型,展示了新闻情绪指标及交互项提高了拟合效果。

      Figure 6: We plot the recursively forecasted values. We see that at a shorter time horizon, our model is prone to false signals, but is more stable and less noisy at longer time horizons. The accurac… 该图像是图表,展示了论文中不同时间预测窗口(h=1,3,6,12)下三种模型对经济衰退概率的递归预测值。图中显示短期预测误报较多,但长期预测更加稳定且噪声较少。

      These figures visually compare the models' predicted probabilities of recession against the actual recession periods (gray bars). Figure 5 (in-sample) shows the proposed model (blue) often provides a cleaner and more decisive signal. Figure 6 (out-of-sample) shows the model's real-time forecasting ability, highlighting that it becomes more stable at longer horizons.

    • Figure 7: ROC Curves

      Figure 7: We perform block bootstrapping across our data set and then perform outof-sample recursive backtesting across the last third of the synthetic data set to obtain different ROC curves. This a… 该图像是图7,展示了不同时间跨度(h=1,3,6,12个月)下通过区块自助法和递归回测得到的ROC曲线。结果显示,拟议模型在6个月预测期表现最佳,而在1个月期表现较差。

      This figure shows the results of a robustness check using bootstrapping. The ROC curves for the proposed model (blue) are consistently shifted further towards the top-left corner (the ideal point) for the 6-month and 12-month ahead forecasts, visually confirming its superior performance at these longer horizons.

  • Tables Transcription & Analysis

    • Table 1: Regression Output of Proposed Model *This table contains the coefficients from the probit regression for the proposed model (Model 3) at different forecast horizons (h=1, 3, 6, 12 months). Standard errors are in parentheses. Significance levels: *p<0.1; **p<0.05; **p<0.01. The table contains too many variables to transcribe fully, but the key variables are micst-h, pmit-h, zent, and the interaction terms.

      Key takeaway from Table 1: The statistical significance of the novel indicator zent and its interaction with mics increases dramatically at longer horizons (h=6h=6 and h=12h=12), demonstrating its value for long-term forecasting.

    • Table 2: AIC and BIC Comparison This is a manual transcription of Table 2 from the paper.

      h = 1 h = 3 h = 6 h = 12
      Model df AIC BIC AIC BIC AIC BIC AIC BIC
      Model 1 (15 PCA) 16 152.2288 223.2328 194.1055 265.0582 337.7354 308.6109 253.886 324.606*
      Model 2 (MCI, PMI, 15 PCA) 18 117.6832* 197.5627* 149.1068* 228.9286* 221.9752 301.7102 254.081 333.641
      Model 3 (Our proposed model) 21 120.7297 213.9225 183.9261 268.1824 197.8835* 290.9076* 238.621* 331.441

      Note: The asterisk () indicates the preferred model according to that criterion. There appears to be a typo in the table for Model 1 at h=6 where BIC is lower than AIC. Assuming this is a typo and focusing on the general trend, the conclusion holds.*

      Analysis: At shorter horizons (h=1, 3), Model 2 is preferred. At longer horizons (h=6, 12), the proposed Model 3 is selected by AIC and/or BIC, supporting the idea that news sentiment is a valuable leading indicator for more distant events.

    • Table 3: Model Performance (F1-scores) This is a manual transcription of Table 3 from the paper.

      In-Sample

      h=1 h=3 h=6 h=12
      Model 1 (15 PCA) 0.8571 0.7875 0.7200 0.6710
      Model 2 (MCI, PMI, 15 PCA) 0.9091 0.8589 0.7564 0.6795
      Model 3 (Our proposed model) 0.9146 0.8606 0.7975 0.7355

      Out-of-Sample

      Period 1 (1965-1981) h=1 h=3 h=6 h=12
      Model 1 (15 PCA) 0.6813 0.7865 0.8409 0.4390
      Model 2 (MCI, PMI, 15 PCA) 0.6392 0.7957 0.8506 0.4444
      Model 3 (Our proposed model) 0.5769 0.7912 0.7955 0.5063
      Period 2 (1982-1998) h=1 h=3 h=6 h=12
      Model 1 (15 PCA) 0.7143 0.4545 0.0000 0.0000
      Model 2 (MCI, PMI, 15 PCA) 0.8966 0.6154 0.2222 0.0000
      Model 3 (Our proposed model) 0.8000 0.5000 0.2500 0.1250
      Period 3 (1999-2017) h=1 h=3 h=6 h=12
      Model 1 (15 PCA) 0.5000 0.5789 0.4706 0.4444
      Model 2 (MCI, PMI, 15 PCA) 0.7273 0.7660 0.6500 0.4444
      Model 3 (Our proposed model) 0.6977 0.8077 0.6818 0.5000

      Analysis: The in-sample results clearly show Model 3 is superior. The out-of-sample results are more nuanced. Model 3's advantage is most pronounced in the most recent period (Period 3), where it shows clear outperformance at horizons of 3, 6, and 12 months. This suggests the indicator's relevance may have increased over time.

    • Table 4: AUROC Comparison This is a manual transcription of Table 4 from the paper. (1)=Model 1, (2)=Model 2, (3)=Model 3.

      Min 1st Quartile Median Mean 3rd Quartile Max
      h = 1 (1)0.9390.9750.9820.9800.9880.999
      (2)0.8180.9640.9820.9740.9920.999
      (3)0.8780.9550.9680.9660.9881.000
      h = 3 (1)0.8720.9640.9730.9700.9790.990
      (2)0.8900.9770.9850.9810.9890.995
      (3)0.8540.9760.9840.9770.9890.999
      h = 6 (1)0.8930.9430.9560.9550.9680.987
      (2)0.9130.9520.9620.9610.9720.986
      (3)0.9060.9590.9720.9680.9790.987
      h = 12 (1)0.8870.9380.9490.9480.9580.986
      (2)0.8890.9390.9500.9490.9610.986
      (3)0.9010.9470.9580.9560.9660.988

      Analysis: The bootstrapped AUROC distributions confirm the paper's main thesis. At h=6h=6 and h=12h=12, both the mean and median AUROC for Model 3 are higher than for Models 1 and 2. This demonstrates that the improved performance is robust and not just a fluke of a single data split.

7. Conclusion & Reflections

  • Conclusion Summary: The paper successfully demonstrates that a novel sentiment indicator, constructed from the polarity and topical coherence of news articles, is a valuable leading indicator for U.S. recessions. Its inclusion in a forecasting model containing a wide array of traditional macroeconomic, financial, and survey-based sentiment variables provides a statistically significant improvement in predictive performance, particularly for longer forecast horizons of 6 to 12 months.

  • Limitations & Future Work: The authors acknowledge several avenues for future research:

    • Broaden Data Sources: The study exclusively uses the New York Times, which may have a specific editorial bias. Including other news outlets (e.g., CNN, Fox News, Wall Street Journal) would create a more robust and less biased indicator.
    • International Application: The methodology is generalizable and could be applied to predict recessions in other countries, as LDA can work with non-English text.
    • Alternative Data Streams: The survey-based sentiment indices are proxies for public mood. A more direct measure could be constructed by applying similar NLP techniques to social media data, such as Twitter feeds.
  • Personal Insights & Critique:

    • Novelty: The paper's primary strength is its innovative construction of the sentiment indicator. The combination of sentiment polarity with topic coherence (via Jensen-Shannon distance) is a clever way to capture a more nuanced signal than simple sentiment scoring alone. It formalizes the intuition that during a crisis, the news not only gets "bad" but also gets "stuck on one topic."
    • Potential Limitations:
      • The choice of K=30K=30 (number of topics) for LDA is arbitrary and fixed for simplicity. The optimal number of topics likely varies over time, and a more adaptive method could potentially improve the coherence measure.
      • The sentiment lexicon used is simple (+1/-1/0). More advanced sentiment analysis techniques could capture nuances like sarcasm, negation, and varying degrees of sentiment intensity.
      • The finding that the indicator's performance is strongest in the post-1999 period is intriguing. This could be due to the changing nature of the 24-hour news cycle, the rise of the internet, or structural shifts in the economy itself. This deserves further investigation.
    • Impact: This work is a strong example of how techniques from computer science and data science can be applied to solve classic problems in economics. It pushes the boundary of economic forecasting beyond traditional time-series analysis and into the realm of unstructured data, paving the way for more timely and direct measures of economic "animal spirits."

Similar papers

Recommended via semantic vector search.

No similar papers found yet.