Revisionist History in ESG Ratings

Investors following an environmental, social and governance (ESG) mandate can achieve their goals only if they can accurately and consistently identify stocks that meet their criteria. But new research shows that those criteria have been subject to arbitrary revisions and that there are wide discrepancies among the vendors providing the data.

There has been accelerating interest in ESG investing strategies. In fact, ESG investing in its various forms, such as sustainable investing (SI) or socially responsible investing (SRI), now accounts for one out of every four dollars under professional management in the United States and one out of every two dollars in Europe[1]. The heightened interest has been accompanied by a dramatic increase in academic research into the subject – research that relies on ESG rating providers such as Bloomberg, CDP, ISS, MSCI, S&P Global, Sustainalytics, Thomson Reuters Refinitiv ESG (“Refinitive ESG”) and Vigeo-Eiris – which raises the question of the reliability, consistency and overall quality of the ratings. Unfortunately, research has uncovered two problems with the data.

Divergence in ratings across raters

Florian Berg, Julian Koelbel and Roberto Rigobon, authors of the August 2019 study, “Aggregate Confusion: The Divergence of ESG Ratings,” investigated the divergence of ESG ratings across the raters. Their findings led the authors to conclude:

  • ESG performance is unlikely to be properly reflected in corporate stock and bond prices, as investors face a challenge when trying to identify outperformers and laggards – investor tastes can influence asset prices, but only when a large enough fraction of the market holds and implements a uniform nonfinancial preference. Therefore, even if a large fraction of investors have a preference for ESG performance, the divergence of the ratings disperses the effect of these preferences on asset prices.
  • The divergence frustrates the ambition of companies to improve their ESG performance because they receive mixed signals from rating agencies about which actions are expected and will be valued by the market.
  • A significant portion of the measurement divergence is rater-specific and not category-specific, suggesting the presence of a “rater effect” – a firm that performs well (poorly) in one category for one rater is more likely to perform well (poorly) in all other categories for that same rater.
  • The divergence of ratings poses a challenge for empirical research, as using one rater versus another may alter a study’s results and conclusions.