Revisionist History in ESG Ratings

Investors following an environmental, social and governance (ESG) mandate can achieve their goals only if they can accurately and consistently identify stocks that meet their criteria. But new research shows that those criteria have been subject to arbitrary revisions and that there are wide discrepancies among the vendors providing the data.

There has been accelerating interest in ESG investing strategies. In fact, ESG investing in its various forms, such as sustainable investing (SI) or socially responsible investing (SRI), now accounts for one out of every four dollars under professional management in the United States and one out of every two dollars in Europe[1]. The heightened interest has been accompanied by a dramatic increase in academic research into the subject – research that relies on ESG rating providers such as Bloomberg, CDP, ISS, MSCI, S&P Global, Sustainalytics, Thomson Reuters Refinitiv ESG (“Refinitive ESG”) and Vigeo-Eiris – which raises the question of the reliability, consistency and overall quality of the ratings. Unfortunately, research has uncovered two problems with the data.

Divergence in ratings across raters

Florian Berg, Julian Koelbel and Roberto Rigobon, authors of the August 2019 study, “Aggregate Confusion: The Divergence of ESG Ratings,” investigated the divergence of ESG ratings across the raters. Their findings led the authors to conclude:

ESG performance is unlikely to be properly reflected in corporate stock and bond prices, as investors face a challenge when trying to identify outperformers and laggards – investor tastes can influence asset prices, but only when a large enough fraction of the market holds and implements a uniform nonfinancial preference. Therefore, even if a large fraction of investors have a preference for ESG performance, the divergence of the ratings disperses the effect of these preferences on asset prices.

The divergence frustrates the ambition of companies to improve their ESG performance because they receive mixed signals from rating agencies about which actions are expected and will be valued by the market.

A significant portion of the measurement divergence is rater-specific and not category-specific, suggesting the presence of a “rater effect” – a firm that performs well (poorly) in one category for one rater is more likely to perform well (poorly) in all other categories for that same rater.

The divergence of ratings poses a challenge for empirical research, as using one rater versus another may alter a study’s results and conclusions.

They added: “Ambiguity around ESG ratings is an impediment to prudent decision-making that would contribute to an environmentally sustainable and socially just economy.” And finally, they stated: “To change the situation, companies should work with rating agencies to establish open and transparent disclosure standards and ensure that the data is publicly accessible.”

Revisionist history in ratings

Florian Berg, Kornelia Fabisik and Zacharias Sautner contribute to the ESG literature with their September 2020 study, “Rewriting History II: The (Un)predictable Past of ESG Ratings.” They began by noting that prior research had found that fund flows react strongly to the ESG ratings of mutual funds, which are constructed based on the ESG ratings of their portfolio firms. They cited the 2018 study, “Why and How Investors Use ESG Information: Evidence from a Global Survey,” which found that “82% of investment professionals use ESG information in the investment process, but 26.4% also indicate a lack of ESG rating reliability.” To determine the reliability of ESG scores, they examined two versions of the same Refinitiv ESG data for the same set of firm-years. Refinitiv ESG’s scores have been used (or referenced) in more than 1,000 academic articles over the past 15 years and are used by major asset managers, such as BlackRock, to manage ESG investment risks. The initial version of the data was September 2018; the rewritten version was two years later, in September 2020. The scores included an overall ESG score as well as environmental (E), social (S) and governance (G) subscores. The sample contained 29,828 firm-year observations between 2011 and 2017 from 72 countries.

Berg, Fabisik and Sautner found widespread changes to the historical ESG scores of Refinitiv ESG. Thus, changes in scores have important implications for analyses linking ESG scores to outcome variables such as firm performance or stock returns. Following are some of their key findings:

ESG scores for identical firm-years differed between the two data versions – in some cases dramatically.

Not a single ESG score was the same across the two versions.

Thirteen percent of the sample observations were subject to a score upgrade – the rewritten ESG score was higher than the initial ESG score.

Eighty-seven percent of the observations were subject to a score downgrade.

The data rewriting was large economically – the overall ESG score in the rewritten version was on average 20.6% lower than in the initial version.

The percentage deviations from the initial to the rewritten version for the E, S and G subscores were -47.4%, 8.6% and 116.2%, respectively.

The ESG score deviations strongly affect ESG-based ranking of S&P 1500 firms. This in turn affects the classification of firms into different ESG quantiles. For the overall ESG score, only 68.5% of firm-year observations were classified into the top decile (top 10%) in the initial and rewritten data versions; numbers were similar for the bottom decile.

Firms that experienced positive (or only small negative) score deviations (top quartile in the distribution of the ESG score deviation), exhibited positive announcement returns (CARs) when Refinitiv ESG’s methodology change was announced, while large negative score deviations (bottom quartile) experienced negative CARs.
Firms that were more profitable (measured using EBIT/assets) and spent more on R&D experienced upgrades in their rewritten ESG scores.

The authors noted that the changes in the scores were caused by two adjustments in the scoring methodology that occurred in April 2020. The first change was that Refinitiv ESG started to take into account that not all ESG metrics feeding into the ESG scores are of equal importance to every industry. The second was that while Refinitiv ESG was previously assigning a neutral score to firms that did not report on a certain metric, the new methodology assigned a score of zero to such firms.

Berg, Fabisik and Sautner also compared abnormal returns of high- and low-E&S firms before versus after a COVID-19 event date (February 24, 2020). When classifying firms based on the initial E&S scores, they found no evidence that high-E&S firms performed better during the COVID-19 pandemic compared to low-E&S firms. However, using the rewritten data, they found strong evidence that high-E&S firms exhibited better stock market performance during the pandemic relative to low-E&S firms. Of course, this outperformance would not have been achievable with the information available to investors at the onset of (or before) the pandemic. This finding has important implications, applying broadly to the backtesting of ESG strategies – it is critical to verify that the original, not the rewritten, scores are being used.

Summary

The explosion in ESG research has led to a reliance on ESG rating providers. These data vendors develop scores that evaluate how well a firm performs with respect to various ESG criteria. Unfortunately, there is wide dispersion in ratings among the raters, and ratings are subject to revision, making it difficult to analyze the implications of linking ESG scores to outcome variables such as firm performance or stock returns. As Berg, Fabisik and Sautner noted: “Moving forward, researchers and investment professionals need to verify whether the original, not the rewritten, ESG scores are needed to perform their tests. Given that ESG research and ESG-related investment strategies are likely to grow even further, this is an important caveat for the use of the current, and thereby rewritten, Refinitiv ESG data.”

Larry Swedroe is the chief research officer for Buckingham Strategic Wealth and Buckingham Strategic Partners.

Important Disclosure: This article is for educational and informational purposes only and should not be construed as specific investment, accounting, legal or tax advice. By clicking on any of the links above, you acknowledge that they are solely for your convenience, and do not necessarily imply any affiliations, sponsorships, endorsements or representations whatsoever by us regarding third-party websites. We are not responsible for the content, availability or privacy policies of these sites, and shall not be responsible or liable for any information, opinions, advice, products or services available on or through them. The opinions expressed by featured authors are their own and may not accurately reflect those of Buckingham Strategic Wealth®. R-20-1477

[1] Depending on how you identify ESG assets, those numbers are considerably less.

Revisionist History in ESG Ratings

Sponsored Content

Trending Topics View All

Upcoming Virtual Events View All