On September 15th, Bloomberg Businessweek (BBW) published its annual ranking of MBA programs including 119 schools across four regions. In the U.S. ranking, each school is evaluated on five factors (‘indexes,’ as BBW calls them): Compensation, Learning, Networking, Entrepreneurship, and Diversity. Diversity is a new index introduced in the 2021 ranking of U.S. schools. These indexes are assigned relative weights, and using those weights to compute a composite score, BBW arrives at a ranking of the schools. A unique feature of the BBW methodology since 2018 is the ‘crowd-sourcing’ of weights for the indexes of the ranking. BBW’s stated methodology is elaborate and mentions the crowd-sourcing feature prominently: “Rather than assign the indexes relative weightings ourselves, as most rankings systems do, we let the stakeholders decide. In our surveys, we ask students, alumni, and recruiters what was most important to them.”
It is therefore problematic (and ironic) that BBW’s published ranking of the U.S. schools cannot be replicated by applying their stakeholder-generated weights to the five indexes that make up the overall score for each school. Using BBW’s published index scores and index weights produces a ranking dramatically different from the one that BBW has published. This replication crisis of the BBW ranking remains unmitigated whether one uses the ‘normalized scores’ provided by BBW for the five indexes, each scaled from 0 to 100, or the standardized ‘z-scores’ computed from the ‘normalized’ scores. The rankings that result by applying BBW’s index weights either to the published ‘normalized’ scores, or to the z-scores computed from the published scores, are egregiously off-kilter when compared to the published ranking, as the tables at the end of this article show.
The only way to replicate the published ranking is to apply index weights that are vastly different from the stakeholder-generated weights that BBW claims to have used. I computed these true weights by using a constrained optimization model that minimizes variances from the published ranking. When applied to BBW’s ‘normalized scores’, the true weights replicate very closely the overall scores published by BBW for the 84 US schools, and thus its published ranking:
A casual glance at some of the published data suggests that the true weights must be quite different from the stakeholder-generated index weights for the published ranking to be valid. For instance, on the “learning” index of the ranking, Wharton and MIT are both in the bottom quartile among 84 US schools (Wharton at rank 78, MIT at 64—I note this without editorial commentary). On the other hand, UT Dallas (Jindal) has the highest “learning” score, ranking #1 on this index. If “learning” did contribute 25.8% to the overall score, Wharton and MIT would need to have dominant scores on the other indexes (which they don’t) to rise to their high overall ranks. I determined mathematically that the true “learning” weight needs to be suppressed to 7.7% for the published ranking to be consistent with BBW’s index scores.
BLOOMBERG BUSINESSWEEK'S RESPONSE TO THE AUTHOR
It is unclear from BBW’s description of methodology why their published rankings are irreproducible from their data on the five indexes and the weights they claim to have applied to the indexes. I posed the question to BBW’s b-school ranking team via email and received the following response:
For all indexes, schools first receive a raw score of 1 to 7 (to reflect the seven choices offered for each survey question). For “hard” data, like salaries and employment rates, figures are re-scaled 1 to 7 based on the minimum/maximum amounts in the entire cohort.
These 1-7 scores for each index are then weighted (according to our index weightings) into a total raw score between 1 and 7. This final raw score is then re-scaled 0-100. The school with the lowest total raw score gets a 0, while the one with the highest gets a 100. All others are scored proportionally in between.
So for example, if a school’s average Networking Index score was 4.5 out of 7, but that was the minimum score among all non-U.S. schools, it receives a 0 for its normalized score that we display.
'AN ELEMENTARY STATISTICAL FACT AND ITS NEGLECT IN THE RANKING CALCULATION SEEMS IMPLAUSIBLE'
Let me state in somewhat mathematical terms why this explanation does not make sense. There is no mention in the statements above of the raw scores being standardized before the index weights were applied to them. The “proportional” re-scaling of the raw scores from the 1-7 scale to 0-100 is a simple linear transformation that would not affect the ranking computation if the index weights were applied to standardized scores (i.e., z-scores) either before or after the re-scaling. BBW’s last sentence above adds further credence to the possibility that raw scores were not standardized before index weights were applied. If some index scores ranged from 4.5 to 7, and others from 1 to 7, then the average of the index scores (weighted or otherwise) will effectively accord less weight to the index that was confined between 4.5 and 7.
This is an elementary statistical fact and its neglect in the ranking calculation seems implausible for a publication of BBW’s stature. College students, for instance, all understand, whether or not they’ve taken a course in statistics, that if a professor uses a 50-50 weight distribution between a midterm and a final exam to compute the course grade, and if the midterm scores range from 95 to 100, and final scores from 50 to 100, then simply adding the midterm and final scores implies that the course grade is predominantly determined by the final exam, and much less by the midterm.
While it is implausible that BBW’s ranking team would neglect standardization before applying the index weights, it is certainly possible. The methodology described on BBW’s site is Byzantine, bearing greater resemblance to alchemy than to statistics in the various transformations and manipulations of data. So it is possible that essential computations like standardization were omitted. But here is why I am unsure that lack of standardization lies at the root of the error. If the published ‘normalized scores’ on each index are simply a 0-100 linear re-scaling of the 1-7 raw data, as BBW’s email response indicates, then the published index scores can still be standardized before applying the index weights to them. I did precisely this to produce the ranking of Table 2. But this ranking is reasonably close to the ranking produced by applying the index weights to the 0-100 ‘normalized’ scores (Table 1), and each of them is markedly different from the BBW ranking.
MEDIA RANKINGS OF B-SCHOOLS INFLUENCE TENS OF THOUSANDS OF PROSPECTXIVE STUDENTS
This suggests that the distributions of ‘normalized’ scores published by BBW are reasonably similar to the distributions of the corresponding z-scores. This, in turn, casts doubt on the possibility that lack of standardization is the culprit for the skewing of effective weights.
So what is the root of the error? I don’t know and would rather not speculate. But it is indubitable and troubling that BBW’s published ranking cannot be replicated by their stated methodology. This state of affairs would be regrettable for any media publication; it is especially so for the magazine that launched B-school rankings and requires participating schools to “abide by Bloomberg’s strict code of ethics.” Media rankings of b-schools influence tens of thousands of prospective students each year, and participating schools should indeed compile the requested data with unimpeachable integrity and diligence. A parallel expectation of methodological rigor, transparency, and data integrity rests upon media organizations producing the rankings and seeking public trust.
(The following page enables access to the spreadsheet I used for the underlying computations.)
Anjani Jain is the deputy dean for academic programs at Yale University's School of Management. His research interests include the analysis and design of manufacturing systems, optimization algorithms, and probabilistic analysis of combinatorial problems. He joined the faculty of the Wharton School of the University of Pennsylvania in 1986 and served for 26 years before joining Yale SOM.