University rankings | Mark C. Wilson

For lack of a suitable alternative venue, I am putting this opinion piece, destined for a University of Auckland audience, here. Some interesting references related to this topic:
http://en.wikipedia.org/wiki/College_and_university_rankings
http://www.insidehighered.com/blogs/globalhighered/

Recently a vast increase in attempts to rank universities worldwide according to various criteria has provoked debate. Despite their deficiencies, they seem unlikely to disappear, given their usefulness for outsiders (such as international students) in making basic quality judgments (although some universities have apparently boycotted some rankings). Some of the many issues involved are: what are they measuring? how accurate and unbiased is the data? how much is factual and how much opinion? how is the data aggregated? can the rankings be manipulated? I want to focus here (given space limitations) on a few of these, and exhort us all to apply our scholarly expertise to help improve the calculation and interpretation of such rankings.

The most famous rankings are probably those conducted by: Times Higher (from 2010 with Thomson Reuters using different methodology, formerly with Quacquarelli Symonds); QS (continuing from 2010 with the same methodology); Shanghai Jiao Tong. The first two aim to measure aspects other than research. Other, research-only, rankings seen as reasonably credible are conducted by Taiwan’s HEEACT, University of Leiden, and very recently a group from University of Western Australia. Most rank at the level of university, and also (for research) at the faculty/discipline level. To give an idea of the variability, in 2010 under QS ranks UoA 68th; THE 145th; HEEACT 287th; Leiden 272nd; UWA 172nd.

The methodology of these rankings is similar. Several numerical measures are computed (either objectively or from opinion surveys), normalized in some way, and aggregated according to various weightings, in order to arrive at a single number which can be used for ranking. These final numbers are affected by quality of the data, institutional game-playing, and the chosen weights, in addition to the actual metrics used.

Many different objective metrics are used, many of which can be manipulated by institutions and have been, often notoriously, in recent years. They all measure (sometimes subtly) different things: for example, citation data can measure overall citation impact (favouring large institutions), impact per staff member, relative impact (field-dependent), and/or use statistics such as $h$-index, $g$-index, etc. The citation measures used by various rankings have ranked UoA between 99th and around 300th in 2010. Measurement of teaching performance by any objective criteria is considered extremely difficult, and very crude measures (such as student-staff ratios) are used.

Subjective measures have their own problems. THE and QS use opinion surveys to measure reputation. The alleged positive aspect of this is that a university’s reputation is difficult for the institution itself (or a competitor) to change by strategic action (manipulating the ranking). The negative aspect is that the institution’s real improvements may be very slow to be reflected in such surveys, and reputation may depend on factors other than real quality. THE gives 19.5% weight to research reputation and 15% to teaching, while QS gives 40% to overall reputation and 10% to reputation among employers. My personal experience as one of the nearly 14000 academics surveyed by THE was that I had very little confidence in the accuracy of my opinions on teaching in North American universities.

The importance of the weights is shown by the 2010 THE performance for UoA. In the 5 categories listed, with respective weights 30,2.5,30,32.5,5 and respective normalized scores 34.8, 94.3, 39.2, 71.8, 61.1, UoA achieved an overall score of 56.1. Different choices of the weights could lead to any score between 34.8 and 94.3, with consequent rankings changing from well below the top 200, to the top 20.

It is clear that substantial interpretation of the rankings is required in order to make any use of them, and the media focus on simplistic analysis using a single final rank is not helpful to universities or to any of their stakeholders. We should, as a university and as individual researchers:

* Ensure that we understand that different rankings measure different things, and aggregation of rankings is highly problematic, and communicate this to media and stakeholders accurately.
* Be honest. Some of these rankings have been singled out for comment by the Vice-Chancellor and Deputy VC (Research) while others have been publicly ignored. We should (at least internally) look at unfavourable ones too. UoA has slowly declined in most rankings over the last few years. Field-specific information reveals that some of our faculties score much higher than others. Let’s use the rankings not only for advertising, but for reflection on our own performance, especially as regards research.
* Get involved to ensure that the ranking methodology is as accurate as possible. Social scientists have a role to play here, in elucidating just what each measure is supposed to measure, what are its axiomatic properties, and how the measures should be aggregated. THE and QS claim to have discussed methodology with hundreds of people. I doubt it included anyone from UoA.
* Demand transparency of methodology and timely provision of unaggregated data to the public, to enable analysis and reproduce results. I have not discussed the Shanghai rankings in detail because their results are apparently unreproducible.
* Demand that ranking organizations justify the costs to the university of data provision. As with many journals, universities supply the data for free and private companies then control it and sell it back to us. Why should we tolerate it?

Related