Saturday, April 07, 2012

Discussion

It is interesting to learn how scholarly pay is determined, and whether citation centrality is a useful indicator of academic performance. If the academic market is a meritocracy, more productive professors, as measured for example by the number of publications, would earn higher pay. However, quality also matters. This suggests that a better measure would be number of publications adjusted for quality, such as quality of journals. Another way to measure quality of publications is by citation counts of the publications.


Existing research on academic salaries and on citation network centralities

Others have correlated academic salaries with citation counts, including:
Bernard Grofman, “Determinants of Political Science Faculty Salaries at the University of California.” Political Science, 2009, 719-727.
Luis R. Gomez-Mejia and David B. Balkin. “Determinants of Faculty Pay: An Agency Theory Perspective.” Academy of Management Journal, 1992, Vol. 35, No. 5, 921-955.
But counting citations is an imperfect measure of a scholar’s marginal product; it is more informative of quality to be cited by an important author than by a minor author. We therefore want to try to construct a different measure of importance of the authors based on centrality measures in the network of research in the academic subject area. If we are successful, our measure should have incremental explanatory power for scholarly pay, and perhaps even dominate determinants identified in past studies.

The paper Michael Hadani, Susan Coombes, Diya Das and David Jalajas “Finding a good job: Academic network centrality and early occupational outcomes in management Academia,” Journal of Organizational Behavior, 2011.
examines academic network centrality (where linkage is by department) in relation to occupational outcomes. However, it does not examine centrality of citations, it does not examine citation networks, and it does not look at the effect on pay. We indent to look at these aspects.


Centrality measures

The concept of centrality was introduced in 1948 by Bavelas in the context of human communication. Since then various different centrality measures as have been proposed and studied in different contexts.
Linton C. Freeman, “Centrality in Social Networks Conceptual Clarification”, Social Networks, 1 (1978/79) 215-239. Gives a graph theoretic approach in defining and measuring centrality in a network.

The following are a few measures of centrality in a network:
  1. PageRank/Katz centrality (there are several variations)
  2. Degree centrality (Citation count in citation network)
  3. Constant function (i.e. number of papers in case of citation network)
  4. Closeness centrality (reciprocal of the average distance with other nodes)
  5. Betweenness (percentage of shortest paths that pass through the given node)
  6. Eigenvector centrality (superset of PageRank)

We need to identify which centrality measure best suits our purpose. In fact, any individual centrality measure might fail to capture all the desired characteristics. So, we believe that a weighted centrality measure encompassing different aspects contributing to the relative importance needs to be defined. Some of the points we need to consider:
  1. Often authors publish papers which build on their previous works, and hence they cite their old papers. We need see how to value this against the citations by other authors.
  2. People also use citations as a form of social exchange. For example, as a favor people may preferentially cite those that they know personally.
  3. Getting cited by an important author, analogous to the contribution to the page rank when a high ranked node links to the node.
(See also the discussion on bias below.)


Robustness

An important challenge while studying centrality is robustness. In most of the cases we have imperfect data, like a few nodes or edges missing, or spurious nodes or edges present. A study on robustness of centrality measures have been done in
Stephen P. Borgatti, Kathleen M. Carley, David Krackhardt, “ On the robustness of centrality measures under conditions of imperfect data”, Social Networks, Volume 28, Issue 2, May 2006, Pages 124–136
where they, (quote) “ show that the accuracy of centrality measures declines smoothly and predictably with the amount of error. This suggests that, for random networks and random error, we shall be able to construct confidence intervals around centrality scores. In addition, centrality measures were highly similar in their response to error. Dense networks were the most robust in the face of all kinds of error except edge deletion. For edge deletion, sparse networks were more accurately measured”. The authors have considered, degree, betweenness, closeness and eigenvector centrality, and have compare them using top 1%, top 3%, top 10%, overlap, and R2 measures of accuracy. In our proposed project, we will be collecting data on citation network and we will be able to create a partial citation network based on that. So the understanding of robustness becomes necessary.


Sources of bias

Another major challenge will be noise in the data on salaries. Academics are paid not only to convert coffee into papers, but also to teach students and perform administrative work for the university. Furthermore, one expects output in these dimensions to be correlated (for example, time spent writing papers is not used for teaching), which could bias our results if not properly controlled for. A difficulty in controlling for this is that measures of teaching load and administrative workload are unlikely to be public. We still need to address this issue.

We will only be looking at a small subset of scholars and academic papers in a few fields. Ideally, we want a very-well-delimited field, distinct from all others, so that field boundaries (for the scholars and papers chosen) are aligned perfectly. People involved in interdisciplinary research may show up as peripheral when looking at only a single field, whereas due to their importance in connecting disparate fields, they could plausibly have especially high centrality in the unobserved overall graph of publications. They would have smaller centrality measures than their true importance, and thus presumably salary, so this error biases the correlation toward zero.


Gameability

If citation centrality affects pay, that incents academics to alter their citation patterns to maximize their core. That would reduce the usefulness of the centrality measure; we want to find a measure that is hard to game in this way. To do this, we must first develop a precise notion of gameability. One possibility is to analyze the maximum and expected marginal effects of getting one additional citation (perhaps as a favor or in exchange for a citation for the other author). More sophisticated models could try to capture social links (since it’s easier to get a citation from someone you know); perhaps school graduated from, graduation year, and current university department could be used as proxies for whether authors are socially linked.

No comments:

Post a Comment