The problem of identifying how significant a node is in a network is encountered in various applications. For example, search engines need to find the importance of web pages for a keyword and rank them in order to produce useful output for the user. For advertising and epidemic control, identifying the most important nodes in the network is a key. The major challenge here is how to measure importance according to the observed structure of a network. The notion of centrality addresses this issue by offering metrics for how “central” a node is to the given network.
In determining faculty salaries, universities attempt to measure a scholar’s marginal value to the university or to the general social welfare. Common measures used are the number of publications, the number of publications in top journals, and the number of times papers are cited. But these measures may not comprehensively measure a scholar’s marginal product.
We propose to construct centrality measures in the citations network of an academic field, use citation centralities of papers to construct empirical indices of the empirical measures of the importance of scholars, and test whether this measure is correlated with professors’ salaries. We will evaluate different centrality measures in the citations network of an academic field. By doing so, we hope to learn what each centrality measure describes, and how well each measure correlates with salary, a proxy for a scholar’s benefit to the university.
Papers cite other papers, and each paper belongs to one or more researchers. We can construct a paper citation network, and calculate the centrality of each paper. We can then calculate statistics for a scholar based on the centralities of the scholar’s papers, such as the mean centrality of the scholar’s papers, and the maximum centrality. We will need some index of the scholar’s overall influence, such as the sum of the author’s paper centralities. This reflects both number of papers and their importance.
But we do not have valuation measures for individual papers, only for scholars. So we must develop a centrality measure for each scholar instead of each paper. A simple measure would be to sum the centralities of papers authored. Alternatively, one could calculate a centrality directly for each scholar. That requires a measure of centrality that is more general than PageRank, because it would have to allow for the fact that scholars do not just have single directed links. For example, Scholar A’s papers may cite Scholar B’s papers 17 times, and that must be distinguished from the case where there is only 1 such citation. So some form of weighted page rank may be better.
It may also be interesting to examine other centrality measures such as degree centrality or closeness centrality. Given the sensitivity of betweenness centrality to small differences in network structure, this measure does not seem appropriate for this study.
Although we are proposing to work on citation networks, our idea and approach is widely applicable. The main aim is to formulate a way to find the central/powerful nodes on a network and see whether the valuation assigned to each node indeed reflects its true importance in the graph. For example, this applies to patent citation networks, considering R&D valuations of companies. This helps us come up with refined measures of centrality for specific applications which require design or incentives based on importance.
After exploring prior centrality measures, we will have some insights as to how well each fit the salary data, and why. If the correlations are poor, and we have specific idea as to why, we will try to spend the rest of the term coming up with a better centrality measurement. If the correlations are strong, that means that a professor has enough incentives to optimize his or her centrality. Then we will study the “gameability” of the most relevant centrality measurement, and analyze how one can improve the centrality and the salary given the existing network.
No comments:
Post a Comment