Monday, May 21, 2012

Updates for May 21, 2012

This weekend, we ran the full regression on delta salary versus the number of citations. We observed that most professors received biyearly base salary raises, so we calculated the salary difference after two years and the corresponding citation changes. We were able to obtain 150 datapoints, and the result is as follows:

deltaSalary = beta * deltaCitations


Estimatep-value
delta Citation33.48996.14027 x 10^-12

The p-value was extremely low, rejecting the null hypothesis.

Also, while running regression on biyearly data, we realized that we should have paired up the previous year's centrality with the current year's salary. We ran the regressions with centralities for 2009 with salaries for 2010, and here are the results:


Estimatep-value
Constant57078.51.62923 x 10^-9
Years Since PhD2431.52.01299 x 10^-10
Citations5.0620.00961352


Estimatep-value
Constant60246.72.01867 x 10^-10
Years Since PhD2338.529.00664 x 10^-10
PageRank4.4146 x 10^60.0155467


The p-values and the estimates were similar to the previous results'.


During the last meeting, we agreed that we should try to capture the values of citations coming from differently ranked professors. This weekend, we debated whether PageRank would be a good measure to capture that. Even though the regression result we obtained for PageRank could be interpreted as statistically significant, we still haven't established the significance of delta PageRank.

Michael also wrote code for calculating g and h index. We will be analyzing the output this week.

Why PageRank is not a good measure to capture value of a paper

Let us consider a toy example:

P = [ 0 0 0 0; 1 0 0 0; 1 0 0 0; 1 1 1 0]

From the traditional page-rank:

G = a* (P+I)./deg_of_each_node + (1-a)*(1/n)*(ones)

with this if we compute page rank for our toy example, we get
    0.9873
    0.1039
    0.1039
    0.0598

that is  r(1) = 1, r(2) = 2, r(3) = 2, r(4) = 3.

Now, suppose paper 3 had cited paper 2,

P = [ 0 0 0 0; 1 0 0 0; 1 1 0 0; 1 1 1 0]

Now the page rank gives,

    0.9835
    0.1475
    0.0848
    0.0608

that is  r(1) = 1, r(2) = 2, r(3) = 3, r(4) = 4.

What did we want to capture?  (assuming each citation means the same, i.e., unweighted graph)
The difference between network 2 and network 1 is that,
val(1) > val(2) > val(3) > val(4)
at the current time.

Problems with absolute value of page ranks :

If we look at values that page rank gives, the value of paper 4 increased from 0.0598 to 0.0608. This does not make sense, since a paper in past having cited another paper or not should not change the value of a new paper.

If we look at ranks instead, paper 3 lost its position from rank 2 to rank 3.  But, a paper's value should depend on its in-degree and not its out degree.

So, page rank does not capture the essence of the difference between network 1 and 2. 

Wednesday, May 16, 2012

Linear regression with delta salary and delta citation count for the year 2010

Another goal of this week was to perform regression on the change in salary and centrality. To do so, we first calculated cumulative centrality for every year between 2004 and 2010. Then we calculated the change in citation count and PageRank.

We had access to two different types of salary data: gross pay, and base pay. We initially ran a regression on the change in gross pay and citation count, but the variance in gross pay was too high to draw any correlation.



When we graphed change in base pay versus change in citation count, it fared much better.



We had in fact not expected a negative change in base pay!

We ran linear regression on delta base pay using two variables, years since PhD and delta citation count, for year 2010 (that is change from 2009 to 2010). The result is as follows:

R^2 :0.0709406 Estimate p-value
Constant -406.115 0.869237
delta Citation 45.6588 0.0348579

The p-value for the constant term suggests that it is not statistically significant. We also note that the estimate for constant term is negative. Which means, the data does not suggest a negative change in salary due to non-performance (no change in citation count). Also, a negative intercept would mean that up to a certain threshold (where the line crosses y=0), the professor's salary is reduced. This is quite interesting if it were statistically significant. But a very high p-value suggests null hypothesis.
Hence, we performed single variable regression on the data without constant bias and the following was the result :


R^2 : 0.0895593 Estimate p-value
delta Citation 43.7869 0.0162938


For PageRank, since the number of papers is steadily increasing, it was natural for the change in PageRank to be negative. We'd like to try multiplying PageRank by the number of papers taken into consideration.

We are performing regression on delta citation for other years (2004 to 2009), and then we have compare the results and analyze them. Also, we  are calculating h- and  g- indices to see how would they perform in predicting the salary under simple linear assumption.


Tuesday, May 15, 2012

Linear Regression on new data

Previously we did a linear regression with two variables on data from SNAP, and found the problems due to considering only papers with primary tag in HepTh. So, we crawled arxiv and obtained new data and performed linear regression on the new data. The following are the results :

Using Gross Pay : (limiting years since PhD to <= 40)

R^2 : 0.34126 Estimate p-value
Constant 84184.6 2.56162*10^-8
Years Since PhD 2094.61 0.000113447
Citation Count 7.47883 0.0138856


R^2 : 0.323274 Estimate p-value
Constant 89390.7 3.70218*10^-9
Years Since PhD 1955.36 0.000356706
Page Rank 6.57888*10^6 0.0284647
  
Using Base Pay : (limiting years since PhD to <= 40)
R^2 :0.605081 Estimate p-value
Constant 57158.9 1.61337*10^-9
Years Since PhD 2427.95 2.17864*10^-10
Citation Count 4.74429 0.0103403
 
R^2 :0.598601 Estimate p-value
Constant 60317.7 1.94649*10^-10
Years Since PhD 2334.74 9.65709*10^-10
Page Rank 4.39694*10^6 0.0158609

Yearly Centralities

We have finished calculating centrality for citation network, for each each year from 2004 to 2010. We also have obtained UC salary data from ucpay.globl.org for years 2004 through 2010. We identified professors from UC in the citation data and have listed the papers for each professor. Now we are converting centrality of papers to rank of professors. Also, with yearly centrality we are computing delta centrality which shall be used to construct rank based on the change in centrality of papers.  

Tuesday, May 08, 2012

Fresh Data :)

We have obtained data on papers on High Energy Physics (HepTh) in arxiv from year 1991 to May 2nd 2012. We have crawled all the papers and their abstracts that are tagged with HepTh, both primary and secondary to address issue 2 raised after looking at outliers in old data obtained from SNAP. (1992 to 2003 with papers primary tagged to HepTh).

Current data has 79,188 nodes and 1,163,903 edges! We ran into memory issues :-P Finally, after fixing it, we have calculated in-degree, out-degree and page-rank of each nodes using Gephi. We are currently computing closeness and betweeness centrality on the data. It is taking a LOT of time, as expected.

Meanwhile, we are also simultaneously working on joining the fresh citation data with the salary data. We are downloading salary data for year 2010 for UC campuses. Since the latest salary data is 2010, we need to truncate our citation data accordingly.

Linear Regression


Our aim is to see if the current payment system in Public Schools has any correlation with the research value of professors. We try to model salary linearly based on following parameters,

  1.  Years since PhD : x_1
  2. Research value : x_2
  3. Gender : x_3
  4. Area of residence : x_4

Salary = c_0 + c_1 * x_1 + c_2 * x_2 + c_3 * x_3 + c_4 * x_4

We used the citation data for High Energy Physics Theory [HepTh] (from arxiv, 1992 to 2003), and Salary data for year 2003 for Universities of California. There are 27,770 nodes and 352,807 edges in our citations graph.

There are 10 UC campuses :
  1. Berkeley
  2. Davis
  3. Irvine
  4. Los Angeles
  5. Merced
  6. Riverside
  7. San Diego
  8. San Fransisco
  9. Santa Barbara
  10. Santa Cruz

We made a list of professors common in abstracts of papers in citation network, and the salary database of UC campuses. This gave us a list of UC professors who have published in the area of HepTh. We found 52 matchings. By manually checking, we found that some of these professors are from other areas like Medicine, Political Science, Finance, Mathematics etc who have published few papers in HepTh. But these professors have bulk of other work apart from HepTh and hence, we cannot compare them with professors of Physics who are working primarily on HepTh. This leaves us with 30 professors.

We collected the data on years since PhD and gender for each professor manually. We found only one female professor and hence decided to ignore gender as a variable. Also, since we are confining ourselves to UC campuses, x_4 is same for all professors, assuming the cost of living is approximately same all over California. From our previous exercise of comparing years since PhD and salary suggests that there is some correlation.

So, our model reduces to a two-variable linear regression model :

y = b0 + b1 * x_1 + b2 * x_2

To begin with we consider simple definition of research value = total citation count.

We test our model on the 30 professors. The following is the result of the fit :
y =5.32492 * citationCount + 269.615 * yrsSincePhD + 98470.3

r-squared : 0.02974

                         Estimate     Standard Error    t-Statistic     P-Value

                                                                   
1                      98470.3     14595.5               6.74663       3.03632E-7

yrsSincePhD      269.615       627.158              0.4299          0.67068

citationCount     5.32492        7.42821             0.716851      0.479622

If we look at p-value, it seems like both variables are not significant.

But we have already seen Years since PhD should to contribute to the salary. So, we go back and look at the data. The following is a plot of gross pay versus citation count :




We looked at the outliers, which suggests the following two problems :

  1. There are professors who are retired/emeritus who are drawing just pension.
  2. There  are professors who have published in HepTh but these papers are primarily tagged in other areas of physics. Our current database has papers with only first tag in HepTh. Thus significantly lowering the citation count in some of the professors with high salary.

To overcome problem 1, we decided to look at professors whose years since PhD is <=30. The result is as follows :

y = 73625.4 + 12.0094 * citationCount + 1880.43 * yrsSincePhD

r-squared = 0.313416

                        Estimate   Standard Error   t-Statistic   P-Value

1                      73625.4    13891.3          5.3001        0.0000255525

yrsSincePhD     1880.43    791.199          2.37668       0.0265942

citationCount   12.0094    6.6497           1.80601       0.0846189


The p-value of years since PhD suggests that it is statistically significant (this is not surprising given observation and that we have eliminated outliers). The p-value of citation count is not still satisfactory ( the standard p-value for statistical significance is ~ 0.05). But, it is still encouraging. If we get a larger data set, we might be able to see a better result. For this, we are collecting a larger data set. Also, citation count might be a very crude measure of research value, which calls for better ways of ranking papers and hence professors. We have calculated page-rank, closeness and betweeness centrality for the current network. Regression using these measures and comparison is being currently done. We shall also look at h-index, g-index of the professors and compare them with measures based on traditional notions of centrality on citation graph.

Problem 2 cannot be completely solved since we cannot get data on all the publications by all professors, i.e, getting a complete citation graph. But, we are crawling arxiv and get the a better citation data by considering papers which have secondary tags as HepTh.


--
P.S : We use Gephi to work on citation network and Mathematica for regression analysis.

Monday, April 30, 2012

Crawling Arxiv

We just finished crawling the abstracts and the citation network for 10 years of theoretical high energy physics papers. Arxiv lists up to 2000 articles for a given year, and we can retrieve 10000 abstracts per API request, so fetching metadata was quite simple.

The bottleneck was retrieving the citation data for a given paper. This was supported by http://inspirehep.net/, which is a high energy physics(HEP) literature database. It seems to perform multiple database lookups to match corresponding paper ids, taking anywhere from 1 to 10 seconds per request. We initially ran twenty threads, but using ten threads actually improved the performance.

About 30% of the papers were cross-listed from other fields besides theoretical high energy physics, and more than half of the papers are tagged with multiple fields. "General Relativity and Quantum Cosmology"(GR-QC) was the most common overlap. It might be in our interest to crawl GR-QC and retrieve citation data, except then it would generate a bias for HEP-theory authors who publish papers related to the specific field.

Work Update as of 2012-04-30

[ramya] Retrieved secondary variables for regression. Calculating centrality for the new data.
[kijun] Finished arxiv crawler. Crawled high energy physics citation network from 2003-2012. 
[michael] Retrieved secondary variables for regression. Working on a unified salary table for public schools in multiple states.

Wednesday, April 25, 2012

Plot of Salary vs. Year of PhD




This is a plot of basic pay versus year since Ph.D for professors in the area of High Energy Physics in UC campuses (as of year 2003).

Monday, April 23, 2012

Work Update as of 2012-04-23

[ramya] Computed closeness, betweenness centrality and page rank for the high energy physics papers.
Working on centrality on weighted graph.
[kijun] For each paper in the database, obtained which journal/conference it was published in.
Still working on arxiv crawler.
[michael] Working on salary data.

A snap-shot of Citation Graph

A snap at the citation graph of papers in High Energy Physics on arXiv (from 1993 to 2003).

Average In-Degree : 10.089
Average Out-Degree : 15.723


Monday, April 16, 2012

Milestone 2 : Regression Variables and More Data

This week's goals are to calculate centralities for high energy physics (HEP) citation network, start crawling arXiv for other domains of physics, and to obtain data for regression variables.

Here's the to-do list for our second milestone:
  • calculate the PageRank for each paper
  • calculate closeness and betweenness centrality
  • obtain years since Ph.D. for professors in UCs who study high energy physics (HEP)
  • obtain relative cost of living for each college area
  • start crawling arXiv's HEP citation network from 2003 to 2012
  • start crawling arXiv's astrophysics and condensed matter citation network
  • obtain salary data from outside UCs, including University of Texas system.

Work Update as of 2012-04-16

[Michael] Collecting salary data from other public universities and calculating degree centrality for high energy physics citation data.
[Ramya] Calculated in-degree for each papers, and working on page-rank and degree centrality.
[Kijun] Mapped high energy physics authors to corresponding professors from UCs. Working on an arXiv crawler for collecting citation network for different areas of physics.

We also prepared for our presentation, which was held on Thursday, April 12th!

Saturday, April 14, 2012

Saturday, April 07, 2012

Milestone 1 : Simple Regression

Our first goal is to get the available data and run a simple single variable regression on it. For this we will be using citation network data for high energy physics from http://snap.stanford.edu/data/cit-HepPh.html. This dataset gives us citation network from January 1993 until April 2003. For salary data we are crawling ucpay.globl.org. This gives us base salary for the year 2004. Currently we restrict ourselves to professors from UC campuses.

To Do for first milestone :
  • Make a table of 2004 salary data from ucpay.globl.org (python script) DONE
  • Make a table of HEP-TH professors and their papers (by arxiv id)
  • Parse the abstracts from http://www.cs.cornell.edu/projects/kddcup/datasets.html
  • Alter ucpay salary table to align with HEP-TH professors (rearrange name, remove middle name, etc.)
  • Calculate citation measurements from SNAP data (mathematica, R)
  • Use HEP-TH table to calculate rank for each professor
  • Join the resulting tables
  • Linear regression (mathematica, R)

Work Update as of 2012-04-07

[Michael] The ucpay.globl.org actually lets you download the raw data as CSV. Parsed these (for years [2004, 2010)) and added to the sqlite DB.
[Ramya] Converted the citation network data to format usable by Mathematica. Exploring various tools available on Mathematica for analysis of networks.
[Kijun] Getting the list of authors for each article on the citation network

Also we have set up a github where we commit our data, codes and results. This is to facilitate a easy co-ordination between team members.

Proposed Timeline

first half of the term
We will first focus on the empirical side. Since gathering data may take a long time, after we obtain sufficient data to start we will concurrently analyze the data and refine/obtain more data.

Gathering citation/authorship and salary data (5 weeks)
We need to gather data for citation network and salary of the professors. For citation networks, we will start with the high-energy physics data from http://snap.stanford.edu (a relatively small network). (SNAP also has software that may be useful for processing and computing using the datasets.) If needed, we will also use the DBLP citation network, available at http://arnetminer.org/citation, which consists of more than one million nodes and two million edges.

For professors’ salaries, the Collegiate Times offers a centralized database of such data. Unfortunately, they only display a non-pageable list of first 250 results for each school. We will write to them and ask for the dataset for academic purposes. Otherwise, we will write a scraper to retrieve data from individual sites that host state-wide salary information, such as http://ucpay.globl.org.

One complication is that many women publish under a name that differs from their legal name, which is presumably used in the salary data.

Regression tools (1 week)
To familiarize ourselves with regression techniques, we will run regressions using the salary data and a trivial centrality measure (such as the number of papers for each professor). This involves choosing a software package, and writing scripts to process the data obtained into a form suitable for the software we decide to use.

Apply Centrality Measures and Regression (2 weeks)
We will calculate the PageRank, degree, closeness, and betweenness centralities of the citation network in our dataset. After calculating the centrality for each node, we will try different regression models to determine which model has the best fit.

Evaluate Result (1 week)
We will analyze the results obtained from the previous step. We will explain why some measures performed better than others, and determine and describe factors accounting for error or bias.

second half of the term
Focused on theory. Of course, since the difficulty of obtaining data is hard to predict, this could be advanced or delayed from the midpoint of the course.

Design a Better Centrality Measure (2 weeks)
If we believe that centrality measure can be improved dramatically, perhaps through weighing different measures or using an entirely new concept, we will spend the next two weeks implementing a centrality measure better suited for measuring importance in a citation network. We will run regressions using the new centrality measure and see whether it correlates more with salary.

Analysis of gameability of centrality measures (2 weeks)
If the new centrality measure is deemed “good enough”, it indicates that there exists a important correlation between a professor’s centrality and salary. We will explore the ways in which a professor is able to “game” the centrality measurement to improve one’s salary.

Discussion

It is interesting to learn how scholarly pay is determined, and whether citation centrality is a useful indicator of academic performance. If the academic market is a meritocracy, more productive professors, as measured for example by the number of publications, would earn higher pay. However, quality also matters. This suggests that a better measure would be number of publications adjusted for quality, such as quality of journals. Another way to measure quality of publications is by citation counts of the publications.


Existing research on academic salaries and on citation network centralities

Others have correlated academic salaries with citation counts, including:
Bernard Grofman, “Determinants of Political Science Faculty Salaries at the University of California.” Political Science, 2009, 719-727.
Luis R. Gomez-Mejia and David B. Balkin. “Determinants of Faculty Pay: An Agency Theory Perspective.” Academy of Management Journal, 1992, Vol. 35, No. 5, 921-955.
But counting citations is an imperfect measure of a scholar’s marginal product; it is more informative of quality to be cited by an important author than by a minor author. We therefore want to try to construct a different measure of importance of the authors based on centrality measures in the network of research in the academic subject area. If we are successful, our measure should have incremental explanatory power for scholarly pay, and perhaps even dominate determinants identified in past studies.

The paper Michael Hadani, Susan Coombes, Diya Das and David Jalajas “Finding a good job: Academic network centrality and early occupational outcomes in management Academia,” Journal of Organizational Behavior, 2011.
examines academic network centrality (where linkage is by department) in relation to occupational outcomes. However, it does not examine centrality of citations, it does not examine citation networks, and it does not look at the effect on pay. We indent to look at these aspects.


Centrality measures

The concept of centrality was introduced in 1948 by Bavelas in the context of human communication. Since then various different centrality measures as have been proposed and studied in different contexts.
Linton C. Freeman, “Centrality in Social Networks Conceptual Clarification”, Social Networks, 1 (1978/79) 215-239. Gives a graph theoretic approach in defining and measuring centrality in a network.

The following are a few measures of centrality in a network:
  1. PageRank/Katz centrality (there are several variations)
  2. Degree centrality (Citation count in citation network)
  3. Constant function (i.e. number of papers in case of citation network)
  4. Closeness centrality (reciprocal of the average distance with other nodes)
  5. Betweenness (percentage of shortest paths that pass through the given node)
  6. Eigenvector centrality (superset of PageRank)

We need to identify which centrality measure best suits our purpose. In fact, any individual centrality measure might fail to capture all the desired characteristics. So, we believe that a weighted centrality measure encompassing different aspects contributing to the relative importance needs to be defined. Some of the points we need to consider:
  1. Often authors publish papers which build on their previous works, and hence they cite their old papers. We need see how to value this against the citations by other authors.
  2. People also use citations as a form of social exchange. For example, as a favor people may preferentially cite those that they know personally.
  3. Getting cited by an important author, analogous to the contribution to the page rank when a high ranked node links to the node.
(See also the discussion on bias below.)


Robustness

An important challenge while studying centrality is robustness. In most of the cases we have imperfect data, like a few nodes or edges missing, or spurious nodes or edges present. A study on robustness of centrality measures have been done in
Stephen P. Borgatti, Kathleen M. Carley, David Krackhardt, “ On the robustness of centrality measures under conditions of imperfect data”, Social Networks, Volume 28, Issue 2, May 2006, Pages 124–136
where they, (quote) “ show that the accuracy of centrality measures declines smoothly and predictably with the amount of error. This suggests that, for random networks and random error, we shall be able to construct confidence intervals around centrality scores. In addition, centrality measures were highly similar in their response to error. Dense networks were the most robust in the face of all kinds of error except edge deletion. For edge deletion, sparse networks were more accurately measured”. The authors have considered, degree, betweenness, closeness and eigenvector centrality, and have compare them using top 1%, top 3%, top 10%, overlap, and R2 measures of accuracy. In our proposed project, we will be collecting data on citation network and we will be able to create a partial citation network based on that. So the understanding of robustness becomes necessary.


Sources of bias

Another major challenge will be noise in the data on salaries. Academics are paid not only to convert coffee into papers, but also to teach students and perform administrative work for the university. Furthermore, one expects output in these dimensions to be correlated (for example, time spent writing papers is not used for teaching), which could bias our results if not properly controlled for. A difficulty in controlling for this is that measures of teaching load and administrative workload are unlikely to be public. We still need to address this issue.

We will only be looking at a small subset of scholars and academic papers in a few fields. Ideally, we want a very-well-delimited field, distinct from all others, so that field boundaries (for the scholars and papers chosen) are aligned perfectly. People involved in interdisciplinary research may show up as peripheral when looking at only a single field, whereas due to their importance in connecting disparate fields, they could plausibly have especially high centrality in the unobserved overall graph of publications. They would have smaller centrality measures than their true importance, and thus presumably salary, so this error biases the correlation toward zero.


Gameability

If citation centrality affects pay, that incents academics to alter their citation patterns to maximize their core. That would reduce the usefulness of the centrality measure; we want to find a measure that is hard to game in this way. To do this, we must first develop a precise notion of gameability. One possibility is to analyze the maximum and expected marginal effects of getting one additional citation (perhaps as a favor or in exchange for a citation for the other author). More sophisticated models could try to capture social links (since it’s easier to get a citation from someone you know); perhaps school graduated from, graduation year, and current university department could be used as proxies for whether authors are socially linked.

Introduction

The problem of identifying how significant a node is in a network is encountered in various applications. For example, search engines need to find the importance of web pages for a keyword and rank them in order to produce useful output for the user. For advertising and epidemic control, identifying the most important nodes in the network is a key. The major challenge here is how to measure importance according to the observed structure of a network. The notion of centrality addresses this issue by offering metrics for how “central” a node is to the given network.

In determining faculty salaries, universities attempt to measure a scholar’s marginal value to the university or to the general social welfare. Common measures used are the number of publications, the number of publications in top journals, and the number of times papers are cited. But these measures may not comprehensively measure a scholar’s marginal product.

We propose to construct centrality measures in the citations network of an academic field, use citation centralities of papers to construct empirical indices of the empirical measures of the importance of scholars, and test whether this measure is correlated with professors’ salaries. We will evaluate different centrality measures in the citations network of an academic field. By doing so, we hope to learn what each centrality measure describes, and how well each measure correlates with salary, a proxy for a scholar’s benefit to the university.

Papers cite other papers, and each paper belongs to one or more researchers. We can construct a paper citation network, and calculate the centrality of each paper. We can then calculate statistics for a scholar based on the centralities of the scholar’s papers, such as the mean centrality of the scholar’s papers, and the maximum centrality. We will need some index of the scholar’s overall influence, such as the sum of the author’s paper centralities. This reflects both number of papers and their importance.

But we do not have valuation measures for individual papers, only for scholars. So we must develop a centrality measure for each scholar instead of each paper. A simple measure would be to sum the centralities of papers authored. Alternatively, one could calculate a centrality directly for each scholar. That requires a measure of centrality that is more general than PageRank, because it would have to allow for the fact that scholars do not just have single directed links. For example, Scholar A’s papers may cite Scholar B’s papers 17 times, and that must be distinguished from the case where there is only 1 such citation. So some form of weighted page rank may be better.
It may also be interesting to examine other centrality measures such as degree centrality or closeness centrality. Given the sensitivity of betweenness centrality to small differences in network structure, this measure does not seem appropriate for this study.

Although we are proposing to work on citation networks, our idea and approach is widely applicable. The main aim is to formulate a way to find the central/powerful nodes on a network and see whether the valuation assigned to each node indeed reflects its true importance in the graph. For example, this applies to patent citation networks, considering R&D valuations of companies. This helps us come up with refined measures of centrality for specific applications which require design or incentives based on importance.

After exploring prior centrality measures, we will have some insights as to how well each fit the salary data, and why. If the correlations are poor, and we have specific idea as to why, we will try to spend the rest of the term coming up with a better centrality measurement. If the correlations are strong, that means that a professor has enough incentives to optimize his or her centrality. Then we will study the “gameability” of the most relevant centrality measurement, and analyze how one can improve the centrality and the salary given the existing network.

CS 145 project blog

This is the CS 145 project blog created by Ramya, Kijun and Michael. We are studying centrality measures on networks and their implications, specifically with citation network.