Wednesday, May 16, 2012

Linear regression with delta salary and delta citation count for the year 2010

Another goal of this week was to perform regression on the change in salary and centrality. To do so, we first calculated cumulative centrality for every year between 2004 and 2010. Then we calculated the change in citation count and PageRank.

We had access to two different types of salary data: gross pay, and base pay. We initially ran a regression on the change in gross pay and citation count, but the variance in gross pay was too high to draw any correlation.



When we graphed change in base pay versus change in citation count, it fared much better.



We had in fact not expected a negative change in base pay!

We ran linear regression on delta base pay using two variables, years since PhD and delta citation count, for year 2010 (that is change from 2009 to 2010). The result is as follows:

R^2 :0.0709406 Estimate p-value
Constant -406.115 0.869237
delta Citation 45.6588 0.0348579

The p-value for the constant term suggests that it is not statistically significant. We also note that the estimate for constant term is negative. Which means, the data does not suggest a negative change in salary due to non-performance (no change in citation count). Also, a negative intercept would mean that up to a certain threshold (where the line crosses y=0), the professor's salary is reduced. This is quite interesting if it were statistically significant. But a very high p-value suggests null hypothesis.
Hence, we performed single variable regression on the data without constant bias and the following was the result :


R^2 : 0.0895593 Estimate p-value
delta Citation 43.7869 0.0162938


For PageRank, since the number of papers is steadily increasing, it was natural for the change in PageRank to be negative. We'd like to try multiplying PageRank by the number of papers taken into consideration.

We are performing regression on delta citation for other years (2004 to 2009), and then we have compare the results and analyze them. Also, we  are calculating h- and  g- indices to see how would they perform in predicting the salary under simple linear assumption.


No comments:

Post a Comment