Another
goal of this week was to perform regression on the change in salary and
centrality. To do so, we first calculated cumulative centrality for
every year between 2004 and 2010. Then we calculated the change in
citation count and PageRank.
We had access to two different types of salary data: gross pay, and base pay. We initially ran a regression on the change in gross pay and citation count, but the variance in gross pay was too high to draw any correlation.
When we graphed change in base pay versus change in citation count, it fared much better.
We had in fact not expected a negative change in base pay!
We ran linear regression on delta base pay using two variables, years since PhD and delta citation count, for year 2010 (that is change from 2009 to 2010). The result is as follows:
We had access to two different types of salary data: gross pay, and base pay. We initially ran a regression on the change in gross pay and citation count, but the variance in gross pay was too high to draw any correlation.
When we graphed change in base pay versus change in citation count, it fared much better.
We had in fact not expected a negative change in base pay!
We ran linear regression on delta base pay using two variables, years since PhD and delta citation count, for year 2010 (that is change from 2009 to 2010). The result is as follows:
R^2 :0.0709406 | Estimate | p-value |
---|---|---|
Constant | -406.115 | 0.869237 |
delta Citation | 45.6588 | 0.0348579 |
The p-value for the constant term suggests that it is not statistically significant. We also note that the estimate for constant term is negative. Which means, the data does not suggest a negative change in salary due to non-performance (no change in citation count). Also, a negative intercept would mean that up to a certain threshold (where the line crosses y=0), the professor's salary is reduced. This is quite interesting if it were statistically significant. But a very high p-value suggests null hypothesis.
Hence, we performed single variable regression on the data without constant bias and the following was the result :
R^2 : 0.0895593 | Estimate | p-value |
---|---|---|
delta Citation | 43.7869 | 0.0162938 |
For PageRank, since the number of papers is steadily increasing, it was natural for the change in PageRank to be negative. We'd like to try multiplying PageRank by the number of papers taken into consideration.
We are performing regression on delta citation for other years (2004 to 2009), and then we have compare the results and analyze them. Also, we are calculating h- and g- indices to see how would they perform in predicting the salary under simple linear assumption.
No comments:
Post a Comment