Monday, May 21, 2012

Why PageRank is not a good measure to capture value of a paper

Let us consider a toy example:

P = [ 0 0 0 0; 1 0 0 0; 1 0 0 0; 1 1 1 0]

From the traditional page-rank:

G = a* (P+I)./deg_of_each_node + (1-a)*(1/n)*(ones)

with this if we compute page rank for our toy example, we get
    0.9873
    0.1039
    0.1039
    0.0598

that is  r(1) = 1, r(2) = 2, r(3) = 2, r(4) = 3.

Now, suppose paper 3 had cited paper 2,

P = [ 0 0 0 0; 1 0 0 0; 1 1 0 0; 1 1 1 0]

Now the page rank gives,

    0.9835
    0.1475
    0.0848
    0.0608

that is  r(1) = 1, r(2) = 2, r(3) = 3, r(4) = 4.

What did we want to capture?  (assuming each citation means the same, i.e., unweighted graph)
The difference between network 2 and network 1 is that,
val(1) > val(2) > val(3) > val(4)
at the current time.

Problems with absolute value of page ranks :

If we look at values that page rank gives, the value of paper 4 increased from 0.0598 to 0.0608. This does not make sense, since a paper in past having cited another paper or not should not change the value of a new paper.

If we look at ranks instead, paper 3 lost its position from rank 2 to rank 3.  But, a paper's value should depend on its in-degree and not its out degree.

So, page rank does not capture the essence of the difference between network 1 and 2. 

No comments:

Post a Comment