Tuesday, May 08, 2012

Fresh Data :)

We have obtained data on papers on High Energy Physics (HepTh) in arxiv from year 1991 to May 2nd 2012. We have crawled all the papers and their abstracts that are tagged with HepTh, both primary and secondary to address issue 2 raised after looking at outliers in old data obtained from SNAP. (1992 to 2003 with papers primary tagged to HepTh).

Current data has 79,188 nodes and 1,163,903 edges! We ran into memory issues :-P Finally, after fixing it, we have calculated in-degree, out-degree and page-rank of each nodes using Gephi. We are currently computing closeness and betweeness centrality on the data. It is taking a LOT of time, as expected.

Meanwhile, we are also simultaneously working on joining the fresh citation data with the salary data. We are downloading salary data for year 2010 for UC campuses. Since the latest salary data is 2010, we need to truncate our citation data accordingly.

No comments:

Post a Comment