headshot of Ryan Ryan Marcus, assistant professor at the University of Pennsylvania. Using machine learning to build the next generation of data systems.
      
    ____                       __  ___                          
   / __ \__  ______ _____     /  |/  /___ _____________  _______
  / /_/ / / / / __ `/ __ \   / /|_/ / __ `/ ___/ ___/ / / / ___/
 / _, _/ /_/ / /_/ / / / /  / /  / / /_/ / /  / /__/ /_/ (__  ) 
/_/ |_|\__, /\__,_/_/ /_/  /_/  /_/\__,_/_/   \___/\__,_/____/  
      /____/                                                    
        
   ___                   __  ___                    
  / _ \__ _____ ____    /  |/  /__ ___________ _____
 / , _/ // / _ `/ _ \  / /|_/ / _ `/ __/ __/ // (_-<
/_/|_|\_, /\_,_/_//_/ /_/  /_/\_,_/_/  \__/\_,_/___/
     /___/                                          
        
   ___  __  ___                    
  / _ \/  |/  /__ ___________ _____
 / , _/ /|_/ / _ `/ __/ __/ // (_-<
/_/|_/_/  /_/\_,_/_/  \__/\_,_/___/                                   
        

Most influential database papers

Latest update: June 10th, 2024.

Ever wondered which database systems papers have been the most influential? This page explores the PageRank of a paper in the citation graph, one possible measure of influence.

All rankings, including this one, codify an ideology and thus won’t match up with everyone’s understanding of “influence.” Most notably, this ranking has nothing to do with industrial adoption, actual usage, or quality of an idea, other than by correlation with citation. Note that the citation graph grows every year, so papers published more recently will, on average, have a lower score.

PageRank is computed with the Python package NetworkX on top of the citation graph of all SIGMOD, VLDB, CIDR, and PODS papers. Self-citations are excluded. Absolute and percentile ranks are then computed for each paper using the PageRank score. Authors are ranked according to the sum of their paper’s PageRank score, where each paper’s score is divided by the number of authors, similar to CSRankings. Click one of the links below to explore the ranking!

Most influential papers of all time

This is a list of all papers, sorted by their PageRank score. Search papers by title using the text box below.

Most influential papers, by year

This is a list of papers published in a particular year, sorted by their PageRank score. You can select a different year from the dropdown. Papers published in recent years have fewer citations, and thus more noise in their score.

Most influential people

This table ranks all authors of a paper that has been cited at least once (by another paper). An author’s score is the sum of their paper’s normalized PageRank score. The normalized PageRank score of a paper is the PageRank score of the paper divided by the number of authors on the paper.

Due to the exponential distribution of PageRank scores, the most influential papers have significantly higher scores than the median paper. As a result, authors of the most influential papers will rank highly here, even if they have few papers overall.

    Notes

    PageRank is computed with and with the initial weight for each paper set to . These hyperparameters come from:

    Walker, Dylan, Huafeng Xie, Koon-Kiu Yan, and Sergei Maslov. “Ranking scientific publications using a model of network traffic.” Journal of Statistical Mechanics: Theory and Experiment 2007, no. 06 (2007): P06010.