Ryan Marcus, assistant professor at the University of Pennsylvania. Using machine learning to build the next generation of data systems.
    ____                       __  ___
   / __ \__  ______ _____     /  |/  /___ _____________  _______
  / /_/ / / / / __ `/ __ \   / /|_/ / __ `/ ___/ ___/ / / / ___/
 / _, _/ /_/ / /_/ / / / /  / /  / / /_/ / /  / /__/ /_/ (__  )
/_/ |_|\__, /\__,_/_/ /_/  /_/  /_/\__,_/_/   \___/\__,_/____/
      /____/
        
   ___                   __  ___
  / _ \__ _____ ____    /  |/  /__ ___________ _____
 / , _/ // / _ `/ _ \  / /|_/ / _ `/ __/ __/ // (_-<
/_/|_|\_, /\_,_/_//_/ /_/  /_/\_,_/_/  \__/\_,_/___/
     /___/
        
   ___  __  ___
  / _ \/  |/  /__ ___________ _____
 / , _/ /|_/ / _ `/ __/ __/ // (_-<
/_/|_/_/  /_/\_,_/_/  \__/\_,_/___/
        

Scalable Semantic Operators

ScaleLLM flowchart. See https://doi.org/10.1145/3722212.3725130 for details.

Semantic operators, generally powered by LLMs, are taking the database world by storm. Queries that were previously out of reach – like β€œdoes this review appear fake?” – are now possible. Unfortunately, naive implementations of semantic operators generally involve calling an expensive LLM for every row of data. How can we scale semantic operators to datasets with billions of rows?

Papers

People