Our group’s research focus is on the implementation (as opposed to
applications) of data systems. Broadly, our group works on machine programming
for data systems, investigating how to build data systems that automatically
adapt to new hardware and user workloads, invent novel processing strategies,
and understand user intention.
Active projects
Offline query optimization: important queries deserve additional optimization. How can we find the best possible query plan given a time budget?
Scalabe semantic operators: how can we scale semantic operators to datasets with billions of rows? Even calling a small LLM per-row is too slow.
Rethinking execution engines: as hardware and query workloads become more complex, how should we design exeution engines for future DBMSes?
LLMs for query optimization: LLMs have great reasoning capabilities in some domains. How can we leverage those abilities for QO?
TIDES: How can organizations share data while ensuring compliance with privacy regulations? TIDES is a privacy-preserving cross-organization data integration service.
Are you current Penn student interested in working with us? Feel free to book a quick 15 minute chat with me.
We generally work with students who have strong systems backgrounds. Please
include a link to your website, and use the phrase “march of the penguins” in
the description so I know you read this!