My field? I don't have a field, I'm not a donkey.

    

View My GitHub Profile

TL;DR

Educated in several acronyms across the globe (UNISR, SFI, MIT), I am the co-founder of Bauplan, an agentic data infrastructure company based in SF.

I was the co-founder and CTO of Tooso, an AI startup providing search and recommendations to millions of users, before being acquired by Coveo (TSX:CVO). I led Coveo’s AI from scale-up to IPO, and built out Coveo Labs, an R&D lab rooted in open science: our libraries, models and datasets have collected thousands of stars and garnered tens of millions of downloads.

Throughout my career, I have been fortunate to collaborate with incredible teams (e.g. Netflix, NVIDIA, Stanford, Univ. of Wisconsin-Madison), while working on products spanning multiple fields: Artificial Intelligence, Data Management, Information Retrieval, Computer Systems. My research contributions are often product focused, and are memorable mostly for their titles (e.g. “Not all those who browse are lost”, “You don’t need a bigger boat”, “FaaS and Furious”).

While building my new startup, I moonlight as an Adj. Professor of ML Systems at NYU, which is only notable because it is the only job I have ever had that my parents understand.

Where is my mind?

I occasionally share code, ideas and teaching materials. Selected projects, talks, papers and datasets are highlighted below.

I recently started investing in startups, both directly and as an LP in AI funds: I’m always happy to chat with founders!

When stars align, I sometimes advise great teams on AI, Data, and IR: past engagements include Outerbounds (acquired by Anaconda), Objective (acquired by Upwork), and Plural (acquired by SAI360). If you think I can help, feel free to reach out.

Research

I have done research in a heterogeneous set of topics: Information Retrieval (e.g. RecSys, SIGIR), Machine Learning and model evaluation (WWW, NeurIPS), NLP (NAACL, ACL), data science (Nat. Sci. Rep., KDD), agentic AI and Large Language Models (ICML), data management (SIGMOD, VLDB), human-machine computation (HCOMP), computer systems (Middleware). Our paper on cognitively inspired query embeddings won the Best Paper Award at NAACL 21, and our talk on reproducible data pipelines on data lakes won the Best Presentation Award at DEEM 24.

I was the lead organizer of Supporting Our AI Overlords at ACM CAIS, the first-ever research workshop focused on the intersection of AI agents and data systems. I have been a co-organizer of SIGIR eCom (2022, 2023) and EvalRS (2022, 2023), Industry Sponsorship Chair for CIKM 2022, Industry Chair at UMAP 2025, and I have been involved in various capacities in several top-tier events (e.g. EMNLP, ACL, SIRIP, ECONLP, ECNLP, PaPoC).

As a true Santa Fe Institute alumnus, I am an old-fashioned generalist, and I made tiny contributions to other fields mostly as an excuse to spend time with old friends: logic and computation, cellular automata, computational social sciences, networks, philosophy of mind, political science, digital ethics.

Finally, some of my projects have been patented, but to this day nobody seems to really know why.

Old stuff

In previous lives, I managed to get a Ph.D., simulate a pre-Columbian civilization, document biases in national elections and give an academic talk on video games. Some of my improbable “achievements” received ample press coverage and earned a few sparks of Hacker News front-page popularity.

Having built end-to-end data pipelines at garage, growth and IPO scale, I happily shared all my mistakes in a series of articles that introduced the concept of Reasonable Scale.

Some time before Brad Pitt’s movie, I led one of the first attempts to run sophisticated analytics for a professional basketball team, and spearheaded the first data science effort on Milan’s bike-sharing service (no bikers or bureaucrats were harmed during the project).

About this page

The content of jacopotagliabue.it is released under the BY-NC-ND license; my chibi was designed by the incredibly talented wisesnail.

Last update: May 2026.

Appendix

Friends in industry and academia often invite me to talk about things I (sort of) know. Highlights include keynotes at KDD, SIGIR, RecSys, CiE, VLDB, and SRDS, plus talks at NVIDIA, Lyft, Home Depot, Pinterest, IBM, Columbia, Berkeley, and many others.

My publication list is available on Google Scholar: selected projects, talks, papers and datasets are collected here for convenience.

Selected Open Source Projects

Selected Talks

Selected Papers

Datasets and Data Challenges

Aside from research and tutorials, our datasets have been successfully used by dozens of graduate students to defend their theses at Tilburg University and Politecnico in Milan.