My field? I don't have a field, I'm not a donkey.

    

View My GitHub Profile

TL;DR

Educated in several acronyms across the globe (UNISR, SFI, MIT), I was co-founder and CTO of Tooso, an AI startup in San Francisco providing search results and product recommendations to millions of users, before being acquired by TSX:CVO.

I led Coveo’s AI from scale-up to IPO, and built out Coveo Labs, an applied R&D practice rooted in open science: our libraries, models and datasets have collected thousands of stars, raised tens of thousands of dollars in donations, and garnered millions of downloads.

Throughout my career, I have been fortunate enough to collaborate with incredible folks in industry and academia (e.g. Netflix, NVIDIA, Stanford, Univ. of Wisconsin-Madison), and work on products spanning multiple fields: Information Retrieval, Data Science, Artificial Intelligence, Data Management, Computer Systems. My research papers are usually characterized by a keen product eye, and are memorable mostly for their titles (e.g. “Not all those who browse are lost”, “You don’t need a bigger boat”, “Mo’ models, mo’ problems”, “Faas and Furious”).

While building my new startup, Bauplan, I moonlight as Adj. Professor of ML at NYU, which is only notable because it is the only job I ever had that my parents understand.

Where is my mind?

I occasionally share code, ideas and teaching materials. Selected projects, talks, papers and datasets are highlighted below.

I recently started investing in startups, both directly and as LP in AI funds: I’m always happy to chat with founders about DataOps, MLOps and AI.

Research

I have done product-minded research in a (perhaps surprisingly) heterogenous set of of topics: Information Retrieval (e.g. RecSys, SIGIR), Machine Learning and model evaluation (WWW, NeurIps), NLP (NAACL, ACL), data science (Nat. Sci. Rep., KDD), AI and Large Language Models (ICML), data management (SIGMOD, VLDB), human-machine computation (HCOMP), computer systems (WoSC 10). Our paper on cognitively-inspired query embeddings won the Best Paper Award at NAACL 21, and our talk on reproducible data science on data lakes won the Best Presentation Award at DEEM 24.

I have been co-organizer of SIGIR eCom (2022, 2023) and EvalRS (2022, 2023), Industry Sponsorship Chair for CIKM 2022, Industry Chair at UMAP (2025), and I have been involved in various organizational capacities in several top-tier research events (COLING, EMNLP, ACL, SIRIP, ECONLP, ECNLP).

As a true Santa Fe Institute alumnus, I am an old-fashioned generalist, and I gave tiny contributions to other fields mostly as an excuse to spend time with old friends: logic and computation, cellular automata, computational social sciences, networks, philosophy of mind, political science, digital ethics.

Finally, some of my research projects have been patented, but to this day nobody seems to really know why.

Old stuff

In previous lives, I managed to get a Ph.D., simulate a pre-Columbian civilization, document biases in national elections and give an academic talk on videogames. Some of my improbable “achievements” received ample press coverage.

Having built end-to-end data pipelines at garage, growth and IPO scale, I happily shared all my mistakes in a series of articles that introduced the concept of Reasonable Scale.

Some time before Brad Pitt’s movie, I led one of the first attempts of running sophisticated analytics for a professional basketball team, and spearheaded the first data science effort on Milan’s bike-sharing service (no bikers or bureaucrats were harmed during the project).

About this page

The content of jacopotagliabue.it are released under the BY-NC-ND license; my chibi has been designed by the incredibly talented wisesnail.

Last update: December 2024.

Appendix

I often get invited to talk about things I (sort of) know by friends in industry (e.g. Home Depot, Farfetch, eBay, Pinterest, Tubi) and academia (e.g. keynotes at KDD, SIRIP, RecSys, CiE).

While my full publication list is available on Google Scholar, quick links to selected projects, talks, papers and datasets are collected here for convenience.

Open source projects

Talks

Papers

Datasets and data challenges

Aside from research and tutorials, our datasets have been successfully used by dozens of master students to defend their thesis at Tillburg University and Politecnico in Milan.