There's a lesson here, and I'm not going to be the one to figure it out.


View My GitHub Profile

My name is Apo and I’m a mighty pirate

Educated in several acronyms across the globe (UNISR, SFI, MIT), I was co-founder and CTO of Tooso, a NLP / IR startup in San Francisco acquired by TSX:CVO.

I led Coveo’s A.I. and MLOps roadmap from scale-up to IPO, and built out Coveo Labs, an agile, applied R&D practice rooted in word-class collaborations (Stanford, Bocconi, Outerbounds, Uber, Microsoft, NVIDIA), open source and open science.

I talk a lot, and I’m often invited to do so by folks in industry (BBC, Walmart, Pinterest, eBay, Farfetch) and academia (SIRIP, CiE, KDD, Stanford).

I am currently an Adj. Professor of ML at NYU, which is mostly notable because it is the only job I ever had that my parents (sort of) understand.

Where is my mind?

I often share code, papers, posts, homework and tweets (ORDER BY importance DESC); if you have no intention of selling me anything, you can also try me on Linkedin.

Selected talks, papers and datasets are highlighted (for the brave reader) at the very end of this page.

Current stuff

RecList and MLOps “at reasonable scale”

I have an ongoing project with Federico Bianchi (and friends) on behavioral testing for recommender systems: RecList spawned a popular open source package, a CIKM competition, hours of English-with-an-Italian-accent (e.g. this), and a paper at WWW 2022. RecList successfully raised funds from MLOps companies to sponsor its open development.

Having built end-to-end systems at garage, scale-up and IPO scale, I had the privilege of making a lot of mistakes in most parts of the DataOps and MLops stack. To share my learnings, I introduced the “reasonable scale ML” in a series of repositories and articles (“You don’t need a bigger boat”).

Current interests: ML testing, developing in the Modern Data Stack, improving SQL (?).

A.I. research

My recent research is in (mostly) applied and (sometimes) theoretical topics at the intersection of language, learning and retrieval.

I am co-organizer of SIGIR eCom, I’ve been involved in several NLP events in various roles (COLING, ECONLP, ECNLP, EMNLP) and my work has been presented in venues such as NAACL, WWW, RecSys: our work on cognitively-inspired query embeddings won the Best Paper Award at NAACL 21.

As a true SFI alumnus, I am an old-fashioned generalist, and I gave small contributions as papers, projects or reviews to a bunch of topics outside of “traditional A.I.”: computational social sciences, agent-based models, urban studies, philosophy of mind.

Current interests: multi-modal representations, making logic great again.

Old stuff

In previous lives, I managed to get a Ph.D., work for a professional basketball team, simulate a pre-Columbian civilization and give an academic talk on videogames (among others improbable “achievements”).

iSport: some time before Brad Pitt’s movie, I led the first attempt in Italy (and one of the first worldwide) of running sophisticated analytics for a professional basketball team.

pedalaMI: I led the first data analysis and vizualization effort on Milan’s bike-sharing service, which received plenty of press coverage. No bikers (and no bureaucrats) were harmed for the project.

SEP: together with Franz Berto I’m a proud author of Cellular Automata on the Stanford Encyclopedia, “the most interesting website on the internet” (or not).

About this page

The content of are released under the BY-NC-ND license; my chibi has been designed by the incredibly talented wisesnail.

Last update: August 2022.


Quick links to selected talks, papers, datasets: if there’s a paper, talk, slide deck you know I have, but you can’t find it (here or elsewhere), please do get in touch directly.

Selected talks

Selected papers


Aside from research and tutorials, our datasets have been successfully used by dozens of master students to defend their thesis at Tillburg University and Politecnico in Milan.