One can't judge a click model only by how well it ranks documents, we also need to make sure it actively identified and removed biases hidden in the logged data.
That's what we showed in our recent #SIGIR23 paper with Philipp Hager, Jean-Michel Renders and Maarten de Rijke.
#sigir23 #PaperThread #ultr #clickmodels #IR
Together with Jean-Michel Renders and Maarten de Rijke, we investigated how click models *actually* perform (spoiler: not so great), and whether our offline metrics capture this (spoiler 2: they don't).
Here's what it means for unbiased learning-to-rank researchers and practitioners ⬇️