This week I ended up reading a couple of recent articles around the topic of search. Not groundbreaking paperâs style. Rather down-to-earth field implementations. Below, Iâll go through the paid search challenges in two major online platforms. And then to the emerging role of a Relevance Engineer.
Shopping upsells on Pinterest. An interesting story. Let me decompose it to the common steps seen across data projects.
A simple problem to solve â introduce ads into the search results. They call it âshopping upsellsâ. Imagine you need to build a shopping upsell model.
Step 1. Get Data
Where to get the data for a feature that doesnât yet exist on a platform?
One approach: randomly display a portion of upsells for all queries. However, this way the product quality is mixed with the user intent for shopping â not clear if the user doesnât want to buy in general or doesnât like this particular ad.
A better approach: embed products in both upsell and organic sections, but hide prices in organic. This way is possible to distil the intent of a user and make data less noisy.
Step 2. Get Model
Youâve got data, get a model.
Use business knowledge to come up with a smart objective. Clicks on products are usually noisy, but a good first start. Much better to assign proper weights to strong signals and smartly combine them. Pinterest uses pins and clicks to partner sites.
Model architecture:
Query -> Embedding -> Encoder -> Dense -> Log LossNew practitioners are often disappointed by seeing simple architectures after all the resnets and RNNs theyâve just studied. But complexity and state-of-the-arts are often wrong fallacies to chase for most of the businesses.
Step 3a. Get Results
âAfter launching the experiment, the model increased more than 2X traffic to the shopping search page without hurting overall search metrics in terms of long clicks or saves. The model also increased more than 2X product impressions and product long clicks through the upsell.â
Step 3b. Hack Production
Having the results you now need to hack the costs to get the âmodel economicsâ right.
For example, they are smartly precomputing head queries and filtering out ânon-shoppable categories, such as ârecipeâ or âfinanceâ.â
My bet is that Pinterest didnât come up with these optimizations from the beginning. Usually, itâs a loop of 2-3b steps until you get all the components right. This often-overlooked cycle of small adjustments, in this case, allowed to reduce model serving traffic by 70% đ€Ż
Ebayâs article on balancing paid and non-paid content in their search results.
The basic idea is that having fixed paid slots is bad. Both for:
head queries, for which there is much more paid content than itâs possible to fit
as well as for tail queries, for which there is often not enough high-quality paid content
The solution? Get rid of the fixed paid slots and rank the whole search result according to ârelevancyâ. Here is a more detailed summary:
đ”ïžââïžDS or ML? RE!
Another interesting take on the career in the data field from one of the most famous search practitioners. A couple of highlights:
Who is a relevance engineer: âimplements information retrieval algorithms that solve user information needs in real time, at scaleâ
Applied approach: âdonât chase the state of the art unnecessarily, rather they prefer proven techniques for 80% of the problemâ, âdonât solve search for Kaggle points or academia, but for real companies and usersâ
How itâs different from ML engineer: both roles are very similar, with relevance engs tending to be more user-centric and focused on IR problems (ML is broader and not necessarily user-facing problems)
I think the role will become more popular going forward with many companies realizing the need and value of showing relevant content to users in an ever-shrinking customer attention span.
Enjoyed what you just read? You can subscribe to the newsletter below.