Training an LLM-RecSys Hybrid for Steerable Recs with Semantic IDs

A bilingual LLM trained with semantic IDs as vocabulary tokens can recommend items and be steered through natural conversation.

Large Language Models Recommendation Systems 语义ID Model Fine-tuning

KEY POINTS

Semantic IDs replace random hashes, making items native to LLM vocabulary
Single model handles both recommendation and conversational explanation
Chat-based steering lets users control recommendations with reasoning

ANALYSIS

When Recommendation Systems Learn to "Speak Human" – Semantic IDs with LLM×RecSys

In today's world dominated by large language models (LLMs), recommendation systems (RecSys) and LLMs have largely operated in separate silos. Recommendation systems rely on massive amounts of user behavior data to precisely predict clicks, but only output cold, impersonal ranked lists. LLMs, on the other hand, can write, chat, and possess world knowledge, but lack specific knowledge of product catalogs, leading to generic recommendations.

Eugene Yan conducted an interesting experiment: instead of having these two systems operate independently, why not use a single model to handle both tasks?

The Core Idea: Semantic IDs

Traditional recommendation systems assign random hash IDs to each product. For the model, there's no semantic connection between "item_7f3a" and "item_b2c1." Eugene Yan's approach is to replace random IDs with semantic token sequences, making product IDs a natural part of the LLM's vocabulary.

The trained model becomes "bilingual" – it can speak English and "speak" product IDs. Given a user's browsing history, it can predict the next likely click. More importantly, you can directly interact with it using natural language:

"Help me find gear suitable for outdoor hiking."
"Why did you recommend this?"
"Combine these products into a gift package and give it a name."

The model can not only recommend but also reason about and explain its choices.

Why It Matters

This breaks down the wall between two paradigms. For developers, this means:

No more complex retrieval + dialogue hybrid architectures – a single model can handle everything.
Recommendation results can be finely tuned using natural language – "Not too expensive," "More sporty style."
Recommendation systems can finally "explain themselves," which is crucial for user experience and compliance.

Of course, this experiment is currently a small-scale fine-tuning exercise, and prompt design has a significant impact. It's still a ways off from being production-ready. However, the direction is clear: the future of recommendation systems may not be stronger recall algorithms, but smarter conversationalists.

The complete code and data preparation process are open-sourced.

Analysis by BitByAI · Read original

Originally from eugeneyan.com · Analyzed by BitByAI