This post explains, step by step, how our paper CORE approaches one of the newest techniques for improving visibility in LLM-based search: controlling the output ranking of generative engines. Full paper: arXiv:2602.03608.
Key definitions
- Generative engine — an LLM-based search system that returns a synthesized answer or a short ranked recommendation set rather than a list of links.
- Output ranking — the order of items in that generated recommendation.
- Optimization content — text appended to retrieved content to influence the ranking.
How CORE works, step by step
- Treat the system as a black box. CORE assumes no access to the model weights or the search engine internals.
- Pick the optimization surface. The realistic lever a content owner controls is the content the search engine retrieves — so that is what CORE optimizes.
- Append optimization content. CORE adds strategically designed content of three types (below) to steer how the LLM ranks items in its answer.
- Measure promotion. CORE evaluates whether a target item is promoted into the top-K of the generated ranking, on the ProductBench benchmark.
The three optimization-content types
- String-based — compact textual additions.
- Reasoning-based — comparative reasoning that helps the LLM when it ranks.
- Review-based — review-style supporting evidence.
Results
| Metric (avg across 15 categories) | CORE |
|---|---|
| Promotion Success Rate @Top-5 | 91.4% |
| Promotion Success Rate @Top-3 | 86.6% |
| Promotion Success Rate @Top-1 | 80.3% |
| LLMs evaluated | GPT-4o, Gemini-2.5, Claude-4, Grok-3 |
CORE outperforms existing ranking-manipulation methods while preserving the fluency of the optimized content.
FAQ
Do I need access to the LLM? No — CORE is black-box and only changes retrieved content.
What is ProductBench? Our benchmark: 15 product categories, 200 products each, paired with top-10 recommendations from Amazon’s search interface.
Which content type should I think about first? The paper studies all three (string / reasoning / review); each is effective, and they target different parts of how the LLM forms its ranking.