CORE: Controlling Output Rankings in Generative Engines for LLM-based Search

Q: How is CORE evaluated?

On ProductBench (15 categories x 200 products, with Amazon top-10 recommendations), measuring Promotion Success Rate.

This is the official project page for our paper, “Controlling Output Rankings in Generative Engines for LLM-based Search” (CORE). Read the paper on arXiv:2602.03608. Below we give the key definitions, the method, the headline results, and an FAQ.

What problem do we study?

LLM-based search systems and generative engines increasingly answer product queries by directly recommending a small set of options, rather than returning a traditional ranked list of web pages. A product may be retrieved by the search engine yet still fail to appear prominently in the LLM’s final recommendation. CORE studies how to improve visibility in this setting by controlling output rankings.

Key definitions

Generative engine / LLM-based search — a search system that synthesizes a direct answer or a short ranked recommendation set with an LLM, instead of returning a list of links.
Output ranking — the order in which items appear in the LLM’s generated recommendation.
Optimization content — text appended to the content a search engine retrieves, designed to influence that output ranking.
Promotion Success Rate (PSR) — how often a target item is promoted into the top-K of the generated ranking.
ProductBench — our evaluation benchmark of real product-recommendation queries (defined below).

Our method: CORE

CORE (Controlling Output Rankings in generative Engines) treats the LLM and its search interactions as a black box and optimizes the content returned by the search engine — not the model. It appends strategically designed optimization content of three types:

String-based optimization content — compact textual additions.
Reasoning-based optimization content — comparative reasoning the LLM can use when ranking.
Review-based optimization content — review-style evidence.

Results at a glance

Item	Value
Avg Promotion Success Rate @Top-5	91.4%
Avg Promotion Success Rate @Top-3	86.6%
Avg Promotion Success Rate @Top-1	80.3%
LLMs evaluated	GPT-4o, Gemini-2.5, Claude-4, Grok-3
Benchmark	ProductBench — 15 categories × 200 products, Amazon top-10
Access required	None (black-box; optimizes retrieved content)

Averages are across 15 product categories; CORE outperforms existing ranking-manipulation methods while preserving the fluency of the optimized content.

Benchmark: ProductBench

15 product categories
200 products per category
each product paired with its top-10 recommendations from Amazon’s search interface

FAQ

What is CORE? A black-box optimization method that controls where items land in the rankings generated by LLM-based search, by optimizing the content the search engine retrieves.

Does CORE need access to the model or search engine? No. It is black-box: it only changes retrieved content.

What does CORE optimize? Retrieved content, via three optimization-content types: string-based, reasoning-based, and review-based.

How is CORE evaluated? On ProductBench (15 categories × 200 products, with Amazon top-10 recommendations), measuring Promotion Success Rate.

Which LLMs were tested? GPT-4o, Gemini-2.5, Claude-4, and Grok-3.

Does optimization hurt readability? The paper reports CORE preserves the fluency of the optimized content while outperforming prior ranking-manipulation methods.

Paper

Controlling Output Rankings in Generative Engines for LLM-based Search — arXiv:2602.03608.