opensearch-function-scoring-algorithms

star 162

Search relevance and ranking on OpenSearch/Elasticsearch for a two-sided marketplace — candidate retrieval (hybrid BM25 + kNN, RRF, two-tower EBR), base relevance (BM25F, multi_match, LambdaMART), quality signals (Wilson lower bound, Bayesian average, rank_feature saturation/sigmoid), personalization (listing/user/session embeddings), spatial/temporal decay (gauss/exp), marketplace balance (conversion-weighted ranking, supply fairness, Pareto multi-objective), bias correction (IPS, click models, Thompson sampling), empirical evaluation (judgment sets, NDCG, ablation, A/B sizing, CUPED, regression suites), and diversity (MMR, DPP, max-per-host). Triggers on function_score, rank_feature, script_score, kNN, hybrid query, learning-to-rank, two-sided ranking, exposure fairness, NDCG, A/B testing, judgment set construction, ranking ablation, or "why is my OpenSearch ranking bad". Applies to Elasticsearch too — same APIs.

pproenca By pproenca schedule Updated 5/27/2026

name: opensearch-function-scoring-algorithms description: Search relevance and ranking on OpenSearch/Elasticsearch for a two-sided marketplace — candidate retrieval (hybrid BM25 + kNN, RRF, two-tower EBR), base relevance (BM25F, multi_match, LambdaMART), quality signals (Wilson lower bound, Bayesian average, rank_feature saturation/sigmoid), personalization (listing/user/session embeddings), spatial/temporal decay (gauss/exp), marketplace balance (conversion-weighted ranking, supply fairness, Pareto multi-objective), bias correction (IPS, click models, Thompson sampling), empirical evaluation (judgment sets, NDCG, ablation, A/B sizing, CUPED, regression suites), and diversity (MMR, DPP, max-per-host). Triggers on function_score, rank_feature, script_score, kNN, hybrid query, learning-to-rank, two-sided ranking, exposure fairness, NDCG, A/B testing, judgment set construction, ranking ablation, or "why is my OpenSearch ranking bad". Applies to Elasticsearch too — same APIs.

Marketplace-Research OpenSearch Function Scoring Best Practices

A reference distillation of research-backed algorithms for ranking in two-sided marketplaces (Airbnb, Uber Eats, DoorDash, Etsy, eBay, Booking.com) implemented on OpenSearch or Elasticsearch. Contains 56 rules across 9 categories, prioritised by cascade effect in the search ranking pipeline. Each rule explains the WHY (the cascade or the bias it corrects), shows incorrect-vs-correct code (OpenSearch JSON queries, Painless scripts, Python pre-processing, evaluation methodology), and links to the canonical source — KDD/SIGIR/WSDM papers, the OpenSearch documentation, and the engineering blogs of the marketplaces that proved these patterns at scale.

When to Apply

Reach for this skill when:

  • Designing a new marketplace search system on OpenSearch or Elasticsearch from scratch
  • Tuning function_score / rank_feature / script_score queries that aren't moving the needle
  • Setting up hybrid retrieval (BM25 + dense vectors) with Reciprocal Rank Fusion
  • Choosing between HNSW and IVF for billion-scale ANN indexes
  • Adding personalization via listing/user embeddings or two-tower architectures
  • Correcting position bias in click logs before retraining an LTR model
  • Designing exposure-fairness or new-listing cold-start exposure allocation
  • Composing decay functions (gauss / exp / linear) over geo + date + freshness
  • Diversifying the top window with MMR, DPP, or per-host caps
  • Debugging "why does my top-10 show 8 listings from one host?" or "why does ranking favor popular incumbents?"
  • Building offline evaluation infrastructure — graded judgment sets, NDCG@k pipelines, ablation studies, regression query suites
  • Designing A/B tests for ranking changes — MDE / power / sample-size pre-computation, CUPED variance reduction, online-offline correlation calibration
  • Attributing lift to specific scoring components — "did my new bias-correction help, or was it the embeddings, or both?"

The rules apply to any OpenSearch/Elasticsearch-backed marketplace search regardless of vertical — accommodation, food delivery, restaurants, services, jobs, secondhand goods, real estate. Triggers include "marketplace ranking", "search relevance", "function_score", "rank_feature", "script_score", "kNN", "hybrid search", "RRF", "learning to rank", "embedding-based retrieval", "two-tower", "position bias", "MMR", "supply fairness", "Pareto multi-objective", "NDCG", "judgment set", "ablation study", "CUPED", "A/B sample size", "ranking eval", and "why are my search results bad".

The Search Ranking Lifecycle

Categories are derived from the marketplace search ranking pipeline. Earlier stages cascade — a miss in recall (stage 1) cannot be repaired by any downstream boost, and a wrong base relevance multiplies through every functional score:

Query → [1] Recall → [2] Base Relevance → [3] Quality Signals → [4] Personalization
      → [5] Geo/Time Decay → [6] Marketplace Balance → [7] Diversity Re-rank → Results
                                                            ↑
                                          [8] Bias Correction (applied across all stages
                                                       and into training)
                                                            ↑
                                          [9] Evaluation & Measurement (the meta-layer:
                                                       judgment sets, NDCG, ablation, A/B
                                                       sizing, CUPED — without these you
                                                       can't tell if any rule helped)

Rule Categories by Priority

Priority Category Impact Prefix Rules
1 Candidate Retrieval & Recall CRITICAL recall- 6
2 Base Relevance & Field Scoring CRITICAL rel- 7
3 Quality Signals & Confidence Bounds HIGH qual- 6
4 Personalization & Embeddings HIGH pers- 7
5 Spatial & Temporal Decay HIGH decay- 5
6 Two-Sided Marketplace Balance HIGH market- 7
7 Bias Correction & Online Learning HIGH bias- 6
8 Evaluation & Measurement HIGH eval- 7
9 Diversity & Re-ranking MEDIUM-HIGH div- 5

Quick Reference

1. Candidate Retrieval & Recall (CRITICAL)

2. Base Relevance & Field Scoring (CRITICAL)

3. Quality Signals & Confidence Bounds (HIGH)

4. Personalization & Embeddings (HIGH)

5. Spatial & Temporal Decay (HIGH)

6. Two-Sided Marketplace Balance (HIGH)

7. Bias Correction & Online Learning (HIGH)

8. Evaluation & Measurement (HIGH)

9. Diversity & Re-ranking (MEDIUM-HIGH)

How to Use

For a focused question ("which decay function for geo distance?"), jump directly to the relevant rule (decay-gauss-geo) — each rule is self-contained with the WHY, OpenSearch query/Painless code, and the canonical source citation.

For a full ranking system review, work the categories top-to-bottom. The cascade ordering is real: get recall right first (no boost recovers a missed candidate), then base relevance (it's the multiplicand of every functional score), then quality / personalization / decay / marketplace balance / bias correction in that order. Diversity is the last re-rank step over a well-ordered top window.

For correcting bias before retraining, start with bias-position-ips and bias-click-models — applying IPS to position-confounded click data is the single highest-leverage change for any marketplace that retrains LTR models on logged clicks.

For testing multiple algorithms together and validating empirically, start with eval-graded-judgment-set (build the foundation), eval-ndcg-primary-metric (pick the metric), then eval-ablation-attribution (attribute lift to specific components). Pair with eval-online-offline-correlation to verify your offline metric predicts online behavior, eval-ab-sample-size-mde + eval-cuped-variance-reduction for disciplined A/B testing, and eval-regression-query-suite to catch silent quality drops on named queries.

For research-citing a design decision, every rule ends with the canonical reference — KDD/SIGIR/WSDM papers, the relevant engineering blog (Airbnb, Pinterest, DoorDash, Etsy, Just Eat Takeaway, Thumbtack), or the OpenSearch documentation page.

Read section definitions for the cascade-impact rationale behind the category ordering, or the rule template when adding a new rule.

Reference Files

File Description
references/_sections.md Category definitions and ordering by cascade impact
AGENTS.md Compact TOC navigation (auto-built; do not edit by hand)
assets/templates/_template.md Template for authoring new rules
metadata.json Version and authoritative reference URLs
Install via CLI
npx skills add https://github.com/pproenca/dot-skills --skill opensearch-function-scoring-algorithms
Repository Details
star Stars 162
call_split Forks 12
navigation Branch main
article Path SKILL.md
More from Creator