name: vs-search-tuning-partial-case description: "Use when the user provides 1-50 concrete bad-case search queries for one Viking Search app and wants local deterministic fixes. This skill only verifies request-level fine-operation interventions against a read-only baseline scene and delivers a console-ready configuration sheet, validated payloads, and a replay script. It must not mutate scenes, apps, dictionaries, datasets, recall core parameters, or online defaults."
Viking Search Partial Case Tuning
1. Scope
Use this skill when the user provides a small number of concrete bad-case queries, usually 1-50, and asks to "optimize the search results for these queries."
This skill only performs local, highly deterministic, fine-grained operational fixes:
- It does not replace global relevance or recall-parameter tuning.
- It does not change online configuration; it only validates candidates through request-level payloads.
- The final deliverables are a fine-grained operations configuration sheet that can be copied into the console, validated payloads, and a replay script.
2. Hard Boundaries
- Do not change core recall parameters:
user_defined_recall_mode,dense_weight,text_weight,query_keyword_match_percent, ormax_retrieved_num. - The baseline scene is read-only; do not create, modify, derive, publish, or switch scenes.
- Do not write app config, dataset config, dictionaries, recommend scenes; do not bind anything, and do not perform any persistent writes.
- Validation may only take effect through a single-request payload passed to
vs search run --data; no side effects may remain after the request finishes. - Do not rewrite the user's real query. Query-rewrite ideas may only be used as candidate evidence; they must not be written back to the original query or used as online trigger terms that replace the user's query.
- Do not use conclusions from local cases to judge the overall quality of the recall strategy.
- Do not force a search-mode change. Evaluate the currently locked baseline scene as-is; if the scene itself is not UserDefined, do not change it for this skill.
- In the apply phase, only deliver the configuration sheet and evidence; actual launch must be performed manually in the Viking console based on the configuration sheet.
3. Naming Conventions
- File names and report references should use "query + strategy name"; do not use internal aliases such as
C1,case-3, orcand-1. - File-name slug: keep the first few words of the query, convert spaces to hyphens, and remove symbols that are unsuitable for file names.
- When referring to a plan in the report, use readable names such as
filter item scope(category=Light Sports,product_type=SHOE).
4. Known CLI Behaviors and Pitfalls
These execution constraints must be internalized. Read this section before writing scripts.
4.1 Command Names Must Follow the Actual CLI
Common read-only commands:
vs auth status --json
vs llm status --json
vs app status --application-id <app_id> --json
vs app get --id <app_id> --json
vs app online-config get --application-id <app_id> --json
vs app dataset-config get --application-id <app_id> --dataset-id <dataset_id> --json
vs search scene list --application-id <app_id> --json
vs search scene get --application-id <app_id> --scene-id <scene_id> --json
Notes:
vs app getuses--id, not--application-id.vs app dataset-config getmust include--dataset-id.- The current CLI may not have
vs search dictionary list; if it returnsUnknown search subcommand: dictionary, record this as a missing capability and do not block the main flow. vs search run --helpmay not show service flags such as--json,--output, or--data; do not conclude that a flag is unsupported only because help omits it. You must verify support with probes.
4.2 --data Is a Full Request Override
vs search run --data does not "append fields on top of --query / --page-size"; it overrides the request body.
Incorrect example:
vs search run ... --query "white shoes for dresses" --page-size 10 --data '{}'
This causes the request body to omit query.text and pagination parameters.
A correct candidate payload must explicitly include:
{
"query": { "text": "user's original query" },
"page_size": 10,
"page_number": 1,
"filter": {
"op": "must",
"field": "reverse",
"conds": ["Footwear"]
}
}
The CLI usually fills in dataset_id, but for auditability, the report must still record the actual application-id, dataset-id, scene-id, and page-size.
4.3 Request-Level Filter DSL
Prioritize validating the following DSL. Do not use unverified eq, string expressions, or {field:value} maps.
Single field:
{
"op": "must",
"field": "category",
"conds": ["Soccer"]
}
Multi-field AND:
{
"op": "and",
"conds": [
{ "op": "must", "field": "category", "conds": ["Soccer"] },
{ "op": "must", "field": "product_type", "conds": ["SHOE"] }
]
}
Multiple conds on the same field means OR; multiple must clauses placed in and.conds mean AND.
When validating product_id filters, the common ID field in results is _id; when calculating field_match@10, map product_id to result _id to avoid incorrectly scoring it as 0.
4.4 Authentication Pitfalls in Batch Scripts
Running vs search run ... directly may read secure-store successfully, while spawning vs from a Node/Bash batch script in a restricted sandbox may fail with:
Missing Viking auth. Run `vs auth import-env` ...
This does not necessarily mean the user is not logged in. Handle it as follows:
- First verify auth with the direct command
vs auth status --json. - If the direct command works but the batch script fails, request external execution / real terminal permission for the current environment and rerun the batch script.
- Do not attribute this error to the app or query.
4.5 Query File Format
vs search tune validate accepts text or query.text; do not use an arbitrary field name query.
Recommended JSONL:
{"id":"white-shoes-for-dresses","text":"white shoes for dresses"}
{"id":"TF-soccer-shoes","text":"TF soccer shoes"}
5. Fine-Grained Operations Action Library
By default, the trigger condition should be "search term equals the user's original query" for exact matching and low spillover. Only use broader conditions such as contains, search-term type, or search-term length when the business explicitly wants to cover a class of queries.
Candidates must first pass capability confirmation. Fields must come from the dataset config's filterFields or from verified request-level capabilities; do not invent fields.
| Failure Mode | Preferred Action | Console Configuration | Request Validation Field | Notes |
|---|---|---|---|---|
| Wrong category / wrong top-level class recalled | Filter item scope | Search term equals X; field category/reverse/product_type/gender equals Y | filter |
Prefer low-risk structured fields |
| Audience mismatch | Filter item scope | gender equals Women/Men/Boys/Girls | filter |
Suitable for kids', women's, and men's queries |
| Product attribute has no structured field | Product ID whitelist, or guaranteed recall after business confirmation | product_id contains ID list | Approximate validation via filter.product_id |
Medium risk; requires maintaining a product pool; mark as requiring business confirmation in the report |
| Target is not recalled and CLI supports guaranteed recall | Guaranteed recall | Product ID contains item_id | guaranteed_recall |
Generate only when cli-capabilities proves support |
| Already recalled but ranked too low | Boost / bury | Product ID weight | boost_rules / bury_rules |
Generate only when a request-level entry point is proven to exist |
| Need sorting by field | Field sorting | Field online_time/sales descending | rank_rules |
Field must exist and be sortable |
| Out of stock / prohibited / competitor items | Strong filter + empty-result protection | stock > 0 and sellable = true | filter |
stock/sellable must be filterFields; otherwise do not generate |
| Same brand / series dominates the page | Diversity | Per-page upper limit N for brand | diversity |
Generate only when a request-level entry point is proven to exist |
If an action-library item has no CLI request-level entry point, write "Not evaluated: the CLI does not expose an equivalent request entry point" in the report; do not generate fake payloads.
6. Standard Flow
Step 1. Preflight Checks
Minimum requirements:
vs auth status --json
vs app status --application-id <app_id> --json
vs search scene list --application-id <app_id> --json
Confirm that the app is ready, that there is a unique or specified dataset-id, and that an available search scene exists.
Step 2. Lock the Baseline Scene
Priority:
- A scene explicitly specified by the user.
- The scene corresponding to a known best global tuning result that has already been applied.
- The current default scene.
After locking it, treat it as read-only. Record the scene id, name, whether it is the default, and a summary of the main SearchConfig.
Step 3. Research Capabilities and Save Them
First inspect help:
vs search run --help
vs search tune run --help
vs search tune plan --help
vs search tune validate --help
Then collect read-only information:
vs auth status --json
vs llm status --json
vs app status --application-id <app_id> --json
vs app get --id <app_id> --json
vs app online-config get --application-id <app_id> --json
vs app dataset-config get --application-id <app_id> --dataset-id <dataset_id> --json
vs search scene get --application-id <app_id> --scene-id <baseline_scene_id> --json
You must confirm capabilities with probes:
vs search run --application-id <app_id> --scene-id <scene_id> --dataset-id <dataset_id> \
--query "<q>" --page-size 2 --json --output <session>/probe-output.json
vs search run --application-id <app_id> --scene-id <scene_id> --dataset-id <dataset_id> \
--query "<q>" --page-size 2 \
--data '{"query":{"text":"<q>"},"page_size":2,"page_number":1}' \
--json --output <session>/probe-data.json
vs search run --application-id <app_id> --scene-id <scene_id> --dataset-id <dataset_id> \
--query "<q>" --page-size 2 \
--data '{"query":{"text":"<q>"},"page_size":2,"page_number":1,"filter":{"op":"must","field":"reverse","conds":["Footwear"]}}' \
--json --output <session>/probe-filter.json
Generate <session>/cli-capabilities.json, containing at least:
- application-id, dataset-id, and baseline scene-id.
search_run_output,search_run_raw_data, andrequest_filter.request_filter_dsl.- Dataset
filterFields,indexFields, andimageIndexFields. - Whether
tune_run_scene_idandtune_run_interventions_fileappear in help. - Unsupported capabilities and error messages, for example dictionary list does not exist.
effective_evaluation_mode: defaultsearch-topn-strategy-effect.
Only consider llm-candidate-matrix when tune run clearly supports scene-id and equivalent pass-through of candidate payloads; otherwise do not use an LLM matrix.
Step 4. Run Baseline and Probes
Run Top10 for each real user query:
vs search run --application-id <app_id> --scene-id <scene_id> --dataset-id <dataset_id> \
--query "<q>" --page-size 10 --json --output <session>/baselines/baseline-<q-slug>.json
Automatically attribute failure based on Top10:
- Wrong top-level class: reverse/category/product_type are clearly mismatched.
- Wrong audience: gender mismatch, for example children/adult items are mixed in.
- Missing attribute: the query requires attributes such as "wide feet / heavy weight / flat feet / rainy days / does not rub the foot," but
filterFieldshas no corresponding field. - Stock / competitor issues: continue only when relevant
filterFieldsexist.
Run a small number of filter probes for suspected field strategies to prove the fields actually take effect.
Step 5. Generate Candidates
By default, generate at most 5 candidates per query. Do not generate invalid or unsupported actions just to fill the count.
Candidate generation priority:
- Low-risk structured filters:
reverse,category,product_type, andgender. - Multi-field combined filters.
- For semantic attributes that have no structured field, use a
product_idwhitelist for approximate validation, and mark it as medium risk requiring business confirmation of the product pool. - Generate guaranteed recall, boost, bury, sorting, diversity, and similar candidates only when capability checks prove support.
Each candidate must include:
strategy_name: a readable strategy name.console_config: trigger condition + rule action + field/value.request_payload: a full payload includingquery.text,page_size, andpage_number.riskand the reason for the risk.
Step 6. Validate Candidates
Baseline and candidate runs must use the same application-id / dataset-id / scene-id / page-size. The only effective difference for the candidate is the --data payload.
Candidate command:
vs search run --application-id <app_id> --scene-id <scene_id> --dataset-id <dataset_id> \
--query "<q>" --page-size 10 \
--data '<full_request_payload_json>' \
--json --output <session>/candidates/candidate-<q-slug>-<strategy-slug>.json
For each query x candidate, output:
strategy_nameconsole_configrequest_payloadbaseline_top10candidate_top10target_hit@10bad_hit@10first_target_rankfield_match@10result_count_deltadecision_reason
Scores are only auxiliary. If the baseline is already perfect, do not force a no-benefit rule recommendation; mark it as "optional guardrail: baseline already satisfies the requirement" or "not recommended for launch."
Step 7. Report and Deliverables
The report's primary content is the fine-grained operations configuration sheet for the console:
| Search Term (Trigger Condition) | Rule Action | Field / Parameter | Value | Risk Level | Recommended for Launch |
|---|---|---|---|---|---|
| Equals X | Filter item scope | category equals AND product_type equals | Light Sports; SHOE | Low | Yes |
| Equals Y | Filter item scope | product_id contains | 123,456 | Medium | Launch after business confirms product pool |
The report must include:
- Console configuration sheet: directly copyable.
- Before/after evidence: Top10 titles, product IDs, reverse/category/gender/product_type, and metric changes.
- Validated payload appendix: the raw request payload for each selected rule.
- Out-of-stock / competitor-specific section: if there are no relevant fields or relevant queries, explain why no rule was generated.
- Spillover and maintenance risk: exact-query triggers have low spillover; product_id whitelists require maintenance; campaign rules need time windows and expiration cleanup.
Also deliver:
cli-capabilities.jsoncandidate-evaluation.jsonselected-strategies.jsonreplay.sh
replay.sh should rerun baseline, candidate, analysis, and report generation; do not use --resume-run-id.
7. Default Execution Summary
Lock a read-only baseline scene -> collect app/dataset/scene/capabilities -> use probes to confirm --output, --data, and filter DSL -> save real queries and validate them -> run baseline Top10 -> infer failure modes -> generate only fine-grained operations candidates supported by capabilities -> validate before/after with full --data payloads -> select the best options and label risks -> output the console configuration sheet, payloads, evidence, and replay script.