跳转至

skill-trigger-evaluator

所属包: companion


skill-trigger-evaluator

Purpose

Evaluate whether a skill's description and trigger phrasing are specific enough to activate on the right requests and avoid unrelated ones.

Trigger phrases

  • evaluate triggers
  • test skill trigger
  • trigger accuracy
  • false positive rate
  • is my skill triggering correctly

When to use

Use this skill when the user wants to test whether a skill fires on the right prompts, compare positive and negative query sets, or diagnose overlap with sibling skills.

Typical targets:

  • a newly authored skill before publishing
  • a skill with vague or noisy trigger phrasing
  • a pack with multiple related skills that may overlap
  • a skill that passes lint but still matches the wrong requests

Core workflow

  1. Load the target skill's SKILL.md and extract the description plus any explicit trigger phrases.
  2. Load a test file or generate a basic test set with:
  3. positive queries that should trigger the skill
  4. negative queries that should not trigger the skill
  5. For each query, compute similarity to the skill description using keyword matching.
  6. Calculate aggregate metrics:
  7. trigger_pass_rate
  8. false_positive_rate
  9. false_negative_rate
  10. Output an evaluation report with per-query pass/fail results and aggregate metrics.

Run the evaluator script instead of improvising manual checks:

uv run .opencode/skills/skill-trigger-evaluator/scripts/evaluate_triggers.py --path <skill-directory>

Useful options:

  • --test-file <file> to supply curated test queries
  • --siblings <skills-dir> to detect sibling overlap
  • --json for machine-readable output
  • --verbose to show per-query decisions
  • --threshold 0.80 to set the minimum acceptable positive pass rate

Output rules

Always report:

  • skill name
  • total positive and negative counts
  • passed positive count
  • failed negative count
  • trigger_pass_rate
  • false_positive_rate
  • false_negative_rate
  • any cross_trigger_conflicts
  • final verdict

If verbose output is requested, include each query with:

  • expected behavior
  • actual behavior
  • overlap score
  • matched keywords

Must do

  • Use both positive and negative query sets.
  • Keep the heuristic explicit: keyword overlap against the skill description.
  • Report sibling conflicts when --siblings is provided.

... (完整 SKILL.md 中还有 13 行)