skill-trigger-evaluator¶
Pack: companion
skill-trigger-evaluator¶
Purpose¶
Evaluate whether a skill's description and trigger phrasing are specific enough to activate on the right requests and avoid unrelated ones.
Trigger phrases¶
- evaluate triggers
- test skill trigger
- trigger accuracy
- false positive rate
- is my skill triggering correctly
When to use¶
Use this skill when the user wants to test whether a skill fires on the right prompts, compare positive and negative query sets, or diagnose overlap with sibling skills.
Typical targets:
- a newly authored skill before publishing
- a skill with vague or noisy trigger phrasing
- a pack with multiple related skills that may overlap
- a skill that passes lint but still matches the wrong requests
Core workflow¶
- Load the target skill's
SKILL.mdand extract the description plus any explicit trigger phrases. - Load a test file or generate a basic test set with:
- positive queries that should trigger the skill
- negative queries that should not trigger the skill
- For each query, compute similarity to the skill description using keyword matching.
- Calculate aggregate metrics:
trigger_pass_ratefalse_positive_ratefalse_negative_rate- Output an evaluation report with per-query pass/fail results and aggregate metrics.
Recommended execution¶
Run the evaluator script instead of improvising manual checks:
uv run .opencode/skills/skill-trigger-evaluator/scripts/evaluate_triggers.py --path <skill-directory>
Useful options:
--test-file <file>to supply curated test queries--siblings <skills-dir>to detect sibling overlap--jsonfor machine-readable output--verboseto show per-query decisions--threshold 0.80to set the minimum acceptable positive pass rate
Output rules¶
Always report:
- skill name
- total positive and negative counts
- passed positive count
- failed negative count
trigger_pass_ratefalse_positive_ratefalse_negative_rate- any
cross_trigger_conflicts - final
verdict
If verbose output is requested, include each query with:
- expected behavior
- actual behavior
- overlap score
- matched keywords
Must do¶
- Use both positive and negative query sets.
- Keep the heuristic explicit: keyword overlap against the skill description.
- Report sibling conflicts when
--siblingsis provided.
... (13 more lines in full SKILL.md)