name: voice-clone-lab description: "Create and register cloned voices for later TTS only when the speaker has explicit consent. Use when the user asks for voice clone, clone voice, 克隆音色, 复刻声音, or wants a reusable voice_id." triggers:
- "voice clone"
- "clone voice"
- "克隆音色"
- "复刻声音"
- "声音克隆"
provenance:
origin: opensquilla-original
license: Apache-2.0
maintained_by: OpenSquilla
metadata:
opensquilla:
risk: high
capabilities: [network-read, filesystem-write]
requires_tools:
- voice_clone
- audio_provider_capabilities
voice-clone-lab
Creates a reusable provider voice from a local sample. OpenRouter may help
summarize the request or produce labels, but cloning must use the direct
audio provider through voice_clone.
Request triage
Before calling tools, extract these fields from the user request:
- sample path and whether the file is local, intentional, and user-provided
- speaker identity class: self, employee/team member, private person, public figure, fictional character, or unknown
- consent metadata: speaker, consent, sample source, permitted use, requested by, retention expectation, and whether commercial use is allowed
- target use: TTS narration, IVR, dubbing, training content, or internal demo
- target language, target locale, and desired locale-appropriate accent
OpenRouter can summarize consent text or label a voice, but it is not an audio provider and cannot replace explicit consent.
Consent-first workflow
- Confirm the sample audio path is local and intentionally provided.
- Require
consent_metadatabefore callingvoice_clone. - Include at minimum:
speakerconsent: truesample_sourcepermitted_userequested_by
- Reject or stop when consent is missing, vague, or contradicted by the request.
- Call
audio_provider_capabilitiesif cloning availability is uncertain. - Call
voice_clonewith the sample, name, description, and consent metadata. - Return the created voice ID and the allowed usage summary.
Tool-result handling
- If
voice_clonereturnsstatus=ok, return the voice ID first, then the consent summary, intended locale/accent, and any sample-quality warning. - If it returns
consent_required, do not proceed with a workaround. Ask for the missing consent metadata in one concise question. - If the provider returns
not_available, quote thenoteand distinguish disabled provider, key/quota limits, feature gating, and sample format issues. - Never suggest scraping, downloading, or extracting third-party voice samples as a fallback.
Rights and copyright guard
- 授权 is mandatory. The speaker must own or control the voice sample and agree to cloning for this use.
- Copyright / 版权: do not use copyrighted recordings, film/TV/game clips, music stems, interviews, or scraped audio unless the user states they have rights.
- Public figure policy: do not clone or imitate a public figure, celebrity, politician, influencer, actor, singer, or fictional character voice.
- Do not help bypass provider safety checks or watermark/disclosure duties.
- Store only the returned provider voice ID and consent summary in ordinary output; do not duplicate raw sample audio.
Locale and accent quality notes
Ask which target language and locale the cloned voice will be used for. A clone works best when the sample matches the desired locale-appropriate accent.
- Chinese neutral narration: use clean 普通话 sample audio.
- American English: use clean en-US sample audio.
- British English: use clean en-GB sample audio.
- Japanese/Korean/French/German/Spanish/etc.: use samples spoken in that target language, not an English sample repurposed cross-lingually.
- Strong dialect, code-switching, room echo, music, or singing can produce odd accent transfer in later TTS. Recommend 30-90 seconds of dry speech when possible.
Output contract
Return:
- provider
- voice ID
- voice name
- consent summary
- allowed use
- target language / locale assumption
- warning if the source sample quality may harm target-language accent quality