[note] Higgs-Audio Common Token Summary
Note: This page is an AI-generated (gpt-5-mini-2025-08-07) translation from Traditional Chinese and may contain minor inaccuracies.
📌 Introduction
When encountering errors while working with Higgs-Audio, I plan to carefully study the entire operation of Higgs-Audio from start to finish, so I’ll start by understanding tokens. Because Higgs-Audio needs to handle both “text” and “audio” tokens simultaneously, it looks rather complicated at first, so I plan to thoroughly organize how many tokens there are.
🚀 Introduction to Tokens Used in Higgs-Audio
- Lowercase tokens: boundary control (start / end)
- Uppercase tokens: content replacement (replaced with actual data during preprocessing)
Text
Text markers
<|begin_of_text|>: start of text sequence<|end_of_text|>: end of text sequence
<|eom_id|>: end of message<|eot_id|>: end of turn
Message roles (System, User, Assistant)
<|start_header_id|>: marks the start of a message role<|end_header_id|>: marks the end of a message role
Audio
<|audio_bos|>: marks the start of an input audio segment<|audio_eos|>: marks the end of an input audio segment<|audio_out_bos|>: marks the starting point of output audio tokens
<|scene_desc_start|>: start of recording environment/scene description<|scene_desc_end|>: end of recording environment/scene description
<|AUDIO|>: audio input<|AUDIO_OUT|>: discrete audio tokens
Others
Tools
<|recipient|>: tool call
Reserved words
<|reserved_special_token_*|>
Generation style guidelines
<|generation_instruction_start|>: start of generation rules/style instructions<|generation_instruction_end|>: end of generation rules/style instructions
Event-type sound effects
<SE><SE_s><SE_e>
1 | for tag, replacement in [ |
🔁 Recap
- Learned there are two main token categories: boundary control and content replacement
- Compiled the tokens appearing in Higgs-Audio and their uses
🔗 References
[note] Higgs-Audio Common Token Summary
https://hsiangjenli.github.io/blog/note-higgs-audio-token.en/