NeuroAgent

How to Expand VOSK Vocabulary: Editing Model Files

Is it technically possible to expand the VOSK speech recognition vocabulary by simply editing text files? Learn about risks, alternatives, and safe methods for adding hundreds of words to the model.

Question

Is it possible to expand the VOSK speech recognition vocabulary by simply editing text files inside the model? The user knows how to limit the vocabulary using the KaldiRecognizer parameter, but when attempting to add a new word that’s not in the standard vocabulary, the system issues a warning. The official documentation for adding words to the VOSK model seems overly complicated. I’m wondering if there’s a way to simply edit text files inside the model to add several hundred new words, and if anyone has experience with such modifications?

NeuroAgent

Brief Answer

Expanding the VOSK speech recognition vocabulary by editing text files within the model is technically possible, but strongly not recommended for regular users. This process requires a deep understanding of the Kaldi model structure and can lead to serious recognition failures if performed incorrectly.

Contents

Possibility of editing text files in VOSK model

VOSK is based on the Kaldi speech recognition framework, which uses a complex file structure to store the vocabulary and language model. Technically, the model files can be edited manually, but this requires a deep understanding of the system’s internal architecture.

The VOSK model contains several key files related to the vocabulary:

  • graph/words.txt - main word dictionary
  • graph/phones.txt - list of phonemes
  • graph/phones/silence.csl - silence phoneme descriptions
  • graph/phones/disambig.int - phoneme disambiguation
  • graph/phones/disambig.txt - text disambiguation

The words.txt file contains the mapping of words to integer identifiers that are used throughout the recognition process. An example of its content:

<eps> 0
<s> 1
</s> 2

!SIL 3
<unk> 4
hello 5
world 6

The internal structure of a VOSK model is much more complex than it appears at first glance. Even simply adding words to words.txt requires corresponding changes in other files:

  1. Phonetic transcription: each new word must have a phonetic transcription in the lexicon.txt file
  2. Phoneme indices: new phonetic sequences must be added to the corresponding index files
  3. Language model weighting: changes to the dictionary affect the statistical language model
  4. Recognition graph: updating the dictionary requires rebuilding the HCLG graph

Important: Simply adding a line to words.txt without corresponding changes in other files will result in the new word never being recognized properly, or the system will produce an error during initialization.

Practical experience of modifying the dictionary

Research and discussions in the developer community show that manual model file editing is an extreme measure used by experienced users:

Pros of manual editing:

  • Allows adding specialized terminology
  • Doesn’t require retraining the model from scratch
  • Preserves all optimizations made by VOSK developers

Cons and risks:

  • High probability of model integrity violation
  • Requires deep Kaldi knowledge
  • May reduce overall recognition accuracy
  • Difficult to revert changes in case of errors

Community users note that adding several dozen words this way is possible, but for several hundred words, it becomes a very risky operation.

Safe alternatives for expanding vocabulary

For safe vocabulary expansion, it’s recommended to use official VOSK methods:

1. Using the max_alternatives parameter

python
recognizer = KaldiRecognizer(model, sample_rate)
recognizer.SetMaxAlternatives(5)  # Allows getting alternative recognition options

2. Creating a custom model

For adding a large number of words, the correct approach is:

  • Collect a corpus of texts with new terminology
  • Use VOSK scripts to create a language model
  • Retrain the model on new data

Note: VOSK provides tools for creating custom models through scripts in the tools directory. This method, while requiring more time, guarantees stable operation.

3. Using external post-processing systems

You can keep the standard VOSK model and add post-processing logic to recognize missing words.

Recommendations for adding several hundred words

If you still decide to edit model files, follow these recommendations:

  1. Create a backup of the original model before starting
  2. Use a virtual environment for testing changes
  3. Test gradually - add words in small groups
  4. Check model integrity after each change
  5. Document all changes for possible rollback

To add words, perform the following steps:

  1. Add words to words.txt with unique IDs
  2. Create phonetic transcriptions for each new word
  3. Update the lexicon.txt file
  4. Rebuild the recognition graph using Kaldi scripts
  5. Verify the model’s functionality

Warning: This process requires deep Kaldi knowledge and can take several hours of work even for an experienced developer.

Conclusions and practical tips

Main conclusions:

  1. Technical possibility: Yes, editing text files in VOSK models is possible
  2. Practical difficulty: The process is very complex and requires specialized knowledge
  3. Risks: High probability of model malfunction
  4. Alternatives: There are safer methods for expanding vocabulary

Practical recommendations:

  • For adding several dozen words, you can try manual editing with extreme caution
  • For adding several hundred words, it’s recommended to create a custom model
  • Use official VOSK tools for working with the dictionary
  • If you’re not an experienced Kaldi user, it’s better to look for alternative solutions

Tip for beginners: Instead of complex model editing, consider using a combination of the standard VOSK model and a simple text post-processing system to fix recognition of specialized terms.

Sources

  1. Official VOSK Documentation - Working with Vocabulary
  2. VOSK Custom Model Creation Guide
  3. VOSK Vocabulary Expansion Discussion on Stack Overflow
  4. Kaldi Documentation - Language Model Creation
  5. VOSK Custom Model Examples on GitHub