The user knows how to limit the vocabulary using the KaldiRecognizer parameter, but when trying to add a new word not present in the standard vocabulary, the system issues a warning. The official documentation on adding words to the VOSK model seems too complex. Is there a way to simply edit text files inside the model to add several hundred new words, and does anyone have experience with such modification?

NeuroAgent

How to Expand VOSK Vocabulary: Editing Model Files

Is it technically possible to expand the VOSK speech recognition vocabulary by simply editing text files? Learn about risks, alternatives, and safe methods for adding hundreds of words to the model.

#custom-recognition-models #kaldi-framework #manual-model-editing #phonetic-transcription #speech-recognition-optimization #vocabulary-expansion-methods #vosk-speech-recognition-library

11/06/2025, 11:02 AM

Question

Is it possible to expand the VOSK speech recognition vocabulary by simply editing text files inside the model? The user knows how to limit the vocabulary using the KaldiRecognizer parameter, but when attempting to add a new word that’s not in the standard vocabulary, the system issues a warning. The official documentation for adding words to the VOSK model seems overly complicated. I’m wondering if there’s a way to simply edit text files inside the model to add several hundred new words, and if anyone has experience with such modifications?

NeuroAgent

Brief Answer

Expanding the VOSK speech recognition vocabulary by editing text files within the model is technically possible, but strongly not recommended for regular users. This process requires a deep understanding of the Kaldi model structure and can lead to serious recognition failures if performed incorrectly.

Possibility of editing text files in VOSK model
Structure of VOSK model files related to vocabulary
Practical experience of modifying the dictionary
Safe alternatives for expanding vocabulary
Recommendations for adding several hundred words
Conclusions and practical tips

Possibility of editing text files in VOSK model

VOSK is based on the Kaldi speech recognition framework, which uses a complex file structure to store the vocabulary and language model. Technically, the model files can be edited manually, but this requires a deep understanding of the system’s internal architecture.

The VOSK model contains several key files related to the vocabulary:

graph/words.txt - main word dictionary
graph/phones.txt - list of phonemes
graph/phones/silence.csl - silence phoneme descriptions
graph/phones/disambig.int - phoneme disambiguation
graph/phones/disambig.txt - text disambiguation

The words.txt file contains the mapping of words to integer identifiers that are used throughout the recognition process. An example of its content:

<eps> 0
<s> 1
</s> 2

!SIL 3
<unk> 4
hello 5
world 6

The internal structure of a VOSK model is much more complex than it appears at first glance. Even simply adding words to words.txt requires corresponding changes in other files:

Phonetic transcription: each new word must have a phonetic transcription in the lexicon.txt file
Phoneme indices: new phonetic sequences must be added to the corresponding index files
Language model weighting: changes to the dictionary affect the statistical language model
Recognition graph: updating the dictionary requires rebuilding the HCLG graph

Important: Simply adding a line to words.txt without corresponding changes in other files will result in the new word never being recognized properly, or the system will produce an error during initialization.

Practical experience of modifying the dictionary

Research and discussions in the developer community show that manual model file editing is an extreme measure used by experienced users:

Pros of manual editing:

Allows adding specialized terminology
Doesn’t require retraining the model from scratch
Preserves all optimizations made by VOSK developers

Cons and risks:

High probability of model integrity violation
Requires deep Kaldi knowledge
May reduce overall recognition accuracy
Difficult to revert changes in case of errors

Community users note that adding several dozen words this way is possible, but for several hundred words, it becomes a very risky operation.

Safe alternatives for expanding vocabulary

For safe vocabulary expansion, it’s recommended to use official VOSK methods:

1. Using the `max_alternatives` parameter

python

recognizer = KaldiRecognizer(model, sample_rate)
recognizer.SetMaxAlternatives(5)  # Allows getting alternative recognition options

2. Creating a custom model

For adding a large number of words, the correct approach is:

Collect a corpus of texts with new terminology
Use VOSK scripts to create a language model
Retrain the model on new data

Note: VOSK provides tools for creating custom models through scripts in the tools directory. This method, while requiring more time, guarantees stable operation.

3. Using external post-processing systems

You can keep the standard VOSK model and add post-processing logic to recognize missing words.

Recommendations for adding several hundred words

If you still decide to edit model files, follow these recommendations:

Create a backup of the original model before starting
Use a virtual environment for testing changes
Test gradually - add words in small groups
Check model integrity after each change
Document all changes for possible rollback

To add words, perform the following steps:

Add words to words.txt with unique IDs
Create phonetic transcriptions for each new word
Update the lexicon.txt file
Rebuild the recognition graph using Kaldi scripts
Verify the model’s functionality

Warning: This process requires deep Kaldi knowledge and can take several hours of work even for an experienced developer.

Conclusions and practical tips

Main conclusions:

Technical possibility: Yes, editing text files in VOSK models is possible
Practical difficulty: The process is very complex and requires specialized knowledge
Risks: High probability of model malfunction
Alternatives: There are safer methods for expanding vocabulary

Practical recommendations:

For adding several dozen words, you can try manual editing with extreme caution
For adding several hundred words, it’s recommended to create a custom model
Use official VOSK tools for working with the dictionary
If you’re not an experienced Kaldi user, it’s better to look for alternative solutions

Tip for beginners: Instead of complex model editing, consider using a combination of the standard VOSK model and a simple text post-processing system to fix recognition of specialized terms.

Sources

How to create a custom VOSK model from scratch for specialized terminology?What VOSK tools are used for creating and configuring language models?Can I use a standard VOSK model with a post-processing system for recognizing specialized terms?How to check the integrity of a VOSK model after manual file editing?What alternative speech recognition libraries with more flexible vocabulary exist besides VOSK?How to add phonetic transcription for new words to a VOSK model without deep Kaldi knowledge?

Ask NeuroAgent