NeuroAgent

Fix Docling PDF Conversion Error

Fix Docling PDF conversion errors. Resolve RuntimeError and ConversionError issues with step-by-step troubleshooting for resources and dependencies.

Question

How to fix an error when using Docling for PDF conversion?

I’m trying to use the Docling library to convert a PDF document to Markdown, but I’m getting an error. The code I’m running is based on the official documentation:

python
from docling.document_converter import DocumentConverter

source = "https://arxiv.org/pdf/2408.09869"
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown())

However, when running this code, the following error occurs:

Traceback (most recent call last):
  File "C:\Usersакс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling\datamodel\document.py", line 171, in __init__
    self._init_doc(backend, path_or_stream)
    ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Usersакс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling\datamodel\document.py", line 215, in _init_doc
    self._backend = backend(self, path_or_stream=path_or_stream)
                    ~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Usersакс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling\backend\docling_parse_v4_backend.py", line 206, in __init__
    self.parser = DoclingPdfParser(loglevel="fatal")
                  ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
  File "C:\Usersакс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling_parse\pdf_parser.py", line 579, in __init__
    self.parser = pdf_parser_v2(level=loglevel)
                  ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
RuntimeError: filename does not exists: C:\Users\Макс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling_parse\pdf_resources_v2/glyphs//standard/additional.dat
2025-10-28 20:18:53,881 - INFO - Going to convert document batch...
Traceback (most recent call last):
  File "C:\Usersакс\Desktop\VS Code Projects\RAG\docling_test\docling1.py", line 10, in <module>
    result = converter.convert(source)
  File "C:\Usersакс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\pydantic\_internal\_validate_call.py", line 39, in wrapper_function
    return wrapper(*args, **kwargs)
  File "C:\Usersакс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\pydantic\_internal\_validate_call.py", line 136, in __call__
    res = self.__pydantic_validator__.validate_python(pydantic_core.ArgsKwargs(args, kwargs))
  File "C:\Usersакс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling\document_converter.py", line 237, in convert
    return next(all_res)
  File "C:\Usersакс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling\document_converter.py", line 260, in convert_all
    for conv_res in conv_res_iter:
                    ^^^^^^^^^^^^^
  File "C:\Usersакс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling\document_converter.py", line 332, in _convert
    for item in map(
                ~~~^
        process_func,
        ^^^^^^^^^^^^^
        input_batch,
        ^^^^^^^^^^^^
    ):
    ^
  File "C:\Usersакс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling\document_converter.py", line 379, in _process_document
    conv_res = self._execute_pipeline(in_doc, raises_on_error=raises_on_error)
  File "C:\Usersакс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling\document_converter.py", line 415, in _execute_pipeline
    raise ConversionError(f"Input document {in_doc.file} is not valid.")
docling.exceptions.ConversionError: Input document 2408.09869v5.pdf is not valid.
-> Cannot close object, library is destroyed. This may cause a memory leak!

Interestingly, the same code runs successfully on another computer. What could be the problem and how can I fix it?

Problem Details:

  • Library: Docling
  • Operation: PDF to Markdown conversion
  • Error: RuntimeError: filename does not exists and ConversionError: Input document is not valid
  • Context: The code works on one computer but not on another

Possible Causes:

  • Problem with Docling installation or configuration
  • Missing required resources or files
  • Library version conflicts
  • File access permission issues
NeuroAgent

RuntimeError: filename does not exists Error in Docling

The RuntimeError: filename does not exists: C:\Users\Макс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling_parse\pdf_resources_v2/glyphs//standard/additional.dat error when using Docling for PDF conversion is a common problem related to missing required model resources. This error occurs when Docling attempts to load or find the required files for PDF processing, but they are not present in the system.

Table of Contents

Main Causes of the Error

The error occurs due to missing resource files in the pdf_resources_v2 directory. According to research, the main causes include:

  1. Incomplete model installation: Docling requires downloading additional resources on first run, but the process may be interrupted due to network issues or limitations.

  2. Access restrictions: In some corporate networks or servers with limited internet access, downloading models from the Hugging Face Hub may be blocked.

  3. Version conflicts: Incompatibility between Docling, docling-parse, and other dependency versions can lead to resource loading issues.

  4. Access permission problems: Resource files may be missing due to write permission issues in the installation directory.

Solutions to the Problem

1. Manual Model Download

The most reliable method is to manually download the required models:

python
import os
from pathlib import Path

# Create the models directory if it doesn't exist
models_dir = Path.home() / ".cache" / "docling" / "models"
models_dir.mkdir(parents=True, exist_ok=True)

# Set the environment variable to use local models
os.environ["DOCLING_MODELS_PATH"] = str(models_dir)

Then use the command to download models:

bash
docling-tools models download rapidocr

2. Reinstall Docling with Cache Cleanup

A complete reinstall with cache cleanup may solve the problem:

bash
# Remove existing installation
pip uninstall docling docling-parse -y

# Clear cache
pip cache purge

# Install the latest version
pip install docling[rapidocr]

3. Using an Alternative Backend

Try using a different PDF processing backend:

python
from docling.document_converter import DocumentConverter
from docling.pipeline import PdfPipeline

# Use the standard backend instead of VLM
converter = DocumentConverter(
    pipeline=PdfPipeline(
        ocr=...,  # OCR settings
        backend="standard"  # instead of VLM
    )
)

source = "https://arxiv.org/pdf/2408.09869"
result = converter.convert(source)
print(result.document.export_to_markdown())

Checking and Updating Dependencies

Ensure all dependencies are updated to compatible versions:

bash
pip list | grep -E "(docling|torch|numpy)"

# Update main dependencies
pip install --upgrade torch numpy
pip install --upgrade docling docling-parse

Note: As mentioned in research, there are compatibility issues between PyTorch and NumPy versions. Make sure you have a compatible combination installed.

Setting Up Offline Mode

If you’re working in an environment without internet access, you need to download all required models beforehand:

  1. Download models on a machine with internet:
bash
# Create a temporary directory for models
mkdir -p ~/docling_models_offline
export DOCLING_MODELS_PATH=~/docling_models_offline

# Download all required models
docling-tools models download rapidocr
docling-tools models download docling-parse
  1. Transfer models to the target machine and set the environment variable:
python
import os
os.environ["DOCLING_MODELS_PATH"] = "/path/to/your/models"
  1. Run Docling with the correctly configured model path.

Alternative Conversion Approaches

1. Using Docker Container

A Docker container can solve dependency and model path issues:

bash
# Create a Dockerfile
FROM python:3.11-slim

# Install Docling and download models
RUN pip install docling[rapidocr] && \
    docling-tools models download rapidocr

# Run conversion
RUN python -c "
from docling.document_converter import DocumentConverter
converter = DocumentConverter()
result = converter.convert('https://arxiv.org/pdf/2408.09869')
print(result.document.export_to_markdown())
"

2. Conversion via CLI

Try using the Command Line Interface:

bash
docling convert https://arxiv.org/pdf/2408.09869 output.md

3. Using Alternative Libraries

If the problem persists, consider alternative approaches:

python
# Alternative: pdf2image + OCR
from pdf2image import convert_from_path
import pytesseract

# Convert PDF to images
images = convert_from_path("document.pdf")

# Process each image
for i, image in enumerate(images):
    text = pytesseract.image_to_string(image, lang='rus+eng')
    print(f"Page {i+1}: {text}")

Checking Version Compatibility

Check the compatibility of component versions:

Component Recommended Version Minimum Version
Docling 2.3+ 2.0+
docling-parse Latest 1.0+
PyTorch 2.2.2+ 2.0+
NumPy 2.0+ 1.21+

To check compatibility, run:

python
import sys
import pkg_resources

# Check version compatibility
required_packages = [
    'docling>=2.3',
    'docling-parse>=1.0',
    'torch>=2.0',
    'numpy>=1.21'
]

for package in required_packages:
    try:
        pkg_resources.require(package)
        print(f"✓ {package} is installed")
    except pkg_resources.DistributionNotFound:
        print(f"✗ {package} not found")
    except pkg_resources.VersionConflict:
        print(f"✗ Version conflict for {package}")

Conclusion

The RuntimeError: filename does not exists error in Docling is usually related to missing model resources. The main solutions include:

  1. Manually downloading models using docling-tools models download
  2. Complete reinstall with cache cleanup
  3. Setting up offline mode for working without internet
  4. Using alternative backends or approaches
  5. Checking version compatibility of all dependencies

Since the code works on one computer but not another, the most likely cause are differences in the installation environment - either incomplete model downloads or dependency version conflicts. It’s recommended to start with manual model download and checking version compatibility.

Sources

  1. Running docling offline with pre-downloaded models - GitHub Issue #232
  2. Installation - Docling Documentation
  3. Manual download of default models - GitHub Discussion #2089
  4. docling-tools models download rapidocr not actually being used by default DocumentConverter - Issue #2500
  5. FAQ - Docling Documentation