I'm trying to use the Docling library to convert a PDF document to Markdown, but I'm getting an error. The code I'm running is based on the official documentation, but when executed, it throws a RuntimeError: filename does not exist and ConversionError: Input document is not valid.

The RuntimeError: filename does not exist when using Docling for PDF conversion is a common issue related to missing model resources. Main causes include incomplete model installation, access restrictions, version conflicts, and permission issues. Solutions include manual model download, reinstalling Docling with cache clearing, using alternative backends, configuring offline mode, and checking dependency version compatibility.

NeuroAgent

Fix Docling PDF Conversion Error

Fix Docling PDF conversion errors. Resolve RuntimeError and ConversionError issues with step-by-step troubleshooting for resources and dependencies.

#docling-error #markdown-export #pdf-conversion #python-libraries #troubleshooting

10/31/2025, 08:01 PM

Question

How to fix an error when using Docling for PDF conversion?

I’m trying to use the Docling library to convert a PDF document to Markdown, but I’m getting an error. The code I’m running is based on the official documentation:

python

from docling.document_converter import DocumentConverter

source = "https://arxiv.org/pdf/2408.09869"
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown())

However, when running this code, the following error occurs:

Traceback (most recent call last):
  File "C:\Users\Макс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling\datamodel\document.py", line 171, in __init__
    self._init_doc(backend, path_or_stream)
    ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Макс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling\datamodel\document.py", line 215, in _init_doc
    self._backend = backend(self, path_or_stream=path_or_stream)
                    ~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Макс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling\backend\docling_parse_v4_backend.py", line 206, in __init__
    self.parser = DoclingPdfParser(loglevel="fatal")
                  ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
  File "C:\Users\Макс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling_parse\pdf_parser.py", line 579, in __init__
    self.parser = pdf_parser_v2(level=loglevel)
                  ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
RuntimeError: filename does not exists: C:\Users\Макс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling_parse\pdf_resources_v2/glyphs//standard/additional.dat
2025-10-28 20:18:53,881 - INFO - Going to convert document batch...
Traceback (most recent call last):
  File "C:\Users\Макс\Desktop\VS Code Projects\RAG\docling_test\docling1.py", line 10, in <module>
    result = converter.convert(source)
  File "C:\Users\Макс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\pydantic\_internal\_validate_call.py", line 39, in wrapper_function
    return wrapper(*args, **kwargs)
  File "C:\Users\Макс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\pydantic\_internal\_validate_call.py", line 136, in __call__
    res = self.__pydantic_validator__.validate_python(pydantic_core.ArgsKwargs(args, kwargs))
  File "C:\Users\Макс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling\document_converter.py", line 237, in convert
    return next(all_res)
  File "C:\Users\Макс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling\document_converter.py", line 260, in convert_all
    for conv_res in conv_res_iter:
                    ^^^^^^^^^^^^^
  File "C:\Users\Макс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling\document_converter.py", line 332, in _convert
    for item in map(
                ~~~^
        process_func,
        ^^^^^^^^^^^^^
        input_batch,
        ^^^^^^^^^^^^
    ):
    ^
  File "C:\Users\Макс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling\document_converter.py", line 379, in _process_document
    conv_res = self._execute_pipeline(in_doc, raises_on_error=raises_on_error)
  File "C:\Users\Макс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling\document_converter.py", line 415, in _execute_pipeline
    raise ConversionError(f"Input document {in_doc.file} is not valid.")
docling.exceptions.ConversionError: Input document 2408.09869v5.pdf is not valid.
-> Cannot close object, library is destroyed. This may cause a memory leak!

Interestingly, the same code runs successfully on another computer. What could be the problem and how can I fix it?

Problem Details:

Library: Docling
Operation: PDF to Markdown conversion
Error: RuntimeError: filename does not exists and ConversionError: Input document is not valid
Context: The code works on one computer but not on another

Possible Causes:

Problem with Docling installation or configuration
Missing required resources or files
Library version conflicts
File access permission issues

NeuroAgent

RuntimeError: filename does not exists Error in Docling

The RuntimeError: filename does not exists: C:\Users\Макс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling_parse\pdf_resources_v2/glyphs//standard/additional.dat error when using Docling for PDF conversion is a common problem related to missing required model resources. This error occurs when Docling attempts to load or find the required files for PDF processing, but they are not present in the system.

Main Causes of the Error
Solutions to the Problem
Checking and Updating Dependencies
Setting Up Offline Mode
Alternative Conversion Approaches
Checking Version Compatibility
Conclusion

Main Causes of the Error

The error occurs due to missing resource files in the pdf_resources_v2 directory. According to research, the main causes include:

Incomplete model installation: Docling requires downloading additional resources on first run, but the process may be interrupted due to network issues or limitations.
Access restrictions: In some corporate networks or servers with limited internet access, downloading models from the Hugging Face Hub may be blocked.
Version conflicts: Incompatibility between Docling, docling-parse, and other dependency versions can lead to resource loading issues.
Access permission problems: Resource files may be missing due to write permission issues in the installation directory.

Solutions to the Problem

1. Manual Model Download

The most reliable method is to manually download the required models:

python

import os
from pathlib import Path

# Create the models directory if it doesn't exist
models_dir = Path.home() / ".cache" / "docling" / "models"
models_dir.mkdir(parents=True, exist_ok=True)

# Set the environment variable to use local models
os.environ["DOCLING_MODELS_PATH"] = str(models_dir)

Then use the command to download models:

bash

docling-tools models download rapidocr

2. Reinstall Docling with Cache Cleanup

A complete reinstall with cache cleanup may solve the problem:

bash

# Remove existing installation
pip uninstall docling docling-parse -y

# Clear cache
pip cache purge

# Install the latest version
pip install docling[rapidocr]

3. Using an Alternative Backend

Try using a different PDF processing backend:

python

from docling.document_converter import DocumentConverter
from docling.pipeline import PdfPipeline

# Use the standard backend instead of VLM
converter = DocumentConverter(
    pipeline=PdfPipeline(
        ocr=...,  # OCR settings
        backend="standard"  # instead of VLM
    )
)

source = "https://arxiv.org/pdf/2408.09869"
result = converter.convert(source)
print(result.document.export_to_markdown())

Checking and Updating Dependencies

Ensure all dependencies are updated to compatible versions:

bash

pip list | grep -E "(docling|torch|numpy)"

# Update main dependencies
pip install --upgrade torch numpy
pip install --upgrade docling docling-parse

Note: As mentioned in research, there are compatibility issues between PyTorch and NumPy versions. Make sure you have a compatible combination installed.

Setting Up Offline Mode

If you’re working in an environment without internet access, you need to download all required models beforehand:

Download models on a machine with internet:

bash

# Create a temporary directory for models
mkdir -p ~/docling_models_offline
export DOCLING_MODELS_PATH=~/docling_models_offline

# Download all required models
docling-tools models download rapidocr
docling-tools models download docling-parse

Transfer models to the target machine and set the environment variable:

python

import os
os.environ["DOCLING_MODELS_PATH"] = "/path/to/your/models"

Run Docling with the correctly configured model path.

Alternative Conversion Approaches

1. Using Docker Container

A Docker container can solve dependency and model path issues:

bash

# Create a Dockerfile
FROM python:3.11-slim

# Install Docling and download models
RUN pip install docling[rapidocr] && \
    docling-tools models download rapidocr

# Run conversion
RUN python -c "
from docling.document_converter import DocumentConverter
converter = DocumentConverter()
result = converter.convert('https://arxiv.org/pdf/2408.09869')
print(result.document.export_to_markdown())
"

2. Conversion via CLI

Try using the Command Line Interface:

bash

docling convert https://arxiv.org/pdf/2408.09869 output.md

3. Using Alternative Libraries

If the problem persists, consider alternative approaches:

python

# Alternative: pdf2image + OCR
from pdf2image import convert_from_path
import pytesseract

# Convert PDF to images
images = convert_from_path("document.pdf")

# Process each image
for i, image in enumerate(images):
    text = pytesseract.image_to_string(image, lang='rus+eng')
    print(f"Page {i+1}: {text}")

Checking Version Compatibility

Check the compatibility of component versions:

Component	Recommended Version	Minimum Version
Docling	2.3+	2.0+
docling-parse	Latest	1.0+
PyTorch	2.2.2+	2.0+
NumPy	2.0+	1.21+

To check compatibility, run:

python

import sys
import pkg_resources

# Check version compatibility
required_packages = [
    'docling>=2.3',
    'docling-parse>=1.0',
    'torch>=2.0',
    'numpy>=1.21'
]

for package in required_packages:
    try:
        pkg_resources.require(package)
        print(f"✓ {package} is installed")
    except pkg_resources.DistributionNotFound:
        print(f"✗ {package} not found")
    except pkg_resources.VersionConflict:
        print(f"✗ Version conflict for {package}")

Conclusion

The RuntimeError: filename does not exists error in Docling is usually related to missing model resources. The main solutions include:

Manually downloading models using docling-tools models download
Complete reinstall with cache cleanup
Setting up offline mode for working without internet
Using alternative backends or approaches
Checking version compatibility of all dependencies

Since the code works on one computer but not another, the most likely cause are differences in the installation environment - either incomplete model downloads or dependency version conflicts. It’s recommended to start with manual model download and checking version compatibility.

Sources

Running docling offline with pre-downloaded models - GitHub Issue #232
Installation - Docling Documentation
Manual download of default models - GitHub Discussion #2089
docling-tools models download rapidocr not actually being used by default DocumentConverter - Issue #2500
FAQ - Docling Documentation

How to install Docling with preloaded models for offline work?What alternative libraries can be used for PDF to Markdown conversion?How to check dependency version compatibility for Docling?How to configure Docling for work with corporate networks with restricted access?What PDF processing backends exist in Docling and how to choose them?How to manually download required models for Docling in an offline environment?

Ask NeuroAgent

Fix Docling PDF Conversion Error

How to fix an error when using Docling for PDF conversion?

Problem Details:

Possible Causes:

RuntimeError: filename does not exists Error in Docling

Table of Contents

Main Causes of the Error

Solutions to the Problem

1. Manual Model Download

2. Reinstall Docling with Cache Cleanup

3. Using an Alternative Backend

Checking and Updating Dependencies

Setting Up Offline Mode

Alternative Conversion Approaches

1. Using Docker Container

2. Conversion via CLI

3. Using Alternative Libraries

Checking Version Compatibility

Conclusion

Sources