NeuroAgent

Fix 'filename does not exists' error in Docling

Solution for missing resource files in Docling when converting PDF to Markdown. Step-by-step methods to fix the error and prevent future issues.

Question

How do I fix the ‘filename does not exists’ error when using the Docling library to convert PDF to Markdown?

I’m trying to run the example code from the official Docling documentation:

python
from docling.document_converter import DocumentConverter

source = "https://arxiv.org/pdf/2408.09869"

converter = DocumentConverter()
result = converter.convert(source)

print(result.document.export_to_markdown())

But I’m getting the following error:

RuntimeError: filename does not exists: C:\Users\Макс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling_parse\pdf_resources_v2/glyphs//standard/additional.dat

Full traceback:

Traceback (most recent call last):
  File "C:\Usersакс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling\datamodel\document.py", line 171, in __init__
    self._init_doc(backend, path_or_stream)
    ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Usersакс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling\datamodel\document.py", line 215, in _init_doc
    self._backend = backend(self, path_or_stream=path_or_stream)
                    ~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Usersакс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling\backend\docling_parse_v4_backend.py", line 206, in __init__
    self.parser = DoclingPdfParser(loglevel="fatal")
                  ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
  File "C:\Usersакс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling_parse\pdf_parser.py", line 579, in __init__
    self.parser = pdf_parser_v2(level=loglevel)
                  ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
RuntimeError: filename does not exists: C:\Users\Макс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling_parse\pdf_resources_v2/glyphs//standard/additional.dat
2025-10-28 20:18:53,881 - INFO - Going to convert document batch...
Traceback (most recent call last):
  File "C:\Usersакс\Desktop\VS Code Projects\RAG\docling_test\docling1.py", line 10, in <module>
    result = converter.convert(source)
  File "C:\Usersакс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\pydantic\_internal\_validate_call.py", line 39, in wrapper_function
    return wrapper(*args, **kwargs)
  File "C:\Usersакс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\pydantic\_internal\_validate_call.py", line 136, in __call__
    res = self.__pydantic_validator__.validate_python(pydantic_core.ArgsKwargs(args, kwargs))
  File "C:\Usersакс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling\document_converter.py", line 237, in convert
    return next(all_res)
  File "C:\Usersакс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling\document_converter.py", line 260, in convert_all
    for conv_res in conv_res_iter:
                    ^^^^^^^^^^^^^
  File "C:\Usersакс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling\document_converter.py", line 332, in _convert
    for item in map(
                ~~~^
        process_func,
        ^^^^^^^^^^^^^
        input_batch,
        ^^^^^^^^^^^^
    ):
    ^
  File "C:\Usersакс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling\document_converter.py", line 379, in _process_document
    conv_res = self._execute_pipeline(in_doc, raises_on_error=raises_on_error)
  File "C:\Usersакс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling\document_converter.py", line 415, in _execute_pipeline
    raise ConversionError(f"Input document {in_doc.file} is not valid.")
docling.exceptions.ConversionError: Input document 2408.09869v5.pdf is not valid.
-> Cannot close object, library is destroyed. This may cause a memory leak!

The same code works successfully on another computer. What could be the problem and how can I fix it?

NeuroAgent

Error filename does not exists: ...additional.dat in Docling Library

The error filename does not exists: ...additional.dat in the Docling library occurs due to missing or corrupted resource files required for PDF document parsing. This is a common issue related to the installed version of Docling or environment configuration, and it can be successfully resolved through several methods.

Contents

Causes of the Error

The error indicates that the file additional.dat is missing from the Docling resource directory. The main causes include:

  1. Incomplete or corrupted installation - resource files were not installed correctly
  2. Access permission issues - the library cannot access resource files
  3. Version conflicts - mismatch between Docling versions and its dependencies
  4. Missing environment variables - the resource path is not configured

Based on your traceback, the problem occurs during PDF parser initialization in the file docling_parse_v4_backend.py, which confirms the resource-oriented nature of the error.

Main Solutions to the Problem

1. Setting artifact paths via environment variable

The most effective solution, based on research of the issue on GitHub:

python
import os
from docling.document_converter import DocumentConverter

# Set the path to artifacts
os.environ["DOCLING_ARTIFACTS_PATH"] = "/path/to/artifacts/directory"

source = "https://arxiv.org/pdf/2408.09869"
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown())

2. Reinstalling Docling with cache clearing

bash
# Complete library reinstall
pip uninstall docling docling-parse
pip install docling --no-cache-dir

# Or using pip cache purge
pip cache purge
pip install docling

3. Configuring converter options with specified paths

python
from docling.document_converter import DocumentConverter
from docling.datamodel.pipeline_options import PdfPipelineOptions
import os

# Create artifacts directory if it doesn't exist
artifacts_path = "./docling_artifacts"
os.makedirs(artifacts_path, exist_ok=True)

# Configure options with explicit paths
pdf_pipeline_options = PdfPipelineOptions(
    artifacts_path=artifacts_path,
    generate_page_images=False,
    generate_picture_images=False,
)

# Create converter with options
converter = DocumentConverter(
    format_options={
        "pdf": {
            "pipeline_options": pdf_pipeline_options
        }
    }
)

source = "https://arxiv.org/pdf/2408.09869"
result = converter.convert(source)

Additional Troubleshooting Methods

4. Checking and correcting installation paths

If the problem persists, verify the physical presence of resource files:

python
import sys
from pathlib import Path

# Check installation paths
for path in sys.path:
    resource_path = Path(path) / "docling_parse" / "pdf_resources_v2" / "glyphs" / "standard" / "additional.dat"
    if resource_path.exists():
        print(f"File found: {resource_path}")
    else:
        print(f"File missing: {resource_path}")

5. Using an alternative backend

If the standard backend causes issues, try using PyPdfium2:

python
from docling.document_converter import DocumentConverter
from docling.backend.pypdfium2_backend import PyPdfiumDocumentBackend

# Use alternative backend
converter = DocumentConverter(backend_class=PyPdfiumDocumentBackend)

source = "https://arxiv.org/pdf/2408.09869"
result = converter.convert(source)
print(result.document.export_to_markdown())

6. Disabling unnecessary features to reduce resource requirements

python
from docling.document_converter import DocumentConverter
from docling.datamodel.pipeline_options import PdfPipelineOptions, AcceleratorOptions, EasyOcrOptions

pdf_pipeline_options = PdfPipelineOptions(
    accelerator_options=AcceleratorOptions(device="auto"),
    ocr_options=EasyOcrOptions(force_full_page_ocr=False, use_gpu=False),
    do_ocr=False,  # Disable OCR if not required
    table_structure_options=None,  # Disable table recognition
    generate_page_images=False,
    generate_picture_images=False,
)

converter = DocumentConverter(
    format_options={
        "pdf": {
            "pipeline_options": pdf_pipeline_options
        }
    }
)

Preventing Future Errors

7. Using a virtual environment with isolated installation

bash
# Create a clean environment
python -m venv clean_docling_env
source clean_docling_env/bin/activate  # For Linux/Mac
# or
clean_docling_env\Scripts\activate  # For Windows

# Install in clean environment
pip install docling

8. Regularly updating the library

bash
pip install --upgrade docling

9. Monitoring resource usage

Add logging for issue diagnostics:

python
import logging
from docling.document_converter import DocumentConverter

# Configure logging
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)

converter = DocumentConverter()
try:
    result = converter.convert("https://arxiv.org/pdf/2408.09869")
    print(result.document.export_to_markdown())
except Exception as e:
    logger.error(f"Conversion error: {e}")

Verification and Diagnostics

10. Installation test verification

Create a test script to verify correct installation:

python
from docling.document_converter import DocumentConverter
import os

def test_docling_installation():
    try:
        # Check basic functionality
        converter = DocumentConverter()
        print("✓ Basic converter created successfully")
        
        # Check resource paths
        import sys
        from pathlib import Path
        
        for path in sys.path:
            resource_dir = Path(path) / "docling_parse" / "pdf_resources_v2"
            if resource_dir.exists():
                print(f"✓ Resource directory found: {resource_dir}")
                return True
        
        print("✗ Resource directory not found")
        return False
        
    except Exception as e:
        print(f"✗ Error during verification: {e}")
        return False

if __name__ == "__main__":
    test_docling_installation()

If all methods fail, you may need to consult the official Docling repository on GitHub or review existing issues, as the problem may be related to a specific version of the library or a system-specific issue.

Sources

  1. Failed to parse pdf from docling >= 2.24 · Issue #1064 · docling-project/docling
  2. Docling FAQ - Official Documentation
  3. Docling CLI reference - Official Documentation
  4. Docling Usage - Official Documentation
  5. Microsoft Q&A - Azure Function crashes with Docling

Conclusion

The filename does not exists error in Docling is typically resolved by combining the environment variable DOCLING_ARTIFACTS_PATH setup, library reinstallation, and converter options configuration. In most cases, using the first method with artifact path setup is sufficient. If the problem persists, it’s recommended to check for the presence of resource files in the installation directory or use alternative backends. To prevent similar issues in the future, use clean virtual environments and regularly update the library.