How do I fix the ‘filename does not exists’ error when using the Docling library to convert PDF to Markdown?
I’m trying to run the example code from the official Docling documentation:
from docling.document_converter import DocumentConverter
source = "https://arxiv.org/pdf/2408.09869"
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown())
But I’m getting the following error:
RuntimeError: filename does not exists: C:\Users\Макс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling_parse\pdf_resources_v2/glyphs//standard/additional.dat
Full traceback:
Traceback (most recent call last):
File "C:\Users\Макс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling\datamodel\document.py", line 171, in __init__
self._init_doc(backend, path_or_stream)
~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Макс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling\datamodel\document.py", line 215, in _init_doc
self._backend = backend(self, path_or_stream=path_or_stream)
~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Макс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling\backend\docling_parse_v4_backend.py", line 206, in __init__
self.parser = DoclingPdfParser(loglevel="fatal")
~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
File "C:\Users\Макс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling_parse\pdf_parser.py", line 579, in __init__
self.parser = pdf_parser_v2(level=loglevel)
~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
RuntimeError: filename does not exists: C:\Users\Макс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling_parse\pdf_resources_v2/glyphs//standard/additional.dat
2025-10-28 20:18:53,881 - INFO - Going to convert document batch...
Traceback (most recent call last):
File "C:\Users\Макс\Desktop\VS Code Projects\RAG\docling_test\docling1.py", line 10, in <module>
result = converter.convert(source)
File "C:\Users\Макс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\pydantic\_internal\_validate_call.py", line 39, in wrapper_function
return wrapper(*args, **kwargs)
File "C:\Users\Макс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\pydantic\_internal\_validate_call.py", line 136, in __call__
res = self.__pydantic_validator__.validate_python(pydantic_core.ArgsKwargs(args, kwargs))
File "C:\Users\Макс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling\document_converter.py", line 237, in convert
return next(all_res)
File "C:\Users\Макс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling\document_converter.py", line 260, in convert_all
for conv_res in conv_res_iter:
^^^^^^^^^^^^^
File "C:\Users\Макс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling\document_converter.py", line 332, in _convert
for item in map(
~~~^
process_func,
^^^^^^^^^^^^^
input_batch,
^^^^^^^^^^^^
):
^
File "C:\Users\Макс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling\document_converter.py", line 379, in _process_document
conv_res = self._execute_pipeline(in_doc, raises_on_error=raises_on_error)
File "C:\Users\Макс\Desktop\VS Code Projects\RAG\.venv\Lib\site-packages\docling\document_converter.py", line 415, in _execute_pipeline
raise ConversionError(f"Input document {in_doc.file} is not valid.")
docling.exceptions.ConversionError: Input document 2408.09869v5.pdf is not valid.
-> Cannot close object, library is destroyed. This may cause a memory leak!
The same code works successfully on another computer. What could be the problem and how can I fix it?
Error filename does not exists: ...additional.dat in Docling Library
The error filename does not exists: ...additional.dat in the Docling library occurs due to missing or corrupted resource files required for PDF document parsing. This is a common issue related to the installed version of Docling or environment configuration, and it can be successfully resolved through several methods.
Contents
- Causes of the Error
- Main Solutions to the Problem
- Additional Troubleshooting Methods
- Preventing Future Errors
- Verification and Diagnostics
Causes of the Error
The error indicates that the file additional.dat is missing from the Docling resource directory. The main causes include:
- Incomplete or corrupted installation - resource files were not installed correctly
- Access permission issues - the library cannot access resource files
- Version conflicts - mismatch between Docling versions and its dependencies
- Missing environment variables - the resource path is not configured
Based on your traceback, the problem occurs during PDF parser initialization in the file docling_parse_v4_backend.py, which confirms the resource-oriented nature of the error.
Main Solutions to the Problem
1. Setting artifact paths via environment variable
The most effective solution, based on research of the issue on GitHub:
import os
from docling.document_converter import DocumentConverter
# Set the path to artifacts
os.environ["DOCLING_ARTIFACTS_PATH"] = "/path/to/artifacts/directory"
source = "https://arxiv.org/pdf/2408.09869"
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown())
2. Reinstalling Docling with cache clearing
# Complete library reinstall
pip uninstall docling docling-parse
pip install docling --no-cache-dir
# Or using pip cache purge
pip cache purge
pip install docling
3. Configuring converter options with specified paths
from docling.document_converter import DocumentConverter
from docling.datamodel.pipeline_options import PdfPipelineOptions
import os
# Create artifacts directory if it doesn't exist
artifacts_path = "./docling_artifacts"
os.makedirs(artifacts_path, exist_ok=True)
# Configure options with explicit paths
pdf_pipeline_options = PdfPipelineOptions(
artifacts_path=artifacts_path,
generate_page_images=False,
generate_picture_images=False,
)
# Create converter with options
converter = DocumentConverter(
format_options={
"pdf": {
"pipeline_options": pdf_pipeline_options
}
}
)
source = "https://arxiv.org/pdf/2408.09869"
result = converter.convert(source)
Additional Troubleshooting Methods
4. Checking and correcting installation paths
If the problem persists, verify the physical presence of resource files:
import sys
from pathlib import Path
# Check installation paths
for path in sys.path:
resource_path = Path(path) / "docling_parse" / "pdf_resources_v2" / "glyphs" / "standard" / "additional.dat"
if resource_path.exists():
print(f"File found: {resource_path}")
else:
print(f"File missing: {resource_path}")
5. Using an alternative backend
If the standard backend causes issues, try using PyPdfium2:
from docling.document_converter import DocumentConverter
from docling.backend.pypdfium2_backend import PyPdfiumDocumentBackend
# Use alternative backend
converter = DocumentConverter(backend_class=PyPdfiumDocumentBackend)
source = "https://arxiv.org/pdf/2408.09869"
result = converter.convert(source)
print(result.document.export_to_markdown())
6. Disabling unnecessary features to reduce resource requirements
from docling.document_converter import DocumentConverter
from docling.datamodel.pipeline_options import PdfPipelineOptions, AcceleratorOptions, EasyOcrOptions
pdf_pipeline_options = PdfPipelineOptions(
accelerator_options=AcceleratorOptions(device="auto"),
ocr_options=EasyOcrOptions(force_full_page_ocr=False, use_gpu=False),
do_ocr=False, # Disable OCR if not required
table_structure_options=None, # Disable table recognition
generate_page_images=False,
generate_picture_images=False,
)
converter = DocumentConverter(
format_options={
"pdf": {
"pipeline_options": pdf_pipeline_options
}
}
)
Preventing Future Errors
7. Using a virtual environment with isolated installation
# Create a clean environment
python -m venv clean_docling_env
source clean_docling_env/bin/activate # For Linux/Mac
# or
clean_docling_env\Scripts\activate # For Windows
# Install in clean environment
pip install docling
8. Regularly updating the library
pip install --upgrade docling
9. Monitoring resource usage
Add logging for issue diagnostics:
import logging
from docling.document_converter import DocumentConverter
# Configure logging
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)
converter = DocumentConverter()
try:
result = converter.convert("https://arxiv.org/pdf/2408.09869")
print(result.document.export_to_markdown())
except Exception as e:
logger.error(f"Conversion error: {e}")
Verification and Diagnostics
10. Installation test verification
Create a test script to verify correct installation:
from docling.document_converter import DocumentConverter
import os
def test_docling_installation():
try:
# Check basic functionality
converter = DocumentConverter()
print("✓ Basic converter created successfully")
# Check resource paths
import sys
from pathlib import Path
for path in sys.path:
resource_dir = Path(path) / "docling_parse" / "pdf_resources_v2"
if resource_dir.exists():
print(f"✓ Resource directory found: {resource_dir}")
return True
print("✗ Resource directory not found")
return False
except Exception as e:
print(f"✗ Error during verification: {e}")
return False
if __name__ == "__main__":
test_docling_installation()
If all methods fail, you may need to consult the official Docling repository on GitHub or review existing issues, as the problem may be related to a specific version of the library or a system-specific issue.
Sources
- Failed to parse pdf from docling >= 2.24 · Issue #1064 · docling-project/docling
- Docling FAQ - Official Documentation
- Docling CLI reference - Official Documentation
- Docling Usage - Official Documentation
- Microsoft Q&A - Azure Function crashes with Docling
Conclusion
The filename does not exists error in Docling is typically resolved by combining the environment variable DOCLING_ARTIFACTS_PATH setup, library reinstallation, and converter options configuration. In most cases, using the first method with artifact path setup is sufficient. If the problem persists, it’s recommended to check for the presence of resource files in the installation directory or use alternative backends. To prevent similar issues in the future, use clean virtual environments and regularly update the library.