Programming

How to Read JSON Files in Python Correctly

Learn the proper way to read JSON files in Python, fixing TypeError and ValueError errors with json.load() vs json.loads() methods.

1 answer 1 view

How do I properly read a JSON file in Python? I’m getting a ‘TypeError: expected string or buffer’ error when using json.loads() and a ‘ValueError: Extra data’ error when using json.load(). What’s the correct way to parse a JSON file like this example?

Reading JSON files in Python requires understanding the fundamental difference between json.load() and json.loads() methods. The errors you’re encountering—TypeError with json.loads() and ValueError with json.load()—stem from using the wrong method for your data type. json.load() is designed to read from file objects, while json.loads() parses string data already loaded into memory.


Contents


Understanding the Difference Between json.load() and json.loads()

When working with JSON data in Python, the most common point of confusion is choosing between json.load() and json.loads(). These two methods serve different purposes and using them interchangeably leads to the exact errors you’re experiencing.

The json.load() function is specifically designed to read JSON data from a file object. It takes a file-like object as its parameter and parses the JSON content directly from that file. This is the method you should use when you have a JSON file on disk and want to read its contents into a Python data structure.

On the other hand, json.loads() (note the ‘s’ at the end) is used to parse JSON data that’s already in string form. The ‘s’ stands for “string,” so this method expects a string containing JSON-formatted data and converts it into a Python object.

Let’s look at the correct implementation for each scenario:

Reading from a file:

python
import json

# Correct usage with json.load()
with open('data.json', 'r') as file:
 data = json.load(file)

Parsing from a string:

python
import json

# Correct usage with json.loads()
json_string = '{"name": "John", "age": 30}'
data = json.loads(json_string)

The key insight here is understanding your data’s current format. If you’re working directly with a file on disk, json.load() is your friend. If you’ve already read the file content into a string variable, then json.loads() is the right choice.

One common mistake developers make is reading the entire file content as a string and then trying to parse it with json.loads() without realizing they could have used json.load() directly on the file object. This approach works but is less efficient and can lead to encoding issues if not handled properly.

Another important consideration is the file mode. When opening JSON files for reading, you should always use the ‘r’ mode (read), not ‘rb’ (read binary) unless you’re working with binary-encoded JSON data. The json.load() function handles the encoding automatically when used with text files opened in ‘r’ mode.


Common JSON Reading Errors in Python and Their Solutions

The errors you’re encountering are quite common among Python developers working with JSON files. Let’s break down each error and explore their root causes and effective solutions.

TypeError: Expected String or Buffer

This error occurs when you try to use json.loads() on a file object instead of a string. The method expects string data but receives a file-like object, resulting in the TypeError.

The problematic code:

python
import json

# This will raise TypeError
with open('data.json', 'r') as file:
 data = json.loads(file) # Error: expected string or buffer

The solution:

python
import json

# Correct approach using json.load()
with open('data.json', 'r') as file:
 data = json.load(file) # This works correctly

Or if you absolutely need to use json.loads() for some reason:

python
import json

# Alternative approach with json.loads()
with open('data.json', 'r') as file:
 content = file.read() # Read the file content as string
 data = json.loads(content) # Now this will work

The first solution using json.load() is preferred because it’s more efficient and handles encoding issues automatically.

ValueError: Extra Data

The “Extra data” error typically occurs when you’re trying to parse JSON data that contains multiple JSON objects concatenated together. The standard JSON format expects a single object or array at the top level.

Problematic scenarios that cause this error:

  1. Multiple JSON objects in a file:
json
{"name": "John"}{"name": "Jane"} # Invalid JSON
  1. JSON content followed by additional text:
json
{"name": "John"} # Some additional text here

Solutions for these scenarios:

For multiple JSON objects:
The most common solution is to use JSON Lines format (also known as JSONL), where each line contains a valid JSON object:

json
{"name": "John"}
{"name": "Jane"}
{"age": 25}

To parse such a file:

python
import json

json_objects = []
with open('data.jsonl', 'r') as file:
 for line in file:
 line = line.strip() # Remove whitespace and newline characters
 if line: # Skip empty lines
 json_objects.append(json.loads(line))

For JSON with trailing text:
If you have control over the file creation, ensure it contains only valid JSON. If not, you can extract the JSON portion:

python
import json
import re

with open('data.txt', 'r') as file:
 content = file.read()
 
# Extract JSON content (assuming it starts with { and ends with })
json_match = re.search(r'{.*}', content, re.DOTALL)
if json_match:
 json_data = json_match.group(0)
 data = json.loads(json_data)
else:
 raise ValueError("No valid JSON found in the file")

Comparison Table: When to Use Each Method

Scenario Recommended Method Why
Reading from a file json.load() Directly parses from file object
Parsing JSON string json.loads() Parses string data
Multiple JSON objects Custom parsing with json.loads() per object Standard JSON doesn’t support multiple top-level objects
Large JSON files Stream parsing or chunked reading Avoids memory issues
Binary JSON data json.load() with ‘rb’ mode Handles binary encoding

Understanding these error scenarios and their solutions will help you troubleshoot JSON parsing issues more effectively and choose the right approach for your specific use case.


Best Practices for Reading JSON Files in Python

To avoid JSON parsing errors and write robust code for handling JSON files, follow these established best practices that ensure reliability and maintainability.

1. Always Use Context Managers for File Handling

The with statement in Python is your best friend when working with files. It ensures proper resource management and automatically closes the file even if errors occur.

python
import json

# Recommended approach
try:
 with open('data.json', 'r', encoding='utf-8') as file:
 data = json.load(file)
except FileNotFoundError:
 print("Error: The file was not found")
except json.JSONDecodeError:
 print("Error: Invalid JSON format")

This approach handles resource cleanup automatically, preventing file handle leaks and potential data corruption.

2. Specify Explicit Encoding

JSON files should typically use UTF-8 encoding. While Python’s default encoding might work in many cases, explicitly specifying it makes your code more portable and prevents encoding-related issues.

python
# Explicit UTF-8 encoding (recommended)
with open('data.json', 'r', encoding='utf-8') as file:
 data = json.load(file)

3. Implement Proper Error Handling

JSON operations can fail for various reasons. Implementing comprehensive error handling makes your code more resilient.

python
import json

def load_json_file(file_path):
 try:
 with open(file_path, 'r', encoding='utf-8') as file:
 return json.load(file)
 except FileNotFoundError:
 print(f"Error: File '{file_path}' not found")
 return None
 except json.JSONDecodeError as e:
 print(f"Error: Invalid JSON format in '{file_path}': {e}")
 return None
 except Exception as e:
 print(f"Unexpected error loading '{file_path}': {e}")
 return None

# Usage
data = load_json_file('data.json')
if data is not None:
 # Process the data
 pass

4. Validate JSON Structure Before Processing

For production applications, consider validating the JSON structure against an expected schema before processing the data. This can prevent downstream issues when the data doesn’t match expectations.

python
import json
from jsonschema import validate

schema = {
 "type": "object",
 "properties": {
 "name": {"type": "string"},
 "age": {"type": "number"}
 },
 "required": ["name", "age"]
}

try:
 with open('user.json', 'r', encoding='utf-8') as file:
 data = json.load(file)
 
 # Validate against schema
 validate(instance=data, schema=schema)
 print("JSON data is valid")
 
 # Process the data
 print(f"User: {data['name']}, Age: {data['age']}")
 
except json.JSONDecodeError as e:
 print(f"Invalid JSON: {e}")
except Exception as e:
 print(f"Validation error: {e}")

5. Handle Large JSON Files Efficiently

For large JSON files that might not fit in memory, consider alternative approaches:

Streaming approach for JSON Lines (JSONL) files:

python
import json

def process_large_jsonl(file_path):
 with open(file_path, 'r', encoding='utf-8') as file:
 for line in file:
 line = line.strip()
 if line: # Skip empty lines
 try:
 data = json.loads(line)
 # Process each JSON object
 yield data
 except json.JSONDecodeError:
 print(f"Skipping invalid line: {line}")
 continue

# Usage
for item in process_large_jsonl('large_data.jsonl'):
 # Process each item
 print(item)

Chunked reading for large single JSON objects:

python
import json
import ijson

def stream_large_json(file_path):
 with open(file_path, 'rb') as file:
 # Process JSON in chunks
 for item in ijson.items(file, 'item'):
 yield item

# Usage
for item in stream_large_json('large_data.json'):
 # Process each item
 print(item)

6. Create Reusable JSON Utilities

Encapsulate common JSON operations in utility functions for better code organization and reusability.

python
import json
import os

class JsonHandler:
 @staticmethod
 def load_json(file_path):
 """Load JSON data from file with error handling."""
 if not os.path.exists(file_path):
 raise FileNotFoundError(f"File not found: {file_path}")
 
 try:
 with open(file_path, 'r', encoding='utf-8') as file:
 return json.load(file)
 except json.JSONDecodeError as e:
 raise ValueError(f"Invalid JSON format in {file_path}: {e}")
 
 @staticmethod
 def save_json(data, file_path, indent=2):
 """Save data to JSON file with error handling."""
 try:
 with open(file_path, 'w', encoding='utf-8') as file:
 json.dump(data, file, indent=indent, ensure_ascii=False)
 except Exception as e:
 raise IOError(f"Failed to save JSON to {file_path}: {e}")
 
 @staticmethod
 def validate_json(json_data, schema):
 """Validate JSON data against a schema."""
 from jsonschema import validate
 try:
 validate(instance=json_data, schema=schema)
 return True
 except Exception:
 return False

# Usage
try:
 data = JsonHandler.load_json('data.json')
 # Process data...
 JsonHandler.save_json(data, 'output.json')
except Exception as e:
 print(f"Error: {e}")

7. Performance Considerations

For performance-critical applications, consider these optimization techniques:

  1. Use ujson if available: The ujson library provides drop-in replacement for Python’s json module with significantly better performance.
python
# Install with: pip install ujson
import ujson

with open('data.json', 'r', encoding='utf-8') as file:
 data = ujson.load(file)
  1. Minimize file operations: Read the file once and work with the in-memory data structure rather than repeatedly accessing the file.

  2. Use appropriate data types: Convert JSON data to appropriate Python data types early in the processing pipeline.

  3. Consider asynchronous I/O for very large files: For extremely large files, consider using asynchronous file operations to prevent blocking the main thread.

By following these best practices, you’ll create more robust, maintainable, and efficient code for handling JSON files in Python applications.


Advanced JSON Parsing Techniques

Once you’ve mastered the basics of JSON file reading in Python, you can explore more advanced techniques to handle complex scenarios, improve performance, and integrate JSON parsing seamlessly into your applications.

1. Custom JSON Decoders

Python’s json module allows you to create custom decoders to handle non-standard JSON formats or to convert JSON data into custom Python objects.

python
import json
from datetime import datetime

class CustomJSONDecoder(json.JSONDecoder):
 def __init__(self, *args, **kwargs):
 json.JSONDecoder.__init__(self, object_hook=self.object_hook, *args, **kwargs)
 
 def object_hook(self, obj):
 # Convert date strings to datetime objects
 if 'date' in obj and isinstance(obj['date'], str):
 try:
 obj['date'] = datetime.strptime(obj['date'], '%Y-%m-%d')
 except ValueError:
 pass
 return obj

# Usage
json_str = '{"name": "Event", "date": "2023-12-25"}'
data = json.loads(json_str, cls=CustomJSONDecoder)
print(data['date']) # datetime object instead of string

This approach is particularly useful when you need to perform type conversion or custom processing during JSON parsing.

2. Handling Different JSON Formats

Real-world JSON data often comes in various formats. Understanding how to handle different structures makes your code more versatile.

Array of objects:

json
[
 {"id": 1, "name": "Item 1"},
 {"id": 2, "name": "Item 2"}
]
python
with open('array_data.json', 'r') as file:
 items = json.load(file)
for item in items:
 print(f"ID: {item['id']}, Name: {item['name']}")

Nested objects:

json
{
 "user": {
 "id": 123,
 "profile": {
 "name": "John",
 "preferences": {
 "theme": "dark"
 }
 }
 }
}
python
with open('nested_data.json', 'r') as file:
 data = json.load(file)

# Access nested values
user_id = data['user']['id']
theme = data['user']['profile']['preferences']['theme']

Mixed data structures:

python
def process_mixed_json(file_path):
 with open(file_path, 'r') as file:
 data = json.load(file)
 
 if isinstance(data, list):
 # Handle array of objects
 return [process_item(item) for item in data]
 elif isinstance(data, dict):
 # Handle single object
 return process_item(data)
 else:
 raise ValueError("Unsupported JSON structure")

def process_item(item):
 """Process individual JSON items"""
 # Custom processing logic
 return item

3. Working with JSON Schema

JSON Schema provides a powerful way to validate and document JSON data structures. Integrating schema validation into your JSON parsing workflow ensures data consistency.

python
import json
from jsonschema import validate, ValidationError

# Define a schema
schema = {
 "type": "object",
 "properties": {
 "name": {"type": "string"},
 "age": {"type": "number", "minimum": 0},
 "email": {"type": "string", "format": "email"}
 },
 "required": ["name", "age"],
 "additionalProperties": False
}

def load_and_validate_json(file_path, schema):
 try:
 with open(file_path, 'r', encoding='utf-8') as file:
 data = json.load(file)
 
 validate(instance=data, schema=schema)
 return data
 except FileNotFoundError:
 raise FileNotFoundError(f"File not found: {file_path}")
 except json.JSONDecodeError as e:
 raise ValueError(f"Invalid JSON format: {e}")
 except ValidationError as e:
 raise ValueError(f"Schema validation failed: {e}")

# Usage
try:
 user_data = load_and_validate_json('user.json', schema)
 print("Valid JSON data:", user_data)
except Exception as e:
 print("Error:", e)

4. JSON Serialization with Custom Encoders

Just as you can create custom decoders, you can also implement custom encoders to handle non-standard Python types when serializing to JSON.

python
import json
from datetime import datetime
from uuid import UUID

class CustomJSONEncoder(json.JSONEncoder):
 def default(self, obj):
 if isinstance(obj, datetime):
 return obj.isoformat()
 elif isinstance(obj, UUID):
 return str(obj)
 # Let the base class default method handle it
 return json.JSONEncoder.default(self, obj)

# Usage
data = {
 'id': UUID('123e4567-e89b-12d3-a456-426614174000'),
 'created_at': datetime.now(),
 'name': 'Test'
}

json_str = json.dumps(data, cls=CustomJSONEncoder, indent=2)
print(json_str)

5. Streaming JSON Processing

For very large JSON files that don’t fit in memory, streaming processing allows you to handle the data incrementally.

python
import json
import ijson

def stream_json_array(file_path, array_key):
 """Process a large JSON array by streaming it"""
 with open(file_path, 'r', encoding='utf-8') as file:
 # Yield each item in the array
 for item in ijson.items(file, f'{array_key}.item'):
 yield item

# Usage
for item in stream_json_array('large_data.json', 'items'):
 # Process each item without loading the entire file
 process_item(item)

6. Asynchronous JSON Processing

For applications that need to handle JSON files concurrently, asynchronous processing can improve performance.

python
import asyncio
import aiofiles
import json

async def async_json_load(file_path):
 """Asynchronously load a JSON file"""
 async with aiofiles.open(file_path, mode='r', encoding='utf-8') as file:
 content = await file.read()
 return json.loads(content)

async def process_multiple_json_files(file_paths):
 """Process multiple JSON files concurrently"""
 tasks = [async_json_load(path) for path in file_paths]
 return await asyncio.gather(*tasks)

# Usage
file_paths = ['data1.json', 'data2.json', 'data3.json']
results = asyncio.run(process_multiple_json_files(file_paths))

7. Integration with Data Science Libraries

For data analysis applications, integrating JSON parsing with libraries like pandas can streamline your workflow.

python
import json
import pandas as pd

def json_to_dataframe(file_path, normalize=False):
 """Convert JSON to pandas DataFrame"""
 with open(file_path, 'r', encoding='utf-8') as file:
 data = json.load(file)
 
 if normalize:
 # Normalize nested JSON into flat table
 return pd.json_normalize(data)
 else:
 return pd.DataFrame(data)

# Usage
df = json_to_dataframe('data.json', normalize=True)
print(df.head())

8. Environment-Specific JSON Handling

Different environments might require different approaches to JSON handling. Creating environment-aware utilities can make your code more adaptable.

python
import json
import os
from pathlib import Path

class EnvironmentAwareJsonHandler:
 def __init__(self, environment=None):
 self.environment = environment or os.getenv('ENVIRONMENT', 'development')
 
 def load_json(self, file_path):
 """Load JSON with environment-specific settings"""
 path = Path(file_path)
 
 if not path.exists():
 # Try environment-specific path
 env_path = path.parent / f"{path.stem}_{self.environment}{path.suffix}"
 if env_path.exists():
 path = env_path
 
 with open(path, 'r', encoding='utf-8') as file:
 return json.load(file)
 
 def save_json(self, data, file_path, indent=None):
 """Save JSON with environment-specific settings"""
 if self.environment == 'production':
 indent = None # Compact format in production
 else:
 indent = indent or 2
 
 with open(file_path, 'w', encoding='utf-8') as file:
 json.dump(data, file, indent=indent, ensure_ascii=False)

# Usage
handler = EnvironmentAwareJsonHandler('production')
data = handler.load_json('config.json')
handler.save_json(data, 'output.json')

These advanced techniques will help you handle complex JSON parsing scenarios more effectively and integrate JSON processing seamlessly into your Python applications.


Sources

  1. LambdaTest Community — Guide to correctly reading JSON files in Python without encountering load errors: https://community.lambdatest.com/t/how-can-i-correctly-read-a-json-file-in-python-without-encountering-load-errors/38008/

  2. Python Official Documentation — Comprehensive documentation on json module methods including load() and loads(): https://docs.python.org/3/library/json.html

  3. Stack Overflow — Community discussion and comparison of json.load() vs json.loads() methods with practical examples: https://stackoverflow.com/questions/33336160/python-typeerror-expected-string-or-buffer


Conclusion

Successfully reading JSON files in Python boils down to understanding the fundamental distinction between json.load() and json.loads(). The TypeError you encountered with json.loads() occurs when you try to parse a file object instead of string data, while the “Extra data” ValueError with json.load() typically happens when your JSON file contains multiple objects or invalid formatting.

By following the best practices outlined in this guide—using context managers for file handling, specifying explicit encoding, implementing proper error handling, and choosing the right parsing method for your data structure—you can avoid these common pitfalls and create more robust Python applications that handle JSON data efficiently.

Remember that json.load() is your go-to method for reading directly from files, while json.loads() is designed for parsing JSON strings already in memory. Mastering this distinction and implementing proper error handling will make your JSON parsing code more reliable and maintainable in the long run.

Authors
Verified by moderation
How to Read JSON Files in Python Correctly