How to convert bytes to string in Python 3
I’ve captured the standard output of an external program into a bytes object:
>>> from subprocess import *
>>> stdout = Popen(['ls', '-l'], stdout=PIPE).communicate()[0]
>>> stdout
b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file2\n'
I want to convert that to a normal Python string, so that I can print it like this:
>>> print(stdout)
-rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file1
-rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file2
How do I convert the bytes object to a str with Python 3?
The most straightforward way to convert bytes to string in Python 3 is to use the .decode() method on your bytes object. For subprocess output specifically, you can either decode the bytes manually or use the text=True parameter (available in Python 3.7+) to get strings directly from the start.
Contents
- Basic Bytes to String Conversion
- Converting Subprocess Output
- Advanced Decoding Techniques
- Handling Encoding Issues
- Best Practices
- Complete Examples
Basic Bytes to String Conversion
In Python 3, the fundamental method to convert bytes to string is using the .decode() method. When you have a bytes object like the one from your subprocess output, you can convert it to a regular string by specifying an encoding:
>>> stdout = b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file1\n'
>>> string_output = stdout.decode('utf-8')
>>> print(string_output)
total 0
-rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file1
The decode() method takes an encoding parameter, with utf-8 being the most common default. According to the Python 3.14.0 documentation, subprocess functions return data as encoded bytes by default, and the application needs to handle the decoding.
For your specific case:
>>> from subprocess import *
>>> stdout = Popen(['ls', '-l'], stdout=PIPE).communicate()[0]
>>> string_output = stdout.decode('utf-8')
>>> print(string_output)
total 0
-rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file1
-rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file2
Converting Subprocess Output
When working with subprocess output, you have several approaches to get strings instead of bytes:
Method 1: Using text=True (Recommended for Python 3.7+)
The modern approach is to use the text=True parameter (or universal_newlines=True in older versions) which automatically handles the conversion:
>>> from subprocess import Popen, PIPE
>>> process = Popen(['ls', '-l'], stdout=PIPE, text=True)
>>> stdout = process.communicate()[0]
>>> print(stdout) # stdout is already a string
total 0
-rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file1
-rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file2
As explained on Stack Overflow, this approach is cleaner and avoids manual decoding.
Method 2: Using encoding Parameter
You can also specify the encoding directly:
>>> process = Popen(['ls', '-l'], stdout=PIPE, encoding='utf-8')
>>> stdout = process.communicate()[0]
Method 3: Using subprocess.run() (Modern API)
For newer Python versions, subprocess.run() is the preferred method:
>>> import subprocess
>>> result = subprocess.run(['ls', '-l'], capture_output=True, text=True)
>>> print(result.stdout)
total 0
-rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file1
-rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file2
Advanced Decoding Techniques
Handling Different Encodings
Sometimes subprocess output might not be in UTF-8. You can specify different encodings:
# For Windows command output (often cp437 or cp1252)
output = subprocess.check_output('dir', shell=True, encoding='cp437')
As noted in the Stack Overflow discussion, you might need to use platform-specific encodings for certain system commands.
Splitting Lines Directly
You can decode and split lines in one operation:
>>> lines = stdout.decode('utf-8').splitlines()
>>> for line in lines:
... print(line)
total 0
-rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file1
-rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file2
Using Context Managers
For more robust handling, use context managers:
import subprocess
with subprocess.Popen(['ls', '-l'], stdout=subprocess.PIPE, text=True) as p:
stdout, stderr = p.communicate()
print(stdout)
Handling Encoding Issues
Error Handling in Decoding
When decoding bytes, you might encounter encoding errors. You can handle these with the errors parameter:
# Ignore errors
clean_string = corrupted_bytes.decode('utf-8', errors='ignore')
# Replace errors with placeholder
clean_string = corrupted_bytes.decode('utf-8', errors='replace')
# Strict mode (default) - raises UnicodeDecodeError on errors
clean_string = corrupted_bytes.decode('utf-8', errors='strict')
According to sqlpey, the errors argument is particularly useful when dealing with corrupted or mixed-encoding data.
Detecting Encoding
For cases where you’re unsure about the encoding, you might need to detect it first:
import locale
# Use system's preferred encoding
encoding = locale.getpreferredencoding(False)
output = stdout.decode(encoding, errors='replace')
The Stack Overflow discussion explains how encoding detection can help with subprocess output.
Best Practices
-
Prefer
text=Truefor Python 3.7+ - it’s cleaner and less error-prone than manual decoding. -
Handle encoding explicitly - don’t rely on system defaults when working with external command output.
-
Use error handling - always consider what happens when encoding fails.
-
Choose the right subprocess function -
subprocess.run()is preferred for modern Python. -
Consider context managers - they ensure proper resource cleanup.
As the Python documentation states, “the actual encoding of the output data may depend on the command being invoked, so the decoding to text will often need to be handled at the application level.”
Complete Examples
Example 1: Basic Conversion
from subprocess import Popen, PIPE
# Get bytes output
process = Popen(['ls', '-l'], stdout=PIPE)
stdout_bytes, stderr_bytes = process.communicate()
# Convert to string
stdout_str = stdout_bytes.decode('utf-8')
print(stdout_str)
Example 2: Modern Approach (Recommended)
import subprocess
# Get string output directly
result = subprocess.run(['ls', '-l'], capture_output=True, text=True)
print(result.stdout)
Example 3: Robust Error Handling
import subprocess
try:
# Try with UTF-8 first
result = subprocess.run(['ls', '-l'], capture_output=True, text=True, encoding='utf-8')
print(result.stdout)
except UnicodeDecodeError:
# Fall back to system encoding
result = subprocess.run(['ls', '-l'], capture_output=True, text=True, encoding=locale.getpreferredencoding())
print(result.stdout)
Example 4: Working with Multiple Commands
import subprocess
commands = [
['ls', '-l'],
['date'],
['whoami']
]
for cmd in commands:
result = subprocess.run(cmd, capture_output=True, text=True, encoding='utf-8')
print(f"Command: {' '.join(cmd)}")
print("Output:")
print(result.stdout)
print("-" * 40)
Sources
- Python 3.14.0 documentation - subprocess management
- Convert bytes to a string in Python 3 - Stack Overflow
- Python: Convert Bytes to String Effectively - sqlpey
- Python Bytes to String: Decoding Techniques and Solutions - sqlpey
- Python subprocess encoding - Stack Overflow
- Why does opening a subprocess with universal_newlines cause a unicode decode exception? - Stack Overflow
Conclusion
Converting bytes to string in Python 3 is straightforward once you understand the available methods. For your subprocess output, the simplest solution is either using .decode('utf-8') on the bytes object or using text=True parameter to get strings directly.
Key takeaways:
- Use
.decode('utf-8')for manual conversion - Prefer
text=Trueparameter (Python 3.7+) for cleaner code - Handle encoding errors with the
errorsparameter when needed - Consider using
subprocess.run()instead ofPopenfor modern Python code - Be aware that subprocess output encoding may vary between systems
Choose the method that best fits your Python version and specific use case, and always consider potential encoding issues when working with external command output.