How to download a file over HTTP using Python?
I have a utility that downloads an MP3 file from a website on a schedule and then builds/updates a podcast XML file for iTunes. The XML processing is written in Python, but I currently use wget in a Windows .bat file to download the MP3 file. I want to replace wget with Python code to have the entire utility in Python.
How can I download files using Python instead of wget?
You can download files over HTTP using Python’s built-in urllib library or the popular requests library. The requests library is generally recommended for its simplicity and robust error handling, while urllib is built into Python and requires no additional dependencies. Both methods can easily replace wget in your podcast utility, allowing you to have the entire process in Python.
Contents
- Basic Download Methods
- Using the requests Library
- Advanced Features
- Error Handling and Best Practices
- Complete Podcast Utility Example
- Choosing the Right Approach
Basic Download Methods
Python offers several built-in ways to download files over HTTP. The most straightforward approach uses urllib.request, which is part of Python’s standard library.
Using urllib.request.urlretrieve
The urlretrieve function is the simplest way to download a file, similar to wget:
import urllib.request
url = 'https://example.com/podcast_episode.mp3'
filename = 'podcast_episode.mp3'
urllib.request.urlretrieve(url, filename)
This method downloads the entire file at once and saves it directly to the specified filename. As noted on Real Python, this approach is suitable for straightforward file downloading tasks when you don’t need progress feedback or advanced features.
Using urllib.request.urlopen
For more control over the download process, you can use urlopen:
import urllib.request
url = 'https://example.com/podcast_episode.mp3'
filename = 'podcast_episode.mp3'
with urllib.request.urlopen(url) as response, open(filename, 'wb') as out_file:
out_file.write(response.read())
According to Stack Overflow, this method gives you more flexibility as it works with file-like objects and allows you to process the response before writing to disk.
Using the requests Library
The requests library is a third-party package that provides a more elegant and powerful API for HTTP requests. It’s not included in Python’s standard library, so you’ll need to install it first:
pip install requests
Basic Download with requests.get
import requests
url = 'https://example.com/podcast_episode.mp3'
filename = 'podcast_episode.mp3'
response = requests.get(url)
with open(filename, 'wb') as file:
file.write(response.content)
As Stack Overflow explains, the requests package has a very easy API to start with and is preferred by many developers for HTTP-related tasks.
Streaming Download for Large Files
For large files like MP3s, it’s better to stream the download rather than loading the entire file into memory:
import requests
def download_file(url, filename):
with requests.get(url, stream=True) as r:
r.raise_for_status()
with open(filename, 'wb') as f:
for chunk in r.iter_content(chunk_size=8192):
f.write(chunk)
url = 'https://example.com/large_podcast_episode.mp3'
download_file(url, 'large_episode.mp3')
Advanced Features
Progress Bar with tqdm
For better user experience, especially with large files, you can add a progress bar using the tqdm library:
pip install tqdm
import requests
from tqdm import tqdm
def download_with_progress(url, filename):
response = requests.get(url, stream=True)
total_size = int(response.headers.get('content-length', 0))
with open(filename, 'wb') as file, tqdm(
desc=filename,
total=total_size,
unit='B',
unit_scale=True,
unit_divisor=1024,
) as progress_bar:
for chunk in response.iter_content(chunk_size=8192):
size = file.write(chunk)
progress_bar.update(size)
# Usage
download_with_progress('https://example.com/podcast_episode.mp3', 'episode.mp3')
As Alpharithms explains, this implementation provides visual feedback during downloads, which is helpful for long downloads.
Authentication for Protected Files
If your podcast files are behind authentication:
import requests
def download_protected_file(url, filename, username, password):
response = requests.get(url, auth=(username, password), stream=True)
response.raise_for_status()
with open(filename, 'wb') as file:
for chunk in response.iter_content(chunk_size=8192):
file.write(chunk)
According to HeyCoach, authentication can be handled seamlessly using the auth parameter.
Error Handling and Best Practices
Proper Error Handling
Always implement proper error handling for network operations:
import requests
import urllib.request
import urllib.error
def safe_download_with_requests(url, filename):
try:
response = requests.get(url, stream=True, timeout=30)
response.raise_for_status() # Raises HTTPError for bad responses
with open(filename, 'wb') as file:
for chunk in response.iter_content(chunk_size=8192):
file.write(chunk)
return True
except requests.exceptions.RequestException as e:
print(f"Download failed: {e}")
return False
def safe_download_with_urllib(url, filename):
try:
urllib.request.urlretrieve(url, filename)
return True
except urllib.error.URLError as e:
print(f"Download failed: {e}")
return False
Best Practices from Medium
- Always use
raise_for_status()to catch HTTP errors - Stream large files using
stream=Trueanditer_content() - Add proper error handling for network issues
- Use progress bars for better user experience
- Validate downloaded files
Complete Podcast Utility Example
Here’s a complete example that replaces your wget approach:
import requests
import os
import xml.etree.ElementTree as ET
from datetime import datetime
from urllib.parse import urljoin
import logging
# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
class PodcastUpdater:
def __init__(self, base_url, output_dir='podcasts'):
self.base_url = base_url
self.output_dir = output_dir
os.makedirs(output_dir, exist_ok=True)
def download_episode(self, episode_url, episode_id):
"""Download a podcast episode with progress tracking and error handling"""
filename = os.path.join(self.output_dir, f"{episode_id}.mp3")
try:
# Stream the download with progress bar
with requests.get(episode_url, stream=True, timeout=30) as response:
response.raise_for_status()
# Get total file size
total_size = int(response.headers.get('content-length', 0))
logger.info(f"Downloading {episode_id} ({total_size/1024/1024:.1f} MB)")
with open(filename, 'wb') as file:
downloaded = 0
for chunk in response.iter_content(chunk_size=8192):
if chunk: # filter out keep-alive chunks
file.write(chunk)
downloaded += len(chunk)
# Log progress every 10%
if total_size > 0 and downloaded % (total_size // 10) == 0:
progress = (downloaded / total_size) * 100
logger.info(f"Progress: {progress:.0f}%")
logger.info(f"Successfully downloaded {filename}")
return True
except requests.exceptions.RequestException as e:
logger.error(f"Failed to download {episode_id}: {e}")
return False
def update_podcast_xml(self, episodes_config):
"""Update the podcast XML file with new episodes"""
xml_file = os.path.join(self.output_dir, 'podcast.xml')
# Create XML structure (simplified example)
root = ET.Element('rss', version='2.0')
channel = ET.SubElement(root, 'channel')
# Add basic podcast info
ET.SubElement(channel, 'title').text = "My Podcast"
ET.SubElement(channel, 'description').text = "A great podcast"
ET.SubElement(channel, 'link').text = self.base_url
# Add episodes
for episode in episodes_config:
if self.download_episode(episode['url'], episode['id']):
item = ET.SubElement(channel, 'item')
ET.SubElement(item, 'title').text = episode['title']
ET.SubElement(item, 'description').text = episode['description']
ET.SubElement(item, 'pubDate').text = datetime.now().strftime('%a, %d %b %Y %H:%M:%S GMT')
ET.SubElement(item, 'enclosure', {
'url': urljoin(self.base_url, f"{episode['id']}.mp3"),
'type': 'audio/mpeg',
'length': str(episode.get('size', 0))
})
# Write XML file
tree = ET.ElementTree(root)
tree.write(xml_file, encoding='utf-8', xml_declaration=True)
logger.info(f"Updated podcast XML: {xml_file}")
# Example usage
if __name__ == "__main__":
podcast = PodcastUpdater('https://my-podcast-website.com')
episodes = [
{
'id': 'episode001',
'title': 'First Episode',
'description': 'This is my first episode',
'url': 'https://my-podcast-website.com/episodes/episode001.mp3',
'size': 52428800 # 50MB
},
{
'id': 'episode002',
'title': 'Second Episode',
'description': 'This is my second episode',
'url': 'https://my-podcast-website.com/episodes/episode002.mp3',
'size': 78643200 # 75MB
}
]
podcast.update_podcast_xml(episodes)
Choosing the Right Approach
When to Use urllib
Use Python’s built-in urllib when:
- You want to avoid external dependencies
- You’re working in a restricted environment
- You need simple downloads without advanced features
- Your utility needs to be self-contained
As Tutorialspoint notes, urllib.request is suitable for straightforward file downloading tasks.
When to Use requests
Use the requests library when:
- You need better error handling
- You want cleaner, more readable code
- You require advanced features like streaming, sessions, or authentication
- You’re working with complex HTTP scenarios
- You want to add progress bars easily
According to AskPython, the requests library provides an exclusive and efficient way to handle HTTP requests in Python.
Recommendations for Your Use Case
For your podcast utility, I recommend using the requests library with streaming and proper error handling because:
- MP3 files can be large, so streaming prevents memory issues
- Error handling is crucial for automated scheduled downloads
- Progress bars provide feedback during downloads
- The code will be more maintainable and readable
- You can easily extend it for authentication if needed in the future
Conclusion
Downloading files over HTTP in Python is straightforward and can easily replace wget in your podcast utility. Here are the key takeaways:
- Both
urllibandrequestscan handle file downloads, withrequestsoffering better error handling and more features - Use streaming downloads (
stream=Truewithiter_content()) for large files like MP3s to avoid memory issues - Always implement proper error handling with
raise_for_status()and try-catch blocks - Add progress bars using
tqdmfor better user experience with large downloads - Consider authentication if your podcast files are behind protected access
For your specific podcast utility, the requests library with streaming, error handling, and progress tracking provides the most robust solution. The complete example above shows how to integrate file downloading with XML processing in a single Python script, eliminating the need for external tools like wget.
Sources
- Real Python - How to Download Files From URLs With Python
- Stack Overflow - How to download a file over HTTP?
- Stack Overflow - Download file from web in Python 3
- Tutorialspoint - Download a file over HTTP in Python
- AskPython - Python HTTP File Download: Using the Requests Library
- Alpharithms - Progress Bars for Python Downloads
- HeyCoach - Handling File Downloads In Python
- Medium - Downloading Files from URLs in Python
- WebScraping.AI - How do I handle file downloads during web scraping with Python?
- Geekflare - 5 Ways to Download Files from a URL Using Python