NeuroAgent

How to Download Files in Python: Complete Guide

Learn how to download files over HTTP in Python using urllib and requests libraries. Complete guide with error handling, streaming, and podcast utility examples.

#python-http-download #python-requests-library #python-urllib #file-streaming-python #python-error-handling #podcast-automation-python

11/11/2025, 06:41 PM

How to download a file over HTTP using Python?

I have a utility that downloads an MP3 file from a website on a schedule and then builds/updates a podcast XML file for iTunes. The XML processing is written in Python, but I currently use wget in a Windows .bat file to download the MP3 file. I want to replace wget with Python code to have the entire utility in Python.

How can I download files using Python instead of wget?

NeuroAgent

You can download files over HTTP using Python’s built-in urllib library or the popular requests library. The requests library is generally recommended for its simplicity and robust error handling, while urllib is built into Python and requires no additional dependencies. Both methods can easily replace wget in your podcast utility, allowing you to have the entire process in Python.

Basic Download Methods
Using the requests Library
Advanced Features
Error Handling and Best Practices
Complete Podcast Utility Example
Choosing the Right Approach

Basic Download Methods

Python offers several built-in ways to download files over HTTP. The most straightforward approach uses urllib.request, which is part of Python’s standard library.

Using urllib.request.urlretrieve

The urlretrieve function is the simplest way to download a file, similar to wget:

python

import urllib.request

url = 'https://example.com/podcast_episode.mp3'
filename = 'podcast_episode.mp3'

urllib.request.urlretrieve(url, filename)

This method downloads the entire file at once and saves it directly to the specified filename. As noted on Real Python, this approach is suitable for straightforward file downloading tasks when you don’t need progress feedback or advanced features.

Using urllib.request.urlopen

For more control over the download process, you can use urlopen:

python

import urllib.request

url = 'https://example.com/podcast_episode.mp3'
filename = 'podcast_episode.mp3'

with urllib.request.urlopen(url) as response, open(filename, 'wb') as out_file:
    out_file.write(response.read())

According to Stack Overflow, this method gives you more flexibility as it works with file-like objects and allows you to process the response before writing to disk.

Using the requests Library

The requests library is a third-party package that provides a more elegant and powerful API for HTTP requests. It’s not included in Python’s standard library, so you’ll need to install it first:

bash

pip install requests

Basic Download with requests.get

python

import requests

url = 'https://example.com/podcast_episode.mp3'
filename = 'podcast_episode.mp3'

response = requests.get(url)
with open(filename, 'wb') as file:
    file.write(response.content)

As Stack Overflow explains, the requests package has a very easy API to start with and is preferred by many developers for HTTP-related tasks.

Streaming Download for Large Files

For large files like MP3s, it’s better to stream the download rather than loading the entire file into memory:

python

import requests

def download_file(url, filename):
    with requests.get(url, stream=True) as r:
        r.raise_for_status()
        with open(filename, 'wb') as f:
            for chunk in r.iter_content(chunk_size=8192):
                f.write(chunk)

url = 'https://example.com/large_podcast_episode.mp3'
download_file(url, 'large_episode.mp3')

Advanced Features

Progress Bar with tqdm

For better user experience, especially with large files, you can add a progress bar using the tqdm library:

bash

pip install tqdm

python

import requests
from tqdm import tqdm

def download_with_progress(url, filename):
    response = requests.get(url, stream=True)
    total_size = int(response.headers.get('content-length', 0))
    
    with open(filename, 'wb') as file, tqdm(
        desc=filename,
        total=total_size,
        unit='B',
        unit_scale=True,
        unit_divisor=1024,
    ) as progress_bar:
        for chunk in response.iter_content(chunk_size=8192):
            size = file.write(chunk)
            progress_bar.update(size)

# Usage
download_with_progress('https://example.com/podcast_episode.mp3', 'episode.mp3')

As Alpharithms explains, this implementation provides visual feedback during downloads, which is helpful for long downloads.

Authentication for Protected Files

If your podcast files are behind authentication:

python

import requests

def download_protected_file(url, filename, username, password):
    response = requests.get(url, auth=(username, password), stream=True)
    response.raise_for_status()
    
    with open(filename, 'wb') as file:
        for chunk in response.iter_content(chunk_size=8192):
            file.write(chunk)

According to HeyCoach, authentication can be handled seamlessly using the auth parameter.

Error Handling and Best Practices

Proper Error Handling

Always implement proper error handling for network operations:

python

import requests
import urllib.request
import urllib.error

def safe_download_with_requests(url, filename):
    try:
        response = requests.get(url, stream=True, timeout=30)
        response.raise_for_status()  # Raises HTTPError for bad responses
        
        with open(filename, 'wb') as file:
            for chunk in response.iter_content(chunk_size=8192):
                file.write(chunk)
        return True
    except requests.exceptions.RequestException as e:
        print(f"Download failed: {e}")
        return False

def safe_download_with_urllib(url, filename):
    try:
        urllib.request.urlretrieve(url, filename)
        return True
    except urllib.error.URLError as e:
        print(f"Download failed: {e}")
        return False

Best Practices from Medium

Always use raise_for_status() to catch HTTP errors
Stream large files using stream=True and iter_content()
Add proper error handling for network issues
Use progress bars for better user experience
Validate downloaded files

Complete Podcast Utility Example

Here’s a complete example that replaces your wget approach:

python

import requests
import os
import xml.etree.ElementTree as ET
from datetime import datetime
from urllib.parse import urljoin
import logging

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class PodcastUpdater:
    def __init__(self, base_url, output_dir='podcasts'):
        self.base_url = base_url
        self.output_dir = output_dir
        os.makedirs(output_dir, exist_ok=True)
    
    def download_episode(self, episode_url, episode_id):
        """Download a podcast episode with progress tracking and error handling"""
        filename = os.path.join(self.output_dir, f"{episode_id}.mp3")
        
        try:
            # Stream the download with progress bar
            with requests.get(episode_url, stream=True, timeout=30) as response:
                response.raise_for_status()
                
                # Get total file size
                total_size = int(response.headers.get('content-length', 0))
                
                logger.info(f"Downloading {episode_id} ({total_size/1024/1024:.1f} MB)")
                
                with open(filename, 'wb') as file:
                    downloaded = 0
                    for chunk in response.iter_content(chunk_size=8192):
                        if chunk:  # filter out keep-alive chunks
                            file.write(chunk)
                            downloaded += len(chunk)
                            
                            # Log progress every 10%
                            if total_size > 0 and downloaded % (total_size // 10) == 0:
                                progress = (downloaded / total_size) * 100
                                logger.info(f"Progress: {progress:.0f}%")
                
                logger.info(f"Successfully downloaded {filename}")
                return True
                
        except requests.exceptions.RequestException as e:
            logger.error(f"Failed to download {episode_id}: {e}")
            return False
    
    def update_podcast_xml(self, episodes_config):
        """Update the podcast XML file with new episodes"""
        xml_file = os.path.join(self.output_dir, 'podcast.xml')
        
        # Create XML structure (simplified example)
        root = ET.Element('rss', version='2.0')
        channel = ET.SubElement(root, 'channel')
        
        # Add basic podcast info
        ET.SubElement(channel, 'title').text = "My Podcast"
        ET.SubElement(channel, 'description').text = "A great podcast"
        ET.SubElement(channel, 'link').text = self.base_url
        
        # Add episodes
        for episode in episodes_config:
            if self.download_episode(episode['url'], episode['id']):
                item = ET.SubElement(channel, 'item')
                ET.SubElement(item, 'title').text = episode['title']
                ET.SubElement(item, 'description').text = episode['description']
                ET.SubElement(item, 'pubDate').text = datetime.now().strftime('%a, %d %b %Y %H:%M:%S GMT')
                ET.SubElement(item, 'enclosure', {
                    'url': urljoin(self.base_url, f"{episode['id']}.mp3"),
                    'type': 'audio/mpeg',
                    'length': str(episode.get('size', 0))
                })
        
        # Write XML file
        tree = ET.ElementTree(root)
        tree.write(xml_file, encoding='utf-8', xml_declaration=True)
        logger.info(f"Updated podcast XML: {xml_file}")

# Example usage
if __name__ == "__main__":
    podcast = PodcastUpdater('https://my-podcast-website.com')
    
    episodes = [
        {
            'id': 'episode001',
            'title': 'First Episode',
            'description': 'This is my first episode',
            'url': 'https://my-podcast-website.com/episodes/episode001.mp3',
            'size': 52428800  # 50MB
        },
        {
            'id': 'episode002',
            'title': 'Second Episode',
            'description': 'This is my second episode',
            'url': 'https://my-podcast-website.com/episodes/episode002.mp3',
            'size': 78643200  # 75MB
        }
    ]
    
    podcast.update_podcast_xml(episodes)

Choosing the Right Approach

When to Use urllib

Use Python’s built-in urllib when:

You want to avoid external dependencies
You’re working in a restricted environment
You need simple downloads without advanced features
Your utility needs to be self-contained

As Tutorialspoint notes, urllib.request is suitable for straightforward file downloading tasks.

When to Use requests

Use the requests library when:

You need better error handling
You want cleaner, more readable code
You require advanced features like streaming, sessions, or authentication
You’re working with complex HTTP scenarios
You want to add progress bars easily

According to AskPython, the requests library provides an exclusive and efficient way to handle HTTP requests in Python.

Recommendations for Your Use Case

For your podcast utility, I recommend using the requests library with streaming and proper error handling because:

MP3 files can be large, so streaming prevents memory issues
Error handling is crucial for automated scheduled downloads
Progress bars provide feedback during downloads
The code will be more maintainable and readable
You can easily extend it for authentication if needed in the future

Conclusion

Downloading files over HTTP in Python is straightforward and can easily replace wget in your podcast utility. Here are the key takeaways:

Both urllib and requests can handle file downloads, with requests offering better error handling and more features
Use streaming downloads (stream=True with iter_content()) for large files like MP3s to avoid memory issues
Always implement proper error handling with raise_for_status() and try-catch blocks
Add progress bars using tqdm for better user experience with large downloads
Consider authentication if your podcast files are behind protected access

For your specific podcast utility, the requests library with streaming, error handling, and progress tracking provides the most robust solution. The complete example above shows how to integrate file downloading with XML processing in a single Python script, eliminating the need for external tools like wget.

Sources

How to handle authentication when downloading files with Python?What's the difference between urllib and requests for file downloads?How to add a progress bar to Python file downloads?How to resume interrupted downloads in Python?How to download multiple files concurrently in Python?How to validate downloaded files in Python?

Ask NeuroAgent

How to Download Files in Python: Complete Guide

Contents

Basic Download Methods

Using urllib.request.urlretrieve

Using urllib.request.urlopen

Using the requests Library

Basic Download with requests.get

Streaming Download for Large Files

Advanced Features

Progress Bar with tqdm

Authentication for Protected Files

Error Handling and Best Practices

Proper Error Handling

Best Practices from Medium

Complete Podcast Utility Example

Choosing the Right Approach

When to Use urllib

When to Use requests

Recommendations for Your Use Case

Conclusion

Sources