NeuroAgent

Best HTML to PDF Conversion PHP Linux Guide

Complete guide to converting HTML/CSS to PDF on Linux servers using PHP. Compare DOMPDF, Mpdf, Puppeteer, and wkhtmltopdf solutions with installation steps.

Question

How can I convert HTML and CSS to PDF on a Linux server using PHP? I have an HTML document that renders correctly in browsers but needs to be converted to PDF. I’ve tried several solutions:

  1. DOMPDF: Had issues with tables and images, consuming excessive memory
  2. HTML2PDF/HTML2PS: Better table rendering but encountered unknown node_type() errors
  3. Htmldoc: Works on basic HTML but has poor CSS support

I need a solution that runs on Linux and ideally works on-demand via PHP on a webserver. What are the best approaches or alternatives for this conversion?

NeuroAgent

HTML and CSS to PDF conversion on Linux servers using PHP can be achieved through several robust approaches, with modern solutions like headless browsers (Chrome/Chromium) offering the most accurate rendering, while specialized libraries like Mpdf or Snappy provide PHP-native alternatives that address the limitations you’ve encountered with DOMPDF and other tools.

Contents

Overview of HTML to PDF Conversion Methods

HTML to PDF conversion on Linux servers typically falls into three main categories:

  1. PHP-native libraries that parse HTML directly and generate PDF output
  2. Command-line tools that convert HTML to PDF via system calls
  3. Headless browser solutions that render HTML in a browser environment and convert to PDF

Each approach has its strengths and weaknesses. PHP-native libraries like Mpdf are lightweight and easy to integrate but may struggle with complex CSS. Command-line tools like Htmldoc offer good basic support but limited CSS capabilities. Headless browsers provide the most accurate rendering but require more system resources.

Key Consideration: The best approach depends on your specific requirements for CSS support, performance, system resources, and output quality.

Modern Headless Browser Solutions

Puppeteer with Node.js Bridge

Puppeteer, a Node.js library, offers excellent HTML to PDF conversion through headless Chrome/Chromium. While not PHP-native, you can create a bridge between PHP and Node.js:

php
// Example PHP to Node.js bridge for Puppeteer
function generatePdfWithPuppeteer($html, $outputPath) {
    $tempHtmlFile = tempnam(sys_get_temp_dir(), 'html_');
    file_put_contents($tempHtmlFile, $html);
    
    $command = "node puppeteer-pdf.js " . escapeshellarg($tempHtmlFile) . " " . escapeshellarg($outputPath);
    exec($command, $output, $returnCode);
    
    unlink($tempHtmlFile);
    
    return ($returnCode === 0);
}

Pros:

  • Excellent CSS support including modern features
  • Accurate rendering matching browsers
  • Supports complex layouts, tables, and images

Cons:

  • Requires Node.js on the server
  • Higher memory consumption
  • More complex setup

WeasyPrint

WeasyPrint is a Python library that converts HTML/CSS to PDF with excellent standards compliance:

php
// PHP wrapper for WeasyPrint
function generatePdfWithWeasyPrint($html, $outputPath) {
    $tempHtmlFile = tempnam(sys_get_temp_dir(), 'html_');
    file_put_contents($tempHtmlFile, $html);
    
    $command = "weasyprint " . escapeshellarg($tempHtmlFile) . " " . escapeshellarg($outputPath);
    exec($command, $output, $returnCode);
    
    unlink($tempHtmlFile);
    
    return ($returnCode === 0);
}

Pros:

  • Excellent CSS3 support
  • High-quality output
  • Good performance

Cons:

  • Requires Python dependencies
  • Limited JavaScript support
  • Steeper learning curve

PHP-Native Libraries and Alternatives

Mpdf (Improved Alternative to DOMPDF)

Mpdf is often recommended as a successor to DOMPDF with better performance and features:

php
require_once __DIR__ . '/vendor/autoload.php';

$mpdf = new \Mpdf\Mpdf([
    'mode' => 'utf-8',
    'format' => 'A4',
    'margin_left' => 10,
    'margin_right' => 10,
    'margin_top' => 10,
    'margin_bottom' => 10
]);

$mpdf->WriteHTML($htmlContent);
$mpdf->Output('document.pdf', 'D');

Key Improvements over DOMPDF:

  • Better table rendering
  • Improved CSS support
  • Lower memory consumption
  • Better image handling

Installation:

bash
composer require mpdf/mpdf

TCPDF

TCPDF is another robust PHP library with extensive features:

php
require_once('tcpdf/tcpdf.php');

$pdf = new TCPDF(PDF_PAGE_ORIENTATION, PDF_UNIT, PDF_PAGE_FORMAT, true, 'UTF-8', false);

$pdf->AddPage();
$pdf->writeHTML($htmlContent, true, false, true, false, '');
$pdf->Output('document.pdf', 'D');

Features:

  • Supports complex layouts
  • Good for forms and tables
  • Extensive documentation

Snappy ( wkhtmltopdf wrapper)

Snappy provides a PHP wrapper for the powerful wkhtmltopdf command-line tool:

php
require_once __DIR__ . '/vendor/autoload.php';

use Knp\Snappy\Pdf;

$snappy = new Pdf('/usr/bin/wkhtmltopdf');

$snappy->generateFromHtml(
    '<h1>Bill</h1><p>Dear Client</p>',
    '/path/to/bill.pdf'
);

Advantages:

  • Uses actual WebKit rendering engine
  • Excellent CSS support
  • Handles complex layouts well

Installation and Configuration Guide

System Requirements

Before installing any HTML to PDF solution, ensure your Linux server meets these requirements:

bash
# Basic system requirements
sudo apt update
sudo apt install -y php php-cli php-gd php-xml php-mbstring
sudo apt install -y libpng-dev libjpeg-dev libfreetype6-dev
sudo apt install -y fontconfig

Installing Mpdf

bash
# Install Mpdf via Composer
composer require mpdf/mpdf

# Or install system dependencies first
sudo apt install -y php-pear php-dev libpng-dev libjpeg-dev libfreetype6-dev
sudo pecl install imagick
echo "extension=imagick.so" | sudo tee /etc/php/$(php -r 'echo PHP_MAJOR_VERSION.".".PHP_MINOR_VERSION;')/cli/conf.d/imagick.ini

Installing wkhtmltopdf for Snappy

bash
# Download and install wkhtmltopdf
sudo apt install -y xvfb
wget https://github.com/wkhtmltopdf/packaging/releases/download/0.12.6-1/wkhtmltox_0.12.6-1.bionic_amd64.deb
sudo dpkg -i wkhtmltox_0.12.6-1.bionic_amd64.deb
sudo apt install -f

# Verify installation
wkhtmltopdf --version

Setting Up Headless Chrome

bash
# Install Chrome/Chromium
sudo apt install -y chromium-browser

# Or install Google Chrome
wget -q -O - https://dl.google.com/linux/linux_signing_key.pub | sudo apt-key add -
echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" | sudo tee /etc/apt/sources.list.d/google-chrome.list
sudo apt update
sudo apt install -y google-chrome-stable

# Install Node.js and Puppeteer
curl -fsSL https://deb.nodesource.com/setup_18.x | sudo -E bash -
sudo apt install -y nodejs
npm install -g puppeteer

Performance Optimization Techniques

Memory Management

For PHP-native libraries like Mpdf:

php
// Configure memory limits in php.ini
memory_limit = 512M

// In PHP code
ini_set('memory_limit', '512M');

// For large documents, use streaming
$mpdf = new \Mpdf\Mpdf([
    'mode' => 'utf-8',
    'format' => 'A4',
    'tempDir' => '/tmp/mpdf',
    'setAutoTopMargin' => 'pad',
    'setAutoBottomMargin' => 'pad'
]);

// Process HTML in chunks if needed
$mpdf->WriteHTML($htmlContent);
$mpdf->Output('document.pdf', 'F');

Caching and Optimization

php
// Implement caching for frequently generated PDFs
function getCachedPdf($key, $html, $ttl = 3600) {
    $cacheFile = "/tmp/pdf_cache/{$key}.pdf";
    
    if (file_exists($cacheFile) && (time() - filemtime($cacheFile)) < $ttl) {
        return $cacheFile;
    }
    
    // Generate PDF
    generatePdfWithLibrary($html, $cacheFile);
    return $cacheFile;
}

// Optimize HTML for PDF generation
function optimizeHtmlForPdf($html) {
    // Remove unnecessary elements
    $html = preg_replace('/<script\b[^<]*(?:(?!<\/script>)<[^<]*)*<\/script>/mi', '', $html);
    $html = preg_replace('/<style\b[^<]*(?:(?!<\/style>)<[^<]*)*<\/style>/mi', '', $html);
    
    // Optimize images
    $html = preg_replace('/<img([^>]+)>/i', '<img$1 loading="eager" />', $html);
    
    return $html;
}

Background Processing

For resource-intensive conversions:

php
// Use queues or background processing
function generatePdfInBackground($html, $outputPath, $callback = null) {
    $tempHtmlFile = tempnam(sys_get_temp_dir(), 'html_');
    file_put_contents($tempHtmlFile, $html);
    
    $command = "nohup php generate_pdf.php " . escapeshellarg($tempHtmlFile) . " " . escapeshellarg($outputPath) . " > /dev/null 2>&1 &";
    exec($command);
    
    if ($callback) {
        $callback($tempHtmlFile);
    }
}

// Worker script (generate_pdf.php)
$htmlFile = $argv[1];
$outputFile = $argv[2];

$html = file_get_contents($htmlFile);
generatePdfWithLibrary($html, $outputFile);
unlink($htmlFile);

Troubleshooting Common Issues

Memory Issues with DOMPDF/Mpdf

php
// Solution: Increase memory and optimize processing
$mpdf = new \Mpdf\Mpdf([
    'mode' => 'utf-8',
    'format' => 'A4',
    'tempDir' => '/tmp/mpdf',
    'memoryLimit' => '512M',
    'debug' => true // Enable debug mode
]);

// For very large documents, break into pages
$htmlParts = explode('<page-break>', $htmlContent);
foreach ($htmlParts as $part) {
    $mpdf->AddPage();
    $mpdf->WriteHTML($part);
}

CSS Compatibility Issues

php
// Solution: Use CSS resets and PDF-specific styles
$pdfStyles = '
    @page {
        size: A4;
        margin: 1cm;
    }
    
    body {
        font-family: Arial, sans-serif;
        font-size: 12pt;
        line-height: 1.4;
    }
    
    table {
        border-collapse: collapse;
        width: 100%;
    }
    
    table td, table th {
        border: 1px solid #ddd;
        padding: 8px;
        text-align: left;
    }
    
    img {
        max-width: 100%;
        height: auto;
    }
';

// Apply PDF-specific styles
$html = '<style>' . $pdfStyles . '</style>' . $html;

Font Loading Issues

php
// Solution: Embed custom fonts
$mpdf = new \Mpdf\Mpdf([
    'mode' => 'utf-8',
    'format' => 'A4',
    'default_font_size' => 0,
    'default_font' => 'helvetica',
    'fontDir' => [
        __DIR__ . '/fonts/',
        __DIR__ . '/tcpdf/fonts/',
        '/usr/share/fonts/truetype/dejavu/'
    ],
    'fontdata' => [
        'customfont' => [
            'R' => 'Custom-Regular.ttf',
            'B' => 'Custom-Bold.ttf',
            'I' => 'Custom-Italic.ttf',
            'BI' => 'Custom-BoldItalic.ttf'
        ]
    ]
]);

Table Rendering Problems

php
// Solution: Use table-friendly HTML structure
$html = '
<table>
    <thead>
        <tr>
            <th>Header 1</th>
            <th>Header 2</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>Data 1</td>
            <td>Data 2</td>
        </tr>
    </tbody>
</table>
';

// CSS for tables
$css = '
table {
    border-collapse: collapse;
    width: 100%;
    page-break-inside: avoid;
}
 
table td, table th {
    border: 1px solid #000;
    padding: 6px;
    text-align: left;
}

thead {
    background-color: #f2f2f2;
}
';

Advanced Features and Customization

Adding Headers and Footers

php
// Mpdf example with headers and footers
$mpdf = new \Mpdf\Mpdf();

// Set up headers and footers
$mpdf->SetHTMLHeader('
<div style="text-align: right; font-size: 10pt;">
    Page {PAGENO} of {nbpg}
</div>
');

$mpdf->SetHTMLFooter('
<div style="text-align: center; font-size: 8pt;">
    Generated on ' . date('Y-m-d H:i:s') . '
</div>
');

$mpdf->WriteHTML($html);
$mpdf->Output('document.pdf', 'D');

Watermarks and Security

php
// Add watermark to PDF
$mpdf = new \Mpdf\Mpdf();

// Add watermark
$mpdf->SetWatermarkText('DRAFT');
$mpdf->showWatermarkText = true;

// Set PDF security
$mpdf->SetProtection(
    ['copy', 'print'], // Allow printing and copying
    'user_password',   // User password
    'owner_password'   // Owner password
);

$mpdf->WriteHTML($html);
$mpdf->Output('document.pdf', 'D');

Multi-language Support

php
// Support for different languages and fonts
$mpdf = new \Mpdf\Mpdf([
    'mode' => 'utf-8',
    'format' => 'A4',
    'fontDir' => [
        __DIR__ . '/fonts/',
        '/usr/share/fonts/truetype/'
    ],
    'fontdata' => [
        'cjk' => [
            'R' => 'NotoSansCJK-Regular.ttc',
            'useOTL' => 0x100
        ]
    ]
]);

// Set language-specific settings
$mpdf->autoScriptToLang = true;
$mpdf->autoLangToFont = true;

$mpdf->WriteHTML('<p>Hello 世界!</p>');
$mpdf->WriteHTML('<p>Bonjour le monde!</p>');
$mpdf->Output('multilingual.pdf', 'D');

Dynamic Content Generation

php
// Generate PDF with dynamic data from database
function generateInvoicePdf($invoiceId) {
    // Get invoice data
    $invoice = getInvoiceData($invoiceId);
    $items = getInvoiceItems($invoiceId);
    
    // Generate HTML template
    $html = '
    <h1>Invoice #' . $invoice['id'] . '</h1>
    <p>Date: ' . $invoice['date'] . '</p>
    <table>
        <thead>
            <tr>
                <th>Item</th>
                <th>Quantity</th>
                <th>Price</th>
                <th>Total</th>
            </tr>
        </thead>
        <tbody>';
    
    foreach ($items as $item) {
        $html .= '
            <tr>
                <td>' . $item['description'] . '</td>
                <td>' . $item['quantity'] . '</td>
                <td>$' . number_format($item['price'], 2) . '</td>
                <td>$' . number_format($item['quantity'] * $item['price'], 2) . '</td>
            </tr>';
    }
    
    $html .= '
        </tbody>
    </table>
    <p><strong>Total: $' . number_format($invoice['total'], 2) . '</strong></p>
    ';
    
    // Generate PDF
    $mpdf = new \Mpdf\Mpdf();
    $mpdf->WriteHTML($html);
    $mpdf->Output('invoice_' . $invoiceId . '.pdf', 'D');
}

Conclusion

HTML to PDF conversion on Linux servers using PHP can be successfully implemented with several approaches depending on your specific requirements:

  1. For high-quality CSS support: Use headless browser solutions like Puppeteer or WeasyPrint, which provide the most accurate rendering but require additional system dependencies.

  2. For PHP-native solutions: Mpdf offers the best balance of features and performance, addressing many of the limitations you encountered with DOMPDF, particularly for tables and memory usage.

  3. For complex layouts: Consider wkhtmltopdf via Snappy, which uses WebKit rendering for excellent CSS support while remaining accessible through PHP.

  4. For performance optimization: Implement caching, background processing, and memory management techniques to handle large documents efficiently.

Recommended next steps: Start with Mpdf for its ease of integration and good performance, then explore headless browser solutions if you need advanced CSS support. Always test with your specific HTML content to ensure compatibility and performance meet your requirements.

Sources

  1. Mpdf Documentation - Official PHP PDF Generation Library
  2. Puppeteer Documentation - Headless Chrome Node API
  3. wkhtmltopdf - HTML to PDF Command Line Tool
  4. WeasyPrint Documentation - HTML/CSS to PDF Converter
  5. TCPDF Documentation - PHP PDF Library
  6. Snappy PHP Wrapper - wkhtmltopdf PHP Interface
  7. PHP Performance Best Practices - Memory Management