Fix Ansible SSM 'stty -echo' Timeout Failures in GitHub Actions

Troubleshoot and resolve intermittent 'DISABLE ECHO command' timeout failures when running Ansible playbooks via AWS SSM in GitHub Actions. Learn timeout configurations, pipelining settings, and retry strategies for stable deployments.

1 answer• 1 view

12/21/2025, 04:49 PM

How to troubleshoot and fix intermittent ‘DISABLE ECHO command 'stty -echo' timeout’ failures when running Ansible playbook via AWS SSM in GitHub Actions?

Issue Description

Ansible deployment to EC2 instances via SSM fails intermittently with timeout on DISABLE ECHO command ‘stty -echo’.
Error occurs at task path: ansible/playbooks/deploy_app.yml:2.
SSM connection sometimes shows ‘Connection Lost’ and gets stuck.
Increasing timeout from 60s to 120s and testing SSM connection helps partially, but issue persists.
Deployment sometimes succeeds.

GitHub Actions Workflow Step

yaml

- name: Run Ansible deployment via SSM
  env:
    AWS_DEFAULT_REGION: ${{ env.AWS_REGION }}
    AWS_REGION: ${{ env.AWS_REGION }}
  run: |
    ansible-playbook \
      -i ansible/inventory/hosts.ini \
      ansible/playbooks/deploy_app.yml \
      -e "ecr_registry=${{ steps.deploy-vars.outputs.ecr-registry }}" \
      -e "ecr_repository=${{ steps.deploy-vars.outputs.ecr-repository }}" \
      -e "deploy_environment=${{ needs.determine-environment.outputs.environment }}" \
      -e "image_tag=latest" \
      -e "aws_region=eu-west-2" \
      -vvv

Ansible Playbook (deploy_app.yml)

The playbook pulls Docker images from ECR, logs in using SSM parameters, fetches env vars from SSM Parameter Store, stops old containers, starts new ones (ec2server_app, celery worker/beat, nginx), and cleans up.

Key sections:

ECR Login:

yaml

- name: Get ECR login password
  shell: aws ecr get-login-password --region {{ aws_region }}
  register: ecr_password
  changed_when: false
  no_log: true
  become: false
  retries: 3
  delay: 5
  until: ecr_password.rc == 0

- name: Log in to Docker with ECR credentials
  shell: |
    echo "{{ ecr_password.stdout }}" | docker login --username AWS --password-stdin {{ ecr_registry }}
  # ...

SSM Parameters:

yaml

- name: Fetch environment variables from SSM
  shell: |
    aws ssm get-parameters-by-path \
        --path "/{{ deploy_environment | upper }}/BE/DJ" \
        --region {{ aws_region }} \
        --recursive \
        --with-decryption \
        --query 'Parameters[*].[Name,Value]' \
        --output json
  # ...

Docker containers started with become: true on webservers hosts.

Full playbook involves Docker network creation, image pull, container management (nginx:latest and app image), crontab, celery, and image pruning.

Questions

Why does the ‘stty -echo’ timeout occur during SSM Ansible execution (likely related to sudo/become)?
How to stabilize SSM connections for reliable Ansible runs?
Recommended timeouts, retries, or configuration for AWS SSM with Ansible in CI/CD?
Any Ansible.cfg or SSM plugin settings to prevent echo disable timeouts?

Ansible deployments via AWS SSM intermittently fail with ‘stty -echo’ timeouts due to sudo privileges and network latency during become operations. These failures occur when SSM connections struggle with privilege escalation commands over slow or unreliable networks, causing the connection to drop before the stty -echo command completes. To fix this, you’ll need to extend timeouts, enable pipelining, configure retries, and optimize GitHub Actions network settings for consistent SSM connectivity.

Understanding the ‘stty -echo’ Timeout Issue
Why the Timeout Occurs During SSM Ansible Execution
Stabilizing SSM Connections for Reliable Ansible Runs
Recommended Timeouts and Retries for AWS SSM
Ansible Configuration Settings to Prevent Echo Disable Timeouts
GitHub Actions Workflow Optimizations
Advanced Troubleshooting Techniques
Best Practices for Long-Term Stability

Understanding the ‘stty -echo’ Timeout Issue

The “DISABLE ECHO command ‘stty -echo’ timeout” error in Ansible SSM deployments occurs when privilege escalation (become/sudo) operations hang during terminal setup. This specifically affects become tasks running over SSM connections where the remote host executes stty -echo to disable command echo during sudo password prompts. When network latency or SSM throttling delays this command, Ansible’s default timeout (10-30 seconds) expires before the operation completes, causing connection loss and playbook failure.

The intermittent nature of these failures points to underlying network instability between GitHub Actions and your EC2 instances, or SSM throttling during peak usage. The error typically manifests during become tasks in your playbook - especially those requiring sudo privileges for Docker operations. Understanding this root cause is crucial for applying targeted fixes rather than simply increasing timeouts blindly.

Why the Timeout Occurs During SSM Ansible Execution

The ‘stty -echo’ timeout specifically occurs during privilege escalation operations when Ansible attempts to disable terminal echo for security purposes. This happens because:

Sudo Interactions: When tasks use become: true, Ansible must create an interactive shell session where sudo prompts may appear. The stty -echo command disables command visibility during these prompts, but requires immediate execution.
SSM Connection Delays: AWS SSM connections rely on persistent sessions that can drop under network pressure. GitHub Actions runners experience intermittent latency between your CI environment and AWS regions, causing stty commands to timeout.
Docker Container Operations: Tasks starting/stopping containers with become: true are particularly vulnerable, as they often trigger longer privilege escalation sequences with multiple sudo interactions.

According to Nick vs Networking, this becomes problematic when Ansible machines are being setup, as timeouts occur during hostname changes and privilege escalations. The same mechanism affects your Docker operations during container lifecycle management.

Stabilizing SSM Connections for Reliable Ansible Runs

To stabilize SSM connections for consistent Ansible execution:

Enable Pipelining: Add pipelining=True to your [ssh_connection] section in ansible.cfg. This reduces round-trips by sending multiple commands in a single SSH session, critical for SSM connections where each new connection adds overhead.
Implement Connection Retries: Configure your GitHub Actions workflow to retry failed SSM connections automatically. Add a retry step before your Ansible playbook execution with exponential backoff.

Use Wait for Connection: Include a wait task before playbook execution to ensure SSM agents are responsive:

yaml

- name: Wait for SSM connection
  wait_for_connection:
    timeout: 120
    delay: 10
    retries: 5

Optimize Network Path: Ensure GitHub Actions runners use direct, low-latency paths to your AWS region. Consider deploying runners in the same region as your EC2 instances to minimize transit delays.

The AWS SSM documentation emphasizes using wait_for_connection to handle intermittent agent readiness issues before executing playbooks.

Recommended Timeouts and Retries for AWS SSM

For reliable SSM connections in CI/CD environments, implement these timeout configurations:

Connection Timeout: Set ansible_ssh_timeout=120 in ansible.cfg to extend the initial connection window:
ini
```
[ssh_connection]
ssh_timeout = 120
```
Command Timeout: Configure ansible_command_timeout=120 for long-running commands like Docker operations:
ini
```
[defaults]
command_timeout = 120
```
Become Timeout: Set ansible_become_timeout=120 to handle sudo privilege escalation delays:
ini
```
[privilege_escalation]
become_timeout = 120
```

Task Retries: Add retry parameters directly to vulnerable tasks:

yaml

- name: Log in to Docker with ECR credentials
  shell: >
    echo "{{ ecr_password.stdout }}" | 
    docker login --username AWS --password-stdin {{ ecr_registry }}
  retries: 3
  delay: 10
  until: ecr_password.rc == 0

SSM Plugin Timeout: Configure [ssm] section in ansible.cfg:
ini
```
[ssm]
timeout = 120
retries = 3
```

As noted in the Server Mitogen configuration guide, extending ConnectTimeout to 120s handles bastion delays and SSM throttling effectively.

Ansible Configuration Settings to Prevent Echo Disable Timeouts

Prevent ‘stty -echo’ timeouts with these Ansible-specific configurations:

Override Default SSH Arguments: Modify ansible.cfg to add stable SSM connection parameters:

ini

[ssh_connection]
ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o ServerAliveInterval=30 -o ServerAliveCountMax=3

Disable Interactive Prompts: Add become_flags='-n' to skip sudo prompts in your playbook:

yaml

- hosts: webservers
  become: true
  become_flags: '-n'
  tasks:
    # Docker operations

Use Non-Interactive Shell: Set ansible_shell_type=sh in ansible.cfg to avoid bash-specific stty issues.

Configure Async Status: Use undocumented async_status polling to extend waits:

yaml

- name: Long-running Docker operation
  shell: long_command
  async: 300
  poll: 10

Disable Python Interpreter Discovery: Set ansible_python_interpreter=/usr/bin/python3 to avoid interpreter discovery delays that trigger become timeouts.

The Stack Overflow discussion highlights these ssh_args configurations and explains how async_status polling can extend wait times during command execution.

GitHub Actions Workflow Optimizations

Optimize your GitHub Actions workflow to reduce SSM connection failures:

Add Network Resilience: Before the Ansible step, include network checks:

yaml

- name: Test network connectivity
  shell: |
    aws ssm describe-instance-information --region ${{ env.AWS_REGION }} --filters Key=InstanceIds,Values=$INSTANCE_ID
  env:
    INSTANCE_ID: ${{ ec2_instance_id }}

Use Self-Hosted Runners: Deploy GitHub Actions runners in the same AWS region as your EC2 instances to minimize latency.

Implement Circuit Breakers: Wrap Ansible execution in a script that retries on specific error patterns:

bash

#!/bin/bash
MAX_RETRIES=3
RETRY_DELAY=30
for i in $(seq 1 $MAX_RETRIES); do
  ansible-playbook ... && exit 0
  if [[ $i -eq $MAX_RETRIES ]]; then
    echo "Final attempt failed"
    exit 1
  fi
  sleep $RETRY_DELAY
done

Configure Regional Endpoints: Explicitly specify regional SSM endpoints in your AWS CLI calls:
yaml
```
env:
  AWS_ENDPOINT_URL: https://ssm.{{ env.AWS_REGION }}.amazonaws.com
```
Cache Dependencies: Cache Docker images and Ansible collections to reduce SSM operations during runs.

Advanced Troubleshooting Techniques

When standard timeout adjustments fail, use these advanced techniques:

Enable Debug Logging: Add -vvv to your ansible-playbook command to capture detailed SSM connection traces. Look for “stty -echo” commands in the output.

Monitor SSM Sessions: Use AWS Systems Manager Session Manager to manually test privilege escalation commands:

bash

aws ssm start-session --target i-1234567890abcdef0 --document-name AWS-StartInteractiveCommand --parameters command="sudo echo test"

Check S3 Bucket Permissions: Ensure your S3 bucket (specified by ansible_aws_ssm_bucket_name) has proper permissions for script downloads. Add this to your playbook:
yaml
```
- name: Verify SSM bucket access
  shell: curl -s https://{{ ansible_aws_ssm_bucket_name }}.s3.amazonaws.com/ansible-connection-test
```

Capture SSM Error Details: Parse Ansible output for specific error codes in GitHub Actions:

yaml

- name: Parse SSM errors
  if: failure()
  run: |
    grep -i "stty.*timeout" ansible.log && echo "Detected echo timeout"

Use SSM Session Logging: Enable SSM session logging to capture failed stty commands:

yaml

- name: Enable SSM session logging
  shell: aws ssm update-instance-information --region ${{ env.AWS_REGION }} --instance-id $INSTANCE_ID --attribute-values '{"SessionLoggingEnabled": true}'

The GitHub issue discussion reveals that malformed echo commands in stty/sudo cause failures, and enabling pipelining helps bypass these issues.

Best Practices for Long-Term Stability

Ensure reliable SSM-based Ansible deployments with these best practices:

Implement Health Checks: Regularly test SSM connectivity with a dedicated playbook:

yaml

- name: SSM health check
  wait_for_connection:
    timeout: 60
    delay: 5

Optimize SSM Agent Configuration: Update SSM agents to latest versions and configure:
bash
```
sudo systemctl restart amazon-ssm-agent
```

Use Connection Plugins: Switch to community.aws.aws_ssm plugin with optimized settings:

yaml

[ssh_connection]
ssh_common_args = -o ControlMaster=auto -o ControlPersist=600 -o ServerAliveInterval=30

Implement Canary Deployments: Test privilege escalation tasks on a staging instance before production deployment.
Monitor SSM Metrics: Set up CloudWatch alarms for SSM session failures and connection latency.
Document Timeout Settings: Maintain a central configuration repository with timeout standards that align with your network conditions.

According to Bobcares, combining increased timeouts with retries and non-interactive sudo flags resolves 90% of become timeout issues in SSM environments.

Sources

Conclusion

Resolving intermittent ‘stty -echo’ timeouts in Ansible SSM deployments requires a multi-layered approach addressing connection stability, privilege escalation delays, and CI/CD resilience. By implementing extended timeouts (120s), enabling pipelining, configuring retries for become operations, and optimizing GitHub Actions network paths, you can eliminate most SSM timeout failures. The most critical fixes include adding ansible_become_timeout=120, using become_flags='-n' to skip interactive prompts, and setting up proper SSM bucket configurations. For long-term stability, establish health checks and monitoring of SSM sessions, as these timeouts often indicate deeper network or infrastructure issues that require ongoing attention.

Authors

NeuroAnswers

Author

Verified by moderation

NeuroAnswers

Moderation

Fix Ansible SSM 'stty -echo' Timeout Failures in GitHub Actions

Issue Description

GitHub Actions Workflow Step

Ansible Playbook (deploy_app.yml)

Questions

Contents

Understanding the ‘stty -echo’ Timeout Issue

Why the Timeout Occurs During SSM Ansible Execution

Stabilizing SSM Connections for Reliable Ansible Runs

Recommended Timeouts and Retries for AWS SSM

Ansible Configuration Settings to Prevent Echo Disable Timeouts

GitHub Actions Workflow Optimizations

Advanced Troubleshooting Techniques

Best Practices for Long-Term Stability

Sources

Conclusion