How to force refresh AWS SSO login credentials to prevent token expiration during long-running jobs?
I’m using Metaflow with S3 and need to authenticate via AWS CLI single sign-on (aws sso login). The issue is that I can’t programmatically refresh my SSO login credentials when they expire.
Problem scenario:
- SSO timeout is set to 24 hours
- Day 1: Login at 10am using aws sso login
- Day 2: Start a program at 9am using aws sso login again
- The token is still valid at login, but expires at 10am
- My Metaflow flow crashes when it can no longer access S3 after token expiration
How can I prevent this issue? I haven’t found relevant information through searches. Should I manually delete tokens in ~/.aws/sso/cache?
I also have long-running jobs that exceed the 24-hour timeout period.
AWS SSO tokens automatically expire after 24 hours by design, but you can programmatically refresh them using AWS CLI commands, custom scripts, or by leveraging the AWS SSO cache mechanism. For long-running jobs exceeding 24 hours, you need to implement automatic credential refresh logic or use alternative authentication methods like IAM roles or long-lived STS tokens with MFA.
Contents
- Understanding AWS SSO Token Caching
- Programmatic Credential Refresh Methods
- Solutions for Long-Running Jobs
- Metaflow Integration Best Practices
- Manual Token Management
- Security Considerations
Understanding AWS SSO Token Caching
When you authenticate using aws sso login, the AWS CLI performs an OIDC/OAuth flow and stores a JSON token file in the SSO cache directory, typically located at ~/.aws/sso/cache/. This cache contains critical information:
{
"accessToken": "Bearer_token_value",
"expiresAt": "2025-01-15T10:00:00Z",
"clientId": "client_id_value",
"clientSecret": "secret_value",
"region": "us-east-1",
"startUrl": "https://your-sso-start-url"
}
The expiresAt timestamp determines when the token becomes invalid. You cannot directly refresh this token programmatically - it’s designed to be refreshed by running aws sso login again when expired. According to the AWS documentation, SSO tokens are short-lived security measures that reduce the risk of leaked credentials compared to long-lived access keys.
Programmatic Credential Refresh Methods
Method 1: AWS SSO Token Refresh Script
Create a script that checks token expiration and refreshes when needed:
#!/bin/bash
CACHE_DIR="$HOME/.aws/sso/cache"
TOKEN_FILE=$(ls -t $CACHE_DIR/*.json | head -1)
if [ -f "$TOKEN_FILE" ]; then
EXPIRES_AT=$(jq -r '.expiresAt' "$TOKEN_FILE")
CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
if [[ "$CURRENT_TIME" > "$EXPIRES_AT" ]]; then
echo "Token expired, refreshing..."
aws sso login --profile your-sso-profile
else
echo "Token still valid"
fi
else
echo "No token found, logging in..."
aws sso login --profile your-sso-profile
fi
Method 2: Python Script with Boto3
For programmatic access, use the AWS SDK to get role credentials:
import boto3
import json
import os
from datetime import datetime, timedelta
def get_sso_credentials(sso_start_url, sso_region, sso_account_id, sso_role_name):
"""Get SSO credentials using AWS SSO token from cache"""
sso_client = boto3.client('sso', region_name=sso_region)
# Read cached token
cache_file = os.path.expanduser('~/.aws/sso/cache/*.json')
with open(cache_file, 'r') as f:
token_data = json.load(f)
# Get role credentials
response = sso_client.get_role_credentials(
accessToken=token_data['accessToken'],
accountId=sso_account_id,
roleName=sso_role_name
)
credentials = response['roleCredentials']
return {
'aws_access_key_id': credentials['accessKeyId'],
'aws_secret_access_key': credentials['secretAccessKey'],
'aws_session_token': credentials['sessionToken'],
'expiration': credentials['expiration']
}
def check_and_refresh_credentials():
"""Check if credentials need refresh and return new ones"""
# Your SSO configuration
config = {
'sso_start_url': 'https://your-sso-start-url',
'sso_region': 'us-east-1',
'sso_account_id': '123456789012',
'sso_role_name': 'YourRoleName'
}
return get_sso_credentials(**config)
Method 3: Using AWS STS with MFA
For longer-lived credentials, use AWS STS with MFA:
#!/bin/bash
MFA_ARN="arn:aws:iam::123456789012:mfa/your-user"
DURATION=86400 # 24 hours
PROFILE="your-profile"
echo -n "Enter MFA code: "
read TOKEN_CODE
# Get temporary credentials
CREDS=$(aws sts get-session-token \
--serial-number "$MFA_ARN" \
--token-code "$TOKEN_CODE" \
--duration-seconds "$DURATION" \
--profile "$PROFILE" \
--output json)
# Extract credentials
AWS_ACCESS_KEY_ID=$(echo $CREDS | jq -r '.Credentials.AccessKeyId')
AWS_SECRET_ACCESS_KEY=$(echo $CREDS | jq -r '.Credentials.SecretAccessKey')
AWS_SESSION_TOKEN=$(echo $CREDS | jq -r '.Credentials.SessionToken')
# Export for use in scripts
export AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_SESSION_TOKEN
Solutions for Long-Running Jobs
Solution 1: Automatic Credential Refresh in Metaflow
Modify your Metaflow script to handle credential refresh:
import os
import subprocess
import time
from datetime import datetime, timedelta
import boto3
class RefreshableCredentials:
def __init__(self):
self.credentials = None
self.expiration = datetime.now()
self.refresh_threshold = timedelta(minutes=30)
def refresh_if_needed(self):
"""Refresh credentials if they're about to expire"""
if datetime.now() > (self.expiration - self.refresh_threshold):
self.refresh_credentials()
def refresh_credentials(self):
"""Refresh AWS credentials"""
try:
# Run aws sso login to refresh token
subprocess.run(['aws', 'sso', 'login'], check=True)
# Create new boto3 session
session = boto3.Session()
self.credentials = session.get_credentials()
self.expiration = datetime.now() + timedelta(hours=1) # SSO tokens last 1 hour after refresh
except subprocess.CalledProcessError as e:
print(f"Failed to refresh credentials: {e}")
raise
# Usage in Metaflow flow
class MyFlow(FlowSpec):
@step
def start(self):
self.credentials = RefreshableCredentials()
self.next(self.process_data)
@step
def process_data(self):
# Refresh credentials before long operation
self.credentials.refresh_if_needed()
# Set environment variables for boto3
os.environ['AWS_ACCESS_KEY_ID'] = self.credentials.credentials.access_key
os.environ['AWS_SECRET_ACCESS_KEY'] = self.credentials.credentials.secret_key
os.environ['AWS_SESSION_TOKEN'] = self.credentials.credentials.token
# Your S3 operations here
s3 = boto3.client('s3')
# ... your code ...
self.next(self.end)
@step
def end(self):
pass
Solution 2: IAM Roles for EC2/ECS
If running on AWS infrastructure, use IAM roles instead of SSO credentials:
# For EC2 instances
aws ec2 associate-iam-instance-profile \
--instance-id i-1234567890abcdef0 \
--iam-instance-profile Name=YourIAMRoleName
# For ECS tasks
aws ecs run-task \
--cluster your-cluster \
--task-definition your-task-definition \
--launch-type FARGATE \
--network-configuration "awsvpcConfiguration={subnets=[subnet-12345678],securityGroups=[sg-123456]}"
Solution 3: Long-Lived Credentials with MFA
For truly long-running jobs, use AWS STS to get credentials up to 12 hours with MFA:
import boto3
from datetime import datetime, timedelta
def get_long_lived_credentials():
"""Get credentials valid for up to 12 hours with MFA"""
sts_client = boto3.client('sts')
# Get MFA device ARN
iam_client = boto3.client('iam')
mfa_devices = iam_client.list_mfa_devices(UserName='your-username')
mfa_arn = mfa_devices['MFADevices'][0]['SerialNumber']
# Prompt for MFA code
mfa_code = input("Enter MFA code: ")
# Get session token
response = sts_client.get_session_token(
DurationSeconds=43200, # 12 hours
SerialNumber=mfa_arn,
TokenCode=mfa_code
)
credentials = response['Credentials']
expiration = credentials['Expiration']
return {
'access_key': credentials['AccessKeyId'],
'secret_key': credentials['SecretAccessKey'],
'session_token': credentials['SessionToken'],
'expiration': expiration
}
Metaflow Integration Best Practices
Configuration Approach 1: Environment Variables
Set up Metaflow to use environment variables for AWS credentials:
# In your .bashrc or profile
export AWS_PROFILE=your-sso-profile
export AWS_CONFIG_FILE=~/.aws/config
# Metaflow configuration
export METAFLOW_SERVICE_ROLE=your-service-role
export METAFLOW_DEFAULT_DATASTORE=s3://your-bucket
export METAFLOW_DEFAULT_METADATA_SERVICE=rds-postgres://your-db
Configuration Approach 2: Metaflow AWS Plugin
Use the Metaflow AWS plugin for better integration:
from metaflow import FlowSpec, step, S3
class AWSFlow(FlowSpec):
@step
def start(self):
# Use S3 with automatic credential handling
s3_path = S3('your-bucket/your-file.txt')
# Your code here
self.next(self.process)
@step
def process(self):
# Metaflow handles credential refresh automatically
# when using S3 operations
self.next(self.end)
@step
def end(self):
pass
Configuration Approach 3: Custom Credential Provider
Create a custom credential provider for Metaflow:
import boto3
from botocore.credentials import DeferredRefreshableCredentials
class SSOCredentialProvider:
def __init__(self):
self.session = boto3.Session()
def get_credentials(self):
# Check if we need to refresh
if hasattr(self, '_credentials') and self._credentials is not None:
exp_time = self._credentials.time_remaining()
if exp_time > 300: # 5 minutes
return self._credentials
# Refresh credentials
try:
self.session.refresh()
self._credentials = self.session.get_credentials()
return self._credentials
except Exception as e:
print(f"Failed to refresh credentials: {e}")
raise
# Use in Metaflow
credentials = SSOCredentialProvider()
os.environ['AWS_ACCESS_KEY_ID'] = credentials.get_credentials().access_key
os.environ['AWS_SECRET_ACCESS_KEY'] = credentials.get_credentials().secret_key
os.environ['AWS_SESSION_TOKEN'] = credentials.get_credentials().token
Manual Token Management
When to Delete Cache Files
You should manually delete cache files in these scenarios:
# Delete all SSO cache files
rm -rf ~/.aws/sso/cache/*
# Delete specific expired tokens
find ~/.aws/sso/cache -name "*.json" -mtime +1 -delete
# Check token expiration dates
ls -la ~/.aws/sso/cache/
cat ~/.aws/sso/cache/*.json | jq '.expiresAt'
Automated Cache Cleanup
Create a cron job to clean up old tokens:
# Edit crontab
crontab -e
# Add this line to clean tokens older than 24 hours
0 2 * * * find ~/.aws/sso/cache -name "*.json" -mtime +1 -delete >/dev/null 2>&1
However, manually deleting tokens is not the recommended solution for your use case. It’s a temporary fix that requires manual intervention and doesn’t solve the underlying problem of long-running jobs exceeding token expiration.
Security Considerations
Risk of Token Storage
Be aware that AWS SSO cache files are stored in plain text, which poses security risks:
# Check file permissions
ls -la ~/.aws/sso/cache/
# Should be: -rw------- (600 permissions)
Best Practices
- Use short token durations - The default 1-hour token duration is a good balance
- Enable MFA - Always require multi-factor authentication for SSO access
- Use IAM roles - favor instance roles over long-lived credentials
- Monitor token usage - Regularly review token expiration logs
- Implement credential rotation - Automatically refresh credentials before expiration
Alternative Approaches
For truly long-running jobs, consider these alternatives:
- Use AWS IAM Roles - Assign roles to EC2 instances, ECS tasks, or Lambda functions
- Use AWS STS AssumeRole - Get temporary credentials with custom durations
- Use OIDC Federation - Connect GitHub Actions or other CI/CD systems directly to AWS
- Use AWS Credentials File - Generate long-lived keys with proper security measures
Conclusion
To prevent AWS SSO token expiration during long-running jobs:
- Implement automatic credential refresh using Python scripts that check token expiration and run
aws sso loginwhen needed - Use AWS STS with MFA for credentials lasting up to 12 hours
- Leverage IAM roles for infrastructure-based authentication when possible
- Integrate credential refresh directly into Metaflow flows using environment variable management
- Avoid manual token deletion as it’s not scalable and requires constant monitoring
The most sustainable solution is to implement programmatic credential refresh in your Metaflow workflows, ensuring seamless access to AWS resources without interruption. For production environments, consider using IAM roles or STS AssumeRole for better security and easier management.