Skip to content

Timeouts Configuration

Timeouts control how long Go Overlay waits for various operations to complete. Proper timeout configuration ensures graceful shutdowns while preventing indefinite hangs.

Timeout Configuration Section

Timeouts are configured in the [timeouts] section of your configuration file:

[timeouts]
post_script_timeout = 30
service_shutdown_timeout = 30
global_shutdown_timeout = 60
dependency_wait_timeout = 300

Timeout Options

post_script_timeout

Type: integer (seconds)
Default: 30

Maximum time to wait for a service's pos_script to complete after the service starts.

[timeouts]
post_script_timeout = 30

When it applies: - After a service process starts successfully - Only if the service has a pos_script configured

What happens on timeout: - The pos_script process is terminated - The service continues running - A warning is logged

Post-Script Timeout

If your post-scripts perform health checks or critical initialization, ensure this timeout is long enough. A timeout doesn't stop the service itself, only the post-script.

Example use case:

[[services]]
name = "database"
command = "/usr/bin/mysqld"
pos_script = "/scripts/verify-db-ready.sh"  # May take 20 seconds

[timeouts]
post_script_timeout = 30  # Allow enough time for verification

service_shutdown_timeout

Type: integer (seconds)
Default: 30

Maximum time to wait for an individual service to shutdown gracefully after receiving SIGTERM.

[timeouts]
service_shutdown_timeout = 30

When it applies: - During system shutdown - When stopping a service via CLI - When a required service fails

What happens on timeout: - SIGKILL is sent to force termination - Service is marked as stopped - Shutdown process continues

Forced Termination

After this timeout expires, the service is forcefully killed with SIGKILL. This may result in data loss or corruption for services that need time to flush buffers or complete transactions.

Example use case:

[[services]]
name = "database"
command = "/usr/bin/postgres"
# Database needs time to flush writes and close connections

[timeouts]
service_shutdown_timeout = 60  # Give database more time to shutdown cleanly

global_shutdown_timeout

Type: integer (seconds)
Default: 60

Maximum time for the entire system to complete shutdown of all services.

[timeouts]
global_shutdown_timeout = 60

When it applies: - During system shutdown (SIGTERM/SIGINT to Go Overlay) - When a required service fails - When shutdown is initiated via CLI

What happens on timeout: - All remaining services are forcefully killed with SIGKILL - Go Overlay exits immediately - May result in ungraceful termination

System-Wide Timeout

This timeout applies to the entire shutdown sequence. If you have many services or services with long shutdown times, increase this value accordingly.

Calculation guideline:

global_shutdown_timeout >= (number_of_services * service_shutdown_timeout) + buffer

Example use case:

# System with 5 services, each needing up to 30 seconds
[timeouts]
service_shutdown_timeout = 30
global_shutdown_timeout = 180  # 5 * 30 + 30 second buffer

dependency_wait_timeout

Type: integer (seconds)
Default: 300

Maximum time to wait for service dependencies to become ready during startup.

[timeouts]
dependency_wait_timeout = 300

When it applies: - During system startup - When a service has depends_on configured - Waiting for dependent services to reach RUNNING state

What happens on timeout: - The waiting service fails to start - An error is logged - System may shutdown if the service is marked as required

Dependency Chains

For complex dependency chains, ensure this timeout accounts for the cumulative startup time of all dependencies plus their wait_after delays.

Example use case:

[[services]]
name = "database"
command = "/usr/bin/postgres"
wait_after = 10  # Needs 10 seconds to initialize

[[services]]
name = "app"
command = "/app/server"
depends_on = ["database"]

[timeouts]
dependency_wait_timeout = 60  # Allow time for database startup + wait_after

Timeout Reference Table

Timeout Default Unit Applies To On Timeout
post_script_timeout 30 seconds Post-start scripts Script terminated, service continues
service_shutdown_timeout 30 seconds Individual service shutdown SIGKILL sent to service
global_shutdown_timeout 60 seconds Complete system shutdown All services killed with SIGKILL
dependency_wait_timeout 300 seconds Waiting for dependencies Service fails to start

Complete Configuration Example

[timeouts]
# Post-script execution timeout
# Increase if health checks or verification scripts take longer
post_script_timeout = 45

# Individual service shutdown timeout
# Increase for databases or services that need time to flush data
service_shutdown_timeout = 60

# Global system shutdown timeout
# Should be larger than service_shutdown_timeout * number_of_services
global_shutdown_timeout = 180

# Dependency wait timeout
# Increase for complex dependency chains or slow-starting services
dependency_wait_timeout = 300

Timeout Scenarios

Scenario 1: Fast Microservices

For lightweight services that start and stop quickly:

[timeouts]
post_script_timeout = 10
service_shutdown_timeout = 15
global_shutdown_timeout = 30
dependency_wait_timeout = 60

Scenario 2: Database-Heavy Stack

For systems with databases that need time for graceful shutdown:

[timeouts]
post_script_timeout = 30
service_shutdown_timeout = 90  # Databases need time to flush
global_shutdown_timeout = 300  # Multiple services with long shutdowns
dependency_wait_timeout = 180  # Database initialization can be slow

Scenario 3: Complex Dependency Chain

For systems with many interdependent services:

[timeouts]
post_script_timeout = 30
service_shutdown_timeout = 30
global_shutdown_timeout = 120
dependency_wait_timeout = 600  # Long chain of dependencies

Best Practices

1. Set Realistic Timeouts

Don't set timeouts too short. Consider: - Service initialization time - Data flush/commit operations - Network operations - Dependency chains

2. Test Shutdown Behavior

Verify your timeouts work correctly:

# Start the system
go-overlay

# In another terminal, trigger shutdown
kill -TERM $(pidof go-overlay)

# Monitor logs to ensure graceful shutdown

3. Monitor Timeout Events

Watch logs for timeout warnings: - Frequent timeouts indicate values are too low - Services being killed with SIGKILL suggest insufficient shutdown time

4. Account for Dependencies

When using depends_on, ensure dependency_wait_timeout accounts for: - Startup time of all dependencies - Any wait_after delays - Service initialization time

5. Balance Safety and Speed

  • Too short: Risk of data loss, corruption, or incomplete operations
  • Too long: Slow shutdown, delayed restarts, poor user experience

Timeout Interactions

Startup Sequence

Service with dependencies starts
Wait for dependencies (dependency_wait_timeout)
Start service process
Run pos_script if configured (post_script_timeout)
Service marked as RUNNING

Shutdown Sequence

Shutdown initiated
Send SIGTERM to all services
Wait for each service (service_shutdown_timeout)
If timeout: Send SIGKILL
All services must stop within (global_shutdown_timeout)
If global timeout: Force kill all remaining services

Troubleshooting

Services Being Killed During Shutdown

Symptom: Services are forcefully terminated with SIGKILL

Solution: Increase service_shutdown_timeout

[timeouts]
service_shutdown_timeout = 60  # Increased from 30

System Shutdown Takes Too Long

Symptom: Shutdown hangs or takes excessive time

Solution: Decrease timeouts or investigate why services aren't stopping

[timeouts]
service_shutdown_timeout = 20  # Reduced from 30
global_shutdown_timeout = 60   # Reduced from 120

Services Fail to Start Due to Dependencies

Symptom: "dependency wait timeout exceeded" errors

Solution: Increase dependency_wait_timeout or reduce dependency wait_after values

[timeouts]
dependency_wait_timeout = 600  # Increased from 300

Post-Scripts Being Terminated

Symptom: Post-scripts don't complete, warnings in logs

Solution: Increase post_script_timeout or optimize scripts

[timeouts]
post_script_timeout = 60  # Increased from 30

Advanced Timeout Patterns

Progressive Timeout Strategy

Use longer timeouts for critical services:

[timeouts]
service_shutdown_timeout = 30  # Default for most services
global_shutdown_timeout = 120  # Allow critical services extra time

[[services]]
name = "cache"
command = "/usr/bin/redis-server"
required = false  # Can be killed quickly

[[services]]
name = "database"
command = "/usr/bin/postgres"
required = true  # Gets full shutdown timeout

Development vs Production

Use different timeout configurations:

Development (fast iteration):

[timeouts]
post_script_timeout = 10
service_shutdown_timeout = 15
global_shutdown_timeout = 30
dependency_wait_timeout = 60

Production (data safety):

[timeouts]
post_script_timeout = 30
service_shutdown_timeout = 90
global_shutdown_timeout = 300
dependency_wait_timeout = 300

Next Steps