Pivotal Knowledge Base

Follow

Azure Networking Connection Idle for more than Four minutes

Environment

  • Pivotal Cloud Foundry all versions
  • Azure Networking Version with a Public IP
  • Spring Cloud Services

Symptom

When a user is trying to change the Elastic Runtime Network setting Router Timeout to Backends (in seconds) to XXX seconds which is under the 4-minute default for Azure networking, he is still not able to pass the Spring Cloud Services (SCS) smoke tests.

Background

The HAProxy and Gorouter use the same top-level manifest property:

properties:
request_timeout_in_seconds: 900

The Elastic Runtime tile enables configuration of this property, so it's expected to be applied to both HAProxy and Gorouter. This setting should still be under 4 minutes (240 seconds) for Azure deployments. Our testing uses a value of 160 seconds.

The original problem that leads to an investigation had to do with intermittent issues with Spring Cloud Services (SCS) connections during smoke test runs and. The problem is even more likely to happen when real users are creating and updating service instances, as it is likely the SCS broker will be idle for more than 4 minutes between user requests. This was identified initially when SCS service broker logs had the message “Timed out waiting for a connection”.

Cause

After lots of investigation with involvements from SCS, Ecosystem, Diego, BOSH, Garden, and Microsoft Azure team members, the underlying issue is that any resource with a public IP endpoint on Azure, such as an Azure Load Balancer (ALB), has a default idle connection timeout of 4 minutes. When Azure detects that a connection has been idle for more than 4 minutes, it closes the connection without sending a TCP RESET to inform the client side that the connection has been closed.

Resolution

The problem is with the timeout for connections from Router (and HAProxy, if you use it) to applications and system components. Increase this to accommodate larger uploads over connections with high latency. Set idle timeouts for all Azure public IPs to 30 min.

Create script file to run the following commands to workaround this problem:

#!/bin/bash
for pip in $( azure network public-ip list -g $1 | grep 'data' | tail +3 | cut -f 5 -d ' '  )
do
azure network public-ip set -g $1 -n $pip -i 30
done

Running the script looks like:

./arbitrary script name>.sh <resource group name>

For example, run this with:

./set-timeouts.sh pcfresourcegroup

Comments

Powered by Zendesk