Pivotal Knowledge Base

Follow

Azure Networking Connection idle for more than 4 minutes

Environment

 Product Version
 Pivotal Cloud Foundry  All versions
 Azure Networking  Versions with a Public IP
 Spring Cloud Services  

Symptom

The Azure networking endpoints disconnect active TCP connections that are idle for over 4 minutes.

When a user is trying to change the Elastic Runtime Network setting Router Timeout to Backends (in seconds) to XXX seconds which is under the 4-minute default for Azure networking, he is still not able to pass the Spring Cloud Services (SCS) smoke tests.

Background

The HAProxy and Gorouter use the same top-level manifest property:

properties:
request_timeout_in_seconds: 900

The Elastic Runtime tile enables configuration of this property, so it's expected to be applied to both HAProxy and Gorouter. This setting should still be under 4 minutes (240 seconds) for Azure deployments. Our testing uses a value of 160 seconds.

The original problem that leads to an investigation had to do with intermittent issues with Spring Cloud Services (SCS) connections during smoke test runs and. The problem is even more likely to happen when real users are creating and updating service instances, as it is likely the SCS broker will be idle for more than 4 minutes between user requests. This was identified initially when SCS service broker logs had message “Timed out waiting for connection.”

Cause

After lots of investigation with involvements from SCS, Ecosystem, Diego, BOSH, Garden, and Microsoft Azure team members, the underlying issue is that any resource with a public IP endpoint on Azure, such as an Azure Load Balancer (ALB), has a default idle connection timeout of 4 minutes. When Azure detects that a connection has been idle for more than 4 minutes, it closes the connection without sending a TCP RESET to inform the client side that the connection has been closed.

Workaround

The problem is with the timeout for connections from Router (and HAProxy, if you use it) to applications and system components. Increase this to accommodate larger uploads over connections with high latency. Set idle timeouts for all Azure public IPs to 30 min.

Create script file to run the following commands to workaround this problem:

#!/bin/bash
for pip in $( azure network public-ip list -g $1 | grep 'data' | tail +3 | cut -f 5 -d ' '  )
do
azure network public-ip set -g $1 -n $pip -i 30
done

Running the script looks like:

./arbitrary script name>.sh <resource group name>

For example, run this with:

./set-timeouts.sh pcfresourcegroup

Comments

Powered by Zendesk