|Pivotal Cloud Foundry||All versions|
|Azure Networking||Versions with a Public IP|
|Spring Cloud Services|
The Azure networking endpoints disconnect active TCP connections that are idle for over 4 minutes.
When a user is trying to change the Elastic Runtime Network setting Router Timeout to Backends (in seconds) to XXX seconds which is under the 4-minute default for Azure networking, he is still not able to pass the Spring Cloud Services (SCS) smoke tests.
The HAProxy and Gorouter use the same top-level manifest property:
properties: request_timeout_in_seconds: 900
The Elastic Runtime tile enables configuration of this property, so it's expected to be applied to both HAProxy and Gorouter. This setting should still be under 4 minutes (240 seconds) for Azure deployments. Our testing uses a value of 160 seconds.
The original problem that leads to an investigation had to do with intermittent issues with Spring Cloud Services (SCS) connections during smoke test runs and. The problem is even more likely to happen when real users are creating and updating service instances, as it is likely the SCS broker will be idle for more than 4 minutes between user requests. This was identified initially when SCS service broker logs had message “Timed out waiting for connection.”
After lots of investigation with involvements from SCS, Ecosystem, Diego, BOSH, Garden, and Microsoft Azure team members, the underlying issue is that any resource with a public IP endpoint on Azure, such as an Azure Load Balancer (ALB), has a default idle connection timeout of 4 minutes. When Azure detects that a connection has been idle for more than 4 minutes, it closes the connection without sending a TCP RESET to inform the client side that the connection has been closed.
The problem is with the timeout for connections from Router (and HAProxy, if you use it) to applications and system components. Increase this to accommodate larger uploads over connections with high latency. Set idle timeouts for all Azure public IPs to 30 min.
Create script file to run the following commands to workaround this problem:
#!/bin/bash for pip in $( azure network public-ip list -g $1 | grep 'data' | tail +3 | cut -f 5 -d ' ' ) do azure network public-ip set -g $1 -n $pip -i 30 done
Running the script looks like:
./arbitrary script name>.sh <resource group name>
For example, run this with: