Container Metrics and Recent Log Times Out with Numerous Dopplers


Pivotal Cloud Foundry® (PCF) 1.10 and 1.11


While running cf logs <app-name> we are seeing the following errors: 

>cf logs admin-portal --recent 
Retrieving logs for app admin-portal in org <org-name> / space <space-name> as admin...

Error dialing trafficcontroller server: Get https://doppler.<system-domain>:443/apps/<app guid>/recentlogs: net/http: request canceled (Client.Timeout exceeded while awaiting headers). 
Please ask your Cloud Foundry Operator to check the platform configuration (trafficcontroller endpoint is wss://doppler.<system-domain>:443). 


TrafficControllers would request recent logs and container metrics from Dopplers serially, only asking one Doppler at a time for their results. When you have a lot of Dopplers this often takes too long and the CLI times out.


Recommended solution

To resolve this issue, please upgrade Pivotal Cloud Foundry Elastic Runtime version to 1.10.33 [1] for PCF 1.10 and 1.11.19 [2] for PCF 1.11. This release makes those requests concurrently and adds a 5 second timeout to each request.

Other temporary workarounds

If your can't upgrade, You can mitigate the issue by scaling up the Dopplers and Traffic Controllers (increasing CPU and Memory), but the problem still exists, however will occur less frequently because the VMs are not under stress and are able to respond quicker.

Additional Information

1. What created the high memory usage on the Dopplers and Traffic Controllers that we didn't have on PCF 1.9?

The reason we saw the high memory usage on PCF 1.10 Dopplers and Traffic Controllers in the foundation is that we switched to using gRPC rather than UDP for communication from metron to Doppler. gRPC [0] adds the reliability of TCP and causes increased load. With UDP the kernel would drop messages and we would never know about it (as it happens with UDP).

[1] https://docs.pivotal.io/pivotalcf/1-11/devguide/deploy-apps/cf-networking.html#create-policies

[2] https://docs.pivotal.io/pivotalcf/1-10/pcf-release-notes/runtime-rn.html#1.10.33

[3] https://docs.pivotal.io/pivotalcf/1-11/pcf-release-notes/runtime-rn.html#1.11.19



