Pivotal Knowledge Base

Follow

App logs appear to be out of order in PCF platform

Environment

Product Version
Elastic Runtime 1.x

Symptom

For applications that scale with multiple instances and have a high rate of log-emitting velocity, some of the message lines drained into a log-management service appear to be not in order.

This usually happens on large Pivotal Cloud Foundry® (PCF) platforms that have multiple instances of doppler server. 

Cause

The Loggregator system is designed to scale horizontally in order to handle the different load profiles of logs and metrics that are generated by the apps and the system components. As the size of the deployment grows, this logging load gets larger than can be handled by a single virtual machine (VM) chain. This scaling is done by adding Dopplers.

The logging load coming out of different VMs varies greatly. These VMs range from DEA and Diego app runners to system components such as the Router and the Cloud Controller. In order to evenly distribute the load over the scaled Dopplers, the Metrons distribute their log streams over the different Dopplers. The algorithm used is that for each log line (message), the Metron chooses a random Doppler to send it to. This means that each sequential log message from an application is routed through a different Doppler via a different network path. Because of the different paths, log lines take varying times to arrive at the destination nozzle or syslog. Because of this, the lines are not guaranteed to be in order. 

For details on the architecture of the Loggregator, its components and how the system works, start here. The GitHub repo for it has further information.

Resolution

Fortunately, the messages coming from Loggregator have a format described here. Most of the public log management systems carry the ability to detect indexes such as timestamp in a message line and sort based on them. Here are some of the examples from log systems:

  • loggly, which has a prioritized detection logic for the timestamp contained in a message line.
  • SumoLogic, which by default automatically detects the message timestamp and converts it to the epoch time. It also has the option to manually specify a format for the timestamp.

Other systems will have slightly different mechanisms for message sorting which could be used as an alternative to the output order provided by Loggregator. 

Comments

Powered by Zendesk