Weekly digest (week 27-2018)
High Speed Trains are killing the European railway network (2013). The author shows that saved travelling time is not much compared to increased ticket prices. In addition, even if trains are actually a more sustainable alternative to air traffic, the high cost of high speed train tickets forced more people to choose cheap but polluting flights over trains. Hence, the overall effect of high speed trains on the environment is negative. Wu Ming 1 from Wu Ming collective wrote an interesting book about the fights agaist high speed trains in Val di Susa (in Italian).
Understanding app performances in the wild. Facebook has released Profilo, an a high-throughput, mobile-first performance tracing library. This tool allows developers to collect and analyse production performance traces. Profilo is available only for Android, at least for now.
Lessons from Building Observability Tools at Netflix. Netflix relies on several metrics to measure customers’ experience and to improve their product. Simply storing logs does not scale well and the microservice architecture adopted by Netflix introduces new challenges, too. Here is what Netflix did.
- Logs are kept and processed in memory and persisted only when needed. The bigger the system is, the higher storage costs and query times are. See also Mantis.
- Microservices produce several distributed streams of logs. In order the get new insights on logs it is important to enrich traces with contextual information “so that multiple traces can be grouped together across services”.
- Traces are not analysed by eye, of course. Metrics analysis should be automatized and alerts should be raised if anomalies are detected. Netflix went beyond trivial threshold alerting and implemented some statistical/machine learning algorithms to analyse metrics trends.
- Log storing and retriving must be fast. Netflix used different databases (Cassandra, Elasticsearch, Hive) depending on the kinds of queries. For example, Elasticsearch is better for queries on different fields while Cassandra is better for queries by ids.
- Logs and analysis are made available to engineers and users through custom user interfaces. It does not make sense to show all logs always because not everybody is able to understand them.
It would interesting to know what they do for limiting the overhead of log collecting on the whole system performances.
- Living APIs, and the Case for GraphQL. The author says that, despite some interests from the developer community and some big players (e.g. Github), GraphQL isn’t spreading very fast. The reason for the author is that “GraphQL’s biggest problem may be that,although it’s better, it’s not ‘better enough’” with respect to REST. Then he makes some in depth arguments about why GraphQL is better. Instead, it would be interesting to know why it is not better enough.
- React 16 makes available some performance metrics via browsers User Timing API. This will open a lot of possibilities for profiling components. For example, it could be interesting to see performance tools integrated into our CI processes. Here is a report from Zalando engineers about their real world experience with performance optimization in React.