Performance Evaluation of Serverless Applications and Infrastructures


Context. Cloud computing has become the de facto standard for deploying modern web-based software systems, which makes its performance crucial to the efficient functioning of many applications. However, the unabated growth of established cloud services, such as Infrastructure-as-a-Service (IaaS), and the emergence of new serverless services, such as Function-as-a-Service (FaaS), has led to an unprecedented diversity of cloud services with different performance characteristics. Measuring these characteristics is difficult in dynamic cloud environments due to performance variability in large-scale distributed systems with limited observability.
Objective. This thesis aims to enable reproducible performance evaluation of serverless applications and their underlying cloud infrastructure.
Method. A combination of literature review and empirical research established a consolidated view on serverless applications and their performance. New solutions were developed through engineering research and used to conduct performance benchmarking field experiments in cloud environments.
Findings. The review of 112 FaaS performance studies from academic and industrial sources found a strong focus on a single cloud platform using artificial micro-benchmarks and discovered that most studies do not follow reproducibility principles on cloud experimentation. Characterizing 89 serverless applications revealed that they are most commonly used for short-running tasks with low data volume and bursty workloads. A novel trace-based serverless application benchmark shows that external service calls often dominate the median end-to-end latency and cause long tail latency. The latency breakdown analysis further identifies performance challenges of serverless applications, such as long delays through asynchronous function triggers, substantial runtime initialization for coldstarts, increased performance variability under bursty workloads, and heavily provider-dependent performance characteristics. The evaluation of different cloud benchmarking methodologies has shown that only selected micro-benchmarks are suitable for estimating application performance, performance variability depends on the resource type, and batch testing on the same instance with repetitions should be used for reliable performance testing.
Conclusions. The insights of this thesis can guide practitioners in building performance-optimized serverless applications and researchers in reproducibly evaluating cloud performance using suitable execution methodologies and different benchmark types.

Chalmers University of Technology
Thesis Goal: My PhD thesis aims to enable reproducible performance evaluation of serverless applications and its underlying cloud infrastructure.

Public Defense

My public PhD defense takes place in a hybrid format onsite at Chalmers and online via Zoom (PDF Invitation).

Date and Time: Thursday September 8th, 2022, 13:00 (CEST)

Opponent: Prof. Petr Tůma, Charles University Prague, Czech Republic

Committee Members:

Room Jupiter 243 (Chalmers Map), Chalmers Campus Lindholmen
Hörselgången 5 (Google Maps) 417 56 Gothenburg, Sweden

Zoom Link: Available at Chalmers Research.

We kindly ask you to: Join the meeting in good time before it starts (~5min) so we can admit you timely. The chairperson leads the meeting and announces when questions from the audience are welcomed.


This PhD thesis was published by the Chalmers University of Technology at Chalmers Research.

PhD Thesis Frontcover
PhD Thesis Frontcover

Thesis Synopsis: The published PDF only contains the synopsis. The full-texts of the included papers are linked for published papers. Feel free to contact me for a personal copy of the full thesis containing all papers. Unfortunately, I cannot publish the full thesis due to potential copyright issues.

Included Papers

[𝛂] Function-as-a-Service Performance Evaluation

Function-as-a-Service Performance Evaluation: A Multivocal Literature Review

This JSS'20 journal paper describes a multivocal literature review (MLR) covering 112 performance studies of Function-as-a-Service (FaaS) platforms. It consolidates the results from 61 industrial and 51 academic performance studies and provides actionable recommendations on reproducible FaaS experimentation. The study concludes that future work needs to go beyond over-simplified micro-benchmarks benchmarks and focus on more realistic application-level benchmarks and workloads.

[𝛃] Serverless Application Characteristics

The State of Serverless Applications: Collection, Characterization, and Community Consensus

This TSE'21 journal paper (extending the IEEE Software article and technical report) studies the state of serverless applications. It contributes the largest collection of 89 serverless applications to date, systematically characterizes these applications along 16 characteristics, and presents a meta-study across 10 related studies towards building a community consensus about typical serverless applications.

[𝛄] Serverless Application Benchmark

Let’s Trace It: Fine-Grained Serverless Benchmarking using Synchronous and Asynchronous Applications (preprint)

This contribution, under submission at a journal conference, proposes a comprehensive application-level benchmark suite, designs novel algorithms for fine-grained latency breakdown analysis based on distributed tracing, conducts a large-scale empirical performance study, and releases a FAIR replication package of the software, data, and results. It addresses research gaps identified in Paper 𝛂 by presenting solutions that are building on the insights from Paper 𝛃. The results show that the median end-to-end latency of serverless applications is often dominated not by function computation but by external service calls, orchestration, or trigger-based coordination.

[𝛅] Cross-provider Application Benchmarking

CrossFit: Fine-grained Benchmarking of Serverless Application Performance across Cloud Providers (preprint)

This contribution, under submission at a conference, presents an approach for detailed and fair cross-provider performance benchmarking of serverless applications based on a provider-independent tracing model. Further, an empirical study demonstrates how detailed distributed tracing enables drill-down analysis to explain performance differences between two leading cloud providers. It addresses research gaps identified in Paper 𝛂 and refines a specific application scenario from Paper 𝛄. The results for an asynchronous application reveal extensive trigger delays and show how increasing and bursty workloads affect performance stability, median latency, and tail latency.

[𝛆] Serverless Function Trigger Benchmark

TriggerBench: A Performance Benchmark for Serverless Function Triggers

This contribution, accepted at IC2E'22 as short paper, quantifies the effect of serverless function triggers on trigger latency. Trigger latency is the delay to transition between an invoker and receiver function given a specific trigger type. It addresses a gap that was identified in Paper 𝛂 and raised as performance problem in Paper 𝛄 and 𝛅.

[ζ] Cloud Benchmark Suite

A Cloud Benchmark Suite Combining Micro and Applications Benchmarks

This QUODS'18 workshop paper presents a new execution methodology that combines micro and application benchmarks into an integrated benchmark suite for IaaS clouds and reports results on cost-performance tradeoff, performance stability, and resource utilization.

[η] Cloud Application Performance Estimation

Estimating Cloud Application Performance Based on Micro-Benchmark Profiling

This CLOUD'18 conference paper develops a cloud benchmarking methodology that uses micro-benchmarks to profile applications and subsequently predicts how an application performs on a wide range of cloud services. A study with a leading cloud provider quantitatively evaluated the estimation model with 38 metrics from 23 micro-benchmarks and 2 applications from different domains. It builds upon the benchmark suite from Paper ζ and highlights the connection between micro- and application benchmarks discussed in Paper 𝛂.

[θ] Software Microbenchmarking in the Cloud

Software microbenchmarking in the cloud. How bad is it really?

This EMSE'19 journal paper quantifies the effects of cloud environments on the variability of software performance test results and to what extent slowdowns can still be reliably detected even in a public cloud. It presents large-scale experiments across multiple providers, programming languages, software microbenchmarks, instance types, and execution methods that reveal substantial differences in variability between benchmarks and instance types. This contribution focuses on reproducibility concerns raised by Paper 𝛂.