Function-as-a-Service (FaaS) is one form of the serverless cloud computing paradigm and is defined through FaaS platforms (e.g., AWS Lambda) executing event-triggered code snippets (i.e., functions). Many studies that empirically evaluate the performance of such FaaS platforms have started to appear but we are currently lacking a comprehensive understanding of the overall domain. In our work, we survey existing research on FaaS performance evaluation and present results from a multivocal literature review (MLR) covering 112 studies from academic (51) and grey literature (61). We find that existing work heavily focuses on AWS Lambda and prevalently evaluates micro-benchmarks using simple functions to measure CPU speed and FaaS platform overhead (i.e., container cold starts). Further, we identify a mismatch between academic and industrial sources on tested platform configurations, conclude that function triggers remain insufficiently studied, and list HTTP API gateways and cloud storages as the most used external service integrations. Following existing guidelines on experimentation in cloud systems, we discover many flaws threatening the reproducibility of experiments presented in the surveyed studies. We conclude with a discussion of gaps in literature and highlight methodological suggestions that may serve to improve future FaaS performance evaluation studies.