Institut für Betriebssysteme und Rechnerverbund
- News
- Wir über uns
- Connected and Mobile Systems
- Verlässliche Systemsoftware
  - Übersicht
  - Team
  - Lehre
  - Arbeiten & Jobs
  - Forschung
  - Publikationen
- Algorithmik
- Mikroprozessorlabor
- Studium
- Service
- Spin-Offs
  - Docoloc
  - bliq (formerly AIPARK)
  - Confidential Technologies
- Forschungsverbünde
  - IST.hub

Improving Benchmark Accuracy by Randomizing Binary Code Alignment

Betreuer	Niklas Gollenstede
Projekt
IBR Gruppe	VSS (Prof. Dietrich)
Art	Bachelorarbeit
Status	vorläufig
Problem In many cases, memory layout is decided more by happenstance than by the developer or well-targeted compiler optimizations. Yet research has shown that the memory layout of a program can have a significant impact on its performance. Therefore, while developers would generally attribute reproducible performance differences after source code changes to the difference in the implementation, it may well be that different code sizes happen to cause changes to the memory layout that happen to make the program faster or slower. Background The impact of memory layout on performance comes from the complexity of modern CPU and memory architectures, and especially the multi-layered caching and other optimizations like branch prediction that they perform. If a loop is executed 1000 times, it may make a significant difference whether it is aligned to fit into one cache block or not. Our current programming abstractions, models and languages, however, do not mean for the developer to express explicit decisions about memory layout. Additionally, many of the effects are hardware dependent -- what is faster on one CPU may be slower on another. Tasks In a first step, this thesis will try to evaluate the true impact of different kinds of memory layout changes. First by adding small snippets of local code, and second by adding a function and adding additional object files to a project. Additionally, changes in the stack layout (e.g., a fixed offset) should be simulated. Using these modifications, impact on large performance critical software, e.g. python or database software, will be evaluated. The second goal is to build an automated performance measurement suite that can execute source-available software benchmarks independently of any specific memory layout that was constructed arbitrarily by the build process. It should achieve this by repeatedly randomly applying the aforementioned modifications to the software and averaging the results. The suite will need to be evaluated for its efficiency (overhead over the baseline) and effectiveness (how well it normalizes performance). For this, several versions of the same software, presumably with code changes that happen to produce different layouts, will be compared to the randomized variants on average (efficiency) and in the amplitude of their performance progression (effectiveness). It may make sense to also evaluate on different architectures (old x86, new x86, ARM): When normalized, the performance progressions of different software versions should follow each other more closely across the architectures. Related / Prior Work Producing Wrong Data Without Doing Anything Obviously Wrong!: Description of the problem and draft of a solution STABILIZER: Statistically Sound Performance Evaluation: Run-time randomization (too complex), but good analysis Fist part of "Performance Matters" by Emery Berger: Talk about STABLIZER Work on binary optimizers (esp. BOLT (Facebook) and Propeller (Google))