Technische Universität Braunschweig
  • Study & Teaching
    • Beginning your Studies
      • Prospective Students
      • Degree Programmes
      • Application
      • Fit4TU
      • Why Braunschweig?
    • During your Studies
      • Fresher's Hub
      • Term Dates
      • Courses
      • Practical Information
      • Beratungsnavi
      • Additional Qualifications
      • Financing and Costs
      • Special Circumstances
      • Health and Well-being
      • Campus life
    • At the End of your Studies
      • Discontinuation and Credentials Certification
      • After graduation
      • Alumni*ae
    • For Teaching Staff
      • Strategy, Offers and Information
      • Learning Management System Stud.IP
    • Contact
      • Study Service Centre
      • Academic Advice Service
      • Student Office
      • Career Service
  • Research
    • Research Profile
      • Core Research Areas
      • Clusters of Excellence at TU Braunschweig
      • Research Projects
      • Research Centres
      • Professors‘ Research Profiles
    • Early Career Researchers
      • Support in the early stages of an academic career
      • PhD-Students
      • Postdocs
      • Junior research group leaders
      • Junior Professorship and Tenure-Track
      • Habilitation
      • Service Offers for Scientists
    • Research Data & Transparency
      • Transparency in Research
      • Research Data
      • Open Access Strategy
      • Digital Research Announcement
    • Research Funding
      • Research Funding Network
      • Research funding
    • Contact
      • Research Services
      • Academy for Graduates
  • International
    • International Students
      • Why Braunschweig?
      • Degree seeking students
      • Exchange Studies
      • TU Braunschweig Summer School
      • Refugees
      • International Student Support
    • Going Abroad
      • Studying abroad
      • Internships abroad
      • Teaching and research abroad
      • Working abroad
    • International Researchers
      • Welcome Support
      • PhD Studies
      • Service for host institutes
    • Language and intercultural competence training
      • Learning German
      • Learning Foreign Languages
      • Intercultural Communication
    • International Profile
      • Internationalisation
      • International Cooperations
      • Strategic Partnerships
      • International networks
    • International House
      • About us
      • Contact & Office Hours
      • News and Events
      • International Days
      • 5th Student Conference: Internationalisation of Higher Education
      • Newsletter, Podcast & Videos
      • Job Advertisements
  • TU Braunschweig
    • Our Profile
      • Aims & Values
      • Regulations and Guidelines
      • Alliances & Partners
      • The University Development Initiative 2030
      • Foundation University
      • Facts & Figures
      • Our History
    • Career
      • Working at TU Braunschweig
      • Vacancies
    • Economy & Business
      • Entrepreneurship
      • Friends & Supporters
    • General Public
      • Check-in for Students
      • The Student House
      • Access to the University Library
    • Media Services
      • Communications and Press Service
      • Services for media
      • Film and photo permits
      • Advices for scientists
      • Topics and stories
    • Contact
      • General Contact
      • Getting here
  • Organisation
    • Presidency & Administration
      • Executive Board
      • Designated Offices
      • Administration
      • Committees
    • Faculties
      • Carl-Friedrich-Gauß-Fakultät
      • Faculty of Life Sciences
      • Faculty of Architecture, Civil Engineering and Environmental Sciences
      • Faculty of Mechanical Engineering
      • Faculty of Electrical Engineering, Information Technology, Physics
      • Faculty of Humanities and Education
    • Institutes
      • Institutes from A to Z
    • Facilities
      • University Library
      • Gauß-IT-Zentrum
      • Professional and Personnel Development
      • International House
      • The Project House of the TU Braunschweig
      • Transfer Service
      • University Sports Center
      • Facilities from A to Z
    • Equal Opportunity Office
      • Equal Opportunity Office
      • Family
      • Diversity for Students
  • Search
  • Quicklinks
    • People Search
    • Webmail
    • cloud.TU Braunschweig
    • Messenger
    • Cafeteria
    • Courses
    • Stud.IP
    • Library Catalogue
    • IT Services
    • Information Portal (employees)
    • Link Collection
    • DE
    • EN
    • IBR YouTube
    • Facebook
    • Instagram
    • YouTube
    • LinkedIn
    • Mastodon
Menu
  • Organisation
  • Faculties
  • Carl-Friedrich-Gauß-Fakultät
  • Institutes
  • Institute of Operating Systems and Computer Networks
  • Prof. Dr.-Ing. Christian Dietrich
  • Advent(2)
  • Pipes Full of Gravy
Logo IBR
IBR Login
  • Institute of Operating Systems and Computer Networks
    • News
    • About us
      • Whole Team
      • Directions
      • Floor Plan
      • Projects
      • Publications
      • Software
      • News Archive
    • Connected and Mobile Systems
      • Team
      • Courses
      • Theses
      • Projects
      • Publications
      • Software
      • Datasets
    • Reliable System Software
      • Overview
      • Team
      • Teaching
      • Theses & Jobs
      • Research
      • Publications
    • Algorithms
      • Team
      • Courses
      • Theses
      • Projects
      • Publications
    • Microprocessor Lab
    • Education
      • Winter 2025/2026
      • Summer 2025
      • Theses
    • Services
      • Library
      • Mailinglists
      • Webmail
      • Knowledge Base
      • Wiki
      • Account Management
      • Services Status
    • Spin-Offs
      • Docoloc
      • bliq (formerly AIPARK)
      • Confidential Technologies
    • Research Cooperations
      • IST.hub
  • Task Overview
  • Git repository
  • Mailing list
  • Matrix-Channel

Pipes Full of Gravy

☃️
Git-Repository: Template Solution Solution-Diff (Solution is posted at 18:00 CET)
Workload: 131 lines of code
Important System-Calls: epoll_create(2), epoll_ctl(2), epoll_wait(2), pipe(7), splice(2)
Recommended Reads:
  • Dec. 7: Select a Gift file 110 lines [select(2)]
Illustration for this exercsie

Christmas is also the season, where the kitchen in the ELF village is a very busy place. All the food and all the cookies for Christmas eve are prepared ahead of time and stored for the big feast when Santa comes home from the tour. And there is one thing that Santa is especially fond of. There is only one thing that both quenches the thirst and fills up the stomach: Gravy! Immense amount of gravy to drench meat and vegetables alike. But making this special ELF gravy is a very peculiar process as it involves multiple process and filter steps to ensure the delicious clarity of the brown molten gold.

However, last year, while the individiual steps of the process were already optimized, there were a few difficulties with the overall gravy production: the transfers stalled and blocked as there was only a single ELF with a single cart that moved the proto-gravy around. For this year, the ELF senate decided to come up with a better gravy-transportation process.

Of Pipes and Polls

Today, we want to look at two different system call interfaces, combine them, and help our elves with the gravy problem. We will start with epoll(7), which is a more modern variant of select(2), and splice(2), which allows the user to directly redirect data from one pipe to another one. In essence, our ELF gets a mechanism to determine which process of the gravy production is producing proto-gravy at its output (epoll), and a mechanism to directly establish pipelines between the output and the input buckets of the gravy production without loading or unloading of the cart.

epoll

Let's first have a look at epoll() and the problem that it solves: With select, we passed a bit-set of queried file descriptors to the kernel and got back the same set where only the bits of the "ready" descriptors are set. So, we have to rejuvenate the poll set for every select invocation and we have to iterate over the set to determine which descriptors are actually ready. Both steps require time and are inefficient as usually the set of polled descriptors is very large and does not change that often.

To solve this, epoll(7) was created in Linux 2.5.44 (2002): In a nutshell, we create a epoll-descriptor with epoll_create(2), which is a file-descriptor itself, and register the file-descriptors that are interesting for us with epoll_ctl(2). With epoll_wait(2), we query the set of registered descriptors and the kernel returns one or multiple "events" (becoming ready is an event) that we can work on. This is fundamentally different than select in two respects: (1) Unless we register a file descriptor explicitly as EPOLLONESHOT, it remains registered even if it already produced an event. (2) The return value is a dense stream of events with much fewer elements than the number of registered fds. In its functionality, epoll is similar to the poll(2) system call, which already solves (1) and allows us to reuse the set of interesting fds. Nevertheless, poll still forces us to iterate over the fd set to find the interesting ones.

In essence, an epoll event loop looks like this:

// Create epoll object
int epoll_fd = epoll_create(...);

// Register interesting FDs
epoll_ctl(epoll_fd, EPOLL_ADD, fd, ...)

// The event loop
while (true) {
    struct epoll_event events[10];
    int n = epoll_wait(epoll_fd, events,....);
    for (int i = 0; i < n; i++) {
        process(events[i])
    }
}

However, one thing is special about epoll: It does not tell us the file descriptor that produced an event, but it only hands back user-defined data (struct epoll_data_t) that we gave it when registering the fds. This could be the file descriptor number, but it also could be a pointer to some fancy context object.

Pipes and splice

The other part of today are Unix pipes (pipe(7) that we want to splice(2) together. For this, we first have to understand that a Unix pipe, which was championed by Doug Mellroy, is an connection between to file descriptors. These fds are not connected to a real file, but if we write some data into one fd, the data can be read from the other FD. This comes with the constraint that pipe are unidirectional and we have a "write end" and a "read end". And this is reflected in the API to create an pipe object with pipe(2) (or with flags: pipe2(2)):

int pipe(int pipefd[2]);
....
int P[2];
pipe(P);
// P[0]: read end
// P[1]: write end

If you use your shell to spawn a command with pipes, this system call is used by the shell to establish the connection between the processes. For example, for cat file | grep bar, the shell invokes pipe2() once, gives the write end (P[1]) as stdout to the cat-process and the read end (P[0]) to the grep process. Thereby, both processes communicate directly with each other and the shell is no longer involved.

However, sometimes, we want to have a more fine grained control over how and when data is moved between two processes. For example, if we want to measure the amount of gravy that flows between two production steps. For this, the shell must become an intermediator, for example by creating two pipe pairs: one from cat to the shell and one from the shell to grep. Nevertheless, it is not desirable to read() the data into the shells address space first, just to write() it out to the grep-pipe. And this is where splice(2) comes in handy:

ssize_t splice(int fd_in,  off64_t *off_in, 
               int fd_out, off64_t *off_out, 
               size_t len, unsigned int flags);

splice() connects two file descriptors (one or both of them must be a pipe) for len bytes. Furthermore, we can add some flags and set input and output file offsets. As a result, we get the number of bytes that were actually written.

Task

Write a single-threaded program that takes multiple filter commands as separate arguments:

seq 1 100000000000000 | ./epoll "grep [0-5]" "grep 1$"  > /dev/null

The program spawns both filters, splices its stdin data with splice() to the first filter, splices the first filter's output to the second filter's input, and so on, until the output of the last filter is written to stdout of the ./epoll program. Meanwhile, the epoll program should record and regularly print the throughput of the filter pipeline to determine how much gravy is flowing. This is similar to the behavior of the pv(1) program. As an output, you should expect something like:

[grep [0-5]] Started filter as pid 458367
[grep 1$] Started filter as pid 458368
 73.63MiB/s 72.70MiB/s 7.29MiB/s
 85.22MiB/s 85.06MiB/s 8.52MiB/s
 92.25MiB/s 92.25MiB/s 9.22MiB/s
 92.44MiB/s 92.46MiB/s 9.25MiB/s

Hints

  • Use epoll to find those pipe write ends that are ready and have data.
  • Start out with an implementation of copy_splice that uses read()/write()
  • As a simplification, use splice eagerly when data is ready and do not try to identify read ends that are ready to accept new data.
  • Enumerate the connected pipe pairs with an index and set that number as user data when adding file descriptors to the epoll object.

Last modified: 2023-12-01 15:52:28.025765, Last author: , Permalink: /p/advent-10-epoll


last changed 2023-12-01, 15:52 by Prof. Dr.-Ing. Christian Dietrich

For All Visitors

Vacancies of TU Braunschweig
Career Service' Job Exchange 
Merchandising

For Students

Term Dates
Courses
Degree Programmes
Information for Freshman
TUCard

Internal Tools

Glossary (GER-EN)
Change your Personal Data

Contact

Technische Universität Braunschweig
Universitätsplatz 2
38106 Braunschweig

P. O. Box: 38092 Braunschweig
GERMANY

Phone: +49 (0) 531 391-0

Getting here

© Technische Universität Braunschweig
Imprint Privacy Accessibility