Why libtorrent 2.0's use of memory mapped files was a bad idea #7551

SemiAccurate · 2023-11-15T10:00:35Z

A paper published last year goes into the details of why database systems using mmap in lieu of implementing a buffer pool inevitably run into problems from both a performance and correctness perspective. Of the four specific issues they detail, the first - transactional safety - does not apply to libtorrent since it is not a database in the conventional sense but works with immutable torrent files. But a lot of the reports they reference are eerily similar to issues reported by people here. (eg. "They also faced other issues when running in containerized environments or on machines without direct-attached storage" see #7480) And let's not even detail the additional issues this approach runs into with Windows, as reported by people here.

The paper Are You Sure You Want to Use MMAP in Your DBMS? says this in its abstract:

"mmap’s perceived ease of use has seduced DBMS developers for decades as a viable alternative to implementing a buffer pool. There are, however, severe correctness and performance issues with mmap that are not immediately apparent. Such problems make it difficult, if not impossible, to use mmap correctly and efficiently in a modern DBMS. In fact, several popular DBMSs initially used mmap to support larger-than-memory databases but soon encountered these hidden perils, forcing them to switch to managing file I/O themselves after significant engineering costs."

Though the paper has its detractors, namely the creators of the LMDB and RDB projects who use mmap (see an attempt at a rebuttal here) none disagree that if you want substantive control over I/O behavior then you have to do it yourself in userspace rather than rely on the OS to do it for you.

HanabishiRecca · 2023-11-16T13:29:51Z

I think it can be salvaged though. Recently described it in #6667 (comment).

ukoz · 2023-11-16T15:56:17Z

Compile 2.0 with TORRENT_HAVE_MMAP=0;TORRENT_HAVE_MAP_VIEW_OF_FILE=0;.
MMAP and multithreaded IO solves large torrents issue which you probably never encounter in "containerized environments or on machines without direct-attached storage".

SemiAccurate · 2023-11-17T06:56:10Z

Compile 2.0 with TORRENT_HAVE_MMAP=0;TORRENT_HAVE_MAP_VIEW_OF_FILE=0;.

This would still have very poor performance compared to libtorrent 1.2's disk I/O subsystem.

MMAP and multithreaded IO solves large torrents issue which you probably never encounter in "containerized environments or on machines without direct-attached storage".

Those on network shares have ostensibly encountered these issues .. among others.

SemiAccurate · 2023-11-17T07:00:34Z

I think it can be salvaged though. Recently described it in #6667 (comment).

@HanabishiRecca Arvid has been trying 'salvage' the mmap implementation for literally three years now but it hasn't been successful. Mind you a lot of these knobs are on Linux and do nothing for the others OSes. Ultimately the lesson from the paper is that to have control over I/O behavior in your program you really have to do it yourself in userspace.

HanabishiRecca · 2023-11-17T09:37:20Z

Well, yeah. I tried to address excessive memory usage in particular.
I heard about the performance issues as well, especially using HDDs.
I am not an expert in this topic, but that's kinda strange that OSes perform so poorly. Maybe mmap-ed files were never meant to be used for intensive I/O tasks.

But Arvid is kinda stubborn in this regard and I doubt we will see "back to the roots" any time soon.

SemiAccurate · 2023-11-18T21:59:49Z

@arvidn the paper outlines the cases of many DBMS projects initially opting for mmap but then switching away when its limitations became clear and they need control over I/O performance. However your case with libtorrent is unique in that your trajectory is the reverse of many of these projects: You started out with your own buffer pool implementation, managing file I/O in userspace, but then switched to mmap with 2.0. What where the reasons that led you to make this curious decision?

arvidn · 2023-11-19T12:58:23Z

Arvid has been trying 'salvage' the mmap implementation for literally three years now but it hasn't been successful.

Contributions are welcome!

Mind you a lot of these knobs are on Linux and do nothing for the others OSes.

Windows does have some counterparts, like msync() and FlushViewOfFile()

But Arvid is kinda stubborn in this regard and I doubt we will see "back to the roots" any time soon.

I'm working on it. It's not easy to get right and efficient, contributions are welcome:
#7013

What where the reasons that led you to make this curious decision?

They are mostly documented here: https://github.com/arvidn/libtorrent/wiki/memory-mapped-I-O

One aspect was that balancing the size of the write cache, read cache and read-back avoidance (i.e. blocks that will need to be read back from disk in order to compute their piece hash) is not possible to do well in user space. It turns out it's not so easy in kernel space either though.

Another aspect was the emergence of fast SSDs and persistent memory (DAX) would most likely be much more efficient via memory mapped files.

The major failure case of mmap (afaict) is in network mounted drives, or any FUSE drive. On these drives, writing a partial page in a memory mapped file becomes very expensive, as it needs to pull the page from the network, overwrite part of it, and then flush the whole page back again over the network. Preserving the fidelity of exactly which bytes are being written helps tremendously in this scenario.

HanabishiRecca · 2023-11-19T17:44:07Z

I'm working on it.

I'm personally fine with POSIX I/O. OS filesystem cache does a quite good job. (Even prior LT 2.0 I had in-client cache disabled anyway.)
Some people report UI freezes in qBittorrent with it (presumably Windows users?), but I never faced that problem.

It's not easy to get right and efficient, contributions are welcome:

I would have helped, but I'm not a C++ guy.

SemiAccurate · 2023-11-20T01:45:49Z

I'm working on it. It's not easy to get right and efficient, contributions are welcome: #7013

Hmm as this new implementation is taking a while, why not just copy wholesale the disk I/O subsystem from 1.2 for 2.1 and then you can work on this new implementation afterwards?

Because you had at first said that this new implementation you're working on wouldn't cache blocks, though more recently you've stated that it would use caching. So it seems to be getting more complex over time .. in the interest of pragmatism is it not prudent to use 1.2's I/O subsystem for now?

What where the reasons that led you to make this curious decision?

They are mostly documented here: What where the reasons that led you to make this curious decision?

Did you mean to include a link here Arvid? I don't see it :(

One aspect was that balancing the size of the write cache, read cache and read-back avoidance (i.e. blocks that will need to be read back from disk in order to compute their piece hash) is not possible to do well in user space. It turns out it's not so easy in kernel space either though.

Another aspect was the emergence of fast SSDs and persistent memory (DAX) would most likely be much more efficient via memory mapped files.

Hmm how far out was this DAX persistent memory for consumer PCs in this idealized future scenario? Just asking because I have never heard of DAX and if it comes to consumer PCs it seems it will take a long time, if ever.

Maybe you were getting ahead of things with respect to future hardware developments with the memory mapped implementation?

The major failure case of mmap (afaict) is in network mounted drives, or any FUSE drive. On these drives, writing a partial page in a memory mapped file becomes very expensive, as it needs to pull the page from the network, overwrite part of it, and then flush the whole page back again over the network. Preserving the fidelity of exactly which bytes are being written helps tremendously in this scenario.

It's a pity I don't know C++ :(

arvidn · 2023-11-20T11:14:13Z

Hmm as this new implementation is taking a while, why not just copy wholesale the disk I/O subsystem from 1.2 for 2.1 and then you can work on this new implementation afterwards?

Because a lot of other this have changed around it. The 1.2 implementation doesn't fit in 2.0+.

Either option is a lot of work, and I don't have a lot of time.

Did you mean to include a link here Arvid? I don't see it :(

Yes, that was a copy-paste failure. I updated my post

Hmm how far out was this DAX persistent memory for consumer PCs in this idealized future scenario? Just asking because I have never heard of DAX and if it comes to consumer PCs it seems it will take a long time, if ever.

It seems Intel Optane kind of failed in the market too.

It's a pity I don't know C++ :(

It's never too late to start!

HanabishiRecca · 2023-11-20T11:26:28Z

Another aspect was the emergence of fast SSDs

The thing is, most heavy lifting seeders still use HDDs, simply because of huge amounts of storage required. I know people seeding tens or even hundreds terabytes of data, having 10000+ tasks in a single client. And I don't think they will change soon, as SSD space still remains significantly more expensive.

arvidn · 2023-11-20T11:34:03Z

Because you had at first #7013 (comment) that this new implementation you're working on wouldn't cache blocks, though more recently you've #7480 (comment) that it would use caching.

My current plan is to only have a store-buffer and rely on the operating system for read cache.

ukoz · 2023-11-23T14:22:33Z

emergence of fast SSDs and persistent memory (DAX)

Persistent memory modules are Intel servers only, you put them in DIMM slots. Only Intel® Xeon® CPUs have hardware support for PMEM in memory controller.
New SSDs going nowhere since they less durable due more bits per cell.

SemiAccurate · 2024-02-17T19:25:56Z

What where the reasons that led you to make this curious decision?

They are mostly documented here: https://github.com/arvidn/libtorrent/wiki/memory-mapped-I-O

So @arvidn this wiki page is mostly about the how, not the why. The first three lines give the goals but everything that follows is about how the implementation will work. There isn't any detailed reasoning as to why to adopt the memory mapped design in the first place, with a careful exploration of all the pros and cons. Perhaps this was never done, which would explain all the problems since...

Because you had at first #7013 (comment) that this new implementation you're working on wouldn't cache blocks, though more recently you've #7480 (comment) that it would use caching.

My current plan is to only have a store-buffer and rely on the operating system for read cache.

Huh, is your current plan changed from a few months ago? Because in September you wanted to use pread (in addition to pwrite) so no memory mapping in the read path apparently. Though back in August 2022 you intended to only have a store-buffer for writes.

It's a pity I don't know C++ :(

It's never too late to start!

Oh I've tried Arvid but man is it hard! And modern C++ is a career, all of C++17 and its idioms, plus the other things you use in your codebase like boost.asio, whose documentation is horrendous! Not for newbies at all!

As an example of how difficult it is for me, in a November PR someone mentioned adding support for passing client_data_t to flush_cache. Curious, I looked in to your docs to learn about what this client_data_t is. I looked at your interface definition for it in the Add Torrent reference page, but all that template magic befuddled me (compared to the simple - though untyped - void* you apparently used in LT 1.2).

Nonetheless being the fool that I am - as a small exercise - I tried to code up an assignment to this data type and a corresponding get() in order to see if I could work with it. Yet the compiler kept giving me errors with some template messages that were indecipherable to me. I couldn't get it to work. I then tried to find more documentation on how this type worked. In your Upgrading to LT 2.0 doc you say that this type of yours is similar to std::any. So I then searched for that and tried to understand this std::any idiom of modern C++ so that my small example code with your client_data_t data type could compile. Yet try as I might I still could not get it to compile using your type, as compared to a simple void* ! It was a very frustrating experience and I gave up.

Mind you this is a very small idiom of C++ you use in your code that in the end I could not grok enough to make working code out of it, no matter how much I tried. And there are so many much bigger pieces of modern C++ you use, not to mention the boost.asio library, which is quite formidable to grok in and of itself.

vincejv mentioned this issue Dec 7, 2023

Expose libtorrent version in client string (in order to influence better adoption of v2.x over 1.x) qbittorrent/qBittorrent#19988

Open

SemiAccurate mentioned this issue Feb 19, 2024

pread-disk-io #7013

Draft

XITRIX mentioned this issue Apr 2, 2024

Assertion fail on download #7541

Open

CXwudi mentioned this issue Aug 19, 2024

[BUG] QB Memory Leak问题 PBH-BTN/PeerBanHelper#335

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why libtorrent 2.0's use of memory mapped files was a bad idea #7551

Why libtorrent 2.0's use of memory mapped files was a bad idea #7551

SemiAccurate commented Nov 15, 2023

HanabishiRecca commented Nov 16, 2023

ukoz commented Nov 16, 2023

SemiAccurate commented Nov 17, 2023

SemiAccurate commented Nov 17, 2023

HanabishiRecca commented Nov 17, 2023 •

edited

Loading

SemiAccurate commented Nov 18, 2023

arvidn commented Nov 19, 2023 •

edited

Loading

HanabishiRecca commented Nov 19, 2023

SemiAccurate commented Nov 20, 2023

arvidn commented Nov 20, 2023

HanabishiRecca commented Nov 20, 2023

arvidn commented Nov 20, 2023

ukoz commented Nov 23, 2023

SemiAccurate commented Feb 17, 2024

Why libtorrent 2.0's use of memory mapped files was a bad idea #7551

Why libtorrent 2.0's use of memory mapped files was a bad idea #7551

Comments

SemiAccurate commented Nov 15, 2023

HanabishiRecca commented Nov 16, 2023

ukoz commented Nov 16, 2023

SemiAccurate commented Nov 17, 2023

SemiAccurate commented Nov 17, 2023

HanabishiRecca commented Nov 17, 2023 • edited Loading

SemiAccurate commented Nov 18, 2023

arvidn commented Nov 19, 2023 • edited Loading

HanabishiRecca commented Nov 19, 2023

SemiAccurate commented Nov 20, 2023

arvidn commented Nov 20, 2023

HanabishiRecca commented Nov 20, 2023

arvidn commented Nov 20, 2023

ukoz commented Nov 23, 2023

SemiAccurate commented Feb 17, 2024

HanabishiRecca commented Nov 17, 2023 •

edited

Loading

arvidn commented Nov 19, 2023 •

edited

Loading