• The_Decryptor@aussie.zone
    link
    fedilink
    English
    arrow-up
    1
    ·
    8 hours ago

    So one of the problems is the size of a “physical page”, on a stock x86 system that’s only 4KiB. If you allocate just 1MiB of RAM you need to back that with 256 “page table entries”, and to then load a virtual address within that allocation you need to walk that list of 256 entries to find the physical address in RAM that the CPU needs to request.

    Of course these days an app is more likely to use 1 GiB of RAM, that’s a mere 262,144 page table entries to scan through, on each memory load.

    Oh but then we’re also not running a single process, there’s multiple processes on the system, so there will be several million of these entries, each one indexed by address (Which can be duplicated, each process has its own private view of the address space), and then by process ID to disambiguate which entry belongs to each process.

    That’s where the TLB comes in handy, to avoid the million or so indexing operations on each and every memory load.

    But caching alone can’t solve everything, you need a smarter way to perform bookkeeping than simply using a flat list for when you don’t have a cached result. So the OS breaks down those mappings into smaller chunks and then provides a table that maps address ranges to those chunks. An OS might cap a list of PTEs at 4096 and have another table index that, so to resolve an address the CPU checks which block of PTEs to load from the first table and then only has to scan the list it points to.

    Like this, this is a 2 level scheme that Intel CPUs used before the Pentium Pro (iirc), the top 10 bits of an address selected an entry in the “page directory”, the CPU loads that and uses the next 10 bits to select the group of PTEs from that list, following that link that it finds the actual PTEs that describe the mappings and then it can scan that list to find the specific matching entry that describes the physical address to load (And it then promptly caches the result to avoid doing that again)

    So yes, for a given page size and CPU you have a fixed number of walks regardless of where the address lives in memory, but we also have more memory now. And much like a hoarder, the more space we have to store things, the more things we do store, and the more disorganised it gets. And even if you do clear a spot, the next thing you want to store might not fit there and you end up storing it someplace else. If you end up bouncing around looking for things you end up thrashing the TLB, throwing out cached entries you still need so now need to perform the entire table walk again (Just to invariably throw that result away soon after).

    Basically, you need to defrag your RAM periodically so that the mappings don’t get too complex and slow things down (Same is true for SSDs btw, you still need to defrag them to clean up the filesystem metadata itself, just less often than HDDs). Meta have been working on improvements to how Linux handles all this stuff (page table layout and memory compaction) for a while because they were seeing some of their long-lived servers ending up spending about 20% of CPU time simply wasted on doing repetitive walks due to a highly fragmented address space.