• Mniot@programming.dev
    link
    fedilink
    English
    arrow-up
    22
    arrow-down
    6
    ·
    12 days ago

    But this was arbitrary. It’s not like “why are there only 16 colors on this video game” (because of space constraints). They could have made it 257 users and nothing would overflow. Given that, I think they should have made a human-comfortable number (multiple of 10) instead of a machine-comfortable number (power of 2).

    • chonglibloodsport@lemmy.world
      link
      fedilink
      arrow-up
      13
      arrow-down
      1
      ·
      12 days ago

      It’s only arbitrary if you ignore the history of computing and the eventual settling on a standard of 8-bit bytes as the smallest addressable value in most programming languages and operating system libraries (though not always addressable in hardware).

      Unless you’re making the very meta claim that it was arbitrary for us to settle on 8 bits instead of 10 or something. I think there are a lot of technical merits to 8 bit bytes (being a power of 2 is nice and 4 bits is just too small).

      • Tja@programming.dev
        link
        fedilink
        arrow-up
        11
        arrow-down
        2
        ·
        11 days ago

        Yes, but this is not a historical piece of code, it is a 21 century app. I very much doubt they are using a uint8 to represent the array size, it’s probably a 64 bit int. They might as well have used 300 or 250, or 1000.

        • chonglibloodsport@lemmy.world
          link
          fedilink
          arrow-up
          7
          ·
          11 days ago

          WhatsApp’s back-end is written in Erlang. Erlang is a very old language with weird limitations. For one thing, it doesn’t have different machine-sized (16, 32, 64 bit) integers the way C does. Arbitrary-precision integers are the only primitive integer type. This makes it quite a slow type to use for something like a group chat member ID.

          However Erlang also has a type called a binary which is used for space-efficient storage of binary data (along with primitive operations on bits). These types are stored as sequences of bytes. I’m guessing this is how WhatsApp does group chat IDs, which would make the 256 user limit perfectly understandable (keep every ID contained within a byte).

          • Tja@programming.dev
            link
            fedilink
            arrow-up
            3
            arrow-down
            1
            ·
            11 days ago

            I don’t think every user would have an ID in the chat of 1 byte, that would be a nightmare when leaving and joining the group, reusing IDs, etc… each user needs to be identified with its uuid (or whatever else they chose).

            Using a 32 at 64 bit size and limiting the value makes much more sense, any subsequent changes would be a config tweak instead of a major refactor. I would guess the limit was a fun “Easter egg” type of thing rathar than a hard technical limit.

            • chonglibloodsport@lemmy.world
              link
              fedilink
              arrow-up
              2
              ·
              edit-2
              11 days ago

              WhatsApp has billions of users. Scaling to that level and maintaining perfect real-time chatting with arbitrary user-created groups is not trivial. Storing 64 bit UUIDs for every single message and other interaction in a group chat would be inefficient, not to mention unidiomatic in Erlang (due to previously-mentioned lack of machine-sized integers).

              The use-case of a group having <256 current users but >256 historical users and the desire to scroll back and read very old messages of people who left the group is very uncommon. It makes perfect sense to put a situation like that on a slow path while optimizing for the common case of <256 chatting right now.

              • Tja@programming.dev
                link
                fedilink
                arrow-up
                1
                arrow-down
                1
                ·
                11 days ago

                I disagree for various reasons:

                It’s not very uncommon, it would be an issue as soon as it happens, without going back that far. Even if it was uncommon, it is possible and something to take care of, making for a super ugly “special case” code.

                Plus you don’t need to sort the user’s ids to deliver messages, it’s a foreach kind of operation.

                And finally, given the underlying hardware, sorting 8 bit integers wouldn’t be faster than sorting 64 bit ones (which we don’t need to do, anyway), processors move all bits in parallel. Unless WhatsApp runs on 8 bit microcontrollers.

                • chonglibloodsport@lemmy.world
                  link
                  fedilink
                  arrow-up
                  1
                  ·
                  edit-2
                  11 days ago

                  I didn’t say sorting, I said “storting” and must have corrected the typo while you were writing your reply. I meant storing. Having a 64-bit UUID attached to every single one of trillions of messages (per day) is a huge amount of wasted space (72TB per trillion messages, just to store 64-bit UUIDs without any message contents).

                  As an annoying aside, my phone now thinks “storting” is a word and helpfully autocorrects storing to that now. Good grief!

                  • JackbyDev@programming.dev
                    link
                    fedilink
                    English
                    arrow-up
                    2
                    ·
                    10 days ago

                    I nominate storting to mean storing and sorting at the same time. Like in a binary heap, binary tree, sorted array, etc. It’s a common thing and similar to other words like “upsert”.

                  • Tja@programming.dev
                    link
                    fedilink
                    arrow-up
                    1
                    ·
                    11 days ago

                    I don’t see how a message uuid is related to the group membership storage…

                    I haven’t seen the code of WhatsApp, obviously, but I use a similar question to interview candidates. There’s a few ways of implementing groups, and you have to store group membership somehow, but just once per group.

                    When a message is sent, it can be stored with a foreign key that relates it to the group, a message ID that should be unique for whatever DB is in used, plus a timestamp. When checking new messages, a client provides the timestamp of the last retrieved message and the server provides all messages since then (per group). Even read confirmations can be implemented using timestamps. There’s no need of storing all group members for every message (not that you claimed it is, just making sure).

        • Flax@feddit.uk
          link
          fedilink
          English
          arrow-up
          1
          ·
          11 days ago

          Some programming languages and data storage types have 8 bit limits. You’d be surprised.

          • Tja@programming.dev
            link
            fedilink
            arrow-up
            1
            arrow-down
            1
            ·
            11 days ago

            Any language that implements and enforces a uint8, yes, but you don’t use those types because of forward compatibility.

    • chicken@lemmy.dbzer0.com
      link
      fedilink
      arrow-up
      6
      ·
      12 days ago

      They could have made it 257 users and nothing would overflow

      It might if the people writing the software are extremely old school about their approach to memory management

    • JackbyDev@programming.dev
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 days ago

      Dev here. Just because CPUs don’t directly use 8 bit numbers anymore doesn’t magically mean 257 wouldn’t overflow. If you’re storing the 8 bits in part of something else that’s 32 or 64 bits (or whatever), like maybe the ID of the chat, then you only have 8 bits. A lot of time this comes down to making compact data representations of things to make uploads/downloads quicker. JSON is the most popular data format to transfer data in (probably), but other more compact binary formats like Avro, Protobuf, and even application specific custom formats exist.