1. 18 5月, 2022 1 次提交
    • J
      random: do not use input pool from hard IRQs · e3e33fc2
      Jason A. Donenfeld 提交于
      Years ago, a separate fast pool was added for interrupts, so that the
      cost associated with taking the input pool spinlocks and mixing into it
      would be avoided in places where latency is critical. However, one
      oversight was that add_input_randomness() and add_disk_randomness()
      still sometimes are called directly from the interrupt handler, rather
      than being deferred to a thread. This means that some unlucky interrupts
      will be caught doing a blake2s_compress() call and potentially spinning
      on input_pool.lock, which can also be taken by unprivileged users by
      writing into /dev/urandom.
      
      In order to fix this, add_timer_randomness() now checks whether it is
      being called from a hard IRQ and if so, just mixes into the per-cpu IRQ
      fast pool using fast_mix(), which is much faster and can be done
      lock-free. A nice consequence of this, as well, is that it means hard
      IRQ context FPU support is likely no longer useful.
      
      The entropy estimation algorithm used by add_timer_randomness() is also
      somewhat different than the one used for add_interrupt_randomness(). The
      former looks at deltas of deltas of deltas, while the latter just waits
      for 64 interrupts for one bit or for one second since the last bit. In
      order to bridge these, and since add_interrupt_randomness() runs after
      an add_timer_randomness() that's called from hard IRQ, we add to the
      fast pool credit the related amount, and then subtract one to account
      for add_interrupt_randomness()'s contribution.
      
      A downside of this, however, is that the num argument is potentially
      attacker controlled, which puts a bit more pressure on the fast_mix()
      sponge to do more than it's really intended to do. As a mitigating
      factor, the first 96 bits of input aren't attacker controlled (a cycle
      counter followed by zeros), which means it's essentially two rounds of
      siphash rather than one, which is somewhat better. It's also not that
      much different from add_interrupt_randomness()'s use of the irq stack
      instruction pointer register.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Filipe Manana <fdmanana@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      e3e33fc2
  2. 16 5月, 2022 1 次提交
  3. 15 5月, 2022 1 次提交
    • J
      random: do not pretend to handle premature next security model · e85c0fc1
      Jason A. Donenfeld 提交于
      Per the thread linked below, "premature next" is not considered to be a
      realistic threat model, and leads to more serious security problems.
      
      "Premature next" is the scenario in which:
      
      - Attacker compromises the current state of a fully initialized RNG via
        some kind of infoleak.
      - New bits of entropy are added directly to the key used to generate the
        /dev/urandom stream, without any buffering or pooling.
      - Attacker then, somehow having read access to /dev/urandom, samples RNG
        output and brute forces the individual new bits that were added.
      - Result: the RNG never "recovers" from the initial compromise, a
        so-called violation of what academics term "post-compromise security".
      
      The usual solutions to this involve some form of delaying when entropy
      gets mixed into the crng. With Fortuna, this involves multiple input
      buckets. With what the Linux RNG was trying to do prior, this involves
      entropy estimation.
      
      However, by delaying when entropy gets mixed in, it also means that RNG
      compromises are extremely dangerous during the window of time before
      the RNG has gathered enough entropy, during which time nonces may become
      predictable (or repeated), ephemeral keys may not be secret, and so
      forth. Moreover, it's unclear how realistic "premature next" is from an
      attack perspective, if these attacks even make sense in practice.
      
      Put together -- and discussed in more detail in the thread below --
      these constitute grounds for just doing away with the current code that
      pretends to handle premature next. I say "pretends" because it wasn't
      doing an especially great job at it either; should we change our mind
      about this direction, we would probably implement Fortuna to "fix" the
      "problem", in which case, removing the pretend solution still makes
      sense.
      
      This also reduces the crng reseed period from 5 minutes down to 1
      minute. The rationale from the thread might lead us toward reducing that
      even further in the future (or even eliminating it), but that remains a
      topic of a future commit.
      
      At a high level, this patch changes semantics from:
      
          Before: Seed for the first time after 256 "bits" of estimated
          entropy have been accumulated since the system booted. Thereafter,
          reseed once every five minutes, but only if 256 new "bits" have been
          accumulated since the last reseeding.
      
          After: Seed for the first time after 256 "bits" of estimated entropy
          have been accumulated since the system booted. Thereafter, reseed
          once every minute.
      
      Most of this patch is renaming and removing: POOL_MIN_BITS becomes
      POOL_INIT_BITS, credit_entropy_bits() becomes credit_init_bits(),
      crng_reseed() loses its "force" parameter since it's now always true,
      the drain_entropy() function no longer has any use so it's removed,
      entropy estimation is skipped if we've already init'd, the various
      notifiers for "low on entropy" are now only active prior to init, and
      finally, some documentation comments are cleaned up here and there.
      
      Link: https://lore.kernel.org/lkml/YmlMGx6+uigkGiZ0@zx2c4.com/
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Nadia Heninger <nadiah@cs.ucsd.edu>
      Cc: Tom Ristenpart <ristenpart@cornell.edu>
      Reviewed-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      e85c0fc1
  4. 14 5月, 2022 5 次提交
    • J
      random: use first 128 bits of input as fast init · 5c3b747e
      Jason A. Donenfeld 提交于
      Before, the first 64 bytes of input, regardless of how entropic it was,
      would be used to mutate the crng base key directly, and none of those
      bytes would be credited as having entropy. Then 256 bits of credited
      input would be accumulated, and only then would the rng transition from
      the earlier "fast init" phase into being actually initialized.
      
      The thinking was that by mixing and matching fast init and real init, an
      attacker who compromised the fast init state, considered easy to do
      given how little entropy might be in those first 64 bytes, would then be
      able to bruteforce bits from the actual initialization. By keeping these
      separate, bruteforcing became impossible.
      
      However, by not crediting potentially creditable bits from those first 64
      bytes of input, we delay initialization, and actually make the problem
      worse, because it means the user is drawing worse random numbers for a
      longer period of time.
      
      Instead, we can take the first 128 bits as fast init, and allow them to
      be credited, and then hold off on the next 128 bits until they've
      accumulated. This is still a wide enough margin to prevent bruteforcing
      the rng state, while still initializing much faster.
      
      Then, rather than trying to piecemeal inject into the base crng key at
      various points, instead just extract from the pool when we need it, for
      the crng_init==0 phase. Performance may even be better for the various
      inputs here, since there are likely more calls to mix_pool_bytes() then
      there are to get_random_bytes() during this phase of system execution.
      
      Since the preinit injection code is gone, bootloader randomness can then
      do something significantly more straight forward, removing the weird
      system_wq hack in hwgenerator randomness.
      
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      5c3b747e
    • J
      random: do not use batches when !crng_ready() · cbe89e5a
      Jason A. Donenfeld 提交于
      It's too hard to keep the batches synchronized, and pointless anyway,
      since in !crng_ready(), we're updating the base_crng key really often,
      where batching only hurts. So instead, if the crng isn't ready, just
      call into get_random_bytes(). At this stage nothing is performance
      critical anyhow.
      
      Cc: Theodore Ts'o <tytso@mit.edu>
      Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      cbe89e5a
    • J
      random: mix in timestamps and reseed on system restore · b7b67d13
      Jason A. Donenfeld 提交于
      Since the RNG loses freshness with system suspend/hibernation, when we
      resume, immediately reseed using whatever data we can, which for this
      particular case is the various timestamps regarding system suspend time,
      in addition to more generally the RDSEED/RDRAND/RDTSC values that happen
      whenever the crng reseeds.
      
      On systems that suspend and resume automatically all the time -- such as
      Android -- we skip the reseeding on suspend resumption, since that could
      wind up being far too busy. This is the same trade-off made in
      WireGuard.
      
      In addition to reseeding upon resumption always mix into the pool these
      various stamps on every power notification event.
      
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      b7b67d13
    • J
      random: vary jitter iterations based on cycle counter speed · 78c768e6
      Jason A. Donenfeld 提交于
      Currently, we do the jitter dance if two consecutive reads to the cycle
      counter return different values. If they do, then we consider the cycle
      counter to be fast enough that one trip through the scheduler will yield
      one "bit" of credited entropy. If those two reads return the same value,
      then we assume the cycle counter is too slow to show meaningful
      differences.
      
      This methodology is flawed for a variety of reasons, one of which Eric
      posted a patch to fix in [1]. The issue that patch solves is that on a
      system with a slow counter, you might be [un]lucky and read the counter
      _just_ before it changes, so that the second cycle counter you read
      differs from the first, even though there's usually quite a large period
      of time in between the two. For example:
      
      | real time | cycle counter |
      | --------- | ------------- |
      | 3         | 5             |
      | 4         | 5             |
      | 5         | 5             |
      | 6         | 5             |
      | 7         | 5             | <--- a
      | 8         | 6             | <--- b
      | 9         | 6             | <--- c
      
      If we read the counter at (a) and compare it to (b), we might be fooled
      into thinking that it's a fast counter, when in reality it is not. The
      solution in [1] is to also compare counter (b) to counter (c), on the
      theory that if the counter is _actually_ slow, and (a)!=(b), then
      certainly (b)==(c).
      
      This helps solve this particular issue, in one sense, but in another
      sense, it mostly functions to disallow jitter entropy on these systems,
      rather than simply taking more samples in that case.
      
      Instead, this patch takes a different approach. Right now we assume that
      a difference in one set of consecutive samples means one "bit" of
      credited entropy per scheduler trip. We can extend this so that a
      difference in two sets of consecutive samples means one "bit" of
      credited entropy per /two/ scheduler trips, and three for three, and
      four for four. In other words, we can increase the amount of jitter
      "work" we require for each "bit", depending on how slow the cycle
      counter is.
      
      So this patch takes whole bunch of samples, sees how many of them are
      different, and divides to find the amount of work required per "bit",
      and also requires that at least some minimum of them are different in
      order to attempt any jitter entropy.
      
      Note that this approach is still far from perfect. It's not a real
      statistical estimate on how much these samples vary; it's not a
      real-time analysis of the relevant input data. That remains a project
      for another time. However, it makes the same (partly flawed) assumptions
      as the code that's there now, so it's probably not worse than the status
      quo, and it handles the issue Eric mentioned in [1]. But, again, it's
      probably a far cry from whatever a really robust version of this would
      be.
      
      [1] https://lore.kernel.org/lkml/20220421233152.58522-1-ebiggers@kernel.org/
          https://lore.kernel.org/lkml/20220421192939.250680-1-ebiggers@kernel.org/
      
      Cc: Eric Biggers <ebiggers@google.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      78c768e6
    • J
      random: insist on random_get_entropy() existing in order to simplify · 4b758eda
      Jason A. Donenfeld 提交于
      All platforms are now guaranteed to provide some value for
      random_get_entropy(). In case some bug leads to this not being so, we
      print a warning, because that indicates that something is really very
      wrong (and likely other things are impacted too). This should never be
      hit, but it's a good and cheap way of finding out if something ever is
      problematic.
      
      Since we now have viable fallback code for random_get_entropy() on all
      platforms, which is, in the worst case, not worse than jiffies, we can
      count on getting the best possible value out of it. That means there's
      no longer a use for using jiffies as entropy input. It also means we no
      longer have a reason for doing the round-robin register flow in the IRQ
      handler, which was always of fairly dubious value.
      
      Instead we can greatly simplify the IRQ handler inputs and also unify
      the construction between 64-bits and 32-bits. We now collect the cycle
      counter and the return address, since those are the two things that
      matter. Because the return address and the irq number are likely
      related, to the extent we mix in the irq number, we can just xor it into
      the top unchanging bytes of the return address, rather than the bottom
      changing bytes of the cycle counter as before. Then, we can do a fixed 2
      rounds of SipHash/HSipHash. Finally, we use the same construction of
      hashing only half of the [H]SipHash state on 32-bit and 64-bit. We're
      not actually discarding any entropy, since that entropy is carried
      through until the next time. And more importantly, it lets us do the
      same sponge-like construction everywhere.
      
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      4b758eda
  5. 25 4月, 2022 1 次提交
  6. 16 4月, 2022 1 次提交
  7. 13 4月, 2022 2 次提交
    • J
      random: make random_get_entropy() return an unsigned long · b0c3e796
      Jason A. Donenfeld 提交于
      Some implementations were returning type `unsigned long`, while others
      that fell back to get_cycles() were implicitly returning a `cycles_t` or
      an untyped constant int literal. That makes for weird and confusing
      code, and basically all code in the kernel already handled it like it
      was an `unsigned long`. I recently tried to handle it as the largest
      type it could be, a `cycles_t`, but doing so doesn't really help with
      much.
      
      Instead let's just make random_get_entropy() return an unsigned long all
      the time. This also matches the commonly used `arch_get_random_long()`
      function, so now RDRAND and RDTSC return the same sized integer, which
      means one can fallback to the other more gracefully.
      
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      b0c3e796
    • J
      random: allow partial reads if later user copies fail · 5209aed5
      Jason A. Donenfeld 提交于
      Rather than failing entirely if a copy_to_user() fails at some point,
      instead we should return a partial read for the amount that succeeded
      prior, unless none succeeded at all, in which case we return -EFAULT as
      before.
      
      This makes it consistent with other reader interfaces. For example, the
      following snippet for /dev/zero outputs "4" followed by "1":
      
        int fd;
        void *x = mmap(NULL, 4096, PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
        assert(x != MAP_FAILED);
        fd = open("/dev/zero", O_RDONLY);
        assert(fd >= 0);
        printf("%zd\n", read(fd, x, 4));
        printf("%zd\n", read(fd, x + 4095, 4));
        close(fd);
      
      This brings that same standard behavior to the various RNG reader
      interfaces.
      
      While we're at it, we can streamline the loop logic a little bit.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Jann Horn <jannh@google.com>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      5209aed5
  8. 07 4月, 2022 1 次提交
    • J
      random: check for signals every PAGE_SIZE chunk of /dev/[u]random · e3c1c4fd
      Jason A. Donenfeld 提交于
      In 1448769c ("random: check for signal_pending() outside of
      need_resched() check"), Jann pointed out that we previously were only
      checking the TIF_NOTIFY_SIGNAL and TIF_SIGPENDING flags if the process
      had TIF_NEED_RESCHED set, which meant in practice, super long reads to
      /dev/[u]random would delay signal handling by a long time. I tried this
      using the below program, and indeed I wasn't able to interrupt a
      /dev/urandom read until after several megabytes had been read. The bug
      he fixed has always been there, and so code that reads from /dev/urandom
      without checking the return value of read() has mostly worked for a long
      time, for most sizes, not just for <= 256.
      
      Maybe it makes sense to keep that code working. The reason it was so
      small prior, ignoring the fact that it didn't work anyway, was likely
      because /dev/random used to block, and that could happen for pretty
      large lengths of time while entropy was gathered. But now, it's just a
      chacha20 call, which is extremely fast and is just operating on pure
      data, without having to wait for some external event. In that sense,
      /dev/[u]random is a lot more like /dev/zero.
      
      Taking a page out of /dev/zero's read_zero() function, it always returns
      at least one chunk, and then checks for signals after each chunk. Chunk
      sizes there are of length PAGE_SIZE. Let's just copy the same thing for
      /dev/[u]random, and check for signals and cond_resched() for every
      PAGE_SIZE amount of data. This makes the behavior more consistent with
      expectations, and should mitigate the impact of Jann's fix for the
      age-old signal check bug.
      
      ---- test program ----
      
        #include <unistd.h>
        #include <signal.h>
        #include <stdio.h>
        #include <sys/random.h>
      
        static unsigned char x[~0U];
      
        static void handle(int) { }
      
        int main(int argc, char *argv[])
        {
          pid_t pid = getpid(), child;
          signal(SIGUSR1, handle);
          if (!(child = fork())) {
            for (;;)
              kill(pid, SIGUSR1);
          }
          pause();
          printf("interrupted after reading %zd bytes\n", getrandom(x, sizeof(x), 0));
          kill(child, SIGTERM);
          return 0;
        }
      
      Cc: Jann Horn <jannh@google.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      e3c1c4fd
  9. 06 4月, 2022 2 次提交
    • J
      random: check for signal_pending() outside of need_resched() check · 1448769c
      Jann Horn 提交于
      signal_pending() checks TIF_NOTIFY_SIGNAL and TIF_SIGPENDING, which
      signal that the task should bail out of the syscall when possible. This
      is a separate concept from need_resched(), which checks
      TIF_NEED_RESCHED, signaling that the task should preempt.
      
      In particular, with the current code, the signal_pending() bailout
      probably won't work reliably.
      
      Change this to look like other functions that read lots of data, such as
      read_zero().
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      1448769c
    • J
      random: do not allow user to keep crng key around on stack · aba120cc
      Jason A. Donenfeld 提交于
      The fast key erasure RNG design relies on the key that's used to be used
      and then discarded. We do this, making judicious use of
      memzero_explicit().  However, reads to /dev/urandom and calls to
      getrandom() involve a copy_to_user(), and userspace can use FUSE or
      userfaultfd, or make a massive call, dynamically remap memory addresses
      as it goes, and set the process priority to idle, in order to keep a
      kernel stack alive indefinitely. By probing
      /proc/sys/kernel/random/entropy_avail to learn when the crng key is
      refreshed, a malicious userspace could mount this attack every 5 minutes
      thereafter, breaking the crng's forward secrecy.
      
      In order to fix this, we just overwrite the stack's key with the first
      32 bytes of the "free" fast key erasure output. If we're returning <= 32
      bytes to the user, then we can still return those bytes directly, so
      that short reads don't become slower. And for long reads, the difference
      is hopefully lost in the amortization, so it doesn't change much, with
      that amortization helping variously for medium reads.
      
      We don't need to do this for get_random_bytes() and the various
      kernel-space callers, and later, if we ever switch to always batching,
      this won't be necessary either, so there's no need to change the API of
      these functions.
      
      Cc: Theodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJann Horn <jannh@google.com>
      Fixes: c92e040d ("random: add backtracking protection to the CRNG")
      Fixes: 186873c5 ("random: use simpler fast key erasure flow on per-cpu keys")
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      aba120cc
  10. 05 4月, 2022 2 次提交
    • J
      random: opportunistically initialize on /dev/urandom reads · 48bff105
      Jason A. Donenfeld 提交于
      In 6f98a4bf ("random: block in /dev/urandom"), we tried to make a
      successful try_to_generate_entropy() call *required* if the RNG was not
      already initialized. Unfortunately, weird architectures and old
      userspaces combined in TCG test harnesses, making that change still not
      realistic, so it was reverted in 0313bc27 ("Revert "random: block in
      /dev/urandom"").
      
      However, rather than making a successful try_to_generate_entropy() call
      *required*, we can instead make it *best-effort*.
      
      If try_to_generate_entropy() fails, it fails, and nothing changes from
      the current behavior. If it succeeds, then /dev/urandom becomes safe to
      use for free. This way, we don't risk the regression potential that led
      to us reverting the required-try_to_generate_entropy() call before.
      
      Practically speaking, this means that at least on x86, /dev/urandom
      becomes safe. Probably other architectures with working cycle counters
      will also become safe. And architectures with slow or broken cycle
      counters at least won't be affected at all by this change.
      
      So it may not be the glorious "all things are unified!" change we were
      hoping for initially, but practically speaking, it makes a positive
      impact.
      
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      48bff105
    • J
      random: do not split fast init input in add_hwgenerator_randomness() · 527a9867
      Jan Varho 提交于
      add_hwgenerator_randomness() tries to only use the required amount of input
      for fast init, but credits all the entropy, rather than a fraction of
      it. Since it's hard to determine how much entropy is left over out of a
      non-unformly random sample, either give it all to fast init or credit
      it, but don't attempt to do both. In the process, we can clean up the
      injection code to no longer need to return a value.
      Signed-off-by: NJan Varho <jan.varho@gmail.com>
      [Jason: expanded commit message]
      Fixes: 73c7733f ("random: do not throw away excess input to crng_fast_load")
      Cc: stable@vger.kernel.org # 5.17+, requires af704c85Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      527a9867
  11. 01 4月, 2022 1 次提交
    • J
      random: mix build-time latent entropy into pool at init · 1754abb3
      Jason A. Donenfeld 提交于
      Prior, the "input_pool_data" array needed no real initialization, and so
      it was easy to mark it with __latent_entropy to populate it during
      compile-time. In switching to using a hash function, this required us to
      specifically initialize it to some specific state, which means we
      dropped the __latent_entropy attribute. An unfortunate side effect was
      this meant the pool was no longer seeded using compile-time random data.
      In order to bring this back, we declare an array in rand_initialize()
      with __latent_entropy and call mix_pool_bytes() on that at init, which
      accomplishes the same thing as before. We make this __initconst, so that
      it doesn't take up space at runtime after init.
      
      Fixes: 6e8ec255 ("random: use computational hash for entropy extraction")
      Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Reviewed-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      1754abb3
  12. 25 3月, 2022 3 次提交
  13. 23 3月, 2022 1 次提交
  14. 13 3月, 2022 11 次提交
    • J
      random: check for signal and try earlier when generating entropy · 3e504d20
      Jason A. Donenfeld 提交于
      Rather than waiting a full second in an interruptable waiter before
      trying to generate entropy, try to generate entropy first and wait
      second. While waiting one second might give an extra second for getting
      entropy from elsewhere, we're already pretty late in the init process
      here, and whatever else is generating entropy will still continue to
      contribute. This has implications on signal handling: we call
      try_to_generate_entropy() from wait_for_random_bytes(), and
      wait_for_random_bytes() always uses wait_event_interruptible_timeout()
      when waiting, since it's called by userspace code in restartable
      contexts, where signals can pend. Since try_to_generate_entropy() now
      runs first, if a signal is pending, it's necessary for
      try_to_generate_entropy() to check for signals, since it won't hit the
      wait until after try_to_generate_entropy() has returned. And even before
      this change, when entering a busy loop in try_to_generate_entropy(), we
      should have been checking to see if any signals are pending, so that a
      process doesn't get stuck in that loop longer than expected.
      
      Cc: Theodore Ts'o <tytso@mit.edu>
      Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      3e504d20
    • J
      random: reseed more often immediately after booting · 7a7ff644
      Jason A. Donenfeld 提交于
      In order to chip away at the "premature first" problem, we augment our
      existing entropy accounting with more frequent reseedings at boot.
      
      The idea is that at boot, we're getting entropy from various places, and
      we're not very sure which of early boot entropy is good and which isn't.
      Even when we're crediting the entropy, we're still not totally certain
      that it's any good. Since boot is the one time (aside from a compromise)
      that we have zero entropy, it's important that we shepherd entropy into
      the crng fairly often.
      
      At the same time, we don't want a "premature next" problem, whereby an
      attacker can brute force individual bits of added entropy. In lieu of
      going full-on Fortuna (for now), we can pick a simpler strategy of just
      reseeding more often during the first 5 minutes after boot. This is
      still bounded by the 256-bit entropy credit requirement, so we'll skip a
      reseeding if we haven't reached that, but in case entropy /is/ coming
      in, this ensures that it makes its way into the crng rather rapidly
      during these early stages.
      
      Ordinarily we reseed if the previous reseeding is 300 seconds old. This
      commit changes things so that for the first 600 seconds of boot time, we
      reseed if the previous reseeding is uptime / 2 seconds old. That means
      that we'll reseed at the very least double the uptime of the previous
      reseeding.
      
      Cc: Theodore Ts'o <tytso@mit.edu>
      Reviewed-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      7a7ff644
    • J
      random: make consistent usage of crng_ready() · a96cfe2d
      Jason A. Donenfeld 提交于
      Rather than sometimes checking `crng_init < 2`, we should always use the
      crng_ready() macro, so that should we change anything later, it's
      consistent. Additionally, that macro already has a likely() around it,
      which means we don't need to open code our own likely() and unlikely()
      annotations.
      
      Cc: Theodore Ts'o <tytso@mit.edu>
      Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      a96cfe2d
    • J
      random: use SipHash as interrupt entropy accumulator · f5eab0e2
      Jason A. Donenfeld 提交于
      The current fast_mix() function is a piece of classic mailing list
      crypto, where it just sort of sprung up by an anonymous author without a
      lot of real analysis of what precisely it was accomplishing. As an ARX
      permutation alone, there are some easily searchable differential trails
      in it, and as a means of preventing malicious interrupts, it completely
      fails, since it xors new data into the entire state every time. It can't
      really be analyzed as a random permutation, because it clearly isn't,
      and it can't be analyzed as an interesting linear algebraic structure
      either, because it's also not that. There really is very little one can
      say about it in terms of entropy accumulation. It might diffuse bits,
      some of the time, maybe, we hope, I guess. But for the most part, it
      fails to accomplish anything concrete.
      
      As a reminder, the simple goal of add_interrupt_randomness() is to
      simply accumulate entropy until ~64 interrupts have elapsed, and then
      dump it into the main input pool, which uses a cryptographic hash.
      
      It would be nice to have something cryptographically strong in the
      interrupt handler itself, in case a malicious interrupt compromises a
      per-cpu fast pool within the 64 interrupts / 1 second window, and then
      inside of that same window somehow can control its return address and
      cycle counter, even if that's a bit far fetched. However, with a very
      CPU-limited budget, actually doing that remains an active research
      project (and perhaps there'll be something useful for Linux to come out
      of it). And while the abundance of caution would be nice, this isn't
      *currently* the security model, and we don't yet have a fast enough
      solution to make it our security model. Plus there's not exactly a
      pressing need to do that. (And for the avoidance of doubt, the actual
      cluster of 64 accumulated interrupts still gets dumped into our
      cryptographically secure input pool.)
      
      So, for now we are going to stick with the existing interrupt security
      model, which assumes that each cluster of 64 interrupt data samples is
      mostly non-malicious and not colluding with an infoleaker. With this as
      our goal, we have a few more choices, simply aiming to accumulate
      entropy, while discarding the least amount of it.
      
      We know from <https://eprint.iacr.org/2019/198> that random oracles,
      instantiated as computational hash functions, make good entropy
      accumulators and extractors, which is the justification for using
      BLAKE2s in the main input pool. As mentioned, we don't have that luxury
      here, but we also don't have the same security model requirements,
      because we're assuming that there aren't malicious inputs. A
      pseudorandom function instance can approximately behave like a random
      oracle, provided that the key is uniformly random. But since we're not
      concerned with malicious inputs, we can pick a fixed key, which is not
      secret, knowing that "nature" won't interact with a sufficiently chosen
      fixed key by accident. So we pick a PRF with a fixed initial key, and
      accumulate into it continuously, dumping the result every 64 interrupts
      into our cryptographically secure input pool.
      
      For this, we make use of SipHash-1-x on 64-bit and HalfSipHash-1-x on
      32-bit, which are already in use in the kernel's hsiphash family of
      functions and achieve the same performance as the function they replace.
      It would be nice to do two rounds, but we don't exactly have the CPU
      budget handy for that, and one round alone is already sufficient.
      
      As mentioned, we start with a fixed initial key (zeros is fine), and
      allow SipHash's symmetry breaking constants to turn that into a useful
      starting point. Also, since we're dumping the result (or half of it on
      64-bit so as to tax our hash function the same amount on all platforms)
      into the cryptographically secure input pool, there's no point in
      finalizing SipHash's output, since it'll wind up being finalized by
      something much stronger. This means that all we need to do is use the
      ordinary round function word-by-word, as normal SipHash does.
      Simplified, the flow is as follows:
      
      Initialize:
      
          siphash_state_t state;
          siphash_init(&state, key={0, 0, 0, 0});
      
      Update (accumulate) on interrupt:
      
          siphash_update(&state, interrupt_data_and_timing);
      
      Dump into input pool after 64 interrupts:
      
          blake2s_update(&input_pool, &state, sizeof(state) / 2);
      
      The result of all of this is that the security model is unchanged from
      before -- we assume non-malicious inputs -- yet we now implement that
      model with a stronger argument. I would like to emphasize, again, that
      the purpose of this commit is to improve the existing design, by making
      it analyzable, without changing any fundamental assumptions. There may
      well be value down the road in changing up the existing design, using
      something cryptographically strong, or simply using a ring buffer of
      samples rather than having a fast_mix() at all, or changing which and
      how much data we collect each interrupt so that we can use something
      linear, or a variety of other ideas. This commit does not invalidate the
      potential for those in the future.
      
      For example, in the future, if we're able to characterize the data we're
      collecting on each interrupt, we may be able to inch toward information
      theoretic accumulators. <https://eprint.iacr.org/2021/523> shows that `s
      = ror32(s, 7) ^ x` and `s = ror64(s, 19) ^ x` make very good
      accumulators for 2-monotone distributions, which would apply to
      timestamp counters, like random_get_entropy() or jiffies, but would not
      apply to our current combination of the two values, or to the various
      function addresses and register values we mix in. Alternatively,
      <https://eprint.iacr.org/2021/1002> shows that max-period linear
      functions with no non-trivial invariant subspace make good extractors,
      used in the form `s = f(s) ^ x`. However, this only works if the input
      data is both identical and independent, and obviously a collection of
      address values and counters fails; so it goes with theoretical papers.
      Future directions here may involve trying to characterize more precisely
      what we actually need to collect in the interrupt handler, and building
      something specific around that.
      
      However, as mentioned, the morass of data we're gathering at the
      interrupt handler presently defies characterization, and so we use
      SipHash for now, which works well and performs well.
      
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: NJean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      f5eab0e2
    • J
      random: provide notifier for VM fork · f3c2682b
      Jason A. Donenfeld 提交于
      Drivers such as WireGuard need to learn when VMs fork in order to clear
      sessions. This commit provides a simple notifier_block for that, with a
      register and unregister function. When no VM fork detection is compiled
      in, this turns into a no-op, similar to how the power notifier works.
      
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      f3c2682b
    • J
      random: replace custom notifier chain with standard one · 5acd3548
      Jason A. Donenfeld 提交于
      We previously rolled our own randomness readiness notifier, which only
      has two users in the whole kernel. Replace this with a more standard
      atomic notifier block that serves the same purpose with less code. Also
      unexport the symbols, because no modules use it, only unconditional
      builtins. The only drawback is that it's possible for a notification
      handler returning the "stop" code to prevent further processing, but
      given that there are only two users, and that we're unexporting this
      anyway, that doesn't seem like a significant drawback for the
      simplification we receive here.
      
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      5acd3548
    • J
      random: do not export add_vmfork_randomness() unless needed · a4107d34
      Jason A. Donenfeld 提交于
      Since add_vmfork_randomness() is only called from vmgenid.o, we can
      guard it in CONFIG_VMGENID, similarly to how we do with
      add_disk_randomness() and CONFIG_BLOCK. If we ever have multiple things
      calling into add_vmfork_randomness(), we can add another shared Kconfig
      symbol for that, but for now, this is good enough. Even though
      add_vmfork_randomess() is a pretty small function, removing it means
      that there are only calls to crng_reseed(false) and none to
      crng_reseed(true), which means the compiler can constant propagate the
      false, removing branches from crng_reseed() and its descendants.
      
      Additionally, we don't even need the symbol to be exported if
      CONFIG_VMGENID is not a module, so conditionalize that too.
      
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      a4107d34
    • J
      random: add mechanism for VM forks to reinitialize crng · ae099e8e
      Jason A. Donenfeld 提交于
      When a VM forks, we must immediately mix in additional information to
      the stream of random output so that two forks or a rollback don't
      produce the same stream of random numbers, which could have catastrophic
      cryptographic consequences. This commit adds a simple API, add_vmfork_
      randomness(), for that, by force reseeding the crng.
      
      This has the added benefit of also draining the entropy pool and setting
      its timer back, so that any old entropy that was there prior -- which
      could have already been used by a different fork, or generally gone
      stale -- does not contribute to the accounting of the next 256 bits.
      
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Jann Horn <jannh@google.com>
      Cc: Eric Biggers <ebiggers@google.com>
      Reviewed-by: NArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      ae099e8e
    • J
      random: don't let 644 read-only sysctls be written to · 77553cf8
      Jason A. Donenfeld 提交于
      We leave around these old sysctls for compatibility, and we keep them
      "writable" for compatibility, but even after writing, we should keep
      reporting the same value. This is consistent with how userspaces tend to
      use sysctl_random_write_wakeup_bits, writing to it, and then later
      reading from it and using the value.
      
      Cc: Theodore Ts'o <tytso@mit.edu>
      Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      77553cf8
    • J
      random: give sysctl_random_min_urandom_seed a more sensible value · d0efdf35
      Jason A. Donenfeld 提交于
      This isn't used by anything or anywhere, but we can't delete it due to
      compatibility. So at least give it the correct value of what it's
      supposed to be instead of a garbage one.
      
      Cc: Theodore Ts'o <tytso@mit.edu>
      Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      d0efdf35
    • J
      random: block in /dev/urandom · 6f98a4bf
      Jason A. Donenfeld 提交于
      This topic has come up countless times, and usually doesn't go anywhere.
      This time I thought I'd bring it up with a slightly narrower focus,
      updated for some developments over the last three years: we finally can
      make /dev/urandom always secure, in light of the fact that our RNG is
      now always seeded.
      
      Ever since Linus' 50ee7529 ("random: try to actively add entropy
      rather than passively wait for it"), the RNG does a haveged-style jitter
      dance around the scheduler, in order to produce entropy (and credit it)
      for the case when we're stuck in wait_for_random_bytes(). How ever you
      feel about the Linus Jitter Dance is beside the point: it's been there
      for three years and usually gets the RNG initialized in a second or so.
      
      As a matter of fact, this is what happens currently when people use
      getrandom(). It's already there and working, and most people have been
      using it for years without realizing.
      
      So, given that the kernel has grown this mechanism for seeding itself
      from nothing, and that this procedure happens pretty fast, maybe there's
      no point any longer in having /dev/urandom give insecure bytes. In the
      past we didn't want the boot process to deadlock, which was
      understandable. But now, in the worst case, a second goes by, and the
      problem is resolved. It seems like maybe we're finally at a point when
      we can get rid of the infamous "urandom read hole".
      
      The one slight drawback is that the Linus Jitter Dance relies on random_
      get_entropy() being implemented. The first lines of try_to_generate_
      entropy() are:
      
      	stack.now = random_get_entropy();
      	if (stack.now == random_get_entropy())
      		return;
      
      On most platforms, random_get_entropy() is simply aliased to get_cycles().
      The number of machines without a cycle counter or some other
      implementation of random_get_entropy() in 2022, which can also run a
      mainline kernel, and at the same time have a both broken and out of date
      userspace that relies on /dev/urandom never blocking at boot is thought
      to be exceedingly low. And to be clear: those museum pieces without
      cycle counters will continue to run Linux just fine, and even
      /dev/urandom will be operable just like before; the RNG just needs to be
      seeded first through the usual means, which should already be the case
      now.
      
      On systems that really do want unseeded randomness, we already offer
      getrandom(GRND_INSECURE), which is in use by, e.g., systemd for seeding
      their hash tables at boot. Nothing in this commit would affect
      GRND_INSECURE, and it remains the means of getting those types of random
      numbers.
      
      This patch goes a long way toward eliminating a long overdue userspace
      crypto footgun. After several decades of endless user confusion, we will
      finally be able to say, "use any single one of our random interfaces and
      you'll be fine. They're all the same. It doesn't matter." And that, I
      think, is really something. Finally all of those blog posts and
      disagreeing forums and contradictory articles will all become correct
      about whatever they happened to recommend, and along with it, a whole
      class of vulnerabilities eliminated.
      
      With very minimal downside, we're finally in a position where we can
      make this change.
      
      Cc: Dinh Nguyen <dinguyen@kernel.org>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Joshua Kinard <kumba@gentoo.org>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Eric Biggers <ebiggers@google.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Lennart Poettering <mzxreary@0pointer.de>
      Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      6f98a4bf
  15. 28 2月, 2022 3 次提交
    • J
      random: do crng pre-init loading in worker rather than irq · c2a7de4f
      Jason A. Donenfeld 提交于
      Taking spinlocks from IRQ context is generally problematic for
      PREEMPT_RT. That is, in part, why we take trylocks instead. However, a
      spin_try_lock() is also problematic since another spin_lock() invocation
      can potentially PI-boost the wrong task, as the spin_try_lock() is
      invoked from an IRQ-context, so the task on CPU (random task or idle) is
      not the actual owner.
      
      Additionally, by deferring the crng pre-init loading to the worker, we
      can use the cryptographic hash function rather than xor, which is
      perhaps a meaningful difference when considering this data has only been
      through the relatively weak fast_mix() function.
      
      The biggest downside of this approach is that the pre-init loading is
      now deferred until later, which means things that need random numbers
      after interrupts are enabled, but before workqueues are running -- or
      before this particular worker manages to run -- are going to get into
      trouble. Hopefully in the real world, this window is rather small,
      especially since this code won't run until 64 interrupts had occurred.
      
      Cc: Sultan Alsawaf <sultan@kerneltoast.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Eric Biggers <ebiggers@kernel.org>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Acked-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      c2a7de4f
    • J
      random: unify cycles_t and jiffies usage and types · abded93e
      Jason A. Donenfeld 提交于
      random_get_entropy() returns a cycles_t, not an unsigned long, which is
      sometimes 64 bits on various 32-bit platforms, including x86.
      Conversely, jiffies is always unsigned long. This commit fixes things to
      use cycles_t for fields that use random_get_entropy(), named "cycles",
      and unsigned long for fields that use jiffies, named "now". It's also
      good to mix in a cycles_t and a jiffies in the same way for both
      add_device_randomness and add_timer_randomness, rather than using xor in
      one case. Finally, we unify the order of these volatile reads, always
      reading the more precise cycles counter, and then jiffies, so that the
      cycle counter is as close to the event as possible.
      
      Cc: Theodore Ts'o <tytso@mit.edu>
      Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      abded93e
    • J
      random: cleanup UUID handling · 64276a99
      Jason A. Donenfeld 提交于
      Rather than hard coding various lengths, we can use the right constants.
      Strings should be `char *` while buffers should be `u8 *`. Rather than
      have a nonsensical and unused maxlength, just remove it. Finally, use
      snprintf instead of sprintf, just out of good hygiene.
      
      As well, remove the old comment about returning a binary UUID via the
      binary sysctl syscall. That syscall was removed from the kernel in 5.5,
      and actually, the "uuid_strategy" function and related infrastructure
      for even serving it via the binary sysctl syscall was removed with
      894d2491 ("sysctl drivers: Remove dead binary sysctl support") back
      in 2.6.33.
      Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      64276a99
  16. 24 2月, 2022 2 次提交
  17. 22 2月, 2022 2 次提交