1. 10 6月, 2022 5 次提交
    • J
      random: remove rng_has_arch_random() · e052a478
      Jason A. Donenfeld 提交于
      With arch randomness being used by every distro and enabled in
      defconfigs, the distinction between rng_has_arch_random() and
      rng_is_initialized() is now rather small. In fact, the places where they
      differ are now places where paranoid users and system builders really
      don't want arch randomness to be used, in which case we should respect
      that choice, or places where arch randomness is known to be broken, in
      which case that choice is all the more important. So this commit just
      removes the function and its one user.
      
      Reviewed-by: Petr Mladek <pmladek@suse.com> # for vsprintf.c
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      e052a478
    • J
      random: do not use jump labels before they are initialized · 60e5b288
      Jason A. Donenfeld 提交于
      Stephen reported that a static key warning splat appears during early
      boot on systems that credit randomness from device trees that contain an
      "rng-seed" property, because because setup_machine_fdt() is called
      before jump_label_init() during setup_arch():
      
       static_key_enable_cpuslocked(): static key '0xffffffe51c6fcfc0' used before call to jump_label_init()
       WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:166 static_key_enable_cpuslocked+0xb0/0xb8
       Modules linked in:
       CPU: 0 PID: 0 Comm: swapper Not tainted 5.18.0+ #224 44b43e377bfc84bc99bb5ab885ff694984ee09ff
       pstate: 600001c9 (nZCv dAIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
       pc : static_key_enable_cpuslocked+0xb0/0xb8
       lr : static_key_enable_cpuslocked+0xb0/0xb8
       sp : ffffffe51c393cf0
       x29: ffffffe51c393cf0 x28: 000000008185054c x27: 00000000f1042f10
       x26: 0000000000000000 x25: 00000000f10302b2 x24: 0000002513200000
       x23: 0000002513200000 x22: ffffffe51c1c9000 x21: fffffffdfdc00000
       x20: ffffffe51c2f0831 x19: ffffffe51c6fcfc0 x18: 00000000ffff1020
       x17: 00000000e1e2ac90 x16: 00000000000000e0 x15: ffffffe51b710708
       x14: 0000000000000066 x13: 0000000000000018 x12: 0000000000000000
       x11: 0000000000000000 x10: 00000000ffffffff x9 : 0000000000000000
       x8 : 0000000000000000 x7 : 61632065726f6665 x6 : 6220646573752027
       x5 : ffffffe51c641d25 x4 : ffffffe51c13142c x3 : ffff0a00ffffff05
       x2 : 40000000ffffe003 x1 : 00000000000001c0 x0 : 0000000000000065
       Call trace:
        static_key_enable_cpuslocked+0xb0/0xb8
        static_key_enable+0x2c/0x40
        crng_set_ready+0x24/0x30
        execute_in_process_context+0x80/0x90
        _credit_init_bits+0x100/0x154
        add_bootloader_randomness+0x64/0x78
        early_init_dt_scan_chosen+0x140/0x184
        early_init_dt_scan_nodes+0x28/0x4c
        early_init_dt_scan+0x40/0x44
        setup_machine_fdt+0x7c/0x120
        setup_arch+0x74/0x1d8
        start_kernel+0x84/0x44c
        __primary_switched+0xc0/0xc8
       ---[ end trace 0000000000000000 ]---
       random: crng init done
       Machine model: Google Lazor (rev1 - 2) with LTE
      
      A trivial fix went in to address this on arm64, 73e2d827 ("arm64:
      Initialize jump labels before setup_machine_fdt()"). I wrote patches as
      well for arm32 and risc-v. But still patches are needed on xtensa,
      powerpc, arc, and mips. So that's 7 platforms where things aren't quite
      right. This sort of points to larger issues that might need a larger
      solution.
      
      Instead, this commit just defers setting the static branch until later
      in the boot process. random_init() is called after jump_label_init() has
      been called, and so is always a safe place from which to adjust the
      static branch.
      
      Fixes: f5bda35f ("random: use static branch for crng_ready()")
      Reported-by: NStephen Boyd <swboyd@chromium.org>
      Reported-by: NPhil Elwell <phil@raspberrypi.com>
      Tested-by: NPhil Elwell <phil@raspberrypi.com>
      Reviewed-by: NArd Biesheuvel <ardb@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      60e5b288
    • J
      random: account for arch randomness in bits · 77fc95f8
      Jason A. Donenfeld 提交于
      Rather than accounting in bytes and multiplying (shifting), we can just
      account in bits and avoid the shift. The main motivation for this is
      there are other patches in flux that expand this code a bit, and
      avoiding the duplication of "* 8" everywhere makes things a bit clearer.
      
      Cc: stable@vger.kernel.org
      Fixes: 12e45a2a ("random: credit architectural init the exact amount")
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      77fc95f8
    • J
      random: mark bootloader randomness code as __init · 39e0f991
      Jason A. Donenfeld 提交于
      add_bootloader_randomness() and the variables it touches are only used
      during __init and not after, so mark these as __init. At the same time,
      unexport this, since it's only called by other __init code that's
      built-in.
      
      Cc: stable@vger.kernel.org
      Fixes: 428826f5 ("fdt: add support for rng-seed")
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      39e0f991
    • J
      random: avoid checking crng_ready() twice in random_init() · 9b29b6b2
      Jason A. Donenfeld 提交于
      The current flow expands to:
      
          if (crng_ready())
             ...
          else if (...)
              if (!crng_ready())
                  ...
      
      The second crng_ready() call is redundant, but can't so easily be
      optimized out by the compiler.
      
      This commit simplifies that to:
      
          if (crng_ready()
              ...
          else if (...)
              ...
      
      Fixes: 560181c2 ("random: move initialization functions out of hot pages")
      Cc: stable@vger.kernel.org
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      9b29b6b2
  2. 23 5月, 2022 1 次提交
    • J
      random: check for signals after page of pool writes · 1ce6c8d6
      Jason A. Donenfeld 提交于
      get_random_bytes_user() checks for signals after producing a PAGE_SIZE
      worth of output, just like /dev/zero does. write_pool() is doing
      basically the same work (actually, slightly more expensive), and so
      should stop to check for signals in the same way. Let's also name it
      write_pool_user() to match get_random_bytes_user(), so this won't be
      misused in the future.
      
      Before this patch, massive writes to /dev/urandom would tie up the
      process for an extremely long time and make it unterminatable. After, it
      can be successfully interrupted. The following test program can be used
      to see this works as intended:
      
        #include <unistd.h>
        #include <fcntl.h>
        #include <signal.h>
        #include <stdio.h>
      
        static unsigned char x[~0U];
      
        static void handle(int) { }
      
        int main(int argc, char *argv[])
        {
          pid_t pid = getpid(), child;
          int fd;
          signal(SIGUSR1, handle);
          if (!(child = fork())) {
            for (;;)
              kill(pid, SIGUSR1);
          }
          fd = open("/dev/urandom", O_WRONLY);
          pause();
          printf("interrupted after writing %zd bytes\n", write(fd, x, sizeof(x)));
          close(fd);
          kill(child, SIGTERM);
          return 0;
        }
      
      Result before: "interrupted after writing 2147479552 bytes"
      Result after: "interrupted after writing 4096 bytes"
      
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      1ce6c8d6
  3. 21 5月, 2022 3 次提交
    • J
      random: wire up fops->splice_{read,write}_iter() · 79025e72
      Jens Axboe 提交于
      Now that random/urandom is using {read,write}_iter, we can wire it up to
      using the generic splice handlers.
      
      Fixes: 36e2c742 ("fs: don't allow splice read/write without explicit ops")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      [Jason: added the splice_write path. Note that sendfile() and such still
       does not work for read, though it does for write, because of a file
       type restriction in splice_direct_to_actor(), which I'll address
       separately.]
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      79025e72
    • J
      random: convert to using fops->write_iter() · 22b0a222
      Jens Axboe 提交于
      Now that the read side has been converted to fix a regression with
      splice, convert the write side as well to have some symmetry in the
      interface used (and help deprecate ->write()).
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      [Jason: cleaned up random_ioctl a bit, require full writes in
       RNDADDENTROPY since it's crediting entropy, simplify control flow of
       write_pool(), and incorporate suggestions from Al.]
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      22b0a222
    • J
      random: convert to using fops->read_iter() · 1b388e77
      Jens Axboe 提交于
      This is a pre-requisite to wiring up splice() again for the random
      and urandom drivers. It also allows us to remove the INT_MAX check in
      getrandom(), because import_single_range() applies capping internally.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      [Jason: rewrote get_random_bytes_user() to simplify and also incorporate
       additional suggestions from Al.]
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      1b388e77
  4. 19 5月, 2022 7 次提交
    • J
      random: unify batched entropy implementations · 3092adce
      Jason A. Donenfeld 提交于
      There are currently two separate batched entropy implementations, for
      u32 and u64, with nearly identical code, with the goal of avoiding
      unaligned memory accesses and letting the buffers be used more
      efficiently. Having to maintain these two functions independently is a
      bit of a hassle though, considering that they always need to be kept in
      sync.
      
      This commit factors them out into a type-generic macro, so that the
      expansion produces the same code as before, such that diffing the
      assembly shows no differences. This will also make it easier in the
      future to add u16 and u8 batches.
      
      This was initially tested using an always_inline function and letting
      gcc constant fold the type size in, but the code gen was less efficient,
      and in general it was more verbose and harder to follow. So this patch
      goes with the boring macro solution, similar to what's already done for
      the _wait functions in random.h.
      
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      3092adce
    • J
      random: move randomize_page() into mm where it belongs · 5ad7dd88
      Jason A. Donenfeld 提交于
      randomize_page is an mm function. It is documented like one. It contains
      the history of one. It has the naming convention of one. It looks
      just like another very similar function in mm, randomize_stack_top().
      And it has always been maintained and updated by mm people. There is no
      need for it to be in random.c. In the "which shape does not look like
      the other ones" test, pointing to randomize_page() is correct.
      
      So move randomize_page() into mm/util.c, right next to the similar
      randomize_stack_top() function.
      
      This commit contains no actual code changes.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      5ad7dd88
    • J
      random: remove mostly unused async readiness notifier · 6701de6c
      Jason A. Donenfeld 提交于
      The register_random_ready_notifier() notifier is somewhat complicated,
      and was already recently rewritten to use notifier blocks. It is only
      used now by one consumer in the kernel, vsprintf.c, for which the async
      mechanism is really overly complex for what it actually needs. This
      commit removes register_random_ready_notifier() and unregister_random_
      ready_notifier(), because it just adds complication with little utility,
      and changes vsprintf.c to just check on `!rng_is_initialized() &&
      !rng_has_arch_random()`, which will eventually be true. Performance-
      wise, that code was already using a static branch, so there's basically
      no overhead at all to this change.
      
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Acked-by: Petr Mladek <pmladek@suse.com> # for vsprintf.c
      Reviewed-by: NPetr Mladek <pmladek@suse.com>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      6701de6c
    • J
      random: remove get_random_bytes_arch() and add rng_has_arch_random() · 248561ad
      Jason A. Donenfeld 提交于
      The RNG incorporates RDRAND into its state at boot and every time it
      reseeds, so there's no reason for callers to use it directly. The
      hashing that the RNG does on it is preferable to using the bytes raw.
      
      The only current use case of get_random_bytes_arch() is vsprintf's
      siphash key for pointer hashing, which uses it to initialize the pointer
      secret earlier than usual if RDRAND is available. In order to replace
      this narrow use case, just expose whether RDRAND is mixed into the RNG,
      with a new function called rng_has_arch_random(). With that taken care
      of, there are no users of get_random_bytes_arch() left, so it can be
      removed.
      
      Later, if trust_cpu gets turned on by default (as most distros are
      doing), this one use of rng_has_arch_random() can probably go away as
      well.
      
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Acked-by: Petr Mladek <pmladek@suse.com> # for vsprintf.c
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      248561ad
    • J
      random: move initialization functions out of hot pages · 560181c2
      Jason A. Donenfeld 提交于
      Much of random.c is devoted to initializing the rng and accounting for
      when a sufficient amount of entropy has been added. In a perfect world,
      this would all happen during init, and so we could mark these functions
      as __init. But in reality, this isn't the case: sometimes the rng only
      finishes initializing some seconds after system init is finished.
      
      For this reason, at the moment, a whole host of functions that are only
      used relatively close to system init and then never again are intermixed
      with functions that are used in hot code all the time. This creates more
      cache misses than necessary.
      
      In order to pack the hot code closer together, this commit moves the
      initialization functions that can't be marked as __init into
      .text.unlikely by way of the __cold attribute.
      
      Of particular note is moving credit_init_bits() into a macro wrapper
      that inlines the crng_ready() static branch check. This avoids a
      function call to a nop+ret, and most notably prevents extra entropy
      arithmetic from being computed in mix_interrupt_randomness().
      Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      560181c2
    • J
      random: make consistent use of buf and len · a1940263
      Jason A. Donenfeld 提交于
      The current code was a mix of "nbytes", "count", "size", "buffer", "in",
      and so forth. Instead, let's clean this up by naming input parameters
      "buf" (or "ubuf") and "len", so that you always understand that you're
      reading this variety of function argument.
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      a1940263
    • J
      random: use static branch for crng_ready() · f5bda35f
      Jason A. Donenfeld 提交于
      Since crng_ready() is only false briefly during initialization and then
      forever after becomes true, we don't need to evaluate it after, making
      it a prime candidate for a static branch.
      
      One complication, however, is that it changes state in a particular call
      to credit_init_bits(), which might be made from atomic context, which
      means we must kick off a workqueue to change the static key. Further
      complicating things, credit_init_bits() may be called sufficiently early
      on in system initialization such that system_wq is NULL.
      
      Fortunately, there exists the nice function execute_in_process_context(),
      which will immediately execute the function if !in_interrupt(), and
      otherwise defer it to a workqueue. During early init, before workqueues
      are available, in_interrupt() is always false, because interrupts
      haven't even been enabled yet, which means the function in that case
      executes immediately. Later on, after workqueues are available,
      in_interrupt() might be true, but in that case, the work is queued in
      system_wq and all goes well.
      
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Sultan Alsawaf <sultan@kerneltoast.com>
      Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      f5bda35f
  5. 18 5月, 2022 10 次提交
    • J
      random: credit architectural init the exact amount · 12e45a2a
      Jason A. Donenfeld 提交于
      RDRAND and RDSEED can fail sometimes, which is fine. We currently
      initialize the RNG with 512 bits of RDRAND/RDSEED. We only need 256 bits
      of those to succeed in order to initialize the RNG. Instead of the
      current "all or nothing" approach, actually credit these contributions
      the amount that is actually contributed.
      Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      12e45a2a
    • J
      random: handle latent entropy and command line from random_init() · 2f14062b
      Jason A. Donenfeld 提交于
      Currently, start_kernel() adds latent entropy and the command line to
      the entropy bool *after* the RNG has been initialized, deferring when
      it's actually used by things like stack canaries until the next time
      the pool is seeded. This surely is not intended.
      
      Rather than splitting up which entropy gets added where and when between
      start_kernel() and random_init(), just do everything in random_init(),
      which should eliminate these kinds of bugs in the future.
      
      While we're at it, rename the awkwardly titled "rand_initialize()" to
      the more standard "random_init()" nomenclature.
      Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      2f14062b
    • J
      random: use proper jiffies comparison macro · 8a5b8a4a
      Jason A. Donenfeld 提交于
      This expands to exactly the same code that it replaces, but makes things
      consistent by using the same macro for jiffy comparisons throughout.
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      8a5b8a4a
    • J
      random: remove ratelimiting for in-kernel unseeded randomness · cc1e127b
      Jason A. Donenfeld 提交于
      The CONFIG_WARN_ALL_UNSEEDED_RANDOM debug option controls whether the
      kernel warns about all unseeded randomness or just the first instance.
      There's some complicated rate limiting and comparison to the previous
      caller, such that even with CONFIG_WARN_ALL_UNSEEDED_RANDOM enabled,
      developers still don't see all the messages or even an accurate count of
      how many were missed. This is the result of basically parallel
      mechanisms aimed at accomplishing more or less the same thing, added at
      different points in random.c history, which sort of compete with the
      first-instance-only limiting we have now.
      
      It turns out, however, that nobody cares about the first unseeded
      randomness instance of in-kernel users. The same first user has been
      there for ages now, and nobody is doing anything about it. It isn't even
      clear that anybody _can_ do anything about it. Most places that can do
      something about it have switched over to using get_random_bytes_wait()
      or wait_for_random_bytes(), which is the right thing to do, but there is
      still much code that needs randomness sometimes during init, and as a
      geeneral rule, if you're not using one of the _wait functions or the
      readiness notifier callback, you're bound to be doing it wrong just
      based on that fact alone.
      
      So warning about this same first user that can't easily change is simply
      not an effective mechanism for anything at all. Users can't do anything
      about it, as the Kconfig text points out -- the problem isn't in
      userspace code -- and kernel developers don't or more often can't react
      to it.
      
      Instead, show the warning for all instances when CONFIG_WARN_ALL_UNSEEDED_RANDOM
      is set, so that developers can debug things need be, or if it isn't set,
      don't show a warning at all.
      
      At the same time, CONFIG_WARN_ALL_UNSEEDED_RANDOM now implies setting
      random.ratelimit_disable=1 on by default, since if you care about one
      you probably care about the other too. And we can clean up usage around
      the related urandom_warning ratelimiter as well (whose behavior isn't
      changing), so that it properly counts missed messages after the 10
      message threshold is reached.
      
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      cc1e127b
    • J
      random: move initialization out of reseeding hot path · 68c9c8b1
      Jason A. Donenfeld 提交于
      Initialization happens once -- by way of credit_init_bits() -- and then
      it never happens again. Therefore, it doesn't need to be in
      crng_reseed(), which is a hot path that is called multiple times. It
      also doesn't make sense to have there, as initialization activity is
      better associated with initialization routines.
      
      After the prior commit, crng_reseed() now won't be called by multiple
      concurrent callers, which means that we can safely move the
      "finialize_init" logic into crng_init_bits() unconditionally.
      Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      68c9c8b1
    • J
      random: avoid initializing twice in credit race · fed7ef06
      Jason A. Donenfeld 提交于
      Since all changes of crng_init now go through credit_init_bits(), we can
      fix a long standing race in which two concurrent callers of
      credit_init_bits() have the new bit count >= some threshold, but are
      doing so with crng_init as a lower threshold, checked outside of a lock,
      resulting in crng_reseed() or similar being called twice.
      
      In order to fix this, we can use the original cmpxchg value of the bit
      count, and only change crng_init when the bit count transitions from
      below a threshold to meeting the threshold.
      Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      fed7ef06
    • J
      random: use symbolic constants for crng_init states · e3d2c5e7
      Jason A. Donenfeld 提交于
      crng_init represents a state machine, with three states, and various
      rules for transitions. For the longest time, we've been managing these
      with "0", "1", and "2", and expecting people to figure it out. To make
      the code more obvious, replace these with proper enum values
      representing the transition, and then redocument what each of these
      states mean.
      Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Cc: Joe Perches <joe@perches.com>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      e3d2c5e7
    • J
      siphash: use one source of truth for siphash permutations · e73aaae2
      Jason A. Donenfeld 提交于
      The SipHash family of permutations is currently used in three places:
      
      - siphash.c itself, used in the ordinary way it was intended.
      - random32.c, in a construction from an anonymous contributor.
      - random.c, as part of its fast_mix function.
      
      Each one of these places reinvents the wheel with the same C code, same
      rotation constants, and same symmetry-breaking constants.
      
      This commit tidies things up a bit by placing macros for the
      permutations and constants into siphash.h, where each of the three .c
      users can access them. It also leaves a note dissuading more users of
      them from emerging.
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      e73aaae2
    • J
      random: help compiler out with fast_mix() by using simpler arguments · 791332b3
      Jason A. Donenfeld 提交于
      Now that fast_mix() has more than one caller, gcc no longer inlines it.
      That's fine. But it also doesn't handle the compound literal argument we
      pass it very efficiently, nor does it handle the loop as well as it
      could. So just expand the code to spell out this function so that it
      generates the same code as it did before. Performance-wise, this now
      behaves as it did before the last commit. The difference in actual code
      size on x86 is 45 bytes, which is less than a cache line.
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      791332b3
    • J
      random: do not use input pool from hard IRQs · e3e33fc2
      Jason A. Donenfeld 提交于
      Years ago, a separate fast pool was added for interrupts, so that the
      cost associated with taking the input pool spinlocks and mixing into it
      would be avoided in places where latency is critical. However, one
      oversight was that add_input_randomness() and add_disk_randomness()
      still sometimes are called directly from the interrupt handler, rather
      than being deferred to a thread. This means that some unlucky interrupts
      will be caught doing a blake2s_compress() call and potentially spinning
      on input_pool.lock, which can also be taken by unprivileged users by
      writing into /dev/urandom.
      
      In order to fix this, add_timer_randomness() now checks whether it is
      being called from a hard IRQ and if so, just mixes into the per-cpu IRQ
      fast pool using fast_mix(), which is much faster and can be done
      lock-free. A nice consequence of this, as well, is that it means hard
      IRQ context FPU support is likely no longer useful.
      
      The entropy estimation algorithm used by add_timer_randomness() is also
      somewhat different than the one used for add_interrupt_randomness(). The
      former looks at deltas of deltas of deltas, while the latter just waits
      for 64 interrupts for one bit or for one second since the last bit. In
      order to bridge these, and since add_interrupt_randomness() runs after
      an add_timer_randomness() that's called from hard IRQ, we add to the
      fast pool credit the related amount, and then subtract one to account
      for add_interrupt_randomness()'s contribution.
      
      A downside of this, however, is that the num argument is potentially
      attacker controlled, which puts a bit more pressure on the fast_mix()
      sponge to do more than it's really intended to do. As a mitigating
      factor, the first 96 bits of input aren't attacker controlled (a cycle
      counter followed by zeros), which means it's essentially two rounds of
      siphash rather than one, which is somewhat better. It's also not that
      much different from add_interrupt_randomness()'s use of the irq stack
      instruction pointer register.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Filipe Manana <fdmanana@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      e3e33fc2
  6. 16 5月, 2022 1 次提交
  7. 15 5月, 2022 1 次提交
    • J
      random: do not pretend to handle premature next security model · e85c0fc1
      Jason A. Donenfeld 提交于
      Per the thread linked below, "premature next" is not considered to be a
      realistic threat model, and leads to more serious security problems.
      
      "Premature next" is the scenario in which:
      
      - Attacker compromises the current state of a fully initialized RNG via
        some kind of infoleak.
      - New bits of entropy are added directly to the key used to generate the
        /dev/urandom stream, without any buffering or pooling.
      - Attacker then, somehow having read access to /dev/urandom, samples RNG
        output and brute forces the individual new bits that were added.
      - Result: the RNG never "recovers" from the initial compromise, a
        so-called violation of what academics term "post-compromise security".
      
      The usual solutions to this involve some form of delaying when entropy
      gets mixed into the crng. With Fortuna, this involves multiple input
      buckets. With what the Linux RNG was trying to do prior, this involves
      entropy estimation.
      
      However, by delaying when entropy gets mixed in, it also means that RNG
      compromises are extremely dangerous during the window of time before
      the RNG has gathered enough entropy, during which time nonces may become
      predictable (or repeated), ephemeral keys may not be secret, and so
      forth. Moreover, it's unclear how realistic "premature next" is from an
      attack perspective, if these attacks even make sense in practice.
      
      Put together -- and discussed in more detail in the thread below --
      these constitute grounds for just doing away with the current code that
      pretends to handle premature next. I say "pretends" because it wasn't
      doing an especially great job at it either; should we change our mind
      about this direction, we would probably implement Fortuna to "fix" the
      "problem", in which case, removing the pretend solution still makes
      sense.
      
      This also reduces the crng reseed period from 5 minutes down to 1
      minute. The rationale from the thread might lead us toward reducing that
      even further in the future (or even eliminating it), but that remains a
      topic of a future commit.
      
      At a high level, this patch changes semantics from:
      
          Before: Seed for the first time after 256 "bits" of estimated
          entropy have been accumulated since the system booted. Thereafter,
          reseed once every five minutes, but only if 256 new "bits" have been
          accumulated since the last reseeding.
      
          After: Seed for the first time after 256 "bits" of estimated entropy
          have been accumulated since the system booted. Thereafter, reseed
          once every minute.
      
      Most of this patch is renaming and removing: POOL_MIN_BITS becomes
      POOL_INIT_BITS, credit_entropy_bits() becomes credit_init_bits(),
      crng_reseed() loses its "force" parameter since it's now always true,
      the drain_entropy() function no longer has any use so it's removed,
      entropy estimation is skipped if we've already init'd, the various
      notifiers for "low on entropy" are now only active prior to init, and
      finally, some documentation comments are cleaned up here and there.
      
      Link: https://lore.kernel.org/lkml/YmlMGx6+uigkGiZ0@zx2c4.com/
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Nadia Heninger <nadiah@cs.ucsd.edu>
      Cc: Tom Ristenpart <ristenpart@cornell.edu>
      Reviewed-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      e85c0fc1
  8. 14 5月, 2022 5 次提交
    • J
      random: use first 128 bits of input as fast init · 5c3b747e
      Jason A. Donenfeld 提交于
      Before, the first 64 bytes of input, regardless of how entropic it was,
      would be used to mutate the crng base key directly, and none of those
      bytes would be credited as having entropy. Then 256 bits of credited
      input would be accumulated, and only then would the rng transition from
      the earlier "fast init" phase into being actually initialized.
      
      The thinking was that by mixing and matching fast init and real init, an
      attacker who compromised the fast init state, considered easy to do
      given how little entropy might be in those first 64 bytes, would then be
      able to bruteforce bits from the actual initialization. By keeping these
      separate, bruteforcing became impossible.
      
      However, by not crediting potentially creditable bits from those first 64
      bytes of input, we delay initialization, and actually make the problem
      worse, because it means the user is drawing worse random numbers for a
      longer period of time.
      
      Instead, we can take the first 128 bits as fast init, and allow them to
      be credited, and then hold off on the next 128 bits until they've
      accumulated. This is still a wide enough margin to prevent bruteforcing
      the rng state, while still initializing much faster.
      
      Then, rather than trying to piecemeal inject into the base crng key at
      various points, instead just extract from the pool when we need it, for
      the crng_init==0 phase. Performance may even be better for the various
      inputs here, since there are likely more calls to mix_pool_bytes() then
      there are to get_random_bytes() during this phase of system execution.
      
      Since the preinit injection code is gone, bootloader randomness can then
      do something significantly more straight forward, removing the weird
      system_wq hack in hwgenerator randomness.
      
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      5c3b747e
    • J
      random: do not use batches when !crng_ready() · cbe89e5a
      Jason A. Donenfeld 提交于
      It's too hard to keep the batches synchronized, and pointless anyway,
      since in !crng_ready(), we're updating the base_crng key really often,
      where batching only hurts. So instead, if the crng isn't ready, just
      call into get_random_bytes(). At this stage nothing is performance
      critical anyhow.
      
      Cc: Theodore Ts'o <tytso@mit.edu>
      Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      cbe89e5a
    • J
      random: mix in timestamps and reseed on system restore · b7b67d13
      Jason A. Donenfeld 提交于
      Since the RNG loses freshness with system suspend/hibernation, when we
      resume, immediately reseed using whatever data we can, which for this
      particular case is the various timestamps regarding system suspend time,
      in addition to more generally the RDSEED/RDRAND/RDTSC values that happen
      whenever the crng reseeds.
      
      On systems that suspend and resume automatically all the time -- such as
      Android -- we skip the reseeding on suspend resumption, since that could
      wind up being far too busy. This is the same trade-off made in
      WireGuard.
      
      In addition to reseeding upon resumption always mix into the pool these
      various stamps on every power notification event.
      
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      b7b67d13
    • J
      random: vary jitter iterations based on cycle counter speed · 78c768e6
      Jason A. Donenfeld 提交于
      Currently, we do the jitter dance if two consecutive reads to the cycle
      counter return different values. If they do, then we consider the cycle
      counter to be fast enough that one trip through the scheduler will yield
      one "bit" of credited entropy. If those two reads return the same value,
      then we assume the cycle counter is too slow to show meaningful
      differences.
      
      This methodology is flawed for a variety of reasons, one of which Eric
      posted a patch to fix in [1]. The issue that patch solves is that on a
      system with a slow counter, you might be [un]lucky and read the counter
      _just_ before it changes, so that the second cycle counter you read
      differs from the first, even though there's usually quite a large period
      of time in between the two. For example:
      
      | real time | cycle counter |
      | --------- | ------------- |
      | 3         | 5             |
      | 4         | 5             |
      | 5         | 5             |
      | 6         | 5             |
      | 7         | 5             | <--- a
      | 8         | 6             | <--- b
      | 9         | 6             | <--- c
      
      If we read the counter at (a) and compare it to (b), we might be fooled
      into thinking that it's a fast counter, when in reality it is not. The
      solution in [1] is to also compare counter (b) to counter (c), on the
      theory that if the counter is _actually_ slow, and (a)!=(b), then
      certainly (b)==(c).
      
      This helps solve this particular issue, in one sense, but in another
      sense, it mostly functions to disallow jitter entropy on these systems,
      rather than simply taking more samples in that case.
      
      Instead, this patch takes a different approach. Right now we assume that
      a difference in one set of consecutive samples means one "bit" of
      credited entropy per scheduler trip. We can extend this so that a
      difference in two sets of consecutive samples means one "bit" of
      credited entropy per /two/ scheduler trips, and three for three, and
      four for four. In other words, we can increase the amount of jitter
      "work" we require for each "bit", depending on how slow the cycle
      counter is.
      
      So this patch takes whole bunch of samples, sees how many of them are
      different, and divides to find the amount of work required per "bit",
      and also requires that at least some minimum of them are different in
      order to attempt any jitter entropy.
      
      Note that this approach is still far from perfect. It's not a real
      statistical estimate on how much these samples vary; it's not a
      real-time analysis of the relevant input data. That remains a project
      for another time. However, it makes the same (partly flawed) assumptions
      as the code that's there now, so it's probably not worse than the status
      quo, and it handles the issue Eric mentioned in [1]. But, again, it's
      probably a far cry from whatever a really robust version of this would
      be.
      
      [1] https://lore.kernel.org/lkml/20220421233152.58522-1-ebiggers@kernel.org/
          https://lore.kernel.org/lkml/20220421192939.250680-1-ebiggers@kernel.org/
      
      Cc: Eric Biggers <ebiggers@google.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      78c768e6
    • J
      random: insist on random_get_entropy() existing in order to simplify · 4b758eda
      Jason A. Donenfeld 提交于
      All platforms are now guaranteed to provide some value for
      random_get_entropy(). In case some bug leads to this not being so, we
      print a warning, because that indicates that something is really very
      wrong (and likely other things are impacted too). This should never be
      hit, but it's a good and cheap way of finding out if something ever is
      problematic.
      
      Since we now have viable fallback code for random_get_entropy() on all
      platforms, which is, in the worst case, not worse than jiffies, we can
      count on getting the best possible value out of it. That means there's
      no longer a use for using jiffies as entropy input. It also means we no
      longer have a reason for doing the round-robin register flow in the IRQ
      handler, which was always of fairly dubious value.
      
      Instead we can greatly simplify the IRQ handler inputs and also unify
      the construction between 64-bits and 32-bits. We now collect the cycle
      counter and the return address, since those are the two things that
      matter. Because the return address and the irq number are likely
      related, to the extent we mix in the irq number, we can just xor it into
      the top unchanging bytes of the return address, rather than the bottom
      changing bytes of the cycle counter as before. Then, we can do a fixed 2
      rounds of SipHash/HSipHash. Finally, we use the same construction of
      hashing only half of the [H]SipHash state on 32-bit and 64-bit. We're
      not actually discarding any entropy, since that entropy is carried
      through until the next time. And more importantly, it lets us do the
      same sponge-like construction everywhere.
      
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      4b758eda
  9. 25 4月, 2022 1 次提交
  10. 16 4月, 2022 1 次提交
  11. 13 4月, 2022 2 次提交
    • J
      random: make random_get_entropy() return an unsigned long · b0c3e796
      Jason A. Donenfeld 提交于
      Some implementations were returning type `unsigned long`, while others
      that fell back to get_cycles() were implicitly returning a `cycles_t` or
      an untyped constant int literal. That makes for weird and confusing
      code, and basically all code in the kernel already handled it like it
      was an `unsigned long`. I recently tried to handle it as the largest
      type it could be, a `cycles_t`, but doing so doesn't really help with
      much.
      
      Instead let's just make random_get_entropy() return an unsigned long all
      the time. This also matches the commonly used `arch_get_random_long()`
      function, so now RDRAND and RDTSC return the same sized integer, which
      means one can fallback to the other more gracefully.
      
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      b0c3e796
    • J
      random: allow partial reads if later user copies fail · 5209aed5
      Jason A. Donenfeld 提交于
      Rather than failing entirely if a copy_to_user() fails at some point,
      instead we should return a partial read for the amount that succeeded
      prior, unless none succeeded at all, in which case we return -EFAULT as
      before.
      
      This makes it consistent with other reader interfaces. For example, the
      following snippet for /dev/zero outputs "4" followed by "1":
      
        int fd;
        void *x = mmap(NULL, 4096, PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
        assert(x != MAP_FAILED);
        fd = open("/dev/zero", O_RDONLY);
        assert(fd >= 0);
        printf("%zd\n", read(fd, x, 4));
        printf("%zd\n", read(fd, x + 4095, 4));
        close(fd);
      
      This brings that same standard behavior to the various RNG reader
      interfaces.
      
      While we're at it, we can streamline the loop logic a little bit.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Jann Horn <jannh@google.com>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      5209aed5
  12. 07 4月, 2022 1 次提交
    • J
      random: check for signals every PAGE_SIZE chunk of /dev/[u]random · e3c1c4fd
      Jason A. Donenfeld 提交于
      In 1448769c ("random: check for signal_pending() outside of
      need_resched() check"), Jann pointed out that we previously were only
      checking the TIF_NOTIFY_SIGNAL and TIF_SIGPENDING flags if the process
      had TIF_NEED_RESCHED set, which meant in practice, super long reads to
      /dev/[u]random would delay signal handling by a long time. I tried this
      using the below program, and indeed I wasn't able to interrupt a
      /dev/urandom read until after several megabytes had been read. The bug
      he fixed has always been there, and so code that reads from /dev/urandom
      without checking the return value of read() has mostly worked for a long
      time, for most sizes, not just for <= 256.
      
      Maybe it makes sense to keep that code working. The reason it was so
      small prior, ignoring the fact that it didn't work anyway, was likely
      because /dev/random used to block, and that could happen for pretty
      large lengths of time while entropy was gathered. But now, it's just a
      chacha20 call, which is extremely fast and is just operating on pure
      data, without having to wait for some external event. In that sense,
      /dev/[u]random is a lot more like /dev/zero.
      
      Taking a page out of /dev/zero's read_zero() function, it always returns
      at least one chunk, and then checks for signals after each chunk. Chunk
      sizes there are of length PAGE_SIZE. Let's just copy the same thing for
      /dev/[u]random, and check for signals and cond_resched() for every
      PAGE_SIZE amount of data. This makes the behavior more consistent with
      expectations, and should mitigate the impact of Jann's fix for the
      age-old signal check bug.
      
      ---- test program ----
      
        #include <unistd.h>
        #include <signal.h>
        #include <stdio.h>
        #include <sys/random.h>
      
        static unsigned char x[~0U];
      
        static void handle(int) { }
      
        int main(int argc, char *argv[])
        {
          pid_t pid = getpid(), child;
          signal(SIGUSR1, handle);
          if (!(child = fork())) {
            for (;;)
              kill(pid, SIGUSR1);
          }
          pause();
          printf("interrupted after reading %zd bytes\n", getrandom(x, sizeof(x), 0));
          kill(child, SIGTERM);
          return 0;
        }
      
      Cc: Jann Horn <jannh@google.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      e3c1c4fd
  13. 06 4月, 2022 2 次提交
    • J
      random: check for signal_pending() outside of need_resched() check · 1448769c
      Jann Horn 提交于
      signal_pending() checks TIF_NOTIFY_SIGNAL and TIF_SIGPENDING, which
      signal that the task should bail out of the syscall when possible. This
      is a separate concept from need_resched(), which checks
      TIF_NEED_RESCHED, signaling that the task should preempt.
      
      In particular, with the current code, the signal_pending() bailout
      probably won't work reliably.
      
      Change this to look like other functions that read lots of data, such as
      read_zero().
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      1448769c
    • J
      random: do not allow user to keep crng key around on stack · aba120cc
      Jason A. Donenfeld 提交于
      The fast key erasure RNG design relies on the key that's used to be used
      and then discarded. We do this, making judicious use of
      memzero_explicit().  However, reads to /dev/urandom and calls to
      getrandom() involve a copy_to_user(), and userspace can use FUSE or
      userfaultfd, or make a massive call, dynamically remap memory addresses
      as it goes, and set the process priority to idle, in order to keep a
      kernel stack alive indefinitely. By probing
      /proc/sys/kernel/random/entropy_avail to learn when the crng key is
      refreshed, a malicious userspace could mount this attack every 5 minutes
      thereafter, breaking the crng's forward secrecy.
      
      In order to fix this, we just overwrite the stack's key with the first
      32 bytes of the "free" fast key erasure output. If we're returning <= 32
      bytes to the user, then we can still return those bytes directly, so
      that short reads don't become slower. And for long reads, the difference
      is hopefully lost in the amortization, so it doesn't change much, with
      that amortization helping variously for medium reads.
      
      We don't need to do this for get_random_bytes() and the various
      kernel-space callers, and later, if we ever switch to always batching,
      this won't be necessary either, so there's no need to change the API of
      these functions.
      
      Cc: Theodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJann Horn <jannh@google.com>
      Fixes: c92e040d ("random: add backtracking protection to the CRNG")
      Fixes: 186873c5 ("random: use simpler fast key erasure flow on per-cpu keys")
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      aba120cc