1. 30 9月, 2022 6 次提交
    • J
      prandom: make use of smaller types in prandom_u32_max · 4c95236a
      Jason A. Donenfeld 提交于
      When possible at compile-time, make use of smaller types in
      prandom_u32_max(), so that we can use smaller batches from random.c,
      which in turn leads to a 2x or 4x performance boost. This makes a
      difference, for example, in kfence, which needs a fast stream of small
      numbers (booleans).
      
      At the same time, we use the occasion to update the old documentation on
      these functions. prandom_u32() and prandom_bytes() have direct
      replacements now in random.h, while prandom_u32_max() remains useful as
      a prandom.h function, since it's not cryptographically secure by virtue
      of not being evenly distributed.
      
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Acked-by: NMarco Elver <elver@google.com>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      4c95236a
    • J
      random: add 8-bit and 16-bit batches · 585cd5fe
      Jason A. Donenfeld 提交于
      There are numerous places in the kernel that would be sped up by having
      smaller batches. Currently those callsites do `get_random_u32() & 0xff`
      or similar. Since these are pretty spread out, and will require patches
      to multiple different trees, let's get ahead of the curve and lay the
      foundation for `get_random_u8()` and `get_random_u16()`, so that it's
      then possible to start submitting conversion patches leisurely.
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      585cd5fe
    • J
      utsname: contribute changes to RNG · 37608ba3
      Jason A. Donenfeld 提交于
      On some small machines with little entropy, a quasi-unique hostname is
      sometimes a relevant factor. I've seen, for example, 8 character
      alpha-numeric serial numbers. In addition, the time at which the hostname
      is set is usually a decent measurement of how long early boot took. So,
      call add_device_randomness() on new hostnames, which feeds its arguments
      to the RNG in addition to a fresh cycle counter.
      
      Low cost hooks like this never hurt and can only ever help, and since
      this costs basically nothing for an operation that is never a fast path,
      this is an overall easy win.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      37608ba3
    • J
      random: use init_utsname() instead of utsname() · dd54fd7d
      Jason A. Donenfeld 提交于
      Rather than going through the current-> indirection for utsname, at this
      point in boot, init_utsname()==utsname(), so just use it directly that
      way. Additionally, init_utsname() appears to be available nearly always,
      so move it into random_init_early().
      Suggested-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      dd54fd7d
    • J
      kfence: use better stack hash seed · 08475dab
      Jason A. Donenfeld 提交于
      As of the prior commit, the RNG will have incorporated both a cycle
      counter value and RDRAND, in addition to various other environmental
      noise. Therefore, using get_random_u32() will supply a stronger seed
      than simply using random_get_entropy(). N.B.: random_get_entropy()
      should be considered an internal API of random.c and not generally
      consumed.
      
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NMarco Elver <elver@google.com>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      08475dab
    • J
      random: split initialization into early step and later step · f6238499
      Jason A. Donenfeld 提交于
      The full RNG initialization relies on some timestamps, made possible
      with initialization functions like time_init() and timekeeping_init().
      However, these are only available rather late in initialization.
      Meanwhile, other things, such as memory allocator functions, make use of
      the RNG much earlier.
      
      So split RNG initialization into two phases. We can provide arch
      randomness very early on, and then later, after timekeeping and such are
      available, initialize the rest.
      
      This ensures that, for example, slabs are properly randomized if RDRAND
      is available. Without this, CONFIG_SLAB_FREELIST_RANDOM=y loses a degree
      of its security, because its random seed is potentially deterministic,
      since it hasn't yet incorporated RDRAND. It also makes it possible to
      use a better seed in kfence, which currently relies on only the cycle
      counter.
      
      Another positive consequence is that on systems with RDRAND, running
      with CONFIG_WARN_ALL_UNSEEDED_RANDOM=y results in no warnings at all.
      
      One subtle side effect of this change is that on systems with no RDRAND,
      RDTSC is now only queried by random_init() once, committing the moment
      of the function call, instead of multiple times as before. This is
      intentional, as the multiple RDTSCs in a loop before weren't
      accomplishing very much, with jitter being better provided by
      try_to_generate_entropy(). Plus, filling blocks with RDTSC is still
      being done in extract_entropy(), which is necessarily called before
      random bytes are served anyway.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      f6238499
  2. 29 9月, 2022 2 次提交
    • J
      random: use expired timer rather than wq for mixing fast pool · 748bc4dd
      Jason A. Donenfeld 提交于
      Previously, the fast pool was dumped into the main pool periodically in
      the fast pool's hard IRQ handler. This worked fine and there weren't
      problems with it, until RT came around. Since RT converts spinlocks into
      sleeping locks, problems cropped up. Rather than switching to raw
      spinlocks, the RT developers preferred we make the transformation from
      originally doing:
      
          do_some_stuff()
          spin_lock()
          do_some_other_stuff()
          spin_unlock()
      
      to doing:
      
          do_some_stuff()
          queue_work_on(some_other_stuff_worker)
      
      This is an ordinary pattern done all over the kernel. However, Sherry
      noticed a 10% performance regression in qperf TCP over a 40gbps
      InfiniBand card. Quoting her message:
      
      > MT27500 Family [ConnectX-3] cards:
      > Infiniband device 'mlx4_0' port 1 status:
      > default gid: fe80:0000:0000:0000:0010:e000:0178:9eb1
      > base lid: 0x6
      > sm lid: 0x1
      > state: 4: ACTIVE
      > phys state: 5: LinkUp
      > rate: 40 Gb/sec (4X QDR)
      > link_layer: InfiniBand
      >
      > Cards are configured with IP addresses on private subnet for IPoIB
      > performance testing.
      > Regression identified in this bug is in TCP latency in this stack as reported
      > by qperf tcp_lat metric:
      >
      > We have one system listen as a qperf server:
      > [root@yourQperfServer ~]# qperf
      >
      > Have the other system connect to qperf server as a client (in this
      > case, it’s X7 server with Mellanox card):
      > [root@yourQperfClient ~]# numactl -m0 -N0 qperf 20.20.20.101 -v -uu -ub --time 60 --wait_server 20 -oo msg_size:4K:1024K:*2 tcp_lat
      
      Rather than incur the scheduling latency from queue_work_on, we can
      instead switch to running on the next timer tick, on the same core. This
      also batches things a bit more -- once per jiffy -- which is okay now
      that mix_interrupt_randomness() can credit multiple bits at once.
      Reported-by: NSherry Yang <sherry.yang@oracle.com>
      Tested-by: NPaul Webb <paul.x.webb@oracle.com>
      Cc: Sherry Yang <sherry.yang@oracle.com>
      Cc: Phillip Goerl <phillip.goerl@oracle.com>
      Cc: Jack Vogel <jack.vogel@oracle.com>
      Cc: Nicky Veitch <nicky.veitch@oracle.com>
      Cc: Colm Harrington <colm.harrington@oracle.com>
      Cc: Ramanan Govindarajan <ramanan.govindarajan@oracle.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Sultan Alsawaf <sultan@kerneltoast.com>
      Cc: stable@vger.kernel.org
      Fixes: 58340f8e ("random: defer fast pool mixing to worker")
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      748bc4dd
    • J
      random: avoid reading two cache lines on irq randomness · 9ee0507e
      Jason A. Donenfeld 提交于
      In order to avoid reading and dirtying two cache lines on every IRQ,
      move the work_struct to the bottom of the fast_pool struct. add_
      interrupt_randomness() always touches .pool and .count, which are
      currently split, because .mix pushes everything down. Instead, move .mix
      to the bottom, so that .pool and .count are always in the first cache
      line, since .mix is only accessed when the pool is full.
      
      Fixes: 58340f8e ("random: defer fast pool mixing to worker")
      Reviewed-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      9ee0507e
  3. 23 9月, 2022 8 次提交
    • J
      random: clamp credited irq bits to maximum mixed · e78a802a
      Jason A. Donenfeld 提交于
      Since the most that's mixed into the pool is sizeof(long)*2, don't
      credit more than that many bytes of entropy.
      
      Fixes: e3e33fc2 ("random: do not use input pool from hard IRQs")
      Cc: stable@vger.kernel.org
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      e78a802a
    • J
      random: throttle hwrng writes if no entropy is credited · d775335e
      Jason A. Donenfeld 提交于
      If a hwrng source does not provide an entropy estimate, it currently
      does not contribute at all to the CRNG. In order to help fix this, in
      case add_hwgenerator_randomness() is called with the entropy parameter
      set to zero, go to sleep until one reseed interval has passed.
      
      While the hwrng thread currently only runs under conditions where this
      is non-zero, this change is not harmful and prepares for future updates
      to the hwrng core.
      
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      d775335e
    • D
      random: use hwgenerator randomness more frequently at early boot · 745558f9
      Dominik Brodowski 提交于
      Mix in randomness from hw-rng sources more frequently during early
      boot, approximately once for every rng reseed.
      Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      745558f9
    • J
      random: restore O_NONBLOCK support · cd4f24ae
      Jason A. Donenfeld 提交于
      Prior to 5.6, when /dev/random was opened with O_NONBLOCK, it would
      return -EAGAIN if there was no entropy. When the pools were unified in
      5.6, this was lost. The post 5.6 behavior of blocking until the pool is
      initialized, and ignoring O_NONBLOCK in the process, went unnoticed,
      with no reports about the regression received for two and a half years.
      However, eventually this indeed did break somebody's userspace.
      
      So we restore the old behavior, by returning -EAGAIN if the pool is not
      initialized. Unlike the old /dev/random, this can only occur during
      early boot, after which it never blocks again.
      
      In order to make this O_NONBLOCK behavior consistent with other
      expectations, also respect users reading with preadv2(RWF_NOWAIT) and
      similar.
      
      Fixes: 30c08efe ("random: make /dev/random be almost like /dev/urandom")
      Reported-by: NGuozihua <guozihua@huawei.com>
      Reported-by: NZhongguohua <zhongguohua1@huawei.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Andrew Lutomirski <luto@kernel.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      cd4f24ae
    • L
      Merge tag 'net-6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 504c25cb
      Linus Torvalds 提交于
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from wifi, netfilter and can.
      
        A handful of awaited fixes here - revert of the FEC changes, bluetooth
        fix, fixes for iwlwifi spew.
      
        We added a warning in PHY/MDIO code which is triggering on a couple of
        platforms in a false-positive-ish way. If we can't iron that out over
        the week we'll drop it and re-add for 6.1.
      
        I've added a new "follow up fixes" section for fixes to fixes in
        6.0-rcs but it may actually give the false impression that those are
        problematic or that more testing time would have caught them. So
        likely a one time thing.
      
        Follow up fixes:
      
         - nf_tables_addchain: fix nft_counters_enabled underflow
      
         - ebtables: fix memory leak when blob is malformed
      
         - nf_ct_ftp: fix deadlock when nat rewrite is needed
      
        Current release - regressions:
      
         - Revert "fec: Restart PPS after link state change" and the related
           "net: fec: Use a spinlock to guard `fep->ptp_clk_on`"
      
         - Bluetooth: fix HCIGETDEVINFO regression
      
         - wifi: mt76: fix 5 GHz connection regression on mt76x0/mt76x2
      
         - mptcp: fix fwd memory accounting on coalesce
      
         - rwlock removal fall out:
            - ipmr: always call ip{,6}_mr_forward() from RCU read-side
              critical section
            - ipv6: fix crash when IPv6 is administratively disabled
      
         - tcp: read multiple skbs in tcp_read_skb()
      
         - mdio_bus_phy_resume state warning fallout:
            - eth: ravb: fix PHY state warning splat during system resume
            - eth: sh_eth: fix PHY state warning splat during system resume
      
        Current release - new code bugs:
      
         - wifi: iwlwifi: don't spam logs with NSS>2 messages
      
         - eth: mtk_eth_soc: enable XDP support just for MT7986 SoC
      
        Previous releases - regressions:
      
         - bonding: fix NULL deref in bond_rr_gen_slave_id
      
         - wifi: iwlwifi: mark IWLMEI as broken
      
        Previous releases - always broken:
      
         - nf_conntrack helpers:
            - irc: tighten matching on DCC message
            - sip: fix ct_sip_walk_headers
            - osf: fix possible bogus match in nf_osf_find()
      
         - ipvlan: fix out-of-bound bugs caused by unset skb->mac_header
      
         - core: fix flow symmetric hash
      
         - bonding, team: unsync device addresses on ndo_stop
      
         - phy: micrel: fix shared interrupt on LAN8814"
      
      * tag 'net-6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits)
        selftests: forwarding: add shebang for sch_red.sh
        bnxt: prevent skb UAF after handing over to PTP worker
        net: marvell: Fix refcounting bugs in prestera_port_sfp_bind()
        net: sched: fix possible refcount leak in tc_new_tfilter()
        net: sunhme: Fix packet reception for len < RX_COPY_THRESHOLD
        udp: Use WARN_ON_ONCE() in udp_read_skb()
        selftests: bonding: cause oops in bond_rr_gen_slave_id
        bonding: fix NULL deref in bond_rr_gen_slave_id
        net: phy: micrel: fix shared interrupt on LAN8814
        net/smc: Stop the CLC flow if no link to map buffers on
        ice: Fix ice_xdp_xmit() when XDP TX queue number is not sufficient
        net: atlantic: fix potential memory leak in aq_ndev_close()
        can: gs_usb: gs_usb_set_phys_id(): return with error if identify is not supported
        can: gs_usb: gs_can_open(): fix race dev->can.state condition
        can: flexcan: flexcan_mailbox_read() fix return value for drop = true
        net: sh_eth: Fix PHY state warning splat during system resume
        net: ravb: Fix PHY state warning splat during system resume
        netfilter: nf_ct_ftp: fix deadlock when nat rewrite is needed
        netfilter: ebtables: fix memory leak when blob is malformed
        netfilter: nf_tables: fix percpu memory leak at nf_tables_addchain()
        ...
      504c25cb
    • L
      Merge tag 'efi-urgent-for-v6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi · 129e7152
      Linus Torvalds 提交于
      Pull EFI fixes from Ard Biesheuvel:
      
       - Use the right variable to check for shim insecure mode
      
       - Wipe setup_data field when booting via EFI
      
       - Add missing error check to efibc driver
      
      * tag 'efi-urgent-for-v6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
        efi: libstub: check Shim mode using MokSBStateRT
        efi: x86: Wipe setup_data on pure EFI boot
        efi: efibc: Guard against allocation failure
      129e7152
    • L
      Merge tag 'gpio-fixes-for-v6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux · 5e0a93e4
      Linus Torvalds 提交于
      Pull gpio fixes from Bartosz Golaszewski:
      
       - fix a NULL-pointer dereference at driver unbind and a potential
         resource leak in error path in gpio-mockup
      
       - make the irqchip immutable in gpio-ftgpio010
      
       - fix dereferencing a potentially uninitialized variable in gpio-tqmx86
      
       - fix interrupt registering in gpiolib's character device code
      
      * tag 'gpio-fixes-for-v6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
        gpiolib: cdev: Set lineevent_state::irq after IRQ register successfully
        gpio: tqmx86: fix uninitialized variable girq
        gpio: ftgpio010: Make irqchip immutable
        gpio: mockup: Fix potential resource leakage when register a chip
        gpio: mockup: fix NULL pointer dereference when removing debugfs
      5e0a93e4
    • L
      Merge tag 'perf-tools-fixes-for-v6.0-2022-09-21' of... · 9597f088
      Linus Torvalds 提交于
      Merge tag 'perf-tools-fixes-for-v6.0-2022-09-21' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
      
      Pull perf tools fixes from Arnaldo Carvalho de Melo:
      
       - Fix polling of system-wide events related to mixing per-cpu and
         per-thread events.
      
       - Do not check if /proc/modules is unchanged when copying /proc/kcore,
         that doesn't get in the way of post processing analysis.
      
       - Include program header in ELF files generated for JIT files, so that
         they can be opened by tools using elfutils libraries.
      
       - Enter namespaces when synthesizing build-ids.
      
       - Fix some bugs related to a recent cpu_map overhaul where we should be
         using an index and not the cpu number.
      
       - Fix BPF program ELF section name, using the naming expected by libbpf
         when using BPF counters in 'perf stat'.
      
       - Add a new test for perf stat cgroup BPF counter.
      
       - Adjust check on 'perf test wp' for older kernels, where the
         PERF_EVENT_IOC_MODIFY_ATTRIBUTES ioctl isn't supported.
      
       - Sync x86 cpufeatures with the kernel sources, no changes in tooling.
      
      * tag 'perf-tools-fixes-for-v6.0-2022-09-21' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
        perf tools: Honor namespace when synthesizing build-ids
        tools headers cpufeatures: Sync with the kernel sources
        perf kcore_copy: Do not check /proc/modules is unchanged
        libperf evlist: Fix polling of system-wide events
        perf record: Fix cpu mask bit setting for mixed mmaps
        perf test: Skip wp modify test on old kernels
        perf jit: Include program header in ELF files
        perf test: Add a new test for perf stat cgroup BPF counter
        perf stat: Use evsel->core.cpus to iterate cpus in BPF cgroup counters
        perf stat: Fix cpu map index in bperf cgroup code
        perf stat: Fix BPF program section name
      9597f088
  4. 22 9月, 2022 24 次提交