1. 30 9月, 2022 3 次提交
    • J
      random: add 8-bit and 16-bit batches · 585cd5fe
      Jason A. Donenfeld 提交于
      There are numerous places in the kernel that would be sped up by having
      smaller batches. Currently those callsites do `get_random_u32() & 0xff`
      or similar. Since these are pretty spread out, and will require patches
      to multiple different trees, let's get ahead of the curve and lay the
      foundation for `get_random_u8()` and `get_random_u16()`, so that it's
      then possible to start submitting conversion patches leisurely.
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      585cd5fe
    • J
      random: use init_utsname() instead of utsname() · dd54fd7d
      Jason A. Donenfeld 提交于
      Rather than going through the current-> indirection for utsname, at this
      point in boot, init_utsname()==utsname(), so just use it directly that
      way. Additionally, init_utsname() appears to be available nearly always,
      so move it into random_init_early().
      Suggested-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      dd54fd7d
    • J
      random: split initialization into early step and later step · f6238499
      Jason A. Donenfeld 提交于
      The full RNG initialization relies on some timestamps, made possible
      with initialization functions like time_init() and timekeeping_init().
      However, these are only available rather late in initialization.
      Meanwhile, other things, such as memory allocator functions, make use of
      the RNG much earlier.
      
      So split RNG initialization into two phases. We can provide arch
      randomness very early on, and then later, after timekeeping and such are
      available, initialize the rest.
      
      This ensures that, for example, slabs are properly randomized if RDRAND
      is available. Without this, CONFIG_SLAB_FREELIST_RANDOM=y loses a degree
      of its security, because its random seed is potentially deterministic,
      since it hasn't yet incorporated RDRAND. It also makes it possible to
      use a better seed in kfence, which currently relies on only the cycle
      counter.
      
      Another positive consequence is that on systems with RDRAND, running
      with CONFIG_WARN_ALL_UNSEEDED_RANDOM=y results in no warnings at all.
      
      One subtle side effect of this change is that on systems with no RDRAND,
      RDTSC is now only queried by random_init() once, committing the moment
      of the function call, instead of multiple times as before. This is
      intentional, as the multiple RDTSCs in a loop before weren't
      accomplishing very much, with jitter being better provided by
      try_to_generate_entropy(). Plus, filling blocks with RDTSC is still
      being done in extract_entropy(), which is necessarily called before
      random bytes are served anyway.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      f6238499
  2. 29 9月, 2022 2 次提交
    • J
      random: use expired timer rather than wq for mixing fast pool · 748bc4dd
      Jason A. Donenfeld 提交于
      Previously, the fast pool was dumped into the main pool periodically in
      the fast pool's hard IRQ handler. This worked fine and there weren't
      problems with it, until RT came around. Since RT converts spinlocks into
      sleeping locks, problems cropped up. Rather than switching to raw
      spinlocks, the RT developers preferred we make the transformation from
      originally doing:
      
          do_some_stuff()
          spin_lock()
          do_some_other_stuff()
          spin_unlock()
      
      to doing:
      
          do_some_stuff()
          queue_work_on(some_other_stuff_worker)
      
      This is an ordinary pattern done all over the kernel. However, Sherry
      noticed a 10% performance regression in qperf TCP over a 40gbps
      InfiniBand card. Quoting her message:
      
      > MT27500 Family [ConnectX-3] cards:
      > Infiniband device 'mlx4_0' port 1 status:
      > default gid: fe80:0000:0000:0000:0010:e000:0178:9eb1
      > base lid: 0x6
      > sm lid: 0x1
      > state: 4: ACTIVE
      > phys state: 5: LinkUp
      > rate: 40 Gb/sec (4X QDR)
      > link_layer: InfiniBand
      >
      > Cards are configured with IP addresses on private subnet for IPoIB
      > performance testing.
      > Regression identified in this bug is in TCP latency in this stack as reported
      > by qperf tcp_lat metric:
      >
      > We have one system listen as a qperf server:
      > [root@yourQperfServer ~]# qperf
      >
      > Have the other system connect to qperf server as a client (in this
      > case, it’s X7 server with Mellanox card):
      > [root@yourQperfClient ~]# numactl -m0 -N0 qperf 20.20.20.101 -v -uu -ub --time 60 --wait_server 20 -oo msg_size:4K:1024K:*2 tcp_lat
      
      Rather than incur the scheduling latency from queue_work_on, we can
      instead switch to running on the next timer tick, on the same core. This
      also batches things a bit more -- once per jiffy -- which is okay now
      that mix_interrupt_randomness() can credit multiple bits at once.
      Reported-by: NSherry Yang <sherry.yang@oracle.com>
      Tested-by: NPaul Webb <paul.x.webb@oracle.com>
      Cc: Sherry Yang <sherry.yang@oracle.com>
      Cc: Phillip Goerl <phillip.goerl@oracle.com>
      Cc: Jack Vogel <jack.vogel@oracle.com>
      Cc: Nicky Veitch <nicky.veitch@oracle.com>
      Cc: Colm Harrington <colm.harrington@oracle.com>
      Cc: Ramanan Govindarajan <ramanan.govindarajan@oracle.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Sultan Alsawaf <sultan@kerneltoast.com>
      Cc: stable@vger.kernel.org
      Fixes: 58340f8e ("random: defer fast pool mixing to worker")
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      748bc4dd
    • J
      random: avoid reading two cache lines on irq randomness · 9ee0507e
      Jason A. Donenfeld 提交于
      In order to avoid reading and dirtying two cache lines on every IRQ,
      move the work_struct to the bottom of the fast_pool struct. add_
      interrupt_randomness() always touches .pool and .count, which are
      currently split, because .mix pushes everything down. Instead, move .mix
      to the bottom, so that .pool and .count are always in the first cache
      line, since .mix is only accessed when the pool is full.
      
      Fixes: 58340f8e ("random: defer fast pool mixing to worker")
      Reviewed-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      9ee0507e
  3. 23 9月, 2022 4 次提交
  4. 22 9月, 2022 8 次提交
    • J
      bnxt: prevent skb UAF after handing over to PTP worker · c31f26c8
      Jakub Kicinski 提交于
      When reading the timestamp is required bnxt_tx_int() hands
      over the ownership of the completed skb to the PTP worker.
      The skb should not be used afterwards, as the worker may
      run before the rest of our code and free the skb, leading
      to a use-after-free.
      
      Since dev_kfree_skb_any() accepts NULL make the loss of
      ownership more obvious and set skb to NULL.
      
      Fixes: 83bb623c ("bnxt_en: Transmit and retrieve packet timestamps")
      Reviewed-by: NAndy Gospodarek <gospo@broadcom.com>
      Reviewed-by: NMichael Chan <michael.chan@broadcom.com>
      Link: https://lore.kernel.org/r/20220921201005.335390-1-kuba@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      c31f26c8
    • L
      net: marvell: Fix refcounting bugs in prestera_port_sfp_bind() · 3aac7ada
      Liang He 提交于
      In prestera_port_sfp_bind(), there are two refcounting bugs:
      (1) we should call of_node_get() before of_find_node_by_name() as
      it will automaitcally decrease the refcount of 'from' argument;
      (2) we should call of_node_put() for the break of the iteration
      for_each_child_of_node() as it will automatically increase and
      decrease the 'child'.
      
      Fixes: 52323ef7 ("net: marvell: prestera: add phylink support")
      Signed-off-by: NLiang He <windhl@126.com>
      Reviewed-by: NYevhen Orlov <yevhen.orlov@plvision.eu>
      Link: https://lore.kernel.org/r/20220921133245.4111672-1-windhl@126.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      3aac7ada
    • S
      net: sunhme: Fix packet reception for len < RX_COPY_THRESHOLD · 878e2405
      Sean Anderson 提交于
      There is a separate receive path for small packets (under 256 bytes).
      Instead of allocating a new dma-capable skb to be used for the next packet,
      this path allocates a skb and copies the data into it (reusing the existing
      sbk for the next packet). There are two bytes of junk data at the beginning
      of every packet. I believe these are inserted in order to allow aligned DMA
      and IP headers. We skip over them using skb_reserve. Before copying over
      the data, we must use a barrier to ensure we see the whole packet. The
      current code only synchronizes len bytes, starting from the beginning of
      the packet, including the junk bytes. However, this leaves off the final
      two bytes in the packet. Synchronize the whole packet.
      
      To reproduce this problem, ping a HME with a payload size between 17 and
      214
      
      	$ ping -s 17 <hme_address>
      
      which will complain rather loudly about the data mismatch. Small packets
      (below 60 bytes on the wire) do not have this issue. I suspect this is
      related to the padding added to increase the minimum packet size.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NSean Anderson <seanga2@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/20220920235018.1675956-1-seanga2@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      878e2405
    • J
      bonding: fix NULL deref in bond_rr_gen_slave_id · 0e400d60
      Jonathan Toppins 提交于
      Fix a NULL dereference of the struct bonding.rr_tx_counter member because
      if a bond is initially created with an initial mode != zero (Round Robin)
      the memory required for the counter is never created and when the mode is
      changed there is never any attempt to verify the memory is allocated upon
      switching modes.
      
      This causes the following Oops on an aarch64 machine:
          [  334.686773] Unable to handle kernel paging request at virtual address ffff2c91ac905000
          [  334.694703] Mem abort info:
          [  334.697486]   ESR = 0x0000000096000004
          [  334.701234]   EC = 0x25: DABT (current EL), IL = 32 bits
          [  334.706536]   SET = 0, FnV = 0
          [  334.709579]   EA = 0, S1PTW = 0
          [  334.712719]   FSC = 0x04: level 0 translation fault
          [  334.717586] Data abort info:
          [  334.720454]   ISV = 0, ISS = 0x00000004
          [  334.724288]   CM = 0, WnR = 0
          [  334.727244] swapper pgtable: 4k pages, 48-bit VAs, pgdp=000008044d662000
          [  334.733944] [ffff2c91ac905000] pgd=0000000000000000, p4d=0000000000000000
          [  334.740734] Internal error: Oops: 96000004 [#1] SMP
          [  334.745602] Modules linked in: bonding tls veth rfkill sunrpc arm_spe_pmu vfat fat acpi_ipmi ipmi_ssif ixgbe igb i40e mdio ipmi_devintf ipmi_msghandler arm_cmn arm_dsu_pmu cppc_cpufreq acpi_tad fuse zram crct10dif_ce ast ghash_ce sbsa_gwdt nvme drm_vram_helper drm_ttm_helper nvme_core ttm xgene_hwmon
          [  334.772217] CPU: 7 PID: 2214 Comm: ping Not tainted 6.0.0-rc4-00133-g64ae13ed #4
          [  334.779950] Hardware name: GIGABYTE R272-P31-00/MP32-AR1-00, BIOS F18v (SCP: 1.08.20211002) 12/01/2021
          [  334.789244] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
          [  334.796196] pc : bond_rr_gen_slave_id+0x40/0x124 [bonding]
          [  334.801691] lr : bond_xmit_roundrobin_slave_get+0x38/0xdc [bonding]
          [  334.807962] sp : ffff8000221733e0
          [  334.811265] x29: ffff8000221733e0 x28: ffffdbac8572d198 x27: ffff80002217357c
          [  334.818392] x26: 000000000000002a x25: ffffdbacb33ee000 x24: ffff07ff980fa000
          [  334.825519] x23: ffffdbacb2e398ba x22: ffff07ff98102000 x21: ffff07ff981029c0
          [  334.832646] x20: 0000000000000001 x19: ffff07ff981029c0 x18: 0000000000000014
          [  334.839773] x17: 0000000000000000 x16: ffffdbacb1004364 x15: 0000aaaabe2f5a62
          [  334.846899] x14: ffff07ff8e55d968 x13: ffff07ff8e55db30 x12: 0000000000000000
          [  334.854026] x11: ffffdbacb21532e8 x10: 0000000000000001 x9 : ffffdbac857178ec
          [  334.861153] x8 : ffff07ff9f6e5a28 x7 : 0000000000000000 x6 : 000000007c2b3742
          [  334.868279] x5 : ffff2c91ac905000 x4 : ffff2c91ac905000 x3 : ffff07ff9f554400
          [  334.875406] x2 : ffff2c91ac905000 x1 : 0000000000000001 x0 : ffff07ff981029c0
          [  334.882532] Call trace:
          [  334.884967]  bond_rr_gen_slave_id+0x40/0x124 [bonding]
          [  334.890109]  bond_xmit_roundrobin_slave_get+0x38/0xdc [bonding]
          [  334.896033]  __bond_start_xmit+0x128/0x3a0 [bonding]
          [  334.901001]  bond_start_xmit+0x54/0xb0 [bonding]
          [  334.905622]  dev_hard_start_xmit+0xb4/0x220
          [  334.909798]  __dev_queue_xmit+0x1a0/0x720
          [  334.913799]  arp_xmit+0x3c/0xbc
          [  334.916932]  arp_send_dst+0x98/0xd0
          [  334.920410]  arp_solicit+0xe8/0x230
          [  334.923888]  neigh_probe+0x60/0xb0
          [  334.927279]  __neigh_event_send+0x3b0/0x470
          [  334.931453]  neigh_resolve_output+0x70/0x90
          [  334.935626]  ip_finish_output2+0x158/0x514
          [  334.939714]  __ip_finish_output+0xac/0x1a4
          [  334.943800]  ip_finish_output+0x40/0xfc
          [  334.947626]  ip_output+0xf8/0x1a4
          [  334.950931]  ip_send_skb+0x5c/0x100
          [  334.954410]  ip_push_pending_frames+0x3c/0x60
          [  334.958758]  raw_sendmsg+0x458/0x6d0
          [  334.962325]  inet_sendmsg+0x50/0x80
          [  334.965805]  sock_sendmsg+0x60/0x6c
          [  334.969286]  __sys_sendto+0xc8/0x134
          [  334.972853]  __arm64_sys_sendto+0x34/0x4c
          [  334.976854]  invoke_syscall+0x78/0x100
          [  334.980594]  el0_svc_common.constprop.0+0x4c/0xf4
          [  334.985287]  do_el0_svc+0x38/0x4c
          [  334.988591]  el0_svc+0x34/0x10c
          [  334.991724]  el0t_64_sync_handler+0x11c/0x150
          [  334.996072]  el0t_64_sync+0x190/0x194
          [  334.999726] Code: b9001062 f9403c02 d53cd044 8b040042 (b8210040)
          [  335.005810] ---[ end trace 0000000000000000 ]---
          [  335.010416] Kernel panic - not syncing: Oops: Fatal exception in interrupt
          [  335.017279] SMP: stopping secondary CPUs
          [  335.021374] Kernel Offset: 0x5baca8eb0000 from 0xffff800008000000
          [  335.027456] PHYS_OFFSET: 0x80000000
          [  335.030932] CPU features: 0x0000,0085c029,19805c82
          [  335.035713] Memory Limit: none
          [  335.038756] Rebooting in 180 seconds..
      
      The fix is to allocate the memory in bond_open() which is guaranteed
      to be called before any packets are processed.
      
      Fixes: 848ca918 ("net: bonding: Use per-cpu rr_tx_counter")
      CC: Jussi Maki <joamaki@gmail.com>
      Signed-off-by: NJonathan Toppins <jtoppins@redhat.com>
      Acked-by: NJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      0e400d60
    • M
      net: phy: micrel: fix shared interrupt on LAN8814 · 2002fbac
      Michael Walle 提交于
      Since commit ece19502 ("net: phy: micrel: 1588 support for LAN8814
      phy") the handler always returns IRQ_HANDLED, except in an error case.
      Before that commit, the interrupt status register was checked and if
      it was empty, IRQ_NONE was returned. Restore that behavior to play nice
      with the interrupt line being shared with others.
      
      Fixes: ece19502 ("net: phy: micrel: 1588 support for LAN8814 phy")
      Signed-off-by: NMichael Walle <michael@walle.cc>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NHoratiu Vultur <horatiu.vultur@microchip.com>
      Reviewed-by: NDivya Koppera <Divya.Koppera@microchip.com>
      Link: https://lore.kernel.org/r/20220920141619.808117-1-michael@walle.ccSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      2002fbac
    • A
      efi: libstub: check Shim mode using MokSBStateRT · 5f56a74c
      Ard Biesheuvel 提交于
      We currently check the MokSBState variable to decide whether we should
      treat UEFI secure boot as being disabled, even if the firmware thinks
      otherwise. This is used by shim to indicate that it is not checking
      signatures on boot images. In the kernel, we use this to relax lockdown
      policies.
      
      However, in cases where shim is not even being used, we don't want this
      variable to interfere with lockdown, given that the variable may be
      non-volatile and therefore persist across a reboot. This means setting
      it once will persistently disable lockdown checks on a given system.
      
      So switch to the mirrored version of this variable, called MokSBStateRT,
      which is supposed to be volatile, and this is something we can check.
      
      Cc: <stable@vger.kernel.org> # v4.19+
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
      Reviewed-by: NPeter Jones <pjones@redhat.com>
      5f56a74c
    • A
      efi: x86: Wipe setup_data on pure EFI boot · 63bf28ce
      Ard Biesheuvel 提交于
      When booting the x86 kernel via EFI using the LoadImage/StartImage boot
      services [as opposed to the deprecated EFI handover protocol], the setup
      header is taken from the image directly, and given that EFI's LoadImage
      has no Linux/x86 specific knowledge regarding struct bootparams or
      struct setup_header, any absolute addresses in the setup header must
      originate from the file and not from a prior loading stage.
      
      Since we cannot generally predict where LoadImage() decides to load an
      image (*), such absolute addresses must be treated as suspect: even if a
      prior boot stage intended to make them point somewhere inside the
      [signed] image, there is no way to validate that, and if they point at
      an arbitrary location in memory, the setup_data nodes will not be
      covered by any signatures or TPM measurements either, and could be made
      to contain an arbitrary sequence of SETUP_xxx nodes, which could
      interfere quite badly with the early x86 boot sequence.
      
      (*) Note that, while LoadImage() does take a buffer/size tuple in
      addition to a device path, which can be used to provide the image
      contents directly, it will re-allocate such images, as the memory
      footprint of an image is generally larger than the PE/COFF file
      representation.
      
      Cc: <stable@vger.kernel.org> # v5.10+
      Link: https://lore.kernel.org/all/20220904165321.1140894-1-Jason@zx2c4.com/Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      Acked-by: NJason A. Donenfeld <Jason@zx2c4.com>
      63bf28ce
    • L
      ice: Fix ice_xdp_xmit() when XDP TX queue number is not sufficient · 114f398d
      Larysa Zaremba 提交于
      The original patch added the static branch to handle the situation,
      when assigning an XDP TX queue to every CPU is not possible,
      so they have to be shared.
      
      However, in the XDP transmit handler ice_xdp_xmit(), an error was
      returned in such cases even before static condition was checked,
      thus making queue sharing still impossible.
      
      Fixes: 22bf877e ("ice: introduce XDP_TX fallback path")
      Signed-off-by: NLarysa Zaremba <larysa.zaremba@intel.com>
      Reviewed-by: NAlexander Lobakin <alexandr.lobakin@intel.com>
      Link: https://lore.kernel.org/r/20220919134346.25030-1-larysa.zaremba@intel.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      114f398d
  5. 21 9月, 2022 19 次提交
  6. 20 9月, 2022 4 次提交