1. 05 12月, 2022 2 次提交
  2. 29 11月, 2022 5 次提交
  3. 27 11月, 2022 1 次提交
    • F
      x86/tsc: use topology_max_packages() in tsc watchdog check · 4f283abb
      Feng Tang 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 187942, https://gitee.com/openeuler/kernel/issues/I5U037
      CVE: NA
      
      -------------------------------
      
      Commit b50db709 ("x86/tsc: Disable clocksource watchdog for TSC
      on qualified platorms") was introduced to solve problem that
      sometimes TSC clocksource is wrongly judged as unstable by watchdog
      like 'jiffies', HPET, etc.
      
      In it, the hardware socket number is a key factor for judging
      whether to disable the watchdog for TSC, and 'nr_online_nodes' was
      chosen as an estimation due to it is needed in early boot phase
      before registering 'tsc-early' clocksource, where all none-boot
      CPUs are not brought up yet.
      
      In recent patch review, Dave Hansen pointed out there are many
      cases that 'nr_online_nodes' could have issue, like:
      * numa emulation (numa=fake=4 etc.)
      * numa=off
      * platforms with CPU+DRAM nodes, CPU-less HBM nodes, CPU-less
        persistent memory nodes.
      
      Peter Zijlstra suggested to use logical package ids, but it is
      only usable after smp_init() and all CPUs are initialized.
      
      One solution is to skip the watchdog for 'tsc-early' clocksource,
      and move the check after smp_init(), while before 'tsc'
      clocksoure is registered, where topology_max_packages() could
      be used as a much more accurate socket number.
      Signed-off-by: NFeng Tang <feng.tang@intel.com>
      
      Conflict:
      	arch/x86/kernel/tsc.c
      Signed-off-by: NYu Liao <liaoyu15@huawei.com>
      Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      4f283abb
  4. 26 11月, 2022 2 次提交
    • X
      scsi: hisi_sas: Set iptt aborted flag when receiving an abnormal CQ · 4cccc16a
      Xingui Yang 提交于
      driver inclusion
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I62ZXO
      CVE: NA
      
      ------------------------------------------------
      
      During the write I/O, when the SAS PHY switch is tested, the hardware
      may reports two CQs for one IO. the first cq indicates invalid port when
      DPH scheduling, the second cq indicates that response frame has been
      written to the memory but the I/O is ended abnormally due to I/O data
      underload. So set iptt aborted flag when receiving an abnormal CQ, then the
      host will discards the IPTT frame received from the SAS hard disk.
      Signed-off-by: NXingui Yang <yangxingui@huawei.com>
      Reviewed-by: Nkang fenglong <kangfenglong@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      4cccc16a
    • L
      ext4: fix bug in extents parsing when eh_entries == 0 and eh_depth > 0 · bc9ebdce
      Luís Henriques 提交于
      mainline inclusion
      from mainline-v6.0-rc7
      commit 29a5b8a1
      category: bugfix
      bugzilla: 187444, https://gitee.com/openeuler/kernel/issues/I6261Z
      CVE: NA
      
      --------------------------------
      
      When walking through an inode extents, the ext4_ext_binsearch_idx() function
      assumes that the extent header has been previously validated.  However, there
      are no checks that verify that the number of entries (eh->eh_entries) is
      non-zero when depth is > 0.  And this will lead to problems because the
      EXT_FIRST_INDEX() and EXT_LAST_INDEX() will return garbage and result in this:
      
      [  135.245946] ------------[ cut here ]------------
      [  135.247579] kernel BUG at fs/ext4/extents.c:2258!
      [  135.249045] invalid opcode: 0000 [#1] PREEMPT SMP
      [  135.250320] CPU: 2 PID: 238 Comm: tmp118 Not tainted 5.19.0-rc8+ #4
      [  135.252067] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014
      [  135.255065] RIP: 0010:ext4_ext_map_blocks+0xc20/0xcb0
      [  135.256475] Code:
      [  135.261433] RSP: 0018:ffffc900005939f8 EFLAGS: 00010246
      [  135.262847] RAX: 0000000000000024 RBX: ffffc90000593b70 RCX: 0000000000000023
      [  135.264765] RDX: ffff8880038e5f10 RSI: 0000000000000003 RDI: ffff8880046e922c
      [  135.266670] RBP: ffff8880046e9348 R08: 0000000000000001 R09: ffff888002ca580c
      [  135.268576] R10: 0000000000002602 R11: 0000000000000000 R12: 0000000000000024
      [  135.270477] R13: 0000000000000000 R14: 0000000000000024 R15: 0000000000000000
      [  135.272394] FS:  00007fdabdc56740(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
      [  135.274510] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  135.276075] CR2: 00007ffc26bd4f00 CR3: 0000000006261004 CR4: 0000000000170ea0
      [  135.277952] Call Trace:
      [  135.278635]  <TASK>
      [  135.279247]  ? preempt_count_add+0x6d/0xa0
      [  135.280358]  ? percpu_counter_add_batch+0x55/0xb0
      [  135.281612]  ? _raw_read_unlock+0x18/0x30
      [  135.282704]  ext4_map_blocks+0x294/0x5a0
      [  135.283745]  ? xa_load+0x6f/0xa0
      [  135.284562]  ext4_mpage_readpages+0x3d6/0x770
      [  135.285646]  read_pages+0x67/0x1d0
      [  135.286492]  ? folio_add_lru+0x51/0x80
      [  135.287441]  page_cache_ra_unbounded+0x124/0x170
      [  135.288510]  filemap_get_pages+0x23d/0x5a0
      [  135.289457]  ? path_openat+0xa72/0xdd0
      [  135.290332]  filemap_read+0xbf/0x300
      [  135.291158]  ? _raw_spin_lock_irqsave+0x17/0x40
      [  135.292192]  new_sync_read+0x103/0x170
      [  135.293014]  vfs_read+0x15d/0x180
      [  135.293745]  ksys_read+0xa1/0xe0
      [  135.294461]  do_syscall_64+0x3c/0x80
      [  135.295284]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      This patch simply adds an extra check in __ext4_ext_check(), verifying that
      eh_entries is not 0 when eh_depth is > 0.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=215941
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=216283
      Cc: Baokun Li <libaokun1@huawei.com>
      Cc: stable@kernel.org
      Signed-off-by: NLuís Henriques <lhenriques@suse.de>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NBaokun Li <libaokun1@huawei.com>
      Link: https://lore.kernel.org/r/20220822094235.2690-1-lhenriques@suse.deSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NBaokun Li <libaokun1@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      bc9ebdce
  5. 21 11月, 2022 1 次提交
  6. 19 11月, 2022 4 次提交
  7. 15 11月, 2022 1 次提交
  8. 14 11月, 2022 2 次提交
  9. 08 11月, 2022 22 次提交
    • R
      init/main.c: return 1 from handled __setup() functions · d484e833
      Randy Dunlap 提交于
      stable inclusion
      from stable-4.19.238
      commit c7daf1b4ad809692d5c26f33c02ed8a031066548
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5X41F
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit f9a40b08 ]
      
      initcall_blacklist() should return 1 to indicate that it handled its
      cmdline arguments.
      
      set_debug_rodata() should return 1 to indicate that it handled its
      cmdline arguments.  Print a warning if the option string is invalid.
      
      This prevents these strings from being added to the 'init' program's
      environment as they are not init arguments/parameters.
      
      Link: https://lkml.kernel.org/r/20220221050901.23985-1-rdunlap@infradead.orgSigned-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Reported-by: NIgor Zhbanov <i.zhbanov@omprussia.ru>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NLin Yujun <linyujun809@huawei.com>
      Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      d484e833
    • P
      x86/pm: Save the MSR validity status at context setup · 47477ca7
      Pawan Gupta 提交于
      stable inclusion
      from stable-4.19.238
      commit c7daf1b4ad809692d5c26f33c02ed8a031066548
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5X41F
      CVE: NA
      
      --------------------------------
      
      commit 73924ec4 upstream.
      
      The mechanism to save/restore MSRs during S3 suspend/resume checks for
      the MSR validity during suspend, and only restores the MSR if its a
      valid MSR.  This is not optimal, as an invalid MSR will unnecessarily
      throw an exception for every suspend cycle.  The more invalid MSRs,
      higher the impact will be.
      
      Check and save the MSR validity at setup.  This ensures that only valid
      MSRs that are guaranteed to not throw an exception will be attempted
      during suspend.
      
      Fixes: 7a9c2dd0 ("x86/pm: Introduce quirk framework to save/restore extra MSR registers around suspend/resume")
      Suggested-by: NDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Reviewed-by: NDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NLin Yujun <linyujun809@huawei.com>
      Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      47477ca7
    • P
      x86/speculation: Restore speculation related MSRs during S3 resume · 2b91784d
      Pawan Gupta 提交于
      stable inclusion
      from stable-4.19.238
      commit c7daf1b4ad809692d5c26f33c02ed8a031066548
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5X41F
      CVE: NA
      
      --------------------------------
      
      commit e2a1256b upstream.
      
      After resuming from suspend-to-RAM, the MSRs that control CPU's
      speculative execution behavior are not being restored on the boot CPU.
      
      These MSRs are used to mitigate speculative execution vulnerabilities.
      Not restoring them correctly may leave the CPU vulnerable.  Secondary
      CPU's MSRs are correctly being restored at S3 resume by
      identify_secondary_cpu().
      
      During S3 resume, restore these MSRs for boot CPU when restoring its
      processor state.
      
      Fixes: 77243971 ("x86/bugs/intel: Set proper CPU features and setup RDS")
      Reported-by: NNeelima Krishnan <neelima.krishnan@intel.com>
      Signed-off-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Tested-by: NNeelima Krishnan <neelima.krishnan@intel.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NLin Yujun <linyujun809@huawei.com>
      Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      2b91784d
    • B
      x86/cpu: Load microcode during restore_processor_state() · 27dd57ae
      Borislav Petkov 提交于
      stable inclusion
      from stable-4.19.238
      commit c7daf1b4ad809692d5c26f33c02ed8a031066548
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5X41F
      CVE: NA
      
      --------------------------------
      
      commit f9e14dbb upstream.
      
      When resuming from system sleep state, restore_processor_state()
      restores the boot CPU MSRs. These MSRs could be emulated by microcode.
      If microcode is not loaded yet, writing to emulated MSRs leads to
      unchecked MSR access error:
      
        ...
        PM: Calling lapic_suspend+0x0/0x210
        unchecked MSR access error: WRMSR to 0x10f (tried to write 0x0...0) at rIP: ... (native_write_msr)
        Call Trace:
          <TASK>
          ? restore_processor_state
          x86_acpi_suspend_lowlevel
          acpi_suspend_enter
          suspend_devices_and_enter
          pm_suspend.cold
          state_store
          kobj_attr_store
          sysfs_kf_write
          kernfs_fop_write_iter
          new_sync_write
          vfs_write
          ksys_write
          __x64_sys_write
          do_syscall_64
          entry_SYSCALL_64_after_hwframe
         RIP: 0033:0x7fda13c260a7
      
      To ensure microcode emulated MSRs are available for restoration, load
      the microcode on the boot CPU before restoring these MSRs.
      
        [ Pawan: write commit message and productize it. ]
      
      Fixes: e2a1256b ("x86/speculation: Restore speculation related MSRs during S3 resume")
      Reported-by: NKyle D. Pelton <kyle.d.pelton@intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Tested-by: NKyle D. Pelton <kyle.d.pelton@intel.com>
      Cc: stable@vger.kernel.org
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=215841
      Link: https://lore.kernel.org/r/4350dfbf785cd482d3fafa72b2b49c83102df3ce.1650386317.git.pawan.kumar.gupta@linux.intel.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NLin Yujun <linyujun809@huawei.com>
      Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      27dd57ae
    • T
      genirq: Synchronize interrupt thread startup · 950faec0
      Thomas Pfaff 提交于
      stable inclusion
      from stable-4.19.238
      commit c7daf1b4ad809692d5c26f33c02ed8a031066548
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5X41F
      CVE: NA
      
      --------------------------------
      
      commit 8707898e upstream.
      
      A kernel hang can be observed when running setserial in a loop on a kernel
      with force threaded interrupts. The sequence of events is:
      
         setserial
           open("/dev/ttyXXX")
             request_irq()
           do_stuff()
            -> serial interrupt
               -> wake(irq_thread)
      	      desc->threads_active++;
           close()
             free_irq()
               kthread_stop(irq_thread)
           synchronize_irq() <- hangs because desc->threads_active != 0
      
      The thread is created in request_irq() and woken up, but does not get on a
      CPU to reach the actual thread function, which would handle the pending
      wake-up. kthread_stop() sets the should stop condition which makes the
      thread immediately exit, which in turn leaves the stale threads_active
      count around.
      
      This problem was introduced with commit 519cc865, which addressed a
      interrupt sharing issue in the PCIe code.
      
      Before that commit free_irq() invoked synchronize_irq(), which waits for
      the hard interrupt handler and also for associated threads to complete.
      
      To address the PCIe issue synchronize_irq() was replaced with
      __synchronize_hardirq(), which only waits for the hard interrupt handler to
      complete, but not for threaded handlers.
      
      This was done under the assumption, that the interrupt thread already
      reached the thread function and waits for a wake-up, which is guaranteed to
      be handled before acting on the stop condition. The problematic case, that
      the thread would not reach the thread function, was obviously overlooked.
      
      Make sure that the interrupt thread is really started and reaches
      thread_fn() before returning from __setup_irq().
      
      This utilizes the existing wait queue in the interrupt descriptor. The
      wait queue is unused for non-shared interrupts. For shared interrupts the
      usage might cause a spurious wake-up of a waiter in synchronize_irq() or the
      completion of a threaded handler might cause a spurious wake-up of the
      waiter for the ready flag. Both are harmless and have no functional impact.
      
      [ tglx: Amended changelog ]
      
      Fixes: 519cc865 ("genirq: Synchronize only with single thread on free_irq()")
      Signed-off-by: NThomas Pfaff <tpfaff@pcs.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NMarc Zyngier <maz@kernel.org>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/552fe7b4-9224-b183-bb87-a8f36d335690@pcs.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NLin Yujun <linyujun809@huawei.com>
      Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      950faec0
    • M
      nvme: Fix IOC_PR_CLEAR and IOC_PR_RELEASE ioctls for nvme devices · 3b23e85f
      Michael Kelley 提交于
      stable inclusion
      from stable-v4.19.261
      commit 5f7fd71e5bebf337769f20dd125822ce63266e4d
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit c292a337 ]
      
      The IOC_PR_CLEAR and IOC_PR_RELEASE ioctls are
      non-functional on NVMe devices because the nvme_pr_clear()
      and nvme_pr_release() functions set the IEKEY field incorrectly.
      The IEKEY field should be set only when the key is zero (i.e,
      not specified).  The current code does it backwards.
      
      Furthermore, the NVMe spec describes the persistent
      reservation "clear" function as an option on the reservation
      release command. The current implementation of nvme_pr_clear()
      erroneously uses the reservation register command.
      
      Fix these errors. Note that NVMe version 1.3 and later specify
      that setting the IEKEY field will return an error of Invalid
      Field in Command.  The fix will set IEKEY when the key is zero,
      which is appropriate as these ioctls consider a zero key to
      be "unspecified", and the intention of the spec change is
      to require a valid key.
      
      Tested on a version 1.4 PCI NVMe device in an Azure VM.
      
      Fixes: 1673f1f0 ("nvme: move block_device_operations and ns/ctrl freeing to common code")
      Fixes: 1d277a63 ("NVMe: Add persistent reservation ops")
      Signed-off-by: NMichael Kelley <mikelley@microsoft.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Conflicts:
      	drivers/nvme/host/core.c
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      Reviewed-by: NJason Yan <yanaijie@huawei.com>
      Reviewed-by: NYu Kuai <yukuai3@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      3b23e85f
    • E
      once: add DO_ONCE_SLOW() for sleepable contexts · 3743e9b5
      Eric Dumazet 提交于
      stable inclusion
      from stable-v4.19.262
      commit f5686a03b138f6330eeda082ee4f96c8109f56f3
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 62c07983 ]
      
      Christophe Leroy reported a ~80ms latency spike
      happening at first TCP connect() time.
      
      This is because __inet_hash_connect() uses get_random_once()
      to populate a perturbation table which became quite big
      after commit 4c2c8f03 ("tcp: increase source port perturb table to 2^16")
      
      get_random_once() uses DO_ONCE(), which block hard irqs for the duration
      of the operation.
      
      This patch adds DO_ONCE_SLOW() which uses a mutex instead of a spinlock
      for operations where we prefer to stay in process context.
      
      Then __inet_hash_connect() can use get_random_slow_once()
      to populate its perturbation table.
      
      Fixes: 4c2c8f03 ("tcp: increase source port perturb table to 2^16")
      Fixes: 190cc824 ("tcp: change source port randomizarion at connect() time")
      Reported-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
      Link: https://lore.kernel.org/netdev/CANn89iLAEYBaoYajy0Y9UmGFff5GPxDUoG-ErVB2jDdRNQ5Tug@mail.gmail.com/T/#tSigned-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Tested-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      
      One conflict occurs because the commit 4c2c8f03 ("tcp: increase
      source port perturb table to 2^16") is integrated but the commit
      e9261476 ("tcp: dynamically allocate the perturb table used by
      source port") is not integrated.
      One conflict occurs because the commit 1027b96e ("once: Fix panic
      when module unload") is not integrated.
      
      Conflicts:
      	net/ipv4/inet_hashtables.c
      	lib/once.c
      Signed-off-by: NLiu Jian <liujian56@huawei.com>
      Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      3743e9b5
    • E
      inet: fully convert sk->sk_rx_dst to RCU rules · 4c4298bf
      Eric Dumazet 提交于
      stable inclusion
      from stable-v4.19.262
      commit 75a578000ae5e511e5d0e8433c94a14d9c99c412
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
      CVE: NA
      
      --------------------------------
      
      commit 8f905c0e upstream.
      
      syzbot reported various issues around early demux,
      one being included in this changelog [1]
      
      sk->sk_rx_dst is using RCU protection without clearly
      documenting it.
      
      And following sequences in tcp_v4_do_rcv()/tcp_v6_do_rcv()
      are not following standard RCU rules.
      
      [a]    dst_release(dst);
      [b]    sk->sk_rx_dst = NULL;
      
      They look wrong because a delete operation of RCU protected
      pointer is supposed to clear the pointer before
      the call_rcu()/synchronize_rcu() guarding actual memory freeing.
      
      In some cases indeed, dst could be freed before [b] is done.
      
      We could cheat by clearing sk_rx_dst before calling
      dst_release(), but this seems the right time to stick
      to standard RCU annotations and debugging facilities.
      
      [1]
      BUG: KASAN: use-after-free in dst_check include/net/dst.h:470 [inline]
      BUG: KASAN: use-after-free in tcp_v4_early_demux+0x95b/0x960 net/ipv4/tcp_ipv4.c:1792
      Read of size 2 at addr ffff88807f1cb73a by task syz-executor.5/9204
      
      CPU: 0 PID: 9204 Comm: syz-executor.5 Not tainted 5.16.0-rc5-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       print_address_description.constprop.0.cold+0x8d/0x320 mm/kasan/report.c:247
       __kasan_report mm/kasan/report.c:433 [inline]
       kasan_report.cold+0x83/0xdf mm/kasan/report.c:450
       dst_check include/net/dst.h:470 [inline]
       tcp_v4_early_demux+0x95b/0x960 net/ipv4/tcp_ipv4.c:1792
       ip_rcv_finish_core.constprop.0+0x15de/0x1e80 net/ipv4/ip_input.c:340
       ip_list_rcv_finish.constprop.0+0x1b2/0x6e0 net/ipv4/ip_input.c:583
       ip_sublist_rcv net/ipv4/ip_input.c:609 [inline]
       ip_list_rcv+0x34e/0x490 net/ipv4/ip_input.c:644
       __netif_receive_skb_list_ptype net/core/dev.c:5508 [inline]
       __netif_receive_skb_list_core+0x549/0x8e0 net/core/dev.c:5556
       __netif_receive_skb_list net/core/dev.c:5608 [inline]
       netif_receive_skb_list_internal+0x75e/0xd80 net/core/dev.c:5699
       gro_normal_list net/core/dev.c:5853 [inline]
       gro_normal_list net/core/dev.c:5849 [inline]
       napi_complete_done+0x1f1/0x880 net/core/dev.c:6590
       virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
       virtnet_poll+0xca2/0x11b0 drivers/net/virtio_net.c:1557
       __napi_poll+0xaf/0x440 net/core/dev.c:7023
       napi_poll net/core/dev.c:7090 [inline]
       net_rx_action+0x801/0xb40 net/core/dev.c:7177
       __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
       invoke_softirq kernel/softirq.c:432 [inline]
       __irq_exit_rcu+0x123/0x180 kernel/softirq.c:637
       irq_exit_rcu+0x5/0x20 kernel/softirq.c:649
       common_interrupt+0x52/0xc0 arch/x86/kernel/irq.c:240
       asm_common_interrupt+0x1e/0x40 arch/x86/include/asm/idtentry.h:629
      RIP: 0033:0x7f5e972bfd57
      Code: 39 d1 73 14 0f 1f 80 00 00 00 00 48 8b 50 f8 48 83 e8 08 48 39 ca 77 f3 48 39 c3 73 3e 48 89 13 48 8b 50 f8 48 89 38 49 8b 0e <48> 8b 3e 48 83 c3 08 48 83 c6 08 eb bc 48 39 d1 72 9e 48 39 d0 73
      RSP: 002b:00007fff8a413210 EFLAGS: 00000283
      RAX: 00007f5e97108990 RBX: 00007f5e97108338 RCX: ffffffff81d3aa45
      RDX: ffffffff81d3aa45 RSI: 00007f5e97108340 RDI: ffffffff81d3aa45
      RBP: 00007f5e97107eb8 R08: 00007f5e97108d88 R09: 0000000093c2e8d9
      R10: 0000000000000000 R11: 0000000000000000 R12: 00007f5e97107eb0
      R13: 00007f5e97108338 R14: 00007f5e97107ea8 R15: 0000000000000019
       </TASK>
      
      Allocated by task 13:
       kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
       kasan_set_track mm/kasan/common.c:46 [inline]
       set_alloc_info mm/kasan/common.c:434 [inline]
       __kasan_slab_alloc+0x90/0xc0 mm/kasan/common.c:467
       kasan_slab_alloc include/linux/kasan.h:259 [inline]
       slab_post_alloc_hook mm/slab.h:519 [inline]
       slab_alloc_node mm/slub.c:3234 [inline]
       slab_alloc mm/slub.c:3242 [inline]
       kmem_cache_alloc+0x202/0x3a0 mm/slub.c:3247
       dst_alloc+0x146/0x1f0 net/core/dst.c:92
       rt_dst_alloc+0x73/0x430 net/ipv4/route.c:1613
       ip_route_input_slow+0x1817/0x3a20 net/ipv4/route.c:2340
       ip_route_input_rcu net/ipv4/route.c:2470 [inline]
       ip_route_input_noref+0x116/0x2a0 net/ipv4/route.c:2415
       ip_rcv_finish_core.constprop.0+0x288/0x1e80 net/ipv4/ip_input.c:354
       ip_list_rcv_finish.constprop.0+0x1b2/0x6e0 net/ipv4/ip_input.c:583
       ip_sublist_rcv net/ipv4/ip_input.c:609 [inline]
       ip_list_rcv+0x34e/0x490 net/ipv4/ip_input.c:644
       __netif_receive_skb_list_ptype net/core/dev.c:5508 [inline]
       __netif_receive_skb_list_core+0x549/0x8e0 net/core/dev.c:5556
       __netif_receive_skb_list net/core/dev.c:5608 [inline]
       netif_receive_skb_list_internal+0x75e/0xd80 net/core/dev.c:5699
       gro_normal_list net/core/dev.c:5853 [inline]
       gro_normal_list net/core/dev.c:5849 [inline]
       napi_complete_done+0x1f1/0x880 net/core/dev.c:6590
       virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
       virtnet_poll+0xca2/0x11b0 drivers/net/virtio_net.c:1557
       __napi_poll+0xaf/0x440 net/core/dev.c:7023
       napi_poll net/core/dev.c:7090 [inline]
       net_rx_action+0x801/0xb40 net/core/dev.c:7177
       __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
      
      Freed by task 13:
       kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
       kasan_set_track+0x21/0x30 mm/kasan/common.c:46
       kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
       ____kasan_slab_free mm/kasan/common.c:366 [inline]
       ____kasan_slab_free mm/kasan/common.c:328 [inline]
       __kasan_slab_free+0xff/0x130 mm/kasan/common.c:374
       kasan_slab_free include/linux/kasan.h:235 [inline]
       slab_free_hook mm/slub.c:1723 [inline]
       slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1749
       slab_free mm/slub.c:3513 [inline]
       kmem_cache_free+0xbd/0x5d0 mm/slub.c:3530
       dst_destroy+0x2d6/0x3f0 net/core/dst.c:127
       rcu_do_batch kernel/rcu/tree.c:2506 [inline]
       rcu_core+0x7ab/0x1470 kernel/rcu/tree.c:2741
       __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
      
      Last potentially related work creation:
       kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
       __kasan_record_aux_stack+0xf5/0x120 mm/kasan/generic.c:348
       __call_rcu kernel/rcu/tree.c:2985 [inline]
       call_rcu+0xb1/0x740 kernel/rcu/tree.c:3065
       dst_release net/core/dst.c:177 [inline]
       dst_release+0x79/0xe0 net/core/dst.c:167
       tcp_v4_do_rcv+0x612/0x8d0 net/ipv4/tcp_ipv4.c:1712
       sk_backlog_rcv include/net/sock.h:1030 [inline]
       __release_sock+0x134/0x3b0 net/core/sock.c:2768
       release_sock+0x54/0x1b0 net/core/sock.c:3300
       tcp_sendmsg+0x36/0x40 net/ipv4/tcp.c:1441
       inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:819
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:724
       sock_write_iter+0x289/0x3c0 net/socket.c:1057
       call_write_iter include/linux/fs.h:2162 [inline]
       new_sync_write+0x429/0x660 fs/read_write.c:503
       vfs_write+0x7cd/0xae0 fs/read_write.c:590
       ksys_write+0x1ee/0x250 fs/read_write.c:643
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      The buggy address belongs to the object at ffff88807f1cb700
       which belongs to the cache ip_dst_cache of size 176
      The buggy address is located 58 bytes inside of
       176-byte region [ffff88807f1cb700, ffff88807f1cb7b0)
      The buggy address belongs to the page:
      page:ffffea0001fc72c0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7f1cb
      flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff)
      raw: 00fff00000000200 dead000000000100 dead000000000122 ffff8881413bb780
      raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      page_owner tracks the page as allocated
      page last allocated via order 0, migratetype Unmovable, gfp_mask 0x112a20(GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_HARDWALL), pid 5, ts 108466983062, free_ts 108048976062
       prep_new_page mm/page_alloc.c:2418 [inline]
       get_page_from_freelist+0xa72/0x2f50 mm/page_alloc.c:4149
       __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5369
       alloc_pages+0x1a7/0x300 mm/mempolicy.c:2191
       alloc_slab_page mm/slub.c:1793 [inline]
       allocate_slab mm/slub.c:1930 [inline]
       new_slab+0x32d/0x4a0 mm/slub.c:1993
       ___slab_alloc+0x918/0xfe0 mm/slub.c:3022
       __slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3109
       slab_alloc_node mm/slub.c:3200 [inline]
       slab_alloc mm/slub.c:3242 [inline]
       kmem_cache_alloc+0x35c/0x3a0 mm/slub.c:3247
       dst_alloc+0x146/0x1f0 net/core/dst.c:92
       rt_dst_alloc+0x73/0x430 net/ipv4/route.c:1613
       __mkroute_output net/ipv4/route.c:2564 [inline]
       ip_route_output_key_hash_rcu+0x921/0x2d00 net/ipv4/route.c:2791
       ip_route_output_key_hash+0x18b/0x300 net/ipv4/route.c:2619
       __ip_route_output_key include/net/route.h:126 [inline]
       ip_route_output_flow+0x23/0x150 net/ipv4/route.c:2850
       ip_route_output_key include/net/route.h:142 [inline]
       geneve_get_v4_rt+0x3a6/0x830 drivers/net/geneve.c:809
       geneve_xmit_skb drivers/net/geneve.c:899 [inline]
       geneve_xmit+0xc4a/0x3540 drivers/net/geneve.c:1082
       __netdev_start_xmit include/linux/netdevice.h:4994 [inline]
       netdev_start_xmit include/linux/netdevice.h:5008 [inline]
       xmit_one net/core/dev.c:3590 [inline]
       dev_hard_start_xmit+0x1eb/0x920 net/core/dev.c:3606
       __dev_queue_xmit+0x299a/0x3650 net/core/dev.c:4229
      page last free stack trace:
       reset_page_owner include/linux/page_owner.h:24 [inline]
       free_pages_prepare mm/page_alloc.c:1338 [inline]
       free_pcp_prepare+0x374/0x870 mm/page_alloc.c:1389
       free_unref_page_prepare mm/page_alloc.c:3309 [inline]
       free_unref_page+0x19/0x690 mm/page_alloc.c:3388
       qlink_free mm/kasan/quarantine.c:146 [inline]
       qlist_free_all+0x5a/0xc0 mm/kasan/quarantine.c:165
       kasan_quarantine_reduce+0x180/0x200 mm/kasan/quarantine.c:272
       __kasan_slab_alloc+0xa2/0xc0 mm/kasan/common.c:444
       kasan_slab_alloc include/linux/kasan.h:259 [inline]
       slab_post_alloc_hook mm/slab.h:519 [inline]
       slab_alloc_node mm/slub.c:3234 [inline]
       kmem_cache_alloc_node+0x255/0x3f0 mm/slub.c:3270
       __alloc_skb+0x215/0x340 net/core/skbuff.c:414
       alloc_skb include/linux/skbuff.h:1126 [inline]
       alloc_skb_with_frags+0x93/0x620 net/core/skbuff.c:6078
       sock_alloc_send_pskb+0x783/0x910 net/core/sock.c:2575
       mld_newpack+0x1df/0x770 net/ipv6/mcast.c:1754
       add_grhead+0x265/0x330 net/ipv6/mcast.c:1857
       add_grec+0x1053/0x14e0 net/ipv6/mcast.c:1995
       mld_send_initial_cr.part.0+0xf6/0x230 net/ipv6/mcast.c:2242
       mld_send_initial_cr net/ipv6/mcast.c:1232 [inline]
       mld_dad_work+0x1d3/0x690 net/ipv6/mcast.c:2268
       process_one_work+0x9b2/0x1690 kernel/workqueue.c:2298
       worker_thread+0x658/0x11f0 kernel/workqueue.c:2445
      
      Memory state around the buggy address:
       ffff88807f1cb600: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff88807f1cb680: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
      >ffff88807f1cb700: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                              ^
       ffff88807f1cb780: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
       ffff88807f1cb800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      Fixes: 41063e9d ("ipv4: Early TCP socket demux.")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20211220143330.680945-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      [cmllamas: fixed trivial merge conflict]
      Signed-off-by: NCarlos Llamas <cmllamas@google.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      Conflicts:
      	net/ipv4/af_inet.c
      Signed-off-by: NDong Chenchen <dongchenchen2@huawei.com>
      Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      4c4298bf
    • J
      ext4: continue to expand file system when the target size doesn't reach · 3b8b0d3f
      Jerry Lee 李修賢 提交于
      stable inclusion
      from stable-v4.19.262
      commit f2180ad6a43501597d20eacad0c6f146c51d4bbd
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
      CVE: NA
      
      --------------------------------
      
      commit df3cb754 upstream.
      
      When expanding a file system from (16TiB-2MiB) to 18TiB, the operation
      exits early which leads to result inconsistency between resize2fs and
      Ext4 kernel driver.
      
      === before ===
      ○ → resize2fs /dev/mapper/thin
      resize2fs 1.45.5 (07-Jan-2020)
      Filesystem at /dev/mapper/thin is mounted on /mnt/test; on-line resizing required
      old_desc_blocks = 2048, new_desc_blocks = 2304
      The filesystem on /dev/mapper/thin is now 4831837696 (4k) blocks long.
      
      [  865.186308] EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
      [  912.091502] dm-4: detected capacity change from 34359738368 to 38654705664
      [  970.030550] dm-5: detected capacity change from 34359734272 to 38654701568
      [ 1000.012751] EXT4-fs (dm-5): resizing filesystem from 4294966784 to 4831837696 blocks
      [ 1000.012878] EXT4-fs (dm-5): resized filesystem to 4294967296
      
      === after ===
      [  129.104898] EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
      [  143.773630] dm-4: detected capacity change from 34359738368 to 38654705664
      [  198.203246] dm-5: detected capacity change from 34359734272 to 38654701568
      [  207.918603] EXT4-fs (dm-5): resizing filesystem from 4294966784 to 4831837696 blocks
      [  207.918754] EXT4-fs (dm-5): resizing filesystem from 4294967296 to 4831837696 blocks
      [  207.918758] EXT4-fs (dm-5): Converting file system to meta_bg
      [  207.918790] EXT4-fs (dm-5): resizing filesystem from 4294967296 to 4831837696 blocks
      [  221.454050] EXT4-fs (dm-5): resized to 4658298880 blocks
      [  227.634613] EXT4-fs (dm-5): resized filesystem to 4831837696
      Signed-off-by: NJerry Lee <jerrylee@qnap.com>
      Link: https://lore.kernel.org/r/PU1PR04MB22635E739BD21150DC182AC6A18C9@PU1PR04MB2263.apcprd04.prod.outlook.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      3b8b0d3f
    • K
      nvme: copy firmware_rev on each init · 8f5816c2
      Keith Busch 提交于
      stable inclusion
      from stable-v4.19.262
      commit 366a2b3110c69f919fb3277acc1a0bb8cd8a8dbd
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit a8eb6c1b ]
      
      The firmware revision can change on after a reset so copy the most
      recent info each time instead of just the first time, otherwise the
      sysfs firmware_rev entry may contain stale data.
      Reported-by: NJeff Lien <jeff.lien@wdc.com>
      Signed-off-by: NKeith Busch <kbusch@kernel.org>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
      Reviewed-by: NChao Leng <lengchao@huawei.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      8f5816c2
    • L
      net: If sock is dead don't access sock's sk_wq in sk_stream_wait_memory · 86774965
      Liu Jian 提交于
      stable inclusion
      from stable-v4.19.262
      commit 5fe03917bb017d9af68a95f989f1c122eebc69a6
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 3f8ef65a ]
      
      Fixes the below NULL pointer dereference:
      
        [...]
        [   14.471200] Call Trace:
        [   14.471562]  <TASK>
        [   14.471882]  lock_acquire+0x245/0x2e0
        [   14.472416]  ? remove_wait_queue+0x12/0x50
        [   14.473014]  ? _raw_spin_lock_irqsave+0x17/0x50
        [   14.473681]  _raw_spin_lock_irqsave+0x3d/0x50
        [   14.474318]  ? remove_wait_queue+0x12/0x50
        [   14.474907]  remove_wait_queue+0x12/0x50
        [   14.475480]  sk_stream_wait_memory+0x20d/0x340
        [   14.476127]  ? do_wait_intr_irq+0x80/0x80
        [   14.476704]  do_tcp_sendpages+0x287/0x600
        [   14.477283]  tcp_bpf_push+0xab/0x260
        [   14.477817]  tcp_bpf_sendmsg_redir+0x297/0x500
        [   14.478461]  ? __local_bh_enable_ip+0x77/0xe0
        [   14.479096]  tcp_bpf_send_verdict+0x105/0x470
        [   14.479729]  tcp_bpf_sendmsg+0x318/0x4f0
        [   14.480311]  sock_sendmsg+0x2d/0x40
        [   14.480822]  ____sys_sendmsg+0x1b4/0x1c0
        [   14.481390]  ? copy_msghdr_from_user+0x62/0x80
        [   14.482048]  ___sys_sendmsg+0x78/0xb0
        [   14.482580]  ? vmf_insert_pfn_prot+0x91/0x150
        [   14.483215]  ? __do_fault+0x2a/0x1a0
        [   14.483738]  ? do_fault+0x15e/0x5d0
        [   14.484246]  ? __handle_mm_fault+0x56b/0x1040
        [   14.484874]  ? lock_is_held_type+0xdf/0x130
        [   14.485474]  ? find_held_lock+0x2d/0x90
        [   14.486046]  ? __sys_sendmsg+0x41/0x70
        [   14.486587]  __sys_sendmsg+0x41/0x70
        [   14.487105]  ? intel_pmu_drain_pebs_core+0x350/0x350
        [   14.487822]  do_syscall_64+0x34/0x80
        [   14.488345]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
        [...]
      
      The test scenario has the following flow:
      
      thread1                               thread2
      -----------                           ---------------
       tcp_bpf_sendmsg
        tcp_bpf_send_verdict
         tcp_bpf_sendmsg_redir              sock_close
          tcp_bpf_push_locked                 __sock_release
           tcp_bpf_push                         //inet_release
            do_tcp_sendpages                    sock->ops->release
             sk_stream_wait_memory          	   // tcp_close
                sk_wait_event                      sk->sk_prot->close
                 release_sock(__sk);
                  ***
                                                      lock_sock(sk);
                                                        __tcp_close
                                                          sock_orphan(sk)
                                                            sk->sk_wq  = NULL
                                                      release_sock
                  ****
                 lock_sock(__sk);
                remove_wait_queue(sk_sleep(sk), &wait);
                   sk_sleep(sk)
                   //NULL pointer dereference
                   &rcu_dereference_raw(sk->sk_wq)->wait
      
      While waiting for memory in thread1, the socket is released with its wait
      queue because thread2 has closed it. This caused by tcp_bpf_send_verdict
      didn't increase the f_count of psock->sk_redir->sk_socket->file in thread1.
      
      We should check if SOCK_DEAD flag is set on wakeup in sk_stream_wait_memory
      before accessing the wait queue.
      Suggested-by: NJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: NLiu Jian <liujian56@huawei.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/bpf/20220823133755.314697-2-liujian56@huawei.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      86774965
    • Z
      can: bcm: check the result of can_send() in bcm_can_tx() · 4b29a149
      Ziyang Xuan 提交于
      stable inclusion
      from stable-v4.19.262
      commit dae06957f856eb699f2a504a46891718c9b1e0d3
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 3fd7bfd2 ]
      
      If can_send() fail, it should not update frames_abs counter
      in bcm_can_tx(). Add the result check for can_send() in bcm_can_tx().
      Suggested-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      Suggested-by: NOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com>
      Link: https://lore.kernel.org/all/9851878e74d6d37aee2f1ee76d68361a46f89458.1663206163.git.william.xuanziyang@huawei.comAcked-by: NOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      4b29a149
    • K
      xfrm: Update ipcomp_scratches with NULL when freed · 16cebfda
      Khalid Masum 提交于
      stable inclusion
      from stable-v4.19.262
      commit 1e8abde895b3ac6a368cbdb372e8800c49e73a28
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 8a04d2fc ]
      
      Currently if ipcomp_alloc_scratches() fails to allocate memory
      ipcomp_scratches holds obsolete address. So when we try to free the
      percpu scratches using ipcomp_free_scratches() it tries to vfree non
      existent vm area. Described below:
      
      static void * __percpu *ipcomp_alloc_scratches(void)
      {
              ...
              scratches = alloc_percpu(void *);
              if (!scratches)
                      return NULL;
      ipcomp_scratches does not know about this allocation failure.
      Therefore holding the old obsolete address.
              ...
      }
      
      So when we free,
      
      static void ipcomp_free_scratches(void)
      {
              ...
              scratches = ipcomp_scratches;
      Assigning obsolete address from ipcomp_scratches
      
              if (!scratches)
                      return;
      
              for_each_possible_cpu(i)
                     vfree(*per_cpu_ptr(scratches, i));
      Trying to free non existent page, causing warning: trying to vfree
      existent vm area.
              ...
      }
      
      Fix this breakage by updating ipcomp_scrtches with NULL when scratches
      is freed
      Suggested-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Reported-by: syzbot+5ec9bb042ddfe9644773@syzkaller.appspotmail.com
      Tested-by: syzbot+5ec9bb042ddfe9644773@syzkaller.appspotmail.com
      Signed-off-by: NKhalid Masum <khalid.masum.92@gmail.com>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      16cebfda
    • E
      tcp: annotate data-race around tcp_md5sig_pool_populated · e68b9f9b
      Eric Dumazet 提交于
      stable inclusion
      from stable-v4.19.262
      commit 5c4e1b8939195fe27b05d791577f92445b139a3e
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit aacd467c ]
      
      tcp_md5sig_pool_populated can be read while another thread
      changes its value.
      
      The race has no consequence because allocations
      are protected with tcp_md5sig_mutex.
      
      This patch adds READ_ONCE() and WRITE_ONCE() to document
      the race and silence KCSAN.
      Reported-by: NAbhishek Shah <abhishek.shah@columbia.edu>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      e68b9f9b
    • N
      tcp: fix tcp_cwnd_validate() to not forget is_cwnd_limited · 4e3f7a25
      Neal Cardwell 提交于
      stable inclusion
      from stable-v4.19.262
      commit a434d10e7a90e301ea4a1826ee758b53c79d7de8
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit f4ce91ce ]
      
      This commit fixes a bug in the tracking of max_packets_out and
      is_cwnd_limited. This bug can cause the connection to fail to remember
      that is_cwnd_limited is true, causing the connection to fail to grow
      cwnd when it should, causing throughput to be lower than it should be.
      
      The following event sequence is an example that triggers the bug:
      
       (a) The connection is cwnd_limited, but packets_out is not at its
           peak due to TSO deferral deciding not to send another skb yet.
           In such cases the connection can advance max_packets_seq and set
           tp->is_cwnd_limited to true and max_packets_out to a small
           number.
      
      (b) Then later in the round trip the connection is pacing-limited (not
           cwnd-limited), and packets_out is larger. In such cases the
           connection would raise max_packets_out to a bigger number but
           (unexpectedly) flip tp->is_cwnd_limited from true to false.
      
      This commit fixes that bug.
      
      One straightforward fix would be to separately track (a) the next
      window after max_packets_out reaches a maximum, and (b) the next
      window after tp->is_cwnd_limited is set to true. But this would
      require consuming an extra u32 sequence number.
      
      Instead, to save space we track only the most important
      information. Specifically, we track the strongest available signal of
      the degree to which the cwnd is fully utilized:
      
      (1) If the connection is cwnd-limited then we remember that fact for
      the current window.
      
      (2) If the connection not cwnd-limited then we track the maximum
      number of outstanding packets in the current window.
      
      In particular, note that the new logic cannot trigger the buggy
      (a)/(b) sequence above because with the new logic a condition where
      tp->packets_out > tp->max_packets_out can only trigger an update of
      tp->is_cwnd_limited if tp->is_cwnd_limited is false.
      
      This first showed up in a testing of a BBRv2 dev branch, but this
      buggy behavior highlighted a general issue with the
      tcp_cwnd_validate() logic that can cause cwnd to fail to increase at
      the proper rate for any TCP congestion control, including Reno or
      CUBIC.
      
      Fixes: ca8a2263 ("tcp: make cwnd-limited checks measurement-based, and gentler")
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NKevin(Yudong) Yang <yyd@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      4e3f7a25
    • B
      ext4: fix null-ptr-deref in ext4_write_info · 4700f752
      Baokun Li 提交于
      stable inclusion
      from stable-v4.19.262
      commit 947264e00c46de19a016fd81218118c708fed2f3
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
      CVE: NA
      
      --------------------------------
      
      commit f9c1f248 upstream.
      
      I caught a null-ptr-deref bug as follows:
      
      ==================================================================
      KASAN: null-ptr-deref in range [0x0000000000000068-0x000000000000006f]
      CPU: 1 PID: 1589 Comm: umount Not tainted 5.10.0-02219-dirty #339
      RIP: 0010:ext4_write_info+0x53/0x1b0
      [...]
      Call Trace:
       dquot_writeback_dquots+0x341/0x9a0
       ext4_sync_fs+0x19e/0x800
       __sync_filesystem+0x83/0x100
       sync_filesystem+0x89/0xf0
       generic_shutdown_super+0x79/0x3e0
       kill_block_super+0xa1/0x110
       deactivate_locked_super+0xac/0x130
       deactivate_super+0xb6/0xd0
       cleanup_mnt+0x289/0x400
       __cleanup_mnt+0x16/0x20
       task_work_run+0x11c/0x1c0
       exit_to_user_mode_prepare+0x203/0x210
       syscall_exit_to_user_mode+0x5b/0x3a0
       do_syscall_64+0x59/0x70
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
       ==================================================================
      
      Above issue may happen as follows:
      -------------------------------------
      exit_to_user_mode_prepare
       task_work_run
        __cleanup_mnt
         cleanup_mnt
          deactivate_super
           deactivate_locked_super
            kill_block_super
             generic_shutdown_super
              shrink_dcache_for_umount
               dentry = sb->s_root
               sb->s_root = NULL              <--- Here set NULL
              sync_filesystem
               __sync_filesystem
                sb->s_op->sync_fs > ext4_sync_fs
                 dquot_writeback_dquots
                  sb->dq_op->write_info > ext4_write_info
                   ext4_journal_start(d_inode(sb->s_root), EXT4_HT_QUOTA, 2)
                    d_inode(sb->s_root)
                     s_root->d_inode          <--- Null pointer dereference
      
      To solve this problem, we use ext4_journal_start_sb directly
      to avoid s_root being used.
      
      Cc: stable@kernel.org
      Signed-off-by: NBaokun Li <libaokun1@huawei.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20220805123947.565152-1-libaokun1@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      4700f752
    • S
      Revert "fs: check FMODE_LSEEK to control internal pipe splicing" · d5e86eaf
      Sasha Levin 提交于
      stable inclusion
      from stable-v4.19.262
      commit 6d43e94b8daf009694d709c5919d67067936577a
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
      CVE: NA
      
      --------------------------------
      
      This reverts commit fd0a6e99b61e6c08fa5cf585d54fd956f70c73a6.
      
      Which was upstream commit 97ef77c5.
      
      The commit is missing dependencies and breaks NFS tests, remove it for
      now.
      Reported-by: NSaeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      d5e86eaf
    • T
      ima: Free the entire rule if it fails to parse · a8e008b9
      Tyler Hicks 提交于
      stable inclusion
      from stable-v4.19.261
      commit f039564c36ee609bc5327a52895dfee05c8f0f7c
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
      CVE: NA
      
      --------------------------------
      
      commit 2bdd737c upstream.
      
      Use ima_free_rule() to fix memory leaks of allocated ima_rule_entry
      members, such as .fsname and .keyrings, when an error is encountered
      during rule parsing.
      
      Set the args_p pointer to NULL after freeing it in the error path of
      ima_lsm_rule_init() so that it isn't freed twice.
      
      This fixes a memory leak seen when loading an rule that contains an
      additional piece of allocated memory, such as an fsname, followed by an
      invalid conditional:
      
       # echo "measure fsname=tmpfs bad=cond" > /sys/kernel/security/ima/policy
       -bash: echo: write error: Invalid argument
       # echo scan > /sys/kernel/debug/kmemleak
       # cat /sys/kernel/debug/kmemleak
       unreferenced object 0xffff98e7e4ece6c0 (size 8):
         comm "bash", pid 672, jiffies 4294791843 (age 21.855s)
         hex dump (first 8 bytes):
           74 6d 70 66 73 00 6b a5                          tmpfs.k.
         backtrace:
           [<00000000abab7413>] kstrdup+0x2e/0x60
           [<00000000f11ede32>] ima_parse_add_rule+0x7d4/0x1020
           [<00000000f883dd7a>] ima_write_policy+0xab/0x1d0
           [<00000000b17cf753>] vfs_write+0xde/0x1d0
           [<00000000b8ddfdea>] ksys_write+0x68/0xe0
           [<00000000b8e21e87>] do_syscall_64+0x56/0xa0
           [<0000000089ea7b98>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: f1b08bbc ("ima: define a new policy condition based on the filesystem name")
      Fixes: 2b60c0ec ("IMA: Read keyrings= option from the IMA policy")
      Signed-off-by: NTyler Hicks <tyhicks@linux.microsoft.com>
      Signed-off-by: NMimi Zohar <zohar@linux.ibm.com>
      Cc: <stable@vger.kernel.org> # 4.19+
      Signed-off-by: NGou Hao <gouhao@uniontech.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      a8e008b9
    • T
      ima: Free the entire rule when deleting a list of rules · f51a4e07
      Tyler Hicks 提交于
      stable inclusion
      from stable-v4.19.261
      commit 3d55a948aaec1ffd4ea329bc6e1a7ecd4f10e64f
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
      CVE: NA
      
      --------------------------------
      
      commit 465aee77 upstream.
      
      Create a function, ima_free_rule(), to free all memory associated with
      an ima_rule_entry. Use the new function to fix memory leaks of allocated
      ima_rule_entry members, such as .fsname and .keyrings, when deleting a
      list of rules.
      
      Make the existing ima_lsm_free_rule() function specific to the LSM
      audit rule array of an ima_rule_entry and require that callers make an
      additional call to kfree to free the ima_rule_entry itself.
      
      This fixes a memory leak seen when loading by a valid rule that contains
      an additional piece of allocated memory, such as an fsname, followed by
      an invalid rule that triggers a policy load failure:
      
       # echo -e "dont_measure fsname=securityfs\nbad syntax" > \
          /sys/kernel/security/ima/policy
       -bash: echo: write error: Invalid argument
       # echo scan > /sys/kernel/debug/kmemleak
       # cat /sys/kernel/debug/kmemleak
       unreferenced object 0xffff9bab67ca12c0 (size 16):
         comm "bash", pid 684, jiffies 4295212803 (age 252.344s)
         hex dump (first 16 bytes):
           73 65 63 75 72 69 74 79 66 73 00 6b 6b 6b 6b a5  securityfs.kkkk.
         backtrace:
           [<00000000adc80b1b>] kstrdup+0x2e/0x60
           [<00000000d504cb0d>] ima_parse_add_rule+0x7d4/0x1020
           [<00000000444825ac>] ima_write_policy+0xab/0x1d0
           [<000000002b7f0d6c>] vfs_write+0xde/0x1d0
           [<0000000096feedcf>] ksys_write+0x68/0xe0
           [<0000000052b544a2>] do_syscall_64+0x56/0xa0
           [<000000007ead1ba7>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: f1b08bbc ("ima: define a new policy condition based on the filesystem name")
      Fixes: 2b60c0ec ("IMA: Read keyrings= option from the IMA policy")
      Signed-off-by: NTyler Hicks <tyhicks@linux.microsoft.com>
      Signed-off-by: NMimi Zohar <zohar@linux.ibm.com>
      Cc: <stable@vger.kernel.org> # 4.19+
      Signed-off-by: NGou Hao <gouhao@uniontech.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      f51a4e07
    • T
      ima: Have the LSM free its audit rule · 5fb445c2
      Tyler Hicks 提交于
      stable inclusion
      from stable-v4.19.261
      commit 7e290764624acfc807a9dae958b3e4ecc550b50c
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
      CVE: NA
      
      --------------------------------
      
      commit 9ff8a616 upstream.
      
      Ask the LSM to free its audit rule rather than directly calling kfree().
      Both AppArmor and SELinux do additional work in their audit_rule_free()
      hooks. Fix memory leaks by allowing the LSMs to perform necessary work.
      
      Fixes: b1694245 ("ima: use the lsm policy update notifier")
      Signed-off-by: NTyler Hicks <tyhicks@linux.microsoft.com>
      Cc: Janne Karhunen <janne.karhunen@gmail.com>
      Cc: Casey Schaufler <casey@schaufler-ca.com>
      Reviewed-by: NMimi Zohar <zohar@linux.ibm.com>
      Signed-off-by: NMimi Zohar <zohar@linux.ibm.com>
      Cc: <stable@vger.kernel.org> # 4.19+
      Signed-off-by: NGou Hao <gouhao@uniontech.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      5fb445c2
    • A
      mm/migrate_device.c: flush TLB while holding PTL · 926a1eb4
      Alistair Popple 提交于
      stable inclusion
      from stable-v4.19.261
      commit acf4387e553ede843bdacf3616e4b750517b58db
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
      CVE: NA
      
      --------------------------------
      
      commit 60bae737 upstream.
      
      When clearing a PTE the TLB should be flushed whilst still holding the PTL
      to avoid a potential race with madvise/munmap/etc.  For example consider
      the following sequence:
      
        CPU0                          CPU1
        ----                          ----
      
        migrate_vma_collect_pmd()
        pte_unmap_unlock()
                                      madvise(MADV_DONTNEED)
                                      -> zap_pte_range()
                                      pte_offset_map_lock()
                                      [ PTE not present, TLB not flushed ]
                                      pte_unmap_unlock()
                                      [ page is still accessible via stale TLB ]
        flush_tlb_range()
      
      In this case the page may still be accessed via the stale TLB entry after
      madvise returns.  Fix this by flushing the TLB while holding the PTL.
      
      Fixes: 8c3328f1 ("mm/migrate: migrate_vma() unmap page from vma while collecting pages")
      Link: https://lkml.kernel.org/r/9f801e9d8d830408f2ca27821f606e09aa856899.1662078528.git-series.apopple@nvidia.comSigned-off-by: NAlistair Popple <apopple@nvidia.com>
      Reported-by: NNadav Amit <nadav.amit@gmail.com>
      Reviewed-by: N"Huang, Ying" <ying.huang@intel.com>
      Acked-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NPeter Xu <peterx@redhat.com>
      Cc: Alex Sierra <alex.sierra@amd.com>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: Felix Kuehling <Felix.Kuehling@amd.com>
      Cc: huang ying <huang.ying.caritas@gmail.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Karol Herbst <kherbst@redhat.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Lyude Paul <lyude@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      926a1eb4
    • M
      mm: prevent page_frag_alloc() from corrupting the memory · d6bd3a36
      Maurizio Lombardi 提交于
      stable inclusion
      from stable-v4.19.261
      commit 39a22a4ccd3a6c073ba2257629e752afa4e7ad08
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
      CVE: NA
      
      --------------------------------
      
      commit dac22531 upstream.
      
      A number of drivers call page_frag_alloc() with a fragment's size >
      PAGE_SIZE.
      
      In low memory conditions, __page_frag_cache_refill() may fail the order
      3 cache allocation and fall back to order 0; In this case, the cache
      will be smaller than the fragment, causing memory corruptions.
      
      Prevent this from happening by checking if the newly allocated cache is
      large enough for the fragment; if not, the allocation will fail and
      page_frag_alloc() will return NULL.
      
      Link: https://lkml.kernel.org/r/20220715125013.247085-1-mlombard@redhat.com
      Fixes: b63ae8ca ("mm/net: Rename and move page fragment handling from net/ to mm/")
      Signed-off-by: NMaurizio Lombardi <mlombard@redhat.com>
      Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com>
      Cc: Chen Lin <chen45464546@163.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      d6bd3a36