1. 08 12月, 2022 2 次提交
    • X
      signal handling: don't use BUG_ON() for debugging · a2f88993
      Xia Fukun 提交于
      stable inclusion
      from stable-v4.19.267
      commit 93d9cef55f8fe463e3b9f6c73c7a32619222c657
      category: bugfix
      bugzilla: 187828, https://gitee.com/openeuler/kernel/issues/I63UEU
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit a382f8fe ]
      
      These are indeed "should not happen" situations, but it turns out recent
      changes made the 'task_is_stopped_or_trace()' case trigger (fix for that
      exists, is pending more testing), and the BUG_ON() makes it
      unnecessarily hard to actually debug for no good reason.
      
      It's been that way for a long time, but let's make it clear: BUG_ON() is
      not good for debugging, and should never be used in situations where you
      could just say "this shouldn't happen, but we can continue".
      
      Use WARN_ON_ONCE() instead to make sure it gets logged, and then just
      continue running.  Instead of making the system basically unusuable
      because you crashed the machine while potentially holding some very core
      locks (eg this function is commonly called while holding 'tasklist_lock'
      for writing).
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NXia Fukun <xiafukun@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      a2f88993
    • X
      ida: don't use BUG_ON() for debugging · eaab6483
      Xia Fukun 提交于
      stable inclusion
      from stable-v4.19.267
      commit 33d2f83e3f2c1fdabb365d25bed3aa630041cbc0
      category: bugfix
      bugzilla: 188002, https://gitee.com/openeuler/kernel/issues/I63UEU
      CVE: NA
      
      --------------------------------
      
      commit fc82bbf4 upstream.
      
      This is another old BUG_ON() that just shouldn't exist (see also commit
      a382f8fe: "signal handling: don't use BUG_ON() for debugging").
      
      In fact, as Matthew Wilcox points out, this condition shouldn't really
      even result in a warning, since a negative id allocation result is just
      a normal allocation failure:
      
        "I wonder if we should even warn here -- sure, the caller is trying to
         free something that wasn't allocated, but we don't warn for
         kfree(NULL)"
      
      and goes on to point out how that current error check is only causing
      people to unnecessarily do their own index range checking before freeing
      it.
      
      This was noted by Itay Iellin, because the bluetooth HCI socket cookie
      code does *not* do that range checking, and ends up just freeing the
      error case too, triggering the BUG_ON().
      
      The HCI code requires CAP_NET_RAW, and seems to just result in an ugly
      splat, but there really is no reason to BUG_ON() here, and we have
      generally striven for allocation models where it's always ok to just do
      
          free(alloc());
      
      even if the allocation were to fail for some random reason (usually
      obviously that "random" reason being some resource limit).
      
      Fixes: 88eca020 ("ida: simplified functions for id allocation")
      Reported-by: NItay Iellin <ieitayie@gmail.com>
      Suggested-by: NMatthew Wilcox <willy@infradead.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NXia Fukun <xiafukun@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      eaab6483
  2. 06 12月, 2022 1 次提交
    • O
      !272 [openEuler-1.0-LTS] Add MWAIT Cx support for Zhaoxin CPUs. · 75ea48ac
      openeuler-ci-bot 提交于
      Merge Pull Request from: @leoliu-oc 
       
      When the processor is idle,low-power idle states (C-states) can be used to save power. For Zhaoxin processors,there are two methods to enter idle states. One is HLT instruction and legacy method of I/O reads from the CPI-defined register (known as P_LVLx),the other one is MWAIT instruction with idle states hints.
      
      Default for legacy operating system,HLT and P_LVLx I/O reads are used for Zhaoxin Processors to enter idle states, but we have checked on some Zhaoxin platform that MWAIT instruction is more efficient than P_LVLx I/O reads and HLT, so we add MWAIT Cx support for Zhaoxin Processors.
      
      ### Issue
      https://gitee.com/openeuler/kernel/issues/I62TOM
      
      ### Test
      N/A
      
      ### Known Issue
      N/A
      
      ### Default config change
      N/A
      
       
       
      Link:https://gitee.com/openeuler/kernel/pulls/272 
      Reviewed-by: Laibin Qiu <qiulaibin@huawei.com> 
      Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com> 
      75ea48ac
  3. 05 12月, 2022 3 次提交
  4. 29 11月, 2022 5 次提交
  5. 27 11月, 2022 1 次提交
    • F
      x86/tsc: use topology_max_packages() in tsc watchdog check · 4f283abb
      Feng Tang 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 187942, https://gitee.com/openeuler/kernel/issues/I5U037
      CVE: NA
      
      -------------------------------
      
      Commit b50db709 ("x86/tsc: Disable clocksource watchdog for TSC
      on qualified platorms") was introduced to solve problem that
      sometimes TSC clocksource is wrongly judged as unstable by watchdog
      like 'jiffies', HPET, etc.
      
      In it, the hardware socket number is a key factor for judging
      whether to disable the watchdog for TSC, and 'nr_online_nodes' was
      chosen as an estimation due to it is needed in early boot phase
      before registering 'tsc-early' clocksource, where all none-boot
      CPUs are not brought up yet.
      
      In recent patch review, Dave Hansen pointed out there are many
      cases that 'nr_online_nodes' could have issue, like:
      * numa emulation (numa=fake=4 etc.)
      * numa=off
      * platforms with CPU+DRAM nodes, CPU-less HBM nodes, CPU-less
        persistent memory nodes.
      
      Peter Zijlstra suggested to use logical package ids, but it is
      only usable after smp_init() and all CPUs are initialized.
      
      One solution is to skip the watchdog for 'tsc-early' clocksource,
      and move the check after smp_init(), while before 'tsc'
      clocksoure is registered, where topology_max_packages() could
      be used as a much more accurate socket number.
      Signed-off-by: NFeng Tang <feng.tang@intel.com>
      
      Conflict:
      	arch/x86/kernel/tsc.c
      Signed-off-by: NYu Liao <liaoyu15@huawei.com>
      Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      4f283abb
  6. 26 11月, 2022 2 次提交
    • X
      scsi: hisi_sas: Set iptt aborted flag when receiving an abnormal CQ · 4cccc16a
      Xingui Yang 提交于
      driver inclusion
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I62ZXO
      CVE: NA
      
      ------------------------------------------------
      
      During the write I/O, when the SAS PHY switch is tested, the hardware
      may reports two CQs for one IO. the first cq indicates invalid port when
      DPH scheduling, the second cq indicates that response frame has been
      written to the memory but the I/O is ended abnormally due to I/O data
      underload. So set iptt aborted flag when receiving an abnormal CQ, then the
      host will discards the IPTT frame received from the SAS hard disk.
      Signed-off-by: NXingui Yang <yangxingui@huawei.com>
      Reviewed-by: Nkang fenglong <kangfenglong@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      4cccc16a
    • L
      ext4: fix bug in extents parsing when eh_entries == 0 and eh_depth > 0 · bc9ebdce
      Luís Henriques 提交于
      mainline inclusion
      from mainline-v6.0-rc7
      commit 29a5b8a1
      category: bugfix
      bugzilla: 187444, https://gitee.com/openeuler/kernel/issues/I6261Z
      CVE: NA
      
      --------------------------------
      
      When walking through an inode extents, the ext4_ext_binsearch_idx() function
      assumes that the extent header has been previously validated.  However, there
      are no checks that verify that the number of entries (eh->eh_entries) is
      non-zero when depth is > 0.  And this will lead to problems because the
      EXT_FIRST_INDEX() and EXT_LAST_INDEX() will return garbage and result in this:
      
      [  135.245946] ------------[ cut here ]------------
      [  135.247579] kernel BUG at fs/ext4/extents.c:2258!
      [  135.249045] invalid opcode: 0000 [#1] PREEMPT SMP
      [  135.250320] CPU: 2 PID: 238 Comm: tmp118 Not tainted 5.19.0-rc8+ #4
      [  135.252067] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014
      [  135.255065] RIP: 0010:ext4_ext_map_blocks+0xc20/0xcb0
      [  135.256475] Code:
      [  135.261433] RSP: 0018:ffffc900005939f8 EFLAGS: 00010246
      [  135.262847] RAX: 0000000000000024 RBX: ffffc90000593b70 RCX: 0000000000000023
      [  135.264765] RDX: ffff8880038e5f10 RSI: 0000000000000003 RDI: ffff8880046e922c
      [  135.266670] RBP: ffff8880046e9348 R08: 0000000000000001 R09: ffff888002ca580c
      [  135.268576] R10: 0000000000002602 R11: 0000000000000000 R12: 0000000000000024
      [  135.270477] R13: 0000000000000000 R14: 0000000000000024 R15: 0000000000000000
      [  135.272394] FS:  00007fdabdc56740(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
      [  135.274510] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  135.276075] CR2: 00007ffc26bd4f00 CR3: 0000000006261004 CR4: 0000000000170ea0
      [  135.277952] Call Trace:
      [  135.278635]  <TASK>
      [  135.279247]  ? preempt_count_add+0x6d/0xa0
      [  135.280358]  ? percpu_counter_add_batch+0x55/0xb0
      [  135.281612]  ? _raw_read_unlock+0x18/0x30
      [  135.282704]  ext4_map_blocks+0x294/0x5a0
      [  135.283745]  ? xa_load+0x6f/0xa0
      [  135.284562]  ext4_mpage_readpages+0x3d6/0x770
      [  135.285646]  read_pages+0x67/0x1d0
      [  135.286492]  ? folio_add_lru+0x51/0x80
      [  135.287441]  page_cache_ra_unbounded+0x124/0x170
      [  135.288510]  filemap_get_pages+0x23d/0x5a0
      [  135.289457]  ? path_openat+0xa72/0xdd0
      [  135.290332]  filemap_read+0xbf/0x300
      [  135.291158]  ? _raw_spin_lock_irqsave+0x17/0x40
      [  135.292192]  new_sync_read+0x103/0x170
      [  135.293014]  vfs_read+0x15d/0x180
      [  135.293745]  ksys_read+0xa1/0xe0
      [  135.294461]  do_syscall_64+0x3c/0x80
      [  135.295284]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      This patch simply adds an extra check in __ext4_ext_check(), verifying that
      eh_entries is not 0 when eh_depth is > 0.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=215941
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=216283
      Cc: Baokun Li <libaokun1@huawei.com>
      Cc: stable@kernel.org
      Signed-off-by: NLuís Henriques <lhenriques@suse.de>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NBaokun Li <libaokun1@huawei.com>
      Link: https://lore.kernel.org/r/20220822094235.2690-1-lhenriques@suse.deSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NBaokun Li <libaokun1@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      bc9ebdce
  7. 24 11月, 2022 1 次提交
    • L
      Add MWAIT Cx support for Zhaoxin CPUs. · e1b6487f
      leoliu 提交于
      zhaoxin inclusion
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I62TOM
      CVE: NA
      
      ----------------------------------------------------------------
      
      When the processor is idle,low-power idle states (C-states) can be used
      to save power. For Zhaoxin processors,there are two methods to enter idle
      states. One is HLT instruction and legacy method of I/O reads from the
      ACPI-defined register (known as P_LVLx),the other one is MWAIT
      instruction with idle states hints.
      
      Default for legacy operating system,HLT and P_LVLx I/O reads are used for
      Zhaoxin Processors to enter idle states, but we have checked on some
      Zhaoxin platform that MWAIT instruction is more efficient than P_LVLx I/O
      reads and HLT, so we add MWAIT Cx support for Zhaoxin Processors.
      Signed-off-by: Nleoliu <leoliu@zhaoxin.com>
      e1b6487f
  8. 21 11月, 2022 1 次提交
  9. 19 11月, 2022 4 次提交
  10. 15 11月, 2022 1 次提交
  11. 14 11月, 2022 2 次提交
  12. 08 11月, 2022 17 次提交
    • R
      init/main.c: return 1 from handled __setup() functions · d484e833
      Randy Dunlap 提交于
      stable inclusion
      from stable-4.19.238
      commit c7daf1b4ad809692d5c26f33c02ed8a031066548
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5X41F
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit f9a40b08 ]
      
      initcall_blacklist() should return 1 to indicate that it handled its
      cmdline arguments.
      
      set_debug_rodata() should return 1 to indicate that it handled its
      cmdline arguments.  Print a warning if the option string is invalid.
      
      This prevents these strings from being added to the 'init' program's
      environment as they are not init arguments/parameters.
      
      Link: https://lkml.kernel.org/r/20220221050901.23985-1-rdunlap@infradead.orgSigned-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Reported-by: NIgor Zhbanov <i.zhbanov@omprussia.ru>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NLin Yujun <linyujun809@huawei.com>
      Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      d484e833
    • P
      x86/pm: Save the MSR validity status at context setup · 47477ca7
      Pawan Gupta 提交于
      stable inclusion
      from stable-4.19.238
      commit c7daf1b4ad809692d5c26f33c02ed8a031066548
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5X41F
      CVE: NA
      
      --------------------------------
      
      commit 73924ec4 upstream.
      
      The mechanism to save/restore MSRs during S3 suspend/resume checks for
      the MSR validity during suspend, and only restores the MSR if its a
      valid MSR.  This is not optimal, as an invalid MSR will unnecessarily
      throw an exception for every suspend cycle.  The more invalid MSRs,
      higher the impact will be.
      
      Check and save the MSR validity at setup.  This ensures that only valid
      MSRs that are guaranteed to not throw an exception will be attempted
      during suspend.
      
      Fixes: 7a9c2dd0 ("x86/pm: Introduce quirk framework to save/restore extra MSR registers around suspend/resume")
      Suggested-by: NDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Reviewed-by: NDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NLin Yujun <linyujun809@huawei.com>
      Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      47477ca7
    • P
      x86/speculation: Restore speculation related MSRs during S3 resume · 2b91784d
      Pawan Gupta 提交于
      stable inclusion
      from stable-4.19.238
      commit c7daf1b4ad809692d5c26f33c02ed8a031066548
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5X41F
      CVE: NA
      
      --------------------------------
      
      commit e2a1256b upstream.
      
      After resuming from suspend-to-RAM, the MSRs that control CPU's
      speculative execution behavior are not being restored on the boot CPU.
      
      These MSRs are used to mitigate speculative execution vulnerabilities.
      Not restoring them correctly may leave the CPU vulnerable.  Secondary
      CPU's MSRs are correctly being restored at S3 resume by
      identify_secondary_cpu().
      
      During S3 resume, restore these MSRs for boot CPU when restoring its
      processor state.
      
      Fixes: 77243971 ("x86/bugs/intel: Set proper CPU features and setup RDS")
      Reported-by: NNeelima Krishnan <neelima.krishnan@intel.com>
      Signed-off-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Tested-by: NNeelima Krishnan <neelima.krishnan@intel.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NLin Yujun <linyujun809@huawei.com>
      Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      2b91784d
    • B
      x86/cpu: Load microcode during restore_processor_state() · 27dd57ae
      Borislav Petkov 提交于
      stable inclusion
      from stable-4.19.238
      commit c7daf1b4ad809692d5c26f33c02ed8a031066548
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5X41F
      CVE: NA
      
      --------------------------------
      
      commit f9e14dbb upstream.
      
      When resuming from system sleep state, restore_processor_state()
      restores the boot CPU MSRs. These MSRs could be emulated by microcode.
      If microcode is not loaded yet, writing to emulated MSRs leads to
      unchecked MSR access error:
      
        ...
        PM: Calling lapic_suspend+0x0/0x210
        unchecked MSR access error: WRMSR to 0x10f (tried to write 0x0...0) at rIP: ... (native_write_msr)
        Call Trace:
          <TASK>
          ? restore_processor_state
          x86_acpi_suspend_lowlevel
          acpi_suspend_enter
          suspend_devices_and_enter
          pm_suspend.cold
          state_store
          kobj_attr_store
          sysfs_kf_write
          kernfs_fop_write_iter
          new_sync_write
          vfs_write
          ksys_write
          __x64_sys_write
          do_syscall_64
          entry_SYSCALL_64_after_hwframe
         RIP: 0033:0x7fda13c260a7
      
      To ensure microcode emulated MSRs are available for restoration, load
      the microcode on the boot CPU before restoring these MSRs.
      
        [ Pawan: write commit message and productize it. ]
      
      Fixes: e2a1256b ("x86/speculation: Restore speculation related MSRs during S3 resume")
      Reported-by: NKyle D. Pelton <kyle.d.pelton@intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Tested-by: NKyle D. Pelton <kyle.d.pelton@intel.com>
      Cc: stable@vger.kernel.org
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=215841
      Link: https://lore.kernel.org/r/4350dfbf785cd482d3fafa72b2b49c83102df3ce.1650386317.git.pawan.kumar.gupta@linux.intel.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NLin Yujun <linyujun809@huawei.com>
      Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      27dd57ae
    • T
      genirq: Synchronize interrupt thread startup · 950faec0
      Thomas Pfaff 提交于
      stable inclusion
      from stable-4.19.238
      commit c7daf1b4ad809692d5c26f33c02ed8a031066548
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5X41F
      CVE: NA
      
      --------------------------------
      
      commit 8707898e upstream.
      
      A kernel hang can be observed when running setserial in a loop on a kernel
      with force threaded interrupts. The sequence of events is:
      
         setserial
           open("/dev/ttyXXX")
             request_irq()
           do_stuff()
            -> serial interrupt
               -> wake(irq_thread)
      	      desc->threads_active++;
           close()
             free_irq()
               kthread_stop(irq_thread)
           synchronize_irq() <- hangs because desc->threads_active != 0
      
      The thread is created in request_irq() and woken up, but does not get on a
      CPU to reach the actual thread function, which would handle the pending
      wake-up. kthread_stop() sets the should stop condition which makes the
      thread immediately exit, which in turn leaves the stale threads_active
      count around.
      
      This problem was introduced with commit 519cc865, which addressed a
      interrupt sharing issue in the PCIe code.
      
      Before that commit free_irq() invoked synchronize_irq(), which waits for
      the hard interrupt handler and also for associated threads to complete.
      
      To address the PCIe issue synchronize_irq() was replaced with
      __synchronize_hardirq(), which only waits for the hard interrupt handler to
      complete, but not for threaded handlers.
      
      This was done under the assumption, that the interrupt thread already
      reached the thread function and waits for a wake-up, which is guaranteed to
      be handled before acting on the stop condition. The problematic case, that
      the thread would not reach the thread function, was obviously overlooked.
      
      Make sure that the interrupt thread is really started and reaches
      thread_fn() before returning from __setup_irq().
      
      This utilizes the existing wait queue in the interrupt descriptor. The
      wait queue is unused for non-shared interrupts. For shared interrupts the
      usage might cause a spurious wake-up of a waiter in synchronize_irq() or the
      completion of a threaded handler might cause a spurious wake-up of the
      waiter for the ready flag. Both are harmless and have no functional impact.
      
      [ tglx: Amended changelog ]
      
      Fixes: 519cc865 ("genirq: Synchronize only with single thread on free_irq()")
      Signed-off-by: NThomas Pfaff <tpfaff@pcs.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NMarc Zyngier <maz@kernel.org>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/552fe7b4-9224-b183-bb87-a8f36d335690@pcs.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NLin Yujun <linyujun809@huawei.com>
      Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      950faec0
    • M
      nvme: Fix IOC_PR_CLEAR and IOC_PR_RELEASE ioctls for nvme devices · 3b23e85f
      Michael Kelley 提交于
      stable inclusion
      from stable-v4.19.261
      commit 5f7fd71e5bebf337769f20dd125822ce63266e4d
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit c292a337 ]
      
      The IOC_PR_CLEAR and IOC_PR_RELEASE ioctls are
      non-functional on NVMe devices because the nvme_pr_clear()
      and nvme_pr_release() functions set the IEKEY field incorrectly.
      The IEKEY field should be set only when the key is zero (i.e,
      not specified).  The current code does it backwards.
      
      Furthermore, the NVMe spec describes the persistent
      reservation "clear" function as an option on the reservation
      release command. The current implementation of nvme_pr_clear()
      erroneously uses the reservation register command.
      
      Fix these errors. Note that NVMe version 1.3 and later specify
      that setting the IEKEY field will return an error of Invalid
      Field in Command.  The fix will set IEKEY when the key is zero,
      which is appropriate as these ioctls consider a zero key to
      be "unspecified", and the intention of the spec change is
      to require a valid key.
      
      Tested on a version 1.4 PCI NVMe device in an Azure VM.
      
      Fixes: 1673f1f0 ("nvme: move block_device_operations and ns/ctrl freeing to common code")
      Fixes: 1d277a63 ("NVMe: Add persistent reservation ops")
      Signed-off-by: NMichael Kelley <mikelley@microsoft.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Conflicts:
      	drivers/nvme/host/core.c
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      Reviewed-by: NJason Yan <yanaijie@huawei.com>
      Reviewed-by: NYu Kuai <yukuai3@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      3b23e85f
    • E
      once: add DO_ONCE_SLOW() for sleepable contexts · 3743e9b5
      Eric Dumazet 提交于
      stable inclusion
      from stable-v4.19.262
      commit f5686a03b138f6330eeda082ee4f96c8109f56f3
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 62c07983 ]
      
      Christophe Leroy reported a ~80ms latency spike
      happening at first TCP connect() time.
      
      This is because __inet_hash_connect() uses get_random_once()
      to populate a perturbation table which became quite big
      after commit 4c2c8f03 ("tcp: increase source port perturb table to 2^16")
      
      get_random_once() uses DO_ONCE(), which block hard irqs for the duration
      of the operation.
      
      This patch adds DO_ONCE_SLOW() which uses a mutex instead of a spinlock
      for operations where we prefer to stay in process context.
      
      Then __inet_hash_connect() can use get_random_slow_once()
      to populate its perturbation table.
      
      Fixes: 4c2c8f03 ("tcp: increase source port perturb table to 2^16")
      Fixes: 190cc824 ("tcp: change source port randomizarion at connect() time")
      Reported-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
      Link: https://lore.kernel.org/netdev/CANn89iLAEYBaoYajy0Y9UmGFff5GPxDUoG-ErVB2jDdRNQ5Tug@mail.gmail.com/T/#tSigned-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Tested-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      
      One conflict occurs because the commit 4c2c8f03 ("tcp: increase
      source port perturb table to 2^16") is integrated but the commit
      e9261476 ("tcp: dynamically allocate the perturb table used by
      source port") is not integrated.
      One conflict occurs because the commit 1027b96e ("once: Fix panic
      when module unload") is not integrated.
      
      Conflicts:
      	net/ipv4/inet_hashtables.c
      	lib/once.c
      Signed-off-by: NLiu Jian <liujian56@huawei.com>
      Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      3743e9b5
    • E
      inet: fully convert sk->sk_rx_dst to RCU rules · 4c4298bf
      Eric Dumazet 提交于
      stable inclusion
      from stable-v4.19.262
      commit 75a578000ae5e511e5d0e8433c94a14d9c99c412
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
      CVE: NA
      
      --------------------------------
      
      commit 8f905c0e upstream.
      
      syzbot reported various issues around early demux,
      one being included in this changelog [1]
      
      sk->sk_rx_dst is using RCU protection without clearly
      documenting it.
      
      And following sequences in tcp_v4_do_rcv()/tcp_v6_do_rcv()
      are not following standard RCU rules.
      
      [a]    dst_release(dst);
      [b]    sk->sk_rx_dst = NULL;
      
      They look wrong because a delete operation of RCU protected
      pointer is supposed to clear the pointer before
      the call_rcu()/synchronize_rcu() guarding actual memory freeing.
      
      In some cases indeed, dst could be freed before [b] is done.
      
      We could cheat by clearing sk_rx_dst before calling
      dst_release(), but this seems the right time to stick
      to standard RCU annotations and debugging facilities.
      
      [1]
      BUG: KASAN: use-after-free in dst_check include/net/dst.h:470 [inline]
      BUG: KASAN: use-after-free in tcp_v4_early_demux+0x95b/0x960 net/ipv4/tcp_ipv4.c:1792
      Read of size 2 at addr ffff88807f1cb73a by task syz-executor.5/9204
      
      CPU: 0 PID: 9204 Comm: syz-executor.5 Not tainted 5.16.0-rc5-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       print_address_description.constprop.0.cold+0x8d/0x320 mm/kasan/report.c:247
       __kasan_report mm/kasan/report.c:433 [inline]
       kasan_report.cold+0x83/0xdf mm/kasan/report.c:450
       dst_check include/net/dst.h:470 [inline]
       tcp_v4_early_demux+0x95b/0x960 net/ipv4/tcp_ipv4.c:1792
       ip_rcv_finish_core.constprop.0+0x15de/0x1e80 net/ipv4/ip_input.c:340
       ip_list_rcv_finish.constprop.0+0x1b2/0x6e0 net/ipv4/ip_input.c:583
       ip_sublist_rcv net/ipv4/ip_input.c:609 [inline]
       ip_list_rcv+0x34e/0x490 net/ipv4/ip_input.c:644
       __netif_receive_skb_list_ptype net/core/dev.c:5508 [inline]
       __netif_receive_skb_list_core+0x549/0x8e0 net/core/dev.c:5556
       __netif_receive_skb_list net/core/dev.c:5608 [inline]
       netif_receive_skb_list_internal+0x75e/0xd80 net/core/dev.c:5699
       gro_normal_list net/core/dev.c:5853 [inline]
       gro_normal_list net/core/dev.c:5849 [inline]
       napi_complete_done+0x1f1/0x880 net/core/dev.c:6590
       virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
       virtnet_poll+0xca2/0x11b0 drivers/net/virtio_net.c:1557
       __napi_poll+0xaf/0x440 net/core/dev.c:7023
       napi_poll net/core/dev.c:7090 [inline]
       net_rx_action+0x801/0xb40 net/core/dev.c:7177
       __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
       invoke_softirq kernel/softirq.c:432 [inline]
       __irq_exit_rcu+0x123/0x180 kernel/softirq.c:637
       irq_exit_rcu+0x5/0x20 kernel/softirq.c:649
       common_interrupt+0x52/0xc0 arch/x86/kernel/irq.c:240
       asm_common_interrupt+0x1e/0x40 arch/x86/include/asm/idtentry.h:629
      RIP: 0033:0x7f5e972bfd57
      Code: 39 d1 73 14 0f 1f 80 00 00 00 00 48 8b 50 f8 48 83 e8 08 48 39 ca 77 f3 48 39 c3 73 3e 48 89 13 48 8b 50 f8 48 89 38 49 8b 0e <48> 8b 3e 48 83 c3 08 48 83 c6 08 eb bc 48 39 d1 72 9e 48 39 d0 73
      RSP: 002b:00007fff8a413210 EFLAGS: 00000283
      RAX: 00007f5e97108990 RBX: 00007f5e97108338 RCX: ffffffff81d3aa45
      RDX: ffffffff81d3aa45 RSI: 00007f5e97108340 RDI: ffffffff81d3aa45
      RBP: 00007f5e97107eb8 R08: 00007f5e97108d88 R09: 0000000093c2e8d9
      R10: 0000000000000000 R11: 0000000000000000 R12: 00007f5e97107eb0
      R13: 00007f5e97108338 R14: 00007f5e97107ea8 R15: 0000000000000019
       </TASK>
      
      Allocated by task 13:
       kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
       kasan_set_track mm/kasan/common.c:46 [inline]
       set_alloc_info mm/kasan/common.c:434 [inline]
       __kasan_slab_alloc+0x90/0xc0 mm/kasan/common.c:467
       kasan_slab_alloc include/linux/kasan.h:259 [inline]
       slab_post_alloc_hook mm/slab.h:519 [inline]
       slab_alloc_node mm/slub.c:3234 [inline]
       slab_alloc mm/slub.c:3242 [inline]
       kmem_cache_alloc+0x202/0x3a0 mm/slub.c:3247
       dst_alloc+0x146/0x1f0 net/core/dst.c:92
       rt_dst_alloc+0x73/0x430 net/ipv4/route.c:1613
       ip_route_input_slow+0x1817/0x3a20 net/ipv4/route.c:2340
       ip_route_input_rcu net/ipv4/route.c:2470 [inline]
       ip_route_input_noref+0x116/0x2a0 net/ipv4/route.c:2415
       ip_rcv_finish_core.constprop.0+0x288/0x1e80 net/ipv4/ip_input.c:354
       ip_list_rcv_finish.constprop.0+0x1b2/0x6e0 net/ipv4/ip_input.c:583
       ip_sublist_rcv net/ipv4/ip_input.c:609 [inline]
       ip_list_rcv+0x34e/0x490 net/ipv4/ip_input.c:644
       __netif_receive_skb_list_ptype net/core/dev.c:5508 [inline]
       __netif_receive_skb_list_core+0x549/0x8e0 net/core/dev.c:5556
       __netif_receive_skb_list net/core/dev.c:5608 [inline]
       netif_receive_skb_list_internal+0x75e/0xd80 net/core/dev.c:5699
       gro_normal_list net/core/dev.c:5853 [inline]
       gro_normal_list net/core/dev.c:5849 [inline]
       napi_complete_done+0x1f1/0x880 net/core/dev.c:6590
       virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
       virtnet_poll+0xca2/0x11b0 drivers/net/virtio_net.c:1557
       __napi_poll+0xaf/0x440 net/core/dev.c:7023
       napi_poll net/core/dev.c:7090 [inline]
       net_rx_action+0x801/0xb40 net/core/dev.c:7177
       __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
      
      Freed by task 13:
       kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
       kasan_set_track+0x21/0x30 mm/kasan/common.c:46
       kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
       ____kasan_slab_free mm/kasan/common.c:366 [inline]
       ____kasan_slab_free mm/kasan/common.c:328 [inline]
       __kasan_slab_free+0xff/0x130 mm/kasan/common.c:374
       kasan_slab_free include/linux/kasan.h:235 [inline]
       slab_free_hook mm/slub.c:1723 [inline]
       slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1749
       slab_free mm/slub.c:3513 [inline]
       kmem_cache_free+0xbd/0x5d0 mm/slub.c:3530
       dst_destroy+0x2d6/0x3f0 net/core/dst.c:127
       rcu_do_batch kernel/rcu/tree.c:2506 [inline]
       rcu_core+0x7ab/0x1470 kernel/rcu/tree.c:2741
       __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
      
      Last potentially related work creation:
       kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
       __kasan_record_aux_stack+0xf5/0x120 mm/kasan/generic.c:348
       __call_rcu kernel/rcu/tree.c:2985 [inline]
       call_rcu+0xb1/0x740 kernel/rcu/tree.c:3065
       dst_release net/core/dst.c:177 [inline]
       dst_release+0x79/0xe0 net/core/dst.c:167
       tcp_v4_do_rcv+0x612/0x8d0 net/ipv4/tcp_ipv4.c:1712
       sk_backlog_rcv include/net/sock.h:1030 [inline]
       __release_sock+0x134/0x3b0 net/core/sock.c:2768
       release_sock+0x54/0x1b0 net/core/sock.c:3300
       tcp_sendmsg+0x36/0x40 net/ipv4/tcp.c:1441
       inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:819
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:724
       sock_write_iter+0x289/0x3c0 net/socket.c:1057
       call_write_iter include/linux/fs.h:2162 [inline]
       new_sync_write+0x429/0x660 fs/read_write.c:503
       vfs_write+0x7cd/0xae0 fs/read_write.c:590
       ksys_write+0x1ee/0x250 fs/read_write.c:643
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      The buggy address belongs to the object at ffff88807f1cb700
       which belongs to the cache ip_dst_cache of size 176
      The buggy address is located 58 bytes inside of
       176-byte region [ffff88807f1cb700, ffff88807f1cb7b0)
      The buggy address belongs to the page:
      page:ffffea0001fc72c0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7f1cb
      flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff)
      raw: 00fff00000000200 dead000000000100 dead000000000122 ffff8881413bb780
      raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      page_owner tracks the page as allocated
      page last allocated via order 0, migratetype Unmovable, gfp_mask 0x112a20(GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_HARDWALL), pid 5, ts 108466983062, free_ts 108048976062
       prep_new_page mm/page_alloc.c:2418 [inline]
       get_page_from_freelist+0xa72/0x2f50 mm/page_alloc.c:4149
       __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5369
       alloc_pages+0x1a7/0x300 mm/mempolicy.c:2191
       alloc_slab_page mm/slub.c:1793 [inline]
       allocate_slab mm/slub.c:1930 [inline]
       new_slab+0x32d/0x4a0 mm/slub.c:1993
       ___slab_alloc+0x918/0xfe0 mm/slub.c:3022
       __slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3109
       slab_alloc_node mm/slub.c:3200 [inline]
       slab_alloc mm/slub.c:3242 [inline]
       kmem_cache_alloc+0x35c/0x3a0 mm/slub.c:3247
       dst_alloc+0x146/0x1f0 net/core/dst.c:92
       rt_dst_alloc+0x73/0x430 net/ipv4/route.c:1613
       __mkroute_output net/ipv4/route.c:2564 [inline]
       ip_route_output_key_hash_rcu+0x921/0x2d00 net/ipv4/route.c:2791
       ip_route_output_key_hash+0x18b/0x300 net/ipv4/route.c:2619
       __ip_route_output_key include/net/route.h:126 [inline]
       ip_route_output_flow+0x23/0x150 net/ipv4/route.c:2850
       ip_route_output_key include/net/route.h:142 [inline]
       geneve_get_v4_rt+0x3a6/0x830 drivers/net/geneve.c:809
       geneve_xmit_skb drivers/net/geneve.c:899 [inline]
       geneve_xmit+0xc4a/0x3540 drivers/net/geneve.c:1082
       __netdev_start_xmit include/linux/netdevice.h:4994 [inline]
       netdev_start_xmit include/linux/netdevice.h:5008 [inline]
       xmit_one net/core/dev.c:3590 [inline]
       dev_hard_start_xmit+0x1eb/0x920 net/core/dev.c:3606
       __dev_queue_xmit+0x299a/0x3650 net/core/dev.c:4229
      page last free stack trace:
       reset_page_owner include/linux/page_owner.h:24 [inline]
       free_pages_prepare mm/page_alloc.c:1338 [inline]
       free_pcp_prepare+0x374/0x870 mm/page_alloc.c:1389
       free_unref_page_prepare mm/page_alloc.c:3309 [inline]
       free_unref_page+0x19/0x690 mm/page_alloc.c:3388
       qlink_free mm/kasan/quarantine.c:146 [inline]
       qlist_free_all+0x5a/0xc0 mm/kasan/quarantine.c:165
       kasan_quarantine_reduce+0x180/0x200 mm/kasan/quarantine.c:272
       __kasan_slab_alloc+0xa2/0xc0 mm/kasan/common.c:444
       kasan_slab_alloc include/linux/kasan.h:259 [inline]
       slab_post_alloc_hook mm/slab.h:519 [inline]
       slab_alloc_node mm/slub.c:3234 [inline]
       kmem_cache_alloc_node+0x255/0x3f0 mm/slub.c:3270
       __alloc_skb+0x215/0x340 net/core/skbuff.c:414
       alloc_skb include/linux/skbuff.h:1126 [inline]
       alloc_skb_with_frags+0x93/0x620 net/core/skbuff.c:6078
       sock_alloc_send_pskb+0x783/0x910 net/core/sock.c:2575
       mld_newpack+0x1df/0x770 net/ipv6/mcast.c:1754
       add_grhead+0x265/0x330 net/ipv6/mcast.c:1857
       add_grec+0x1053/0x14e0 net/ipv6/mcast.c:1995
       mld_send_initial_cr.part.0+0xf6/0x230 net/ipv6/mcast.c:2242
       mld_send_initial_cr net/ipv6/mcast.c:1232 [inline]
       mld_dad_work+0x1d3/0x690 net/ipv6/mcast.c:2268
       process_one_work+0x9b2/0x1690 kernel/workqueue.c:2298
       worker_thread+0x658/0x11f0 kernel/workqueue.c:2445
      
      Memory state around the buggy address:
       ffff88807f1cb600: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff88807f1cb680: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
      >ffff88807f1cb700: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                              ^
       ffff88807f1cb780: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
       ffff88807f1cb800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      Fixes: 41063e9d ("ipv4: Early TCP socket demux.")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20211220143330.680945-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      [cmllamas: fixed trivial merge conflict]
      Signed-off-by: NCarlos Llamas <cmllamas@google.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      Conflicts:
      	net/ipv4/af_inet.c
      Signed-off-by: NDong Chenchen <dongchenchen2@huawei.com>
      Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      4c4298bf
    • J
      ext4: continue to expand file system when the target size doesn't reach · 3b8b0d3f
      Jerry Lee 李修賢 提交于
      stable inclusion
      from stable-v4.19.262
      commit f2180ad6a43501597d20eacad0c6f146c51d4bbd
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
      CVE: NA
      
      --------------------------------
      
      commit df3cb754 upstream.
      
      When expanding a file system from (16TiB-2MiB) to 18TiB, the operation
      exits early which leads to result inconsistency between resize2fs and
      Ext4 kernel driver.
      
      === before ===
      ○ → resize2fs /dev/mapper/thin
      resize2fs 1.45.5 (07-Jan-2020)
      Filesystem at /dev/mapper/thin is mounted on /mnt/test; on-line resizing required
      old_desc_blocks = 2048, new_desc_blocks = 2304
      The filesystem on /dev/mapper/thin is now 4831837696 (4k) blocks long.
      
      [  865.186308] EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
      [  912.091502] dm-4: detected capacity change from 34359738368 to 38654705664
      [  970.030550] dm-5: detected capacity change from 34359734272 to 38654701568
      [ 1000.012751] EXT4-fs (dm-5): resizing filesystem from 4294966784 to 4831837696 blocks
      [ 1000.012878] EXT4-fs (dm-5): resized filesystem to 4294967296
      
      === after ===
      [  129.104898] EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
      [  143.773630] dm-4: detected capacity change from 34359738368 to 38654705664
      [  198.203246] dm-5: detected capacity change from 34359734272 to 38654701568
      [  207.918603] EXT4-fs (dm-5): resizing filesystem from 4294966784 to 4831837696 blocks
      [  207.918754] EXT4-fs (dm-5): resizing filesystem from 4294967296 to 4831837696 blocks
      [  207.918758] EXT4-fs (dm-5): Converting file system to meta_bg
      [  207.918790] EXT4-fs (dm-5): resizing filesystem from 4294967296 to 4831837696 blocks
      [  221.454050] EXT4-fs (dm-5): resized to 4658298880 blocks
      [  227.634613] EXT4-fs (dm-5): resized filesystem to 4831837696
      Signed-off-by: NJerry Lee <jerrylee@qnap.com>
      Link: https://lore.kernel.org/r/PU1PR04MB22635E739BD21150DC182AC6A18C9@PU1PR04MB2263.apcprd04.prod.outlook.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      3b8b0d3f
    • K
      nvme: copy firmware_rev on each init · 8f5816c2
      Keith Busch 提交于
      stable inclusion
      from stable-v4.19.262
      commit 366a2b3110c69f919fb3277acc1a0bb8cd8a8dbd
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit a8eb6c1b ]
      
      The firmware revision can change on after a reset so copy the most
      recent info each time instead of just the first time, otherwise the
      sysfs firmware_rev entry may contain stale data.
      Reported-by: NJeff Lien <jeff.lien@wdc.com>
      Signed-off-by: NKeith Busch <kbusch@kernel.org>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
      Reviewed-by: NChao Leng <lengchao@huawei.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      8f5816c2
    • L
      net: If sock is dead don't access sock's sk_wq in sk_stream_wait_memory · 86774965
      Liu Jian 提交于
      stable inclusion
      from stable-v4.19.262
      commit 5fe03917bb017d9af68a95f989f1c122eebc69a6
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 3f8ef65a ]
      
      Fixes the below NULL pointer dereference:
      
        [...]
        [   14.471200] Call Trace:
        [   14.471562]  <TASK>
        [   14.471882]  lock_acquire+0x245/0x2e0
        [   14.472416]  ? remove_wait_queue+0x12/0x50
        [   14.473014]  ? _raw_spin_lock_irqsave+0x17/0x50
        [   14.473681]  _raw_spin_lock_irqsave+0x3d/0x50
        [   14.474318]  ? remove_wait_queue+0x12/0x50
        [   14.474907]  remove_wait_queue+0x12/0x50
        [   14.475480]  sk_stream_wait_memory+0x20d/0x340
        [   14.476127]  ? do_wait_intr_irq+0x80/0x80
        [   14.476704]  do_tcp_sendpages+0x287/0x600
        [   14.477283]  tcp_bpf_push+0xab/0x260
        [   14.477817]  tcp_bpf_sendmsg_redir+0x297/0x500
        [   14.478461]  ? __local_bh_enable_ip+0x77/0xe0
        [   14.479096]  tcp_bpf_send_verdict+0x105/0x470
        [   14.479729]  tcp_bpf_sendmsg+0x318/0x4f0
        [   14.480311]  sock_sendmsg+0x2d/0x40
        [   14.480822]  ____sys_sendmsg+0x1b4/0x1c0
        [   14.481390]  ? copy_msghdr_from_user+0x62/0x80
        [   14.482048]  ___sys_sendmsg+0x78/0xb0
        [   14.482580]  ? vmf_insert_pfn_prot+0x91/0x150
        [   14.483215]  ? __do_fault+0x2a/0x1a0
        [   14.483738]  ? do_fault+0x15e/0x5d0
        [   14.484246]  ? __handle_mm_fault+0x56b/0x1040
        [   14.484874]  ? lock_is_held_type+0xdf/0x130
        [   14.485474]  ? find_held_lock+0x2d/0x90
        [   14.486046]  ? __sys_sendmsg+0x41/0x70
        [   14.486587]  __sys_sendmsg+0x41/0x70
        [   14.487105]  ? intel_pmu_drain_pebs_core+0x350/0x350
        [   14.487822]  do_syscall_64+0x34/0x80
        [   14.488345]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
        [...]
      
      The test scenario has the following flow:
      
      thread1                               thread2
      -----------                           ---------------
       tcp_bpf_sendmsg
        tcp_bpf_send_verdict
         tcp_bpf_sendmsg_redir              sock_close
          tcp_bpf_push_locked                 __sock_release
           tcp_bpf_push                         //inet_release
            do_tcp_sendpages                    sock->ops->release
             sk_stream_wait_memory          	   // tcp_close
                sk_wait_event                      sk->sk_prot->close
                 release_sock(__sk);
                  ***
                                                      lock_sock(sk);
                                                        __tcp_close
                                                          sock_orphan(sk)
                                                            sk->sk_wq  = NULL
                                                      release_sock
                  ****
                 lock_sock(__sk);
                remove_wait_queue(sk_sleep(sk), &wait);
                   sk_sleep(sk)
                   //NULL pointer dereference
                   &rcu_dereference_raw(sk->sk_wq)->wait
      
      While waiting for memory in thread1, the socket is released with its wait
      queue because thread2 has closed it. This caused by tcp_bpf_send_verdict
      didn't increase the f_count of psock->sk_redir->sk_socket->file in thread1.
      
      We should check if SOCK_DEAD flag is set on wakeup in sk_stream_wait_memory
      before accessing the wait queue.
      Suggested-by: NJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: NLiu Jian <liujian56@huawei.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/bpf/20220823133755.314697-2-liujian56@huawei.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      86774965
    • Z
      can: bcm: check the result of can_send() in bcm_can_tx() · 4b29a149
      Ziyang Xuan 提交于
      stable inclusion
      from stable-v4.19.262
      commit dae06957f856eb699f2a504a46891718c9b1e0d3
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 3fd7bfd2 ]
      
      If can_send() fail, it should not update frames_abs counter
      in bcm_can_tx(). Add the result check for can_send() in bcm_can_tx().
      Suggested-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      Suggested-by: NOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com>
      Link: https://lore.kernel.org/all/9851878e74d6d37aee2f1ee76d68361a46f89458.1663206163.git.william.xuanziyang@huawei.comAcked-by: NOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      4b29a149
    • K
      xfrm: Update ipcomp_scratches with NULL when freed · 16cebfda
      Khalid Masum 提交于
      stable inclusion
      from stable-v4.19.262
      commit 1e8abde895b3ac6a368cbdb372e8800c49e73a28
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 8a04d2fc ]
      
      Currently if ipcomp_alloc_scratches() fails to allocate memory
      ipcomp_scratches holds obsolete address. So when we try to free the
      percpu scratches using ipcomp_free_scratches() it tries to vfree non
      existent vm area. Described below:
      
      static void * __percpu *ipcomp_alloc_scratches(void)
      {
              ...
              scratches = alloc_percpu(void *);
              if (!scratches)
                      return NULL;
      ipcomp_scratches does not know about this allocation failure.
      Therefore holding the old obsolete address.
              ...
      }
      
      So when we free,
      
      static void ipcomp_free_scratches(void)
      {
              ...
              scratches = ipcomp_scratches;
      Assigning obsolete address from ipcomp_scratches
      
              if (!scratches)
                      return;
      
              for_each_possible_cpu(i)
                     vfree(*per_cpu_ptr(scratches, i));
      Trying to free non existent page, causing warning: trying to vfree
      existent vm area.
              ...
      }
      
      Fix this breakage by updating ipcomp_scrtches with NULL when scratches
      is freed
      Suggested-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Reported-by: syzbot+5ec9bb042ddfe9644773@syzkaller.appspotmail.com
      Tested-by: syzbot+5ec9bb042ddfe9644773@syzkaller.appspotmail.com
      Signed-off-by: NKhalid Masum <khalid.masum.92@gmail.com>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      16cebfda
    • E
      tcp: annotate data-race around tcp_md5sig_pool_populated · e68b9f9b
      Eric Dumazet 提交于
      stable inclusion
      from stable-v4.19.262
      commit 5c4e1b8939195fe27b05d791577f92445b139a3e
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit aacd467c ]
      
      tcp_md5sig_pool_populated can be read while another thread
      changes its value.
      
      The race has no consequence because allocations
      are protected with tcp_md5sig_mutex.
      
      This patch adds READ_ONCE() and WRITE_ONCE() to document
      the race and silence KCSAN.
      Reported-by: NAbhishek Shah <abhishek.shah@columbia.edu>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      e68b9f9b
    • N
      tcp: fix tcp_cwnd_validate() to not forget is_cwnd_limited · 4e3f7a25
      Neal Cardwell 提交于
      stable inclusion
      from stable-v4.19.262
      commit a434d10e7a90e301ea4a1826ee758b53c79d7de8
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit f4ce91ce ]
      
      This commit fixes a bug in the tracking of max_packets_out and
      is_cwnd_limited. This bug can cause the connection to fail to remember
      that is_cwnd_limited is true, causing the connection to fail to grow
      cwnd when it should, causing throughput to be lower than it should be.
      
      The following event sequence is an example that triggers the bug:
      
       (a) The connection is cwnd_limited, but packets_out is not at its
           peak due to TSO deferral deciding not to send another skb yet.
           In such cases the connection can advance max_packets_seq and set
           tp->is_cwnd_limited to true and max_packets_out to a small
           number.
      
      (b) Then later in the round trip the connection is pacing-limited (not
           cwnd-limited), and packets_out is larger. In such cases the
           connection would raise max_packets_out to a bigger number but
           (unexpectedly) flip tp->is_cwnd_limited from true to false.
      
      This commit fixes that bug.
      
      One straightforward fix would be to separately track (a) the next
      window after max_packets_out reaches a maximum, and (b) the next
      window after tp->is_cwnd_limited is set to true. But this would
      require consuming an extra u32 sequence number.
      
      Instead, to save space we track only the most important
      information. Specifically, we track the strongest available signal of
      the degree to which the cwnd is fully utilized:
      
      (1) If the connection is cwnd-limited then we remember that fact for
      the current window.
      
      (2) If the connection not cwnd-limited then we track the maximum
      number of outstanding packets in the current window.
      
      In particular, note that the new logic cannot trigger the buggy
      (a)/(b) sequence above because with the new logic a condition where
      tp->packets_out > tp->max_packets_out can only trigger an update of
      tp->is_cwnd_limited if tp->is_cwnd_limited is false.
      
      This first showed up in a testing of a BBRv2 dev branch, but this
      buggy behavior highlighted a general issue with the
      tcp_cwnd_validate() logic that can cause cwnd to fail to increase at
      the proper rate for any TCP congestion control, including Reno or
      CUBIC.
      
      Fixes: ca8a2263 ("tcp: make cwnd-limited checks measurement-based, and gentler")
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NKevin(Yudong) Yang <yyd@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      4e3f7a25
    • B
      ext4: fix null-ptr-deref in ext4_write_info · 4700f752
      Baokun Li 提交于
      stable inclusion
      from stable-v4.19.262
      commit 947264e00c46de19a016fd81218118c708fed2f3
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
      CVE: NA
      
      --------------------------------
      
      commit f9c1f248 upstream.
      
      I caught a null-ptr-deref bug as follows:
      
      ==================================================================
      KASAN: null-ptr-deref in range [0x0000000000000068-0x000000000000006f]
      CPU: 1 PID: 1589 Comm: umount Not tainted 5.10.0-02219-dirty #339
      RIP: 0010:ext4_write_info+0x53/0x1b0
      [...]
      Call Trace:
       dquot_writeback_dquots+0x341/0x9a0
       ext4_sync_fs+0x19e/0x800
       __sync_filesystem+0x83/0x100
       sync_filesystem+0x89/0xf0
       generic_shutdown_super+0x79/0x3e0
       kill_block_super+0xa1/0x110
       deactivate_locked_super+0xac/0x130
       deactivate_super+0xb6/0xd0
       cleanup_mnt+0x289/0x400
       __cleanup_mnt+0x16/0x20
       task_work_run+0x11c/0x1c0
       exit_to_user_mode_prepare+0x203/0x210
       syscall_exit_to_user_mode+0x5b/0x3a0
       do_syscall_64+0x59/0x70
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
       ==================================================================
      
      Above issue may happen as follows:
      -------------------------------------
      exit_to_user_mode_prepare
       task_work_run
        __cleanup_mnt
         cleanup_mnt
          deactivate_super
           deactivate_locked_super
            kill_block_super
             generic_shutdown_super
              shrink_dcache_for_umount
               dentry = sb->s_root
               sb->s_root = NULL              <--- Here set NULL
              sync_filesystem
               __sync_filesystem
                sb->s_op->sync_fs > ext4_sync_fs
                 dquot_writeback_dquots
                  sb->dq_op->write_info > ext4_write_info
                   ext4_journal_start(d_inode(sb->s_root), EXT4_HT_QUOTA, 2)
                    d_inode(sb->s_root)
                     s_root->d_inode          <--- Null pointer dereference
      
      To solve this problem, we use ext4_journal_start_sb directly
      to avoid s_root being used.
      
      Cc: stable@kernel.org
      Signed-off-by: NBaokun Li <libaokun1@huawei.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20220805123947.565152-1-libaokun1@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      4700f752
    • S
      Revert "fs: check FMODE_LSEEK to control internal pipe splicing" · d5e86eaf
      Sasha Levin 提交于
      stable inclusion
      from stable-v4.19.262
      commit 6d43e94b8daf009694d709c5919d67067936577a
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
      CVE: NA
      
      --------------------------------
      
      This reverts commit fd0a6e99b61e6c08fa5cf585d54fd956f70c73a6.
      
      Which was upstream commit 97ef77c5.
      
      The commit is missing dependencies and breaks NFS tests, remove it for
      now.
      Reported-by: NSaeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      d5e86eaf