1. 10 1月, 2023 5 次提交
    • W
      timekeeping: Avoiding false sharing in field access of tk_core · ae79e85a
      Wang ShaoBo 提交于
      hulk inclusion
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I47W8L
      CVE: NA
      
      ---------------------------
      
      We detect a performance deterioration when using Unixbench, we use the dichotomy to
      locate the patch 7e66740a ("MPAM / ACPI: Refactoring MPAM init process and set
      MPAM ACPI as entrance"), In comparing two commit df5defd9 ("KVM: X86: MMU: Use
      the correct inherited permissions to get shadow page") and ac4dbb75 ("ACPI 6.x:
      Add definitions for MPAM table") we get following testing result:
      
      CMD: ./Run -c xx context1
      RESULT:
      +-------------UnixBench context1-----------+
      +---------+--------------+-----------------+
      +         + ac4dbb75 +  df5defd9   +
      +---------+--------------+---------+-------+
      +  Cores  +    Score     +      Score      +
      +---------+--------------+-----------------+
      +    1    +    522.8     +      535.7      +
      +---------+--------------+-----------------+
      +   24    +   11231.5    +     12111.2     +
      +---------+--------------+-----------------+
      +   48    +   8535.1     +     8745.1      +
      +---------+--------------+-----------------+
      +   72    +   10821.9    +     10343.8     +
      +---------+--------------+-----------------+
      +   96    +   15238.5    +     42947.8     +
      +---------+--------------+-----------------+
      
      We found a irrefutable difference in latency sampling when using the perf tool:
      
      HEAD:ac4dbb75                                   HEAD:df5defd9
      
      45.18% [kernel] [k] ktime_get_coarse_real_ts64  ->  1.78% [kernel] [k] ktime_get_coarse_real_ts64
                                                                                     ...
                                                                             65.87 │ dmb ishld //smp_rmb()
      
      Through ftrace we get the calltrace and and detected the number of visits of
      ktime_get_coarse_real_ts64, which frequently visits tk_core->seq and
      tk_core->timekeeper->tkr_mono:
      
      -   48.86%  [kernel]                 [k] ktime_get_coarse_real_ts64
         - 5.76% ktime_get_coarse_real_ts64   #about 111437657 times per 10 seconds
            - 14.70% __audit_syscall_entry
                 syscall_trace_enter
                 el0_svc_common
                 el0_svc_handler
               + el0_svc
            - 2.85% current_time
      
      So this may be performance degradation caused by interference when happened different
      fields access, We compare .bss and .data section of this two version:
      
          HEAD:ac4dbb75
      `->
          ffff00000962e680 l     O .bss   0000000000000110 tk_core
          ffff000009355680 l     O .data  0000000000000078 tk_fast_mono
          ffff0000093557a0 l     O .data  0000000000000090 dummy_clock
          ffff000009355700 l     O .data  0000000000000078 tk_fast_raw
          ffff000009355778 l     O .data  0000000000000028 timekeeping_syscore_ops
          ffff00000962e640 l     O .bss   0000000000000008 cycles_at_suspend
      
          HEAD:df5defd9
      `->
          ffff00000957dbc0 l     O .bss   0000000000000110 tk_core
          ffff0000092b4e80 l     O .data  0000000000000078 tk_fast_mono
          ffff0000092b4fa0 l     O .data  0000000000000090 dummy_clock
          ffff0000092b4f00 l     O .data  0000000000000078 tk_fast_raw
          ffff0000092b4f78 l     O .data  0000000000000028 timekeeping_syscore_ops
          ffff00000957db80 l     O .bss   0000000000000008 cycles_at_suspend
      
      By comparing this two version tk_core's address: ffff00000962e680 is 128Byte aligned
      but latter df5defd9 is 64Byte aligned, the memory storage layout of tk_core has
      undergone subtle changes:
      
          HEAD:ac4dbb75
      `->                     |<--------formmer 64Bytes---------->|<------------latter 64Byte------------->|
          0xffff00000957dbc0_>|<-seq 8Bytes->|<-tkr_mono 56Bytes->|<-thr_raw 56Bytes->|<-xtime_sec 8Bytes->|
          0xffff00000957dc00_>...
      
          HEAD:df5defd9
      `->                     |<------formmer 64Bytes---->|<------------latter 64Byte-------->|
          0xffff00000962e680_>|<-Other variables 64Bytes->|<-seq 8Bytes->|<-tkr_mono 56Bytes->|
          0xffff00000962e6c0_>..
      
      We testified thr_raw,xtime_sec fields interfere strongly with seq,tkr_mono field because of
      frequent load/store operation, this will cause as known false sharing.
      
      We add a 64Bytes padding field in tk_core for reservation of any after usefull usage and
      keep tk_core 128Byte aligned, this can avoid changes in the way tk_core's layout is stored,
      In this solution, layout of tk_core always like this:
      
      crash>  struct -o tk_core_t
      struct tk_core_t {
          [0] u64 padding[8];
         [64] seqcount_t seq;
         [72] struct timekeeper timekeeper;
      }
      SIZE: 336
      crash> struct -o timekeeper
      struct timekeeper {
          [0] struct tk_read_base tkr_mono;
         [56] struct tk_read_base tkr_raw;
        [112] u64 xtime_sec;
        [120] unsigned long ktime_sec;
        ...
      }
      SIZE: 264
      
      After appling our own solution:
      
      +---------+--------------+
      +         + Our solution +
      +---------+--------------+
      +  Cores  +    Score     +
      +---------+--------------+
      +    1    +    548.9     +
      +---------+--------------+
      +   24    +   11018.3    +
      +---------+--------------+
      +   48    +   8938.2     +
      +---------+--------------+
      +   72    +   14610.7    +
      +---------+--------------+
      +   96    +   40811.7    +
      +---------+--------------+
      Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
      Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
      Signed-off-by: NLaibin Qiu <qiulaibin@huawei.com>
      ae79e85a
    • N
      mm/hwpoison: put page in already hwpoisoned case with MF_COUNT_INCREASED · cc4d1b26
      Naoya Horiguchi 提交于
      mainline inclusion
      from mainline-v5.19-rc1
      commit f361e246
      category: bugfix
      bugzilla: 188200, https://gitee.com/openeuler/kernel/issues/I68OOI
      CVE: NA
      
      --------------------------------
      
      In already hwpoisoned case, memory_failure() is supposed to return with
      releasing the page refcount taken for error handling.  But currently the
      refcount is not released when called with MF_COUNT_INCREASED, which makes
      page refcount inconsistent.  This should be rare and non-critical, but it
      might be inconvenient in testing (unpoison doesn't work).
      
      Link: https://lkml.kernel.org/r/20220408135323.1559401-3-naoya.horiguchi@linux.devSigned-off-by: NNaoya Horiguchi <naoya.horiguchi@nec.com>
      Suggested-by: NMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
      Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      cc4d1b26
    • M
      mm/memory-failure.c: fix race with changing page more robustly · 626d06a2
      Miaohe Lin 提交于
      mainline inclusion
      from mainline-v5.18-rc1
      commit 75ee64b3
      category: bugfix
      bugzilla: 188200, https://gitee.com/openeuler/kernel/issues/I68OOI
      CVE: NA
      
      --------------------------------
      
      We're only intended to deal with the non-Compound page after we split
      thp in memory_failure.  However, the page could have changed compound
      pages due to race window.  If this happens, we could retry once to
      hopefully handle the page next round.  Also remove unneeded orig_head.
      It's always equal to the hpage.  So we can use hpage directly and remove
      this redundant one.
      
      Link: https://lkml.kernel.org/r/20220218090118.1105-5-linmiaohe@huawei.comSigned-off-by: NMiaohe Lin <linmiaohe@huawei.com>
      Acked-by: NNaoya Horiguchi <naoya.horiguchi@nec.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
      Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      626d06a2
    • O
      mm,memory_failure: always pin the page in madvise_inject_error · f0c2fbe9
      Oscar Salvador 提交于
      mainline inclusion
      from mainline-v5.11-rc1
      commit 1e8aaedb
      category: bugfix
      bugzilla: 188200, https://gitee.com/openeuler/kernel/issues/I68OOI
      CVE: NA
      
      --------------------------------
      
      madvise_inject_error() uses get_user_pages_fast to translate the address
      we specified to a page.  After [1], we drop the extra reference count for
      memory_failure() path.  That commit says that memory_failure wanted to
      keep the pin in order to take the page out of circulation.
      
      The truth is that we need to keep the page pinned, otherwise the page
      might be re-used after the put_page() and we can end up messing with
      someone else's memory.
      
      E.g:
      
      CPU0
      process X					CPU1
       madvise_inject_error
        get_user_pages
         put_page
      					page gets reclaimed
      					process Y allocates the page
        memory_failure
         // We mess with process Y memory
      
      madvise() is meant to operate on a self address space, so messing with
      pages that do not belong to us seems the wrong thing to do.
      To avoid that, let us keep the page pinned for memory_failure as well.
      
      Pages for DAX mappings will release this extra refcount in
      memory_failure_dev_pagemap.
      
      [1] ("23e7b5c2: mm, madvise_inject_error:
            Let memory_failure() optionally take a page reference")
      
      Link: https://lkml.kernel.org/r/20201207094818.8518-1-osalvador@suse.de
      Fixes: 23e7b5c2 ("mm, madvise_inject_error: Let memory_failure() optionally take a page reference")
      Signed-off-by: NOscar Salvador <osalvador@suse.de>
      Suggested-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NNaoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Conflicts:
      	mm/madvise.c
      Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
      Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      f0c2fbe9
    • X
      kobject: Fix slab-out-of-bounds in fill_kobj_path() · 95e62156
      Xia Fukun 提交于
      hulk inclusion
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I697JG
      CVE: NA
      
      --------------------------------
      
      In kobject_get_path(), if kobj->name is changed between calls
      get_kobj_path_length() and fill_kobj_path() and the length becomes
      longer, then fill_kobj_path() will have an out-of-bounds bug.
      
      The actual current problem occurs when the ixgbe probe.
      
      In ixgbe_mii_bus_init(), if the length of netdev->dev.kobj.name
      length becomes longer, out-of-bounds will occur.
      
      cpu0                                         cpu1
      ixgbe_probe
       register_netdev(netdev)
        netdev_register_kobject
         device_add
          kobject_uevent // Sending ADD events
                                                   systemd-udevd // rename netdev
                                                    dev_change_name
                                                     device_rename
                                                      kobject_rename
       ixgbe_mii_bus_init                             |
        mdiobus_register                              |
         __mdiobus_register                           |
          device_register                             |
           device_add                                 |
            kobject_uevent                            |
             kobject_get_path                         |
              len = get_kobj_path_length // old name  |
              path = kzalloc(len, gfp_mask);          |
                                                      kobj->name = name;
                                                      /* name length becomes
                                                       * longer
                                                       */
              fill_kobj_path /* kobj path length is
                              * longer than path,
                              * resulting in out of
                              * bounds when filling path
                              */
      
      This is the kasan report:
      
      ==================================================================
      BUG: KASAN: slab-out-of-bounds in fill_kobj_path+0x50/0xc0
      Write of size 7 at addr ff1100090573d1fd by task kworker/28:1/673
      
       Workqueue: events work_for_cpu_fn
       Call Trace:
       <TASK>
       dump_stack_lvl+0x34/0x48
       print_address_description.constprop.0+0x86/0x1e7
       print_report+0x36/0x4f
       kasan_report+0xad/0x130
       kasan_check_range+0x35/0x1c0
       memcpy+0x39/0x60
       fill_kobj_path+0x50/0xc0
       kobject_get_path+0x5a/0xc0
       kobject_uevent_env+0x140/0x460
       device_add+0x5c7/0x910
       __mdiobus_register+0x14e/0x490
       ixgbe_probe.cold+0x441/0x574 [ixgbe]
       local_pci_probe+0x78/0xc0
       work_for_cpu_fn+0x26/0x40
       process_one_work+0x3b6/0x6a0
       worker_thread+0x368/0x520
       kthread+0x165/0x1a0
       ret_from_fork+0x1f/0x30
      
      This reproducer triggers that bug:
      
      while:
      do
          rmmod ixgbe
          sleep 0.5
          modprobe ixgbe
          sleep 0.5
      
      When calling fill_kobj_path() to fill path, if the name length of
      kobj becomes longer, return failure and retry. This fixes the problem.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NWang Hai <wanghai38@huawei.com>
      Signed-off-by: NXia Fukun <xiafukun@huawei.com>
      Reviewed-by: Nsongping yu <yusongping@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      95e62156
  2. 05 1月, 2023 3 次提交
  3. 30 12月, 2022 1 次提交
  4. 27 12月, 2022 1 次提交
    • Z
      dm thin: Use last transaction's pmd->root when commit failed · 97e4e6f4
      Zhihao Cheng 提交于
      mainline inclusion
      from mainline-v6.2-rc1
      commit 7991dbff
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I65M32
      CVE: NA
      
      --------------------------------
      
      Recently we found a softlock up problem in dm thin pool btree lookup
      code due to corrupted metadata:
      
       Kernel panic - not syncing: softlockup: hung tasks
       CPU: 7 PID: 2669225 Comm: kworker/u16:3
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
       Workqueue: dm-thin do_worker [dm_thin_pool]
       Call Trace:
         <IRQ>
         dump_stack+0x9c/0xd3
         panic+0x35d/0x6b9
         watchdog_timer_fn.cold+0x16/0x25
         __run_hrtimer+0xa2/0x2d0
         </IRQ>
         RIP: 0010:__relink_lru+0x102/0x220 [dm_bufio]
         __bufio_new+0x11f/0x4f0 [dm_bufio]
         new_read+0xa3/0x1e0 [dm_bufio]
         dm_bm_read_lock+0x33/0xd0 [dm_persistent_data]
         ro_step+0x63/0x100 [dm_persistent_data]
         btree_lookup_raw.constprop.0+0x44/0x220 [dm_persistent_data]
         dm_btree_lookup+0x16f/0x210 [dm_persistent_data]
         dm_thin_find_block+0x12c/0x210 [dm_thin_pool]
         __process_bio_read_only+0xc5/0x400 [dm_thin_pool]
         process_thin_deferred_bios+0x1a4/0x4a0 [dm_thin_pool]
         process_one_work+0x3c5/0x730
      
      Following process may generate a broken btree mixed with fresh and
      stale btree nodes, which could get dm thin trapped in an infinite loop
      while looking up data block:
       Transaction 1: pmd->root = A, A->B->C   // One path in btree
                      pmd->root = X, X->Y->Z   // Copy-up
       Transaction 2: X,Z is updated on disk, Y write failed.
                      // Commit failed, dm thin becomes read-only.
                      process_bio_read_only
      		 dm_thin_find_block
      		  __find_block
      		   dm_btree_lookup(pmd->root)
      The pmd->root points to a broken btree, Y may contain stale node
      pointing to any block, for example X, which gets dm thin trapped into
      a dead loop while looking up Z.
      
      Fix this by setting pmd->root in __open_metadata(), so that dm thin
      will use the last transaction's pmd->root if commit failed.
      
      Fetch a reproducer in [Link].
      
      Linke: https://bugzilla.kernel.org/show_bug.cgi?id=216790
      Cc: stable@vger.kernel.org
      Fixes: 991d9fa0 ("dm: add thin provisioning target")
      Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@kernel.org>
      Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      97e4e6f4
  5. 24 12月, 2022 4 次提交
  6. 22 12月, 2022 1 次提交
  7. 20 12月, 2022 1 次提交
    • O
      !325 Support enabling dirty log gradually in small chunks · 7b1b5d4d
      openeuler-ci-bot 提交于
      Merge Pull Request from: @Mayunlong541 
       
      KVM: x86/arm64: enable dirty log gradually in small chunks
      Reducing performance loss during hugepage VM migration.
      
      (1)add dirty log reprotect function
      Patch:
      kvm: make KVM_CAP_ENABLE_CAP_VM architecture agnostic
            kvm: introduce manual dirty log reprotect
      
      (2)fix some bug about dirty_log_protect, commects and argument name
      Patch:
      kvm: rename last argument to kvm_get_dirty_log_protect
            KVM: validate userspace input in kvm_clear_dirty_log_protect()
            Revert "KVM: Eliminate extra function calls in kvm_get_dirty_log_protect()"
            KVM: fix KVM_CLEAR_DIRTY_LOG for memory slots of unaligned size
            kvm_main: fix some comments
            KVM: Fix the bitmap range to copy during clear dirty
            KVM: Fix kvm_clear_dirty_log_protect off-by-(minus-)one
            KVM: Introduce KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2
      
      (3)support enbaling dirty log gradually in small chunks
      Patch:
      KVM: x86: enable dirty log gradually in small chunks
            KVM: arm64: Support enabling dirty log gradually in small chunks
      
      https://gitee.com/openeuler/kernel/issues/I66COX 
       
      Link:https://gitee.com/openeuler/kernel/pulls/325 
      Reviewed-by: Kevin Zhu <zhukeqian1@huawei.com> 
      Reviewed-by: Laibin Qiu <qiulaibin@huawei.com> 
      Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com> 
      7b1b5d4d
  8. 19 12月, 2022 5 次提交
  9. 17 12月, 2022 12 次提交
  10. 16 12月, 2022 7 次提交
    • J
      xen/netback: don't call kfree_skb() with interrupts disabled · 6271646d
      Juergen Gross 提交于
      mainline inclusion
      from mainline-v6.1
      commit 74e7e1ef
      category: bugfix
      bugzilla: https://gitee.com/src-openeuler/kernel/issues/I651DP
      CVE: CVE-2022-42328
      
      --------------------------------
      
      It is not allowed to call kfree_skb() from hardware interrupt
      context or with interrupts being disabled. So remove kfree_skb()
      from the spin_lock_irqsave() section and use the already existing
      "drop" label in xenvif_start_xmit() for dropping the SKB. At the
      same time replace the dev_kfree_skb() call there with a call of
      dev_kfree_skb_any(), as xenvif_start_xmit() can be called with
      disabled interrupts.
      
      This is XSA-424 / CVE-2022-42328 / CVE-2022-42329.
      
      Fixes: be81992f ("xen/netback: don't queue unlimited number of packages")
      Reported-by: NYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reviewed-by: NJan Beulich <jbeulich@suse.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      
      conflict:
      	drivers/net/xen-netback/common.h
      Signed-off-by: NLu Wei <luwei32@huawei.com>
      Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
      Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      6271646d
    • J
      xen/netback: fix build warning · fa44095a
      Juergen Gross 提交于
      mainline inclusion
      from mainline-v6.1
      commit 7dfa764e
      category: bugfix
      bugzilla: https://gitee.com/src-openeuler/kernel/issues/I651EB
      CVE: CVE-2022-3643
      
      --------------------------------
      
      Commit ad7f402a ("xen/netback: Ensure protocol headers don't fall in
      the non-linear area") introduced a (valid) build warning. There have
      even been reports of this problem breaking networking of Xen guests.
      
      Fixes: ad7f402a ("xen/netback: Ensure protocol headers don't fall in the non-linear area")
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reviewed-by: NJan Beulich <jbeulich@suse.com>
      Reviewed-by: NRoss Lagerwall <ross.lagerwall@citrix.com>
      Tested-by: NJason Andryuk <jandryuk@gmail.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NZhengchao Shao <shaozhengchao@huawei.com>
      Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
      Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      fa44095a
    • R
      xen/netback: Ensure protocol headers don't fall in the non-linear area · 885cfe76
      Ross Lagerwall 提交于
      mainline inclusion
      from mainline-v6.1
      commit ad7f402a
      category: bugfix
      bugzilla: https://gitee.com/src-openeuler/kernel/issues/I651EB
      CVE: CVE-2022-3643
      
      --------------------------------
      
      In some cases, the frontend may send a packet where the protocol headers
      are spread across multiple slots. This would result in netback creating
      an skb where the protocol headers spill over into the non-linear area.
      Some drivers and NICs don't handle this properly resulting in an
      interface reset or worse.
      
      This issue was introduced by the removal of an unconditional skb pull in
      the tx path to improve performance.  Fix this without reintroducing the
      pull by setting up grant copy ops for as many slots as needed to reach
      the XEN_NETBACK_TX_COPY_LEN size. Adjust the rest of the code to handle
      multiple copy operations per skb.
      
      This is XSA-423 / CVE-2022-3643.
      
      Fixes: 7e5d7753 ("xen-netback: remove unconditional __pskb_pull_tail() in guest Tx path")
      Signed-off-by: NRoss Lagerwall <ross.lagerwall@citrix.com>
      Reviewed-by: NPaul Durrant <paul@xen.org>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NZhengchao Shao <shaozhengchao@huawei.com>
      Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
      Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      885cfe76
    • O
      !273 [openEuler-1.0-LTS] Fix mouse enumeration issue after wakeup from s4 · ed9df17a
      openeuler-ci-bot 提交于
      Merge Pull Request from: @leoliu-oc 
       
      There is a mouse attached in the xHCI port. Then plug out this mouse and plug to UHCI port after system go into hibernation. This mouse will random be identified after system wakeup from hibernation.
      
      During s4 wakeup, xHCI driver will cleanup this disconnect mouse (not connect to xHCI port). This will delay s4 wakeup process and UHCI root hub will goto auto suspend. Usb hub threads will be called to handle usb controller root hub's event after S4 wakeup completed. However, this are too many usb controllers to ensure EHCI and UHCI hub threads execute order. Once, EHCI giveback port to UHCI before UHCI hub event check. UHCI will try to enumerate this mouse with UHCI run bit not set. Which will cause control transfer fail during enumeration phase.
      
      In order to fix this issues, set UHCI root hub auto suspend delay value larger. UHCI run bit will be set after wakeup from S4 and mouse will be identified.
      
      ### Issue
      https://gitee.com/openeuler/kernel/issues/I62V77
      
      ### Test
      N/A
      
      ### Knowe Issue
      N/A
      
      ### Default config change
      N/A 
       
      Link:https://gitee.com/openeuler/kernel/pulls/273 
      Reviewed-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> 
      Reviewed-by: Laibin Qiu <qiulaibin@huawei.com> 
      Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com> 
      ed9df17a
    • R
      arm64: fix a concurrency issue in emulation_proc_handler() · f676dd4c
      ruanjinjie 提交于
      hulk inclusion
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I65T0J
      CVE: NA
      
      -------------------------------
      
      In emulation_proc_handler(), read and write operations are performed on
      insn->current_mode. In the concurrency scenario, mutex only protects
      writing insn->current_mode, and not protects the read. Suppose there are
      two concurrent tasks, task1 updates insn->current_mode to INSN_EMULATE
      in the critical section, the prev_mode of task2 is still the old data
      INSN_UNDEF of insn->current_mode. As a result, two tasks call
      update_insn_emulation_mode twice with prev_mode = INSN_UNDEF and
      current_mode = INSN_EMULATE, then call register_emulation_hooks twice,
      resulting in a list_add double problem.
      
      Call trace:
       __list_add_valid+0xd8/0xe4
       register_undef_hook+0x94/0x13c
       update_insn_emulation_mode+0xd0/0x12c
       emulation_proc_handler+0xd8/0xf4
       proc_sys_call_handler+0x140/0x250
       proc_sys_write+0x1c/0x2c
       new_sync_write+0xec/0x18c
       vfs_write+0x214/0x2ac
       ksys_write+0x70/0xfc
       __arm64_sys_write+0x24/0x30
       el0_svc_common.constprop.0+0x7c/0x1bc
       do_el0_svc+0x2c/0x94
       el0_svc+0x20/0x30
       el0_sync_handler+0xb0/0xb4
       el0_sync+0x160/0x180
      
      Fixes: 08f3f0b2 ("arm64: fix oops in concurrently setting insn_emulation sysctls")
      Signed-off-by: Nruanjinjie <ruanjinjie@huawei.com>
      Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
      Reviewed-by: NLiao Chang <liaochang1@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      f676dd4c
    • Z
      dm thin: Fix ABBA deadlock between shrink_slab and dm_pool_abort_metadata · 6ae2a8a9
      Zhihao Cheng 提交于
      mainline inclusion
      from mainline-v6.1
      commit 8111964f
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I65I9A
      CVE: NA
      
      -------------------------------
      
      Following concurrent processes:
      
                P1(drop cache)                P2(kworker)
      drop_caches_sysctl_handler
       drop_slab
        shrink_slab
         down_read(&shrinker_rwsem)  - LOCK A
         do_shrink_slab
          super_cache_scan
           prune_icache_sb
            dispose_list
             evict
              ext4_evict_inode
      	 ext4_clear_inode
      	  ext4_discard_preallocations
      	   ext4_mb_load_buddy_gfp
      	    ext4_mb_init_cache
      	     ext4_read_block_bitmap_nowait
      	      ext4_read_bh_nowait
      	       submit_bh
      	        dm_submit_bio
      		                 do_worker
      				  process_deferred_bios
      				   commit
      				    metadata_operation_failed
      				     dm_pool_abort_metadata
      				      down_write(&pmd->root_lock) - LOCK B
      		                      __destroy_persistent_data_objects
      				       dm_block_manager_destroy
      				        dm_bufio_client_destroy
      				         unregister_shrinker
      					  down_write(&shrinker_rwsem)
      		 thin_map                            |
      		  dm_thin_find_block                 ↓
      		   down_read(&pmd->root_lock) --> ABBA deadlock
      
      , which triggers hung task:
      
      [   76.974820] INFO: task kworker/u4:3:63 blocked for more than 15 seconds.
      [   76.976019]       Not tainted 6.1.0-rc4-00011-g8f17dd350364-dirty #910
      [   76.978521] task:kworker/u4:3    state:D stack:0     pid:63    ppid:2
      [   76.978534] Workqueue: dm-thin do_worker
      [   76.978552] Call Trace:
      [   76.978564]  __schedule+0x6ba/0x10f0
      [   76.978582]  schedule+0x9d/0x1e0
      [   76.978588]  rwsem_down_write_slowpath+0x587/0xdf0
      [   76.978600]  down_write+0xec/0x110
      [   76.978607]  unregister_shrinker+0x2c/0xf0
      [   76.978616]  dm_bufio_client_destroy+0x116/0x3d0
      [   76.978625]  dm_block_manager_destroy+0x19/0x40
      [   76.978629]  __destroy_persistent_data_objects+0x5e/0x70
      [   76.978636]  dm_pool_abort_metadata+0x8e/0x100
      [   76.978643]  metadata_operation_failed+0x86/0x110
      [   76.978649]  commit+0x6a/0x230
      [   76.978655]  do_worker+0xc6e/0xd90
      [   76.978702]  process_one_work+0x269/0x630
      [   76.978714]  worker_thread+0x266/0x630
      [   76.978730]  kthread+0x151/0x1b0
      [   76.978772] INFO: task test.sh:2646 blocked for more than 15 seconds.
      [   76.979756]       Not tainted 6.1.0-rc4-00011-g8f17dd350364-dirty #910
      [   76.982111] task:test.sh         state:D stack:0     pid:2646  ppid:2459
      [   76.982128] Call Trace:
      [   76.982139]  __schedule+0x6ba/0x10f0
      [   76.982155]  schedule+0x9d/0x1e0
      [   76.982159]  rwsem_down_read_slowpath+0x4f4/0x910
      [   76.982173]  down_read+0x84/0x170
      [   76.982177]  dm_thin_find_block+0x4c/0xd0
      [   76.982183]  thin_map+0x201/0x3d0
      [   76.982188]  __map_bio+0x5b/0x350
      [   76.982195]  dm_submit_bio+0x2b6/0x930
      [   76.982202]  __submit_bio+0x123/0x2d0
      [   76.982209]  submit_bio_noacct_nocheck+0x101/0x3e0
      [   76.982222]  submit_bio_noacct+0x389/0x770
      [   76.982227]  submit_bio+0x50/0xc0
      [   76.982232]  submit_bh_wbc+0x15e/0x230
      [   76.982238]  submit_bh+0x14/0x20
      [   76.982241]  ext4_read_bh_nowait+0xc5/0x130
      [   76.982247]  ext4_read_block_bitmap_nowait+0x340/0xc60
      [   76.982254]  ext4_mb_init_cache+0x1ce/0xdc0
      [   76.982259]  ext4_mb_load_buddy_gfp+0x987/0xfa0
      [   76.982263]  ext4_discard_preallocations+0x45d/0x830
      [   76.982274]  ext4_clear_inode+0x48/0xf0
      [   76.982280]  ext4_evict_inode+0xcf/0xc70
      [   76.982285]  evict+0x119/0x2b0
      [   76.982290]  dispose_list+0x43/0xa0
      [   76.982294]  prune_icache_sb+0x64/0x90
      [   76.982298]  super_cache_scan+0x155/0x210
      [   76.982303]  do_shrink_slab+0x19e/0x4e0
      [   76.982310]  shrink_slab+0x2bd/0x450
      [   76.982317]  drop_slab+0xcc/0x1a0
      [   76.982323]  drop_caches_sysctl_handler+0xb7/0xe0
      [   76.982327]  proc_sys_call_handler+0x1bc/0x300
      [   76.982331]  proc_sys_write+0x17/0x20
      [   76.982334]  vfs_write+0x3d3/0x570
      [   76.982342]  ksys_write+0x73/0x160
      [   76.982347]  __x64_sys_write+0x1e/0x30
      [   76.982352]  do_syscall_64+0x35/0x80
      [   76.982357]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Function metadata_operation_failed() is called when operations failed
      on dm pool metadata, dm pool will destroy and recreate metadata. So,
      shrinker will be unregistered and registered, which could down write
      shrinker_rwsem under pmd_write_lock.
      
      Fix it by allocating dm_block_manager before locking pmd->root_lock
      and destroying old dm_block_manager after unlocking pmd->root_lock,
      then old dm_block_manager is replaced with new dm_block_manager under
      pmd->root_lock. So, shrinker register/unregister could be done without
      holding pmd->root_lock.
      
      Fetch a reproducer in [Link].
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=216676
      Fixes: e49e5829 ("dm thin: add read only and fail io modes")
      
      Conflicts:
      	drivers/md/dm-thin-metadata.c
      	[ 873f258b("dm thin metadata: do not write metadata if no
      	  changes occurred") is not applied.
      	  6a1b1ddc("dm thin metadata: add wrappers for managing
      	  write locking of metadata") is not applied. ]
      Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      6ae2a8a9
    • Z
      sched/qos: Don't unthrottle cfs_rq when cfs_rq is throttled by qos · fbea24f5
      Zhang Qiao 提交于
      hulk inclusion
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I64OUS
      CVE: NA
      
      -------------------------------
      
      When a cfs_rq throttled by qos, mark cfs_rq->throttled as 1,
      and cfs bw will unthrottled this cfs_rq by mistake, it cause
      a list_del_valid warning.
      So add macro QOS_THROTTLED(=2), when a cfs_rq is throttled by
      qos, we mark the cfs_rq->throttled as QOS_THROTTLED, will check
      the value of cfs_rq->throttled before unthrottle a cfs_rq.
      Signed-off-by: NZhang Qiao <zhangqiao22@huawei.com>
      Reviewed-by: NChen Hui <judy.chenhui@huawei.com>
      Reviewed-by: Nzheng zucheng <zhengzucheng@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      fbea24f5