1. 08 3月, 2022 11 次提交
  2. 07 3月, 2022 15 次提交
  3. 06 3月, 2022 14 次提交
    • J
      Merge branch 'for-next' of... · a7637069
      Jens Axboe 提交于
      Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/colyli/linux-bcache into for-5.18/drivers
      
      Pull bcache updates from Coly:
      
      "We have 2 patches for Linux v5.18, both of them are from Mingzhe Zou.
       The first patch improves bcache initialization speed by avoid
       unnecessary cost of cache consistency, the second one fixes a potential
       NULL pointer deference in bcache initialization time."
      
      * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/colyli/linux-bcache:
        bcache: fixup multiple threads crash
        bcache: fixup bcache_dev_sectors_dirty_add() multithreaded CPU false sharing
      a7637069
    • M
      bcache: fixup multiple threads crash · 887554ab
      Mingzhe Zou 提交于
      When multiple threads to check btree nodes in parallel, the main
      thread wait for all threads to stop or CACHE_SET_IO_DISABLE flag:
      
      wait_event_interruptible(check_state->wait,
                               atomic_read(&check_state->started) == 0 ||
                               test_bit(CACHE_SET_IO_DISABLE, &c->flags));
      
      However, the bch_btree_node_read and bch_btree_node_read_done
      maybe call bch_cache_set_error, then the CACHE_SET_IO_DISABLE
      will be set. If the flag already set, the main thread return
      error. At the same time, maybe some threads still running and
      read NULL pointer, the kernel will crash.
      
      This patch change the event wait condition, the main thread must
      wait for all threads to stop.
      
      Fixes: 8e710227 ("bcache: make bch_btree_check() to be multithreaded")
      Signed-off-by: NMingzhe Zou <mingzhe.zou@easystack.cn>
      Cc: stable@vger.kernel.org # v5.7+
      Signed-off-by: NColy Li <colyli@suse.de>
      887554ab
    • M
      bcache: fixup bcache_dev_sectors_dirty_add() multithreaded CPU false sharing · 7b1002f7
      Mingzhe Zou 提交于
      When attaching a cached device (a.k.a backing device) to a cache
      device, bch_sectors_dirty_init() is called to count dirty sectors
      and stripes (see what bcache_dev_sectors_dirty_add() does) on the
      cache device.
      
      When bcache_dev_sectors_dirty_add() is called, set_bit(stripe,
      d->full_dirty_stripes) or clear_bit(stripe, d->full_dirty_stripes)
      operation will always be performed. In full_dirty_stripes, each 1bit
      represents stripe_size (8192) sectors (512B), so 1bit=4MB (8192*512),
      and each CPU cache line=64B=512bit=2048MB. When 20 threads process
      a cached disk with 100G dirty data, a single thread processes about
      23M at a time, and 20 threads total 460M. These full_dirty_stripes
      bits corresponding to the 460M data is likely to fall in the same CPU
      cache line. When one of these threads performs a set_bit or clear_bit
      operation, the same CPU cache line of other threads will become invalid
      and must read the full_dirty_stripes from the main memory again. Compared
      with single thread, the time of a bcache_dev_sectors_dirty_add()
      call is increased by about 50 times in our test (100G dirty data,
      20 threads, bcache_dev_sectors_dirty_add() is called more than
      20 million times).
      
      This patch tries to test_bit before set_bit or clear_bit operation.
      Therefore, a lot of force set and clear operations will be avoided,
      and most of bcache_dev_sectors_dirty_add() calls will only read CPU
      cache line.
      Signed-off-by: NMingzhe Zou <mingzhe.zou@easystack.cn>
      Signed-off-by: NColy Li <colyli@suse.de>
      7b1002f7
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · dcde98da
      Linus Torvalds 提交于
      Pull input updates from Dmitry Torokhov:
      
       - a fixup for Goodix touchscreen driver allowing it to work on certain
         Cherry Trail devices
      
       - a fix for imbalanced enable/disable regulator in Elam touchpad driver
         that became apparent when used with Asus TF103C 2-in-1 dock
      
       - a couple new input keycodes used on newer keyboards
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        HID: add mapping for KEY_ALL_APPLICATIONS
        HID: add mapping for KEY_DICTATE
        Input: elan_i2c - fix regulator enable count imbalance after suspend/resume
        Input: elan_i2c - move regulator_[en|dis]able() out of elan_[en|dis]able_power()
        Input: goodix - workaround Cherry Trail devices with a bogus ACPI Interrupt() resource
        Input: goodix - use the new soc_intel_is_byt() helper
        Input: samsung-keypad - properly state IOMEM dependency
      dcde98da
    • L
      Merge branch 'akpm' (patches from Andrew) · 0014404f
      Linus Torvalds 提交于
      Merge misc fixes from Andrew Morton:
       "8 patches.
      
        Subsystems affected by this patch series: mm (hugetlb, pagemap, and
        userfaultfd), memfd, selftests, and kconfig"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        configs/debug: set CONFIG_DEBUG_INFO=y properly
        proc: fix documentation and description of pagemap
        kselftest/vm: fix tests build with old libc
        memfd: fix F_SEAL_WRITE after shmem huge page allocated
        mm: fix use-after-free when anon vma name is used after vma is freed
        mm: prevent vm_area_struct::anon_name refcount saturation
        mm: refactor vm_area_struct::anon_vma_name usage code
        selftests/vm: cleanup hugetlb file after mremap test
      0014404f
    • L
      Merge tag 's390-5.17-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · f9026e19
      Linus Torvalds 提交于
      Pull s390 fixes from Vasily Gorbik:
      
       - Fix HAVE_DYNAMIC_FTRACE_WITH_ARGS implementation by providing correct
         switching between ftrace_caller/ftrace_regs_caller and supplying
         pt_regs only when ftrace_regs_caller is activated.
      
       - Fix exception table sorting.
      
       - Fix breakage of kdump tooling by preserving metadata it cannot
         function without.
      
      * tag 's390-5.17-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/extable: fix exception table sorting
        s390/ftrace: fix arch_ftrace_get_regs implementation
        s390/ftrace: fix ftrace_caller/ftrace_regs_caller generation
        s390/setup: preserve memory at OLDMEM_BASE and OLDMEM_SIZE
      f9026e19
    • Q
      configs/debug: set CONFIG_DEBUG_INFO=y properly · d1eff16d
      Qian Cai 提交于
      CONFIG_DEBUG_INFO can't be set by user directly, so set
      CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y instead.
      
      Otherwise, we end up with no debuginfo in vmlinux which is a big no-no
      for kernel debugging.
      
      Link: https://lkml.kernel.org/r/20220301202920.18488-1-quic_qiancai@quicinc.comSigned-off-by: NQian Cai <quic_qiancai@quicinc.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d1eff16d
    • Y
      proc: fix documentation and description of pagemap · dd21bfa4
      Yun Zhou 提交于
      Since bit 57 was exported for uffd-wp write-protected (commit
      fb8e37f3: "mm/pagemap: export uffd-wp protection information"),
      fixing it can reduce some unnecessary confusion.
      
      Link: https://lkml.kernel.org/r/20220301044538.3042713-1-yun.zhou@windriver.com
      Fixes: fb8e37f3 ("mm/pagemap: export uffd-wp protection information")
      Signed-off-by: NYun Zhou <yun.zhou@windriver.com>
      Reviewed-by: NPeter Xu <peterx@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Tiberiu A Georgescu <tiberiu.georgescu@nutanix.com>
      Cc: Florian Schmidt <florian.schmidt@nutanix.com>
      Cc: Ivan Teterevkov <ivan.teterevkov@nutanix.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Colin Cross <ccross@google.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dd21bfa4
    • C
      kselftest/vm: fix tests build with old libc · b773827e
      Chengming Zhou 提交于
      The error message when I build vm tests on debian10 (GLIBC 2.28):
      
          userfaultfd.c: In function `userfaultfd_pagemap_test':
          userfaultfd.c:1393:37: error: `MADV_PAGEOUT' undeclared (first use
          in this function); did you mean `MADV_RANDOM'?
            if (madvise(area_dst, test_pgsize, MADV_PAGEOUT))
                                               ^~~~~~~~~~~~
                                               MADV_RANDOM
      
      This patch includes these newer definitions from UAPI linux/mman.h, is
      useful to fix tests build on systems without these definitions in glibc
      sys/mman.h.
      
      Link: https://lkml.kernel.org/r/20220227055330.43087-2-zhouchengming@bytedance.comSigned-off-by: NChengming Zhou <zhouchengming@bytedance.com>
      Reviewed-by: NShuah Khan <skhan@linuxfoundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b773827e
    • H
      memfd: fix F_SEAL_WRITE after shmem huge page allocated · f2b277c4
      Hugh Dickins 提交于
      Wangyong reports: after enabling tmpfs filesystem to support transparent
      hugepage with the following command:
      
        echo always > /sys/kernel/mm/transparent_hugepage/shmem_enabled
      
      the docker program tries to add F_SEAL_WRITE through the following
      command, but it fails unexpectedly with errno EBUSY:
      
        fcntl(5, F_ADD_SEALS, F_SEAL_WRITE) = -1.
      
      That is because memfd_tag_pins() and memfd_wait_for_pins() were never
      updated for shmem huge pages: checking page_mapcount() against
      page_count() is hopeless on THP subpages - they need to check
      total_mapcount() against page_count() on THP heads only.
      
      Make memfd_tag_pins() (compared > 1) as strict as memfd_wait_for_pins()
      (compared != 1): either can be justified, but given the non-atomic
      total_mapcount() calculation, it is better now to be strict.  Bear in
      mind that total_mapcount() itself scans all of the THP subpages, when
      choosing to take an XA_CHECK_SCHED latency break.
      
      Also fix the unlikely xa_is_value() case in memfd_wait_for_pins(): if a
      page has been swapped out since memfd_tag_pins(), then its refcount must
      have fallen, and so it can safely be untagged.
      
      Link: https://lkml.kernel.org/r/a4f79248-df75-2c8c-3df-ba3317ccb5da@google.comSigned-off-by: NHugh Dickins <hughd@google.com>
      Reported-by: NZeal Robot <zealci@zte.com.cn>
      Reported-by: Nwangyong <wang.yong12@zte.com.cn>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: CGEL ZTE <cgel.zte@gmail.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Yang Yang <yang.yang29@zte.com.cn>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f2b277c4
    • S
      mm: fix use-after-free when anon vma name is used after vma is freed · 942341dc
      Suren Baghdasaryan 提交于
      When adjacent vmas are being merged it can result in the vma that was
      originally passed to madvise_update_vma being destroyed.  In the current
      implementation, the name parameter passed to madvise_update_vma points
      directly to vma->anon_name and it is used after the call to vma_merge.
      In the cases when vma_merge merges the original vma and destroys it,
      this might result in UAF.  For that the original vma would have to hold
      the anon_vma_name with the last reference.  The following vma would need
      to contain a different anon_vma_name object with the same string.  Such
      scenario is shown below:
      
      madvise_vma_behavior(vma)
        madvise_update_vma(vma, ..., anon_name == vma->anon_name)
          vma_merge(vma)
            __vma_adjust(vma) <-- merges vma with adjacent one
              vm_area_free(vma) <-- frees the original vma
          replace_vma_anon_name(anon_name) <-- UAF of vma->anon_name
      
      Fix this by raising the name refcount and stabilizing it.
      
      Link: https://lkml.kernel.org/r/20220224231834.1481408-3-surenb@google.com
      Link: https://lkml.kernel.org/r/20220223153613.835563-3-surenb@google.com
      Fixes: 9a10064f ("mm: add a field to store names for private anonymous memory")
      Signed-off-by: NSuren Baghdasaryan <surenb@google.com>
      Reported-by: syzbot+aa7b3d4b35f9dc46a366@syzkaller.appspotmail.com
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Alexey Gladkov <legion@kernel.org>
      Cc: Chris Hyser <chris.hyser@oracle.com>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: Colin Cross <ccross@google.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Peter Collingbourne <pcc@google.com>
      Cc: Sasha Levin <sashal@kernel.org>
      Cc: Sumit Semwal <sumit.semwal@linaro.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Xiaofeng Cao <caoxiaofeng@yulong.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      942341dc
    • S
      mm: prevent vm_area_struct::anon_name refcount saturation · 96403e11
      Suren Baghdasaryan 提交于
      A deep process chain with many vmas could grow really high.  With
      default sysctl_max_map_count (64k) and default pid_max (32k) the max
      number of vmas in the system is 2147450880 and the refcounter has
      headroom of 1073774592 before it reaches REFCOUNT_SATURATED
      (3221225472).
      
      Therefore it's unlikely that an anonymous name refcounter will overflow
      with these defaults.  Currently the max for pid_max is PID_MAX_LIMIT
      (4194304) and for sysctl_max_map_count it's INT_MAX (2147483647).  In
      this configuration anon_vma_name refcount overflow becomes theoretically
      possible (that still require heavy sharing of that anon_vma_name between
      processes).
      
      kref refcounting interface used in anon_vma_name structure will detect a
      counter overflow when it reaches REFCOUNT_SATURATED value but will only
      generate a warning and freeze the ref counter.  This would lead to the
      refcounted object never being freed.  A determined attacker could leak
      memory like that but it would be rather expensive and inefficient way to
      do so.
      
      To ensure anon_vma_name refcount does not overflow, stop anon_vma_name
      sharing when the refcount reaches REFCOUNT_MAX (2147483647), which still
      leaves INT_MAX/2 (1073741823) values before the counter reaches
      REFCOUNT_SATURATED.  This should provide enough headroom for raising the
      refcounts temporarily.
      
      Link: https://lkml.kernel.org/r/20220223153613.835563-2-surenb@google.comSigned-off-by: NSuren Baghdasaryan <surenb@google.com>
      Suggested-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Alexey Gladkov <legion@kernel.org>
      Cc: Chris Hyser <chris.hyser@oracle.com>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: Colin Cross <ccross@google.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Peter Collingbourne <pcc@google.com>
      Cc: Sasha Levin <sashal@kernel.org>
      Cc: Sumit Semwal <sumit.semwal@linaro.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Xiaofeng Cao <caoxiaofeng@yulong.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      96403e11
    • S
      mm: refactor vm_area_struct::anon_vma_name usage code · 5c26f6ac
      Suren Baghdasaryan 提交于
      Avoid mixing strings and their anon_vma_name referenced pointers by
      using struct anon_vma_name whenever possible.  This simplifies the code
      and allows easier sharing of anon_vma_name structures when they
      represent the same name.
      
      [surenb@google.com: fix comment]
      
      Link: https://lkml.kernel.org/r/20220223153613.835563-1-surenb@google.com
      Link: https://lkml.kernel.org/r/20220224231834.1481408-1-surenb@google.comSigned-off-by: NSuren Baghdasaryan <surenb@google.com>
      Suggested-by: NMatthew Wilcox <willy@infradead.org>
      Suggested-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Colin Cross <ccross@google.com>
      Cc: Sumit Semwal <sumit.semwal@linaro.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: Alexey Gladkov <legion@kernel.org>
      Cc: Sasha Levin <sashal@kernel.org>
      Cc: Chris Hyser <chris.hyser@oracle.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Peter Collingbourne <pcc@google.com>
      Cc: Xiaofeng Cao <caoxiaofeng@yulong.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5c26f6ac
    • M
      selftests/vm: cleanup hugetlb file after mremap test · ff712a62
      Mike Kravetz 提交于
      The hugepage-mremap test will create a file in a hugetlb filesystem.  In
      a default 'run_vmtests' run, the file will contain all the hugetlb
      pages.  After the test, the file remains and there are no free hugetlb
      pages for subsequent tests.  This causes those hugetlb tests to fail.
      
      Change hugepage-mremap to take the name of the hugetlb file as an
      argument.  Unlink the file within the test, and just to be sure remove
      the file in the run_vmtests script.
      
      Link: https://lkml.kernel.org/r/20220201033459.156944-1-mike.kravetz@oracle.comSigned-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: NShuah Khan <skhan@linuxfoundation.org>
      Acked-by: NYosry Ahmed <yosryahmed@google.com>
      Reviewed-by: NMuchun Song <songmuchun@bytedance.com>
      Reviewed-by: NMina Almasry <almasrymina@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ff712a62