1. 22 1月, 2022 1 次提交
  2. 15 1月, 2022 4 次提交
  3. 07 11月, 2021 2 次提交
  4. 18 10月, 2021 1 次提交
  5. 27 9月, 2021 1 次提交
  6. 04 9月, 2021 3 次提交
  7. 02 7月, 2021 1 次提交
  8. 30 6月, 2021 3 次提交
    • H
      mm, swap: remove unnecessary smp_rmb() in swap_type_to_swap_info() · a4b45114
      Huang Ying 提交于
      Before commit c10d38cc ("mm, swap: bounds check swap_info array
      accesses to avoid NULL derefs"), the typical code to reference the
      swap_info[] is as follows,
      
        type = swp_type(swp_entry);
        if (type >= nr_swapfiles)
                /* handle invalid swp_entry */;
        p = swap_info[type];
        /* access fields of *p.  OOPS! p may be NULL! */
      
      Because the ordering isn't guaranteed, it's possible that swap_info[type]
      is read before "nr_swapfiles".  And that may result in NULL pointer
      dereference.
      
      So after commit c10d38cc, the code becomes,
      
        struct swap_info_struct *swap_type_to_swap_info(int type)
        {
      	  if (type >= READ_ONCE(nr_swapfiles))
      		  return NULL;
      	  smp_rmb();
      	  return READ_ONCE(swap_info[type]);
        }
      
        /* users */
        type = swp_type(swp_entry);
        p = swap_type_to_swap_info(type);
        if (!p)
      	  /* handle invalid swp_entry */;
        /* dereference p */
      
      Where the value of swap_info[type] (that is, "p") is checked to be
      non-zero before being dereferenced.  So, the NULL deferencing becomes
      impossible even if "nr_swapfiles" is read after swap_info[type].
      Therefore, the "smp_rmb()" becomes unnecessary.
      
      And, we don't even need to read "nr_swapfiles" here.  Because the non-zero
      checking for "p" is sufficient.  We just need to make sure we will not
      access out of the boundary of the array.  With the change, nr_swapfiles
      will only be accessed with swap_lock held, except in
      swapcache_free_entries().  Where the absolute correctness of the value
      isn't needed, as described in the comments.
      
      We still need to guarantee swap_info[type] is read before being
      dereferenced.  That can be satisfied via the data dependency ordering
      enforced by READ_ONCE(swap_info[type]).  This needs to be paired with
      proper write barriers.  So smp_store_release() is used in
      alloc_swap_info() to guarantee the fields of *swap_info[type] is
      initialized before swap_info[type] itself being written.  Note that the
      fields of *swap_info[type] is initialized to be 0 via kvzalloc() firstly.
      The assignment and deferencing of swap_info[type] is like
      rcu_assign_pointer() and rcu_dereference().
      
      Link: https://lkml.kernel.org/r/20210520073301.1676294-1-ying.huang@intel.comSigned-off-by: N"Huang, Ying" <ying.huang@intel.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Andrea Parri <andrea.parri@amarulasolutions.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Paul McKenney <paulmck@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a4b45114
    • M
      mm/swapfile: move get_swap_page_of_type() under CONFIG_HIBERNATION · bb243f7d
      Miaohe Lin 提交于
      Patch series "Cleanups for swap", v2.
      
      This series contains just cleanups to remove some unused variables, delete
      meaningless forward declarations and so on.  More details can be found in
      the respective changelogs.
      
      This patch (of 4):
      
      We should move get_swap_page_of_type() under CONFIG_HIBERNATION since the
      only caller of this function is now suspend routine.
      
      [linmiaohe@huawei.com: move scan_swap_map() under CONFIG_HIBERNATION]
        Link: https://lkml.kernel.org/r/20210521070855.2015094-1-linmiaohe@huawei.com
      [linmiaohe@huawei.com: fold scan_swap_map() into the only caller get_swap_page_of_type()]
        Link: https://lkml.kernel.org/r/20210527120328.3935132-1-linmiaohe@huawei.com
      
      Link: https://lkml.kernel.org/r/20210520134022.1370406-1-linmiaohe@huawei.com
      Link: https://lkml.kernel.org/r/20210520134022.1370406-2-linmiaohe@huawei.comSigned-off-by: NMiaohe Lin <linmiaohe@huawei.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bb243f7d
    • M
      mm/swapfile: use percpu_ref to serialize against concurrent swapoff · 63d8620e
      Miaohe Lin 提交于
      Patch series "close various race windows for swap", v6.
      
      When I was investigating the swap code, I found some possible race
      windows.  This series aims to fix all these races.  But using current
      get/put_swap_device() to guard against concurrent swapoff for
      swap_readpage() looks terrible because swap_readpage() may take really
      long time.  And to reduce the performance overhead on the hot-path as much
      as possible, it appears we can use the percpu_ref to close this race
      window(as suggested by Huang, Ying).  The patch 1 adds percpu_ref support
      for swap and most of the remaining patches try to use this to close
      various race windows.  More details can be found in the respective
      changelogs.
      
      This patch (of 4):
      
      Using current get/put_swap_device() to guard against concurrent swapoff
      for some swap ops, e.g.  swap_readpage(), looks terrible because they
      might take really long time.  This patch adds the percpu_ref support to
      serialize against concurrent swapoff(as suggested by Huang, Ying).  Also
      we remove the SWP_VALID flag because it's used together with RCU solution.
      
      Link: https://lkml.kernel.org/r/20210426123316.806267-1-linmiaohe@huawei.com
      Link: https://lkml.kernel.org/r/20210426123316.806267-2-linmiaohe@huawei.comSigned-off-by: NMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: N"Huang, Ying" <ying.huang@intel.com>
      Cc: Alex Shi <alexs@kernel.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      63d8620e
  9. 17 6月, 2021 1 次提交
  10. 06 5月, 2021 1 次提交
  11. 03 3月, 2021 1 次提交
  12. 25 2月, 2021 1 次提交
  13. 10 2月, 2021 1 次提交
  14. 28 1月, 2021 2 次提交
  15. 21 1月, 2021 1 次提交
  16. 16 12月, 2020 4 次提交
  17. 07 12月, 2020 1 次提交
  18. 14 10月, 2020 4 次提交
  19. 27 9月, 2020 1 次提交
    • G
      mm, THP, swap: fix allocating cluster for swapfile by mistake · 41663430
      Gao Xiang 提交于
      SWP_FS is used to make swap_{read,write}page() go through the
      filesystem, and it's only used for swap files over NFS.  So, !SWP_FS
      means non NFS for now, it could be either file backed or device backed.
      Something similar goes with legacy SWP_FILE.
      
      So in order to achieve the goal of the original patch, SWP_BLKDEV should
      be used instead.
      
      FS corruption can be observed with SSD device + XFS + fragmented
      swapfile due to CONFIG_THP_SWAP=y.
      
      I reproduced the issue with the following details:
      
      Environment:
      
        QEMU + upstream kernel + buildroot + NVMe (2 GB)
      
      Kernel config:
      
        CONFIG_BLK_DEV_NVME=y
        CONFIG_THP_SWAP=y
      
      Some reproducible steps:
      
        mkfs.xfs -f /dev/nvme0n1
        mkdir /tmp/mnt
        mount /dev/nvme0n1 /tmp/mnt
        bs="32k"
        sz="1024m"    # doesn't matter too much, I also tried 16m
        xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw
        xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw
        xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw
        xfs_io -f -c "pwrite -F -S 0 -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw
        xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fsync" /tmp/mnt/sw
      
        mkswap /tmp/mnt/sw
        swapon /tmp/mnt/sw
      
        stress --vm 2 --vm-bytes 600M   # doesn't matter too much as well
      
      Symptoms:
       - FS corruption (e.g. checksum failure)
       - memory corruption at: 0xd2808010
       - segfault
      
      Fixes: f0eea189 ("mm, THP, swap: Don't allocate huge cluster for file backed swap device")
      Fixes: 38d8b4e6 ("mm, THP, swap: delay splitting THP during swap out")
      Signed-off-by: NGao Xiang <hsiangkao@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: N"Huang, Ying" <ying.huang@intel.com>
      Reviewed-by: NYang Shi <shy828301@gmail.com>
      Acked-by: NRafael Aquini <aquini@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Eric Sandeen <esandeen@redhat.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20200820045323.7809-1-hsiangkao@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      41663430
  20. 25 9月, 2020 2 次提交
  21. 24 9月, 2020 2 次提交
  22. 04 9月, 2020 1 次提交
    • S
      mm: Add arch hooks for saving/restoring tags · 8a84802e
      Steven Price 提交于
      Arm's Memory Tagging Extension (MTE) adds some metadata (tags) to
      every physical page, when swapping pages out to disk it is necessary to
      save these tags, and later restore them when reading the pages back.
      
      Add some hooks along with dummy implementations to enable the
      arch code to handle this.
      
      Three new hooks are added to the swap code:
       * arch_prepare_to_swap() and
       * arch_swap_invalidate_page() / arch_swap_invalidate_area().
      One new hook is added to shmem:
       * arch_swap_restore()
      Signed-off-by: NSteven Price <steven.price@arm.com>
      [catalin.marinas@arm.com: add unlock_page() on the error path]
      [catalin.marinas@arm.com: dropped the _tags suffix]
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: NAndrew Morton <akpm@linux-foundation.org>
      8a84802e
  23. 15 8月, 2020 1 次提交
    • Q
      mm/swapfile: fix and annotate various data races · a449bf58
      Qian Cai 提交于
      swap_info_struct si.highest_bit, si.swap_map[offset] and si.flags could
      be accessed concurrently separately as noticed by KCSAN,
      
      === si.highest_bit ===
      
       write to 0xffff8d5abccdc4d4 of 4 bytes by task 5353 on cpu 24:
        swap_range_alloc+0x81/0x130
        swap_range_alloc at mm/swapfile.c:681
        scan_swap_map_slots+0x371/0xb90
        get_swap_pages+0x39d/0x5c0
        get_swap_page+0xf2/0x524
        add_to_swap+0xe4/0x1c0
        shrink_page_list+0x1795/0x2870
        shrink_inactive_list+0x316/0x880
        shrink_lruvec+0x8dc/0x1380
        shrink_node+0x317/0xd80
        do_try_to_free_pages+0x1f7/0xa10
        try_to_free_pages+0x26c/0x5e0
        __alloc_pages_slowpath+0x458/0x1290
      
       read to 0xffff8d5abccdc4d4 of 4 bytes by task 6672 on cpu 70:
        scan_swap_map_slots+0x4a6/0xb90
        scan_swap_map_slots at mm/swapfile.c:892
        get_swap_pages+0x39d/0x5c0
        get_swap_page+0xf2/0x524
        add_to_swap+0xe4/0x1c0
        shrink_page_list+0x1795/0x2870
        shrink_inactive_list+0x316/0x880
        shrink_lruvec+0x8dc/0x1380
        shrink_node+0x317/0xd80
        do_try_to_free_pages+0x1f7/0xa10
        try_to_free_pages+0x26c/0x5e0
        __alloc_pages_slowpath+0x458/0x1290
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 70 PID: 6672 Comm: oom01 Tainted: G        W    L 5.5.0-next-20200205+ #3
       Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
      
      === si.swap_map[offset] ===
      
       write to 0xffffbc370c29a64c of 1 bytes by task 6856 on cpu 86:
        __swap_entry_free_locked+0x8c/0x100
        __swap_entry_free_locked at mm/swapfile.c:1209 (discriminator 4)
        __swap_entry_free.constprop.20+0x69/0xb0
        free_swap_and_cache+0x53/0xa0
        unmap_page_range+0x7f8/0x1d70
        unmap_single_vma+0xcd/0x170
        unmap_vmas+0x18b/0x220
        exit_mmap+0xee/0x220
        mmput+0x10e/0x270
        do_exit+0x59b/0xf40
        do_group_exit+0x8b/0x180
      
       read to 0xffffbc370c29a64c of 1 bytes by task 6855 on cpu 20:
        _swap_info_get+0x81/0xa0
        _swap_info_get at mm/swapfile.c:1140
        free_swap_and_cache+0x40/0xa0
        unmap_page_range+0x7f8/0x1d70
        unmap_single_vma+0xcd/0x170
        unmap_vmas+0x18b/0x220
        exit_mmap+0xee/0x220
        mmput+0x10e/0x270
        do_exit+0x59b/0xf40
        do_group_exit+0x8b/0x180
      
      === si.flags ===
      
       write to 0xffff956c8fc6c400 of 8 bytes by task 6087 on cpu 23:
        scan_swap_map_slots+0x6fe/0xb50
        scan_swap_map_slots at mm/swapfile.c:887
        get_swap_pages+0x39d/0x5c0
        get_swap_page+0x377/0x524
        add_to_swap+0xe4/0x1c0
        shrink_page_list+0x1795/0x2870
        shrink_inactive_list+0x316/0x880
        shrink_lruvec+0x8dc/0x1380
        shrink_node+0x317/0xd80
        do_try_to_free_pages+0x1f7/0xa10
        try_to_free_pages+0x26c/0x5e0
        __alloc_pages_slowpath+0x458/0x1290
      
       read to 0xffff956c8fc6c400 of 8 bytes by task 6207 on cpu 63:
        _swap_info_get+0x41/0xa0
        __swap_info_get at mm/swapfile.c:1114
        put_swap_page+0x84/0x490
        __remove_mapping+0x384/0x5f0
        shrink_page_list+0xff1/0x2870
        shrink_inactive_list+0x316/0x880
        shrink_lruvec+0x8dc/0x1380
        shrink_node+0x317/0xd80
        do_try_to_free_pages+0x1f7/0xa10
        try_to_free_pages+0x26c/0x5e0
        __alloc_pages_slowpath+0x458/0x1290
      
      The writes are under si->lock but the reads are not. For si.highest_bit
      and si.swap_map[offset], data race could trigger logic bugs, so fix them
      by having WRITE_ONCE() for the writes and READ_ONCE() for the reads
      except those isolated reads where they compare against zero which a data
      race would cause no harm. Thus, annotate them as intentional data races
      using the data_race() macro.
      
      For si.flags, the readers are only interested in a single bit where a
      data race there would cause no issue there.
      
      [cai@lca.pw: add a missing annotation for si->flags in memory.c]
        Link: http://lkml.kernel.org/r/1581612647-5958-1-git-send-email-cai@lca.pwSigned-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Marco Elver <elver@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Link: http://lkml.kernel.org/r/1581095163-12198-1-git-send-email-cai@lca.pwSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a449bf58