1. 20 11月, 2019 26 次提交
  2. 19 11月, 2019 3 次提交
    • M
      block: fix .bi_size overflow · a74e2556
      Ming Lei 提交于
      commit 79d08f89bb1b5c2c1ff90d9bb95497ab9e8aa7e0 upstream
      
      'bio->bi_iter.bi_size' is 'unsigned int', which at most hold 4G - 1
      bytes.
      
      Before 07173c3ec276 ("block: enable multipage bvecs"), one bio can
      include very limited pages, and usually at most 256, so the fs bio
      size won't be bigger than 1M bytes most of times.
      
      Since we support multi-page bvec, in theory one fs bio really can
      be added > 1M pages, especially in case of hugepage, or big writeback
      with too many dirty pages. Then there is chance in which .bi_size
      is overflowed.
      
      Fixes this issue by using bio_full() to check if the added segment may
      overflow .bi_size.
      Signed-off-by: NHui Zhu <teawaterz@linux.alibaba.com>
      Cc: Liu Yiding <liuyd.fnst@cn.fujitsu.com>
      Cc: kernel test robot <rong.a.chen@intel.com>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Cc: linux-xfs@vger.kernel.org
      Cc: linux-fsdevel@vger.kernel.org
      Cc: stable@vger.kernel.org
      Fixes: 07173c3ec276 ("block: enable multipage bvecs")
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      a74e2556
    • H
      mm, swap: fix race between swapoff and some swap operations · 73c29467
      Huang Ying 提交于
      commit eb085574a7526c4375965c5fbf7e5b0c19cdd336 upstream.
      Change SWP_VALID to (1 << 12).
      
      When swapin is performed, after getting the swap entry information from
      the page table, system will swap in the swap entry, without any lock held
      to prevent the swap device from being swapoff.  This may cause the race
      like below,
      
      CPU 1				CPU 2
      -----				-----
      				do_swap_page
      				  swapin_readahead
      				    __read_swap_cache_async
      swapoff				      swapcache_prepare
        p->swap_map = NULL		        __swap_duplicate
      					  p->swap_map[?] /* !!! NULL pointer access */
      
      Because swapoff is usually done when system shutdown only, the race may
      not hit many people in practice.  But it is still a race need to be fixed.
      
      To fix the race, get_swap_device() is added to check whether the specified
      swap entry is valid in its swap device.  If so, it will keep the swap
      entry valid via preventing the swap device from being swapoff, until
      put_swap_device() is called.
      
      Because swapoff() is very rare code path, to make the normal path runs as
      fast as possible, rcu_read_lock/unlock() and synchronize_rcu() instead of
      reference count is used to implement get/put_swap_device().  >From
      get_swap_device() to put_swap_device(), RCU reader side is locked, so
      synchronize_rcu() in swapoff() will wait until put_swap_device() is
      called.
      
      In addition to swap_map, cluster_info, etc.  data structure in the struct
      swap_info_struct, the swap cache radix tree will be freed after swapoff,
      so this patch fixes the race between swap cache looking up and swapoff
      too.
      
      Races between some other swap cache usages and swapoff are fixed too via
      calling synchronize_rcu() between clearing PageSwapCache() and freeing
      swap cache data structure.
      
      Another possible method to fix this is to use preempt_off() +
      stop_machine() to prevent the swap device from being swapoff when its data
      structure is being accessed.  The overhead in hot-path of both methods is
      similar.  The advantages of RCU based method are,
      
      1. stop_machine() may disturb the normal execution code path on other
         CPUs.
      
      2. File cache uses RCU to protect its radix tree.  If the similar
         mechanism is used for swap cache too, it is easier to share code
         between them.
      
      3. RCU is used to protect swap cache in total_swapcache_pages() and
         exit_swap_address_space() already.  The two mechanisms can be
         merged to simplify the logic.
      
      Link: http://lkml.kernel.org/r/20190522015423.14418-1-ying.huang@intel.com
      Fixes: 235b6217 ("mm/swap: add cluster lock")
      Signed-off-by: NHui Zhu <teawaterz@linux.alibaba.com>
      Signed-off-by: N"Huang, Ying" <ying.huang@intel.com>
      Reviewed-by: NAndrea Parri <andrea.parri@amarulasolutions.com>
      Not-nacked-by: NHugh Dickins <hughd@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Yang Shi <yang.shi@linux.alibaba.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
      73c29467
    • W
      vmscan: return NODE_RECLAIM_NOSCAN in node_reclaim() when CONFIG_NUMA is n · e431b612
      Wei Yang 提交于
      commit 8b09549c2bfd9f3f8f4cdad74107ef4f4ff9cdd7 upstream.
      
      Commit fa5e084e ("vmscan: do not unconditionally treat zones that
      fail zone_reclaim() as full") changed the return value of
      node_reclaim().  The original return value 0 means NODE_RECLAIM_SOME
      after this commit.
      
      While the return value of node_reclaim() when CONFIG_NUMA is n is not
      changed.  This will leads to call zone_watermark_ok() again.
      
      This patch fixes the return value by adjusting to NODE_RECLAIM_NOSCAN.
      Since node_reclaim() is only called in page_alloc.c, move it to
      mm/internal.h.
      
      Link: http://lkml.kernel.org/r/20181113080436.22078-1-richard.weiyang@gmail.comSigned-off-by: NHui Zhu <teawaterz@linux.alibaba.com>
      Signed-off-by: NWei Yang <richard.weiyang@gmail.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NMatthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>
      Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
      e431b612
  3. 07 11月, 2019 6 次提交
  4. 30 10月, 2019 5 次提交