1. 01 7月, 2020 1 次提交
  2. 10 6月, 2020 2 次提交
    • M
      mmap locking API: use coccinelle to convert mmap_sem rwsem call sites · d8ed45c5
      Michel Lespinasse 提交于
      This change converts the existing mmap_sem rwsem calls to use the new mmap
      locking API instead.
      
      The change is generated using coccinelle with the following rule:
      
      // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .
      
      @@
      expression mm;
      @@
      (
      -init_rwsem
      +mmap_init_lock
      |
      -down_write
      +mmap_write_lock
      |
      -down_write_killable
      +mmap_write_lock_killable
      |
      -down_write_trylock
      +mmap_write_trylock
      |
      -up_write
      +mmap_write_unlock
      |
      -downgrade_write
      +mmap_write_downgrade
      |
      -down_read
      +mmap_read_lock
      |
      -down_read_killable
      +mmap_read_lock_killable
      |
      -down_read_trylock
      +mmap_read_trylock
      |
      -up_read
      +mmap_read_unlock
      )
      -(&mm->mmap_sem)
      +(mm)
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com>
      Reviewed-by: NLaurent Dufour <ldufour@linux.ibm.com>
      Reviewed-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Liam Howlett <Liam.Howlett@oracle.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ying Han <yinghan@google.com>
      Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d8ed45c5
    • M
      mm: don't include asm/pgtable.h if linux/mm.h is already included · e31cf2f4
      Mike Rapoport 提交于
      Patch series "mm: consolidate definitions of page table accessors", v2.
      
      The low level page table accessors (pXY_index(), pXY_offset()) are
      duplicated across all architectures and sometimes more than once.  For
      instance, we have 31 definition of pgd_offset() for 25 supported
      architectures.
      
      Most of these definitions are actually identical and typically it boils
      down to, e.g.
      
      static inline unsigned long pmd_index(unsigned long address)
      {
              return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
      }
      
      static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
      {
              return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
      }
      
      These definitions can be shared among 90% of the arches provided
      XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined.
      
      For architectures that really need a custom version there is always
      possibility to override the generic version with the usual ifdefs magic.
      
      These patches introduce include/linux/pgtable.h that replaces
      include/asm-generic/pgtable.h and add the definitions of the page table
      accessors to the new header.
      
      This patch (of 12):
      
      The linux/mm.h header includes <asm/pgtable.h> to allow inlining of the
      functions involving page table manipulations, e.g.  pte_alloc() and
      pmd_alloc().  So, there is no point to explicitly include <asm/pgtable.h>
      in the files that include <linux/mm.h>.
      
      The include statements in such cases are remove with a simple loop:
      
      	for f in $(git grep -l "include <linux/mm.h>") ; do
      		sed -i -e '/include <asm\/pgtable.h>/ d' $f
      	done
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org
      Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e31cf2f4
  3. 04 6月, 2020 5 次提交
  4. 03 6月, 2020 15 次提交
  5. 08 4月, 2020 1 次提交
    • A
      proc: faster open/read/close with "permanent" files · d919b33d
      Alexey Dobriyan 提交于
      Now that "struct proc_ops" exist we can start putting there stuff which
      could not fly with VFS "struct file_operations"...
      
      Most of fs/proc/inode.c file is dedicated to make open/read/.../close
      reliable in the event of disappearing /proc entries which usually happens
      if module is getting removed.  Files like /proc/cpuinfo which never
      disappear simply do not need such protection.
      
      Save 2 atomic ops, 1 allocation, 1 free per open/read/close sequence for such
      "permanent" files.
      
      Enable "permanent" flag for
      
      	/proc/cpuinfo
      	/proc/kmsg
      	/proc/modules
      	/proc/slabinfo
      	/proc/stat
      	/proc/sysvipc/*
      	/proc/swaps
      
      More will come once I figure out foolproof way to prevent out module
      authors from marking their stuff "permanent" for performance reasons
      when it is not.
      
      This should help with scalability: benchmark is "read /proc/cpuinfo R times
      by N threads scattered over the system".
      
      	N	R	t, s (before)	t, s (after)
      	-----------------------------------------------------
      	64	4096	1.582458	1.530502	-3.2%
      	256	4096	6.371926	6.125168	-3.9%
      	1024	4096	25.64888	24.47528	-4.6%
      
      Benchmark source:
      
      #include <chrono>
      #include <iostream>
      #include <thread>
      #include <vector>
      
      #include <sys/types.h>
      #include <sys/stat.h>
      #include <fcntl.h>
      #include <unistd.h>
      
      const int NR_CPUS = sysconf(_SC_NPROCESSORS_ONLN);
      int N;
      const char *filename;
      int R;
      
      int xxx = 0;
      
      int glue(int n)
      {
      	cpu_set_t m;
      	CPU_ZERO(&m);
      	CPU_SET(n, &m);
      	return sched_setaffinity(0, sizeof(cpu_set_t), &m);
      }
      
      void f(int n)
      {
      	glue(n % NR_CPUS);
      
      	while (*(volatile int *)&xxx == 0) {
      	}
      
      	for (int i = 0; i < R; i++) {
      		int fd = open(filename, O_RDONLY);
      		char buf[4096];
      		ssize_t rv = read(fd, buf, sizeof(buf));
      		asm volatile ("" :: "g" (rv));
      		close(fd);
      	}
      }
      
      int main(int argc, char *argv[])
      {
      	if (argc < 4) {
      		std::cerr << "usage: " << argv[0] << ' ' << "N /proc/filename R
      ";
      		return 1;
      	}
      
      	N = atoi(argv[1]);
      	filename = argv[2];
      	R = atoi(argv[3]);
      
      	for (int i = 0; i < NR_CPUS; i++) {
      		if (glue(i) == 0)
      			break;
      	}
      
      	std::vector<std::thread> T;
      	T.reserve(N);
      	for (int i = 0; i < N; i++) {
      		T.emplace_back(f, i);
      	}
      
      	auto t0 = std::chrono::system_clock::now();
      	{
      		*(volatile int *)&xxx = 1;
      		for (auto& t: T) {
      			t.join();
      		}
      	}
      	auto t1 = std::chrono::system_clock::now();
      	std::chrono::duration<double> dt = t1 - t0;
      	std::cout << dt.count() << '
      ';
      
      	return 0;
      }
      
      P.S.:
      Explicit randomization marker is added because adding non-function pointer
      will silently disable structure layout randomization.
      
      [akpm@linux-foundation.org: coding style fixes]
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Joe Perches <joe@perches.com>
      Link: http://lkml.kernel.org/r/20200222201539.GA22576@avx2Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d919b33d
  6. 03 4月, 2020 2 次提交
  7. 30 3月, 2020 1 次提交
    • N
      mm/swapfile.c: move inode_lock out of claim_swapfile · d795a90e
      Naohiro Aota 提交于
      claim_swapfile() currently keeps the inode locked when it is successful,
      or the file is already swapfile (with -EBUSY).  And, on the other error
      cases, it does not lock the inode.
      
      This inconsistency of the lock state and return value is quite confusing
      and actually causing a bad unlock balance as below in the "bad_swap"
      section of __do_sys_swapon().
      
      This commit fixes this issue by moving the inode_lock() and IS_SWAPFILE
      check out of claim_swapfile().  The inode is unlocked in
      "bad_swap_unlock_inode" section, so that the inode is ensured to be
      unlocked at "bad_swap".  Thus, error handling codes after the locking now
      jumps to "bad_swap_unlock_inode" instead of "bad_swap".
      
          =====================================
          WARNING: bad unlock balance detected!
          5.5.0-rc7+ #176 Not tainted
          -------------------------------------
          swapon/4294 is trying to release lock (&sb->s_type->i_mutex_key) at: __do_sys_swapon+0x94b/0x3550
          but there are no more locks to release!
      
          other info that might help us debug this:
          no locks held by swapon/4294.
      
          stack backtrace:
          CPU: 5 PID: 4294 Comm: swapon Not tainted 5.5.0-rc7-BTRFS-ZNS+ #176
          Hardware name: ASUS All Series/H87-PRO, BIOS 2102 07/29/2014
          Call Trace:
           dump_stack+0xa1/0xea
           print_unlock_imbalance_bug.cold+0x114/0x123
           lock_release+0x562/0xed0
           up_write+0x2d/0x490
           __do_sys_swapon+0x94b/0x3550
           __x64_sys_swapon+0x54/0x80
           do_syscall_64+0xa4/0x4b0
           entry_SYSCALL_64_after_hwframe+0x49/0xbe
          RIP: 0033:0x7f15da0a0dc7
      
      Fixes: 1638045c ("mm: set S_SWAPFILE on blockdev swap devices")
      Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: NQais Youef <qais.yousef@arm.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20200206090132.154869-1-naohiro.aota@wdc.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d795a90e
  8. 22 2月, 2020 1 次提交
  9. 04 2月, 2020 1 次提交
  10. 01 2月, 2020 1 次提交
  11. 01 12月, 2019 1 次提交
  12. 20 8月, 2019 2 次提交
  13. 13 7月, 2019 2 次提交
    • A
      mm, swap: use rbtree for swap_extent · 4efaceb1
      Aaron Lu 提交于
      swap_extent is used to map swap page offset to backing device's block
      offset.  For a continuous block range, one swap_extent is used and all
      these swap_extents are managed in a linked list.
      
      These swap_extents are used by map_swap_entry() during swap's read and
      write path.  To find out the backing device's block offset for a page
      offset, the swap_extent list will be traversed linearly, with
      curr_swap_extent being used as a cache to speed up the search.
      
      This works well as long as swap_extents are not huge or when the number
      of processes that access swap device are few, but when the swap device
      has many extents and there are a number of processes accessing the swap
      device concurrently, it can be a problem.  On one of our servers, the
      disk's remaining size is tight:
      
        $df -h
        Filesystem      Size  Used Avail Use% Mounted on
        ... ...
        /dev/nvme0n1p1  1.8T  1.3T  504G  72% /home/t4
      
      When creating a 80G swapfile there, there are as many as 84656 swap
      extents.  The end result is, kernel spends abou 30% time in
      map_swap_entry() and swap throughput is only 70MB/s.
      
      As a comparison, when I used smaller sized swapfile, like 4G whose
      swap_extent dropped to 2000, swap throughput is back to 400-500MB/s and
      map_swap_entry() is about 3%.
      
      One downside of using rbtree for swap_extent is, 'struct rbtree' takes
      24 bytes while 'struct list_head' takes 16 bytes, that's 8 bytes more
      for each swap_extent.  For a swapfile that has 80k swap_extents, that
      means 625KiB more memory consumed.
      
      Test:
      
      Since it's not possible to reboot that server, I can not test this patch
      diretly there.  Instead, I tested it on another server with NVMe disk.
      
      I created a 20G swapfile on an NVMe backed XFS fs.  By default, the
      filesystem is quite clean and the created swapfile has only 2 extents.
      Testing vanilla and this patch shows no obvious performance difference
      when swapfile is not fragmented.
      
      To see the patch's effects, I used some tweaks to manually fragment the
      swapfile by breaking the extent at 1M boundary.  This made the swapfile
      have 20K extents.
      
        nr_task=4
        kernel   swapout(KB/s) map_swap_entry(perf)  swapin(KB/s) map_swap_entry(perf)
        vanilla  165191           90.77%             171798          90.21%
        patched  858993 +420%      2.16%             715827 +317%     0.77%
      
        nr_task=8
        kernel   swapout(KB/s) map_swap_entry(perf)  swapin(KB/s) map_swap_entry(perf)
        vanilla  306783           92.19%             318145          87.76%
        patched  954437 +211%      2.35%            1073741 +237%     1.57%
      
      swapout: the throughput of swap out, in KB/s, higher is better 1st
      map_swap_entry: cpu cycles percent sampled by perf swapin: the
      throughput of swap in, in KB/s, higher is better.  2nd map_swap_entry:
      cpu cycles percent sampled by perf
      
      nr_task=1 doesn't show any difference, this is due to the curr_swap_extent
      can be effectively used to cache the correct swap extent for single task
      workload.
      
      [akpm@linux-foundation.org: s/BUG_ON(1)/BUG()/]
      Link: http://lkml.kernel.org/r/20190523142404.GA181@aaronluSigned-off-by: NAaron Lu <ziqian.lzq@antfin.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4efaceb1
    • H
      mm, swap: fix race between swapoff and some swap operations · eb085574
      Huang Ying 提交于
      When swapin is performed, after getting the swap entry information from
      the page table, system will swap in the swap entry, without any lock held
      to prevent the swap device from being swapoff.  This may cause the race
      like below,
      
      CPU 1				CPU 2
      -----				-----
      				do_swap_page
      				  swapin_readahead
      				    __read_swap_cache_async
      swapoff				      swapcache_prepare
        p->swap_map = NULL		        __swap_duplicate
      					  p->swap_map[?] /* !!! NULL pointer access */
      
      Because swapoff is usually done when system shutdown only, the race may
      not hit many people in practice.  But it is still a race need to be fixed.
      
      To fix the race, get_swap_device() is added to check whether the specified
      swap entry is valid in its swap device.  If so, it will keep the swap
      entry valid via preventing the swap device from being swapoff, until
      put_swap_device() is called.
      
      Because swapoff() is very rare code path, to make the normal path runs as
      fast as possible, rcu_read_lock/unlock() and synchronize_rcu() instead of
      reference count is used to implement get/put_swap_device().  >From
      get_swap_device() to put_swap_device(), RCU reader side is locked, so
      synchronize_rcu() in swapoff() will wait until put_swap_device() is
      called.
      
      In addition to swap_map, cluster_info, etc.  data structure in the struct
      swap_info_struct, the swap cache radix tree will be freed after swapoff,
      so this patch fixes the race between swap cache looking up and swapoff
      too.
      
      Races between some other swap cache usages and swapoff are fixed too via
      calling synchronize_rcu() between clearing PageSwapCache() and freeing
      swap cache data structure.
      
      Another possible method to fix this is to use preempt_off() +
      stop_machine() to prevent the swap device from being swapoff when its data
      structure is being accessed.  The overhead in hot-path of both methods is
      similar.  The advantages of RCU based method are,
      
      1. stop_machine() may disturb the normal execution code path on other
         CPUs.
      
      2. File cache uses RCU to protect its radix tree.  If the similar
         mechanism is used for swap cache too, it is easier to share code
         between them.
      
      3. RCU is used to protect swap cache in total_swapcache_pages() and
         exit_swap_address_space() already.  The two mechanisms can be
         merged to simplify the logic.
      
      Link: http://lkml.kernel.org/r/20190522015423.14418-1-ying.huang@intel.com
      Fixes: 235b6217 ("mm/swap: add cluster lock")
      Signed-off-by: N"Huang, Ying" <ying.huang@intel.com>
      Reviewed-by: NAndrea Parri <andrea.parri@amarulasolutions.com>
      Not-nacked-by: NHugh Dickins <hughd@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Yang Shi <yang.shi@linux.alibaba.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eb085574
  14. 21 5月, 2019 1 次提交
  15. 20 4月, 2019 3 次提交
    • H
      mm: swapoff: shmem_unuse() stop eviction without igrab() · af53d3e9
      Hugh Dickins 提交于
      The igrab() in shmem_unuse() looks good, but we forgot that it gives no
      protection against concurrent unmounting: a point made by Konstantin
      Khlebnikov eight years ago, and then fixed in 2.6.39 by 778dd893
      ("tmpfs: fix race between umount and swapoff").  The current 5.1-rc
      swapoff is liable to hit "VFS: Busy inodes after unmount of tmpfs.
      Self-destruct in 5 seconds.  Have a nice day..." followed by GPF.
      
      Once again, give up on using igrab(); but don't go back to making such
      heavy-handed use of shmem_swaplist_mutex as last time: that would spoil
      the new design, and I expect could deadlock inside shmem_swapin_page().
      
      Instead, shmem_unuse() just raise a "stop_eviction" count in the shmem-
      specific inode, and shmem_evict_inode() wait for that to go down to 0.
      Call it "stop_eviction" rather than "swapoff_busy" because it can be put
      to use for others later (huge tmpfs patches expect to use it).
      
      That simplifies shmem_unuse(), protecting it from both unlink and
      unmount; and in practice lets it locate all the swap in its first try.
      But do not rely on that: there's still a theoretical case, when
      shmem_writepage() might have been preempted after its get_swap_page(),
      before making the swap entry visible to swapoff.
      
      [hughd@google.com: remove incorrect list_del()]
        Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1904091133570.1898@eggly.anvils
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1904081259400.1523@eggly.anvils
      Fixes: b56a2d8a ("mm: rid swapoff of quadratic complexity")
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: "Alex Xu (Hello71)" <alex_y_xu@yahoo.ca>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Kelley Nielsen <kelleynnn@gmail.com>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Vineeth Pillai <vpillai@digitalocean.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      af53d3e9
    • H
      mm: swapoff: take notice of completion sooner · 64165b1a
      Hugh Dickins 提交于
      The old try_to_unuse() implementation was driven by find_next_to_unuse(),
      which terminated as soon as all the swap had been freed.
      
      Add inuse_pages checks now (alongside signal_pending()) to stop scanning
      mms and swap_map once finished.
      
      The same ought to be done in shmem_unuse() too, but never was before,
      and needs a different interface: so leave it as is for now.
      
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1904081258200.1523@eggly.anvils
      Fixes: b56a2d8a ("mm: rid swapoff of quadratic complexity")
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: "Alex Xu (Hello71)" <alex_y_xu@yahoo.ca>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Kelley Nielsen <kelleynnn@gmail.com>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Vineeth Pillai <vpillai@digitalocean.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      64165b1a
    • H
      mm: swapoff: remove too limiting SWAP_UNUSE_MAX_TRIES · dd862deb
      Hugh Dickins 提交于
      SWAP_UNUSE_MAX_TRIES 3 appeared to work well in earlier testing, but
      further testing has proved it to be a source of unnecessary swapoff
      EBUSY failures (which can then be followed by unmount EBUSY failures).
      
      When mmget_not_zero() or shmem's igrab() fails, there is an mm exiting
      or inode being evicted, freeing up swap independent of try_to_unuse().
      Those typically completed much sooner than the old quadratic swapoff,
      but now it's more common that swapoff may need to wait for them.
      
      It's possible to move those cases from init_mm.mmlist and shmem_swaplist
      to separate "exiting" swaplists, and try_to_unuse() then wait for those
      lists to be emptied; but we've not bothered with that in the past, and
      don't want to risk missing some other forgotten case.  So just revert to
      cycling around until the swap is gone, without any retries limit.
      
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1904081256170.1523@eggly.anvils
      Fixes: b56a2d8a ("mm: rid swapoff of quadratic complexity")
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: "Alex Xu (Hello71)" <alex_y_xu@yahoo.ca>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Kelley Nielsen <kelleynnn@gmail.com>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Vineeth Pillai <vpillai@digitalocean.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dd862deb
  16. 06 3月, 2019 1 次提交