1. 18 8月, 2015 2 次提交
  2. 15 8月, 2015 4 次提交
    • O
      change sb_writers to use percpu_rw_semaphore · 8129ed29
      Oleg Nesterov 提交于
      We can remove everything from struct sb_writers except frozen
      and add the array of percpu_rw_semaphore's instead.
      
      This patch doesn't remove sb_writers->wait_unfrozen yet, we keep
      it for get_super_thawed(). We will probably remove it later.
      
      This change tries to address the following problems:
      
      	- Firstly, __sb_start_write() looks simply buggy. It does
      	  __sb_end_write() if it sees ->frozen, but if it migrates
      	  to another CPU before percpu_counter_dec(), sb_wait_write()
      	  can wrongly succeed if there is another task which holds
      	  the same "semaphore": sb_wait_write() can miss the result
      	  of the previous percpu_counter_inc() but see the result
      	  of this percpu_counter_dec().
      
      	- As Dave Hansen reports, it is suboptimal. The trivial
      	  microbenchmark that writes to a tmpfs file in a loop runs
      	  12% faster if we change this code to rely on RCU and kill
      	  the memory barriers.
      
      	- This code doesn't look simple. It would be better to rely
      	  on the generic locking code.
      
      	  According to Dave, this change adds the same performance
      	  improvement.
      
      Note: with this change both freeze_super() and thaw_super() will do
      synchronize_sched_expedited() 3 times. This is just ugly. But:
      
      	- This will be "fixed" by the rcu_sync changes we are going
      	  to merge. After that freeze_super()->percpu_down_write()
      	  will use synchronize_sched(), and thaw_super() won't use
      	  synchronize() at all.
      
      	  This doesn't need any changes in fs/super.c.
      
      	- Once we merge rcu_sync changes, we can also change super.c
      	  so that all wb_write->rw_sem's will share the single ->rss
      	  in struct sb_writes, then freeze_super() will need only one
      	  synchronize_sched().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.com>
      8129ed29
    • O
      shift percpu_counter_destroy() into destroy_super_work() · 853b39a7
      Oleg Nesterov 提交于
      Of course, this patch is ugly as hell. It will be (partially)
      reverted later. We add it to ensure that other WIP changes in
      percpu_rw_semaphore won't break fs/super.c.
      
      We do not even need this change right now, percpu_free_rwsem()
      is fine in atomic context. But we are going to change this, it
      will be might_sleep() after we merge the rcu_sync() patches.
      
      And even after that we do not really need destroy_super_work(),
      we will kill it in any case. Instead, destroy_super_rcu() should
      just check that rss->cb_state == CB_IDLE and do call_rcu() again
      in the (very unlikely) case this is not true.
      
      So this is just the temporary kludge which helps us to avoid the
      conflicts with the changes which will be (hopefully) routed via
      rcu tree.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.com>
      853b39a7
    • O
      document rwsem_release() in sb_wait_write() · 0e28e01f
      Oleg Nesterov 提交于
      Not only we need to avoid the warning from lockdep_sys_exit(), the
      caller of freeze_super() can never release this lock. Another thread
      can do this, so there is another reason for rwsem_release().
      
      Plus the comment should explain why we have to fool lockdep.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.com>
      0e28e01f
    • O
      fix the broken lockdep logic in __sb_start_write() · f4b554af
      Oleg Nesterov 提交于
      1. wait_event(frozen < level) without rwsem_acquire_read() is just
         wrong from lockdep perspective. If we are going to deadlock
         because the caller is buggy, lockdep can't detect this problem.
      
      2. __sb_start_write() can race with thaw_super() + freeze_super(),
         and after "goto retry" the 2nd  acquire_freeze_lock() is wrong.
      
      3. The "tell lockdep we are doing trylock" hack doesn't look nice.
      
         I think this is correct, but this logic should be more explicit.
         Yes, the recursive read_lock() is fine if we hold the lock on a
         higher level. But we do not need to fool lockdep. If we can not
         deadlock in this case then try-lock must not fail and we can use
         use wait == F throughout this code.
      
      Note: as Dave Chinner explains, the "trylock" hack and the fat comment
      can be probably removed. But this needs a separate change and it will
      be trivial: just kill __sb_start_write() and rename do_sb_start_write()
      back to __sb_start_write().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.com>
      f4b554af
  3. 01 7月, 2015 1 次提交
  4. 15 4月, 2015 1 次提交
    • V
      cleancache: remove limit on the number of cleancache enabled filesystems · 3cb29d11
      Vladimir Davydov 提交于
      The limit equals 32 and is imposed by the number of entries in the
      fs_poolid_map and shared_fs_poolid_map.  Nowadays it is insufficient,
      because with containers on board a Linux host can have hundreds of
      active fs mounts.
      
      These maps were introduced by commit 49a9ab81 ("mm: cleancache:
      lazy initialization to allow tmem backends to build/run as modules") in
      order to allow compiling cleancache drivers as modules.  Real pool ids
      are stored in these maps while super_block->cleancache_poolid points to
      an entry in the map, so that on cleancache registration we can walk over
      all (if there are <= 32 of them, of course) cleancache-enabled super
      blocks and assign real pool ids.
      
      Actually, there is absolutely no need in these maps, because we can
      iterate over all super blocks immediately using iterate_supers.  This is
      not racy, because cleancache_init_ops is called from mount_fs with
      super_block->s_umount held for writing, while iterate_supers takes this
      semaphore for reading, so if we call iterate_supers after setting
      cleancache_ops, all super blocks that had been created before
      cleancache_register_ops was called will be assigned pool ids by the
      action function of iterate_supers while all newer super blocks will
      receive it in cleancache_init_fs.
      
      This patch therefore removes the maps and hence the artificial limit on
      the number of cleancache enabled filesystems.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Stefan Hengelein <ilendir@googlemail.com>
      Cc: Florian Schmaus <fschmaus@gmail.com>
      Cc: Andor Daam <andor.daam@googlemail.com>
      Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Bob Liu <lliubbo@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3cb29d11
  5. 23 2月, 2015 1 次提交
    • K
      trylock_super(): replacement for grab_super_passive() · eb6ef3df
      Konstantin Khlebnikov 提交于
      I've noticed significant locking contention in memory reclaimer around
      sb_lock inside grab_super_passive(). Grab_super_passive() is called from
      two places: in icache/dcache shrinkers (function super_cache_scan) and
      from writeback (function __writeback_inodes_wb). Both are required for
      progress in memory allocator.
      
      Grab_super_passive() acquires sb_lock to increment sb->s_count and check
      sb->s_instances. It seems sb->s_umount locked for read is enough here:
      super-block deactivation always runs under sb->s_umount locked for write.
      Protecting super-block itself isn't a problem: in super_cache_scan() sb
      is protected by shrinker_rwsem: it cannot be freed if its slab shrinkers
      are still active. Inside writeback super-block comes from inode from bdi
      writeback list under wb->list_lock.
      
      This patch removes locking sb_lock and checks s_instances under s_umount:
      generic_shutdown_super() unlinks it under sb->s_umount locked for write.
      New variant is called trylock_super() and since it only locks semaphore,
      callers must call up_read(&sb->s_umount) instead of drop_super(sb) when
      they're done.
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      eb6ef3df
  6. 13 2月, 2015 5 次提交
    • V
      fs: shrinker: always scan at least one object of each type · 49e7e7ff
      Vladimir Davydov 提交于
      In super_cache_scan() we divide the number of objects of particular type
      by the total number of objects in order to distribute pressure among As a
      result, in some corner cases we can get nr_to_scan=0 even if there are
      some objects to reclaim, e.g.  dentries=1, inodes=1, fs_objects=1,
      nr_to_scan=1/3=0.
      
      This is unacceptable for per memcg kmem accounting, because this means
      that some objects may never get reclaimed after memcg death, preventing it
      from being freed.
      
      This patch therefore assures that super_cache_scan() will scan at least
      one object of each type if any.
      
      [akpm@linux-foundation.org: add comment]
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dave Chinner <david@fromorbit.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      49e7e7ff
    • V
      fs: make shrinker memcg aware · 2acb60a0
      Vladimir Davydov 提交于
      Now, to make any list_lru-based shrinker memcg aware we should only
      initialize its list_lru as memcg aware.  Let's do it for the general FS
      shrinker (super_block::s_shrink).
      
      There are other FS-specific shrinkers that use list_lru for storing
      objects, such as XFS and GFS2 dquot cache shrinkers, but since they
      reclaim objects that are shared among different cgroups, there is no point
      making them memcg aware.  It's a big question whether we should account
      them to memcg at all.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Glauber Costa <glommer@gmail.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2acb60a0
    • V
      list_lru: organize all list_lrus to list · c0a5b560
      Vladimir Davydov 提交于
      To make list_lru memcg aware, we need all list_lrus to be kept on a list
      protected by a mutex, so that we could sleep while walking over the
      list.
      
      Therefore after this change list_lru_destroy may sleep.  Fortunately,
      there is only one user that calls it from an atomic context - it's
      put_super - and we can easily fix it by calling list_lru_destroy before
      put_super in destroy_locked_super - anyway we don't longer need lrus by
      that time.
      
      Another point that should be noted is that list_lru_destroy is allowed
      to be called on an uninitialized zeroed-out object, in which case it is
      a no-op.  Before this patch this was guaranteed by kfree, but now we
      need an explicit check there.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Glauber Costa <glommer@gmail.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c0a5b560
    • V
      fs: consolidate {nr,free}_cached_objects args in shrink_control · 4101b624
      Vladimir Davydov 提交于
      We are going to make FS shrinkers memcg-aware.  To achieve that, we will
      have to pass the memcg to scan to the nr_cached_objects and
      free_cached_objects VFS methods, which currently take only the NUMA node
      to scan.  Since the shrink_control structure already holds the node, and
      the memcg to scan will be added to it when we introduce memcg-aware
      vmscan, let us consolidate the methods' arguments in this structure to
      keep things clean.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Suggested-by: NDave Chinner <david@fromorbit.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Glauber Costa <glommer@gmail.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4101b624
    • V
      list_lru: introduce list_lru_shrink_{count,walk} · 503c358c
      Vladimir Davydov 提交于
      Kmem accounting of memcg is unusable now, because it lacks slab shrinker
      support.  That means when we hit the limit we will get ENOMEM w/o any
      chance to recover.  What we should do then is to call shrink_slab, which
      would reclaim old inode/dentry caches from this cgroup.  This is what
      this patch set is intended to do.
      
      Basically, it does two things.  First, it introduces the notion of
      per-memcg slab shrinker.  A shrinker that wants to reclaim objects per
      cgroup should mark itself as SHRINKER_MEMCG_AWARE.  Then it will be
      passed the memory cgroup to scan from in shrink_control->memcg.  For
      such shrinkers shrink_slab iterates over the whole cgroup subtree under
      the target cgroup and calls the shrinker for each kmem-active memory
      cgroup.
      
      Secondly, this patch set makes the list_lru structure per-memcg.  It's
      done transparently to list_lru users - everything they have to do is to
      tell list_lru_init that they want memcg-aware list_lru.  Then the
      list_lru will automatically distribute objects among per-memcg lists
      basing on which cgroup the object is accounted to.  This way to make FS
      shrinkers (icache, dcache) memcg-aware we only need to make them use
      memcg-aware list_lru, and this is what this patch set does.
      
      As before, this patch set only enables per-memcg kmem reclaim when the
      pressure goes from memory.limit, not from memory.kmem.limit.  Handling
      memory.kmem.limit is going to be tricky due to GFP_NOFS allocations, and
      it is still unclear whether we will have this knob in the unified
      hierarchy.
      
      This patch (of 9):
      
      NUMA aware slab shrinkers use the list_lru structure to distribute
      objects coming from different NUMA nodes to different lists.  Whenever
      such a shrinker needs to count or scan objects from a particular node,
      it issues commands like this:
      
              count = list_lru_count_node(lru, sc->nid);
              freed = list_lru_walk_node(lru, sc->nid, isolate_func,
                                         isolate_arg, &sc->nr_to_scan);
      
      where sc is an instance of the shrink_control structure passed to it
      from vmscan.
      
      To simplify this, let's add special list_lru functions to be used by
      shrinkers, list_lru_shrink_count() and list_lru_shrink_walk(), which
      consolidate the nid and nr_to_scan arguments in the shrink_control
      structure.
      
      This will also allow us to avoid patching shrinkers that use list_lru
      when we make shrink_slab() per-memcg - all we will have to do is extend
      the shrink_control structure to include the target memcg and make
      list_lru_shrink_{count,walk} handle this appropriately.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Suggested-by: NDave Chinner <david@fromorbit.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Glauber Costa <glommer@gmail.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      503c358c
  7. 03 2月, 2015 1 次提交
  8. 26 1月, 2015 1 次提交
  9. 21 1月, 2015 1 次提交
  10. 10 11月, 2014 2 次提交
  11. 09 10月, 2014 1 次提交
  12. 08 9月, 2014 1 次提交
    • T
      percpu_counter: add @gfp to percpu_counter_init() · 908c7f19
      Tejun Heo 提交于
      Percpu allocator now supports allocation mask.  Add @gfp to
      percpu_counter_init() so that !GFP_KERNEL allocation masks can be used
      with percpu_counters too.
      
      We could have left percpu_counter_init() alone and added
      percpu_counter_init_gfp(); however, the number of users isn't that
      high and introducing _gfp variants to all percpu data structures would
      be quite ugly, so let's just do the conversion.  This is the one with
      the most users.  Other percpu data structures are a lot easier to
      convert.
      
      This patch doesn't make any functional difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NJan Kara <jack@suse.cz>
      Acked-by: N"David S. Miller" <davem@davemloft.net>
      Cc: x86@kernel.org
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      908c7f19
  13. 08 8月, 2014 3 次提交
  14. 16 7月, 2014 1 次提交
  15. 05 6月, 2014 2 次提交
    • T
      fs/superblock: avoid locking counting inodes and dentries before reclaiming them · d23da150
      Tim Chen 提交于
      We remove the call to grab_super_passive in call to super_cache_count.
      This becomes a scalability bottleneck as multiple threads are trying to do
      memory reclamation, e.g.  when we are doing large amount of file read and
      page cache is under pressure.  The cached objects quickly got reclaimed
      down to 0 and we are aborting the cache_scan() reclaim.  But counting
      creates a log jam acquiring the sb_lock.
      
      We are holding the shrinker_rwsem which ensures the safety of call to
      list_lru_count_node() and s_op->nr_cached_objects.  The shrinker is
      unregistered now before ->kill_sb() so the operation is safe when we are
      doing unmount.
      
      The impact will depend heavily on the machine and the workload but for a
      small machine using postmark tuned to use 4xRAM size the results were
      
                                        3.15.0-rc5            3.15.0-rc5
                                           vanilla         shrinker-v1r1
      Ops/sec Transactions         21.00 (  0.00%)       24.00 ( 14.29%)
      Ops/sec FilesCreate          39.00 (  0.00%)       44.00 ( 12.82%)
      Ops/sec CreateTransact       10.00 (  0.00%)       12.00 ( 20.00%)
      Ops/sec FilesDeleted       6202.00 (  0.00%)     6202.00 (  0.00%)
      Ops/sec DeleteTransact       11.00 (  0.00%)       12.00 (  9.09%)
      Ops/sec DataRead/MB          25.97 (  0.00%)       29.10 ( 12.05%)
      Ops/sec DataWrite/MB         49.99 (  0.00%)       56.02 ( 12.06%)
      
      ffsb running in a configuration that is meant to simulate a mail server showed
      
                                       3.15.0-rc5             3.15.0-rc5
                                          vanilla          shrinker-v1r1
      Ops/sec readall           9402.63 (  0.00%)      9567.97 (  1.76%)
      Ops/sec create            4695.45 (  0.00%)      4735.00 (  0.84%)
      Ops/sec delete             173.72 (  0.00%)       179.83 (  3.52%)
      Ops/sec Transactions     14271.80 (  0.00%)     14482.81 (  1.48%)
      Ops/sec Read                37.00 (  0.00%)        37.60 (  1.62%)
      Ops/sec Write               18.20 (  0.00%)        18.30 (  0.55%)
      Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Tested-by: NYuanhan Liu <yuanhan.liu@linux.intel.com>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: Jan Kara <jack@suse.cz>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d23da150
    • D
      fs/superblock: unregister sb shrinker before ->kill_sb() · 28f2cd4f
      Dave Chinner 提交于
      This series is aimed at regressions noticed during reclaim activity.  The
      first two patches are shrinker patches that were posted ages ago but never
      merged for reasons that are unclear to me.  I'm posting them again to see
      if there was a reason they were dropped or if they just got lost.  Dave?
      Time?  The last patch adjusts proportional reclaim.  Yuanhan Liu, can you
      retest the vm scalability test cases on a larger machine?  Hugh, does this
      work for you on the memcg test cases?
      
      Based on ext4, I get the following results but unfortunately my larger
      test machines are all unavailable so this is based on a relatively small
      machine.
      
      postmark
                                        3.15.0-rc5            3.15.0-rc5
                                           vanilla       proportion-v1r4
      Ops/sec Transactions         21.00 (  0.00%)       25.00 ( 19.05%)
      Ops/sec FilesCreate          39.00 (  0.00%)       45.00 ( 15.38%)
      Ops/sec CreateTransact       10.00 (  0.00%)       12.00 ( 20.00%)
      Ops/sec FilesDeleted       6202.00 (  0.00%)     6202.00 (  0.00%)
      Ops/sec DeleteTransact       11.00 (  0.00%)       12.00 (  9.09%)
      Ops/sec DataRead/MB          25.97 (  0.00%)       30.02 ( 15.59%)
      Ops/sec DataWrite/MB         49.99 (  0.00%)       57.78 ( 15.58%)
      
      ffsb (mail server simulator)
                                       3.15.0-rc5             3.15.0-rc5
                                          vanilla        proportion-v1r4
      Ops/sec readall           9402.63 (  0.00%)      9805.74 (  4.29%)
      Ops/sec create            4695.45 (  0.00%)      4781.39 (  1.83%)
      Ops/sec delete             173.72 (  0.00%)       177.23 (  2.02%)
      Ops/sec Transactions     14271.80 (  0.00%)     14764.37 (  3.45%)
      Ops/sec Read                37.00 (  0.00%)        38.50 (  4.05%)
      Ops/sec Write               18.20 (  0.00%)        18.50 (  1.65%)
      
      dd of a large file
                                      3.15.0-rc5            3.15.0-rc5
                                         vanilla       proportion-v1r4
      WallTime DownloadTar       75.00 (  0.00%)       61.00 ( 18.67%)
      WallTime DD               423.00 (  0.00%)      401.00 (  5.20%)
      WallTime Delete             2.00 (  0.00%)        5.00 (-150.00%)
      
      stutter (times mmap latency during large amounts of IO)
      
                                  3.15.0-rc5            3.15.0-rc5
                                     vanilla       proportion-v1r4
      Unit >5ms Delays  80252.0000 (  0.00%)  81523.0000 ( -1.58%)
      Unit Mmap min         8.2118 (  0.00%)      8.3206 ( -1.33%)
      Unit Mmap mean       17.4614 (  0.00%)     17.2868 (  1.00%)
      Unit Mmap stddev     24.9059 (  0.00%)     34.6771 (-39.23%)
      Unit Mmap max      2811.6433 (  0.00%)   2645.1398 (  5.92%)
      Unit Mmap 90%        20.5098 (  0.00%)     18.3105 ( 10.72%)
      Unit Mmap 93%        22.9180 (  0.00%)     20.1751 ( 11.97%)
      Unit Mmap 95%        25.2114 (  0.00%)     22.4988 ( 10.76%)
      Unit Mmap 99%        46.1430 (  0.00%)     43.5952 (  5.52%)
      Unit Ideal  Tput     85.2623 (  0.00%)     78.8906 (  7.47%)
      Unit Tput min        44.0666 (  0.00%)     43.9609 (  0.24%)
      Unit Tput mean       45.5646 (  0.00%)     45.2009 (  0.80%)
      Unit Tput stddev      0.9318 (  0.00%)      1.1084 (-18.95%)
      Unit Tput max        46.7375 (  0.00%)     46.7539 ( -0.04%)
      
      This patch (of 3):
      
      We will like to unregister the sb shrinker before ->kill_sb().  This will
      allow cached objects to be counted without call to grab_super_passive() to
      update ref count on sb.  We want to avoid locking during memory
      reclamation especially when we are skipping the memory reclaim when we are
      out of cached objects.
      
      This is safe because grab_super_passive does a try-lock on the
      sb->s_umount now, and so if we are in the unmount process, it won't ever
      block.  That means what used to be a deadlock and races we were avoiding
      by using grab_super_passive() is now:
      
              shrinker                        umount
      
              down_read(shrinker_rwsem)
                                              down_write(sb->s_umount)
                                              shrinker_unregister
                                                down_write(shrinker_rwsem)
                                                  <blocks>
              grab_super_passive(sb)
                down_read_trylock(sb->s_umount)
                  <fails>
              <shrinker aborts>
              ....
              <shrinkers finish running>
              up_read(shrinker_rwsem)
                                                <unblocks>
                                                <removes shrinker>
                                                up_write(shrinker_rwsem)
                                              ->kill_sb()
                                              ....
      
      So it is safe to deregister the shrinker before ->kill_sb().
      Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Tested-by: NYuanhan Liu <yuanhan.liu@linux.intel.com>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: Jan Kara <jack@suse.cz>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      28f2cd4f
  16. 17 4月, 2014 1 次提交
  17. 13 3月, 2014 1 次提交
    • T
      fs: push sync_filesystem() down to the file system's remount_fs() · 02b9984d
      Theodore Ts'o 提交于
      Previously, the no-op "mount -o mount /dev/xxx" operation when the
      file system is already mounted read-write causes an implied,
      unconditional syncfs().  This seems pretty stupid, and it's certainly
      documented or guaraunteed to do this, nor is it particularly useful,
      except in the case where the file system was mounted rw and is getting
      remounted read-only.
      
      However, it's possible that there might be some file systems that are
      actually depending on this behavior.  In most file systems, it's
      probably fine to only call sync_filesystem() when transitioning from
      read-write to read-only, and there are some file systems where this is
      not needed at all (for example, for a pseudo-filesystem or something
      like romfs).
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Artem Bityutskiy <dedekind1@gmail.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Evgeniy Dushistov <dushistov@mail.ru>
      Cc: Jan Kara <jack@suse.cz>
      Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Anders Larsen <al@alarsen.net>
      Cc: Phillip Lougher <phillip@squashfs.org.uk>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
      Cc: Petr Vandrovec <petr@vandrovec.name>
      Cc: xfs@oss.sgi.com
      Cc: linux-btrfs@vger.kernel.org
      Cc: linux-cifs@vger.kernel.org
      Cc: samba-technical@lists.samba.org
      Cc: codalist@coda.cs.cmu.edu
      Cc: linux-ext4@vger.kernel.org
      Cc: linux-f2fs-devel@lists.sourceforge.net
      Cc: fuse-devel@lists.sourceforge.net
      Cc: cluster-devel@redhat.com
      Cc: linux-mtd@lists.infradead.org
      Cc: jfs-discussion@lists.sourceforge.net
      Cc: linux-nfs@vger.kernel.org
      Cc: linux-nilfs@vger.kernel.org
      Cc: linux-ntfs-dev@lists.sourceforge.net
      Cc: ocfs2-devel@oss.oracle.com
      Cc: reiserfs-devel@vger.kernel.org
      02b9984d
  18. 01 2月, 2014 1 次提交
  19. 22 1月, 2014 1 次提交
  20. 09 11月, 2013 1 次提交
  21. 25 10月, 2013 2 次提交
  22. 02 10月, 2013 1 次提交
    • A
      fs/super.c: fix lru_list leak for real · c2d22ecd
      Al Viro 提交于
      Freeing ->s_{inode,dentry}_lru in deactivate_locked_super() is wrong;
      the right place is destroy_super().  As it is, we leak them if sget()
      decides that new superblock it has allocated (and never shown to
      anybody) isn't needed and should be freed.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      c2d22ecd
  23. 11 9月, 2013 5 次提交
    • G
      super: fix for destroy lrus · f5e1dd34
      Glauber Costa 提交于
      This patch adds the missing call to list_lru_destroy (spotted by Li Zhong)
      and moves the deletion to after the shrinker is unregistered, as correctly
      spotted by Dave
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Dave Chinner <dchinner@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      f5e1dd34
    • G
      list_lru: dynamically adjust node arrays · 5ca302c8
      Glauber Costa 提交于
      We currently use a compile-time constant to size the node array for the
      list_lru structure.  Due to this, we don't need to allocate any memory at
      initialization time.  But as a consequence, the structures that contain
      embedded list_lru lists can become way too big (the superblock for
      instance contains two of them).
      
      This patch aims at ameliorating this situation by dynamically allocating
      the node arrays with the firmware provided nr_node_ids.
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5ca302c8
    • D
      fs: convert inode and dentry shrinking to be node aware · 9b17c623
      Dave Chinner 提交于
      Now that the shrinker is passing a node in the scan control structure, we
      can pass this to the the generic LRU list code to isolate reclaim to the
      lists on matching nodes.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9b17c623
    • D
      dcache: convert to use new lru list infrastructure · f6041567
      Dave Chinner 提交于
      [glommer@openvz.org: don't reintroduce double decrement of nr_unused_dentries, adapted for new LRU return codes]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      f6041567
    • D
      inode: convert inode lru list to generic lru list code. · bc3b14cb
      Dave Chinner 提交于
      [glommer@openvz.org: adapted for new LRU return codes]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      bc3b14cb