1. 17 11月, 2015 1 次提交
  2. 04 9月, 2015 1 次提交
  3. 09 6月, 2015 1 次提交
  4. 03 6月, 2015 2 次提交
    • A
      gfs2: limit quota log messages · 9cde2898
      Abhi Das 提交于
      This patch makes the quota subsystem only report once that a
      particular user/group has exceeded their allotted quota.
      
      Previously, it was possible for a program to continuously try
      exceeding quota (despite receiving EDQUOT) and in turn trigger
      gfs2 to issue a kernel log message about quota exceed. In theory,
      this could get out of hand and flood the log and the filesystem
      hosting the log files.
      Signed-off-by: NAbhi Das <adas@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      9cde2898
    • A
      gfs2: fix quota updates on block boundaries · 39a72580
      Abhi Das 提交于
      For smaller block sizes (512B, 1K, 2K), some quotas straddle block
      boundaries such that the usage value is on one block and the rest
      of the quota is on the previous block. In such cases, the value
      does not get updated correctly. This patch fixes that by addressing
      the boundary conditions correctly.
      
      This patch also adds a (s64) cast that was missing in a call to
      gfs2_quota_change() in inode.c
      Signed-off-by: NAbhi Das <adas@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      39a72580
  5. 08 4月, 2015 1 次提交
    • A
      gfs2: fix quota refresh race in do_glock() · 30133177
      Abhi Das 提交于
      quotad periodically syncs in-memory quotas to the ondisk quota file
      and sets the QDF_REFRESH flag so that a subsequent read of a synced
      quota is re-read from disk.
      
      gfs2_quota_lock() checks for this flag and sets a 'force' bit to
      force re-read from disk if requested. However, there is a race
      condition here. It is possible for gfs2_quota_lock() to find the
      QDF_REFRESH flag unset (i.e force=0) and quotad comes in immediately
      after and syncs the relevant quota and sets the QDF_REFRESH flag.
      gfs2_quota_lock() resumes with force=0 and uses the stale in-memory
      quota usage values that result in miscalculations.
      
      This patch fixes this race by moving the check for the QDF_REFRESH
      flag check further out into the gfs2_quota_lock() process, i.e, in
      do_glock(), under the protection of the quota glock.
      Signed-off-by: NAbhi Das <adas@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
      30133177
  6. 19 3月, 2015 2 次提交
    • A
      gfs2: allow quota_check and inplace_reserve to return available blocks · 25435e5e
      Abhi Das 提交于
      struct gfs2_alloc_parms is passed to gfs2_quota_check() and
      gfs2_inplace_reserve() with ap->target containing the number of
      blocks being requested for allocation in the current operation.
      
      We add a new field to struct gfs2_alloc_parms called 'allowed'.
      gfs2_quota_check() and gfs2_inplace_reserve() return the max
      blocks allowed by quota and the max blocks allowed by the chosen
      rgrp respectively in 'allowed'.
      
      A new field 'min_target', when non-zero, tells gfs2_quota_check()
      and gfs2_inplace_reserve() to not return -EDQUOT/-ENOSPC when
      there are atleast 'min_target' blocks allowable/available. The
      assumption is that the caller is ok with just 'min_target' blocks
      and will likely proceed with allocating them.
      Signed-off-by: NAbhi Das <adas@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
      25435e5e
    • A
      gfs2: perform quota checks against allocation parameters · b8fbf471
      Abhi Das 提交于
      Use struct gfs2_alloc_parms as an argument to gfs2_quota_check()
      and gfs2_quota_lock_check() to check for quota violations while
      accounting for the new blocks requested by the current operation
      in ap->target.
      
      Previously, the number of new blocks requested during an operation
      were not accounted for during quota_check and would allow these
      operations to exceed quota. This was not very apparent since most
      operations allocated only 1 block at a time and quotas would get
      violated in the next operation. i.e. quota excess would only be by
      1 block or so. With fallocate, (where we allocate a bunch of blocks
      at once) the quota excess is non-trivial and is addressed by this
      patch.
      Signed-off-by: NAbhi Das <adas@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
      b8fbf471
  7. 04 3月, 2015 1 次提交
  8. 13 2月, 2015 2 次提交
    • V
      list_lru: add helpers to isolate items · 3f97b163
      Vladimir Davydov 提交于
      Currently, the isolate callback passed to the list_lru_walk family of
      functions is supposed to just delete an item from the list upon returning
      LRU_REMOVED or LRU_REMOVED_RETRY, while nr_items counter is fixed by
      __list_lru_walk_one after the callback returns.  Since the callback is
      allowed to drop the lock after removing an item (it has to return
      LRU_REMOVED_RETRY then), the nr_items can be less than the actual number
      of elements on the list even if we check them under the lock.  This makes
      it difficult to move items from one list_lru_one to another, which is
      required for per-memcg list_lru reparenting - we can't just splice the
      lists, we have to move entries one by one.
      
      This patch therefore introduces helpers that must be used by callback
      functions to isolate items instead of raw list_del/list_move.  These are
      list_lru_isolate and list_lru_isolate_move.  They not only remove the
      entry from the list, but also fix the nr_items counter, making sure
      nr_items always reflects the actual number of elements on the list if
      checked under the appropriate lock.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3f97b163
    • V
      list_lru: introduce list_lru_shrink_{count,walk} · 503c358c
      Vladimir Davydov 提交于
      Kmem accounting of memcg is unusable now, because it lacks slab shrinker
      support.  That means when we hit the limit we will get ENOMEM w/o any
      chance to recover.  What we should do then is to call shrink_slab, which
      would reclaim old inode/dentry caches from this cgroup.  This is what
      this patch set is intended to do.
      
      Basically, it does two things.  First, it introduces the notion of
      per-memcg slab shrinker.  A shrinker that wants to reclaim objects per
      cgroup should mark itself as SHRINKER_MEMCG_AWARE.  Then it will be
      passed the memory cgroup to scan from in shrink_control->memcg.  For
      such shrinkers shrink_slab iterates over the whole cgroup subtree under
      the target cgroup and calls the shrinker for each kmem-active memory
      cgroup.
      
      Secondly, this patch set makes the list_lru structure per-memcg.  It's
      done transparently to list_lru users - everything they have to do is to
      tell list_lru_init that they want memcg-aware list_lru.  Then the
      list_lru will automatically distribute objects among per-memcg lists
      basing on which cgroup the object is accounted to.  This way to make FS
      shrinkers (icache, dcache) memcg-aware we only need to make them use
      memcg-aware list_lru, and this is what this patch set does.
      
      As before, this patch set only enables per-memcg kmem reclaim when the
      pressure goes from memory.limit, not from memory.kmem.limit.  Handling
      memory.kmem.limit is going to be tricky due to GFP_NOFS allocations, and
      it is still unclear whether we will have this knob in the unified
      hierarchy.
      
      This patch (of 9):
      
      NUMA aware slab shrinkers use the list_lru structure to distribute
      objects coming from different NUMA nodes to different lists.  Whenever
      such a shrinker needs to count or scan objects from a particular node,
      it issues commands like this:
      
              count = list_lru_count_node(lru, sc->nid);
              freed = list_lru_walk_node(lru, sc->nid, isolate_func,
                                         isolate_arg, &sc->nr_to_scan);
      
      where sc is an instance of the shrink_control structure passed to it
      from vmscan.
      
      To simplify this, let's add special list_lru functions to be used by
      shrinkers, list_lru_shrink_count() and list_lru_shrink_walk(), which
      consolidate the nid and nr_to_scan arguments in the shrink_control
      structure.
      
      This will also allow us to avoid patching shrinkers that use list_lru
      when we make shrink_slab() per-memcg - all we will have to do is extend
      the shrink_control structure to include the target memcg and make
      list_lru_shrink_{count,walk} handle this appropriately.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Suggested-by: NDave Chinner <david@fromorbit.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Glauber Costa <glommer@gmail.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      503c358c
  9. 28 1月, 2015 1 次提交
    • J
      quota: Switch ->get_dqblk() and ->set_dqblk() to use bytes as space units · 14bf61ff
      Jan Kara 提交于
      Currently ->get_dqblk() and ->set_dqblk() use struct fs_disk_quota which
      tracks space limits and usage in 512-byte blocks. However VFS quotas
      track usage in bytes (as some filesystems require that) and we need to
      somehow pass this information. Upto now it wasn't a problem because we
      didn't do any unit conversion (thus VFS quota routines happily stuck
      number of bytes into d_bcount field of struct fd_disk_quota). Only if
      you tried to use Q_XGETQUOTA or Q_XSETQLIM for VFS quotas (or Q_GETQUOTA
      / Q_SETQUOTA for XFS quotas), you got bogus results. Hardly anyone
      tried this but reportedly some Samba users hit the problem in practice.
      So when we want interfaces compatible we need to fix this.
      
      We bite the bullet and define another quota structure used for passing
      information from/to ->get_dqblk()/->set_dqblk. It's somewhat sad we have
      to have more conversion routines in fs/quota/quota.c and another copying
      of quota structure slows down getting of quota information by about 2%
      but it seems cleaner than overloading e.g. units of d_bcount to bytes.
      
      CC: stable@vger.kernel.org
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      14bf61ff
  10. 20 11月, 2014 1 次提交
  11. 14 5月, 2014 1 次提交
    • B
      GFS2: remove transaction glock · 24972557
      Benjamin Marzinski 提交于
      GFS2 has a transaction glock, which must be grabbed for every
      transaction, whose purpose is to deal with freezing the filesystem.
      Aside from this involving a large amount of locking, it is very easy to
      make the current fsfreeze code hang on unfreezing.
      
      This patch rewrites how gfs2 handles freezing the filesystem. The
      transaction glock is removed. In it's place is a freeze glock, which is
      cached (but not held) in a shared state by every node in the cluster
      when the filesystem is mounted. This lock only needs to be grabbed on
      freezing, and actions which need to be safe from freezing, like
      recovery.
      
      When a node wants to freeze the filesystem, it grabs this glock
      exclusively.  When the freeze glock state changes on the nodes (either
      from shared to unlocked, or shared to exclusive), the filesystem does a
      special log flush.  gfs2_log_flush() does all the work for flushing out
      the and shutting down the incore log, and then it tries to grab the
      freeze glock in a shared state again.  Since the filesystem is stuck in
      gfs2_log_flush, no new transaction can start, and nothing can be written
      to disk. Unfreezing the filesytem simply involes dropping the freeze
      glock, allowing gfs2_log_flush() to grab and then release the shared
      lock, so it is cached for next time.
      
      However, in order for the unfreezing ioctl to occur, gfs2 needs to get a
      shared lock on the filesystem root directory inode to check permissions.
      If that glock has already been grabbed exclusively, fsfreeze will be
      unable to get the shared lock and unfreeze the filesystem.
      
      In order to allow the unfreeze, this patch makes gfs2 grab a shared lock
      on the filesystem root directory during the freeze, and hold it until it
      unfreezes the filesystem.  The functions which need to grab a shared
      lock in order to allow the unfreeze ioctl to be issued now use the lock
      grabbed by the freeze code instead.
      
      The freeze and unfreeze code take care to make sure that this shared
      lock will not be dropped while another process is using it.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      24972557
  12. 17 4月, 2014 1 次提交
  13. 31 3月, 2014 1 次提交
  14. 07 3月, 2014 3 次提交
  15. 27 2月, 2014 1 次提交
  16. 15 1月, 2014 5 次提交
    • S
      GFS2: Fix kbuild test robot reported warning · 1e3d3620
      Steven Whitehouse 提交于
      Well I don't get the same warning locally as the kbuild
      robot, but I guess this should fix the problem, anyway.
      Here is the warning:
      
      head:   2d9e7230
      commit: ee2411a8 [19/20] GFS2: Clean up quota slot allocation
      config: make ARCH=powerpc allmodconfig
      
      All error/warnings:
      
         fs/gfs2/quota.c: In function 'gfs2_quota_init':
      >> fs/gfs2/quota.c:1246:3: error: implicit declaration of function '__vmalloc' [-Werror=implicit-function-declaration]
            sdp->sd_quota_bitmap = __vmalloc(bm_size, GFP_NOFS, PAGE_KERNEL);
            ^
      >> fs/gfs2/quota.c:1246:24: warning: assignment makes pointer from integer without a cast [enabled by default]
            sdp->sd_quota_bitmap = __vmalloc(bm_size, GFP_NOFS, PAGE_KERNEL);
                                 ^
         fs/gfs2/quota.c: In function 'gfs2_quota_cleanup':
      >> fs/gfs2/quota.c:1361:4: error: implicit declaration of function 'vfree' [-Werror=implicit-function-declaration]
             vfree(sdp->sd_quota_bitmap);
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      1e3d3620
    • S
      GFS2: Move quota bitmap operations under their own lock · 2d9e7230
      Steven Whitehouse 提交于
      Gradually, the global qd_lock is being used for less and less.
      After this patch it will only be used for the per super block
      list whose purpose is to allow syncing of changes back to the
      master quota file from the local quota changes file. Fixing
      up that process to make it more efficient will be the subject
      of a later patch, however this patch removes another barrier
      to doing that.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Abhijith Das <adas@redhat.com>
      2d9e7230
    • S
      GFS2: Clean up quota slot allocation · ee2411a8
      Steven Whitehouse 提交于
      Quota slot allocation has historically used a vector of pages
      and a set of homegrown find/test/set/clear bit functions. Since
      the size of the bitmap is likely to be based on the default
      qc file size, thats a couple of pages at most. So we ought
      to be able to allocate that as a single chunk, with a vmalloc
      fallback, just in case of memory fragmentation.
      
      We are then able to use the kernel's own find/test/set/clear
      bit functions, rather than rolling our own.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Abhijith Das <adas@redhat.com>
      ee2411a8
    • S
      GFS2: Only run logd and quota when mounted read/write · 8ad151c2
      Steven Whitehouse 提交于
      While investigating a rather strange bit of code in the quota
      clean up function, I spotted that the reason for its existence
      was that when remounting read only, we were not stopping the
      quotad thread, and thus it was possible for it to still have
      a reference to some of the quotas in that case.
      
      This patch moves the logd and quota thread start and stop into
      the make_fs_rw/ro functions, so that we now stop those threads
      when mounted read only.
      
      This means that quotad will always be stopped before we call
      the quota clean up function, and we can thus dispose of the
      (rather hackish) code that waits for it to give up its
      reference on the quotas.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Abhijith Das <adas@redhat.com>
      8ad151c2
    • S
      GFS2: Use RCU/hlist_bl based hash for quotas · c754fbbb
      Steven Whitehouse 提交于
      Prior to this patch, GFS2 kept all the quotas for each
      super block in a single linked list. This is rather slow
      when there are large numbers of quotas.
      
      This patch introduces a hlist_bl based hash table, similar
      to the one used for glocks. The initial look up of the quota
      is now lockless in the case where it is already cached,
      although we still have to take the per quota spinlock in
      order to bump the ref count. Either way though, this is a
      big improvement on what was there before.
      
      The qd_lock and the per super block list is preserved, for
      the time being. However it is intended that since this is no
      longer used for its original role, it should be possible to
      shrink the number of items on that list in due course and
      remove the requirement to take qd_lock in qd_get.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Abhijith Das <adas@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      c754fbbb
  17. 03 1月, 2014 1 次提交
  18. 16 11月, 2013 1 次提交
  19. 04 11月, 2013 3 次提交
  20. 04 10月, 2013 4 次提交
  21. 02 10月, 2013 2 次提交
  22. 11 9月, 2013 2 次提交
    • D
      fs: convert fs shrinkers to new scan/count API · 1ab6c499
      Dave Chinner 提交于
      Convert the filesystem shrinkers to use the new API, and standardise some
      of the behaviours of the shrinkers at the same time.  For example,
      nr_to_scan means the number of objects to scan, not the number of objects
      to free.
      
      I refactored the CIFS idmap shrinker a little - it really needs to be
      broken up into a shrinker per tree and keep an item count with the tree
      root so that we don't need to walk the tree every time the shrinker needs
      to count the number of objects in the tree (i.e.  all the time under
      memory pressure).
      
      [glommer@openvz.org: fixes for ext4, ubifs, nfs, cifs and glock. Fixes are needed mainly due to new code merged in the tree]
      [assorted fixes folded in]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Acked-by: NJan Kara <jack@suse.cz>
      Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1ab6c499
    • G
      super: fix calculation of shrinkable objects for small numbers · 55f841ce
      Glauber Costa 提交于
      The sysctl knob sysctl_vfs_cache_pressure is used to determine which
      percentage of the shrinkable objects in our cache we should actively try
      to shrink.
      
      It works great in situations in which we have many objects (at least more
      than 100), because the aproximation errors will be negligible.  But if
      this is not the case, specially when total_objects < 100, we may end up
      concluding that we have no objects at all (total / 100 = 0, if total <
      100).
      
      This is certainly not the biggest killer in the world, but may matter in
      very low kernel memory situations.
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      55f841ce
  23. 05 6月, 2013 1 次提交
  24. 24 5月, 2013 1 次提交