1. 03 6月, 2015 1 次提交
    • A
      gfs2: fix quota updates on block boundaries · 39a72580
      Abhi Das 提交于
      For smaller block sizes (512B, 1K, 2K), some quotas straddle block
      boundaries such that the usage value is on one block and the rest
      of the quota is on the previous block. In such cases, the value
      does not get updated correctly. This patch fixes that by addressing
      the boundary conditions correctly.
      
      This patch also adds a (s64) cast that was missing in a call to
      gfs2_quota_change() in inode.c
      Signed-off-by: NAbhi Das <adas@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      39a72580
  2. 19 5月, 2015 1 次提交
  3. 06 5月, 2015 5 次提交
    • F
      gfs2: kerneldoc warning fixes · 1272574b
      Fabian Frederick 提交于
      Fixes the following kernel-doc warnings:
      Warning(fs/gfs2/aops.c:180): No description found for parameter 'wbc'
      Warning(fs/gfs2/aops.c:236): No description found for parameter 'end'
      Warning(fs/gfs2/aops.c:236): No description found for parameter 'done_index'
      Warning(fs/gfs2/aops.c:236): Excess function parameter 'writepage' description in 'gfs2_write_jdata_pagevec'
      Warning(fs/gfs2/aops.c:346): Excess function parameter 'writepage' description in 'gfs2_write_cache_jdata'
      Warning(fs/gfs2/aops.c:346): Excess function parameter 'data' description in 'gfs2_write_cache_jdata'
      Warning(fs/gfs2/aops.c:605): No description found for parameter 'file'
      Warning(fs/gfs2/aops.c:605): No description found for parameter 'mapping'
      Warning(fs/gfs2/aops.c:605): No description found for parameter 'pages'
      Warning(fs/gfs2/aops.c:605): No description found for parameter 'nr_pages'
      Warning(fs/gfs2/aops.c:870): No description found for parameter 'copied'
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      1272574b
    • F
      gfs2: convert simple_str to kstr · e50ead48
      Fabian Frederick 提交于
      -Remove obsolete simple_str functions.
      -Return error code when kstr failed.
      -This patch also calls functions corresponding to destination type.
      
      Thanks to Alexey Dobriyan for suggesting improvements in
      block_store() and wdack_store()
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      e50ead48
    • B
      GFS2: make sure S_NOSEC flag isn't overwritten · 01e64ee4
      Benjamin Marzinski 提交于
      At the end of gfs2_set_inode_flags inode->i_flags is set to flags, so
      we should be modifying flags instead of inode->i_flags, so it isn't
      overwritten.
      
      Signed-off-by: Benjamin Marzinski <bmarzins redhat com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      01e64ee4
    • B
      GFS2: add support for rename2 and RENAME_EXCHANGE · a63b7bbc
      Benjamin Marzinski 提交于
      gfs2 now uses the rename2 directory iop, and supports the
      RENAME_EXCHANGE flag (as well as RENAME_NOREPLACE, which the vfs
      takes care of).
      
      Signed-off-by: Benjamin Marzinski <bmarzins redhat com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      a63b7bbc
    • A
      gfs2: handle NULL rgd in set_rgrp_preferences · 959b6717
      Abhi Das 提交于
      The function set_rgrp_preferences() does not handle the (rarely
      returned) NULL value from gfs2_rgrpd_get_next() and this patch
      fixes that.
      
      The fs image in question is only 150MB in size which allows for
      only 1 rgrp to be created. The in-memory rb tree has only 1 node
      and when gfs2_rgrpd_get_next() is called on this sole rgrp, it
      returns NULL. (Default behavior is to wrap around the rb tree and
      return the first node to give the illusion of a circular linked
      list. In the case of only 1 rgrp, we can't have
      gfs2_rgrpd_get_next() return the same rgrp (first, last, next all
      point to the same rgrp)... that would cause unintended consequences
      and infinite loops.)
      Signed-off-by: NAbhi Das <adas@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      959b6717
  4. 02 5月, 2015 1 次提交
  5. 01 5月, 2015 1 次提交
    • B
      GFS2: mark the journal idle to fix ro mounts · 086cc672
      Benjamin Marzinski 提交于
      When gfs2 was mounted read-only and then unmounted, it was writing a
      header block to the journal in the syncing gfs2_log_flush() call from
      kill_sb(). This is because the journal was not being marked as idle
      until the first log header was written out, and on a read-only mount
      there never was a log header written out. Since the journal was not
      marked idle, gfs2_log_flush() was writing out a header lock to make
      sure it was empty during the sync.  Not only did this cause IO to a
      read-only filesystem, but the journalling isn't completely initialized
      on read-only mounts, and so gfs2 was writing out the wrong sequence
      number in the log header.
      
      Now, the journal is marked idle on mount, and gfs2_log_flush() won't
      write out anything until there starts being transactions to flush.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
      086cc672
  6. 24 4月, 2015 2 次提交
  7. 08 4月, 2015 1 次提交
    • A
      gfs2: fix quota refresh race in do_glock() · 30133177
      Abhi Das 提交于
      quotad periodically syncs in-memory quotas to the ondisk quota file
      and sets the QDF_REFRESH flag so that a subsequent read of a synced
      quota is re-read from disk.
      
      gfs2_quota_lock() checks for this flag and sets a 'force' bit to
      force re-read from disk if requested. However, there is a race
      condition here. It is possible for gfs2_quota_lock() to find the
      QDF_REFRESH flag unset (i.e force=0) and quotad comes in immediately
      after and syncs the relevant quota and sets the QDF_REFRESH flag.
      gfs2_quota_lock() resumes with force=0 and uses the stale in-memory
      quota usage values that result in miscalculations.
      
      This patch fixes this race by moving the check for the QDF_REFRESH
      flag check further out into the gfs2_quota_lock() process, i.e, in
      do_glock(), under the protection of the quota glock.
      Signed-off-by: NAbhi Das <adas@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
      30133177
  8. 30 3月, 2015 1 次提交
  9. 26 3月, 2015 1 次提交
  10. 19 3月, 2015 6 次提交
  11. 23 2月, 2015 1 次提交
    • D
      VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) · e36cb0b8
      David Howells 提交于
      Convert the following where appropriate:
      
       (1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry).
      
       (2) S_ISREG(dentry->d_inode) to d_is_reg(dentry).
      
       (3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry).  This is actually more
           complicated than it appears as some calls should be converted to
           d_can_lookup() instead.  The difference is whether the directory in
           question is a real dir with a ->lookup op or whether it's a fake dir with
           a ->d_automount op.
      
      In some circumstances, we can subsume checks for dentry->d_inode not being
      NULL into this, provided we the code isn't in a filesystem that expects
      d_inode to be NULL if the dirent really *is* negative (ie. if we're going to
      use d_inode() rather than d_backing_inode() to get the inode pointer).
      
      Note that the dentry type field may be set to something other than
      DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS
      manages the fall-through from a negative dentry to a lower layer.  In such a
      case, the dentry type of the negative union dentry is set to the same as the
      type of the lower dentry.
      
      However, if you know d_inode is not NULL at the call site, then you can use
      the d_is_xxx() functions even in a filesystem.
      
      There is one further complication: a 0,0 chardev dentry may be labelled
      DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE.  Strictly, this was
      intended for special directory entry types that don't have attached inodes.
      
      The following perl+coccinelle script was used:
      
      use strict;
      
      my @callers;
      open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') ||
          die "Can't grep for S_ISDIR and co. callers";
      @callers = <$fd>;
      close($fd);
      unless (@callers) {
          print "No matches\n";
          exit(0);
      }
      
      my @cocci = (
          '@@',
          'expression E;',
          '@@',
          '',
          '- S_ISLNK(E->d_inode->i_mode)',
          '+ d_is_symlink(E)',
          '',
          '@@',
          'expression E;',
          '@@',
          '',
          '- S_ISDIR(E->d_inode->i_mode)',
          '+ d_is_dir(E)',
          '',
          '@@',
          'expression E;',
          '@@',
          '',
          '- S_ISREG(E->d_inode->i_mode)',
          '+ d_is_reg(E)' );
      
      my $coccifile = "tmp.sp.cocci";
      open($fd, ">$coccifile") || die $coccifile;
      print($fd "$_\n") || die $coccifile foreach (@cocci);
      close($fd);
      
      foreach my $file (@callers) {
          chomp $file;
          print "Processing ", $file, "\n";
          system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 ||
      	die "spatch failed";
      }
      
      [AV: overlayfs parts skipped]
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e36cb0b8
  12. 13 2月, 2015 2 次提交
    • V
      list_lru: add helpers to isolate items · 3f97b163
      Vladimir Davydov 提交于
      Currently, the isolate callback passed to the list_lru_walk family of
      functions is supposed to just delete an item from the list upon returning
      LRU_REMOVED or LRU_REMOVED_RETRY, while nr_items counter is fixed by
      __list_lru_walk_one after the callback returns.  Since the callback is
      allowed to drop the lock after removing an item (it has to return
      LRU_REMOVED_RETRY then), the nr_items can be less than the actual number
      of elements on the list even if we check them under the lock.  This makes
      it difficult to move items from one list_lru_one to another, which is
      required for per-memcg list_lru reparenting - we can't just splice the
      lists, we have to move entries one by one.
      
      This patch therefore introduces helpers that must be used by callback
      functions to isolate items instead of raw list_del/list_move.  These are
      list_lru_isolate and list_lru_isolate_move.  They not only remove the
      entry from the list, but also fix the nr_items counter, making sure
      nr_items always reflects the actual number of elements on the list if
      checked under the appropriate lock.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3f97b163
    • V
      list_lru: introduce list_lru_shrink_{count,walk} · 503c358c
      Vladimir Davydov 提交于
      Kmem accounting of memcg is unusable now, because it lacks slab shrinker
      support.  That means when we hit the limit we will get ENOMEM w/o any
      chance to recover.  What we should do then is to call shrink_slab, which
      would reclaim old inode/dentry caches from this cgroup.  This is what
      this patch set is intended to do.
      
      Basically, it does two things.  First, it introduces the notion of
      per-memcg slab shrinker.  A shrinker that wants to reclaim objects per
      cgroup should mark itself as SHRINKER_MEMCG_AWARE.  Then it will be
      passed the memory cgroup to scan from in shrink_control->memcg.  For
      such shrinkers shrink_slab iterates over the whole cgroup subtree under
      the target cgroup and calls the shrinker for each kmem-active memory
      cgroup.
      
      Secondly, this patch set makes the list_lru structure per-memcg.  It's
      done transparently to list_lru users - everything they have to do is to
      tell list_lru_init that they want memcg-aware list_lru.  Then the
      list_lru will automatically distribute objects among per-memcg lists
      basing on which cgroup the object is accounted to.  This way to make FS
      shrinkers (icache, dcache) memcg-aware we only need to make them use
      memcg-aware list_lru, and this is what this patch set does.
      
      As before, this patch set only enables per-memcg kmem reclaim when the
      pressure goes from memory.limit, not from memory.kmem.limit.  Handling
      memory.kmem.limit is going to be tricky due to GFP_NOFS allocations, and
      it is still unclear whether we will have this knob in the unified
      hierarchy.
      
      This patch (of 9):
      
      NUMA aware slab shrinkers use the list_lru structure to distribute
      objects coming from different NUMA nodes to different lists.  Whenever
      such a shrinker needs to count or scan objects from a particular node,
      it issues commands like this:
      
              count = list_lru_count_node(lru, sc->nid);
              freed = list_lru_walk_node(lru, sc->nid, isolate_func,
                                         isolate_arg, &sc->nr_to_scan);
      
      where sc is an instance of the shrink_control structure passed to it
      from vmscan.
      
      To simplify this, let's add special list_lru functions to be used by
      shrinkers, list_lru_shrink_count() and list_lru_shrink_walk(), which
      consolidate the nid and nr_to_scan arguments in the shrink_control
      structure.
      
      This will also allow us to avoid patching shrinkers that use list_lru
      when we make shrink_slab() per-memcg - all we will have to do is extend
      the shrink_control structure to include the target memcg and make
      list_lru_shrink_{count,walk} handle this appropriately.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Suggested-by: NDave Chinner <david@fromorbit.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Glauber Costa <glommer@gmail.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      503c358c
  13. 11 2月, 2015 1 次提交
  14. 10 2月, 2015 1 次提交
  15. 05 2月, 2015 1 次提交
    • T
      vfs: add support for a lazytime mount option · 0ae45f63
      Theodore Ts'o 提交于
      Add a new mount option which enables a new "lazytime" mode.  This mode
      causes atime, mtime, and ctime updates to only be made to the
      in-memory version of the inode.  The on-disk times will only get
      updated when (a) if the inode needs to be updated for some non-time
      related change, (b) if userspace calls fsync(), syncfs() or sync(), or
      (c) just before an undeleted inode is evicted from memory.
      
      This is OK according to POSIX because there are no guarantees after a
      crash unless userspace explicitly requests via a fsync(2) call.
      
      For workloads which feature a large number of random write to a
      preallocated file, the lazytime mount option significantly reduces
      writes to the inode table.  The repeated 4k writes to a single block
      will result in undesirable stress on flash devices and SMR disk
      drives.  Even on conventional HDD's, the repeated writes to the inode
      table block will trigger Adjacent Track Interference (ATI) remediation
      latencies, which very negatively impact long tail latencies --- which
      is a very big deal for web serving tiers (for example).
      
      Google-Bug-Id: 18297052
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      0ae45f63
  16. 04 2月, 2015 1 次提交
  17. 28 1月, 2015 1 次提交
    • J
      quota: Switch ->get_dqblk() and ->set_dqblk() to use bytes as space units · 14bf61ff
      Jan Kara 提交于
      Currently ->get_dqblk() and ->set_dqblk() use struct fs_disk_quota which
      tracks space limits and usage in 512-byte blocks. However VFS quotas
      track usage in bytes (as some filesystems require that) and we need to
      somehow pass this information. Upto now it wasn't a problem because we
      didn't do any unit conversion (thus VFS quota routines happily stuck
      number of bytes into d_bcount field of struct fd_disk_quota). Only if
      you tried to use Q_XGETQUOTA or Q_XSETQLIM for VFS quotas (or Q_GETQUOTA
      / Q_SETQUOTA for XFS quotas), you got bogus results. Hardly anyone
      tried this but reportedly some Samba users hit the problem in practice.
      So when we want interfaces compatible we need to fix this.
      
      We bite the bullet and define another quota structure used for passing
      information from/to ->get_dqblk()/->set_dqblk. It's somewhat sad we have
      to have more conversion routines in fs/quota/quota.c and another copying
      of quota structure slows down getting of quota information by about 2%
      but it seems cleaner than overloading e.g. units of d_bcount to bytes.
      
      CC: stable@vger.kernel.org
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      14bf61ff
  18. 27 1月, 2015 1 次提交
  19. 21 1月, 2015 2 次提交
  20. 13 1月, 2015 1 次提交
  21. 09 1月, 2015 1 次提交
  22. 20 11月, 2014 7 次提交