1. 16 12月, 2010 1 次提交
    • S
      GFS2: Don't flush delete workqueue when releasing the transaction lock · 846f4045
      Steven Whitehouse 提交于
      There is no requirement to flush the delete workqueue before a
      gfs2 filesystem is suspended. The workqueue's work will just
      be suspended along with the rest of the tasks on the filesystem.
      
      The resolves a deadlock situation where the transaction lock's
      demotion code was trying to flush the delete workqueue while at
      the same time, the workqueue was waiting for the transaction
      lock.
      
      The delete workqueue is flushed by gfs2_make_fs_ro() already, so
      that umount/remount are correctly protected anyway.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      846f4045
  2. 08 12月, 2010 1 次提交
    • B
      GFS2: fsck.gfs2 reported statfs error after gfs2_grow · bcd7278d
      Bob Peterson 提交于
      When you do gfs2_grow it failed to take the very last
      rgrp into account when adding up the new free space due
      to an off-by-one error.  It was not reading the last
      rgrp from the rindex because of a check for "<=" that
      should have been "<".  Therefore, fsck.gfs2 was finding
      (and fixing) an error with the system statfs file.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      bcd7278d
  3. 30 11月, 2010 13 次提交
  4. 19 11月, 2010 1 次提交
  5. 15 11月, 2010 1 次提交
    • S
      GFS2: Fix inode deallocation race · 044b9414
      Steven Whitehouse 提交于
      This area of the code has always been a bit delicate due to the
      subtleties of lock ordering. The problem is that for "normal"
      alloc/dealloc, we always grab the inode locks first and the rgrp lock
      later.
      
      In order to ensure no races in looking up the unlinked, but still
      allocated inodes, we need to hold the rgrp lock when we do the lookup,
      which means that we can't take the inode glock.
      
      The solution is to borrow the technique already used by NFS to solve
      what is essentially the same problem (given an inode number, look up
      the inode carefully, checking that it really is in the expected
      state).
      
      We cannot do that directly from the allocation code (lock ordering
      again) so we give the job to the pre-existing delete workqueue and
      carry on with the allocation as normal.
      
      If we find there is no space, we do a journal flush (required anyway
      if space from a deallocation is to be released) which should block
      against the pending deallocations, so we should always get the space
      back.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      044b9414
  6. 31 10月, 2010 2 次提交
  7. 29 10月, 2010 1 次提交
  8. 27 10月, 2010 1 次提交
    • W
      writeback: remove nonblocking/encountered_congestion references · 1b430bee
      Wu Fengguang 提交于
      This removes more dead code that was somehow missed by commit 0d99519e
      (writeback: remove unused nonblocking and congestion checks).  There are
      no behavior change except for the removal of two entries from one of the
      ext4 tracing interface.
      
      The nonblocking checks in ->writepages are no longer used because the
      flusher now prefer to block on get_request_wait() than to skip inodes on
      IO congestion.  The latter will lead to more seeky IO.
      
      The nonblocking checks in ->writepage are no longer used because it's
      redundant with the WB_SYNC_NONE check.
      
      We no long set ->nonblocking in VM page out and page migration, because
      a) it's effectively redundant with WB_SYNC_NONE in current code
      b) it's old semantic of "Don't get stuck on request queues" is mis-behavior:
         that would skip some dirty inodes on congestion and page out others, which
         is unfair in terms of LRU age.
      
      Inspired by Christoph Hellwig. Thanks!
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Sage Weil <sage@newdream.net>
      Cc: Steve French <sfrench@samba.org>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1b430bee
  9. 26 10月, 2010 3 次提交
  10. 18 10月, 2010 1 次提交
  11. 15 10月, 2010 1 次提交
    • A
      llseek: automatically add .llseek fop · 6038f373
      Arnd Bergmann 提交于
      All file_operations should get a .llseek operation so we can make
      nonseekable_open the default for future file operations without a
      .llseek pointer.
      
      The three cases that we can automatically detect are no_llseek, seq_lseek
      and default_llseek. For cases where we can we can automatically prove that
      the file offset is always ignored, we use noop_llseek, which maintains
      the current behavior of not returning an error from a seek.
      
      New drivers should normally not use noop_llseek but instead use no_llseek
      and call nonseekable_open at open time.  Existing drivers can be converted
      to do the same when the maintainer knows for certain that no user code
      relies on calling seek on the device file.
      
      The generated code is often incorrectly indented and right now contains
      comments that clarify for each added line why a specific variant was
      chosen. In the version that gets submitted upstream, the comments will
      be gone and I will manually fix the indentation, because there does not
      seem to be a way to do that using coccinelle.
      
      Some amount of new code is currently sitting in linux-next that should get
      the same modifications, which I will do at the end of the merge window.
      
      Many thanks to Julia Lawall for helping me learn to write a semantic
      patch that does all this.
      
      ===== begin semantic patch =====
      // This adds an llseek= method to all file operations,
      // as a preparation for making no_llseek the default.
      //
      // The rules are
      // - use no_llseek explicitly if we do nonseekable_open
      // - use seq_lseek for sequential files
      // - use default_llseek if we know we access f_pos
      // - use noop_llseek if we know we don't access f_pos,
      //   but we still want to allow users to call lseek
      //
      @ open1 exists @
      identifier nested_open;
      @@
      nested_open(...)
      {
      <+...
      nonseekable_open(...)
      ...+>
      }
      
      @ open exists@
      identifier open_f;
      identifier i, f;
      identifier open1.nested_open;
      @@
      int open_f(struct inode *i, struct file *f)
      {
      <+...
      (
      nonseekable_open(...)
      |
      nested_open(...)
      )
      ...+>
      }
      
      @ read disable optional_qualifier exists @
      identifier read_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      expression E;
      identifier func;
      @@
      ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
      {
      <+...
      (
         *off = E
      |
         *off += E
      |
         func(..., off, ...)
      |
         E = *off
      )
      ...+>
      }
      
      @ read_no_fpos disable optional_qualifier exists @
      identifier read_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      @@
      ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
      {
      ... when != off
      }
      
      @ write @
      identifier write_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      expression E;
      identifier func;
      @@
      ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
      {
      <+...
      (
        *off = E
      |
        *off += E
      |
        func(..., off, ...)
      |
        E = *off
      )
      ...+>
      }
      
      @ write_no_fpos @
      identifier write_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      @@
      ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
      {
      ... when != off
      }
      
      @ fops0 @
      identifier fops;
      @@
      struct file_operations fops = {
       ...
      };
      
      @ has_llseek depends on fops0 @
      identifier fops0.fops;
      identifier llseek_f;
      @@
      struct file_operations fops = {
      ...
       .llseek = llseek_f,
      ...
      };
      
      @ has_read depends on fops0 @
      identifier fops0.fops;
      identifier read_f;
      @@
      struct file_operations fops = {
      ...
       .read = read_f,
      ...
      };
      
      @ has_write depends on fops0 @
      identifier fops0.fops;
      identifier write_f;
      @@
      struct file_operations fops = {
      ...
       .write = write_f,
      ...
      };
      
      @ has_open depends on fops0 @
      identifier fops0.fops;
      identifier open_f;
      @@
      struct file_operations fops = {
      ...
       .open = open_f,
      ...
      };
      
      // use no_llseek if we call nonseekable_open
      ////////////////////////////////////////////
      @ nonseekable1 depends on !has_llseek && has_open @
      identifier fops0.fops;
      identifier nso ~= "nonseekable_open";
      @@
      struct file_operations fops = {
      ...  .open = nso, ...
      +.llseek = no_llseek, /* nonseekable */
      };
      
      @ nonseekable2 depends on !has_llseek @
      identifier fops0.fops;
      identifier open.open_f;
      @@
      struct file_operations fops = {
      ...  .open = open_f, ...
      +.llseek = no_llseek, /* open uses nonseekable */
      };
      
      // use seq_lseek for sequential files
      /////////////////////////////////////
      @ seq depends on !has_llseek @
      identifier fops0.fops;
      identifier sr ~= "seq_read";
      @@
      struct file_operations fops = {
      ...  .read = sr, ...
      +.llseek = seq_lseek, /* we have seq_read */
      };
      
      // use default_llseek if there is a readdir
      ///////////////////////////////////////////
      @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier readdir_e;
      @@
      // any other fop is used that changes pos
      struct file_operations fops = {
      ... .readdir = readdir_e, ...
      +.llseek = default_llseek, /* readdir is present */
      };
      
      // use default_llseek if at least one of read/write touches f_pos
      /////////////////////////////////////////////////////////////////
      @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read.read_f;
      @@
      // read fops use offset
      struct file_operations fops = {
      ... .read = read_f, ...
      +.llseek = default_llseek, /* read accesses f_pos */
      };
      
      @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier write.write_f;
      @@
      // write fops use offset
      struct file_operations fops = {
      ... .write = write_f, ...
      +	.llseek = default_llseek, /* write accesses f_pos */
      };
      
      // Use noop_llseek if neither read nor write accesses f_pos
      ///////////////////////////////////////////////////////////
      
      @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read_no_fpos.read_f;
      identifier write_no_fpos.write_f;
      @@
      // write fops use offset
      struct file_operations fops = {
      ...
       .write = write_f,
       .read = read_f,
      ...
      +.llseek = noop_llseek, /* read and write both use no f_pos */
      };
      
      @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier write_no_fpos.write_f;
      @@
      struct file_operations fops = {
      ... .write = write_f, ...
      +.llseek = noop_llseek, /* write uses no f_pos */
      };
      
      @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read_no_fpos.read_f;
      @@
      struct file_operations fops = {
      ... .read = read_f, ...
      +.llseek = noop_llseek, /* read uses no f_pos */
      };
      
      @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      @@
      struct file_operations fops = {
      ...
      +.llseek = noop_llseek, /* no read or write fn */
      };
      ===== End semantic patch =====
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Julia Lawall <julia@diku.dk>
      Cc: Christoph Hellwig <hch@infradead.org>
      6038f373
  12. 11 10月, 2010 1 次提交
    • T
      workqueue: add and use WQ_MEM_RECLAIM flag · 6370a6ad
      Tejun Heo 提交于
      Add WQ_MEM_RECLAIM flag which currently maps to WQ_RESCUER, mark
      WQ_RESCUER as internal and replace all external WQ_RESCUER usages to
      WQ_MEM_RECLAIM.
      
      This makes the API users express the intent of the workqueue instead
      of indicating the internal mechanism used to guarantee forward
      progress.  This is also to make it cleaner to add more semantics to
      WQ_MEM_RECLAIM.  For example, if deemed necessary, memory reclaim
      workqueues can be made highpri.
      
      This patch doesn't introduce any functional change.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Jeff Garzik <jgarzik@pobox.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      6370a6ad
  13. 06 10月, 2010 1 次提交
  14. 05 10月, 2010 1 次提交
    • A
      fs/locks.c: prepare for BKL removal · b89f4321
      Arnd Bergmann 提交于
      This prepares the removal of the big kernel lock from the
      file locking code. We still use the BKL as long as fs/lockd
      uses it and ceph might sleep, but we can flip the definition
      to a private spinlock as soon as that's done.
      All users outside of fs/lockd get converted to use
      lock_flocks() instead of lock_kernel() where appropriate.
      
      Based on an earlier patch to use a spinlock from Matthew
      Wilcox, who has attempted this a few times before, the
      earliest patch from over 10 years ago turned it into
      a semaphore, which ended up being slower than the BKL
      and was subsequently reverted.
      
      Someone should do some serious performance testing when
      this becomes a spinlock, since this has caused problems
      before. Using a spinlock should be at least as good
      as the BKL in theory, but who knows...
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NMatthew Wilcox <willy@linux.intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Miklos Szeredi <mszeredi@suse.cz>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: John Kacur <jkacur@redhat.com>
      Cc: Sage Weil <sage@newdream.net>
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-fsdevel@vger.kernel.org
      b89f4321
  15. 01 10月, 2010 1 次提交
    • B
      GFS2 fatal: filesystem consistency error on rename · 46290341
      Bob Peterson 提交于
      This patch fixes a GFS2 problem whereby the first rename after a
      mount can result in a file system consistency error being flagged
      improperly and cause the file system to withdraw.  The problem is
      that the rename code tries to run the rgrp list with function
      gfs2_blk2rgrpd before the rgrp list is guaranteed to be read in
      from disk.  The patch makes the rename function hold the rindex
      glock (as the gfs2_unlink code does today) which reads in the rgrp
      list if need be.  There were a total of three places in the rename
      code that improperly referenced the rgrp list without the rindex
      glock and this patch fixes all three.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      46290341
  16. 29 9月, 2010 3 次提交
    • S
      GFS2: Improve journal allocation via sysfs · feb47ca9
      Steven Whitehouse 提交于
      Recently a feature was added to GFS2 to allow journal id allocation
      via sysfs. This patch builds upon that so that a negative journal id
      will be treated as an error code to be passed back as the return code
      from mount. This allows termination of the mount process if there is
      a failure.
      
      Also, the process has been updated so that the kernel will wait
      for a journal id, even in the "spectator" case. This is required
      in order to avoid mounting a filesystem in case there is an error
      while joining the cluster. In the spectator case, 0 is written into
      the file to indicate that all is well, and that mount should continue.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      feb47ca9
    • S
      GFS2: Add "norecovery" mount option as a synonym for "spectator" · 43f74c19
      Steven Whitehouse 提交于
      XFS supports the "norecovery" mount option which is basically the
      same as the GFS2 spectator mode. This adds support for "norecovery"
      as a synonym for spectator mode, which is hopefully a more obvious
      description of what it actually does.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      43f74c19
    • S
      GFS2: Fix spectator umount issue · c741c455
      Steven Whitehouse 提交于
      The tests further down the recovery function relating to
      unlocking the journal need to be updated to match the
      intial test. Also, a test in the umount code which was
      surplus to requirements has been removed. Umounting
      spectator mounts now works correctly, as expected.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      c741c455
  17. 28 9月, 2010 2 次提交
  18. 27 9月, 2010 1 次提交
  19. 24 9月, 2010 1 次提交
  20. 23 9月, 2010 2 次提交
    • S
      GFS2: Remove localcaching mount option · c2048b00
      Steven Whitehouse 提交于
      This option defaulted to on for lock_nolock mounts and off
      otherwise. The only function was to avoid the revalidation of
      dentries. In the cluster case, that is entirely pointless and
      liable to cause coherency problems.
      
      The patch changes the revalidation to depend upon whether the
      fs is a local or cluster fs (i.e. it follows the existing default
      behaviour). I very much doubt anybody ever used this option as
      there is no reason to. Even so we will continue to accept it
      on the mount command line, but ignore it.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      c2048b00
    • S
      GFS2: Remove ignore_local_fs mount argument · f57a024e
      Steven Whitehouse 提交于
      This is been a no-op for a very long time now. I'm pretty sure
      nobody uses it, but just in case we'll still accept it on the
      command line, but ignore it.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f57a024e
  21. 20 9月, 2010 1 次提交