1. 21 10月, 2011 3 次提交
    • S
      GFS2: Use ->dirty_inode() · ab9bbda0
      Steven Whitehouse 提交于
      The aim of this patch is to use the newly enhanced ->dirty_inode()
      super block operation to deal with atime updates, rather than
      piggy backing that code into ->write_inode() as is currently
      done.
      
      The net result is a simplification of the code in various places
      and a reduction of the number of gfs2_dinode_out() calls since
      this is now implied by ->dirty_inode().
      
      Some of the mark_inode_dirty() calls have been moved under glocks
      in order to take advantage of then being able to avoid locking in
      ->dirty_inode() when we already have suitable locks.
      
      One consequence is that generic_write_end() now correctly deals
      with file size updates, so that we do not need a separate check
      for that afterwards. This also, indirectly, means that fdatasync
      should work correctly on GFS2 - the current code always syncs the
      metadata whether it needs to or not.
      
      Has survived testing with postmark (with and without atime) and
      also fsx.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      ab9bbda0
    • S
      GFS2: Fix bug trap and journaled data fsync · f1818529
      Steven Whitehouse 提交于
      Journaled data requires that a complete flush of all dirty data for
      the file is done, in order that the ail flush which comes after
      will succeed.
      
      Also the recently enhanced bug trap can trigger falsely in case
      an ail flush from fsync races with a page read. This updates the
      bug trap such that it will ignore buffers which are locked and
      only trigger on dirty and/or pinned buffers when the ail flush
      is run from fsync. The original bug trap is retained when ail
      flush is run from ->go_sync()
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f1818529
    • S
      GFS2: Split data write & wait in fsync · 2f0264d5
      Steven Whitehouse 提交于
      Now that the data writing is part of fsync proper, we can split
      the waiting part out and do it later on. This reduces the
      number of waits that we do during fsync on average.
      
      There is also no need to take the i_mutex unless we are flushing
      metadata to disk, so we can move that to within the metadata
      flushing code.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      2f0264d5
  2. 21 7月, 2011 1 次提交
    • J
      fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers · 02c24a82
      Josef Bacik 提交于
      Btrfs needs to be able to control how filemap_write_and_wait_range() is called
      in fsync to make it less of a painful operation, so push down taking i_mutex and
      the calling of filemap_write_and_wait() down into the ->fsync() handlers.  Some
      file systems can drop taking the i_mutex altogether it seems, like ext3 and
      ocfs2.  For correctness sake I just pushed everything down in all cases to make
      sure that we keep the current behavior the same for everybody, and then each
      individual fs maintainer can make up their mind about what to do from there.
      Thanks,
      Acked-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      02c24a82
  3. 20 7月, 2011 1 次提交
  4. 15 7月, 2011 1 次提交
  5. 03 5月, 2011 1 次提交
    • B
      GFS2: make sure fallocate bytes is a multiple of blksize · 6905d9e4
      Benjamin Marzinski 提交于
      The GFS2 fallocate code chooses a target size to for allocating chunks of
      space.  Whenever it can't find any resource groups with enough space free, it
      halves its target. Since this target is in bytes, eventually it will no longer
      be a multiple of blksize.  As long as there is more space available in the
      resource group than the target, this isn't a problem, since gfs2 will use the
      actual space available, which is always a multiple of blksize.  However,
      when gfs couldn't fallocate a bigger chunk than the target, it was using the
      non-blksize aligned number. This caused a BUG in later code that required
      blksize aligned offsets.  GFS2 now ensures that bytes is always a multiple of
      blksize
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      6905d9e4
  6. 20 4月, 2011 1 次提交
    • S
      GFS2: Clean up fsync() · dba898b0
      Steven Whitehouse 提交于
      This patch is designed to clean up GFS2's fsync
      implementation and ensure that it really does get everything on
      disk. Since ->write_inode() has been updated, we can call that
      via the vfs library function sync_inode_metadata() and the only
      remaining thing that has to be done is to ensure that we get
      any revoke records in the log after the inode has been written back.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      dba898b0
  7. 18 4月, 2011 1 次提交
  8. 24 3月, 2011 1 次提交
  9. 11 3月, 2011 1 次提交
  10. 09 3月, 2011 1 次提交
  11. 02 2月, 2011 1 次提交
    • S
      GFS2: Improve cluster mmap scalability · b9c93bb7
      Steven Whitehouse 提交于
      The mmap system call grabs a glock when an update to atime maybe
      required. It does this in order to ensure that the flags on the
      inode are uptodate, but since it will only mark atime for a future
      update, an exclusive lock is not required here (one will be taken
      later when the actual update is performed).
      
      Also, the lock can be skipped when the mount is marked noatime in
      addition to the original check which only looked at the noatime
      flag for the inode itself.
      
      This should increase the scalability of the mmap call when multiple
      nodes are all mmaping the same file.
      Reported-by: NScooter Morris <scooter@cgl.ucsf.edu>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      b9c93bb7
  12. 17 1月, 2011 1 次提交
    • C
      fallocate should be a file operation · 2fe17c10
      Christoph Hellwig 提交于
      Currently all filesystems except XFS implement fallocate asynchronously,
      while XFS forced a commit.  Both of these are suboptimal - in case of O_SYNC
      I/O we really want our allocation on disk, especially for the !KEEP_SIZE
      case where we actually grow the file with user-visible zeroes.  On the
      other hand always commiting the transaction is a bad idea for fast-path
      uses of fallocate like for example in recent Samba versions.   Given
      that block allocation is a data plane operation anyway change it from
      an inode operation to a file operation so that we have the file structure
      available that lets us check for O_SYNC.
      
      This also includes moving the code around for a few of the filesystems,
      and remove the already unnedded S_ISDIR checks given that we only wire
      up fallocate for regular files.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2fe17c10
  13. 07 1月, 2011 1 次提交
  14. 31 10月, 2010 2 次提交
  15. 15 10月, 2010 1 次提交
    • A
      llseek: automatically add .llseek fop · 6038f373
      Arnd Bergmann 提交于
      All file_operations should get a .llseek operation so we can make
      nonseekable_open the default for future file operations without a
      .llseek pointer.
      
      The three cases that we can automatically detect are no_llseek, seq_lseek
      and default_llseek. For cases where we can we can automatically prove that
      the file offset is always ignored, we use noop_llseek, which maintains
      the current behavior of not returning an error from a seek.
      
      New drivers should normally not use noop_llseek but instead use no_llseek
      and call nonseekable_open at open time.  Existing drivers can be converted
      to do the same when the maintainer knows for certain that no user code
      relies on calling seek on the device file.
      
      The generated code is often incorrectly indented and right now contains
      comments that clarify for each added line why a specific variant was
      chosen. In the version that gets submitted upstream, the comments will
      be gone and I will manually fix the indentation, because there does not
      seem to be a way to do that using coccinelle.
      
      Some amount of new code is currently sitting in linux-next that should get
      the same modifications, which I will do at the end of the merge window.
      
      Many thanks to Julia Lawall for helping me learn to write a semantic
      patch that does all this.
      
      ===== begin semantic patch =====
      // This adds an llseek= method to all file operations,
      // as a preparation for making no_llseek the default.
      //
      // The rules are
      // - use no_llseek explicitly if we do nonseekable_open
      // - use seq_lseek for sequential files
      // - use default_llseek if we know we access f_pos
      // - use noop_llseek if we know we don't access f_pos,
      //   but we still want to allow users to call lseek
      //
      @ open1 exists @
      identifier nested_open;
      @@
      nested_open(...)
      {
      <+...
      nonseekable_open(...)
      ...+>
      }
      
      @ open exists@
      identifier open_f;
      identifier i, f;
      identifier open1.nested_open;
      @@
      int open_f(struct inode *i, struct file *f)
      {
      <+...
      (
      nonseekable_open(...)
      |
      nested_open(...)
      )
      ...+>
      }
      
      @ read disable optional_qualifier exists @
      identifier read_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      expression E;
      identifier func;
      @@
      ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
      {
      <+...
      (
         *off = E
      |
         *off += E
      |
         func(..., off, ...)
      |
         E = *off
      )
      ...+>
      }
      
      @ read_no_fpos disable optional_qualifier exists @
      identifier read_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      @@
      ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
      {
      ... when != off
      }
      
      @ write @
      identifier write_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      expression E;
      identifier func;
      @@
      ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
      {
      <+...
      (
        *off = E
      |
        *off += E
      |
        func(..., off, ...)
      |
        E = *off
      )
      ...+>
      }
      
      @ write_no_fpos @
      identifier write_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      @@
      ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
      {
      ... when != off
      }
      
      @ fops0 @
      identifier fops;
      @@
      struct file_operations fops = {
       ...
      };
      
      @ has_llseek depends on fops0 @
      identifier fops0.fops;
      identifier llseek_f;
      @@
      struct file_operations fops = {
      ...
       .llseek = llseek_f,
      ...
      };
      
      @ has_read depends on fops0 @
      identifier fops0.fops;
      identifier read_f;
      @@
      struct file_operations fops = {
      ...
       .read = read_f,
      ...
      };
      
      @ has_write depends on fops0 @
      identifier fops0.fops;
      identifier write_f;
      @@
      struct file_operations fops = {
      ...
       .write = write_f,
      ...
      };
      
      @ has_open depends on fops0 @
      identifier fops0.fops;
      identifier open_f;
      @@
      struct file_operations fops = {
      ...
       .open = open_f,
      ...
      };
      
      // use no_llseek if we call nonseekable_open
      ////////////////////////////////////////////
      @ nonseekable1 depends on !has_llseek && has_open @
      identifier fops0.fops;
      identifier nso ~= "nonseekable_open";
      @@
      struct file_operations fops = {
      ...  .open = nso, ...
      +.llseek = no_llseek, /* nonseekable */
      };
      
      @ nonseekable2 depends on !has_llseek @
      identifier fops0.fops;
      identifier open.open_f;
      @@
      struct file_operations fops = {
      ...  .open = open_f, ...
      +.llseek = no_llseek, /* open uses nonseekable */
      };
      
      // use seq_lseek for sequential files
      /////////////////////////////////////
      @ seq depends on !has_llseek @
      identifier fops0.fops;
      identifier sr ~= "seq_read";
      @@
      struct file_operations fops = {
      ...  .read = sr, ...
      +.llseek = seq_lseek, /* we have seq_read */
      };
      
      // use default_llseek if there is a readdir
      ///////////////////////////////////////////
      @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier readdir_e;
      @@
      // any other fop is used that changes pos
      struct file_operations fops = {
      ... .readdir = readdir_e, ...
      +.llseek = default_llseek, /* readdir is present */
      };
      
      // use default_llseek if at least one of read/write touches f_pos
      /////////////////////////////////////////////////////////////////
      @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read.read_f;
      @@
      // read fops use offset
      struct file_operations fops = {
      ... .read = read_f, ...
      +.llseek = default_llseek, /* read accesses f_pos */
      };
      
      @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier write.write_f;
      @@
      // write fops use offset
      struct file_operations fops = {
      ... .write = write_f, ...
      +	.llseek = default_llseek, /* write accesses f_pos */
      };
      
      // Use noop_llseek if neither read nor write accesses f_pos
      ///////////////////////////////////////////////////////////
      
      @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read_no_fpos.read_f;
      identifier write_no_fpos.write_f;
      @@
      // write fops use offset
      struct file_operations fops = {
      ...
       .write = write_f,
       .read = read_f,
      ...
      +.llseek = noop_llseek, /* read and write both use no f_pos */
      };
      
      @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier write_no_fpos.write_f;
      @@
      struct file_operations fops = {
      ... .write = write_f, ...
      +.llseek = noop_llseek, /* write uses no f_pos */
      };
      
      @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read_no_fpos.read_f;
      @@
      struct file_operations fops = {
      ... .read = read_f, ...
      +.llseek = noop_llseek, /* read uses no f_pos */
      };
      
      @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      @@
      struct file_operations fops = {
      ...
      +.llseek = noop_llseek, /* no read or write fn */
      };
      ===== End semantic patch =====
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Julia Lawall <julia@diku.dk>
      Cc: Christoph Hellwig <hch@infradead.org>
      6038f373
  16. 05 10月, 2010 1 次提交
    • A
      fs/locks.c: prepare for BKL removal · b89f4321
      Arnd Bergmann 提交于
      This prepares the removal of the big kernel lock from the
      file locking code. We still use the BKL as long as fs/lockd
      uses it and ceph might sleep, but we can flip the definition
      to a private spinlock as soon as that's done.
      All users outside of fs/lockd get converted to use
      lock_flocks() instead of lock_kernel() where appropriate.
      
      Based on an earlier patch to use a spinlock from Matthew
      Wilcox, who has attempted this a few times before, the
      earliest patch from over 10 years ago turned it into
      a semaphore, which ended up being slower than the BKL
      and was subsequently reverted.
      
      Someone should do some serious performance testing when
      this becomes a spinlock, since this has caused problems
      before. Using a spinlock should be at least as good
      as the BKL in theory, but who knows...
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NMatthew Wilcox <willy@linux.intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Miklos Szeredi <mszeredi@suse.cz>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: John Kacur <jkacur@redhat.com>
      Cc: Sage Weil <sage@newdream.net>
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-fsdevel@vger.kernel.org
      b89f4321
  17. 28 9月, 2010 1 次提交
  18. 20 9月, 2010 1 次提交
  19. 29 7月, 2010 1 次提交
  20. 28 5月, 2010 1 次提交
  21. 24 5月, 2010 1 次提交
  22. 12 3月, 2010 1 次提交
  23. 08 1月, 2010 1 次提交
    • S
      GFS2: Ensure uptodate inode size when using O_APPEND · 56aa616a
      Steven Whitehouse 提交于
      The VFS reads the inode size during generic_file_aio_write() but
      with no locking around it. In order to get the expected result
      from O_APPEND opens, this patch updated the inode size before
      calling generic_file_aio_write()
      
      There is of course still a race here, in that there is nothing to
      prevent another node coming in and extending the file in the
      mean time. On the other hand, when used with file locking this
      will ensure that the expected results are obtained.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      56aa616a
  24. 28 9月, 2009 1 次提交
  25. 27 8月, 2009 1 次提交
    • S
      GFS2: Clean up of extended attribute support · 40b78a32
      Steven Whitehouse 提交于
      This has been on my list for some time. We need to change the way
      in which we handle extended attributes to allow faster file creation
      times (by reducing the number of transactions required) and the
      extended attribute code is the main obstacle to this.
      
      In addition to that, the VFS provides a way to demultiplex the xattr
      calls which we ought to be using, rather than rolling our own. This
      patch changes the GFS2 code to use that VFS feature and as a result
      the code shrinks by a couple of hundred lines or so, and becomes
      easier to read.
      
      I'm planning on doing further clean up work in this area, but this
      patch is a good start. The cleaned up code also uses the more usual
      "xattr" shorthand, I plan to eliminate the use of "eattr" eventually
      and in the mean time it serves as a flag as to which bits of the code
      have been updated.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      40b78a32
  26. 03 6月, 2009 1 次提交
  27. 02 6月, 2009 1 次提交
  28. 22 5月, 2009 1 次提交
    • S
      GFS2: Clean up some file names · b1e71b06
      Steven Whitehouse 提交于
      This patch renames the ops_*.c files which have no counterpart
      without the ops_ prefix in order to shorten the name and make
      it more readable. In addition, ops_address.h (which was very
      small) is moved into inode.h and inode.h is cleaned up by
      adding extern where required.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      b1e71b06
  29. 11 5月, 2009 1 次提交
    • S
      GFS2: Something nonlinear this way comes! · 48bf2b17
      Steven Whitehouse 提交于
      For some reason GFS2 has been missing support for non-linear
      mappings. This patch fixes that, and also avoids taking any
      locks for mmap in the O_NOATIME case. In fact we don't actually need
      to take the lock here at all - just doing file_accessed() would be
      enough, but we have to take the lock eventually and this helps
      it hit disk (and thus be seen by other nodes) faster.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      48bf2b17
  30. 20 4月, 2009 1 次提交
  31. 15 4月, 2009 1 次提交
  32. 01 4月, 2009 1 次提交
    • N
      mm: page_mkwrite change prototype to match fault · c2ec175c
      Nick Piggin 提交于
      Change the page_mkwrite prototype to take a struct vm_fault, and return
      VM_FAULT_xxx flags.  There should be no functional change.
      
      This makes it possible to return much more detailed error information to
      the VM (and also can provide more information eg.  virtual_address to the
      driver, which might be important in some special cases).
      
      This is required for a subsequent fix.  And will also make it easier to
      merge page_mkwrite() with fault() in future.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <joel.becker@oracle.com>
      Cc: Artem Bityutskiy <dedekind@infradead.org>
      Cc: Felix Blyakher <felixb@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c2ec175c
  33. 24 3月, 2009 3 次提交
    • S
      Fix a minor bug in the previous patch · 9c538837
      Steven Whitehouse 提交于
      The logic requires that we mark the glock dirty in page_mkwrite
      otherwise we might not flush correctly in the case that no
      allocation was required in the process of dirying the page.
      Also we need to set the shared write flag early for the same
      reason.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      9c538837
    • S
      GFS2: Clean up of glops.c · 6bac243f
      Steven Whitehouse 提交于
      This cleans up a number of bits of code mostly based in glops.c.
      A couple of simple functions have been merged into the callers
      to make it more obvious what is going on, the mysterious raising
      of i_writecount around the truncate_inode_pages() call has been
      removed. The meta_go_* operations have been renamed rgrp_go_*
      since that is the only lock type that they are used with.
      
      The unused argument of gfs2_read_sb has been removed. Also
      a bug has been fixed where a check for the rindex inode was
      in the wrong callback. More comments are added, and the
      debugging code is improved too.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      6bac243f
    • S
      GFS2: Merge lock_dlm module into GFS2 · f057f6cd
      Steven Whitehouse 提交于
      This is the big patch that I've been working on for some time
      now. There are many reasons for wanting to make this change
      such as:
       o Reducing overhead by eliminating duplicated fields between structures
       o Simplifcation of the code (reduces the code size by a fair bit)
       o The locking interface is now the DLM interface itself as proposed
         some time ago.
       o Fewer lookups of glocks when processing replies from the DLM
       o Fewer memory allocations/deallocations for each glock
       o Scope to do further optimisations in the future (but this patch is
         more than big enough for now!)
      
      Please note that (a) this patch relates to the lock_dlm module and
      not the DLM itself, that is still a separate module; and (b) that
      we retain the ability to build GFS2 as a standalone single node
      filesystem with out requiring the DLM.
      
      This patch needs a lot of testing, hence my keeping it I restarted
      my -git tree after the last merge window. That way, this has the maximum
      exposure before its merged. This is (modulo a few minor bug fixes) the
      same patch that I've been posting on and off the the last three months
      and its passed a number of different tests so far.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f057f6cd
  34. 07 1月, 2009 1 次提交
  35. 05 1月, 2009 1 次提交