1. 10 10月, 2007 3 次提交
    • S
      [GFS2] Clean up journaled data writing · 16615be1
      Steven Whitehouse 提交于
      This patch cleans up the code for writing journaled data into the log.
      It also removes the need to allocate a small "tag" structure for each
      block written into the log. Instead we just keep count of the outstanding
      I/O so that we can be sure that its all been written at the correct time.
      Another result of this patch is that a number of ll_rw_block() calls
      have become submit_bh() calls, closing some races at the same time.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      16615be1
    • S
      [GFS2] Clean up ordered write code · d7b616e2
      Steven Whitehouse 提交于
      The following patch removes the ordered write processing from
      databuf_lo_before_commit() and moves it to log.c. This has the effect of
      greatly simplyfying databuf_lo_before_commit() and well as potentially
      making the ordered write code more efficient.
      
      As a side effect of this, its now possible to remove ordered buffers
      from the ordered buffer list at any time, so we now make use of this in
      invalidatepage and releasepage to ensure timely release of these
      buffers.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d7b616e2
    • S
      [GFS2] Clean up invalidatepage/releasepage · bb3b0e3d
      Steven Whitehouse 提交于
      This patch fixes some bugs relating to journaled data files by cleaning
      up the gfs2_invalidatepage() and gfs2_releasepage() functions. We now
      never block during gfs2_releasepage(), instead we always either release
      or refuse to release depending on the status of the buffers.
      
      This fixes Red Hat bugzillas #248969 and #252392.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Bob Peterson <rpeterso@redhat.com>
      bb3b0e3d
  2. 14 8月, 2007 1 次提交
  3. 20 7月, 2007 1 次提交
    • N
      mm: merge populate and nopage into fault (fixes nonlinear) · 54cb8821
      Nick Piggin 提交于
      Nonlinear mappings are (AFAIKS) simply a virtual memory concept that encodes
      the virtual address -> file offset differently from linear mappings.
      
      ->populate is a layering violation because the filesystem/pagecache code
      should need to know anything about the virtual memory mapping.  The hitch here
      is that the ->nopage handler didn't pass down enough information (ie.  pgoff).
       But it is more logical to pass pgoff rather than have the ->nopage function
      calculate it itself anyway (because that's a similar layering violation).
      
      Having the populate handler install the pte itself is likewise a nasty thing
      to be doing.
      
      This patch introduces a new fault handler that replaces ->nopage and
      ->populate and (later) ->nopfn.  Most of the old mechanism is still in place
      so there is a lot of duplication and nice cleanups that can be removed if
      everyone switches over.
      
      The rationale for doing this in the first place is that nonlinear mappings are
      subject to the pagefault vs invalidate/truncate race too, and it seemed stupid
      to duplicate the synchronisation logic rather than just consolidate the two.
      
      After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in
      pagecache.  Seems like a fringe functionality anyway.
      
      NOPAGE_REFAULT is removed.  This should be implemented with ->fault, and no
      users have hit mainline yet.
      
      [akpm@linux-foundation.org: cleanup]
      [randy.dunlap@oracle.com: doc. fixes for readahead]
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54cb8821
  4. 09 7月, 2007 7 次提交
    • S
      [GFS2] Use zero_user_page() in stuffed_readpage() · 2840501a
      Steven Whitehouse 提交于
      As suggested by Robert P. J. Day <rpjday@mindspring.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Robert P. J. Day <rpjday@mindspring.com>
      2840501a
    • R
      [GFS2] Journaled file write/unstuff bug · 8fb68595
      Robert Peterson 提交于
      This patch is for bugzilla bug 283162, which uncovered a number of
      bugs pertaining to writing to files that have the journaled bit on.
      These bugs happen most often when writing to the meta_fs because
      the files are always journaled.  So operations like gfs2_grow were
      particularly vulnerable, although many of the problems could be
      recreated with normal files after setting the journaled bit on.
      The problems fixed are:
      
      -GFS2 wasn't ever writing unstuffed journaled data blocks to their
       in-place location on disk. Now it does.
      
      -If you unmounted too quickly after doing IO to a journaled file,
       GFS2 was crashing because you would discard a buffer whose bufdata
       was still on the active items list.  GFS2 now deals with this
       gracefully.
      
      -GFS2 was losing track of the bufdata for journaled data blocks,
       and it wasn't getting freed, causing an error when you tried to
       unmount the module.  GFS2 now frees all the bufdata structures.
      
      -There was a memory corruption occurring because GFS2 wrote
       twice as many log entries for journaled buffers.
      
      -It was occasionally trying to write journal headers in buffers
       that weren't currently mapped.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      8fb68595
    • B
      [GFS2] fix jdata issues · ddf4b426
      Benjamin Marzinski 提交于
      This is a patch for the first three issues of RHBZ #238162
      
      The first issue is that when you allocate a new page for a file, it will not
      start off uptodate. This makes sense, since you haven't written anything to that
      part of the file yet.  Unfortunately, gfs2_pin() checks to make sure that the
      buffers are uptodate.  The solution to this is to mark the buffers uptodate in
      gfs2_commit_write(), after they have been zeroed out and have the data written
      into them.  I'm pretty confident with this fix, although it's not completely
      obvious that there is no problem with marking the buffers uptodate here.
      
      The second issue is simply that you can try to pin a data buffer that is already
      on the incore log, and thus, already pinned. This patch checks to see if this
      buffer is already on the log, and exits databuf_lo_add() if it is, just like
      buf_lo_add() does.
      
      The third issue is that gfs2_log_flush() doesn't do it's block accounting
      correctly.  Both metadata and journaled data are logged, but gfs2_log_flush()
      only compares the number of metadata blocks with the number of blocks to commit
      to the ondisk journal.  This patch also counts the journaled data blocks.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      ddf4b426
    • S
      [GFS2] Clean up inode number handling · dbb7cae2
      Steven Whitehouse 提交于
      This patch cleans up the inode number handling code. The main difference
      is that instead of looking up the inodes using a struct gfs2_inum_host
      we now use just the no_addr member of this structure. The tests relating
      to no_formal_ino can then be done by the calling code. This has
      advantages in that we want to do different things in different code
      paths if the no_formal_ino doesn't match. In the NFS patch we want to
      return -ESTALE, but in the ->lookup() path, its a bug in the fs if the
      no_formal_ino doesn't match and thus we can withdraw in this case.
      
      In order to later fix bz #201012, we need to be able to look up an inode
      without knowing no_formal_ino, as the only information that is known to
      us is the on-disk location of the inode in question.
      
      This patch will also help us to fix bz #236099 at a later date by
      cleaning up a lot of the code in that area.
      
      There are no user visible changes as a result of this patch and there
      are no changes to the on-disk format either.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      dbb7cae2
    • R
      [GFS2] Addendum patch 2 for gfs2_grow · cd81a4ba
      Robert Peterson 提交于
      This addendum patch 2 corrects three things:
      
      1. It fixes a stupid mistake in the previous addendum that broke gfs2.
         Ref: https://www.redhat.com/archives/cluster-devel/2007-May/msg00162.html
      2. It fixes a problem that Dave Teigland pointed out regarding the
         external declarations in ops_address.h being in the wrong place.
      3. It recasts a couple more %llu printks to (unsigned long long)
         as requested by Steve Whitehouse.
      
      I would have loved to put this all in one revised patch, but there was
      a rush to get some patches for RHEL5.	Therefore, the previous patches
      were applied to the git tree "as is" and therefore, I'm posting another
      addendum.  Sorry.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      cd81a4ba
    • R
      [GFS2] Kernel changes to support new gfs2_grow command (part 2) · 6c53267f
      Robert Peterson 提交于
      To avoid code redundancy, I separated out the operational "guts" into
      a new function called read_rindex_entry.  Then I made two functions:
      the closer-to-original gfs2_ri_update (without the special condition
      checks) and gfs2_ri_update_special that's designed with that condition
      in mind.  (I don't like the name, but if you have a suggestion, I'm
      all ears).
      
      Oh, and there's an added benefit:  we don't need all the ugly gotos
      anymore.  ;)
      
      This patch has been tested with gfs2_fsck_hellfire (which runs for
      three and a half hours, btw).
      Signed-off-By: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      6c53267f
    • R
      [GFS2] kernel changes to support new gfs2_grow command · 7ae8fa84
      Robert Peterson 提交于
      This is another revision of my gfs2 kernel patch that allows
      gfs2_grow to function properly.
      
      Steve Whitehouse expressed some concerns about the previous
      patch and I restructured it based on his comments.
      The previous patch was doing the statfs_change at file close time,
      under its own transaction.  The current patch does the statfs_change
      inside the gfs2_commit_write function, which keeps it under the
      umbrella of the inode transaction.
      
      I can't call ri_update to re-read the rindex file during the
      transaction because the transaction may have outstanding unwritten
      buffers attached to the rgrps that would be otherwise blown away.
      So instead, I created a new function, gfs2_ri_total, that will
      re-read the rindex file just to total the file system space
      for the sake of the statfs_change.  The ri_update will happen
      later, when gfs2 realizes the version number has changed, as it
      happened before my patch.
      
      Since the statfs_change is happening at write_commit time and there
      may be multiple writes to the rindex file for one grow operation.
      So one consequence of this restructuring is that instead of getting
      one kernel message to indicate the change, you may see several.
      For example, before when you did a gfs2_grow, you'd get a single
      message like:
      
      GFS2: File system extended by 247876 blocks (968MB)
      
      Now you get something like:
      
      GFS2: File system extended by 207896 blocks (812MB)
      GFS2: File system extended by 39980 blocks (156MB)
      
      This version has also been successfully run against the hours-long
      "gfs2_fsck_hellfire" test that does several gfs2_grow and gfs2_fsck
      while interjecting file system damage.  It does this repeatedly
      under a variety Resource Group conditions.
      Signed-off-By: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      7ae8fa84
  5. 01 5月, 2007 2 次提交
    • S
      [GFS2] Patch to fix mmap of stuffed files · bf126aee
      Steven Whitehouse 提交于
      If a stuffed file is mmaped and a page fault is generated at some offset
      above the initial page, we need to create a zero page to hang the buffer
      heads off before we can unstuff the file. This is a fix for bz #236087
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      bf126aee
    • J
      [GFS2] Fix bz 231380, unlock page before dequeing glocks in gfs2_commit_write · 1de91390
      Josef Whiter 提交于
      If we are writing a file, and in the middle of writing the file
      another node attempts to get a shared lock on that file (by doing a du for
      example) the process doing the writing will hang waiting on lock_page.  The
      reason for this is because when we have waiters on a exclusive glock, we will go
      through and flush out all dirty pages associated with that inode and release the
      lock.  The problem is that when we flush the dirty pages, we could hit a page
      that we have locked durring the generic_file_buffered_write part of this
      operation.  This patch unlocks the page before we go to dequeue the lock and
      locks it immediatly afterwards, since generic_file_buffered_write needs the page
      locked when the commit_write is completed.  This patch resolves the problem,
      however if somebody sees a better way to do this please don't hesistate to yell.
      Signed-off-by: NJosef Whiter <jwhiter@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      1de91390
  6. 08 3月, 2007 1 次提交
    • J
      [GFS2] fix hangup when multiple processes are trying to write to the same file · a13cbe37
      Josef Whiter 提交于
      This fixes a problem I encountered while running bonnie++.  When you have one
      thread that opens a file and starts to write to it, and then another thread that
      tries to open and write to the same file, the second thread will loop forever
      trying to grab the inode lock for that inode.  Basically we come in through
      generic_buffered_file_write, which calls gfs2_prepare_write, which then attempts
      to grab the glock.  Because we don't own the lock, gfs2_prepare_write gets
      GLR_TRYFAILED, which returns AOP_TRUNCATED_PAGE to generic_buffered_file_write.
      At this point generic_buffered_file_write loops around again and immediately
      retries the prepare_write.  This means that the second process never gets off of
      the processor in order to allow the process that holds the lock to finish its
      work and let go of the lock.  This patch makes gfs2_glock_nq schedule() if it
      gets back a GLR_TRYFAILED, which resolves this problem.
      Signed-off-by: NJosef Whiter <jwhiter@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      a13cbe37
  7. 07 2月, 2007 2 次提交
  8. 06 2月, 2007 3 次提交
    • S
      [GFS2] Add writepages for "data=writeback" mounts · a8d638e3
      Steven Whitehouse 提交于
      It occurred to me that although a gfs2 specific writepages for ordered
      writes and journaled data would be tricky, by hooking writepages only
      for "data=writeback" mounts we could take advantage of not needing
      buffer heads (we don't use them on the read side, nor have we for some
      time) and create much larger I/Os for the block layer.
      
      Using blktrace both before and after, its possible to see that for large
      I/Os, most of the requests generated through writepages are now 1024
      sectors after this patch is applied as opposed to 8 sectors before.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      a8d638e3
    • S
      [GFS2] Fail over to readpage for stuffed files · e1d5b18a
      Steven Whitehouse 提交于
      This is partially derrived from a patch written by Russell Cattelan.
      It fixes a bug where there is a race between readpages and truncate
      by ignoring readpages for stuffed files. This is ok because a stuffed
      file will never be more than one block (minus sizeof(struct gfs2_dinode))
      in size and block size is always less than page size, so we do not lose
      anything efficiency-wise by not doing readahead for stuffed files. They
      will have already been "read ahead" by the action of reading the inode
      in, in the first place.
      
      This is the remaining part of the fix for Red Hat bugzilla #218966
      which had not yet made it upstream.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Russell Cattelan <cattelan@redhat.com>
      e1d5b18a
    • S
      [GFS2] Fix DIO deadlock · c7b33834
      Steven Whitehouse 提交于
      This patch fixes Red Hat bugzilla #212627 in which a deadlock occurs
      due to trying to take the i_mutex while holding a glock. The correct
      locking order is defined as i_mutex -> glock in all cases.
      
      I've left dealing with allocating writes. I know that we need to do
      that, but for now this should do the trick. We don't need to take the
      i_mutex on write, because the VFS has already taken it for us. On read
      we don't need it since the glock is enough protection. The reason that
      I've made some of the checks into a separate function is that we'll need
      to do the checks again in the allocating write case eventually, so this
      is partly in preparation for this. Likewise the return value test of !=
      1 might look a bit odd and thats because we'll need a third return value
      in case of requiring an allocation.
      
      I've made the change to deferred mode on the glock to ensure flushing
      read caches on other nodes. I notice that (using blktrace to look at
      whats going on) we appear to do a better job of large I/Os than ext3
      after this patch (in terms of not splitting up the I/Os).
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Wendy Cheng <wcheng@redhat.com>
      c7b33834
  9. 30 11月, 2006 7 次提交
  10. 04 11月, 2006 1 次提交
  11. 20 10月, 2006 1 次提交
  12. 13 10月, 2006 3 次提交
  13. 03 10月, 2006 1 次提交
    • S
      [GFS2] Remove uneeded endian conversion · 48516ced
      Steven Whitehouse 提交于
      In many places GFS2 was calling the endian conversion routines
      for an inode even when only a single field, or a few fields might
      have changed. As a result we were copying lots of data needlessly.
      
      This patch replaces those calls with conversion of just the
      required fields in each case. This should be faster and easier
      to understand. There are still other places which suffer from this
      problem, but this is a start in the right direction.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      48516ced
  14. 25 9月, 2006 1 次提交
  15. 22 9月, 2006 1 次提交
    • S
      [GFS2] Tidy up meta_io code · 7276b3b0
      Steven Whitehouse 提交于
      Fix a bug in the directory reading code, where we might have dereferenced
      a NULL pointer in case of OOM. Updated the directory code to use the new
      & improved version of gfs2_meta_ra() which now returns the first block
      that was being read. Previously it was releasing it requiring following
      code to grab the block again at each point it was called.
      
      Also turned off readahead on directory lookups since we are reading a
      hash table, and therefore reading the entries in order is very
      unlikely. Readahead is still used for all other calls to the
      directory reading function (e.g. when growing the hash table).
      
      Removed the DIO_START constant. Everywhere this was used, it was
      used to unconditionally start i/o aside from a couple of places, so
      I've removed it and made the couple of exceptions to this rule into
      separate functions.
      
      Also hunted through the other DIO flags and removed them as arguments
      from functions which were always called with the same combination of
      arguments.
      
      Updated gfs2_meta_indirect_buffer to be a bit more efficient and
      hopefully also be a bit easier to read.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      7276b3b0
  16. 21 9月, 2006 1 次提交
  17. 19 9月, 2006 4 次提交
    • F
      [GFS2] Export lm_interface to kernel headers · 7d308590
      Fabio Massimo Di Nitto 提交于
      
      lm_interface.h has a few out of the tree clients such as GFS1
      and userland tools.
      
      Right now, these clients keeps a copy of the file in their build tree
      that can go out of sync.
      
      Move lm_interface.h to include/linux, export it to userland and
      clean up fs/gfs2 to use the new location.
      Signed-off-by: NFabio M. Di Nitto <fabbione@ubuntu.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      7d308590
    • S
      [GFS2] Tweek unlock test in readpage() · 07903c02
      Steven Whitehouse 提交于
      This make the unlock test a bit simpler.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      07903c02
    • R
      [GFS2] Fix for mmap() bug in readpage · dc41aeed
      Russell Cattelan 提交于
      Fix for Red Hat bz 205307. Don't need to lock in readpage if
      the higher level code has already grabbed the lock.
      Signed-off-by: NRussell Cattelan <cattelan@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      dc41aeed
    • S
      [GFS2] Map multiple blocks at once where possible · 7a6bbacb
      Steven Whitehouse 提交于
      This is a tidy up of the GFS2 bmap code. The main change is that the
      bh is passed to gfs2_block_map allowing the flags to be set directly
      rather than having to repeat that code several times in ops_address.c.
      
      At the same time, the extent mapping code from gfs2_extent_map has
      been moved into gfs2_block_map. This allows all calls to gfs2_block_map
      to map extents in the case that no allocation is taking place. As a
      result reads and non-allocating writes should be faster. A quick test
      with postmark appears to support this.
      
      There is a limit on the number of blocks mapped in a single bmap
      call in that it will only ever map blocks which are pointed to
      from a single pointer block. So in other words, it will never try
      to do additional i/o in order to satisfy read-ahead. The maximum
      number of blocks is thus somewhat less than 512 (the GFS2 4k block
      size minus the header divided by sizeof(u64)). I've further limited
      the mapping of "normal" blocks to 32 blocks (to avoid extra work)
      since readpages() will currently read a maximum of 32 blocks ahead (128k).
      
      Some further work will probably be needed to set a suitable value
      for DIO as well, but for now thats left at the maximum 512 (see
      ops_address.c:gfs2_get_block_direct).
      
      There is probably a lot more that can be done to improve bmap for GFS2,
      but this is a good first step.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      7a6bbacb