1. 10 10月, 2007 2 次提交
    • J
      [GFS2] Fix calculation of demote state · 26caee5b
      Josef Whiter 提交于
      If a glock is in the exclusive state and a request for demote to
      deferred has been received, then further requests for demote to
      shared are being ignored. This patch fixes that by ensuring that
      we demote to unlocked in that case.
      Signed-off-by: NJosef Whiter <jwhiter@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      26caee5b
    • S
      [GFS2] Fix two races relating to glock callbacks · 87124e58
      Steven Whitehouse 提交于
      One of the races relates to referencing a variable while not holding
      its protecting spinlock. The patch simply moves the test inside the
      spin lock. The other races occurs when a demote to unlocked request
      occurs during the time a demote to shared request is already running.
      This of course only happens in the case that the lock was in the
      exclusive mode to start with. The patch adds a check to see if another
      demote request has occurred in the mean time and if it has, then it
      performs a second demote.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      87124e58
  2. 14 8月, 2007 6 次提交
  3. 01 8月, 2007 1 次提交
  4. 20 7月, 2007 5 次提交
    • P
      mm: Remove slab destructors from kmem_cache_create(). · 20c2df83
      Paul Mundt 提交于
      Slab destructors were no longer supported after Christoph's
      c59def9f change. They've been
      BUGs for both slab and slub, and slob never supported them
      either.
      
      This rips out support for the dtor pointer from kmem_cache_create()
      completely and fixes up every single callsite in the kernel (there were
      about 224, not including the slab allocator definitions themselves,
      or the documentation references).
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      20c2df83
    • N
      mm: fault feedback #2 · 83c54070
      Nick Piggin 提交于
      This patch completes Linus's wish that the fault return codes be made into
      bit flags, which I agree makes everything nicer.  This requires requires
      all handle_mm_fault callers to be modified (possibly the modifications
      should go further and do things like fault accounting in handle_mm_fault --
      however that would be for another patch).
      
      [akpm@linux-foundation.org: fix alpha build]
      [akpm@linux-foundation.org: fix s390 build]
      [akpm@linux-foundation.org: fix sparc build]
      [akpm@linux-foundation.org: fix sparc64 build]
      [akpm@linux-foundation.org: fix ia64 build]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Ian Molton <spyro@f2s.com>
      Cc: Bryan Wu <bryan.wu@analog.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: Greg Ungerer <gerg@uclinux.org>
      Cc: Matthew Wilcox <willy@debian.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
      Cc: Richard Curnow <rc@rc0.org.uk>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
      Cc: Chris Zankel <chris@zankel.net>
      Acked-by: NKyle McMartin <kyle@mcmartin.ca>
      Acked-by: NHaavard Skinnemoen <hskinnemoen@atmel.com>
      Acked-by: NRalf Baechle <ralf@linux-mips.org>
      Acked-by: NAndi Kleen <ak@muc.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      [ Still apparently needs some ARM and PPC loving - Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      83c54070
    • N
      mm: fault feedback #1 · d0217ac0
      Nick Piggin 提交于
      Change ->fault prototype.  We now return an int, which contains
      VM_FAULT_xxx code in the low byte, and FAULT_RET_xxx code in the next byte.
       FAULT_RET_ code tells the VM whether a page was found, whether it has been
      locked, and potentially other things.  This is not quite the way he wanted
      it yet, but that's changed in the next patch (which requires changes to
      arch code).
      
      This means we no longer set VM_CAN_INVALIDATE in the vma in order to say
      that a page is locked which requires filemap_nopage to go away (because we
      can no longer remain backward compatible without that flag), but we were
      going to do that anyway.
      
      struct fault_data is renamed to struct vm_fault as Linus asked. address
      is now a void __user * that we should firmly encourage drivers not to use
      without really good reason.
      
      The page is now returned via a page pointer in the vm_fault struct.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d0217ac0
    • N
      mm: merge populate and nopage into fault (fixes nonlinear) · 54cb8821
      Nick Piggin 提交于
      Nonlinear mappings are (AFAIKS) simply a virtual memory concept that encodes
      the virtual address -> file offset differently from linear mappings.
      
      ->populate is a layering violation because the filesystem/pagecache code
      should need to know anything about the virtual memory mapping.  The hitch here
      is that the ->nopage handler didn't pass down enough information (ie.  pgoff).
       But it is more logical to pass pgoff rather than have the ->nopage function
      calculate it itself anyway (because that's a similar layering violation).
      
      Having the populate handler install the pte itself is likewise a nasty thing
      to be doing.
      
      This patch introduces a new fault handler that replaces ->nopage and
      ->populate and (later) ->nopfn.  Most of the old mechanism is still in place
      so there is a lot of duplication and nice cleanups that can be removed if
      everyone switches over.
      
      The rationale for doing this in the first place is that nonlinear mappings are
      subject to the pagefault vs invalidate/truncate race too, and it seemed stupid
      to duplicate the synchronisation logic rather than just consolidate the two.
      
      After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in
      pagecache.  Seems like a fringe functionality anyway.
      
      NOPAGE_REFAULT is removed.  This should be implemented with ->fault, and no
      users have hit mainline yet.
      
      [akpm@linux-foundation.org: cleanup]
      [randy.dunlap@oracle.com: doc. fixes for readahead]
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54cb8821
    • N
      mm: fix fault vs invalidate race for linear mappings · d00806b1
      Nick Piggin 提交于
      Fix the race between invalidate_inode_pages and do_no_page.
      
      Andrea Arcangeli identified a subtle race between invalidation of pages from
      pagecache with userspace mappings, and do_no_page.
      
      The issue is that invalidation has to shoot down all mappings to the page,
      before it can be discarded from the pagecache.  Between shooting down ptes to
      a particular page, and actually dropping the struct page from the pagecache,
      do_no_page from any process might fault on that page and establish a new
      mapping to the page just before it gets discarded from the pagecache.
      
      The most common case where such invalidation is used is in file truncation.
      This case was catered for by doing a sort of open-coded seqlock between the
      file's i_size, and its truncate_count.
      
      Truncation will decrease i_size, then increment truncate_count before
      unmapping userspace pages; do_no_page will read truncate_count, then find the
      page if it is within i_size, and then check truncate_count under the page
      table lock and back out and retry if it had subsequently been changed (ptl
      will serialise against unmapping, and ensure a potentially updated
      truncate_count is actually visible).
      
      Complexity and documentation issues aside, the locking protocol fails in the
      case where we would like to invalidate pagecache inside i_size.  do_no_page
      can come in anytime and filemap_nopage is not aware of the invalidation in
      progress (as it is when it is outside i_size).  The end result is that
      dangling (->mapping == NULL) pages that appear to be from a particular file
      may be mapped into userspace with nonsense data.  Valid mappings to the same
      place will see a different page.
      
      Andrea implemented two working fixes, one using a real seqlock, another using
      a page->flags bit.  He also proposed using the page lock in do_no_page, but
      that was initially considered too heavyweight.  However, it is not a global or
      per-file lock, and the page cacheline is modified in do_no_page to increment
      _count and _mapcount anyway, so a further modification should not be a large
      performance hit.  Scalability is not an issue.
      
      This patch implements this latter approach.  ->nopage implementations return
      with the page locked if it is possible for their underlying file to be
      invalidated (in that case, they must set a special vm_flags bit to indicate
      so).  do_no_page only unlocks the page after setting up the mapping
      completely.  invalidation is excluded because it holds the page lock during
      invalidation of each page (and ensures that the page is not mapped while
      holding the lock).
      
      This also allows significant simplifications in do_no_page, because we have
      the page locked in the right place in the pagecache from the start.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d00806b1
  5. 19 7月, 2007 1 次提交
  6. 18 7月, 2007 2 次提交
  7. 17 7月, 2007 1 次提交
    • A
      Remove capability.h from mm.h · aa0ac365
      Alexey Dobriyan 提交于
      I forgot to remove capability.h from mm.h while removing sched.h!  This
      patch remedies that, because the only inline function which was using
      CAP_something was made out of line.
      
      Cross-compile tested without regressions on:
      
      	all powerpc defconfigs
      	all mips defconfigs
      	all m68k defconfigs
      	all arm defconfigs
      	all ia64 defconfigs
      
      	alpha alpha-allnoconfig alpha-defconfig alpha-up
      	arm
      	i386 i386-allnoconfig i386-defconfig i386-up
      	ia64 ia64-allnoconfig ia64-defconfig ia64-up
      	m68k
      	mips
      	parisc parisc-allnoconfig parisc-defconfig parisc-up
      	powerpc powerpc-up
      	s390 s390-allnoconfig s390-defconfig s390-up
      	sparc sparc-allnoconfig sparc-defconfig sparc-up
      	sparc64 sparc64-allnoconfig sparc64-defconfig sparc64-up
      	um-x86_64
      	x86_64 x86_64-allnoconfig x86_64-defconfig x86_64-up
      
      as well as my two usual configs.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aa0ac365
  8. 10 7月, 2007 2 次提交
    • S
      [GFS2] Accept old format NFS filehandles · 3ebf4490
      Steven Whitehouse 提交于
      On Tue, 2007-07-10 at 10:06 +0100, Christoph Hellwig wrote:
      > > -#define GFS2_LARGE_FH_SIZE 10
      > > -
      > > -struct gfs2_fh_obj {
      > > -   struct gfs2_inum_host this;
      > > -   u32 imode;
      > > -};
      > > +#define GFS2_LARGE_FH_SIZE 8
      >
      > Because gfs2_decode_fh only accepts file handles with GFS2_LARGE_FH_SIZE
      > or GFS2_LARGE_FH_SIZE you don't accept filehandles sent out by and older
      > gfs version anymore.  Stale filehandles because of a new kernel version
      > are a big no-no, so please add back code to handle the old filehandles
      > on the decode side.
      >
      
      This should fix that problem I think since its only relating to end of
      the fh we can just ignore that field in order to accept the older
      format.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Wendy Cheng <wcheng@redhat.com>
      3ebf4490
    • J
      sendfile: remove .sendfile from filesystems that use generic_file_sendfile() · 5ffc4ef4
      Jens Axboe 提交于
      They can use generic_file_splice_read() instead. Since sys_sendfile() now
      prefers that, there should be no change in behaviour.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      5ffc4ef4
  9. 09 7月, 2007 20 次提交
    • S
      [GFS2] Small fixes to logging code · a0a24741
      Steven Whitehouse 提交于
      This reverts part of an earlier patch which tried to reclaim
      gfs2_bufdata structures too early and resulted in a "use after free"
      case (this bit from me). Also a change to not write out log headers
      unless we really need to (in the case of flushing nothing we don't need
      a header) from Bob.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      a0a24741
    • W
      [GFS2] Remove i_mode passing from NFS File Handle · 35dcc52e
      Wendy Cheng 提交于
      GFS2 has been passing i_mode within NFS File Handle. Other than the
      wrong assumption that there is always room for this extra 16 bit value,
      the current gfs2_get_dentry doesn't really need the i_mode to work
      correctly. Note that GFS2 NFS code does go thru the same lookup code
      path as direct file access route (where the mode is obtained from name
      lookup) but gfs2_get_dentry() is coded for different purpose. It is not
      used during lookup time. It is part of the file access procedure call.
      When the call is invoked, if on-disk inode is not in-memory, it has to
      be read-in. This makes i_mode passing a useless overhead.
      Signed-off-by: NS. Wendy Cheng <wcheng@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      35dcc52e
    • W
      [GFS2] Obtaining no_formal_ino from directory entry · bb9bcf06
      Wendy Cheng 提交于
      GFS2 lookup code doesn't ask for inode shared glock. This implies during
      in-memory inode creation for existing file, GFS2 will not disk-read in
      the inode contents. This leaves no_formal_ino un-initialized during
      lookup time. The un-initialized no_formal_ino is subsequently encoded
      into file handle. Clients will get ESTALE error whenever it tries to
      access these files.
      Signed-off-by: NS. Wendy Cheng <wcheng@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      bb9bcf06
    • A
      [GFS2] System won't suspend with GFS2 file system mounted · b3657629
      Abhijith Das 提交于
      The kernel threads in gfs2, namely gfs2_scand, gfs2_logd, gfs2_quotad,
      gfs2_glockd, gfs2_recoverd weren't doing anything when the suspend
      mechanism was trying to freeze them.
      
      I put in calls to refrigerator() in the loops for all the daemons and
      suspend works as expected.
      Signed-off-by: NAbhijith Das <adas@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      b3657629
    • B
      [GFS2] remounting w/o acl option leaves acls enabled · 569a7b6c
      Bob Peterson 提交于
      This patch is for bugzilla bug #245663.  This crosswrites a fix from
      gfs1 (bz #210369) so that the mount options are reset properly upon
      remount.  This was tested on system trin-10.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      569a7b6c
    • W
      [GFS2] inode size inconsistency · 090ffaa5
      Wendy Cheng 提交于
      This should have been part of the NFS patch #1 but somehow I missed it
      when packaging the patches. It is not a critical issue as the others (I
      hope). RHEL 5.1 31.el5 kernel runs fine without this change.
      
      Our truncate code is chopped into two parts, one for vfs inode changes
      (in vmtruncate()) and one of gfs inode (in gfs2_truncatei()). These two
      operatons are, unfortunately, not atomic. So it could happens that
      vmtruncate() succeeds (inode->i_size is changed) but gfs2_truncatei
      fails (say kernel temporarily out of memory). This would leave gfs inode
      i_di.di_size out of sync with vfs inode i_size. It will later confuse
      gfs2_commit_write() if a write is issued. Last time I checked, it will
      cause file corruption.
      Signed-off-by: NS. Wendy Cheng <wcheng@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      090ffaa5
    • S
      [GFS2] Fix gfs2_block_truncate_page err return · 1875f2f3
      S. Wendy Cheng 提交于
      Code segment inside gfs2_block_truncate_page() doesn't set the return
      code correctly. This causes NFSD erroneously returns EIO back to client
      with setattr procedure call (truncate error).
      Signed-off-by: NS. Wendy Cheng <wcheng@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      1875f2f3
    • R
      [GFS2] Addendum to the journaled file/unmount patch · 773ed1a0
      Robert Peterson 提交于
      This patch is an addendum to the previous journaled file/unmount patch.
      It fixes a problem discovered during testing.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      773ed1a0
    • S
      [GFS2] Simplify multiple glock aquisition · eaf5bd3c
      Steven Whitehouse 提交于
      There is a bug in the code which acquires multiple glocks where if the
      initial out-of-order attempt fails part way though we can land up trying
      to acquire the wrong number of glocks. This is part of the fix for red
      hat bz #239737. The other part of the bz doesn't apply to upstream
      kernels since it was fixed by:
      
      http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d3717bdf8f08a0e1039158c8bab2c24d20f492b6
      
      Since the out-of-order code doesn't appear to add anything to the
      performance of GFS2, this patch just removed it rather than trying to
      fix it. It should be much easier to see whats going on here now. In
      addition, we don't allocate any memory unless we are using a lot of
      glocks (which is a relatively uncommon case).
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      eaf5bd3c
    • R
      [GFS2] assertion failure after writing to journaled file, umount · 2332c443
      Robert Peterson 提交于
      This patch passes all my nasty tests that were causing the code to
      fail under one circumstance or another.  Here is a complete summary
      of all changes from today's git tree, in order of appearance:
      
      1. There are now separate variables for metadata buffer accounting.
      2. Variable sd_log_num_hdrs is no longer needed, since the header
         accounting is taken care of by the reserve/refund sequence.
      3. Fixed a tiny grammatical problem in a comment.
      4. Added a new function "calc_reserved" to calculate the reserved
         log space.  This isn't entirely necessary, but it has two benefits:
         First, it simplifies the gfs2_log_refund function greatly.
         Second, it allows for easier debugging because I could sprinkle the
         code with calls to this function to make sure the accounting is
         proper (by adding asserts and printks) at strategic point of the code.
      5. In log_pull_tail there apparently was a kludge to fix up the
         accounting based on a "pull" parameter.  The buffer accounting is
         now done properly, so the kludge was removed.
      6. File sync operations were making a call to gfs2_log_flush that
         writes another journal header.  Since that header was unplanned
         for (reserved) by the reserve/refund sequence, the free space had
         to be decremented so that when log_pull_tail gets called, the free
         space is be adjusted properly.  (Did I hear you call that a kludge?
         well, maybe, but a lot more justifiable than the one I removed).
      7. In the gfs2_log_shutdown code, it optionally syncs the log by
         specifying the PULL parameter to log_write_header.  I'm not sure
         this is necessary anymore.  It just seems to me there could be
         cases where shutdown is called while there are outstanding log
         buffers.
      8. In the (data)buf_lo_before_commit functions, I changed some offset
         values from being calculated on the fly to being constants.	That
         simplified some code and we might as well let the compiler do the
         calculation once rather than redoing those cycles at run time.
      9. This version has my rewritten databuf_lo_add function.
         This version is much more like its predecessor, buf_lo_add, which
         makes it easier to understand.  Again, this might not be necessary,
         but it seems as if this one works as well as the previous one,
         maybe even better, so I decided to leave it in.
      10. In databuf_lo_before_commit, a previous data corruption problem
         was caused by going off the end of the buffer.  The proper solution
         is to have the proper limit in place, rather than stopping earlier.
         (Thus my previous attempt to fix it is wrong).
         If you don't wrap the buffer, you're stopping too early and that
         causes more log buffer accounting problems.
      11. In lops.h there are two new (previously mentioned) constants for
         figuring out the data offset for the journal buffers.
      12. There are also two new functions, buf_limit and databuf_limit to
         calculate how many entries will fit in the buffer.
      13. In function gfs2_meta_wipe, it needs to distinguish between pinned
         metadata buffers and journaled data buffers for proper journal buffer
         accounting.	It can't use the JDATA gfs2_inode flag because it's
         sometimes passed the "real" inode and sometimes the "metadata
         inode" and the inode flags will be random bits in a metadata
         gfs2_inode.	It needs to base its decision on which was passed in.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      2332c443
    • S
      [GFS2] Use zero_user_page() in stuffed_readpage() · 2840501a
      Steven Whitehouse 提交于
      As suggested by Robert P. J. Day <rpjday@mindspring.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Robert P. J. Day <rpjday@mindspring.com>
      2840501a
    • S
      [GFS2] Remove bogus '\0' in rgrp.c · c4201214
      Steven Whitehouse 提交于
      Not sure how it slipped in, but we don't want it anyway.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      c4201214
    • R
      [GFS2] Journaled file write/unstuff bug · 8fb68595
      Robert Peterson 提交于
      This patch is for bugzilla bug 283162, which uncovered a number of
      bugs pertaining to writing to files that have the journaled bit on.
      These bugs happen most often when writing to the meta_fs because
      the files are always journaled.  So operations like gfs2_grow were
      particularly vulnerable, although many of the problems could be
      recreated with normal files after setting the journaled bit on.
      The problems fixed are:
      
      -GFS2 wasn't ever writing unstuffed journaled data blocks to their
       in-place location on disk. Now it does.
      
      -If you unmounted too quickly after doing IO to a journaled file,
       GFS2 was crashing because you would discard a buffer whose bufdata
       was still on the active items list.  GFS2 now deals with this
       gracefully.
      
      -GFS2 was losing track of the bufdata for journaled data blocks,
       and it wasn't getting freed, causing an error when you tried to
       unmount the module.  GFS2 now frees all the bufdata structures.
      
      -There was a memory corruption occurring because GFS2 wrote
       twice as many log entries for journaled buffers.
      
      -It was occasionally trying to write journal headers in buffers
       that weren't currently mapped.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      8fb68595
    • A
      [GFS2] Fix deallocation issues · d93cfa98
      Abhijith Das 提交于
      There were two issues during deallocation of unlinked inodes. The
      first was relating to the use of a "try" lock which in the case of
      the inode lock wasn't trying hard enough to deallocate in all
      circumstances (now changed to a normal glock) and in the case of
      the iopen lock didn't wait for the demotion of the shared lock before
      attempting to get the exclusive lock, and thereby sometimes (timing dependent)
      not completing the deallocation when it should have done.
      
      The second issue related to the lack of a way to invalidate dcache entries
      on remote nodes (now fixed by this patch) which meant that unlinks were
      taking a long time to return disk space to the fs. By adding some code to
      invalidate the dcache entries across the cluster for unlinked inodes, that
      is now fixed.
      
      This patch was written jointly by Abhijith Das and Steven Whitehouse.
      Signed-off-by: NAbhijith Das <adas@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d93cfa98
    • D
      [GFS2] return conflicts for GETLK · a7a2ff8a
      David Teigland 提交于
      We weren't returning the correct result when GETLK found a conflict,
      which is indicated by userspace passing back a 1.
      
      Signed-off-by: Abhijith Das <adas redhat com>
      Signed-off-by: David Teigland <teigland redhat com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      a7a2ff8a
    • D
      [GFS2] set plock owner in GETLK info · d88101d4
      David Teigland 提交于
      Set the owner field in the plock info sent to userspace for GETLK.
      Without this, gfs_controld won't correctly see when the GETLK from a
      process matches one of the process's existing locks.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d88101d4
    • A
      [GFS2] gfs2_lookupi() uninitialised var fix · 037bcbb7
      akpm@linux-foundation.org 提交于
      fs/gfs2/inode.c: In function 'gfs2_lookupi':
      fs/gfs2/inode.c:392: warning: 'error' may be used uninitialized in this function
      
      Looks like a real bug to me.
      
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      037bcbb7
    • S
      [GFS2] Recovery for lost unlinked inodes · c8cdf479
      Steven Whitehouse 提交于
      Under certain circumstances its possible (though rather unlikely) that
      inodes which were unlinked by one node while still open on another might
      get "lost" in the sense that they don't get deallocated if the node
      which held the inode open crashed before it was unlinked.
      
      This patch adds the recovery code which allows automatic deallocation of
      the inode if its found during block allocation (the sensible time to
      look for such inodes since we are scanning the rgrp's bitmaps anyway at
      this time, so it adds no overhead to do this).
      
      Since the inode will have had its i_nlink set to zero, all we need to
      trigger recovery is a lookup and an iput(), and the normal deallocation
      code takes care of the rest.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      c8cdf479
    • R
      [GFS2] Can't mount GFS2 file system on AoE device · b35997d4
      Robert Peterson 提交于
      This patch fixes bug 243131: Can't mount GFS2 file system on AoE device.
      When using AoE devices with lock_nolock, there is no locking table, so
      gfs2 (and gfs1) uses the superblock s_id.  This turns out to be the device
      name in some cases.  In the case of AoE, the device contains a slash,
      (e.g. "etherd/e1.1p2") which is an invalid character when we try to
      register the table in sysfs.  This patch replaces the "/" with underscore.
      Rather than add a new variable to the stack, I'm just reusing a (char *)
      variable that's no longer used: table.
      
      This code has been tested on the failing system using a RHEL5 patch.
      The upstream code was tested by using gfs2_tool sb to interject a "/"
      into the table name of a clustered gfs2 file system.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      b35997d4
    • S
      [GFS2] Fix bug in error path of inode · e1cc8603
      Steven Whitehouse 提交于
      This fixes a bug in the ordering of operations in the error path of
      createi. Its not valid to do an iput() when holding the inode's glock
      since the iput() will (in this case) result in delete_inode() being
      called which needs to grab the lock itself. This was causing the
      recursive lock checking code to trigger.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      e1cc8603