1. 18 3月, 2011 5 次提交
    • J
      Btrfs: use a slab for the free space entries · dc89e982
      Josef Bacik 提交于
      Since we alloc/free free space entries a whole lot, lets use a slab to keep
      track of them.  This makes some of my tests slightly faster.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      dc89e982
    • J
      Btrfs: change reserved_extents to an atomic_t · 57a45ced
      Josef Bacik 提交于
      We track delayed allocation per inodes via 2 counters, one is
      outstanding_extents and reserved_extents.  Outstanding_extents is already an
      atomic_t, but reserved_extents is not and is protected by a spinlock.  So
      convert this to an atomic_t and instead of using a spinlock, use atomic_cmpxchg
      when releasing delalloc bytes.  This makes our inode 72 bytes smaller, and
      reduces locking overhead (albiet it was minimal to begin with).  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      57a45ced
    • J
      Btrfs: fix how we deal with the pages array in the write path · 4a64001f
      Josef Bacik 提交于
      Really we don't need to memset the pages array at all, since we know how many
      pages we're going to use in the array and pass that around.  So don't memset,
      just trust we're not idiots and we pass num_pages around properly.
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      4a64001f
    • J
      Btrfs: simplify our write path · d0215f3e
      Josef Bacik 提交于
      Our aio_write function is huge and kind of hard to follow at times.  So this
      patch fixes this by breaking out the buffered and direct write paths out into
      seperate functions so it's a little clearer what's going on.  I've also fixed
      some wrong typing that we had and added the ability to handle getting an error
      back from btrfs_set_extent_delalloc.  Tested this with xfstests and everything
      came out fine.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      d0215f3e
    • J
      Btrfs: fix formatting in file.c · 9f570b8d
      Josef Bacik 提交于
      Sorry, but these were bugging me.  Just cleanup some of the formatting in
      file.c.
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      9f570b8d
  2. 15 3月, 2011 1 次提交
    • T
      Fix corrupted OSF partition table parsing · 1eafbfeb
      Timo Warns 提交于
      The kernel automatically evaluates partition tables of storage devices.
      The code for evaluating OSF partitions contains a bug that leaks data
      from kernel heap memory to userspace for certain corrupted OSF
      partitions.
      
      In more detail:
      
        for (i = 0 ; i < le16_to_cpu(label->d_npartitions); i++, partition++) {
      
      iterates from 0 to d_npartitions - 1, where d_npartitions is read from
      the partition table without validation and partition is a pointer to an
      array of at most 8 d_partitions.
      
      Add the proper and obvious validation.
      Signed-off-by: NTimo Warns <warns@pre-sense.de>
      Cc: stable@kernel.org
      [ Changed the patch trivially to not repeat the whole le16_to_cpu()
        thing, and to use an explicit constant for the magic value '8' ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1eafbfeb
  3. 14 3月, 2011 1 次提交
  4. 12 3月, 2011 7 次提交
    • C
      Btrfs: break out of shrink_delalloc earlier · 36e39c40
      Chris Mason 提交于
      Josef had changed shrink_delalloc to exit after three shrink
      attempts, which wasn't quite enough because new writers could
      race in and steal free space.
      
      But it also fixed deadlocks and stalls as we tried to recover
      delalloc reservations.  The code was tweaked to loop 1024
      times, and would reset the counter any time a small amount
      of progress was made.  This was too drastic, and with a
      lot of writers we can end up stuck in shrink_delalloc forever.
      
      The shrink_delalloc loop is fairly complex because the caller is looping
      too, and the caller will go ahead and force a transaction commit to make
      sure we reclaim space.
      
      This reworks things to exit shrink_delalloc when we've forced some
      writeback and the delalloc reservations have gone down.  This means
      the writeback has not just started but has also finished at
      least some of the metadata changes required to reclaim delalloc
      space.
      
      If we've got this wrong, we're returning ENOSPC too early, which
      is a big improvement over the current behavior of hanging the machine.
      
      Test 224 in xfstests hammers on this nicely, and with 1000 writers
      trying to fill a 1GB drive we get our first ENOSPC at 93% full.  The
      other writers are able to continue until we get 100%.
      
      This is a worst case test for btrfs because the 1000 writers are doing
      small IO, and the small FS size means we don't have a lot of room
      for metadata chunks.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      36e39c40
    • C
      NFS: NFSROOT should default to "proto=udp" · 53d47375
      Chuck Lever 提交于
      There have been a number of recent reports that NFSROOT is no longer
      working with default mount options, but fails only with certain NICs.
      
      Brian Downing <bdowning@lavos.net> bisected to commit 56463e50 "NFS:
      Use super.c for NFSROOT mount option parsing".  Among other things,
      this commit changes the default mount options for NFSROOT to use TCP
      instead of UDP as the underlying transport.
      
      TCP seems less able to deal with NICs that are slow to initialize.
      The system logs that have accompanied reports of problems all show
      that NFSROOT attempts to establish a TCP connection before the NIC is
      fully initialized, and thus the TCP connection attempt fails.
      
      When a TCP connection attempt fails during a mount operation, the
      NFS stack needs to fail the operation.  Usually user space knows how
      and when to retry it.  The network layer does not report a distinct
      error code for this particular failure mode.  Thus, there isn't a
      clean way for the RPC client to see that it needs to retry in this
      case, but not in others.
      
      Because NFSROOT is used in some environments where it is not possible
      to update the kernel command line to specify "udp", the proper thing
      to do is change NFSROOT to use UDP by default, as it did before commit
      56463e50.
      
      To make it easier to see how to change default mount options for
      NFSROOT and to distinguish default settings from mandatory settings,
      I've adjusted a couple of areas to document the specifics.
      
      root_nfs_cat() is also modified to deal with commas properly when
      concatenating strings containing mount option lists.  This keeps
      root_nfs_cat() call sites simpler, now that we may be concatenating
      multiple mount option strings.
      Tested-by: NBrian Downing <bdowning@lavos.net>
      Tested-by: NMark Brown <broonie@opensource.wolfsonmicro.com>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Cc: <stable@kernel.org> # 2.6.37
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      53d47375
    • H
      nfs4: remove duplicated #include · 57df216b
      Huang Weiyi 提交于
      Remove duplicated #include('s) in
        fs/nfs/nfs4proc.c
      Signed-off-by: NHuang Weiyi <weiyi.huang@gmail.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      57df216b
    • T
      NFSv4: nfs4_state_mark_reclaim_nograce() should be static · f9feab1e
      Trond Myklebust 提交于
      There are no more external users of nfs4_state_mark_reclaim_nograce() or
      nfs4_state_mark_reclaim_reboot(), so mark them as static.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      f9feab1e
    • T
      NFSv4: Fix the setlk error handler · ecac799a
      Trond Myklebust 提交于
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      ecac799a
    • T
      NFSv4.1: Fix the handling of the SEQUENCE status bits · b4410c2f
      Trond Myklebust 提交于
      We want SEQUENCE status bits to be handled by the state manager in order
      to avoid threading issues.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      b4410c2f
    • T
      NFSv4/4.1: Fix nfs4_schedule_state_recovery abuses · 0400a6b0
      Trond Myklebust 提交于
      nfs4_schedule_state_recovery() should only be used when we need to force
      the state manager to check the lease. If we just want to start the
      state manager in order to handle a state recovery situation, we should be
      using nfs4_schedule_state_manager().
      
      This patch fixes the abuses of nfs4_schedule_state_recovery() by replacing
      its use with a set of helper functions that do the right thing.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      0400a6b0
  5. 11 3月, 2011 10 次提交
  6. 10 3月, 2011 10 次提交
  7. 09 3月, 2011 3 次提交
  8. 08 3月, 2011 3 次提交
    • A
      unfuck proc_sysctl ->d_compare() · dfef6dcd
      Al Viro 提交于
      a) struct inode is not going to be freed under ->d_compare();
      however, the thing PROC_I(inode)->sysctl points to just might.
      Fortunately, it's enough to make freeing that sucker delayed,
      provided that we don't step on its ->unregistering, clear
      the pointer to it in PROC_I(inode) before dropping the reference
      and check if it's NULL in ->d_compare().
      
      b) I'm not sure that we *can* walk into NULL inode here (we recheck
      dentry->seq between verifying that it's still hashed / fetching
      dentry->d_inode and passing it to ->d_compare() and there's no
      negative hashed dentries in /proc/sys/*), but if we can walk into
      that, we really should not have ->d_compare() return 0 on it!
      Said that, I really suspect that this check can be simply killed.
      Nick?
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      dfef6dcd
    • J
      nfsd4: fix bad pointer on failure to find delegation · 32b007b4
      J. Bruce Fields 提交于
      In case of a nonempty list, the return on error here is obviously bogus;
      it ends up being a pointer to the list head instead of to any valid
      delegation on the list.
      
      In particular, if nfsd4_delegreturn() hits this case, and you're quite unlucky,
      then renew_client may oops, and it may take an embarassingly long time to
      figure out why.  Facepalm.
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000090
      IP: [<ffffffff81292965>] nfsd4_delegreturn+0x125/0x200
      ...
      
      Cc: stable@kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      32b007b4
    • C
      Btrfs: deal with short returns from copy_from_user · 31339acd
      Chris Mason 提交于
      When copy_from_user is only able to copy some of the bytes we requested,
      we may end up creating a partially up to date page.  To avoid garbage in
      the page, we need to treat a partial copy as a zero length copy.
      
      This makes the rest of the file_write code drop the page and
      retry the whole copy instead of marking the partially up to
      date page as dirty.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      cc: stable@kernel.org
      31339acd