1. 01 4月, 2012 5 次提交
  2. 30 3月, 2012 2 次提交
    • L
      Revert "ext4: don't release page refs in ext4_end_bio()" · 6268b325
      Linus Torvalds 提交于
      This reverts commit b43d17f3.
      
      Dave Jones reports that it causes lockups on his laptop, and his debug
      output showed a lot of processes hung waiting for page_writeback (or
      more commonly - processes hung waiting for a lock that was held during
      that writeback wait).
      
      The page_writeback hint made Ted suggest that Dave look at this commit,
      and Dave verified that reverting it makes his problems go away.
      
      Ted says:
       "That commit fixes a race which is seen when you write into fallocated
        (and hence uninitialized) disk blocks under *very* heavy memory
        pressure.  Furthermore, although theoretically it could trigger under
        normal direct I/O writes, it only seems to trigger if you are issuing
        a huge number of AIO writes, such that a just-written page can get
        evicted from memory, and then read back into memory, before the
        workqueue has a chance to update the extent tree.
      
        This race has been around for a little over a year, and no one noticed
        until two months ago; it only happens under fairly exotic conditions,
        and in fact even after trying very hard to create a simple repro under
        lab conditions, we could only reproduce the problem and confirm the
        fix on production servers running MySQL on very fast PCIe-attached
        flash devices.
      
        Given that Dave was able to hit this problem pretty quickly, if we
        confirm that this commit is at fault, the only reasonable thing to do
        is to revert it IMO."
      Reported-and-tested-by: NDave Jones <davej@redhat.com>
      Acked-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6268b325
    • N
      pagemap: remove remaining unneeded spin_lock() · 10bdfb5e
      Naoya Horiguchi 提交于
      Commit 025c5b24 ("thp: optimize away unnecessary page table
      locking") moves spin_lock() into pmd_trans_huge_lock() in order to avoid
      locking unless pmd is for thp.  So this spin_lock() is a bug.
      Reported-by: NSasha Levin <levinsasha928@gmail.com>
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      10bdfb5e
  3. 29 3月, 2012 9 次提交
  4. 28 3月, 2012 8 次提交
  5. 27 3月, 2012 3 次提交
    • D
      xfs: Account log unmount transaction correctly · 3948659e
      Dave Chinner 提交于
      There have been a few reports of this warning appearing recently:
      
      XFS (dm-4): xlog_space_left: head behind tail
       tail_cycle = 129, tail_bytes = 20163072
       GH   cycle = 129, GH   bytes = 20162880
      
      The common cause appears to be lots of freeze and unfreeze cycles,
      and the output from the warnings indicates that we are leaking
      around 8 bytes of log space per freeze/unfreeze cycle.
      
      When we freeze the filesystem, we write an unmount record and that
      uses xlog_write directly - a special type of transaction,
      effectively. What it doesn't do, however, is correctly account for
      the log space it uses. The unmount record writes an 8 byte structure
      with a special magic number into the log, and the space this
      consumes is not accounted for in the log ticket tracking the
      operation. Hence we leak 8 bytes every unmount record that is
      written.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      3948659e
    • D
      xfs: don't cache inodes read through bulkstat · 5132ba8f
      Dave Chinner 提交于
      When we read inodes via bulkstat, we generally only read them once
      and then throw them away - they never get used again. If we retain
      them in cache, then it simply causes the working set of inodes and
      other cached items to be reclaimed just so the inode cache can grow.
      
      Avoid this problem by marking inodes read by bulkstat not to be
      cached and check this flag in .drop_inode to determine whether the
      inode should be added to the VFS LRU or not. If the inode lookup
      hits an already cached inode, then don't set the flag. If the inode
      lookup hits an inode marked with no cache flag, remove the flag and
      allow it to be cached once the current reference goes away.
      
      Inodes marked as not cached will get cleaned up by the background
      inode reclaim or via memory pressure, so they will still generate
      some short term cache pressure. They will, however, be reclaimed
      much sooner and in preference to cache hot inodes.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      5132ba8f
    • C
      xfs: trace xfs_name strings correctly · f6161375
      Christoph Hellwig 提交于
      Strings store in an xfs_name structure are often not NUL terminated,
      print them using the correct printf specifiers that make use of the
      string length store in the xfs_name structure.
      Reported-by: NBrian Candler <B.Candler@pobox.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      f6161375
  6. 26 3月, 2012 7 次提交
    • J
      nfsd4: allow numeric idmapping · e9541ce8
      J. Bruce Fields 提交于
      Mimic the client side by providing a module parameter that turns off
      idmapping in the auth_sys case, for backwards compatibility with NFSv2
      and NFSv3.
      
      Unlike in the client case, we don't have any way to negotiate, since the
      client can return an error to us if it doesn't like the id that we
      return to it in (for example) a getattr call.
      
      However, it has always been possible for servers to return numeric id's,
      and as far as we're aware clients have always been able to handle them.
      
      Also, in the auth_sys case clients already need to have numeric id's the
      same between client and server.
      
      Therefore we believe it's safe to default this to on; but the module
      parameter is available to return to previous behavior if this proves to
      be a problem in some unexpected setup.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      e9541ce8
    • J
      nfsd: don't allow legacy client tracker init for anything but init_net · cc27e0d4
      Jeff Layton 提交于
      This code isn't set up for containers, so don't allow it to be
      used for anything but init_net.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      cc27e0d4
    • J
      nfsd: add notifier to handle mount/unmount of rpc_pipefs sb · 813fd320
      Jeff Layton 提交于
      In the event that rpc_pipefs isn't mounted when nfsd starts, we
      must register a notifier to handle creating the dentry once it
      is mounted, and to remove the dentry on unmount.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      813fd320
    • J
      nfsd: add the infrastructure to handle the cld upcall · f3f80148
      Jeff Layton 提交于
      ...and add a mechanism for switching between the "legacy" tracker and
      the new one. The decision is made by looking to see whether the
      v4recoverydir exists. If it does, then the legacy client tracker is
      used.
      
      If it's not, then the kernel will create a "cld" pipe in rpc_pipefs.
      That pipe is used to talk to a daemon for handling the upcall.
      
      Most of the data structures for the new client tracker are handled on a
      per-namespace basis, so this upcall should be essentially ready for
      containerization. For now however, nfsd just starts it by calling the
      initialization and exit functions for init_net.
      
      I'm making the assumption that at some point in the future we'll be able
      to determine the net namespace from the nfs4_client. Until then, this
      patch hardcodes init_net in those places. I've sprinkled some "FIXME"
      comments around that code to attempt to make it clear where we'll need
      to fix that up later.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      f3f80148
    • J
      nfsd: add a per-net-namespace struct for nfsd · 7ea34ac1
      Jeff Layton 提交于
      Eventually, we'll need this when nfsd gets containerized fully. For
      now, create a struct on a per-net-namespace basis that will just hold
      a pointer to the cld_net structure. That struct will hold all of the
      per-net data that we need for the cld tracker.
      
      Eventually we can add other pernet objects to struct nfsd_net.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      7ea34ac1
    • J
      nfsd: add nfsd4_client_tracking_ops struct and a way to set it · 2a4317c5
      Jeff Layton 提交于
      Abstract out the mechanism that we use to track clients into a set of
      client name tracking functions.
      
      This gives us a mechanism to plug in a new set of client tracking
      functions without disturbing the callers. It also gives us a way to
      decide on what tracking scheme to use at runtime.
      
      For now, this just looks like pointless abstraction, but later we'll
      add a new alternate scheme for tracking clients on stable storage.
      
      Note too that this patch anticipates the eventual containerization
      of this code by passing in struct net pointers in places. No attempt
      is made to containerize the legacy client tracker however.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      2a4317c5
    • J
      nfsd: convert nfs4_client->cl_cb_flags to a generic flags field · a52d726b
      Jeff Layton 提交于
      We'll need a way to flag the nfs4_client as already being recorded on
      stable storage so that we don't continually upcall. Currently, that's
      recorded in the cl_firststate field of the client struct. Using an
      entire u32 to store a flag is rather wasteful though.
      
      The cl_cb_flags field is only using 2 bits right now, so repurpose that
      to a generic flags field. Rename NFSD4_CLIENT_KILL to
      NFSD4_CLIENT_CB_KILL to make it evident that it's part of the callback
      flags. Add a mask that we can use for existing checks that look to see
      whether any flags are set, so that the new flags don't interfere.
      
      Convert all references to cl_firstate to the NFSD4_CLIENT_STABLE flag,
      and add a new NFSD4_CLIENT_RECLAIM_COMPLETE flag.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      a52d726b
  7. 25 3月, 2012 3 次提交
  8. 24 3月, 2012 3 次提交