1. 09 1月, 2011 1 次提交
  2. 07 1月, 2011 10 次提交
    • N
      fs: provide rcu-walk aware permission i_ops · b74c79e9
      Nick Piggin 提交于
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      b74c79e9
    • N
      fs: rcu-walk aware d_revalidate method · 34286d66
      Nick Piggin 提交于
      Require filesystems be aware of .d_revalidate being called in rcu-walk
      mode (nd->flags & LOOKUP_RCU). For now do a simple push down, returning
      -ECHILD from all implementations.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      34286d66
    • N
      fs: rcu-walk for path lookup · 31e6b01f
      Nick Piggin 提交于
      Perform common cases of path lookups without any stores or locking in the
      ancestor dentry elements. This is called rcu-walk, as opposed to the current
      algorithm which is a refcount based walk, or ref-walk.
      
      This results in far fewer atomic operations on every path element,
      significantly improving path lookup performance. It also avoids cacheline
      bouncing on common dentries, significantly improving scalability.
      
      The overall design is like this:
      * LOOKUP_RCU is set in nd->flags, which distinguishes rcu-walk from ref-walk.
      * Take the RCU lock for the entire path walk, starting with the acquiring
        of the starting path (eg. root/cwd/fd-path). So now dentry refcounts are
        not required for dentry persistence.
      * synchronize_rcu is called when unregistering a filesystem, so we can
        access d_ops and i_ops during rcu-walk.
      * Similarly take the vfsmount lock for the entire path walk. So now mnt
        refcounts are not required for persistence. Also we are free to perform mount
        lookups, and to assume dentry mount points and mount roots are stable up and
        down the path.
      * Have a per-dentry seqlock to protect the dentry name, parent, and inode,
        so we can load this tuple atomically, and also check whether any of its
        members have changed.
      * Dentry lookups (based on parent, candidate string tuple) recheck the parent
        sequence after the child is found in case anything changed in the parent
        during the path walk.
      * inode is also RCU protected so we can load d_inode and use the inode for
        limited things.
      * i_mode, i_uid, i_gid can be tested for exec permissions during path walk.
      * i_op can be loaded.
      
      When we reach the destination dentry, we lock it, recheck lookup sequence,
      and increment its refcount and mountpoint refcount. RCU and vfsmount locks
      are dropped. This is termed "dropping rcu-walk". If the dentry refcount does
      not match, we can not drop rcu-walk gracefully at the current point in the
      lokup, so instead return -ECHILD (for want of a better errno). This signals the
      path walking code to re-do the entire lookup with a ref-walk.
      
      Aside from the final dentry, there are other situations that may be encounted
      where we cannot continue rcu-walk. In that case, we drop rcu-walk (ie. take
      a reference on the last good dentry) and continue with a ref-walk. Again, if
      we can drop rcu-walk gracefully, we return -ECHILD and do the whole lookup
      using ref-walk. But it is very important that we can continue with ref-walk
      for most cases, particularly to avoid the overhead of double lookups, and to
      gain the scalability advantages on common path elements (like cwd and root).
      
      The cases where rcu-walk cannot continue are:
      * NULL dentry (ie. any uncached path element)
      * parent with d_inode->i_op->permission or ACLs
      * dentries with d_revalidate
      * Following links
      
      In future patches, permission checks and d_revalidate become rcu-walk aware. It
      may be possible eventually to make following links rcu-walk aware.
      
      Uncached path elements will always require dropping to ref-walk mode, at the
      very least because i_mutex needs to be grabbed, and objects allocated.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      31e6b01f
    • N
      fs: icache RCU free inodes · fa0d7e3d
      Nick Piggin 提交于
      RCU free the struct inode. This will allow:
      
      - Subsequent store-free path walking patch. The inode must be consulted for
        permissions when walking, so an RCU inode reference is a must.
      - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
        to take i_lock no longer need to take sb_inode_list_lock to walk the list in
        the first place. This will simplify and optimize locking.
      - Could remove some nested trylock loops in dcache code
      - Could potentially simplify things a bit in VM land. Do not need to take the
        page lock to follow page->mapping.
      
      The downsides of this is the performance cost of using RCU. In a simple
      creat/unlink microbenchmark, performance drops by about 10% due to inability to
      reuse cache-hot slab objects. As iterations increase and RCU freeing starts
      kicking over, this increases to about 20%.
      
      In cases where inode lifetimes are longer (ie. many inodes may be allocated
      during the average life span of a single inode), a lot of this cache reuse is
      not applicable, so the regression caused by this patch is smaller.
      
      The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
      however this adds some complexity to list walking and store-free path walking,
      so I prefer to implement this at a later date, if it is shown to be a win in
      real situations. I haven't found a regression in any non-micro benchmark so I
      doubt it will be a problem.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      fa0d7e3d
    • N
      fs: dcache remove dcache_lock · b5c84bf6
      Nick Piggin 提交于
      dcache_lock no longer protects anything. remove it.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      b5c84bf6
    • N
      fs: change d_hash for rcu-walk · b1e6a015
      Nick Piggin 提交于
      Change d_hash so it may be called from lock-free RCU lookups. See similar
      patch for d_compare for details.
      
      For in-tree filesystems, this is just a mechanical change.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      b1e6a015
    • N
      fs: change d_compare for rcu-walk · 621e155a
      Nick Piggin 提交于
      Change d_compare so it may be called from lock-free RCU lookups. This
      does put significant restrictions on what may be done from the callback,
      however there don't seem to have been any problems with in-tree fses.
      If some strange use case pops up that _really_ cannot cope with the
      rcu-walk rules, we can just add new rcu-unaware callbacks, which would
      cause name lookup to drop out of rcu-walk mode.
      
      For in-tree filesystems, this is just a mechanical change.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      621e155a
    • N
      fs: change d_delete semantics · fe15ce44
      Nick Piggin 提交于
      Change d_delete from a dentry deletion notification to a dentry caching
      advise, more like ->drop_inode. Require it to be constant and idempotent,
      and not take d_lock. This is how all existing filesystems use the callback
      anyway.
      
      This makes fine grained dentry locking of dput and dentry lru scanning
      much simpler.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      fe15ce44
    • N
      Documentation: update kernel-docs.txt · d5ba92b7
      Nicolas Kaiser 提交于
      Fixed typos, and removed duplicated entries.
      Signed-off-by: NNicolas Kaiser <nikai@nikai.net>
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d5ba92b7
    • M
      Documentation/dontdiff: add further autogenerated files to ignore list · a4064978
      Michael Prokop 提交于
      Mainly resulting from (but not limited to) autogenerated files of
      lib/raid6 and drivers/gpu/drm/radeon. List generated as result of
      a diff of a clean 2.6.36 tree against a built one.
      Signed-off-by: NMichael Prokop <mika@grml.org>
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a4064978
  3. 06 1月, 2011 1 次提交
    • J
      tools, perf: Documentation for the power events API · 4b95f135
      Jean Pihet 提交于
      Provides documentation for the following:
      - the new power trace API,
      - the old (legacy) power trace API,
      - the DEPRECATED Kconfig option usage.
      Signed-off-by: NJean Pihet <j-pihet@ti.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: trenn@suse.de
      Cc: Len Brown <len.brown@intel.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: linux-pm@lists.linux-foundation.org
      LKML-Reference: <1294253342-29056-3-git-send-email-j-pihet@ti.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4b95f135
  4. 05 1月, 2011 1 次提交
  5. 03 1月, 2011 1 次提交
    • B
      watchdog: Improve initialisation error message and documentation · 55142374
      Ben Hutchings 提交于
      The error message 'NMI watchdog failed to create perf event...'
      does not make it clear that this is a fatal error for the
      watchdog.  It also currently prints the error value as a
      pointer, rather than extracting the error code with PTR_ERR().
      Fix that.
      
      Add a note to the description of the 'nowatchdog' kernel
      parameter to associate it with this message.
      Reported-by: NCesare Leonardi <celeonar@gmail.com>
      Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
      Cc: 599368@bugs.debian.org
      Cc: 608138@bugs.debian.org
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: <stable@kernel.org> # .37.x and later
      LKML-Reference: <1294009362.3167.126.camel@localhost>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      55142374
  6. 31 12月, 2010 1 次提交
  7. 29 12月, 2010 16 次提交
  8. 25 12月, 2010 1 次提交
  9. 23 12月, 2010 3 次提交
    • A
      [SCSI] megaraid_sas: Documentation update · 5f7bb3a4
      adam radford 提交于
      The following patch updates the
      Documentation/scsi/ChangeLog.megaraid_sas file.
      Signed-off-by: NAdam Radford <aradford@gmail.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      5f7bb3a4
    • J
      taskstats: pad taskstats netlink response for aligment issues on ia64 · 4be2c95d
      Jeff Mahoney 提交于
      The taskstats structure is internally aligned on 8 byte boundaries but the
      layout of the aggregrate reply, with two NLA headers and the pid (each 4
      bytes), actually force the entire structure to be unaligned.  This causes
      the kernel to issue unaligned access warnings on some architectures like
      ia64.  Unfortunately, some software out there doesn't properly unroll the
      NLA packet and assumes that the start of the taskstats structure will
      always be 20 bytes from the start of the netlink payload.  Aligning the
      start of the taskstats structure breaks this software, which we don't
      want.  So, for now the alignment only happens on architectures that
      require it and those users will have to update to fixed versions of those
      packages.  Space is reserved in the packet only when needed.  This ifdef
      should be removed in several years e.g.  2012 once we can be confident
      that fixed versions are installed on most systems.  We add the padding
      before the aggregate since the aggregate is already a defined type.
      
      Commit 85893120 ("delayacct: align to 8 byte boundary on 64-bit systems")
      previously addressed the alignment issues by padding out the pid field.
      This was supposed to be a compatible change but the circumstances
      described above mean that it wasn't.  This patch backs out that change,
      since it was a hack, and introduces a new NULL attribute type to provide
      the padding.  Padding the response with 4 bytes avoids allocating an
      aligned taskstats structure and copying it back.  Since the structure
      weighs in at 328 bytes, it's too big to do it on the stack.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Reported-by: NBrian Rogers <brian@xyzw.org>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Guillaume Chazarain <guichaz@gmail.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4be2c95d
    • M
      mm: vmscan: tracepoint: account for scanned pages similarly for both ftrace and vmstat · 7a2d19bc
      Mel Gorman 提交于
      When correlating ftrace results with /proc/vmstat, I noticed that the
      reporting scripts value for "pages scanned" differed significantly.  Both
      values were "right" depending on how you look at it.
      
      The difference is due to vmstat only counting scanning of the inactive
      list towards pages scanned.  The analysis script for the tracepoint counts
      active and inactive list yielding a far higher value than vmstat.  The
      resulting scanning/reclaim ratio looks much worse.  The tracepoint is ok
      but this patch updates the reporting script so that the report values for
      scanned are similar to vmstat.
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7a2d19bc
  10. 22 12月, 2010 1 次提交
  11. 21 12月, 2010 1 次提交
  12. 20 12月, 2010 1 次提交
  13. 18 12月, 2010 1 次提交
  14. 17 12月, 2010 1 次提交