1. 12 6月, 2009 40 次提交
    • N
      fs: introduce mnt_clone_write · 96029c4e
      npiggin@suse.de 提交于
      This patch speeds up lmbench lat_mmap test by about another 2% after the
      first patch.
      
      Before:
       avg = 462.286
       std = 5.46106
      
      After:
       avg = 453.12
       std = 9.58257
      
      (50 runs of each, stddev gives a reasonable confidence)
      
      It does this by introducing mnt_clone_write, which avoids some heavyweight
      operations of mnt_want_write if called on a vfsmount which we know already
      has a write count; and mnt_want_write_file, which can call mnt_clone_write
      if the file is open for write.
      
      After these two patches, mnt_want_write and mnt_drop_write go from 7% on
      the profile down to 1.3% (including mnt_clone_write).
      
      [AV: mnt_want_write_file() should take file alone and derive mnt from it;
      not only all callers have that form, but that's the only mnt about which
      we know that it's already held for write if file is opened for write]
      
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      96029c4e
    • N
      fs: mnt_want_write speedup · d3ef3d73
      npiggin@suse.de 提交于
      This patch speeds up lmbench lat_mmap test by about 8%. lat_mmap is set up
      basically to mmap a 64MB file on tmpfs, fault in its pages, then unmap it.
      A microbenchmark yes, but it exercises some important paths in the mm.
      
      Before:
       avg = 501.9
       std = 14.7773
      
      After:
       avg = 462.286
       std = 5.46106
      
      (50 runs of each, stddev gives a reasonable confidence, but there is quite
      a bit of variation there still)
      
      It does this by removing the complex per-cpu locking and counter-cache and
      replaces it with a percpu counter in struct vfsmount. This makes the code
      much simpler, and avoids spinlocks (although the msync is still pretty
      costly, unfortunately). It results in about 900 bytes smaller code too. It
      does increase the size of a vfsmount, however.
      
      It should also give a speedup on large systems if CPUs are frequently operating
      on different mounts (because the existing scheme has to operate on an atomic in
      the struct vfsmount when switching between mounts). But I'm most interested in
      the single threaded path performance for the moment.
      
      [AV: minor cleanup]
      
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      d3ef3d73
    • A
      Move junk from proc_fs.h to fs/proc/internal.h · 3174c21b
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      3174c21b
    • A
      switch lookup_mnt() · 1c755af4
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1c755af4
    • A
      switch follow_mount() · 79ed0226
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      79ed0226
    • A
      switch follow_down() · 9393bd07
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9393bd07
    • A
      Switch collect_mounts() to struct path · 589ff870
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      589ff870
    • A
      switch follow_up() to struct path · bab77ebf
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      bab77ebf
    • A
      switch rqst_exp_parent() · e64c390c
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e64c390c
    • A
      switch rqst_exp_get_by_name() · 91c9fa8f
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      91c9fa8f
    • A
      switch exp_parent() to struct path · 5bf3bd2b
      Al Viro 提交于
      ... and lose the always-NULL last argument (non-NULL case had been
      split off a while ago).
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5bf3bd2b
    • A
      nfsd struct path use: exp_get_by_name() · 55430e2e
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      55430e2e
    • A
      Don't bother with check_mnt() in do_add_mount() on shrinkable ones · dd5cae6e
      Al Viro 提交于
      These guys are what we add as submounts; checks for "is that attached in
      our namespace" are simply irrelevant for those and counterproductive for
      use of private vfsmount trees a-la what NFS folks want.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      dd5cae6e
    • A
      Make vfs_path_lookup() use starting point as root · 5b857119
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5b857119
    • A
      Cache root in nameidata · 2a737871
      Al Viro 提交于
      New field: nd->root.  When pathname resolution wants to know the root,
      check if nd->root.mnt is non-NULL; use nd->root if it is, otherwise
      copy current->fs->root there.  After path_walk() is finished, we check
      if we'd got a cached value in nd->root and drop it.  Before calling
      path_walk() we should either set nd->root.mnt to NULL *or* copy (and
      pin down) some path to nd->root.  In the latter case we won't be
      looking at current->fs->root at all.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2a737871
    • A
      Preparations to caching root in path_walk() · 9b4a9b14
      Al Viro 提交于
      Split do_path_lookup(), opencode the call from do_filp_open()
      do_filp_open() is the only caller of do_path_lookup() that
      cares about root afterwards (it keeps resolving symlinks on
      O_CREAT path after it'd done LOOKUP_PARENT walk).  So when
      we start caching fs->root in path_walk(), it'll need a different
      treatment.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9b4a9b14
    • A
      Get rid of path_lookup in autofs4 · 4e44b685
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4e44b685
    • J
      reiserfs: allow exposing privroot w/ xattrs enabled · 73422811
      Jeff Mahoney 提交于
      This patch adds an -oexpose_privroot option to allow access to the privroot.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      73422811
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable · a525890c
      Linus Torvalds 提交于
      * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (23 commits)
        Btrfs: fix extent_buffer leak during tree log replay
        Btrfs: fix oops when btrfs_inherit_iflags called with a NULL dir
        Btrfs: fix -o nodatasum printk spelling
        Btrfs: check duplicate backrefs for both data and metadata
        Btrfs: init worker struct fields before kthread-run
        Btrfs: pin buffers during write_dev_supers
        Btrfs: avoid races between super writeout and device list updates
        Fix btrfs when ACLs are configured out
        Btrfs: fdatasync should skip metadata writeout
        Btrfs: remove crc32c.h and use libcrc32c directly.
        Btrfs: implement FS_IOC_GETFLAGS/SETFLAGS/GETVERSION
        Btrfs: autodetect SSD devices
        Btrfs: add mount -o ssd_spread to spread allocations out
        Btrfs: avoid allocation clusters that are too spread out
        Btrfs: Add mount -o nossd
        Btrfs: avoid IO stalls behind congested devices in a multi-device FS
        Btrfs: don't allow WRITE_SYNC bios to starve out regular writes
        Btrfs: fix metadata dirty throttling limits
        Btrfs: reduce mount -o ssd CPU usage
        Btrfs: balance btree more often
        ...
      a525890c
    • L
      Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify · 3bb66d7f
      Linus Torvalds 提交于
      * 'for-linus' of git://git.infradead.org/users/eparis/notify:
        fsnotify: allow groups to set freeing_mark to null
        inotify/dnotify: should_send_event shouldn't match on FS_EVENT_ON_CHILD
        dnotify: do not bother to lock entry->lock when reading mask
        dnotify: do not use ?true:false when assigning to a bool
        fsnotify: move events should indicate the event was on a child
        inotify: reimplement inotify using fsnotify
        fsnotify: handle filesystem unmounts with fsnotify marks
        fsnotify: fsnotify marks on inodes pin them in core
        fsnotify: allow groups to add private data to events
        fsnotify: add correlations between events
        fsnotify: include pathnames with entries when possible
        fsnotify: generic notification queue and waitq
        dnotify: reimplement dnotify using fsnotify
        fsnotify: parent event notification
        fsnotify: add marks to inodes so groups can interpret how to handle those inodes
        fsnotify: unified filesystem notification backend
      3bb66d7f
    • L
      Merge branch 'for-linus' of git://linux-arm.org/linux-2.6 · 512626a0
      Linus Torvalds 提交于
      * 'for-linus' of git://linux-arm.org/linux-2.6:
        kmemleak: Add the corresponding MAINTAINERS entry
        kmemleak: Simple testing module for kmemleak
        kmemleak: Enable the building of the memory leak detector
        kmemleak: Remove some of the kmemleak false positives
        kmemleak: Add modules support
        kmemleak: Add kmemleak_alloc callback from alloc_large_system_hash
        kmemleak: Add the vmalloc memory allocation/freeing hooks
        kmemleak: Add the slub memory allocation/freeing hooks
        kmemleak: Add the slob memory allocation/freeing hooks
        kmemleak: Add the slab memory allocation/freeing hooks
        kmemleak: Add documentation on the memory leak detector
        kmemleak: Add the base support
      
      Manual conflict resolution (with the slab/earlyboot changes) in:
      	drivers/char/vt.c
      	init/main.c
      	mm/slab.c
      512626a0
    • L
      Merge branch 'perfcounters-for-linus' of... · 8a1ca8ce
      Linus Torvalds 提交于
      Merge branch 'perfcounters-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
      
      * 'perfcounters-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (574 commits)
        perf_counter: Turn off by default
        perf_counter: Add counter->id to the throttle event
        perf_counter: Better align code
        perf_counter: Rename L2 to LL cache
        perf_counter: Standardize event names
        perf_counter: Rename enums
        perf_counter tools: Clean up u64 usage
        perf_counter: Rename perf_counter_limit sysctl
        perf_counter: More paranoia settings
        perf_counter: powerpc: Implement generalized cache events for POWER processors
        perf_counters: powerpc: Add support for POWER7 processors
        perf_counter: Accurate period data
        perf_counter: Introduce struct for sample data
        perf_counter tools: Normalize data using per sample period data
        perf_counter: Annotate exit ctx recursion
        perf_counter tools: Propagate signals properly
        perf_counter tools: Small frequency related fixes
        perf_counter: More aggressive frequency adjustment
        perf_counter/x86: Fix the model number of Intel Core2 processors
        perf_counter, x86: Correct some event and umask values for Intel processors
        ...
      8a1ca8ce
    • L
      Merge branch 'topic/slab/earlyboot' of... · b640f042
      Linus Torvalds 提交于
      Merge branch 'topic/slab/earlyboot' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6
      
      * 'topic/slab/earlyboot' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
        vgacon: use slab allocator instead of the bootmem allocator
        irq: use kcalloc() instead of the bootmem allocator
        sched: use slab in cpupri_init()
        sched: use alloc_cpumask_var() instead of alloc_bootmem_cpumask_var()
        memcg: don't use bootmem allocator in setup code
        irq/cpumask: make memoryless node zero happy
        x86: remove some alloc_bootmem_cpumask_var calling
        vt: use kzalloc() instead of the bootmem allocator
        sched: use kzalloc() instead of the bootmem allocator
        init: introduce mm_init()
        vmalloc: use kzalloc() instead of alloc_bootmem()
        slab: setup allocators earlier in the boot sequence
        bootmem: fix slab fallback on numa
        bootmem: use slab if bootmem is no longer available
      b640f042
    • E
      fsnotify: allow groups to set freeing_mark to null · a092ee20
      Eric Paris 提交于
      Most fsnotify listeners (all but inotify) do not care about marks being
      freed.  Allow groups to set freeing_mark to null and do not call any
      function if it is set that way.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      a092ee20
    • E
      inotify/dnotify: should_send_event shouldn't match on FS_EVENT_ON_CHILD · e42e2773
      Eric Paris 提交于
      inotify and dnotify will both indicate that they want any event which came
      from a child inode.  The fix is to mask off FS_EVENT_ON_CHILD when deciding
      if inotify or dnotify is interested in a given event.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      e42e2773
    • E
      dnotify: do not bother to lock entry->lock when reading mask · ce61856b
      Eric Paris 提交于
      entry->lock is needed to make sure entry->mask does not change while
      manipulating it.  In dnotify_should_send_event() we don't care if we get an
      old or a new mask value out of this entry so there is no point it taking
      the lock.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      ce61856b
    • E
      dnotify: do not use ?true:false when assigning to a bool · 5ac697b7
      Eric Paris 提交于
      dnotify_should send event assigned a bool using ?true:false when computing
      a bit operation.  This is poitless and the bool type does this for us.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      5ac697b7
    • E
      fsnotify: move events should indicate the event was on a child · ff52cc21
      Eric Paris 提交于
      fsnotify tells its listeners explicitly when an event happened on the given
      inode verses on the child of the given inode.  (see __fsnotify_parent)
      However, the semantics of fsnotify_move() are such that we deliver events
      directly to the two parent directories in question (old_dir and new_dir)
      directly without using the __fsnotify_parent() call.  fsnotify should be
      adding FS_EVENT_ON_CHILD for the notifications to these parents.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      ff52cc21
    • E
      inotify: reimplement inotify using fsnotify · 63c882a0
      Eric Paris 提交于
      Reimplement inotify_user using fsnotify.  This should be feature for feature
      exactly the same as the original inotify_user.  This does not make any changes
      to the in kernel inotify feature used by audit.  Those patches (and the eventual
      removal of in kernel inotify) will come after the new inotify_user proves to be
      working correctly.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      63c882a0
    • E
      fsnotify: handle filesystem unmounts with fsnotify marks · 164bc619
      Eric Paris 提交于
      When an fs is unmounted with an fsnotify mark entry attached to one of its
      inodes we need to destroy that mark entry and we also (like inotify) send
      an unmount event.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      164bc619
    • E
      fsnotify: fsnotify marks on inodes pin them in core · 1ef5f13c
      Eric Paris 提交于
      This patch pins any inodes with an fsnotify mark in core.  The idea is that
      as soon as the mark is removed from the inode->fsnotify_mark_entries list
      the inode will be iput.  In reality is doesn't quite work exactly this way.
      The igrab will happen when the mark is added to an inode, but the iput will
      happen when the inode pointer is NULL'd inside the mark.
      
      It's possible that 2 racing things will try to remove the mark from
      different directions.  One may try to remove the mark because of an
      explicit request and one might try to remove it because the inode was
      deleted.  It's possible that the removal because of inode deletion will
      remove the mark from the inode's list, but the removal by explicit request
      will actually set entry->inode == NULL; and call the iput.  This is safe.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      1ef5f13c
    • E
      fsnotify: allow groups to add private data to events · e4aff117
      Eric Paris 提交于
      inotify needs per group information attached to events.  This patch allows
      groups to attach private information and implements a callback so that
      information can be freed when an event is being destroyed.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      e4aff117
    • E
      fsnotify: add correlations between events · 47882c6f
      Eric Paris 提交于
      As part of the standard inotify events it includes a correlation cookie
      between two dentry move operations.  This patch includes the same behaviour
      in fsnotify events.  It is needed so that inotify userspace can be
      implemented on top of fsnotify.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      47882c6f
    • E
      fsnotify: include pathnames with entries when possible · 62ffe5df
      Eric Paris 提交于
      When inotify wants to send events to a directory about a child it includes
      the name of the original file.  This patch collects that filename and makes
      it available for notification.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      62ffe5df
    • E
      fsnotify: generic notification queue and waitq · a2d8bc6c
      Eric Paris 提交于
      inotify needs to do asyc notification in which event information is stored
      on a queue until the listener is ready to receive it.  This patch
      implements a generic notification queue for inotify (and later fanotify) to
      store events to be sent at a later time.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      a2d8bc6c
    • E
      dnotify: reimplement dnotify using fsnotify · 3c5119c0
      Eric Paris 提交于
      Reimplement dnotify using fsnotify.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      3c5119c0
    • E
      fsnotify: parent event notification · c28f7e56
      Eric Paris 提交于
      inotify and dnotify both use a similar parent notification mechanism.  We
      add a generic parent notification mechanism to fsnotify for both of these
      to use.  This new machanism also adds the dentry flag optimization which
      exists for inotify to dnotify.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      c28f7e56
    • E
      fsnotify: add marks to inodes so groups can interpret how to handle those inodes · 3be25f49
      Eric Paris 提交于
      This patch creates a way for fsnotify groups to attach marks to inodes.
      These marks have little meaning to the generic fsnotify infrastructure
      and thus their meaning should be interpreted by the group that attached
      them to the inode's list.
      
      dnotify and inotify  will make use of these markings to indicate which
      inodes are of interest to their respective groups.  But this implementation
      has the useful property that in the future other listeners could actually
      use the marks for the exact opposite reason, aka to indicate which inodes
      it had NO interest in.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      3be25f49
    • E
      fsnotify: unified filesystem notification backend · 90586523
      Eric Paris 提交于
      fsnotify is a backend for filesystem notification.  fsnotify does
      not provide any userspace interface but does provide the basis
      needed for other notification schemes such as dnotify.  fsnotify
      can be extended to be the backend for inotify or the upcoming
      fanotify.  fsnotify provides a mechanism for "groups" to register for
      some set of filesystem events and to then deliver those events to
      those groups for processing.
      
      fsnotify has a number of benefits, the first being actually shrinking the size
      of an inode.  Before fsnotify to support both dnotify and inotify an inode had
      
              unsigned long           i_dnotify_mask; /* Directory notify events */
              struct dnotify_struct   *i_dnotify; /* for directory notifications */
              struct list_head        inotify_watches; /* watches on this inode */
              struct mutex            inotify_mutex;  /* protects the watches list
      
      But with fsnotify this same functionallity (and more) is done with just
      
              __u32                   i_fsnotify_mask; /* all events for this inode */
              struct hlist_head       i_fsnotify_mark_entries; /* marks on this inode */
      
      That's right, inotify, dnotify, and fanotify all in 64 bits.  We used that
      much space just in inotify_watches alone, before this patch set.
      
      fsnotify object lifetime and locking is MUCH better than what we have today.
      inotify locking is incredibly complex.  See 8f7b0ba1 as an example of
      what's been busted since inception.  inotify needs to know internal semantics
      of superblock destruction and unmounting to function.  The inode pinning and
      vfs contortions are horrible.
      
      no fsnotify implementers do allocation under locks.  This means things like
      f04b30de which (due to an overabundance of caution) changes GFP_KERNEL to
      GFP_NOFS can be reverted.  There are no longer any allocation rules when using
      or implementing your own fsnotify listener.
      
      fsnotify paves the way for fanotify.  In brief fanotify is a notification
      mechanism that delivers the lisener both an 'event' and an open file descriptor
      to the object in question.  This means that fanotify is pathname agnostic.
      Some on lkml may not care for the original companies or users that pushed for
      TALPA, but fanotify was designed with flexibility and input for other users in
      mind.  The readahead group expressed interest in fanotify as it could be used
      to profile disk access on boot without breaking the audit system.  The desktop
      search groups have also expressed interest in fanotify as it solves a number
      of the race conditions and problems present with managing inotify when more
      than a limited number of specific files are of interest.  fanotify can provide
      for a userspace access control system which makes it a clean interface for AV
      vendors to hook without trying to do binary patching on the syscall table,
      LSM, and everywhere else they do their things today.  With this patch series
      fanotify can be implemented in less than 1200 lines of easy to review code.
      Almost all of which is the socket based user interface.
      
      This patch series builds fsnotify to the point that it can implement
      dnotify and inotify_user.  Patches exist and will be sent soon after
      acceptance to finish the in kernel inotify conversion (audit) and implement
      fanotify.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      90586523
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6 · 871fa907
      Linus Torvalds 提交于
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6:
        jfs: Add missing mutex_unlock call to error path
        missing unlock in jfs_quota_write()
      871fa907