1. 12 6月, 2009 29 次提交
    • J
      vfs: Make sys_sync() use fsync_super() (version 4) · 5cee5815
      Jan Kara 提交于
      It is unnecessarily fragile to have two places (fsync_super() and do_sync())
      doing data integrity sync of the filesystem. Alter __fsync_super() to
      accommodate needs of both callers and use it. So after this patch
      __fsync_super() is the only place where we gather all the calls needed to
      properly send all data on a filesystem to disk.
      
      Nice bonus is that we get a complete livelock avoidance and write_supers()
      is now only used for periodic writeback of superblocks.
      
      sync_blockdevs() introduced a couple of patches ago is gone now.
      
      [build fixes folded]
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5cee5815
    • J
      vfs: Make __fsync_super() a static function (version 4) · 429479f0
      Jan Kara 提交于
      __fsync_super() does the same thing as fsync_super(). So change the only
      caller to use fsync_super() and make __fsync_super() static. This removes
      unnecessarily duplicated call to sync_blockdev() and prepares ground
      for the changes to __fsync_super() in the following patches.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      429479f0
    • C
      remove s_async_list · 876a9f76
      Christoph Hellwig 提交于
      Remove the unused s_async_list in the superblock, a leftover of the
      broken async inode deletion code that leaked into mainline.  Having this
      in the middle of the sync/unmount path is not helpful for the following
      cleanups.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      876a9f76
    • N
      fs: introduce mnt_clone_write · 96029c4e
      npiggin@suse.de 提交于
      This patch speeds up lmbench lat_mmap test by about another 2% after the
      first patch.
      
      Before:
       avg = 462.286
       std = 5.46106
      
      After:
       avg = 453.12
       std = 9.58257
      
      (50 runs of each, stddev gives a reasonable confidence)
      
      It does this by introducing mnt_clone_write, which avoids some heavyweight
      operations of mnt_want_write if called on a vfsmount which we know already
      has a write count; and mnt_want_write_file, which can call mnt_clone_write
      if the file is open for write.
      
      After these two patches, mnt_want_write and mnt_drop_write go from 7% on
      the profile down to 1.3% (including mnt_clone_write).
      
      [AV: mnt_want_write_file() should take file alone and derive mnt from it;
      not only all callers have that form, but that's the only mnt about which
      we know that it's already held for write if file is opened for write]
      
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      96029c4e
    • N
      fs: mnt_want_write speedup · d3ef3d73
      npiggin@suse.de 提交于
      This patch speeds up lmbench lat_mmap test by about 8%. lat_mmap is set up
      basically to mmap a 64MB file on tmpfs, fault in its pages, then unmap it.
      A microbenchmark yes, but it exercises some important paths in the mm.
      
      Before:
       avg = 501.9
       std = 14.7773
      
      After:
       avg = 462.286
       std = 5.46106
      
      (50 runs of each, stddev gives a reasonable confidence, but there is quite
      a bit of variation there still)
      
      It does this by removing the complex per-cpu locking and counter-cache and
      replaces it with a percpu counter in struct vfsmount. This makes the code
      much simpler, and avoids spinlocks (although the msync is still pretty
      costly, unfortunately). It results in about 900 bytes smaller code too. It
      does increase the size of a vfsmount, however.
      
      It should also give a speedup on large systems if CPUs are frequently operating
      on different mounts (because the existing scheme has to operate on an atomic in
      the struct vfsmount when switching between mounts). But I'm most interested in
      the single threaded path performance for the moment.
      
      [AV: minor cleanup]
      
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      d3ef3d73
    • A
      Move junk from proc_fs.h to fs/proc/internal.h · 3174c21b
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      3174c21b
    • A
      switch lookup_mnt() · 1c755af4
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1c755af4
    • A
      switch follow_down() · 9393bd07
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9393bd07
    • A
      Switch collect_mounts() to struct path · 589ff870
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      589ff870
    • A
      switch follow_up() to struct path · bab77ebf
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      bab77ebf
    • A
      switch rqst_exp_parent() · e64c390c
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e64c390c
    • A
      switch rqst_exp_get_by_name() · 91c9fa8f
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      91c9fa8f
    • A
      Cache root in nameidata · 2a737871
      Al Viro 提交于
      New field: nd->root.  When pathname resolution wants to know the root,
      check if nd->root.mnt is non-NULL; use nd->root if it is, otherwise
      copy current->fs->root there.  After path_walk() is finished, we check
      if we'd got a cached value in nd->root and drop it.  Before calling
      path_walk() we should either set nd->root.mnt to NULL *or* copy (and
      pin down) some path to nd->root.  In the latter case we won't be
      looking at current->fs->root at all.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2a737871
    • J
      reiserfs: allow exposing privroot w/ xattrs enabled · 73422811
      Jeff Mahoney 提交于
      This patch adds an -oexpose_privroot option to allow access to the privroot.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      73422811
    • E
      fsnotify: move events should indicate the event was on a child · ff52cc21
      Eric Paris 提交于
      fsnotify tells its listeners explicitly when an event happened on the given
      inode verses on the child of the given inode.  (see __fsnotify_parent)
      However, the semantics of fsnotify_move() are such that we deliver events
      directly to the two parent directories in question (old_dir and new_dir)
      directly without using the __fsnotify_parent() call.  fsnotify should be
      adding FS_EVENT_ON_CHILD for the notifications to these parents.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      ff52cc21
    • E
      inotify: reimplement inotify using fsnotify · 63c882a0
      Eric Paris 提交于
      Reimplement inotify_user using fsnotify.  This should be feature for feature
      exactly the same as the original inotify_user.  This does not make any changes
      to the in kernel inotify feature used by audit.  Those patches (and the eventual
      removal of in kernel inotify) will come after the new inotify_user proves to be
      working correctly.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      63c882a0
    • E
      fsnotify: handle filesystem unmounts with fsnotify marks · 164bc619
      Eric Paris 提交于
      When an fs is unmounted with an fsnotify mark entry attached to one of its
      inodes we need to destroy that mark entry and we also (like inotify) send
      an unmount event.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      164bc619
    • E
      fsnotify: allow groups to add private data to events · e4aff117
      Eric Paris 提交于
      inotify needs per group information attached to events.  This patch allows
      groups to attach private information and implements a callback so that
      information can be freed when an event is being destroyed.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      e4aff117
    • E
      fsnotify: add correlations between events · 47882c6f
      Eric Paris 提交于
      As part of the standard inotify events it includes a correlation cookie
      between two dentry move operations.  This patch includes the same behaviour
      in fsnotify events.  It is needed so that inotify userspace can be
      implemented on top of fsnotify.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      47882c6f
    • E
      fsnotify: include pathnames with entries when possible · 62ffe5df
      Eric Paris 提交于
      When inotify wants to send events to a directory about a child it includes
      the name of the original file.  This patch collects that filename and makes
      it available for notification.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      62ffe5df
    • E
      fsnotify: generic notification queue and waitq · a2d8bc6c
      Eric Paris 提交于
      inotify needs to do asyc notification in which event information is stored
      on a queue until the listener is ready to receive it.  This patch
      implements a generic notification queue for inotify (and later fanotify) to
      store events to be sent at a later time.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      a2d8bc6c
    • E
      dnotify: reimplement dnotify using fsnotify · 3c5119c0
      Eric Paris 提交于
      Reimplement dnotify using fsnotify.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      3c5119c0
    • E
      fsnotify: parent event notification · c28f7e56
      Eric Paris 提交于
      inotify and dnotify both use a similar parent notification mechanism.  We
      add a generic parent notification mechanism to fsnotify for both of these
      to use.  This new machanism also adds the dentry flag optimization which
      exists for inotify to dnotify.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      c28f7e56
    • E
      fsnotify: add marks to inodes so groups can interpret how to handle those inodes · 3be25f49
      Eric Paris 提交于
      This patch creates a way for fsnotify groups to attach marks to inodes.
      These marks have little meaning to the generic fsnotify infrastructure
      and thus their meaning should be interpreted by the group that attached
      them to the inode's list.
      
      dnotify and inotify  will make use of these markings to indicate which
      inodes are of interest to their respective groups.  But this implementation
      has the useful property that in the future other listeners could actually
      use the marks for the exact opposite reason, aka to indicate which inodes
      it had NO interest in.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      3be25f49
    • E
      fsnotify: unified filesystem notification backend · 90586523
      Eric Paris 提交于
      fsnotify is a backend for filesystem notification.  fsnotify does
      not provide any userspace interface but does provide the basis
      needed for other notification schemes such as dnotify.  fsnotify
      can be extended to be the backend for inotify or the upcoming
      fanotify.  fsnotify provides a mechanism for "groups" to register for
      some set of filesystem events and to then deliver those events to
      those groups for processing.
      
      fsnotify has a number of benefits, the first being actually shrinking the size
      of an inode.  Before fsnotify to support both dnotify and inotify an inode had
      
              unsigned long           i_dnotify_mask; /* Directory notify events */
              struct dnotify_struct   *i_dnotify; /* for directory notifications */
              struct list_head        inotify_watches; /* watches on this inode */
              struct mutex            inotify_mutex;  /* protects the watches list
      
      But with fsnotify this same functionallity (and more) is done with just
      
              __u32                   i_fsnotify_mask; /* all events for this inode */
              struct hlist_head       i_fsnotify_mark_entries; /* marks on this inode */
      
      That's right, inotify, dnotify, and fanotify all in 64 bits.  We used that
      much space just in inotify_watches alone, before this patch set.
      
      fsnotify object lifetime and locking is MUCH better than what we have today.
      inotify locking is incredibly complex.  See 8f7b0ba1 as an example of
      what's been busted since inception.  inotify needs to know internal semantics
      of superblock destruction and unmounting to function.  The inode pinning and
      vfs contortions are horrible.
      
      no fsnotify implementers do allocation under locks.  This means things like
      f04b30de which (due to an overabundance of caution) changes GFP_KERNEL to
      GFP_NOFS can be reverted.  There are no longer any allocation rules when using
      or implementing your own fsnotify listener.
      
      fsnotify paves the way for fanotify.  In brief fanotify is a notification
      mechanism that delivers the lisener both an 'event' and an open file descriptor
      to the object in question.  This means that fanotify is pathname agnostic.
      Some on lkml may not care for the original companies or users that pushed for
      TALPA, but fanotify was designed with flexibility and input for other users in
      mind.  The readahead group expressed interest in fanotify as it could be used
      to profile disk access on boot without breaking the audit system.  The desktop
      search groups have also expressed interest in fanotify as it solves a number
      of the race conditions and problems present with managing inotify when more
      than a limited number of specific files are of interest.  fanotify can provide
      for a userspace access control system which makes it a clean interface for AV
      vendors to hook without trying to do binary patching on the syscall table,
      LSM, and everywhere else they do their things today.  With this patch series
      fanotify can be implemented in less than 1200 lines of easy to review code.
      Almost all of which is the socket based user interface.
      
      This patch series builds fsnotify to the point that it can implement
      dnotify and inotify_user.  Patches exist and will be sent soon after
      acceptance to finish the in kernel inotify conversion (audit) and implement
      fanotify.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      90586523
    • Y
      x86: remove some alloc_bootmem_cpumask_var calling · 38c7fed2
      Yinghai Lu 提交于
      Now that we set up the slab allocator earlier, we can get rid of some
      alloc_bootmem_cpumask_var() calls in boot code.
      
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      38c7fed2
    • C
      kmemleak: Remove some of the kmemleak false positives · 2e1483c9
      Catalin Marinas 提交于
      There are allocations for which the main pointer cannot be found but
      they are not memory leaks. This patch fixes some of them. For more
      information on false positives, see Documentation/kmemleak.txt.
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      2e1483c9
    • C
      kmemleak: Add the slab memory allocation/freeing hooks · d5cff635
      Catalin Marinas 提交于
      This patch adds the callbacks to kmemleak_(alloc|free) functions from
      the slab allocator. The patch also adds the SLAB_NOLEAKTRACE flag to
      avoid recursive calls to kmemleak when it allocates its own data
      structures.
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: NPekka Enberg <penberg@cs.helsinki.fi>
      d5cff635
    • C
      kmemleak: Add the base support · 3c7b4e6b
      Catalin Marinas 提交于
      This patch adds the base support for the kernel memory leak
      detector. It traces the memory allocation/freeing in a way similar to
      the Boehm's conservative garbage collector, the difference being that
      the unreferenced objects are not freed but only shown in
      /sys/kernel/debug/kmemleak. Enabling this feature introduces an
      overhead to memory allocations.
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      3c7b4e6b
  2. 11 6月, 2009 11 次提交