1. 14 5月, 2015 1 次提交
    • E
      mnt: Refactor the logic for mounting sysfs and proc in a user namespace · 1b852bce
      Eric W. Biederman 提交于
      Fresh mounts of proc and sysfs are a very special case that works very
      much like a bind mount.  Unfortunately the current structure can not
      preserve the MNT_LOCK... mount flags.  Therefore refactor the logic
      into a form that can be modified to preserve those lock bits.
      
      Add a new filesystem flag FS_USERNS_VISIBLE that requires some mount
      of the filesystem be fully visible in the current mount namespace,
      before the filesystem may be mounted.
      
      Move the logic for calling fs_fully_visible from proc and sysfs into
      fs/namespace.c where it has greater access to mount namespace state.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      1b852bce
  2. 10 5月, 2015 1 次提交
  3. 25 4月, 2015 4 次提交
    • A
      RCU pathwalk breakage when running into a symlink overmounting something · 3cab989a
      Al Viro 提交于
      Calling unlazy_walk() in walk_component() and do_last() when we find
      a symlink that needs to be followed doesn't acquire a reference to vfsmount.
      That's fine when the symlink is on the same vfsmount as the parent directory
      (which is almost always the case), but it's not always true - one _can_
      manage to bind a symlink on top of something.  And in such cases we end up
      with excessive mntput().
      
      Cc: stable@vger.kernel.org # since 2.6.39
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      3cab989a
    • J
      direct-io: only inc/dec inode->i_dio_count for file systems · fe0f07d0
      Jens Axboe 提交于
      do_blockdev_direct_IO() increments and decrements the inode
      ->i_dio_count for each IO operation. It does this to protect against
      truncate of a file. Block devices don't need this sort of protection.
      
      For a capable multiqueue setup, this atomic int is the only shared
      state between applications accessing the device for O_DIRECT, and it
      presents a scaling wall for that. In my testing, as much as 30% of
      system time is spent incrementing and decrementing this value. A mixed
      read/write workload improved from ~2.5M IOPS to ~9.6M IOPS, with
      better latencies too. Before:
      
      clat percentiles (usec):
       |  1.00th=[   33],  5.00th=[   34], 10.00th=[   34], 20.00th=[   34],
       | 30.00th=[   34], 40.00th=[   34], 50.00th=[   35], 60.00th=[   35],
       | 70.00th=[   35], 80.00th=[   35], 90.00th=[   37], 95.00th=[   80],
       | 99.00th=[   98], 99.50th=[  151], 99.90th=[  155], 99.95th=[  155],
       | 99.99th=[  165]
      
      After:
      
      clat percentiles (usec):
       |  1.00th=[   95],  5.00th=[  108], 10.00th=[  129], 20.00th=[  149],
       | 30.00th=[  155], 40.00th=[  161], 50.00th=[  167], 60.00th=[  171],
       | 70.00th=[  177], 80.00th=[  185], 90.00th=[  201], 95.00th=[  270],
       | 99.00th=[  390], 99.50th=[  398], 99.90th=[  418], 99.95th=[  422],
       | 99.99th=[  438]
      
      In other setups, Robert Elliott reported seeing good performance
      improvements:
      
      https://lkml.org/lkml/2015/4/3/557
      
      The more applications accessing the device, the worse it gets.
      
      Add a new direct-io flags, DIO_SKIP_DIO_COUNT, which tells
      do_blockdev_direct_IO() that it need not worry about incrementing
      or decrementing the inode i_dio_count for this caller.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Elliott, Robert (Server Storage) <elliott@hp.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      fe0f07d0
    • J
      fs/9p: fix readdir() · 8e3c5005
      Johannes Berg 提交于
      Al Viro's IOV changes broke 9p readdir() because the new code
      didn't abort the read when it returned nothing. The original
      code checked if the combined error/length was <= 0 but in the
      new code that accidentally got changed to just an error check.
      
      Add back the return from the function when nothing is read.
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Fixes: e1200fe6 ("9p: switch p9_client_read() to passing struct iov_iter *")
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      8e3c5005
    • C
      Btrfs: prevent list corruption during free space cache processing · a3bdccc4
      Chris Mason 提交于
      __btrfs_write_out_cache is holding the ctl->tree_lock while it prepares
      a list of bitmaps to record in the free space cache.  It was dropping
      the lock while it worked on other components, which made a window for
      free_bitmap() to free the bitmap struct without removing it from the
      list.
      
      This changes things to hold the lock the whole time, and also makes sure
      we hold the lock during enospc cleanup.
      Reported-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      a3bdccc4
  4. 24 4月, 2015 15 次提交
  5. 22 4月, 2015 9 次提交
  6. 20 4月, 2015 10 次提交