1. 24 6月, 2015 2 次提交
  2. 19 6月, 2015 1 次提交
    • D
      overlayfs: Make f_path always point to the overlay and f_inode to the underlay · 4bacc9c9
      David Howells 提交于
      Make file->f_path always point to the overlay dentry so that the path in
      /proc/pid/fd is correct and to ensure that label-based LSMs have access to the
      overlay as well as the underlay (path-based LSMs probably don't need it).
      
      Using my union testsuite to set things up, before the patch I see:
      
      	[root@andromeda union-testsuite]# bash 5</mnt/a/foo107
      	[root@andromeda union-testsuite]# ls -l /proc/$$/fd/
      	...
      	lr-x------. 1 root root 64 Jun  5 14:38 5 -> /a/foo107
      	[root@andromeda union-testsuite]# stat /mnt/a/foo107
      	...
      	Device: 23h/35d Inode: 13381       Links: 1
      	...
      	[root@andromeda union-testsuite]# stat -L /proc/$$/fd/5
      	...
      	Device: 23h/35d Inode: 13381       Links: 1
      	...
      
      After the patch:
      
      	[root@andromeda union-testsuite]# bash 5</mnt/a/foo107
      	[root@andromeda union-testsuite]# ls -l /proc/$$/fd/
      	...
      	lr-x------. 1 root root 64 Jun  5 14:22 5 -> /mnt/a/foo107
      	[root@andromeda union-testsuite]# stat /mnt/a/foo107
      	...
      	Device: 23h/35d Inode: 40346       Links: 1
      	...
      	[root@andromeda union-testsuite]# stat -L /proc/$$/fd/5
      	...
      	Device: 23h/35d Inode: 40346       Links: 1
      	...
      
      Note the change in where /proc/$$/fd/5 points to in the ls command.  It was
      pointing to /a/foo107 (which doesn't exist) and now points to /mnt/a/foo107
      (which is correct).
      
      The inode accessed, however, is the lower layer.  The union layer is on device
      25h/37d and the upper layer on 24h/36d.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4bacc9c9
  3. 15 5月, 2015 2 次提交
  4. 11 5月, 2015 5 次提交
  5. 25 4月, 2015 2 次提交
    • E
      fix I_DIO_WAKEUP definition · ac74d8d6
      Eric Sandeen 提交于
      I_DIO_WAKEUP is never directly used, but fix it up anyway.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ac74d8d6
    • J
      direct-io: only inc/dec inode->i_dio_count for file systems · fe0f07d0
      Jens Axboe 提交于
      do_blockdev_direct_IO() increments and decrements the inode
      ->i_dio_count for each IO operation. It does this to protect against
      truncate of a file. Block devices don't need this sort of protection.
      
      For a capable multiqueue setup, this atomic int is the only shared
      state between applications accessing the device for O_DIRECT, and it
      presents a scaling wall for that. In my testing, as much as 30% of
      system time is spent incrementing and decrementing this value. A mixed
      read/write workload improved from ~2.5M IOPS to ~9.6M IOPS, with
      better latencies too. Before:
      
      clat percentiles (usec):
       |  1.00th=[   33],  5.00th=[   34], 10.00th=[   34], 20.00th=[   34],
       | 30.00th=[   34], 40.00th=[   34], 50.00th=[   35], 60.00th=[   35],
       | 70.00th=[   35], 80.00th=[   35], 90.00th=[   37], 95.00th=[   80],
       | 99.00th=[   98], 99.50th=[  151], 99.90th=[  155], 99.95th=[  155],
       | 99.99th=[  165]
      
      After:
      
      clat percentiles (usec):
       |  1.00th=[   95],  5.00th=[  108], 10.00th=[  129], 20.00th=[  149],
       | 30.00th=[  155], 40.00th=[  161], 50.00th=[  167], 60.00th=[  171],
       | 70.00th=[  177], 80.00th=[  185], 90.00th=[  201], 95.00th=[  270],
       | 99.00th=[  390], 99.50th=[  398], 99.90th=[  418], 99.95th=[  422],
       | 99.99th=[  438]
      
      In other setups, Robert Elliott reported seeing good performance
      improvements:
      
      https://lkml.org/lkml/2015/4/3/557
      
      The more applications accessing the device, the worse it gets.
      
      Add a new direct-io flags, DIO_SKIP_DIO_COUNT, which tells
      do_blockdev_direct_IO() that it need not worry about incrementing
      or decrementing the inode i_dio_count for this caller.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Elliott, Robert (Server Storage) <elliott@hp.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      fe0f07d0
  6. 17 4月, 2015 2 次提交
    • A
      proc: show locks in /proc/pid/fdinfo/X · 6c8c9031
      Andrey Vagin 提交于
      Let's show locks which are associated with a file descriptor in
      its fdinfo file.
      
      Currently we don't have a reliable way to determine who holds a lock.  We
      can find some information in /proc/locks, but PID which is reported there
      can be wrong.  For example, a process takes a lock, then forks a child and
      dies.  In this case /proc/locks contains the parent pid, which can be
      reused by another process.
      
      $ cat /proc/locks
      ...
      6: FLOCK  ADVISORY  WRITE 324 00:13:13431 0 EOF
      ...
      
      $ ps -C rpcbind
        PID TTY          TIME CMD
        332 ?        00:00:00 rpcbind
      
      $ cat /proc/332/fdinfo/4
      pos:	0
      flags:	0100000
      mnt_id:	22
      lock:	1: FLOCK  ADVISORY  WRITE 324 00:13:13431 0 EOF
      
      $ ls -l /proc/332/fd/4
      lr-x------ 1 root root 64 Mar  5 14:43 /proc/332/fd/4 -> /run/rpcbind.lock
      
      $ ls -l /proc/324/fd/
      total 0
      lrwx------ 1 root root 64 Feb 27 14:50 0 -> /dev/pts/0
      lrwx------ 1 root root 64 Feb 27 14:50 1 -> /dev/pts/0
      lrwx------ 1 root root 64 Feb 27 14:49 2 -> /dev/pts/0
      
      You can see that the process with the 324 pid doesn't hold the lock.
      
      This information is required for proper dumping and restoring file
      locks.
      Signed-off-by: NAndrey Vagin <avagin@openvz.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Acked-by: NJeff Layton <jlayton@poochiereds.net>
      Acked-by: N"J. Bruce Fields" <bfields@fieldses.org>
      Acked-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Joe Perches <joe@perches.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6c8c9031
    • K
      mm: rcu-protected get_mm_exe_file() · 90f31d0e
      Konstantin Khlebnikov 提交于
      This patch removes mm->mmap_sem from mm->exe_file read side.
      Also it kills dup_mm_exe_file() and moves exe_file duplication into
      dup_mmap() where both mmap_sems are locked.
      
      [akpm@linux-foundation.org: fix comment typo]
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      90f31d0e
  7. 16 4月, 2015 2 次提交
  8. 12 4月, 2015 11 次提交
  9. 07 4月, 2015 1 次提交
    • A
      fix mremap() vs. ioctx_kill() race · b2edffdd
      Al Viro 提交于
      teach ->mremap() method to return an error and have it fail for
      aio mappings in process of being killed
      
      Note that in case of ->mremap() failure we need to undo move_page_tables()
      we'd already done; we could call ->mremap() first, but then the failure of
      move_page_tables() would require undoing whatever _successful_ ->mremap()
      has done, which would be a lot more headache in general.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      b2edffdd
  10. 03 4月, 2015 1 次提交
    • J
      locks: change lm_get_owner and lm_put_owner prototypes · cae80b30
      Jeff Layton 提交于
      The current prototypes for these operations are somewhat awkward as they
      deal with fl_owners but take struct file_lock arguments. In the future,
      we'll want to be able to take references without necessarily dealing
      with a struct file_lock.
      
      Change them to take fl_owner_t arguments instead and have the callers
      deal with assigning the values to the file_lock structs.
      Signed-off-by: NJeff Layton <jlayton@primarydata.com>
      cae80b30
  11. 26 3月, 2015 1 次提交
  12. 18 3月, 2015 1 次提交
    • T
      fs: make sure the timestamps for lazytime inodes eventually get written · a2f48706
      Theodore Ts'o 提交于
      Jan Kara pointed out that if there is an inode which is constantly
      getting dirtied with I_DIRTY_PAGES, an inode with an updated timestamp
      will never be written since inode->dirtied_when is constantly getting
      updated.  We fix this by adding an extra field to the inode,
      dirtied_time_when, so inodes with a stale dirtytime can get detected
      and handled.
      
      In addition, if we have a dirtytime inode caused by an atime update,
      and there is no write activity on the file system, we need to have a
      secondary system to make sure these inodes get written out.  We do
      this by setting up a second delayed work structure which wakes up the
      CPU much more rarely compared to writeback_expire_centisecs.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      a2f48706
  13. 17 2月, 2015 9 次提交