1. 12 4月, 2015 6 次提交
  2. 26 3月, 2015 1 次提交
  3. 18 3月, 2015 1 次提交
    • T
      fs: make sure the timestamps for lazytime inodes eventually get written · a2f48706
      Theodore Ts'o 提交于
      Jan Kara pointed out that if there is an inode which is constantly
      getting dirtied with I_DIRTY_PAGES, an inode with an updated timestamp
      will never be written since inode->dirtied_when is constantly getting
      updated.  We fix this by adding an extra field to the inode,
      dirtied_time_when, so inodes with a stale dirtytime can get detected
      and handled.
      
      In addition, if we have a dirtytime inode caused by an atime update,
      and there is no write activity on the file system, we need to have a
      secondary system to make sure these inodes get written out.  We do
      this by setting up a second delayed work structure which wakes up the
      CPU much more rarely compared to writeback_expire_centisecs.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      a2f48706
  4. 17 2月, 2015 9 次提交
  5. 13 2月, 2015 1 次提交
    • V
      fs: consolidate {nr,free}_cached_objects args in shrink_control · 4101b624
      Vladimir Davydov 提交于
      We are going to make FS shrinkers memcg-aware.  To achieve that, we will
      have to pass the memcg to scan to the nr_cached_objects and
      free_cached_objects VFS methods, which currently take only the NUMA node
      to scan.  Since the shrink_control structure already holds the node, and
      the memcg to scan will be added to it when we introduce memcg-aware
      vmscan, let us consolidate the methods' arguments in this structure to
      keep things clean.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Suggested-by: NDave Chinner <david@fromorbit.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Glauber Costa <glommer@gmail.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4101b624
  6. 11 2月, 2015 3 次提交
    • K
      rmap: drop support of non-linear mappings · 27ba0644
      Kirill A. Shutemov 提交于
      We don't create non-linear mappings anymore.  Let's drop code which
      handles them in rmap.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      27ba0644
    • K
      mm: drop vm_ops->remap_pages and generic_file_remap_pages() stub · d83a08db
      Kirill A. Shutemov 提交于
      Nobody uses it anymore.
      
      [akpm@linux-foundation.org: fix filemap_xip.c]
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d83a08db
    • K
      mm: replace remap_file_pages() syscall with emulation · c8d78c18
      Kirill A. Shutemov 提交于
      remap_file_pages(2) was invented to be able efficiently map parts of
      huge file into limited 32-bit virtual address space such as in database
      workloads.
      
      Nonlinear mappings are pain to support and it seems there's no
      legitimate use-cases nowadays since 64-bit systems are widely available.
      
      Let's drop it and get rid of all these special-cased code.
      
      The patch replaces the syscall with emulation which creates new VMA on
      each remap_file_pages(), unless they it can be merged with an adjacent
      one.
      
      I didn't find *any* real code that uses remap_file_pages(2) to test
      emulation impact on.  I've checked Debian code search and source of all
      packages in ALT Linux.  No real users: libc wrappers, mentions in
      strace, gdb, valgrind and this kind of stuff.
      
      There are few basic tests in LTP for the syscall.  They work just fine
      with emulation.
      
      To test performance impact, I've written small test case which
      demonstrate pretty much worst case scenario: map 4G shmfs file, write to
      begin of every page pgoff of the page, remap pages in reverse order,
      read every page.
      
      The test creates 1 million of VMAs if emulation is in use, so I had to
      set vm.max_map_count to 1100000 to avoid -ENOMEM.
      
      Before:		23.3 ( +-  4.31% ) seconds
      After:		43.9 ( +-  0.85% ) seconds
      Slowdown:	1.88x
      
      I believe we can live with that.
      
      Test case:
      
              #define _GNU_SOURCE
              #include <assert.h>
              #include <stdlib.h>
              #include <stdio.h>
              #include <sys/mman.h>
      
              #define MB	(1024UL * 1024)
              #define SIZE	(4096 * MB)
      
              int main(int argc, char **argv)
              {
                      unsigned long *p;
                      long i, pass;
      
                      for (pass = 0; pass < 10; pass++) {
                              p = mmap(NULL, SIZE, PROT_READ|PROT_WRITE,
                                              MAP_SHARED | MAP_ANONYMOUS, -1, 0);
                              if (p == MAP_FAILED) {
                                      perror("mmap");
                                      return -1;
                              }
      
                              for (i = 0; i < SIZE / 4096; i++)
                                      p[i * 4096 / sizeof(*p)] = i;
      
                              for (i = 0; i < SIZE / 4096; i++) {
                                      if (remap_file_pages(p + i * 4096 / sizeof(*p), 4096,
                                                      0, (SIZE - 4096 * (i + 1)) >> 12, 0)) {
                                              perror("remap_file_pages");
                                              return -1;
                                      }
                              }
      
                              for (i = SIZE / 4096 - 1; i >= 0; i--)
                                      assert(p[i * 4096 / sizeof(*p)] == SIZE / 4096 - i - 1);
      
                              munmap(p, SIZE);
                      }
      
                      return 0;
              }
      
      [akpm@linux-foundation.org: fix spello]
      [sasha.levin@oracle.com: initialize populate before usage]
      [sasha.levin@oracle.com: grab file ref to prevent race while mmaping]
      Signed-off-by: N"Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Armin Rigo <arigo@tunes.org>
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c8d78c18
  7. 05 2月, 2015 2 次提交
    • T
      vfs: add find_inode_nowait() function · fe032c42
      Theodore Ts'o 提交于
      Add a new function find_inode_nowait() which is an even more general
      version of ilookup5_nowait().  It is designed for callers which need
      very fine grained control over when the function is allowed to block
      or increment the inode's reference count.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      fe032c42
    • T
      vfs: add support for a lazytime mount option · 0ae45f63
      Theodore Ts'o 提交于
      Add a new mount option which enables a new "lazytime" mode.  This mode
      causes atime, mtime, and ctime updates to only be made to the
      in-memory version of the inode.  The on-disk times will only get
      updated when (a) if the inode needs to be updated for some non-time
      related change, (b) if userspace calls fsync(), syncfs() or sync(), or
      (c) just before an undeleted inode is evicted from memory.
      
      This is OK according to POSIX because there are no guarantees after a
      crash unless userspace explicitly requests via a fsync(2) call.
      
      For workloads which feature a large number of random write to a
      preallocated file, the lazytime mount option significantly reduces
      writes to the inode table.  The repeated 4k writes to a single block
      will result in undesirable stress on flash devices and SMR disk
      drives.  Even on conventional HDD's, the repeated writes to the inode
      table block will trigger Adjacent Track Interference (ATI) remediation
      latencies, which very negatively impact long tail latencies --- which
      is a very big deal for web serving tiers (for example).
      
      Google-Bug-Id: 18297052
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      0ae45f63
  8. 03 2月, 2015 2 次提交
    • C
      fs: add FL_LAYOUT lease type · 11afe9f7
      Christoph Hellwig 提交于
      This (ab-)uses the file locking code to allow filesystems to recall
      outstanding pNFS layouts on a file.  This new lease type is similar but
      not quite the same as FL_DELEG.  A FL_LAYOUT lease can always be granted,
      an a per-filesystem lock (XFS iolock for the initial implementation)
      ensures not FL_LAYOUT leases granted when we would need to recall them.
      
      Also included are changes that allow multiple outstanding read
      leases of different types on the same file as long as they have a
      differnt owner.  This wasn't a problem until now as nfsd never set
      FL_LEASE leases, and no one else used FL_DELEG leases, but given that
      nfsd will also issues FL_LAYOUT leases we will have to handle it now.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      11afe9f7
    • A
      Make super_blocks and sb_lock static · 15d0f5ea
      Al Viro 提交于
      The only user outside of fs/super.c is gone now
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Acked-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      15d0f5ea
  9. 29 1月, 2015 1 次提交
  10. 23 1月, 2015 1 次提交
    • P
      audit: replace getname()/putname() hacks with reference counters · 55422d0b
      Paul Moore 提交于
      In order to ensure that filenames are not released before the audit
      subsystem is done with the strings there are a number of hacks built
      into the fs and audit subsystems around getname() and putname().  To
      say these hacks are "ugly" would be kind.
      
      This patch removes the filename hackery in favor of a more
      conventional reference count based approach.  The diffstat below tells
      most of the story; lots of audit/fs specific code is replaced with a
      traditional reference count based approach that is easily understood,
      even by those not familiar with the audit and/or fs subsystems.
      
      CC: viro@zeniv.linux.org.uk
      CC: linux-fsdevel@vger.kernel.org
      Signed-off-by: NPaul Moore <pmoore@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      55422d0b
  11. 22 1月, 2015 1 次提交
  12. 21 1月, 2015 2 次提交
  13. 17 1月, 2015 8 次提交
  14. 09 1月, 2015 1 次提交
  15. 14 12月, 2014 1 次提交
    • P
      aio: Make it possible to remap aio ring · e4a0d3e7
      Pavel Emelyanov 提交于
      There are actually two issues this patch addresses. Let me start with
      the one I tried to solve in the beginning.
      
      So, in the checkpoint-restore project (criu) we try to dump tasks'
      state and restore one back exactly as it was. One of the tasks' state
      bits is rings set up with io_setup() call. There's (almost) no problems
      in dumping them, there's a problem restoring them -- if I dump a task
      with aio ring originally mapped at address A, I want to restore one
      back at exactly the same address A. Unfortunately, the io_setup() does
      not allow for that -- it mmaps the ring at whatever place mm finds
      appropriate (it calls do_mmap_pgoff() with zero address and without
      the MAP_FIXED flag).
      
      To make restore possible I'm going to mremap() the freshly created ring
      into the address A (under which it was seen before dump). The problem is
      that the ring's virtual address is passed back to the user-space as the
      context ID and this ID is then used as search key by all the other io_foo()
      calls. Reworking this ID to be just some integer doesn't seem to work, as
      this value is already used by libaio as a pointer using which this library
      accesses memory for aio meta-data.
      
      So, to make restore work we need to make sure that
      
      a) ring is mapped at desired virtual address
      b) kioctx->user_id matches this value
      
      Having said that, the patch makes mremap() on aio region update the
      kioctx's user_id and mmap_base values.
      
      Here appears the 2nd issue I mentioned in the beginning of this mail.
      If (regardless of the C/R dances I do) someone creates an io context
      with io_setup(), then mremap()-s the ring and then destroys the context,
      the kill_ioctx() routine will call munmap() on wrong (old) address.
      This will result in a) aio ring remaining in memory and b) some other
      vma get unexpectedly unmapped.
      
      What do you think?
      Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
      Acked-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
      e4a0d3e7