1. 09 2月, 2016 1 次提交
    • D
      xfs: introduce inode log format object · f8d55aa0
      Dave Chinner 提交于
      We currently carry around and log an entire inode core in the
      struct xfs_inode. A lot of the information in the inode core is
      duplicated in the VFS inode, but we cannot remove this duplication
      of infomration because the inode core is logged directly in
      xfs_inode_item_format().
      
      Add a new function xfs_inode_item_format_core() that copies the
      inode core data into a struct xfs_icdinode that is pulled directly
      from the log vector buffer. This means we no longer directly
      copy the inode core, but copy the structures one member at a time.
      This will be slightly less efficient than copying, but will allow us
      to remove duplicate and unnecessary items from the struct xfs_inode.
      
      To enable us to do this, call the new structure a xfs_log_dinode,
      so that we know it's different to the physical xfs_dinode and the
      in-core xfs_icdinode.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      f8d55aa0
  2. 30 1月, 2016 1 次提交
  3. 27 1月, 2016 2 次提交
  4. 26 1月, 2016 3 次提交
    • D
      Revert "btrfs: clear PF_NOFREEZE in cleaner_kthread()" · 80ad623e
      David Sterba 提交于
      This reverts commit 69624913. The
      cleaner thread can block freezing when there's a snapshot cleaning in
      progress and the other threads get suspended first. From the logs
      provided by Martin we're waiting for reading extent pages:
      
      kernel: PM: Syncing filesystems ... done.
      kernel: Freezing user space processes ... (elapsed 0.015 seconds) done.
      kernel: Freezing remaining freezable tasks ...
      kernel: Freezing of tasks failed after 20.003 seconds (1 tasks refusing to freeze, wq_busy=0):
      kernel: btrfs-cleaner   D ffff88033dd13bc0     0   152      2 0x00000000
      kernel: ffff88032ebc2e00 ffff88032e750000 ffff88032e74fa50 7fffffffffffffff
      kernel: ffffffff814a58df 0000000000000002 ffffea000934d580 ffffffff814a5451
      kernel: 7fffffffffffffff ffffffff814a6e8f 0000000000000000 0000000000000020
      kernel: Call Trace:
      kernel: [<ffffffff814a58df>] ? bit_wait+0x2c/0x2c
      kernel: [<ffffffff814a5451>] ? schedule+0x6f/0x7c
      kernel: [<ffffffff814a6e8f>] ? schedule_timeout+0x2f/0xd8
      kernel: [<ffffffff81076f94>] ? timekeeping_get_ns+0xa/0x2e
      kernel: [<ffffffff81077603>] ? ktime_get+0x36/0x44
      kernel: [<ffffffff814a4f6c>] ? io_schedule_timeout+0x94/0xf2
      kernel: [<ffffffff814a4f6c>] ? io_schedule_timeout+0x94/0xf2
      kernel: [<ffffffff814a590b>] ? bit_wait_io+0x2c/0x30
      kernel: [<ffffffff814a5694>] ? __wait_on_bit+0x41/0x73
      kernel: [<ffffffff8109eba8>] ? wait_on_page_bit+0x6d/0x72
      kernel: [<ffffffff8105d718>] ? autoremove_wake_function+0x2a/0x2a
      kernel: [<ffffffff811a02d7>] ? read_extent_buffer_pages+0x1bd/0x203
      kernel: [<ffffffff8117d9e9>] ? free_root_pointers+0x4c/0x4c
      kernel: [<ffffffff8117e831>] ? btree_read_extent_buffer_pages.constprop.57+0x5a/0xe9
      kernel: [<ffffffff8117f4f3>] ? read_tree_block+0x2d/0x45
      kernel: [<ffffffff8116782a>] ? read_block_for_search.isra.34+0x22a/0x26b
      kernel: [<ffffffff811656c3>] ? btrfs_set_path_blocking+0x1e/0x4a
      kernel: [<ffffffff8116919b>] ? btrfs_search_slot+0x648/0x736
      kernel: [<ffffffff81170559>] ? btrfs_lookup_extent_info+0xb7/0x2c7
      kernel: [<ffffffff81170ee5>] ? walk_down_proc+0x9c/0x1ae
      kernel: [<ffffffff81171c9d>] ? walk_down_tree+0x40/0xa4
      kernel: [<ffffffff8117375f>] ? btrfs_drop_snapshot+0x2da/0x664
      kernel: [<ffffffff8104ff21>] ? finish_task_switch+0x126/0x167
      kernel: [<ffffffff811850f8>] ? btrfs_clean_one_deleted_snapshot+0xa6/0xb0
      kernel: [<ffffffff8117eaba>] ? cleaner_kthread+0x13e/0x17b
      kernel: [<ffffffff8117e97c>] ? btrfs_item_end+0x33/0x33
      kernel: [<ffffffff8104d256>] ? kthread+0x95/0x9d
      kernel: [<ffffffff8104d1c1>] ? kthread_parkme+0x16/0x16
      kernel: [<ffffffff814a7b5f>] ? ret_from_fork+0x3f/0x70
      kernel: [<ffffffff8104d1c1>] ? kthread_parkme+0x16/0x16
      
      As this affects a released kernel (4.4) we need a minimal fix for
      stable kernels.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=108361Reported-by: NMartin Ziegler <ziegler@uni-freiburg.de>
      CC: stable@vger.kernel.org # 4.4
      CC: Jiri Kosina <jkosina@suse.cz>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      80ad623e
    • Q
      btrfs: async-thread: Fix a use-after-free error for trace · 0a95b851
      Qu Wenruo 提交于
      Parameter of trace_btrfs_work_queued() can be freed in its workqueue.
      So no one use use that pointer after queue_work().
      
      Fix the user-after-free bug by move the trace line before queue_work().
      Reported-by: NDave Jones <davej@codemonkey.org.uk>
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      0a95b851
    • F
      Btrfs: fix race between fsync and lockless direct IO writes · de0ee0ed
      Filipe Manana 提交于
      An fsync, using the fast path, can race with a concurrent lockless direct
      IO write and end up logging a file extent item that points to an extent
      that wasn't written to yet. This is because the fast fsync path collects
      ordered extents into a local list and then collects all the new extent
      maps to log file extent items based on them, while the direct IO write
      path creates the new extent map before it creates the corresponding
      ordered extent (and submitting the respective bio(s)).
      
      So fix this by making the direct IO write path create ordered extents
      before the extent maps and make the fast fsync path collect any new
      ordered extents after it collects the extent maps.
      Note that making the fsync handler call inode_dio_wait() (after acquiring
      the inode's i_mutex) would not work and lead to a deadlock when doing
      AIO, as through AIO we end up in a path where the fsync handler is called
      (through dio_aio_complete_work() -> dio_complete() -> vfs_fsync_range())
      before the inode's dio counter is decremented (inode_dio_wait() waits
      for this counter to have a value of zero).
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      de0ee0ed
  5. 25 1月, 2016 2 次提交
  6. 23 1月, 2016 13 次提交
    • D
      vfs: abort dedupe loop if fatal signals are pending · e62e560f
      Darrick J. Wong 提交于
      If the program running dedupe receives a fatal signal during the
      dedupe loop, we should bail out to avoid tying up the system.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e62e560f
    • T
      tree wide: use kvfree() than conditional kfree()/vfree() · 1d5cfdb0
      Tetsuo Handa 提交于
      There are many locations that do
      
        if (memory_was_allocated_by_vmalloc)
          vfree(ptr);
        else
          kfree(ptr);
      
      but kvfree() can handle both kmalloc()ed memory and vmalloc()ed memory
      using is_vmalloc_addr().  Unless callers have special reasons, we can
      replace this branch with kvfree().  Please check and reply if you found
      problems.
      Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NJan Kara <jack@suse.com>
      Acked-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Reviewed-by: NAndreas Dilger <andreas.dilger@intel.com>
      Acked-by: N"Rafael J. Wysocki" <rjw@rjwysocki.net>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Oleg Drokin <oleg.drokin@intel.com>
      Cc: Boris Petkov <bp@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1d5cfdb0
    • R
      dax: never rely on bh.b_dev being set by get_block() · eab95db6
      Ross Zwisler 提交于
      Previously in DAX we assumed that calls to get_block() would set
      bh.b_bdev, and we would then use that value even in error cases for
      debugging.  This caused a NULL pointer dereference in __dax_dbg() which
      was fixed by a previous commit, but that commit only changed the one
      place where we were hitting an error.
      
      Instead, update dax.c so that we always initialize bh.b_bdev as best we
      can based on the information that DAX has.  get_block() may or may not
      update to a new value, but this at least lets us get something helpful
      from bh.b_bdev for error messages and not have to worry about whether it
      was set by get_block() or not.
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Reported-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eab95db6
    • R
      xfs: call dax_pfn_mkwrite() for DAX fsync/msync · 5eb88dca
      Ross Zwisler 提交于
      To properly support the new DAX fsync/msync infrastructure filesystems
      need to call dax_pfn_mkwrite() so that DAX can track when user pages are
      dirtied.
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Jeff Layton <jlayton@poochiereds.net>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5eb88dca
    • R
      ext4: call dax_pfn_mkwrite() for DAX fsync/msync · d5be7a03
      Ross Zwisler 提交于
      To properly support the new DAX fsync/msync infrastructure filesystems
      need to call dax_pfn_mkwrite() so that DAX can track when user pages are
      dirtied.
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Jeff Layton <jlayton@poochiereds.net>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d5be7a03
    • R
      ext2: call dax_pfn_mkwrite() for DAX fsync/msync · 80b4adca
      Ross Zwisler 提交于
      To properly support the new DAX fsync/msync infrastructure filesystems
      need to call dax_pfn_mkwrite() so that DAX can track when user pages are
      dirtied.
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Jeff Layton <jlayton@poochiereds.net>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      80b4adca
    • R
      dax: add support for fsync/sync · 9973c98e
      Ross Zwisler 提交于
      To properly handle fsync/msync in an efficient way DAX needs to track
      dirty pages so it is able to flush them durably to media on demand.
      
      The tracking of dirty pages is done via the radix tree in struct
      address_space.  This radix tree is already used by the page writeback
      infrastructure for tracking dirty pages associated with an open file,
      and it already has support for exceptional (non struct page*) entries.
      We build upon these features to add exceptional entries to the radix
      tree for DAX dirty PMD or PTE pages at fault time.
      
      [dan.j.williams@intel.com: fix dax_pmd_dbg build warning]
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Jeff Layton <jlayton@poochiereds.net>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9973c98e
    • R
      dax: support dirty DAX entries in radix tree · f9fe48be
      Ross Zwisler 提交于
      Add support for tracking dirty DAX entries in the struct address_space
      radix tree.  This tree is already used for dirty page writeback, and it
      already supports the use of exceptional (non struct page*) entries.
      
      In order to properly track dirty DAX pages we will insert new
      exceptional entries into the radix tree that represent dirty DAX PTE or
      PMD pages.  These exceptional entries will also contain the writeback
      addresses for the PTE or PMD faults that we can use at fsync/msync time.
      
      There are currently two types of exceptional entries (shmem and shadow)
      that can be placed into the radix tree, and this adds a third.  We rely
      on the fact that only one type of exceptional entry can be found in a
      given radix tree based on its usage.  This happens for free with DAX vs
      shmem but we explicitly prevent shadow entries from being added to radix
      trees for DAX mappings.
      
      The only shadow entries that would be generated for DAX radix trees
      would be to track zero page mappings that were created for holes.  These
      pages would receive minimal benefit from having shadow entries, and the
      choice to have only one type of exceptional entry in a given radix tree
      makes the logic simpler both in clear_exceptional_entry() and in the
      rest of DAX.
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Jeff Layton <jlayton@poochiereds.net>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f9fe48be
    • R
      dax: fix conversion of holes to PMDs · de14b9cb
      Ross Zwisler 提交于
      When we get a DAX PMD fault for a write it is possible that there could
      be some number of 4k zero pages already present for the same range that
      were inserted to service reads from a hole.  These 4k zero pages need to
      be unmapped from the VMAs and removed from the struct address_space
      radix tree before the real DAX PMD entry can be inserted.
      
      For PTE faults this same use case also exists and is handled by a
      combination of unmap_mapping_range() to unmap the VMAs and
      delete_from_page_cache() to remove the page from the address_space radix
      tree.
      
      For PMD faults we do have a call to unmap_mapping_range() (protected by
      a buffer_new() check), but nothing clears out the radix tree entry.  The
      buffer_new() check is also incorrect as the current ext4 and XFS
      filesystem code will never return a buffer_head with BH_New set, even
      when allocating new blocks over a hole.  Instead the filesystem will
      zero the blocks manually and return a buffer_head with only BH_Mapped
      set.
      
      Fix this situation by removing the buffer_new() check and adding a call
      to truncate_inode_pages_range() to clear out the radix tree entries
      before we insert the DAX PMD.
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Reported-by: NDan Williams <dan.j.williams@intel.com>
      Tested-by: NDan Williams <dan.j.williams@intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jeff Layton <jlayton@poochiereds.net>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      de14b9cb
    • R
      dax: fix NULL pointer dereference in __dax_dbg() · d4bbe706
      Ross Zwisler 提交于
      In __dax_pmd_fault() we currently assume that get_block() will always
      set bh.b_bdev and we unconditionally dereference it in __dax_dbg().
      
      This assumption isn't always true - when called for reads of holes
      ext4_dax_mmap_get_block() returns a buffer head where bh->b_bdev is
      never set.  I hit this BUG while testing the DAX PMD fault path.
      
      Instead, initialize bh.b_bdev before passing bh into get_block().  It is
      possible that the filesystem's get_block() will update bh.b_bdev, and
      this is fine - we just want to initialize bh.b_bdev to something
      reasonable so that the calls to __dax_dbg() work and print something
      useful.
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Reported-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Jan Kara <jack@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d4bbe706
    • A
      wrappers for ->i_mutex access · 5955102c
      Al Viro 提交于
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5955102c
    • D
      btrfs: tweak free space tree bitmap allocation · 79b134a2
      David Sterba 提交于
      The requested bitmap size varies, observed numbers were < 4K up to 16K.
      Using vmalloc unconditionally would be too heavy, we'll try contiguous
      allocations first and fall back to vmalloc if there's no contig memory.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      79b134a2
    • T
      pNFS/flexfiles: Fix an XDR encoding bug in layoutreturn · 082fa37d
      Trond Myklebust 提交于
      We must not skip encoding the statistics, or the server will see an
      XDR encoding error.
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      Cc: stable@vger.kernel.org # 4.0+
      082fa37d
  7. 22 1月, 2016 15 次提交
  8. 21 1月, 2016 3 次提交