1. 09 5月, 2017 10 次提交
  2. 06 5月, 2017 2 次提交
    • B
      GFS2: Allow glocks to be unlocked after withdraw · ed17545d
      Bob Peterson 提交于
      This bug fixes a regression introduced by patch 0d1c7ae9.
      
      The intent of the patch was to stop promoting glocks after a
      file system is withdrawn due to a variety of errors, because doing
      so results in a BUG(). (You should be able to unmount after a
      withdraw rather than having the kernel panic.)
      
      Unfortunately, it also stopped demotions, so glocks could not be
      unlocked after withdraw, which means the unmount would hang.
      
      This patch allows function do_xmote to demote locks to an
      unlocked state after a withdraw, but not promote them.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      ed17545d
    • E
      xfs: fix use-after-free in xfs_finish_page_writeback · 161f55ef
      Eryu Guan 提交于
      Commit 28b783e4 ("xfs: bufferhead chains are invalid after
      end_page_writeback") fixed one use-after-free issue by
      pre-calculating the loop conditionals before calling bh->b_end_io()
      in the end_io processing loop, but it assigned 'next' pointer before
      checking end offset boundary & breaking the loop, at which point the
      bh might be freed already, and caused use-after-free.
      
      This is caught by KASAN when running fstests generic/127 on sub-page
      block size XFS.
      
      [ 2517.244502] run fstests generic/127 at 2017-04-27 07:30:50
      [ 2747.868840] ==================================================================
      [ 2747.876949] BUG: KASAN: use-after-free in xfs_destroy_ioend+0x3d3/0x4e0 [xfs] at addr ffff8801395ae698
      ...
      [ 2747.918245] Call Trace:
      [ 2747.920975]  dump_stack+0x63/0x84
      [ 2747.924673]  kasan_object_err+0x21/0x70
      [ 2747.928950]  kasan_report+0x271/0x530
      [ 2747.933064]  ? xfs_destroy_ioend+0x3d3/0x4e0 [xfs]
      [ 2747.938409]  ? end_page_writeback+0xce/0x110
      [ 2747.943171]  __asan_report_load8_noabort+0x19/0x20
      [ 2747.948545]  xfs_destroy_ioend+0x3d3/0x4e0 [xfs]
      [ 2747.953724]  xfs_end_io+0x1af/0x2b0 [xfs]
      [ 2747.958197]  process_one_work+0x5ff/0x1000
      [ 2747.962766]  worker_thread+0xe4/0x10e0
      [ 2747.966946]  kthread+0x2d3/0x3d0
      [ 2747.970546]  ? process_one_work+0x1000/0x1000
      [ 2747.975405]  ? kthread_create_on_node+0xc0/0xc0
      [ 2747.980457]  ? syscall_return_slowpath+0xe6/0x140
      [ 2747.985706]  ? do_page_fault+0x30/0x80
      [ 2747.989887]  ret_from_fork+0x2c/0x40
      [ 2747.993874] Object at ffff8801395ae690, in cache buffer_head size: 104
      [ 2748.001155] Allocated:
      [ 2748.003782] PID = 8327
      [ 2748.006411]  save_stack_trace+0x1b/0x20
      [ 2748.010688]  save_stack+0x46/0xd0
      [ 2748.014383]  kasan_kmalloc+0xad/0xe0
      [ 2748.018370]  kasan_slab_alloc+0x12/0x20
      [ 2748.022648]  kmem_cache_alloc+0xb8/0x1b0
      [ 2748.027024]  alloc_buffer_head+0x22/0xc0
      [ 2748.031399]  alloc_page_buffers+0xd1/0x250
      [ 2748.035968]  create_empty_buffers+0x30/0x410
      [ 2748.040730]  create_page_buffers+0x120/0x1b0
      [ 2748.045493]  __block_write_begin_int+0x17a/0x1800
      [ 2748.050740]  iomap_write_begin+0x100/0x2f0
      [ 2748.055308]  iomap_zero_range_actor+0x253/0x5c0
      [ 2748.060362]  iomap_apply+0x157/0x270
      [ 2748.064347]  iomap_zero_range+0x5a/0x80
      [ 2748.068624]  iomap_truncate_page+0x6b/0xa0
      [ 2748.073227]  xfs_setattr_size+0x1f7/0xa10 [xfs]
      [ 2748.078312]  xfs_vn_setattr_size+0x68/0x140 [xfs]
      [ 2748.083589]  xfs_file_fallocate+0x4ac/0x820 [xfs]
      [ 2748.088838]  vfs_fallocate+0x2cf/0x780
      [ 2748.093021]  SyS_fallocate+0x48/0x80
      [ 2748.097006]  do_syscall_64+0x18a/0x430
      [ 2748.101186]  return_from_SYSCALL_64+0x0/0x6a
      [ 2748.105948] Freed:
      [ 2748.108189] PID = 8327
      [ 2748.110816]  save_stack_trace+0x1b/0x20
      [ 2748.115093]  save_stack+0x46/0xd0
      [ 2748.118788]  kasan_slab_free+0x73/0xc0
      [ 2748.122969]  kmem_cache_free+0x7a/0x200
      [ 2748.127247]  free_buffer_head+0x41/0x80
      [ 2748.131524]  try_to_free_buffers+0x178/0x250
      [ 2748.136316]  xfs_vm_releasepage+0x2e9/0x3d0 [xfs]
      [ 2748.141563]  try_to_release_page+0x100/0x180
      [ 2748.146325]  invalidate_inode_pages2_range+0x7da/0xcf0
      [ 2748.152087]  xfs_shift_file_space+0x37d/0x6e0 [xfs]
      [ 2748.157557]  xfs_collapse_file_space+0x49/0x120 [xfs]
      [ 2748.163223]  xfs_file_fallocate+0x2a7/0x820 [xfs]
      [ 2748.168462]  vfs_fallocate+0x2cf/0x780
      [ 2748.172642]  SyS_fallocate+0x48/0x80
      [ 2748.176629]  do_syscall_64+0x18a/0x430
      [ 2748.180810]  return_from_SYSCALL_64+0x0/0x6a
      
      Fixed it by checking on offset against end & breaking out first,
      dereference bh only if there're still bufferheads to process.
      Signed-off-by: NEryu Guan <eguan@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      161f55ef
  3. 05 5月, 2017 5 次提交
  4. 04 5月, 2017 16 次提交
    • J
      nfs: Fix bdi handling for cloned superblocks · 9052c7cf
      Jan Kara 提交于
      In commit 0d3b12584972 "nfs: Convert to separately allocated bdi" I have
      wrongly cloned bdi reference in nfs_clone_super(). Further inspection
      has shown that originally the code was actually allocating a new bdi (in
      ->clone_server callback) which was later registered in
      nfs_fs_mount_common() and used for sb->s_bdi in nfs_initialise_sb().
      This could later result in bdi for the original superblock not getting
      unregistered when that superblock got shutdown (as the cloned sb still
      held bdi reference) and later when a new superblock was created under
      the same anonymous device number, a clash in sysfs has happened on bdi
      registration:
      
      ------------[ cut here ]------------
      WARNING: CPU: 1 PID: 10284 at /linux-next/fs/sysfs/dir.c:31 sysfs_warn_dup+0x64/0x74
      sysfs: cannot create duplicate filename '/devices/virtual/bdi/0:32'
      Modules linked in: axp20x_usb_power gpio_axp209 nvmem_sunxi_sid sun4i_dma sun4i_ss virt_dma
      CPU: 1 PID: 10284 Comm: mount.nfs Not tainted 4.11.0-rc4+ #14
      Hardware name: Allwinner sun7i (A20) Family
      [<c010f19c>] (unwind_backtrace) from [<c010bc74>] (show_stack+0x10/0x14)
      [<c010bc74>] (show_stack) from [<c03c6e24>] (dump_stack+0x78/0x8c)
      [<c03c6e24>] (dump_stack) from [<c0122200>] (__warn+0xe8/0x100)
      [<c0122200>] (__warn) from [<c0122250>] (warn_slowpath_fmt+0x38/0x48)
      [<c0122250>] (warn_slowpath_fmt) from [<c02ac178>] (sysfs_warn_dup+0x64/0x74)
      [<c02ac178>] (sysfs_warn_dup) from [<c02ac254>] (sysfs_create_dir_ns+0x84/0x94)
      [<c02ac254>] (sysfs_create_dir_ns) from [<c03c8b8c>] (kobject_add_internal+0x9c/0x2ec)
      [<c03c8b8c>] (kobject_add_internal) from [<c03c8e24>] (kobject_add+0x48/0x98)
      [<c03c8e24>] (kobject_add) from [<c048d75c>] (device_add+0xe4/0x5a0)
      [<c048d75c>] (device_add) from [<c048ddb4>] (device_create_groups_vargs+0xac/0xbc)
      [<c048ddb4>] (device_create_groups_vargs) from [<c048dde4>] (device_create_vargs+0x20/0x28)
      [<c048dde4>] (device_create_vargs) from [<c02075c8>] (bdi_register_va+0x44/0xfc)
      [<c02075c8>] (bdi_register_va) from [<c023d378>] (super_setup_bdi_name+0x48/0xa4)
      [<c023d378>] (super_setup_bdi_name) from [<c0312ef4>] (nfs_fill_super+0x1a4/0x204)
      [<c0312ef4>] (nfs_fill_super) from [<c03133f0>] (nfs_fs_mount_common+0x140/0x1e8)
      [<c03133f0>] (nfs_fs_mount_common) from [<c03335cc>] (nfs4_remote_mount+0x50/0x58)
      [<c03335cc>] (nfs4_remote_mount) from [<c023ef98>] (mount_fs+0x14/0xa4)
      [<c023ef98>] (mount_fs) from [<c025cba0>] (vfs_kern_mount+0x54/0x128)
      [<c025cba0>] (vfs_kern_mount) from [<c033352c>] (nfs_do_root_mount+0x80/0xa0)
      [<c033352c>] (nfs_do_root_mount) from [<c0333818>] (nfs4_try_mount+0x28/0x3c)
      [<c0333818>] (nfs4_try_mount) from [<c0313874>] (nfs_fs_mount+0x2cc/0x8c4)
      [<c0313874>] (nfs_fs_mount) from [<c023ef98>] (mount_fs+0x14/0xa4)
      [<c023ef98>] (mount_fs) from [<c025cba0>] (vfs_kern_mount+0x54/0x128)
      [<c025cba0>] (vfs_kern_mount) from [<c02600f0>] (do_mount+0x158/0xc7c)
      [<c02600f0>] (do_mount) from [<c0260f98>] (SyS_mount+0x8c/0xb4)
      [<c0260f98>] (SyS_mount) from [<c0107840>] (ret_fast_syscall+0x0/0x3c)
      
      Fix the problem by always creating new bdi for a superblock as we used
      to do.
      Reported-and-tested-by: NCorentin Labbe <clabbe.montjoie@gmail.com>
      Fixes: 0d3b12584972ce5781179ad3f15cca3cdb5cae05
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      9052c7cf
    • S
      SMB3: Work around mount failure when using SMB3 dialect to Macs · 7db0a6ef
      Steve French 提交于
      Macs send the maximum buffer size in response on ioctl to validate
      negotiate security information, which causes us to fail the mount
      as the response buffer is larger than the expected response.
      
      Changed ioctl response processing to allow for padding of validate
      negotiate ioctl response and limit the maximum response size to
      maximum buffer size.
      Signed-off-by: NSteve French <steve.french@primarydata.com>
      CC: Stable <stable@vger.kernel.org>
      7db0a6ef
    • D
      cifs: fix CIFS_IOC_GET_MNT_INFO oops · d8a6e505
      David Disseldorp 提交于
      An open directory may have a NULL private_data pointer prior to readdir.
      
      Fixes: 0de1f4c6 ("Add way to query server fs info for smb3")
      Cc: stable@vger.kernel.org
      Signed-off-by: NDavid Disseldorp <ddiss@suse.de>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      d8a6e505
    • B
      CIFS: fix mapping of SFM_SPACE and SFM_PERIOD · b704e70b
      Björn Jacke 提交于
      - trailing space maps to 0xF028
      - trailing period maps to 0xF029
      
      This fix corrects the mapping of file names which have a trailing character
      that would otherwise be illegal (period or space) but is allowed by POSIX.
      Signed-off-by: NBjoern Jacke <bjacke@samba.org>
      CC: Stable <stable@vger.kernel.org>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      b704e70b
    • A
      fs/block_dev: always invalidate cleancache in invalidate_bdev() · a5f6a6a9
      Andrey Ryabinin 提交于
      invalidate_bdev() calls cleancache_invalidate_inode() iff ->nrpages != 0
      which doen't make any sense.
      
      Make sure that invalidate_bdev() always calls cleancache_invalidate_inode()
      regardless of mapping->nrpages value.
      
      Fixes: c515e1fd ("mm/fs: add hooks to support cleancache")
      Link: http://lkml.kernel.org/r/20170424164135.22350-3-aryabinin@virtuozzo.comSigned-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Acked-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Alexey Kuznetsov <kuznet@virtuozzo.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Nikolay Borisov <n.borisov.lkml@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a5f6a6a9
    • A
      fs: fix data invalidation in the cleancache during direct IO · 55635ba7
      Andrey Ryabinin 提交于
      Patch series "Properly invalidate data in the cleancache", v2.
      
      We've noticed that after direct IO write, buffered read sometimes gets
      stale data which is coming from the cleancache.  The reason for this is
      that some direct write hooks call call invalidate_inode_pages2[_range]()
      conditionally iff mapping->nrpages is not zero, so we may not invalidate
      data in the cleancache.
      
      Another odd thing is that we check only for ->nrpages and don't check
      for ->nrexceptional, but invalidate_inode_pages2[_range] also
      invalidates exceptional entries as well.  So we invalidate exceptional
      entries only if ->nrpages != 0? This doesn't feel right.
      
       - Patch 1 fixes direct IO writes by removing ->nrpages check.
       - Patch 2 fixes similar case in invalidate_bdev().
           Note: I only fixed conditional cleancache_invalidate_inode() here.
             Do we also need to add ->nrexceptional check in into invalidate_bdev()?
      
       - Patches 3-4: some optimizations.
      
      This patch (of 4):
      
      Some direct IO write fs hooks call invalidate_inode_pages2[_range]()
      conditionally iff mapping->nrpages is not zero.  This can't be right,
      because invalidate_inode_pages2[_range]() also invalidate data in the
      cleancache via cleancache_invalidate_inode() call.  So if page cache is
      empty but there is some data in the cleancache, buffered read after
      direct IO write would get stale data from the cleancache.
      
      Also it doesn't feel right to check only for ->nrpages because
      invalidate_inode_pages2[_range] invalidates exceptional entries as well.
      
      Fix this by calling invalidate_inode_pages2[_range]() regardless of
      nrpages state.
      
      Note: nfs,cifs,9p doesn't need similar fix because the never call
      cleancache_get_page() (nor directly, nor via mpage_readpage[s]()), so
      they are not affected by this bug.
      
      Fixes: c515e1fd ("mm/fs: add hooks to support cleancache")
      Link: http://lkml.kernel.org/r/20170424164135.22350-2-aryabinin@virtuozzo.comSigned-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Acked-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Alexey Kuznetsov <kuznet@virtuozzo.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Nikolay Borisov <n.borisov.lkml@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      55635ba7
    • M
      jbd2: make the whole kjournald2 kthread NOFS safe · eb52da3f
      Michal Hocko 提交于
      kjournald2 is central to the transaction commit processing.  As such any
      potential allocation from this kernel thread has to be GFP_NOFS.  Make
      sure to mark the whole kernel thread GFP_NOFS by the memalloc_nofs_save.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Link: http://lkml.kernel.org/r/20170306131408.9828-8-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Suggested-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Chris Mason <clm@fb.com>
      Cc: David Sterba <dsterba@suse.cz>
      Cc: Brian Foster <bfoster@redhat.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: Nikolay Borisov <nborisov@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eb52da3f
    • M
      jbd2: mark the transaction context with the scope GFP_NOFS context · 81378da6
      Michal Hocko 提交于
      now that we have memalloc_nofs_{save,restore} api we can mark the whole
      transaction context as implicitly GFP_NOFS.  All allocations will
      automatically inherit GFP_NOFS this way.  This means that we do not have
      to mark any of those requests with GFP_NOFS and moreover all the
      ext4_kv[mz]alloc(GFP_NOFS) are also safe now because even the hardcoded
      GFP_KERNEL allocations deep inside the vmalloc will be NOFS now.
      
      [akpm@linux-foundation.org: tweak comments]
      Link: http://lkml.kernel.org/r/20170306131408.9828-7-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Chris Mason <clm@fb.com>
      Cc: David Sterba <dsterba@suse.cz>
      Cc: Brian Foster <bfoster@redhat.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: Nikolay Borisov <nborisov@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      81378da6
    • M
      xfs: use memalloc_nofs_{save,restore} instead of memalloc_noio* · 9ba1fb2c
      Michal Hocko 提交于
      kmem_zalloc_large and _xfs_buf_map_pages use memalloc_noio_{save,restore}
      API to prevent from reclaim recursion into the fs because vmalloc can
      invoke unconditional GFP_KERNEL allocations and these functions might be
      called from the NOFS contexts.  The memalloc_noio_save will enforce
      GFP_NOIO context which is even weaker than GFP_NOFS and that seems to be
      unnecessary.  Let's use memalloc_nofs_{save,restore} instead as it
      should provide exactly what we need here - implicit GFP_NOFS context.
      
      Link: http://lkml.kernel.org/r/20170306131408.9828-6-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Chris Mason <clm@fb.com>
      Cc: David Sterba <dsterba@suse.cz>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Nikolay Borisov <nborisov@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9ba1fb2c
    • M
      mm: introduce memalloc_nofs_{save,restore} API · 7dea19f9
      Michal Hocko 提交于
      GFP_NOFS context is used for the following 5 reasons currently:
      
       - to prevent from deadlocks when the lock held by the allocation
         context would be needed during the memory reclaim
      
       - to prevent from stack overflows during the reclaim because the
         allocation is performed from a deep context already
      
       - to prevent lockups when the allocation context depends on other
         reclaimers to make a forward progress indirectly
      
       - just in case because this would be safe from the fs POV
      
       - silence lockdep false positives
      
      Unfortunately overuse of this allocation context brings some problems to
      the MM.  Memory reclaim is much weaker (especially during heavy FS
      metadata workloads), OOM killer cannot be invoked because the MM layer
      doesn't have enough information about how much memory is freeable by the
      FS layer.
      
      In many cases it is far from clear why the weaker context is even used
      and so it might be used unnecessarily.  We would like to get rid of
      those as much as possible.  One way to do that is to use the flag in
      scopes rather than isolated cases.  Such a scope is declared when really
      necessary, tracked per task and all the allocation requests from within
      the context will simply inherit the GFP_NOFS semantic.
      
      Not only this is easier to understand and maintain because there are
      much less problematic contexts than specific allocation requests, this
      also helps code paths where FS layer interacts with other layers (e.g.
      crypto, security modules, MM etc...) and there is no easy way to convey
      the allocation context between the layers.
      
      Introduce memalloc_nofs_{save,restore} API to control the scope of
      GFP_NOFS allocation context.  This is basically copying
      memalloc_noio_{save,restore} API we have for other restricted allocation
      context GFP_NOIO.  The PF_MEMALLOC_NOFS flag already exists and it is
      just an alias for PF_FSTRANS which has been xfs specific until recently.
      There are no more PF_FSTRANS users anymore so let's just drop it.
      
      PF_MEMALLOC_NOFS is now checked in the MM layer and drops __GFP_FS
      implicitly same as PF_MEMALLOC_NOIO drops __GFP_IO.  memalloc_noio_flags
      is renamed to current_gfp_context because it now cares about both
      PF_MEMALLOC_NOFS and PF_MEMALLOC_NOIO contexts.  Xfs code paths preserve
      their semantic.  kmem_flags_convert() doesn't need to evaluate the flag
      anymore.
      
      This patch shouldn't introduce any functional changes.
      
      Let's hope that filesystems will drop direct GFP_NOFS (resp.  ~__GFP_FS)
      usage as much as possible and only use a properly documented
      memalloc_nofs_{save,restore} checkpoints where they are appropriate.
      
      [akpm@linux-foundation.org: fix comment typo, reflow comment]
      Link: http://lkml.kernel.org/r/20170306131408.9828-5-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Chris Mason <clm@fb.com>
      Cc: David Sterba <dsterba@suse.cz>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Brian Foster <bfoster@redhat.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: Nikolay Borisov <nborisov@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7dea19f9
    • M
      xfs: abstract PF_FSTRANS to PF_MEMALLOC_NOFS · 9070733b
      Michal Hocko 提交于
      xfs has defined PF_FSTRANS to declare a scope GFP_NOFS semantic quite
      some time ago.  We would like to make this concept more generic and use
      it for other filesystems as well.  Let's start by giving the flag a more
      generic name PF_MEMALLOC_NOFS which is in line with an exiting
      PF_MEMALLOC_NOIO already used for the same purpose for GFP_NOIO
      contexts.  Replace all PF_FSTRANS usage from the xfs code in the first
      step before we introduce a full API for it as xfs uses the flag directly
      anyway.
      
      This patch doesn't introduce any functional change.
      
      Link: http://lkml.kernel.org/r/20170306131408.9828-4-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Chris Mason <clm@fb.com>
      Cc: David Sterba <dsterba@suse.cz>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Nikolay Borisov <nborisov@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9070733b
    • S
      proc: show MADV_FREE pages info in smaps · cf8496ea
      Shaohua Li 提交于
      Show MADV_FREE pages info of each vma in smaps.  The interface is for
      diganose or monitoring purpose, userspace could use it to understand
      what happens in the application.  Since userspace could dirty MADV_FREE
      pages without notice from kernel, this interface is the only place we
      can get accurate accounting info about MADV_FREE pages.
      
      [mhocko@kernel.org: update Documentation/filesystems/proc.txt]
      Link: http://lkml.kernel.org/r/89efde633559de1ec07444f2ef0f4963a97a2ce8.1487965799.git.shli@fb.comSigned-off-by: NShaohua Li <shli@fb.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cf8496ea
    • G
      fs/ocfs2/cluster: use offset_in_page() macro · d47736fa
      Geliang Tang 提交于
      Use offset_in_page() macro instead of open-coding.
      
      Link: http://lkml.kernel.org/r/4dbc77ccaaed98b183cf4dba58a4fa325fd65048.1492758503.git.geliangtang@gmail.comSigned-off-by: NGeliang Tang <geliangtang@gmail.com>
      Cc: Mark Fasheh <mfasheh@versity.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <jiangqi903@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d47736fa
    • J
      ocfs2: o2hb: revert hb threshold to keep compatible · 33496c3c
      Junxiao Bi 提交于
      Configfs is the interface for ocfs2-tools to set configure to kernel and
      $configfs_dir/cluster/$clustername/heartbeat/dead_threshold is the one
      used to configure heartbeat dead threshold.  Kernel has a default value
      of it but user can set O2CB_HEARTBEAT_THRESHOLD in /etc/sysconfig/o2cb
      to override it.
      
      Commit 45b99773 ("ocfs2/cluster: use per-attribute show and store
      methods") changed heartbeat dead threshold name while ocfs2-tools did
      not, so ocfs2-tools won't set this configurable and the default value is
      always used.  So revert it.
      
      Fixes: 45b99773 ("ocfs2/cluster: use per-attribute show and store methods")
      Link: http://lkml.kernel.org/r/1490665245-15374-1-git-send-email-junxiao.bi@oracle.comSigned-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Acked-by: NJoseph Qi <jiangqi903@gmail.com>
      Cc: Mark Fasheh <mfasheh@versity.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      33496c3c
    • G
      667b8a37
    • D
      xfs: reserve enough blocks to handle btree splits when remapping · fe0be23e
      Darrick J. Wong 提交于
      In xfs_reflink_end_cow, we erroneously reserve only enough blocks to
      handle adding 1 extent.  This is problematic if we fragment free space,
      have to do CoW, and then have to perform multiple bmap btree expansions.
      Furthermore, the BUI recovery routine doesn't reserve /any/ blocks to
      handle btree splits, so log recovery fails after our first error causes
      the filesystem to go down.
      
      Therefore, refactor the transaction block reservation macros until we
      have a macro that works for our deferred (re)mapping activities, and fix
      both problems by using that macro.
      
      With 1k blocks we can hit this fairly often in g/187 if the scratch fs
      is big enough.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      fe0be23e
  5. 03 5月, 2017 7 次提交
    • R
      CIFS: fix oplock break deadlocks · 3998e6b8
      Rabin Vincent 提交于
      When the final cifsFileInfo_put() is called from cifsiod and an oplock
      break work is queued, lockdep complains loudly:
      
       =============================================
       [ INFO: possible recursive locking detected ]
       4.11.0+ #21 Not tainted
       ---------------------------------------------
       kworker/0:2/78 is trying to acquire lock:
        ("cifsiod"){++++.+}, at: flush_work+0x215/0x350
      
       but task is already holding lock:
        ("cifsiod"){++++.+}, at: process_one_work+0x255/0x8e0
      
       other info that might help us debug this:
        Possible unsafe locking scenario:
      
              CPU0
              ----
         lock("cifsiod");
         lock("cifsiod");
      
        *** DEADLOCK ***
      
        May be due to missing lock nesting notation
      
       2 locks held by kworker/0:2/78:
        #0:  ("cifsiod"){++++.+}, at: process_one_work+0x255/0x8e0
        #1:  ((&wdata->work)){+.+...}, at: process_one_work+0x255/0x8e0
      
       stack backtrace:
       CPU: 0 PID: 78 Comm: kworker/0:2 Not tainted 4.11.0+ #21
       Workqueue: cifsiod cifs_writev_complete
       Call Trace:
        dump_stack+0x85/0xc2
        __lock_acquire+0x17dd/0x2260
        ? match_held_lock+0x20/0x2b0
        ? trace_hardirqs_off_caller+0x86/0x130
        ? mark_lock+0xa6/0x920
        lock_acquire+0xcc/0x260
        ? lock_acquire+0xcc/0x260
        ? flush_work+0x215/0x350
        flush_work+0x236/0x350
        ? flush_work+0x215/0x350
        ? destroy_worker+0x170/0x170
        __cancel_work_timer+0x17d/0x210
        ? ___preempt_schedule+0x16/0x18
        cancel_work_sync+0x10/0x20
        cifsFileInfo_put+0x338/0x7f0
        cifs_writedata_release+0x2a/0x40
        ? cifs_writedata_release+0x2a/0x40
        cifs_writev_complete+0x29d/0x850
        ? preempt_count_sub+0x18/0xd0
        process_one_work+0x304/0x8e0
        worker_thread+0x9b/0x6a0
        kthread+0x1b2/0x200
        ? process_one_work+0x8e0/0x8e0
        ? kthread_create_on_node+0x40/0x40
        ret_from_fork+0x31/0x40
      
      This is a real warning.  Since the oplock is queued on the same
      workqueue this can deadlock if there is only one worker thread active
      for the workqueue (which will be the case during memory pressure when
      the rescuer thread is handling it).
      
      Furthermore, there is at least one other kind of hang possible due to
      the oplock break handling if there is only worker.  (This can be
      reproduced without introducing memory pressure by having passing 1 for
      the max_active parameter of cifsiod.) cifs_oplock_break() can wait
      indefintely in the filemap_fdatawait() while the cifs_writev_complete()
      work is blocked:
      
       sysrq: SysRq : Show Blocked State
         task                        PC stack   pid father
       kworker/0:1     D    0    16      2 0x00000000
       Workqueue: cifsiod cifs_oplock_break
       Call Trace:
        __schedule+0x562/0xf40
        ? mark_held_locks+0x4a/0xb0
        schedule+0x57/0xe0
        io_schedule+0x21/0x50
        wait_on_page_bit+0x143/0x190
        ? add_to_page_cache_lru+0x150/0x150
        __filemap_fdatawait_range+0x134/0x190
        ? do_writepages+0x51/0x70
        filemap_fdatawait_range+0x14/0x30
        filemap_fdatawait+0x3b/0x40
        cifs_oplock_break+0x651/0x710
        ? preempt_count_sub+0x18/0xd0
        process_one_work+0x304/0x8e0
        worker_thread+0x9b/0x6a0
        kthread+0x1b2/0x200
        ? process_one_work+0x8e0/0x8e0
        ? kthread_create_on_node+0x40/0x40
        ret_from_fork+0x31/0x40
       dd              D    0   683    171 0x00000000
       Call Trace:
        __schedule+0x562/0xf40
        ? mark_held_locks+0x29/0xb0
        schedule+0x57/0xe0
        io_schedule+0x21/0x50
        wait_on_page_bit+0x143/0x190
        ? add_to_page_cache_lru+0x150/0x150
        __filemap_fdatawait_range+0x134/0x190
        ? do_writepages+0x51/0x70
        filemap_fdatawait_range+0x14/0x30
        filemap_fdatawait+0x3b/0x40
        filemap_write_and_wait+0x4e/0x70
        cifs_flush+0x6a/0xb0
        filp_close+0x52/0xa0
        __close_fd+0xdc/0x150
        SyS_close+0x33/0x60
        entry_SYSCALL_64_fastpath+0x1f/0xbe
      
       Showing all locks held in the system:
       2 locks held by kworker/0:1/16:
        #0:  ("cifsiod"){.+.+.+}, at: process_one_work+0x255/0x8e0
        #1:  ((&cfile->oplock_break)){+.+.+.}, at: process_one_work+0x255/0x8e0
      
       Showing busy workqueues and worker pools:
       workqueue cifsiod: flags=0xc
         pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/1
           in-flight: 16:cifs_oplock_break
           delayed: cifs_writev_complete, cifs_echo_request
       pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=0s workers=3 idle: 750 3
      
      Fix these problems by creating a a new workqueue (with a rescuer) for
      the oplock break work.
      Signed-off-by: NRabin Vincent <rabinv@axis.com>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      CC: Stable <stable@vger.kernel.org>
      3998e6b8
    • D
      cifs: fix CIFS_ENUMERATE_SNAPSHOTS oops · 6026685d
      David Disseldorp 提交于
      As with 61876395, an open directory may have a NULL private_data
      pointer prior to readdir. CIFS_ENUMERATE_SNAPSHOTS must check for this
      before dereference.
      
      Fixes: 834170c8 ("Enable previous version support")
      Signed-off-by: NDavid Disseldorp <ddiss@suse.de>
      CC: Stable <stable@vger.kernel.org>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      6026685d
    • D
      cifs: fix leak in FSCTL_ENUM_SNAPS response handling · 0e5c7955
      David Disseldorp 提交于
      The server may respond with success, and an output buffer less than
      sizeof(struct smb_snapshot_array) in length. Do not leak the output
      buffer in this case.
      
      Fixes: 834170c8 ("Enable previous version support")
      Signed-off-by: NDavid Disseldorp <ddiss@suse.de>
      CC: Stable <stable@vger.kernel.org>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      0e5c7955
    • S
      Set unicode flag on cifs echo request to avoid Mac error · 26c9cb66
      Steve French 提交于
      Mac requires the unicode flag to be set for cifs, even for the smb
      echo request (which doesn't have strings).
      
      Without this Mac rejects the periodic echo requests (when mounting
      with cifs) that we use to check if server is down
      Signed-off-by: NSteve French <smfrench@gmail.com>
      CC: Stable <stable@vger.kernel.org>
      26c9cb66
    • P
      CIFS: Add asynchronous write support through kernel AIO · c610c4b6
      Pavel Shilovsky 提交于
      This patch adds support to process write calls passed by io_submit()
      asynchronously. It based on the previously introduced async context
      that allows to process i/o responses in a separate thread and
      return the caller immediately for asynchronous calls.
      
      This improves writing performance of single threaded applications
      with increasing of i/o queue depth size.
      Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      c610c4b6
    • P
      CIFS: Add asynchronous read support through kernel AIO · 6685c5e2
      Pavel Shilovsky 提交于
      This patch adds support to process read calls passed by io_submit()
      asynchronously. It based on the previously introduced async context
      that allows to process i/o responses in a separate thread and
      return the caller immediately for asynchronous calls.
      
      This improves reading performance of single threaded applications
      with increasing of i/o queue depth size.
      Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      6685c5e2
    • P
      CIFS: Add asynchronous context to support kernel AIO · ccf7f408
      Pavel Shilovsky 提交于
      Currently the code doesn't recognize asynchronous calls passed
      by io_submit() and processes all calls synchronously. This is not
      what kernel AIO expects. This patch introduces a new async context
      that keeps track of all issued i/o requests and moves a response
      collecting procedure to a separate thread. This allows to return
      to a caller immediately for async calls and call iocb->ki_complete()
      once all requests are completed. For sync calls the current thread
      simply waits until all requests are completed.
      Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      ccf7f408