1. 07 3月, 2014 2 次提交
    • D
      xfs: use NOIO contexts for vm_map_ram · ae687e58
      Dave Chinner 提交于
      When we map pages in the buffer cache, we can do so in GFP_NOFS
      contexts. However, the vmap interfaces do not provide any method of
      communicating this information to memory reclaim, and hence we get
      lockdep complaining about it regularly and occassionally see hangs
      that may be vmap related reclaim deadlocks. We can also see these
      same problems from anywhere where we use vmalloc for a large buffer
      (e.g. attribute code) inside a transaction context.
      
      A typical lockdep report shows up as a reclaim state warning like so:
      
      [14046.101458] =================================
      [14046.102850] [ INFO: inconsistent lock state ]
      [14046.102850] 3.14.0-rc4+ #2 Not tainted
      [14046.102850] ---------------------------------
      [14046.102850] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
      [14046.102850] kswapd0/14 [HC0[0]:SC0[0]:HE1:SE1] takes:
      [14046.102850]  (&xfs_dir_ilock_class){++++?+}, at: [<791a04bb>] xfs_ilock+0xff/0x16a
      [14046.102850] {RECLAIM_FS-ON-W} state was registered at:
      [14046.102850]   [<7904cdb1>] mark_held_locks+0x81/0xe7
      [14046.102850]   [<7904d390>] lockdep_trace_alloc+0x5c/0xb4
      [14046.102850]   [<790c2c28>] kmem_cache_alloc_trace+0x2b/0x11e
      [14046.102850]   [<790ba7f4>] vm_map_ram+0x119/0x3e6
      [14046.102850]   [<7914e124>] _xfs_buf_map_pages+0x5b/0xcf
      [14046.102850]   [<7914ed74>] xfs_buf_get_map+0x67/0x13f
      [14046.102850]   [<7917506f>] xfs_attr_rmtval_set+0x396/0x4d5
      [14046.102850]   [<7916e8bb>] xfs_attr_leaf_addname+0x18f/0x37d
      [14046.102850]   [<7916ed9e>] xfs_attr_set_int+0x2f5/0x3e8
      [14046.102850]   [<7916eefc>] xfs_attr_set+0x6b/0x74
      [14046.102850]   [<79168355>] xfs_xattr_set+0x61/0x81
      [14046.102850]   [<790e5b10>] generic_setxattr+0x59/0x68
      [14046.102850]   [<790e4c06>] __vfs_setxattr_noperm+0x58/0xce
      [14046.102850]   [<790e4d0a>] vfs_setxattr+0x8e/0x92
      [14046.102850]   [<790e4ddd>] setxattr+0xcf/0x159
      [14046.102850]   [<790e5423>] SyS_lsetxattr+0x88/0xbb
      [14046.102850]   [<79268438>] sysenter_do_call+0x12/0x36
      
      Now, we can't completely remove these traces - mainly because
      vm_map_ram() will do GFP_KERNEL allocation and that generates the
      above warning before we get into the reclaim code, but we can turn
      them all into false positive warnings.
      
      To do that, use the method that DM and other IO context code uses to
      avoid this problem: there is a process flag to tell memory reclaim
      not to do IO that we can set appropriately. That prevents GFP_KERNEL
      context reclaim being done from deep inside the vmalloc code in
      places we can't directly pass a GFP_NOFS context to. That interface
      has a pair of wrapper functions: memalloc_noio_save() and
      memalloc_noio_restore().
      
      Adding them around vm_map_ram and the vzalloc call in
      kmem_alloc_large() will prevent deadlocks and most lockdep reports
      for this issue. Also, convert the vzalloc() call in
      kmem_alloc_large() to use __vmalloc() so that we can pass the
      correct gfp context to the data page allocation routine inside
      __vmalloc() so that it is clear that GFP_NOFS context is important
      to this vmalloc call.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      ae687e58
    • D
      xfs: don't leak EFSBADCRC to userspace · ac75a1f7
      Dave Chinner 提交于
      While the verifier routines may return EFSBADCRC when a buffer has
      a bad CRC, we need to translate that to EFSCORRUPTED so that the
      higher layers treat the error appropriately and we return a
      consistent error to userspace. This fixes a xfs/005 regression.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      ac75a1f7
  2. 03 2月, 2014 2 次提交
    • M
      hpfs: optimize quad buffer loading · 1c0b8a7a
      Mikulas Patocka 提交于
      HPFS needs to load 4 consecutive 512-byte sectors when accessing the
      directory nodes or bitmaps.  We can't switch to 2048-byte block size
      because files are allocated in the units of 512-byte sectors.
      
      Previously, the driver would allocate a 2048-byte area using kmalloc,
      copy the data from four buffers to this area and eventually copy them
      back if they were modified.
      
      In the current implementation of the buffer cache, buffers are allocated
      in the pagecache.  That means that 4 consecutive 512-byte buffers are
      stored in consecutive areas in the kernel address space.  So, we don't
      need to allocate extra memory and copy the content of the buffers there.
      
      This patch optimizes the code to avoid copying the buffers.  It checks
      if the four buffers are stored in contiguous memory - if they are not,
      it falls back to allocating a 2048-byte area and copying data there.
      Signed-off-by: NMikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1c0b8a7a
    • M
      hpfs: remember free space · 2cbe5c76
      Mikulas Patocka 提交于
      Previously, hpfs scanned all bitmaps each time the user asked for free
      space using statfs.  This patch changes it so that hpfs scans the
      bitmaps only once, remembes the free space and on next invocation of
      statfs it returns the value instantly.
      
      New versions of wine are hammering on the statfs syscall very heavily,
      making some games unplayable when they're stored on hpfs, with load
      times in minutes.
      
      This should be backported to the stable kernels because it fixes
      user-visible problem (excessive level load times in wine).
      Signed-off-by: NMikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2cbe5c76
  3. 02 2月, 2014 1 次提交
  4. 01 2月, 2014 5 次提交
  5. 31 1月, 2014 5 次提交
  6. 30 1月, 2014 7 次提交
  7. 29 1月, 2014 18 次提交
    • C
      Btrfs: fix spin_unlock in check_ref_cleanup · cf93da7b
      Chris Mason 提交于
      Our goto out should have gone a little farther.
      Signed-off-by: NChris Mason <clm@fb.com>
      cf93da7b
    • C
      Btrfs: setup inode location during btrfs_init_inode_locked · 90d3e592
      Chris Mason 提交于
      We have a race during inode init because the BTRFS_I(inode)->location is setup
      after the inode hash table lock is dropped.  btrfs_find_actor uses the location
      field, so our search might not find an existing inode in the hash table if we
      race with the inode init code.
      
      This commit changes things to setup the location field sooner.  Also the find actor now
      uses only the location objectid to match inodes.  For inode hashing, we just
      need a unique and stable test, it doesn't have to reflect the inode numbers we
      show to userland.
      Signed-off-by: NChris Mason <clm@fb.com>
      CC: stable@vger.kernel.org
      90d3e592
    • C
      Btrfs: don't use ram_bytes for uncompressed inline items · 514ac8ad
      Chris Mason 提交于
      If we truncate an uncompressed inline item, ram_bytes isn't updated to reflect
      the new size.  The fixe uses the size directly from the item header when
      reading uncompressed inlines, and also fixes truncate to update the
      size as it goes.
      Reported-by: NJens Axboe <axboe@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      CC: stable@vger.kernel.org
      514ac8ad
    • F
      Btrfs: fix btrfs_search_slot_for_read backwards iteration · 23c6bf6a
      Filipe David Borba Manana 提交于
      If the current path's leaf slot is 0, we do search for the previous
      leaf (via btrfs_prev_leaf) and set the new path's leaf slot to a
      value corresponding to the number of items - 1 of the former leaf.
      Fix this by using the slot set by btrfs_prev_leaf, decrementing it
      by 1 if it's equal to the leaf's number of items.
      
      Use of btrfs_search_slot_for_read() for backward iteration is used in
      particular by the send feature, which could miss items when the input
      leaf has less items than its previous leaf.
      
      This could be reproduced by running btrfs/007 from xfstests in a loop.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      23c6bf6a
    • W
      Btrfs: do not export ulist functions · 49fc647a
      Wang Shilong 提交于
      There are not any users that use ulist except Btrfs,don't
      export them.
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      49fc647a
    • W
      Btrfs: rework ulist with list+rb_tree · 4c7a6f74
      Wang Shilong 提交于
      We are really suffering from now ulist's implementation, some developers
      gave their try, and i just gave some of my ideas for things:
      
       1. use list+rb_tree instead of arrary+rb_tree
      
       2. add cur_list to iterator rather than ulist structure.
      
       3. add seqnum into every node when they are added, this is
       used to do selfcheck when iterating node.
      
      I noticed Zach Brown's comments before, long term is to kick off
      ulist implementation, however, for now, we need at least avoid
      arrary from ulist.
      
      Cc: Liu Bo <bo.li.liu@oracle.com>
      Cc: Josef Bacik <jbacik@fb.com>
      Cc: Zach Brown <zab@redhat.com>
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      4c7a6f74
    • W
      Btrfs: fix memory leaks on walking backrefs failure · f05c4746
      Wang Shilong 提交于
      When walking backrefs, we may iterate every inode's extent
      and add/merge them into ulist, and the caller will free memory
      from ulist.
      
      However, if we fail to allocate inode's extents element
      memory or ulist_add() fail to allocate memory, we won't
      add allocated memory into ulist, and the caller won't
      free some allocated memory thus memory leaks happen.
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      f05c4746
    • F
      Btrfs: fix send file hole detection leading to data corruption · bf54f412
      Filipe David Borba Manana 提交于
      There was a case where file hole detection was incorrect and it would
      cause an incremental send to override a section of a file with zeroes.
      
      This happened in the case where between the last leaf we processed which
      contained a file extent item for our current inode and the leaf we're
      currently are at (and has a file extent item for our current inode) there
      are only leafs containing exclusively file extent items for our current
      inode, and none of them was updated since the previous send operation.
      The file hole detection code would incorrectly consider the file range
      covered by these leafs as a hole.
      
      A test case for xfstests follows soon.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      bf54f412
    • W
      Btrfs: add a reschedule point in btrfs_find_all_roots() · bca1a290
      Wang Shilong 提交于
      I can easily trigger the following warnings when enabling quota
      in my virtual machine(running Opensuse), Steps are firstly creating
      a subvolume full of fragment extents, and then create many snapshots
      (500 in my test case).
      
      [ 2362.808459] BUG: soft lockup - CPU#0 stuck for 22s! [btrfs-qgroup-re:1970]
      
      [ 2362.809023] task: e4af8450 ti: e371c000 task.ti: e371c000
      [ 2362.809026] EIP: 0060:[<fa38f4ae>] EFLAGS: 00000246 CPU: 0
      [ 2362.809049] EIP is at __merge_refs+0x5e/0x100 [btrfs]
      [ 2362.809051] EAX: 00000000 EBX: cfadbcf0 ECX: 00000000 EDX: cfadbcb0
      [ 2362.809052] ESI: dd8d3370 EDI: e371dde0 EBP: e371dd6c ESP: e371dd5c
      [ 2362.809054]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
      [ 2362.809055] CR0: 80050033 CR2: ac454d50 CR3: 009a9000 CR4: 001407d0
      [ 2362.809099] Stack:
      [ 2362.809100]  00000001 e371dde0 dfcc6890 f29f8000 e371de28 fa39016d 00000011 00000001
      [ 2362.809105]  99bfc000 00000000 93928000 00000000 00000001 00000050 e371dda8 00000001
      [ 2362.809109]  f3a31000 f3413000 00000001 e371ddb8 000040a8 00000202 00000000 00000023
      [ 2362.809113] Call Trace:
      [ 2362.809136]  [<fa39016d>] find_parent_nodes+0x34d/0x1280 [btrfs]
      [ 2362.809156]  [<fa391172>] btrfs_find_all_roots+0xb2/0x110 [btrfs]
      [ 2362.809174]  [<fa3934a8>] btrfs_qgroup_rescan_worker+0x358/0x7a0 [btrfs]
      [ 2362.809180]  [<c024d0ce>] ? lock_timer_base.isra.39+0x1e/0x40
      [ 2362.809199]  [<fa3648df>] worker_loop+0xff/0x470 [btrfs]
      [ 2362.809204]  [<c027a88a>] ? __wake_up_locked+0x1a/0x20
      [ 2362.809221]  [<fa3647e0>] ? btrfs_queue_worker+0x2b0/0x2b0 [btrfs]
      [ 2362.809225]  [<c025ebbc>] kthread+0x9c/0xb0
      [ 2362.809229]  [<c06b487b>] ret_from_kernel_thread+0x1b/0x30
      [ 2362.809233]  [<c025eb20>] ? kthread_create_on_node+0x110/0x110
      
      By adding a reschedule point at the end of btrfs_find_all_roots(), i no longer
      hit these warnings.
      
      Cc: Josef Bacik <jbacik@fb.com>
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      bca1a290
    • F
      Btrfs: make send's file extent item search more efficient · 7fdd29d0
      Filipe David Borba Manana 提交于
      Instead of looking for a file extent item, process it, release the path
      and do a btree search for the next file extent item, just process all
      file extent items in a leaf without intermediate btree searches. This way
      we save cpu and we're not blocking other tasks or affecting concurrency on
      the btree, because send's paths use the commit root and skip btree node/leaf
      locking.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      7fdd29d0
    • W
      Btrfs: fix to catch all errors when resolving indirect ref · 95def2ed
      Wang Shilong 提交于
      We can only tolerate ENOENT here, for other errors, we should
      return directly.
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      95def2ed
    • W
      Btrfs: fix protection between walking backrefs and root deletion · 538f72cd
      Wang Shilong 提交于
      There is a race condition between resolving indirect ref and root deletion,
      and we should gurantee that root can not be destroyed to avoid accessing
      broken tree here.
      
      Here we fix it by holding @subvol_srcu, and we will release it as soon
      as we have held root node lock.
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      538f72cd
    • G
      btrfs: fix warning while merging two adjacent extents · 3c9665df
      Gui Hecheng 提交于
      When we have two adjacent extents in relink_extent_backref,
      we try to merge them. When we use btrfs_search_slot to locate the
      slot for the current extent, we shouldn't set "ins_len = 1",
      because we will merge it into the previous extent rather than
      insert a new item. Otherwise, we may happen to create a new leaf
      in btrfs_search_slot and path->slot[0] will be 0. Then we try to
      fetch the previous item using "path->slots[0]--", and it will cause
      a warning as follows:
      
      	[  145.713385] WARNING: CPU: 3 PID: 1796 at fs/btrfs/extent_io.c:5043 map_private_extent_buffer+0xd4/0xe0
      	[  145.713387] btrfs bad mapping eb start 53370886 len 4096, wanted 167772306 8
      	...
      	[  145.713462]  [<ffffffffa034b1f4>] map_private_extent_buffer+0xd4/0xe0
      	[  145.713476]  [<ffffffffa030097a>] ? btrfs_free_path+0x2a/0x40
      	[  145.713485]  [<ffffffffa0340864>] btrfs_get_token_64+0x64/0xf0
      	[  145.713498]  [<ffffffffa033472c>] relink_extent_backref+0x41c/0x820
      	[  145.713508]  [<ffffffffa0334d69>] btrfs_finish_ordered_io+0x239/0xa80
      
      I encounter this warning when running defrag having mkfs.btrfs
      with option -M. At the same time there are read/writes & snapshots
      running at background.
      Signed-off-by: NGui Hecheng <guihc.fnst@cn.fujitsu.com>
      Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      3c9665df
    • F
      Btrfs: fix infinite path build loops in incremental send · 9f03740a
      Filipe David Borba Manana 提交于
      The send operation processes inodes by their ascending number, and assumes
      that any rename/move operation can be successfully performed (sent to the
      caller) once all previous inodes (those with a smaller inode number than the
      one we're currently processing) were processed.
      
      This is not true when an incremental send had to process an hierarchical change
      between 2 snapshots where the parent-children relationship between directory
      inodes was reversed - that is, parents became children and children became
      parents. This situation made the path building code go into an infinite loop,
      which kept allocating more and more memory that eventually lead to a krealloc
      warning being displayed in dmesg:
      
        WARNING: CPU: 1 PID: 5705 at mm/page_alloc.c:2477 __alloc_pages_nodemask+0x365/0xad0()
        Modules linked in: btrfs raid6_pq xor pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) snd_hda_codec_hdmi snd_hda_codec_realtek joydev radeon snd_hda_intel snd_hda_codec snd_hwdep snd_seq_midi snd_pcm psmouse i915 snd_rawmidi serio_raw snd_seq_midi_event lpc_ich snd_seq snd_timer ttm snd_seq_device rfcomm drm_kms_helper parport_pc bnep bluetooth drm ppdev snd soundcore i2c_algo_bit snd_page_alloc binfmt_misc video lp parport r8169 mii hid_generic usbhid hid
        CPU: 1 PID: 5705 Comm: btrfs Tainted: G           O 3.13.0-rc7-fdm-btrfs-next-18+ #3
        Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z77 Pro4, BIOS P1.50 09/04/2012
        [ 5381.660441]  00000000000009ad ffff8806f6f2f4e8 ffffffff81777434 0000000000000007
        [ 5381.660447]  0000000000000000 ffff8806f6f2f528 ffffffff8104a9ec ffff8807038f36f0
        [ 5381.660452]  0000000000000000 0000000000000206 ffff8807038f2490 ffff8807038f36f0
        [ 5381.660457] Call Trace:
        [ 5381.660464]  [<ffffffff81777434>] dump_stack+0x4e/0x68
        [ 5381.660471]  [<ffffffff8104a9ec>] warn_slowpath_common+0x8c/0xc0
        [ 5381.660476]  [<ffffffff8104aa3a>] warn_slowpath_null+0x1a/0x20
        [ 5381.660480]  [<ffffffff81144995>] __alloc_pages_nodemask+0x365/0xad0
        [ 5381.660487]  [<ffffffff8108313f>] ? local_clock+0x4f/0x60
        [ 5381.660491]  [<ffffffff811430e8>] ? free_one_page+0x98/0x440
        [ 5381.660495]  [<ffffffff8108313f>] ? local_clock+0x4f/0x60
        [ 5381.660502]  [<ffffffff8113fae4>] ? __get_free_pages+0x14/0x50
        [ 5381.660508]  [<ffffffff81095fb8>] ? trace_hardirqs_off_caller+0x28/0xd0
        [ 5381.660515]  [<ffffffff81183caf>] alloc_pages_current+0x10f/0x1f0
        [ 5381.660520]  [<ffffffff8113fae4>] ? __get_free_pages+0x14/0x50
        [ 5381.660524]  [<ffffffff8113fae4>] __get_free_pages+0x14/0x50
        [ 5381.660530]  [<ffffffff8115dace>] kmalloc_order_trace+0x3e/0x100
        [ 5381.660536]  [<ffffffff81191ea0>] __kmalloc_track_caller+0x220/0x230
        [ 5381.660560]  [<ffffffffa0729fdb>] ? fs_path_ensure_buf.part.12+0x6b/0x200 [btrfs]
        [ 5381.660564]  [<ffffffff8178085c>] ? retint_restore_args+0xe/0xe
        [ 5381.660569]  [<ffffffff811580ef>] krealloc+0x6f/0xb0
        [ 5381.660586]  [<ffffffffa0729fdb>] fs_path_ensure_buf.part.12+0x6b/0x200 [btrfs]
        [ 5381.660601]  [<ffffffffa072a208>] fs_path_prepare_for_add+0x98/0xb0 [btrfs]
        [ 5381.660615]  [<ffffffffa072a2bc>] fs_path_add_path+0x2c/0x60 [btrfs]
        [ 5381.660628]  [<ffffffffa072c55c>] get_cur_path+0x7c/0x1c0 [btrfs]
      
      Even without this loop, the incremental send couldn't succeed, because it would attempt
      to send a rename/move operation for the lower inode before the highest inode number was
      renamed/move. This issue is easy to trigger with the following steps:
      
        $ mkfs.btrfs -f /dev/sdb3
        $ mount /dev/sdb3 /mnt/btrfs
        $ mkdir -p /mnt/btrfs/a/b/c/d
        $ mkdir /mnt/btrfs/a/b/c2
        $ btrfs subvol snapshot -r /mnt/btrfs /mnt/btrfs/snap1
        $ mv /mnt/btrfs/a/b/c/d /mnt/btrfs/a/b/c2/d2
        $ mv /mnt/btrfs/a/b/c /mnt/btrfs/a/b/c2/d2/cc
        $ btrfs subvol snapshot -r /mnt/btrfs /mnt/btrfs/snap2
        $ btrfs send -p /mnt/btrfs/snap1 /mnt/btrfs/snap2 > /tmp/incremental.send
      
      The structure of the filesystem when the first snapshot is taken is:
      
      	 .                       (ino 256)
      	 |-- a                   (ino 257)
      	     |-- b               (ino 258)
      	         |-- c           (ino 259)
      	         |   |-- d       (ino 260)
                       |
      	         |-- c2          (ino 261)
      
      And its structure when the second snapshot is taken is:
      
      	 .                       (ino 256)
      	 |-- a                   (ino 257)
      	     |-- b               (ino 258)
      	         |-- c2          (ino 261)
      	             |-- d2      (ino 260)
      	                 |-- cc  (ino 259)
      
      Before the move/rename operation is performed for the inode 259, the
      move/rename for inode 260 must be performed, since 259 is now a child
      of 260.
      
      A test case for xfstests, with a more complex scenario, will follow soon.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      9f03740a
    • J
      fanotify: Fix use after free for permission events · 85816794
      Jan Kara 提交于
      Currently struct fanotify_event_info has been destroyed immediately
      after reporting its contents to userspace. However that is wrong for
      permission events because those need to stay around until userspace
      provides response which is filled back in fanotify_event_info. So change
      to code to free permission events only after we have got the response
      from userspace.
      Reported-and-tested-by: NJiri Kosina <jkosina@suse.cz>
      Reported-and-tested-by: NDave Jones <davej@fedoraproject.org>
      Signed-off-by: NJan Kara <jack@suse.cz>
      85816794
    • J
      fsnotify: Do not return merged event from fsnotify_add_notify_event() · 83c0e1b4
      Jan Kara 提交于
      The event returned from fsnotify_add_notify_event() cannot ever be used
      safely as the event may be freed by the time the function returns (after
      dropping notification_mutex). So change the prototype to just return
      whether the event was added or merged into some existing event.
      Reported-and-tested-by: NJiri Kosina <jkosina@suse.cz>
      Reported-and-tested-by: NDave Jones <davej@fedoraproject.org>
      Signed-off-by: NJan Kara <jack@suse.cz>
      83c0e1b4
    • J
      fanotify: Fix use after free in mask checking · 13116dfd
      Jan Kara 提交于
      We cannot use the event structure returned from
      fsnotify_add_notify_event() because that event can be freed by the time
      that function returns. Use the mask argument passed into the event
      handler directly instead. This also fixes a possible problem when we
      could unnecessarily wait for permission response for a normal fanotify
      event which got merged with a permission event.
      
      We also disallow merging of permission event with any other event so
      that we know the permission event which we just created is the one on
      which we should wait for permission response.
      Reported-and-tested-by: NJiri Kosina <jkosina@suse.cz>
      Reported-and-tested-by: NDave Jones <davej@fedoraproject.org>
      Signed-off-by: NJan Kara <jack@suse.cz>
      13116dfd
    • L
      ceph: Fix up after semantic merge conflict · 4db658ea
      Linus Torvalds 提交于
      The previous ceph-client merge resulted in ceph not even building,
      because there was a merge conflict that wasn't visible as an actual data
      conflict: commit 7221fe4c ("ceph: add acl for cephfs") added support
      for POSIX ACL's into Ceph, but unluckily we also had the VFS tree change
      a lot of the POSIX ACL helper functions to be much more helpful to
      filesystems (see for example commits 2aeccbe9 "fs: add generic
      xattr_acl handlers", 5bf3258f "fs: make posix_acl_chmod more useful"
      and 37bc1539 "fs: make posix_acl_create more useful")
      
      The reason this conflict wasn't obvious was many-fold: because it was a
      semantic conflict rather than a data conflict, it wasn't visible in the
      git merge as a conflict.  And because the VFS tree hadn't been in
      linux-next, people hadn't become aware of it that way.  And because I
      was at jury duty this morning, I was using my laptop and as a result not
      doing constant "allmodconfig" builds.
      
      Anyway, this fixes the build and generally removes a fair chunk of the
      Ceph POSIX ACL support code, since the improved helpers seem to match
      really well for Ceph too.  But I don't actually have any way to *test*
      the end result, and I was really hoping for some ACK's for this.  Oh,
      well.
      
      Not compiling certainly doesn't make things easier to test, so I'm
      committing this without the acks after having waited for four hours...
      Plus it's what I would have done for the merge had I noticed the
      semantic conflict..
      Reported-by: NDave Jones <davej@redhat.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Guangliang Zhao <lucienchao@gmail.com>
      Cc: Li Wang <li.wang@ubuntykylin.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4db658ea