1. 22 10月, 2015 15 次提交
  2. 11 10月, 2015 1 次提交
    • T
      namei: results of d_is_negative() should be checked after dentry revalidation · daf3761c
      Trond Myklebust 提交于
      Leandro Awa writes:
       "After switching to version 4.1.6, our parallelized and distributed
        workflows now fail consistently with errors of the form:
      
        T34: ./regex.c:39:22: error: config.h: No such file or directory
      
        From our 'git bisect' testing, the following commit appears to be the
        possible cause of the behavior we've been seeing: commit 766c4cbf"
      
      Al Viro says:
       "What happens is that 766c4cbf got the things subtly wrong.
      
        We used to treat d_is_negative() after lookup_fast() as "fall with
        ENOENT".  That was wrong - checking ->d_flags outside of ->d_seq
        protection is unreliable and failing with hard error on what should've
        fallen back to non-RCU pathname resolution is a bug.
      
        Unfortunately, we'd pulled the test too far up and ran afoul of
        another kind of staleness.  The dentry might have been absolutely
        stable from the RCU point of view (and we might be on UP, etc), but
        stale from the remote fs point of view.  If ->d_revalidate() returns
        "it's actually stale", dentry gets thrown away and the original code
        wouldn't even have looked at its ->d_flags.
      
        What we need is to check ->d_flags where 766c4cbf does (prior to
        ->d_seq validation) but only use the result in cases where we do not
        discard this dentry outright"
      Reported-by: NLeandro Awa <lawa@nvidia.com>
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=104911
      Fixes: 766c4cbf ("namei: d_is_negative() should be checked...")
      Tested-by: NLeandro Awa <lawa@nvidia.com>
      Cc: stable@vger.kernel.org # v4.1+
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      daf3761c
  3. 07 10月, 2015 1 次提交
  4. 06 10月, 2015 5 次提交
    • N
      BTRFS: support NFSv2 export · 7d35199e
      NeilBrown 提交于
      The "fh_len" passed to ->fh_to_* is not guaranteed to be that same as
      that returned by encode_fh - it may be larger.
      
      With NFSv2, the filehandle is fixed length, so it may appear longer
      than expected and be zero-padded.
      
      So we must test that fh_len is at least some value, not exactly equal
      to it.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: NDavid Sterba <dsterba@suse.cz>
      7d35199e
    • C
      Btrfs: open_ctree: Fix possible memory leak · e5fffbac
      chandan 提交于
      After reading one of chunk or tree root tree's root node from disk, if the
      root node does not have EXTENT_BUFFER_UPTODATE flag set, we fail to release
      the memory used by the root node. Fix this.
      Signed-off-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
      e5fffbac
    • F
      Btrfs: fix deadlock when finalizing block group creation · d9a0540a
      Filipe Manana 提交于
      Josef ran into a deadlock while a transaction handle was finalizing the
      creation of its block groups, which produced the following trace:
      
        [260445.593112] fio             D ffff88022a9df468     0  8924   4518 0x00000084
        [260445.593119]  ffff88022a9df468 ffffffff81c134c0 ffff880429693c00 ffff88022a9df488
        [260445.593126]  ffff88022a9e0000 ffff8803490d7b00 ffff8803490d7b18 ffff88022a9df4b0
        [260445.593132]  ffff8803490d7af8 ffff88022a9df488 ffffffff8175a437 ffff8803490d7b00
        [260445.593137] Call Trace:
        [260445.593145]  [<ffffffff8175a437>] schedule+0x37/0x80
        [260445.593189]  [<ffffffffa0850f37>] btrfs_tree_lock+0xa7/0x1f0 [btrfs]
        [260445.593197]  [<ffffffff810db7c0>] ? prepare_to_wait_event+0xf0/0xf0
        [260445.593225]  [<ffffffffa07eac44>] btrfs_lock_root_node+0x34/0x50 [btrfs]
        [260445.593253]  [<ffffffffa07eff6b>] btrfs_search_slot+0x88b/0xa00 [btrfs]
        [260445.593295]  [<ffffffffa08389df>] ? free_extent_buffer+0x4f/0x90 [btrfs]
        [260445.593324]  [<ffffffffa07f1a06>] btrfs_insert_empty_items+0x66/0xc0 [btrfs]
        [260445.593351]  [<ffffffffa07ea94a>] ? btrfs_alloc_path+0x1a/0x20 [btrfs]
        [260445.593394]  [<ffffffffa08403b9>] btrfs_finish_chunk_alloc+0x1c9/0x570 [btrfs]
        [260445.593427]  [<ffffffffa08002ab>] btrfs_create_pending_block_groups+0x11b/0x200 [btrfs]
        [260445.593459]  [<ffffffffa0800964>] do_chunk_alloc+0x2a4/0x2e0 [btrfs]
        [260445.593491]  [<ffffffffa0803815>] find_free_extent+0xa55/0xd90 [btrfs]
        [260445.593524]  [<ffffffffa0803c22>] btrfs_reserve_extent+0xd2/0x220 [btrfs]
        [260445.593532]  [<ffffffff8119fe5d>] ? account_page_dirtied+0xdd/0x170
        [260445.593564]  [<ffffffffa0803e78>] btrfs_alloc_tree_block+0x108/0x4a0 [btrfs]
        [260445.593597]  [<ffffffffa080c9de>] ? btree_set_page_dirty+0xe/0x10 [btrfs]
        [260445.593626]  [<ffffffffa07eb5cd>] __btrfs_cow_block+0x12d/0x5b0 [btrfs]
        [260445.593654]  [<ffffffffa07ebbff>] btrfs_cow_block+0x11f/0x1c0 [btrfs]
        [260445.593682]  [<ffffffffa07ef8c7>] btrfs_search_slot+0x1e7/0xa00 [btrfs]
        [260445.593724]  [<ffffffffa08389df>] ? free_extent_buffer+0x4f/0x90 [btrfs]
        [260445.593752]  [<ffffffffa07f1a06>] btrfs_insert_empty_items+0x66/0xc0 [btrfs]
        [260445.593830]  [<ffffffffa07ea94a>] ? btrfs_alloc_path+0x1a/0x20 [btrfs]
        [260445.593905]  [<ffffffffa08403b9>] btrfs_finish_chunk_alloc+0x1c9/0x570 [btrfs]
        [260445.593946]  [<ffffffffa08002ab>] btrfs_create_pending_block_groups+0x11b/0x200 [btrfs]
        [260445.593990]  [<ffffffffa0815798>] btrfs_commit_transaction+0xa8/0xb40 [btrfs]
        [260445.594042]  [<ffffffffa085abcd>] ? btrfs_log_dentry_safe+0x6d/0x80 [btrfs]
        [260445.594089]  [<ffffffffa082bc84>] btrfs_sync_file+0x294/0x350 [btrfs]
        [260445.594115]  [<ffffffff8123e29b>] vfs_fsync_range+0x3b/0xa0
        [260445.594133]  [<ffffffff81023891>] ? syscall_trace_enter_phase1+0x131/0x180
        [260445.594149]  [<ffffffff8123e35d>] do_fsync+0x3d/0x70
        [260445.594169]  [<ffffffff81023bb8>] ? syscall_trace_leave+0xb8/0x110
        [260445.594187]  [<ffffffff8123e600>] SyS_fsync+0x10/0x20
        [260445.594204]  [<ffffffff8175de6e>] entry_SYSCALL_64_fastpath+0x12/0x71
      
      This happened because the same transaction handle created a large number
      of block groups and while finalizing their creation (inserting new items
      and updating existing items in the chunk and device trees) a new metadata
      extent had to be allocated and no free space was found in the current
      metadata block groups, which made find_free_extent() attempt to allocate
      a new block group via do_chunk_alloc(). However at do_chunk_alloc() we
      ended up allocating a new system chunk too and exceeded the threshold
      of 2Mb of reserved chunk bytes, which makes do_chunk_alloc() enter the
      final part of block group creation again (at
      btrfs_create_pending_block_groups()) and attempt to lock again the root
      of the chunk tree when it's already write locked by the same task.
      
      Similarly we can deadlock on extent tree nodes/leafs if while we are
      running delayed references we end up creating a new metadata block group
      in order to allocate a new node/leaf for the extent tree (as part of
      a CoW operation or growing the tree), as btrfs_create_pending_block_groups
      inserts items into the extent tree as well. In this case we get the
      following trace:
      
        [14242.773581] fio             D ffff880428ca3418     0  3615   3100 0x00000084
        [14242.773588]  ffff880428ca3418 ffff88042d66b000 ffff88042a03c800 ffff880428ca3438
        [14242.773594]  ffff880428ca4000 ffff8803e4b20190 ffff8803e4b201a8 ffff880428ca3460
        [14242.773600]  ffff8803e4b20188 ffff880428ca3438 ffffffff8175a437 ffff8803e4b20190
        [14242.773606] Call Trace:
        [14242.773613]  [<ffffffff8175a437>] schedule+0x37/0x80
        [14242.773656]  [<ffffffffa057ff07>] btrfs_tree_lock+0xa7/0x1f0 [btrfs]
        [14242.773664]  [<ffffffff810db7c0>] ? prepare_to_wait_event+0xf0/0xf0
        [14242.773692]  [<ffffffffa0519c44>] btrfs_lock_root_node+0x34/0x50 [btrfs]
        [14242.773720]  [<ffffffffa051ef6b>] btrfs_search_slot+0x88b/0xa00 [btrfs]
        [14242.773750]  [<ffffffffa0520a06>] btrfs_insert_empty_items+0x66/0xc0 [btrfs]
        [14242.773758]  [<ffffffff811ef4a2>] ? kmem_cache_alloc+0x1d2/0x200
        [14242.773786]  [<ffffffffa0520ad1>] btrfs_insert_item+0x71/0xf0 [btrfs]
        [14242.773818]  [<ffffffffa052f292>] btrfs_create_pending_block_groups+0x102/0x200 [btrfs]
        [14242.773850]  [<ffffffffa052f96e>] do_chunk_alloc+0x2ae/0x2f0 [btrfs]
        [14242.773934]  [<ffffffffa0532825>] find_free_extent+0xa55/0xd90 [btrfs]
        [14242.773998]  [<ffffffffa0532c22>] btrfs_reserve_extent+0xc2/0x1d0 [btrfs]
        [14242.774041]  [<ffffffffa0532e38>] btrfs_alloc_tree_block+0x108/0x4a0 [btrfs]
        [14242.774078]  [<ffffffffa051a5cd>] __btrfs_cow_block+0x12d/0x5b0 [btrfs]
        [14242.774118]  [<ffffffffa051abff>] btrfs_cow_block+0x11f/0x1c0 [btrfs]
        [14242.774155]  [<ffffffffa051e8c7>] btrfs_search_slot+0x1e7/0xa00 [btrfs]
        [14242.774194]  [<ffffffffa0528021>] ? __btrfs_free_extent.isra.70+0x2e1/0xcb0 [btrfs]
        [14242.774235]  [<ffffffffa0520a06>] btrfs_insert_empty_items+0x66/0xc0 [btrfs]
        [14242.774274]  [<ffffffffa051994a>] ? btrfs_alloc_path+0x1a/0x20 [btrfs]
        [14242.774318]  [<ffffffffa052c433>] __btrfs_run_delayed_refs+0xbb3/0x1020 [btrfs]
        [14242.774358]  [<ffffffffa052f404>] btrfs_run_delayed_refs.part.78+0x74/0x280 [btrfs]
        [14242.774391]  [<ffffffffa052f627>] btrfs_run_delayed_refs+0x17/0x20 [btrfs]
        [14242.774432]  [<ffffffffa05be236>] commit_cowonly_roots+0x8d/0x2bd [btrfs]
        [14242.774474]  [<ffffffffa059d07f>] ? __btrfs_run_delayed_items+0x1cf/0x210 [btrfs]
        [14242.774516]  [<ffffffffa05adac3>] ? btrfs_qgroup_account_extents+0x83/0x130 [btrfs]
        [14242.774558]  [<ffffffffa0544c40>] btrfs_commit_transaction+0x590/0xb40 [btrfs]
        [14242.774599]  [<ffffffffa0589b9d>] ? btrfs_log_dentry_safe+0x6d/0x80 [btrfs]
        [14242.774642]  [<ffffffffa055ac54>] btrfs_sync_file+0x294/0x350 [btrfs]
        [14242.774650]  [<ffffffff8123e29b>] vfs_fsync_range+0x3b/0xa0
        [14242.774657]  [<ffffffff81023891>] ? syscall_trace_enter_phase1+0x131/0x180
        [14242.774663]  [<ffffffff8123e35d>] do_fsync+0x3d/0x70
        [14242.774669]  [<ffffffff81023bb8>] ? syscall_trace_leave+0xb8/0x110
        [14242.774675]  [<ffffffff8123e600>] SyS_fsync+0x10/0x20
        [14242.774681]  [<ffffffff8175de6e>] entry_SYSCALL_64_fastpath+0x12/0x71
      
      Fix this by never recursing into the finalization phase of block group
      creation and making sure we never trigger the finalization of block group
      creation while running delayed references.
      Reported-by: NJosef Bacik <jbacik@fb.com>
      Fixes: 00d80e34 ("Btrfs: fix quick exhaustion of the system array in the superblock")
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      d9a0540a
    • F
      Btrfs: update fix for read corruption of compressed and shared extents · 808f80b4
      Filipe Manana 提交于
      My previous fix in commit 005efedf ("Btrfs: fix read corruption of
      compressed and shared extents") was effective only if the compressed
      extents cover a file range with a length that is not a multiple of 16
      pages. That's because the detection of when we reached a different range
      of the file that shares the same compressed extent as the previously
      processed range was done at extent_io.c:__do_contiguous_readpages(),
      which covers subranges with a length up to 16 pages, because
      extent_readpages() groups the pages in clusters no larger than 16 pages.
      So fix this by tracking the start of the previously processed file
      range's extent map at extent_readpages().
      
      The following test case for fstests reproduces the issue:
      
        seq=`basename $0`
        seqres=$RESULT_DIR/$seq
        echo "QA output created by $seq"
        tmp=/tmp/$$
        status=1	# failure is the default!
        trap "_cleanup; exit \$status" 0 1 2 3 15
      
        _cleanup()
        {
            rm -f $tmp.*
        }
      
        # get standard environment, filters and checks
        . ./common/rc
        . ./common/filter
      
        # real QA test starts here
        _need_to_be_root
        _supported_fs btrfs
        _supported_os Linux
        _require_scratch
        _require_cloner
      
        rm -f $seqres.full
      
        test_clone_and_read_compressed_extent()
        {
            local mount_opts=$1
      
            _scratch_mkfs >>$seqres.full 2>&1
            _scratch_mount $mount_opts
      
            # Create our test file with a single extent of 64Kb that is going to
            # be compressed no matter which compression algo is used (zlib/lzo).
            $XFS_IO_PROG -f -c "pwrite -S 0xaa 0K 64K" \
                $SCRATCH_MNT/foo | _filter_xfs_io
      
            # Now clone the compressed extent into an adjacent file offset.
            $CLONER_PROG -s 0 -d $((64 * 1024)) -l $((64 * 1024)) \
                $SCRATCH_MNT/foo $SCRATCH_MNT/foo
      
            echo "File digest before unmount:"
            md5sum $SCRATCH_MNT/foo | _filter_scratch
      
            # Remount the fs or clear the page cache to trigger the bug in
            # btrfs. Because the extent has an uncompressed length that is a
            # multiple of 16 pages, all the pages belonging to the second range
            # of the file (64K to 128K), which points to the same extent as the
            # first range (0K to 64K), had their contents full of zeroes instead
            # of the byte 0xaa. This was a bug exclusively in the read path of
            # compressed extents, the correct data was stored on disk, btrfs
            # just failed to fill in the pages correctly.
            _scratch_remount
      
            echo "File digest after remount:"
            # Must match the digest we got before.
            md5sum $SCRATCH_MNT/foo | _filter_scratch
        }
      
        echo -e "\nTesting with zlib compression..."
        test_clone_and_read_compressed_extent "-o compress=zlib"
      
        _scratch_unmount
      
        echo -e "\nTesting with lzo compression..."
        test_clone_and_read_compressed_extent "-o compress=lzo"
      
        status=0
        exit
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Tested-by: NTimofey Titovets <nefelim4ag@gmail.com>
      808f80b4
    • F
      Btrfs: send, fix corner case for reference overwrite detection · b786f16a
      Filipe Manana 提交于
      When the inode given to did_overwrite_ref() matches the current progress
      and has a reference that collides with the reference of other inode that
      has the same number as the current progress, we were always telling our
      caller that the inode's reference was overwritten, which is incorrect
      because the other inode might be a new inode (different generation number)
      in which case we must return false from did_overwrite_ref() so that its
      callers don't use an orphanized path for the inode (as it will never be
      orphanized, instead it will be unlinked and the new inode created later).
      
      The following test case for fstests reproduces the issue:
      
        seq=`basename $0`
        seqres=$RESULT_DIR/$seq
        echo "QA output created by $seq"
      
        tmp=/tmp/$$
        status=1	# failure is the default!
        trap "_cleanup; exit \$status" 0 1 2 3 15
      
        _cleanup()
        {
            rm -fr $send_files_dir
            rm -f $tmp.*
        }
      
        # get standard environment, filters and checks
        . ./common/rc
        . ./common/filter
      
        # real QA test starts here
        _supported_fs btrfs
        _supported_os Linux
        _require_scratch
        _need_to_be_root
      
        send_files_dir=$TEST_DIR/btrfs-test-$seq
      
        rm -f $seqres.full
        rm -fr $send_files_dir
        mkdir $send_files_dir
      
        _scratch_mkfs >>$seqres.full 2>&1
        _scratch_mount
      
        # Create our test file with a single extent of 64K.
        mkdir -p $SCRATCH_MNT/foo
        $XFS_IO_PROG -f -c "pwrite -S 0xaa 0 64K" $SCRATCH_MNT/foo/bar \
            | _filter_xfs_io
      
        _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT \
            $SCRATCH_MNT/mysnap1
        _run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT \
            $SCRATCH_MNT/mysnap2
      
        echo "File digest before being replaced:"
        md5sum $SCRATCH_MNT/mysnap1/foo/bar | _filter_scratch
      
        # Remove the file and then create a new one in the same location with
        # the same name but with different content. This new file ends up
        # getting the same inode number as the previous one, because that inode
        # number was the highest inode number used by the snapshot's root and
        # therefore when attempting to find the a new inode number for the new
        # file, we end up reusing the same inode number. This happens because
        # currently btrfs uses the highest inode number summed by 1 for the
        # first inode created once a snapshot's root is loaded (done at
        # fs/btrfs/inode-map.c:btrfs_find_free_objectid in the linux kernel
        # tree).
        # Having these two different files in the snapshots with the same inode
        # number (but different generation numbers) caused the btrfs send code
        # to emit an incorrect path for the file when issuing an unlink
        # operation because it failed to realize they were different files.
        rm -f $SCRATCH_MNT/mysnap2/foo/bar
        $XFS_IO_PROG -f -c "pwrite -S 0xbb 0 96K" \
            $SCRATCH_MNT/mysnap2/foo/bar | _filter_xfs_io
      
        _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT/mysnap2 \
            $SCRATCH_MNT/mysnap2_ro
      
        _run_btrfs_util_prog send $SCRATCH_MNT/mysnap1 -f $send_files_dir/1.snap
        _run_btrfs_util_prog send -p $SCRATCH_MNT/mysnap1 \
            $SCRATCH_MNT/mysnap2_ro -f $send_files_dir/2.snap
      
        echo "File digest in the original filesystem after being replaced:"
        md5sum $SCRATCH_MNT/mysnap2_ro/foo/bar | _filter_scratch
      
        # Now recreate the filesystem by receiving both send streams and verify
        # we get the same file contents that the original filesystem had.
        _scratch_unmount
        _scratch_mkfs >>$seqres.full 2>&1
        _scratch_mount
      
        _run_btrfs_util_prog receive -vv $SCRATCH_MNT -f $send_files_dir/1.snap
        _run_btrfs_util_prog receive -vv $SCRATCH_MNT -f $send_files_dir/2.snap
      
        echo "File digest in the new filesystem:"
        # Must match the digest from the new file.
        md5sum $SCRATCH_MNT/mysnap2_ro/foo/bar | _filter_scratch
      
        status=0
        exit
      Reported-by: NMartin Raiber <martin@urbackup.org>
      Fixes: 8b191a68 ("Btrfs: incremental send, check if orphanized dir inode needs delayed rename")
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      b786f16a
  5. 04 10月, 2015 1 次提交
  6. 03 10月, 2015 6 次提交
  7. 02 10月, 2015 2 次提交
    • S
      [SMB3] Do not fall back to SMBWriteX in set_file_size error cases · 646200a0
      Steve French 提交于
      The error paths in set_file_size for cifs and smb3 are incorrect.
      
      In the unlikely event that a server did not support set file info
      of the file size, the code incorrectly falls back to trying SMBWriteX
      (note that only the original core SMB Write, used for example by DOS,
      can set the file size this way - this actually  does not work for the more
      recent SMBWriteX).  The idea was since the old DOS SMB Write could set
      the file size if you write zero bytes at that offset then use that if
      server rejects the normal set file info call.
      
      Fortunately the SMBWriteX will never be sent on the wire (except when
      file size is zero) since the length and offset fields were reversed
      in the two places in this function that call SMBWriteX causing
      the fall back path to return an error. It is also important to never call
      an SMB request from an SMB2/sMB3 session (which theoretically would
      be possible, and can cause a brief session drop, although the client
      recovers) so this should be fixed.  In practice this path does not happen
      with modern servers but the error fall back to SMBWriteX is clearly wrong.
      
      Removing the calls to SMBWriteX in the error paths in cifs_set_file_size
      
      Pointed out by PaX/grsecurity team
      Signed-off-by: NSteve French <steve.french@primarydata.com>
      Reported-by: NPaX Team <pageexec@freemail.hu>
      CC: Emese Revfy <re.emese@gmail.com>
      CC: Brad Spengler <spender@grsecurity.net>
      CC: Stable <stable@vger.kernel.org>
      646200a0
    • R
      dax: fix NULL pointer in __dax_pmd_fault() · 8346c416
      Ross Zwisler 提交于
      Commit 46c043ed ("mm: take i_mmap_lock in unmap_mapping_range() for
      DAX") moved some code in __dax_pmd_fault() that was responsible for
      zeroing newly allocated PMD pages.  The new location didn't properly set
      up 'kaddr', so when run this code resulted in a NULL pointer BUG.
      
      Fix this by getting the correct 'kaddr' via bdev_direct_access().
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Reported-by: NDan Williams <dan.j.williams@intel.com>
      Reviewed-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8346c416
  8. 29 9月, 2015 1 次提交
    • R
      UBIFS: Kill unneeded locking in ubifs_init_security · cf6f54e3
      Richard Weinberger 提交于
      Fixes the following lockdep splat:
      [    1.244527] =============================================
      [    1.245193] [ INFO: possible recursive locking detected ]
      [    1.245193] 4.2.0-rc1+ #37 Not tainted
      [    1.245193] ---------------------------------------------
      [    1.245193] cp/742 is trying to acquire lock:
      [    1.245193]  (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [<ffffffff812b3f69>] ubifs_init_security+0x29/0xb0
      [    1.245193]
      [    1.245193] but task is already holding lock:
      [    1.245193]  (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [<ffffffff81198e7f>] path_openat+0x3af/0x1280
      [    1.245193]
      [    1.245193] other info that might help us debug this:
      [    1.245193]  Possible unsafe locking scenario:
      [    1.245193]
      [    1.245193]        CPU0
      [    1.245193]        ----
      [    1.245193]   lock(&sb->s_type->i_mutex_key#9);
      [    1.245193]   lock(&sb->s_type->i_mutex_key#9);
      [    1.245193]
      [    1.245193]  *** DEADLOCK ***
      [    1.245193]
      [    1.245193]  May be due to missing lock nesting notation
      [    1.245193]
      [    1.245193] 2 locks held by cp/742:
      [    1.245193]  #0:  (sb_writers#5){.+.+.+}, at: [<ffffffff811ad37f>] mnt_want_write+0x1f/0x50
      [    1.245193]  #1:  (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [<ffffffff81198e7f>] path_openat+0x3af/0x1280
      [    1.245193]
      [    1.245193] stack backtrace:
      [    1.245193] CPU: 2 PID: 742 Comm: cp Not tainted 4.2.0-rc1+ #37
      [    1.245193] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140816_022509-build35 04/01/2014
      [    1.245193]  ffffffff8252d530 ffff88007b023a38 ffffffff814f6f49 ffffffff810b56c5
      [    1.245193]  ffff88007c30cc80 ffff88007b023af8 ffffffff810a150d ffff88007b023a68
      [    1.245193]  000000008101302a ffff880000000000 00000008f447e23f ffffffff8252d500
      [    1.245193] Call Trace:
      [    1.245193]  [<ffffffff814f6f49>] dump_stack+0x4c/0x65
      [    1.245193]  [<ffffffff810b56c5>] ? console_unlock+0x1c5/0x510
      [    1.245193]  [<ffffffff810a150d>] __lock_acquire+0x1a6d/0x1ea0
      [    1.245193]  [<ffffffff8109fa78>] ? __lock_is_held+0x58/0x80
      [    1.245193]  [<ffffffff810a1a93>] lock_acquire+0xd3/0x270
      [    1.245193]  [<ffffffff812b3f69>] ? ubifs_init_security+0x29/0xb0
      [    1.245193]  [<ffffffff814fc83b>] mutex_lock_nested+0x6b/0x3a0
      [    1.245193]  [<ffffffff812b3f69>] ? ubifs_init_security+0x29/0xb0
      [    1.245193]  [<ffffffff812b3f69>] ? ubifs_init_security+0x29/0xb0
      [    1.245193]  [<ffffffff812b3f69>] ubifs_init_security+0x29/0xb0
      [    1.245193]  [<ffffffff8128e286>] ubifs_create+0xa6/0x1f0
      [    1.245193]  [<ffffffff81198e7f>] ? path_openat+0x3af/0x1280
      [    1.245193]  [<ffffffff81195d15>] vfs_create+0x95/0xc0
      [    1.245193]  [<ffffffff8119929c>] path_openat+0x7cc/0x1280
      [    1.245193]  [<ffffffff8109ffe3>] ? __lock_acquire+0x543/0x1ea0
      [    1.245193]  [<ffffffff81088f20>] ? sched_clock_cpu+0x90/0xc0
      [    1.245193]  [<ffffffff81088c00>] ? calc_global_load_tick+0x60/0x90
      [    1.245193]  [<ffffffff81088f20>] ? sched_clock_cpu+0x90/0xc0
      [    1.245193]  [<ffffffff811a9cef>] ? __alloc_fd+0xaf/0x180
      [    1.245193]  [<ffffffff8119ac55>] do_filp_open+0x75/0xd0
      [    1.245193]  [<ffffffff814ffd86>] ? _raw_spin_unlock+0x26/0x40
      [    1.245193]  [<ffffffff811a9cef>] ? __alloc_fd+0xaf/0x180
      [    1.245193]  [<ffffffff81189bd9>] do_sys_open+0x129/0x200
      [    1.245193]  [<ffffffff81189cc9>] SyS_open+0x19/0x20
      [    1.245193]  [<ffffffff81500717>] entry_SYSCALL_64_fastpath+0x12/0x6f
      
      While the lockdep splat is a false positive, becuase path_openat holds i_mutex
      of the parent directory and ubifs_init_security() tries to acquire i_mutex
      of a new inode, it reveals that taking i_mutex in ubifs_init_security() is
      in vain because it is only being called in the inode allocation path
      and therefore nobody else can see the inode yet.
      
      Cc: stable@vger.kernel.org # 3.20-
      Reported-and-tested-by: NBoris Brezillon <boris.brezillon@free-electrons.com>
      Reviewed-and-tested-by: NDongsheng Yang <yangds.fnst@cn.fujitsu.com>
      Signed-off-by: NRichard Weinberger <richard@nod.at>
      Signed-off-by: dedekind1@gmail.com
      cf6f54e3
  9. 26 9月, 2015 1 次提交
  10. 24 9月, 2015 2 次提交
  11. 23 9月, 2015 5 次提交
    • P
      NFS41: make close wait for layoutreturn · 500d701f
      Peng Tao 提交于
      If we send a layoutreturn asynchronously before close, the close
      might reach server first and layoutreturn would fail with BADSTATEID
      because there is nothing keeping the layout stateid alive.
      
      Also do not pretend sending layoutreturn if we are not.
      Signed-off-by: NPeng Tao <tao.peng@primarydata.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      500d701f
    • J
      ocfs2/dlm: fix deadlock when dispatch assert master · 012572d4
      Joseph Qi 提交于
      The order of the following three spinlocks should be:
      dlm_domain_lock < dlm_ctxt->spinlock < dlm_lock_resource->spinlock
      
      But dlm_dispatch_assert_master() is called while holding
      dlm_ctxt->spinlock and dlm_lock_resource->spinlock, and then it calls
      dlm_grab() which will take dlm_domain_lock.
      
      Once another thread (for example, dlm_query_join_handler) has already
      taken dlm_domain_lock, and tries to take dlm_ctxt->spinlock deadlock
      happens.
      Signed-off-by: NJoseph Qi <joseph.qi@huawei.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: "Junxiao Bi" <junxiao.bi@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      012572d4
    • A
      userfaultfd: revert "userfaultfd: waitqueue: add nr wake parameter to __wake_up_locked_key" · ac5be6b4
      Andrea Arcangeli 提交于
      This reverts commit 51360155 and adapts
      fs/userfaultfd.c to use the old version of that function.
      
      It didn't look robust to call __wake_up_common with "nr == 1" when we
      absolutely require wakeall semantics, but we've full control of what we
      insert in the two waitqueue heads of the blocked userfaults.  No
      exclusive waitqueue risks to be inserted into those two waitqueue heads
      so we can as well stick to "nr == 1" of the old code and we can rely
      purely on the fact no waitqueue inserted in one of the two waitqueue
      heads we must enforce as wakeall, has wait->flags WQ_FLAG_EXCLUSIVE set.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Shuah Khan <shuahkh@osg.samsung.com>
      Cc: Thierry Reding <treding@nvidia.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ac5be6b4
    • K
      NFS: Skip checking ds_cinfo.buckets when lseg's commit_through_mds is set · 834e465b
      Kinglong Mee 提交于
      When lseg's commit_through_mds is set, pnfs client always WARN once
      in nfs_direct_select_verf after checking ds_cinfo.nbuckets.
      
      nfs should use the DS verf except commit_through_mds is set for
      layout segment where nbuckets is zero.
      
      [17844.666094] ------------[ cut here ]------------
      [17844.667071] WARNING: CPU: 0 PID: 21758 at /root/source/linux-pnfs/fs/nfs/direct.c:174 nfs_direct_select_verf+0x5a/0x70 [nfs]()
      [17844.668650] Modules linked in: nfs_layout_nfsv41_files(OE) nfsv4(OE) nfs(OE) fscache(E) nfsd(OE) xfs libcrc32c btrfs ppdev coretemp crct10dif_pclmul auth_rpcgss crc32_pclmul crc32c_intel nfs_acl ghash_clmulni_intel lockd vmw_balloon xor vmw_vmci grace raid6_pq shpchp sunrpc parport_pc i2c_piix4 parport vmwgfx drm_kms_helper ttm drm serio_raw mptspi e1000 scsi_transport_spi mptscsih mptbase ata_generic pata_acpi [last unloaded: fscache]
      [17844.686676] CPU: 0 PID: 21758 Comm: kworker/0:1 Tainted: G        W  OE   4.3.0-rc1-pnfs+ #245
      [17844.687352] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014
      [17844.698502] Workqueue: nfsiod rpc_async_release [sunrpc]
      [17844.699212]  0000000000000009 0000000043e58010 ffff8800454fbc10 ffffffff813680c4
      [17844.699990]  ffff8800454fbc48 ffffffff8108b49d ffff88004eb20000 ffff88004eb20000
      [17844.700844]  ffff880062e26000 0000000000000000 0000000000000001 ffff8800454fbc58
      [17844.701637] Call Trace:
      [17844.725252]  [<ffffffff813680c4>] dump_stack+0x19/0x25
      [17844.732693]  [<ffffffff8108b49d>] warn_slowpath_common+0x7d/0xb0
      [17844.733855]  [<ffffffff8108b5da>] warn_slowpath_null+0x1a/0x20
      [17844.735015]  [<ffffffffa04a27ca>] nfs_direct_select_verf+0x5a/0x70 [nfs]
      [17844.735999]  [<ffffffffa04a2b83>] nfs_direct_set_hdr_verf+0x23/0x90 [nfs]
      [17844.736846]  [<ffffffffa04a2e17>] nfs_direct_write_completion+0x227/0x260 [nfs]
      [17844.737782]  [<ffffffffa04a433c>] nfs_pgio_release+0x1c/0x20 [nfs]
      [17844.738597]  [<ffffffffa0502df3>] pnfs_generic_rw_release+0x23/0x30 [nfsv4]
      [17844.739486]  [<ffffffffa01cbbea>] rpc_free_task+0x2a/0x70 [sunrpc]
      [17844.740326]  [<ffffffffa01cbcd5>] rpc_async_release+0x15/0x20 [sunrpc]
      [17844.741173]  [<ffffffff810a387c>] process_one_work+0x21c/0x4c0
      [17844.741984]  [<ffffffff810a37cd>] ? process_one_work+0x16d/0x4c0
      [17844.742837]  [<ffffffff810a3b6a>] worker_thread+0x4a/0x440
      [17844.743639]  [<ffffffff810a3b20>] ? process_one_work+0x4c0/0x4c0
      [17844.744399]  [<ffffffff810a3b20>] ? process_one_work+0x4c0/0x4c0
      [17844.745176]  [<ffffffff810a8d75>] kthread+0xf5/0x110
      [17844.745927]  [<ffffffff810a8c80>] ? kthread_create_on_node+0x240/0x240
      [17844.747105]  [<ffffffff8172ce1f>] ret_from_fork+0x3f/0x70
      [17844.747856]  [<ffffffff810a8c80>] ? kthread_create_on_node+0x240/0x240
      [17844.748642] ---[ end trace 336a2845d42b83f0 ]---
      Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      834e465b
    • P
      cifs: use server timestamp for ntlmv2 authentication · 98ce94c8
      Peter Seiderer 提交于
      Linux cifs mount with ntlmssp against an Mac OS X (Yosemite
      10.10.5) share fails in case the clocks differ more than +/-2h:
      
      digest-service: digest-request: od failed with 2 proto=ntlmv2
      digest-service: digest-request: kdc failed with -1561745592 proto=ntlmv2
      
      Fix this by (re-)using the given server timestamp for the
      ntlmv2 authentication (as Windows 7 does).
      
      A related problem was also reported earlier by Namjae Jaen (see below):
      
      Windows machine has extended security feature which refuse to allow
      authentication when there is time difference between server time and
      client time when ntlmv2 negotiation is used. This problem is prevalent
      in embedded enviornment where system time is set to default 1970.
      
      Modern servers send the server timestamp in the TargetInfo Av_Pair
      structure in the challenge message [see MS-NLMP 2.2.2.1]
      In [MS-NLMP 3.1.5.1.2] it is explicitly mentioned that the client must
      use the server provided timestamp if present OR current time if it is
      not
      Reported-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NPeter Seiderer <ps.report@gmx.net>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      CC: Stable <stable@vger.kernel.org>
      98ce94c8