1. 09 8月, 2015 5 次提交
    • F
      Btrfs: fix stale dir entries after removing a link and fsync · 18aa0922
      Filipe Manana 提交于
      We have one more case where after a log tree is replayed we get
      inconsistent metadata leading to stale directory entries, due to
      some directories having entries pointing to some inode while the
      inode does not have a matching BTRFS_INODE_[REF|EXTREF]_KEY item.
      
      To trigger the problem we need to have a file with multiple hard links
      belonging to different parent directories. Then if one of those hard
      links is removed and we fsync the file using one of its other links
      that belongs to a different parent directory, we end up not logging
      the fact that the removed hard link doesn't exists anymore in the
      parent directory.
      
      Simple reproducer:
      
        seq=`basename $0`
        seqres=$RESULT_DIR/$seq
        echo "QA output created by $seq"
        tmp=/tmp/$$
        status=1	# failure is the default!
        trap "_cleanup; exit \$status" 0 1 2 3 15
      
        _cleanup()
        {
            _cleanup_flakey
            rm -f $tmp.*
        }
      
        # get standard environment, filters and checks
        . ./common/rc
        . ./common/filter
        . ./common/dmflakey
      
        # real QA test starts here
        _need_to_be_root
        _supported_fs generic
        _supported_os Linux
        _require_scratch
        _require_dm_flakey
        _require_metadata_journaling $SCRATCH_DEV
      
        rm -f $seqres.full
      
        _scratch_mkfs >>$seqres.full 2>&1
        _init_flakey
        _mount_flakey
      
        # Create our test directory and file.
        mkdir $SCRATCH_MNT/testdir
        touch $SCRATCH_MNT/foo
        ln $SCRATCH_MNT/foo $SCRATCH_MNT/testdir/foo2
        ln $SCRATCH_MNT/foo $SCRATCH_MNT/testdir/foo3
      
        # Make sure everything done so far is durably persisted.
        sync
      
        # Now we remove one of our file's hardlinks in the directory testdir.
        unlink $SCRATCH_MNT/testdir/foo3
      
        # We now fsync our file using the "foo" link, which has a parent that
        # is not the directory "testdir".
        $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foo
      
        # Silently drop all writes and unmount to simulate a crash/power
        # failure.
        _load_flakey_table $FLAKEY_DROP_WRITES
        _unmount_flakey
      
        # Allow writes again, mount to trigger journal/log replay.
        _load_flakey_table $FLAKEY_ALLOW_WRITES
        _mount_flakey
      
        # After the journal/log is replayed we expect to not see the "foo3"
        # link anymore and we should be able to remove all names in the
        # directory "testdir" and then remove it (no stale directory entries
        # left after the journal/log replay).
        echo "Entries in testdir:"
        ls -1 $SCRATCH_MNT/testdir
      
        rm -f $SCRATCH_MNT/testdir/*
        rmdir $SCRATCH_MNT/testdir
      
        _unmount_flakey
      
        status=0
        exit
      
      The test fails with:
      
        $ ./check generic/107
        FSTYP         -- btrfs
        PLATFORM      -- Linux/x86_64 debian3 4.1.0-rc6-btrfs-next-11+
        MKFS_OPTIONS  -- /dev/sdc
        MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1
      
        generic/107 3s ... - output mismatch (see .../results/generic/107.out.bad)
          --- tests/generic/107.out	2015-08-01 01:39:45.807462161 +0100
          +++ /home/fdmanana/git/hub/xfstests/results//generic/107.out.bad
          @@ -1,3 +1,5 @@
           QA output created by 107
           Entries in testdir:
           foo2
          +foo3
          +rmdir: failed to remove '/home/fdmanana/btrfs-tests/scratch_1/testdir': Directory not empty
          ...
          _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent \
            (see /home/fdmanana/git/hub/xfstests/results//generic/107.full)
          _check_dmesg: something found in dmesg (see .../results/generic/107.dmesg)
        Ran: generic/107
        Failures: generic/107
        Failed 1 of 1 tests
      
        $ cat /home/fdmanana/git/hub/xfstests/results//generic/107.full
        (...)
        checking fs roots
        root 5 inode 257 errors 200, dir isize wrong
      	unresolved ref dir 257 index 3 namelen 4 name foo3 filetype 1 errors 5, no dir item, no inode ref
        (...)
      
      And produces the following warning in dmesg:
      
        [127298.759064] BTRFS info (device dm-0): failed to delete reference to foo3, inode 258 parent 257
        [127298.762081] ------------[ cut here ]------------
        [127298.763311] WARNING: CPU: 10 PID: 7891 at fs/btrfs/inode.c:3956 __btrfs_unlink_inode+0x182/0x35a [btrfs]()
        [127298.767327] BTRFS: Transaction aborted (error -2)
        (...)
        [127298.788611] Call Trace:
        [127298.789137]  [<ffffffff8145f077>] dump_stack+0x4f/0x7b
        [127298.790090]  [<ffffffff81095de5>] ? console_unlock+0x356/0x3a2
        [127298.791157]  [<ffffffff8104b3b0>] warn_slowpath_common+0xa1/0xbb
        [127298.792323]  [<ffffffffa065ad09>] ? __btrfs_unlink_inode+0x182/0x35a [btrfs]
        [127298.793633]  [<ffffffff8104b410>] warn_slowpath_fmt+0x46/0x48
        [127298.794699]  [<ffffffffa065ad09>] __btrfs_unlink_inode+0x182/0x35a [btrfs]
        [127298.797640]  [<ffffffffa065be8f>] btrfs_unlink_inode+0x1e/0x40 [btrfs]
        [127298.798876]  [<ffffffffa065bf11>] btrfs_unlink+0x60/0x9b [btrfs]
        [127298.800154]  [<ffffffff8116fb48>] vfs_unlink+0x9c/0xed
        [127298.801303]  [<ffffffff81173481>] do_unlinkat+0x12b/0x1fb
        [127298.802450]  [<ffffffff81253855>] ? lockdep_sys_exit_thunk+0x12/0x14
        [127298.803797]  [<ffffffff81174056>] SyS_unlinkat+0x29/0x2b
        [127298.805017]  [<ffffffff81465197>] system_call_fastpath+0x12/0x6f
        [127298.806310] ---[ end trace bbfddacb7aaada7b ]---
        [127298.807325] BTRFS warning (device dm-0): __btrfs_unlink_inode:3956: Aborting unused transaction(No such entry).
      
      So fix this by logging all parent inodes, current and old ones, to make
      sure we do not get stale entries after log replay. This is not a simple
      solution such as triggering a full transaction commit because it would
      imply full transaction commit when an inode is fsynced in the same
      transaction that modified it and reloaded it after eviction (because its
      last_unlink_trans is set to the same value as its last_trans as of the
      commit with the title "Btrfs: fix stale dir entries after unlink, inode
      eviction and fsync"), and it would also make fstest generic/066 fail
      since one of the fsyncs triggers a full commit and the next fsync will
      not find the inode in the log anymore (therefore not removing the xattr).
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      18aa0922
    • N
      btrfs: fix search key advancing condition · dd81d459
      Naohiro Aota 提交于
      The search key advancing condition used in copy_to_sk() is loose. It can
      advance the key even if it reaches sk->max_*: e.g. when the max key = (512,
      1024, -1) and the current key = (512, 1025, 10), it increments the
      offset by 1, continues hopeless search from (512, 1025, 11). This issue
      make ioctl() to take unexpectedly long time scanning all the leaf a blocks
      one by one.
      
      This commit fix the problem using standard way of key comparison:
      btrfs_comp_cpu_keys()
      Signed-off-by: NNaohiro Aota <naota@elisp.net>
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      dd81d459
    • F
      Btrfs: teach backref walking about backrefs with underflowed offset values · d6589101
      Filipe Manana 提交于
      When cloning/deduplicating file extents (through the clone and extent_same
      ioctls) we can get data back references with offset values that are a
      result of an unsigned integer arithmetic underflow, that is, values that
      are much larger then they could be otherwise.
      
      This is not a problem when decrementing or dropping the back references
      (happens when we overwrite the extents or punch a hole for example, through
      __btrfs_drop_extents()), since we compute the same too large offset value,
      but it is a problem for the backref walking code, used by an incremental
      send and the ioctls that are used by the btrfs tool "inspect-internal"
      commands, as it makes it miss the corresponding file extent items because
      the search key is set for an extent item that starts at an offset matching
      the exceptionally large offset value of the data back reference. For an
      incremental send this causes the send ioctl to fail with -EIO.
      
      So teach the backref walking code to deal with these cases by setting the
      search key's offset to 0 if the backref's offset value is larger than
      LLONG_MAX (the largest possible file offset). This makes sure the backref
      walking code finds the corresponding file extent items at the expense of
      scanning more items and leafs in the btree.
      
      Fixing the clone/dedup ioctls to not produce such underflowed results would
      require major changes breaking backward compatibility, updating user space
      tools, etc.
      
      Simple reproducer case for fstests:
      
        seq=`basename $0`
        seqres=$RESULT_DIR/$seq
        echo "QA output created by $seq"
      
        tmp=/tmp/$$
        status=1	# failure is the default!
        trap "_cleanup; exit \$status" 0 1 2 3 15
      
        _cleanup()
        {
            rm -fr $send_files_dir
            rm -f $tmp.*
        }
      
        # get standard environment, filters and checks
        . ./common/rc
        . ./common/filter
      
        # real QA test starts here
        _supported_fs btrfs
        _supported_os Linux
        _require_scratch
        _require_cloner
        _need_to_be_root
      
        send_files_dir=$TEST_DIR/btrfs-test-$seq
      
        rm -f $seqres.full
        rm -fr $send_files_dir
        mkdir $send_files_dir
      
        _scratch_mkfs >>$seqres.full 2>&1
        _scratch_mount
      
        # Create our test file with a single extent of 64K starting at file
        # offset 128K.
        $XFS_IO_PROG -f -c "pwrite -S 0xaa 128K 64K" $SCRATCH_MNT/foo \
            | _filter_xfs_io
      
        _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT \
            $SCRATCH_MNT/mysnap1
      
        # Now clone parts of the original extent into lower offsets of the file.
        #
        # The first clone operation adds a file extent item to file offset 0
        # that points to our initial extent with a data offset of 16K. The
        # corresponding data back reference in the extent tree has an offset of
        # 18446744073709535232, which is the result of file_offset - data_offset
        # = 0 - 16K.
        #
        # The second clone operation adds a file extent item to file offset 16K
        # that points to our initial extent with a data offset of 48K. The
        # corresponding data back reference in the extent tree has an offset of
        # 18446744073709518848, which is the result of file_offset - data_offset
        # = 16K - 48K.
        #
        # Those large back reference offsets (result of unsigned arithmetic
        # underflow) confused the back reference walking code (used by an
        # incremental send and the multiple inspect-internal ioctls) and made it
        # miss the back references, which for the case of an incremental send it
        # made it fail with -EIO and print a message like the following to
        # dmesg:
        #
        # "BTRFS error (device sdc): did not find backref in send_root. \
        #  inode=257, offset=0, disk_byte=12845056 found extent=12845056"
        #
        $CLONER_PROG -s $(((128 + 16) * 1024)) -d 0 -l $((16 * 1024)) \
            $SCRATCH_MNT/foo $SCRATCH_MNT/foo
        $CLONER_PROG -s $(((128 + 48) * 1024)) -d $((16 * 1024)) \
            -l $((16 * 1024)) $SCRATCH_MNT/foo $SCRATCH_MNT/foo
      
        _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT \
            $SCRATCH_MNT/mysnap2
      
        _run_btrfs_util_prog send $SCRATCH_MNT/mysnap1 -f $send_files_dir/1.snap
        _run_btrfs_util_prog send -p $SCRATCH_MNT/mysnap1 $SCRATCH_MNT/mysnap2 \
            -f $send_files_dir/2.snap
      
        echo "File digest in the original filesystem:"
        md5sum $SCRATCH_MNT/mysnap2/foo | _filter_scratch
      
        # Now recreate the filesystem by receiving both send streams and verify
        # we get the same file contents that the original filesystem had.
        _scratch_unmount
        _scratch_mkfs >>$seqres.full 2>&1
        _scratch_mount
      
        _run_btrfs_util_prog receive $SCRATCH_MNT -f $send_files_dir/1.snap
        _run_btrfs_util_prog receive $SCRATCH_MNT -f $send_files_dir/2.snap
      
        echo "File digest in the new filesystem:"
        md5sum $SCRATCH_MNT/mysnap2/foo | _filter_scratch
      
        status=0
        exit
      
      The test's expected golden output is:
      
        wrote 65536/65536 bytes at offset 131072
        XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
        File digest in the original filesystem:
        6c6079335cff141b8a31233ead04cbff  SCRATCH_MNT/mysnap2/foo
        File digest in the new filesystem:
        6c6079335cff141b8a31233ead04cbff  SCRATCH_MNT/mysnap2/foo
      
      But it failed with:
      
          (...)
          @@ -1,7 +1,5 @@
           QA output created by 097
           wrote 65536/65536 bytes at offset 131072
           XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
          -File digest in the original filesystem:
          -6c6079335cff141b8a31233ead04cbff  SCRATCH_MNT/mysnap2/foo
          -File digest in the new filesystem:
          -6c6079335cff141b8a31233ead04cbff  SCRATCH_MNT/mysnap2/foo
          ...
      
        $ cat /home/fdmanana/git/hub/xfstests/results//btrfs/097.full
        (...)
        ERROR: send ioctl failed with -5: Input/output error
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      d6589101
    • F
      Btrfs: fix stale dir entries after unlink, inode eviction and fsync · bde6c242
      Filipe Manana 提交于
      If we remove a hard link from an inode, the inode gets evicted, then
      we fsync the inode and then power fail/crash, when the log tree is
      replayed, the parent directory inode still has entries pointing to
      the name that no longer exists, while our inode no longer has the
      BTRFS_INODE_REF_KEY item matching the deleted hard link (as expected),
      leaving the filesystem in an inconsistent state. The stale directory
      entries can not be deleted (an attempt to delete them causes -ESTALE
      errors), which makes it impossible to delete the parent directory.
      
      This happens because we track the id of the transaction where the last
      unlink operation for the inode happened (last_unlink_trans) in an
      in-memory only field of the inode, that is, a value that is never
      persisted in the inode item stored on the fs/subvol btree. So if an
      inode is evicted and loaded again, the value for last_unlink_trans is
      set to 0, which prevents the fsync from logging the parent directory
      at btrfs_log_inode_parent(). So fix this by setting last_unlink_trans
      to the id of the transaction that last modified the inode when we
      load the inode. This is a pessimistic approach but it always ensures
      correctness with the trade off of ocassional full transaction commits
      when an fsync is done against the inode in the same transaction where
      it was evicted and reloaded when our inode is a directory and often
      logging its parent unnecessarily when our inode is not a directory.
      
      The following test case for fstests triggers the problem:
      
        seq=`basename $0`
        seqres=$RESULT_DIR/$seq
        echo "QA output created by $seq"
        tmp=/tmp/$$
        status=1	# failure is the default!
        trap "_cleanup; exit \$status" 0 1 2 3 15
      
        _cleanup()
        {
            _cleanup_flakey
            rm -f $tmp.*
        }
      
        # get standard environment, filters and checks
        . ./common/rc
        . ./common/filter
        . ./common/dmflakey
      
        # real QA test starts here
        _need_to_be_root
        _supported_fs generic
        _supported_os Linux
        _require_scratch
        _require_dm_flakey
        _require_metadata_journaling $SCRATCH_DEV
      
        rm -f $seqres.full
      
        _scratch_mkfs >>$seqres.full 2>&1
        _init_flakey
        _mount_flakey
      
        # Create our test file with 2 hard links.
        mkdir $SCRATCH_MNT/testdir
        touch $SCRATCH_MNT/testdir/foo
        ln $SCRATCH_MNT/testdir/foo $SCRATCH_MNT/testdir/bar
      
        # Make sure everything done so far is durably persisted.
        sync
      
        # Now remove one of the links, trigger inode eviction and then fsync
        # our inode.
        unlink $SCRATCH_MNT/testdir/bar
        echo 2 > /proc/sys/vm/drop_caches
        $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/testdir/foo
      
        # Silently drop all writes on our scratch device to simulate a power failure.
        _load_flakey_table $FLAKEY_DROP_WRITES
        _unmount_flakey
      
        # Allow writes again and mount the fs to trigger log/journal replay.
        _load_flakey_table $FLAKEY_ALLOW_WRITES
        _mount_flakey
      
        # Now verify our directory entries.
        echo "Entries in testdir:"
        ls -1 $SCRATCH_MNT/testdir
      
        # If we remove our inode, its parent should become empty and therefore we should
        # be able to remove the parent.
        rm -f $SCRATCH_MNT/testdir/*
        rmdir $SCRATCH_MNT/testdir
      
        _unmount_flakey
      
        # The fstests framework will call fsck against our filesystem which will verify
        # that all metadata is in a consistent state.
      
        status=0
        exit
      
      The test failed on btrfs with:
      
        generic/098 4s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//generic/098.out.bad)
          --- tests/generic/098.out	2015-07-23 18:01:12.616175932 +0100
          +++ /home/fdmanana/git/hub/xfstests/results//generic/098.out.bad	2015-07-23 18:04:58.924138308 +0100
          @@ -1,3 +1,6 @@
           QA output created by 098
           Entries in testdir:
          +bar
           foo
          +rm: cannot remove '/home/fdmanana/btrfs-tests/scratch_1/testdir/foo': Stale file handle
          +rmdir: failed to remove '/home/fdmanana/btrfs-tests/scratch_1/testdir': Directory not empty
          ...
          (Run 'diff -u tests/generic/098.out /home/fdmanana/git/hub/xfstests/results//generic/098.out.bad'  to see the entire diff)
        _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent (see /home/fdmanana/git/hub/xfstests/results//generic/098.full)
      
        $ cat /home/fdmanana/git/hub/xfstests/results//generic/098.full
        (...)
        checking fs roots
        root 5 inode 258 errors 2001, no inode item, link count wrong
           unresolved ref dir 257 index 0 namelen 3 name foo filetype 1 errors 6, no dir index, no inode ref
           unresolved ref dir 257 index 3 namelen 3 name bar filetype 1 errors 5, no dir item, no inode ref
        Checking filesystem on /dev/sdc
        (...)
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      bde6c242
    • F
      Btrfs: fix stale directory entries after fsync log replay · bb53eda9
      Filipe Manana 提交于
      We have another case where after an fsync log replay we get an inode with
      a wrong link count (smaller than it should be) and a number of directory
      entries greater than its link count. This happens when we add a new link
      hard link to our inode A and then we fsync some other inode B that has
      the side effect of logging the parent directory inode too. In this case
      at log replay time we add the new hard link to our inode (the item with
      key BTRFS_INODE_REF_KEY) when processing the parent directory but we
      never adjust the link count of our inode A. As a result we get stale dir
      entries for our inode A that can never be deleted and therefore it makes
      it impossible to remove the parent directory (as its i_size can never
      decrease back to 0).
      
      A simple reproducer for fstests that triggers this issue:
      
        seq=`basename $0`
        seqres=$RESULT_DIR/$seq
        echo "QA output created by $seq"
        tmp=/tmp/$$
        status=1	# failure is the default!
        trap "_cleanup; exit \$status" 0 1 2 3 15
      
        _cleanup()
        {
            _cleanup_flakey
            rm -f $tmp.*
        }
      
        # get standard environment, filters and checks
        . ./common/rc
        . ./common/filter
        . ./common/dmflakey
      
        # real QA test starts here
        _need_to_be_root
        _supported_fs generic
        _supported_os Linux
        _require_scratch
        _require_dm_flakey
        _require_metadata_journaling $SCRATCH_DEV
      
        rm -f $seqres.full
      
        _scratch_mkfs >>$seqres.full 2>&1
        _init_flakey
        _mount_flakey
      
        # Create our test directory and files.
        mkdir $SCRATCH_MNT/testdir
        touch $SCRATCH_MNT/testdir/foo
        touch $SCRATCH_MNT/testdir/bar
      
        # Make sure everything done so far is durably persisted.
        sync
      
        # Create one hard link for file foo and another one for file bar. After
        # that fsync only the file bar.
        ln $SCRATCH_MNT/testdir/bar $SCRATCH_MNT/testdir/bar_link
        ln $SCRATCH_MNT/testdir/foo $SCRATCH_MNT/testdir/foo_link
        $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/testdir/bar
      
        # Silently drop all writes on scratch device to simulate power failure.
        _load_flakey_table $FLAKEY_DROP_WRITES
        _unmount_flakey
      
        # Allow writes again and mount the fs to trigger log/journal replay.
        _load_flakey_table $FLAKEY_ALLOW_WRITES
        _mount_flakey
      
        # Now verify both our files have a link count of 2.
        echo "Link count for file foo: $(stat --format=%h $SCRATCH_MNT/testdir/foo)"
        echo "Link count for file bar: $(stat --format=%h $SCRATCH_MNT/testdir/bar)"
      
        # We should be able to remove all the links of our files in testdir, and
        # after that the parent directory should become empty and therefore
        # possible to remove it.
        rm -f $SCRATCH_MNT/testdir/*
        rmdir $SCRATCH_MNT/testdir
      
        _unmount_flakey
      
        # The fstests framework will call fsck against our filesystem which will verify
        # that all metadata is in a consistent state.
      
        status=0
        exit
      
      The test fails with:
      
       -Link count for file foo: 2
       +Link count for file foo: 1
        Link count for file bar: 2
       +rm: cannot remove '/home/fdmanana/btrfs-tests/scratch_1/testdir/foo_link': Stale file handle
       +rmdir: failed to remove '/home/fdmanana/btrfs-tests/scratch_1/testdir': Directory not empty
       (...)
       _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent
      
      And fsck's output:
      
        (...)
        checking fs roots
        root 5 inode 258 errors 2001, no inode item, link count wrong
            unresolved ref dir 257 index 5 namelen 8 name foo_link filetype 1 errors 4, no inode ref
        Checking filesystem on /dev/sdc
        (...)
      
      So fix this by marking inodes for link count fixup at log replay time
      whenever a directory entry is replayed if the entry was created in the
      transaction where the fsync was made and if it points to a non-directory
      inode.
      
      This isn't a new problem/regression, the issue exists for a long time,
      possibly since the log tree feature was added (2008).
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      bb53eda9
  2. 03 8月, 2015 5 次提交
  3. 02 8月, 2015 5 次提交
  4. 01 8月, 2015 13 次提交
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 7c764cec
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Must teardown SR-IOV before unregistering netdev in igb driver, from
          Alex Williamson.
      
       2) Fix ipv6 route unreachable crash in IPVS, from Alex Gartrell.
      
       3) Default route selection in ipv4 should take the prefix length, table
          ID, and TOS into account, from Julian Anastasov.
      
       4) sch_plug must have a reset method in order to purge all buffered
          packets when the qdisc is reset, likewise for sch_choke, from WANG
          Cong.
      
       5) Fix deadlock and races in slave_changelink/br_setport in bridging.
          From Nikolay Aleksandrov.
      
       6) mlx4 bug fixes (wrong index in port even propagation to VFs,
          overzealous BUG_ON assertion, etc.) from Ido Shamay, Jack
          Morgenstein, and Or Gerlitz.
      
       7) Turn off klog message about SCTP userspace interface compat that
          makes no sense at all, from Daniel Borkmann.
      
       8) Fix unbounded restarts of inet frag eviction process, causing NMI
          watchdog soft lockup messages, from Florian Westphal.
      
       9) Suspend/resume fixes for r8152 from Hayes Wang.
      
      10) Fix busy loop when MSG_WAITALL|MSG_PEEK is used in TCP recv, from
          Sabrina Dubroca.
      
      11) Fix performance regression when removing a lot of routes from the
          ipv4 routing tables, from Alexander Duyck.
      
      12) Fix device leak in AF_PACKET, from Lars Westerhoff.
      
      13) AF_PACKET also has a header length comparison bug due to signedness,
          from Alexander Drozdov.
      
      14) Fix bug in EBPF tail call generation on x86, from Daniel Borkmann.
      
      15) Memory leaks, TSO stats, watchdog timeout and other fixes to
          thunderx driver from Sunil Goutham and Thanneeru Srinivasulu.
      
      16) act_bpf can leak memory when replacing programs, from Daniel
          Borkmann.
      
      17) WOL packet fixes in gianfar driver, from Claudiu Manoil.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (79 commits)
        stmmac: fix missing MODULE_LICENSE in stmmac_platform
        gianfar: Enable device wakeup when appropriate
        gianfar: Fix suspend/resume for wol magic packet
        gianfar: Fix warning when CONFIG_PM off
        act_pedit: check binding before calling tcf_hash_release()
        net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket
        net: sched: fix refcount imbalance in actions
        r8152: reset device when tx timeout
        r8152: add pre_reset and post_reset
        qlcnic: Fix corruption while copying
        act_bpf: fix memory leaks when replacing bpf programs
        net: thunderx: Fix for crash while BGX teardown
        net: thunderx: Add PCI driver shutdown routine
        net: thunderx: Fix crash when changing rss with mutliple traffic flows
        net: thunderx: Set watchdog timeout value
        net: thunderx: Wakeup TXQ only if CQE_TX are processed
        net: thunderx: Suppress alloc_pages() failure warnings
        net: thunderx: Fix TSO packet statistic
        net: thunderx: Fix memory leak when changing queue count
        net: thunderx: Fix RQ_DROP miscalculation
        ...
      7c764cec
    • L
      Merge branch 'for-linus-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · acea568f
      Linus Torvalds 提交于
      Pull btrfs fixes from Chris Mason:
       "Filipe fixed up a hard to trigger ENOSPC regression from our merge
        window pull, and we have a few other smaller fixes"
      
      * 'for-linus-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
        Btrfs: fix quick exhaustion of the system array in the superblock
        btrfs: its btrfs_err() instead of btrfs_error()
        btrfs: Avoid NULL pointer dereference of free_extent_buffer when read_tree_block() fail
        btrfs: Fix lockdep warning of btrfs_run_delayed_iputs()
      acea568f
    • L
      Merge tag 'sound-4.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · c6fd4fc7
      Linus Torvalds 提交于
      Pull sound fixes from Takashi Iwai:
       "This became a relative big update as it includes the collected ASoC
        fixes.  There are a few fixes in ASoC core side, mostly for DAPM and
        the new topology API.  The rest are various ASoC driver-specific
        fixes, as well as the usual HD-audio and USB-audio quirks"
      
      * tag 'sound-4.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (29 commits)
        ALSA: hda - Fix MacBook Pro 5,2 quirk
        ALSA: hda - Fix race between PM ops and HDA init/probe
        ALSA: usb-audio: add dB range mapping for some devices
        ALSA: hda - Apply a fixup to Dell Vostro 5480
        ALSA: hda - Add pin quirk for the headset mic jack detection on Dell laptop
        ALSA: hda - Apply fixup for another Toshiba Satellite S50D
        ALSA: fireworks: add support for AudioFire2 quirk
        ALSA: hda - Fix the headset mic that will not work on Dell desktop machine
        ALSA: hda - fix cs4210_spdif_automute()
        ASoC: pcm1681: Fix setting de-emphasis sampling rate selection
        ASoC: ssm4567: Keep TDM_BCLKS in ssm4567_set_dai_fmt
        ASoC: sgtl5000: Fix up define for SGTL5000_SMALL_POP
        ASoC: dapm: Don't add prefix to widget stream name
        ASoC: rt5645: Check if codec is initialized in workqueue handler
        ASoC: Intel: Get correct usage_count value to load firmware
        ASoC: topology: Fix to add dapm mixer info
        ASoC: zx: spdif: Fix devm_ioremap_resource return value check
        ASoC: zx: i2s: Fix devm_ioremap_resource return value check
        ASoC: mediatek: Use platform_of_node for machine drivers
        ASoC: Free card DAPM context on snd_soc_instantiate_card() error path
        ...
      c6fd4fc7
    • J
      stmmac: fix missing MODULE_LICENSE in stmmac_platform · ea111545
      Joachim Eastwood 提交于
      Commit 50649ab1 ("stmmac: drop driver from stmmac platform code")
      was a bit overzealous in removing code and dropped the MODULE_*
      macro's that are still needed since stmmac_platform can be a module.
      Fix this by putting the macro's remvoed in 50649ab1 back.
      
      This fixes the following errors when used as a module:
        stmmac_platform: module license 'unspecified' taints kernel.
        Disabling lock debugging due to kernel taint
        stmmac_platform: Unknown symbol devm_kmalloc (err 0)
        stmmac_platform: Unknown symbol stmmac_suspend (err 0)
        stmmac_platform: Unknown symbol platform_get_irq_byname (err 0)
        stmmac_platform: Unknown symbol stmmac_dvr_remove (err 0)
        stmmac_platform: Unknown symbol platform_get_resource (err 0)
        stmmac_platform: Unknown symbol of_get_phy_mode (err 0)
        stmmac_platform: Unknown symbol of_property_read_u32_array (err 0)
        stmmac_platform: Unknown symbol of_alias_get_id (err 0)
        stmmac_platform: Unknown symbol stmmac_resume (err 0)
        stmmac_platform: Unknown symbol stmmac_dvr_probe (err 0)
      
      Fixes: 50649ab1 ("stmmac: drop driver from stmmac platform code")
      Reported-by: NIgor Gnatenko <i.gnatenko.brain@gmail.com>
      Signed-off-by: NJoachim Eastwood <manabian@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea111545
    • D
      Merge branch 'gianfar-wol-fixes' · ef1f4364
      David S. Miller 提交于
      Claudiu Manoil says:
      
      ====================
      gianfar: wol magic packet fixes
      
      These changes were already validated as part of FSL SDK.
      Patch 2 fixes occasional wake-on magic packet failures during
      traffic, probably due to incorrect traffic stop/ device halt
      sequence and incorrect usage of txlock.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ef1f4364
    • C
      gianfar: Enable device wakeup when appropriate · b0734b6d
      Claudiu Manoil 提交于
      The wol_en flag is 0 by default anyway, and we have the
      following inconsistency: a MAGIC packet wol capable eth
      interface is registered as a wake-up source but unable
      to wake-up the system as wol_en is 0 (wake-on flag set to 'd').
      Calling set_wakeup_enable() at netdev open is just redundant
      because wol_en is 0 by default.
      Let only ethtool call set_wakeup_enable() for now.
      
      The bflock is obviously obsoleted, its utility has been corroded
      over time.  The bitfield flags used today in gianfar are accessed
      only on the init/ config path, with no real possibility of
      concurrency - nothing that would justify smth. like bflock.
      Signed-off-by: NClaudiu Manoil <claudiu.manoil@freescale.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b0734b6d
    • C
      gianfar: Fix suspend/resume for wol magic packet · 614b4242
      Claudiu Manoil 提交于
      If we disable NAPI in the first place we can mask the device's
      interrupts (and halt it) without fearing that imask may be
      concurrently accessed from interrupt context, so there's
      no need to do local_irq_save() around gfar_halt_nodisable().
      lock_rx_qs()/unlock_tx_qs() are just obsoleted and potentially
      buggy routines.  The txlock is currently used in the driver only
      to manage TX congestion, it has nothing to do with halting the
      device.  With these changes, the TX processing is stopped before
      gfar_halt().
      
      Compact gfar_halt() is used instead of gfar_halt_nodisable(),
      as it disables Rx/TX DMA h/w blocks and the Rx/TX h/w queues.
      gfar_start() re-enables all these blocks on resume.  Enabling
      the magic-packet mode remains the same, note that the RX block
      is re-enabled just before entering sleep mode.
      
      Add IRQF_NO_SUSPEND flag for the error interrupt line, to signal
      that the interrupt line must remain active during sleep in order
      to wake the system by magic packet (MAG) reception interrupt.
      (On some systems the MAG interrupt did trigger w/o this flag
      as well, but on others it didn't.)
      
      Without these fixes, when suspended during fair Tx traffic the
      interface occasionally failed to be woken up by magic packet.
      Signed-off-by: NClaudiu Manoil <claudiu.manoil@freescale.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      614b4242
    • C
      gianfar: Fix warning when CONFIG_PM off · 84868305
      Claudiu Manoil 提交于
      CC      drivers/net/ethernet/freescale/gianfar.o
      drivers/net/ethernet/freescale/gianfar.c:568:13: warning: 'lock_tx_qs'
      defined but not used [-Wunused-function]
       static void lock_tx_qs(struct gfar_private *priv)
                   ^
      drivers/net/ethernet/freescale/gianfar.c:576:13: warning: 'unlock_tx_qs'
      defined but not used [-Wunused-function]
       static void unlock_tx_qs(struct gfar_private *priv)
                   ^
      Reported-by: NScott Wood <scottwood@freescale.com>
      Signed-off-by: NClaudiu Manoil <claudiu.manoil@freescale.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      84868305
    • W
      act_pedit: check binding before calling tcf_hash_release() · 5175f710
      WANG Cong 提交于
      When we share an action within a filter, the bind refcnt
      should increase, therefore we should not call tcf_hash_release().
      
      Fixes: 1a29321e ("net_sched: act: Dont increment refcnt on replace")
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NCong Wang <cwang@twopensource.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5175f710
    • M
      ARM: dts: keystone: fix dt bindings to use post div register for mainpll · c1bfa985
      Murali Karicheri 提交于
      All of the keystone devices have a separate register to hold post
      divider value for main pll clock. Currently the fixed-postdiv
      value used for k2hk/l/e SoCs works by sheer luck as u-boot happens to
      use a value of 2 for this. Now that we have fixed this in the pll
      clock driver change the dt bindings for the same.
      Signed-off-by: NMurali Karicheri <m-karicheri2@ti.com>
      Acked-by: NSantosh Shilimkar <ssantosh@kernel.org>
      Signed-off-by: NOlof Johansson <olof@lixom.net>
      c1bfa985
    • L
      Merge tag 'iommu-fixes-v4.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · 5e49e0be
      Linus Torvalds 提交于
      Pull IOMMU fixes from Joerg Roedel:
       "These fixes are all for the AMD IOMMU driver:
      
         - A regression with HSA caused by the conversion of the driver to
           default domains.  The fixes make sure that an HSA device can still
           be attached to an IOMMUv2 domain and that these domains also allow
           non-IOMMUv2 capable devices.
      
         - Fix iommu=pt mode which did not work because the dma_ops where set
           to nommu_ops, which breaks devices that can only do 32bit DMA.
      
         - Fix an issue with non-PCI devices not working, because there are no
           dma_ops for them.  This issue was discovered recently as new AMD
           x86 platforms have non-PCI devices too"
      
      * tag 'iommu-fixes-v4.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
        iommu/amd: Allow non-ATS devices in IOMMUv2 domains
        iommu/amd: Set global dma_ops if swiotlb is disabled
        iommu/amd: Use swiotlb in passthrough mode
        iommu/amd: Allow non-IOMMUv2 devices in IOMMUv2 domains
        iommu/amd: Use iommu core for passthrough mode
        iommu/amd: Use iommu_attach_group()
      5e49e0be
    • L
      Merge tag 'drm-intel-fixes-2015-07-31' of git://anongit.freedesktop.org/drm-intel · 23ff9e19
      Linus Torvalds 提交于
      Pull drm intel fixes from Daniel Vetter:
       "I delayed my -fixes pull a bit hoping that I could include a fix for
        the dp mst stuff but looks a bit more nasty than that.  So just 3
        other regression fixes, one 4.2 other two cc: stable"
      
      * tag 'drm-intel-fixes-2015-07-31' of git://anongit.freedesktop.org/drm-intel:
        drm/i915: Declare the swizzling unknown for L-shaped configurations
        drm/i915: Mark PIN_USER binding as GLOBAL_BIND without the aliasing ppgtt
        drm/i915: Replace WARN inside I915_READ64_2x32 with retry loop
      23ff9e19
    • L
      Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux · fd56d1d6
      Linus Torvalds 提交于
      Pull drm fixes from Dave Airlie:
       "This has a bunch of nouveau fixes, as Ben has been hibernating and has
        lots of small fixes for lots of bugs across nouveau.
      
        Radeon has one major fix for hdmi/dp audio regression that is larger
        than Alex would like, but seems to fix up a fair few bugs, along with
        some misc fixes.
      
        And a few msm fixes, one of which is also a bit large.
      
        But nothing in here seems insane or crazy for this stage, just more
        than I'd like"
      
      * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (33 commits)
        drm/msm/mdp5: release SMB (shared memory blocks) in various cases
        drm/msm: change to uninterruptible wait in atomic commit
        drm/msm: mdp4: Fix drm_framebuffer dereference crash
        drm/msm: fix msm_gem_prime_get_sg_table()
        drm/amdgpu: add new parameter to seperate map and unmap
        drm/amdgpu: hdp_flush is not needed for inside IB
        drm/amdgpu: different emit_ib for gfx and compute
        drm/amdgpu: information leak in amdgpu_info_ioctl()
        drm/amdgpu: clean up init sequence for failures
        drm/radeon/combios: add some validation of lvds values
        drm/radeon: rework audio modeset to handle non-audio hdmi features
        drm/radeon: rework audio detect (v4)
        drm/amdgpu: Drop drm/ prefix for including drm.h in amdgpu_drm.h
        drm/radeon: Drop drm/ prefix for including drm.h in radeon_drm.h
        drm/nouveau/nouveau/ttm: fix tiled system memory with Maxwell
        drm/nouveau/kms/nv50-: guard against enabling cursor on disabled heads
        drm/nouveau/fbcon/g80: reduce PUSH_SPACE alloc, fire ring on accel init
        drm/nouveau/fbcon/gf100-: reduce RING_SPACE allocation
        drm/nouveau/fbcon/nv11-: correctly account for ring space usage
        drm/nouveau/bios: add proper support for opcode 0x59
        ...
      fd56d1d6
  5. 31 7月, 2015 12 次提交
    • J
      Revert "dmaengine: virt-dma: don't always free descriptor upon completion" · 8c8fe97b
      Jun Nie 提交于
      This reverts commit b9855f03.
      The patch break existing DMA usage case. For example, audio SOC
      dmaengine never release channel and cause virt-dma to cache too
      much memory in descriptor to exhaust system memory.
      Signed-off-by: NVinod Koul <vinod.koul@intel.com>
      8c8fe97b
    • T
      dmaengine: mv_xor: fix big endian operation in register mode · 0ec9ebc7
      Thomas Petazzoni 提交于
      Commit 6f166312 ("dmaengine: mv_xor: add support for a38x command
      in descriptor mode") introduced the support for a feature that
      appeared in Armada 38x: specifying the operation to be performed in a
      per-descriptor basis rather than globally per channel.
      
      However, when doing so, it changed the function mv_chan_set_mode() to
      use:
      
        if (IS_ENABLED(__BIG_ENDIAN))
      
      instead of:
      
        #if defined(__BIG_ENDIAN)
      
      While IS_ENABLED() is perfectly fine for CONFIG_* symbols, it is not
      for other symbols such as __BIG_ENDIAN that is provided directly by
      the compiler. Consequently, the commit broke support for big-endian,
      as the XOR_DESCRIPTOR_SWAP flag was not set in the XOR channel
      configuration register.
      
      The primarily visible effect was some nasty warnings and failures
      appearing during the self-test of the XOR unit:
      
      [    1.197368] mv_xor d0060900.xor: error on chan 0. intr cause 0x00000082
      [    1.197393] mv_xor d0060900.xor: config       0x00008440
      [    1.197410] mv_xor d0060900.xor: activation   0x00000000
      [    1.197427] mv_xor d0060900.xor: intr cause   0x00000082
      [    1.197443] mv_xor d0060900.xor: intr mask    0x000003f7
      [    1.197460] mv_xor d0060900.xor: error cause  0x00000000
      [    1.197477] mv_xor d0060900.xor: error addr   0x00000000
      [    1.197491] ------------[ cut here ]------------
      [    1.197513] WARNING: CPU: 0 PID: 1 at ../drivers/dma/mv_xor.c:664 mv_xor_interrupt_handler+0x14c/0x170()
      
      See also:
      
        http://storage.kernelci.org/next/next-20150617/arm-mvebu_v7_defconfig+CONFIG_CPU_BIG_ENDIAN=y/lab-khilman/boot-armada-xp-openblocks-ax3-4.txtSigned-off-by: NThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Fixes: 6f166312 ("dmaengine: mv_xor: add support for a38x command in descriptor mode")
      Reviewed-by: NMaxime Ripard <maxime.ripard@free-electrons.com>
      Signed-off-by: NVinod Koul <vinod.koul@intel.com>
      0ec9ebc7
    • R
      dmaengine: xgene-dma: Fix the resource map to handle overlapping · cda8e937
      Rameshwar Prasad Sahu 提交于
      There is an overlap in dma ring cmd csr region due to sharing of ethernet
      ring cmd csr region. This patch fix the resource overlapping by mapping
      the entire dma ring cmd csr region.
      Signed-off-by: NRameshwar Prasad Sahu <rsahu@apm.com>
      Signed-off-by: NVinod Koul <vinod.koul@intel.com>
      cda8e937
    • C
      dmaengine: at_xdmac: fix transfer data width in at_xdmac_prep_slave_sg() · 1c8a38b1
      Cyrille Pitchen 提交于
      This patch adds the missing update of the transfer data width in
      at_xdmac_prep_slave_sg().
      
      Indeed, for each item in the scatter-gather list, we check whether the
      transfer length is aligned with the data width provided by
      dmaengine_slave_config(). If so, we directly use this data width for the
      current part of the transfer we are preparing. Otherwise, the data width
      is reduced to 8 bits (1 byte). Of course, the actual number of register
      accesses must also be updated to match the new data width.
      
      So one chunk was missing in the original patch (see Fixes tag below): the
      number of register accesses was correctly set to (len >> fixed_dwidth) in
      mbr_ubc but the real data width was not updated in mbr_cfg. Since mbr_cfg
      may change for each part of the scatter-gather transfer this also explains
      why the original patch used the Descriptor View 2 instead of the
      Descriptor View 1.
      
      Let's take the example of a DMA transfer to write 8bit data into an Atmel
      USART with FIFOs. When FIFOs are enabled in the USART, its Transmit
      Holding Register (THR) works in multidata mode, that is to say that up to
      4 8bit data can be written into the THR in a single 32bit access and it is
      still possible to write only one data with a 8bit access. To take
      advantage of this new feature, the DMA driver was modified to allow
      multiple dwidths when doing slave transfers.
      For instance, when the total length is 22 bytes, the USART driver splits
      the transfer into 2 parts:
      
      First part: 20 bytes transferred through 5 32bit writes into THR
      Second part: 2 bytes transferred though 2 8bit writes into THR
      
      For the second part, the data width was first set to 4_BYTES by the USART
      driver thanks to dmaengine_slave_config() then at_xdmac_prep_slave_sg()
      reduces this data width to 1_BYTE because the 2 byte length is not aligned
      with the original 4_BYTES data width. Since the data width is modified,
      the actual number of writes into THR must be set accordingly.
      Signed-off-by: NCyrille Pitchen <cyrille.pitchen@atmel.com>
      Fixes: 6d3a7d9e ("dmaengine: at_xdmac: allow muliple dwidths when doing slave transfers")
      Cc: stable@vger.kernel.org #4.0 and later
      Acked-by: NNicolas Ferre <nicolas.ferre@atmel.com>
      Acked-by: NLudovic Desroches <ludovic.desroches@atmel.com>
      Signed-off-by: NVinod Koul <vinod.koul@intel.com>
      1c8a38b1
    • C
      dmaengine: at_hdmac: fix residue computation · 93dce3a6
      Cyrille Pitchen 提交于
      As claimed by the programmer datasheet and confirmed by the IP designer,
      the Block Transfer Size (BTSIZE) bitfield of the Channel x Control A
      Register (CTRLAx) always refers to a number of Source Width (SRC_WIDTH)
      transfers.
      
      Both the SRC_WIDTH and BTSIZE bitfields can be extacted from the CTRLAx
      register to compute the DMA residue. So the 'tx_width' field is useless
      and can be removed from the struct at_desc.
      
      Before this patch, atc_prep_slave_sg() was not consistent: BTSIZE was
      correctly initialized according to the SRC_WIDTH but 'tx_width' was always
      set to reg_width, which was incorrect for MEM_TO_DEV transfers. It led to
      bad DMA residue when 'tx_width' != SRC_WIDTH.
      
      Also the 'tx_width' field was mostly set only in the first and last
      descriptors. Depending on the kind of DMA transfer, this field remained
      uninitialized for intermediate descriptors. The accurate DMA residue was
      computed only when the currently processed descriptor was the first or the
      last of the chain. This algorithm was a little bit odd. An accurate DMA
      residue can always be computed using the SRC_WIDTH and BTSIZE bitfields
      in the CTRLAx register.
      
      Finally, the test to check whether the currently processed descriptor is
      the last of the chain was wrong: for cyclic transfer, last_desc->lli.dscr
      is NOT equal to zero, since set_desc_eol() is never called, but logically
      equal to first_desc->txd.phys. This bug has a side effect on the
      drivers/tty/serial/atmel_serial.c driver, which uses cyclic DMA transfer
      to receive data. Since the DMA residue was wrong each time the DMA
      transfer reaches the second (and last) period of the transfer, no more
      data were received by the USART driver till the cyclic DMA transfer loops
      back to the first period.
      Signed-off-by: NCyrille Pitchen <cyrille.pitchen@atmel.com>
      Acked-by: NTorsten Fleischer <torfl6749@gmail.com>
      Tested-by: NJirí Prchal <jiri.prchal@aksignal.cz>
      Acked-by: NNicolas Ferre <nicolas.ferre@atmel.com>
      Signed-off-by: NVinod Koul <vinod.koul@intel.com>
      93dce3a6
    • L
      dmaengine: at_xdmac: fix bug about channel configuration · 20cadcb4
      Ludovic Desroches 提交于
      When using descriptor view 2 or higher, we don't write the configuration
      into AT_XDMAC_CC register because this configuration will be fetch from
      the descriptor. Unfortunately, the PROT bit is not updated with this
      method, we have to do it manually before enabling the channel.
      Signed-off-by: NLudovic Desroches <ludovic.desroches@atmel.com>
      Signed-off-by: NVinod Koul <vinod.koul@intel.com>
      20cadcb4
    • J
      iommu/amd: Allow non-ATS devices in IOMMUv2 domains · 1c1cc454
      Joerg Roedel 提交于
      With the grouping of multi-function devices a non-ATS
      capable device might also end up in the same domain as an
      IOMMUv2 capable device.
      So handle this situation gracefully and don't consider it a
      bug anymore.
      Tested-by: NOded Gabbay <oded.gabbay@gmail.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      1c1cc454
    • A
      x86/ldt: Make modify_ldt synchronous · 37868fe1
      Andy Lutomirski 提交于
      modify_ldt() has questionable locking and does not synchronize
      threads.  Improve it: redesign the locking and synchronize all
      threads' LDTs using an IPI on all modifications.
      
      This will dramatically slow down modify_ldt in multithreaded
      programs, but there shouldn't be any multithreaded programs that
      care about modify_ldt's performance in the first place.
      
      This fixes some fallout from the CVE-2015-5157 fixes.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jan Beulich <jbeulich@suse.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: security@kernel.org <security@kernel.org>
      Cc: <stable@vger.kernel.org>
      Cc: xen-devel <xen-devel@lists.xen.org>
      Link: http://lkml.kernel.org/r/4c6978476782160600471bd865b318db34c7b628.1438291540.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      37868fe1
    • A
      x86/xen: Probe target addresses in set_aliased_prot() before the hypercall · aa1acff3
      Andy Lutomirski 提交于
      The update_va_mapping hypercall can fail if the VA isn't present
      in the guest's page tables.  Under certain loads, this can
      result in an OOPS when the target address is in unpopulated vmap
      space.
      
      While we're at it, add comments to help explain what's going on.
      
      This isn't a great long-term fix.  This code should probably be
      changed to use something like set_memory_ro.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Vrabel <dvrabel@cantab.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jan Beulich <jbeulich@suse.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: security@kernel.org <security@kernel.org>
      Cc: <stable@vger.kernel.org>
      Cc: xen-devel <xen-devel@lists.xen.org>
      Link: http://lkml.kernel.org/r/0b0e55b995cda11e7829f140b833ef932fcabe3a.1438291540.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      aa1acff3
    • I
      Merge tag 'efi-urgent' of... · 1adb9123
      Ingo Molnar 提交于
      Merge tag 'efi-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent
      
      Pull EFI fixes from Matt Fleming:
      
       * Fix an EFI boot issue preventing a Parallels virtual machine from
         booting because the upper 32-bits of the EFI memmap pointer were
         being discarded in setup_e820(). (Dmitry Skorodumov)
      
       * Validate that the "efi" kernel parameter gets used with an argument,
         otherwise we will oops. (Ricardo Neri)
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      1adb9123
    • L
      Merge tag 'xfs-for-linus-4.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs · 84009357
      Linus Torvalds 提交于
      Pull xfs fixes from Dave Chinner:
       "There are a couple of recently found, long standing remote attribute
        corruption fixes caused by log recovery getting confused after a
        crash, and the new DAX code in XFS (merged in 4.2-rc1) needs to
        actually use the DAX fault path on read faults.
      
        Summary:
      
         - remote attribute log recovery corruption fixes
      
         - DAX page faults need to use direct mappings, not a page cache
           mapping"
      
      * tag 'xfs-for-linus-4.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
        xfs: remote attributes need to be considered data
        xfs: remote attribute headers contain an invalid LSN
        xfs: call dax_fault on read page faults for DAX
      84009357
    • S
      net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket · 8a681736
      Sowmini Varadhan 提交于
      The newsk returned by sk_clone_lock should hold a get_net()
      reference if, and only if, the parent is not a kernel socket
      (making this similar to sk_alloc()).
      
      E.g,. for the SYN_RECV path, tcp_v4_syn_recv_sock->..inet_csk_clone_lock
      sets up the syn_recv newsk from sk_clone_lock. When the parent (listen)
      socket is a kernel socket (defined in sk_alloc() as having
      sk_net_refcnt == 0), then the newsk should also have a 0 sk_net_refcnt
      and should not hold a get_net() reference.
      
      Fixes: 26abe143 ("net: Modify sk_alloc to not reference count the
            netns of kernel sockets.")
      Acked-by: NEric Dumazet <edumazet@google.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a681736