1. 21 11月, 2018 40 次提交
    • V
      ext4: fix buffer leak in ext4_xattr_move_to_block() on error path · 29ee4d62
      Vasily Averin 提交于
      commit 6bdc9977fcdedf47118d2caf7270a19f4b6d8a8f upstream.
      
      Fixes: 3f2571c1 ("ext4: factor out xattr moving")
      Fixes: 6dd4ee7c ("ext4: Expand extra_inodes space per ...")
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 2.6.23
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      29ee4d62
    • V
      ext4: release bs.bh before re-using in ext4_xattr_block_find() · 4648dcb2
      Vasily Averin 提交于
      commit 45ae932d246f721e6584430017176cbcadfde610 upstream.
      
      bs.bh was taken in previous ext4_xattr_block_find() call,
      it should be released before re-using
      
      Fixes: 7e01c8e5 ("ext3/4: fix uninitialized bs in ...")
      Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 2.6.26
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4648dcb2
    • V
      ext4: fix buffer leak in ext4_xattr_get_block() on error path · 0f0d1c16
      Vasily Averin 提交于
      commit ecaaf408478b6fb4d9986f9b6652f3824e374f4c upstream.
      
      Fixes: dec214d0 ("ext4: xattr inode deduplication")
      Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 4.13
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0f0d1c16
    • V
      ext4: fix possible leak of s_journal_flag_rwsem in error path · 0a992da5
      Vasily Averin 提交于
      commit af18e35bfd01e6d65a5e3ef84ffe8b252d1628c5 upstream.
      
      Fixes: c8585c6f ("ext4: fix races between changing inode journal ...")
      Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 4.7
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0a992da5
    • T
      ext4: fix possible leak of sbi->s_group_desc_leak in error path · 0d339ced
      Theodore Ts'o 提交于
      commit 9e463084cdb22e0b56b2dfbc50461020409a5fd3 upstream.
      
      Fixes: bfe0a5f4 ("ext4: add more mount time checks of the superblock")
      Reported-by: NVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 4.18
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0d339ced
    • T
      ext4: avoid possible double brelse() in add_new_gdb() on error path · 64a3d537
      Theodore Ts'o 提交于
      commit 4f32c38b4662312dd3c5f113d8bdd459887fb773 upstream.
      
      Fixes: b4097142 ("ext4: add error checking to calls to ...")
      Reported-by: NVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 2.6.38
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      64a3d537
    • V
      ext4: fix missing cleanup if ext4_alloc_flex_bg_array() fails while resizing · 110a1994
      Vasily Averin 提交于
      commit f348e2241fb73515d65b5d77dd9c174128a7fbf2 upstream.
      
      Fixes: 117fff10 ("ext4: grow the s_flex_groups array as needed ...")
      Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 3.7
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      110a1994
    • V
      ext4: avoid buffer leak in ext4_orphan_add() after prior errors · 656b121b
      Vasily Averin 提交于
      commit feaf264ce7f8d54582e2f66eb82dd9dd124c94f3 upstream.
      
      Fixes: d745a8c2 ("ext4: reduce contention on s_orphan_lock")
      Fixes: 6e3617e5 ("ext4: Handle non empty on-disk orphan link")
      Cc: Dmitry Monakhov <dmonakhov@gmail.com>
      Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 2.6.34
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      656b121b
    • V
      ext4: avoid buffer leak on shutdown in ext4_mark_iloc_dirty() · d65b7d33
      Vasily Averin 提交于
      commit a6758309a005060b8297a538a457c88699cb2520 upstream.
      
      ext4_mark_iloc_dirty() callers expect that it releases iloc->bh
      even if it returns an error.
      
      Fixes: 0db1ff22 ("ext4: add shutdown bit and check for it")
      Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 4.11
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d65b7d33
    • V
      ext4: fix possible inode leak in the retry loop of ext4_resize_fs() · 36b1ba6a
      Vasily Averin 提交于
      commit db6aee62406d9fbb53315fcddd81f1dc271d49fa upstream.
      
      Fixes: 1c6bd717 ("ext4: convert file system to meta_bg if needed ...")
      Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 3.7
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      36b1ba6a
    • V
      ext4: missing !bh check in ext4_xattr_inode_write() · 4903c091
      Vasily Averin 提交于
      commit eb6984fa4ce2837dcb1f66720a600f31b0bb3739 upstream.
      
      According to Ted Ts'o ext4_getblk() called in ext4_xattr_inode_write()
      should not return bh = NULL
      
      The only time that bh could be NULL, then, would be in the case of
      something really going wrong; a programming error elsewhere (perhaps a
      wild pointer dereference) or I/O error causing on-disk file system
      corruption (although that would be highly unlikely given that we had
      *just* allocated the blocks and so the metadata blocks in question
      probably would still be in the cache).
      
      Fixes: e50e5129 ("ext4: xattr-in-inode support")
      Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 4.13
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4903c091
    • V
      ext4: avoid potential extra brelse in setup_new_flex_group_blocks() · 20dd2c4e
      Vasily Averin 提交于
      commit 9e4028935cca3f9ef9b6a90df9da6f1f94853536 upstream.
      
      Currently bh is set to NULL only during first iteration of for cycle,
      then this pointer is not cleared after end of using.
      Therefore rollback after errors can lead to extra brelse(bh) call,
      decrements bh counter and later trigger an unexpected warning in __brelse()
      
      Patch moves brelse() calls in body of cycle to exclude requirement of
      brelse() call in rollback.
      
      Fixes: 33afdcc5 ("ext4: add a function which sets up group blocks ...")
      Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 3.3+
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      20dd2c4e
    • V
      ext4: add missing brelse() add_new_gdb_meta_bg()'s error path · 2aa79d31
      Vasily Averin 提交于
      commit 61a9c11e5e7a0dab5381afa5d9d4dd5ebf18f7a0 upstream.
      
      Fixes: 01f795f9 ("ext4: add online resizing support for meta_bg ...")
      Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 3.7
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2aa79d31
    • V
      ext4: add missing brelse() in set_flexbg_block_bitmap()'s error path · cd18d6e0
      Vasily Averin 提交于
      commit cea5794122125bf67559906a0762186cf417099c upstream.
      
      Fixes: 33afdcc5 ("ext4: add a function which sets up group blocks ...")
      Cc: stable@kernel.org # 3.3
      Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cd18d6e0
    • V
      ext4: add missing brelse() update_backups()'s error path · f7b6459e
      Vasily Averin 提交于
      commit ea0abbb648452cdb6e1734b702b6330a7448fcf8 upstream.
      
      Fixes: ac27a0ec ("ext4: initial copy of files from ext3")
      Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 2.6.19
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f7b6459e
    • M
      clockevents/drivers/i8253: Add support for PIT shutdown quirk · ebbc6fce
      Michael Kelley 提交于
      commit 35b69a420bfb56b7b74cb635ea903db05e357bec upstream.
      
      Add support for platforms where pit_shutdown() doesn't work because of a
      quirk in the PIT emulation. On these platforms setting the counter register
      to zero causes the PIT to start running again, negating the shutdown.
      
      Provide a global variable that controls whether the counter register is
      zero'ed, which platform specific code can override.
      Signed-off-by: NMichael Kelley <mikelley@microsoft.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>
      Cc: "devel@linuxdriverproject.org" <devel@linuxdriverproject.org>
      Cc: "daniel.lezcano@linaro.org" <daniel.lezcano@linaro.org>
      Cc: "virtualization@lists.linux-foundation.org" <virtualization@lists.linux-foundation.org>
      Cc: "jgross@suse.com" <jgross@suse.com>
      Cc: "akataria@vmware.com" <akataria@vmware.com>
      Cc: "olaf@aepfle.de" <olaf@aepfle.de>
      Cc: "apw@canonical.com" <apw@canonical.com>
      Cc: vkuznets <vkuznets@redhat.com>
      Cc: "jasowang@redhat.com" <jasowang@redhat.com>
      Cc: "marcelo.cerri@canonical.com" <marcelo.cerri@canonical.com>
      Cc: KY Srinivasan <kys@microsoft.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/1541303219-11142-2-git-send-email-mikelley@microsoft.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ebbc6fce
    • S
      btrfs: tree-checker: Fix misleading group system information · f2589f9a
      Shaokun Zhang 提交于
      commit 761333f2f50ccc887aa9957ae829300262c0d15b upstream.
      
      block_group_err shows the group system as a decimal value with a '0x'
      prefix, which is somewhat misleading.
      
      Fix it to print hexadecimal, as was intended.
      
      Fixes: fce466ea ("btrfs: tree-checker: Verify block_group_item")
      CC: stable@vger.kernel.org # 4.19+
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NShaokun Zhang <zhangshaokun@hisilicon.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f2589f9a
    • F
      Btrfs: fix data corruption due to cloning of eof block · ec6d90a4
      Filipe Manana 提交于
      commit ac765f83f1397646c11092a032d4f62c3d478b81 upstream.
      
      We currently allow cloning a range from a file which includes the last
      block of the file even if the file's size is not aligned to the block
      size. This is fine and useful when the destination file has the same size,
      but when it does not and the range ends somewhere in the middle of the
      destination file, it leads to corruption because the bytes between the EOF
      and the end of the block have undefined data (when there is support for
      discard/trimming they have a value of 0x00).
      
      Example:
      
       $ mkfs.btrfs -f /dev/sdb
       $ mount /dev/sdb /mnt
      
       $ export foo_size=$((256 * 1024 + 100))
       $ xfs_io -f -c "pwrite -S 0x3c 0 $foo_size" /mnt/foo
       $ xfs_io -f -c "pwrite -S 0xb5 0 1M" /mnt/bar
      
       $ xfs_io -c "reflink /mnt/foo 0 512K $foo_size" /mnt/bar
      
       $ od -A d -t x1 /mnt/bar
       0000000 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5
       *
       0524288 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c
       *
       0786528 3c 3c 3c 3c 00 00 00 00 00 00 00 00 00 00 00 00
       0786544 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       *
       0790528 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5
       *
       1048576
      
      The bytes in the range from 786532 (512Kb + 256Kb + 100 bytes) to 790527
      (512Kb + 256Kb + 4Kb - 1) got corrupted, having now a value of 0x00 instead
      of 0xb5.
      
      This is similar to the problem we had for deduplication that got recently
      fixed by commit de02b9f6 ("Btrfs: fix data corruption when
      deduplicating between different files").
      
      Fix this by not allowing such operations to be performed and return the
      errno -EINVAL to user space. This is what XFS is doing as well at the VFS
      level. This change however now makes us return -EINVAL instead of
      -EOPNOTSUPP for cases where the source range maps to an inline extent and
      the destination range's end is smaller then the destination file's size,
      since the detection of inline extents is done during the actual process of
      dropping file extent items (at __btrfs_drop_extents()). Returning the
      -EINVAL error is done early on and solely based on the input parameters
      (offsets and length) and destination file's size. This makes us consistent
      with XFS and anyone else supporting cloning since this case is now checked
      at a higher level in the VFS and is where the -EINVAL will be returned
      from starting with kernel 4.20 (the VFS changed was introduced in 4.20-rc1
      by commit 07d19dc9fbe9 ("vfs: avoid problematic remapping requests into
      partial EOF block"). So this change is more geared towards stable kernels,
      as it's unlikely the new VFS checks get removed intentionally.
      
      A test case for fstests follows soon, as well as an update to filter
      existing tests that expect -EOPNOTSUPP to accept -EINVAL as well.
      
      CC: <stable@vger.kernel.org> # 4.4+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ec6d90a4
    • F
      Btrfs: fix infinite loop on inode eviction after deduplication of eof block · bafd5b78
      Filipe Manana 提交于
      commit 11023d3f5fdf89bba5e1142127701ca6e6014587 upstream.
      
      If we attempt to deduplicate the last block of a file A into the middle of
      a file B, and file A's size is not a multiple of the block size, we end
      rounding the deduplication length to 0 bytes, to avoid the data corruption
      issue fixed by commit de02b9f6 ("Btrfs: fix data corruption when
      deduplicating between different files"). However a length of zero will
      cause the insertion of an extent state with a start value greater (by 1)
      then the end value, leading to a corrupt extent state that will trigger a
      warning and cause chaos such as an infinite loop during inode eviction.
      Example trace:
      
       [96049.833585] ------------[ cut here ]------------
       [96049.833714] WARNING: CPU: 0 PID: 24448 at fs/btrfs/extent_io.c:436 insert_state+0x101/0x120 [btrfs]
       [96049.833767] CPU: 0 PID: 24448 Comm: xfs_io Not tainted 4.19.0-rc7-btrfs-next-39 #1
       [96049.833768] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
       [96049.833780] RIP: 0010:insert_state+0x101/0x120 [btrfs]
       [96049.833783] RSP: 0018:ffffafd2c3707af0 EFLAGS: 00010282
       [96049.833785] RAX: 0000000000000000 RBX: 000000000004dfff RCX: 0000000000000006
       [96049.833786] RDX: 0000000000000007 RSI: ffff99045c143230 RDI: ffff99047b2168a0
       [96049.833787] RBP: ffff990457851cd0 R08: 0000000000000001 R09: 0000000000000000
       [96049.833787] R10: ffffafd2c3707ab8 R11: 0000000000000000 R12: ffff9903b93b12c8
       [96049.833788] R13: 000000000004e000 R14: ffffafd2c3707b80 R15: ffffafd2c3707b78
       [96049.833790] FS:  00007f5c14e7d700(0000) GS:ffff99047b200000(0000) knlGS:0000000000000000
       [96049.833791] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [96049.833792] CR2: 00007f5c146abff8 CR3: 0000000115f4c004 CR4: 00000000003606f0
       [96049.833795] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       [96049.833796] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       [96049.833796] Call Trace:
       [96049.833809]  __set_extent_bit+0x46c/0x6a0 [btrfs]
       [96049.833823]  lock_extent_bits+0x6b/0x210 [btrfs]
       [96049.833831]  ? _raw_spin_unlock+0x24/0x30
       [96049.833841]  ? test_range_bit+0xdf/0x130 [btrfs]
       [96049.833853]  lock_extent_range+0x8e/0x150 [btrfs]
       [96049.833864]  btrfs_double_extent_lock+0x78/0xb0 [btrfs]
       [96049.833875]  btrfs_extent_same_range+0x14e/0x550 [btrfs]
       [96049.833885]  ? rcu_read_lock_sched_held+0x3f/0x70
       [96049.833890]  ? __kmalloc_node+0x2b0/0x2f0
       [96049.833899]  ? btrfs_dedupe_file_range+0x19a/0x280 [btrfs]
       [96049.833909]  btrfs_dedupe_file_range+0x270/0x280 [btrfs]
       [96049.833916]  vfs_dedupe_file_range_one+0xd9/0xe0
       [96049.833919]  vfs_dedupe_file_range+0x131/0x1b0
       [96049.833924]  do_vfs_ioctl+0x272/0x6e0
       [96049.833927]  ? __fget+0x113/0x200
       [96049.833931]  ksys_ioctl+0x70/0x80
       [96049.833933]  __x64_sys_ioctl+0x16/0x20
       [96049.833937]  do_syscall_64+0x60/0x1b0
       [96049.833939]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
       [96049.833941] RIP: 0033:0x7f5c1478ddd7
       [96049.833943] RSP: 002b:00007ffe15b196a8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
       [96049.833945] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5c1478ddd7
       [96049.833946] RDX: 00005625ece322d0 RSI: 00000000c0189436 RDI: 0000000000000004
       [96049.833947] RBP: 0000000000000000 R08: 00007f5c14a46f48 R09: 0000000000000040
       [96049.833948] R10: 0000000000000541 R11: 0000000000000202 R12: 0000000000000000
       [96049.833949] R13: 0000000000000000 R14: 0000000000000004 R15: 00005625ece322d0
       [96049.833954] irq event stamp: 6196
       [96049.833956] hardirqs last  enabled at (6195): [<ffffffff91b00663>] console_unlock+0x503/0x640
       [96049.833958] hardirqs last disabled at (6196): [<ffffffff91a037dd>] trace_hardirqs_off_thunk+0x1a/0x1c
       [96049.833959] softirqs last  enabled at (6114): [<ffffffff92600370>] __do_softirq+0x370/0x421
       [96049.833964] softirqs last disabled at (6095): [<ffffffff91a8dd4d>] irq_exit+0xcd/0xe0
       [96049.833965] ---[ end trace db7b05f01b7fa10c ]---
       [96049.935816] R13: 0000000000000000 R14: 00005562e5259240 R15: 00007ffff092b910
       [96049.935822] irq event stamp: 6584
       [96049.935823] hardirqs last  enabled at (6583): [<ffffffff91b00663>] console_unlock+0x503/0x640
       [96049.935825] hardirqs last disabled at (6584): [<ffffffff91a037dd>] trace_hardirqs_off_thunk+0x1a/0x1c
       [96049.935827] softirqs last  enabled at (6328): [<ffffffff92600370>] __do_softirq+0x370/0x421
       [96049.935828] softirqs last disabled at (6313): [<ffffffff91a8dd4d>] irq_exit+0xcd/0xe0
       [96049.935829] ---[ end trace db7b05f01b7fa123 ]---
       [96049.935840] ------------[ cut here ]------------
       [96049.936065] WARNING: CPU: 1 PID: 24463 at fs/btrfs/extent_io.c:436 insert_state+0x101/0x120 [btrfs]
       [96049.936107] CPU: 1 PID: 24463 Comm: umount Tainted: G        W         4.19.0-rc7-btrfs-next-39 #1
       [96049.936108] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
       [96049.936117] RIP: 0010:insert_state+0x101/0x120 [btrfs]
       [96049.936119] RSP: 0018:ffffafd2c3637bc0 EFLAGS: 00010282
       [96049.936120] RAX: 0000000000000000 RBX: 000000000004dfff RCX: 0000000000000006
       [96049.936121] RDX: 0000000000000007 RSI: ffff990445cf88e0 RDI: ffff99047b2968a0
       [96049.936122] RBP: ffff990457851cd0 R08: 0000000000000001 R09: 0000000000000000
       [96049.936123] R10: ffffafd2c3637b88 R11: 0000000000000000 R12: ffff9904574301e8
       [96049.936124] R13: 000000000004e000 R14: ffffafd2c3637c50 R15: ffffafd2c3637c48
       [96049.936125] FS:  00007fe4b87e72c0(0000) GS:ffff99047b280000(0000) knlGS:0000000000000000
       [96049.936126] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [96049.936128] CR2: 00005562e52618d8 CR3: 00000001151c8005 CR4: 00000000003606e0
       [96049.936129] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       [96049.936131] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       [96049.936131] Call Trace:
       [96049.936141]  __set_extent_bit+0x46c/0x6a0 [btrfs]
       [96049.936154]  lock_extent_bits+0x6b/0x210 [btrfs]
       [96049.936167]  btrfs_evict_inode+0x1e1/0x5a0 [btrfs]
       [96049.936172]  evict+0xbf/0x1c0
       [96049.936174]  dispose_list+0x51/0x80
       [96049.936176]  evict_inodes+0x193/0x1c0
       [96049.936180]  generic_shutdown_super+0x3f/0x110
       [96049.936182]  kill_anon_super+0xe/0x30
       [96049.936189]  btrfs_kill_super+0x13/0x100 [btrfs]
       [96049.936191]  deactivate_locked_super+0x3a/0x70
       [96049.936193]  cleanup_mnt+0x3b/0x80
       [96049.936195]  task_work_run+0x93/0xc0
       [96049.936198]  exit_to_usermode_loop+0xfa/0x100
       [96049.936201]  do_syscall_64+0x17f/0x1b0
       [96049.936202]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
       [96049.936204] RIP: 0033:0x7fe4b80cfb37
       [96049.936206] RSP: 002b:00007ffff092b688 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
       [96049.936207] RAX: 0000000000000000 RBX: 00005562e5259060 RCX: 00007fe4b80cfb37
       [96049.936208] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00005562e525faa0
       [96049.936209] RBP: 00005562e525faa0 R08: 00005562e525f770 R09: 0000000000000015
       [96049.936210] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007fe4b85d1e64
       [96049.936211] R13: 0000000000000000 R14: 00005562e5259240 R15: 00007ffff092b910
       [96049.936211] R13: 0000000000000000 R14: 00005562e5259240 R15: 00007ffff092b910
       [96049.936216] irq event stamp: 6616
       [96049.936219] hardirqs last  enabled at (6615): [<ffffffff91b00663>] console_unlock+0x503/0x640
       [96049.936219] hardirqs last disabled at (6616): [<ffffffff91a037dd>] trace_hardirqs_off_thunk+0x1a/0x1c
       [96049.936222] softirqs last  enabled at (6328): [<ffffffff92600370>] __do_softirq+0x370/0x421
       [96049.936222] softirqs last disabled at (6313): [<ffffffff91a8dd4d>] irq_exit+0xcd/0xe0
       [96049.936223] ---[ end trace db7b05f01b7fa124 ]---
      
      The second stack trace, from inode eviction, is repeated forever due to
      the infinite loop during eviction.
      
      This is the same type of problem fixed way back in 2015 by commit
      113e8283 ("Btrfs: fix inode eviction infinite loop after extent_same
      ioctl") and commit ccccf3d6 ("Btrfs: fix inode eviction infinite loop
      after cloning into it").
      
      So fix this by returning immediately if the deduplication range length
      gets rounded down to 0 bytes, as there is nothing that needs to be done in
      such case.
      
      Example reproducer:
      
       $ mkfs.btrfs -f /dev/sdb
       $ mount /dev/sdb /mnt
      
       $ xfs_io -f -c "pwrite -S 0xe6 0 100" /mnt/foo
       $ xfs_io -f -c "pwrite -S 0xe6 0 1M" /mnt/bar
      
       # Unmount the filesystem and mount it again so that we start without any
       # extent state records when we ask for the deduplication.
       $ umount /mnt
       $ mount /dev/sdb /mnt
      
       $ xfs_io -c "dedupe /mnt/foo 0 500K 100" /mnt/bar
      
       # This unmount triggers the infinite loop.
       $ umount /mnt
      
      A test case for fstests will follow soon.
      
      Fixes: de02b9f6 ("Btrfs: fix data corruption when deduplicating between different files")
      CC: <stable@vger.kernel.org> # 4.19+
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bafd5b78
    • R
      Btrfs: fix cur_offset in the error case for nocow · db39065c
      Robbie Ko 提交于
      commit 506481b20e818db40b6198815904ecd2d6daee64 upstream.
      
      When the cow_file_range fails, the related resources are unlocked
      according to the range [start..end), so the unlock cannot be repeated in
      run_delalloc_nocow.
      
      In some cases (e.g. cur_offset <= end && cow_start != -1), cur_offset is
      not updated correctly, so move the cur_offset update before
      cow_file_range.
      
        kernel BUG at mm/page-writeback.c:2663!
        Internal error: Oops - BUG: 0 [#1] SMP
        CPU: 3 PID: 31525 Comm: kworker/u8:7 Tainted: P O
        Hardware name: Realtek_RTD1296 (DT)
        Workqueue: writeback wb_workfn (flush-btrfs-1)
        task: ffffffc076db3380 ti: ffffffc02e9ac000 task.ti: ffffffc02e9ac000
        PC is at clear_page_dirty_for_io+0x1bc/0x1e8
        LR is at clear_page_dirty_for_io+0x14/0x1e8
        pc : [<ffffffc00033c91c>] lr : [<ffffffc00033c774>] pstate: 40000145
        sp : ffffffc02e9af4f0
        Process kworker/u8:7 (pid: 31525, stack limit = 0xffffffc02e9ac020)
        Call trace:
        [<ffffffc00033c91c>] clear_page_dirty_for_io+0x1bc/0x1e8
        [<ffffffbffc514674>] extent_clear_unlock_delalloc+0x1e4/0x210 [btrfs]
        [<ffffffbffc4fb168>] run_delalloc_nocow+0x3b8/0x948 [btrfs]
        [<ffffffbffc4fb948>] run_delalloc_range+0x250/0x3a8 [btrfs]
        [<ffffffbffc514c0c>] writepage_delalloc.isra.21+0xbc/0x1d8 [btrfs]
        [<ffffffbffc516048>] __extent_writepage+0xe8/0x248 [btrfs]
        [<ffffffbffc51630c>] extent_write_cache_pages.isra.17+0x164/0x378 [btrfs]
        [<ffffffbffc5185a8>] extent_writepages+0x48/0x68 [btrfs]
        [<ffffffbffc4f5828>] btrfs_writepages+0x20/0x30 [btrfs]
        [<ffffffc00033d758>] do_writepages+0x30/0x88
        [<ffffffc0003ba0f4>] __writeback_single_inode+0x34/0x198
        [<ffffffc0003ba6c4>] writeback_sb_inodes+0x184/0x3c0
        [<ffffffc0003ba96c>] __writeback_inodes_wb+0x6c/0xc0
        [<ffffffc0003bac20>] wb_writeback+0x1b8/0x1c0
        [<ffffffc0003bb0f0>] wb_workfn+0x150/0x250
        [<ffffffc0002b0014>] process_one_work+0x1dc/0x388
        [<ffffffc0002b02f0>] worker_thread+0x130/0x500
        [<ffffffc0002b6344>] kthread+0x10c/0x110
        [<ffffffc000284590>] ret_from_fork+0x10/0x40
        Code: d503201f a9025bb5 a90363b7 f90023b9 (d4210000)
      
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NRobbie Ko <robbieko@synology.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      db39065c
    • F
      Btrfs: fix missing data checksums after a ranged fsync (msync) · fa625a48
      Filipe Manana 提交于
      commit 008c6753f7e070c77c70d708a6bf0255b4381763 upstream.
      
      Recently we got a massive simplification for fsync, where for the fast
      path we no longer log new extents while their respective ordered extents
      are still running.
      
      However that simplification introduced a subtle regression for the case
      where we use a ranged fsync (msync). Consider the following example:
      
                     CPU 0                                    CPU 1
      
                                                  mmap write to range [2Mb, 4Mb[
        mmap write to range [512Kb, 1Mb[
        msync range [512K, 1Mb[
          --> triggers fast fsync
              (BTRFS_INODE_NEEDS_FULL_SYNC
               not set)
          --> creates extent map A for this
              range and adds it to list of
              modified extents
          --> starts ordered extent A for
              this range
          --> waits for it to complete
      
                                                  writeback triggered for range
                                                  [2Mb, 4Mb[
                                                    --> create extent map B and
                                                        adds it to the list of
                                                        modified extents
                                                    --> creates ordered extent B
      
          --> start looking for and logging
              modified extents
          --> logs extent maps A and B
          --> finds checksums for extent A
              in the csum tree, but not for
              extent B
        fsync (msync) finishes
      
                                                    --> ordered extent B
                                                        finishes and its
                                                        checksums are added
                                                        to the csum tree
      
                                      <power cut>
      
      After replaying the log, we have the extent covering the range [2Mb, 4Mb[
      but do not have the data checksum items covering that file range.
      
      This happens because at the very beginning of an fsync (btrfs_sync_file())
      we start and wait for IO in the given range [512Kb, 1Mb[ and therefore
      wait for any ordered extents in that range to complete before we start
      logging the extents. However if right before we start logging the extent
      in our range [512Kb, 1Mb[, writeback is started for any other dirty range,
      such as the range [2Mb, 4Mb[ due to memory pressure or a concurrent fsync
      or msync (btrfs_sync_file() starts writeback before acquiring the inode's
      lock), an ordered extent is created for that other range and a new extent
      map is created to represent that range and added to the inode's list of
      modified extents.
      
      That means that we will see that other extent in that list when collecting
      extents for logging (done at btrfs_log_changed_extents()) and log the
      extent before the respective ordered extent finishes - namely before the
      checksum items are added to the checksums tree, which is where
      log_extent_csums() looks for the checksums, therefore making us log an
      extent without logging its checksums. Before that massive simplification
      of fsync, this wasn't a problem because besides looking for checkums in
      the checksums tree, we also looked for them in any ordered extent still
      running.
      
      The consequence of data checksums missing for a file range is that users
      attempting to read the affected file range will get -EIO errors and dmesg
      reports the following:
      
       [10188.358136] BTRFS info (device sdc): no csum found for inode 297 start 57344
       [10188.359278] BTRFS warning (device sdc): csum failed root 5 ino 297 off 57344 csum 0x98f94189 expected csum 0x00000000 mirror 1
      
      So fix this by skipping extents outside of our logging range at
      btrfs_log_changed_extents() and leaving them on the list of modified
      extents so that any subsequent ranged fsync may collect them if needed.
      Also, if we find a hole extent outside of the range still log it, just
      to prevent having gaps between extent items after replaying the log,
      otherwise fsck will complain when we are not using the NO_HOLES feature
      (fstest btrfs/056 triggers such case).
      
      Fixes: e7175a69 ("btrfs: remove the wait ordered logic in the log_one_extent path")
      CC: stable@vger.kernel.org # 4.19+
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fa625a48
    • L
      btrfs: fix pinned underflow after transaction aborted · ec26ad25
      Lu Fengqi 提交于
      commit fcd5e74288f7d36991b1f0fb96b8c57079645e38 upstream.
      
      When running generic/475, we may get the following warning in dmesg:
      
      [ 6902.102154] WARNING: CPU: 3 PID: 18013 at fs/btrfs/extent-tree.c:9776 btrfs_free_block_groups+0x2af/0x3b0 [btrfs]
      [ 6902.109160] CPU: 3 PID: 18013 Comm: umount Tainted: G        W  O      4.19.0-rc8+ #8
      [ 6902.110971] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
      [ 6902.112857] RIP: 0010:btrfs_free_block_groups+0x2af/0x3b0 [btrfs]
      [ 6902.118921] RSP: 0018:ffffc9000459bdb0 EFLAGS: 00010286
      [ 6902.120315] RAX: ffff880175050bb0 RBX: ffff8801124a8000 RCX: 0000000000170007
      [ 6902.121969] RDX: 0000000000000002 RSI: 0000000000170007 RDI: ffffffff8125fb74
      [ 6902.123716] RBP: ffff880175055d10 R08: 0000000000000000 R09: 0000000000000000
      [ 6902.125417] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880175055d88
      [ 6902.127129] R13: ffff880175050bb0 R14: 0000000000000000 R15: dead000000000100
      [ 6902.129060] FS:  00007f4507223780(0000) GS:ffff88017ba00000(0000) knlGS:0000000000000000
      [ 6902.130996] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 6902.132558] CR2: 00005623599cac78 CR3: 000000014b700001 CR4: 00000000003606e0
      [ 6902.134270] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 6902.135981] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 6902.137836] Call Trace:
      [ 6902.138939]  close_ctree+0x171/0x330 [btrfs]
      [ 6902.140181]  ? kthread_stop+0x146/0x1f0
      [ 6902.141277]  generic_shutdown_super+0x6c/0x100
      [ 6902.142517]  kill_anon_super+0x14/0x30
      [ 6902.143554]  btrfs_kill_super+0x13/0x100 [btrfs]
      [ 6902.144790]  deactivate_locked_super+0x2f/0x70
      [ 6902.146014]  cleanup_mnt+0x3b/0x70
      [ 6902.147020]  task_work_run+0x9e/0xd0
      [ 6902.148036]  do_syscall_64+0x470/0x600
      [ 6902.149142]  ? trace_hardirqs_off_thunk+0x1a/0x1c
      [ 6902.150375]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [ 6902.151640] RIP: 0033:0x7f45077a6a7b
      [ 6902.157324] RSP: 002b:00007ffd589f3e68 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
      [ 6902.159187] RAX: 0000000000000000 RBX: 000055e8eec732b0 RCX: 00007f45077a6a7b
      [ 6902.160834] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000055e8eec73490
      [ 6902.162526] RBP: 0000000000000000 R08: 000055e8eec734b0 R09: 00007ffd589f26c0
      [ 6902.164141] R10: 0000000000000000 R11: 0000000000000246 R12: 000055e8eec73490
      [ 6902.165815] R13: 00007f4507ac61a4 R14: 0000000000000000 R15: 00007ffd589f40d8
      [ 6902.167553] irq event stamp: 0
      [ 6902.168998] hardirqs last  enabled at (0): [<0000000000000000>]           (null)
      [ 6902.170731] hardirqs last disabled at (0): [<ffffffff810cd810>] copy_process.part.55+0x3b0/0x1f00
      [ 6902.172773] softirqs last  enabled at (0): [<ffffffff810cd810>] copy_process.part.55+0x3b0/0x1f00
      [ 6902.174671] softirqs last disabled at (0): [<0000000000000000>]           (null)
      [ 6902.176407] ---[ end trace 463138c2986b275c ]---
      [ 6902.177636] BTRFS info (device dm-3): space_info 4 has 273465344 free, is not full
      [ 6902.179453] BTRFS info (device dm-3): space_info total=276824064, used=4685824, pinned=18446744073708158976, reserved=0, may_use=0, readonly=65536
      
      In the above line there's "pinned=18446744073708158976" which is an
      unsigned u64 value of -1392640, an obvious underflow.
      
      When transaction_kthread is running cleanup_transaction(), another
      fsstress is running btrfs_commit_transaction(). The
      btrfs_finish_extent_commit() may get the same range as
      btrfs_destroy_pinned_extent() got, which causes the pinned underflow.
      
      Fixes: d4b450cd ("Btrfs: fix race between transaction commit and empty block group removal")
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NLu Fengqi <lufq.fnst@cn.fujitsu.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ec26ad25
    • M
      watchdog/core: Add missing prototypes for weak functions · cb7c993f
      Mathieu Malaterre 提交于
      commit 81bd415c91eb966118d773dddf254aebf3022411 upstream.
      
      The split out of the hard lockup detector exposed two new weak functions,
      but no prototypes for them, which triggers the build warning:
      
        kernel/watchdog.c:109:12: warning: no previous prototype for ‘watchdog_nmi_enable’ [-Wmissing-prototypes]
        kernel/watchdog.c:115:13: warning: no previous prototype for ‘watchdog_nmi_disable’ [-Wmissing-prototypes]
      
      Add the prototypes.
      
      Fixes: 73ce0511 ("kernel/watchdog.c: move hardlockup detector to separate file")
      Signed-off-by: NMathieu Malaterre <malat@debian.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Babu Moger <babu.moger@oracle.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20180606194232.17653-1-malat@debian.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cb7c993f
    • H
      arch/alpha, termios: implement BOTHER, IBSHIFT and termios2 · 139ca3da
      H. Peter Anvin (Intel) 提交于
      commit d0ffb805b729322626639336986bc83fc2e60871 upstream.
      
      Alpha has had c_ispeed and c_ospeed, but still set speeds in c_cflags
      using arbitrary flags. Because BOTHER is not defined, the general
      Linux code doesn't allow setting arbitrary baud rates, and because
      CBAUDEX == 0, we can have an array overrun of the baud_rate[] table in
      drivers/tty/tty_baudrate.c if (c_cflags & CBAUD) == 037.
      
      Resolve both problems by #defining BOTHER to 037 on Alpha.
      
      However, userspace still needs to know if setting BOTHER is actually
      safe given legacy kernels (does anyone actually care about that on
      Alpha anymore?), so enable the TCGETS2/TCSETS*2 ioctls on Alpha, even
      though they use the same structure. Define struct termios2 just for
      compatibility; it is the exact same structure as struct termios. In a
      future patchset, this will be cleaned up so the uapi headers are
      usable from libc.
      Signed-off-by: NH. Peter Anvin (Intel) <hpa@zytor.com>
      Cc: Jiri Slaby <jslaby@suse.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Kate Stewart <kstewart@linuxfoundation.org>
      Cc: Philippe Ombredanne <pombredanne@nexb.com>
      Cc: Eugene Syromiatnikov <esyr@redhat.com>
      Cc: <linux-alpha@vger.kernel.org>
      Cc: <linux-serial@vger.kernel.org>
      Cc: Johan Hovold <johan@kernel.org>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      139ca3da
    • H
      termios, tty/tty_baudrate.c: fix buffer overrun · 8851e11f
      H. Peter Anvin 提交于
      commit 991a25194097006ec1e0d2e0814ff920e59e3465 upstream.
      
      On architectures with CBAUDEX == 0 (Alpha and PowerPC), the code in tty_baudrate.c does
      not do any limit checking on the tty_baudrate[] array, and in fact a
      buffer overrun is possible on both architectures. Add a limit check to
      prevent that situation.
      
      This will be followed by a much bigger cleanup/simplification patch.
      Signed-off-by: NH. Peter Anvin (Intel) <hpa@zytor.com>
      Requested-by: NCc: Johan Hovold <johan@kernel.org>
      Cc: Jiri Slaby <jslaby@suse.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Kate Stewart <kstewart@linuxfoundation.org>
      Cc: Philippe Ombredanne <pombredanne@nexb.com>
      Cc: Eugene Syromiatnikov <esyr@redhat.com>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8851e11f
    • M
      x86/hyper-v: Enable PIT shutdown quirk · 2deb55aa
      Michael Kelley 提交于
      commit 1de72c706488b7be664a601cf3843bd01e327e58 upstream.
      
      Hyper-V emulation of the PIT has a quirk such that the normal PIT shutdown
      path doesn't work, because clearing the counter register restarts the
      timer.
      
      Disable the counter clearing on PIT shutdown.
      Signed-off-by: NMichael Kelley <mikelley@microsoft.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>
      Cc: "devel@linuxdriverproject.org" <devel@linuxdriverproject.org>
      Cc: "daniel.lezcano@linaro.org" <daniel.lezcano@linaro.org>
      Cc: "virtualization@lists.linux-foundation.org" <virtualization@lists.linux-foundation.org>
      Cc: "jgross@suse.com" <jgross@suse.com>
      Cc: "akataria@vmware.com" <akataria@vmware.com>
      Cc: "olaf@aepfle.de" <olaf@aepfle.de>
      Cc: "apw@canonical.com" <apw@canonical.com>
      Cc: vkuznets <vkuznets@redhat.com>
      Cc: "jasowang@redhat.com" <jasowang@redhat.com>
      Cc: "marcelo.cerri@canonical.com" <marcelo.cerri@canonical.com>
      Cc: KY Srinivasan <kys@microsoft.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/1541303219-11142-3-git-send-email-mikelley@microsoft.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2deb55aa
    • S
      x86/cpu/vmware: Do not trace vmware_sched_clock() · e73cb6a6
      Steven Rostedt (VMware) 提交于
      commit 15035388439f892017d38b05214d3cda6578af64 upstream.
      
      When running function tracing on a Linux guest running on VMware
      Workstation, the guest would crash. This is due to tracing of the
      sched_clock internal call of the VMware vmware_sched_clock(), which
      causes an infinite recursion within the tracing code (clock calls must
      not be traced).
      
      Make vmware_sched_clock() not traced by ftrace.
      
      Fixes: 80e9a4f2 ("x86/vmware: Add paravirt sched clock")
      Reported-by: NGwanYeong Kim <gy741.kim@gmail.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      CC: Alok Kataria <akataria@vmware.com>
      CC: GwanYeong Kim <gy741.kim@gmail.com>
      CC: "H. Peter Anvin" <hpa@zytor.com>
      CC: Ingo Molnar <mingo@kernel.org>
      Cc: stable@vger.kernel.org
      CC: Thomas Gleixner <tglx@linutronix.de>
      CC: virtualization@lists.linux-foundation.org
      CC: x86-ml <x86@kernel.org>
      Link: http://lkml.kernel.org/r/20181109152207.4d3e7d70@gandalf.local.homeSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e73cb6a6
    • J
      of, numa: Validate some distance map rules · 3cbdaf13
      John Garry 提交于
      commit 89c38422e072bb453e3045b8f1b962a344c3edea upstream.
      
      Currently the NUMA distance map parsing does not validate the distance
      table for the distance-matrix rules 1-2 in [1].
      
      However the arch NUMA code may enforce some of these rules, but not all.
      Such is the case for the arm64 port, which does not enforce the rule that
      the distance between separates nodes cannot equal LOCAL_DISTANCE.
      
      The patch adds the following rules validation:
      - distance of node to self equals LOCAL_DISTANCE
      - distance of separate nodes > LOCAL_DISTANCE
      
      This change avoids a yet-unresolved crash reported in [2].
      
      A note on dealing with symmetrical distances between nodes:
      
      Validating symmetrical distances between nodes is difficult. If it were
      mandated in the bindings that every distance must be recorded in the
      table, then it would be easy. However, it isn't.
      
      In addition to this, it is also possible to record [b, a] distance only
      (and not [a, b]). So, when processing the table for [b, a], we cannot
      assert that current distance of [a, b] != [b, a] as invalid, as [a, b]
      distance may not be present in the table and current distance would be
      default at REMOTE_DISTANCE.
      
      As such, we maintain the policy that we overwrite distance [a, b] = [b, a]
      for b > a. This policy is different to kernel ACPI SLIT validation, which
      allows non-symmetrical distances (ACPI spec SLIT rules allow it). However,
      the distance debug message is dropped as it may be misleading (for a distance
      which is later overwritten).
      
      Some final notes on semantics:
      
      - It is implied that it is the responsibility of the arch NUMA code to
        reset the NUMA distance map for an error in distance map parsing.
      
      - It is the responsibility of the FW NUMA topology parsing (whether OF or
        ACPI) to enforce NUMA distance rules, and not arch NUMA code.
      
      [1] Documents/devicetree/bindings/numa.txt
      [2] https://www.spinics.net/lists/arm-kernel/msg683304.html
      
      Cc: stable@vger.kernel.org # 4.7
      Signed-off-by: NJohn Garry <john.garry@huawei.com>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NRob Herring <robh@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3cbdaf13
    • A
      perf intel-pt: Insert callchain context into synthesized callchains · 73c660f3
      Adrian Hunter 提交于
      commit 242483068b4b9ad02f1653819b6e683577681e0e upstream.
      
      In the absence of a fallback, callchains must encode also the callchain
      context. Do that now there is no fallback.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: stable@vger.kernel.org # 4.19
      Link: http://lkml.kernel.org/r/100ea2ec-ed14-b56d-d810-e0a6d2f4b069@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      73c660f3
    • A
      perf intel-pt/bts: Calculate cpumode for synthesized samples · f3de8640
      Adrian Hunter 提交于
      commit 5d4f0edaa3ac4f1844ed7c64cd2bae6f1912bac5 upstream.
      
      In the absence of a fallback, samples must provide a correct cpumode for
      the 'ip'. Do that now there is no fallback.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: stable@vger.kernel.org # 4.19
      Link: http://lkml.kernel.org/r/20181031091043.23465-6-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f3de8640
    • D
      perf callchain: Honour the ordering of PERF_CONTEXT_{USER,KERNEL,etc} · 1b913453
      David S. Miller 提交于
      commit e9024d519d892b38176cafd46f68a7cdddd77412 upstream.
      
      When processing using 'perf report -g caller', which is the default, we
      ended up reverting the callchain entries received from the kernel, but
      simply reverting throws away the information that tells that from a
      point onwards the addresses are for userspace, kernel, guest kernel,
      guest user, hypervisor.
      
      The idea is that if we are walking backwards, for each cluster of
      non-cpumode entries we have to first scan backwards for the next one and
      use that for the cluster.
      
      This seems silly and more expensive than it needs to be but it is enough
      for a initial fix.
      
      The code here is really complicated because it is intimately intertwined
      with the lbr and branch handling, as well as this callchain order,
      further fixes will be needed to properly take into account the cpumode
      in those cases.
      
      Another problem with ORDER_CALLER is that the NULL "0" IP that is at the
      end of most callchains shows up at the top of the histogram because
      every callchain contains it and with ORDER_CALLER it is the first entry.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Souvik Banerjee <souvik1997@gmail.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: stable@vger.kernel.org # 4.19
      Link: https://lkml.kernel.org/n/tip-2wt3ayp6j2y2f2xowixa8y6y@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1b913453
    • T
      perf stat: Handle different PMU names with common prefix · 7b0131a0
      Thomas Richter 提交于
      commit ea1fa48c055f833eb25f0c33188feecb7002ada5 upstream.
      
      On s390 the CPU Measurement Facility for counters now supports
      2 PMUs named cpum_cf (CPU Measurement Facility for counters) and
      cpum_cf_diag (CPU Measurement Facility for diagnostic counters)
      for one and the same CPU.
      
      Running command
      
       [root@s35lp76 perf]# ./perf stat -e tx_c_tend \
      	 -- ~/mytests/cf-tx-events 1
      
       Measuring transactions
       TX_C_TABORT_NO_SPECIAL: 0 expected:0
       TX_C_TABORT_SPECIAL: 0 expected:0
       TX_C_TEND: 1 expected:1
       TX_NC_TABORT: 11 expected:11
       TX_NC_TEND: 1 expected:1
      
       Performance counter stats for '/root/mytests/cf-tx-events 1':
      
        2      tx_c_tend
      
            0.002120091 seconds time elapsed
      
            0.000121000 seconds user
            0.002127000 seconds sys
      
       [root@s35lp76 perf]#
      
      displays output which is unexpected (and wrong):
      
        2      tx_c_tend
      
      The test program definitely triggers only one transaction, as shown
      in line 'TX_C_TEND: 1 expected:1'.
      
      This is caused by the following call sequence:
      
      pmu_lookup() scans and installs a PMU.
      +--> pmu_aliases() parses all aliases in directory
      		.../<pmu-name>/events/* which are file names.
           +--> pmu_aliases_parse() Read each file in directory and create
                            an new alias entry. This is done with
                +--> perf_pmu__new_alias() and
      	       +--> __perf_pmu__new_alias() which also check for
      	                   identical alias names.
      
      After pmu_aliases() returns, a complete list of event names
      for this pmu has been created. Now function
      
      pmu_add_cpu_aliases()   is called to add the events listed in the json
      |                       files to the alias list of the cpu.
      +--> perf_pmu__find_map()  Returns a pointer to the json events.
      
      Now function pmu_add_cpu_aliases() scans through all events listed
      in the JSON files for this CPU.
      Each json event pmu name is compared with the current PMU being
      built up and if they mismatch, the json event is added to the
      current PMUs alias list.
      To avoid duplicate entries the following comparison is done:
      
      	if (!is_arm_pmu_core(name)) {
      	     pname = pe->pmu ? pe->pmu : "cpu";
      	     if (strncmp(pname, name, strlen(pname)))
      		     continue;
           }
      
      The culprit is the strncmp() function.
      
      Using current s390 PMU naming, the first PMU is 'cpum_cf'
      and a long list of events is added, among them 'tx_c_tend'
      
      When the second PMU named 'cpum_cf_diag' is added, only one event
      named 'CF_DIAG' is added by the pmu_aliases()  function.
      
      Now function pmu_add_cpu_aliases() is invoked for PMU 'cpum_cf_diag'.
      Since the CPUID string is the same for both PMUs, json file events
      for PMU named 'cpum_cf' are added to the PMU 'cpm_cf_diag'
      
      This happens because the strncmp() actually compares:
      
           strncmp("cpum_cf", "cpum_cf_diag", 6);
      
      The first parameter is the pmu name taken from the event in
      the json file. The second parameter is the pmu name of the PMU
      currently being built.
      They are different, but the length of the compare only tests the
      common prefix and this returns 0(true) when it should return false.
      
      Now all events for PMU cpum_cf are added to the alias list for pmu
      cpum_cf_diag.
      
      Later on in function parse_events_add_pmu() the event 'tx_c_end' is
      searched in all available PMUs and found twice, adding it two
      times to the evsel_list global variable which is the root
      of all events. This results in a counter value of 2 instead
      of 1.
      
      Output with this patch:
      
       [root@s35lp76 perf]# ./perf stat -e tx_c_tend \
      			-- ~/mytests/cf-tx-events 1
       Measuring transactions
       TX_C_TABORT_NO_SPECIAL: 0 expected:0
       TX_C_TABORT_SPECIAL: 0 expected:0
       TX_C_TEND: 1 expected:1
       TX_NC_TABORT: 11 expected:11
       TX_NC_TEND: 1 expected:1
      
       Performance counter stats for '/root/mytests/cf-tx-events 1':
      
                        1      tx_c_tend
      
            0.001815365 seconds time elapsed
      
            0.000123000 seconds user
            0.001756000 seconds sys
      
       [root@s35lp76 perf]#
      Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
      Reviewed-by: NHendrik Brueckner <brueckner@linux.ibm.com>
      Reviewed-by: NSebastien Boisvert <sboisvert@gydle.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: stable@vger.kernel.org
      Fixes: 292c34c1 ("perf pmu: Fix core PMU alias list for X86 platform")
      Link: http://lkml.kernel.org/r/20181023151616.78193-1-tmricht@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7b0131a0
    • L
      perf cs-etm: Correct CPU mode for samples · 29414ff3
      Leo Yan 提交于
      commit d6c9c05fe1eb4b213b183d8a1e79416256dc833a upstream.
      
      Since commit edeb0c90 ("perf tools: Stop fallbacking to kallsyms for
      vdso symbols lookup"), the kernel address cannot be properly parsed to
      kernel symbol with command 'perf script -k vmlinux'.  The reason is
      CoreSight samples is always to set CPU mode as PERF_RECORD_MISC_USER,
      thus it fails to find corresponding map/dso in below flows:
      
        process_sample_event()
          `-> machine__resolve()
      	  `-> thread__find_map(thread, sample->cpumode, sample->ip, al);
      
      In this flow it needs to pass argument 'sample->cpumode' to tell what's
      the CPU mode, before it always passed PERF_RECORD_MISC_USER but without
      any failure until the commit edeb0c90 ("perf tools: Stop fallbacking
      to kallsyms for vdso symbols lookup") has been merged.  The reason is
      even with the wrong CPU mode the function thread__find_map() firstly
      fails to find map but it will rollback to find kernel map for vdso
      symbols lookup.  In the latest code it has removed the fallback code,
      thus if CPU mode is PERF_RECORD_MISC_USER then it cannot find map
      anymore with kernel address.
      
      This patch is to correct samples CPU mode setting, it creates a new
      helper function cs_etm__cpu_mode() to tell what's the CPU mode based on
      the address with the info from machine structure; this patch has a bit
      extension to check not only kernel and user mode, but also check for
      host/guest and hypervisor mode.  Finally this patch uses the function in
      instruction and branch samples and also apply in cs_etm__mem_access()
      for a minor polishing.
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: stable@kernel.org # v4.19
      Link: http://lkml.kernel.org/r/1540883908-17018-1-git-send-email-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      29414ff3
    • D
      hwmon: (core) Fix double-free in __hwmon_device_register() · a63fffbd
      Dmitry Osipenko 提交于
      commit 74e3512731bd5c9673176425a76a7cc5efa8ddb6 upstream.
      
      Fix double-free that happens when thermal zone setup fails, see KASAN log
      below.
      
      ==================================================================
      BUG: KASAN: double-free or invalid-free in __hwmon_device_register+0x5dc/0xa7c
      
      CPU: 0 PID: 132 Comm: kworker/0:2 Tainted: G    B             4.19.0-rc8-next-20181016-00042-gb52cd80401e9-dirty #41
      Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
      Workqueue: events deferred_probe_work_func
      Backtrace:
      [<c0110540>] (dump_backtrace) from [<c0110944>] (show_stack+0x20/0x24)
      [<c0110924>] (show_stack) from [<c105cb08>] (dump_stack+0x9c/0xb0)
      [<c105ca6c>] (dump_stack) from [<c02fdaec>] (print_address_description+0x68/0x250)
      [<c02fda84>] (print_address_description) from [<c02fd4ac>] (kasan_report_invalid_free+0x68/0x88)
      [<c02fd444>] (kasan_report_invalid_free) from [<c02fc85c>] (__kasan_slab_free+0x1f4/0x200)
      [<c02fc668>] (__kasan_slab_free) from [<c02fd0c0>] (kasan_slab_free+0x14/0x18)
      [<c02fd0ac>] (kasan_slab_free) from [<c02f9c6c>] (kfree+0x90/0x294)
      [<c02f9bdc>] (kfree) from [<c0b41bbc>] (__hwmon_device_register+0x5dc/0xa7c)
      [<c0b415e0>] (__hwmon_device_register) from [<c0b421e8>] (hwmon_device_register_with_info+0xa0/0xa8)
      [<c0b42148>] (hwmon_device_register_with_info) from [<c0b42324>] (devm_hwmon_device_register_with_info+0x74/0xb4)
      [<c0b422b0>] (devm_hwmon_device_register_with_info) from [<c0b4481c>] (lm90_probe+0x414/0x578)
      [<c0b44408>] (lm90_probe) from [<c0aeeff4>] (i2c_device_probe+0x35c/0x384)
      [<c0aeec98>] (i2c_device_probe) from [<c08776cc>] (really_probe+0x290/0x3e4)
      [<c087743c>] (really_probe) from [<c0877a2c>] (driver_probe_device+0x80/0x1c4)
      [<c08779ac>] (driver_probe_device) from [<c0877da8>] (__device_attach_driver+0x104/0x11c)
      [<c0877ca4>] (__device_attach_driver) from [<c0874dd8>] (bus_for_each_drv+0xa4/0xc8)
      [<c0874d34>] (bus_for_each_drv) from [<c08773b0>] (__device_attach+0xf0/0x15c)
      [<c08772c0>] (__device_attach) from [<c0877e24>] (device_initial_probe+0x1c/0x20)
      [<c0877e08>] (device_initial_probe) from [<c08762f4>] (bus_probe_device+0xdc/0xec)
      [<c0876218>] (bus_probe_device) from [<c0876a08>] (deferred_probe_work_func+0xa8/0xd4)
      [<c0876960>] (deferred_probe_work_func) from [<c01527c4>] (process_one_work+0x3dc/0x96c)
      [<c01523e8>] (process_one_work) from [<c01541e0>] (worker_thread+0x4ec/0x8bc)
      [<c0153cf4>] (worker_thread) from [<c015b238>] (kthread+0x230/0x240)
      [<c015b008>] (kthread) from [<c01010bc>] (ret_from_fork+0x14/0x38)
      Exception stack(0xcf743fb0 to 0xcf743ff8)
      3fa0:                                     00000000 00000000 00000000 00000000
      3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      3fe0: 00000000 00000000 00000000 00000000 00000013 00000000
      
      Allocated by task 132:
       kasan_kmalloc.part.1+0x58/0xf4
       kasan_kmalloc+0x90/0xa4
       kmem_cache_alloc_trace+0x90/0x2a0
       __hwmon_device_register+0xbc/0xa7c
       hwmon_device_register_with_info+0xa0/0xa8
       devm_hwmon_device_register_with_info+0x74/0xb4
       lm90_probe+0x414/0x578
       i2c_device_probe+0x35c/0x384
       really_probe+0x290/0x3e4
       driver_probe_device+0x80/0x1c4
       __device_attach_driver+0x104/0x11c
       bus_for_each_drv+0xa4/0xc8
       __device_attach+0xf0/0x15c
       device_initial_probe+0x1c/0x20
       bus_probe_device+0xdc/0xec
       deferred_probe_work_func+0xa8/0xd4
       process_one_work+0x3dc/0x96c
       worker_thread+0x4ec/0x8bc
       kthread+0x230/0x240
       ret_from_fork+0x14/0x38
         (null)
      
      Freed by task 132:
       __kasan_slab_free+0x12c/0x200
       kasan_slab_free+0x14/0x18
       kfree+0x90/0x294
       hwmon_dev_release+0x1c/0x20
       device_release+0x4c/0xe8
       kobject_put+0xac/0x11c
       device_unregister+0x2c/0x30
       __hwmon_device_register+0xa58/0xa7c
       hwmon_device_register_with_info+0xa0/0xa8
       devm_hwmon_device_register_with_info+0x74/0xb4
       lm90_probe+0x414/0x578
       i2c_device_probe+0x35c/0x384
       really_probe+0x290/0x3e4
       driver_probe_device+0x80/0x1c4
       __device_attach_driver+0x104/0x11c
       bus_for_each_drv+0xa4/0xc8
       __device_attach+0xf0/0x15c
       device_initial_probe+0x1c/0x20
       bus_probe_device+0xdc/0xec
       deferred_probe_work_func+0xa8/0xd4
       process_one_work+0x3dc/0x96c
       worker_thread+0x4ec/0x8bc
       kthread+0x230/0x240
       ret_from_fork+0x14/0x38
         (null)
      
      Cc: <stable@vger.kernel.org> # v4.15+
      Fixes: 47c332de ("hwmon: Deal with errors from the thermal subsystem")
      Signed-off-by: NDmitry Osipenko <digetx@gmail.com>
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a63fffbd
    • A
      mtd: docg3: don't set conflicting BCH_CONST_PARAMS option · 14e58f9d
      Arnd Bergmann 提交于
      commit be2e1c9dcf76886a83fb1c433a316e26d4ca2550 upstream.
      
      I noticed during the creation of another bugfix that the BCH_CONST_PARAMS
      option that is set by DOCG3 breaks setting variable parameters for any
      other users of the BCH library code.
      
      The only other user we have today is the MTD_NAND software BCH
      implementation (most flash controllers use hardware BCH these days
      and are not affected). I considered removing BCH_CONST_PARAMS entirely
      because of the inherent conflict, but according to the description in
      lib/bch.c there is a significant performance benefit in keeping it.
      
      To avoid the immediate problem of the conflict between MTD_NAND_BCH
      and DOCG3, this only sets the constant parameters if MTD_NAND_BCH
      is disabled, which should fix the problem for all cases that
      are affected. This should also work for all stable kernels.
      
      Note that there is only one machine that actually seems to use the
      DOCG3 driver (arch/arm/mach-pxa/mioa701.c), so most users should have
      the driver disabled, but it almost certainly shows up if we wanted
      to test random kernels on machines that use software BCH in MTD.
      
      Fixes: d13d19ec ("mtd: docg3: add ECC correction code")
      Cc: stable@vger.kernel.org
      Cc: Robert Jarzmik <robert.jarzmik@free.fr>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NBoris Brezillon <boris.brezillon@bootlin.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      14e58f9d
    • B
      mtd: nand: Fix nanddev_neraseblocks() · 9bec0c3d
      Boris Brezillon 提交于
      commit d098093ba06eb032057d1aca1c2e45889e099d00 upstream.
      
      nanddev_neraseblocks() currently returns the number pages per LUN
      instead of the total number of eraseblocks.
      
      Fixes: 9c3736a3 ("mtd: nand: Add core infrastructure to deal with NAND devices")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NBoris Brezillon <boris.brezillon@bootlin.com>
      Reviewed-by: NMiquel Raynal <miquel.raynal@bootlin.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9bec0c3d
    • C
      mtd: spi-nor: cadence-quadspi: Return error code in cqspi_direct_read_execute() · 9e9dd0f1
      Christophe JAILLET 提交于
      commit 91d7b67000c6e9bd605624079fee5a084238ad92 upstream.
      
      We return 0 unconditionally in 'cqspi_direct_read_execute()'.
      However, 'ret' is set to some error codes in several error handling
      paths.
      
      Return 'ret' instead to propagate the error code.
      
      Fixes: ffa639e0 ("mtd: spi-nor: cadence-quadspi: Add DMA support for direct mode reads")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: NBoris Brezillon <boris.brezillon@bootlin.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9e9dd0f1
    • J
      bonding/802.3ad: fix link_failure_count tracking · 218b6e82
      Jarod Wilson 提交于
      commit ea53abfab960909d622ca37bcfb8e1c5378d21cc upstream.
      
      Commit 4d2c0cda set slave->link to
      BOND_LINK_DOWN for 802.3ad bonds whenever invalid speed/duplex values
      were read, to fix a problem with slaves getting into weird states, but
      in the process, broke tracking of link failures, as going straight to
      BOND_LINK_DOWN when a link is indeed down (cable pulled, switch rebooted)
      means we broke out of bond_miimon_inspect()'s BOND_LINK_DOWN case because
      !link_state was already true, we never incremented commit, and never got
      a chance to call bond_miimon_commit(), where slave->link_failure_count
      would be incremented. I believe the simple fix here is to mark the slave
      as BOND_LINK_FAIL, and let bond_miimon_inspect() transition the link from
      _FAIL to either _UP or _DOWN, and in the latter case, we now get proper
      incrementing of link_failure_count again.
      
      Fixes: 4d2c0cda ("bonding: speed/duplex update at NETDEV_UP event")
      CC: Mahesh Bandewar <maheshb@google.com>
      CC: David S. Miller <davem@davemloft.net>
      CC: netdev@vger.kernel.org
      CC: stable@vger.kernel.org
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      218b6e82
    • A
      ARM: 8809/1: proc-v7: fix Thumb annotation of cpu_v7_hvc_switch_mm · 9333523b
      Ard Biesheuvel 提交于
      commit 6282e916f774e37845c65d1eae9f8c649004f033 upstream.
      
      Due to what appears to be a copy/paste error, the opening ENTRY()
      of cpu_v7_hvc_switch_mm() lacks a matching ENDPROC(), and instead,
      the one for cpu_v7_smc_switch_mm() is duplicated.
      
      Given that it is ENDPROC() that emits the Thumb annotation, the
      cpu_v7_hvc_switch_mm() routine will be called in ARM mode on a
      Thumb2 kernel, resulting in the following splat:
      
        Internal error: Oops - undefined instruction: 0 [#1] SMP THUMB2
        Modules linked in:
        CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.0-rc1-00030-g4d28ad89189d-dirty #488
        Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
        PC is at cpu_v7_hvc_switch_mm+0x12/0x18
        LR is at flush_old_exec+0x31b/0x570
        pc : [<c0316efe>]    lr : [<c04117c7>]    psr: 00000013
        sp : ee899e50  ip : 00000000  fp : 00000001
        r10: eda28f34  r9 : eda31800  r8 : c12470e0
        r7 : eda1fc00  r6 : eda53000  r5 : 00000000  r4 : ee88c000
        r3 : c0316eec  r2 : 00000001  r1 : eda53000  r0 : 6da6c000
        Flags: nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
      
      Note the 'ISA ARM' in the last line.
      
      Fix this by using the correct name in ENDPROC().
      
      Cc: <stable@vger.kernel.org>
      Fixes: 10115105 ("ARM: spectre-v2: add firmware based hardening")
      Reviewed-by: NDave Martin <Dave.Martin@arm.com>
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9333523b
    • V
      netfilter: conntrack: fix calculation of next bucket number in early_drop · 1be1576a
      Vasily Khoruzhick 提交于
      commit f393808dc64149ccd0e5a8427505ba2974a59854 upstream.
      
      If there's no entry to drop in bucket that corresponds to the hash,
      early_drop() should look for it in other buckets. But since it increments
      hash instead of bucket number, it actually looks in the same bucket 8
      times: hsize is 16k by default (14 bits) and hash is 32-bit value, so
      reciprocal_scale(hash, hsize) returns the same value for hash..hash+7 in
      most cases.
      
      Fix it by increasing bucket number instead of hash and rename _hash
      to bucket to avoid future confusion.
      
      Fixes: 3e86638e ("netfilter: conntrack: consider ct netns in early_drop logic")
      Cc: <stable@vger.kernel.org> # v4.7+
      Signed-off-by: NVasily Khoruzhick <vasilykh@arista.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1be1576a