1. 08 2月, 2017 8 次提交
  2. 04 2月, 2017 1 次提交
    • W
      tick/broadcast: Reduce lock cacheline contention · 668802c2
      Waiman Long 提交于
      It was observed that on an Intel x86 system without the ARAT (Always
      running APIC timer) feature and with fairly large number of CPUs as
      well as CPUs coming in and out of intel_idle frequently, the lock
      contention on the tick_broadcast_lock can become significant.
      
      To reduce contention, the lock is put into its own cacheline and all
      the cpumask_var_t variables are put into the __read_mostly section.
      
      Running the SP benchmark of the NAS Parallel Benchmarks on a 4-socket
      16-core 32-thread Nehalam system, the performance number improved
      from 3353.94 Mop/s to 3469.31 Mop/s when this patch was applied on
      a 4.9.6 kernel.  This is a 3.4% improvement.
      Signed-off-by: NWaiman Long <longman@redhat.com>
      Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1485799063-20857-1-git-send-email-longman@redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      668802c2
  3. 30 1月, 2017 1 次提交
  4. 28 1月, 2017 1 次提交
  5. 22 1月, 2017 1 次提交
    • J
      x86/timer: Make delay() work during early bootup · 4c45c516
      Jiri Slaby 提交于
      When a panic happens during bootup, "Rebooting in X seconds.." is
      shown, but reboot happens immediatelly. It is because panic() uses mdelay()
      and mdelay() calls __const_udelay() immediately, which does not
      work while booting.
      
      The per_cpu cpu_info.loops_per_jiffy value is not initialized yet, so
      __const_udelay() actually multiplies the number of loops by zero. This
      results in __const_udelay() to delay the execution only by a nanosecond
      or so.
      
      So check whether cpu_info.loops_per_jiffy is zero and use
      loops_per_jiffy in that case. mdelay() will not be so precise without
      proper calibration, but it works relatively well.
      
      Before:
      
        [    0.170039] delaying 100ms
        [    0.170828] done
      
      After
      
        [    0.214042] delaying 100ms
        [    0.313974] done
      
      I do not think the added check matters given we are about to spin the
      processor in the next few hundred cycles.
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Reviewed-by: NAndy Shevchenko <andy.shevchenko@gmail.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20170119114730.2670-1-jslaby@suse.cz
      [ Minor edits. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      4c45c516
  6. 21 1月, 2017 1 次提交
    • R
      delay: Add explanation of udelay() inaccuracy · 9f819798
      Russell King 提交于
      There seems to be some misunderstanding that udelay() and friends will
      always guarantee the specified delay.  This is a false understanding.
      When udelay() is based on CPU cycles, it can return early for many
      reasons which are detailed by Linus' reply to me in a thread in 2011:
      
        http://lists.openwall.net/linux-kernel/2011/01/12/372
      
      However, a udelay test module was created in 2014 which allows udelay()
      to only be 0.5% fast, which is outside of the CPU-cycles udelay()
      results I measured back in 2011, which were deemed to be in the "we
      don't care" region.
      
      test_udelay() should be fixed to reflect the real allowable tolerance
      on udelay(), rather than 0.5%.
      
      Cc: David Riley <davidriley@chromium.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      9f819798
  7. 20 1月, 2017 1 次提交
  8. 14 1月, 2017 8 次提交
    • P
      math64, timers: Fix 32bit mul_u64_u32_shr() and friends · 9e3d6223
      Peter Zijlstra 提交于
      It turns out that while GCC-4.4 manages to generate 32x32->64 mult
      instructions for the 32bit mul_u64_u32_shr() code, any GCC after that
      fails horribly.
      
      Fix this by providing an explicit mul_u32_u32() function which can be
      architcture provided.
      Reported-by: NChris Metcalf <cmetcalf@mellanox.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: Chris Metcalf <cmetcalf@mellanox.com> [for tile]
      Cc: Christopher S. Hall <christopher.s.hall@intel.com>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Laurent Vivier <lvivier@redhat.com>
      Cc: Liav Rehana <liavr@mellanox.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Parit Bhargava <prarit@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20161209083011.GD15765@worktop.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9e3d6223
    • L
      Merge branch 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · e96f8f18
      Linus Torvalds 提交于
      Pull btrfs fixes from Chris Mason:
       "These are all over the place.
      
        The tracepoint part of the pull fixes a crash and adds a little more
        information to two tracepoints, while the rest are good old fashioned
        fixes"
      
      * 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
        btrfs: make tracepoint format strings more compact
        Btrfs: add truncated_len for ordered extent tracepoints
        Btrfs: add 'inode' for extent map tracepoint
        btrfs: fix crash when tracepoint arguments are freed by wq callbacks
        Btrfs: adjust outstanding_extents counter properly when dio write is split
        Btrfs: fix lockdep warning about log_mutex
        Btrfs: use down_read_nested to make lockdep silent
        btrfs: fix locking when we put back a delayed ref that's too new
        btrfs: fix error handling when run_delayed_extent_op fails
        btrfs: return the actual error value from  from btrfs_uuid_tree_iterate
      e96f8f18
    • L
      Merge tag 'ceph-for-4.10-rc4' of git://github.com/ceph/ceph-client · 04e39627
      Linus Torvalds 提交于
      Pull ceph fixes from Ilya Dryomov:
       "Two small fixups for the filesystem changes that went into this merge
        window"
      
      * tag 'ceph-for-4.10-rc4' of git://github.com/ceph/ceph-client:
        ceph: fix get_oldest_context()
        ceph: fix mds cluster availability check
      04e39627
    • L
      Merge tag 'vfio-v4.10-rc4' of git://github.com/awilliam/linux-vfio · af54efa4
      Linus Torvalds 提交于
      Pull VFIO fixes from Alex Williamson:
      
       - Cleanups and bug fixes for the mtty sample driver (Dan Carpenter)
      
       - Export and make use of has_capability() to fix incorrect use of
         ns_capable() for testing task capabilities (Jike Song)
      
      * tag 'vfio-v4.10-rc4' of git://github.com/awilliam/linux-vfio:
        vfio/type1: Remove pid_namespace.h include
        vfio iommu type1: fix the testing of capability for remote task
        capability: export has_capability
        vfio-mdev: remove some dead code
        vfio-mdev: buffer overflow in ioctl()
        vfio-mdev: return -EFAULT if copy_to_user() fails
      af54efa4
    • L
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 406732c9
      Linus Torvalds 提交于
      Pull KVM fixes from Paolo Bonzini:
      
       - fix for module unload vs deferred jump labels (note: there might be
         other buggy modules!)
      
       - two NULL pointer dereferences from syzkaller
      
       - also syzkaller: fix emulation of fxsave/fxrstor/sgdt/sidt, problem
         made worse during this merge window, "just" kernel memory leak on
         releases
      
       - fix emulation of "mov ss" - somewhat serious on AMD, less so on Intel
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: x86: fix emulation of "MOV SS, null selector"
        KVM: x86: fix NULL deref in vcpu_scan_ioapic
        KVM: eventfd: fix NULL deref irqbypass consumer
        KVM: x86: Introduce segmented_write_std
        KVM: x86: flush pending lapic jump label updates on module unload
        jump_labels: API for flushing deferred jump label updates
      406732c9
    • L
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · a65c9259
      Linus Torvalds 提交于
      Pull arm64 fixes from Catalin Marinas:
      
       - Fix huge_ptep_set_access_flags() to return "changed" when any of the
         ptes in the contiguous range is changed, not just the last one
      
       - Fix the adr_l assembly macro to work in modules under KASLR
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: assembler: make adr_l work in modules under KASLR
        arm64: hugetlb: fix the wrong return value for huge_ptep_set_access_flags
      a65c9259
    • L
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · c79d47f1
      Linus Torvalds 提交于
      Pull SCSI fixes from James Bottomley:
       "The major fix is the bfa firmware, since the latest 10Gb cards fail
        probing with the current firmware.
      
        The rest is a set of minor fixes: one missed Kconfig dependency
        causing randconfig failures, a missed error return on an error leg, a
        change for how multiqueue waits on a blocked device and a don't reset
        while in reset fix"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: bfa: Increase requested firmware version to 3.2.5.1
        scsi: snic: Return error code on memory allocation failure
        scsi: fnic: Avoid sending reset to firmware when another reset is in progress
        scsi: qedi: fix build, depends on UIO
        scsi: scsi-mq: Wait for .queue_rq() if necessary
      c79d47f1
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 6d90b4f9
      Linus Torvalds 提交于
      Pull input updates from Dmitry Torokhov:
       "Small driver fixups"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: elants_i2c - avoid divide by 0 errors on bad touchscreen data
        Input: adxl34x - make it enumerable in ACPI environment
        Input: ALPS - fix TrackStick Y axis handling for SS5 hardware
        Input: synaptics-rmi4 - fix F03 build error when serio is module
        Input: xpad - use correct product id for x360w controllers
        Input: synaptics_i2c - change msleep to usleep_range for small msecs
        Input: i8042 - add Pegatron touchpad to noloop table
        Input: joydev - remove unused linux/miscdevice.h include
      6d90b4f9
  9. 13 1月, 2017 12 次提交
  10. 12 1月, 2017 6 次提交
    • P
      KVM: x86: fix emulation of "MOV SS, null selector" · 33ab9110
      Paolo Bonzini 提交于
      This is CVE-2017-2583.  On Intel this causes a failed vmentry because
      SS's type is neither 3 nor 7 (even though the manual says this check is
      only done for usable SS, and the dmesg splat says that SS is unusable!).
      On AMD it's worse: svm.c is confused and sets CPL to 0 in the vmcb.
      
      The fix fabricates a data segment descriptor when SS is set to a null
      selector, so that CPL and SS.DPL are set correctly in the VMCS/vmcb.
      Furthermore, only allow setting SS to a NULL selector if SS.RPL < 3;
      this in turn ensures CPL < 3 because RPL must be equal to CPL.
      
      Thanks to Andy Lutomirski and Willy Tarreau for help in analyzing
      the bug and deciphering the manuals.
      Reported-by: NXiaohan Zhang <zhangxiaohan1@huawei.com>
      Fixes: 79d5b4c3
      Cc: stable@nongnu.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      33ab9110
    • J
      capability: export has_capability · 19c816e8
      Jike Song 提交于
      has_capability() is sometimes needed by modules to test capability
      for specified task other than current, so export it.
      
      Cc: Kirti Wankhede <kwankhede@nvidia.com>
      Signed-off-by: NJike Song <jike.song@intel.com>
      Acked-by: NSerge Hallyn <serge@hallyn.com>
      Acked-by: NJames Morris <james.l.morris@oracle.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      19c816e8
    • W
      KVM: x86: fix NULL deref in vcpu_scan_ioapic · 546d87e5
      Wanpeng Li 提交于
      Reported by syzkaller:
      
          BUG: unable to handle kernel NULL pointer dereference at 00000000000001b0
          IP: _raw_spin_lock+0xc/0x30
          PGD 3e28eb067
          PUD 3f0ac6067
          PMD 0
          Oops: 0002 [#1] SMP
          CPU: 0 PID: 2431 Comm: test Tainted: G           OE   4.10.0-rc1+ #3
          Call Trace:
           ? kvm_ioapic_scan_entry+0x3e/0x110 [kvm]
           kvm_arch_vcpu_ioctl_run+0x10a8/0x15f0 [kvm]
           ? pick_next_task_fair+0xe1/0x4e0
           ? kvm_arch_vcpu_load+0xea/0x260 [kvm]
           kvm_vcpu_ioctl+0x33a/0x600 [kvm]
           ? hrtimer_try_to_cancel+0x29/0x130
           ? do_nanosleep+0x97/0xf0
           do_vfs_ioctl+0xa1/0x5d0
           ? __hrtimer_init+0x90/0x90
           ? do_nanosleep+0x5b/0xf0
           SyS_ioctl+0x79/0x90
           do_syscall_64+0x6e/0x180
           entry_SYSCALL64_slow_path+0x25/0x25
          RIP: _raw_spin_lock+0xc/0x30 RSP: ffffa43688973cc0
      
      The syzkaller folks reported a NULL pointer dereference due to
      ENABLE_CAP succeeding even without an irqchip.  The Hyper-V
      synthetic interrupt controller is activated, resulting in a
      wrong request to rescan the ioapic and a NULL pointer dereference.
      
          #include <sys/ioctl.h>
          #include <sys/mman.h>
          #include <sys/types.h>
          #include <linux/kvm.h>
          #include <pthread.h>
          #include <stddef.h>
          #include <stdint.h>
          #include <stdlib.h>
          #include <string.h>
          #include <unistd.h>
      
          #ifndef KVM_CAP_HYPERV_SYNIC
          #define KVM_CAP_HYPERV_SYNIC 123
          #endif
      
          void* thr(void* arg)
          {
      	struct kvm_enable_cap cap;
      	cap.flags = 0;
      	cap.cap = KVM_CAP_HYPERV_SYNIC;
      	ioctl((long)arg, KVM_ENABLE_CAP, &cap);
      	return 0;
          }
      
          int main()
          {
      	void *host_mem = mmap(0, 0x1000, PROT_READ|PROT_WRITE,
      			MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
      	int kvmfd = open("/dev/kvm", 0);
      	int vmfd = ioctl(kvmfd, KVM_CREATE_VM, 0);
      	struct kvm_userspace_memory_region memreg;
      	memreg.slot = 0;
      	memreg.flags = 0;
      	memreg.guest_phys_addr = 0;
      	memreg.memory_size = 0x1000;
      	memreg.userspace_addr = (unsigned long)host_mem;
      	host_mem[0] = 0xf4;
      	ioctl(vmfd, KVM_SET_USER_MEMORY_REGION, &memreg);
      	int cpufd = ioctl(vmfd, KVM_CREATE_VCPU, 0);
      	struct kvm_sregs sregs;
      	ioctl(cpufd, KVM_GET_SREGS, &sregs);
      	sregs.cr0 = 0;
      	sregs.cr4 = 0;
      	sregs.efer = 0;
      	sregs.cs.selector = 0;
      	sregs.cs.base = 0;
      	ioctl(cpufd, KVM_SET_SREGS, &sregs);
      	struct kvm_regs regs = { .rflags = 2 };
      	ioctl(cpufd, KVM_SET_REGS, &regs);
      	ioctl(vmfd, KVM_CREATE_IRQCHIP, 0);
      	pthread_t th;
      	pthread_create(&th, 0, thr, (void*)(long)cpufd);
      	usleep(rand() % 10000);
      	ioctl(cpufd, KVM_RUN, 0);
      	pthread_join(th, 0);
      	return 0;
          }
      
      This patch fixes it by failing ENABLE_CAP if without an irqchip.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Fixes: 5c919412 (kvm/x86: Hyper-V synthetic interrupt controller)
      Cc: stable@vger.kernel.org # 4.5+
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      546d87e5
    • W
      KVM: eventfd: fix NULL deref irqbypass consumer · 4f3dbdf4
      Wanpeng Li 提交于
      Reported syzkaller:
      
          BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
          IP: irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass]
          PGD 0
      
          Oops: 0002 [#1] SMP
          CPU: 1 PID: 125 Comm: kworker/1:1 Not tainted 4.9.0+ #1
          Workqueue: kvm-irqfd-cleanup irqfd_shutdown [kvm]
          task: ffff9bbe0dfbb900 task.stack: ffffb61802014000
          RIP: 0010:irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass]
          Call Trace:
           irqfd_shutdown+0x66/0xa0 [kvm]
           process_one_work+0x16b/0x480
           worker_thread+0x4b/0x500
           kthread+0x101/0x140
           ? process_one_work+0x480/0x480
           ? kthread_create_on_node+0x60/0x60
           ret_from_fork+0x25/0x30
          RIP: irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass] RSP: ffffb61802017e20
          CR2: 0000000000000008
      
      The syzkaller folks reported a NULL pointer dereference that due to
      unregister an consumer which fails registration before. The syzkaller
      creates two VMs w/ an equal eventfd occasionally. So the second VM
      fails to register an irqbypass consumer. It will make irqfd as inactive
      and queue an workqueue work to shutdown irqfd and unregister the irqbypass
      consumer when eventfd is closed. However, the second consumer has been
      initialized though it fails registration. So the token(same as the first
      VM's) is taken to unregister the consumer through the workqueue, the
      consumer of the first VM is found and unregistered, then NULL deref incurred
      in the path of deleting consumer from the consumers list.
      
      This patch fixes it by making irq_bypass_register/unregister_consumer()
      looks for the consumer entry based on consumer pointer itself instead of
      token matching.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Suggested-by: NAlex Williamson <alex.williamson@redhat.com>
      Cc: stable@vger.kernel.org
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Alex Williamson <alex.williamson@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4f3dbdf4
    • S
      KVM: x86: Introduce segmented_write_std · 129a72a0
      Steve Rutherford 提交于
      Introduces segemented_write_std.
      
      Switches from emulated reads/writes to standard read/writes in fxsave,
      fxrstor, sgdt, and sidt.  This fixes CVE-2017-2584, a longstanding
      kernel memory leak.
      
      Since commit 283c95d0 ("KVM: x86: emulate FXSAVE and FXRSTOR",
      2016-11-09), which is luckily not yet in any final release, this would
      also be an exploitable kernel memory *write*!
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: stable@vger.kernel.org
      Fixes: 96051572
      Fixes: 283c95d0Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSteve Rutherford <srutherford@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      129a72a0
    • D
      KVM: x86: flush pending lapic jump label updates on module unload · cef84c30
      David Matlack 提交于
      KVM's lapic emulation uses static_key_deferred (apic_{hw,sw}_disabled).
      These are implemented with delayed_work structs which can still be
      pending when the KVM module is unloaded. We've seen this cause kernel
      panics when the kvm_intel module is quickly reloaded.
      
      Use the new static_key_deferred_flush() API to flush pending updates on
      module unload.
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cef84c30