1. 10 3月, 2016 2 次提交
    • P
      KVM: MMU: fix reserved bit check for ept=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 · 5f0b8199
      Paolo Bonzini 提交于
      KVM has special logic to handle pages with pte.u=1 and pte.w=0 when
      CR0.WP=1.  These pages' SPTEs flip continuously between two states:
      U=1/W=0 (user and supervisor reads allowed, supervisor writes not allowed)
      and U=0/W=1 (supervisor reads and writes allowed, user writes not allowed).
      
      When SMEP is in effect, however, U=0 will enable kernel execution of
      this page.  To avoid this, KVM also sets NX=1 in the shadow PTE together
      with U=0, making the two states U=1/W=0/NX=gpte.NX and U=0/W=1/NX=1.
      When guest EFER has the NX bit cleared, the reserved bit check thinks
      that the latter state is invalid; teach it that the smep_andnot_wp case
      will also use the NX bit of SPTEs.
      
      Cc: stable@vger.kernel.org
      Reviewed-by: NXiao Guangrong <guangrong.xiao@linux.inel.com>
      Fixes: c258b62bSigned-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5f0b8199
    • P
      KVM: MMU: fix ept=0/pte.u=1/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo · 844a5fe2
      Paolo Bonzini 提交于
      Yes, all of these are needed. :) This is admittedly a bit odd, but
      kvm-unit-tests access.flat tests this if you run it with "-cpu host"
      and of course ept=0.
      
      KVM runs the guest with CR0.WP=1, so it must handle supervisor writes
      specially when pte.u=1/pte.w=0/CR0.WP=0.  Such writes cause a fault
      when U=1 and W=0 in the SPTE, but they must succeed because CR0.WP=0.
      When KVM gets the fault, it sets U=0 and W=1 in the shadow PTE and
      restarts execution.  This will still cause a user write to fault, while
      supervisor writes will succeed.  User reads will fault spuriously now,
      and KVM will then flip U and W again in the SPTE (U=1, W=0).  User reads
      will be enabled and supervisor writes disabled, going back to the
      originary situation where supervisor writes fault spuriously.
      
      When SMEP is in effect, however, U=0 will enable kernel execution of
      this page.  To avoid this, KVM also sets NX=1 in the shadow PTE together
      with U=0.  If the guest has not enabled NX, the result is a continuous
      stream of page faults due to the NX bit being reserved.
      
      The fix is to force EFER.NX=1 even if the CPU is taking care of the EFER
      switch.  (All machines with SMEP have the CPU_LOAD_IA32_EFER vm-entry
      control, so they do not use user-return notifiers for EFER---if they did,
      EFER.NX would be forced to the same value as the host).
      
      There is another bug in the reserved bit check, which I've split to a
      separate patch for easier application to stable kernels.
      
      Cc: stable@vger.kernel.org
      Cc: Andy Lutomirski <luto@amacapital.net>
      Reviewed-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
      Fixes: f6577a5fSigned-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      844a5fe2
  2. 09 3月, 2016 1 次提交
    • D
      kvm: cap halt polling at exactly halt_poll_ns · 313f636d
      David Matlack 提交于
      When growing halt-polling, there is no check that the poll time exceeds
      the limit. It's possible for vcpu->halt_poll_ns grow once past
      halt_poll_ns, and stay there until a halt which takes longer than
      vcpu->halt_poll_ns. For example, booting a Linux guest with
      halt_poll_ns=11000:
      
       ... kvm:kvm_halt_poll_ns: vcpu 0: halt_poll_ns 0 (shrink 10000)
       ... kvm:kvm_halt_poll_ns: vcpu 0: halt_poll_ns 10000 (grow 0)
       ... kvm:kvm_halt_poll_ns: vcpu 0: halt_poll_ns 20000 (grow 10000)
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Fixes: aca6ff29
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      313f636d
  3. 08 3月, 2016 4 次提交
    • D
      KVM: s390: correct fprs on SIGP (STOP AND) STORE STATUS · 9522b37f
      David Hildenbrand 提交于
      With MACHINE_HAS_VX, we convert the floating point registers from the
      vector registeres when storing the status. For other VCPUs, these are
      stored to vcpu->run->s.regs.vrs, but we are using current->thread.fpu.vxrs,
      which resolves to the currently loaded VCPU.
      
      So kvm_s390_store_status_unloaded() currently writes the wrong floating
      point registers (converted from the vector registers) when called from
      another VCPU on a z13.
      
      This is only the case for old user space not handling SIGP STORE STATUS and
      SIGP STOP AND STORE STATUS, but relying on the kernel implementation. All
      other calls come from the loaded VCPU via kvm_s390_store_status().
      
      Fixes: 9abc2a08 (KVM: s390: fix memory overwrites when vx is disabled)
      Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Cc: stable@vger.kernel.org # v4.4+
      Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9522b37f
    • P
      Merge branch 'kvm-ppc-fixes' of... · 8bb9b9cc
      Paolo Bonzini 提交于
      Merge branch 'kvm-ppc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
      8bb9b9cc
    • R
      KVM: VMX: disable PEBS before a guest entry · 7099e2e1
      Radim Krčmář 提交于
      Linux guests on Haswell (and also SandyBridge and Broadwell, at least)
      would crash if you decided to run a host command that uses PEBS, like
        perf record -e 'cpu/mem-stores/pp' -a
      
      This happens because KVM is using VMX MSR switching to disable PEBS, but
      SDM [2015-12] 18.4.4.4 Re-configuring PEBS Facilities explains why it
      isn't safe:
        When software needs to reconfigure PEBS facilities, it should allow a
        quiescent period between stopping the prior event counting and setting
        up a new PEBS event. The quiescent period is to allow any latent
        residual PEBS records to complete its capture at their previously
        specified buffer address (provided by IA32_DS_AREA).
      
      There might not be a quiescent period after the MSR switch, so a CPU
      ends up using host's MSR_IA32_DS_AREA to access an area in guest's
      memory.  (Or MSR switching is just buggy on some models.)
      
      The guest can learn something about the host this way:
      If the guest doesn't map address pointed by MSR_IA32_DS_AREA, it results
      in #PF where we leak host's MSR_IA32_DS_AREA through CR2.
      
      After that, a malicious guest can map and configure memory where
      MSR_IA32_DS_AREA is pointing and can therefore get an output from
      host's tracing.
      
      This is not a critical leak as the host must initiate with PEBS tracing
      and I have not been able to get a record from more than one instruction
      before vmentry in vmx_vcpu_run() (that place has most registers already
      overwritten with guest's).
      
      We could disable PEBS just few instructions before vmentry, but
      disabling it earlier shouldn't affect host tracing too much.
      We also don't need to switch MSR_IA32_PEBS_ENABLE on VMENTRY, but that
      optimization isn't worth its code, IMO.
      
      (If you are implementing PEBS for guests, be sure to handle the case
       where both host and guest enable PEBS, because this patch doesn't.)
      
      Fixes: 26a4f3c0 ("perf/x86: disable PEBS on a guest entry.")
      Cc: <stable@vger.kernel.org>
      Reported-by: NJiří Olša <jolsa@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7099e2e1
    • P
      KVM: PPC: Book3S HV: Sanitize special-purpose register values on guest exit · ccec4456
      Paul Mackerras 提交于
      Thomas Huth discovered that a guest could cause a hard hang of a
      host CPU by setting the Instruction Authority Mask Register (IAMR)
      to a suitable value.  It turns out that this is because when the
      code was added to context-switch the new special-purpose registers
      (SPRs) that were added in POWER8, we forgot to add code to ensure
      that they were restored to a sane value on guest exit.
      
      This adds code to set those registers where a bad value could
      compromise the execution of the host kernel to a suitable neutral
      value on guest exit.
      
      Cc: stable@vger.kernel.org # v3.14+
      Fixes: b005255eReported-by: NThomas Huth <thuth@redhat.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      ccec4456
  4. 07 3月, 2016 10 次提交
  5. 06 3月, 2016 7 次提交
  6. 05 3月, 2016 16 次提交
    • L
      Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · a7c9b603
      Linus Torvalds 提交于
      Pull libnvcimm fix from Dan Williams:
       "One straggling fix for NVDIMM support.
      
        The KVM/QEMU enabling for NVDIMMs has recently reached the point where
        it is able to accept some ACPI _DSM requests from a guest VM.  However
        they immediately found that the 4.5-rc kernel is unusable because the
        kernel's 'nfit' driver fails to load upon seeing a valid "not
        supported" response from the virtual BIOS for an address range scrub
        command.
      
        It is not mandatory that a platform implement address range scrubbing,
        so this fix from Vishal properly treats the 'not supported' response
        as 'skip scrubbing and continue loading the driver'"
      
      * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        nfit: Continue init even if ARS commands are unimplemented
      a7c9b603
    • L
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · c12f83c3
      Linus Torvalds 提交于
      Pull SCSI fixes from James Bottomley:
       "Two fairly simple fixes.
      
        One is a regression with ipr firmware loading caused by one of the
        trivial patches in the last merge window which failed to strip the \n
        from the file name string, so now the firmware loader no longer works
        leading to a lot of unhappy ipr users; fix by stripping the \n.
      
        The second is a memory leak within SCSI: the BLK_PREP_INVALID state
        was introduced a recent fix but we forgot to account for it correctly
        when freeing state, resulting in memory leakage.  Add the correct
        state freeing in scsi_prep_return()"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        ipr: Fix regression when loading firmware
        SCSI: Free resources when we return BLKPREP_INVALID
      c12f83c3
    • L
      Merge branch 'for-4.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata · fab3e94a
      Linus Torvalds 提交于
      Pull libata fixes from Tejun Heo:
       "Assorted fixes for libata drivers.
      
         - Turns out HDIO_GET_32BIT ioctl was subtly broken all along.
      
         - Recent update to ahci external port handling was incorrectly
           marking hotpluggable ports as external making userland handle
           devices connected to those ports incorrectly.
      
         - ahci_xgene needs its own irq handler to work around a hardware
           erratum.  libahci updated to allow irq handler override.
      
         - Misc driver specific updates"
      
      * 'for-4.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
        ata: ahci: don't mark HotPlugCapable Ports as external/removable
        ahci: Workaround for ThunderX Errata#22536
        libata: Align ata_device's id on a cacheline
        Adding Intel Lewisburg device IDs for SATA
        pata-rb532-cf: get rid of the irq_to_gpio() call
        libata: fix HDIO_GET_32BIT ioctl
        ahci_xgene: Implement the workaround to fix the missing of the edge interrupt for the HOST_IRQ_STAT.
        ata: Remove the AHCI_HFLAG_EDGE_IRQ support from libahci.
        libahci: Implement the capability to override the generic ahci interrupt handler.
      fab3e94a
    • L
      Merge branch 'for-linus2' of git://git.kernel.dk/linux-block · e5322c54
      Linus Torvalds 提交于
      Pull block fixes from Jens Axboe:
       "Round 2 of this.  I cut back to the bare necessities, the patch is
        still larger than it usually would be at this time, due to the number
        of NVMe fixes in there.  This pull request contains:
      
         - The 4 core fixes from Ming, that fix both problems with exceeding
           the virtual boundary limit in case of merging, and the gap checking
           for cloned bio's.
      
         - NVMe fixes from Keith and Christoph:
      
              - Regression on larger user commands, causing problems with
                reading log pages (for instance). This touches both NVMe,
                and the block core since that is now generally utilized also
                for these types of commands.
      
              - Hot removal fixes.
      
              - User exploitable issue with passthrough IO commands, if !length
                is given, causing us to fault on writing to the zero
                page.
      
              - Fix for a hang under error conditions
      
         - And finally, the current series regression for umount with cgroup
           writeback, where the final flush would happen async and hence open
           up window after umount where the device wasn't consistent.  fsck
           right after umount would show this.  From Tejun"
      
      * 'for-linus2' of git://git.kernel.dk/linux-block:
        block: support large requests in blk_rq_map_user_iov
        block: fix blk_rq_get_max_sectors for driver private requests
        nvme: fix max_segments integer truncation
        nvme: set queue limits for the admin queue
        writeback: flush inode cgroup wb switches instead of pinning super_block
        NVMe: Fix 0-length integrity payload
        NVMe: Don't allow unsupported flags
        NVMe: Move error handling to failed reset handler
        NVMe: Simplify device reset failure
        NVMe: Fix namespace removal deadlock
        NVMe: Use IDA for namespace disk naming
        NVMe: Don't unmap controller registers on reset
        block: merge: get the 1st and last bvec via helpers
        block: get the 1st and last bvec via helpers
        block: check virt boundary in bio_will_gap()
        block: bio: introduce helpers to get the 1st and last bvec
      e5322c54
    • L
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma · bdf9d297
      Linus Torvalds 提交于
      Pull rdma fixes from Doug Ledford:
       "Additional 4.5-rc6 fixes.
      
        I have four patches today.  I had previously thought I had submitted
        two of them last week, but they were accidentally skipped :-(.
      
         - One fix to an error path in the core
         - One fix for RoCE in the core
         - Two related fixes for the core/mlx5"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
        IB/core: Use GRH when the path hop-limit > 0
        IB/{core, mlx5}: Fix input len in vendor part of create_qp/srq
        IB/mlx5: Avoid using user-index for SRQs
        IB/core: Fix missed clean call in registration path
      bdf9d297
    • L
      Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux · 638c201e
      Linus Torvalds 提交于
      Pull drm fixes from Dave Airlie:
       "This contains one i915 patch twice, as I merged it locally for
        testing, and then pulled some stuff in on top, and then Jani sent to
        me, I didn't think it was worth redoing all the merges of what I had
        tested.
      
        Summary:
      
         - amdgpu/radeon fixes for some more power management and VM races.
      
         - Two i915 fixes, one for the a recent regression, one another power
           management fix for skylake.
      
         - Two tegra dma mask fixes for a regression.
      
         - One ast fix for a typo I made transcribing the userspace driver,
           that I'd like to get into stable so I don't forget about it"
      
      * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
        gpu: host1x: Set DMA ops on device creation
        gpu: host1x: Set DMA mask
        drm/amdgpu: return from atombios_dp_get_dpcd only when error
        drm/amdgpu/cz: remove commented out call to enable vce pg
        drm/amdgpu/powerplay/cz: enable/disable vce dpm independent of vce pg
        drm/amdgpu/cz: enable/disable vce dpm even if vce pg is disabled
        drm/amdgpu/gfx8: specify which engine to wait before vm flush
        drm/amdgpu: apply gfx_v8 fixes to gfx_v7 as well
        drm/amd/powerplay: send event to notify powerplay all modules are initialized.
        drm/amd/powerplay: export AMD_PP_EVENT_COMPLETE_INIT task to amdgpu.
        drm/radeon/pm: update current crtc info after setting the powerstate
        drm/amdgpu/pm: update current crtc info after setting the powerstate
        drm/i915: Balance assert_rpm_wakelock_held() for !IS_ENABLED(CONFIG_PM)
        drm/i915/skl: Fix power domain suspend sequence
        drm/ast: Fix incorrect register check for DRAM width
        drm/i915: Balance assert_rpm_wakelock_held() for !IS_ENABLED(CONFIG_PM)
      638c201e
    • L
      Merge tag 'pm+acpi-4.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · b80e8e28
      Linus Torvalds 提交于
      Pull power management and ACPI fixes from Rafael Wysocki:
       "Two build fixes for cpufreq drivers (including one for breakage
        introduced recently) and a fix for a graph tracer crash when used over
        suspend-to-RAM on x86.
      
        Specifics:
      
         - Prevent the graph tracer from crashing when used over suspend-to-
           RAM on x86 by pausing it before invoking do_suspend_lowlevel() and
           un-pausing it when that function has returned (Todd Brandt).
      
         - Fix build issues in the qoriq and mediatek cpufreq drivers related
           to broken dependencies on THERMAL (Arnd Bergmann)"
      
      * tag 'pm+acpi-4.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        PM / sleep / x86: Fix crash on graph trace through x86 suspend
        cpufreq: mediatek: allow building as a module
        cpufreq: qoriq: allow building as module with THERMAL=m
      b80e8e28
    • L
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · ed385c7a
      Linus Torvalds 提交于
      Pull arm64 fix from Will Deacon:
       "Arm64 fix for -rc7.  Without it, our struct page array can overflow
        the vmemmap region on systems with a large PHYS_OFFSET.
      
        Nothing else on the radar at the moment, so hopefully that's it for
        4.5 from us.
      
        Summary: Ensure struct page array fits within vmemmap area"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: vmemmap: use virtual projection of linear region
      ed385c7a
    • L
      Merge tag 'for-linus-20160304' of git://git.infradead.org/linux-mtd · c51797d2
      Linus Torvalds 提交于
      Pull jffs2 fixes from David Woodhouse:
       "This contains two important JFFS2 fixes marked for stable:
      
         - a lock ordering problem between the page lock and the internal
           f->sem mutex, which was causing occasional deadlocks in garbage
           collection
      
         - a scan failure causing moved directories to sometimes end up
           appearing to have hard links.
      
        There are also a couple of trivial MAINTAINERS file updates"
      
      * tag 'for-linus-20160304' of git://git.infradead.org/linux-mtd:
        MAINTAINERS: add maintainer entry for FREESCALE GPMI NAND driver
        Fix directory hardlinks from deleted directories
        jffs2: Fix page lock / f->sem deadlock
        Revert "jffs2: Fix lock acquisition order bug in jffs2_write_begin"
        MAINTAINERS: update Han's email
      c51797d2
    • L
      Merge branch 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · 2cdcb2b5
      Linus Torvalds 提交于
      Pull btrfs fix from Chris Mason:
       "Filipe nailed down a problem where tree log replay would do some work
        that orphan code wasn't expecting to be done yet, leading to BUG_ON"
      
      * 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
        Btrfs: fix loading of orphan roots leading to BUG_ON
      2cdcb2b5
    • L
      Merge tag 'trace-fixes-v4.5-rc6' of... · 78baab7a
      Linus Torvalds 提交于
      Merge tag 'trace-fixes-v4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
      
      Pull tracing fix from Steven Rostedt:
       "A feature was added in 4.3 that allowed users to filter trace points
        on a tasks "comm" field.  But this prevented filtering on a comm field
        that is within a trace event (like sched_migrate_task).
      
        When trying to filter on when a program migrated, this change
        prevented the filtering of the sched_migrate_task.
      
        To fix this, the event fields are examined first, and then the extra
        fields like "comm" and "cpu" are examined.  Also, instead of testing
        to assign the comm filter function based on the field's name, the
        generic comm field is given a new filter type (FILTER_COMM).  When
        this field is used to filter the type is checked.  The same is done
        for the cpu filter field.
      
        Two new special filter types are added: "COMM" and "CPU".  This allows
        users to still filter the tasks comm for events that have "comm" as
        one of their fields, in cases that users would like to filter
        sched_migrate_task on the comm of the task that called the event, and
        not the comm of the task that is being migrated"
      
      * tag 'trace-fixes-v4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracing: Do not have 'comm' filter override event 'comm' field
      78baab7a
    • V
      nfit: Continue init even if ARS commands are unimplemented · 6e2452df
      Vishal Verma 提交于
      If firmware doesn't implement any of the ARS commands, take that to
      mean that ARS is unsupported, and continue to initialize regions without
      bad block lists. We cannot make the assumption that ARS commands will be
      unconditionally supported on all NVDIMMs.
      Reported-by: NHaozhong Zhang <haozhong.zhang@intel.com>
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      Acked-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
      Tested-by: NHaozhong Zhang <haozhong.zhang@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      6e2452df
    • M
      ARM: 8544/1: set_memory_xx fixes · f474c8c8
      Mika Penttilä 提交于
      Allow zero size updates. This makes set_memory_xx() consistent with x86, s390 and arm64 and makes apply_to_page_range() not to BUG() when loading modules.
      
      Signed-off-by: Mika Penttilä mika.penttila@nextfour.com
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      f474c8c8
    • D
      Merge tag 'drm/tegra/for-4.5-rc7' of git://anongit.freedesktop.org/tegra/linux into drm-fixes · 26bae5e0
      Dave Airlie 提交于
      drm/tegra: Fixes for v4.5-rc7
      
      Two small fixes that restore PRIME support.
      
      * tag 'drm/tegra/for-4.5-rc7' of git://anongit.freedesktop.org/tegra/linux:
        gpu: host1x: Set DMA ops on device creation
        gpu: host1x: Set DMA mask
      26bae5e0
    • M
      MIPS: traps: Fix SIGFPE information leak from `do_ov' and `do_trap_or_bp' · e723e3f7
      Maciej W. Rozycki 提交于
      Avoid sending a partially initialised `siginfo_t' structure along SIGFPE
      signals issued from `do_ov' and `do_trap_or_bp', leading to information
      leaking from the kernel stack.
      Signed-off-by: NMaciej W. Rozycki <macro@imgtec.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      e723e3f7
    • R
      Merge branches 'pm-cpufreq-fixes' and 'pm-sleep-fixes' · bfc6b97d
      Rafael J. Wysocki 提交于
      * pm-cpufreq-fixes:
        cpufreq: mediatek: allow building as a module
        cpufreq: qoriq: allow building as module with THERMAL=m
      
      * pm-sleep-fixes:
        PM / sleep / x86: Fix crash on graph trace through x86 suspend
      bfc6b97d