1. 20 4月, 2019 7 次提交
  2. 17 4月, 2019 33 次提交
    • G
      Linux 4.19.35 · 4b0e041c
      Greg Kroah-Hartman 提交于
      4b0e041c
    • M
      KVM: x86: nVMX: fix x2APIC VTPR read intercept · 59bf185a
      Marc Orr 提交于
      commit c73f4c998e1fd4249b9edfa39e23f4fda2b9b041 upstream.
      
      Referring to the "VIRTUALIZING MSR-BASED APIC ACCESSES" chapter of the
      SDM, when "virtualize x2APIC mode" is 1 and "APIC-register
      virtualization" is 0, a RDMSR of 808H should return the VTPR from the
      virtual APIC page.
      
      However, for nested, KVM currently fails to disable the read intercept
      for this MSR. This means that a RDMSR exit takes precedence over
      "virtualize x2APIC mode", and KVM passes through L1's TPR to L2,
      instead of sourcing the value from L2's virtual APIC page.
      
      This patch fixes the issue by disabling the read intercept, in VMCS02,
      for the VTPR when "APIC-register virtualization" is 0.
      
      The issue described above and fix prescribed here, were verified with
      a related patch in kvm-unit-tests titled "Test VMX's virtualize x2APIC
      mode w/ nested".
      Signed-off-by: NMarc Orr <marcorr@google.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Fixes: c992384b ("KVM: vmx: speed up MSR bitmap merge")
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      59bf185a
    • M
      KVM: x86: nVMX: close leak of L0's x2APIC MSRs (CVE-2019-3887) · 119031be
      Marc Orr 提交于
      commit acff78477b9b4f26ecdf65733a4ed77fe837e9dc upstream.
      
      The nested_vmx_prepare_msr_bitmap() function doesn't directly guard the
      x2APIC MSR intercepts with the "virtualize x2APIC mode" MSR. As a
      result, we discovered the potential for a buggy or malicious L1 to get
      access to L0's x2APIC MSRs, via an L2, as follows.
      
      1. L1 executes WRMSR(IA32_SPEC_CTRL, 1). This causes the spec_ctrl
      variable, in nested_vmx_prepare_msr_bitmap() to become true.
      2. L1 disables "virtualize x2APIC mode" in VMCS12.
      3. L1 enables "APIC-register virtualization" in VMCS12.
      
      Now, KVM will set VMCS02's x2APIC MSR intercepts from VMCS12, and then
      set "virtualize x2APIC mode" to 0 in VMCS02. Oops.
      
      This patch closes the leak by explicitly guarding VMCS02's x2APIC MSR
      intercepts with VMCS12's "virtualize x2APIC mode" control.
      
      The scenario outlined above and fix prescribed here, were verified with
      a related patch in kvm-unit-tests titled "Add leak scenario to
      virt_x2apic_mode_test".
      
      Note, it looks like this issue may have been introduced inadvertently
      during a merge---see 15303ba5.
      Signed-off-by: NMarc Orr <marcorr@google.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      119031be
    • E
      ACPICA: AML interpreter: add region addresses in global list during initialization · f8053df6
      Erik Schmauss 提交于
      commit 4abb951b73ff0a8a979113ef185651aa3c8da19b upstream.
      
      The table load process omitted adding the operation region address
      range to the global list. This omission is problematic because the OS
      queries the global list to check for address range conflicts before
      deciding which drivers to load. This commit may result in warning
      messages that look like the following:
      
      [    7.871761] ACPI Warning: system_IO range 0x00000428-0x0000042F conflicts with op_region 0x00000400-0x0000047F (\PMIO) (20180531/utaddress-213)
      [    7.871769] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
      
      However, these messages do not signify regressions. It is a result of
      properly adding address ranges within the global address list.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=200011Tested-by: NJean-Marc Lenoir <archlinux@jihemel.com>
      Signed-off-by: NErik Schmauss <erik.schmauss@intel.com>
      Cc: All applicable <stable@vger.kernel.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f8053df6
    • T
      arm64: dts: rockchip: Fix vcc_host1_5v GPIO polarity on rk3328-rock64 · fad502a9
      Tomohiro Mayama 提交于
      commit a8772e5d826d0f61f8aa9c284b3ab49035d5273d upstream.
      
      This patch makes USB ports functioning again.
      
      Fixes: 955bebde ("arm64: dts: rockchip: add rk3328-rock64 board")
      Cc: stable@vger.kernel.org
      Suggested-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NTomohiro Mayama <parly-gh@iris.mystia.org>
      Tested-by: NKatsuhiro Suzuki <katsuhiro@katsuster.net>
      Signed-off-by: NHeiko Stuebner <heiko@sntech.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fad502a9
    • K
      arm64: dts: rockchip: fix vcc_host1_5v pin assign on rk3328-rock64 · c9634759
      Katsuhiro Suzuki 提交于
      commit ef05bcb60c1a8841e38c91923ba998181117a87c upstream.
      
      This patch fixes pin assign of vcc_host1_5v. This regulator is
      controlled by USB20_HOST_DRV signal.
      
      ROCK64 schematic says that GPIO0_A2 pin is used as USB20_HOST_DRV.
      GPIO0_D3 pin is for SPDIF_TX_M0.
      Signed-off-by: NKatsuhiro Suzuki <katsuhiro@katsuster.net>
      Signed-off-by: NHeiko Stuebner <heiko@sntech.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c9634759
    • M
      dm integrity: fix deadlock with overlapping I/O · aa9ee4b1
      Mikulas Patocka 提交于
      commit 4ed319c6ac08e9a28fca7ac188181ac122f4de84 upstream.
      
      dm-integrity will deadlock if overlapping I/O is issued to it, the bug
      was introduced by commit 724376a0 ("dm integrity: implement fair
      range locks").  Users rarely use overlapping I/O so this bug went
      undetected until now.
      
      Fix this bug by correcting, likely cut-n-paste, typos in
      ranges_overlap() and also remove a flawed ranges_overlap() check in
      remove_range_unlocked().  This condition could leave unprocessed bios
      hanging on wait_list forever.
      
      Cc: stable@vger.kernel.org # v4.19+
      Fixes: 724376a0 ("dm integrity: implement fair range locks")
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aa9ee4b1
    • I
      dm table: propagate BDI_CAP_STABLE_WRITES to fix sporadic checksum errors · 469b40a4
      Ilya Dryomov 提交于
      commit eb40c0acdc342b815d4d03ae6abb09e80c0f2988 upstream.
      
      Some devices don't use blk_integrity but still want stable pages
      because they do their own checksumming.  Examples include rbd and iSCSI
      when data digests are negotiated.  Stacking DM (and thus LVM) on top of
      these devices results in sporadic checksum errors.
      
      Set BDI_CAP_STABLE_WRITES if any underlying device has it set.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      469b40a4
    • M
      dm: revert 8f50e358 ("dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE") · 4f5c99e0
      Mikulas Patocka 提交于
      commit 75ae193626de3238ca5fb895868ec91c94e63b1b upstream.
      
      The limit was already incorporated to dm-crypt with commit 4e870e94
      ("dm crypt: fix error with too large bios"), so we don't need to apply
      it globally to all targets. The quantity BIO_MAX_PAGES * PAGE_SIZE is
      wrong anyway because the variable ti->max_io_len it is supposed to be in
      the units of 512-byte sectors not in bytes.
      
      Reduction of the limit to 1048576 sectors could even cause data
      corruption in rare cases - suppose that we have a dm-striped device with
      stripe size 768MiB. The target will call dm_set_target_max_io_len with
      the value 1572864. The buggy code would reduce it to 1048576. Now, the
      dm-core will errorneously split the bios on 1048576-sector boundary
      insetad of 1572864-sector boundary and pass these stripe-crossing bios
      to the striped target.
      
      Cc: stable@vger.kernel.org # v4.16+
      Fixes: 8f50e358 ("dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE")
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Acked-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4f5c99e0
    • M
      dm integrity: change memcmp to strncmp in dm_integrity_ctr · 30dc4d7b
      Mikulas Patocka 提交于
      commit 0d74e6a3b6421d98eeafbed26f29156d469bc0b5 upstream.
      
      If the string opt_string is small, the function memcmp can access bytes
      that are beyond the terminating nul character. In theory, it could cause
      segfault, if opt_string were located just below some unmapped memory.
      
      Change from memcmp to strncmp so that we don't read bytes beyond the end
      of the string.
      
      Cc: stable@vger.kernel.org # v4.12+
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      30dc4d7b
    • S
      PCI: pciehp: Ignore Link State Changes after powering off a slot · 5be6e02c
      Sergey Miroshnichenko 提交于
      commit 3943af9d01e94330d0cfac6fccdbc829aad50c92 upstream.
      
      During a safe hot remove, the OS powers off the slot, which may cause a
      Data Link Layer State Changed event.  The slot has already been set to
      OFF_STATE, so that event results in re-enabling the device, making it
      impossible to safely remove it.
      
      Clear out the Presence Detect Changed and Data Link Layer State Changed
      events when the disabled slot has settled down.
      
      It is still possible to re-enable the device if it remains in the slot
      after pressing the Attention Button by pressing it again.
      
      Fixes the problem that Micah reported below: an NVMe drive power button may
      not actually turn off the drive.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=203237Reported-by: NMicah Parrish <micah.parrish@hpe.com>
      Tested-by: NMicah Parrish <micah.parrish@hpe.com>
      Signed-off-by: NSergey Miroshnichenko <s.miroshnichenko@yadro.com>
      [bhelgaas: changelog, add bugzilla URL]
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NLukas Wunner <lukas@wunner.de>
      Cc: stable@vger.kernel.org	# v4.19+
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5be6e02c
    • A
      PCI: Add function 1 DMA alias quirk for Marvell 9170 SATA controller · 250fef8d
      Andre Przywara 提交于
      commit 9cde402a59770a0669d895399c13407f63d7d209 upstream.
      
      There is a Marvell 88SE9170 PCIe SATA controller I found on a board here.
      Some quick testing with the ARM SMMU enabled reveals that it suffers from
      the same requester ID mixup problems as the other Marvell chips listed
      already.
      
      Add the PCI vendor/device ID to the list of chips which need the
      workaround.
      Signed-off-by: NAndre Przywara <andre.przywara@arm.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      CC: stable@vger.kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      250fef8d
    • L
      x86/perf/amd: Remove need to check "running" bit in NMI handler · 05626465
      Lendacky, Thomas 提交于
      commit 3966c3feca3fd10b2935caa0b4a08c7dd59469e5 upstream.
      
      Spurious interrupt support was added to perf in the following commit, almost
      a decade ago:
      
        63e6be6d ("perf, x86: Catch spurious interrupts after disabling counters")
      
      The two previous patches (resolving the race condition when disabling a
      PMC and NMI latency mitigation) allow for the removal of this older
      spurious interrupt support.
      
      Currently in x86_pmu_stop(), the bit for the PMC in the active_mask bitmap
      is cleared before disabling the PMC, which sets up a race condition. This
      race condition was mitigated by introducing the running bitmap. That race
      condition can be eliminated by first disabling the PMC, waiting for PMC
      reset on overflow and then clearing the bit for the PMC in the active_mask
      bitmap. The NMI handler will not re-enable a disabled counter.
      
      If x86_pmu_stop() is called from the perf NMI handler, the NMI latency
      mitigation support will guard against any unhandled NMI messages.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org> # 4.14.x-
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/Message-ID:
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      05626465
    • L
      x86/perf/amd: Resolve NMI latency issues for active PMCs · 23d39b0a
      Lendacky, Thomas 提交于
      commit 6d3edaae16c6c7d238360f2841212c2b26774d5e upstream.
      
      On AMD processors, the detection of an overflowed PMC counter in the NMI
      handler relies on the current value of the PMC. So, for example, to check
      for overflow on a 48-bit counter, bit 47 is checked to see if it is 1 (not
      overflowed) or 0 (overflowed).
      
      When the perf NMI handler executes it does not know in advance which PMC
      counters have overflowed. As such, the NMI handler will process all active
      PMC counters that have overflowed. NMI latency in newer AMD processors can
      result in multiple overflowed PMC counters being processed in one NMI and
      then a subsequent NMI, that does not appear to be a back-to-back NMI, not
      finding any PMC counters that have overflowed. This may appear to be an
      unhandled NMI resulting in either a panic or a series of messages,
      depending on how the kernel was configured.
      
      To mitigate this issue, add an AMD handle_irq callback function,
      amd_pmu_handle_irq(), that will invoke the common x86_pmu_handle_irq()
      function and upon return perform some additional processing that will
      indicate if the NMI has been handled or would have been handled had an
      earlier NMI not handled the overflowed PMC. Using a per-CPU variable, a
      minimum value of the number of active PMCs or 2 will be set whenever a
      PMC is active. This is used to indicate the possible number of NMIs that
      can still occur. The value of 2 is used for when an NMI does not arrive
      at the LAPIC in time to be collapsed into an already pending NMI. Each
      time the function is called without having handled an overflowed counter,
      the per-CPU value is checked. If the value is non-zero, it is decremented
      and the NMI indicates that it handled the NMI. If the value is zero, then
      the NMI indicates that it did not handle the NMI.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org> # 4.14.x-
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/Message-ID:
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      23d39b0a
    • L
      x86/perf/amd: Resolve race condition when disabling PMC · e5a791b4
      Lendacky, Thomas 提交于
      commit 914123fa39042e651d79eaf86bbf63a1b938dddf upstream.
      
      On AMD processors, the detection of an overflowed counter in the NMI
      handler relies on the current value of the counter. So, for example, to
      check for overflow on a 48 bit counter, bit 47 is checked to see if it
      is 1 (not overflowed) or 0 (overflowed).
      
      There is currently a race condition present when disabling and then
      updating the PMC. Increased NMI latency in newer AMD processors makes this
      race condition more pronounced. If the counter value has overflowed, it is
      possible to update the PMC value before the NMI handler can run. The
      updated PMC value is not an overflowed value, so when the perf NMI handler
      does run, it will not find an overflowed counter. This may appear as an
      unknown NMI resulting in either a panic or a series of messages, depending
      on how the kernel is configured.
      
      To eliminate this race condition, the PMC value must be checked after
      disabling the counter. Add an AMD function, amd_pmu_disable_all(), that
      will wait for the NMI handler to reset any active and overflowed counter
      after calling x86_pmu_disable_all().
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org> # 4.14.x-
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/Message-ID:
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e5a791b4
    • A
      x86/asm: Use stricter assembly constraints in bitops · 4b004504
      Alexander Potapenko 提交于
      commit 5b77e95dd7790ff6c8fbf1cd8d0104ebed818a03 upstream.
      
      There's a number of problems with how arch/x86/include/asm/bitops.h
      is currently using assembly constraints for the memory region
      bitops are modifying:
      
      1) Use memory clobber in bitops that touch arbitrary memory
      
      Certain bit operations that read/write bits take a base pointer and an
      arbitrarily large offset to address the bit relative to that base.
      Inline assembly constraints aren't expressive enough to tell the
      compiler that the assembly directive is going to touch a specific memory
      location of unknown size, therefore we have to use the "memory" clobber
      to indicate that the assembly is going to access memory locations other
      than those listed in the inputs/outputs.
      
      To indicate that BTR/BTS instructions don't necessarily touch the first
      sizeof(long) bytes of the argument, we also move the address to assembly
      inputs.
      
      This particular change leads to size increase of 124 kernel functions in
      a defconfig build. For some of them the diff is in NOP operations, other
      end up re-reading values from memory and may potentially slow down the
      execution. But without these clobbers the compiler is free to cache
      the contents of the bitmaps and use them as if they weren't changed by
      the inline assembly.
      
      2) Use byte-sized arguments for operations touching single bytes.
      
      Passing a long value to ANDB/ORB/XORB instructions makes the compiler
      treat sizeof(long) bytes as being clobbered, which isn't the case. This
      may theoretically lead to worse code in the case of heavy optimization.
      
      Practical impact:
      
      I've built a defconfig kernel and looked through some of the functions
      generated by GCC 7.3.0 with and without this clobber, and didn't spot
      any miscompilations.
      
      However there is a (trivial) theoretical case where this code leads to
      miscompilation:
      
        https://lkml.org/lkml/2019/3/28/393
      
      using just GCC 8.3.0 with -O2.  It isn't hard to imagine someone writes
      such a function in the kernel someday.
      
      So the primary motivation is to fix an existing misuse of the asm
      directive, which happens to work in certain configurations now, but
      isn't guaranteed to work under different circumstances.
      
      [ --mingo: Added -stable tag because defconfig only builds a fraction
        of the kernel and the trivial testcase looks normal enough to
        be used in existing or in-development code. ]
      Signed-off-by: NAlexander Potapenko <glider@google.com>
      Cc: <stable@vger.kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: James Y Knight <jyknight@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20190402112813.193378-1-glider@google.com
      [ Edited the changelog, tidied up one of the defines. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4b004504
    • R
      x86/asm: Remove dead __GNUC__ conditionals · 356ae4de
      Rasmus Villemoes 提交于
      commit 88ca66d8540ca26119b1428cddb96b37925bdf01 upstream.
      
      The minimum supported gcc version is >= 4.6, so these can be removed.
      Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190111084931.24601-1-linux@rasmusvillemoes.dkSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      356ae4de
    • M
      xtensa: fix return_address · f7b778b9
      Max Filippov 提交于
      commit ada770b1e74a77fff2d5f539bf6c42c25f4784db upstream.
      
      return_address returns the address that is one level higher in the call
      stack than requested in its argument, because level 0 corresponds to its
      caller's return address. Use requested level as the number of stack
      frames to skip.
      
      This fixes the address reported by might_sleep and friends.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f7b778b9
    • M
      sched/fair: Do not re-read ->h_load_next during hierarchical load calculation · cb75a0c5
      Mel Gorman 提交于
      commit 0e9f02450da07fc7b1346c8c32c771555173e397 upstream.
      
      A NULL pointer dereference bug was reported on a distribution kernel but
      the same issue should be present on mainline kernel. It occured on s390
      but should not be arch-specific.  A partial oops looks like:
      
        Unable to handle kernel pointer dereference in virtual kernel address space
        ...
        Call Trace:
          ...
          try_to_wake_up+0xfc/0x450
          vhost_poll_wakeup+0x3a/0x50 [vhost]
          __wake_up_common+0xbc/0x178
          __wake_up_common_lock+0x9e/0x160
          __wake_up_sync_key+0x4e/0x60
          sock_def_readable+0x5e/0x98
      
      The bug hits any time between 1 hour to 3 days. The dereference occurs
      in update_cfs_rq_h_load when accumulating h_load. The problem is that
      cfq_rq->h_load_next is not protected by any locking and can be updated
      by parallel calls to task_h_load. Depending on the compiler, code may be
      generated that re-reads cfq_rq->h_load_next after the check for NULL and
      then oops when reading se->avg.load_avg. The dissassembly showed that it
      was possible to reread h_load_next after the check for NULL.
      
      While this does not appear to be an issue for later compilers, it's still
      an accident if the correct code is generated. Full locking in this path
      would have high overhead so this patch uses READ_ONCE to read h_load_next
      only once and check for NULL before dereferencing. It was confirmed that
      there were no further oops after 10 days of testing.
      
      As Peter pointed out, it is also necessary to use WRITE_ONCE() to avoid any
      potential problems with store tearing.
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NValentin Schneider <valentin.schneider@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: <stable@vger.kernel.org>
      Fixes: 68520796 ("sched: Move h_load calculation to task_h_load()")
      Link: https://lkml.kernel.org/r/20190319123610.nsivgf3mjbjjesxb@techsingularity.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cb75a0c5
    • D
      xen: Prevent buffer overflow in privcmd ioctl · ed3adb56
      Dan Carpenter 提交于
      commit 42d8644bd77dd2d747e004e367cb0c895a606f39 upstream.
      
      The "call" variable comes from the user in privcmd_ioctl_hypercall().
      It's an offset into the hypercall_page[] which has (PAGE_SIZE / 32)
      elements.  We need to put an upper bound on it to prevent an out of
      bounds access.
      
      Cc: stable@vger.kernel.org
      Fixes: 1246ae0b ("xen: add variable hypercall caller")
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ed3adb56
    • W
      arm64: backtrace: Don't bother trying to unwind the userspace stack · 84c6c2af
      Will Deacon 提交于
      commit 1e6f5440a6814d28c32d347f338bfef68bc3e69d upstream.
      
      Calling dump_backtrace() with a pt_regs argument corresponding to
      userspace doesn't make any sense and our unwinder will simply print
      "Call trace:" before unwinding the stack looking for user frames.
      
      Rather than go through this song and dance, just return early if we're
      passed a user register state.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 1149aad1 ("arm64: Add dump_backtrace() in show_regs")
      Reported-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      84c6c2af
    • P
      arm64: dts: rockchip: fix rk3328 rgmii high tx error rate · 1ec54cee
      Peter Geis 提交于
      commit 6fd8b9780ec1a49ac46e0aaf8775247205e66231 upstream.
      
      Several rk3328 based boards experience high rgmii tx error rates.
      This is due to several pins in the rk3328.dtsi rgmii pinmux that are
      missing a defined pull strength setting.
      This causes the pinmux driver to default to 2ma (bit mask 00).
      
      These pins are only defined in the rk3328.dtsi, and are not listed in
      the rk3328 specification.
      The TRM only lists them as "Reserved"
      (RK3328 TRM V1.1, 3.3.3 Detail Register Description, GRF_GPIO0B_IOMUX,
      GRF_GPIO0C_IOMUX, GRF_GPIO0D_IOMUX).
      However, removal of these pins from the rgmii pinmux definition causes
      the interface to fail to transmit.
      
      Also, the rgmii tx and rx pins defined in the dtsi are not consistent
      with the rk3328 specification, with tx pins currently set to 12ma and
      rx pins set to 2ma.
      
      Fix this by setting tx pins to 8ma and the rx pins to 4ma, consistent
      with the specification.
      Defining the drive strength for the undefined pins eliminated the high
      tx packet error rate observed under heavy data transfers.
      Aligning the drive strength to the TRM values eliminated the occasional
      packet retry errors under iperf3 testing.
      This allows much higher data rates with no recorded tx errors.
      
      Tested on the rk3328-roc-cc board.
      
      Fixes: 52e02d37 ("arm64: dts: rockchip: add core dtsi file for RK3328 SoCs")
      Cc: stable@vger.kernel.org
      Signed-off-by: NPeter Geis <pgwipeout@gmail.com>
      Signed-off-by: NHeiko Stuebner <heiko@sntech.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1ec54cee
    • W
      arm64: futex: Fix FUTEX_WAKE_OP atomic ops with non-zero result value · 82a30a5d
      Will Deacon 提交于
      commit 045afc24124d80c6998d9c770844c67912083506 upstream.
      
      Rather embarrassingly, our futex() FUTEX_WAKE_OP implementation doesn't
      explicitly set the return value on the non-faulting path and instead
      leaves it holding the result of the underlying atomic operation. This
      means that any FUTEX_WAKE_OP atomic operation which computes a non-zero
      value will be reported as having failed. Regrettably, I wrote the buggy
      code back in 2011 and it was upstreamed as part of the initial arm64
      support in 2012.
      
      The reasons we appear to get away with this are:
      
        1. FUTEX_WAKE_OP is rarely used and therefore doesn't appear to get
           exercised by futex() test applications
      
        2. If the result of the atomic operation is zero, the system call
           behaves correctly
      
        3. Prior to version 2.25, the only operation used by GLIBC set the
           futex to zero, and therefore worked as expected. From 2.25 onwards,
           FUTEX_WAKE_OP is not used by GLIBC at all.
      
      Fix the implementation by ensuring that the return value is either 0
      to indicate that the atomic operation completed successfully, or -EFAULT
      if we encountered a fault when accessing the user mapping.
      
      Cc: <stable@kernel.org>
      Fixes: 6170a974 ("arm64: Atomic operations")
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      82a30a5d
    • D
      ARM: dts: at91: Fix typo in ISC_D0 on PC9 · 4362ff97
      David Engraf 提交于
      commit e7dfb6d04e4715be1f3eb2c60d97b753fd2e4516 upstream.
      
      The function argument for the ISC_D0 on PC9 was incorrect. According to
      the documentation it should be 'C' aka 3.
      Signed-off-by: NDavid Engraf <david.engraf@sysgo.com>
      Reviewed-by: NNicolas Ferre <nicolas.ferre@microchip.com>
      Signed-off-by: NLudovic Desroches <ludovic.desroches@microchip.com>
      Fixes: 7f16cb67 ("ARM: at91/dt: add sama5d2 pinmux")
      Cc: <stable@vger.kernel.org> # v4.4+
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4362ff97
    • P
      ARM: dts: am335x-evm: Correct the regulators for the audio codec · 627a7d5a
      Peter Ujfalusi 提交于
      commit 4f96dc0a3e79ec257a2b082dab3ee694ff88c317 upstream.
      
      Correctly map the regulators used by tlv320aic3106.
      Both 1.8V and 3.3V for the codec is derived from VBAT via fixed regulators.
      
      Cc: <Stable@vger.kernel.org> # v4.14+
      Signed-off-by: NPeter Ujfalusi <peter.ujfalusi@ti.com>
      Signed-off-by: NTony Lindgren <tony@atomide.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      627a7d5a
    • P
      ARM: dts: am335x-evmsk: Correct the regulators for the audio codec · 57a9c1f4
      Peter Ujfalusi 提交于
      commit 6691370646e844be98bb6558c024269791d20bd7 upstream.
      
      Correctly map the regulators used by tlv320aic3106.
      Both 1.8V and 3.3V for the codec is derived from VBAT via fixed regulators.
      
      Cc: <Stable@vger.kernel.org> # v4.14+
      Signed-off-by: NPeter Ujfalusi <peter.ujfalusi@ti.com>
      Signed-off-by: NTony Lindgren <tony@atomide.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      57a9c1f4
    • J
      ARM: dts: rockchip: fix rk3288 cpu opp node reference · 3ba48b3c
      Jonas Karlman 提交于
      commit 6b2fde3dbfab6ebc45b0cd605e17ca5057ff9a3b upstream.
      
      The following error can be seen during boot:
      
        of: /cpus/cpu@501: Couldn't find opp node
      
      Change cpu nodes to use operating-points-v2 in order to fix this.
      
      Fixes: ce76de98 ("ARM: dts: rockchip: convert rk3288 to operating-points-v2")
      Cc: stable@vger.kernel.org
      Signed-off-by: NJonas Karlman <jonas@kwiboo.se>
      Signed-off-by: NHeiko Stuebner <heiko@sntech.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3ba48b3c
    • C
      virtio: Honour 'may_reduce_num' in vring_create_virtqueue · 32fdac09
      Cornelia Huck 提交于
      commit cf94db21905333e610e479688add629397a4b384 upstream.
      
      vring_create_virtqueue() allows the caller to specify via the
      may_reduce_num parameter whether the vring code is allowed to
      allocate a smaller ring than specified.
      
      However, the split ring allocation code tries to allocate a
      smaller ring on allocation failure regardless of what the
      caller specified. This may cause trouble for e.g. virtio-pci
      in legacy mode, which does not support ring resizing. (The
      packed ring code does not resize in any case.)
      
      Let's fix this by bailing out immediately in the split ring code
      if the requested size cannot be allocated and may_reduce_num has
      not been specified.
      
      While at it, fix a typo in the usage instructions.
      
      Fixes: 2a2d1382 ("virtio: Add improved queue allocation API")
      Cc: stable@vger.kernel.org # v4.6+
      Signed-off-by: NCornelia Huck <cohuck@redhat.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: NHalil Pasic <pasic@linux.ibm.com>
      Reviewed-by: NJens Freimann <jfreimann@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      32fdac09
    • K
      genirq: Initialize request_mutex if CONFIG_SPARSE_IRQ=n · 8b4f68b4
      Kefeng Wang 提交于
      commit e8458e7afa855317b14915d7b86ab3caceea7eb6 upstream.
      
      When CONFIG_SPARSE_IRQ is disable, the request_mutex in struct irq_desc
      is not initialized which causes malfunction.
      
      Fixes: 9114014c ("genirq: Add mutex to irq desc to serialize request/free_irq()")
      Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NMukesh Ojha <mojha@codeaurora.org>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: <linux-arm-kernel@lists.infradead.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190404074512.145533-1-wangkefeng.wang@huawei.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8b4f68b4
    • S
      genirq: Respect IRQCHIP_SKIP_SET_WAKE in irq_chip_set_wake_parent() · cd5b06a9
      Stephen Boyd 提交于
      commit 325aa19598e410672175ed50982f902d4e3f31c5 upstream.
      
      If a child irqchip calls irq_chip_set_wake_parent() but its parent irqchip
      has the IRQCHIP_SKIP_SET_WAKE flag set an error is returned.
      
      This is inconsistent behaviour vs. set_irq_wake_real() which returns 0 when
      the irqchip has the IRQCHIP_SKIP_SET_WAKE flag set. It doesn't attempt to
      walk the chain of parents and set irq wake on any chips that don't have the
      flag set either. If the intent is to call the .irq_set_wake() callback of
      the parent irqchip, then we expect irqchip implementations to omit the
      IRQCHIP_SKIP_SET_WAKE flag and implement an .irq_set_wake() function that
      calls irq_chip_set_wake_parent().
      
      The problem has been observed on a Qualcomm sdm845 device where set wake
      fails on any GPIO interrupts after applying work in progress wakeup irq
      patches to the GPIO driver. The chain of chips looks like this:
      
           QCOM GPIO -> QCOM PDC (SKIP) -> ARM GIC (SKIP)
      
      The GPIO controllers parent is the QCOM PDC irqchip which in turn has ARM
      GIC as parent.  The QCOM PDC irqchip has the IRQCHIP_SKIP_SET_WAKE flag
      set, and so does the grandparent ARM GIC.
      
      The GPIO driver doesn't know if the parent needs to set wake or not, so it
      unconditionally calls irq_chip_set_wake_parent() causing this function to
      return a failure because the parent irqchip (PDC) doesn't have the
      .irq_set_wake() callback set. Returning 0 instead makes everything work and
      irqs from the GPIO controller can be configured for wakeup.
      
      Make it consistent by returning 0 (success) from irq_chip_set_wake_parent()
      when a parent chip has IRQCHIP_SKIP_SET_WAKE set.
      
      [ tglx: Massaged changelog ]
      
      Fixes: 08b55e2a ("genirq: Add irqchip_set_wake_parent")
      Signed-off-by: NStephen Boyd <swboyd@chromium.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-gpio@vger.kernel.org
      Cc: Lina Iyer <ilina@codeaurora.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190325181026.247796-1-swboyd@chromium.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cd5b06a9
    • J
      block: fix the return errno for direct IO · 543bb48d
      Jason Yan 提交于
      commit a89afe58f1a74aac768a5eb77af95ef4ee15beaa upstream.
      
      If the last bio returned is not dio->bio, the status of the bio will
      not assigned to dio->bio if it is error. This will cause the whole IO
      status wrong.
      
          ksoftirqd/21-117   [021] ..s.  4017.966090:   8,0    C   N 4883648 [0]
                <idle>-0     [018] ..s.  4017.970888:   8,0    C  WS 4924800 + 1024 [0]
                <idle>-0     [018] ..s.  4017.970909:   8,0    D  WS 4935424 + 1024 [<idle>]
                <idle>-0     [018] ..s.  4017.970924:   8,0    D  WS 4936448 + 321 [<idle>]
          ksoftirqd/21-117   [021] ..s.  4017.995033:   8,0    C   R 4883648 + 336 [65475]
          ksoftirqd/21-117   [021] d.s.  4018.001988: myprobe1: (blkdev_bio_end_io+0x0/0x168) bi_status=7
          ksoftirqd/21-117   [021] d.s.  4018.001992: myprobe: (aio_complete_rw+0x0/0x148) x0=0xffff802f2595ad80 res=0x12a000 res2=0x0
      
      We always have to assign bio->bi_status to dio->bio.bi_status because we
      will only check dio->bio.bi_status when we return the whole IO to
      the upper layer.
      
      Fixes: 542ff7bf ("block: new direct I/O implementation")
      Cc: stable@vger.kernel.org
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJason Yan <yanaijie@huawei.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      543bb48d
    • J
      block: do not leak memory in bio_copy_user_iov() · 2591bfc6
      Jérôme Glisse 提交于
      commit a3761c3c91209b58b6f33bf69dd8bb8ec0c9d925 upstream.
      
      When bio_add_pc_page() fails in bio_copy_user_iov() we should free
      the page we just allocated otherwise we are leaking it.
      
      Cc: linux-block@vger.kernel.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: stable@vger.kernel.org
      Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: NJérôme Glisse <jglisse@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2591bfc6
    • D
      riscv: Fix syscall_get_arguments() and syscall_set_arguments() · 7af20b60
      Dmitry V. Levin 提交于
      commit 10a16997db3d99fc02c026cf2c6e6c670acafab0 upstream.
      
      RISC-V syscall arguments are located in orig_a0,a1..a5 fields
      of struct pt_regs.
      
      Due to an off-by-one bug and a bug in pointer arithmetic
      syscall_get_arguments() was reading s3..s7 fields instead of a1..a5.
      Likewise, syscall_set_arguments() was writing s3..s7 fields
      instead of a1..a5.
      
      Link: http://lkml.kernel.org/r/20190329171221.GA32456@altlinux.org
      
      Fixes: e2c0cdfb ("RISC-V: User-facing API")
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Will Drewry <wad@chromium.org>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: linux-riscv@lists.infradead.org
      Cc: stable@vger.kernel.org # v4.15+
      Acked-by: NPalmer Dabbelt <palmer@sifive.com>
      Signed-off-by: NDmitry V. Levin <ldv@altlinux.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7af20b60