1. 06 12月, 2018 40 次提交
    • K
      ALSA: hda/realtek - Support ALC300 · 094c0089
      Kailang Yang 提交于
      commit 1078bef0cd9291355a20369b21cd823026ab8eaa upstream.
      
      This patch will enable ALC300.
      
      [ It's almost equivalent with other ALC269-compatible ones, and
        apparently has no loopback mixer -- tiwai ]
      Signed-off-by: NKailang Yang <kailang@realtek.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      094c0089
    • H
      ALSA: hda: Add ASRock N68C-S UCC the power_save blacklist · bb951d8d
      Hans de Goede 提交于
      commit 39070a98d668db8fbaa2a6a6752f732cbcbb14b1 upstream.
      
      Power-saving is causing plops on audio start/stop on the built-in audio
      of the nForce 430 based ASRock N68C-S UCC motherboard, add this model to
      the power_save blacklist.
      
      BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1525104
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NHans de Goede <hdegoede@redhat.com>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bb951d8d
    • T
      ALSA: sparc: Fix invalid snd_free_pages() at error path · 15c5fb33
      Takashi Iwai 提交于
      commit 9a20332ab373b1f8f947e0a9c923652b32dab031 upstream.
      
      Some spurious calls of snd_free_pages() have been overlooked and
      remain in the error paths of sparc cs4231 driver code.  Since
      runtime->dma_area is managed by the PCM core helper, we shouldn't
      release manually.
      
      Drop the superfluous calls.
      Reviewed-by: NTakashi Sakamoto <o-takashi@sakamocchi.jp>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      15c5fb33
    • T
      ALSA: control: Fix race between adding and removing a user element · d8a2dca0
      Takashi Iwai 提交于
      commit e1a7bfe3807974e66f971f2589d4e0197ec0fced upstream.
      
      The procedure for adding a user control element has some window opened
      for race against the concurrent removal of a user element.  This was
      caught by syzkaller, hitting a KASAN use-after-free error.
      
      This patch addresses the bug by wrapping the whole procedure to add a
      user control element with the card->controls_rwsem, instead of only
      around the increment of card->user_ctl_count.
      
      This required a slight code refactoring, too.  The function
      snd_ctl_add() is split to two parts: a core function to add the
      control element and a part calling it.  The former is called from the
      function for adding a user control element inside the controls_rwsem.
      
      One change to be noted is that snd_ctl_notify() for adding a control
      element gets called inside the controls_rwsem as well while it was
      called outside the rwsem.  But this should be OK, as snd_ctl_notify()
      takes another (finer) rwlock instead of rwsem, and the call of
      snd_ctl_notify() inside rwsem is already done in another code path.
      
      Reported-by: syzbot+dc09047bce3820621ba2@syzkaller.appspotmail.com
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d8a2dca0
    • T
      ALSA: ac97: Fix incorrect bit shift at AC97-SPSA control write · b77c35ef
      Takashi Iwai 提交于
      commit 7194eda1ba0872d917faf3b322540b4f57f11ba5 upstream.
      
      The function snd_ac97_put_spsa() gets the bit shift value from the
      associated private_value, but it extracts too much; the current code
      extracts 8 bit values in bits 8-15, but this is a combination of two
      nibbles (bits 8-11 and bits 12-15) for left and right shifts.
      Due to the incorrect bits extraction, the actual shift may go beyond
      the 32bit value, as spotted recently by UBSAN check:
       UBSAN: Undefined behaviour in sound/pci/ac97/ac97_codec.c:836:7
       shift exponent 68 is too large for 32-bit type 'int'
      
      This patch fixes the shift value extraction by masking the properly
      with 0x0f instead of 0xff.
      Reported-and-tested-by: NMeelis Roos <mroos@linux.ee>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b77c35ef
    • T
      ALSA: wss: Fix invalid snd_free_pages() at error path · e83c4405
      Takashi Iwai 提交于
      commit 7b69154171b407844c273ab4c10b5f0ddcd6aa29 upstream.
      
      Some spurious calls of snd_free_pages() have been overlooked and
      remain in the error paths of wss driver code.  Since runtime->dma_area
      is managed by the PCM core helper, we shouldn't release manually.
      
      Drop the superfluous calls.
      Reviewed-by: NTakashi Sakamoto <o-takashi@sakamocchi.jp>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e83c4405
    • M
      fs: fix lost error code in dio_complete · adcd35a3
      Maximilian Heyne 提交于
      commit 41e817bca3acd3980efe5dd7d28af0e6f4ab9247 upstream.
      
      commit e2592217 ("fs: simplify the
      generic_write_sync prototype") reworked callers of generic_write_sync(),
      and ended up dropping the error return for the directio path. Prior to
      that commit, in dio_complete(), an error would be bubbled up the stack,
      but after that commit, errors passed on to dio_complete were eaten up.
      
      This was reported on the list earlier, and a fix was proposed in
      https://lore.kernel.org/lkml/20160921141539.GA17898@infradead.org/, but
      never followed up with.  We recently hit this bug in our testing where
      fencing io errors, which were previously erroring out with EIO, were
      being returned as success operations after this commit.
      
      The fix proposed on the list earlier was a little short -- it would have
      still called generic_write_sync() in case `ret` already contained an
      error. This fix ensures generic_write_sync() is only called when there's
      no pending error in the write. Additionally, transferred is replaced
      with ret to bring this code in line with other callers.
      
      Fixes: e2592217 ("fs: simplify the generic_write_sync prototype")
      Reported-by: NRavi Nankani <rnankani@amazon.com>
      Signed-off-by: NMaximilian Heyne <mheyne@amazon.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      CC: Torsten Mehlan <tomeh@amazon.de>
      CC: Uwe Dannowski <uwed@amazon.de>
      CC: Amit Shah <aams@amazon.de>
      CC: David Woodhouse <dwmw@amazon.co.uk>
      CC: stable@vger.kernel.org
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      adcd35a3
    • J
      perf/x86/intel: Disallow precise_ip on BTS events · 205af59e
      Jiri Olsa 提交于
      commit 472de49fdc53365c880ab81ae2b5cfdd83db0b06 upstream.
      
      Vince reported a crash in the BTS flush code when touching the callchain
      data, which was supposed to be initialized as an 'early' callchain,
      but intel_pmu_drain_bts_buffer() does not do that:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
        ...
        Call Trace:
         <IRQ>
         intel_pmu_drain_bts_buffer+0x151/0x220
         ? intel_get_event_constraints+0x219/0x360
         ? perf_assign_events+0xe2/0x2a0
         ? select_idle_sibling+0x22/0x3a0
         ? __update_load_avg_se+0x1ec/0x270
         ? enqueue_task_fair+0x377/0xdd0
         ? cpumask_next_and+0x19/0x20
         ? load_balance+0x134/0x950
         ? check_preempt_curr+0x7a/0x90
         ? ttwu_do_wakeup+0x19/0x140
         x86_pmu_stop+0x3b/0x90
         x86_pmu_del+0x57/0x160
         event_sched_out.isra.106+0x81/0x170
         group_sched_out.part.108+0x51/0xc0
         __perf_event_disable+0x7f/0x160
         event_function+0x8c/0xd0
         remote_function+0x3c/0x50
         flush_smp_call_function_queue+0x35/0xe0
         smp_call_function_single_interrupt+0x3a/0xd0
         call_function_single_interrupt+0xf/0x20
         </IRQ>
      
      It was triggered by fuzzer but can be easily reproduced by:
      
        # perf record -e cpu/branch-instructions/pu -g -c 1
      
      Peter suggested not to allow branch tracing for precise events:
      
       > Now arguably, this is really stupid behaviour. Who in his right mind
       > wants callchain output on BTS entries. And even if they do, BTS +
       > precise_ip is nonsensical.
       >
       > So in my mind disallowing precise_ip on BTS would be the simplest fix.
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Reported-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 6cbc304f ("perf/x86/intel: Fix unwind errors from PEBS entries (mk-II)")
      Link: http://lkml.kernel.org/r/20181121101612.16272-3-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      205af59e
    • J
      perf/x86/intel: Add generic branch tracing check to intel_pmu_has_bts() · be0e2e24
      Jiri Olsa 提交于
      commit 67266c1080ad56c31af72b9c18355fde8ccc124a upstream.
      
      Currently we check the branch tracing only by checking for the
      PERF_COUNT_HW_BRANCH_INSTRUCTIONS event of PERF_TYPE_HARDWARE
      type. But we can define the same event with the PERF_TYPE_RAW
      type.
      
      Changing the intel_pmu_has_bts() code to check on event's final
      hw config value, so both HW types are covered.
      
      Adding unlikely to intel_pmu_has_bts() condition calls, because
      it was used in the original code in intel_bts_constraints.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/20181121101612.16272-2-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      be0e2e24
    • J
      perf/x86/intel: Move branch tracing setup to the Intel-specific source file · ad65b548
      Jiri Olsa 提交于
      commit ed6101bbf6266ee83e620b19faa7c6ad56bb41ab upstream.
      
      Moving branch tracing setup to Intel core object into separate
      intel_pmu_bts_config function, because it's Intel specific.
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/20181121101612.16272-1-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ad65b548
    • S
      x86/fpu: Disable bottom halves while loading FPU registers · 33448a8b
      Sebastian Andrzej Siewior 提交于
      commit 68239654acafe6aad5a3c1dc7237e60accfebc03 upstream.
      
      The sequence
      
        fpu->initialized = 1;		/* step A */
        preempt_disable();		/* step B */
        fpu__restore(fpu);
        preempt_enable();
      
      in __fpu__restore_sig() is racy in regard to a context switch.
      
      For 32bit frames, __fpu__restore_sig() prepares the FPU state within
      fpu->state. To ensure that a context switch (switch_fpu_prepare() in
      particular) does not modify fpu->state it uses fpu__drop() which sets
      fpu->initialized to 0.
      
      After fpu->initialized is cleared, the CPU's FPU state is not saved
      to fpu->state during a context switch. The new state is loaded via
      fpu__restore(). It gets loaded into fpu->state from userland and
      ensured it is sane. fpu->initialized is then set to 1 in order to avoid
      fpu__initialize() doing anything (overwrite the new state) which is part
      of fpu__restore().
      
      A context switch between step A and B above would save CPU's current FPU
      registers to fpu->state and overwrite the newly prepared state. This
      looks like a tiny race window but the Kernel Test Robot reported this
      back in 2016 while we had lazy FPU support. Borislav Petkov made the
      link between that report and another patch that has been posted. Since
      the removal of the lazy FPU support, this race goes unnoticed because
      the warning has been removed.
      
      Disable bottom halves around the restore sequence to avoid the race. BH
      need to be disabled because BH is allowed to run (even with preemption
      disabled) and might invoke kernel_fpu_begin() by doing IPsec.
      
       [ bp: massage commit message a bit. ]
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
      Cc: kvm ML <kvm@vger.kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: stable@vger.kernel.org
      Cc: x86-ml <x86@kernel.org>
      Link: http://lkml.kernel.org/r/20181120102635.ddv3fvavxajjlfqk@linutronix.de
      Link: https://lkml.kernel.org/r/20160226074940.GA28911@pd.tnicSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      33448a8b
    • B
      x86/MCE/AMD: Fix the thresholding machinery initialization order · 00f91adf
      Borislav Petkov 提交于
      commit 60c8144afc287ef09ce8c1230c6aa972659ba1bb upstream.
      
      Currently, the code sets up the thresholding interrupt vector and only
      then goes about initializing the thresholding banks. Which is wrong,
      because an early thresholding interrupt would cause a NULL pointer
      dereference when accessing those banks and prevent the machine from
      booting.
      
      Therefore, set the thresholding interrupt vector only *after* having
      initialized the banks successfully.
      
      Fixes: 18807ddb ("x86/mce/AMD: Reset Threshold Limit after logging error")
      Reported-by: NRafał Miłecki <rafal@milecki.pl>
      Reported-by: NJohn Clemens <clemej@gmail.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Tested-by: NRafał Miłecki <rafal@milecki.pl>
      Tested-by: NJohn Clemens <john@deater.net>
      Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
      Cc: linux-edac@vger.kernel.org
      Cc: stable@vger.kernel.org
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: x86@kernel.org
      Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
      Link: https://lkml.kernel.org/r/20181127101700.2964-1-zajec5@gmail.com
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=201291Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      00f91adf
    • C
      arm64: dts: rockchip: Fix PCIe reset polarity for rk3399-puma-haikou. · 8af02415
      Christoph Muellner 提交于
      commit c1d91f86a1b4c9c05854d59c6a0abd5d0f75b849 upstream.
      
      This patch fixes the wrong polarity setting for the PCIe host driver's
      pre-reset pin for rk3399-puma-haikou. Without this patch link training
      will most likely fail.
      
      Fixes: 60fd9f72 ("arm64: dts: rockchip: add Haikou baseboard with RK3399-Q7 SoM")
      Cc: stable@vger.kernel.org
      Signed-off-by: NChristoph Muellner <christoph.muellner@theobroma-systems.com>
      Signed-off-by: NHeiko Stuebner <heiko@sntech.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8af02415
    • M
      PCI: Fix incorrect value returned from pcie_get_speed_cap() · ab770216
      Mikulas Patocka 提交于
      commit f1f90e254e46e0a14220e4090041f68256fbe297 upstream.
      
      The macros PCI_EXP_LNKCAP_SLS_*GB are values, not bit masks.  We must mask
      the register and compare it against them.
      
      This fixes errors like this:
      
        amdgpu: [powerplay] failed to send message 261 ret is 0
      
      when a PCIe-v3 card is plugged into a PCIe-v1 slot, because the slot is
      being incorrectly reported as PCIe-v3 capable.
      
      6cf57be0, which appeared in v4.17, added pcie_get_speed_cap() with the
      incorrect test of PCI_EXP_LNKCAP_SLS as a bitmask.  5d9a6330, which
      appeared in v4.19, changed amdgpu to use pcie_get_speed_cap(), so the
      amdgpu bug reports below are regressions in v4.19.
      
      Fixes: 6cf57be0 ("PCI: Add pcie_get_speed_cap() to find max supported link speed")
      Fixes: 5d9a6330 ("drm/amdgpu: use pcie functions for link width and speed")
      Link: https://bugs.freedesktop.org/show_bug.cgi?id=108704
      Link: https://bugs.freedesktop.org/show_bug.cgi?id=108778Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      [bhelgaas: update comment, remove use of PCI_EXP_LNKCAP_SLS_8_0GB and
      PCI_EXP_LNKCAP_SLS_16_0GB since those should be covered by PCI_EXP_LNKCAP2,
      remove test of PCI_EXP_LNKCAP for zero, since that register is required]
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: NAlex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org	# v4.17+
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ab770216
    • G
      PCI: dwc: Fix MSI-X EP framework address calculation bug · 1ce69ec3
      Gustavo Pimentel 提交于
      commit 15cb127e3c8f6232096d5dba6a5b4046bc292d70 upstream.
      
      Fix an error caused by 3-bit right rotation on offset address
      calculation of MSI-X table in dw_pcie_ep_raise_msix_irq().
      
      The initial testing code was setting by default the offset address of
      MSI-X table to zero, so that even with a 3-bit right rotation the
      computed result would still be zero and valid, therefore this bug went
      unnoticed.
      
      Fixes: beb4641a ("PCI: dwc: Add MSI-X callbacks handler")
      Signed-off-by: NGustavo Pimentel <gustavo.pimentel@synopsys.com>
      [lorenzo.pieralisi@arm.com: updated commit log]
      Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1ce69ec3
    • H
      PCI: layerscape: Fix wrong invocation of outbound window disable accessor · b391ed73
      Hou Zhiqiang 提交于
      commit c6fd6fe9dea44732cdcd970f1130b8cc50ad685a upstream.
      
      The order of parameters is not correct when invoking the outbound
      window disable routine. Fix it.
      
      Fixes: 4a2745d7 ("PCI: layerscape: Disable outbound windows configured by bootloader")
      Signed-off-by: NHou Zhiqiang <Zhiqiang.Hou@nxp.com>
      [lorenzo.pieralisi@arm.com: commit log]
      Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b391ed73
    • P
      btrfs: relocation: set trans to be NULL after ending transaction · 59065765
      Pan Bian 提交于
      commit 42a657f57628402c73237547f0134e083e2f6764 upstream.
      
      The function relocate_block_group calls btrfs_end_transaction to release
      trans when update_backref_cache returns 1, and then continues the loop
      body. If btrfs_block_rsv_refill fails this time, it will jump out the
      loop and the freed trans will be accessed. This may result in a
      use-after-free bug. The patch assigns NULL to trans after trans is
      released so that it will not be accessed.
      
      Fixes: 0647bf56 ("Btrfs: improve forever loop when doing balance relocation")
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NPan Bian <bianpan2016@163.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      59065765
    • F
      Btrfs: fix race between enabling quotas and subvolume creation · 172a94eb
      Filipe Manana 提交于
      commit 552f0329c75b3e1d7f9bb8c9e421d37403f192cd upstream.
      
      We have a race between enabling quotas end subvolume creation that cause
      subvolume creation to fail with -EINVAL, and the following diagram shows
      how it happens:
      
                    CPU 0                                          CPU 1
      
       btrfs_ioctl()
        btrfs_ioctl_quota_ctl()
         btrfs_quota_enable()
          mutex_lock(fs_info->qgroup_ioctl_lock)
      
                                                        btrfs_ioctl()
                                                         create_subvol()
                                                          btrfs_qgroup_inherit()
                                                           -> save fs_info->quota_root
                                                              into quota_root
                                                           -> stores a NULL value
                                                           -> tries to lock the mutex
                                                              qgroup_ioctl_lock
                                                              -> blocks waiting for
                                                                 the task at CPU0
      
         -> sets BTRFS_FS_QUOTA_ENABLED in fs_info
         -> sets quota_root in fs_info->quota_root
            (non-NULL value)
      
         mutex_unlock(fs_info->qgroup_ioctl_lock)
      
                                                           -> checks quota enabled
                                                              flag is set
                                                           -> returns -EINVAL because
                                                              fs_info->quota_root was
                                                              NULL before it acquired
                                                              the mutex
                                                              qgroup_ioctl_lock
                                                         -> ioctl returns -EINVAL
      
      Returning -EINVAL to user space will be confusing if all the arguments
      passed to the subvolume creation ioctl were valid.
      
      Fix it by grabbing the value from fs_info->quota_root after acquiring
      the mutex.
      
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      172a94eb
    • F
      Btrfs: fix rare chances for data loss when doing a fast fsync · 715608db
      Filipe Manana 提交于
      commit aab15e8ec25765cf7968c72cbec7583acf99d8a4 upstream.
      
      After the simplification of the fast fsync patch done recently by commit
      b5e6c3e1 ("btrfs: always wait on ordered extents at fsync time") and
      commit e7175a69 ("btrfs: remove the wait ordered logic in the
      log_one_extent path"), we got a very short time window where we can get
      extents logged without writeback completing first or extents logged
      without logging the respective data checksums. Both issues can only happen
      when doing a non-full (fast) fsync.
      
      As soon as we enter btrfs_sync_file() we trigger writeback, then lock the
      inode and then wait for the writeback to complete before starting to log
      the inode. However before we acquire the inode's lock and after we started
      writeback, it's possible that more writes happened and dirtied more pages.
      If that happened and those pages get writeback triggered while we are
      logging the inode (for example, the VM subsystem triggering it due to
      memory pressure, or another concurrent fsync), we end up seeing the
      respective extent maps in the inode's list of modified extents and will
      log matching file extent items without waiting for the respective
      ordered extents to complete, meaning that either of the following will
      happen:
      
      1) We log an extent after its writeback finishes but before its checksums
         are added to the csum tree, leading to -EIO errors when attempting to
         read the extent after a log replay.
      
      2) We log an extent before its writeback finishes.
         Therefore after the log replay we will have a file extent item pointing
         to an unwritten extent (and without the respective data checksums as
         well).
      
      This could not happen before the fast fsync patch simplification, because
      for any extent we found in the list of modified extents, we would wait for
      its respective ordered extent to finish writeback or collect its checksums
      for logging if it did not complete yet.
      
      Fix this by triggering writeback again after acquiring the inode's lock
      and before waiting for ordered extents to complete.
      
      Fixes: e7175a69 ("btrfs: remove the wait ordered logic in the log_one_extent path")
      Fixes: b5e6c3e1 ("btrfs: always wait on ordered extents at fsync time")
      CC: stable@vger.kernel.org # 4.19+
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      715608db
    • F
      Btrfs: ensure path name is null terminated at btrfs_control_ioctl · 78a2890f
      Filipe Manana 提交于
      commit f505754fd6599230371cb01b9332754ddc104be1 upstream.
      
      We were using the path name received from user space without checking that
      it is null terminated. While btrfs-progs is well behaved and does proper
      validation and null termination, someone could call the ioctl and pass
      a non-null terminated patch, leading to buffer overrun problems in the
      kernel.  The ioctl is protected by CAP_SYS_ADMIN.
      
      So just set the last byte of the path to a null character, similar to what
      we do in other ioctls (add/remove/resize device, snapshot creation, etc).
      
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      78a2890f
    • N
      btrfs: Always try all copies when reading extent buffers · aaf249e3
      Nikolay Borisov 提交于
      commit f8397d69daef06d358430d3054662fb597e37c00 upstream.
      
      When a metadata read is served the endio routine btree_readpage_end_io_hook
      is called which eventually runs the tree-checker. If tree-checker fails
      to validate the read eb then it sets EXTENT_BUFFER_CORRUPT flag. This
      leads to btree_read_extent_buffer_pages wrongly assuming that all
      available copies of this extent buffer are wrong and failing prematurely.
      Fix this modify btree_read_extent_buffer_pages to read all copies of
      the data.
      
      This failure was exhibitted in xfstests btrfs/124 which would
      spuriously fail its balance operations. The reason was that when balance
      was run following re-introduction of the missing raid1 disk
      __btrfs_map_block would map the read request to stripe 0, which
      corresponded to devid 2 (the disk which is being removed in the test):
      
          item 2 key (FIRST_CHUNK_TREE CHUNK_ITEM 3553624064) itemoff 15975 itemsize 112
      	length 1073741824 owner 2 stripe_len 65536 type DATA|RAID1
      	io_align 65536 io_width 65536 sector_size 4096
      	num_stripes 2 sub_stripes 1
      		stripe 0 devid 2 offset 2156920832
      		dev_uuid 8466c350-ed0c-4c3b-b17d-6379b445d5c8
      		stripe 1 devid 1 offset 3553624064
      		dev_uuid 1265d8db-5596-477e-af03-df08eb38d2ca
      
      This caused read requests for a checksum item that to be routed to the
      stale disk which triggered the aforementioned logic involving
      EXTENT_BUFFER_CORRUPT flag. This then triggered cascading failures of
      the balance operation.
      
      Fixes: a826d6dc ("Btrfs: check items for correctness as we search")
      CC: stable@vger.kernel.org # 4.4+
      Suggested-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aaf249e3
    • J
      udf: Allow mounting volumes with incorrect identification strings · 949ddf80
      Jan Kara 提交于
      commit b54e41f5efcb4316b2f30b30c2535cc194270373 upstream.
      
      Commit c26f6c61 ("udf: Fix conversion of 'dstring' fields to UTF8")
      started to be more strict when checking whether converted strings are
      properly formatted. Sudip reports that there are DVDs where the volume
      identification string is actually too long - UDF reports:
      
      [  632.309320] UDF-fs: incorrect dstring lengths (32/32)
      
      during mount and fails the mount. This is mostly harmless failure as we
      don't need volume identification (and even less volume set
      identification) for anything. So just truncate the volume identification
      string if it is too long and replace it with 'Invalid' if we just cannot
      convert it for other reasons. This keeps slightly incorrect media still
      mountable.
      
      CC: stable@vger.kernel.org
      Fixes: c26f6c61 ("udf: Fix conversion of 'dstring' fields to UTF8")
      Reported-and-tested-by: NSudip Mukherjee <sudipm.mukherjee@gmail.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      949ddf80
    • M
      xtensa: fix coprocessor part of ptrace_{get,set}xregs · 01fb21bf
      Max Filippov 提交于
      commit 38a35a78c5e270cbe53c4fef6b0d3c2da90dd849 upstream.
      
      Layout of coprocessor registers in the elf_xtregs_t and
      xtregs_coprocessor_t may be different due to alignment. Thus it is not
      always possible to copy data between the xtregs_coprocessor_t structure
      and the elf_xtregs_t and get correct values for all registers.
      Use a table of offsets and sizes of individual coprocessor register
      groups to do coprocessor context copying in the ptrace_getxregs and
      ptrace_setxregs.
      This fixes incorrect coprocessor register values reading from the user
      process by the native gdb on an xtensa core with multiple coprocessors
      and registers with high alignment requirements.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      01fb21bf
    • M
      xtensa: fix coprocessor context offset definitions · 5f84a996
      Max Filippov 提交于
      commit 03bc996af0cc71c7f30c384d8ce7260172423b34 upstream.
      
      Coprocessor context offsets are used by the assembly code that moves
      coprocessor context between the individual fields of the
      thread_info::xtregs_cp structure and coprocessor registers.
      This fixes coprocessor context clobbering on flushing and reloading
      during normal user code execution and user process debugging in the
      presence of more than one coprocessor in the core configuration.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5f84a996
    • M
      xtensa: enable coprocessors that are being flushed · 4ec1039f
      Max Filippov 提交于
      commit 2958b66694e018c552be0b60521fec27e8d12988 upstream.
      
      coprocessor_flush_all may be called from a context of a thread that is
      different from the thread being flushed. In that case contents of the
      cpenable special register may not match ti->cpenable of the target
      thread, resulting in unhandled coprocessor exception in the kernel
      context.
      Set cpenable special register to the ti->cpenable of the target register
      for the duration of the flush and restore it afterwards.
      This fixes the following crash caused by coprocessor register inspection
      in native gdb:
      
        (gdb) p/x $w0
        Illegal instruction in kernel: sig: 9 [#1] PREEMPT
        Call Trace:
          ___might_sleep+0x184/0x1a4
          __might_sleep+0x41/0xac
          exit_signals+0x14/0x218
          do_exit+0xc9/0x8b8
          die+0x99/0xa0
          do_illegal_instruction+0x18/0x6c
          common_exception+0x77/0x77
          coprocessor_flush+0x16/0x3c
          arch_ptrace+0x46c/0x674
          sys_ptrace+0x2ce/0x3b4
          system_call+0x54/0x80
          common_exception+0x77/0x77
        note: gdb[100] exited with preempt_count 1
        Killed
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4ec1039f
    • L
      KVM: VMX: re-add ple_gap module parameter · bbe23c4b
      Luiz Capitulino 提交于
      commit a87c99e6 upstream.
      
      Apparently, the ple_gap parameter was accidentally removed
      by commit c8e88717. Add it
      back.
      Signed-off-by: NLuiz Capitulino <lcapitulino@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: c8e88717Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bbe23c4b
    • W
      KVM: X86: Fix scan ioapic use-before-initialization · 61c42d65
      Wanpeng Li 提交于
      commit e97f852fd4561e77721bb9a4e0ea9d98305b1e93 upstream.
      
      Reported by syzkaller:
      
       BUG: unable to handle kernel NULL pointer dereference at 00000000000001c8
       PGD 80000003ec4da067 P4D 80000003ec4da067 PUD 3f7bfa067 PMD 0
       Oops: 0000 [#1] PREEMPT SMP PTI
       CPU: 7 PID: 5059 Comm: debug Tainted: G           OE     4.19.0-rc5 #16
       RIP: 0010:__lock_acquire+0x1a6/0x1990
       Call Trace:
        lock_acquire+0xdb/0x210
        _raw_spin_lock+0x38/0x70
        kvm_ioapic_scan_entry+0x3e/0x110 [kvm]
        vcpu_enter_guest+0x167e/0x1910 [kvm]
        kvm_arch_vcpu_ioctl_run+0x35c/0x610 [kvm]
        kvm_vcpu_ioctl+0x3e9/0x6d0 [kvm]
        do_vfs_ioctl+0xa5/0x690
        ksys_ioctl+0x6d/0x80
        __x64_sys_ioctl+0x1a/0x20
        do_syscall_64+0x83/0x6e0
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The reason is that the testcase writes hyperv synic HV_X64_MSR_SINT6 msr
      and triggers scan ioapic logic to load synic vectors into EOI exit bitmap.
      However, irqchip is not initialized by this simple testcase, ioapic/apic
      objects should not be accessed.
      This can be triggered by the following program:
      
          #define _GNU_SOURCE
      
          #include <endian.h>
          #include <stdint.h>
          #include <stdio.h>
          #include <stdlib.h>
          #include <string.h>
          #include <sys/syscall.h>
          #include <sys/types.h>
          #include <unistd.h>
      
          uint64_t r[3] = {0xffffffffffffffff, 0xffffffffffffffff, 0xffffffffffffffff};
      
          int main(void)
          {
          	syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0);
          	long res = 0;
          	memcpy((void*)0x20000040, "/dev/kvm", 9);
          	res = syscall(__NR_openat, 0xffffffffffffff9c, 0x20000040, 0, 0);
          	if (res != -1)
          		r[0] = res;
          	res = syscall(__NR_ioctl, r[0], 0xae01, 0);
          	if (res != -1)
          		r[1] = res;
          	res = syscall(__NR_ioctl, r[1], 0xae41, 0);
          	if (res != -1)
          		r[2] = res;
          	memcpy(
          			(void*)0x20000080,
          			"\x01\x00\x00\x00\x00\x5b\x61\xbb\x96\x00\x00\x40\x00\x00\x00\x00\x01\x00"
          			"\x08\x00\x00\x00\x00\x00\x0b\x77\xd1\x78\x4d\xd8\x3a\xed\xb1\x5c\x2e\x43"
          			"\xaa\x43\x39\xd6\xff\xf5\xf0\xa8\x98\xf2\x3e\x37\x29\x89\xde\x88\xc6\x33"
          			"\xfc\x2a\xdb\xb7\xe1\x4c\xac\x28\x61\x7b\x9c\xa9\xbc\x0d\xa0\x63\xfe\xfe"
          			"\xe8\x75\xde\xdd\x19\x38\xdc\x34\xf5\xec\x05\xfd\xeb\x5d\xed\x2e\xaf\x22"
          			"\xfa\xab\xb7\xe4\x42\x67\xd0\xaf\x06\x1c\x6a\x35\x67\x10\x55\xcb",
          			106);
          	syscall(__NR_ioctl, r[2], 0x4008ae89, 0x20000080);
          	syscall(__NR_ioctl, r[2], 0xae80, 0);
          	return 0;
          }
      
      This patch fixes it by bailing out scan ioapic if ioapic is not initialized in
      kernel.
      Reported-by: NWei Wu <ww9210@gmail.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Wei Wu <ww9210@gmail.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      61c42d65
    • W
      KVM: LAPIC: Fix pv ipis use-before-initialization · ffb01e73
      Wanpeng Li 提交于
      commit 38ab012f109caf10f471db1adf284e620dd8d701 upstream.
      
      Reported by syzkaller:
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000014
       PGD 800000040410c067 P4D 800000040410c067 PUD 40410d067 PMD 0
       Oops: 0000 [#1] PREEMPT SMP PTI
       CPU: 3 PID: 2567 Comm: poc Tainted: G           OE     4.19.0-rc5 #16
       RIP: 0010:kvm_pv_send_ipi+0x94/0x350 [kvm]
       Call Trace:
        kvm_emulate_hypercall+0x3cc/0x700 [kvm]
        handle_vmcall+0xe/0x10 [kvm_intel]
        vmx_handle_exit+0xc1/0x11b0 [kvm_intel]
        vcpu_enter_guest+0x9fb/0x1910 [kvm]
        kvm_arch_vcpu_ioctl_run+0x35c/0x610 [kvm]
        kvm_vcpu_ioctl+0x3e9/0x6d0 [kvm]
        do_vfs_ioctl+0xa5/0x690
        ksys_ioctl+0x6d/0x80
        __x64_sys_ioctl+0x1a/0x20
        do_syscall_64+0x83/0x6e0
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The reason is that the apic map has not yet been initialized, the testcase
      triggers pv_send_ipi interface by vmcall which results in kvm->arch.apic_map
      is dereferenced. This patch fixes it by checking whether or not apic map is
      NULL and bailing out immediately if that is the case.
      
      Fixes: 4180bf1b (KVM: X86: Implement "send IPI" hypercall)
      Reported-by: NWei Wu <ww9210@gmail.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Wei Wu <ww9210@gmail.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ffb01e73
    • L
      KVM: x86: Fix kernel info-leak in KVM_HC_CLOCK_PAIRING hypercall · 6d772df4
      Liran Alon 提交于
      commit bcbfbd8e upstream.
      
      kvm_pv_clock_pairing() allocates local var
      "struct kvm_clock_pairing clock_pairing" on stack and initializes
      all it's fields besides padding (clock_pairing.pad[]).
      
      Because clock_pairing var is written completely (including padding)
      to guest memory, failure to init struct padding results in kernel
      info-leak.
      
      Fix the issue by making sure to also init the padding with zeroes.
      
      Fixes: 55dd00a7 ("KVM: x86: add KVM_HC_CLOCK_PAIRING hypercall")
      Reported-by: syzbot+a8ef68d71211ba264f56@syzkaller.appspotmail.com
      Reviewed-by: NMark Kanda <mark.kanda@oracle.com>
      Signed-off-by: NLiran Alon <liran.alon@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6d772df4
    • L
      KVM: nVMX/nSVM: Fix bug which sets vcpu->arch.tsc_offset to L1 tsc_offset · 76c8476c
      Leonid Shatz 提交于
      commit 326e7425 upstream.
      
      Since commit e79f245d ("X86/KVM: Properly update 'tsc_offset' to
      represent the running guest"), vcpu->arch.tsc_offset meaning was
      changed to always reflect the tsc_offset value set on active VMCS.
      Regardless if vCPU is currently running L1 or L2.
      
      However, above mentioned commit failed to also change
      kvm_vcpu_write_tsc_offset() to set vcpu->arch.tsc_offset correctly.
      This is because vmx_write_tsc_offset() could set the tsc_offset value
      in active VMCS to given offset parameter *plus vmcs12->tsc_offset*.
      However, kvm_vcpu_write_tsc_offset() just sets vcpu->arch.tsc_offset
      to given offset parameter. Without taking into account the possible
      addition of vmcs12->tsc_offset. (Same is true for SVM case).
      
      Fix this issue by changing kvm_x86_ops->write_tsc_offset() to return
      actually set tsc_offset in active VMCS and modify
      kvm_vcpu_write_tsc_offset() to set returned value in
      vcpu->arch.tsc_offset.
      In addition, rename write_tsc_offset() callback to write_l1_tsc_offset()
      to make it clear that it is meant to set L1 TSC offset.
      
      Fixes: e79f245d ("X86/KVM: Properly update 'tsc_offset' to represent the running guest")
      Reviewed-by: NLiran Alon <liran.alon@oracle.com>
      Reviewed-by: NMihai Carabas <mihai.carabas@oracle.com>
      Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Signed-off-by: NLeonid Shatz <leonid.shatz@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      76c8476c
    • J
      kvm: svm: Ensure an IBPB on all affected CPUs when freeing a vmcb · b8b0c871
      Jim Mattson 提交于
      commit fd65d314 upstream.
      
      Previously, we only called indirect_branch_prediction_barrier on the
      logical CPU that freed a vmcb. This function should be called on all
      logical CPUs that last loaded the vmcb in question.
      
      Fixes: 15d45071 ("KVM/x86: Add IBPB support")
      Reported-by: NNeel Natu <neelnatu@google.com>
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b8b0c871
    • J
      kvm: mmu: Fix race in emulated page table writes · 471aca57
      Junaid Shahid 提交于
      commit 0e0fee5c539b61fdd098332e0e2cc375d9073706 upstream.
      
      When a guest page table is updated via an emulated write,
      kvm_mmu_pte_write() is called to update the shadow PTE using the just
      written guest PTE value. But if two emulated guest PTE writes happened
      concurrently, it is possible that the guest PTE and the shadow PTE end
      up being out of sync. Emulated writes do not mark the shadow page as
      unsync-ed, so this inconsistency will not be resolved even by a guest TLB
      flush (unless the page was marked as unsync-ed at some other point).
      
      This is fixed by re-reading the current value of the guest PTE after the
      MMU lock has been acquired instead of just using the value that was
      written prior to calling kvm_mmu_pte_write().
      Signed-off-by: NJunaid Shahid <junaids@google.com>
      Reviewed-by: NWanpeng Li <wanpengli@tencent.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      471aca57
    • A
      userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas · 34b7a7cc
      Andrea Arcangeli 提交于
      commit 29ec9066 upstream.
      
      After the VMA to register the uffd onto is found, check that it has
      VM_MAYWRITE set before allowing registration.  This way we inherit all
      common code checks before allowing to fill file holes in shmem and
      hugetlbfs with UFFDIO_COPY.
      
      The userfaultfd memory model is not applicable for readonly files unless
      it's a MAP_PRIVATE.
      
      Link: http://lkml.kernel.org/r/20181126173452.26955-4-aarcange@redhat.com
      Fixes: ff62a342 ("hugetlb: implement memfd sealing")
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reviewed-by: NMike Rapoport <rppt@linux.ibm.com>
      Reviewed-by: NHugh Dickins <hughd@google.com>
      Reported-by: NJann Horn <jannh@google.com>
      Fixes: 4c27fe4c ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support")
      Cc: <stable@vger.kernel.org>
      Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      34b7a7cc
    • T
      x86/speculation: Provide IBPB always command line options · 9f3baace
      Thomas Gleixner 提交于
      commit 55a974021ec952ee460dc31ca08722158639de72 upstream
      
      Provide the possibility to enable IBPB always in combination with 'prctl'
      and 'seccomp'.
      
      Add the extra command line options and rework the IBPB selection to
      evaluate the command instead of the mode selected by the STIPB switch case.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185006.144047038@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9f3baace
    • T
      x86/speculation: Add seccomp Spectre v2 user space protection mode · d1ec2354
      Thomas Gleixner 提交于
      commit 6b3e64c237c072797a9ec918654a60e3a46488e2 upstream
      
      If 'prctl' mode of user space protection from spectre v2 is selected
      on the kernel command-line, STIBP and IBPB are applied on tasks which
      restrict their indirect branch speculation via prctl.
      
      SECCOMP enables the SSBD mitigation for sandboxed tasks already, so it
      makes sense to prevent spectre v2 user space to user space attacks as
      well.
      
      The Intel mitigation guide documents how STIPB works:
          
         Setting bit 1 (STIBP) of the IA32_SPEC_CTRL MSR on a logical processor
         prevents the predicted targets of indirect branches on any logical
         processor of that core from being controlled by software that executes
         (or executed previously) on another logical processor of the same core.
      
      Ergo setting STIBP protects the task itself from being attacked from a task
      running on a different hyper-thread and protects the tasks running on
      different hyper-threads from being attacked.
      
      While the document suggests that the branch predictors are shielded between
      the logical processors, the observed performance regressions suggest that
      STIBP simply disables the branch predictor more or less completely. Of
      course the document wording is vague, but the fact that there is also no
      requirement for issuing IBPB when STIBP is used points clearly in that
      direction. The kernel still issues IBPB even when STIBP is used until Intel
      clarifies the whole mechanism.
      
      IBPB is issued when the task switches out, so malicious sandbox code cannot
      mistrain the branch predictor for the next user space task on the same
      logical processor.
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185006.051663132@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d1ec2354
    • T
      x86/speculation: Enable prctl mode for spectre_v2_user · 7b62ef14
      Thomas Gleixner 提交于
      commit 7cc765a67d8e04ef7d772425ca5a2a1e2b894c15 upstream
      
      Now that all prerequisites are in place:
      
       - Add the prctl command line option
      
       - Default the 'auto' mode to 'prctl'
      
       - When SMT state changes, update the static key which controls the
         conditional STIBP evaluation on context switch.
      
       - At init update the static key which controls the conditional IBPB
         evaluation on context switch.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.958421388@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7b62ef14
    • T
      x86/speculation: Add prctl() control for indirect branch speculation · 238ba6e7
      Thomas Gleixner 提交于
      commit 9137bb27e60e554dab694eafa4cca241fa3a694f upstream
      
      Add the PR_SPEC_INDIRECT_BRANCH option for the PR_GET_SPECULATION_CTRL and
      PR_SET_SPECULATION_CTRL prctls to allow fine grained per task control of
      indirect branch speculation via STIBP and IBPB.
      
      Invocations:
       Check indirect branch speculation status with
       - prctl(PR_GET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, 0, 0, 0);
      
       Enable indirect branch speculation with
       - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_ENABLE, 0, 0);
      
       Disable indirect branch speculation with
       - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_DISABLE, 0, 0);
      
       Force disable indirect branch speculation with
       - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_FORCE_DISABLE, 0, 0);
      
      See Documentation/userspace-api/spec_ctrl.rst.
      Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.866780996@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      238ba6e7
    • T
      x86/speculation: Prepare arch_smt_update() for PRCTL mode · f67fafb8
      Thomas Gleixner 提交于
      commit 6893a959d7fdebbab5f5aa112c277d5a44435ba1 upstream
      
      The upcoming fine grained per task STIBP control needs to be updated on CPU
      hotplug as well.
      
      Split out the code which controls the strict mode so the prctl control code
      can be added later. Mark the SMP function call argument __unused while at it.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.759457117@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f67fafb8
    • T
      x86/speculation: Prevent stale SPEC_CTRL msr content · e8412401
      Thomas Gleixner 提交于
      commit 6d991ba509ebcfcc908e009d1db51972a4f7a064 upstream
      
      The seccomp speculation control operates on all tasks of a process, but
      only the current task of a process can update the MSR immediately. For the
      other threads the update is deferred to the next context switch.
      
      This creates the following situation with Process A and B:
      
      Process A task 2 and Process B task 1 are pinned on CPU1. Process A task 2
      does not have the speculation control TIF bit set. Process B task 1 has the
      speculation control TIF bit set.
      
      CPU0					CPU1
      					MSR bit is set
      					ProcB.T1 schedules out
      					ProcA.T2 schedules in
      					MSR bit is cleared
      ProcA.T1
        seccomp_update()
        set TIF bit on ProcA.T2
      					ProcB.T1 schedules in
      					MSR is not updated  <-- FAIL
      
      This happens because the context switch code tries to avoid the MSR update
      if the speculation control TIF bits of the incoming and the outgoing task
      are the same. In the worst case ProcB.T1 and ProcA.T2 are the only tasks
      scheduling back and forth on CPU1, which keeps the MSR stale forever.
      
      In theory this could be remedied by IPIs, but chasing the remote task which
      could be migrated is complex and full of races.
      
      The straight forward solution is to avoid the asychronous update of the TIF
      bit and defer it to the next context switch. The speculation control state
      is stored in task_struct::atomic_flags by the prctl and seccomp updates
      already.
      
      Add a new TIF_SPEC_FORCE_UPDATE bit and set this after updating the
      atomic_flags. Check the bit on context switch and force a synchronous
      update of the speculation control if set. Use the same mechanism for
      updating the current task.
      Reported-by: NTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1811272247140.1875@nanos.tec.linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e8412401
    • T
      x86/speculation: Split out TIF update · 59028be1
      Thomas Gleixner 提交于
      commit e6da8bb6f9abb2628381904b24163c770e630bac upstream
      
      The update of the TIF_SSBD flag and the conditional speculation control MSR
      update is done in the ssb_prctl_set() function directly. The upcoming prctl
      support for controlling indirect branch speculation via STIBP needs the
      same mechanism.
      
      Split the code out and make it reusable. Reword the comment about updates
      for other tasks.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.652305076@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      59028be1