1. 06 12月, 2018 40 次提交
    • S
      sh/function_graph: Simplify with function_graph_enter() · 56c1dd92
      Steven Rostedt (VMware) 提交于
      commit bc715ee4dbc5db462c59b9cfba92d31b3274fe3a upstream.
      
      The function_graph_enter() function does the work of calling the function
      graph hook function and the management of the shadow stack, simplifying the
      work done in the architecture dependent prepare_ftrace_return().
      
      Have superh use the new code, and remove the shadow stack management as well as
      having to set up the trace structure.
      
      This is needed to prepare for a fix of a design bug on how the curr_ret_stack
      is used.
      
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: linux-sh@vger.kernel.org
      Cc: stable@kernel.org
      Fixes: 03274a3f ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
      Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      56c1dd92
    • S
      powerpc/function_graph: Simplify with function_graph_enter() · 5478648e
      Steven Rostedt (VMware) 提交于
      commit fe60522ec60082a1dd735691b82c64f65d4ad15e upstream.
      
      The function_graph_enter() function does the work of calling the function
      graph hook function and the management of the shadow stack, simplifying the
      work done in the architecture dependent prepare_ftrace_return().
      
      Have powerpc use the new code, and remove the shadow stack management as well as
      having to set up the trace structure.
      
      This is needed to prepare for a fix of a design bug on how the curr_ret_stack
      is used.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: stable@kernel.org
      Fixes: 03274a3f ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
      Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5478648e
    • S
      nds32: function_graph: Simplify with function_graph_enter() · 25ac02d0
      Steven Rostedt (VMware) 提交于
      commit d48ebb24866edea2c35be02a878f25bc65529370 upstream.
      
      The function_graph_enter() function does the work of calling the function
      graph hook function and the management of the shadow stack, simplifying the
      work done in the architecture dependent prepare_ftrace_return().
      
      Have nds32 use the new code, and remove the shadow stack management as well as
      having to set up the trace structure.
      
      This is needed to prepare for a fix of a design bug on how the curr_ret_stack
      is used.
      
      Cc: Greentime Hu <greentime@andestech.com>
      Cc: stable@kernel.org
      Fixes: 03274a3f ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
      Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      25ac02d0
    • S
      x86/function_graph: Simplify with function_graph_enter() · 21761499
      Steven Rostedt (VMware) 提交于
      commit 07f7175b43827640d1e69c9eded89aa089a234b4 upstream.
      
      The function_graph_enter() function does the work of calling the function
      graph hook function and the management of the shadow stack, simplifying the
      work done in the architecture dependent prepare_ftrace_return().
      
      Have x86 use the new code, and remove the shadow stack management as well as
      having to set up the trace structure.
      
      This is needed to prepare for a fix of a design bug on how the curr_ret_stack
      is used.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: x86@kernel.org
      Cc: stable@kernel.org
      Fixes: 03274a3f ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
      Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      21761499
    • S
      microblaze: function_graph: Simplify with function_graph_enter() · e7deeabe
      Steven Rostedt (VMware) 提交于
      commit 556763e5a500d71879d632867b75826551acd49c upstream.
      
      The function_graph_enter() function does the work of calling the function
      graph hook function and the management of the shadow stack, simplifying the
      work done in the architecture dependent prepare_ftrace_return().
      
      Have microblaze use the new code, and remove the shadow stack management as well as
      having to set up the trace structure.
      
      This is needed to prepare for a fix of a design bug on how the curr_ret_stack
      is used.
      
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: stable@kernel.org
      Fixes: 03274a3f ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
      Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e7deeabe
    • S
      ARM: function_graph: Simplify with function_graph_enter() · fbbee0cf
      Steven Rostedt (VMware) 提交于
      commit f1f5b14afd7cce39e6a9b25c685e1ea34c231096 upstream.
      
      The function_graph_enter() function does the work of calling the function
      graph hook function and the management of the shadow stack, simplifying the
      work done in the architecture dependent prepare_ftrace_return().
      
      Have ARM use the new code, and remove the shadow stack management as well as
      having to set up the trace structure.
      
      This is needed to prepare for a fix of a design bug on how the curr_ret_stack
      is used.
      
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: stable@kernel.org
      Fixes: 03274a3f ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
      Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fbbee0cf
    • S
      function_graph: Create function_graph_enter() to consolidate architecture code · 67d7bec3
      Steven Rostedt (VMware) 提交于
      commit 8114865ff82e200b383e46821c25cb0625b842b5 upstream.
      
      Currently all the architectures do basically the same thing in preparing the
      function graph tracer on entry to a function. This code can be pulled into a
      generic location and then this will allow the function graph tracer to be
      fixed, as well as extended.
      
      Create a new function graph helper function_graph_enter() that will call the
      hook function (ftrace_graph_entry) and the shadow stack operation
      (ftrace_push_return_trace), and remove the need of the architecture code to
      manage the shadow stack.
      
      This is needed to prepare for a fix of a design bug on how the curr_ret_stack
      is used.
      
      Cc: stable@kernel.org
      Fixes: 03274a3f ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
      Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      67d7bec3
    • G
      ALSA: hda/realtek - Add auto-mute quirk for HP Spectre x360 laptop · b72fc1c3
      Girija Kumar Kasinadhuni 提交于
      commit e8ed64b08eddc05043e556832616a478bbe4bb00 upstream.
      
      This device makes a loud buzzing sound when a headphone is inserted while
      playing audio at full volume through the speaker.
      
      Fixes: bbf8ff6b ("ALSA: hda/realtek - Fixup for HP x360 laptops with B&O speakers")
      Signed-off-by: NGirija Kumar Kasinadhuni <gkumar@neverware.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b72fc1c3
    • H
      ALSA: hda/realtek - fix the pop noise on headphone for lenovo laptops · dcd51305
      Hui Wang 提交于
      commit c4cfcf6f4297c9256b53790bacbbbd6901fef468 upstream.
      
      We have several Lenovo laptops with the codec alc285, when playing
      sound via headphone, we can hear click/pop noise in the headphone,
      if we let the headphone share the DAC of NID 0x2 with the speaker,
      the noise disappears.
      
      The Lenovo laptops here include P52, P72, X1 yoda2 and X1 carbon.
      
      I have tried to set preferred_dacs and override_conn, but neither of
      them worked. Thanks for Kailang, he told me to invalidate the NID 0x3
      through override_wcaps.
      
      BugLink: https://bugs.launchpad.net/bugs/1805079
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NKailang Yang <kailang@realtek.com>
      Signed-off-by: NHui Wang <hui.wang@canonical.com>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dcd51305
    • A
      ALSA: hda/realtek - fix headset mic detection for MSI MS-B171 · 52484115
      Anisse Astier 提交于
      commit 8cd65271f8e545ddeed10ecc2e417936bdff168e upstream.
      
      MSI Cubi N 8GL (MS-B171) needs the same fixup as its older model, the
      MS-B120, in order for the headset mic to be properly detected.
      
      They both use a single 3-way jack for both mic and headset with an
      ALC283 codec, with the same pins used.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NAnisse Astier <anisse@astier.eu>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      52484115
    • K
      ALSA: hda/realtek - Support ALC300 · 094c0089
      Kailang Yang 提交于
      commit 1078bef0cd9291355a20369b21cd823026ab8eaa upstream.
      
      This patch will enable ALC300.
      
      [ It's almost equivalent with other ALC269-compatible ones, and
        apparently has no loopback mixer -- tiwai ]
      Signed-off-by: NKailang Yang <kailang@realtek.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      094c0089
    • H
      ALSA: hda: Add ASRock N68C-S UCC the power_save blacklist · bb951d8d
      Hans de Goede 提交于
      commit 39070a98d668db8fbaa2a6a6752f732cbcbb14b1 upstream.
      
      Power-saving is causing plops on audio start/stop on the built-in audio
      of the nForce 430 based ASRock N68C-S UCC motherboard, add this model to
      the power_save blacklist.
      
      BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1525104
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NHans de Goede <hdegoede@redhat.com>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bb951d8d
    • T
      ALSA: sparc: Fix invalid snd_free_pages() at error path · 15c5fb33
      Takashi Iwai 提交于
      commit 9a20332ab373b1f8f947e0a9c923652b32dab031 upstream.
      
      Some spurious calls of snd_free_pages() have been overlooked and
      remain in the error paths of sparc cs4231 driver code.  Since
      runtime->dma_area is managed by the PCM core helper, we shouldn't
      release manually.
      
      Drop the superfluous calls.
      Reviewed-by: NTakashi Sakamoto <o-takashi@sakamocchi.jp>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      15c5fb33
    • T
      ALSA: control: Fix race between adding and removing a user element · d8a2dca0
      Takashi Iwai 提交于
      commit e1a7bfe3807974e66f971f2589d4e0197ec0fced upstream.
      
      The procedure for adding a user control element has some window opened
      for race against the concurrent removal of a user element.  This was
      caught by syzkaller, hitting a KASAN use-after-free error.
      
      This patch addresses the bug by wrapping the whole procedure to add a
      user control element with the card->controls_rwsem, instead of only
      around the increment of card->user_ctl_count.
      
      This required a slight code refactoring, too.  The function
      snd_ctl_add() is split to two parts: a core function to add the
      control element and a part calling it.  The former is called from the
      function for adding a user control element inside the controls_rwsem.
      
      One change to be noted is that snd_ctl_notify() for adding a control
      element gets called inside the controls_rwsem as well while it was
      called outside the rwsem.  But this should be OK, as snd_ctl_notify()
      takes another (finer) rwlock instead of rwsem, and the call of
      snd_ctl_notify() inside rwsem is already done in another code path.
      
      Reported-by: syzbot+dc09047bce3820621ba2@syzkaller.appspotmail.com
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d8a2dca0
    • T
      ALSA: ac97: Fix incorrect bit shift at AC97-SPSA control write · b77c35ef
      Takashi Iwai 提交于
      commit 7194eda1ba0872d917faf3b322540b4f57f11ba5 upstream.
      
      The function snd_ac97_put_spsa() gets the bit shift value from the
      associated private_value, but it extracts too much; the current code
      extracts 8 bit values in bits 8-15, but this is a combination of two
      nibbles (bits 8-11 and bits 12-15) for left and right shifts.
      Due to the incorrect bits extraction, the actual shift may go beyond
      the 32bit value, as spotted recently by UBSAN check:
       UBSAN: Undefined behaviour in sound/pci/ac97/ac97_codec.c:836:7
       shift exponent 68 is too large for 32-bit type 'int'
      
      This patch fixes the shift value extraction by masking the properly
      with 0x0f instead of 0xff.
      Reported-and-tested-by: NMeelis Roos <mroos@linux.ee>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b77c35ef
    • T
      ALSA: wss: Fix invalid snd_free_pages() at error path · e83c4405
      Takashi Iwai 提交于
      commit 7b69154171b407844c273ab4c10b5f0ddcd6aa29 upstream.
      
      Some spurious calls of snd_free_pages() have been overlooked and
      remain in the error paths of wss driver code.  Since runtime->dma_area
      is managed by the PCM core helper, we shouldn't release manually.
      
      Drop the superfluous calls.
      Reviewed-by: NTakashi Sakamoto <o-takashi@sakamocchi.jp>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e83c4405
    • M
      fs: fix lost error code in dio_complete · adcd35a3
      Maximilian Heyne 提交于
      commit 41e817bca3acd3980efe5dd7d28af0e6f4ab9247 upstream.
      
      commit e2592217 ("fs: simplify the
      generic_write_sync prototype") reworked callers of generic_write_sync(),
      and ended up dropping the error return for the directio path. Prior to
      that commit, in dio_complete(), an error would be bubbled up the stack,
      but after that commit, errors passed on to dio_complete were eaten up.
      
      This was reported on the list earlier, and a fix was proposed in
      https://lore.kernel.org/lkml/20160921141539.GA17898@infradead.org/, but
      never followed up with.  We recently hit this bug in our testing where
      fencing io errors, which were previously erroring out with EIO, were
      being returned as success operations after this commit.
      
      The fix proposed on the list earlier was a little short -- it would have
      still called generic_write_sync() in case `ret` already contained an
      error. This fix ensures generic_write_sync() is only called when there's
      no pending error in the write. Additionally, transferred is replaced
      with ret to bring this code in line with other callers.
      
      Fixes: e2592217 ("fs: simplify the generic_write_sync prototype")
      Reported-by: NRavi Nankani <rnankani@amazon.com>
      Signed-off-by: NMaximilian Heyne <mheyne@amazon.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      CC: Torsten Mehlan <tomeh@amazon.de>
      CC: Uwe Dannowski <uwed@amazon.de>
      CC: Amit Shah <aams@amazon.de>
      CC: David Woodhouse <dwmw@amazon.co.uk>
      CC: stable@vger.kernel.org
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      adcd35a3
    • J
      perf/x86/intel: Disallow precise_ip on BTS events · 205af59e
      Jiri Olsa 提交于
      commit 472de49fdc53365c880ab81ae2b5cfdd83db0b06 upstream.
      
      Vince reported a crash in the BTS flush code when touching the callchain
      data, which was supposed to be initialized as an 'early' callchain,
      but intel_pmu_drain_bts_buffer() does not do that:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
        ...
        Call Trace:
         <IRQ>
         intel_pmu_drain_bts_buffer+0x151/0x220
         ? intel_get_event_constraints+0x219/0x360
         ? perf_assign_events+0xe2/0x2a0
         ? select_idle_sibling+0x22/0x3a0
         ? __update_load_avg_se+0x1ec/0x270
         ? enqueue_task_fair+0x377/0xdd0
         ? cpumask_next_and+0x19/0x20
         ? load_balance+0x134/0x950
         ? check_preempt_curr+0x7a/0x90
         ? ttwu_do_wakeup+0x19/0x140
         x86_pmu_stop+0x3b/0x90
         x86_pmu_del+0x57/0x160
         event_sched_out.isra.106+0x81/0x170
         group_sched_out.part.108+0x51/0xc0
         __perf_event_disable+0x7f/0x160
         event_function+0x8c/0xd0
         remote_function+0x3c/0x50
         flush_smp_call_function_queue+0x35/0xe0
         smp_call_function_single_interrupt+0x3a/0xd0
         call_function_single_interrupt+0xf/0x20
         </IRQ>
      
      It was triggered by fuzzer but can be easily reproduced by:
      
        # perf record -e cpu/branch-instructions/pu -g -c 1
      
      Peter suggested not to allow branch tracing for precise events:
      
       > Now arguably, this is really stupid behaviour. Who in his right mind
       > wants callchain output on BTS entries. And even if they do, BTS +
       > precise_ip is nonsensical.
       >
       > So in my mind disallowing precise_ip on BTS would be the simplest fix.
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Reported-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 6cbc304f ("perf/x86/intel: Fix unwind errors from PEBS entries (mk-II)")
      Link: http://lkml.kernel.org/r/20181121101612.16272-3-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      205af59e
    • J
      perf/x86/intel: Add generic branch tracing check to intel_pmu_has_bts() · be0e2e24
      Jiri Olsa 提交于
      commit 67266c1080ad56c31af72b9c18355fde8ccc124a upstream.
      
      Currently we check the branch tracing only by checking for the
      PERF_COUNT_HW_BRANCH_INSTRUCTIONS event of PERF_TYPE_HARDWARE
      type. But we can define the same event with the PERF_TYPE_RAW
      type.
      
      Changing the intel_pmu_has_bts() code to check on event's final
      hw config value, so both HW types are covered.
      
      Adding unlikely to intel_pmu_has_bts() condition calls, because
      it was used in the original code in intel_bts_constraints.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/20181121101612.16272-2-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      be0e2e24
    • J
      perf/x86/intel: Move branch tracing setup to the Intel-specific source file · ad65b548
      Jiri Olsa 提交于
      commit ed6101bbf6266ee83e620b19faa7c6ad56bb41ab upstream.
      
      Moving branch tracing setup to Intel core object into separate
      intel_pmu_bts_config function, because it's Intel specific.
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/20181121101612.16272-1-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ad65b548
    • S
      x86/fpu: Disable bottom halves while loading FPU registers · 33448a8b
      Sebastian Andrzej Siewior 提交于
      commit 68239654acafe6aad5a3c1dc7237e60accfebc03 upstream.
      
      The sequence
      
        fpu->initialized = 1;		/* step A */
        preempt_disable();		/* step B */
        fpu__restore(fpu);
        preempt_enable();
      
      in __fpu__restore_sig() is racy in regard to a context switch.
      
      For 32bit frames, __fpu__restore_sig() prepares the FPU state within
      fpu->state. To ensure that a context switch (switch_fpu_prepare() in
      particular) does not modify fpu->state it uses fpu__drop() which sets
      fpu->initialized to 0.
      
      After fpu->initialized is cleared, the CPU's FPU state is not saved
      to fpu->state during a context switch. The new state is loaded via
      fpu__restore(). It gets loaded into fpu->state from userland and
      ensured it is sane. fpu->initialized is then set to 1 in order to avoid
      fpu__initialize() doing anything (overwrite the new state) which is part
      of fpu__restore().
      
      A context switch between step A and B above would save CPU's current FPU
      registers to fpu->state and overwrite the newly prepared state. This
      looks like a tiny race window but the Kernel Test Robot reported this
      back in 2016 while we had lazy FPU support. Borislav Petkov made the
      link between that report and another patch that has been posted. Since
      the removal of the lazy FPU support, this race goes unnoticed because
      the warning has been removed.
      
      Disable bottom halves around the restore sequence to avoid the race. BH
      need to be disabled because BH is allowed to run (even with preemption
      disabled) and might invoke kernel_fpu_begin() by doing IPsec.
      
       [ bp: massage commit message a bit. ]
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
      Cc: kvm ML <kvm@vger.kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: stable@vger.kernel.org
      Cc: x86-ml <x86@kernel.org>
      Link: http://lkml.kernel.org/r/20181120102635.ddv3fvavxajjlfqk@linutronix.de
      Link: https://lkml.kernel.org/r/20160226074940.GA28911@pd.tnicSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      33448a8b
    • B
      x86/MCE/AMD: Fix the thresholding machinery initialization order · 00f91adf
      Borislav Petkov 提交于
      commit 60c8144afc287ef09ce8c1230c6aa972659ba1bb upstream.
      
      Currently, the code sets up the thresholding interrupt vector and only
      then goes about initializing the thresholding banks. Which is wrong,
      because an early thresholding interrupt would cause a NULL pointer
      dereference when accessing those banks and prevent the machine from
      booting.
      
      Therefore, set the thresholding interrupt vector only *after* having
      initialized the banks successfully.
      
      Fixes: 18807ddb ("x86/mce/AMD: Reset Threshold Limit after logging error")
      Reported-by: NRafał Miłecki <rafal@milecki.pl>
      Reported-by: NJohn Clemens <clemej@gmail.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Tested-by: NRafał Miłecki <rafal@milecki.pl>
      Tested-by: NJohn Clemens <john@deater.net>
      Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
      Cc: linux-edac@vger.kernel.org
      Cc: stable@vger.kernel.org
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: x86@kernel.org
      Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
      Link: https://lkml.kernel.org/r/20181127101700.2964-1-zajec5@gmail.com
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=201291Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      00f91adf
    • C
      arm64: dts: rockchip: Fix PCIe reset polarity for rk3399-puma-haikou. · 8af02415
      Christoph Muellner 提交于
      commit c1d91f86a1b4c9c05854d59c6a0abd5d0f75b849 upstream.
      
      This patch fixes the wrong polarity setting for the PCIe host driver's
      pre-reset pin for rk3399-puma-haikou. Without this patch link training
      will most likely fail.
      
      Fixes: 60fd9f72 ("arm64: dts: rockchip: add Haikou baseboard with RK3399-Q7 SoM")
      Cc: stable@vger.kernel.org
      Signed-off-by: NChristoph Muellner <christoph.muellner@theobroma-systems.com>
      Signed-off-by: NHeiko Stuebner <heiko@sntech.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8af02415
    • M
      PCI: Fix incorrect value returned from pcie_get_speed_cap() · ab770216
      Mikulas Patocka 提交于
      commit f1f90e254e46e0a14220e4090041f68256fbe297 upstream.
      
      The macros PCI_EXP_LNKCAP_SLS_*GB are values, not bit masks.  We must mask
      the register and compare it against them.
      
      This fixes errors like this:
      
        amdgpu: [powerplay] failed to send message 261 ret is 0
      
      when a PCIe-v3 card is plugged into a PCIe-v1 slot, because the slot is
      being incorrectly reported as PCIe-v3 capable.
      
      6cf57be0, which appeared in v4.17, added pcie_get_speed_cap() with the
      incorrect test of PCI_EXP_LNKCAP_SLS as a bitmask.  5d9a6330, which
      appeared in v4.19, changed amdgpu to use pcie_get_speed_cap(), so the
      amdgpu bug reports below are regressions in v4.19.
      
      Fixes: 6cf57be0 ("PCI: Add pcie_get_speed_cap() to find max supported link speed")
      Fixes: 5d9a6330 ("drm/amdgpu: use pcie functions for link width and speed")
      Link: https://bugs.freedesktop.org/show_bug.cgi?id=108704
      Link: https://bugs.freedesktop.org/show_bug.cgi?id=108778Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      [bhelgaas: update comment, remove use of PCI_EXP_LNKCAP_SLS_8_0GB and
      PCI_EXP_LNKCAP_SLS_16_0GB since those should be covered by PCI_EXP_LNKCAP2,
      remove test of PCI_EXP_LNKCAP for zero, since that register is required]
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: NAlex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org	# v4.17+
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ab770216
    • G
      PCI: dwc: Fix MSI-X EP framework address calculation bug · 1ce69ec3
      Gustavo Pimentel 提交于
      commit 15cb127e3c8f6232096d5dba6a5b4046bc292d70 upstream.
      
      Fix an error caused by 3-bit right rotation on offset address
      calculation of MSI-X table in dw_pcie_ep_raise_msix_irq().
      
      The initial testing code was setting by default the offset address of
      MSI-X table to zero, so that even with a 3-bit right rotation the
      computed result would still be zero and valid, therefore this bug went
      unnoticed.
      
      Fixes: beb4641a ("PCI: dwc: Add MSI-X callbacks handler")
      Signed-off-by: NGustavo Pimentel <gustavo.pimentel@synopsys.com>
      [lorenzo.pieralisi@arm.com: updated commit log]
      Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1ce69ec3
    • H
      PCI: layerscape: Fix wrong invocation of outbound window disable accessor · b391ed73
      Hou Zhiqiang 提交于
      commit c6fd6fe9dea44732cdcd970f1130b8cc50ad685a upstream.
      
      The order of parameters is not correct when invoking the outbound
      window disable routine. Fix it.
      
      Fixes: 4a2745d7 ("PCI: layerscape: Disable outbound windows configured by bootloader")
      Signed-off-by: NHou Zhiqiang <Zhiqiang.Hou@nxp.com>
      [lorenzo.pieralisi@arm.com: commit log]
      Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b391ed73
    • P
      btrfs: relocation: set trans to be NULL after ending transaction · 59065765
      Pan Bian 提交于
      commit 42a657f57628402c73237547f0134e083e2f6764 upstream.
      
      The function relocate_block_group calls btrfs_end_transaction to release
      trans when update_backref_cache returns 1, and then continues the loop
      body. If btrfs_block_rsv_refill fails this time, it will jump out the
      loop and the freed trans will be accessed. This may result in a
      use-after-free bug. The patch assigns NULL to trans after trans is
      released so that it will not be accessed.
      
      Fixes: 0647bf56 ("Btrfs: improve forever loop when doing balance relocation")
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NPan Bian <bianpan2016@163.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      59065765
    • F
      Btrfs: fix race between enabling quotas and subvolume creation · 172a94eb
      Filipe Manana 提交于
      commit 552f0329c75b3e1d7f9bb8c9e421d37403f192cd upstream.
      
      We have a race between enabling quotas end subvolume creation that cause
      subvolume creation to fail with -EINVAL, and the following diagram shows
      how it happens:
      
                    CPU 0                                          CPU 1
      
       btrfs_ioctl()
        btrfs_ioctl_quota_ctl()
         btrfs_quota_enable()
          mutex_lock(fs_info->qgroup_ioctl_lock)
      
                                                        btrfs_ioctl()
                                                         create_subvol()
                                                          btrfs_qgroup_inherit()
                                                           -> save fs_info->quota_root
                                                              into quota_root
                                                           -> stores a NULL value
                                                           -> tries to lock the mutex
                                                              qgroup_ioctl_lock
                                                              -> blocks waiting for
                                                                 the task at CPU0
      
         -> sets BTRFS_FS_QUOTA_ENABLED in fs_info
         -> sets quota_root in fs_info->quota_root
            (non-NULL value)
      
         mutex_unlock(fs_info->qgroup_ioctl_lock)
      
                                                           -> checks quota enabled
                                                              flag is set
                                                           -> returns -EINVAL because
                                                              fs_info->quota_root was
                                                              NULL before it acquired
                                                              the mutex
                                                              qgroup_ioctl_lock
                                                         -> ioctl returns -EINVAL
      
      Returning -EINVAL to user space will be confusing if all the arguments
      passed to the subvolume creation ioctl were valid.
      
      Fix it by grabbing the value from fs_info->quota_root after acquiring
      the mutex.
      
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      172a94eb
    • F
      Btrfs: fix rare chances for data loss when doing a fast fsync · 715608db
      Filipe Manana 提交于
      commit aab15e8ec25765cf7968c72cbec7583acf99d8a4 upstream.
      
      After the simplification of the fast fsync patch done recently by commit
      b5e6c3e1 ("btrfs: always wait on ordered extents at fsync time") and
      commit e7175a69 ("btrfs: remove the wait ordered logic in the
      log_one_extent path"), we got a very short time window where we can get
      extents logged without writeback completing first or extents logged
      without logging the respective data checksums. Both issues can only happen
      when doing a non-full (fast) fsync.
      
      As soon as we enter btrfs_sync_file() we trigger writeback, then lock the
      inode and then wait for the writeback to complete before starting to log
      the inode. However before we acquire the inode's lock and after we started
      writeback, it's possible that more writes happened and dirtied more pages.
      If that happened and those pages get writeback triggered while we are
      logging the inode (for example, the VM subsystem triggering it due to
      memory pressure, or another concurrent fsync), we end up seeing the
      respective extent maps in the inode's list of modified extents and will
      log matching file extent items without waiting for the respective
      ordered extents to complete, meaning that either of the following will
      happen:
      
      1) We log an extent after its writeback finishes but before its checksums
         are added to the csum tree, leading to -EIO errors when attempting to
         read the extent after a log replay.
      
      2) We log an extent before its writeback finishes.
         Therefore after the log replay we will have a file extent item pointing
         to an unwritten extent (and without the respective data checksums as
         well).
      
      This could not happen before the fast fsync patch simplification, because
      for any extent we found in the list of modified extents, we would wait for
      its respective ordered extent to finish writeback or collect its checksums
      for logging if it did not complete yet.
      
      Fix this by triggering writeback again after acquiring the inode's lock
      and before waiting for ordered extents to complete.
      
      Fixes: e7175a69 ("btrfs: remove the wait ordered logic in the log_one_extent path")
      Fixes: b5e6c3e1 ("btrfs: always wait on ordered extents at fsync time")
      CC: stable@vger.kernel.org # 4.19+
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      715608db
    • F
      Btrfs: ensure path name is null terminated at btrfs_control_ioctl · 78a2890f
      Filipe Manana 提交于
      commit f505754fd6599230371cb01b9332754ddc104be1 upstream.
      
      We were using the path name received from user space without checking that
      it is null terminated. While btrfs-progs is well behaved and does proper
      validation and null termination, someone could call the ioctl and pass
      a non-null terminated patch, leading to buffer overrun problems in the
      kernel.  The ioctl is protected by CAP_SYS_ADMIN.
      
      So just set the last byte of the path to a null character, similar to what
      we do in other ioctls (add/remove/resize device, snapshot creation, etc).
      
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      78a2890f
    • N
      btrfs: Always try all copies when reading extent buffers · aaf249e3
      Nikolay Borisov 提交于
      commit f8397d69daef06d358430d3054662fb597e37c00 upstream.
      
      When a metadata read is served the endio routine btree_readpage_end_io_hook
      is called which eventually runs the tree-checker. If tree-checker fails
      to validate the read eb then it sets EXTENT_BUFFER_CORRUPT flag. This
      leads to btree_read_extent_buffer_pages wrongly assuming that all
      available copies of this extent buffer are wrong and failing prematurely.
      Fix this modify btree_read_extent_buffer_pages to read all copies of
      the data.
      
      This failure was exhibitted in xfstests btrfs/124 which would
      spuriously fail its balance operations. The reason was that when balance
      was run following re-introduction of the missing raid1 disk
      __btrfs_map_block would map the read request to stripe 0, which
      corresponded to devid 2 (the disk which is being removed in the test):
      
          item 2 key (FIRST_CHUNK_TREE CHUNK_ITEM 3553624064) itemoff 15975 itemsize 112
      	length 1073741824 owner 2 stripe_len 65536 type DATA|RAID1
      	io_align 65536 io_width 65536 sector_size 4096
      	num_stripes 2 sub_stripes 1
      		stripe 0 devid 2 offset 2156920832
      		dev_uuid 8466c350-ed0c-4c3b-b17d-6379b445d5c8
      		stripe 1 devid 1 offset 3553624064
      		dev_uuid 1265d8db-5596-477e-af03-df08eb38d2ca
      
      This caused read requests for a checksum item that to be routed to the
      stale disk which triggered the aforementioned logic involving
      EXTENT_BUFFER_CORRUPT flag. This then triggered cascading failures of
      the balance operation.
      
      Fixes: a826d6dc ("Btrfs: check items for correctness as we search")
      CC: stable@vger.kernel.org # 4.4+
      Suggested-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aaf249e3
    • J
      udf: Allow mounting volumes with incorrect identification strings · 949ddf80
      Jan Kara 提交于
      commit b54e41f5efcb4316b2f30b30c2535cc194270373 upstream.
      
      Commit c26f6c61 ("udf: Fix conversion of 'dstring' fields to UTF8")
      started to be more strict when checking whether converted strings are
      properly formatted. Sudip reports that there are DVDs where the volume
      identification string is actually too long - UDF reports:
      
      [  632.309320] UDF-fs: incorrect dstring lengths (32/32)
      
      during mount and fails the mount. This is mostly harmless failure as we
      don't need volume identification (and even less volume set
      identification) for anything. So just truncate the volume identification
      string if it is too long and replace it with 'Invalid' if we just cannot
      convert it for other reasons. This keeps slightly incorrect media still
      mountable.
      
      CC: stable@vger.kernel.org
      Fixes: c26f6c61 ("udf: Fix conversion of 'dstring' fields to UTF8")
      Reported-and-tested-by: NSudip Mukherjee <sudipm.mukherjee@gmail.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      949ddf80
    • M
      xtensa: fix coprocessor part of ptrace_{get,set}xregs · 01fb21bf
      Max Filippov 提交于
      commit 38a35a78c5e270cbe53c4fef6b0d3c2da90dd849 upstream.
      
      Layout of coprocessor registers in the elf_xtregs_t and
      xtregs_coprocessor_t may be different due to alignment. Thus it is not
      always possible to copy data between the xtregs_coprocessor_t structure
      and the elf_xtregs_t and get correct values for all registers.
      Use a table of offsets and sizes of individual coprocessor register
      groups to do coprocessor context copying in the ptrace_getxregs and
      ptrace_setxregs.
      This fixes incorrect coprocessor register values reading from the user
      process by the native gdb on an xtensa core with multiple coprocessors
      and registers with high alignment requirements.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      01fb21bf
    • M
      xtensa: fix coprocessor context offset definitions · 5f84a996
      Max Filippov 提交于
      commit 03bc996af0cc71c7f30c384d8ce7260172423b34 upstream.
      
      Coprocessor context offsets are used by the assembly code that moves
      coprocessor context between the individual fields of the
      thread_info::xtregs_cp structure and coprocessor registers.
      This fixes coprocessor context clobbering on flushing and reloading
      during normal user code execution and user process debugging in the
      presence of more than one coprocessor in the core configuration.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5f84a996
    • M
      xtensa: enable coprocessors that are being flushed · 4ec1039f
      Max Filippov 提交于
      commit 2958b66694e018c552be0b60521fec27e8d12988 upstream.
      
      coprocessor_flush_all may be called from a context of a thread that is
      different from the thread being flushed. In that case contents of the
      cpenable special register may not match ti->cpenable of the target
      thread, resulting in unhandled coprocessor exception in the kernel
      context.
      Set cpenable special register to the ti->cpenable of the target register
      for the duration of the flush and restore it afterwards.
      This fixes the following crash caused by coprocessor register inspection
      in native gdb:
      
        (gdb) p/x $w0
        Illegal instruction in kernel: sig: 9 [#1] PREEMPT
        Call Trace:
          ___might_sleep+0x184/0x1a4
          __might_sleep+0x41/0xac
          exit_signals+0x14/0x218
          do_exit+0xc9/0x8b8
          die+0x99/0xa0
          do_illegal_instruction+0x18/0x6c
          common_exception+0x77/0x77
          coprocessor_flush+0x16/0x3c
          arch_ptrace+0x46c/0x674
          sys_ptrace+0x2ce/0x3b4
          system_call+0x54/0x80
          common_exception+0x77/0x77
        note: gdb[100] exited with preempt_count 1
        Killed
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4ec1039f
    • L
      KVM: VMX: re-add ple_gap module parameter · bbe23c4b
      Luiz Capitulino 提交于
      commit a87c99e61236ba8ca962ce97a19fab5ebd588d35 upstream.
      
      Apparently, the ple_gap parameter was accidentally removed
      by commit c8e88717. Add it
      back.
      Signed-off-by: NLuiz Capitulino <lcapitulino@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: c8e88717Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bbe23c4b
    • W
      KVM: X86: Fix scan ioapic use-before-initialization · 61c42d65
      Wanpeng Li 提交于
      commit e97f852fd4561e77721bb9a4e0ea9d98305b1e93 upstream.
      
      Reported by syzkaller:
      
       BUG: unable to handle kernel NULL pointer dereference at 00000000000001c8
       PGD 80000003ec4da067 P4D 80000003ec4da067 PUD 3f7bfa067 PMD 0
       Oops: 0000 [#1] PREEMPT SMP PTI
       CPU: 7 PID: 5059 Comm: debug Tainted: G           OE     4.19.0-rc5 #16
       RIP: 0010:__lock_acquire+0x1a6/0x1990
       Call Trace:
        lock_acquire+0xdb/0x210
        _raw_spin_lock+0x38/0x70
        kvm_ioapic_scan_entry+0x3e/0x110 [kvm]
        vcpu_enter_guest+0x167e/0x1910 [kvm]
        kvm_arch_vcpu_ioctl_run+0x35c/0x610 [kvm]
        kvm_vcpu_ioctl+0x3e9/0x6d0 [kvm]
        do_vfs_ioctl+0xa5/0x690
        ksys_ioctl+0x6d/0x80
        __x64_sys_ioctl+0x1a/0x20
        do_syscall_64+0x83/0x6e0
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The reason is that the testcase writes hyperv synic HV_X64_MSR_SINT6 msr
      and triggers scan ioapic logic to load synic vectors into EOI exit bitmap.
      However, irqchip is not initialized by this simple testcase, ioapic/apic
      objects should not be accessed.
      This can be triggered by the following program:
      
          #define _GNU_SOURCE
      
          #include <endian.h>
          #include <stdint.h>
          #include <stdio.h>
          #include <stdlib.h>
          #include <string.h>
          #include <sys/syscall.h>
          #include <sys/types.h>
          #include <unistd.h>
      
          uint64_t r[3] = {0xffffffffffffffff, 0xffffffffffffffff, 0xffffffffffffffff};
      
          int main(void)
          {
          	syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0);
          	long res = 0;
          	memcpy((void*)0x20000040, "/dev/kvm", 9);
          	res = syscall(__NR_openat, 0xffffffffffffff9c, 0x20000040, 0, 0);
          	if (res != -1)
          		r[0] = res;
          	res = syscall(__NR_ioctl, r[0], 0xae01, 0);
          	if (res != -1)
          		r[1] = res;
          	res = syscall(__NR_ioctl, r[1], 0xae41, 0);
          	if (res != -1)
          		r[2] = res;
          	memcpy(
          			(void*)0x20000080,
          			"\x01\x00\x00\x00\x00\x5b\x61\xbb\x96\x00\x00\x40\x00\x00\x00\x00\x01\x00"
          			"\x08\x00\x00\x00\x00\x00\x0b\x77\xd1\x78\x4d\xd8\x3a\xed\xb1\x5c\x2e\x43"
          			"\xaa\x43\x39\xd6\xff\xf5\xf0\xa8\x98\xf2\x3e\x37\x29\x89\xde\x88\xc6\x33"
          			"\xfc\x2a\xdb\xb7\xe1\x4c\xac\x28\x61\x7b\x9c\xa9\xbc\x0d\xa0\x63\xfe\xfe"
          			"\xe8\x75\xde\xdd\x19\x38\xdc\x34\xf5\xec\x05\xfd\xeb\x5d\xed\x2e\xaf\x22"
          			"\xfa\xab\xb7\xe4\x42\x67\xd0\xaf\x06\x1c\x6a\x35\x67\x10\x55\xcb",
          			106);
          	syscall(__NR_ioctl, r[2], 0x4008ae89, 0x20000080);
          	syscall(__NR_ioctl, r[2], 0xae80, 0);
          	return 0;
          }
      
      This patch fixes it by bailing out scan ioapic if ioapic is not initialized in
      kernel.
      Reported-by: NWei Wu <ww9210@gmail.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Wei Wu <ww9210@gmail.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      61c42d65
    • W
      KVM: LAPIC: Fix pv ipis use-before-initialization · ffb01e73
      Wanpeng Li 提交于
      commit 38ab012f109caf10f471db1adf284e620dd8d701 upstream.
      
      Reported by syzkaller:
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000014
       PGD 800000040410c067 P4D 800000040410c067 PUD 40410d067 PMD 0
       Oops: 0000 [#1] PREEMPT SMP PTI
       CPU: 3 PID: 2567 Comm: poc Tainted: G           OE     4.19.0-rc5 #16
       RIP: 0010:kvm_pv_send_ipi+0x94/0x350 [kvm]
       Call Trace:
        kvm_emulate_hypercall+0x3cc/0x700 [kvm]
        handle_vmcall+0xe/0x10 [kvm_intel]
        vmx_handle_exit+0xc1/0x11b0 [kvm_intel]
        vcpu_enter_guest+0x9fb/0x1910 [kvm]
        kvm_arch_vcpu_ioctl_run+0x35c/0x610 [kvm]
        kvm_vcpu_ioctl+0x3e9/0x6d0 [kvm]
        do_vfs_ioctl+0xa5/0x690
        ksys_ioctl+0x6d/0x80
        __x64_sys_ioctl+0x1a/0x20
        do_syscall_64+0x83/0x6e0
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The reason is that the apic map has not yet been initialized, the testcase
      triggers pv_send_ipi interface by vmcall which results in kvm->arch.apic_map
      is dereferenced. This patch fixes it by checking whether or not apic map is
      NULL and bailing out immediately if that is the case.
      
      Fixes: 4180bf1b (KVM: X86: Implement "send IPI" hypercall)
      Reported-by: NWei Wu <ww9210@gmail.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Wei Wu <ww9210@gmail.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ffb01e73
    • L
      KVM: x86: Fix kernel info-leak in KVM_HC_CLOCK_PAIRING hypercall · 6d772df4
      Liran Alon 提交于
      commit bcbfbd8ec21096027f1ee13ce6c185e8175166f6 upstream.
      
      kvm_pv_clock_pairing() allocates local var
      "struct kvm_clock_pairing clock_pairing" on stack and initializes
      all it's fields besides padding (clock_pairing.pad[]).
      
      Because clock_pairing var is written completely (including padding)
      to guest memory, failure to init struct padding results in kernel
      info-leak.
      
      Fix the issue by making sure to also init the padding with zeroes.
      
      Fixes: 55dd00a7 ("KVM: x86: add KVM_HC_CLOCK_PAIRING hypercall")
      Reported-by: syzbot+a8ef68d71211ba264f56@syzkaller.appspotmail.com
      Reviewed-by: NMark Kanda <mark.kanda@oracle.com>
      Signed-off-by: NLiran Alon <liran.alon@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6d772df4
    • L
      KVM: nVMX/nSVM: Fix bug which sets vcpu->arch.tsc_offset to L1 tsc_offset · 76c8476c
      Leonid Shatz 提交于
      commit 326e742533bf0a23f0127d8ea62fb558ba665f08 upstream.
      
      Since commit e79f245d ("X86/KVM: Properly update 'tsc_offset' to
      represent the running guest"), vcpu->arch.tsc_offset meaning was
      changed to always reflect the tsc_offset value set on active VMCS.
      Regardless if vCPU is currently running L1 or L2.
      
      However, above mentioned commit failed to also change
      kvm_vcpu_write_tsc_offset() to set vcpu->arch.tsc_offset correctly.
      This is because vmx_write_tsc_offset() could set the tsc_offset value
      in active VMCS to given offset parameter *plus vmcs12->tsc_offset*.
      However, kvm_vcpu_write_tsc_offset() just sets vcpu->arch.tsc_offset
      to given offset parameter. Without taking into account the possible
      addition of vmcs12->tsc_offset. (Same is true for SVM case).
      
      Fix this issue by changing kvm_x86_ops->write_tsc_offset() to return
      actually set tsc_offset in active VMCS and modify
      kvm_vcpu_write_tsc_offset() to set returned value in
      vcpu->arch.tsc_offset.
      In addition, rename write_tsc_offset() callback to write_l1_tsc_offset()
      to make it clear that it is meant to set L1 TSC offset.
      
      Fixes: e79f245d ("X86/KVM: Properly update 'tsc_offset' to represent the running guest")
      Reviewed-by: NLiran Alon <liran.alon@oracle.com>
      Reviewed-by: NMihai Carabas <mihai.carabas@oracle.com>
      Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Signed-off-by: NLeonid Shatz <leonid.shatz@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      76c8476c