1. 27 12月, 2019 40 次提交
    • M
      net/mlx5: Fix atomic_mode enum values · 12d4e210
      Moni Shoua 提交于
      mainline inclusion
      from mainline-4.20-rc1
      commit aa7e80b220f3a543eefbe4b7e2c5d2b73e2e2ef7
      category: bugfix
      bugzilla: 5560
      CVE: NA
      
      ---------------------------
      
      The field atomic_mode is 4 bits wide and therefore can hold values
      from 0x0 to 0xf. Remove the unnecessary 20 bit shift that made the values
      be incorrect. While that, remove unused enum values.
      
      Fixes: 57cda166 ("net/mlx5: Add DCT command interface")
      Signed-off-by: NMoni Shoua <monis@mellanox.com>
      Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Yan <yanaijie@huawei.com>
      Reviewed-by: Nzhengbin <zhengbin13@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      12d4e210
    • R
      gpio: fix kernel-doc notation warning for 'request_key' · 3f446a97
      Randy Dunlap 提交于
      mainline inclusion
      from mainline-4.20-rc1
      commit 02ad0437decf2e5dba975c23b1a89775f4b211e1
      category: bugfix
      bugzilla: 5562
      CVE: NA
      
      ---------------------------
      
      Fix kernel-doc warning for missing struct member 'request_key':
      
      ../include/linux/gpio/driver.h:142: warning: Function parameter or member 'request_key' not described in 'gpio_irq_chip'
      
      Fixes: 39c3fd58 ("kernel/irq: Extend lockdep class for request mutex")
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Cc: Andrew Lunn <andrew@lunn.ch>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: linux-gpio@vger.kernel.org
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NJason Yan <yanaijie@huawei.com>
      Reviewed-by: Nzhengbin <zhengbin13@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      3f446a97
    • K
      kernfs: update comment about kernfs_path() return value · 0435096f
      Konstantin Khlebnikov 提交于
      mainline inclusion
      from mainline-4.20-rc1
      commit 8f5be0ec23bb9ef3f96659c8dff1340b876600bf
      category: bugfix
      bugzilla: 5564
      CVE: NA
      
      ---------------------------
      
      Now it returns the length of the full path or error code.
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Fixes: 3abb1d90 ("kernfs: make kernfs_path*() behave in the style of strlcpy()")
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NJason Yan <yanaijie@huawei.com>
      Reviewed-by: Nzhengbin <zhengbin13@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      0435096f
    • M
      build_bug.h: remove most of dummy BUILD_BUG_ON stubs for Sparse · 9baadf68
      Masahiro Yamada 提交于
      mainline inclusion
      from mainline-5.0-rc1
      commit 527edbc18a70e745740ef31edb0ffefb2f161afa
      category: bugfix
      bugzilla: 6820
      CVE: NA
      
      ---------------------------
      
      The introduction of these dummy BUILD_BUG_ON stubs dates back to commmit
      903c0c7c ("sparse: define dummy BUILD_BUG_ON definition for
      sparse").
      
      At that time, BUILD_BUG_ON() was implemented with the negative array
      trick *and* the link-time trick, like this:
      
        extern int __build_bug_on_failed;
        #define BUILD_BUG_ON(condition)                                \
                do {                                                   \
                        ((void)sizeof(char[1 - 2*!!(condition)]));     \
                        if (condition) __build_bug_on_failed = 1;      \
                } while(0)
      
      Sparse is more strict about the negative array trick than GCC because
      Sparse requires the array length to be really constant.
      
      Here is the simple test code for the macro above:
      
        static const int x = 0;
        BUILD_BUG_ON(x);
      
      GCC is absolutely fine with it (-Wvla was enabled only very recently),
      but Sparse warns like this:
      
        error: bad constant expression
        error: cannot size expression
      
      (If you are using a newer version of Sparse, you will see a different
      warning message, "warning: Variable length array is used".)
      
      Anyway, Sparse was producing many false positives, and noisier than it
      should be at that time.
      
      With the previous commit, the leftover negative array trick is gone.
      Sparse is fine with the current BUILD_BUG_ON(), which is implemented by
      using the 'error' attribute.
      
      I am keeping the stub for BUILD_BUG_ON_ZERO().  Otherwise, Sparse would
      complain about the following code, which GCC is fine with:
      
        static const int x = 0;
        int y = BUILD_BUG_ON_ZERO(x);
      
      Link: http://lkml.kernel.org/r/1542856462-18836-3-git-send-email-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NLuc Van Oostenryck <luc.vanoostenryck@gmail.com>
      Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
      Tested-by: NNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NJason Yan <yanaijie@huawei.com>
      Reviewed-by: Nzhengbin <zhengbin13@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      9baadf68
    • M
      include/linux/compiler*.h: fix OPTIMIZER_HIDE_VAR · 05505fc0
      Michael S. Tsirkin 提交于
      mainline inclusion
      from mainline-v5.0-rc3
      commit 3e2ffd655cc6a694608d997738989ff5572a8266
      category: bugfix
      bugzilla: 7094
      CVE: NA
      
      ---------------------------
      
      Since commit 815f0ddb ("include/linux/compiler*.h: make compiler-*.h
      mutually exclusive") clang no longer reuses the OPTIMIZER_HIDE_VAR macro
      from compiler-gcc - instead it gets the version in
      include/linux/compiler.h.  Unfortunately that version doesn't actually
      prevent compiler from optimizing out the variable.
      
      Fix up by moving the macro out from compiler-gcc.h to compiler.h.
      Compilers without incline asm support will keep working
      since it's protected by an ifdef.
      
      Also fix up comments to match reality since we are no longer overriding
      any macros.
      
      Build-tested with gcc and clang.
      
      Fixes: 815f0ddb ("include/linux/compiler*.h: make compiler-*.h mutually exclusive")
      Cc: Eli Friedman <efriedma@codeaurora.org>
      Cc: Joe Perches <joe@perches.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NMiguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Signed-off-by: NJason Yan <yanaijie@huawei.com>
      Reviewed-by: Nzhengbin <zhengbin13@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      05505fc0
    • R
      of: Fix property name in of_node_get_device_type · 4e3cc0cf
      Rob Herring 提交于
      mainline inclusion
      from mainline-4.20-rc1
      commit 5d5a0ab1a7918fce5ca5c0fb1871a3e2000f85de
      category: bugfix
      bugzilla: 5547
      CVE: NA
      
      ---------------------------
      
      Commit 0413beda ("of: Add device_type access helper functions")
      added a new helper not yet used in preparation for some treewide clean
      up of accesses to 'device_type' properties. Unfortunately, there's an
      error and 'type' was used for the property name. Fix this.
      
      Fixes: 0413beda ("of: Add device_type access helper functions")
      Cc: Frank Rowand <frowand.list@gmail.com>
      Signed-off-by: NRob Herring <robh@kernel.org>
      Signed-off-by: NJason Yan <yanaijie@huawei.com>
      Reviewed-by: Nzhengbin <zhengbin13@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      4e3cc0cf
    • J
      gpio: drop broken to_gpio_irq_chip() helper · 7c7db3d1
      Johan Hovold 提交于
      mainline inclusion
      from mainline-5.0-rc1
      commit eee3919c5f2949a8b7b1e9fa239d153be1538656
      category: bugfix
      bugzilla: 5548
      CVE: NA
      
      ---------------------------
      
      Drop the broken to_gpio_irq_chip() container_of() helper, which would
      break the build for anyone who tries to use it.
      
      Specifically, struct gpio_irq_chip only holds a pointer to a struct
      irq_chip so using container_of() on an irq-chip pointer makes no sense.
      
      Fixes: da80ff81 ("gpio: Move irqchip into struct gpio_irq_chip")
      Cc: Thierry Reding <treding@nvidia.com>
      Cc: Grygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: NJohan Hovold <johan@kernel.org>
      Reviewed-by: NBartosz Golaszewski <bgolaszewski@baylibre.com>
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NJason Yan <yanaijie@huawei.com>
      Reviewed-by: Nzhengbin <zhengbin13@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      7c7db3d1
    • C
      xprtrdma: Squelch a sparse warning · c163fd45
      Chuck Lever 提交于
      mainline inclusion
      from mainline-4.20-rc1
      commit 470443e0b379b070305629f911cc09562bdf324f
      category: bugfix
      bugzilla: 5551
      CVE: NA
      
      ---------------------------
      
      linux/include/trace/events/rpcrdma.h:501:1: warning: expression using sizeof bool
      linux/include/trace/events/rpcrdma.h:501:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1)
      
      Fixes: ab03eff5 ("xprtrdma: Add trace points in RPC Call ... ")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: NJason Yan <yanaijie@huawei.com>
      Reviewed-by: Nzhengbin <zhengbin13@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      c163fd45
    • R
      srcu: Fix kernel-doc missing notation · 6533f24f
      Randy Dunlap 提交于
      mainline inclusion
      from mainline-5.0-rc1
      commit f3e763c3e544b73ae5c4a3842cedb9ff6ca37715
      category: bugfix
      bugzilla: 5552
      CVE: NA
      
      ---------------------------
      
      Fix kernel-doc warnings for missing parameter descriptions:
      
      ../include/linux/srcu.h:175: warning: Function parameter or member 'p' not described in 'srcu_dereference_notrace'
      ../include/linux/srcu.h:175: warning: Function parameter or member 'sp' not described in 'srcu_dereference_notrace'
      
      Fixes: 0b764a6e ("srcu: Add notrace variant of srcu_dereference")
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NJason Yan <yanaijie@huawei.com>
      Reviewed-by: Nzhengbin <zhengbin13@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      6533f24f
    • M
      compiler.h: give up __compiletime_assert_fallback() · bca68b4a
      Masahiro Yamada 提交于
      mainline inclusion
      from mainline-4.20-rc1
      commit 81b45683487a51b0f4d3b29d37f20d6d078544e4
      category: bugfix
      bugzilla: 5553
      CVE: NA
      
      ---------------------------
      
      __compiletime_assert_fallback() is supposed to stop building earlier
      by using the negative-array-size method in case the compiler does not
      support "error" attribute, but has never worked like that.
      
      You can simply try:
      
          BUILD_BUG_ON(1);
      
      GCC immediately terminates the build, but Clang does not report
      anything because Clang does not support the "error" attribute now.
      It will later fail at link time, but __compiletime_assert_fallback()
      is not working at least.
      
      The root cause is commit 1d6a0d19 ("bug.h: prevent double evaluation
      of `condition' in BUILD_BUG_ON").  Prior to that commit, BUILD_BUG_ON()
      was checked by the negative-array-size method *and* the link-time trick.
      Since that commit, the negative-array-size is not effective because
      '__cond' is no longer constant.  As the comment in <linux/build_bug.h>
      says, GCC (and Clang as well) only emits the error for obvious cases.
      
      When '__cond' is a variable,
      
          ((void)sizeof(char[1 - 2 * __cond]))
      
      ... is not obvious for the compiler to know the array size is negative.
      
      Reverting that commit would break BUILD_BUG() because negative-size-array
      is evaluated before the code is optimized out.
      
      Let's give up __compiletime_assert_fallback().  This commit does not
      change the current behavior since it just rips off the useless code.
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NJason Yan <yanaijie@huawei.com>
      Reviewed-by: Nzhengbin <zhengbin13@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      bca68b4a
    • K
      gpiolib: Fix return value of gpio_to_desc() stub if !GPIOLIB · f46a293c
      Krzysztof Kozlowski 提交于
      mainline inclusion
      from mainline-5.0-rc1
      commit c5510b8dafce5f3f5a039c9b262ebcae0092c462
      category: bugfix
      bugzilla: 5539
      CVE: NA
      
      ---------------------------
      
      If CONFIG_GPOILIB is not set, the stub of gpio_to_desc() should return
      the same type of error as regular version: NULL.  All the callers
      compare the return value of gpio_to_desc() against NULL, so returned
      ERR_PTR would be treated as non-error case leading to dereferencing of
      error value.
      
      Fixes: 79a9becd ("gpiolib: export descriptor-based GPIO interface")
      Signed-off-by: NKrzysztof Kozlowski <krzk@kernel.org>
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NJason Yan <yanaijie@huawei.com>
      Reviewed-by: Nzhengbin <zhengbin13@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      f46a293c
    • J
      cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM · d396de12
      Josh Poimboeuf 提交于
      commit b284909abad48b07d3071a9fc9b5692b3e64914b upstream.
      
      With the following commit:
      
        73d5e2b4 ("cpu/hotplug: detect SMT disabled by BIOS")
      
      ... the hotplug code attempted to detect when SMT was disabled by BIOS,
      in which case it reported SMT as permanently disabled.  However, that
      code broke a virt hotplug scenario, where the guest is booted with only
      primary CPU threads, and a sibling is brought online later.
      
      The problem is that there doesn't seem to be a way to reliably
      distinguish between the HW "SMT disabled by BIOS" case and the virt
      "sibling not yet brought online" case.  So the above-mentioned commit
      was a bit misguided, as it permanently disabled SMT for both cases,
      preventing future virt sibling hotplugs.
      
      Going back and reviewing the original problems which were attempted to
      be solved by that commit, when SMT was disabled in BIOS:
      
        1) /sys/devices/system/cpu/smt/control showed "on" instead of
           "notsupported"; and
      
        2) vmx_vm_init() was incorrectly showing the L1TF_MSG_SMT warning.
      
      I'd propose that we instead consider #1 above to not actually be a
      problem.  Because, at least in the virt case, it's possible that SMT
      wasn't disabled by BIOS and a sibling thread could be brought online
      later.  So it makes sense to just always default the smt control to "on"
      to allow for that possibility (assuming cpuid indicates that the CPU
      supports SMT).
      
      The real problem is #2, which has a simple fix: change vmx_vm_init() to
      query the actual current SMT state -- i.e., whether any siblings are
      currently online -- instead of looking at the SMT "control" sysfs value.
      
      So fix it by:
      
        a) reverting the original "fix" and its followup fix:
      
           73d5e2b4 ("cpu/hotplug: detect SMT disabled by BIOS")
           bc2d8d26 ("cpu/hotplug: Fix SMT supported evaluation")
      
           and
      
        b) changing vmx_vm_init() to query the actual current SMT state --
           instead of the sysfs control value -- to determine whether the L1TF
           warning is needed.  This also requires the 'sched_smt_present'
           variable to exported, instead of 'cpu_smt_control'.
      
      Fixes: 73d5e2b4 ("cpu/hotplug: detect SMT disabled by BIOS")
      Reported-by: NIgor Mammedov <imammedo@redhat.com>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Joe Mario <jmario@redhat.com>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: kvm@vger.kernel.org
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/e3a85d585da28cc333ecbc1e78ee9216e6da9396.1548794349.git.jpoimboe@redhat.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      d396de12
    • C
      ALSA: compress: Fix stop handling on compressed capture streams · 767046e0
      Charles Keepax 提交于
      commit 4f2ab5e1d13d6aa77c55f4914659784efd776eb4 upstream.
      
      It is normal user behaviour to start, stop, then start a stream
      again without closing it. Currently this works for compressed
      playback streams but not capture ones.
      
      The states on a compressed capture stream go directly from OPEN to
      PREPARED, unlike a playback stream which moves to SETUP and waits
      for a write of data before moving to PREPARED. Currently however,
      when a stop is sent the state is set to SETUP for both types of
      streams. This leaves a capture stream in the situation where a new
      start can't be sent as that requires the state to be PREPARED and
      a new set_params can't be sent as that requires the state to be
      OPEN. The only option being to close the stream, and then reopen.
      
      Correct this issues by allowing snd_compr_drain_notify to set the
      state depending on the stream direction, as we already do in
      set_params.
      
      Fixes: 49bb6402 ("ALSA: compress_core: Add support for capture streams")
      Signed-off-by: NCharles Keepax <ckeepax@opensource.cirrus.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      767046e0
    • J
      kvm: Change offset in kvm_write_guest_offset_cached to unsigned · d7023bb1
      Jim Mattson 提交于
      [ Upstream commit 7a86dab8cf2f0fdf508f3555dddfc236623bff60 ]
      
      Since the offset is added directly to the hva from the
      gfn_to_hva_cache, a negative offset could result in an out of bounds
      write. The existing BUG_ON only checks for addresses beyond the end of
      the gfn_to_hva_cache, not for addresses before the start of the
      gfn_to_hva_cache.
      
      Note that all current call sites have non-negative offsets.
      
      Fixes: 4ec6e863 ("kvm: Introduce kvm_write_guest_offset_cached()")
      Reported-by: NCfir Cohen <cfir@google.com>
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NCfir Cohen <cfir@google.com>
      Reviewed-by: NPeter Shier <pshier@google.com>
      Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      d7023bb1
    • N
      drbd: Avoid Clang warning about pointless switch statment · 2b750730
      Nathan Chancellor 提交于
      [ Upstream commit a52c5a16cf19d8a85831bb1b915a221dd4ffae3c ]
      
      There are several warnings from Clang about no case statement matching
      the constant 0:
      
      In file included from drivers/block/drbd/drbd_receiver.c:48:
      In file included from drivers/block/drbd/drbd_int.h:48:
      In file included from ./include/linux/drbd_genl_api.h:54:
      In file included from ./include/linux/genl_magic_struct.h:236:
      ./include/linux/drbd_genl.h:321:1: warning: no case matching constant
      switch condition '0'
      GENL_struct(DRBD_NLA_HELPER, 24, drbd_helper_info,
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      ./include/linux/genl_magic_struct.h:220:10: note: expanded from macro
      'GENL_struct'
              switch (0) {
                      ^
      
      Silence this warning by adding a 'case 0:' statement. Additionally,
      adjust the alignment of the statements in the ct_assert_unique macro to
      avoid a checkpatch warning.
      
      This solution was originally sent by Arnd Bergmann with a default case
      statement: https://lore.kernel.org/patchwork/patch/756723/
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/43Suggested-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      2b750730
    • S
      net/mlx5: EQ, Use the right place to store/read IRQ affinity hint · 64042b03
      Saeed Mahameed 提交于
      [ Upstream commit 1e86ace4c140fd5a693e266c9b23409358f25381 ]
      
      Currently the cpu affinity hint mask for completion EQs is stored and
      read from the wrong place, since reading and storing is done from the
      same index, there is no actual issue with that, but internal irq_info
      for completion EQs stars at MLX5_EQ_VEC_COMP_BASE offset in irq_info
      array, this patch changes the code to use the correct offset to store
      and read the IRQ affinity hint.
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
      Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      64042b03
    • M
      gpiolib: Fix possible use after free on label · b11d82db
      Muchun Song 提交于
      [ Upstream commit 18534df419041e6c1f4b41af56ee7d41f757815c ]
      
      gpiod_request_commit() copies the pointer to the label passed as
      an argument only to be used later. But there's a chance the caller
      could immediately free the passed string(e.g., local variable).
      This could trigger a use after free when we use gpio label(e.g.,
      gpiochip_unlock_as_irq(), gpiochip_is_requested()).
      
      To be on the safe side: duplicate the string with kstrdup_const()
      so that if an unaware user passes an address to a stack-allocated
      buffer, we won't get the arbitrary label.
      
      Also fix gpiod_set_consumer_name().
      Signed-off-by: NMuchun Song <smuchun@gmail.com>
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      b11d82db
    • V
      HID: debug: fix the ring buffer implementation · 6b0c9002
      Vladis Dronov 提交于
      mainline inclusion
      from mainline-5.0
      commit 13054abbaa4f1fd4e6f3b4b63439ec033b4c8035
      category: bugfix
      bugzilla: NA
      CVE: CVE-2019-3819
      
      -------------------------------------------------
      
      Ring buffer implementation in hid_debug_event() and hid_debug_events_read()
      is strange allowing lost or corrupted data. After commit 717adfda
      ("HID: debug: check length before copy_to_user()") it is possible to enter
      an infinite loop in hid_debug_events_read() by providing 0 as count, this
      locks up a system. Fix this by rewriting the ring buffer implementation
      with kfifo and simplify the code.
      
      This fixes CVE-2019-3819.
      
      v2: fix an execution logic and add a comment
      v3: use __set_current_state() instead of set_current_state()
      
      Link: https://bugzilla.redhat.com/show_bug.cgi?id=1669187
      Cc: stable@vger.kernel.org # v4.18+
      Fixes: cd667ce2 ("HID: use debugfs for events/reports dumping")
      Fixes: 717adfda ("HID: debug: check length before copy_to_user()")
      Signed-off-by: NVladis Dronov <vdronov@redhat.com>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NBenjamin Tissoires <benjamin.tissoires@redhat.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Acked-by: Nzhangyi (F) <yi.zhang@huawei.com>
      6b0c9002
    • Q
      mm/memblock.c: skip kmemleak for kasan_init() · ad5aa6ec
      Qian Cai 提交于
      mainline inclusion
      from mainline-5.0-rc1
      commit fed84c78527009d4f799a3ed9a566502fa026d82
      category: bugfix
      bugzilla: 7440
      CVE: NA
      
      -------------------------------------------------
      
      Kmemleak does not play well with KASAN (tested on both HPE Apollo 70 and
      Huawei TaiShan 2280 aarch64 servers).
      
      After calling start_kernel()->setup_arch()->kasan_init(), kmemleak early
      log buffer went from something like 280 to 260000 which caused kmemleak
      disabled and crash dump memory reservation failed.  The multitude of
      kmemleak_alloc() calls is from nested loops while KASAN is setting up full
      memory mappings, so let early kmemleak allocations skip those
      memblock_alloc_internal() calls came from kasan_init() given that those
      early KASAN memory mappings should not reference to other memory.  Hence,
      no kmemleak false positives.
      
      kasan_init
        kasan_map_populate [1]
          kasan_pgd_populate [2]
            kasan_pud_populate [3]
              kasan_pmd_populate [4]
                kasan_pte_populate [5]
                  kasan_alloc_zeroed_page
                    memblock_alloc_try_nid
                      memblock_alloc_internal
                        kmemleak_alloc
      
      [1] for_each_memblock(memory, reg)
      [2] while (pgdp++, addr = next, addr != end)
      [3] while (pudp++, addr = next, addr != end && pud_none(READ_ONCE(*pudp)))
      [4] while (pmdp++, addr = next, addr != end && pmd_none(READ_ONCE(*pmdp)))
      [5] while (ptep++, addr = next, addr != end && pte_none(READ_ONCE(*ptep)))
      
      Link: http://lkml.kernel.org/r/1543442925-17794-1-git-send-email-cai@gmx.usSigned-off-by: NQian Cai <cai@gmx.us>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NZengruan Ye <yezengruan@huawei.com>
      Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
      [yyl: adjust context]
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      ad5aa6ec
    • F
      of: overlay: add tests to validate kfrees from overlay removal · 5969ad33
      Frank Rowand 提交于
      commit 144552c786925314c1e7cb8f91a71dae1aca8798 upstream.
      
      Add checks:
        - attempted kfree due to refcount reaching zero before overlay
          is removed
        - properties linked to an overlay node when the node is removed
        - node refcount > one during node removal in a changeset destroy,
          if the node was created by the changeset
      
      After applying this patch, several validation warnings will be
      reported from the devicetree unittest during boot due to
      pre-existing devicetree bugs. The warnings will be similar to:
      
        OF: ERROR: of_node_release(), unexpected properties in /testcase-data/overlay-node/test-bus/test-unittest11
        OF: ERROR: memory leak, expected refcount 1 instead of 2, of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay node /testcase-data-2/substation@100/
        hvac-medium-2
      Tested-by: NAlan Tull <atull@kernel.org>
      Signed-off-by: NFrank Rowand <frank.rowand@sony.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      5969ad33
    • T
      oom, oom_reaper: do not enqueue same task twice · 0b13c78a
      Tetsuo Handa 提交于
      commit 9bcdeb51bd7d2ae9fe65ea4d60643d2aeef5bfe3 upstream.
      
      Arkadiusz reported that enabling memcg's group oom killing causes
      strange memcg statistics where there is no task in a memcg despite the
      number of tasks in that memcg is not 0.  It turned out that there is a
      bug in wake_oom_reaper() which allows enqueuing same task twice which
      makes impossible to decrease the number of tasks in that memcg due to a
      refcount leak.
      
      This bug existed since the OOM reaper became invokable from
      task_will_free_mem(current) path in out_of_memory() in Linux 4.7,
      
        T1@P1     |T2@P1     |T3@P1     |OOM reaper
        ----------+----------+----------+------------
                                         # Processing an OOM victim in a different memcg domain.
                              try_charge()
                                mem_cgroup_out_of_memory()
                                  mutex_lock(&oom_lock)
                   try_charge()
                     mem_cgroup_out_of_memory()
                       mutex_lock(&oom_lock)
        try_charge()
          mem_cgroup_out_of_memory()
            mutex_lock(&oom_lock)
                                  out_of_memory()
                                    oom_kill_process(P1)
                                      do_send_sig_info(SIGKILL, @P1)
                                      mark_oom_victim(T1@P1)
                                      wake_oom_reaper(T1@P1) # T1@P1 is enqueued.
                                  mutex_unlock(&oom_lock)
                       out_of_memory()
                         mark_oom_victim(T2@P1)
                         wake_oom_reaper(T2@P1) # T2@P1 is enqueued.
                       mutex_unlock(&oom_lock)
            out_of_memory()
              mark_oom_victim(T1@P1)
              wake_oom_reaper(T1@P1) # T1@P1 is enqueued again due to oom_reaper_list == T2@P1 && T1@P1->oom_reaper_list == NULL.
            mutex_unlock(&oom_lock)
                                         # Completed processing an OOM victim in a different memcg domain.
                                         spin_lock(&oom_reaper_lock)
                                         # T1P1 is dequeued.
                                         spin_unlock(&oom_reaper_lock)
      
      but memcg's group oom killing made it easier to trigger this bug by
      calling wake_oom_reaper() on the same task from one out_of_memory()
      request.
      
      Fix this bug using an approach used by commit 855b0183 ("oom,
      oom_reaper: disable oom_reaper for oom_kill_allocating_task").  As a
      side effect of this patch, this patch also avoids enqueuing multiple
      threads sharing memory via task_will_free_mem(current) path.
      
      Link: http://lkml.kernel.org/r/e865a044-2c10-9858-f4ef-254bc71d6cc2@i-love.sakura.ne.jp
      Link: http://lkml.kernel.org/r/5ee34fc6-1485-34f8-8790-903ddabaa809@i-love.sakura.ne.jp
      Fixes: af8e15cc ("oom, oom_reaper: do not enqueue task if it is on the oom_reaper_list head")
      Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reported-by: NArkadiusz Miskiewicz <arekm@maven.pl>
      Tested-by: NArkadiusz Miskiewicz <arekm@maven.pl>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NRoman Gushchin <guro@fb.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Aleksa Sarai <asarai@suse.de>
      Cc: Jay Kamat <jgkamat@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      0b13c78a
    • D
      ipvlan, l3mdev: fix broken l3s mode wrt local routes · 15ef6c67
      Daniel Borkmann 提交于
      [ Upstream commit d5256083f62e2720f75bb3c5a928a0afe47d6bc3 ]
      
      While implementing ipvlan l3 and l3s mode for kubernetes CNI plugin,
      I ran into the issue that while l3 mode is working fine, l3s mode
      does not have any connectivity to kube-apiserver and hence all pods
      end up in Error state as well. The ipvlan master device sits on
      top of a bond device and hostns traffic to kube-apiserver (also running
      in hostns) is DNATed from 10.152.183.1:443 to 139.178.29.207:37573
      where the latter is the address of the bond0. While in l3 mode, a
      curl to https://10.152.183.1:443 or to https://139.178.29.207:37573
      works fine from hostns, neither of them do in case of l3s. In the
      latter only a curl to https://127.0.0.1:37573 appeared to work where
      for local addresses of bond0 I saw kernel suddenly starting to emit
      ARP requests to query HW address of bond0 which remained unanswered
      and neighbor entries in INCOMPLETE state. These ARP requests only
      happen while in l3s.
      
      Debugging this further, I found the issue is that l3s mode is piggy-
      backing on l3 master device, and in this case local routes are using
      l3mdev_master_dev_rcu(dev) instead of net->loopback_dev as per commit
      f5a0aab8 ("net: ipv4: dst for local input routes should use l3mdev
      if relevant") and 5f02ce24 ("net: l3mdev: Allow the l3mdev to be
      a loopback"). I found that reverting them back into using the
      net->loopback_dev fixed ipvlan l3s connectivity and got everything
      working for the CNI.
      
      Now judging from 4fbae7d8 ("ipvlan: Introduce l3s mode") and the
      l3mdev paper in [0] the only sole reason why ipvlan l3s is relying
      on l3 master device is to get the l3mdev_ip_rcv() receive hook for
      setting the dst entry of the input route without adding its own
      ipvlan specific hacks into the receive path, however, any l3 domain
      semantics beyond just that are breaking l3s operation. Note that
      ipvlan also has the ability to dynamically switch its internal
      operation from l3 to l3s for all ports via ipvlan_set_port_mode()
      at runtime. In any case, l3 vs l3s soley distinguishes itself by
      'de-confusing' netfilter through switching skb->dev to ipvlan slave
      device late in NF_INET_LOCAL_IN before handing the skb to L4.
      
      Minimal fix taken here is to add a IFF_L3MDEV_RX_HANDLER flag which,
      if set from ipvlan setup, gets us only the wanted l3mdev_l3_rcv() hook
      without any additional l3mdev semantics on top. This should also have
      minimal impact since dev->priv_flags is already hot in cache. With
      this set, l3s mode is working fine and I also get things like
      masquerading pod traffic on the ipvlan master properly working.
      
        [0] https://netdevconf.org/1.2/papers/ahern-what-is-l3mdev-paper.pdf
      
      Fixes: f5a0aab8 ("net: ipv4: dst for local input routes should use l3mdev if relevant")
      Fixes: 5f02ce24 ("net: l3mdev: Allow the l3mdev to be a loopback")
      Fixes: 4fbae7d8 ("ipvlan: Introduce l3s mode")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Mahesh Bandewar <maheshb@google.com>
      Cc: David Ahern <dsa@cumulusnetworks.com>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Martynas Pumputis <m@lambda.lt>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      15ef6c67
    • D
      Input: input_event - fix the CONFIG_SPARC64 mixup · aec9091e
      Deepa Dinamani 提交于
      commit 141e5dcaa7356077028b4cd48ec351a38c70e5e5 upstream.
      
      Arnd Bergmann pointed out that CONFIG_* cannot be used in a uapi header.
      Override with an equivalent conditional.
      
      Fixes: 2e746942ebac ("Input: input_event - provide override for sparc64")
      Fixes: 152194fe ("Input: extend usable life of event timestamps to 2106 on 32 bit systems")
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      aec9091e
    • D
      bpf: fix sanitation of alu op with pointer / scalar type from different paths · 52cf1114
      Daniel Borkmann 提交于
      [ commit d3bd7413e0ca40b60cf60d4003246d067cafdeda upstream ]
      
      While 979d63d50c0c ("bpf: prevent out of bounds speculation on pointer
      arithmetic") took care of rejecting alu op on pointer when e.g. pointer
      came from two different map values with different map properties such as
      value size, Jann reported that a case was not covered yet when a given
      alu op is used in both "ptr_reg += reg" and "numeric_reg += reg" from
      different branches where we would incorrectly try to sanitize based
      on the pointer's limit. Catch this corner case and reject the program
      instead.
      
      Fixes: 979d63d50c0c ("bpf: prevent out of bounds speculation on pointer arithmetic")
      Reported-by: NJann Horn <jannh@google.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      52cf1114
    • D
      bpf: prevent out of bounds speculation on pointer arithmetic · cedaa820
      Daniel Borkmann 提交于
      [ commit 979d63d50c0c0f7bc537bf821e056cc9fe5abd38 upstream ]
      
      Jann reported that the original commit back in b2157399
      ("bpf: prevent out-of-bounds speculation") was not sufficient
      to stop CPU from speculating out of bounds memory access:
      While b2157399 only focussed on masking array map access
      for unprivileged users for tail calls and data access such
      that the user provided index gets sanitized from BPF program
      and syscall side, there is still a more generic form affected
      from BPF programs that applies to most maps that hold user
      data in relation to dynamic map access when dealing with
      unknown scalars or "slow" known scalars as access offset, for
      example:
      
        - Load a map value pointer into R6
        - Load an index into R7
        - Do a slow computation (e.g. with a memory dependency) that
          loads a limit into R8 (e.g. load the limit from a map for
          high latency, then mask it to make the verifier happy)
        - Exit if R7 >= R8 (mispredicted branch)
        - Load R0 = R6[R7]
        - Load R0 = R6[R0]
      
      For unknown scalars there are two options in the BPF verifier
      where we could derive knowledge from in order to guarantee
      safe access to the memory: i) While </>/<=/>= variants won't
      allow to derive any lower or upper bounds from the unknown
      scalar where it would be safe to add it to the map value
      pointer, it is possible through ==/!= test however. ii) another
      option is to transform the unknown scalar into a known scalar,
      for example, through ALU ops combination such as R &= <imm>
      followed by R |= <imm> or any similar combination where the
      original information from the unknown scalar would be destroyed
      entirely leaving R with a constant. The initial slow load still
      precedes the latter ALU ops on that register, so the CPU
      executes speculatively from that point. Once we have the known
      scalar, any compare operation would work then. A third option
      only involving registers with known scalars could be crafted
      as described in [0] where a CPU port (e.g. Slow Int unit)
      would be filled with many dependent computations such that
      the subsequent condition depending on its outcome has to wait
      for evaluation on its execution port and thereby executing
      speculatively if the speculated code can be scheduled on a
      different execution port, or any other form of mistraining
      as described in [1], for example. Given this is not limited
      to only unknown scalars, not only map but also stack access
      is affected since both is accessible for unprivileged users
      and could potentially be used for out of bounds access under
      speculation.
      
      In order to prevent any of these cases, the verifier is now
      sanitizing pointer arithmetic on the offset such that any
      out of bounds speculation would be masked in a way where the
      pointer arithmetic result in the destination register will
      stay unchanged, meaning offset masked into zero similar as
      in array_index_nospec() case. With regards to implementation,
      there are three options that were considered: i) new insn
      for sanitation, ii) push/pop insn and sanitation as inlined
      BPF, iii) reuse of ax register and sanitation as inlined BPF.
      
      Option i) has the downside that we end up using from reserved
      bits in the opcode space, but also that we would require
      each JIT to emit masking as native arch opcodes meaning
      mitigation would have slow adoption till everyone implements
      it eventually which is counter-productive. Option ii) and iii)
      have both in common that a temporary register is needed in
      order to implement the sanitation as inlined BPF since we
      are not allowed to modify the source register. While a push /
      pop insn in ii) would be useful to have in any case, it
      requires once again that every JIT needs to implement it
      first. While possible, amount of changes needed would also
      be unsuitable for a -stable patch. Therefore, the path which
      has fewer changes, less BPF instructions for the mitigation
      and does not require anything to be changed in the JITs is
      option iii) which this work is pursuing. The ax register is
      already mapped to a register in all JITs (modulo arm32 where
      it's mapped to stack as various other BPF registers there)
      and used in constant blinding for JITs-only so far. It can
      be reused for verifier rewrites under certain constraints.
      The interpreter's tmp "register" has therefore been remapped
      into extending the register set with hidden ax register and
      reusing that for a number of instructions that needed the
      prior temporary variable internally (e.g. div, mod). This
      allows for zero increase in stack space usage in the interpreter,
      and enables (restricted) generic use in rewrites otherwise as
      long as such a patchlet does not make use of these instructions.
      The sanitation mask is dynamic and relative to the offset the
      map value or stack pointer currently holds.
      
      There are various cases that need to be taken under consideration
      for the masking, e.g. such operation could look as follows:
      ptr += val or val += ptr or ptr -= val. Thus, the value to be
      sanitized could reside either in source or in destination
      register, and the limit is different depending on whether
      the ALU op is addition or subtraction and depending on the
      current known and bounded offset. The limit is derived as
      follows: limit := max_value_size - (smin_value + off). For
      subtraction: limit := umax_value + off. This holds because
      we do not allow any pointer arithmetic that would
      temporarily go out of bounds or would have an unknown
      value with mixed signed bounds where it is unclear at
      verification time whether the actual runtime value would
      be either negative or positive. For example, we have a
      derived map pointer value with constant offset and bounded
      one, so limit based on smin_value works because the verifier
      requires that statically analyzed arithmetic on the pointer
      must be in bounds, and thus it checks if resulting
      smin_value + off and umax_value + off is still within map
      value bounds at time of arithmetic in addition to time of
      access. Similarly, for the case of stack access we derive
      the limit as follows: MAX_BPF_STACK + off for subtraction
      and -off for the case of addition where off := ptr_reg->off +
      ptr_reg->var_off.value. Subtraction is a special case for
      the masking which can be in form of ptr += -val, ptr -= -val,
      or ptr -= val. In the first two cases where we know that
      the value is negative, we need to temporarily negate the
      value in order to do the sanitation on a positive value
      where we later swap the ALU op, and restore original source
      register if the value was in source.
      
      The sanitation of pointer arithmetic alone is still not fully
      sufficient as is, since a scenario like the following could
      happen ...
      
        PTR += 0x1000 (e.g. K-based imm)
        PTR -= BIG_NUMBER_WITH_SLOW_COMPARISON
        PTR += 0x1000
        PTR -= BIG_NUMBER_WITH_SLOW_COMPARISON
        [...]
      
      ... which under speculation could end up as ...
      
        PTR += 0x1000
        PTR -= 0 [ truncated by mitigation ]
        PTR += 0x1000
        PTR -= 0 [ truncated by mitigation ]
        [...]
      
      ... and therefore still access out of bounds. To prevent such
      case, the verifier is also analyzing safety for potential out
      of bounds access under speculative execution. Meaning, it is
      also simulating pointer access under truncation. We therefore
      "branch off" and push the current verification state after the
      ALU operation with known 0 to the verification stack for later
      analysis. Given the current path analysis succeeded it is
      likely that the one under speculation can be pruned. In any
      case, it is also subject to existing complexity limits and
      therefore anything beyond this point will be rejected. In
      terms of pruning, it needs to be ensured that the verification
      state from speculative execution simulation must never prune
      a non-speculative execution path, therefore, we mark verifier
      state accordingly at the time of push_stack(). If verifier
      detects out of bounds access under speculative execution from
      one of the possible paths that includes a truncation, it will
      reject such program.
      
      Given we mask every reg-based pointer arithmetic for
      unprivileged programs, we've been looking into how it could
      affect real-world programs in terms of size increase. As the
      majority of programs are targeted for privileged-only use
      case, we've unconditionally enabled masking (with its alu
      restrictions on top of it) for privileged programs for the
      sake of testing in order to check i) whether they get rejected
      in its current form, and ii) by how much the number of
      instructions and size will increase. We've tested this by
      using Katran, Cilium and test_l4lb from the kernel selftests.
      For Katran we've evaluated balancer_kern.o, Cilium bpf_lxc.o
      and an older test object bpf_lxc_opt_-DUNKNOWN.o and l4lb
      we've used test_l4lb.o as well as test_l4lb_noinline.o. We
      found that none of the programs got rejected by the verifier
      with this change, and that impact is rather minimal to none.
      balancer_kern.o had 13,904 bytes (1,738 insns) xlated and
      7,797 bytes JITed before and after the change. Most complex
      program in bpf_lxc.o had 30,544 bytes (3,817 insns) xlated
      and 18,538 bytes JITed before and after and none of the other
      tail call programs in bpf_lxc.o had any changes either. For
      the older bpf_lxc_opt_-DUNKNOWN.o object we found a small
      increase from 20,616 bytes (2,576 insns) and 12,536 bytes JITed
      before to 20,664 bytes (2,582 insns) and 12,558 bytes JITed
      after the change. Other programs from that object file had
      similar small increase. Both test_l4lb.o had no change and
      remained at 6,544 bytes (817 insns) xlated and 3,401 bytes
      JITed and for test_l4lb_noinline.o constant at 5,080 bytes
      (634 insns) xlated and 3,313 bytes JITed. This can be explained
      in that LLVM typically optimizes stack based pointer arithmetic
      by using K-based operations and that use of dynamic map access
      is not overly frequent. However, in future we may decide to
      optimize the algorithm further under known guarantees from
      branch and value speculation. Latter seems also unclear in
      terms of prediction heuristics that today's CPUs apply as well
      as whether there could be collisions in e.g. the predictor's
      Value History/Pattern Table for triggering out of bounds access,
      thus masking is performed unconditionally at this point but could
      be subject to relaxation later on. We were generally also
      brainstorming various other approaches for mitigation, but the
      blocker was always lack of available registers at runtime and/or
      overhead for runtime tracking of limits belonging to a specific
      pointer. Thus, we found this to be minimally intrusive under
      given constraints.
      
      With that in place, a simple example with sanitized access on
      unprivileged load at post-verification time looks as follows:
      
        # bpftool prog dump xlated id 282
        [...]
        28: (79) r1 = *(u64 *)(r7 +0)
        29: (79) r2 = *(u64 *)(r7 +8)
        30: (57) r1 &= 15
        31: (79) r3 = *(u64 *)(r0 +4608)
        32: (57) r3 &= 1
        33: (47) r3 |= 1
        34: (2d) if r2 > r3 goto pc+19
        35: (b4) (u32) r11 = (u32) 20479  |
        36: (1f) r11 -= r2                | Dynamic sanitation for pointer
        37: (4f) r11 |= r2                | arithmetic with registers
        38: (87) r11 = -r11               | containing bounded or known
        39: (c7) r11 s>>= 63              | scalars in order to prevent
        40: (5f) r11 &= r2                | out of bounds speculation.
        41: (0f) r4 += r11                |
        42: (71) r4 = *(u8 *)(r4 +0)
        43: (6f) r4 <<= r1
        [...]
      
      For the case where the scalar sits in the destination register
      as opposed to the source register, the following code is emitted
      for the above example:
      
        [...]
        16: (b4) (u32) r11 = (u32) 20479
        17: (1f) r11 -= r2
        18: (4f) r11 |= r2
        19: (87) r11 = -r11
        20: (c7) r11 s>>= 63
        21: (5f) r2 &= r11
        22: (0f) r2 += r0
        23: (61) r0 = *(u32 *)(r2 +0)
        [...]
      
      JIT blinding example with non-conflicting use of r10:
      
        [...]
         d5:	je     0x0000000000000106    _
         d7:	mov    0x0(%rax),%edi       |
         da:	mov    $0xf153246,%r10d     | Index load from map value and
         e0:	xor    $0xf153259,%r10      | (const blinded) mask with 0x1f.
         e7:	and    %r10,%rdi            |_
         ea:	mov    $0x2f,%r10d          |
         f0:	sub    %rdi,%r10            | Sanitized addition. Both use r10
         f3:	or     %rdi,%r10            | but do not interfere with each
         f6:	neg    %r10                 | other. (Neither do these instructions
         f9:	sar    $0x3f,%r10           | interfere with the use of ax as temp
         fd:	and    %r10,%rdi            | in interpreter.)
        100:	add    %rax,%rdi            |_
        103:	mov    0x0(%rdi),%eax
       [...]
      
      Tested that it fixes Jann's reproducer, and also checked that test_verifier
      and test_progs suite with interpreter, JIT and JIT with hardening enabled
      on x86-64 and arm64 runs successfully.
      
        [0] Speculose: Analyzing the Security Implications of Speculative
            Execution in CPUs, Giorgi Maisuradze and Christian Rossow,
            https://arxiv.org/pdf/1801.04084.pdf
      
        [1] A Systematic Evaluation of Transient Execution Attacks and
            Defenses, Claudio Canella, Jo Van Bulck, Michael Schwarz,
            Moritz Lipp, Benjamin von Berg, Philipp Ortner, Frank Piessens,
            Dmitry Evtyushkin, Daniel Gruss,
            https://arxiv.org/pdf/1811.05441.pdf
      
      Fixes: b2157399 ("bpf: prevent out-of-bounds speculation")
      Reported-by: NJann Horn <jannh@google.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      cedaa820
    • D
      bpf: enable access to ax register also from verifier rewrite · 2facace8
      Daniel Borkmann 提交于
      [ commit 9b73bfdd08e73231d6a90ae6db4b46b3fbf56c30 upstream ]
      
      Right now we are using BPF ax register in JIT for constant blinding as
      well as in interpreter as temporary variable. Verifier will not be able
      to use it simply because its use will get overridden from the former in
      bpf_jit_blind_insn(). However, it can be made to work in that blinding
      will be skipped if there is prior use in either source or destination
      register on the instruction. Taking constraints of ax into account, the
      verifier is then open to use it in rewrites under some constraints. Note,
      ax register already has mappings in every eBPF JIT.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      2facace8
    • D
      bpf: move tmp variable into ax register in interpreter · 648f6859
      Daniel Borkmann 提交于
      [ commit 144cd91c4c2bced6eb8a7e25e590f6618a11e854 upstream ]
      
      This change moves the on-stack 64 bit tmp variable in ___bpf_prog_run()
      into the hidden ax register. The latter is currently only used in JITs
      for constant blinding as a temporary scratch register, meaning the BPF
      interpreter will never see the use of ax. Therefore it is safe to use
      it for the cases where tmp has been used earlier. This is needed to later
      on allow restricted hidden use of ax in both interpreter and JITs.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      648f6859
    • D
      bpf: move {prev_,}insn_idx into verifier env · f38a7d20
      Daniel Borkmann 提交于
      [ commit c08435ec7f2bc8f4109401f696fd55159b4b40cb upstream ]
      
      Move prev_insn_idx and insn_idx from the do_check() function into
      the verifier environment, so they can be read inside the various
      helper functions for handling the instructions. It's easier to put
      this into the environment rather than changing all call-sites only
      to pass it along. insn_idx is useful in particular since this later
      on allows to hold state in env->insn_aux_data[env->insn_idx].
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      f38a7d20
    • D
      Input: input_event - provide override for sparc64 · 716056f3
      Deepa Dinamani 提交于
      commit 2e746942ebacf1565caa72cf980745e5ce297c48 upstream.
      
      The usec part of the timeval is defined as
      __kernel_suseconds_t	tv_usec; /* microseconds */
      
      Arnd noticed that sparc64 is the only architecture that defines
      __kernel_suseconds_t as int rather than long.
      
      This breaks the current y2038 fix for kernel as we only access and define
      the timeval struct for non-kernel use cases.  But, this was hidden by an
      another typo in the use of __KERNEL__ qualifier.
      
      Fix the typo, and provide an override for sparc64.
      
      Fixes: 152194fe ("Input: extend usable life of event timestamps to 2106 on 32 bit systems")
      Reported-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      716056f3
    • D
      Drivers: hv: vmbus: Check for ring when getting debug info · 028e002b
      Dexuan Cui 提交于
      commit ba50bf1ce9a51fc97db58b96d01306aa70bc3979 upstream.
      
      fc96df16a1ce is good and can already fix the "return stack garbage" issue,
      but let's also improve hv_ringbuffer_get_debuginfo(), which would silently
      return stack garbage, if people forget to check channel->state or
      ring_info->ring_buffer, when using the function in the future.
      
      Having an error check in the function would eliminate the potential risk.
      
      Add a Fixes tag to indicate the patch depdendency.
      
      Fixes: fc96df16a1ce ("Drivers: hv: vmbus: Return -EINVAL for the sys files for unopened channels")
      Cc: stable@vger.kernel.org
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: NDexuan Cui <decui@microsoft.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      028e002b
    • I
      net: ipv4: Fix memory leak in network namespace dismantle · 31091745
      Ido Schimmel 提交于
      [ Upstream commit f97f4dd8b3bb9d0993d2491e0f22024c68109184 ]
      
      IPv4 routing tables are flushed in two cases:
      
      1. In response to events in the netdev and inetaddr notification chains
      2. When a network namespace is being dismantled
      
      In both cases only routes associated with a dead nexthop group are
      flushed. However, a nexthop group will only be marked as dead in case it
      is populated with actual nexthops using a nexthop device. This is not
      the case when the route in question is an error route (e.g.,
      'blackhole', 'unreachable').
      
      Therefore, when a network namespace is being dismantled such routes are
      not flushed and leaked [1].
      
      To reproduce:
      # ip netns add blue
      # ip -n blue route add unreachable 192.0.2.0/24
      # ip netns del blue
      
      Fix this by not skipping error routes that are not marked with
      RTNH_F_DEAD when flushing the routing tables.
      
      To prevent the flushing of such routes in case #1, add a parameter to
      fib_table_flush() that indicates if the table is flushed as part of
      namespace dismantle or not.
      
      Note that this problem does not exist in IPv6 since error routes are
      associated with the loopback device.
      
      [1]
      unreferenced object 0xffff888066650338 (size 56):
        comm "ip", pid 1206, jiffies 4294786063 (age 26.235s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 b0 1c 62 61 80 88 ff ff  ..........ba....
          e8 8b a1 64 80 88 ff ff 00 07 00 08 fe 00 00 00  ...d............
        backtrace:
          [<00000000856ed27d>] inet_rtm_newroute+0x129/0x220
          [<00000000fcdfc00a>] rtnetlink_rcv_msg+0x397/0xa20
          [<00000000cb85801a>] netlink_rcv_skb+0x132/0x380
          [<00000000ebc991d2>] netlink_unicast+0x4c0/0x690
          [<0000000014f62875>] netlink_sendmsg+0x929/0xe10
          [<00000000bac9d967>] sock_sendmsg+0xc8/0x110
          [<00000000223e6485>] ___sys_sendmsg+0x77a/0x8f0
          [<000000002e94f880>] __sys_sendmsg+0xf7/0x250
          [<00000000ccb1fa72>] do_syscall_64+0x14d/0x610
          [<00000000ffbe3dae>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
          [<000000003a8b605b>] 0xffffffffffffffff
      unreferenced object 0xffff888061621c88 (size 48):
        comm "ip", pid 1206, jiffies 4294786063 (age 26.235s)
        hex dump (first 32 bytes):
          6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
          6b 6b 6b 6b 6b 6b 6b 6b d8 8e 26 5f 80 88 ff ff  kkkkkkkk..&_....
        backtrace:
          [<00000000733609e3>] fib_table_insert+0x978/0x1500
          [<00000000856ed27d>] inet_rtm_newroute+0x129/0x220
          [<00000000fcdfc00a>] rtnetlink_rcv_msg+0x397/0xa20
          [<00000000cb85801a>] netlink_rcv_skb+0x132/0x380
          [<00000000ebc991d2>] netlink_unicast+0x4c0/0x690
          [<0000000014f62875>] netlink_sendmsg+0x929/0xe10
          [<00000000bac9d967>] sock_sendmsg+0xc8/0x110
          [<00000000223e6485>] ___sys_sendmsg+0x77a/0x8f0
          [<000000002e94f880>] __sys_sendmsg+0xf7/0x250
          [<00000000ccb1fa72>] do_syscall_64+0x14d/0x610
          [<00000000ffbe3dae>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
          [<000000003a8b605b>] 0xffffffffffffffff
      
      Fixes: 8cced9ef ("[NETNS]: Enable routing configuration in non-initial namespace.")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      31091745
    • R
      net: Fix usage of pskb_trim_rcsum · ff3f0d43
      Ross Lagerwall 提交于
      [ Upstream commit 6c57f0458022298e4da1729c67bd33ce41c14e7a ]
      
      In certain cases, pskb_trim_rcsum() may change skb pointers.
      Reinitialize header pointers afterwards to avoid potential
      use-after-frees. Add a note in the documentation of
      pskb_trim_rcsum(). Found by KASAN.
      Signed-off-by: NRoss Lagerwall <ross.lagerwall@citrix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      ff3f0d43
    • X
      sched.h: depend on resctl & intel_rdt · 21d8578b
      Xie XiuQi 提交于
      hulk inclusion
      category: feature
      bugzilla: 5510
      CVE: NA
      Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com>
      Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      21d8578b
    • Y
      resctrlfs: mpam: init struct for mpam · 1abcabe9
      Yang Yingliang 提交于
      hulk inclusion
      category: feature
      bugzilla: 5510
      CVE: NA
      Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com>
      Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      1abcabe9
    • X
      resctrlfs: init support resctrlfs · e1b8071e
      Xie XiuQi 提交于
      hulk inclusion
      category: feature
      bugzilla: 5510
      CVE: NA
      Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com>
      Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      e1b8071e
    • X
      cpufreq / cppc: Work around for Hisilicon CPPC cpufreq · 5f085d45
      Xiongfeng Wang 提交于
      euler inclusion
      category: feature
      Bugzilla: 5520
      CVE: N/A
      
      ------------------------------------------------------------------------
      
      Hisilicon chips do not support delivered performance counter register
      and reference performance counter register. But the platform can
      calculate the real performance using its own method. This patch provide
      a workaround for this problem, and other platforms can also use this
      workaround framework. We reuse the desired performance register to
      store the real performance calculated by the platform. After the
      platform finished the frequency adjust, it gets the real performance and
      writes it into desired performance register. OS can use it to calculate
      the real frequency.
      Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
      Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      5f085d45
    • X
      firmware: arm_sdei: make 'sdei_api_event_disable/enable' public · ac271e1a
      Xiongfeng Wang 提交于
      euler inclusion
      category: feature
      Bugzilla: 5515
      CVE: N/A
      
      ----------------------------------------
      
      NMI Watchdog need to enable the event for each core individually. But the
      existing public api 'sdei_event_enable' enable events for all cores when
      the event type is private.
      Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
      Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      ac271e1a
    • X
      firmware: arm_sdei: add interrupt binding api · e5c4311e
      Xiongfeng Wang 提交于
      euler inclusion
      category: feature
      Bugzilla: 5515
      CVE: N/A
      
      ----------------------------------------
      
      This patch add a interrupt binding api function which returns the binded
      event number.
      Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
      Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      e5c4311e
    • X
      watchdog: make hardlockup detect code public · b8163908
      Xiongfeng Wang 提交于
      euler inclusion
      category: feature
      Bugzilla: 5515
      CVE: N/A
      
      ----------------------------------------
      
      In current code, the hardlockup detect code is contained by
      CONFIG_HARDLOCKUP_DETECTOR_PERF. This patch makes this code public so
      that other arch hardlockup detector can use it.
      Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
      Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      b8163908
    • A
      mm/swap: use nr_node_ids for avail_lists in swap_info_struct · 08c9196f
      Aaron Lu 提交于
      [ Upstream commit 66f71da9dd38af17dc17209cdde7987d4679a699 ]
      
      Since a2468cc9 ("swap: choose swap device according to numa node"),
      avail_lists field of swap_info_struct is changed to an array with
      MAX_NUMNODES elements.  This made swap_info_struct size increased to 40KiB
      and needs an order-4 page to hold it.
      
      This is not optimal in that:
      1 Most systems have way less than MAX_NUMNODES(1024) nodes so it
        is a waste of memory;
      2 It could cause swapon failure if the swap device is swapped on
        after system has been running for a while, due to no order-4
        page is available as pointed out by Vasily Averin.
      
      Solve the above two issues by using nr_node_ids(which is the actual
      possible node number the running system has) for avail_lists instead of
      MAX_NUMNODES.
      
      nr_node_ids is unknown at compile time so can't be directly used when
      declaring this array.  What I did here is to declare avail_lists as zero
      element array and allocate space for it when allocating space for
      swap_info_struct.  The reason why keep using array but not pointer is
      plist_for_each_entry needs the field to be part of the struct, so pointer
      will not work.
      
      This patch is on top of Vasily Averin's fix commit.  I think the use of
      kvzalloc for swap_info_struct is still needed in case nr_node_ids is
      really big on some systems.
      
      Link: http://lkml.kernel.org/r/20181115083847.GA11129@intel.comSigned-off-by: NAaron Lu <aaron.lu@intel.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Vasily Averin <vvs@virtuozzo.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      08c9196f