1. 06 4月, 2019 40 次提交
    • S
      scsi: fcoe: make use of fip_mode enum complete · 1ef1b20f
      Sedat Dilek 提交于
      [ Upstream commit 8beb90aaf334a6efa3e924339926b5f93a234dbb ]
      
      commit 1917d42d ("fcoe: use enum for fip_mode") introduces a separate
      enum for the fip_mode that shall be used during initialisation handling
      until it is passed to fcoe_ctrl_link_up to set the initial fip_state.  That
      change was incomplete and gcc quietly converted in various places between
      the fip_mode and the fip_state enum values with implicit enum conversions,
      which fortunately cannot cause any issues in the actual code's execution.
      
      clang however warns about these implicit enum conversions in the scsi
      drivers. This commit consolidates the use of the two enums, guided by
      clang's enum-conversion warnings.
      
      This commit now completes the use of the fip_mode: It expects and uses
      fip_mode in {bnx2fc,fcoe}_interface_create and fcoe_ctlr_init, and it calls
      fcoe_ctrl_set_set() with the correct values in fcoe_ctlr_link_up().  It
      also breaks the association between FIP_MODE_AUTO and FIP_ST_AUTO to
      indicate these two enums are distinct.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/151
      Fixes: 1917d42d ("fcoe: use enum for fip_mode")
      Reported-by: NDmitry Golovin <dima@golovin.in>
      Original-by: NLukas Bulwahn <lukas.bulwahn@gmail.com>
      CC: Lukas Bulwahn <lukas.bulwahn@gmail.com>
      CC: Nick Desaulniers <ndesaulniers@google.com>
      CC: Nathan Chancellor <natechancellor@gmail.com>
      Reviewed-by: NNathan Chancellor <natechancellor@gmail.com>
      Tested-by: NNathan Chancellor <natechancellor@gmail.com>
      Suggested-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      1ef1b20f
    • J
      scsi: megaraid_sas: return error when create DMA pool failed · 8032fc91
      Jason Yan 提交于
      [ Upstream commit bcf3b67d16a4c8ffae0aa79de5853435e683945c ]
      
      when create DMA pool for cmd frames failed, we should return -ENOMEM,
      instead of 0.
      In some case in:
      
          megasas_init_adapter_fusion()
      
          -->megasas_alloc_cmds()
             -->megasas_create_frame_pool
                create DMA pool failed,
              --> megasas_free_cmds() [1]
      
          -->megasas_alloc_cmds_fusion()
             failed, then goto fail_alloc_cmds.
          -->megasas_free_cmds() [2]
      
      we will call megasas_free_cmds twice, [1] will kfree cmd_list,
      [2] will use cmd_list.it will cause a problem:
      
      Unable to handle kernel NULL pointer dereference at virtual address
      00000000
      pgd = ffffffc000f70000
      [00000000] *pgd=0000001fbf893003, *pud=0000001fbf893003,
      *pmd=0000001fbf894003, *pte=006000006d000707
      Internal error: Oops: 96000005 [#1] SMP
       Modules linked in:
       CPU: 18 PID: 1 Comm: swapper/0 Not tainted
       task: ffffffdfb9290000 ti: ffffffdfb923c000 task.ti: ffffffdfb923c000
       PC is at megasas_free_cmds+0x30/0x70
       LR is at megasas_free_cmds+0x24/0x70
       ...
       Call trace:
       [<ffffffc0005b779c>] megasas_free_cmds+0x30/0x70
       [<ffffffc0005bca74>] megasas_init_adapter_fusion+0x2f4/0x4d8
       [<ffffffc0005b926c>] megasas_init_fw+0x2dc/0x760
       [<ffffffc0005b9ab0>] megasas_probe_one+0x3c0/0xcd8
       [<ffffffc0004a5abc>] local_pci_probe+0x4c/0xb4
       [<ffffffc0004a5c40>] pci_device_probe+0x11c/0x14c
       [<ffffffc00053a5e4>] driver_probe_device+0x1ec/0x430
       [<ffffffc00053a92c>] __driver_attach+0xa8/0xb0
       [<ffffffc000538178>] bus_for_each_dev+0x74/0xc8
        [<ffffffc000539e88>] driver_attach+0x28/0x34
       [<ffffffc000539a18>] bus_add_driver+0x16c/0x248
       [<ffffffc00053b234>] driver_register+0x6c/0x138
       [<ffffffc0004a5350>] __pci_register_driver+0x5c/0x6c
       [<ffffffc000ce3868>] megasas_init+0xc0/0x1a8
       [<ffffffc000082a58>] do_one_initcall+0xe8/0x1ec
       [<ffffffc000ca7be8>] kernel_init_freeable+0x1c8/0x284
       [<ffffffc0008d90b8>] kernel_init+0x1c/0xe4
      Signed-off-by: NJason Yan <yanaijie@huawei.com>
      Acked-by: NSumit Saxena <sumit.saxena@broadcom.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      8032fc91
    • S
      s390/ism: ignore some errors during deregistration · 2c6e3ec8
      Sebastian Ott 提交于
      [ Upstream commit 0ff06c44efeede4acd068847d3bf8cf894b6c664 ]
      
      Prior to dma unmap/free operations the ism driver tries to ensure
      that the memory is no longer accessed by the HW. When errors
      during deregistration of memory regions from the HW occur the ism
      driver will not unmap/free this memory.
      
      When we receive notification from the hypervisor that a PCI function
      has been detached we can no longer access the device and would never
      unmap/free these memory regions which led to complaints by the DMA
      debug API.
      
      Treat this kind of errors during the deregistration of memory regions
      from the HW as success since it is already ensured that the memory
      is no longer accessed by HW.
      Reported-by: NKarsten Graul <kgraul@linux.ibm.com>
      Reported-by: NHans Wippel <hwippel@linux.ibm.com>
      Signed-off-by: NSebastian Ott <sebott@linux.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      2c6e3ec8
    • R
      efi: cper: Fix possible out-of-bounds access · d60f458e
      Ross Lagerwall 提交于
      [ Upstream commit 45b14a4ffcc1e0b5caa246638f942cbe7eaea7ad ]
      
      When checking a generic status block, we iterate over all the generic
      data blocks. The loop condition only checks that the start of the
      generic data block is valid (within estatus->data_length) but not the
      whole block. Because the size of data blocks (excluding error data) may
      vary depending on the revision and the revision is contained within the
      data block, ensure that enough of the current data block is valid before
      dereferencing any members otherwise an out-of-bounds access may occur if
      estatus->data_length is invalid.
      
      This relies on the fact that struct acpi_hest_generic_data_v300 is a
      superset of the earlier version.  Also rework the other checks to avoid
      potential underflow.
      Signed-off-by: NRoss Lagerwall <ross.lagerwall@citrix.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Tested-by: NTyler Baicar <baicar.tyler@gmail.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      d60f458e
    • E
      cpufreq: acpi-cpufreq: Report if CPU doesn't support boost technologies · e57f4676
      Erwan Velu 提交于
      [ Upstream commit 1222d527f314c86a3b59a522115d62facc5a7965 ]
      
      There is some rare cases where CPB (and possibly IDA) are missing on
      processors.
      
      This is the case fixed by commit f7f3dc00 ("x86/cpu/AMD: Fix
      erratum 1076 (CPB bit)") and following.
      
      In such context, the boost status isn't reported by
      /sys/devices/system/cpu/cpufreq/boost.
      
      This commit is about printing a message to report that the CPU
      doesn't expose the boost capabilities.
      
      This message could help debugging platforms hit by this phenomena.
      Signed-off-by: NErwan Velu <e.velu@criteo.com>
      [ rjw: Change the message text somewhat ]
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      e57f4676
    • T
      ASoC: qcom: Fix of-node refcount unbalance in qcom_snd_parse_of() · eb70531a
      Takashi Iwai 提交于
      [ Upstream commit 70b773219a32c7b8f3e53e041bc023ad99fd81f4 ]
      
      Although qcom_snd_parse_of() tries to manage the of-node refcount,
      there are still a few places that lead to the unblanced refcount in
      the error code path.  Namely,
      
      - for_each_child_of_node() needs to unreference the iterator node if
        aborting the loop in the middle,
      - cpu, codec and platform node objects have to be unreferenced at each
        iteration,
      - platform and codec node objects have to be referred before jumping
        to the error handling code that unreference them unconditionally.
      
      This patch tries to address these by moving the assignment of platform
      and codec node objects to the beginning of the loop and adding the
      of_node_put() calls adequately.
      
      Fixes: c25e295c ("ASoC: qcom: Add support to parse common audio device nodes")
      Cc: Patrick Lai <plai@codeaurora.org>
      Cc: Banajit Goswami <bgoswami@codeaurora.org>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NMark Brown <broonie@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      eb70531a
    • W
      perf annotate: Fix getting source line failure · e6786f86
      Wei Li 提交于
      [ Upstream commit 11db1ad4513d6205d2519e1a30ff4cef746e3243 ]
      
      The output of "perf annotate -l --stdio xxx" changed since commit 425859ff
      ("perf annotate: No need to calculate notes->start twice") removed notes->start
      assignment in symbol__calc_lines(). It will get failed in
      find_address_in_section() from symbol__tty_annotate() subroutine as the
      a2l->addr is wrong. So the annotate summary doesn't report the line number of
      source code correctly.
      
      Before fix:
      
        liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ cat common_while_1.c
        void hotspot_1(void)
        {
      	volatile int i;
      
      	for (i = 0; i < 0x10000000; i++);
      	for (i = 0; i < 0x10000000; i++);
      	for (i = 0; i < 0x10000000; i++);
        }
      
        int main(void)
        {
      	hotspot_1();
      
      	return 0;
        }
        liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ gcc common_while_1.c -g -o common_while_1
      
        liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf record ./common_while_1
        [ perf record: Woken up 2 times to write data ]
        [ perf record: Captured and wrote 0.488 MB perf.data (12498 samples) ]
        liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf annotate -l -s hotspot_1 --stdio
      
        Sorted summary for file /home/liwei/main_code/hulk_work/hulk/tools/perf/common_while_1
        ----------------------------------------------
      
         19.30 common_while_1[32]
         19.03 common_while_1[4e]
         19.01 common_while_1[16]
          5.04 common_while_1[13]
          4.99 common_while_1[4b]
          4.78 common_while_1[2c]
          4.77 common_while_1[10]
          4.66 common_while_1[2f]
          4.59 common_while_1[51]
          4.59 common_while_1[35]
          4.52 common_while_1[19]
          4.20 common_while_1[56]
          0.51 common_while_1[48]
         Percent |      Source code & Disassembly of common_while_1 for cycles:ppp (12480 samples, percent: local period)
        -----------------------------------------------------------------------------------------------------------------
               :
               :
               :
               :         Disassembly of section .text:
               :
               :         00000000000005fa <hotspot_1>:
               :         hotspot_1():
               :         void hotspot_1(void)
               :         {
          0.00 :   5fa:   push   %rbp
          0.00 :   5fb:   mov    %rsp,%rbp
               :                 volatile int i;
               :
               :                 for (i = 0; i < 0x10000000; i++);
          0.00 :   5fe:   movl   $0x0,-0x4(%rbp)
          0.00 :   605:   jmp    610 <hotspot_1+0x16>
          0.00 :   607:   mov    -0x4(%rbp),%eax
         common_while_1[10]    4.77 :   60a:   add    $0x1,%eax
         common_while_1[13]    5.04 :   60d:   mov    %eax,-0x4(%rbp)
         common_while_1[16]   19.01 :   610:   mov    -0x4(%rbp),%eax
         common_while_1[19]    4.52 :   613:   cmp    $0xfffffff,%eax
            0.00 :   618:   jle    607 <hotspot_1+0xd>
                 :                 for (i = 0; i < 0x10000000; i++);
        ...
      
      After fix:
      
        liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf record ./common_while_1
        [ perf record: Woken up 2 times to write data ]
        [ perf record: Captured and wrote 0.488 MB perf.data (12500 samples) ]
        liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf annotate -l -s hotspot_1 --stdio
      
        Sorted summary for file /home/liwei/main_code/hulk_work/hulk/tools/perf/common_while_1
        ----------------------------------------------
      
         33.34 common_while_1.c:5
         33.34 common_while_1.c:6
         33.32 common_while_1.c:7
         Percent |      Source code & Disassembly of common_while_1 for cycles:ppp (12482 samples, percent: local period)
        -----------------------------------------------------------------------------------------------------------------
               :
               :
               :
               :         Disassembly of section .text:
               :
               :         00000000000005fa <hotspot_1>:
               :         hotspot_1():
               :         void hotspot_1(void)
               :         {
          0.00 :   5fa:   push   %rbp
          0.00 :   5fb:   mov    %rsp,%rbp
               :                 volatile int i;
               :
               :                 for (i = 0; i < 0x10000000; i++);
          0.00 :   5fe:   movl   $0x0,-0x4(%rbp)
          0.00 :   605:   jmp    610 <hotspot_1+0x16>
          0.00 :   607:   mov    -0x4(%rbp),%eax
         common_while_1.c:5    4.70 :   60a:   add    $0x1,%eax
          4.89 :   60d:   mov    %eax,-0x4(%rbp)
         common_while_1.c:5   19.03 :   610:   mov    -0x4(%rbp),%eax
         common_while_1.c:5    4.72 :   613:   cmp    $0xfffffff,%eax
          0.00 :   618:   jle    607 <hotspot_1+0xd>
               :                 for (i = 0; i < 0x10000000; i++);
          0.00 :   61a:   movl   $0x0,-0x4(%rbp)
          0.00 :   621:   jmp    62c <hotspot_1+0x32>
          0.00 :   623:   mov    -0x4(%rbp),%eax
         common_while_1.c:6    4.54 :   626:   add    $0x1,%eax
          4.73 :   629:   mov    %eax,-0x4(%rbp)
         common_while_1.c:6   19.54 :   62c:   mov    -0x4(%rbp),%eax
         common_while_1.c:6    4.54 :   62f:   cmp    $0xfffffff,%eax
        ...
      Signed-off-by: NWei Li <liwei391@huawei.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Fixes: 425859ff ("perf annotate: No need to calculate notes->start twice")
      Link: http://lkml.kernel.org/r/20190221095716.39529-1-liwei391@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      e6786f86
    • K
      clk: fractional-divider: check parent rate only if flag is set · 763a895a
      Katsuhiro Suzuki 提交于
      [ Upstream commit d13501a2bedfbea0983cc868d3f1dc692627f60d ]
      
      Custom approximation of fractional-divider may not need parent clock
      rate checking. For example Rockchip SoCs work fine using grand parent
      clock rate even if target rate is greater than parent.
      
      This patch checks parent clock rate only if CLK_SET_RATE_PARENT flag
      is set.
      
      For detailed example, clock tree of Rockchip I2S audio hardware.
        - Clock rate of CPLL is 1.2GHz, GPLL is 491.52MHz.
        - i2s1_div is integer divider can divide N (N is 1~128).
          Input clock is CPLL or GPLL. Initial divider value is N = 1.
          Ex) PLL = CPLL, N = 10, i2s1_div output rate is
            CPLL / 10 = 1.2GHz / 10 = 120MHz
        - i2s1_frac is fractional divider can divide input to x/y, x and
          y are 16bit integer.
      
      CPLL --> | selector | ---> i2s1_div -+--> | selector | --> I2S1 MCLK
      GPLL --> |          | ,--------------'    |          |
                            `--> i2s1_frac ---> |          |
      
      Clock mux system try to choose suitable one from i2s1_div and
      i2s1_frac for master clock (MCLK) of I2S1.
      
      Bad scenario as follows:
        - Try to set MCLK to 8.192MHz (32kHz audio replay)
          Candidate setting is
          - i2s1_div: GPLL / 60 = 8.192MHz
          i2s1_div candidate is exactly same as target clock rate, so mux
          choose this clock source. i2s1_div output rate is changed
          491.52MHz -> 8.192MHz
      
        - After that try to set to 11.2896MHz (44.1kHz audio replay)
          Candidate settings are
          - i2s1_div : CPLL / 107 = 11.214945MHz
          - i2s1_frac: i2s1_div   = 8.192MHz
            This is because clk_fd_round_rate() thinks target rate
            (11.2896MHz) is higher than parent rate (i2s1_div = 8.192MHz)
            and returns parent clock rate.
      
      Above is current upstreamed behavior. Clock mux system choose
      i2s1_div, but this clock rate is not acceptable for I2S driver, so
      users cannot replay audio.
      
      Expected behavior is:
        - Try to set master clock to 11.2896MHz (44.1kHz audio replay)
          Candidate settings are
          - i2s1_div : CPLL / 107          = 11.214945MHz
          - i2s1_frac: i2s1_div * 147/6400 = 11.2896MHz
                       Change i2s1_div to GPLL / 1 = 491.52MHz at same
                       time.
      
      If apply this commit, clk_fd_round_rate() calls custom approximate
      function of Rockchip even if target rate is higher than parent.
      Custom function changes both grand parent (i2s1_div) and parent
      (i2s_frac) settings at same time. Clock mux system can choose
      i2s1_frac and audio works fine.
      Signed-off-by: NKatsuhiro Suzuki <katsuhiro@katsuster.net>
      Reviewed-by: NHeiko Stuebner <heiko@sntech.de>
      [sboyd@kernel.org: Make function into a macro instead]
      Signed-off-by: NStephen Boyd <sboyd@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      763a895a
    • H
      IB/mlx4: Increase the timeout for CM cache · d3ec442d
      Håkon Bugge 提交于
      [ Upstream commit 2612d723aadcf8281f9bf8305657129bd9f3cd57 ]
      
      Using CX-3 virtual functions, either from a bare-metal machine or
      pass-through from a VM, MAD packets are proxied through the PF driver.
      
      Since the VF drivers have separate name spaces for MAD Transaction Ids
      (TIDs), the PF driver has to re-map the TIDs and keep the book keeping
      in a cache.
      
      Following the RDMA Connection Manager (CM) protocol, it is clear when
      an entry has to evicted form the cache. But life is not perfect,
      remote peers may die or be rebooted. Hence, it's a timeout to wipe out
      a cache entry, when the PF driver assumes the remote peer has gone.
      
      During workloads where a high number of QPs are destroyed concurrently,
      excessive amount of CM DREQ retries has been observed
      
      The problem can be demonstrated in a bare-metal environment, where two
      nodes have instantiated 8 VFs each. This using dual ported HCAs, so we
      have 16 vPorts per physical server.
      
      64 processes are associated with each vPort and creates and destroys
      one QP for each of the remote 64 processes. That is, 1024 QPs per
      vPort, all in all 16K QPs. The QPs are created/destroyed using the
      CM.
      
      When tearing down these 16K QPs, excessive CM DREQ retries (and
      duplicates) are observed. With some cat/paste/awk wizardry on the
      infiniband_cm sysfs, we observe as sum of the 16 vPorts on one of the
      nodes:
      
      cm_rx_duplicates:
            dreq  2102
      cm_rx_msgs:
            drep  1989
            dreq  6195
             rep  3968
             req  4224
             rtu  4224
      cm_tx_msgs:
            drep  4093
            dreq 27568
             rep  4224
             req  3968
             rtu  3968
      cm_tx_retries:
            dreq 23469
      
      Note that the active/passive side is equally distributed between the
      two nodes.
      
      Enabling pr_debug in cm.c gives tons of:
      
      [171778.814239] <mlx4_ib> mlx4_ib_multiplex_cm_handler: id{slave:
      1,sl_cm_id: 0xd393089f} is NULL!
      
      By increasing the CM_CLEANUP_CACHE_TIMEOUT from 5 to 30 seconds, the
      tear-down phase of the application is reduced from approximately 90 to
      50 seconds. Retries/duplicates are also significantly reduced:
      
      cm_rx_duplicates:
            dreq  2460
      []
      cm_tx_retries:
            dreq  3010
             req    47
      
      Increasing the timeout further didn't help, as these duplicates and
      retries stems from a too short CMA timeout, which was 20 (~4 seconds)
      on the systems. By increasing the CMA timeout to 22 (~17 seconds), the
      numbers fell down to about 10 for both of them.
      
      Adjustment of the CMA timeout is not part of this commit.
      Signed-off-by: NHåkon Bugge <haakon.bugge@oracle.com>
      Acked-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      d3ec442d
    • D
      loop: set GENHD_FL_NO_PART_SCAN after blkdev_reread_part() · 61584032
      Dongli Zhang 提交于
      [ Upstream commit 758a58d0bc67457f1215321a536226654a830eeb ]
      
      Commit 0da03cab87e6
      ("loop: Fix deadlock when calling blkdev_reread_part()") moves
      blkdev_reread_part() out of the loop_ctl_mutex. However,
      GENHD_FL_NO_PART_SCAN is set before __blkdev_reread_part(). As a result,
      __blkdev_reread_part() will fail the check of GENHD_FL_NO_PART_SCAN and
      will not rescan the loop device to delete all partitions.
      
      Below are steps to reproduce the issue:
      
      step1 # dd if=/dev/zero of=tmp.raw bs=1M count=100
      step2 # losetup -P /dev/loop0 tmp.raw
      step3 # parted /dev/loop0 mklabel gpt
      step4 # parted -a none -s /dev/loop0 mkpart primary 64s 1
      step5 # losetup -d /dev/loop0
      
      Step5 will not be able to delete /dev/loop0p1 (introduced by step4) and
      there is below kernel warning message:
      
      [  464.414043] __loop_clr_fd: partition scan of loop0 failed (rc=-22)
      
      This patch sets GENHD_FL_NO_PART_SCAN after blkdev_reread_part().
      
      Fixes: 0da03cab87e6 ("loop: Fix deadlock when calling blkdev_reread_part()")
      Signed-off-by: NDongli Zhang <dongli.zhang@oracle.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      61584032
    • V
      platform/mellanox: mlxreg-hotplug: Fix KASAN warning · 07a31820
      Vadim Pasternak 提交于
      [ Upstream commit e4c275f77624961b56cce397814d9d770a45ac59 ]
      
      Fix the following KASAN warning produced when booting a 64-bit kernel:
      [   13.334750] BUG: KASAN: stack-out-of-bounds in find_first_bit+0x19/0x70
      [   13.342166] Read of size 8 at addr ffff880235067178 by task kworker/2:1/42
      [   13.342176] CPU: 2 PID: 42 Comm: kworker/2:1 Not tainted 4.20.0-rc1+ #106
      [   13.342179] Hardware name: Mellanox Technologies Ltd. MSN2740/Mellanox x86 SFF board, BIOS 5.6.5 06/07/2016
      [   13.342190] Workqueue: events deferred_probe_work_func
      [   13.342194] Call Trace:
      [   13.342206]  dump_stack+0xc7/0x15b
      [   13.342214]  ? show_regs_print_info+0x5/0x5
      [   13.342220]  ? kmsg_dump_rewind_nolock+0x59/0x59
      [   13.342234]  ? _raw_write_lock_irqsave+0x100/0x100
      [   13.351593]  print_address_description+0x73/0x260
      [   13.351603]  kasan_report+0x260/0x380
      [   13.351611]  ? find_first_bit+0x19/0x70
      [   13.351619]  find_first_bit+0x19/0x70
      [   13.351630]  mlxreg_hotplug_work_handler+0x73c/0x920 [mlxreg_hotplug]
      [   13.351639]  ? __lock_text_start+0x8/0x8
      [   13.351646]  ? _raw_write_lock_irqsave+0x80/0x100
      [   13.351656]  ? mlxreg_hotplug_remove+0x1e0/0x1e0 [mlxreg_hotplug]
      [   13.351663]  ? regmap_volatile+0x40/0xb0
      [   13.351668]  ? regcache_write+0x4c/0x90
      [   13.351676]  ? mlxplat_mlxcpld_reg_write+0x24/0x30 [mlx_platform]
      [   13.351681]  ? _regmap_write+0xea/0x220
      [   13.351688]  ? __mutex_lock_slowpath+0x10/0x10
      [   13.351696]  ? devm_add_action+0x70/0x70
      [   13.351701]  ? mutex_unlock+0x1d/0x40
      [   13.351710]  mlxreg_hotplug_probe+0x82e/0x989 [mlxreg_hotplug]
      [   13.351723]  ? mlxreg_hotplug_work_handler+0x920/0x920 [mlxreg_hotplug]
      [   13.351731]  ? sysfs_do_create_link_sd.isra.2+0xf4/0x190
      [   13.351737]  ? sysfs_rename_link_ns+0xf0/0xf0
      [   13.351743]  ? devres_close_group+0x2b0/0x2b0
      [   13.351749]  ? pinctrl_put+0x20/0x20
      [   13.351755]  ? acpi_dev_pm_attach+0x2c/0xd0
      [   13.351763]  platform_drv_probe+0x70/0xd0
      [   13.351771]  really_probe+0x480/0x6e0
      [   13.351778]  ? device_attach+0x10/0x10
      [   13.351784]  ? __lock_text_start+0x8/0x8
      [   13.351790]  ? _raw_write_lock_irqsave+0x80/0x100
      [   13.351797]  ? _raw_write_lock_irqsave+0x80/0x100
      [   13.351806]  ? __driver_attach+0x190/0x190
      [   13.351812]  driver_probe_device+0x17d/0x1a0
      [   13.351819]  ? __driver_attach+0x190/0x190
      [   13.351825]  bus_for_each_drv+0xd6/0x130
      [   13.351831]  ? bus_rescan_devices+0x20/0x20
      [   13.351837]  ? __mutex_lock_slowpath+0x10/0x10
      [   13.351845]  __device_attach+0x18c/0x230
      [   13.351852]  ? device_bind_driver+0x70/0x70
      [   13.351859]  ? __mutex_lock_slowpath+0x10/0x10
      [   13.351866]  bus_probe_device+0xea/0x110
      [   13.351874]  deferred_probe_work_func+0x1c9/0x290
      [   13.351882]  ? driver_deferred_probe_add+0x1d0/0x1d0
      [   13.351889]  ? preempt_notifier_dec+0x20/0x20
      [   13.351897]  ? read_word_at_a_time+0xe/0x20
      [   13.351904]  ? strscpy+0x151/0x290
      [   13.351912]  ? set_work_pool_and_clear_pending+0x9c/0xf0
      [   13.351918]  ? __switch_to_asm+0x34/0x70
      [   13.351924]  ? __switch_to_asm+0x40/0x70
      [   13.351929]  ? __switch_to_asm+0x34/0x70
      [   13.351935]  ? __switch_to_asm+0x40/0x70
      [   13.351942]  process_one_work+0x5cc/0xa00
      [   13.351952]  ? pwq_dec_nr_in_flight+0x1e0/0x1e0
      [   13.351960]  ? pci_mmcfg_check_reserved+0x80/0xb8
      [   13.351967]  ? run_rebalance_domains+0x250/0x250
      [   13.351980]  ? stack_access_ok+0x35/0x80
      [   13.351986]  ? deref_stack_reg+0xa1/0xe0
      [   13.351994]  ? schedule+0xcd/0x250
      [   13.352000]  ? worker_enter_idle+0x2d6/0x330
      [   13.352006]  ? __schedule+0xeb0/0xeb0
      [   13.352014]  ? fork_usermode_blob+0x130/0x130
      [   13.352019]  ? mutex_lock+0xa7/0x100
      [   13.352026]  ? _raw_spin_lock_irq+0x98/0xf0
      [   13.352032]  ? _raw_read_unlock_irqrestore+0x30/0x30
      [   13.352037] i2c i2c-2: Added multiplexed i2c bus 11
      [   13.352043]  worker_thread+0x181/0xa80
      [   13.352052]  ? __switch_to_asm+0x34/0x70
      [   13.352058]  ? __switch_to_asm+0x40/0x70
      [   13.352064]  ? process_one_work+0xa00/0xa00
      [   13.352070]  ? __switch_to_asm+0x34/0x70
      [   13.352076]  ? __switch_to_asm+0x40/0x70
      [   13.352081]  ? __switch_to_asm+0x34/0x70
      [   13.352086]  ? __switch_to_asm+0x40/0x70
      [   13.352092]  ? __switch_to_asm+0x34/0x70
      [   13.352097]  ? __switch_to_asm+0x40/0x70
      [   13.352105]  ? __schedule+0x3d6/0xeb0
      [   13.352112]  ? migrate_swap_stop+0x470/0x470
      [   13.352119]  ? save_stack+0x89/0xb0
      [   13.352127]  ? kmem_cache_alloc_trace+0xe5/0x570
      [   13.352132]  ? kthread+0x59/0x1d0
      [   13.352138]  ? ret_from_fork+0x35/0x40
      [   13.352154]  ? __schedule+0xeb0/0xeb0
      [   13.352161]  ? remove_wait_queue+0x150/0x150
      [   13.352169]  ? _raw_write_lock_irqsave+0x80/0x100
      [   13.352175]  ? __lock_text_start+0x8/0x8
      [   13.352183]  ? process_one_work+0xa00/0xa00
      [   13.352188]  kthread+0x1a4/0x1d0
      [   13.352195]  ? kthread_create_worker_on_cpu+0xc0/0xc0
      [   13.352202]  ret_from_fork+0x35/0x40
      
      [   13.353879] The buggy address belongs to the page:
      [   13.353885] page:ffffea0008d419c0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
      [   13.353890] flags: 0x2ffff8000000000()
      [   13.353897] raw: 02ffff8000000000 ffffea0008d419c8 ffffea0008d419c8 0000000000000000
      [   13.353903] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
      [   13.353905] page dumped because: kasan: bad access detected
      
      [   13.353908] Memory state around the buggy address:
      [   13.353912]  ffff880235067000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   13.353917]  ffff880235067080: 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 04
      [   13.353921] >ffff880235067100: f2 f2 f2 f2 f2 f2 f2 04 f2 f2 f2 f2 f2 f2 f2 04
      [   13.353923]                                                                 ^
      [   13.353927]  ffff880235067180: f2 f2 f2 f2 f2 f2 f2 04 f2 f2 f2 00 00 00 00 00
      [   13.353931]  ffff880235067200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   13.353933] ==================================================================
      
      The warning is caused by the below loop:
      	for_each_set_bit(bit, (unsigned long *)&asserted, 8) {
      while "asserted" is declared as 'unsigned'.
      
      The casting of 32-bit unsigned integer pointer to a 64-bit unsigned long
      pointer. There are two problems here.
      It causes the access of four extra byte, which can corrupt memory
      The 32-bit pointer address may not be 64-bit aligned.
      
      The fix changes variable "asserted" to "unsigned long".
      
      Fixes: 1f976f69 ("platform/x86: Move Mellanox platform hotplug driver to platform/mellanox")
      Signed-off-by: NVadim Pasternak <vadimp@mellanox.com>
      Signed-off-by: NDarren Hart (VMware) <dvhart@infradead.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      07a31820
    • Y
      platform/x86: ideapad-laptop: Fix no_hw_rfkill_list for Lenovo RESCUER R720-15IKBN · 0bacfb4a
      Yang Fan 提交于
      [ Upstream commit 4d9b2864a415fec39150bc13efc730c7eb88711e ]
      
      Commit ae7c8cba ("platform/x86: ideapad-laptop: add lenovo RESCUER
      R720-15IKBN to no_hw_rfkill_list") added
          DMI_MATCH(DMI_BOARD_NAME, "80WW")
      for Lenovo RESCUER R720-15IKBN.
      
      But DMI_BOARD_NAME does not match 80WW on Lenovo RESCUER R720-15IKBN,
      thus cause Wireless LAN still be hard blocked.
      
      On Lenovo RESCUER R720-15IKBN:
          ~$ cat /sys/class/dmi/id/sys_vendor
          LENOVO
          ~$ cat /sys/class/dmi/id/board_name
          Provence-5R3
          ~$ cat /sys/class/dmi/id/product_name
          80WW
          ~$ cat /sys/class/dmi/id/product_version
          Lenovo R720-15IKBN
      
      So on Lenovo RESCUER R720-15IKBN:
          DMI_SYS_VENDOR should match "LENOVO",
          DMI_BOARD_NAME should match "Provence-5R3",
          DMI_PRODUCT_NAME should match "80WW",
          DMI_PRODUCT_VERSION should match "Lenovo R720-15IKBN".
      
      Fix it, and in according with other entries in no_hw_rfkill_list,
      use DMI_PRODUCT_VERSION instead of DMI_BOARD_NAME.
      
      Fixes: ae7c8cba ("platform/x86: ideapad-laptop: add lenovo RESCUER R720-15IKBN to no_hw_rfkill_list")
      Signed-off-by: NYang Fan <nullptr.cpp@gmail.com>
      Signed-off-by: NDarren Hart (VMware) <dvhart@infradead.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      0bacfb4a
    • F
      mlxsw: spectrum: Avoid -Wformat-truncation warnings · a64ffbaf
      Florian Fainelli 提交于
      [ Upstream commit ab2c4e2581ad32c28627235ff0ae8c5a5ea6899f ]
      
      Give precision identifiers to the two snprintf() formatting the priority
      and TC strings to avoid producing these two warnings:
      
      drivers/net/ethernet/mellanox/mlxsw/spectrum.c: In function
      'mlxsw_sp_port_get_prio_strings':
      drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2132:37: warning: '%d'
      directive output may be truncated writing between 1 and 3 bytes into a
      region of size between 0 and 31 [-Wformat-truncation=]
         snprintf(*p, ETH_GSTRING_LEN, "%s_%d",
                                           ^~
      drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2132:3: note: 'snprintf'
      output between 3 and 36 bytes into a destination of size 32
         snprintf(*p, ETH_GSTRING_LEN, "%s_%d",
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           mlxsw_sp_port_hw_prio_stats[i].str, prio);
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      drivers/net/ethernet/mellanox/mlxsw/spectrum.c: In function
      'mlxsw_sp_port_get_tc_strings':
      drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2143:37: warning: '%d'
      directive output may be truncated writing between 1 and 11 bytes into a
      region of size between 0 and 31 [-Wformat-truncation=]
         snprintf(*p, ETH_GSTRING_LEN, "%s_%d",
                                           ^~
      drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2143:3: note: 'snprintf'
      output between 3 and 44 bytes into a destination of size 32
         snprintf(*p, ETH_GSTRING_LEN, "%s_%d",
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           mlxsw_sp_port_hw_tc_stats[i].str, tc);
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      a64ffbaf
    • F
      e1000e: Fix -Wformat-truncation warnings · 49dd86f0
      Florian Fainelli 提交于
      [ Upstream commit 135e7245479addc6b1f5d031e3d7e2ddb3d2b109 ]
      
      Provide precision hints to snprintf() since we know the destination
      buffer size of the RX/TX ring names are IFNAMSIZ + 5 - 1. This fixes the
      following warnings:
      
      drivers/net/ethernet/intel/e1000e/netdev.c: In function
      'e1000_request_msix':
      drivers/net/ethernet/intel/e1000e/netdev.c:2109:13: warning: 'snprintf'
      output may be truncated before the last format character
      [-Wformat-truncation=]
           "%s-rx-0", netdev->name);
                   ^
      drivers/net/ethernet/intel/e1000e/netdev.c:2107:3: note: 'snprintf'
      output between 6 and 21 bytes into a destination of size 20
         snprintf(adapter->rx_ring->name,
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           sizeof(adapter->rx_ring->name) - 1,
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           "%s-rx-0", netdev->name);
           ~~~~~~~~~~~~~~~~~~~~~~~~
      drivers/net/ethernet/intel/e1000e/netdev.c:2125:13: warning: 'snprintf'
      output may be truncated before the last format character
      [-Wformat-truncation=]
           "%s-tx-0", netdev->name);
                   ^
      drivers/net/ethernet/intel/e1000e/netdev.c:2123:3: note: 'snprintf'
      output between 6 and 21 bytes into a destination of size 20
         snprintf(adapter->tx_ring->name,
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           sizeof(adapter->tx_ring->name) - 1,
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           "%s-tx-0", netdev->name);
           ~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      49dd86f0
    • A
      net: dsa: mv88e6xxx: Add lockdep classes to fix false positive splat · c6fb45d8
      Andrew Lunn 提交于
      [ Upstream commit f6d9758b12660484b6639364cc406da92a918c96 ]
      
      The following false positive lockdep splat has been observed.
      
      ======================================================
      WARNING: possible circular locking dependency detected
      4.20.0+ #302 Not tainted
      ------------------------------------------------------
      systemd-udevd/160 is trying to acquire lock:
      edea6080 (&chip->reg_lock){+.+.}, at: __setup_irq+0x640/0x704
      
      but task is already holding lock:
      edff0340 (&desc->request_mutex){+.+.}, at: __setup_irq+0xa0/0x704
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #1 (&desc->request_mutex){+.+.}:
             mutex_lock_nested+0x1c/0x24
             __setup_irq+0xa0/0x704
             request_threaded_irq+0xd0/0x150
             mv88e6xxx_probe+0x41c/0x694 [mv88e6xxx]
             mdio_probe+0x2c/0x54
             really_probe+0x200/0x2c4
             driver_probe_device+0x5c/0x174
             __driver_attach+0xd8/0xdc
             bus_for_each_dev+0x58/0x7c
             bus_add_driver+0xe4/0x1f0
             driver_register+0x7c/0x110
             mdio_driver_register+0x24/0x58
             do_one_initcall+0x74/0x2e8
             do_init_module+0x60/0x1d0
             load_module+0x1968/0x1ff4
             sys_finit_module+0x8c/0x98
             ret_fast_syscall+0x0/0x28
             0xbedf2ae8
      
      -> #0 (&chip->reg_lock){+.+.}:
             __mutex_lock+0x50/0x8b8
             mutex_lock_nested+0x1c/0x24
             __setup_irq+0x640/0x704
             request_threaded_irq+0xd0/0x150
             mv88e6xxx_g2_irq_setup+0xcc/0x1b4 [mv88e6xxx]
             mv88e6xxx_probe+0x44c/0x694 [mv88e6xxx]
             mdio_probe+0x2c/0x54
             really_probe+0x200/0x2c4
             driver_probe_device+0x5c/0x174
             __driver_attach+0xd8/0xdc
             bus_for_each_dev+0x58/0x7c
             bus_add_driver+0xe4/0x1f0
             driver_register+0x7c/0x110
             mdio_driver_register+0x24/0x58
             do_one_initcall+0x74/0x2e8
             do_init_module+0x60/0x1d0
             load_module+0x1968/0x1ff4
             sys_finit_module+0x8c/0x98
             ret_fast_syscall+0x0/0x28
             0xbedf2ae8
      
      other info that might help us debug this:
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(&desc->request_mutex);
                                     lock(&chip->reg_lock);
                                     lock(&desc->request_mutex);
        lock(&chip->reg_lock);
      
      &desc->request_mutex refer to two different mutex. #1 is the GPIO for
      the chip interrupt. #2 is the chained interrupt between global 1 and
      global 2.
      
      Add lockdep classes to the GPIO interrupt to avoid this.
      Reported-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      c6fb45d8
    • A
      mmc: omap: fix the maximum timeout setting · 194b888a
      Aaro Koskinen 提交于
      [ Upstream commit a6327b5e57fdc679c842588c3be046c0b39cc127 ]
      
      When running OMAP1 kernel on QEMU, MMC access is annoyingly noisy:
      
      	MMC: CTO of 0xff and 0xfe cannot be used!
      	MMC: CTO of 0xff and 0xfe cannot be used!
      	MMC: CTO of 0xff and 0xfe cannot be used!
      	[ad inf.]
      
      Emulator warnings appear to be valid. The TI document SPRU680 [1]
      ("OMAP5910 Dual-Core Processor MultiMedia Card/Secure Data Memory Card
      (MMC/SD) Reference Guide") page 36 states that the maximum timeout is 253
      cycles and "0xff and 0xfe cannot be used".
      
      Fix by using 0xfd as the maximum timeout.
      
      Tested using QEMU 2.5 (Siemens SX1 machine, OMAP310), and also checked on
      real hardware using Palm TE (OMAP310), Nokia 770 (OMAP1710) and Nokia N810
      (OMAP2420) that MMC works as before.
      
      [1] http://www.ti.com/lit/ug/spru680/spru680.pdf
      
      Fixes: 730c9b7e ("[MMC] Add OMAP MMC host driver")
      Signed-off-by: NAaro Koskinen <aaro.koskinen@iki.fi>
      Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      194b888a
    • Q
      btrfs: qgroup: Make qgroup async transaction commit more aggressive · dcedd379
      Qu Wenruo 提交于
      [ Upstream commit f5fef4593653dfa2a865c485bb81415de51d5c99 ]
      
      [BUG]
      Btrfs qgroup will still hit EDQUOT under the following case:
      
        $ dev=/dev/test/test
        $ mnt=/mnt/btrfs
        $ umount $mnt &> /dev/null
        $ umount $dev &> /dev/null
      
        $ mkfs.btrfs -f $dev
        $ mount $dev $mnt -o nospace_cache
      
        $ btrfs subv create $mnt/subv
        $ btrfs quota enable $mnt
        $ btrfs quota rescan -w $mnt
        $ btrfs qgroup limit -e 1G $mnt/subv
      
        $ fallocate -l 900M $mnt/subv/padding
        $ sync
      
        $ rm $mnt/subv/padding
      
        # Hit EDQUOT
        $ xfs_io -f -c "pwrite 0 512M" $mnt/subv/real_file
      
      [CAUSE]
      Since commit a514d638 ("btrfs: qgroup: Commit transaction in advance
      to reduce early EDQUOT"), btrfs is not forced to commit transaction to
      reclaim more quota space.
      
      Instead, we just check pertrans metadata reservation against some
      threshold and try to do asynchronously transaction commit.
      
      However in above case, the pertrans metadata reservation is pretty small
      thus it will never trigger asynchronous transaction commit.
      
      [FIX]
      Instead of only accounting pertrans metadata reservation, we calculate
      how much free space we have, and if there isn't much free space left,
      commit transaction asynchronously to try to free some space.
      
      This may slow down the fs when we have less than 32M free qgroup space,
      but should reduce a lot of false EDQUOT, so the cost should be
      acceptable.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      dcedd379
    • A
      powerpc/hugetlb: Handle mmap_min_addr correctly in get_unmapped_area callback · 6cf5f631
      Aneesh Kumar K.V 提交于
      [ Upstream commit 5330367fa300742a97e20e953b1f77f48392faae ]
      
      After we ALIGN up the address we need to make sure we didn't overflow
      and resulted in zero address. In that case, we need to make sure that
      the returned address is greater than mmap_min_addr.
      
      This fixes selftest va_128TBswitch --run-hugetlb reporting failures when
      run as non root user for
      
      mmap(-1, MAP_HUGETLB)
      
      The bug is that a non-root user requesting address -1 will be given address 0
      which will then fail, whereas they should have been given something else that
      would have succeeded.
      
      We also avoid the first mmap(-1, MAP_HUGETLB) returning NULL address as mmap address
      with this change. So we think this is not a security issue, because it only affects
      whether we choose an address below mmap_min_addr, not whether we
      actually allow that address to be mapped. ie. there are existing capability
      checks to prevent a user mapping below mmap_min_addr and those will still be
      honoured even without this fix.
      
      Fixes: 48483760 ("powerpc/mm: Add radix support for hugetlb")
      Reviewed-by: NLaurent Dufour <ldufour@linux.vnet.ibm.com>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      6cf5f631
    • N
      iommu/io-pgtable-arm-v7s: Only kmemleak_ignore L2 tables · fc96b44c
      Nicolas Boichat 提交于
      [ Upstream commit 032ebd8548c9d05e8d2bdc7a7ec2fe29454b0ad0 ]
      
      L1 tables are allocated with __get_dma_pages, and therefore already
      ignored by kmemleak.
      
      Without this, the kernel would print this error message on boot,
      when the first L1 table is allocated:
      
      [    2.810533] kmemleak: Trying to color unknown object at 0xffffffd652388000 as Black
      [    2.818190] CPU: 5 PID: 39 Comm: kworker/5:0 Tainted: G S                4.19.16 #8
      [    2.831227] Workqueue: events deferred_probe_work_func
      [    2.836353] Call trace:
      ...
      [    2.852532]  paint_ptr+0xa0/0xa8
      [    2.855750]  kmemleak_ignore+0x38/0x6c
      [    2.859490]  __arm_v7s_alloc_table+0x168/0x1f4
      [    2.863922]  arm_v7s_alloc_pgtable+0x114/0x17c
      [    2.868354]  alloc_io_pgtable_ops+0x3c/0x78
      ...
      
      Fixes: e5fc9753 ("iommu/io-pgtable: Add ARMv7 short descriptor support")
      Signed-off-by: NNicolas Boichat <drinkcat@chromium.org>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      fc96b44c
    • S
      ARM: 8840/1: use a raw_spinlock_t in unwind · d81bdb3c
      Sebastian Andrzej Siewior 提交于
      [ Upstream commit 74ffe79ae538283bbf7c155e62339f1e5c87b55a ]
      
      Mostly unwind is done with irqs enabled however SLUB may call it with
      irqs disabled while creating a new SLUB cache.
      
      I had system freeze while loading a module which called
      kmem_cache_create() on init. That means SLUB's __slab_alloc() disabled
      interrupts and then
      
      ->new_slab_objects()
       ->new_slab()
        ->setup_object()
         ->setup_object_debug()
          ->init_tracking()
           ->set_track()
            ->save_stack_trace()
             ->save_stack_trace_tsk()
              ->walk_stackframe()
               ->unwind_frame()
                ->unwind_find_idx()
                 =>spin_lock_irqsave(&unwind_lock);
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      d81bdb3c
    • L
      serial: 8250_pxa: honor the port number from devicetree · 95130717
      Lubomir Rintel 提交于
      [ Upstream commit fe9ed6d2483fda55465f32924fb15bce0fac3fac ]
      
      Like the other OF-enabled drivers, use the port number from the firmware if
      the devicetree specifies an alias:
      
        aliases {
            ...
            serial2 = &uart2; /* Should be ttyS2 */
        }
      
      This is how the deprecated pxa.c driver behaved, switching to 8250_pxa
      messes up the numbering.
      Signed-off-by: NLubomir Rintel <lkundrak@v3.sk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      95130717
    • S
      coresight: etm4x: Add support to enable ETMv4.2 · 2636ccec
      Sai Prakash Ranjan 提交于
      [ Upstream commit 5666dfd1d8a45a167f0d8b4ef47ea7f780b1f24a ]
      
      SDM845 has ETMv4.2 and can use the existing etm4x driver.
      But the current etm driver checks only for ETMv4.0 and
      errors out for other etm4x versions. This patch adds this
      missing support to enable SoC's with ETMv4x to use same
      driver by checking only the ETM architecture major version
      number.
      
      Without this change, we get below error during etm probe:
      
      / # dmesg | grep etm
      [    6.660093] coresight-etm4x: probe of 7040000.etm failed with error -22
      [    6.666902] coresight-etm4x: probe of 7140000.etm failed with error -22
      [    6.673708] coresight-etm4x: probe of 7240000.etm failed with error -22
      [    6.680511] coresight-etm4x: probe of 7340000.etm failed with error -22
      [    6.687313] coresight-etm4x: probe of 7440000.etm failed with error -22
      [    6.694113] coresight-etm4x: probe of 7540000.etm failed with error -22
      [    6.700914] coresight-etm4x: probe of 7640000.etm failed with error -22
      [    6.707717] coresight-etm4x: probe of 7740000.etm failed with error -22
      
      With this change, etm probe is successful:
      
      / # dmesg | grep etm
      [    6.659198] coresight-etm4x 7040000.etm: CPU0: ETM v4.2 initialized
      [    6.665848] coresight-etm4x 7140000.etm: CPU1: ETM v4.2 initialized
      [    6.672493] coresight-etm4x 7240000.etm: CPU2: ETM v4.2 initialized
      [    6.679129] coresight-etm4x 7340000.etm: CPU3: ETM v4.2 initialized
      [    6.685770] coresight-etm4x 7440000.etm: CPU4: ETM v4.2 initialized
      [    6.692403] coresight-etm4x 7540000.etm: CPU5: ETM v4.2 initialized
      [    6.699024] coresight-etm4x 7640000.etm: CPU6: ETM v4.2 initialized
      [    6.705646] coresight-etm4x 7740000.etm: CPU7: ETM v4.2 initialized
      Signed-off-by: NSai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
      Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      2636ccec
    • N
      powerpc/xmon: Fix opcode being uninitialized in print_insn_powerpc · c70214d5
      Nathan Chancellor 提交于
      [ Upstream commit e7140639b1de65bba435a6bd772d134901141f86 ]
      
      When building with -Wsometimes-uninitialized, Clang warns:
      
        arch/powerpc/xmon/ppc-dis.c:157:7: warning: variable 'opcode' is used
        uninitialized whenever 'if' condition is false
        [-Wsometimes-uninitialized]
          if (cpu_has_feature(CPU_FTRS_POWER9))
              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        arch/powerpc/xmon/ppc-dis.c:167:7: note: uninitialized use occurs here
          if (opcode == NULL)
              ^~~~~~
        arch/powerpc/xmon/ppc-dis.c:157:3: note: remove the 'if' if its
        condition is always true
          if (cpu_has_feature(CPU_FTRS_POWER9))
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        arch/powerpc/xmon/ppc-dis.c:132:38: note: initialize the variable
        'opcode' to silence this warning
          const struct powerpc_opcode *opcode;
                                             ^
                                              = NULL
        1 warning generated.
      
      This warning seems to make no sense on the surface because opcode is set
      to NULL right below this statement. However, there is a comma instead of
      semicolon to end the dialect assignment, meaning that the opcode
      assignment only happens in the if statement. Properly terminate that
      line so that Clang no longer warns.
      
      Fixes: 5b102782 ("powerpc/xmon: Enable disassembly files (compilation changes)")
      Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
      Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      c70214d5
    • M
      kbuild: invoke syncconfig if include/config/auto.conf.cmd is missing · 638ecaf5
      Masahiro Yamada 提交于
      [ Upstream commit 9390dff66a52d1a60c6e517d8fa6cdbdffc83cb1 ]
      
      If include/config/auto.conf.cmd is lost for some reasons, it is not
      self-healing, so the top Makefile misses to run syncconfig.
      Move include/config/auto.conf.cmd to the target side.
      
      I used a pattern rule instead of a normal rule here although it is
      a bit gross.
      
      If the rule were written with a normal rule like this,
      
        include/config/auto.conf \
        include/config/auto.conf.cmd \
        include/config/tristate.conf: $(KCONFIG_CONFIG)
                $(Q)$(MAKE) -f $(srctree)/Makefile syncconfig
      
      ... syncconfig would be executed per target.
      
      Using a pattern rule makes sure that syncconfig is executed just once
      because Make assumes the recipe will create all of the targets.
      
      Here is a quote from the GNU Make manual [1]:
      
      "Pattern rules may have more than one target. Unlike normal rules,
      this does not act as many different rules with the same prerequisites
      and recipe. If a pattern rule has multiple targets, make knows that
      the rule's recipe is responsible for making all of the targets. The
      recipe is executed only once to make all the targets. When searching
      for a pattern rule to match a target, the target patterns of a rule
      other than the one that matches the target in need of a rule are
      incidental: make worries only about giving a recipe and prerequisites
      to the file presently in question. However, when this file's recipe is
      run, the other targets are marked as having been updated themselves."
      
      [1]: https://www.gnu.org/software/make/manual/html_node/Pattern-Intro.htmlSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      638ecaf5
    • B
      scsi: core: replace GFP_ATOMIC with GFP_KERNEL in scsi_scan.c · 5db10748
      Benjamin Block 提交于
      [ Upstream commit 1749ef00f7312679f76d5e9104c5d1e22a829038 ]
      
      We had a test-report where, under memory pressure, adding LUNs to the
      systems would fail (the tests add LUNs strictly in sequence):
      
      [ 5525.853432] scsi 0:0:1:1088045124: Direct-Access     IBM      2107900          .148 PQ: 0 ANSI: 5
      [ 5525.853826] scsi 0:0:1:1088045124: alua: supports implicit TPGS
      [ 5525.853830] scsi 0:0:1:1088045124: alua: device naa.6005076303ffd32700000000000044da port group 0 rel port 43
      [ 5525.853931] sd 0:0:1:1088045124: Attached scsi generic sg10 type 0
      [ 5525.854075] sd 0:0:1:1088045124: [sdk] Disabling DIF Type 1 protection
      [ 5525.855495] sd 0:0:1:1088045124: [sdk] 2097152 512-byte logical blocks: (1.07 GB/1.00 GiB)
      [ 5525.855606] sd 0:0:1:1088045124: [sdk] Write Protect is off
      [ 5525.855609] sd 0:0:1:1088045124: [sdk] Mode Sense: ed 00 00 08
      [ 5525.855795] sd 0:0:1:1088045124: [sdk] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
      [ 5525.857838]  sdk: sdk1
      [ 5525.859468] sd 0:0:1:1088045124: [sdk] Attached SCSI disk
      [ 5525.865073] sd 0:0:1:1088045124: alua: transition timeout set to 60 seconds
      [ 5525.865078] sd 0:0:1:1088045124: alua: port group 00 state A preferred supports tolusnA
      [ 5526.015070] sd 0:0:1:1088045124: alua: port group 00 state A preferred supports tolusnA
      [ 5526.015213] sd 0:0:1:1088045124: alua: port group 00 state A preferred supports tolusnA
      [ 5526.587439] scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured
      [ 5526.588562] scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured
      
      Looking at the code of scsi_alloc_sdev(), and all the calling contexts,
      there seems to be no reason to use GFP_ATMOIC here. All the different
      call-contexts use a mutex at some point, and nothing in between that
      requires no sleeping, as far as I could see. Additionally, the code that
      later allocates the block queue for the device (scsi_mq_alloc_queue())
      already uses GFP_KERNEL.
      
      There are similar allocations in two other functions:
      scsi_probe_and_add_lun(), and scsi_add_lun(),; that can also be done with
      GFP_KERNEL.
      
      Here is the contexts for the three functions so far:
      
          scsi_alloc_sdev()
              scsi_probe_and_add_lun()
                  scsi_sequential_lun_scan()
                      __scsi_scan_target()
                          scsi_scan_target()
                              mutex_lock()
                          scsi_scan_channel()
                              scsi_scan_host_selected()
                                  mutex_lock()
                  scsi_report_lun_scan()
                      __scsi_scan_target()
          	            ...
                  __scsi_add_device()
                      mutex_lock()
                  __scsi_scan_target()
                      ...
              scsi_report_lun_scan()
                  ...
              scsi_get_host_dev()
                  mutex_lock()
      
          scsi_probe_and_add_lun()
              ...
      
          scsi_add_lun()
              scsi_probe_and_add_lun()
                  ...
      
      So replace all these, and give them a bit of a better chance to succeed,
      with more chances of reclaim.
      Signed-off-by: NBenjamin Block <bblock@linux.ibm.com>
      Reviewed-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      5db10748
    • A
      powerpc/powernv/ioda: Fix locked_vm counting for memory used by IOMMU tables · 4acf7974
      Alexey Kardashevskiy 提交于
      [ Upstream commit 11f5acce2fa43b015a8120fa7620fa4efd0a2952 ]
      
      We store 2 multilevel tables in iommu_table - one for the hardware and
      one with the corresponding userspace addresses. Before allocating
      the tables, the iommu_table_group_ops::get_table_size() hook returns
      the combined size of the two and VFIO SPAPR TCE IOMMU driver adjusts
      the locked_vm counter correctly. When the table is actually allocated,
      the amount of allocated memory is stored in iommu_table::it_allocated_size
      and used to decrement the locked_vm counter when we release the memory
      used by the table; .get_table_size() and .create_table() calculate it
      independently but the result is expected to be the same.
      
      However the allocator does not add the userspace table size to
      .it_allocated_size so when we destroy the table because of VFIO PCI
      unplug (i.e. VFIO container is gone but the userspace keeps running),
      we decrement locked_vm by just a half of size of memory we are
      releasing.
      
      To make things worse, since we enabled on-demand allocation of
      indirect levels, it_allocated_size contains only the amount of memory
      actually allocated at the table creation time which can just be a
      fraction. It is not a problem with incrementing locked_vm (as
      get_table_size() value is used) but it is with decrementing.
      
      As the result, we leak locked_vm and may not be able to allocate more
      IOMMU tables after few iterations of hotplug/unplug.
      
      This sets it_allocated_size in the pnv_pci_ioda2_ops::create_table()
      hook to what pnv_pci_ioda2_get_table_size() returns so from now on we
      have a single place which calculates the maximum memory a table can
      occupy. The original meaning of it_allocated_size is somewhat lost now
      though.
      
      We do not ditch it_allocated_size whatsoever here and we do not call
      get_table_size() from vfio_iommu_spapr_tce.c when decrementing
      locked_vm as we may have multiple IOMMU groups per container and even
      though they all are supposed to have the same get_table_size()
      implementation, there is a small chance for failure or confusion.
      
      Fixes: 090bad39 ("powerpc/powernv: Add indirect levels to it_userspace")
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      4acf7974
    • P
      usb: chipidea: Grab the (legacy) USB PHY by phandle first · 6030bcc0
      Paul Kocialkowski 提交于
      [ Upstream commit 68ef236274793066b9ba3154b16c0acc1c891e5c ]
      
      According to the chipidea driver bindings, the USB PHY is specified via
      the "phys" phandle node. However, this only takes effect for USB PHYs
      that use the common PHY framework. For legacy USB PHYs, a simple lookup
      based on the USB PHY type is done instead.
      
      This does not play out well when more than one USB PHY is registered,
      since the first registered PHY matching the type will always be
      returned regardless of what the driver was bound to.
      
      Fix this by looking up the PHY based on the "phys" phandle node.
      Although generic PHYs are rather matched by their "phys-name" and not
      the "phys" phandle directly, there is no helper for similar lookup on
      legacy PHYs and it's probably not worth the effort to add it.
      
      When no legacy USB PHY is found by phandle, fallback to grabbing any
      registered USB2 PHY. This ensures backward compatibility if some users
      were actually relying on this mechanism.
      Signed-off-by: NPaul Kocialkowski <paul.kocialkowski@bootlin.com>
      Signed-off-by: NPeter Chen <peter.chen@nxp.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      6030bcc0
    • E
      crypto: cavium/zip - fix collision with generic cra_driver_name · b142c797
      Eric Biggers 提交于
      [ Upstream commit 41798036430015ad45137db2d4c213cd77fd0251 ]
      
      The cavium/zip implementation of the deflate compression algorithm is
      incorrectly being registered under the generic driver name, which
      prevents the generic implementation from being registered with the
      crypto API when CONFIG_CRYPTO_DEV_CAVIUM_ZIP=y.  Similarly the lzs
      algorithm (which does not currently have a generic implementation...)
      is incorrectly being registered as lzs-generic.
      
      Fix the naming collision by adding a suffix "-cavium" to the
      cra_driver_name of the cavium/zip algorithms.
      
      Fixes: 640035a2 ("crypto: zip - Add ThunderX ZIP driver core")
      Cc: Mahipal Challa <mahipalreddy2006@gmail.com>
      Cc: Jan Glauber <jglauber@cavium.com>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      b142c797
    • J
      crypto: crypto4xx - add missing of_node_put after of_device_is_available · d401d121
      Julia Lawall 提交于
      [ Upstream commit 8c2b43d2d85b48a97d2f8279278a4aac5b45f925 ]
      
      Add an of_node_put when a tested device node is not available.
      
      The semantic patch that fixes this problem is as follows
      (http://coccinelle.lip6.fr):
      
      // <smpl>
      @@
      identifier f;
      local idexpression e;
      expression x;
      @@
      
      e = f(...);
      ... when != of_node_put(e)
          when != x = e
          when != e = x
          when any
      if (<+...of_device_is_available(e)...+>) {
        ... when != of_node_put(e)
      (
        return e;
      |
      + of_node_put(e);
        return ...;
      )
      }
      // </smpl>
      
      Fixes: 5343e674 ("crypto4xx: integrate ppc4xx-rng into crypto4xx")
      Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      d401d121
    • W
      mt76: fix a leaked reference by adding a missing of_node_put · 241ebd2e
      Wen Yang 提交于
      [ Upstream commit 34e022d8b780a03902d82fb3997ba7c7b1f40c81 ]
      
      The call to of_find_node_by_phandle returns a node pointer with refcount
      incremented thus it must be explicitly decremented after the last
      usage.
      
      Detected by coccinelle with the following warnings:
      ./drivers/net/wireless/mediatek/mt76/eeprom.c:58:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 48, but without a corresponding object release within this function.
      ./drivers/net/wireless/mediatek/mt76/eeprom.c:61:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 48, but without a corresponding object release within this function.
      ./drivers/net/wireless/mediatek/mt76/eeprom.c:67:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 48, but without a corresponding object release within this function.
      ./drivers/net/wireless/mediatek/mt76/eeprom.c:70:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 48, but without a corresponding object release within this function.
      ./drivers/net/wireless/mediatek/mt76/eeprom.c:72:1-7: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 48, but without a corresponding object release within this function.
      Signed-off-by: NWen Yang <wen.yang99@zte.com.cn>
      Cc: Felix Fietkau <nbd@nbd.name>
      Cc: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>
      Cc: Kalle Valo <kvalo@codeaurora.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Matthias Brugger <matthias.bgg@gmail.com>
      Cc: linux-wireless@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-mediatek@lists.infradead.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      241ebd2e
    • A
      wil6210: check null pointer in _wil_cfg80211_merge_extra_ies · 6115055b
      Alexei Avshalom Lazar 提交于
      [ Upstream commit de77a53c2d1e8fb3621e63e8e1f0f0c9a1a99ff7 ]
      
      ies1 or ies2 might be null when code inside
      _wil_cfg80211_merge_extra_ies access them.
      Add explicit check for null and make sure ies1/ies2 are not
      accessed in such a case.
      
      spos might be null and be accessed inside
      _wil_cfg80211_merge_extra_ies.
      Add explicit check for null in the while condition statement
      and make sure spos is not accessed in such a case.
      Signed-off-by: NAlexei Avshalom Lazar <ailizaro@codeaurora.org>
      Signed-off-by: NMaya Erez <merez@codeaurora.org>
      Signed-off-by: NKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      6115055b
    • R
      PCI/PME: Fix hotplug/sysfs remove deadlock in pcie_pme_remove() · 9546c366
      Rafael J. Wysocki 提交于
      [ Upstream commit 95c80bc6952b6a5badc7b702d23e5bf14d251e7c ]
      
      Dongdong reported a deadlock triggered by a hotplug event during a sysfs
      "remove" operation:
      
        pciehp 0000:00:0c.0:pcie004: Slot(0-1): Link Up
        # echo 1 > 0000:00:0c.0/remove
      
        PME and hotplug share an MSI/MSI-X vector.  The sysfs "remove" side is:
      
          remove_store
             pci_stop_and_remove_bus_device_locked
      	 pci_lock_rescan_remove
      	 pci_stop_and_remove_bus_device
      	   ...
      	   pcie_pme_remove
      	     pcie_pme_suspend
      	       synchronize_irq        # wait for hotplug IRQ handler
      	 pci_unlock_rescan_remove
      
        The hotplug side is:
      
          pciehp_ist
             pciehp_handle_presence_or_link_change
      	 pciehp_configure_device
      	   pci_lock_rescan_remove     # wait for pci_unlock_rescan_remove()
      
        INFO: task bash:10913 blocked for more than 120 seconds.
      
        # ps -ax |grep D
         PID TTY      STAT   TIME COMMAND
        10913 ttyAMA0  Ds+    0:00 -bash
        14022 ?        D      0:00 [irq/745-pciehp]
      
        # cat /proc/14022/stack
        __switch_to+0x94/0xd8
        pci_lock_rescan_remove+0x20/0x28
        pciehp_configure_device+0x30/0x140
        pciehp_handle_presence_or_link_change+0x324/0x458
        pciehp_ist+0x1dc/0x1e0
      
        # cat /proc/10913/stack
        __switch_to+0x94/0xd8
        synchronize_irq+0x8c/0xc0
        pcie_pme_suspend+0xa4/0x118
        pcie_pme_remove+0x20/0x40
        pcie_port_remove_service+0x3c/0x58
        ...
        pcie_port_device_remove+0x2c/0x48
        pcie_portdrv_remove+0x68/0x78
        pci_device_remove+0x48/0x120
        ...
        pci_stop_bus_device+0x84/0xc0
        pci_stop_and_remove_bus_device_locked+0x24/0x40
        remove_store+0xa4/0xb8
        dev_attr_store+0x44/0x60
        sysfs_kf_write+0x58/0x80
      
      It is incorrect to call pcie_pme_suspend() from pcie_pme_remove() for two
      reasons.
      
      First, pcie_pme_suspend() calls synchronize_irq(), which will wait for the
      native hotplug interrupt handler as well as for the PME one, because they
      share one IRQ (as per the spec).  That may deadlock if hotplug is signaled
      while pcie_pme_remove() is running and the latter calls
      pci_lock_rescan_remove() before the former.
      
      Second, if pcie_pme_suspend() figures out that wakeup needs to be enabled
      for the port, it will return without disabling the interrupt as expected by
      pcie_pme_remove() which was overlooked by commit c7b5a4e6 ("PCI / PM:
      Fix native PME handling during system suspend/resume").
      
      To fix that, rework pcie_pme_remove() to disable the PME interrupt, clear
      its status and prevent the PME worker function from re-enabling it before
      calling free_irq() on it, which should be sufficient.
      
      Fixes: c7b5a4e6 ("PCI / PM: Fix native PME handling during system suspend/resume")
      Link: https://lore.kernel.org/linux-pci/c7697e7c-e1af-13e4-8491-0a3996e6ab5d@huawei.comReported-by: NDongdong Liu <liudongdong3@huawei.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      [bhelgaas: add URL and deadlock details from Dongdong]
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      9546c366
    • T
      tools lib traceevent: Fix buffer overflow in arg_eval · 224c996e
      Tony Jones 提交于
      [ Upstream commit 7c5b019e3a638a5a290b0ec020f6ca83d2ec2aaa ]
      
      Fix buffer overflow observed when running perf test.
      
      The overflow is when trying to evaluate "1ULL << (64 - 1)" which is
      resulting in -9223372036854775808 which overflows the 20 character
      buffer.
      
      If is possible this bug has been reported before but I still don't see
      any fix checked in:
      
      See: https://www.spinics.net/lists/linux-perf-users/msg07714.htmlReported-by: NMichael Sartain <mikesart@fastmail.com>
      Reported-by: NMathias Krause <minipli@googlemail.com>
      Signed-off-by: NTony Jones <tonyj@suse.de>
      Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Fixes: f7d82350 ("tools/events: Add files to create libtraceevent.a")
      Link: http://lkml.kernel.org/r/20190228015532.8941-1-tonyj@suse.deSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      224c996e
    • C
      fs: fix guard_bio_eod to check for real EOD errors · 83c39533
      Carlos Maiolino 提交于
      [ Upstream commit dce30ca9e3b676fb288c33c1f4725a0621361185 ]
      
      guard_bio_eod() can truncate a segment in bio to allow it to do IO on
      odd last sectors of a device.
      
      It already checks if the IO starts past EOD, but it does not consider
      the possibility of an IO request starting within device boundaries can
      contain more than one segment past EOD.
      
      In such cases, truncated_bytes can be bigger than PAGE_SIZE, and will
      underflow bvec->bv_len.
      
      Fix this by checking if truncated_bytes is lower than PAGE_SIZE.
      
      This situation has been found on filesystems such as isofs and vfat,
      which doesn't check the device size before mount, if the device is
      smaller than the filesystem itself, a readahead on such filesystem,
      which spans EOD, can trigger this situation, leading a call to
      zero_user() with a wrong size possibly corrupting memory.
      
      I didn't see any crash, or didn't let the system run long enough to
      check if memory corruption will be hit somewhere, but adding
      instrumentation to guard_bio_end() to check truncated_bytes size, was
      enough to see the error.
      
      The following script can trigger the error.
      
      MNT=/mnt
      IMG=./DISK.img
      DEV=/dev/loop0
      
      mkfs.vfat $IMG
      mount $IMG $MNT
      cp -R /etc $MNT &> /dev/null
      umount $MNT
      
      losetup -D
      
      losetup --find --show --sizelimit 16247280 $IMG
      mount $DEV $MNT
      
      find $MNT -type f -exec cat {} + >/dev/null
      
      Kudos to Eric Sandeen for coming up with the reproducer above
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      83c39533
    • L
      jbd2: fix invalid descriptor block checksum · 6a817a7a
      luojiajun 提交于
      [ Upstream commit 6e876c3dd205d30b0db6850e97a03d75457df007 ]
      
      In jbd2_journal_commit_transaction(), if we are in abort mode,
      we may flush the buffer without setting descriptor block checksum
      by goto start_journal_io. Then fs is mounted,
      jbd2_descriptor_block_csum_verify() failed.
      
      [  271.379811] EXT4-fs (vdd): shut down requested (2)
      [  271.381827] Aborting journal on device vdd-8.
      [  271.597136] JBD2: Invalid checksum recovering block 22199 in log
      [  271.598023] JBD2: recovery failed
      [  271.598484] EXT4-fs (vdd): error loading journal
      
      Fix this problem by keep setting descriptor block checksum if the
      descriptor buffer is not NULL.
      
      This checksum problem can be reproduced by xfstests generic/388.
      Signed-off-by: Nluojiajun <luojiajun3@huawei.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      6a817a7a
    • F
      netfilter: conntrack: tcp: only close if RST matches exact sequence · ca66f667
      Florian Westphal 提交于
      [ Upstream commit be0502a3f2e94211a8809a09ecbc3a017189b8fb ]
      
      TCP resets cause instant transition from established to closed state
      provided the reset is in-window.  Endpoints that implement RFC 5961
      require resets to match the next expected sequence number.
      RST segments that are in-window (but that do not match RCV.NXT) are
      ignored, and a "challenge ACK" is sent back.
      
      Main problem for conntrack is that its a middlebox, i.e.  whereas an end
      host might have ACK'd SEQ (and would thus accept an RST with this
      sequence number), conntrack might not have seen this ACK (yet).
      
      Therefore we can't simply flag RSTs with non-exact match as invalid.
      
      This updates RST processing as follows:
      
      1. If the connection is in a state other than ESTABLISHED, nothing is
         changed, RST is subject to normal in-window check.
      
      2. If the RSTs sequence number either matches exactly RCV.NXT,
         connection state moves to CLOSE.
      
      3. The same applies if the RST sequence number aligns with a previous
         packet in the same direction.
      
      In all other cases, the connection remains in ESTABLISHED state.
      If the normal-in-window check passes, the timeout will be lowered
      to that of CLOSE.
      
      If the peer sends a challenge ack, connection timeout will be reset.
      
      If the challenge ACK triggers another RST (RST was valid after all),
      this 2nd RST will match expected sequence and conntrack state changes to
      CLOSE.
      
      If no challenge ACK is received, the connection will time out after
      CLOSE seconds (10 seconds by default), just like without this patch.
      
      Packetdrill test case:
      
      0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      0.000 bind(3, ..., ...) = 0
      0.000 listen(3, 1) = 0
      
      0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
      0.100 > S. 0:0(0) ack 1 win 64240 <mss 1460,nop,nop,sackOK,nop,wscale 7>
      0.200 < . 1:1(0) ack 1 win 257
      0.200 accept(3, ..., ...) = 4
      
      // Receive a segment.
      0.210 < P. 1:1001(1000) ack 1 win 46
      0.210 > . 1:1(0) ack 1001
      
      // Application writes 1000 bytes.
      0.250 write(4, ..., 1000) = 1000
      0.250 > P. 1:1001(1000) ack 1001
      
      // First reset, old sequence. Conntrack (correctly) considers this
      // invalid due to failed window validation (regardless of this patch).
      0.260 < R  2:2(0) ack 1001 win 260
      
      // 2nd reset, but too far ahead sequence.  Same: correctly handled
      // as invalid.
      0.270 < R 99990001:99990001(0) ack 1001 win 260
      
      // in-window, but not exact sequence.
      // Current Linux kernels might reply with a challenge ack, and do not
      // remove connection.
      // Without this patch, conntrack state moves to CLOSE.
      // With patch, timeout is lowered like CLOSE, but connection stays
      // in ESTABLISHED state.
      0.280 < R 1010:1010(0) ack 1001 win 260
      
      // Expect challenge ACK
      0.281 > . 1001:1001(0) ack 1001 win 501
      
      // With or without this patch, RST will cause connection
      // to move to CLOSE (sequence number matches)
      // 0.282 < R 1001:1001(0) ack 1001 win 260
      
      // ACK
      0.300 < . 1001:1001(0) ack 1001 win 257
      
      // more data could be exchanged here, connection
      // is still established
      
      // Client closes the connection.
      0.610 < F. 1001:1001(0) ack 1001 win 260
      0.650 > . 1001:1001(0) ack 1002
      
      // Close the connection without reading outstanding data
      0.700 close(4) = 0
      
      // so one more reset.  Will be deemed acceptable with patch as well:
      // connection is already closing.
      0.701 > R. 1001:1001(0) ack 1002 win 501
      // End packetdrill test case.
      
      With patch, this generates following conntrack events:
         [NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [UNREPLIED]
      [UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80
      [UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
      [UPDATE] 120 FIN_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
      [UPDATE] 60 CLOSE_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
      [UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
      
      Without patch, first RST moves connection to close, whereas socket state
      does not change until FIN is received.
         [NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [UNREPLIED]
      [UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80
      [UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED]
      [UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED]
      
      Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      ca66f667
    • L
      netfilter: nf_tables: check the result of dereferencing base_chain->stats · 709aaa09
      Li RongQing 提交于
      [ Upstream commit a9f5e78c403d2d62ade4f4c85040efc85f4049b8 ]
      
      Check the result of dereferencing base_chain->stats, instead of result
      of this_cpu_ptr with NULL.
      
      base_chain->stats maybe be changed to NULL when a chain is updated and a
      new NULL counter can be attached.
      
      And we do not need to check returning of this_cpu_ptr since
      base_chain->stats is from percpu allocator if it is non-NULL,
      this_cpu_ptr returns a valid value.
      
      And fix two sparse error by replacing rcu_access_pointer and
      rcu_dereference with READ_ONCE under rcu_read_lock.
      
      Thanks for Eric's help to finish this patch.
      
      Fixes: 00924094 ("netfilter: nf_tables: don't assume chain stats are set when jumplabel is set")
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NZhang Yu <zhangyu31@baidu.com>
      Signed-off-by: NLi RongQing <lirongqing@baidu.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      709aaa09
    • Y
      cifs: Fix NULL pointer dereference of devname · 36a3219e
      Yao Liu 提交于
      [ Upstream commit 68e2672f8fbd1e04982b8d2798dd318bf2515dd2 ]
      
      There is a NULL pointer dereference of devname in strspn()
      
      The oops looks something like:
      
        CIFS: Attempting to mount (null)
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
        ...
        RIP: 0010:strspn+0x0/0x50
        ...
        Call Trace:
         ? cifs_parse_mount_options+0x222/0x1710 [cifs]
         ? cifs_get_volume_info+0x2f/0x80 [cifs]
         cifs_setup_volume_info+0x20/0x190 [cifs]
         cifs_get_volume_info+0x50/0x80 [cifs]
         cifs_smb3_do_mount+0x59/0x630 [cifs]
         ? ida_alloc_range+0x34b/0x3d0
         cifs_do_mount+0x11/0x20 [cifs]
         mount_fs+0x52/0x170
         vfs_kern_mount+0x6b/0x170
         do_mount+0x216/0xdc0
         ksys_mount+0x83/0xd0
         __x64_sys_mount+0x25/0x30
         do_syscall_64+0x65/0x220
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fix this by adding a NULL check on devname in cifs_parse_devname()
      Signed-off-by: NYao Liu <yotta.liu@ucloud.cn>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      36a3219e
    • N
      cifs: Accept validate negotiate if server return NT_STATUS_NOT_SUPPORTED · d579b4ea
      Namjae Jeon 提交于
      [ Upstream commit 969ae8e8d4ee54c99134d3895f2adf96047f5bee ]
      
      Old windows version or Netapp SMB server will return
      NT_STATUS_NOT_SUPPORTED since they do not allow or implement
      FSCTL_VALIDATE_NEGOTIATE_INFO. The client should accept the response
      provided it's properly signed.
      
      See
      https://blogs.msdn.microsoft.com/openspecification/2012/06/28/smb3-secure-dialect-negotiation/
      
      and
      
      MS-SMB2 validate negotiate response processing:
      https://msdn.microsoft.com/en-us/library/hh880630.aspx
      
      Samba client had already handled it.
      https://bugzilla.samba.org/attachment.cgi?id=13285&action=editSigned-off-by: NNamjae Jeon <linkinjeon@gmail.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      d579b4ea
    • C
      f2fs: fix to check inline_xattr_size boundary correctly · 4ab78f4d
      Chao Yu 提交于
      [ Upstream commit 500e0b28ecd3c5aade98f3c3a339d18dcb166bb6 ]
      
      We use below condition to check inline_xattr_size boundary:
      
      	if (!F2FS_OPTION(sbi).inline_xattr_size ||
      		F2FS_OPTION(sbi).inline_xattr_size >=
      				DEF_ADDRS_PER_INODE -
      				F2FS_TOTAL_EXTRA_ATTR_SIZE -
      				DEF_INLINE_RESERVED_SIZE -
      				DEF_MIN_INLINE_SIZE)
      
      There is there problems in that check:
      - we should allow inline_xattr_size equaling to min size of inline
      {data,dentry} area.
      - F2FS_TOTAL_EXTRA_ATTR_SIZE and inline_xattr_size are based on
      different size unit, previous one is 4 bytes, latter one is 1 bytes.
      - DEF_MIN_INLINE_SIZE only indicate min size of inline data area,
      however, we need to consider min size of inline dentry area as well,
      minimal inline dentry should at least contain two entries: '.' and
      '..', so that min inline_dentry size is 40 bytes.
      
      .bitmap		1 * 1 = 1
      .reserved	1 * 1 = 1
      .dentry		11 * 2 = 22
      .filename	8 * 2 = 16
      total		40
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      4ab78f4d