1. 17 12月, 2018 40 次提交
    • K
      RDMA/rdmavt: Fix rvt_create_ah function signature · 8653ffc3
      Kamal Heib 提交于
      [ Upstream commit 4f32fb921b153ae9ea280e02a3e91509fffc03d3 ]
      
      rdmavt uses a crazy system that looses the type checking when assinging
      functions to struct ib_device function pointers. Because of this the
      signature to this function was not changed when the below commit revised
      things.
      
      Fix the signature so we are not calling a function pointer with a
      mismatched signature.
      
      Fixes: 477864c8 ("IB/core: Let create_ah return extended response to user")
      Signed-off-by: NKamal Heib <kamalheib1@gmail.com>
      Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      8653ffc3
    • S
      RDMA/bnxt_re: Avoid accessing the device structure after it is freed · 59315d0c
      Selvin Xavier 提交于
      [ Upstream commit a6c66d6a08b88cc10aca9d3f65cfae31e7652a99 ]
      
      When bnxt_re_ib_reg returns failure, the device structure gets
      freed. Driver tries to access the device pointer
      after it is freed.
      
      [ 4871.034744] Failed to register with netedev: 0xffffffa1
      [ 4871.034765] infiniband (null): Failed to register with IB: 0xffffffea
      [ 4871.046430] ==================================================================
      [ 4871.046437] BUG: KASAN: use-after-free in bnxt_re_task+0x63/0x180 [bnxt_re]
      [ 4871.046439] Write of size 4 at addr ffff880fa8406f48 by task kworker/u48:2/17813
      
      [ 4871.046443] CPU: 20 PID: 17813 Comm: kworker/u48:2 Kdump: loaded Tainted: G B OE  4.20.0-rc1+ #42
      [ 4871.046444] Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 1.0.4 08/28/2014
      [ 4871.046447] Workqueue: bnxt_re bnxt_re_task [bnxt_re]
      [ 4871.046449] Call Trace:
      [ 4871.046454]  dump_stack+0x91/0xeb
      [ 4871.046458]  print_address_description+0x6a/0x2a0
      [ 4871.046461]  kasan_report+0x176/0x2d0
      [ 4871.046463]  ? bnxt_re_task+0x63/0x180 [bnxt_re]
      [ 4871.046466]  bnxt_re_task+0x63/0x180 [bnxt_re]
      [ 4871.046470]  process_one_work+0x216/0x5b0
      [ 4871.046471]  ? process_one_work+0x189/0x5b0
      [ 4871.046475]  worker_thread+0x4e/0x3d0
      [ 4871.046479]  kthread+0x10e/0x140
      [ 4871.046480]  ? process_one_work+0x5b0/0x5b0
      [ 4871.046482]  ? kthread_stop+0x220/0x220
      [ 4871.046486]  ret_from_fork+0x3a/0x50
      
      [ 4871.046492] The buggy address belongs to the page:
      [ 4871.046494] page:ffffea003ea10180 count:0 mapcount:0 mapping:0000000000000000 index:0x0
      [ 4871.046495] flags: 0x57ffffc0000000()
      [ 4871.046498] raw: 0057ffffc0000000 0000000000000000 ffffea003ea10188 0000000000000000
      [ 4871.046500] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
      [ 4871.046501] page dumped because: kasan: bad access detected
      
      Avoid accessing the device structure once it is freed.
      
      Fixes: 497158aa ("RDMA/bnxt_re: Fix the ib_reg failure cleanup")
      Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      59315d0c
    • S
      RDMA/bnxt_re: Fix system hang when registration with L2 driver fails · f4515855
      Selvin Xavier 提交于
      [ Upstream commit 3c4b1419c33c2417836a63f8126834ee36968321 ]
      
      Driver doesn't release rtnl lock if registration with
      L2 driver (bnxt_re_register_netdev) fais and this causes
      hang while requesting for the next lock.
      
      [  371.635416] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  371.635417] kworker/u48:1   D    0   634      2 0x80000000
      [  371.635423] Workqueue: bnxt_re bnxt_re_task [bnxt_re]
      [  371.635424] Call Trace:
      [  371.635426]  ? __schedule+0x36b/0xbd0
      [  371.635429]  schedule+0x39/0x90
      [  371.635430]  schedule_preempt_disabled+0x11/0x20
      [  371.635431]  __mutex_lock+0x45b/0x9c0
      [  371.635433]  ? __mutex_lock+0x16d/0x9c0
      [  371.635435]  ? bnxt_re_ib_reg+0x2b/0xb30 [bnxt_re]
      [  371.635438]  ? wake_up_klogd+0x37/0x40
      [  371.635442]  bnxt_re_ib_reg+0x2b/0xb30 [bnxt_re]
      [  371.635447]  bnxt_re_task+0xfd/0x180 [bnxt_re]
      [  371.635449]  process_one_work+0x216/0x5b0
      [  371.635450]  ? process_one_work+0x189/0x5b0
      [  371.635453]  worker_thread+0x4e/0x3d0
      [  371.635455]  kthread+0x10e/0x140
      [  371.635456]  ? process_one_work+0x5b0/0x5b0
      [  371.635458]  ? kthread_stop+0x220/0x220
      [  371.635460]  ret_from_fork+0x3a/0x50
      [  371.635477] INFO: task NetworkManager:1228 blocked for more than 120 seconds.
      [  371.635478]       Tainted: G    B      OE     4.20.0-rc1+ #42
      [  371.635479] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      
      Release the rtnl_lock correctly in the failure path.
      
      Fixes: de5c95d0 ("RDMA/bnxt_re: Fix system crash during RDMA resource initialization")
      Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      f4515855
    • P
      RDMA/core: Add GIDs while changing MAC addr only for registered ndev · 5a49ef98
      Parav Pandit 提交于
      [ Upstream commit d52ef88a9f4be523425730da3239cf87bee936da ]
      
      Currently when MAC address is changed, regardless of the netdev reg_state,
      GID entries are removed and added to reflect the new MAC address and new
      default GID entries.
      
      When a bonding device is used and the underlying PCI device is removed
      several netdevice events are generated. Two events of the interest are
      CHANGEADDR and UNREGISTER event on lower(slave) netdevice of the bond
      netdevice.
      
      Sometimes CHANGEADDR event is generated when netdev state is
      UNREGISTERING (after UNREGISTER event is generated). In this scenario, GID
      entries for default GIDs are added and never deleted because GID entries
      are deleted only when netdev state is < UNREGISTERED.
      
      This leads to non zero reference count on the netdevice. Due to this, PCI
      device unbind operation is getting stuck.
      
      To avoid it, when changing mac address, add GID entries only if netdev is
      in REGISTERED state.
      
      Fixes: 03db3a2d ("IB/core: Add RoCE GID table management")
      Signed-off-by: NParav Pandit <parav@mellanox.com>
      Reviewed-by: NMark Bloch <markb@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      5a49ef98
    • M
      RDMA/mlx5: Fix fence type for IB_WR_LOCAL_INV WR · 7c736fee
      Majd Dibbiny 提交于
      [ Upstream commit 074fca3a18e7e1e0d4d7dcc9d7badc43b90232f4 ]
      
      Currently, for IB_WR_LOCAL_INV WR, when the next fence is None, the
      current fence will be SMALL instead of Normal Fence.
      
      Without this patch krping doesn't work on CX-5 devices and throws
      following error:
      
      The error messages are from CX5 driver are: (from server side)
      [ 710.434014] mlx5_0:dump_cqe:278:(pid 2712): dump error cqe
      [ 710.434016] 00000000 00000000 00000000 00000000
      [ 710.434016] 00000000 00000000 00000000 00000000
      [ 710.434017] 00000000 00000000 00000000 00000000
      [ 710.434018] 00000000 93003204 100000b8 000524d2
      [ 710.434019] krping: cq completion failed with wr_id 0 status 4 opcode 128 vender_err 32
      
      Fixed the logic to set the correct fence type.
      
      Fixes: 6e8484c5 ("RDMA/mlx5: set UMR wqe fence according to HCA cap")
      Signed-off-by: NMajd Dibbiny <majd@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      7c736fee
    • R
      Btrfs: send, fix infinite loop due to directory rename dependencies · 91f6a9aa
      Robbie Ko 提交于
      [ Upstream commit a4390aee72713d9e73f1132bcdeb17d72fbbf974 ]
      
      When doing an incremental send, due to the need of delaying directory move
      (rename) operations we can end up in infinite loop at
      apply_children_dir_moves().
      
      An example scenario that triggers this problem is described below, where
      directory names correspond to the numbers of their respective inodes.
      
      Parent snapshot:
      
       .
       |--- 261/
             |--- 271/
                   |--- 266/
                         |--- 259/
                         |--- 260/
                         |     |--- 267
                         |
                         |--- 264/
                         |     |--- 258/
                         |           |--- 257/
                         |
                         |--- 265/
                         |--- 268/
                         |--- 269/
                         |     |--- 262/
                         |
                         |--- 270/
                         |--- 272/
                         |     |--- 263/
                         |     |--- 275/
                         |
                         |--- 274/
                               |--- 273/
      
      Send snapshot:
      
       .
       |-- 275/
            |-- 274/
                 |-- 273/
                      |-- 262/
                           |-- 269/
                                |-- 258/
                                     |-- 271/
                                          |-- 268/
                                               |-- 267/
                                                    |-- 270/
                                                         |-- 259/
                                                         |    |-- 265/
                                                         |
                                                         |-- 272/
                                                              |-- 257/
                                                                   |-- 260/
                                                                   |-- 264/
                                                                        |-- 263/
                                                                             |-- 261/
                                                                                  |-- 266/
      
      When processing inode 257 we delay its move (rename) operation because its
      new parent in the send snapshot, inode 272, was not yet processed. Then
      when processing inode 272, we delay the move operation for that inode
      because inode 274 is its ancestor in the send snapshot. Finally we delay
      the move operation for inode 274 when processing it because inode 275 is
      its new parent in the send snapshot and was not yet moved.
      
      When finishing processing inode 275, we start to do the move operations
      that were previously delayed (at apply_children_dir_moves()), resulting in
      the following iterations:
      
      1) We issue the move operation for inode 274;
      
      2) Because inode 262 depended on the move operation of inode 274 (it was
         delayed because 274 is its ancestor in the send snapshot), we issue the
         move operation for inode 262;
      
      3) We issue the move operation for inode 272, because it was delayed by
         inode 274 too (ancestor of 272 in the send snapshot);
      
      4) We issue the move operation for inode 269 (it was delayed by 262);
      
      5) We issue the move operation for inode 257 (it was delayed by 272);
      
      6) We issue the move operation for inode 260 (it was delayed by 272);
      
      7) We issue the move operation for inode 258 (it was delayed by 269);
      
      8) We issue the move operation for inode 264 (it was delayed by 257);
      
      9) We issue the move operation for inode 271 (it was delayed by 258);
      
      10) We issue the move operation for inode 263 (it was delayed by 264);
      
      11) We issue the move operation for inode 268 (it was delayed by 271);
      
      12) We verify if we can issue the move operation for inode 270 (it was
          delayed by 271). We detect a path loop in the current state, because
          inode 267 needs to be moved first before we can issue the move
          operation for inode 270. So we delay again the move operation for
          inode 270, this time we will attempt to do it after inode 267 is
          moved;
      
      13) We issue the move operation for inode 261 (it was delayed by 263);
      
      14) We verify if we can issue the move operation for inode 266 (it was
          delayed by 263). We detect a path loop in the current state, because
          inode 270 needs to be moved first before we can issue the move
          operation for inode 266. So we delay again the move operation for
          inode 266, this time we will attempt to do it after inode 270 is
          moved (its move operation was delayed in step 12);
      
      15) We issue the move operation for inode 267 (it was delayed by 268);
      
      16) We verify if we can issue the move operation for inode 266 (it was
          delayed by 270). We detect a path loop in the current state, because
          inode 270 needs to be moved first before we can issue the move
          operation for inode 266. So we delay again the move operation for
          inode 266, this time we will attempt to do it after inode 270 is
          moved (its move operation was delayed in step 12). So here we added
          again the same delayed move operation that we added in step 14;
      
      17) We attempt again to see if we can issue the move operation for inode
          266, and as in step 16, we realize we can not due to a path loop in
          the current state due to a dependency on inode 270. Again we delay
          inode's 266 rename to happen after inode's 270 move operation, adding
          the same dependency to the empty stack that we did in steps 14 and 16.
          The next iteration will pick the same move dependency on the stack
          (the only entry) and realize again there is still a path loop and then
          again the same dependency to the stack, over and over, resulting in
          an infinite loop.
      
      So fix this by preventing adding the same move dependency entries to the
      stack by removing each pending move record from the red black tree of
      pending moves. This way the next call to get_pending_dir_moves() will
      not return anything for the current parent inode.
      
      A test case for fstests, with this reproducer, follows soon.
      Signed-off-by: NRobbie Ko <robbieko@synology.com>
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      [Wrote changelog with example and more clear explanation]
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      91f6a9aa
    • R
      ARM: dts: at91: sama5d2: use the divided clock for SMC · b3159470
      Romain Izard 提交于
      [ Upstream commit 4ab7ca092c3c7ac8b16aa28eba723a8868f82f14 ]
      
      The SAMA5D2 is different from SAMA5D3 and SAMA5D4, as there are two
      different clocks for the peripherals in the SoC. The Static Memory
      controller is connected to the divided master clock.
      
      Unfortunately, the device tree does not correctly show this and uses the
      master clock directly. This clock is then used by the code for the NAND
      controller to calculate the timings for the controller, and we end up with
      slow NAND Flash access.
      
      Fix the device tree, and the performance of Flash access is improved.
      Signed-off-by: NRomain Izard <romain.izard.pro@gmail.com>
      Signed-off-by: NAlexandre Belloni <alexandre.belloni@bootlin.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      b3159470
    • M
      phy: qcom-qusb2: Fix HSTX_TRIM tuning with fused value for SDM845 · 4724b50f
      Manu Gautam 提交于
      [ Upstream commit c88520db18ba0b9a41326c3b8680e7c09eb4c381 ]
      
      Tune1 register on sdm845 is used to update HSTX_TRIM with fused
      setting. Enable same by specifying update_tune1_with_efuse flag
      for sdm845, otherwise driver ends up programming tune2 register.
      
      Fixes: ef17f6e2 ("phy: qcom-qusb2: Add QUSB2 PHYs support for sdm845")
      Signed-off-by: NManu Gautam <mgautam@codeaurora.org>
      Reviewed-by: NDouglas Anderson <dianders@chromium.org>
      Reviewed-by: NStephen Boyd <swboyd@chromium.org>
      Acked-by: NVivek Gautam <vivek.gautam@codeaurora.org>
      Signed-off-by: NKishon Vijay Abraham I <kishon@ti.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      4724b50f
    • M
      phy: qcom-qusb2: Use HSTX_TRIM fused value as is · d801a3ef
      Manu Gautam 提交于
      [ Upstream commit 6e34d358b24ffc40764eb3681164c79091765429 ]
      
      Fix HSTX_TRIM tuning logic which instead of using fused value
      as HSTX_TRIM, incorrectly performs bitwise OR operation with
      existing default value.
      
      Fixes: ca04d9d3 ("phy: qcom-qusb2: New driver for QUSB2 PHY on Qcom chips")
      Signed-off-by: NManu Gautam <mgautam@codeaurora.org>
      Reviewed-by: NDouglas Anderson <dianders@chromium.org>
      Reviewed-by: NStephen Boyd <swboyd@chromium.org>
      Acked-by: NVivek Gautam <vivek.gautam@codeaurora.org>
      Signed-off-by: NKishon Vijay Abraham I <kishon@ti.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      d801a3ef
    • A
      objtool: Fix segfault in .cold detection with -ffunction-sections · 3d2d2ba0
      Artem Savkov 提交于
      [ Upstream commit 22566c1603030f0a036ad564634b064ad1a55db2 ]
      
      Because find_symbol_by_name() traverses the same lists as
      read_symbols(), changing sym->name in place without copying it affects
      the result of find_symbol_by_name().  In the case where a ".cold"
      function precedes its parent in sec->symbol_list, it can result in a
      function being considered a parent of itself. This leads to function
      length being set to 0 and other consequent side-effects including a
      segfault in add_switch_table().  The effects of this bug are only
      visible when building with -ffunction-sections in KCFLAGS.
      
      Fix by copying the search string instead of modifying it in place.
      Signed-off-by: NArtem Savkov <asavkov@redhat.com>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 13810435 ("objtool: Support GCC 8's cold subfunctions")
      Link: http://lkml.kernel.org/r/910abd6b5a4945130fd44f787c24e07b9e07c8da.1542736240.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      3d2d2ba0
    • A
      objtool: Fix double-free in .cold detection error path · 79cd7b0e
      Artem Savkov 提交于
      [ Upstream commit 0b9301fb632f7111a3293a30cc5b20f1b82ed08d ]
      
      If read_symbols() fails during second list traversal (the one dealing
      with ".cold" subfunctions) it frees the symbol, but never deletes it
      from the list/hash_table resulting in symbol being freed again in
      elf_close(). Fix it by just returning an error, leaving cleanup to
      elf_close().
      Signed-off-by: NArtem Savkov <asavkov@redhat.com>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 13810435 ("objtool: Support GCC 8's cold subfunctions")
      Link: http://lkml.kernel.org/r/beac5a9b7da9e8be90223459dcbe07766ae437dd.1542736240.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      79cd7b0e
    • K
      ASoC: acpi: fix: continue searching when machine is ignored · a8657e68
      Keyon Jie 提交于
      [ Upstream commit a3e620f8422832afd832ad5e20aa97d0c72bada8 ]
      
      The machine_quirk may return NULL which means the acpi entries should be
      skipped and search for next matched entry is needed, here add return
      check here and continue for NULL case.
      Signed-off-by: NKeyon Jie <yang.jie@linux.intel.com>
      Signed-off-by: NPierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
      Signed-off-by: NMark Brown <broonie@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      a8657e68
    • T
      PCI: imx6: Fix link training status detection in link up check · 2a031cab
      Trent Piepho 提交于
      [ Upstream commit 68bc10bf992180f269816ff3d22eb30383138577 ]
      
      This bug was introduced in the interaction for two commits on either
      branch of the merge commit 562df5c8 ("Merge branch
      'pci/host-designware' into next").
      
      Commit 4d107d3b ("PCI: imx6: Move link up check into
      imx6_pcie_wait_for_link()"), changed imx6_pcie_wait_for_link() to poll
      the link status register directly, checking for link up and not
      training, and made imx6_pcie_link_up() only check the link up bit (once,
      not a polling loop).
      
      While commit 886bc5ce ("PCI: designware: Add generic
      dw_pcie_wait_for_link()"), replaced the loop in
      imx6_pcie_wait_for_link() with a call to a new dwc core function, which
      polled imx6_pcie_link_up(), which still checked both link up and not
      training in a loop.
      
      When these two commits were merged, the version of
      imx6_pcie_wait_for_link() from 886bc5ce was kept, which eliminated
      the link training check placed there by 4d107d3b. However, the
      version of imx6_pcie_link_up() from 4d107d3b was kept, which
      eliminated the link training check that had been there and was moved to
      imx6_pcie_wait_for_link().
      
      The result was the link training check got lost for the imx6 driver.
      
      Eliminate imx6_pcie_link_up() so that the default handler,
      dw_pcie_link_up(), is used instead. The default handler has the correct
      code, which checks for link up and also that it still is not training,
      fixing the regression.
      
      Fixes: 562df5c8 ("Merge branch 'pci/host-designware' into next")
      Signed-off-by: NTrent Piepho <tpiepho@impinj.com>
      [lorenzo.pieralisi@arm.com: rewrote the commit log]
      Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Reviewed-by: NLucas Stach <l.stach@pengutronix.de>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Joao Pinto <Joao.Pinto@synopsys.com>
      Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Cc: Richard Zhu <hongxing.zhu@nxp.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      2a031cab
    • J
      perf tools: Restore proper cwd on return from mnt namespace · 67707627
      Jiri Olsa 提交于
      [ Upstream commit b01c1f69c8660eaeab7d365cd570103c5c073a02 ]
      
      When reporting on 'record' server we try to retrieve/use the mnt
      namespace of the profiled tasks. We use following API with cookie to
      hold the return namespace, roughly:
      
        nsinfo__mountns_enter(struct nsinfo *nsi, struct nscookie *nc)
          setns(newns, 0);
        ...
        new ns related open..
        ...
        nsinfo__mountns_exit(struct nscookie *nc)
          setns(nc->oldns)
      
      Once finished we setns to old namespace, which also sets the current
      working directory (cwd) to "/", trashing the cwd we had.
      
      This is mostly fine, because we use absolute paths almost everywhere,
      but it screws up 'perf diff':
      
        # perf diff
        failed to open perf.data: No such file or directory  (try 'perf record' first)
        ...
      
      Adding the current working directory to be part of the cookie and
      restoring it in the nsinfo__mountns_exit call.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Krister Johansen <kjlx@templeofstupid.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Fixes: 843ff37b ("perf symbols: Find symbols in different mount namespace")
      Link: http://lkml.kernel.org/r/20181101170001.30019-1-jolsa@kernel.org
      [ No need to check for NULL args for free(), use zfree() for struct members ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      67707627
    • H
      hwmon: (w83795) temp4_type has writable permission · f3ff2ac4
      Huacai Chen 提交于
      [ Upstream commit 09aaf6813cfca4c18034fda7a43e68763f34abb1 ]
      
      Both datasheet and comments of store_temp_mode() tell us that temp1~4_type
      is writable, so fix it.
      Signed-off-by: NYao Wang <wangyao@lemote.com>
      Signed-off-by: NHuacai Chen <chenhc@lemote.com>
      Fixes: 39deb699 (" hwmon: (w83795) Simplify temperature sensor type handling")
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      f3ff2ac4
    • T
      netfilter: xt_hashlimit: fix a possible memory leak in htable_create() · fb0fc90c
      Taehee Yoo 提交于
      [ Upstream commit b4e955e9f372035361fbc6f07b21fe2cc6a5be4a ]
      
      In the htable_create(), hinfo is allocated by vmalloc()
      So that if error occurred, hinfo should be freed.
      
      Fixes: 11d5f157 ("netfilter: xt_hashlimit: Create revision 2 to support higher pps rates")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      fb0fc90c
    • J
      aio: fix failure to put the file pointer · df66ef67
      Jens Axboe 提交于
      [ Upstream commit 53fffe29a9e664a999dd3787e4428da8c30533e0 ]
      
      If the ioprio capability check fails, we return without putting
      the file pointer.
      
      Fixes: d9a08a9e ("fs: Add aio iopriority support")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      df66ef67
    • R
      bpf: allocate local storage buffers using GFP_ATOMIC · 5689666a
      Roman Gushchin 提交于
      [ Upstream commit 569a933b03f3c48b392fe67c0086b3a6b9306b5a ]
      
      Naresh reported an issue with the non-atomic memory allocation of
      cgroup local storage buffers:
      
      [   73.047526] BUG: sleeping function called from invalid context at
      /srv/oe/build/tmp-rpb-glibc/work-shared/intel-corei7-64/kernel-source/mm/slab.h:421
      [   73.060915] in_atomic(): 1, irqs_disabled(): 0, pid: 3157, name: test_cgroup_sto
      [   73.068342] INFO: lockdep is turned off.
      [   73.072293] CPU: 2 PID: 3157 Comm: test_cgroup_sto Not tainted
      4.20.0-rc2-next-20181113 #1
      [   73.080548] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
      2.0b 07/27/2017
      [   73.088018] Call Trace:
      [   73.090463]  dump_stack+0x70/0xa5
      [   73.093783]  ___might_sleep+0x152/0x240
      [   73.097619]  __might_sleep+0x4a/0x80
      [   73.101191]  __kmalloc_node+0x1cf/0x2f0
      [   73.105031]  ? cgroup_storage_update_elem+0x46/0x90
      [   73.109909]  cgroup_storage_update_elem+0x46/0x90
      
      cgroup_storage_update_elem() (as well as other update map update
      callbacks) is called with disabled preemption, so GFP_ATOMIC
      allocation should be used: e.g. alloc_htab_elem() in hashtab.c.
      Reported-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
      Tested-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
      Signed-off-by: NRoman Gushchin <guro@fb.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      5689666a
    • V
      hwmon: (mlxreg-fan) Fix macros for tacho fault reading · 0d4ff099
      Vadim Pasternak 提交于
      [ Upstream commit 243cfe3fb8978c7eec24511aba7dac98819ed896 ]
      
      Fix macros for tacometer fault reading.
      This fix is relevant for three Mellanox systems MQMB7, MSN37, MSN34,
      which are about to be released to the customers.
      At the moment, none of them is at customers sites.
      
      Fixes: 65afb4c8 ("hwmon: (mlxreg-fan) Add support for Mellanox FAN driver")
      Signed-off-by: NVadim Pasternak <vadimp@mellanox.com>
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      0d4ff099
    • T
      spi: omap2-mcspi: Add missing suspend and resume calls · 842aeeac
      Tony Lindgren 提交于
      [ Upstream commit 91b9deefedf4c35a01027ce38bed7299605026a3 ]
      
      I've been wondering still about omap2-mcspi related suspend and resume
      flakeyness and looks like we're missing calls to spi_master_suspend()
      and spi_master_resume(). Adding those and using pm_runtime_force_suspend()
      and pm_runtime_force_resume() makes things work for suspend and resume
      and allows us to stop using noirq suspend and resume.
      
      And while at it, let's use SET_SYSTEM_SLEEP_PM_OPS to simplify things
      further.
      Signed-off-by: NTony Lindgren <tony@atomide.com>
      Signed-off-by: NMark Brown <broonie@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      842aeeac
    • T
      ASoC: dapm: Recalculate audio map forcely when card instantiated · fa3ceb3b
      Tzung-Bi Shih 提交于
      [ Upstream commit 882eab6c28d23a970ae73b7eb831b169a672d456 ]
      
      Audio map are possible in wrong state before card->instantiated has
      been set to true.  Imaging the following examples:
      
      time 1: at the beginning
      
        in:-1    in:-1    in:-1    in:-1
       out:-1   out:-1   out:-1   out:-1
       SIGGEN        A        B      Spk
      
      time 2: after someone called snd_soc_dapm_new_widgets()
      (e.g. create_fill_widget_route_map() in sound/soc/codecs/hdac_hdmi.c)
      
         in:1     in:0     in:0     in:0
        out:0    out:0    out:0    out:1
       SIGGEN        A        B      Spk
      
      time 3: routes added
      
         in:1     in:0     in:0     in:0
        out:0    out:0    out:0    out:1
       SIGGEN -----> A -----> B ---> Spk
      
      In the end, the path should be powered on but it did not.  At time 3,
      "in" of SIGGEN and "out" of Spk did not propagate to their neighbors
      because snd_soc_dapm_add_path() will not invalidate the paths if
      the card has not instantiated (i.e. card->instantiated is false).
      To correct the state of audio map, recalculate the whole map forcely.
      Signed-off-by: NTzung-Bi Shih <tzungbi@google.com>
      Signed-off-by: NMark Brown <broonie@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      fa3ceb3b
    • P
      ASoC: omap-abe-twl6040: Fix missing audio card caused by deferred probing · abbd01b7
      Peter Ujfalusi 提交于
      [ Upstream commit 76836fd354922ebe4798a64fda01f8dc6a8b0984 ]
      
      The machine driver fails to probe in next-20181113 with:
      
      [    2.539093] omap-abe-twl6040 sound: ASoC: CODEC DAI twl6040-legacy not registered
      [    2.546630] omap-abe-twl6040 sound: devm_snd_soc_register_card() failed: -517
      ...
      [    3.693206] omap-abe-twl6040 sound: ASoC: Both platform name/of_node are set for TWL6040
      [    3.701446] omap-abe-twl6040 sound: ASoC: failed to init link TWL6040
      [    3.708007] omap-abe-twl6040 sound: devm_snd_soc_register_card() failed: -22
      [    3.715148] omap-abe-twl6040: probe of sound failed with error -22
      
      Bisect pointed to a merge commit:
      first bad commit: [0f688ab20a540aafa984c5dbd68a71debebf4d7f] Merge remote-tracking branch 'net-next/master'
      
      and a diff between a working kernel does not reveal anything which would
      explain the change in behavior.
      
      Further investigation showed that on the second try of loading fails
      because the dai_link->platform is no longer NULL and it might be pointing
      to uninitialized memory.
      
      The fix is to move the snd_soc_dai_link and snd_soc_card inside of the
      abe_twl6040 struct, which is dynamically allocated every time the driver
      probes.
      Signed-off-by: NPeter Ujfalusi <peter.ujfalusi@ti.com>
      Signed-off-by: NMark Brown <broonie@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      abbd01b7
    • N
      hwmon: (ina2xx) Fix current value calculation · 3ef0d19c
      Nicolin Chen 提交于
      [ Upstream commit 38cd989ee38c16388cde89db5b734f9d55b905f9 ]
      
      The current register (04h) has a sign bit at MSB. The comments
      for this calculation also mention that it's a signed register.
      
      However, the regval is unsigned type so result of calculation
      turns out to be an incorrect value when current is negative.
      
      This patch simply fixes this by adding a casting to s16.
      
      Fixes: 5d389b12 ("hwmon: (ina2xx) Make calibration register value fixed")
      Signed-off-by: NNicolin Chen <nicoleotsuka@gmail.com>
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      3ef0d19c
    • T
      s390/cpum_cf: Reject request for sampling in event initialization · d70a6605
      Thomas Richter 提交于
      [ Upstream commit 613a41b0d16e617f46776a93b975a1eeea96417c ]
      
      On s390 command perf top fails
      [root@s35lp76 perf] # ./perf top -F100000  --stdio
         Error:
         cycles: PMU Hardware doesn't support sampling/overflow-interrupts.
         	Try 'perf stat'
      [root@s35lp76 perf] #
      
      Using event -e rb0000 works as designed.  Event rb0000 is the event
      number of the sampling facility for basic sampling.
      
      During system start up the following PMUs are installed in the kernel's
      PMU list (from head to tail):
         cpum_cf --> s390 PMU counter facility device driver
         cpum_sf --> s390 PMU sampling facility device driver
         uprobe
         kprobe
         tracepoint
         task_clock
         cpu_clock
      
      Perf top executes following functions and calls perf_event_open(2) system
      call with different parameters many times:
      
      cmd_top
      --> __cmd_top
          --> perf_evlist__add_default
              --> __perf_evlist__add_default
                  --> perf_evlist__new_cycles (creates event type:0 (HW)
      			    		config 0 (CPU_CYCLES)
      	        --> perf_event_attr__set_max_precise_ip
      		    Uses perf_event_open(2) to detect correct
      		    precise_ip level. Fails 3 times on s390 which is ok.
      
      Then functions cmd_top
      --> __cmd_top
          --> perf_top__start_counters
              -->perf_evlist__config
      	   --> perf_can_comm_exec
                     --> perf_probe_api
      	           This functions test support for the following events:
      		   "cycles:u", "instructions:u", "cpu-clock:u" using
      		   --> perf_do_probe_api
      		       --> perf_event_open_cloexec
      		           Test the close on exec flag support with
      			   perf_event_open(2).
      	               perf_do_probe_api returns true if the event is
      		       supported.
      		       The function returns true because event cpu-clock is
      		       supported by the PMU cpu_clock.
      	               This is achieved by many calls to perf_event_open(2).
      
      Function perf_top__start_counters now calls perf_evsel__open() for every
      event, which is the default event cpu_cycles (config:0) and type HARDWARE
      (type:0) which a predfined frequence of 4000.
      
      Given the above order of the PMU list, the PMU cpum_cf gets called first
      and returns 0, which indicates support for this sampling. The event is
      fully allocated in the function perf_event_open (file kernel/event/core.c
      near line 10521 and the following check fails:
      
              event = perf_event_alloc(&attr, cpu, task, group_leader, NULL,
      		                 NULL, NULL, cgroup_fd);
      	if (IS_ERR(event)) {
      		err = PTR_ERR(event);
      		goto err_cred;
      	}
      
              if (is_sampling_event(event)) {
      		if (event->pmu->capabilities & PERF_PMU_CAP_NO_INTERRUPT) {
      			err = -EOPNOTSUPP;
      			goto err_alloc;
      		}
      	}
      
      The check for the interrupt capabilities fails and the system call
      perf_event_open() returns -EOPNOTSUPP (-95).
      
      Add a check to return -ENODEV when sampling is requested in PMU cpum_cf.
      This allows common kernel code in the perf_event_open() system call to
      test the next PMU in above list.
      
      Fixes: 97b1198f (" "s390, perf: Use common PMU interrupt disabled code")
      Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
      Reviewed-by: NHendrik Brueckner <brueckner@linux.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      d70a6605
    • R
      ASoC: qcom: Set dai_link id to each dai_link · 2309636d
      Rohit kumar 提交于
      [ Upstream commit 67fd1437d11620de8768b22fe20942e752ed52e9 ]
      
      Frontend dai_link id is used for closing ADM sessions.
      During concurrent usecase when one session is closed,
      it closes other ADM session associated with other usecase
      too. Dai_link->id should always point to Frontend dai id.
      Set cpu_dai id as dai_link id to fix the issue.
      Signed-off-by: NRohit kumar <rohitkr@codeaurora.org>
      Acked-by: NSrinivas Kandagatla <srinivas.kandagatla@linaro.org>
      Signed-off-by: NMark Brown <broonie@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      2309636d
    • P
      ASoC: Intel: Power down links before turning off display audio power · 88e8e3c7
      Pierre-Louis Bossart 提交于
      [ Upstream commit 4c10473d6ddf12ec124c9ff71a5d23bb5388478b ]
      
      On certain platforms, Display HDMI HDA codec was not going to sleep state
      after the use when links are powered down after turning off the display
      power. As per the HW recommendation, links are powered down before turning
      off the display power to ensure that the codec goes to sleep state.
      
      This patch was updated from an earlier version submitted upstream [1]
      which conflicted with the changes merged for HDaudio codec support
      with the Intel DSP.
      
      [1] https://patchwork.kernel.org/patch/10540213/Signed-off-by: NSriram Periyasamy <sriramx.periyasamy@intel.com>
      Signed-off-by: NSanyog Kale <sanyog.r.kale@intel.com>
      Signed-off-by: NPierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
      Signed-off-by: NMark Brown <broonie@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      88e8e3c7
    • R
      ASoC: wm_adsp: Fix dma-unsafe read of scratch registers · 737f3bb3
      Richard Fitzgerald 提交于
      [ Upstream commit 20e00db2f59bdddf8a8e241473ef8be94631d3ae ]
      
      Stack memory isn't DMA-safe so it isn't safe to use either
      regmap_raw_read or regmap_bulk_read to read into stack memory.
      
      The two functions to read the scratch registers were using
      stack memory and regmap_raw_read. It's not worth allocating
      memory just for this trivial read, and it isn't time-critical.
      A simple regmap_read for each register is sufficient.
      Signed-off-by: NRichard Fitzgerald <rf@opensource.cirrus.com>
      Signed-off-by: NMark Brown <broonie@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      737f3bb3
    • K
      ASoC: rockchip: add missing slave_config setting for I2S · e4777c2e
      Katsuhiro Suzuki 提交于
      [ Upstream commit 16a8ee4c80b45984b6de1f90a49edcf336b7c621 ]
      
      This patch adds missing prepare_sleve_config that is needed for
      setup the DMA slave channel for I2S.
      Signed-off-by: NKatsuhiro Suzuki <katsuhiro@katsuster.net>
      Signed-off-by: NMark Brown <broonie@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      e4777c2e
    • S
      hwmon: (raspberrypi) Fix initial notify · dbc62bd3
      Stefan Wahren 提交于
      [ Upstream commit 35fdc3902179366489a12cae4cb3ccc3b95f0afe ]
      
      In case an under-voltage happens before probing the driver wont
      write the critical warning into the kernel log. So don't init
      of last_throttled during probe and fix this issue.
      
      Fixes: 74d1e007 ("hwmon: Add support for RPi voltage sensor")
      Reported-by: N"Noralf Trønnes" <noralf@tronnes.org>
      Signed-off-by: NStefan Wahren <stefan.wahren@i2se.com>
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      dbc62bd3
    • N
      hwmon (ina2xx) Fix NULL id pointer in probe() · 08cff351
      Nicolin Chen 提交于
      [ Upstream commit 70df9ebbd82c794ddfbb49d45b337f18d5588dc2 ]
      
      When using DT configurations, the id pointer might turn out to
      be NULL. Then the driver encounters NULL pointer access:
      
        Unable to handle kernel read from unreadable memory at vaddr 00000018
        [...]
        PC is at ina2xx_probe+0x114/0x200
        LR is at ina2xx_probe+0x10c/0x200
        [...]
        Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
      
      The reason is that i2c core returns the id pointer by matching
      id_table with client->name, while the client->name is actually
      using the name from the first string in the DT compatible list,
      not the best one. So i2c core would fail to match the id_table
      if the best matched compatible string isn't the first one, and
      then would return a NULL id pointer.
      
      This probably should be fixed in i2c core. But it doesn't hurt
      to make the driver robust. So this patch fixes it by using the
      "chip" that's added to unify both DT and non-DT configurations.
      
      Additionally, since id pointer could be null, so as id->name:
        ina2xx 10-0047: power monitor (null) (Rshunt = 1000 uOhm)
        ina2xx 10-0048: power monitor (null) (Rshunt = 10000 uOhm)
      
      So this patch also fixes NULL name pointer, using client->name
      to play safe and to align with hwmon->name.
      
      Fixes: bd0ddd4d ("hwmon: (ina2xx) Add OF device ID table")
      Signed-off-by: NNicolin Chen <nicoleotsuka@gmail.com>
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      08cff351
    • E
      s390/cio: Fix cleanup when unsupported IDA format is used · 61170596
      Eric Farman 提交于
      [ Upstream commit b89e242eee8d4cd8261d8d821c62c5d1efc454d0 ]
      
      Direct returns from within a loop are rude, but it doesn't mean it gets
      to avoid releasing the memory acquired beforehand.
      Signed-off-by: NEric Farman <farman@linux.ibm.com>
      Message-Id: <20181109023937.96105-3-farman@linux.ibm.com>
      Reviewed-by: NFarhan Ali <alifm@linux.ibm.com>
      Reviewed-by: NPierre Morel <pmorel@linux.ibm.com>
      Acked-by: NHalil Pasic <pasic@linux.ibm.com>
      Signed-off-by: NCornelia Huck <cohuck@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      61170596
    • E
      s390/cio: Fix cleanup of pfn_array alloc failure · a4f21114
      Eric Farman 提交于
      [ Upstream commit 806212f91c874b24cf9eb4a9f180323671b6c5ed ]
      
      If pfn_array_alloc fails somehow, we need to release the pfn_array_table
      that was malloc'd earlier.
      Signed-off-by: NEric Farman <farman@linux.ibm.com>
      Message-Id: <20181109023937.96105-2-farman@linux.ibm.com>
      Acked-by: NHalil Pasic <pasic@linux.ibm.com>
      Signed-off-by: NCornelia Huck <cohuck@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      a4f21114
    • F
      netfilter: nf_tables: fix use-after-free when deleting compat expressions · 00bac44c
      Florian Westphal 提交于
      [ Upstream commit 29e3880109e357fdc607b4393f8308cef6af9413 ]
      
      nft_compat ops do not have static storage duration, unlike all other
      expressions.
      
      When nf_tables_expr_destroy() returns, expr->ops might have been
      free'd already, so we need to store next address before calling
      expression destructor.
      
      For same reason, we can't deref match pointer after nft_xt_put().
      
      This can be easily reproduced by adding msleep() before
      nft_match_destroy() returns.
      
      Fixes: 0ca743a5 ("netfilter: nf_tables: add compatibility layer for x_tables")
      Reported-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      00bac44c
    • T
      netfilter: xt_RATEEST: remove netns exit routine · e947f9aa
      Taehee Yoo 提交于
      [ Upstream commit 0fbcc5b568edab7d848b7c7fa66d44ffbd4133c0 ]
      
      xt_rateest_net_exit() was added to check whether rules are flushed
      successfully. but ->net_exit() callback is called earlier than
      ->destroy() callback.
      So that ->net_exit() callback can't check that.
      
      test commands:
         %ip netns add vm1
         %ip netns exec vm1 iptables -t mangle -I PREROUTING -p udp \
      	   --dport 1111 -j RATEEST --rateest-name ap \
      	   --rateest-interval 250ms --rateest-ewma 0.5s
         %ip netns del vm1
      
      splat looks like:
      [  668.813518] WARNING: CPU: 0 PID: 87 at net/netfilter/xt_RATEEST.c:210 xt_rateest_net_exit+0x210/0x340 [xt_RATEEST]
      [  668.813518] Modules linked in: xt_RATEEST xt_tcpudp iptable_mangle bpfilter ip_tables x_tables
      [  668.813518] CPU: 0 PID: 87 Comm: kworker/u4:2 Not tainted 4.19.0-rc7+ #21
      [  668.813518] Workqueue: netns cleanup_net
      [  668.813518] RIP: 0010:xt_rateest_net_exit+0x210/0x340 [xt_RATEEST]
      [  668.813518] Code: 00 48 8b 85 30 ff ff ff 4c 8b 23 80 38 00 0f 85 24 01 00 00 48 8b 85 30 ff ff ff 4d 85 e4 4c 89 a5 58 ff ff ff c6 00 f8 74 b2 <0f> 0b 48 83 c3 08 4c 39 f3 75 b0 48 b8 00 00 00 00 00 fc ff df 49
      [  668.813518] RSP: 0018:ffff8801156c73f8 EFLAGS: 00010282
      [  668.813518] RAX: ffffed0022ad8e85 RBX: ffff880118928e98 RCX: 5db8012a00000000
      [  668.813518] RDX: ffff8801156c7428 RSI: 00000000cb1d185f RDI: ffff880115663b74
      [  668.813518] RBP: ffff8801156c74d0 R08: ffff8801156633c0 R09: 1ffff100236440be
      [  668.813518] R10: 0000000000000001 R11: ffffed002367d852 R12: ffff880115142b08
      [  668.813518] R13: 1ffff10022ad8e81 R14: ffff880118928ea8 R15: dffffc0000000000
      [  668.813518] FS:  0000000000000000(0000) GS:ffff88011b200000(0000) knlGS:0000000000000000
      [  668.813518] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  668.813518] CR2: 0000563aa69f4f28 CR3: 0000000105a16000 CR4: 00000000001006f0
      [  668.813518] Call Trace:
      [  668.813518]  ? unregister_netdevice_many+0xe0/0xe0
      [  668.813518]  ? xt_rateest_net_init+0x2c0/0x2c0 [xt_RATEEST]
      [  668.813518]  ? default_device_exit+0x1ca/0x270
      [  668.813518]  ? remove_proc_entry+0x1cd/0x390
      [  668.813518]  ? dev_change_net_namespace+0xd00/0xd00
      [  668.813518]  ? __init_waitqueue_head+0x130/0x130
      [  668.813518]  ops_exit_list.isra.10+0x94/0x140
      [  668.813518]  cleanup_net+0x45b/0x900
      [  668.813518]  ? net_drop_ns+0x110/0x110
      [  668.813518]  ? swapgs_restore_regs_and_return_to_usermode+0x3c/0x80
      [  668.813518]  ? save_trace+0x300/0x300
      [  668.813518]  ? lock_acquire+0x196/0x470
      [  668.813518]  ? lock_acquire+0x196/0x470
      [  668.813518]  ? process_one_work+0xb60/0x1de0
      [  668.813518]  ? _raw_spin_unlock_irq+0x29/0x40
      [  668.813518]  ? _raw_spin_unlock_irq+0x29/0x40
      [  668.813518]  ? __lock_acquire+0x4500/0x4500
      [  668.813518]  ? __lock_is_held+0xb4/0x140
      [  668.813518]  process_one_work+0xc13/0x1de0
      [  668.813518]  ? pwq_dec_nr_in_flight+0x3c0/0x3c0
      [  668.813518]  ? set_load_weight+0x270/0x270
      [ ... ]
      
      Fixes: 3427b2ab ("netfilter: make xt_rateest hash table per net")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      e947f9aa
    • J
      perf tools: Fix crash on synthesizing the unit · f8328abb
      Jiri Olsa 提交于
      [ Upstream commit fb50c09e923870a358d68b0d58891bd145b8d7c7 ]
      
      Adam reported a record command crash for simple session like:
      
        $ perf record -e cpu-clock ls
      
      with following backtrace:
      
        Program received signal SIGSEGV, Segmentation fault.
        3543            ev = event_update_event__new(size + 1, PERF_EVENT_UPDATE__UNIT, evsel->id[0]);
        (gdb) bt
        #0  perf_event__synthesize_event_update_unit
        #1  0x000000000051e469 in perf_event__synthesize_extra_attr
        #2  0x00000000004445cb in record__synthesize
        #3  0x0000000000444bc5 in __cmd_record
        ...
      
      We synthesize an update event that needs to touch the evsel id array,
      which is not defined at that time. Fix this by forcing the id allocation
      for events with their unit defined.
      
      Reflecting possible read_format ID bit in the attr tests.
      Reported-by: NYongxin Liu <yongxin.liu@outlook.com>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adam Lee <leeadamrobert@gmail.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201477
      Fixes: bfd8f72c ("perf record: Synthesize unit/scale/... in event update")
      Link: http://lkml.kernel.org/r/20181112130012.5424-1-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      f8328abb
    • F
      selftests: add script to stress-test nft packet path vs. control plane · d15443a1
      Florian Westphal 提交于
      [ Upstream commit 25d8bcedbf4329895dbaf9dd67baa6f18dad918c ]
      
      Start flood ping for each cpu while loading/flushing rulesets to make
      sure we do not access already-free'd rules from nf_tables evaluation loop.
      
      Also add this to TARGETS so 'make run_tests' in selftest dir runs it
      automatically.
      
      This would have caught the bug fixed in previous change
      ("netfilter: nf_tables: do not skip inactive chains during generation update")
      sooner.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      d15443a1
    • F
      netfilter: nf_tables: don't skip inactive chains during update · 8fe8940f
      Florian Westphal 提交于
      [ Upstream commit 0fb39bbe43d4481fcf300d2b5822de60942fd189 ]
      
      There is no synchronization between packet path and the configuration plane.
      
      The packet path uses two arrays with rules, one contains the current (active)
      generation.  The other either contains the last (obsolete) generation or
      the future one.
      
      Consider:
      cpu1               cpu2
                         nft_do_chain(c);
      delete c
      net->gen++;
                         genbit = !!net->gen;
                         rules = c->rg[genbit];
      
      cpu1 ignores c when updating if c is not active anymore in the new
      generation.
      
      On cpu2, we now use rules from wrong generation, as c->rg[old]
      contains the rules matching 'c' whereas c->rg[new] was not updated and
      can even point to rules that have been free'd already, causing a crash.
      
      To fix this, make sure that 'current' to the 'next' generation are
      identical for chains that are going away so that c->rg[new] will just
      use the matching rules even if genbit was incremented already.
      
      Fixes: 0cbc06b3 ("netfilter: nf_tables: remove synchronize_rcu in commit phase")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      8fe8940f
    • T
      netfilter: nf_conncount: fix unexpected permanent node of list. · 4a3b49f0
      Taehee Yoo 提交于
      [ Upstream commit 3c5cdb17c3be76714dfd0d03e384f70579545614 ]
      
      When list->count is 0, the list is deleted by GC. But list->count is
      never reached 0 because initial count value is 1 and it is increased
      when node is inserted. So that initial value of list->count should be 0.
      
      Originally GC always finds zero count list through deleting node and
      decreasing count. However, list may be left empty since node insertion
      may fail eg.  allocaton problem. In order to solve this problem, GC
      routine also finds zero count list without deleting node.
      
      Fixes: cb2b36f5 ("netfilter: nf_conncount: Switch to plain list")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      4a3b49f0
    • T
      netfilter: nf_conncount: fix list_del corruption in conn_free · ae60f470
      Taehee Yoo 提交于
      [ Upstream commit 31568ec09ea02a050249921698c9729419539cce ]
      
      nf_conncount_tuple is an element of nft_connlimit and that is deleted by
      conn_free(). Elements can be deleted by both GC routine and data path
      functions (nf_conncount_lookup, nf_conncount_add) and they call
      conn_free() to free elements. But conn_free() only protects lists, not
      each element. So that list_del corruption could occurred.
      
      The conn_free() doesn't check whether element is already deleted. In
      order to protect elements, dead flag is added. If an element is deleted,
      dead flag is set. The only conn_free() can delete elements so that both
      list lock and dead flag are enough to protect it.
      
      test commands:
         %nft add table ip filter
         %nft add chain ip filter input { type filter hook input priority 0\; }
         %nft add rule filter input meter test { ip id ct count over 2 } counter
      
      splat looks like:
      [ 1779.495778] list_del corruption, ffff8800b6e12008->prev is LIST_POISON2 (dead000000000200)
      [ 1779.505453] ------------[ cut here ]------------
      [ 1779.506260] kernel BUG at lib/list_debug.c:50!
      [ 1779.515831] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
      [ 1779.516772] CPU: 0 PID: 33 Comm: kworker/0:2 Not tainted 4.19.0-rc6+ #22
      [ 1779.516772] Workqueue: events_power_efficient nft_rhash_gc [nf_tables_set]
      [ 1779.516772] RIP: 0010:__list_del_entry_valid+0xd8/0x150
      [ 1779.516772] Code: 39 48 83 c4 08 b8 01 00 00 00 5b 5d c3 48 89 ea 48 c7 c7 00 c3 5b 98 e8 0f dc 40 ff 0f 0b 48 c7 c7 60 c3 5b 98 e8 01 dc 40 ff <0f> 0b 48 c7 c7 c0 c3 5b 98 e8 f3 db 40 ff 0f 0b 48 c7 c7 20 c4 5b
      [ 1779.516772] RSP: 0018:ffff880119127420 EFLAGS: 00010286
      [ 1779.516772] RAX: 000000000000004e RBX: dead000000000200 RCX: 0000000000000000
      [ 1779.516772] RDX: 000000000000004e RSI: 0000000000000008 RDI: ffffed0023224e7a
      [ 1779.516772] RBP: ffff88011934bc10 R08: ffffed002367cea9 R09: ffffed002367cea9
      [ 1779.516772] R10: 0000000000000001 R11: ffffed002367cea8 R12: ffff8800b6e12008
      [ 1779.516772] R13: ffff8800b6e12010 R14: ffff88011934bc20 R15: ffff8800b6e12008
      [ 1779.516772] FS:  0000000000000000(0000) GS:ffff88011b200000(0000) knlGS:0000000000000000
      [ 1779.516772] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1779.516772] CR2: 00007fc876534010 CR3: 000000010da16000 CR4: 00000000001006f0
      [ 1779.516772] Call Trace:
      [ 1779.516772]  conn_free+0x9f/0x2b0 [nf_conncount]
      [ 1779.516772]  ? nf_ct_tmpl_alloc+0x2a0/0x2a0 [nf_conntrack]
      [ 1779.516772]  ? nf_conncount_add+0x520/0x520 [nf_conncount]
      [ 1779.516772]  ? do_raw_spin_trylock+0x1a0/0x1a0
      [ 1779.516772]  ? do_raw_spin_trylock+0x10/0x1a0
      [ 1779.516772]  find_or_evict+0xe5/0x150 [nf_conncount]
      [ 1779.516772]  nf_conncount_gc_list+0x162/0x360 [nf_conncount]
      [ 1779.516772]  ? nf_conncount_lookup+0xee0/0xee0 [nf_conncount]
      [ 1779.516772]  ? _raw_spin_unlock_irqrestore+0x45/0x50
      [ 1779.516772]  ? trace_hardirqs_off+0x6b/0x220
      [ 1779.516772]  ? trace_hardirqs_on_caller+0x220/0x220
      [ 1779.516772]  nft_rhash_gc+0x16b/0x540 [nf_tables_set]
      [ ... ]
      
      Fixes: 5c789e13 ("netfilter: nf_conncount: Add list lock and gc worker, and RCU for init tree search")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      ae60f470
    • T
      netfilter: nf_conncount: use spin_lock_bh instead of spin_lock · 08c7e68a
      Taehee Yoo 提交于
      [ Upstream commit fd3e71a9f71e232181a225301a75936373636ccc ]
      
      conn_free() holds lock with spin_lock() and it is called by both
      nf_conncount_lookup() and nf_conncount_gc_list(). nf_conncount_lookup()
      is called from bottom-half context and nf_conncount_gc_list() from
      process context. So that spin_lock() call is not safe. Hence
      conn_free() should use spin_lock_bh() instead of spin_lock().
      
      test commands:
         %nft add table ip filter
         %nft add chain ip filter input { type filter hook input priority 0\; }
         %nft add rule filter input meter test { ip saddr ct count over 2 } \
      	   counter
      
      splat looks like:
      [  461.996507] ================================
      [  461.998999] WARNING: inconsistent lock state
      [  461.998999] 4.19.0-rc6+ #22 Not tainted
      [  461.998999] --------------------------------
      [  461.998999] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
      [  461.998999] kworker/0:2/134 [HC0[0]:SC0[0]:HE1:SE1] takes:
      [  461.998999] 00000000a71a559a (&(&list->list_lock)->rlock){+.?.}, at: conn_free+0x69/0x2b0 [nf_conncount]
      [  461.998999] {IN-SOFTIRQ-W} state was registered at:
      [  461.998999]   _raw_spin_lock+0x30/0x70
      [  461.998999]   nf_conncount_add+0x28a/0x520 [nf_conncount]
      [  461.998999]   nft_connlimit_eval+0x401/0x580 [nft_connlimit]
      [  461.998999]   nft_dynset_eval+0x32b/0x590 [nf_tables]
      [  461.998999]   nft_do_chain+0x497/0x1430 [nf_tables]
      [  461.998999]   nft_do_chain_ipv4+0x255/0x330 [nf_tables]
      [  461.998999]   nf_hook_slow+0xb1/0x160
      [ ... ]
      [  461.998999] other info that might help us debug this:
      [  461.998999]  Possible unsafe locking scenario:
      [  461.998999]
      [  461.998999]        CPU0
      [  461.998999]        ----
      [  461.998999]   lock(&(&list->list_lock)->rlock);
      [  461.998999]   <Interrupt>
      [  461.998999]     lock(&(&list->list_lock)->rlock);
      [  461.998999]
      [  461.998999]  *** DEADLOCK ***
      [  461.998999]
      [ ... ]
      
      Fixes: 5c789e13 ("netfilter: nf_conncount: Add list lock and gc worker, and RCU for init tree search")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      08c7e68a