1. 29 10月, 2019 2 次提交
  2. 08 10月, 2019 1 次提交
  3. 05 10月, 2019 1 次提交
  4. 19 9月, 2019 1 次提交
    • M
      driver core: Fix use-after-free and double free on glue directory · e1666bcb
      Muchun Song 提交于
      commit ac43432cb1f5c2950408534987e57c2071e24d8f upstream.
      
      There is a race condition between removing glue directory and adding a new
      device under the glue dir. It can be reproduced in following test:
      
      CPU1:                                         CPU2:
      
      device_add()
        get_device_parent()
          class_dir_create_and_add()
            kobject_add_internal()
              create_dir()    // create glue_dir
      
                                                    device_add()
                                                      get_device_parent()
                                                        kobject_get() // get glue_dir
      
      device_del()
        cleanup_glue_dir()
          kobject_del(glue_dir)
      
                                                      kobject_add()
                                                        kobject_add_internal()
                                                          create_dir() // in glue_dir
                                                            sysfs_create_dir_ns()
                                                              kernfs_create_dir_ns(sd)
      
            sysfs_remove_dir() // glue_dir->sd=NULL
            sysfs_put()        // free glue_dir->sd
      
                                                                // sd is freed
                                                                kernfs_new_node(sd)
                                                                  kernfs_get(glue_dir)
                                                                  kernfs_add_one()
                                                                  kernfs_put()
      
      Before CPU1 remove last child device under glue dir, if CPU2 add a new
      device under glue dir, the glue_dir kobject reference count will be
      increase to 2 via kobject_get() in get_device_parent(). And CPU2 has
      been called kernfs_create_dir_ns(), but not call kernfs_new_node().
      Meanwhile, CPU1 call sysfs_remove_dir() and sysfs_put(). This result in
      glue_dir->sd is freed and it's reference count will be 0. Then CPU2 call
      kernfs_get(glue_dir) will trigger a warning in kernfs_get() and increase
      it's reference count to 1. Because glue_dir->sd is freed by CPU1, the next
      call kernfs_add_one() by CPU2 will fail(This is also use-after-free)
      and call kernfs_put() to decrease reference count. Because the reference
      count is decremented to 0, it will also call kmem_cache_free() to free
      the glue_dir->sd again. This will result in double free.
      
      In order to avoid this happening, we also should make sure that kernfs_node
      for glue_dir is released in CPU1 only when refcount for glue_dir kobj is
      1 to fix this race.
      
      The following calltrace is captured in kernel 4.14 with the following patch
      applied:
      
      commit 726e4109 ("drivers: core: Remove glue dirs from sysfs earlier")
      
      --------------------------------------------------------------------------
      [    3.633703] WARNING: CPU: 4 PID: 513 at .../fs/kernfs/dir.c:494
                      Here is WARN_ON(!atomic_read(&kn->count) in kernfs_get().
      ....
      [    3.633986] Call trace:
      [    3.633991]  kernfs_create_dir_ns+0xa8/0xb0
      [    3.633994]  sysfs_create_dir_ns+0x54/0xe8
      [    3.634001]  kobject_add_internal+0x22c/0x3f0
      [    3.634005]  kobject_add+0xe4/0x118
      [    3.634011]  device_add+0x200/0x870
      [    3.634017]  _request_firmware+0x958/0xc38
      [    3.634020]  request_firmware_into_buf+0x4c/0x70
      ....
      [    3.634064] kernel BUG at .../mm/slub.c:294!
                      Here is BUG_ON(object == fp) in set_freepointer().
      ....
      [    3.634346] Call trace:
      [    3.634351]  kmem_cache_free+0x504/0x6b8
      [    3.634355]  kernfs_put+0x14c/0x1d8
      [    3.634359]  kernfs_create_dir_ns+0x88/0xb0
      [    3.634362]  sysfs_create_dir_ns+0x54/0xe8
      [    3.634366]  kobject_add_internal+0x22c/0x3f0
      [    3.634370]  kobject_add+0xe4/0x118
      [    3.634374]  device_add+0x200/0x870
      [    3.634378]  _request_firmware+0x958/0xc38
      [    3.634381]  request_firmware_into_buf+0x4c/0x70
      --------------------------------------------------------------------------
      
      Fixes: 726e4109 ("drivers: core: Remove glue dirs from sysfs earlier")
      Signed-off-by: NMuchun Song <smuchun@gmail.com>
      Reviewed-by: NMukesh Ojha <mojha@codeaurora.org>
      Signed-off-by: NPrateek Sood <prsood@codeaurora.org>
      Link: https://lore.kernel.org/r/20190727032122.24639-1-smuchun@gmail.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e1666bcb
  5. 09 8月, 2019 2 次提交
    • D
      drivers/base: Introduce kill_device() · c23106d4
      Dan Williams 提交于
      commit 00289cd87676e14913d2d8492d1ce05c4baafdae upstream.
      
      The libnvdimm subsystem arranges for devices to be destroyed as a result
      of a sysfs operation. Since device_unregister() cannot be called from
      an actively running sysfs attribute of the same device libnvdimm
      arranges for device_unregister() to be performed in an out-of-line async
      context.
      
      The driver core maintains a 'dead' state for coordinating its own racing
      async registration / de-registration requests. Rather than add local
      'dead' state tracking infrastructure to libnvdimm device objects, export
      the existing state tracking via a new kill_device() helper.
      
      The kill_device() helper simply marks the device as dead, i.e. that it
      is on its way to device_del(), or returns that the device was already
      dead. This can be used in advance of calling device_unregister() for
      subsystems like libnvdimm that might need to handle multiple user
      threads racing to delete a device.
      
      This refactoring does not change any behavior, but it is a pre-requisite
      for follow-on fixes and therefore marked for -stable.
      
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Fixes: 4d88a97a ("libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver...")
      Cc: <stable@vger.kernel.org>
      Tested-by: NJane Chu <jane.chu@oracle.com>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Link: https://lore.kernel.org/r/156341207332.292348.14959761496009347574.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      c23106d4
    • A
      driver core: Establish order of operations for device_add and device_del via bitflag · 7c43f84e
      Alexander Duyck 提交于
      commit 3451a495ef244a88ed6317a035299d835554d579 upstream.
      
      Add an additional bit flag to the device_private struct named "dead".
      
      This additional flag provides a guarantee that when a device_del is
      executed on a given interface an async worker will not attempt to attach
      the driver following the earlier device_del call. Previously this
      guarantee was not present and could result in the device_del call
      attempting to remove a driver from an interface only to have the async
      worker attempt to probe the driver later when it finally completes the
      asynchronous probe call.
      
      One additional change added was that I pulled the check for dev->driver
      out of the __device_attach_driver call and instead placed it in the
      __device_attach_async_helper call. This was motivated by the fact that the
      only other caller of this, __device_attach, had already taken the
      device_lock() and checked for dev->driver. Instead of testing for this
      twice in this path it makes more sense to just consolidate the dev->dead
      and dev->driver checks together into one set of checks.
      Reviewed-by: NDan Williams <dan.j.williams@intel.com>
      Reviewed-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      7c43f84e
  6. 26 7月, 2019 2 次提交
    • S
      regmap: fix bulk writes on paged registers · b01bf44c
      Srinivas Kandagatla 提交于
      [ Upstream commit db057679de3e9e6a03c1bcd5aee09b0d25fd9f5b ]
      
      On buses like SlimBus and SoundWire which does not support
      gather_writes yet in regmap, A bulk write on paged register
      would be silently ignored after programming page.
      This is because local variable 'ret' value in regmap_raw_write_impl()
      gets reset to 0 once page register is written successfully and the
      code below checks for 'ret' value to be -ENOTSUPP before linearising
      the write buffer to send to bus->write().
      
      Fix this by resetting the 'ret' value to -ENOTSUPP in cases where
      gather_writes() is not supported or single register write is
      not possible.
      Signed-off-by: NSrinivas Kandagatla <srinivas.kandagatla@linaro.org>
      Signed-off-by: NMark Brown <broonie@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      b01bf44c
    • D
      regmap: debugfs: Fix memory leak in regmap_debugfs_init · 83d133c9
      Daniel Baluta 提交于
      [ Upstream commit 2899872b627e99b7586fe3b6c9f861da1b4d5072 ]
      
      As detected by kmemleak running on i.MX6ULL board:
      
      nreferenced object 0xd8366600 (size 64):
        comm "swapper/0", pid 1, jiffies 4294937370 (age 933.220s)
        hex dump (first 32 bytes):
          64 75 6d 6d 79 2d 69 6f 6d 75 78 63 2d 67 70 72  dummy-iomuxc-gpr
          40 32 30 65 34 30 30 30 00 e3 f3 ab fe d1 1b dd  @20e4000........
        backtrace:
          [<b0402aec>] kasprintf+0x2c/0x54
          [<a6fbad2c>] regmap_debugfs_init+0x7c/0x31c
          [<9c8d91fa>] __regmap_init+0xb5c/0xcf4
          [<5b1c3d2a>] of_syscon_register+0x164/0x2c4
          [<596a5d80>] syscon_node_to_regmap+0x64/0x90
          [<49bd597b>] imx6ul_init_machine+0x34/0xa0
          [<250a4dac>] customize_machine+0x1c/0x30
          [<2d19fdaf>] do_one_initcall+0x7c/0x398
          [<e6084469>] kernel_init_freeable+0x328/0x448
          [<168c9101>] kernel_init+0x8/0x114
          [<913268aa>] ret_from_fork+0x14/0x20
          [<ce7b131a>] 0x0
      
      Root cause is that map->debugfs_name is allocated using kasprintf
      and then the pointer is lost by assigning it other memory address.
      Reported-by: NStefan Wahren <stefan.wahren@i2se.com>
      Signed-off-by: NDaniel Baluta <daniel.baluta@nxp.com>
      Signed-off-by: NMark Brown <broonie@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      83d133c9
  7. 21 7月, 2019 3 次提交
  8. 31 5月, 2019 1 次提交
  9. 26 5月, 2019 1 次提交
    • J
      driver core: Postpone DMA tear-down until after devres release for probe failure · 90110ffd
      John Garry 提交于
      commit 0b777eee88d712256ba8232a9429edb17c4f9ceb upstream.
      
      In commit 376991db4b64 ("driver core: Postpone DMA tear-down until after
      devres release"), we changed the ordering of tearing down the device DMA
      ops and releasing all the device's resources; this was because the DMA ops
      should be maintained until we release the device's managed DMA memories.
      
      However, we have seen another crash on an arm64 system when a
      device driver probe fails:
      
        hisi_sas_v3_hw 0000:74:02.0: Adding to iommu group 2
        scsi host1: hisi_sas_v3_hw
        BUG: Bad page state in process swapper/0  pfn:313f5
        page:ffff7e0000c4fd40 count:1 mapcount:0
        mapping:0000000000000000 index:0x0
        flags: 0xfffe00000001000(reserved)
        raw: 0fffe00000001000 ffff7e0000c4fd48 ffff7e0000c4fd48
      0000000000000000
        raw: 0000000000000000 0000000000000000 00000001ffffffff
      0000000000000000
        page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
        bad because of flags: 0x1000(reserved)
        Modules linked in:
        CPU: 49 PID: 1 Comm: swapper/0 Not tainted
      5.1.0-rc1-43081-g22d97fd-dirty #1433
        Hardware name: Huawei D06/D06, BIOS Hisilicon D06 UEFI
      RC0 - V1.12.01 01/29/2019
        Call trace:
        dump_backtrace+0x0/0x118
        show_stack+0x14/0x1c
        dump_stack+0xa4/0xc8
        bad_page+0xe4/0x13c
        free_pages_check_bad+0x4c/0xc0
        __free_pages_ok+0x30c/0x340
        __free_pages+0x30/0x44
        __dma_direct_free_pages+0x30/0x38
        dma_direct_free+0x24/0x38
        dma_free_attrs+0x9c/0xd8
        dmam_release+0x20/0x28
        release_nodes+0x17c/0x220
        devres_release_all+0x34/0x54
        really_probe+0xc4/0x2c8
        driver_probe_device+0x58/0xfc
        device_driver_attach+0x68/0x70
        __driver_attach+0x94/0xdc
        bus_for_each_dev+0x5c/0xb4
        driver_attach+0x20/0x28
        bus_add_driver+0x14c/0x200
        driver_register+0x6c/0x124
        __pci_register_driver+0x48/0x50
        sas_v3_pci_driver_init+0x20/0x28
        do_one_initcall+0x40/0x25c
        kernel_init_freeable+0x2b8/0x3c0
        kernel_init+0x10/0x100
        ret_from_fork+0x10/0x18
        Disabling lock debugging due to kernel taint
        BUG: Bad page state in process swapper/0  pfn:313f6
        page:ffff7e0000c4fd80 count:1 mapcount:0
      mapping:0000000000000000 index:0x0
      [   89.322983] flags: 0xfffe00000001000(reserved)
        raw: 0fffe00000001000 ffff7e0000c4fd88 ffff7e0000c4fd88
      0000000000000000
        raw: 0000000000000000 0000000000000000 00000001ffffffff
      0000000000000000
      
      The crash occurs for the same reason.
      
      In this case, on the really_probe() failure path, we are still clearing
      the DMA ops prior to releasing the device's managed memories.
      
      This patch fixes this issue by reordering the DMA ops teardown and the
      call to devres_release_all() on the failure path.
      Reported-by: NXiang Chen <chenxiang66@hisilicon.com>
      Tested-by: NXiang Chen <chenxiang66@hisilicon.com>
      Signed-off-by: NJohn Garry <john.garry@huawei.com>
      Reviewed-by: NRobin Murphy <robin.murphy@arm.com>
      [jpg: backport to 4.19.x and earlier]
      Signed-off-by: NJohn Garry <john.garry@huawei.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      90110ffd
  10. 15 5月, 2019 1 次提交
  11. 20 4月, 2019 2 次提交
    • K
      mm: hide incomplete nr_indirectly_reclaimable in sysfs · 9e91db59
      Konstantin Khlebnikov 提交于
      In upstream branch this fixed by commit b29940c1abd7 ("mm: rename and
      change semantics of nr_indirectly_reclaimable_bytes").
      
      This fixes /sys/devices/system/node/node*/vmstat format:
      
      ...
      nr_dirtied 6613155
      nr_written 5796802
       11089216
      ...
      
      Cc: <stable@vger.kernel.org> # 4.19.y
      Fixes: 7aaf7727 ("mm: don't show nr_indirectly_reclaimable in /proc/vmstat")
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9e91db59
    • J
      PM / Domains: Avoid a potential deadlock · b63df738
      Jiada Wang 提交于
      [ Upstream commit 2071ac985d37efe496782c34318dbead93beb02f ]
      
      Lockdep warns that prepare_lock and genpd->mlock can cause a deadlock
      the deadlock scenario is like following:
      First thread is probing cs2000
      cs2000_probe()
        clk_register()
          __clk_core_init()
            clk_prepare_lock()                            ----> acquires prepare_lock
              cs2000_recalc_rate()
                i2c_smbus_read_byte_data()
                  rcar_i2c_master_xfer()
                    dma_request_chan()
                      rcar_dmac_of_xlate()
                        rcar_dmac_alloc_chan_resources()
                          pm_runtime_get_sync()
                            __pm_runtime_resume()
                              rpm_resume()
                                rpm_callback()
                                  genpd_runtime_resume()   ----> acquires genpd->mlock
      
      Second thread is attaching any device to the same PM domain
      genpd_add_device()
        genpd_lock()                                       ----> acquires genpd->mlock
          cpg_mssr_attach_dev()
            of_clk_get_from_provider()
              __of_clk_get_from_provider()
                __clk_create_clk()
                  clk_prepare_lock()                       ----> acquires prepare_lock
      
      Since currently no PM provider access genpd's critical section
      in .attach_dev, and .detach_dev callbacks, so there is no need to protect
      these two callbacks with genpd->mlock.
      This patch avoids a potential deadlock by moving out .attach_dev and .detach_dev
      from genpd->mlock, so that genpd->mlock won't be held when prepare_lock is acquired
      in .attach_dev and .detach_dev
      Signed-off-by: NJiada Wang <jiada_wang@mentor.com>
      Reviewed-by: NUlf Hansson <ulf.hansson@linaro.org>
      Tested-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Reviewed-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      b63df738
  12. 24 3月, 2019 1 次提交
    • V
      PM / wakeup: Rework wakeup source timer cancellation · cd738246
      Viresh Kumar 提交于
      commit 1fad17fb1bbcd73159c2b992668a6957ecc5af8a upstream.
      
      If wakeup_source_add() is called right after wakeup_source_remove()
      for the same wakeup source, timer_setup() may be called for a
      potentially scheduled timer which is incorrect.
      
      To avoid that, move the wakeup source timer cancellation from
      wakeup_source_drop() to wakeup_source_remove().
      
      Moreover, make wakeup_source_remove() clear the timer function after
      canceling the timer to let wakeup_source_not_registered() treat
      unregistered wakeup sources in the same way as the ones that have
      never been registered.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Cc: 4.4+ <stable@vger.kernel.org> # 4.4+
      [ rjw: Subject, changelog, merged two patches together ]
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cd738246
  13. 14 3月, 2019 1 次提交
    • G
      driver core: Postpone DMA tear-down until after devres release · 7053a6fa
      Geert Uytterhoeven 提交于
      commit 376991db4b6464e906d699ef07681e2ffa8ab08c upstream.
      
      When unbinding the (IOMMU-enabled) R-Car SATA device on Salvator-XS
      (R-Car H3 ES2.0), in preparation of rebinding against vfio-platform for
      device pass-through for virtualization:
      
          echo ee300000.sata > /sys/bus/platform/drivers/sata_rcar/unbind
      
      the kernel crashes with:
      
          Unable to handle kernel paging request at virtual address ffffffbf029ffffc
          Mem abort info:
            ESR = 0x96000006
            Exception class = DABT (current EL), IL = 32 bits
            SET = 0, FnV = 0
            EA = 0, S1PTW = 0
          Data abort info:
            ISV = 0, ISS = 0x00000006
            CM = 0, WnR = 0
          swapper pgtable: 4k pages, 39-bit VAs, pgdp = 000000007e8c586c
          [ffffffbf029ffffc] pgd=000000073bfc6003, pud=000000073bfc6003, pmd=0000000000000000
          Internal error: Oops: 96000006 [#1] SMP
          Modules linked in:
          CPU: 0 PID: 1098 Comm: bash Not tainted 5.0.0-rc5-salvator-x-00452-g37596f884f4318ef #287
          Hardware name: Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+ (DT)
          pstate: 60400005 (nZCv daif +PAN -UAO)
          pc : __free_pages+0x8/0x58
          lr : __dma_direct_free_pages+0x50/0x5c
          sp : ffffff801268baa0
          x29: ffffff801268baa0 x28: 0000000000000000
          x27: ffffffc6f9c60bf0 x26: ffffffc6f9c60bf0
          x25: ffffffc6f9c60810 x24: 0000000000000000
          x23: 00000000fffff000 x22: ffffff8012145000
          x21: 0000000000000800 x20: ffffffbf029fffc8
          x19: 0000000000000000 x18: ffffffc6f86c42c8
          x17: 0000000000000000 x16: 0000000000000070
          x15: 0000000000000003 x14: 0000000000000000
          x13: ffffff801103d7f8 x12: 0000000000000028
          x11: ffffff8011117604 x10: 0000000000009ad8
          x9 : ffffff80110126d0 x8 : ffffffc6f7563000
          x7 : 6b6b6b6b6b6b6b6b x6 : 0000000000000018
          x5 : ffffff8011cf3cc8 x4 : 0000000000004000
          x3 : 0000000000080000 x2 : 0000000000000001
          x1 : 0000000000000000 x0 : ffffffbf029fffc8
          Process bash (pid: 1098, stack limit = 0x00000000c38e3e32)
          Call trace:
           __free_pages+0x8/0x58
           __dma_direct_free_pages+0x50/0x5c
           arch_dma_free+0x1c/0x98
           dma_direct_free+0x14/0x24
           dma_free_attrs+0x9c/0xdc
           dmam_release+0x18/0x20
           release_nodes+0x25c/0x28c
           devres_release_all+0x48/0x4c
           device_release_driver_internal+0x184/0x1f0
           device_release_driver+0x14/0x1c
           unbind_store+0x70/0xb8
           drv_attr_store+0x24/0x34
           sysfs_kf_write+0x4c/0x64
           kernfs_fop_write+0x154/0x1c4
           __vfs_write+0x34/0x164
           vfs_write+0xb4/0x16c
           ksys_write+0x5c/0xbc
           __arm64_sys_write+0x14/0x1c
           el0_svc_common+0x98/0x114
           el0_svc_handler+0x1c/0x24
           el0_svc+0x8/0xc
          Code: d51b4234 17fffffa a9bf7bfd 910003fd (b9403404)
          ---[ end trace 8c564cdd3a1a840f ]---
      
      While I've bisected this to commit e8e683ae9a736407 ("iommu/of: Fix
      probe-deferral"), and reverting that commit on post-v5.0-rc4 kernels
      does fix the problem, this turned out to be a red herring.
      
      On arm64, arch_teardown_dma_ops() resets dev->dma_ops to NULL.
      Hence if a driver has used a managed DMA allocation API, the allocated
      DMA memory will be freed using the direct DMA ops, while it may have
      been allocated using a custom DMA ops (iommu_dma_ops in this case).
      
      Fix this by reversing the order of the calls to devres_release_all() and
      arch_teardown_dma_ops().
      Signed-off-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Acked-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: stable <stable@vger.kernel.org>
      Reviewed-by: NRobin Murphy <robin.murphy@arm.com>
      [rm: backport for 4.12-4.19 - kernels before 5.0 will not see
       the crash above, but may get silent memory corruption instead]
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7053a6fa
  14. 13 2月, 2019 4 次提交
  15. 26 1月, 2019 1 次提交
    • D
      sysfs: Disable lockdep for driver bind/unbind files · a13daf03
      Daniel Vetter 提交于
      [ Upstream commit 4f4b374332ec0ae9c738ff8ec9bed5cd97ff9adc ]
      
      This is the much more correct fix for my earlier attempt at:
      
      https://lkml.org/lkml/2018/12/10/118
      
      Short recap:
      
      - There's not actually a locking issue, it's just lockdep being a bit
        too eager to complain about a possible deadlock.
      
      - Contrary to what I claimed the real problem is recursion on
        kn->count. Greg pointed me at sysfs_break_active_protection(), used
        by the scsi subsystem to allow a sysfs file to unbind itself. That
        would be a real deadlock, which isn't what's happening here. Also,
        breaking the active protection means we'd need to manually handle
        all the lifetime fun.
      
      - With Rafael we discussed the task_work approach, which kinda works,
        but has two downsides: It's a functional change for a lockdep
        annotation issue, and it won't work for the bind file (which needs
        to get the errno from the driver load function back to userspace).
      
      - Greg also asked why this never showed up: To hit this you need to
        unregister a 2nd driver from the unload code of your first driver. I
        guess only gpus do that. The bug has always been there, but only
        with a recent patch series did we add more locks so that lockdep
        built a chain from unbinding the snd-hda driver to the
        acpi_video_unregister call.
      
      Full lockdep splat:
      
      [12301.898799] ============================================
      [12301.898805] WARNING: possible recursive locking detected
      [12301.898811] 4.20.0-rc7+ #84 Not tainted
      [12301.898815] --------------------------------------------
      [12301.898821] bash/5297 is trying to acquire lock:
      [12301.898826] 00000000f61c6093 (kn->count#39){++++}, at: kernfs_remove_by_name_ns+0x3b/0x80
      [12301.898841] but task is already holding lock:
      [12301.898847] 000000005f634021 (kn->count#39){++++}, at: kernfs_fop_write+0xdc/0x190
      [12301.898856] other info that might help us debug this:
      [12301.898862]  Possible unsafe locking scenario:
      [12301.898867]        CPU0
      [12301.898870]        ----
      [12301.898874]   lock(kn->count#39);
      [12301.898879]   lock(kn->count#39);
      [12301.898883] *** DEADLOCK ***
      [12301.898891]  May be due to missing lock nesting notation
      [12301.898899] 5 locks held by bash/5297:
      [12301.898903]  #0: 00000000cd800e54 (sb_writers#4){.+.+}, at: vfs_write+0x17f/0x1b0
      [12301.898915]  #1: 000000000465e7c2 (&of->mutex){+.+.}, at: kernfs_fop_write+0xd3/0x190
      [12301.898925]  #2: 000000005f634021 (kn->count#39){++++}, at: kernfs_fop_write+0xdc/0x190
      [12301.898936]  #3: 00000000414ef7ac (&dev->mutex){....}, at: device_release_driver_internal+0x34/0x240
      [12301.898950]  #4: 000000003218fbdf (register_count_mutex){+.+.}, at: acpi_video_unregister+0xe/0x40
      [12301.898960] stack backtrace:
      [12301.898968] CPU: 1 PID: 5297 Comm: bash Not tainted 4.20.0-rc7+ #84
      [12301.898974] Hardware name: Hewlett-Packard HP EliteBook 8460p/161C, BIOS 68SCF Ver. F.01 03/11/2011
      [12301.898982] Call Trace:
      [12301.898989]  dump_stack+0x67/0x9b
      [12301.898997]  __lock_acquire+0x6ad/0x1410
      [12301.899003]  ? kernfs_remove_by_name_ns+0x3b/0x80
      [12301.899010]  ? find_held_lock+0x2d/0x90
      [12301.899017]  ? mutex_spin_on_owner+0xe4/0x150
      [12301.899023]  ? find_held_lock+0x2d/0x90
      [12301.899030]  ? lock_acquire+0x90/0x180
      [12301.899036]  lock_acquire+0x90/0x180
      [12301.899042]  ? kernfs_remove_by_name_ns+0x3b/0x80
      [12301.899049]  __kernfs_remove+0x296/0x310
      [12301.899055]  ? kernfs_remove_by_name_ns+0x3b/0x80
      [12301.899060]  ? kernfs_name_hash+0xd/0x80
      [12301.899066]  ? kernfs_find_ns+0x6c/0x100
      [12301.899073]  kernfs_remove_by_name_ns+0x3b/0x80
      [12301.899080]  bus_remove_driver+0x92/0xa0
      [12301.899085]  acpi_video_unregister+0x24/0x40
      [12301.899127]  i915_driver_unload+0x42/0x130 [i915]
      [12301.899160]  i915_pci_remove+0x19/0x30 [i915]
      [12301.899169]  pci_device_remove+0x36/0xb0
      [12301.899176]  device_release_driver_internal+0x185/0x240
      [12301.899183]  unbind_store+0xaf/0x180
      [12301.899189]  kernfs_fop_write+0x104/0x190
      [12301.899195]  __vfs_write+0x31/0x180
      [12301.899203]  ? rcu_read_lock_sched_held+0x6f/0x80
      [12301.899209]  ? rcu_sync_lockdep_assert+0x29/0x50
      [12301.899216]  ? __sb_start_write+0x13c/0x1a0
      [12301.899221]  ? vfs_write+0x17f/0x1b0
      [12301.899227]  vfs_write+0xb9/0x1b0
      [12301.899233]  ksys_write+0x50/0xc0
      [12301.899239]  do_syscall_64+0x4b/0x180
      [12301.899247]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [12301.899253] RIP: 0033:0x7f452ac7f7a4
      [12301.899259] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 80 00 00 00 00 8b 05 aa f0 2c 00 48 63 ff 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 55 53 48 89 d5 48 89 f3 48 83
      [12301.899273] RSP: 002b:00007ffceafa6918 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [12301.899282] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007f452ac7f7a4
      [12301.899288] RDX: 000000000000000d RSI: 00005612a1abf7c0 RDI: 0000000000000001
      [12301.899295] RBP: 00005612a1abf7c0 R08: 000000000000000a R09: 00005612a1c46730
      [12301.899301] R10: 000000000000000a R11: 0000000000000246 R12: 000000000000000d
      [12301.899308] R13: 0000000000000001 R14: 00007f452af4a740 R15: 000000000000000d
      
      Looking around I've noticed that usb and i2c already handle similar
      recursion problems, where a sysfs file can unbind the same type of
      sysfs somewhere else in the hierarchy. Relevant commits are:
      
      commit 356c05d5
      Author: Alan Stern <stern@rowland.harvard.edu>
      Date:   Mon May 14 13:30:03 2012 -0400
      
          sysfs: get rid of some lockdep false positives
      
      commit e9b526fe
      Author: Alexander Sverdlin <alexander.sverdlin@nsn.com>
      Date:   Fri May 17 14:56:35 2013 +0200
      
          i2c: suppress lockdep warning on delete_device
      
      Implement the same trick for driver bind/unbind.
      
      v2: Put the macro into bus.c (Greg).
      Reviewed-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Ramalingam C <ramalingam.c@intel.com>
      Cc: Arend van Spriel <aspriel@gmail.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Geert Uytterhoeven <geert+renesas@glider.be>
      Cc: Bartosz Golaszewski <brgl@bgdev.pl>
      Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com>
      Cc: Vivek Gautam <vivek.gautam@codeaurora.org>
      Cc: Joe Perches <joe@perches.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      a13daf03
  16. 13 1月, 2019 1 次提交
  17. 10 1月, 2019 1 次提交
    • M
      platform-msi: Free descriptors in platform_msi_domain_free() · 6c56e89e
      Miquel Raynal 提交于
      commit 81b1e6e6a8590a19257e37a1633bec098d499c57 upstream.
      
      Since the addition of platform MSI support, there were two helpers
      supposed to allocate/free IRQs for a device:
      
          platform_msi_domain_alloc_irqs()
          platform_msi_domain_free_irqs()
      
      In these helpers, IRQ descriptors are allocated in the "alloc" routine
      while they are freed in the "free" one.
      
      Later, two other helpers have been added to handle IRQ domains on top
      of MSI domains:
      
          platform_msi_domain_alloc()
          platform_msi_domain_free()
      
      Seen from the outside, the logic is pretty close with the former
      helpers and people used it with the same logic as before: a
      platform_msi_domain_alloc() call should be balanced with a
      platform_msi_domain_free() call. While this is probably what was
      intended to do, the platform_msi_domain_free() does not remove/free
      the IRQ descriptor(s) created/inserted in
      platform_msi_domain_alloc().
      
      One effect of such situation is that removing a module that requested
      an IRQ will let one orphaned IRQ descriptor (with an allocated MSI
      entry) in the device descriptors list. Next time the module will be
      inserted back, one will observe that the allocation will happen twice
      in the MSI domain, one time for the remaining descriptor, one time for
      the new one. It also has the side effect to quickly overshoot the
      maximum number of allocated MSI and then prevent any module requesting
      an interrupt in the same domain to be inserted anymore.
      
      This situation has been met with loops of insertion/removal of the
      mvpp2.ko module (requesting 15 MSIs each time).
      
      Fixes: 552c494a ("platform-msi: Allow creation of a MSI-based stacked irq domain")
      Cc: stable@vger.kernel.org
      Signed-off-by: NMiquel Raynal <miquel.raynal@bootlin.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6c56e89e
  18. 05 10月, 2018 1 次提交
  19. 30 9月, 2018 1 次提交
  20. 12 9月, 2018 1 次提交
    • R
      firmware: Fix security issue with request_firmware_into_buf() · 422b3db2
      Rishabh Bhatnagar 提交于
      When calling request_firmware_into_buf() with the FW_OPT_NOCACHE flag
      it is expected that firmware is loaded into buffer from memory.
      But inside alloc_lookup_fw_priv every new firmware that is loaded is
      added to the firmware cache (fwc) list head. So if any driver requests
      a firmware that is already loaded the code iterates over the above
      mentioned list and it can end up giving a pointer to other device driver's
      firmware buffer.
      Also the existing copy may either be modified by drivers, remote processors
      or even freed. This causes a potential security issue with batched requests
      when using request_firmware_into_buf.
      
      Fix alloc_lookup_fw_priv to not add to the fwc head list if FW_OPT_NOCACHE
      is set, and also don't do the lookup in the list.
      
      Fixes: 0e742e92 ("firmware: provide infrastructure to make fw caching optional")
      [mcgrof: broken since feature introduction on v4.8]
      
      Cc: stable@vger.kernel.org # v4.8+
      Signed-off-by: NVikram Mulukutla <markivx@codeaurora.org>
      Signed-off-by: NRishabh Bhatnagar <rishabhb@codeaurora.org>
      Signed-off-by: NLuis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      422b3db2
  21. 05 9月, 2018 1 次提交
    • M
      memory_hotplug: fix kernel_panic on offline page processing · 4e8346d0
      Mikhail Zaslonko 提交于
      Within show_valid_zones() the function test_pages_in_a_zone() should be
      called for online memory blocks only.
      
      Otherwise it might lead to the VM_BUG_ON due to uninitialized struct
      pages (when CONFIG_DEBUG_VM_PGFLAGS kernel option is set):
      
       page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
       ------------[ cut here ]------------
       Call Trace:
       ([<000000000038f91e>] test_pages_in_a_zone+0xe6/0x168)
        [<0000000000923472>] show_valid_zones+0x5a/0x1a8
        [<0000000000900284>] dev_attr_show+0x3c/0x78
        [<000000000046f6f0>] sysfs_kf_seq_show+0xd0/0x150
        [<00000000003ef662>] seq_read+0x212/0x4b8
        [<00000000003bf202>] __vfs_read+0x3a/0x178
        [<00000000003bf3ca>] vfs_read+0x8a/0x148
        [<00000000003bfa3a>] ksys_read+0x62/0xb8
        [<0000000000bc2220>] system_call+0xdc/0x2d8
      
      That VM_BUG_ON was triggered by the page poisoning introduced in
      mm/sparse.c with the git commit d0dc12e8 ("mm/memory_hotplug:
      optimize memory hotplug").
      
      With the same commit the new 'nid' field has been added to the struct
      memory_block in order to store and later on derive the node id for
      offline pages (instead of accessing struct page which might be
      uninitialized).  But one reference to nid in show_valid_zones() function
      has been overlooked.  Fixed with current commit.  Also, nr_pages will
      not be used any more after test_pages_in_a_zone() call, do not update
      it.
      
      Link: http://lkml.kernel.org/r/20180828090539.41491-1-zaslonko@linux.ibm.com
      Fixes: d0dc12e8 ("mm/memory_hotplug: optimize memory hotplug")
      Signed-off-by: NMikhail Zaslonko <zaslonko@linux.ibm.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NPavel Tatashin <pavel.tatashin@microsoft.com>
      Cc: <stable@vger.kernel.org>	[4.17+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4e8346d0
  22. 24 8月, 2018 1 次提交
  23. 18 8月, 2018 4 次提交
  24. 09 8月, 2018 1 次提交
    • C
      regmap: Add regmap_noinc_read API · 74fe7b55
      Crestez Dan Leonard 提交于
      The regmap API usually assumes that bulk read operations will read a
      range of registers but some I2C/SPI devices have certain registers for
      which a such a read operation will return data from an internal FIFO
      instead. Add an explicit API to support bulk read without range semantics.
      
      Some linux drivers use regmap_bulk_read or regmap_raw_read for such
      registers, for example mpu6050 or bmi150 from IIO. This only happens to
      work because when caching is disabled a single regmap read op will map
      to a single bus read op (as desired). This breaks if caching is enabled and
      reg+1 happens to be a cacheable register.
      
      Without regmap support refactoring a driver to enable regmap caching
      requires separate I2C and SPI paths. This is exactly what regmap is
      supposed to help avoid.
      Suggested-by: NJonathan Cameron <jic23@kernel.org>
      Signed-off-by: NCrestez Dan Leonard <leonard.crestez@intel.com>
      Signed-off-by: NStefan Popa <stefan.popa@analog.com>
      Signed-off-by: NMark Brown <broonie@kernel.org>
      74fe7b55
  25. 24 7月, 2018 1 次提交
  26. 21 7月, 2018 3 次提交