1. 07 12月, 2022 1 次提交
    • A
      ata: libahci_platform: ahci_platform_find_clk: oops, NULL pointer · d95d140e
      Anders Roxell 提交于
      When booting a arm 32-bit kernel with config CONFIG_AHCI_DWC enabled on
      a am57xx-evm board. This happens when the clock references are unnamed
      in DT, the strcmp() produces a NULL pointer dereference, see the
      following oops, NULL pointer dereference:
      
      [    4.673950] Unable to handle kernel NULL pointer dereference at virtual address 00000000
      [    4.682098] [00000000] *pgd=00000000
      [    4.685699] Internal error: Oops: 5 [#1] SMP ARM
      [    4.690338] Modules linked in:
      [    4.693420] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc7 #1
      [    4.699615] Hardware name: Generic DRA74X (Flattened Device Tree)
      [    4.705749] PC is at strcmp+0x0/0x34
      [    4.709350] LR is at ahci_platform_find_clk+0x3c/0x5c
      [    4.714416] pc : [<c130c494>]    lr : [<c0c230e0>]    psr: 20000013
      [    4.720703] sp : f000dda8  ip : 00000001  fp : c29b1840
      [    4.725952] r10: 00000020  r9 : c1b23380  r8 : c1b23368
      [    4.731201] r7 : c1ab4cc4  r6 : 00000001  r5 : c3c66040  r4 : 00000000
      [    4.737762] r3 : 00000080  r2 : 00000080  r1 : c1ab4cc4  r0 : 00000000
      [...]
      [    4.998870]  strcmp from ahci_platform_find_clk+0x3c/0x5c
      [    5.004302]  ahci_platform_find_clk from ahci_dwc_probe+0x1f0/0x54c
      [    5.010589]  ahci_dwc_probe from platform_probe+0x64/0xc0
      [    5.016021]  platform_probe from really_probe+0xe8/0x41c
      [    5.021362]  really_probe from __driver_probe_device+0xa4/0x204
      [    5.027313]  __driver_probe_device from driver_probe_device+0x38/0xc8
      [    5.033782]  driver_probe_device from __driver_attach+0xb4/0x1ec
      [    5.039825]  __driver_attach from bus_for_each_dev+0x78/0xb8
      [    5.045532]  bus_for_each_dev from bus_add_driver+0x17c/0x220
      [    5.051300]  bus_add_driver from driver_register+0x90/0x124
      [    5.056915]  driver_register from do_one_initcall+0x48/0x1e8
      [    5.062591]  do_one_initcall from kernel_init_freeable+0x1cc/0x234
      [    5.068817]  kernel_init_freeable from kernel_init+0x20/0x13c
      [    5.074584]  kernel_init from ret_from_fork+0x14/0x2c
      [    5.079681] Exception stack(0xf000dfb0 to 0xf000dff8)
      [    5.084747] dfa0:                                     00000000 00000000 00000000 00000000
      [    5.092956] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      [    5.101165] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
      [    5.107818] Code: e5e32001 e3520000 1afffffb e12fff1e (e4d03001)
      [    5.114013] ---[ end trace 0000000000000000 ]---
      
      Add an extra check in the if-statement if hpriv-clks[i].id.
      
      Fixes: 6ce73f3a ("ata: libahci_platform: Add function returning a clock-handle by id")
      Suggested-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAnders Roxell <anders.roxell@linaro.org>
      Reviewed-by: NSerge Semin <fancer.lancer@gmail.com>
      Signed-off-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
      d95d140e
  2. 12 11月, 2022 1 次提交
    • N
      ata: libata-core: do not issue non-internal commands once EH is pending · e20e81a2
      Niklas Cassel 提交于
      While the ATA specification states that a device should return command
      aborted for all commands queued after the device has entered error state,
      since ATA only keeps the sense data for the latest command (in non-NCQ
      case), we really don't want to send block layer commands to the device
      after it has entered error state. (Only ATA EH commands should be sent,
      to read the sense data etc.)
      
      Currently, scsi_queue_rq() will check if scsi_host_in_recovery()
      (state is SHOST_RECOVERY), and if so, it will _not_ issue a command via:
      scsi_dispatch_cmd() -> host->hostt->queuecommand() (ata_scsi_queuecmd())
      -> __ata_scsi_queuecmd() -> ata_scsi_translate() -> ata_qc_issue()
      
      Before commit e494f6a7 ("[SCSI] improved eh timeout handler"),
      when receiving a TFES error IRQ, the call chain looked like this:
      ahci_error_intr() -> ata_port_abort() -> ata_do_link_abort() ->
      ata_qc_complete() -> ata_qc_schedule_eh() -> blk_abort_request() ->
      blk_rq_timed_out() -> q->rq_timed_out_fn() (scsi_times_out()) ->
      scsi_eh_scmd_add() -> scsi_host_set_state(shost, SHOST_RECOVERY)
      
      Which meant that as soon as an error IRQ was serviced, SHOST_RECOVERY
      would be set.
      
      However, after commit e494f6a7 ("[SCSI] improved eh timeout handler"),
      scsi_times_out() will instead call scsi_abort_command() which will queue
      delayed work, and the worker function scmd_eh_abort_handler() will call
      scsi_eh_scmd_add(), which calls scsi_host_set_state(shost, SHOST_RECOVERY).
      
      So now, after the TFES error IRQ has been serviced, we need to wait for
      the SCSI workqueue to run its work before SHOST_RECOVERY gets set.
      
      It is worth noting that, even before commit e494f6a7 ("[SCSI] improved
      eh timeout handler"), we could receive an error IRQ from the time when
      scsi_queue_rq() checks scsi_host_in_recovery(), to the time when
      ata_scsi_queuecmd() is actually called.
      
      In order to handle both the delayed setting of SHOST_RECOVERY and the
      window where we can receive an error IRQ, add a check against
      ATA_PFLAG_EH_PENDING (which gets set when servicing the error IRQ),
      inside ata_scsi_queuecmd() itself, while holding the ap->lock.
      (Since the ap->lock is held while servicing IRQs.)
      
      Fixes: e494f6a7 ("[SCSI] improved eh timeout handler")
      Signed-off-by: NNiklas Cassel <niklas.cassel@wdc.com>
      Tested-by: NJohn Garry <john.g.garry@oracle.com>
      Signed-off-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
      e20e81a2
  3. 11 11月, 2022 4 次提交
    • Y
      ata: libata-transport: fix error handling in ata_tdev_add() · 1ff36351
      Yang Yingliang 提交于
      In ata_tdev_add(), the return value of transport_add_device() is
      not checked. As a result, it causes null-ptr-deref while removing
      the module, because transport_remove_device() is called to remove
      the device that was not added.
      
      Unable to handle kernel NULL pointer dereference at virtual address 00000000000000d0
      CPU: 13 PID: 13603 Comm: rmmod Kdump: loaded Tainted: G        W          6.1.0-rc3+ #36
      pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      pc : device_del+0x48/0x3a0
      lr : device_del+0x44/0x3a0
      Call trace:
       device_del+0x48/0x3a0
       attribute_container_class_device_del+0x28/0x40
       transport_remove_classdev+0x60/0x7c
       attribute_container_device_trigger+0x118/0x120
       transport_remove_device+0x20/0x30
       ata_tdev_delete+0x24/0x50 [libata]
       ata_tlink_delete+0x40/0xa0 [libata]
       ata_tport_delete+0x2c/0x60 [libata]
       ata_port_detach+0x148/0x1b0 [libata]
       ata_pci_remove_one+0x50/0x80 [libata]
       ahci_remove_one+0x4c/0x8c [ahci]
      
      Fix this by checking and handling return value of transport_add_device()
      in ata_tdev_add(). In the error path, device_del() is called to delete
      the device which was added earlier in this function, and ata_tdev_free()
      is called to free ata_dev.
      
      Fixes: d9027470 ("[libata] Add ATA transport class")
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
      1ff36351
    • Y
      ata: libata-transport: fix error handling in ata_tlink_add() · cf0816f6
      Yang Yingliang 提交于
      In ata_tlink_add(), the return value of transport_add_device() is
      not checked. As a result, it causes null-ptr-deref while removing
      the module, because transport_remove_device() is called to remove
      the device that was not added.
      
      Unable to handle kernel NULL pointer dereference at virtual address 00000000000000d0
      CPU: 33 PID: 13850 Comm: rmmod Kdump: loaded Tainted: G        W          6.1.0-rc3+ #12
      pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      pc : device_del+0x48/0x39c
      lr : device_del+0x44/0x39c
      Call trace:
       device_del+0x48/0x39c
       attribute_container_class_device_del+0x28/0x40
       transport_remove_classdev+0x60/0x7c
       attribute_container_device_trigger+0x118/0x120
       transport_remove_device+0x20/0x30
       ata_tlink_delete+0x88/0xb0 [libata]
       ata_tport_delete+0x2c/0x60 [libata]
       ata_port_detach+0x148/0x1b0 [libata]
       ata_pci_remove_one+0x50/0x80 [libata]
       ahci_remove_one+0x4c/0x8c [ahci]
      
      Fix this by checking and handling return value of transport_add_device()
      in ata_tlink_add().
      
      Fixes: d9027470 ("[libata] Add ATA transport class")
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
      cf0816f6
    • Y
      ata: libata-transport: fix error handling in ata_tport_add() · 3613dbe3
      Yang Yingliang 提交于
      In ata_tport_add(), the return value of transport_add_device() is
      not checked. As a result, it causes null-ptr-deref while removing
      the module, because transport_remove_device() is called to remove
      the device that was not added.
      
      Unable to handle kernel NULL pointer dereference at virtual address 00000000000000d0
      CPU: 12 PID: 13605 Comm: rmmod Kdump: loaded Tainted: G        W          6.1.0-rc3+ #8
      pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      pc : device_del+0x48/0x39c
      lr : device_del+0x44/0x39c
      Call trace:
       device_del+0x48/0x39c
       attribute_container_class_device_del+0x28/0x40
       transport_remove_classdev+0x60/0x7c
       attribute_container_device_trigger+0x118/0x120
       transport_remove_device+0x20/0x30
       ata_tport_delete+0x34/0x60 [libata]
       ata_port_detach+0x148/0x1b0 [libata]
       ata_pci_remove_one+0x50/0x80 [libata]
       ahci_remove_one+0x4c/0x8c [ahci]
      
      Fix this by checking and handling return value of transport_add_device()
      in ata_tport_add().
      
      Fixes: d9027470 ("[libata] Add ATA transport class")
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
      3613dbe3
    • Y
      ata: libata-transport: fix double ata_host_put() in ata_tport_add() · 8c763107
      Yang Yingliang 提交于
      In the error path in ata_tport_add(), when calling put_device(),
      ata_tport_release() is called, it will put the refcount of 'ap->host'.
      
      And then ata_host_put() is called again, the refcount is decreased
      to 0, ata_host_release() is called, all ports are freed and set to
      null.
      
      When unbinding the device after failure, ata_host_stop() is called
      to release the resources, it leads a null-ptr-deref(), because all
      the ports all freed and null.
      
      Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
      CPU: 7 PID: 18671 Comm: modprobe Kdump: loaded Tainted: G            E      6.1.0-rc3+ #8
      pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      pc : ata_host_stop+0x3c/0x84 [libata]
      lr : release_nodes+0x64/0xd0
      Call trace:
       ata_host_stop+0x3c/0x84 [libata]
       release_nodes+0x64/0xd0
       devres_release_all+0xbc/0x1b0
       device_unbind_cleanup+0x20/0x70
       really_probe+0x158/0x320
       __driver_probe_device+0x84/0x120
       driver_probe_device+0x44/0x120
       __driver_attach+0xb4/0x220
       bus_for_each_dev+0x78/0xdc
       driver_attach+0x2c/0x40
       bus_add_driver+0x184/0x240
       driver_register+0x80/0x13c
       __pci_register_driver+0x4c/0x60
       ahci_pci_driver_init+0x30/0x1000 [ahci]
      
      Fix this by removing redundant ata_host_put() in the error path.
      
      Fixes: 2623c7a5 ("libata: add refcounting to ata_host")
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
      8c763107
  4. 08 11月, 2022 1 次提交
  5. 31 10月, 2022 2 次提交
  6. 18 10月, 2022 5 次提交
  7. 17 10月, 2022 3 次提交
    • D
      ata: ahci_st: Fix compilation warning · 17cc1ee6
      Damien Le Moal 提交于
      If CONFIG_OF is disabled and the ahci_st driver is builtin (or
      CONFIG_MODULES is disabled), then using the macro of_match_ptr()
      results in the st_ahci_match variable being unused, which generates a
      compilation warning and a compilation error if CONFIG_WERROR is enabled.
      
      Fix this by directly assigning st_ahci_match to .of_match_table in the
      st_ahci_driver platform driver definition.
      Reported-by: Nkernel test robot <lkp@intel.com>
      Signed-off-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      17cc1ee6
    • K
      ata: ahci: Match EM_MAX_SLOTS with SATA_PMP_MAX_PORTS · 1e41e693
      Kai-Heng Feng 提交于
      UBSAN complains about array-index-out-of-bounds:
      [ 1.980703] kernel: UBSAN: array-index-out-of-bounds in /build/linux-9H675w/linux-5.15.0/drivers/ata/libahci.c:968:41
      [ 1.980709] kernel: index 15 is out of range for type 'ahci_em_priv [8]'
      [ 1.980713] kernel: CPU: 0 PID: 209 Comm: scsi_eh_8 Not tainted 5.15.0-25-generic #25-Ubuntu
      [ 1.980716] kernel: Hardware name: System manufacturer System Product Name/P5Q3, BIOS 1102 06/11/2010
      [ 1.980718] kernel: Call Trace:
      [ 1.980721] kernel: <TASK>
      [ 1.980723] kernel: show_stack+0x52/0x58
      [ 1.980729] kernel: dump_stack_lvl+0x4a/0x5f
      [ 1.980734] kernel: dump_stack+0x10/0x12
      [ 1.980736] kernel: ubsan_epilogue+0x9/0x45
      [ 1.980739] kernel: __ubsan_handle_out_of_bounds.cold+0x44/0x49
      [ 1.980742] kernel: ahci_qc_issue+0x166/0x170 [libahci]
      [ 1.980748] kernel: ata_qc_issue+0x135/0x240
      [ 1.980752] kernel: ata_exec_internal_sg+0x2c4/0x580
      [ 1.980754] kernel: ? vprintk_default+0x1d/0x20
      [ 1.980759] kernel: ata_exec_internal+0x67/0xa0
      [ 1.980762] kernel: sata_pmp_read+0x8d/0xc0
      [ 1.980765] kernel: sata_pmp_read_gscr+0x3c/0x90
      [ 1.980768] kernel: sata_pmp_attach+0x8b/0x310
      [ 1.980771] kernel: ata_eh_revalidate_and_attach+0x28c/0x4b0
      [ 1.980775] kernel: ata_eh_recover+0x6b6/0xb30
      [ 1.980778] kernel: ? ahci_do_hardreset+0x180/0x180 [libahci]
      [ 1.980783] kernel: ? ahci_stop_engine+0xb0/0xb0 [libahci]
      [ 1.980787] kernel: ? ahci_do_softreset+0x290/0x290 [libahci]
      [ 1.980792] kernel: ? trace_event_raw_event_ata_eh_link_autopsy_qc+0xe0/0xe0
      [ 1.980795] kernel: sata_pmp_eh_recover.isra.0+0x214/0x560
      [ 1.980799] kernel: sata_pmp_error_handler+0x23/0x40
      [ 1.980802] kernel: ahci_error_handler+0x43/0x80 [libahci]
      [ 1.980806] kernel: ata_scsi_port_error_handler+0x2b1/0x600
      [ 1.980810] kernel: ata_scsi_error+0x9c/0xd0
      [ 1.980813] kernel: scsi_error_handler+0xa1/0x180
      [ 1.980817] kernel: ? scsi_unjam_host+0x1c0/0x1c0
      [ 1.980820] kernel: kthread+0x12a/0x150
      [ 1.980823] kernel: ? set_kthread_struct+0x50/0x50
      [ 1.980826] kernel: ret_from_fork+0x22/0x30
      [ 1.980831] kernel: </TASK>
      
      This happens because sata_pmp_init_links() initialize link->pmp up to
      SATA_PMP_MAX_PORTS while em_priv is declared as 8 elements array.
      
      I can't find the maximum Enclosure Management ports specified in AHCI
      spec v1.3.1, but "12.2.1 LED message type" states that "Port Multiplier
      Information" can utilize 4 bits, which implies it can support up to 16
      ports. Hence, use SATA_PMP_MAX_PORTS as EM_MAX_SLOTS to resolve the
      issue.
      
      BugLink: https://bugs.launchpad.net/bugs/1970074
      Cc: stable@vger.kernel.org
      Signed-off-by: NKai-Heng Feng <kai.heng.feng@canonical.com>
      Signed-off-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
      1e41e693
    • A
      ata: ahci-imx: Fix MODULE_ALIAS · 979556f1
      Alexander Stein 提交于
      'ahci:' is an invalid prefix, preventing the module from autoloading.
      Fix this by using the 'platform:' prefix and DRV_NAME.
      
      Fixes: 9e54eae2 ("ahci_imx: add ahci sata support on imx platforms")
      Cc: stable@vger.kernel.org
      Signed-off-by: NAlexander Stein <alexander.stein@ew.tq-group.com>
      Reviewed-by: NFabio Estevam <festevam@gmail.com>
      Signed-off-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
      979556f1
  8. 15 10月, 2022 6 次提交
    • J
      clk: tegra: Fix Tegra PWM parent clock · c461c677
      Jon Hunter 提交于
      Commit 8c193f47 ("pwm: tegra: Optimize period calculation") updated
      the period calculation in the Tegra PWM driver and now returns an error
      if the period requested is less than minimum period supported. This is
      breaking PWM support on various Tegra platforms. For example, on the
      Tegra210 Jetson Nano platform this is breaking the PWM fan support and
      probing the PWM fan driver now fails ...
      
       pwm-fan pwm-fan: Failed to configure PWM: -22
       pwm-fan: probe of pwm-fan failed with error -22
      
      The problem is that the default parent clock for the PWM on Tegra210 is
      a 32kHz clock and is unable to support the requested PWM period.
      
      Fix PWM support on Tegra20, Tegra30, Tegra114, Tegra124 and Tegra210 by
      updating the parent clock for the PWM to be the PLL_P.
      
      Fixes: 8c193f47 ("pwm: tegra: Optimize period calculation")
      Signed-off-by: NJon Hunter <jonathanh@nvidia.com>
      Tested-by: Robert Eckelmann <longnoserob@gmail.com> # TF101 T20
      Tested-by: Antoni Aloy Torrens <aaloytorrens@gmail.com> # TF101 T20
      Tested-by: Svyatoslav Ryhel <clamor95@gmail.com> # TF201 T30
      Tested-by: Andreas Westman Dorcsak <hedmoo@yahoo.com> # TF700T T3
      Link: https://lore.kernel.org/r/20221010100046.6477-1-jonathanh@nvidia.comAcked-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Signed-off-by: NStephen Boyd <sboyd@kernel.org>
      c461c677
    • L
      clk: qcom: gcc-msm8660: Drop hardcoded fixed board clocks · 8c7bc6ca
      Linus Walleij 提交于
      These two clocks are now registered in the device tree as fixed clocks,
      causing a regression in the driver as the clock already exists with
      e.g. the name "pxo_board" as the MSM8660 GCC driver probes.
      
      Fix this by just not hard-coding this anymore and everything works
      like a charm.
      
      Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
      Fixes: baecbda5 ("ARM: dts: qcom: msm8660: fix node names for fixed clocks")
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      Link: https://lore.kernel.org/r/20221013140745.7801-1-linus.walleij@linaro.orgSigned-off-by: NStephen Boyd <sboyd@kernel.org>
      8c7bc6ca
    • A
      clk: mediatek: clk-mux: Add .determine_rate() callback · b05ea331
      AngeloGioacchino Del Regno 提交于
      Since commit 262ca38f ("clk: Stop forwarding clk_rate_requests
      to the parent"), the clk_rate_request is .. as the title says, not
      forwarded anymore to the parent: this produces an issue with the
      MediaTek clock MUX driver during GPU DVFS on MT8195, but not on
      MT8192 or others.
      
      This is because, differently from others, like MT8192 where all of
      the clocks in the MFG parents tree are of mtk_mux type, but in the
      parent tree of MT8195's MFG clock, we have one mtk_mux clock and
      one (clk framework generic) mux clock, like so:
      
      names: mfg_bg3d -> mfg_ck_fast_ref -> top_mfg_core_tmp (or) mfgpll
      types: mtk_gate ->      mux        ->     mtk_mux      (or) mtk_pll
      
      To solve this issue and also keep the GPU DVFS clocks code working
      as expected, wire up a .determine_rate() callback for the mtk_mux
      ops; for that, the standard clk_mux_determine_rate_flags() was used
      as it was possible to.
      
      This commit was successfully tested on MT6795 Xperia M5, MT8173 Elm,
      MT8192 Spherion and MT8195 Tomato; no regressions were seen.
      
      For the sake of some more documentation about this issue here's the
      trace of it:
      
      [   12.211587] ------------[ cut here ]------------
      [   12.211589] WARNING: CPU: 6 PID: 78 at drivers/clk/clk.c:1462 clk_core_init_rate_req+0x84/0x90
      [   12.211593] Modules linked in: stp crct10dif_ce mtk_adsp_common llc rfkill snd_sof_xtensa_dsp
                     panfrost(+) sbs_battery cros_ec_lid_angle cros_ec_sensors snd_sof_of
                     cros_ec_sensors_core hid_multitouch cros_usbpd_logger snd_sof gpu_sched
                     snd_sof_utils fuse ipv6
      [   12.211614] CPU: 6 PID: 78 Comm: kworker/u16:2 Tainted: G        W          6.0.0-next-20221011+ #58
      [   12.211616] Hardware name: Acer Tomato (rev2) board (DT)
      [   12.211617] Workqueue: devfreq_wq devfreq_monitor
      [   12.211620] pstate: 40400009 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      [   12.211622] pc : clk_core_init_rate_req+0x84/0x90
      [   12.211625] lr : clk_core_forward_rate_req+0xa4/0xe4
      [   12.211627] sp : ffff80000893b8e0
      [   12.211628] x29: ffff80000893b8e0 x28: ffffdddf92f9b000 x27: ffff46a2c0e8bc05
      [   12.211632] x26: ffff46a2c1041200 x25: 0000000000000000 x24: 00000000173eed80
      [   12.211636] x23: ffff80000893b9c0 x22: ffff80000893b940 x21: 0000000000000000
      [   12.211641] x20: ffff46a2c1039f00 x19: ffff46a2c1039f00 x18: 0000000000000000
      [   12.211645] x17: 0000000000000038 x16: 000000000000d904 x15: 0000000000000003
      [   12.211649] x14: ffffdddf9357ce48 x13: ffffdddf935e71c8 x12: 000000000004803c
      [   12.211653] x11: 00000000a867d7ad x10: 00000000a867d7ad x9 : ffffdddf90c28df4
      [   12.211657] x8 : ffffdddf9357a980 x7 : 0000000000000000 x6 : 0000000000000004
      [   12.211661] x5 : ffffffffffffffc8 x4 : 00000000173eed80 x3 : ffff80000893b940
      [   12.211665] x2 : 00000000173eed80 x1 : ffff80000893b940 x0 : 0000000000000000
      [   12.211669] Call trace:
      [   12.211670]  clk_core_init_rate_req+0x84/0x90
      [   12.211673]  clk_core_round_rate_nolock+0xe8/0x10c
      [   12.211675]  clk_mux_determine_rate_flags+0x174/0x1f0
      [   12.211677]  clk_mux_determine_rate+0x1c/0x30
      [   12.211680]  clk_core_determine_round_nolock+0x74/0x130
      [   12.211682]  clk_core_round_rate_nolock+0x58/0x10c
      [   12.211684]  clk_core_round_rate_nolock+0xf4/0x10c
      [   12.211686]  clk_core_set_rate_nolock+0x194/0x2ac
      [   12.211688]  clk_set_rate+0x40/0x94
      [   12.211691]  _opp_config_clk_single+0x38/0xa0
      [   12.211693]  _set_opp+0x1b0/0x500
      [   12.211695]  dev_pm_opp_set_rate+0x120/0x290
      [   12.211697]  panfrost_devfreq_target+0x3c/0x50 [panfrost]
      [   12.211705]  devfreq_set_target+0x8c/0x2d0
      [   12.211707]  devfreq_update_target+0xcc/0xf4
      [   12.211708]  devfreq_monitor+0x40/0x1d0
      [   12.211710]  process_one_work+0x294/0x664
      [   12.211712]  worker_thread+0x7c/0x45c
      [   12.211713]  kthread+0x104/0x110
      [   12.211716]  ret_from_fork+0x10/0x20
      [   12.211718] irq event stamp: 7102
      [   12.211719] hardirqs last  enabled at (7101): [<ffffdddf904ea5a0>] finish_task_switch.isra.0+0xec/0x2f0
      [   12.211723] hardirqs last disabled at (7102): [<ffffdddf91794b74>] el1_dbg+0x24/0x90
      [   12.211726] softirqs last  enabled at (6716): [<ffffdddf90410be4>] __do_softirq+0x414/0x588
      [   12.211728] softirqs last disabled at (6507): [<ffffdddf904171d8>] ____do_softirq+0x18/0x24
      [   12.211730] ---[ end trace 0000000000000000 ]---
      
      Fixes: 262ca38f ("clk: Stop forwarding clk_rate_requests to the parent")
      Signed-off-by: NAngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
      Link: https://lore.kernel.org/r/20221011135548.318323-1-angelogioacchino.delregno@collabora.comSigned-off-by: NStephen Boyd <sboyd@kernel.org>
      b05ea331
    • B
      Revert "PCI: Distribute available resources for root buses, too" · 5632e2be
      Bjorn Helgaas 提交于
      This reverts commit e96e27fc.
      
      Jonathan reported that this commit broke this topology, where all the space
      available on bus 02 was assigned to the 02:00.0 bridge window, leaving none
      for the e1000 device at 02:00.1:
      
        pci 0000:00:04.0: bridge window [mem 0x10200000-0x103fffff] to [bus 02-04]
        pci 0000:02:00.0: bridge window [mem 0x10200000-0x103fffff] to [bus 03-04]
        pci 0000:02:00.1: BAR 0: failed to assign [mem size 0x00020000]
        e1000 0000:02:00.1: can't ioremap BAR 0: [??? 0x00000000 flags 0x0]
      
      Link: https://lore.kernel.org/r/20221014124553.0000696f@huawei.comReported-by: NJonathan Cameron <Jonathan.Cameron@huawei.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      5632e2be
    • N
      drm/amd/display: Fix build breakage with CONFIG_DEBUG_FS=n · 2130b87b
      Nathan Chancellor 提交于
      After commit 8799c0be ("drm/amd/display: Fix vblank refcount in vrr
      transition"), a build with CONFIG_DEBUG_FS=n is broken due to a
      misplaced brace, along the lines of:
      
        In file included from drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_trace.h:39,
                         from drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:41:
        drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c: At top level:
        ./include/drm/drm_atomic.h:864:9: error: expected identifier or ‘(’ before ‘for’
          864 |         for ((__i) = 0;                                                 \
              |         ^~~
        drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:8317:9: note: in expansion of macro ‘for_each_new_crtc_in_state’
         8317 |         for_each_new_crtc_in_state(state, crtc, new_crtc_state, j)
              |         ^~~~~~~~~~~~~~~~~~~~~~~~~~
      
      Move the brace within the #ifdef so that the file can be built with or
      without CONFIG_DEBUG_FS.
      
      Fixes: 8799c0be ("drm/amd/display: Fix vblank refcount in vrr transition")
      Signed-off-by: NNathan Chancellor <nathan@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2130b87b
    • C
      parisc: Fix spelling mistake "mis-match" -> "mismatch" in eisa driver · 34314cd6
      Colin Ian King 提交于
      There are several spelling mistakes in kernel error messages. Fix them.
      Signed-off-by: NColin Ian King <colin.i.king@gmail.com>
      Signed-off-by: NHelge Deller <deller@gmx.de>
      34314cd6
  9. 14 10月, 2022 11 次提交
  10. 13 10月, 2022 6 次提交
    • M
      virtio_pci: use irq to detect interrupt support · 2145ab51
      Michael S. Tsirkin 提交于
      commit 71491c54 ("virtio_pci: don't try to use intxif pin is zero")
      breaks virtio_pci on powerpc, when running as a qemu guest.
      
      vp_find_vqs() bails out because pci_dev->pin == 0.
      
      But pci_dev->irq is populated correctly, so vp_find_vqs_intx() would
      succeed if we called it - which is what the code used to do.
      
      This seems to happen because pci_dev->pin is not populated in
      pci_assign_irq(). A PCI core bug? Maybe.
      
      However Linus said:
      	I really think that that is basically the only time you should use
      	that 'pci_dev->pin' thing: it basically exists not for "does this
      	device have an IRQ", but for "what is the routing of this irq on this
      	device".
      
      and
      	The correct way to check for "no irq" doesn't use NO_IRQ at all, it just does
      		if (dev->irq) ...
      
      so let's just check irq and be done with it.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Reported-by: NMichael Ellerman <mpe@ellerman.id.au>
      Fixes: 71491c54 ("virtio_pci: don't try to use intxif pin is zero")
      Cc: "Angus Chen" <angus.chen@jaguarmicro.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Tested-by: NMichael Ellerman <mpe@ellerman.id.au>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Message-Id: <20221012220312.308522-1-mst@redhat.com>
      2145ab51
    • B
      zram: always expose rw_page · 94541bc3
      Brian Geffon 提交于
      Currently zram will adjust its fops to a version which does not contain
      rw_page when a backing device has been assigned.  This is done to prevent
      upper layers from assuming a synchronous operation when a page may have
      been written back.  This forces every operation through bio which has
      overhead associated with bio_alloc/frees.
      
      The code can be simplified to always expose an rw_page method and only in
      the rare event that a page is written back we instead will return
      -EOPNOTSUPP forcing the upper layer to fallback to bio.
      
      Link: https://lkml.kernel.org/r/20221003144832.2906610-1-bgeffon@google.comSigned-off-by: NBrian Geffon <bgeffon@google.com>
      Reviewed-by: NSergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Rom Lemarchand <romlem@google.com>
      Cc: Suleiman Souhlal <suleiman@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      94541bc3
    • A
      nouveau/dmem: evict device private memory during release · 24988123
      Alistair Popple 提交于
      When the module is unloaded or a GPU is unbound from the module it is
      possible for device private pages to still be mapped in currently running
      processes.  This can lead to a hangs and RCU stall warnings when unbinding
      the device as memunmap_pages() will wait in an uninterruptible state until
      all device pages have been freed which may never happen.
      
      Fix this by migrating device mappings back to normal CPU memory prior to
      freeing the GPU memory chunks and associated device private pages.
      
      Link: https://lkml.kernel.org/r/66277601fb8fda9af408b33da9887192bf895bda.1664366292.git-series.apopple@nvidia.comSigned-off-by: NAlistair Popple <apopple@nvidia.com>
      Cc: Lyude Paul <lyude@redhat.com>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Alex Sierra <alex.sierra@amd.com>
      Cc: Christian König <christian.koenig@amd.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Felix Kuehling <Felix.Kuehling@amd.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      24988123
    • A
      nouveau/dmem: refactor nouveau_dmem_fault_copy_one() · d9b71939
      Alistair Popple 提交于
      nouveau_dmem_fault_copy_one() is used during handling of CPU faults via
      the migrate_to_ram() callback and is used to copy data from GPU to CPU
      memory.  It is currently specific to fault handling, however a future
      patch implementing eviction of data during teardown needs similar
      functionality.
      
      Refactor out the core functionality so that it is not specific to fault
      handling.
      
      Link: https://lkml.kernel.org/r/20573d7b4e641a78fde9935f948e64e71c9e709e.1664366292.git-series.apopple@nvidia.comSigned-off-by: NAlistair Popple <apopple@nvidia.com>
      Reviewed-by: NLyude Paul <lyude@redhat.com>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Alex Sierra <alex.sierra@amd.com>
      Cc: Christian König <christian.koenig@amd.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Felix Kuehling <Felix.Kuehling@amd.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      d9b71939
    • A
      mm: free device private pages have zero refcount · ef233450
      Alistair Popple 提交于
      Since 27674ef6 ("mm: remove the extra ZONE_DEVICE struct page
      refcount") device private pages have no longer had an extra reference
      count when the page is in use.  However before handing them back to the
      owning device driver we add an extra reference count such that free pages
      have a reference count of one.
      
      This makes it difficult to tell if a page is free or not because both free
      and in use pages will have a non-zero refcount.  Instead we should return
      pages to the drivers page allocator with a zero reference count.  Kernel
      code can then safely use kernel functions such as get_page_unless_zero().
      
      Link: https://lkml.kernel.org/r/cf70cf6f8c0bdb8aaebdbfb0d790aea4c683c3c6.1664366292.git-series.apopple@nvidia.comSigned-off-by: NAlistair Popple <apopple@nvidia.com>
      Acked-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Christian König <christian.koenig@amd.com>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: Lyude Paul <lyude@redhat.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Alex Sierra <alex.sierra@amd.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      ef233450
    • A
      mm/memory.c: fix race when faulting a device private page · 16ce101d
      Alistair Popple 提交于
      Patch series "Fix several device private page reference counting issues",
      v2
      
      This series aims to fix a number of page reference counting issues in
      drivers dealing with device private ZONE_DEVICE pages.  These result in
      use-after-free type bugs, either from accessing a struct page which no
      longer exists because it has been removed or accessing fields within the
      struct page which are no longer valid because the page has been freed.
      
      During normal usage it is unlikely these will cause any problems.  However
      without these fixes it is possible to crash the kernel from userspace. 
      These crashes can be triggered either by unloading the kernel module or
      unbinding the device from the driver prior to a userspace task exiting. 
      In modules such as Nouveau it is also possible to trigger some of these
      issues by explicitly closing the device file-descriptor prior to the task
      exiting and then accessing device private memory.
      
      This involves some minor changes to both PowerPC and AMD GPU code. 
      Unfortunately I lack hardware to test either of those so any help there
      would be appreciated.  The changes mimic what is done in for both Nouveau
      and hmm-tests though so I doubt they will cause problems.
      
      
      This patch (of 8):
      
      When the CPU tries to access a device private page the migrate_to_ram()
      callback associated with the pgmap for the page is called.  However no
      reference is taken on the faulting page.  Therefore a concurrent migration
      of the device private page can free the page and possibly the underlying
      pgmap.  This results in a race which can crash the kernel due to the
      migrate_to_ram() function pointer becoming invalid.  It also means drivers
      can't reliably read the zone_device_data field because the page may have
      been freed with memunmap_pages().
      
      Close the race by getting a reference on the page while holding the ptl to
      ensure it has not been freed.  Unfortunately the elevated reference count
      will cause the migration required to handle the fault to fail.  To avoid
      this failure pass the faulting page into the migrate_vma functions so that
      if an elevated reference count is found it can be checked to see if it's
      expected or not.
      
      [mpe@ellerman.id.au: fix build]
        Link: https://lkml.kernel.org/r/87fsgbf3gh.fsf@mpe.ellerman.id.au
      Link: https://lkml.kernel.org/r/cover.60659b549d8509ddecafad4f498ee7f03bb23c69.1664366292.git-series.apopple@nvidia.com
      Link: https://lkml.kernel.org/r/d3e813178a59e565e8d78d9b9a4e2562f6494f90.1664366292.git-series.apopple@nvidia.comSigned-off-by: NAlistair Popple <apopple@nvidia.com>
      Acked-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Lyude Paul <lyude@redhat.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Alex Sierra <alex.sierra@amd.com>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: Christian König <christian.koenig@amd.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      16ce101d