1. 31 5月, 2018 1 次提交
    • J
      platform/x86: asus-wmi: Fix NULL pointer dereference · 32ffd6e8
      João Paulo Rechi Vita 提交于
      Do not perform the rfkill cleanup routine when
      (asus->driver->wlan_ctrl_by_user && ashs_present()) is true, since
      nothing is registered with the rfkill subsystem in that case. Doing so
      leads to the following kernel NULL pointer dereference:
      
        BUG: unable to handle kernel NULL pointer dereference at           (null)
        IP: [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120
        PGD 1a3aa8067
        PUD 1a3b3d067
        PMD 0
      
        Oops: 0002 [#1] PREEMPT SMP
        Modules linked in: bnep ccm binfmt_misc uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core hid_a4tech videodev x86_pkg_temp_thermal intel_powerclamp coretemp ath3k btusb btrtl btintel bluetooth kvm_intel snd_hda_codec_hdmi kvm snd_hda_codec_realtek snd_hda_codec_generic irqbypass crc32c_intel arc4 i915 snd_hda_intel snd_hda_codec ath9k ath9k_common ath9k_hw ath i2c_algo_bit snd_hwdep mac80211 ghash_clmulni_intel snd_hda_core snd_pcm snd_timer cfg80211 ehci_pci xhci_pci drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm xhci_hcd ehci_hcd asus_nb_wmi(-) asus_wmi sparse_keymap r8169 rfkill mxm_wmi serio_raw snd mii mei_me lpc_ich i2c_i801 video soundcore mei i2c_smbus wmi i2c_core mfd_core
        CPU: 3 PID: 3275 Comm: modprobe Not tainted 4.9.34-gentoo #34
        Hardware name: ASUSTeK COMPUTER INC. K56CM/K56CM, BIOS K56CM.206 08/21/2012
        task: ffff8801a639ba00 task.stack: ffffc900014cc000
        RIP: 0010:[<ffffffff816c7348>]  [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120
        RSP: 0018:ffffc900014cfce0  EFLAGS: 00010282
        RAX: 0000000000000000 RBX: ffff8801a54315b0 RCX: 00000000c0000100
        RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8801a54315b4
        RBP: ffffc900014cfd30 R08: 0000000000000000 R09: 0000000000000002
        R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801a54315b4
        R13: ffff8801a639ba00 R14: 00000000ffffffff R15: ffff8801a54315b8
        FS:  00007faa254fb700(0000) GS:ffff8801aef80000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000000 CR3: 00000001a3b1b000 CR4: 00000000001406e0
        Stack:
         ffff8801a54315b8 0000000000000000 ffffffff814733ae ffffc900014cfd28
         ffffffff8146a28c ffff8801a54315b0 0000000000000000 ffff8801a54315b0
         ffff8801a66f3820 0000000000000000 ffffc900014cfd48 ffffffff816c73e7
        Call Trace:
         [<ffffffff814733ae>] ? acpi_ut_release_mutex+0x5d/0x61
         [<ffffffff8146a28c>] ? acpi_ns_get_node+0x49/0x52
         [<ffffffff816c73e7>] mutex_lock+0x17/0x30
         [<ffffffffa00a3bb4>] asus_rfkill_hotplug+0x24/0x1a0 [asus_wmi]
         [<ffffffffa00a4421>] asus_wmi_rfkill_exit+0x61/0x150 [asus_wmi]
         [<ffffffffa00a49f1>] asus_wmi_remove+0x61/0xb0 [asus_wmi]
         [<ffffffff814a5128>] platform_drv_remove+0x28/0x40
         [<ffffffff814a2901>] __device_release_driver+0xa1/0x160
         [<ffffffff814a29e3>] device_release_driver+0x23/0x30
         [<ffffffff814a1ffd>] bus_remove_device+0xfd/0x170
         [<ffffffff8149e5a9>] device_del+0x139/0x270
         [<ffffffff814a5028>] platform_device_del+0x28/0x90
         [<ffffffff814a50a2>] platform_device_unregister+0x12/0x30
         [<ffffffffa00a4209>] asus_wmi_unregister_driver+0x19/0x30 [asus_wmi]
         [<ffffffffa00da0ea>] asus_nb_wmi_exit+0x10/0xf26 [asus_nb_wmi]
         [<ffffffff8110c692>] SyS_delete_module+0x192/0x270
         [<ffffffff810022b2>] ? exit_to_usermode_loop+0x92/0xa0
         [<ffffffff816ca560>] entry_SYSCALL_64_fastpath+0x13/0x94
        Code: e8 5e 30 00 00 8b 03 83 f8 01 0f 84 93 00 00 00 48 8b 43 10 4c 8d 7b 08 48 89 63 10 41 be ff ff ff ff 4c 89 3c 24 48 89 44 24 08 <48> 89 20 4c 89 6c 24 10 eb 1d 4c 89 e7 49 c7 45 08 02 00 00 00
        RIP  [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120
         RSP <ffffc900014cfce0>
        CR2: 0000000000000000
        ---[ end trace 8d484233fa7cb512 ]---
        note: modprobe[3275] exited with preempt_count 2
      
      https://bugzilla.kernel.org/show_bug.cgi?id=196467
      
      Reported-by: red.f0xyz@gmail.com
      Signed-off-by: NJoão Paulo Rechi Vita <jprvita@endlessm.com>
      Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      32ffd6e8
  2. 30 5月, 2018 3 次提交
  3. 29 5月, 2018 1 次提交
    • A
      IB: Revert "remove redundant INFINIBAND kconfig dependencies" · 533d1dae
      Arnd Bergmann 提交于
      Several subsystems depend on INFINIBAND_ADDR_TRANS, which in turn depends
      on INFINIBAND. However, when with CONFIG_INIFIBAND=m, this leads to a
      link error when another driver using it is built-in. The
      INFINIBAND_ADDR_TRANS dependency is insufficient here as this is
      a 'bool' symbol that does not force anything to be a module in turn.
      
      fs/cifs/smbdirect.o: In function `smbd_disconnect_rdma_work':
      smbdirect.c:(.text+0x1e4): undefined reference to `rdma_disconnect'
      net/9p/trans_rdma.o: In function `rdma_request':
      trans_rdma.c:(.text+0x7bc): undefined reference to `rdma_disconnect'
      net/9p/trans_rdma.o: In function `rdma_destroy_trans':
      trans_rdma.c:(.text+0x830): undefined reference to `ib_destroy_qp'
      trans_rdma.c:(.text+0x858): undefined reference to `ib_dealloc_pd'
      
      Fixes: 9533b292 ("IB: remove redundant INFINIBAND kconfig dependencies")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NGreg Thelen <gthelen@google.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      533d1dae
  4. 26 5月, 2018 4 次提交
    • A
      crypto: inside-secure - do not use memset on MMIO · bf4407f0
      Antoine Tenart 提交于
      This patch fixes the Inside Secure driver which uses a memtset() call to
      set an MMIO area from the cryptographic engine to 0. This is wrong as
      memset() isn't guaranteed to work on MMIO for many reasons. This led to
      kernel paging request panics in certain cases. Use memset_io() instead.
      
      Fixes: 1b44c5a6 ("crypto: inside-secure - add SafeXcel EIP197 crypto engine driver")
      Reported-by: NOfer Heifetz <oferh@marvell.com>
      Signed-off-by: NAntoine Tenart <antoine.tenart@bootlin.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      bf4407f0
    • J
      mm/memory_hotplug: fix leftover use of struct page during hotplug · a2155861
      Jonathan Cameron 提交于
      The case of a new numa node got missed in avoiding using the node info
      from page_struct during hotplug.  In this path we have a call to
      register_mem_sect_under_node (which allows us to specify it is hotplug
      so don't change the node), via link_mem_sections which unfortunately
      does not.
      
      Fix is to pass check_nid through link_mem_sections as well and disable
      it in the new numa node path.
      
      Note the bug only 'sometimes' manifests depending on what happens to be
      in the struct page structures - there are lots of them and it only needs
      to match one of them.
      
      The result of the bug is that (with a new memory only node) we never
      successfully call register_mem_sect_under_node so don't get the memory
      associated with the node in sysfs and meminfo for the node doesn't
      report it.
      
      It came up whilst testing some arm64 hotplug patches, but appears to be
      universal.  Whilst I'm triggering it by removing then reinserting memory
      to a node with no other elements (thus making the node disappear then
      appear again), it appears it would happen on hotplugging memory where
      there was none before and it doesn't seem to be related the arm64
      patches.
      
      These patches call __add_pages (where most of the issue was fixed by
      Pavel's patch).  If there is a node at the time of the __add_pages call
      then all is well as it calls register_mem_sect_under_node from there
      with check_nid set to false.  Without a node that function returns
      having not done the sysfs related stuff as there is no node to use.
      This is expected but it is the resulting path that fails...
      
      Exact path to the problem is as follows:
      
       mm/memory_hotplug.c: add_memory_resource()
      
         The node is not online so we enter the 'if (new_node)' twice, on the
         second such block there is a call to link_mem_sections which calls
         into
      
        drivers/node.c: link_mem_sections() which calls
      
        drivers/node.c: register_mem_sect_under_node() which calls
           get_nid_for_pfn and keeps trying until the output of that matches
           the expected node (passed all the way down from
           add_memory_resource)
      
      It is effectively the same fix as the one referred to in the fixes tag
      just in the code path for a new node where the comments point out we
      have to rerun the link creation because it will have failed in
      register_new_memory (as there was no node at the time).  (actually that
      comment is wrong now as we don't have register_new_memory any more it
      got renamed to hotplug_memory_register in Pavel's patch).
      
      Link: http://lkml.kernel.org/r/20180504085311.1240-1-Jonathan.Cameron@huawei.com
      Fixes: fc44f7f9 ("mm/memory_hotplug: don't read nid from struct page during hotplug")
      Signed-off-by: NJonathan Cameron <Jonathan.Cameron@huawei.com>
      Reviewed-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a2155861
    • T
      ibmvnic: Fix partial success login retries · eb110410
      Thomas Falcon 提交于
      In its current state, the driver will handle backing device
      login in a loop for a certain number of retries while the
      device returns a partial success, indicating that the driver
      may need to try again using a smaller number of resources.
      
      The variable it checks to continue retrying may change
      over the course of operations, resulting in reallocation
      of resources but exits without sending the login attempt.
      Guard against this by introducing a boolean variable that
      will retain the state indicating that the driver needs to
      reattempt login with backing device firmware.
      Signed-off-by: NThomas Falcon <tlfalcon@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb110410
    • D
      RDMA/bnxt_re: Fix broken RoCE driver due to recent L2 driver changes · 6e04b103
      Devesh Sharma 提交于
      The recent changes in Broadcom's ethernet driver(L2 driver) broke
      RoCE functionality in terms of MSIx vector allocation and
      de-allocation.
      
      There is a possibility that L2 driver would initiate MSIx vector
      reallocation depending upon the requests coming from administrator.
      In such cases L2 driver needs to free up all the MSIx vectors
      allocated previously and reallocate/initialize those.
      
      If RoCE driver is loaded and reshuffling is attempted, there will be
      kernel crashes because RoCE driver would still be holding the MSIx
      vectors but L2 driver would attempt to free in-use vectors. Thus
      leading to a kernel crash.
      
      Making changes in roce driver to fix crashes described above.
      As part of solution L2 driver tells RoCE driver to release
      the MSIx vector whenever there is a need. When RoCE driver
      get message it sync up with all the running tasklets and IRQ
      handlers and releases the vectors. L2 driver send one more
      message to RoCE driver to resume the MSIx vectors. L2 driver
      guarantees that RoCE vector do not change during reshuffling.
      
      Fixes: ec86f14e ("bnxt_en: Add ULP calls to stop and restart IRQs.")
      Fixes: 08654eb2 ("bnxt_en: Change IRQ assignment for RDMA driver.")
      Signed-off-by: NDevesh Sharma <devesh.sharma@broadcom.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      6e04b103
  5. 25 5月, 2018 7 次提交
    • Q
      mlx4_core: allocate ICM memory in page size chunks · 1383cb81
      Qing Huang 提交于
      When a system is under memory presure (high usage with fragments),
      the original 256KB ICM chunk allocations will likely trigger kernel
      memory management to enter slow path doing memory compact/migration
      ops in order to complete high order memory allocations.
      
      When that happens, user processes calling uverb APIs may get stuck
      for more than 120s easily even though there are a lot of free pages
      in smaller chunks available in the system.
      
      Syslog:
      ...
      Dec 10 09:04:51 slcc03db02 kernel: [397078.572732] INFO: task
      oracle_205573_e:205573 blocked for more than 120 seconds.
      ...
      
      With 4KB ICM chunk size on x86_64 arch, the above issue is fixed.
      
      However in order to support smaller ICM chunk size, we need to fix
      another issue in large size kcalloc allocations.
      
      E.g.
      Setting log_num_mtt=30 requires 1G mtt entries. With the 4KB ICM chunk
      size, each ICM chunk can only hold 512 mtt entries (8 bytes for each mtt
      entry). So we need a 16MB allocation for a table->icm pointer array to
      hold 2M pointers which can easily cause kcalloc to fail.
      
      The solution is to use kvzalloc to replace kcalloc which will fall back
      to vmalloc automatically if kmalloc fails.
      Signed-off-by: NQing Huang <qing.huang@oracle.com>
      Acked-by: NDaniel Jurgens <danielj@mellanox.com>
      Reviewed-by: NZhu Yanjun <yanjun.zhu@oracle.com>
      Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1383cb81
    • N
      firmware: qcom: scm: Fix crash in qcom_scm_call_atomic1() · 5ec3444c
      Niklas Cassel 提交于
      qcom_scm_call_atomic1() can crash with a NULL pointer dereference at
      qcom_scm_call_atomic1+0x30/0x48.
      
      disassembly of qcom_scm_call_atomic1():
      ...
      <0xc08d73b0 <+12>: ldr r3, [r12]
      ... (no instruction explicitly modifies r12)
      0xc08d73cc <+40>: smc 0
      ... (no instruction explicitly modifies r12)
      0xc08d73d4 <+48>: ldr r3, [r12] <- crashing instruction
      ...
      
      Since the first ldr is successful, and since r12 isn't explicitly
      modified by any instruction between the first and the second ldr,
      it must have been modified by the smc call, which is ok,
      since r12 is caller save according to the AAPCS.
      
      Add r12 to the clobber list so that the compiler knows that the
      callee potentially overwrites the value in r12.
      Clobber descriptions may not in any way overlap with an input or
      output operand.
      Signed-off-by: NNiklas Cassel <niklas.cassel@linaro.org>
      Reviewed-by: NBjorn Andersson <bjorn.andersson@linaro.org>
      Reviewed-by: NStephen Boyd <sboyd@kernel.org>
      Signed-off-by: NAndy Gross <andy.gross@linaro.org>
      5ec3444c
    • G
      enic: set DMA mask to 47 bit · 322eaa06
      Govindarajulu Varadarajan 提交于
      In commit 624dbf55 ("driver/net: enic: Try DMA 64 first, then
      failover to DMA") DMA mask was changed from 40 bits to 64 bits.
      Hardware actually supports only 47 bits.
      
      Fixes: 624dbf55 ("driver/net: enic: Try DMA 64 first, then failover to DMA")
      Signed-off-by: NGovindarajulu Varadarajan <gvaradar@cisco.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      322eaa06
    • E
      ppp: remove the PPPIOCDETACH ioctl · af8d3c7c
      Eric Biggers 提交于
      The PPPIOCDETACH ioctl effectively tries to "close" the given ppp file
      before f_count has reached 0, which is fundamentally a bad idea.  It
      does check 'f_count < 2', which excludes concurrent operations on the
      file since they would only be possible with a shared fd table, in which
      case each fdget() would take a file reference.  However, it fails to
      account for the fact that even with 'f_count == 1' the file can still be
      linked into epoll instances.  As reported by syzbot, this can trivially
      be used to cause a use-after-free.
      
      Yet, the only known user of PPPIOCDETACH is pppd versions older than
      ppp-2.4.2, which was released almost 15 years ago (November 2003).
      Also, PPPIOCDETACH apparently stopped working reliably at around the
      same time, when the f_count check was added to the kernel, e.g. see
      https://lkml.org/lkml/2002/12/31/83.  Also, the current 'f_count < 2'
      check makes PPPIOCDETACH only work in single-threaded applications; it
      always fails if called from a multithreaded application.
      
      All pppd versions released in the last 15 years just close() the file
      descriptor instead.
      
      Therefore, instead of hacking around this bug by exporting epoll
      internals to modules, and probably missing other related bugs, just
      remove the PPPIOCDETACH ioctl and see if anyone actually notices.  Leave
      a stub in place that prints a one-time warning and returns EINVAL.
      
      Reported-by: syzbot+16363c99d4134717c05b@syzkaller.appspotmail.com
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Acked-by: NPaul Mackerras <paulus@ozlabs.org>
      Reviewed-by: NGuillaume Nault <g.nault@alphalink.fr>
      Tested-by: NGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af8d3c7c
    • J
      vhost: synchronize IOTLB message with dev cleanup · 1b15ad68
      Jason Wang 提交于
      DaeRyong Jeong reports a race between vhost_dev_cleanup() and
      vhost_process_iotlb_msg():
      
      Thread interleaving:
      CPU0 (vhost_process_iotlb_msg)			CPU1 (vhost_dev_cleanup)
      (In the case of both VHOST_IOTLB_UPDATE and
      VHOST_IOTLB_INVALIDATE)
      
      =====						=====
      						vhost_umem_clean(dev->iotlb);
      if (!dev->iotlb) {
      	        ret = -EFAULT;
      		        break;
      }
      						dev->iotlb = NULL;
      
      The reason is we don't synchronize between them, fixing by protecting
      vhost_process_iotlb_msg() with dev mutex.
      Reported-by: NDaeRyong Jeong <threeearcat@gmail.com>
      Fixes: 6b1e6cc7 ("vhost: new device IOTLB API")
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1b15ad68
    • Y
      net/mlx5: IPSec, Fix a race between concurrent sandbox QP commands · 1dcbc01f
      Yossi Kuperman 提交于
      Sandbox QP Commands are retired in the order they are sent. Outstanding
      commands are stored in a linked-list in the order they appear. Once a
      response is received and the callback gets called, we pull the first
      element off the pending list, assuming they correspond.
      
      Sending a message and adding it to the pending list is not done atomically,
      hence there is an opportunity for a race between concurrent requests.
      
      Bind both send and add under a critical section.
      
      Fixes: bebb23e6 ("net/mlx5: Accel, Add IPSec acceleration interface")
      Signed-off-by: NYossi Kuperman <yossiku@mellanox.com>
      Signed-off-by: NAdi Nissim <adin@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      1dcbc01f
    • E
      net/mlx5e: When RXFCS is set, add FCS data into checksum calculation · 902a5459
      Eran Ben Elisha 提交于
      When RXFCS feature is enabled, the HW do not strip the FCS data,
      however it is not present in the checksum calculated by the HW.
      
      Fix that by manually calculating the FCS checksum and adding it to the SKB
      checksum field.
      
      Add helper function to find the FCS data for all SKB forms (linear,
      one fragment or more).
      
      Fixes: 102722fc ("net/mlx5e: Add support for RXFCS feature flag")
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      902a5459
  6. 24 5月, 2018 19 次提交
  7. 23 5月, 2018 5 次提交
    • T
      drm/vmwgfx: Schedule an fb dirty update after resume · 6a93cea1
      Thomas Hellstrom 提交于
      We have had problems displaying fbdev after a resume and as a
      workaround we have had to call vmw_fb_refresh(). This has had
      a number of unwanted side-effects. The root of the problem was,
      however that the coalesced fbdev dirty region was not empty on
      the first dirty_mark() after a resume, so a flush was never
      scheduled.
      
      Fix this by force scheduling an fbdev flush after resume, and
      remove the workaround.
      Signed-off-by: NThomas Hellstrom <thellstrom@vmware.com>
      Reviewed-by: NBrian Paul <brianp@vmware.com>
      Reviewed-by: NDeepak Rawat <drawat@vmware.com>
      6a93cea1
    • T
      drm/vmwgfx: Fix host logging / guestinfo reading error paths · f37230c0
      Thomas Hellstrom 提交于
      The error paths were leaking opened channels.
      Fix by using dedicated error paths.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NThomas Hellstrom <thellstrom@vmware.com>
      Reviewed-by: NBrian Paul <brianp@vmware.com>
      Reviewed-by: NSinclair Yeh <syeh@vmware.com>
      f37230c0
    • T
      drm/vmwgfx: Fix 32-bit VMW_PORT_HB_[IN|OUT] macros · 938ae725
      Thomas Hellstrom 提交于
      Depending on whether the kernel is compiled with frame-pointer or not,
      the temporary memory location used for the bp parameter in these macros
      is referenced relative to the stack pointer or the frame pointer.
      Hence we can never reference that parameter when we've modified either
      the stack pointer or the frame pointer, because then the compiler would
      generate an incorrect stack reference.
      
      Fix this by pushing the temporary memory parameter on a known location on
      the stack before modifying the stack- and frame pointers.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NThomas Hellstrom <thellstrom@vmware.com>
      Reviewed-by: NBrian Paul <brianp@vmware.com>
      Reviewed-by: NSinclair Yeh <syeh@vmware.com>
      938ae725
    • S
      s390/dasd: use blk_mq_rq_from_pdu for per request data · f0f59a2f
      Sebastian Ott 提交于
      Dasd uses completion_data from struct request to store per request
      private data - this is problematic since this member is part of a
      union which is also used by IO schedulers.
      Let the block layer maintain space for per request data behind each
      struct request.
      
      Fixes crashes on block layer timeouts like this one:
      
      Unable to handle kernel pointer dereference in virtual kernel address space
      Failing address: 0000000000000000 TEID: 0000000000000483
      Fault in home space mode while using kernel ASCE.
      AS:0000000001308007 R3:00000000fffc8007 S:00000000fffcc000 P:000000000000013d
      Oops: 0004 ilc:2 [#1] PREEMPT SMP
      Modules linked in: [...]
      CPU: 0 PID: 1480 Comm: kworker/0:2H Not tainted 4.17.0-rc4-00046-gaa3bcd43b5af #203
      Hardware name: IBM 3906 M02 702 (LPAR)
      Workqueue: kblockd blk_mq_timeout_work
      Krnl PSW : 0000000067ac406b 00000000b6960308 (do_raw_spin_trylock+0x30/0x78)
                 R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
      Krnl GPRS: 0000000000000c00 0000000000000000 0000000000000000 0000000000000001
                 0000000000b9d3c8 0000000000000000 0000000000000001 00000000cf9639d8
                 0000000000000000 0700000000000000 0000000000000000 000000000099f09e
                 0000000000000000 000000000076e9d0 000000006247bb08 000000006247bae0
      Krnl Code: 00000000001c159c: b90400c2           lgr     %r12,%r2
                 00000000001c15a0: a7180000           lhi     %r1,0
                #00000000001c15a4: 583003a4           l       %r3,932
                >00000000001c15a8: ba132000           cs      %r1,%r3,0(%r2)
                 00000000001c15ac: a7180001           lhi     %r1,1
                 00000000001c15b0: a784000b           brc     8,1c15c6
                 00000000001c15b4: c0e5004e72aa       brasl   %r14,b8fb08
                 00000000001c15ba: 1812               lr      %r1,%r2
      Call Trace:
      ([<0700000000000000>] 0x700000000000000)
       [<0000000000b9d3d2>] _raw_spin_lock_irqsave+0x7a/0xb8
       [<000000000099f09e>] dasd_times_out+0x46/0x278
       [<000000000076ea6e>] blk_mq_terminate_expired+0x9e/0x108
       [<000000000077497a>] bt_for_each+0x102/0x130
       [<0000000000774e54>] blk_mq_queue_tag_busy_iter+0x74/0xd8
       [<000000000076fea0>] blk_mq_timeout_work+0x260/0x320
       [<0000000000169dd4>] process_one_work+0x3bc/0x708
       [<000000000016a382>] worker_thread+0x262/0x408
       [<00000000001723a8>] kthread+0x160/0x178
       [<0000000000b9e73a>] kernel_thread_starter+0x6/0xc
       [<0000000000b9e734>] kernel_thread_starter+0x0/0xc
      INFO: lockdep is turned off.
      Last Breaking-Event-Address:
       [<0000000000b9d3cc>] _raw_spin_lock_irqsave+0x74/0xb8
      
      Kernel panic - not syncing: Fatal exception: panic_on_oops
      Signed-off-by: NSebastian Ott <sebott@linux.ibm.com>
      Reviewed-by: NStefan Haberland <sth@linux.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      f0f59a2f
    • B
      mfd: cros_ec: Retry commands when EC is known to be busy · 11799564
      Brian Norris 提交于
      Commit 001dde94 ("mfd: cros ec: spi: Fix "in progress" error
      signaling") pointed out some bad code, but its analysis and conclusion
      was not 100% correct.
      
      It *is* correct that we should not propagate result==EC_RES_IN_PROGRESS
      for transport errors, because this has a special meaning -- that we
      should follow up with EC_CMD_GET_COMMS_STATUS until the EC is no longer
      busy. This is definitely the wrong thing for many commands, because
      among other problems, EC_CMD_GET_COMMS_STATUS doesn't actually retrieve
      any RX data from the EC, so commands that expected some data back will
      instead start processing junk.
      
      For such commands, the right answer is to either propagate the error
      (and return that error to the caller) or resend the original command
      (*not* EC_CMD_GET_COMMS_STATUS).
      
      Unfortunately, commit 001dde94 forgets a crucial point: that for
      some long-running operations, the EC physically cannot respond to
      commands any more. For example, with EC_CMD_FLASH_ERASE, the EC may be
      re-flashing its own code regions, so it can't respond to SPI interrupts.
      Instead, the EC prepares us ahead of time for being busy for a "long"
      time, and fills its hardware buffer with EC_SPI_PAST_END. Thus, we
      expect to see several "transport" errors (or, messages filled with
      EC_SPI_PAST_END). So we should really translate that to a retryable
      error (-EAGAIN) and continue sending EC_CMD_GET_COMMS_STATUS until we
      get a ready status.
      
      IOW, it is actually important to treat some of these "junk" values as
      retryable errors.
      
      Together with commit 001dde94, this resolves bugs like the
      following:
      
      1. EC_CMD_FLASH_ERASE now works again (with commit 001dde94, we
         would abort the first time we saw EC_SPI_PAST_END)
      2. Before commit 001dde94, transport errors (e.g.,
         EC_SPI_RX_BAD_DATA) seen in other commands (e.g.,
         EC_CMD_RTC_GET_VALUE) used to yield junk data in the RX buffer; they
         will now yield -EAGAIN return values, and tools like 'hwclock' will
         simply fail instead of retrieving and re-programming undefined time
         values
      
      Fixes: 001dde94 ("mfd: cros ec: spi: Fix "in progress" error signaling")
      Signed-off-by: NBrian Norris <briannorris@chromium.org>
      Signed-off-by: NLee Jones <lee.jones@linaro.org>
      11799564