1. 21 12月, 2019 37 次提交
    • L
      ARM: dts: s3c64xx: Fix init order of clock providers · 7e14f038
      Lihua Yao 提交于
      commit d60d0cff4ab01255b25375425745c3cff69558ad upstream.
      
      fin_pll is the parent of clock-controller@7e00f000, specify
      the dependency to ensure proper initialization order of clock
      providers.
      
      without this patch:
      [    0.000000] S3C6410 clocks: apll = 0, mpll = 0
      [    0.000000]  epll = 0, arm_clk = 0
      
      with this patch:
      [    0.000000] S3C6410 clocks: apll = 532000000, mpll = 532000000
      [    0.000000]  epll = 24000000, arm_clk = 532000000
      
      Cc: <stable@vger.kernel.org>
      Fixes: 3f6d439f ("clk: reverse default clk provider initialization order in of_clk_init()")
      Signed-off-by: NLihua Yao <ylhuajnu@outlook.com>
      Reviewed-by: NSylwester Nawrocki <s.nawrocki@samsung.com>
      Signed-off-by: NKrzysztof Kozlowski <krzk@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7e14f038
    • P
      CIFS: Close open handle after interrupted close · e8b26877
      Pavel Shilovsky 提交于
      commit 9150c3adbf24d77cfba37f03639d4a908ca4ac25 upstream.
      
      If Close command is interrupted before sending a request
      to the server the client ends up leaking an open file
      handle. This wastes server resources and can potentially
      block applications that try to remove the file or any
      directory containing this file.
      
      Fix this by putting the close command into a worker queue,
      so another thread retries it later.
      
      Cc: Stable <stable@vger.kernel.org>
      Tested-by: NFrank Sorenson <sorenson@redhat.com>
      Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e8b26877
    • P
      CIFS: Respect O_SYNC and O_DIRECT flags during reconnect · 3ddc09c8
      Pavel Shilovsky 提交于
      commit 44805b0e62f15e90d233485420e1847133716bdc upstream.
      
      Currently the client translates O_SYNC and O_DIRECT flags
      into corresponding SMB create options when openning a file.
      The problem is that on reconnect when the file is being
      re-opened the client doesn't set those flags and it causes
      a server to reject re-open requests because create options
      don't match. The latter means that any subsequent system
      call against that open file fail until a share is re-mounted.
      
      Fix this by properly setting SMB create options when
      re-openning files after reconnects.
      
      Fixes: 1013e760: ("SMB3: Don't ignore O_SYNC/O_DSYNC and O_DIRECT flags")
      Cc: Stable <stable@vger.kernel.org>
      Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3ddc09c8
    • L
      cifs: Don't display RDMA transport on reconnect · 5948e7ec
      Long Li 提交于
      commit 14cc639c17ab0b6671526a7459087352507609e4 upstream.
      
      On reconnect, the transport data structure is NULL and its information is not
      available.
      Signed-off-by: NLong Li <longli@microsoft.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5948e7ec
    • L
      cifs: smbd: Return -EINVAL when the number of iovs exceeds SMBDIRECT_MAX_SGE · 33852a95
      Long Li 提交于
      commit 37941ea17d3f8eb2f5ac2f59346fab9e8439271a upstream.
      
      While it's not friendly to fail user processes that issue more iovs
      than we support, at least we should return the correct error code so the
      user process gets a chance to retry with smaller number of iovs.
      Signed-off-by: NLong Li <longli@microsoft.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      33852a95
    • L
      cifs: smbd: Add messages on RDMA session destroy and reconnection · 674b7b6c
      Long Li 提交于
      commit d63cdbae60ac6fbb2864bd3d8df7404f12b7407d upstream.
      
      Log these activities to help production support.
      Signed-off-by: NLong Li <longli@microsoft.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      674b7b6c
    • L
      cifs: smbd: Return -EAGAIN when transport is reconnecting · 5cceead7
      Long Li 提交于
      commit 4357d45f50e58672e1d17648d792f27df01dfccd upstream.
      
      During reconnecting, the transport may have already been destroyed and is in
      the process being reconnected. In this case, return -EAGAIN to not fail and
      to retry this I/O.
      Signed-off-by: NLong Li <longli@microsoft.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5cceead7
    • B
      rpmsg: glink: Free pending deferred work on remove · 7438617d
      Bjorn Andersson 提交于
      commit 278bcb7300f61785dba63840bd2a8cf79f14554c upstream.
      
      By just cancelling the deferred rx worker during GLINK instance teardown
      any pending deferred commands are leaked, so free them.
      
      Fixes: b4f8e52b ("rpmsg: Introduce Qualcomm RPM glink driver")
      Cc: stable@vger.kernel.org
      Acked-by: NChris Lew <clew@codeaurora.org>
      Tested-by: NSrinivas Kandagatla <srinivas.kandagatla@linaro.org>
      Signed-off-by: NBjorn Andersson <bjorn.andersson@linaro.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7438617d
    • B
      rpmsg: glink: Don't send pending rx_done during remove · 6f482295
      Bjorn Andersson 提交于
      commit c3dadc19b7564c732598b30d637c6f275c3b77b6 upstream.
      
      Attempting to transmit rx_done messages after the GLINK instance is
      being torn down will cause use after free and memory leaks. So cancel
      the intent_work and free up the pending intents.
      
      With this there are no concurrent accessors of the channel left during
      qcom_glink_native_remove() and there is therefor no need to hold the
      spinlock during this operation - which would prohibit the use of
      cancel_work_sync() in the release function. So remove this.
      
      Fixes: 1d2ea36e ("rpmsg: glink: Add rx done command")
      Cc: stable@vger.kernel.org
      Acked-by: NChris Lew <clew@codeaurora.org>
      Tested-by: NSrinivas Kandagatla <srinivas.kandagatla@linaro.org>
      Signed-off-by: NBjorn Andersson <bjorn.andersson@linaro.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6f482295
    • C
      rpmsg: glink: Fix rpmsg_register_device err handling · a033a2a6
      Chris Lew 提交于
      commit f7e714988edaffe6ac578318e99501149b067ba0 upstream.
      
      The device release function is set before registering with rpmsg. If
      rpmsg registration fails, the framework will call device_put(), which
      invokes the release function. The channel create logic does not need to
      free rpdev if rpmsg_register_device() fails and release is called.
      
      Fixes: b4f8e52b ("rpmsg: Introduce Qualcomm RPM glink driver")
      Cc: stable@vger.kernel.org
      Tested-by: NSrinivas Kandagatla <srinivas.kandagatla@linaro.org>
      Signed-off-by: NChris Lew <clew@codeaurora.org>
      Signed-off-by: NBjorn Andersson <bjorn.andersson@linaro.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a033a2a6
    • C
      rpmsg: glink: Put an extra reference during cleanup · 478963b1
      Chris Lew 提交于
      commit b646293e272816dd0719529dcebbd659de0722f7 upstream.
      
      In a remote processor crash scenario, there is no guarantee the remote
      processor sent close requests before it went into a bad state. Remove
      the reference that is normally handled by the close command in the
      so channel resources can be released.
      
      Fixes: b4f8e52b ("rpmsg: Introduce Qualcomm RPM glink driver")
      Cc: stable@vger.kernel.org
      Tested-by: NSrinivas Kandagatla <srinivas.kandagatla@linaro.org>
      Signed-off-by: NChris Lew <clew@codeaurora.org>
      Reported-by: NSrinivas Kandagatla <srinivas.kandagatla@linaro.org>
      Signed-off-by: NBjorn Andersson <bjorn.andersson@linaro.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      478963b1
    • A
      rpmsg: glink: Fix use after free in open_ack TIMEOUT case · 8a5b99ad
      Arun Kumar Neelakantam 提交于
      commit ac74ea01860170699fb3b6ea80c0476774c8e94f upstream.
      
      Extra channel reference put when remote sending OPEN_ACK after timeout
      causes use-after-free while handling next remote CLOSE command.
      
      Remove extra reference put in timeout case to avoid use-after-free.
      
      Fixes: b4f8e52b ("rpmsg: Introduce Qualcomm RPM glink driver")
      Cc: stable@vger.kernel.org
      Tested-by: NSrinivas Kandagatla <srinivas.kandagatla@linaro.org>
      Signed-off-by: NArun Kumar Neelakantam <aneela@codeaurora.org>
      Signed-off-by: NBjorn Andersson <bjorn.andersson@linaro.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8a5b99ad
    • A
      rpmsg: glink: Fix reuse intents memory leak issue · b909f12e
      Arun Kumar Neelakantam 提交于
      commit b85f6b601407347f5425c4c058d1b7871f5bf4f0 upstream.
      
      Memory allocated for re-usable intents are not freed during channel
      cleanup which causes memory leak in system.
      
      Check and free all re-usable memory to avoid memory leak.
      
      Fixes: 933b45da ("rpmsg: glink: Add support for TX intents")
      Cc: stable@vger.kernel.org
      Acked-By: NChris Lew <clew@codeaurora.org>
      Tested-by: NSrinivas Kandagatla <srinivas.kandagatla@linaro.org>
      Signed-off-by: NArun Kumar Neelakantam <aneela@codeaurora.org>
      Reported-by: NSrinivas Kandagatla <srinivas.kandagatla@linaro.org>
      Signed-off-by: NBjorn Andersson <bjorn.andersson@linaro.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b909f12e
    • C
      rpmsg: glink: Set tail pointer to 0 at end of FIFO · 6c456036
      Chris Lew 提交于
      commit 4623e8bf1de0b86e23a56cdb39a72f054e89c3bd upstream.
      
      When wrapping around the FIFO, the remote expects the tail pointer to
      be reset to 0 on the edge case where the tail equals the FIFO length.
      
      Fixes: caf989c3 ("rpmsg: glink: Introduce glink smem based transport")
      Cc: stable@vger.kernel.org
      Signed-off-by: NChris Lew <clew@codeaurora.org>
      Signed-off-by: NBjorn Andersson <bjorn.andersson@linaro.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6c456036
    • M
      xtensa: fix TLB sanity checker · 2f5d27dd
      Max Filippov 提交于
      commit 36de10c4788efc6efe6ff9aa10d38cb7eea4c818 upstream.
      
      Virtual and translated addresses retrieved by the xtensa TLB sanity
      checker must be consistent, i.e. correspond to the same state of the
      checked TLB entry. KASAN shadow memory is mapped dynamically using
      auto-refill TLB entries and thus may change TLB state between the
      virtual and translated address retrieval, resulting in false TLB
      insanity report.
      Move read_xtlb_translation close to read_xtlb_virtual to make sure that
      read values are consistent.
      
      Cc: stable@vger.kernel.org
      Fixes: a99e07ee ("xtensa: check TLB sanity on return to userspace")
      Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2f5d27dd
    • G
      PCI: Apply Cavium ACS quirk to ThunderX2 and ThunderX3 · dc6f9b00
      George Cherian 提交于
      commit f338bb9f0179cb959977b74e8331b312264d720b upstream.
      
      Enhance the ACS quirk for Cavium Processors. Add the root port vendor IDs
      for ThunderX2 and ThunderX3 series of processors.
      
      [bhelgaas: add Fixes: and stable tag]
      Fixes: f2ddaf8d ("PCI: Apply Cavium ThunderX ACS quirk to more Root Ports")
      Link: https://lore.kernel.org/r/20191111024243.GA11408@dc5-eodlnx05.marvell.comSigned-off-by: NGeorge Cherian <george.cherian@marvell.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NRobert Richter <rrichter@marvell.com>
      Cc: stable@vger.kernel.org	# v4.12+
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dc6f9b00
    • J
      PCI/MSI: Fix incorrect MSI-X masking on resume · 8d588ce7
      Jian-Hong Pan 提交于
      commit e045fa29e89383c717e308609edd19d2fd29e1be upstream.
      
      When a driver enables MSI-X, msix_program_entries() reads the MSI-X Vector
      Control register for each vector and saves it in desc->masked.  Each
      register is 32 bits and bit 0 is the actual Mask bit.
      
      When we restored these registers during resume, we previously set the Mask
      bit if *any* bit in desc->masked was set instead of when the Mask bit
      itself was set:
      
        pci_restore_state
          pci_restore_msi_state
            __pci_restore_msix_state
              for_each_pci_msi_entry
                msix_mask_irq(entry, entry->masked)   <-- entire u32 word
                  __pci_msix_desc_mask_irq(desc, flag)
                    mask_bits = desc->masked & ~PCI_MSIX_ENTRY_CTRL_MASKBIT
                    if (flag)       <-- testing entire u32, not just bit 0
                      mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT
                    writel(mask_bits, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL)
      
      This means that after resume, MSI-X vectors were masked when they shouldn't
      be, which leads to timeouts like this:
      
        nvme nvme0: I/O 978 QID 3 timeout, completion polled
      
      On resume, set the Mask bit only when the saved Mask bit from suspend was
      set.
      
      This should remove the need for 19ea025e1d28 ("nvme: Add quirk for Kingston
      NVME SSD running FW E8FK11.T").
      
      [bhelgaas: commit log, move fix to __pci_msix_desc_mask_irq()]
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=204887
      Link: https://lore.kernel.org/r/20191008034238.2503-1-jian-hong@endlessm.com
      Fixes: f2440d9a ("PCI MSI: Refactor interrupt masking code")
      Signed-off-by: NJian-Hong Pan <jian-hong@endlessm.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8d588ce7
    • S
      PCI: Fix Intel ACS quirk UPDCR register address · de8ecdd2
      Steffen Liebergeld 提交于
      commit d8558ac8c93d429d65d7490b512a3a67e559d0d4 upstream.
      
      According to documentation [0] the correct offset for the Upstream Peer
      Decode Configuration Register (UPDCR) is 0x1014.  It was previously defined
      as 0x1114.
      
      d99321b6 ("PCI: Enable quirks for PCIe ACS on Intel PCH root ports")
      intended to enforce isolation between PCI devices allowing them to be put
      into separate IOMMU groups.  Due to the wrong register offset the intended
      isolation was not fully enforced.  This is fixed with this patch.
      
      Please note that I did not test this patch because I have no hardware that
      implements this register.
      
      [0] https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/4th-gen-core-family-mobile-i-o-datasheet.pdf (page 325)
      Fixes: d99321b6 ("PCI: Enable quirks for PCIe ACS on Intel PCH root ports")
      Link: https://lore.kernel.org/r/7a3505df-79ba-8a28-464c-88b83eefffa6@kernkonzept.comSigned-off-by: NSteffen Liebergeld <steffen.liebergeld@kernkonzept.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NAndrew Murray <andrew.murray@arm.com>
      Acked-by: NAshok Raj <ashok.raj@intel.com>
      Cc: stable@vger.kernel.org	# v3.15+
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      de8ecdd2
    • L
      PCI: pciehp: Avoid returning prematurely from sysfs requests · 248e65f3
      Lukas Wunner 提交于
      commit 157c1062fcd86ade3c674503705033051fd3d401 upstream.
      
      A sysfs request to enable or disable a PCIe hotplug slot should not
      return before it has been carried out.  That is sought to be achieved by
      waiting until the controller's "pending_events" have been cleared.
      
      However the IRQ thread pciehp_ist() clears the "pending_events" before
      it acts on them.  If pciehp_sysfs_enable_slot() / _disable_slot() happen
      to check the "pending_events" after they have been cleared but while
      pciehp_ist() is still running, the functions may return prematurely
      with an incorrect return value.
      
      Fix by introducing an "ist_running" flag which must be false before a sysfs
      request is allowed to return.
      
      Fixes: 32a8cef2 ("PCI: pciehp: Enable/disable exclusively from IRQ thread")
      Link: https://lore.kernel.org/linux-pci/1562226638-54134-1-git-send-email-wangxiongfeng2@huawei.com
      Link: https://lore.kernel.org/r/4174210466e27eb7e2243dd1d801d5f75baaffd8.1565345211.git.lukas@wunner.deReported-and-tested-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Cc: stable@vger.kernel.org # v4.19+
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      248e65f3
    • D
      PCI/PM: Always return devices to D0 when thawing · 253c77b5
      Dexuan Cui 提交于
      commit f2c33ccacb2d4bbeae2a255a7ca0cbfd03017b7c upstream.
      
      pci_pm_thaw_noirq() is supposed to return the device to D0 and restore its
      configuration registers, but previously it only did that for devices whose
      drivers implemented the new power management ops.
      
      Hibernation, e.g., via "echo disk > /sys/power/state", involves freezing
      devices, creating a hibernation image, thawing devices, writing the image,
      and powering off.  The fact that thawing did not return devices with legacy
      power management to D0 caused errors, e.g., in this path:
      
        pci_pm_thaw_noirq
          if (pci_has_legacy_pm_support(pci_dev)) # true for Mellanox VF driver
            return pci_legacy_resume_early(dev)   # ... legacy PM skips the rest
          pci_set_power_state(pci_dev, PCI_D0)
          pci_restore_state(pci_dev)
        pci_pm_thaw
          if (pci_has_legacy_pm_support(pci_dev))
            pci_legacy_resume
      	drv->resume
      	  mlx4_resume
      	    ...
      	      pci_enable_msix_range
      	        ...
      		  if (dev->current_state != PCI_D0)  # <---
      		    return -EINVAL;
      
      which caused these warnings:
      
        mlx4_core a6d1:00:02.0: INTx is not supported in multi-function mode, aborting
        PM: dpm_run_callback(): pci_pm_thaw+0x0/0xd7 returns -95
        PM: Device a6d1:00:02.0 failed to thaw: error -95
      
      Return devices to D0 and restore config registers for all devices, not just
      those whose drivers support new power management.
      
      [bhelgaas: also call pci_restore_state() before pci_legacy_resume_early(),
      update comment, add stable tag, commit log]
      Link: https://lore.kernel.org/r/KU1P153MB016637CAEAD346F0AA8E3801BFAD0@KU1P153MB0166.APCP153.PROD.OUTLOOK.COMSigned-off-by: NDexuan Cui <decui@microsoft.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: stable@vger.kernel.org	# v4.13+
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      253c77b5
    • C
      mmc: block: Add CMD13 polling for MMC IOCTLS with R1B response · 12e8ae94
      Chaotian Jing 提交于
      commit a0d4c7eb71dd08a89ad631177bb0cbbabd598f84 upstream.
      
      MMC IOCTLS with R1B responses may cause the card to enter the busy state,
      which means it's not ready to receive a new request. To prevent new
      requests from being sent to the card, use a CMD13 polling loop to verify
      that the card returns to the transfer state, before completing the request.
      Signed-off-by: NChaotian Jing <chaotian.jing@mediatek.com>
      Reviewed-by: NAvri Altman <avri.altman@wdc.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      12e8ae94
    • C
      mmc: block: Make card_busy_detect() a bit more generic · 848fd6b1
      Chaotian Jing 提交于
      commit 3869468e0c4800af52bfe1e0b72b338dcdae2cfc upstream.
      
      To prepare for more users of card_busy_detect(), let's drop the struct
      request * as an in-parameter and convert to log the error message via
      dev_err() instead of pr_err().
      Signed-off-by: NChaotian Jing <chaotian.jing@mediatek.com>
      Reviewed-by: NAvri Altman <avri.altman@wdc.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      848fd6b1
    • G
      Revert "arm64: preempt: Fix big-endian when checking preempt count in assembly" · 45ede4b1
      Greg Kroah-Hartman 提交于
      This reverts commit 64694b27 which is
      commit 7faa313f05cad184e8b17750f0cbe5216ac6debb upstream.
      
      Turns out one of the pre-requsite patches wasn't in 4.19.y, so this
      patch didn't make sense.  So let's revert it.
      Reported-by: NSteven Rostedt <rostedt@goodmis.org>
      Reported-by: NWill Deacon <will@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Kevin Hilman <khilman@baylibre.com>
      Cc: Sasha Levin <sashal@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      45ede4b1
    • G
      tcp: Protect accesses to .ts_recent_stamp with {READ,WRITE}_ONCE() · fbcf85b0
      Guillaume Nault 提交于
      [ Upstream commit 721c8dafad26ccfa90ff659ee19755e3377b829d ]
      
      Syncookies borrow the ->rx_opt.ts_recent_stamp field to store the
      timestamp of the last synflood. Protect them with READ_ONCE() and
      WRITE_ONCE() since reads and writes aren't serialised.
      
      Use of .rx_opt.ts_recent_stamp for storing the synflood timestamp was
      introduced by a0f82f64 ("syncookies: remove last_synq_overflow from
      struct tcp_sock"). But unprotected accesses were already there when
      timestamp was stored in .last_synq_overflow.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NGuillaume Nault <gnault@redhat.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fbcf85b0
    • G
      tcp: tighten acceptance of ACKs not matching a child socket · 4b8a9869
      Guillaume Nault 提交于
      [ Upstream commit cb44a08f8647fd2e8db5cc9ac27cd8355fa392d8 ]
      
      When no synflood occurs, the synflood timestamp isn't updated.
      Therefore it can be so old that time_after32() can consider it to be
      in the future.
      
      That's a problem for tcp_synq_no_recent_overflow() as it may report
      that a recent overflow occurred while, in fact, it's just that jiffies
      has grown past 'last_overflow' + TCP_SYNCOOKIE_VALID + 2^31.
      
      Spurious detection of recent overflows lead to extra syncookie
      verification in cookie_v[46]_check(). At that point, the verification
      should fail and the packet dropped. But we should have dropped the
      packet earlier as we didn't even send a syncookie.
      
      Let's refine tcp_synq_no_recent_overflow() to report a recent overflow
      only if jiffies is within the
      [last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval. This
      way, no spurious recent overflow is reported when jiffies wraps and
      'last_overflow' becomes in the future from the point of view of
      time_after32().
      
      However, if jiffies wraps and enters the
      [last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval (with
      'last_overflow' being a stale synflood timestamp), then
      tcp_synq_no_recent_overflow() still erroneously reports an
      overflow. In such cases, we have to rely on syncookie verification
      to drop the packet. We unfortunately have no way to differentiate
      between a fresh and a stale syncookie timestamp.
      
      In practice, using last_overflow as lower bound is problematic.
      If the synflood timestamp is concurrently updated between the time
      we read jiffies and the moment we store the timestamp in
      'last_overflow', then 'now' becomes smaller than 'last_overflow' and
      tcp_synq_no_recent_overflow() returns true, potentially dropping a
      valid syncookie.
      
      Reading jiffies after loading the timestamp could fix the problem,
      but that'd require a memory barrier. Let's just accommodate for
      potential timestamp growth instead and extend the interval using
      'last_overflow - HZ' as lower bound.
      Signed-off-by: NGuillaume Nault <gnault@redhat.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4b8a9869
    • G
      tcp: fix rejected syncookies due to stale timestamps · bac9e8f3
      Guillaume Nault 提交于
      [ Upstream commit 04d26e7b159a396372646a480f4caa166d1b6720 ]
      
      If no synflood happens for a long enough period of time, then the
      synflood timestamp isn't refreshed and jiffies can advance so much
      that time_after32() can't accurately compare them any more.
      
      Therefore, we can end up in a situation where time_after32(now,
      last_overflow + HZ) returns false, just because these two values are
      too far apart. In that case, the synflood timestamp isn't updated as
      it should be, which can trick tcp_synq_no_recent_overflow() into
      rejecting valid syncookies.
      
      For example, let's consider the following scenario on a system
      with HZ=1000:
      
        * The synflood timestamp is 0, either because that's the timestamp
          of the last synflood or, more commonly, because we're working with
          a freshly created socket.
      
        * We receive a new SYN, which triggers synflood protection. Let's say
          that this happens when jiffies == 2147484649 (that is,
          'synflood timestamp' + HZ + 2^31 + 1).
      
        * Then tcp_synq_overflow() doesn't update the synflood timestamp,
          because time_after32(2147484649, 1000) returns false.
          With:
            - 2147484649: the value of jiffies, aka. 'now'.
            - 1000: the value of 'last_overflow' + HZ.
      
        * A bit later, we receive the ACK completing the 3WHS. But
          cookie_v[46]_check() rejects it because tcp_synq_no_recent_overflow()
          says that we're not under synflood. That's because
          time_after32(2147484649, 120000) returns false.
          With:
            - 2147484649: the value of jiffies, aka. 'now'.
            - 120000: the value of 'last_overflow' + TCP_SYNCOOKIE_VALID.
      
          Of course, in reality jiffies would have increased a bit, but this
          condition will last for the next 119 seconds, which is far enough
          to accommodate for jiffie's growth.
      
      Fix this by updating the overflow timestamp whenever jiffies isn't
      within the [last_overflow, last_overflow + HZ] range. That shouldn't
      have any performance impact since the update still happens at most once
      per second.
      
      Now we're guaranteed to have fresh timestamps while under synflood, so
      tcp_synq_no_recent_overflow() can safely use it with time_after32() in
      such situations.
      
      Stale timestamps can still make tcp_synq_no_recent_overflow() return
      the wrong verdict when not under synflood. This will be handled in the
      next patch.
      
      For 64 bits architectures, the problem was introduced with the
      conversion of ->tw_ts_recent_stamp to 32 bits integer by commit
      cca9bab1 ("tcp: use monotonic timestamps for PAWS").
      The problem has always been there on 32 bits architectures.
      
      Fixes: cca9bab1 ("tcp: use monotonic timestamps for PAWS")
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NGuillaume Nault <gnault@redhat.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bac9e8f3
    • H
      net/mlx5e: Query global pause state before setting prio2buffer · c5fc25e6
      Huy Nguyen 提交于
      [ Upstream commit 73e6551699a32fac703ceea09214d6580edcf2d5 ]
      
      When the user changes prio2buffer mapping while global pause is
      enabled, mlx5 driver incorrectly sets all active buffers
      (buffer that has at least one priority mapped) to lossy.
      
      Solution:
      If global pause is enabled, set all the active buffers to lossless
      in prio2buffer command.
      Also, add error message when buffer size is not enough to meet
      xoff threshold.
      
      Fixes: 0696d608 ("net/mlx5e: Receive buffer configuration")
      Signed-off-by: NHuy Nguyen <huyn@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c5fc25e6
    • T
      tipc: fix ordering of tipc module init and exit routine · 9430afbc
      Taehee Yoo 提交于
      [ Upstream commit 9cf1cd8ee3ee09ef2859017df2058e2f53c5347f ]
      
      In order to set/get/dump, the tipc uses the generic netlink
      infrastructure. So, when tipc module is inserted, init function
      calls genl_register_family().
      After genl_register_family(), set/get/dump commands are immediately
      allowed and these callbacks internally use the net_generic.
      net_generic is allocated by register_pernet_device() but this
      is called after genl_register_family() in the __init function.
      So, these callbacks would use un-initialized net_generic.
      
      Test commands:
          #SHELL1
          while :
          do
              modprobe tipc
              modprobe -rv tipc
          done
      
          #SHELL2
          while :
          do
              tipc link list
          done
      
      Splat looks like:
      [   59.616322][ T2788] kasan: CONFIG_KASAN_INLINE enabled
      [   59.617234][ T2788] kasan: GPF could be caused by NULL-ptr deref or user memory access
      [   59.618398][ T2788] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
      [   59.619389][ T2788] CPU: 3 PID: 2788 Comm: tipc Not tainted 5.4.0+ #194
      [   59.620231][ T2788] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [   59.621428][ T2788] RIP: 0010:tipc_bcast_get_broadcast_mode+0x131/0x310 [tipc]
      [   59.622379][ T2788] Code: c7 c6 ef 8b 38 c0 65 ff 0d 84 83 c9 3f e8 d7 a5 f2 e3 48 8d bb 38 11 00 00 48 b8 00 00 00 00
      [   59.622550][ T2780] NET: Registered protocol family 30
      [   59.624627][ T2788] RSP: 0018:ffff88804b09f578 EFLAGS: 00010202
      [   59.624630][ T2788] RAX: dffffc0000000000 RBX: 0000000000000011 RCX: 000000008bc66907
      [   59.624631][ T2788] RDX: 0000000000000229 RSI: 000000004b3cf4cc RDI: 0000000000001149
      [   59.624633][ T2788] RBP: ffff88804b09f588 R08: 0000000000000003 R09: fffffbfff4fb3df1
      [   59.624635][ T2788] R10: fffffbfff50318f8 R11: ffff888066cadc18 R12: ffffffffa6cc2f40
      [   59.624637][ T2788] R13: 1ffff11009613eba R14: ffff8880662e9328 R15: ffff8880662e9328
      [   59.624639][ T2788] FS:  00007f57d8f7b740(0000) GS:ffff88806cc00000(0000) knlGS:0000000000000000
      [   59.624645][ T2788] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   59.625875][ T2780] tipc: Started in single node mode
      [   59.626128][ T2788] CR2: 00007f57d887a8c0 CR3: 000000004b140002 CR4: 00000000000606e0
      [   59.633991][ T2788] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   59.635195][ T2788] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   59.636478][ T2788] Call Trace:
      [   59.637025][ T2788]  tipc_nl_add_bc_link+0x179/0x1470 [tipc]
      [   59.638219][ T2788]  ? lock_downgrade+0x6e0/0x6e0
      [   59.638923][ T2788]  ? __tipc_nl_add_link+0xf90/0xf90 [tipc]
      [   59.639533][ T2788]  ? tipc_nl_node_dump_link+0x318/0xa50 [tipc]
      [   59.640160][ T2788]  ? mutex_lock_io_nested+0x1380/0x1380
      [   59.640746][ T2788]  tipc_nl_node_dump_link+0x4fd/0xa50 [tipc]
      [   59.641356][ T2788]  ? tipc_nl_node_reset_link_stats+0x340/0x340 [tipc]
      [   59.642088][ T2788]  ? __skb_ext_del+0x270/0x270
      [   59.642594][ T2788]  genl_lock_dumpit+0x85/0xb0
      [   59.643050][ T2788]  netlink_dump+0x49c/0xed0
      [   59.643529][ T2788]  ? __netlink_sendskb+0xc0/0xc0
      [   59.644044][ T2788]  ? __netlink_dump_start+0x190/0x800
      [   59.644617][ T2788]  ? __mutex_unlock_slowpath+0xd0/0x670
      [   59.645177][ T2788]  __netlink_dump_start+0x5a0/0x800
      [   59.645692][ T2788]  genl_rcv_msg+0xa75/0xe90
      [   59.646144][ T2788]  ? __lock_acquire+0xdfe/0x3de0
      [   59.646692][ T2788]  ? genl_family_rcv_msg_attrs_parse+0x320/0x320
      [   59.647340][ T2788]  ? genl_lock_dumpit+0xb0/0xb0
      [   59.647821][ T2788]  ? genl_unlock+0x20/0x20
      [   59.648290][ T2788]  ? genl_parallel_done+0xe0/0xe0
      [   59.648787][ T2788]  ? find_held_lock+0x39/0x1d0
      [   59.649276][ T2788]  ? genl_rcv+0x15/0x40
      [   59.649722][ T2788]  ? lock_contended+0xcd0/0xcd0
      [   59.650296][ T2788]  netlink_rcv_skb+0x121/0x350
      [   59.650828][ T2788]  ? genl_family_rcv_msg_attrs_parse+0x320/0x320
      [   59.651491][ T2788]  ? netlink_ack+0x940/0x940
      [   59.651953][ T2788]  ? lock_acquire+0x164/0x3b0
      [   59.652449][ T2788]  genl_rcv+0x24/0x40
      [   59.652841][ T2788]  netlink_unicast+0x421/0x600
      [ ... ]
      
      Fixes: 7e436905 ("tipc: fix a slab object leak")
      Fixes: a62fbcce ("tipc: make subscriber server support net namespace")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9430afbc
    • E
      tcp: md5: fix potential overestimation of TCP option space · a148815a
      Eric Dumazet 提交于
      [ Upstream commit 9424e2e7ad93ffffa88f882c9bc5023570904b55 ]
      
      Back in 2008, Adam Langley fixed the corner case of packets for flows
      having all of the following options : MD5 TS SACK
      
      Since MD5 needs 20 bytes, and TS needs 12 bytes, no sack block
      can be cooked from the remaining 8 bytes.
      
      tcp_established_options() correctly sets opts->num_sack_blocks
      to zero, but returns 36 instead of 32.
      
      This means TCP cooks packets with 4 extra bytes at the end
      of options, containing unitialized bytes.
      
      Fixes: 33ad798c ("tcp: options clean up")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a148815a
    • A
      openvswitch: support asymmetric conntrack · 6f99afcc
      Aaron Conole 提交于
      [ Upstream commit 5d50aa83e2c8e91ced2cca77c198b468ca9210f4 ]
      
      The openvswitch module shares a common conntrack and NAT infrastructure
      exposed via netfilter.  It's possible that a packet needs both SNAT and
      DNAT manipulation, due to e.g. tuple collision.  Netfilter can support
      this because it runs through the NAT table twice - once on ingress and
      again after egress.  The openvswitch module doesn't have such capability.
      
      Like netfilter hook infrastructure, we should run through NAT twice to
      keep the symmetry.
      
      Fixes: 05752523 ("openvswitch: Interface with NAT.")
      Signed-off-by: NAaron Conole <aconole@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6f99afcc
    • M
      net: thunderx: start phy before starting autonegotiation · 13156081
      Mian Yousaf Kaukab 提交于
      [ Upstream commit a350d2e7adbb57181d33e3aa6f0565632747feaa ]
      
      Since commit 2b3e88ea6528 ("net: phy: improve phy state checking")
      phy_start_aneg() expects phy state to be >= PHY_UP. Call phy_start()
      before calling phy_start_aneg() during probe so that autonegotiation
      is initiated.
      
      As phy_start() takes care of calling phy_start_aneg(), drop the explicit
      call to phy_start_aneg().
      
      Network fails without this patch on Octeon TX.
      
      Fixes: 2b3e88ea6528 ("net: phy: improve phy state checking")
      Signed-off-by: NMian Yousaf Kaukab <ykaukab@suse.de>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      13156081
    • D
      net: sched: fix dump qlen for sch_mq/sch_mqprio with NOLOCK subqueues · 0c5a4dd6
      Dust Li 提交于
      [ Upstream commit 2f23cd42e19c22c24ff0e221089b7b6123b117c5 ]
      
      sch->q.len hasn't been set if the subqueue is a NOLOCK qdisc
       in mq_dump() and mqprio_dump().
      
      Fixes: ce679e8d ("net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio")
      Signed-off-by: NDust Li <dust.li@linux.alibaba.com>
      Signed-off-by: NTony Lu <tonylu@linux.alibaba.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0c5a4dd6
    • G
      net: ethernet: ti: cpsw: fix extra rx interrupt · 64334e4f
      Grygorii Strashko 提交于
      [ Upstream commit 51302f77bedab8768b761ed1899c08f89af9e4e2 ]
      
      Now RX interrupt is triggered twice every time, because in
      cpsw_rx_interrupt() it is asked first and then disabled. So there will be
      pending interrupt always, when RX interrupt is enabled again in NAPI
      handler.
      
      Fix it by first disabling IRQ and then do ask.
      
      Fixes: 870915fe ("drivers: net: cpsw: remove disable_irq/enable_irq as irq can be masked from cpsw itself")
      Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      64334e4f
    • A
      net: dsa: fix flow dissection on Tx path · a7d80e75
      Alexander Lobakin 提交于
      [ Upstream commit 8bef0af09a5415df761b04fa487a6c34acae74bc ]
      
      Commit 43e66528 ("net-next: dsa: fix flow dissection") added an
      ability to override protocol and network offset during flow dissection
      for DSA-enabled devices (i.e. controllers shipped as switch CPU ports)
      in order to fix skb hashing for RPS on Rx path.
      
      However, skb_hash() and added part of code can be invoked not only on
      Rx, but also on Tx path if we have a multi-queued device and:
       - kernel is running on UP system or
       - XPS is not configured.
      
      The call stack in this two cases will be like: dev_queue_xmit() ->
      __dev_queue_xmit() -> netdev_core_pick_tx() -> netdev_pick_tx() ->
      skb_tx_hash() -> skb_get_hash().
      
      The problem is that skbs queued for Tx have both network offset and
      correct protocol already set up even after inserting a CPU tag by DSA
      tagger, so calling tag_ops->flow_dissect() on this path actually only
      breaks flow dissection and hashing.
      
      This can be observed by adding debug prints just before and right after
      tag_ops->flow_dissect() call to the related block of code:
      
      Before the patch:
      
      Rx path (RPS):
      
      [   19.240001] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
      [   19.244271] tag_ops->flow_dissect()
      [   19.247811] Rx: proto: 0x0800, nhoff: 8	/* ETH_P_IP */
      
      [   19.215435] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
      [   19.219746] tag_ops->flow_dissect()
      [   19.223241] Rx: proto: 0x0806, nhoff: 8	/* ETH_P_ARP */
      
      [   18.654057] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
      [   18.658332] tag_ops->flow_dissect()
      [   18.661826] Rx: proto: 0x8100, nhoff: 8	/* ETH_P_8021Q */
      
      Tx path (UP system):
      
      [   18.759560] Tx: proto: 0x0800, nhoff: 26	/* ETH_P_IP */
      [   18.763933] tag_ops->flow_dissect()
      [   18.767485] Tx: proto: 0x920b, nhoff: 34	/* junk */
      
      [   22.800020] Tx: proto: 0x0806, nhoff: 26	/* ETH_P_ARP */
      [   22.804392] tag_ops->flow_dissect()
      [   22.807921] Tx: proto: 0x920b, nhoff: 34	/* junk */
      
      [   16.898342] Tx: proto: 0x86dd, nhoff: 26	/* ETH_P_IPV6 */
      [   16.902705] tag_ops->flow_dissect()
      [   16.906227] Tx: proto: 0x920b, nhoff: 34	/* junk */
      
      After:
      
      Rx path (RPS):
      
      [   16.520993] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
      [   16.525260] tag_ops->flow_dissect()
      [   16.528808] Rx: proto: 0x0800, nhoff: 8	/* ETH_P_IP */
      
      [   15.484807] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
      [   15.490417] tag_ops->flow_dissect()
      [   15.495223] Rx: proto: 0x0806, nhoff: 8	/* ETH_P_ARP */
      
      [   17.134621] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
      [   17.138895] tag_ops->flow_dissect()
      [   17.142388] Rx: proto: 0x8100, nhoff: 8	/* ETH_P_8021Q */
      
      Tx path (UP system):
      
      [   15.499558] Tx: proto: 0x0800, nhoff: 26	/* ETH_P_IP */
      
      [   20.664689] Tx: proto: 0x0806, nhoff: 26	/* ETH_P_ARP */
      
      [   18.565782] Tx: proto: 0x86dd, nhoff: 26	/* ETH_P_IPV6 */
      
      In order to fix that we can add the check 'proto == htons(ETH_P_XDSA)'
      to prevent code from calling tag_ops->flow_dissect() on Tx.
      I also decided to initialize 'offset' variable so tagger callbacks can
      now safely leave it untouched without provoking a chaos.
      
      Fixes: 43e66528 ("net-next: dsa: fix flow dissection")
      Signed-off-by: NAlexander Lobakin <alobakin@dlink.ru>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a7d80e75
    • N
      net: bridge: deny dev_set_mac_address() when unregistering · bb168ebe
      Nikolay Aleksandrov 提交于
      [ Upstream commit c4b4c421857dc7b1cf0dccbd738472360ff2cd70 ]
      
      We have an interesting memory leak in the bridge when it is being
      unregistered and is a slave to a master device which would change the
      mac of its slaves on unregister (e.g. bond, team). This is a very
      unusual setup but we do end up leaking 1 fdb entry because
      dev_set_mac_address() would cause the bridge to insert the new mac address
      into its table after all fdbs are flushed, i.e. after dellink() on the
      bridge has finished and we call NETDEV_UNREGISTER the bond/team would
      release it and will call dev_set_mac_address() to restore its original
      address and that in turn will add an fdb in the bridge.
      One fix is to check for the bridge dev's reg_state in its
      ndo_set_mac_address callback and return an error if the bridge is not in
      NETREG_REGISTERED.
      
      Easy steps to reproduce:
       1. add bond in mode != A/B
       2. add any slave to the bond
       3. add bridge dev as a slave to the bond
       4. destroy the bridge device
      
      Trace:
       unreferenced object 0xffff888035c4d080 (size 128):
         comm "ip", pid 4068, jiffies 4296209429 (age 1413.753s)
         hex dump (first 32 bytes):
           41 1d c9 36 80 88 ff ff 00 00 00 00 00 00 00 00  A..6............
           d2 19 c9 5e 3f d7 00 00 00 00 00 00 00 00 00 00  ...^?...........
         backtrace:
           [<00000000ddb525dc>] kmem_cache_alloc+0x155/0x26f
           [<00000000633ff1e0>] fdb_create+0x21/0x486 [bridge]
           [<0000000092b17e9c>] fdb_insert+0x91/0xdc [bridge]
           [<00000000f2a0f0ff>] br_fdb_change_mac_address+0xb3/0x175 [bridge]
           [<000000001de02dbd>] br_stp_change_bridge_id+0xf/0xff [bridge]
           [<00000000ac0e32b1>] br_set_mac_address+0x76/0x99 [bridge]
           [<000000006846a77f>] dev_set_mac_address+0x63/0x9b
           [<00000000d30738fc>] __bond_release_one+0x3f6/0x455 [bonding]
           [<00000000fc7ec01d>] bond_netdev_event+0x2f2/0x400 [bonding]
           [<00000000305d7795>] notifier_call_chain+0x38/0x56
           [<0000000028885d4a>] call_netdevice_notifiers+0x1e/0x23
           [<000000008279477b>] rollback_registered_many+0x353/0x6a4
           [<0000000018ef753a>] unregister_netdevice_many+0x17/0x6f
           [<00000000ba854b7a>] rtnl_delete_link+0x3c/0x43
           [<00000000adf8618d>] rtnl_dellink+0x1dc/0x20a
           [<000000009b6395fd>] rtnetlink_rcv_msg+0x23d/0x268
      
      Fixes: 43598813 ("bridge: add local MAC address to forwarding table (v2)")
      Reported-by: syzbot+2add91c08eb181fea1bf@syzkaller.appspotmail.com
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bb168ebe
    • V
      mqprio: Fix out-of-bounds access in mqprio_dump · 588fac83
      Vladyslav Tarasiuk 提交于
      [ Upstream commit 9f104c7736904ac72385bbb48669e0c923ca879b ]
      
      When user runs a command like
      tc qdisc add dev eth1 root mqprio
      KASAN stack-out-of-bounds warning is emitted.
      Currently, NLA_ALIGN macro used in mqprio_dump provides too large
      buffer size as argument for nla_put and memcpy down the call stack.
      The flow looks like this:
      1. nla_put expects exact object size as an argument;
      2. Later it provides this size to memcpy;
      3. To calculate correct padding for SKB, nla_put applies NLA_ALIGN
         macro itself.
      
      Therefore, NLA_ALIGN should not be applied to the nla_put parameter.
      Otherwise it will lead to out-of-bounds memory access in memcpy.
      
      Fixes: 4e8b86c0 ("mqprio: Introduce new hardware offload mode and shaper in mqprio")
      Signed-off-by: NVladyslav Tarasiuk <vladyslavt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      588fac83
    • E
      inet: protect against too small mtu values. · d80d67cd
      Eric Dumazet 提交于
      [ Upstream commit 501a90c945103e8627406763dac418f20f3837b2 ]
      
      syzbot was once again able to crash a host by setting a very small mtu
      on loopback device.
      
      Let's make inetdev_valid_mtu() available in include/net/ip.h,
      and use it in ip_setup_cork(), so that we protect both ip_append_page()
      and __ip_append_data()
      
      Also add a READ_ONCE() when the device mtu is read.
      
      Pairs this lockless read with one WRITE_ONCE() in __dev_set_mtu(),
      even if other code paths might write over this field.
      
      Add a big comment in include/linux/netdevice.h about dev->mtu
      needing READ_ONCE()/WRITE_ONCE() annotations.
      
      Hopefully we will add the missing ones in followup patches.
      
      [1]
      
      refcount_t: saturated; leaking memory.
      WARNING: CPU: 0 PID: 9464 at lib/refcount.c:22 refcount_warn_saturate+0x138/0x1f0 lib/refcount.c:22
      Kernel panic - not syncing: panic_on_warn set ...
      CPU: 0 PID: 9464 Comm: syz-executor850 Not tainted 5.4.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x197/0x210 lib/dump_stack.c:118
       panic+0x2e3/0x75c kernel/panic.c:221
       __warn.cold+0x2f/0x3e kernel/panic.c:582
       report_bug+0x289/0x300 lib/bug.c:195
       fixup_bug arch/x86/kernel/traps.c:174 [inline]
       fixup_bug arch/x86/kernel/traps.c:169 [inline]
       do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:267
       do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:286
       invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027
      RIP: 0010:refcount_warn_saturate+0x138/0x1f0 lib/refcount.c:22
      Code: 06 31 ff 89 de e8 c8 f5 e6 fd 84 db 0f 85 6f ff ff ff e8 7b f4 e6 fd 48 c7 c7 e0 71 4f 88 c6 05 56 a6 a4 06 01 e8 c7 a8 b7 fd <0f> 0b e9 50 ff ff ff e8 5c f4 e6 fd 0f b6 1d 3d a6 a4 06 31 ff 89
      RSP: 0018:ffff88809689f550 EFLAGS: 00010286
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: ffffffff815e4336 RDI: ffffed1012d13e9c
      RBP: ffff88809689f560 R08: ffff88809c50a3c0 R09: fffffbfff15d31b1
      R10: fffffbfff15d31b0 R11: ffffffff8ae98d87 R12: 0000000000000001
      R13: 0000000000040100 R14: ffff888099041104 R15: ffff888218d96e40
       refcount_add include/linux/refcount.h:193 [inline]
       skb_set_owner_w+0x2b6/0x410 net/core/sock.c:1999
       sock_wmalloc+0xf1/0x120 net/core/sock.c:2096
       ip_append_page+0x7ef/0x1190 net/ipv4/ip_output.c:1383
       udp_sendpage+0x1c7/0x480 net/ipv4/udp.c:1276
       inet_sendpage+0xdb/0x150 net/ipv4/af_inet.c:821
       kernel_sendpage+0x92/0xf0 net/socket.c:3794
       sock_sendpage+0x8b/0xc0 net/socket.c:936
       pipe_to_sendpage+0x2da/0x3c0 fs/splice.c:458
       splice_from_pipe_feed fs/splice.c:512 [inline]
       __splice_from_pipe+0x3ee/0x7c0 fs/splice.c:636
       splice_from_pipe+0x108/0x170 fs/splice.c:671
       generic_splice_sendpage+0x3c/0x50 fs/splice.c:842
       do_splice_from fs/splice.c:861 [inline]
       direct_splice_actor+0x123/0x190 fs/splice.c:1035
       splice_direct_to_actor+0x3b4/0xa30 fs/splice.c:990
       do_splice_direct+0x1da/0x2a0 fs/splice.c:1078
       do_sendfile+0x597/0xd00 fs/read_write.c:1464
       __do_sys_sendfile64 fs/read_write.c:1525 [inline]
       __se_sys_sendfile64 fs/read_write.c:1511 [inline]
       __x64_sys_sendfile64+0x1dd/0x220 fs/read_write.c:1511
       do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x441409
      Code: e8 ac e8 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fffb64c4f78 EFLAGS: 00000246 ORIG_RAX: 0000000000000028
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441409
      RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000005
      RBP: 0000000000073b8a R08: 0000000000000010 R09: 0000000000000010
      R10: 0000000000010001 R11: 0000000000000246 R12: 0000000000402180
      R13: 0000000000402210 R14: 0000000000000000 R15: 0000000000000000
      Kernel Offset: disabled
      Rebooting in 86400 seconds..
      
      Fixes: 1470ddf7 ("inet: Remove explicit write references to sk/inet in ip_append_data")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d80d67cd
  2. 18 12月, 2019 3 次提交