1. 25 5月, 2018 6 次提交
    • Q
      mlx4_core: allocate ICM memory in page size chunks · 1383cb81
      Qing Huang 提交于
      When a system is under memory presure (high usage with fragments),
      the original 256KB ICM chunk allocations will likely trigger kernel
      memory management to enter slow path doing memory compact/migration
      ops in order to complete high order memory allocations.
      
      When that happens, user processes calling uverb APIs may get stuck
      for more than 120s easily even though there are a lot of free pages
      in smaller chunks available in the system.
      
      Syslog:
      ...
      Dec 10 09:04:51 slcc03db02 kernel: [397078.572732] INFO: task
      oracle_205573_e:205573 blocked for more than 120 seconds.
      ...
      
      With 4KB ICM chunk size on x86_64 arch, the above issue is fixed.
      
      However in order to support smaller ICM chunk size, we need to fix
      another issue in large size kcalloc allocations.
      
      E.g.
      Setting log_num_mtt=30 requires 1G mtt entries. With the 4KB ICM chunk
      size, each ICM chunk can only hold 512 mtt entries (8 bytes for each mtt
      entry). So we need a 16MB allocation for a table->icm pointer array to
      hold 2M pointers which can easily cause kcalloc to fail.
      
      The solution is to use kvzalloc to replace kcalloc which will fall back
      to vmalloc automatically if kmalloc fails.
      Signed-off-by: NQing Huang <qing.huang@oracle.com>
      Acked-by: NDaniel Jurgens <danielj@mellanox.com>
      Reviewed-by: NZhu Yanjun <yanjun.zhu@oracle.com>
      Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1383cb81
    • G
      enic: set DMA mask to 47 bit · 322eaa06
      Govindarajulu Varadarajan 提交于
      In commit 624dbf55 ("driver/net: enic: Try DMA 64 first, then
      failover to DMA") DMA mask was changed from 40 bits to 64 bits.
      Hardware actually supports only 47 bits.
      
      Fixes: 624dbf55 ("driver/net: enic: Try DMA 64 first, then failover to DMA")
      Signed-off-by: NGovindarajulu Varadarajan <gvaradar@cisco.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      322eaa06
    • E
      ppp: remove the PPPIOCDETACH ioctl · af8d3c7c
      Eric Biggers 提交于
      The PPPIOCDETACH ioctl effectively tries to "close" the given ppp file
      before f_count has reached 0, which is fundamentally a bad idea.  It
      does check 'f_count < 2', which excludes concurrent operations on the
      file since they would only be possible with a shared fd table, in which
      case each fdget() would take a file reference.  However, it fails to
      account for the fact that even with 'f_count == 1' the file can still be
      linked into epoll instances.  As reported by syzbot, this can trivially
      be used to cause a use-after-free.
      
      Yet, the only known user of PPPIOCDETACH is pppd versions older than
      ppp-2.4.2, which was released almost 15 years ago (November 2003).
      Also, PPPIOCDETACH apparently stopped working reliably at around the
      same time, when the f_count check was added to the kernel, e.g. see
      https://lkml.org/lkml/2002/12/31/83.  Also, the current 'f_count < 2'
      check makes PPPIOCDETACH only work in single-threaded applications; it
      always fails if called from a multithreaded application.
      
      All pppd versions released in the last 15 years just close() the file
      descriptor instead.
      
      Therefore, instead of hacking around this bug by exporting epoll
      internals to modules, and probably missing other related bugs, just
      remove the PPPIOCDETACH ioctl and see if anyone actually notices.  Leave
      a stub in place that prints a one-time warning and returns EINVAL.
      
      Reported-by: syzbot+16363c99d4134717c05b@syzkaller.appspotmail.com
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Acked-by: NPaul Mackerras <paulus@ozlabs.org>
      Reviewed-by: NGuillaume Nault <g.nault@alphalink.fr>
      Tested-by: NGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af8d3c7c
    • J
      vhost: synchronize IOTLB message with dev cleanup · 1b15ad68
      Jason Wang 提交于
      DaeRyong Jeong reports a race between vhost_dev_cleanup() and
      vhost_process_iotlb_msg():
      
      Thread interleaving:
      CPU0 (vhost_process_iotlb_msg)			CPU1 (vhost_dev_cleanup)
      (In the case of both VHOST_IOTLB_UPDATE and
      VHOST_IOTLB_INVALIDATE)
      
      =====						=====
      						vhost_umem_clean(dev->iotlb);
      if (!dev->iotlb) {
      	        ret = -EFAULT;
      		        break;
      }
      						dev->iotlb = NULL;
      
      The reason is we don't synchronize between them, fixing by protecting
      vhost_process_iotlb_msg() with dev mutex.
      Reported-by: NDaeRyong Jeong <threeearcat@gmail.com>
      Fixes: 6b1e6cc7 ("vhost: new device IOTLB API")
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1b15ad68
    • Y
      net/mlx5: IPSec, Fix a race between concurrent sandbox QP commands · 1dcbc01f
      Yossi Kuperman 提交于
      Sandbox QP Commands are retired in the order they are sent. Outstanding
      commands are stored in a linked-list in the order they appear. Once a
      response is received and the callback gets called, we pull the first
      element off the pending list, assuming they correspond.
      
      Sending a message and adding it to the pending list is not done atomically,
      hence there is an opportunity for a race between concurrent requests.
      
      Bind both send and add under a critical section.
      
      Fixes: bebb23e6 ("net/mlx5: Accel, Add IPSec acceleration interface")
      Signed-off-by: NYossi Kuperman <yossiku@mellanox.com>
      Signed-off-by: NAdi Nissim <adin@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      1dcbc01f
    • E
      net/mlx5e: When RXFCS is set, add FCS data into checksum calculation · 902a5459
      Eran Ben Elisha 提交于
      When RXFCS feature is enabled, the HW do not strip the FCS data,
      however it is not present in the checksum calculated by the HW.
      
      Fix that by manually calculating the FCS checksum and adding it to the SKB
      checksum field.
      
      Add helper function to find the FCS data for all SKB forms (linear,
      one fragment or more).
      
      Fixes: 102722fc ("net/mlx5e: Add support for RXFCS feature flag")
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      902a5459
  2. 24 5月, 2018 10 次提交
    • J
      net/mlx4: Fix irq-unsafe spinlock usage · d546b67c
      Jack Morgenstein 提交于
      spin_lock/unlock was used instead of spin_un/lock_irq
      in a procedure used in process space, on a spinlock
      which can be grabbed in an interrupt.
      
      This caused the stack trace below to be displayed (on kernel
      4.17.0-rc1 compiled with Lock Debugging enabled):
      
      [  154.661474] WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
      [  154.668909] 4.17.0-rc1-rdma_rc_mlx+ #3 Tainted: G          I
      [  154.675856] -----------------------------------------------------
      [  154.682706] modprobe/10159 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
      [  154.690254] 00000000f3b0e495 (&(&qp_table->lock)->rlock){+.+.}, at: mlx4_qp_remove+0x20/0x50 [mlx4_core]
      [  154.700927]
      and this task is already holding:
      [  154.707461] 0000000094373b5d (&(&cq->lock)->rlock/1){....}, at: destroy_qp_common+0x111/0x560 [mlx4_ib]
      [  154.718028] which would create a new lock dependency:
      [  154.723705]  (&(&cq->lock)->rlock/1){....} -> (&(&qp_table->lock)->rlock){+.+.}
      [  154.731922]
      but this new dependency connects a SOFTIRQ-irq-safe lock:
      [  154.740798]  (&(&cq->lock)->rlock){..-.}
      [  154.740800]
      ... which became SOFTIRQ-irq-safe at:
      [  154.752163]   _raw_spin_lock_irqsave+0x3e/0x50
      [  154.757163]   mlx4_ib_poll_cq+0x36/0x900 [mlx4_ib]
      [  154.762554]   ipoib_tx_poll+0x4a/0xf0 [ib_ipoib]
      ...
      to a SOFTIRQ-irq-unsafe lock:
      [  154.815603]  (&(&qp_table->lock)->rlock){+.+.}
      [  154.815604]
      ... which became SOFTIRQ-irq-unsafe at:
      [  154.827718] ...
      [  154.827720]   _raw_spin_lock+0x35/0x50
      [  154.833912]   mlx4_qp_lookup+0x1e/0x50 [mlx4_core]
      [  154.839302]   mlx4_flow_attach+0x3f/0x3d0 [mlx4_core]
      
      Since mlx4_qp_lookup() is called only in process space, we can
      simply replace the spin_un/lock calls with spin_un/lock_irq calls.
      
      Fixes: 6dc06c08 ("net/mlx4: Fix the check in attaching steering rules")
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d546b67c
    • F
      net: phy: broadcom: Fix bcm_write_exp() · 79fb218d
      Florian Fainelli 提交于
      On newer PHYs, we need to select the expansion register to write with
      setting bits [11:8] to 0xf. This was done correctly by bcm7xxx.c prior
      to being migrated to generic code under bcm-phy-lib.c which
      unfortunately used the older implementation from the BCM54xx days.
      
      Fix this by creating an inline stub: bcm_write_exp_sel() which adds the
      correct value (MII_BCM54XX_EXP_SEL_ER) and update both the Cygnus PHY
      and BCM7xxx PHY drivers which require setting these bits.
      
      broadcom.c is unchanged because some PHYs even use a different selector
      method, so let them specify it directly (e.g: SerDes secondary selector).
      
      Fixes: a1cba561 ("net: phy: Add Broadcom phy library for common interfaces")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79fb218d
    • F
      net: phy: broadcom: Fix auxiliary control register reads · 733a969a
      Florian Fainelli 提交于
      We are currently doing auxiliary control register reads with the shadow
      register value 0b111 (0x7) which incidentally is also the selector value
      that should be present in bits [2:0]. Fix this by using the appropriate
      selector mask which is defined (MII_BCM54XX_AUXCTL_SHDWSEL_MASK).
      
      This does not have a functional impact yet because we always access the
      MII_BCM54XX_AUXCTL_SHDWSEL_MISC (0x7) register in the current code.
      This might change at some point though.
      
      Fixes: 5b4e2900 ("net: phy: broadcom: add bcm54xx_auxctl_read")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      733a969a
    • C
      net/mlx4: fix spelling mistake: "Inrerface" -> "Interface" and rephrase message · 4f7f56b6
      Colin Ian King 提交于
      Trivial fix to spelling mistake in mlx4_dbg debug message and also
      change the phrasing of the message so that is is more readable
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f7f56b6
    • N
      ibmvnic: Only do H_EOI for mobility events · 73f9d364
      Nathan Fontenot 提交于
      When enabling the sub-CRQ IRQ a previous update sent a H_EOI prior
      to the enablement to clear any pending interrupts that may be present
      across a partition migration. This fixed a firmware bug where a
      migration could erroneously indicate that a H_EOI was pending.
      
      The H_EOI should only be sent when enabling during a mobility
      event though. Doing so at other time could wrong and can produce
      extra driver output when IRQs are enabled when doing TX completion.
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      73f9d364
    • J
      tuntap: correctly set SOCKWQ_ASYNC_NOSPACE · 2f3ab622
      Jason Wang 提交于
      When link is down, writes to the device might fail with
      -EIO. Userspace needs an indication when the status is resolved.  As a
      fix, tun_net_open() attempts to wake up writers - but that is only
      effective if SOCKWQ_ASYNC_NOSPACE has been set in the past. This is
      not the case of vhost_net which only poll for EPOLLOUT after it meets
      errors during sendmsg().
      
      This patch fixes this by making sure SOCKWQ_ASYNC_NOSPACE is set when
      socket is not writable or device is down to guarantee EPOLLOUT will be
      raised in either tun_chr_poll() or tun_sock_write_space() after device
      is up.
      
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Fixes: 1bd4978a ("tun: honor IFF_UP in tun_get_user()")
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2f3ab622
    • J
      virtio-net: fix leaking page for gso packet during mergeable XDP · 3d62b2a0
      Jason Wang 提交于
      We need to drop refcnt to xdp_page if we see a gso packet. Otherwise
      it will be leaked. Fixing this by moving the check of gso packet above
      the linearizing logic. While at it, remove useless comment as well.
      
      Cc: John Fastabend <john.fastabend@gmail.com>
      Fixes: 72979a6c ("virtio_net: xdp, add slowpath case for non contiguous buffers")
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3d62b2a0
    • J
      virtio-net: correctly check num_buf during err path · 850e088d
      Jason Wang 提交于
      If we successfully linearize the packet, num_buf will be set to zero
      which may confuse error handling path which assumes num_buf is at
      least 1 and this can lead the code tries to pop the descriptor of next
      buffer. Fixing this by checking num_buf against 1 before decreasing.
      
      Fixes: 4941d472 ("virtio-net: do not reset during XDP set")
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      850e088d
    • J
      virtio-net: correctly transmit XDP buff after linearizing · 5d458a13
      Jason Wang 提交于
      We should not go for the error path after successfully transmitting a
      XDP buffer after linearizing. Since the error path may try to pop and
      drop next packet and increase the drop counters. Fixing this by simply
      drop the refcnt of original page and go for xmit path.
      
      Fixes: 72979a6c ("virtio_net: xdp, add slowpath case for non contiguous buffers")
      Cc: John Fastabend <john.fastabend@gmail.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5d458a13
    • J
      virtio-net: correctly redirect linearized packet · 6890418b
      Jason Wang 提交于
      After a linearized packet was redirected by XDP, we should not go for
      the err path which will try to pop buffers for the next packet and
      increase the drop counter. Fixing this by just drop the page refcnt
      for the original page.
      
      Fixes: 186b3c99 ("virtio-net: support XDP_REDIRECT")
      Reported-by: NDavid Ahern <dsahern@gmail.com>
      Tested-by: NDavid Ahern <dsahern@gmail.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6890418b
  3. 23 5月, 2018 5 次提交
    • B
      pcnet32: add an error handling path in pcnet32_probe_pci() · d7db3186
      Bo Chen 提交于
      Make sure to invoke pci_disable_device() when errors occur in
      pcnet32_probe_pci().
      Signed-off-by: NBo Chen <chenbo@pdx.edu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7db3186
    • S
      qed: Fix mask for physical address in ILT entry · fdd13dd3
      Shahed Shaikh 提交于
      ILT entry requires 12 bit right shifted physical address.
      Existing mask for ILT entry of physical address i.e.
      ILT_ENTRY_PHY_ADDR_MASK is not sufficient to handle 64bit
      address because upper 8 bits of 64 bit address were getting
      masked which resulted in completer abort error on
      PCIe bus due to invalid address.
      
      Fix that mask to handle 64bit physical address.
      
      Fixes: fe56b9e6 ("qed: Add module with basic common support")
      Signed-off-by: NShahed Shaikh <shahed.shaikh@cavium.com>
      Signed-off-by: NAriel Elior <ariel.elior@cavium.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fdd13dd3
    • W
      isdn: eicon: fix a missing-check bug · 6009d1fe
      Wenwen Wang 提交于
      In divasmain.c, the function divas_write() firstly invokes the function
      diva_xdi_open_adapter() to open the adapter that matches with the adapter
      number provided by the user, and then invokes the function diva_xdi_write()
      to perform the write operation using the matched adapter. The two functions
      diva_xdi_open_adapter() and diva_xdi_write() are located in diva.c.
      
      In diva_xdi_open_adapter(), the user command is copied to the object 'msg'
      from the userspace pointer 'src' through the function pointer 'cp_fn',
      which eventually calls copy_from_user() to do the copy. Then, the adapter
      number 'msg.adapter' is used to find out a matched adapter from the
      'adapter_queue'. A matched adapter will be returned if it is found.
      Otherwise, NULL is returned to indicate the failure of the verification on
      the adapter number.
      
      As mentioned above, if a matched adapter is returned, the function
      diva_xdi_write() is invoked to perform the write operation. In this
      function, the user command is copied once again from the userspace pointer
      'src', which is the same as the 'src' pointer in diva_xdi_open_adapter() as
      both of them are from the 'buf' pointer in divas_write(). Similarly, the
      copy is achieved through the function pointer 'cp_fn', which finally calls
      copy_from_user(). After the successful copy, the corresponding command
      processing handler of the matched adapter is invoked to perform the write
      operation.
      
      It is obvious that there are two copies here from userspace, one is in
      diva_xdi_open_adapter(), and one is in diva_xdi_write(). Plus, both of
      these two copies share the same source userspace pointer, i.e., the 'buf'
      pointer in divas_write(). Given that a malicious userspace process can race
      to change the content pointed by the 'buf' pointer, this can pose potential
      security issues. For example, in the first copy, the user provides a valid
      adapter number to pass the verification process and a valid adapter can be
      found. Then the user can modify the adapter number to an invalid number.
      This way, the user can bypass the verification process of the adapter
      number and inject inconsistent data.
      
      This patch reuses the data copied in
      diva_xdi_open_adapter() and passes it to diva_xdi_write(). This way, the
      above issues can be avoided.
      Signed-off-by: NWenwen Wang <wang6495@umn.edu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6009d1fe
    • F
      net: fec: Add a SPDX identifier · 1f508124
      Fabio Estevam 提交于
      Currently there is no license information in the header of
      this file.
      
      The MODULE_LICENSE field contains ("GPL"), which means
      GNU Public License v2 or later, so add a corresponding
      SPDX license identifier.
      Signed-off-by: NFabio Estevam <fabio.estevam@nxp.com>
      Acked-by: NFugang Duan <fugang.duan@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f508124
    • F
      net: fec: ptp: Switch to SPDX identifier · 9fcca5ef
      Fabio Estevam 提交于
      Adopt the SPDX license identifier headers to ease license compliance
      management.
      Signed-off-by: NFabio Estevam <fabio.estevam@nxp.com>
      Acked-by: NFugang Duan <fugang.duan@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9fcca5ef
  4. 22 5月, 2018 1 次提交
  5. 19 5月, 2018 4 次提交
  6. 18 5月, 2018 4 次提交
  7. 17 5月, 2018 10 次提交