1. 26 5月, 2018 3 次提交
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · d2f30f51
      David S. Miller 提交于
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2018-05-24
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Fix a bug in the original fix to prevent out of bounds speculation when
         multiple tail call maps from different branches or calls end up at the
         same tail call helper invocation, from Daniel.
      
      2) Two selftest fixes, one in reuseport_bpf_numa where test is skipped in
         case of missing numa support and another one to update kernel config to
         properly support xdp_meta.sh test, from Anders.
      
       ...
      
      Would be great if you have a chance to merge net into net-next after that.
      
      The verifier fix would be needed later as a dependency in bpf-next for
      upcomig work there. When you do the merge there's a trivial conflict on
      BPF side with 849fa506 ("bpf/verifier: refine retval R0 state for
      bpf_get_stack helper"): Resolution is to keep both functions, the
      do_refine_retval_range() and record_func_map().
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d2f30f51
    • S
      selftests/net: Add missing config options for PMTU tests · 24e4b075
      Stefano Brivio 提交于
      PMTU tests in pmtu.sh need support for VTI, VTI6 and dummy
      interfaces: add them to config file.
      Reported-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
      Fixes: d1f1b9cb ("selftests: net: Introduce first PMTU test")
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      24e4b075
    • D
      Merge tag 'batadv-net-for-davem-20180524' of git://git.open-mesh.org/linux-merge · e3ffec48
      David S. Miller 提交于
      Simon Wunderlich says:
      
      ====================
      Here are some batman-adv bugfixes:
      
       - prevent hardif_put call with NULL parameter, by Colin Ian King
      
       - Avoid race in Translation Table allocator, by Sven Eckelmann
      
       - Fix Translation Table sync flags for intermediate Responses,
         by Linus Luessing
      
       - prevent sending inconsistent Translation Table TVLVs,
         by Marek Lindner
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e3ffec48
  2. 25 5月, 2018 10 次提交
    • Q
      mlx4_core: allocate ICM memory in page size chunks · 1383cb81
      Qing Huang 提交于
      When a system is under memory presure (high usage with fragments),
      the original 256KB ICM chunk allocations will likely trigger kernel
      memory management to enter slow path doing memory compact/migration
      ops in order to complete high order memory allocations.
      
      When that happens, user processes calling uverb APIs may get stuck
      for more than 120s easily even though there are a lot of free pages
      in smaller chunks available in the system.
      
      Syslog:
      ...
      Dec 10 09:04:51 slcc03db02 kernel: [397078.572732] INFO: task
      oracle_205573_e:205573 blocked for more than 120 seconds.
      ...
      
      With 4KB ICM chunk size on x86_64 arch, the above issue is fixed.
      
      However in order to support smaller ICM chunk size, we need to fix
      another issue in large size kcalloc allocations.
      
      E.g.
      Setting log_num_mtt=30 requires 1G mtt entries. With the 4KB ICM chunk
      size, each ICM chunk can only hold 512 mtt entries (8 bytes for each mtt
      entry). So we need a 16MB allocation for a table->icm pointer array to
      hold 2M pointers which can easily cause kcalloc to fail.
      
      The solution is to use kvzalloc to replace kcalloc which will fall back
      to vmalloc automatically if kmalloc fails.
      Signed-off-by: NQing Huang <qing.huang@oracle.com>
      Acked-by: NDaniel Jurgens <danielj@mellanox.com>
      Reviewed-by: NZhu Yanjun <yanjun.zhu@oracle.com>
      Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1383cb81
    • G
      enic: set DMA mask to 47 bit · 322eaa06
      Govindarajulu Varadarajan 提交于
      In commit 624dbf55 ("driver/net: enic: Try DMA 64 first, then
      failover to DMA") DMA mask was changed from 40 bits to 64 bits.
      Hardware actually supports only 47 bits.
      
      Fixes: 624dbf55 ("driver/net: enic: Try DMA 64 first, then failover to DMA")
      Signed-off-by: NGovindarajulu Varadarajan <gvaradar@cisco.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      322eaa06
    • E
      ppp: remove the PPPIOCDETACH ioctl · af8d3c7c
      Eric Biggers 提交于
      The PPPIOCDETACH ioctl effectively tries to "close" the given ppp file
      before f_count has reached 0, which is fundamentally a bad idea.  It
      does check 'f_count < 2', which excludes concurrent operations on the
      file since they would only be possible with a shared fd table, in which
      case each fdget() would take a file reference.  However, it fails to
      account for the fact that even with 'f_count == 1' the file can still be
      linked into epoll instances.  As reported by syzbot, this can trivially
      be used to cause a use-after-free.
      
      Yet, the only known user of PPPIOCDETACH is pppd versions older than
      ppp-2.4.2, which was released almost 15 years ago (November 2003).
      Also, PPPIOCDETACH apparently stopped working reliably at around the
      same time, when the f_count check was added to the kernel, e.g. see
      https://lkml.org/lkml/2002/12/31/83.  Also, the current 'f_count < 2'
      check makes PPPIOCDETACH only work in single-threaded applications; it
      always fails if called from a multithreaded application.
      
      All pppd versions released in the last 15 years just close() the file
      descriptor instead.
      
      Therefore, instead of hacking around this bug by exporting epoll
      internals to modules, and probably missing other related bugs, just
      remove the PPPIOCDETACH ioctl and see if anyone actually notices.  Leave
      a stub in place that prints a one-time warning and returns EINVAL.
      
      Reported-by: syzbot+16363c99d4134717c05b@syzkaller.appspotmail.com
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Acked-by: NPaul Mackerras <paulus@ozlabs.org>
      Reviewed-by: NGuillaume Nault <g.nault@alphalink.fr>
      Tested-by: NGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af8d3c7c
    • W
      ipv4: remove warning in ip_recv_error · 730c54d5
      Willem de Bruijn 提交于
      A precondition check in ip_recv_error triggered on an otherwise benign
      race. Remove the warning.
      
      The warning triggers when passing an ipv6 socket to this ipv4 error
      handling function. RaceFuzzer was able to trigger it due to a race
      in setsockopt IPV6_ADDRFORM.
      
        ---
        CPU0
          do_ipv6_setsockopt
            sk->sk_socket->ops = &inet_dgram_ops;
      
        ---
        CPU1
          sk->sk_prot->recvmsg
            udp_recvmsg
              ip_recv_error
                WARN_ON_ONCE(sk->sk_family == AF_INET6);
      
        ---
        CPU0
          do_ipv6_setsockopt
            sk->sk_family = PF_INET;
      
      This socket option converts a v6 socket that is connected to a v4 peer
      to an v4 socket. It updates the socket on the fly, changing fields in
      sk as well as other structs. This is inherently non-atomic. It races
      with the lockless udp_recvmsg path.
      
      No other code makes an assumption that these fields are updated
      atomically. It is benign here, too, as ip_recv_error cares only about
      the protocol of the skbs enqueued on the error queue, for which
      sk_family is not a precise predictor (thanks to another isue with
      IPV6_ADDRFORM).
      
      Link: http://lkml.kernel.org/r/20180518120826.GA19515@dragonet.kaist.ac.kr
      Fixes: 7ce875e5 ("ipv4: warn once on passing AF_INET6 socket to ip_recv_error")
      Reported-by: NDaeRyong Jeong <threeearcat@gmail.com>
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      730c54d5
    • O
      net : sched: cls_api: deal with egdev path only if needed · f8f4bef3
      Or Gerlitz 提交于
      When dealing with ingress rule on a netdev, if we did fine through the
      conventional path, there's no need to continue into the egdev route,
      and we can stop right there.
      
      Not doing so may cause a 2nd rule to be added by the cls api layer
      with the ingress being the egdev.
      
      For example, under sriov switchdev scheme, a user rule of VFR A --> VFR B
      will end up with two HW rules (1) VF A --> VF B and (2) uplink --> VF B
      
      Fixes: 208c0f4b ('net: sched: use tc_setup_cb_call to call per-block callbacks')
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f8f4bef3
    • J
      vhost: synchronize IOTLB message with dev cleanup · 1b15ad68
      Jason Wang 提交于
      DaeRyong Jeong reports a race between vhost_dev_cleanup() and
      vhost_process_iotlb_msg():
      
      Thread interleaving:
      CPU0 (vhost_process_iotlb_msg)			CPU1 (vhost_dev_cleanup)
      (In the case of both VHOST_IOTLB_UPDATE and
      VHOST_IOTLB_INVALIDATE)
      
      =====						=====
      						vhost_umem_clean(dev->iotlb);
      if (!dev->iotlb) {
      	        ret = -EFAULT;
      		        break;
      }
      						dev->iotlb = NULL;
      
      The reason is we don't synchronize between them, fixing by protecting
      vhost_process_iotlb_msg() with dev mutex.
      Reported-by: NDaeRyong Jeong <threeearcat@gmail.com>
      Fixes: 6b1e6cc7 ("vhost: new device IOTLB API")
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1b15ad68
    • D
      Merge tag 'mlx5-fixes-2018-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · d681bc02
      David S. Miller 提交于
      Saeed Mahameed says:
      
      ====================
      Mellanox, mlx5 fixes 2018-05-24
      
      This series includes two mlx5 fixes.
      
      1) add FCS data to checksum complete when required, from Eran Ben
      Elisha.
      
      2) Fix A race in IPSec sandbox QP commands, from Yossi Kuperman.
      
      Please pull and let me know if there's any problem.
      
      for -stable v4.15
      ("net/mlx5e: When RXFCS is set, add FCS data into checksum calculation")
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d681bc02
    • W
      packet: fix reserve calculation · 9aad13b0
      Willem de Bruijn 提交于
      Commit b84bbaf7 ("packet: in packet_snd start writing at link
      layer allocation") ensures that packet_snd always starts writing
      the link layer header in reserved headroom allocated for this
      purpose.
      
      This is needed because packets may be shorter than hard_header_len,
      in which case the space up to hard_header_len may be zeroed. But
      that necessary padding is not accounted for in skb->len.
      
      The fix, however, is buggy. It calls skb_push, which grows skb->len
      when moving skb->data back. But in this case packet length should not
      change.
      
      Instead, call skb_reserve, which moves both skb->data and skb->tail
      back, without changing length.
      
      Fixes: b84bbaf7 ("packet: in packet_snd start writing at link layer allocation")
      Reported-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9aad13b0
    • Y
      net/mlx5: IPSec, Fix a race between concurrent sandbox QP commands · 1dcbc01f
      Yossi Kuperman 提交于
      Sandbox QP Commands are retired in the order they are sent. Outstanding
      commands are stored in a linked-list in the order they appear. Once a
      response is received and the callback gets called, we pull the first
      element off the pending list, assuming they correspond.
      
      Sending a message and adding it to the pending list is not done atomically,
      hence there is an opportunity for a race between concurrent requests.
      
      Bind both send and add under a critical section.
      
      Fixes: bebb23e6 ("net/mlx5: Accel, Add IPSec acceleration interface")
      Signed-off-by: NYossi Kuperman <yossiku@mellanox.com>
      Signed-off-by: NAdi Nissim <adin@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      1dcbc01f
    • E
      net/mlx5e: When RXFCS is set, add FCS data into checksum calculation · 902a5459
      Eran Ben Elisha 提交于
      When RXFCS feature is enabled, the HW do not strip the FCS data,
      however it is not present in the checksum calculated by the HW.
      
      Fix that by manually calculating the FCS checksum and adding it to the SKB
      checksum field.
      
      Add helper function to find the FCS data for all SKB forms (linear,
      one fragment or more).
      
      Fixes: 102722fc ("net/mlx5e: Add support for RXFCS feature flag")
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      902a5459
  3. 24 5月, 2018 14 次提交
    • D
      bpf: properly enforce index mask to prevent out-of-bounds speculation · c93552c4
      Daniel Borkmann 提交于
      While reviewing the verifier code, I recently noticed that the
      following two program variants in relation to tail calls can be
      loaded.
      
      Variant 1:
      
        # bpftool p d x i 15
          0: (15) if r1 == 0x0 goto pc+3
          1: (18) r2 = map[id:5]
          3: (05) goto pc+2
          4: (18) r2 = map[id:6]
          6: (b7) r3 = 7
          7: (35) if r3 >= 0xa0 goto pc+2
          8: (54) (u32) r3 &= (u32) 255
          9: (85) call bpf_tail_call#12
         10: (b7) r0 = 1
         11: (95) exit
      
        # bpftool m s i 5
          5: prog_array  flags 0x0
              key 4B  value 4B  max_entries 4  memlock 4096B
        # bpftool m s i 6
          6: prog_array  flags 0x0
              key 4B  value 4B  max_entries 160  memlock 4096B
      
      Variant 2:
      
        # bpftool p d x i 20
          0: (15) if r1 == 0x0 goto pc+3
          1: (18) r2 = map[id:8]
          3: (05) goto pc+2
          4: (18) r2 = map[id:7]
          6: (b7) r3 = 7
          7: (35) if r3 >= 0x4 goto pc+2
          8: (54) (u32) r3 &= (u32) 3
          9: (85) call bpf_tail_call#12
         10: (b7) r0 = 1
         11: (95) exit
      
        # bpftool m s i 8
          8: prog_array  flags 0x0
              key 4B  value 4B  max_entries 160  memlock 4096B
        # bpftool m s i 7
          7: prog_array  flags 0x0
              key 4B  value 4B  max_entries 4  memlock 4096B
      
      In both cases the index masking inserted by the verifier in order
      to control out of bounds speculation from a CPU via b2157399
      ("bpf: prevent out-of-bounds speculation") seems to be incorrect
      in what it is enforcing. In the 1st variant, the mask is applied
      from the map with the significantly larger number of entries where
      we would allow to a certain degree out of bounds speculation for
      the smaller map, and in the 2nd variant where the mask is applied
      from the map with the smaller number of entries, we get buggy
      behavior since we truncate the index of the larger map.
      
      The original intent from commit b2157399 is to reject such
      occasions where two or more different tail call maps are used
      in the same tail call helper invocation. However, the check on
      the BPF_MAP_PTR_POISON is never hit since we never poisoned the
      saved pointer in the first place! We do this explicitly for map
      lookups but in case of tail calls we basically used the tail
      call map in insn_aux_data that was processed in the most recent
      path which the verifier walked. Thus any prior path that stored
      a pointer in insn_aux_data at the helper location was always
      overridden.
      
      Fix it by moving the map pointer poison logic into a small helper
      that covers both BPF helpers with the same logic. After that in
      fixup_bpf_calls() the poison check is then hit for tail calls
      and the program rejected. Latter only happens in unprivileged
      case since this is the *only* occasion where a rewrite needs to
      happen, and where such rewrite is specific to the map (max_entries,
      index_mask). In the privileged case the rewrite is generic for
      the insn->imm / insn->code update so multiple maps from different
      paths can be handled just fine since all the remaining logic
      happens in the instruction processing itself. This is similar
      to the case of map lookups: in case there is a collision of
      maps in fixup_bpf_calls() we must skip the inlined rewrite since
      this will turn the generic instruction sequence into a non-
      generic one. Thus the patch_call_imm will simply update the
      insn->imm location where the bpf_map_lookup_elem() will later
      take care of the dispatch. Given we need this 'poison' state
      as a check, the information of whether a map is an unpriv_array
      gets lost, so enforcing it prior to that needs an additional
      state. In general this check is needed since there are some
      complex and tail call intensive BPF programs out there where
      LLVM tends to generate such code occasionally. We therefore
      convert the map_ptr rather into map_state to store all this
      w/o extra memory overhead, and the bit whether one of the maps
      involved in the collision was from an unpriv_array thus needs
      to be retained as well there.
      
      Fixes: b2157399 ("bpf: prevent out-of-bounds speculation")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      c93552c4
    • J
      net/mlx4: Fix irq-unsafe spinlock usage · d546b67c
      Jack Morgenstein 提交于
      spin_lock/unlock was used instead of spin_un/lock_irq
      in a procedure used in process space, on a spinlock
      which can be grabbed in an interrupt.
      
      This caused the stack trace below to be displayed (on kernel
      4.17.0-rc1 compiled with Lock Debugging enabled):
      
      [  154.661474] WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
      [  154.668909] 4.17.0-rc1-rdma_rc_mlx+ #3 Tainted: G          I
      [  154.675856] -----------------------------------------------------
      [  154.682706] modprobe/10159 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
      [  154.690254] 00000000f3b0e495 (&(&qp_table->lock)->rlock){+.+.}, at: mlx4_qp_remove+0x20/0x50 [mlx4_core]
      [  154.700927]
      and this task is already holding:
      [  154.707461] 0000000094373b5d (&(&cq->lock)->rlock/1){....}, at: destroy_qp_common+0x111/0x560 [mlx4_ib]
      [  154.718028] which would create a new lock dependency:
      [  154.723705]  (&(&cq->lock)->rlock/1){....} -> (&(&qp_table->lock)->rlock){+.+.}
      [  154.731922]
      but this new dependency connects a SOFTIRQ-irq-safe lock:
      [  154.740798]  (&(&cq->lock)->rlock){..-.}
      [  154.740800]
      ... which became SOFTIRQ-irq-safe at:
      [  154.752163]   _raw_spin_lock_irqsave+0x3e/0x50
      [  154.757163]   mlx4_ib_poll_cq+0x36/0x900 [mlx4_ib]
      [  154.762554]   ipoib_tx_poll+0x4a/0xf0 [ib_ipoib]
      ...
      to a SOFTIRQ-irq-unsafe lock:
      [  154.815603]  (&(&qp_table->lock)->rlock){+.+.}
      [  154.815604]
      ... which became SOFTIRQ-irq-unsafe at:
      [  154.827718] ...
      [  154.827720]   _raw_spin_lock+0x35/0x50
      [  154.833912]   mlx4_qp_lookup+0x1e/0x50 [mlx4_core]
      [  154.839302]   mlx4_flow_attach+0x3f/0x3d0 [mlx4_core]
      
      Since mlx4_qp_lookup() is called only in process space, we can
      simply replace the spin_un/lock calls with spin_un/lock_irq calls.
      
      Fixes: 6dc06c08 ("net/mlx4: Fix the check in attaching steering rules")
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d546b67c
    • F
      net: phy: broadcom: Fix bcm_write_exp() · 79fb218d
      Florian Fainelli 提交于
      On newer PHYs, we need to select the expansion register to write with
      setting bits [11:8] to 0xf. This was done correctly by bcm7xxx.c prior
      to being migrated to generic code under bcm-phy-lib.c which
      unfortunately used the older implementation from the BCM54xx days.
      
      Fix this by creating an inline stub: bcm_write_exp_sel() which adds the
      correct value (MII_BCM54XX_EXP_SEL_ER) and update both the Cygnus PHY
      and BCM7xxx PHY drivers which require setting these bits.
      
      broadcom.c is unchanged because some PHYs even use a different selector
      method, so let them specify it directly (e.g: SerDes secondary selector).
      
      Fixes: a1cba561 ("net: phy: Add Broadcom phy library for common interfaces")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79fb218d
    • F
      net: phy: broadcom: Fix auxiliary control register reads · 733a969a
      Florian Fainelli 提交于
      We are currently doing auxiliary control register reads with the shadow
      register value 0b111 (0x7) which incidentally is also the selector value
      that should be present in bits [2:0]. Fix this by using the appropriate
      selector mask which is defined (MII_BCM54XX_AUXCTL_SHDWSEL_MASK).
      
      This does not have a functional impact yet because we always access the
      MII_BCM54XX_AUXCTL_SHDWSEL_MISC (0x7) register in the current code.
      This might change at some point though.
      
      Fixes: 5b4e2900 ("net: phy: broadcom: add bcm54xx_auxctl_read")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      733a969a
    • R
    • C
      net/mlx4: fix spelling mistake: "Inrerface" -> "Interface" and rephrase message · 4f7f56b6
      Colin Ian King 提交于
      Trivial fix to spelling mistake in mlx4_dbg debug message and also
      change the phrasing of the message so that is is more readable
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f7f56b6
    • N
      ibmvnic: Only do H_EOI for mobility events · 73f9d364
      Nathan Fontenot 提交于
      When enabling the sub-CRQ IRQ a previous update sent a H_EOI prior
      to the enablement to clear any pending interrupts that may be present
      across a partition migration. This fixed a firmware bug where a
      migration could erroneously indicate that a H_EOI was pending.
      
      The H_EOI should only be sent when enabling during a mobility
      event though. Doing so at other time could wrong and can produce
      extra driver output when IRQs are enabled when doing TX completion.
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      73f9d364
    • D
      Merge tag 'wireless-drivers-for-davem-2018-05-22' of... · ab1f1786
      David S. Miller 提交于
      Merge tag 'wireless-drivers-for-davem-2018-05-22' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
      
      Kalle Valo says:
      
      ====================
      wireless-drivers fixes for 4.17
      
      Hopefully the last fixes for 4.17. ssb is again causing problems so we
      had to revert a commit and fix it better. Also a small fix to bcma and
      some MAINTAINERS file updates.
      
      ssb
      
      * fix regression with all module PCI cards, for example using b43 and
        b44 drivers
      
      * try again fixing a MIPS linker error
      
      bcma
      
      * fix truncated info log messages
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ab1f1786
    • J
      tuntap: correctly set SOCKWQ_ASYNC_NOSPACE · 2f3ab622
      Jason Wang 提交于
      When link is down, writes to the device might fail with
      -EIO. Userspace needs an indication when the status is resolved.  As a
      fix, tun_net_open() attempts to wake up writers - but that is only
      effective if SOCKWQ_ASYNC_NOSPACE has been set in the past. This is
      not the case of vhost_net which only poll for EPOLLOUT after it meets
      errors during sendmsg().
      
      This patch fixes this by making sure SOCKWQ_ASYNC_NOSPACE is set when
      socket is not writable or device is down to guarantee EPOLLOUT will be
      raised in either tun_chr_poll() or tun_sock_write_space() after device
      is up.
      
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Fixes: 1bd4978a ("tun: honor IFF_UP in tun_get_user()")
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2f3ab622
    • D
      Merge branch 'virtio_net-mergeable-XDP' · a43ad59a
      David S. Miller 提交于
      Jason Wang says:
      
      ====================
      Fix several issues of virtio-net mergeable XDP
      
      Please review the patches that tries to fix several issues of
      virtio-net mergeable XDP.
      
      Changes from V1:
      - check against 1 before decreasing instead of resetting to 1
      - typoe fixes
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a43ad59a
    • J
      virtio-net: fix leaking page for gso packet during mergeable XDP · 3d62b2a0
      Jason Wang 提交于
      We need to drop refcnt to xdp_page if we see a gso packet. Otherwise
      it will be leaked. Fixing this by moving the check of gso packet above
      the linearizing logic. While at it, remove useless comment as well.
      
      Cc: John Fastabend <john.fastabend@gmail.com>
      Fixes: 72979a6c ("virtio_net: xdp, add slowpath case for non contiguous buffers")
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3d62b2a0
    • J
      virtio-net: correctly check num_buf during err path · 850e088d
      Jason Wang 提交于
      If we successfully linearize the packet, num_buf will be set to zero
      which may confuse error handling path which assumes num_buf is at
      least 1 and this can lead the code tries to pop the descriptor of next
      buffer. Fixing this by checking num_buf against 1 before decreasing.
      
      Fixes: 4941d472 ("virtio-net: do not reset during XDP set")
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      850e088d
    • J
      virtio-net: correctly transmit XDP buff after linearizing · 5d458a13
      Jason Wang 提交于
      We should not go for the error path after successfully transmitting a
      XDP buffer after linearizing. Since the error path may try to pop and
      drop next packet and increase the drop counters. Fixing this by simply
      drop the refcnt of original page and go for xmit path.
      
      Fixes: 72979a6c ("virtio_net: xdp, add slowpath case for non contiguous buffers")
      Cc: John Fastabend <john.fastabend@gmail.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5d458a13
    • J
      virtio-net: correctly redirect linearized packet · 6890418b
      Jason Wang 提交于
      After a linearized packet was redirected by XDP, we should not go for
      the err path which will try to pop buffers for the next packet and
      increase the drop counter. Fixing this by just drop the page refcnt
      for the original page.
      
      Fixes: 186b3c99 ("virtio-net: support XDP_REDIRECT")
      Reported-by: NDavid Ahern <dsahern@gmail.com>
      Tested-by: NDavid Ahern <dsahern@gmail.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6890418b
  4. 23 5月, 2018 10 次提交
    • D
      Merge tag 'mac80211-for-davem-2018-05-23' of... · 419fc888
      David S. Miller 提交于
      Merge tag 'mac80211-for-davem-2018-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
      
      Johannes Berg says:
      
      ====================
      A handful of fixes:
       * hwsim radio dump wasn't working for the first radio
       * mesh was updating statistics incorrectly
       * a netlink message allocation was possibly too short
       * wiphy name limit was still too long
       * in certain cases regdb query could find a NULL pointer
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      419fc888
    • A
      selftests: net: reuseport_bpf_numa: don't fail if no numa support · 1a2b80ec
      Anders Roxell 提交于
      The reuseport_bpf_numa test case fails there's no numa support.  The
      test shouldn't fail if there's no support it should be skipped.
      
      Fixes: 3c2c3c16 ("reuseport, bpf: add test case for bpf_get_numa_node_id")
      Signed-off-by: NAnders Roxell <anders.roxell@linaro.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      1a2b80ec
    • B
      pcnet32: add an error handling path in pcnet32_probe_pci() · d7db3186
      Bo Chen 提交于
      Make sure to invoke pci_disable_device() when errors occur in
      pcnet32_probe_pci().
      Signed-off-by: NBo Chen <chenbo@pdx.edu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7db3186
    • S
      qed: Fix mask for physical address in ILT entry · fdd13dd3
      Shahed Shaikh 提交于
      ILT entry requires 12 bit right shifted physical address.
      Existing mask for ILT entry of physical address i.e.
      ILT_ENTRY_PHY_ADDR_MASK is not sufficient to handle 64bit
      address because upper 8 bits of 64 bit address were getting
      masked which resulted in completer abort error on
      PCIe bus due to invalid address.
      
      Fix that mask to handle 64bit physical address.
      
      Fixes: fe56b9e6 ("qed: Add module with basic common support")
      Signed-off-by: NShahed Shaikh <shahed.shaikh@cavium.com>
      Signed-off-by: NAriel Elior <ariel.elior@cavium.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fdd13dd3
    • E
      ipmr: properly check rhltable_init() return value · 66fb3325
      Eric Dumazet 提交于
      commit 8fb472c0 ("ipmr: improve hash scalability")
      added a call to rhltable_init() without checking its return value.
      
      This problem was then later copied to IPv6 and factorized in commit
      0bbbf0e7 ("ipmr, ip6mr: Unite creation of new mr_table")
      
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 1 PID: 31552 Comm: syz-executor7 Not tainted 4.17.0-rc5+ #60
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:rht_key_hashfn include/linux/rhashtable.h:277 [inline]
      RIP: 0010:__rhashtable_lookup include/linux/rhashtable.h:630 [inline]
      RIP: 0010:rhltable_lookup include/linux/rhashtable.h:716 [inline]
      RIP: 0010:mr_mfc_find_parent+0x2ad/0xbb0 net/ipv4/ipmr_base.c:63
      RSP: 0018:ffff8801826aef70 EFLAGS: 00010203
      RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffffc90001ea0000
      RDX: 0000000000000079 RSI: ffffffff8661e859 RDI: 000000000000000c
      RBP: ffff8801826af1c0 R08: ffff8801b2212000 R09: ffffed003b5e46c2
      R10: ffffed003b5e46c2 R11: ffff8801daf23613 R12: dffffc0000000000
      R13: ffff8801826af198 R14: ffff8801cf8225c0 R15: ffff8801826af658
      FS:  00007ff7fa732700(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000003ffffff9c CR3: 00000001b0210000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       ip6mr_cache_find_parent net/ipv6/ip6mr.c:981 [inline]
       ip6mr_mfc_delete+0x1fe/0x6b0 net/ipv6/ip6mr.c:1221
       ip6_mroute_setsockopt+0x15c6/0x1d70 net/ipv6/ip6mr.c:1698
       do_ipv6_setsockopt.isra.9+0x422/0x4660 net/ipv6/ipv6_sockglue.c:163
       ipv6_setsockopt+0xbd/0x170 net/ipv6/ipv6_sockglue.c:922
       rawv6_setsockopt+0x59/0x140 net/ipv6/raw.c:1060
       sock_common_setsockopt+0x9a/0xe0 net/core/sock.c:3039
       __sys_setsockopt+0x1bd/0x390 net/socket.c:1903
       __do_sys_setsockopt net/socket.c:1914 [inline]
       __se_sys_setsockopt net/socket.c:1911 [inline]
       __x64_sys_setsockopt+0xbe/0x150 net/socket.c:1911
       do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fixes: 8fb472c0 ("ipmr: improve hash scalability")
      Fixes: 0bbbf0e7 ("ipmr, ip6mr: Unite creation of new mr_table")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Cc: Yuval Mintz <yuvalm@mellanox.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Acked-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      66fb3325
    • A
      dccp: don't free ccid2_hc_tx_sock struct in dccp_disconnect() · 2677d206
      Alexey Kodanev 提交于
      Syzbot reported the use-after-free in timer_is_static_object() [1].
      
      This can happen because the structure for the rto timer (ccid2_hc_tx_sock)
      is removed in dccp_disconnect(), and ccid2_hc_tx_rto_expire() can be
      called after that.
      
      The report [1] is similar to the one in commit 120e9dab ("dccp:
      defer ccid_hc_tx_delete() at dismantle time"). And the fix is the same,
      delay freeing ccid2_hc_tx_sock structure, so that it is freed in
      dccp_sk_destruct().
      
      [1]
      
      ==================================================================
      BUG: KASAN: use-after-free in timer_is_static_object+0x80/0x90
      kernel/time/timer.c:607
      Read of size 8 at addr ffff8801bebb5118 by task syz-executor2/25299
      
      CPU: 1 PID: 25299 Comm: syz-executor2 Not tainted 4.17.0-rc5+ #54
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      Call Trace:
        <IRQ>
        __dump_stack lib/dump_stack.c:77 [inline]
        dump_stack+0x1b9/0x294 lib/dump_stack.c:113
        print_address_description+0x6c/0x20b mm/kasan/report.c:256
        kasan_report_error mm/kasan/report.c:354 [inline]
        kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
        __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
        timer_is_static_object+0x80/0x90 kernel/time/timer.c:607
        debug_object_activate+0x2d9/0x670 lib/debugobjects.c:508
        debug_timer_activate kernel/time/timer.c:709 [inline]
        debug_activate kernel/time/timer.c:764 [inline]
        __mod_timer kernel/time/timer.c:1041 [inline]
        mod_timer+0x4d3/0x13b0 kernel/time/timer.c:1102
        sk_reset_timer+0x22/0x60 net/core/sock.c:2742
        ccid2_hc_tx_rto_expire+0x587/0x680 net/dccp/ccids/ccid2.c:147
        call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
        expire_timers kernel/time/timer.c:1363 [inline]
        __run_timers+0x79e/0xc50 kernel/time/timer.c:1666
        run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
        __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
        invoke_softirq kernel/softirq.c:365 [inline]
        irq_exit+0x1d1/0x200 kernel/softirq.c:405
        exiting_irq arch/x86/include/asm/apic.h:525 [inline]
        smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
        apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
        </IRQ>
      ...
      Allocated by task 25374:
        save_stack+0x43/0xd0 mm/kasan/kasan.c:448
        set_track mm/kasan/kasan.c:460 [inline]
        kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
        kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
        kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554
        ccid_new+0x25b/0x3e0 net/dccp/ccid.c:151
        dccp_hdlr_ccid+0x27/0x150 net/dccp/feat.c:44
        __dccp_feat_activate+0x184/0x270 net/dccp/feat.c:344
        dccp_feat_activate_values+0x3a7/0x819 net/dccp/feat.c:1538
        dccp_create_openreq_child+0x472/0x610 net/dccp/minisocks.c:128
        dccp_v4_request_recv_sock+0x12c/0xca0 net/dccp/ipv4.c:408
        dccp_v6_request_recv_sock+0x125d/0x1f10 net/dccp/ipv6.c:415
        dccp_check_req+0x455/0x6a0 net/dccp/minisocks.c:197
        dccp_v4_rcv+0x7b8/0x1f3f net/dccp/ipv4.c:841
        ip_local_deliver_finish+0x2e3/0xd80 net/ipv4/ip_input.c:215
        NF_HOOK include/linux/netfilter.h:288 [inline]
        ip_local_deliver+0x1e1/0x720 net/ipv4/ip_input.c:256
        dst_input include/net/dst.h:450 [inline]
        ip_rcv_finish+0x81b/0x2200 net/ipv4/ip_input.c:396
        NF_HOOK include/linux/netfilter.h:288 [inline]
        ip_rcv+0xb70/0x143d net/ipv4/ip_input.c:492
        __netif_receive_skb_core+0x26f5/0x3630 net/core/dev.c:4592
        __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4657
        process_backlog+0x219/0x760 net/core/dev.c:5337
        napi_poll net/core/dev.c:5735 [inline]
        net_rx_action+0x7b7/0x1930 net/core/dev.c:5801
        __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
      
      Freed by task 25374:
        save_stack+0x43/0xd0 mm/kasan/kasan.c:448
        set_track mm/kasan/kasan.c:460 [inline]
        __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
        kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
        __cache_free mm/slab.c:3498 [inline]
        kmem_cache_free+0x86/0x2d0 mm/slab.c:3756
        ccid_hc_tx_delete+0xc3/0x100 net/dccp/ccid.c:190
        dccp_disconnect+0x130/0xc66 net/dccp/proto.c:286
        dccp_close+0x3bc/0xe60 net/dccp/proto.c:1045
        inet_release+0x104/0x1f0 net/ipv4/af_inet.c:427
        inet6_release+0x50/0x70 net/ipv6/af_inet6.c:460
        sock_release+0x96/0x1b0 net/socket.c:594
        sock_close+0x16/0x20 net/socket.c:1149
        __fput+0x34d/0x890 fs/file_table.c:209
        ____fput+0x15/0x20 fs/file_table.c:243
        task_work_run+0x1e4/0x290 kernel/task_work.c:113
        tracehook_notify_resume include/linux/tracehook.h:191 [inline]
        exit_to_usermode_loop+0x2bd/0x310 arch/x86/entry/common.c:166
        prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
        syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
        do_syscall_64+0x6ac/0x800 arch/x86/entry/common.c:290
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The buggy address belongs to the object at ffff8801bebb4cc0
        which belongs to the cache ccid2_hc_tx_sock of size 1240
      The buggy address is located 1112 bytes inside of
        1240-byte region [ffff8801bebb4cc0, ffff8801bebb5198)
      The buggy address belongs to the page:
      page:ffffea0006faed00 count:1 mapcount:0 mapping:ffff8801bebb41c0
      index:0xffff8801bebb5240 compound_mapcount: 0
      flags: 0x2fffc0000008100(slab|head)
      raw: 02fffc0000008100 ffff8801bebb41c0 ffff8801bebb5240 0000000100000003
      raw: ffff8801cdba3138 ffffea0007634120 ffff8801cdbaab40 0000000000000000
      page dumped because: kasan: bad access detected
      ...
      ==================================================================
      
      Reported-by: syzbot+5d47e9ec91a6f15dbd6f@syzkaller.appspotmail.com
      Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2677d206
    • W
      isdn: eicon: fix a missing-check bug · 6009d1fe
      Wenwen Wang 提交于
      In divasmain.c, the function divas_write() firstly invokes the function
      diva_xdi_open_adapter() to open the adapter that matches with the adapter
      number provided by the user, and then invokes the function diva_xdi_write()
      to perform the write operation using the matched adapter. The two functions
      diva_xdi_open_adapter() and diva_xdi_write() are located in diva.c.
      
      In diva_xdi_open_adapter(), the user command is copied to the object 'msg'
      from the userspace pointer 'src' through the function pointer 'cp_fn',
      which eventually calls copy_from_user() to do the copy. Then, the adapter
      number 'msg.adapter' is used to find out a matched adapter from the
      'adapter_queue'. A matched adapter will be returned if it is found.
      Otherwise, NULL is returned to indicate the failure of the verification on
      the adapter number.
      
      As mentioned above, if a matched adapter is returned, the function
      diva_xdi_write() is invoked to perform the write operation. In this
      function, the user command is copied once again from the userspace pointer
      'src', which is the same as the 'src' pointer in diva_xdi_open_adapter() as
      both of them are from the 'buf' pointer in divas_write(). Similarly, the
      copy is achieved through the function pointer 'cp_fn', which finally calls
      copy_from_user(). After the successful copy, the corresponding command
      processing handler of the matched adapter is invoked to perform the write
      operation.
      
      It is obvious that there are two copies here from userspace, one is in
      diva_xdi_open_adapter(), and one is in diva_xdi_write(). Plus, both of
      these two copies share the same source userspace pointer, i.e., the 'buf'
      pointer in divas_write(). Given that a malicious userspace process can race
      to change the content pointed by the 'buf' pointer, this can pose potential
      security issues. For example, in the first copy, the user provides a valid
      adapter number to pass the verification process and a valid adapter can be
      found. Then the user can modify the adapter number to an invalid number.
      This way, the user can bypass the verification process of the adapter
      number and inject inconsistent data.
      
      This patch reuses the data copied in
      diva_xdi_open_adapter() and passes it to diva_xdi_write(). This way, the
      above issues can be avoided.
      Signed-off-by: NWenwen Wang <wang6495@umn.edu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6009d1fe
    • F
      net: fec: Add a SPDX identifier · 1f508124
      Fabio Estevam 提交于
      Currently there is no license information in the header of
      this file.
      
      The MODULE_LICENSE field contains ("GPL"), which means
      GNU Public License v2 or later, so add a corresponding
      SPDX license identifier.
      Signed-off-by: NFabio Estevam <fabio.estevam@nxp.com>
      Acked-by: NFugang Duan <fugang.duan@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f508124
    • F
      net: fec: ptp: Switch to SPDX identifier · 9fcca5ef
      Fabio Estevam 提交于
      Adopt the SPDX license identifier headers to ease license compliance
      management.
      Signed-off-by: NFabio Estevam <fabio.estevam@nxp.com>
      Acked-by: NFugang Duan <fugang.duan@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9fcca5ef
    • X
      sctp: fix the issue that flags are ignored when using kernel_connect · 644fbdea
      Xin Long 提交于
      Now sctp uses inet_dgram_connect as its proto_ops .connect, and the flags
      param can't be passed into its proto .connect where this flags is really
      needed.
      
      sctp works around it by getting flags from socket file in __sctp_connect.
      It works for connecting from userspace, as inherently the user sock has
      socket file and it passes f_flags as the flags param into the proto_ops
      .connect.
      
      However, the sock created by sock_create_kern doesn't have a socket file,
      and it passes the flags (like O_NONBLOCK) by using the flags param in
      kernel_connect, which calls proto_ops .connect later.
      
      So to fix it, this patch defines a new proto_ops .connect for sctp,
      sctp_inet_connect, which calls __sctp_connect() directly with this
      flags param. After this, the sctp's proto .connect can be removed.
      
      Note that sctp_inet_connect doesn't need to do some checks that are not
      needed for sctp, which makes thing better than with inet_dgram_connect.
      Suggested-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Reviewed-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      644fbdea
  5. 22 5月, 2018 3 次提交