1. 29 10月, 2017 5 次提交
    • C
      net_sched: avoid matching qdisc with zero handle · 50317fce
      Cong Wang 提交于
      Davide found the following script triggers a NULL pointer
      dereference:
      
      ip l a name eth0 type dummy
      tc q a dev eth0 parent :1 handle 1: htb
      
      This is because for a freshly created netdevice noop_qdisc
      is attached and when passing 'parent :1', kernel actually
      tries to match the major handle which is 0 and noop_qdisc
      has handle 0 so is matched by mistake. Commit 69012ae4
      tries to fix a similar bug but still misses this case.
      
      Handle 0 is not a valid one, should be just skipped. In
      fact, kernel uses it as TC_H_UNSPEC.
      
      Fixes: 69012ae4 ("net: sched: fix handling of singleton qdiscs with qdisc_hash")
      Fixes: 59cc1f61 ("net: sched:convert qdisc linked list to hashtable")
      Reported-by: NDavide Caratti <dcaratti@redhat.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      50317fce
    • X
      sctp: reset owner sk for data chunks on out queues when migrating a sock · d04adf1b
      Xin Long 提交于
      Now when migrating sock to another one in sctp_sock_migrate(), it only
      resets owner sk for the data in receive queues, not the chunks on out
      queues.
      
      It would cause that data chunks length on the sock is not consistent
      with sk sk_wmem_alloc. When closing the sock or freeing these chunks,
      the old sk would never be freed, and the new sock may crash due to
      the overflow sk_wmem_alloc.
      
      syzbot found this issue with this series:
      
        r0 = socket$inet_sctp()
        sendto$inet(r0)
        listen(r0)
        accept4(r0)
        close(r0)
      
      Although listen() should have returned error when one TCP-style socket
      is in connecting (I may fix this one in another patch), it could also
      be reproduced by peeling off an assoc.
      
      This issue is there since very beginning.
      
      This patch is to reset owner sk for the chunks on out queues so that
      sk sk_wmem_alloc has correct value after accept one sock or peeloff
      an assoc to one sock.
      
      Note that when resetting owner sk for chunks on outqueue, it has to
      sctp_clear_owner_w/skb_orphan chunks before changing assoc->base.sk
      first and then sctp_set_owner_w them after changing assoc->base.sk,
      due to that sctp_wfree and it's callees are using assoc->base.sk.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d04adf1b
    • D
      Merge branch 'sockmap-fixes' · 151516fa
      David S. Miller 提交于
      John Fastabend says:
      
      ====================
      net: sockmap fixes
      
      Last two fixes (as far as I know) for sockmap code this round.
      
      First, we are using the qdisc cb structure when making the data end
      calculation. This is really just wrong so, store it with the other
      metadata in the correct tcp_skb_cb sturct to avoid breaking things.
      
      Next, with recent work to attach multiple programs to a cgroup a
      specific enumeration of return codes was agreed upon. However,
      I wrote the sk_skb program types before seeing this work and used
      a different convention. Patch 2 in the series aligns the return
      codes to avoid breaking with this infrastructure and also aligns
      with other programming conventions to avoid being the odd duck out
      forcing programs to remember SK_SKB programs are different. Pusing
      to net because its a user visible change. With this SK_SKB program
      return codes are the same as other cgroup program types.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      151516fa
    • J
      bpf: rename sk_actions to align with bpf infrastructure · bfa64075
      John Fastabend 提交于
      Recent additions to support multiple programs in cgroups impose
      a strict requirement, "all yes is yes, any no is no". To enforce
      this the infrastructure requires the 'no' return code, SK_DROP in
      this case, to be 0.
      
      To apply these rules to SK_SKB program types the sk_actions return
      codes need to be adjusted.
      
      This fix adds SK_PASS and makes 'SK_DROP = 0'. Finally, remove
      SK_ABORTED to remove any chance that the API may allow aborted
      program flows to be passed up the stack. This would be incorrect
      behavior and allow programs to break existing policies.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bfa64075
    • J
      bpf: bpf_compute_data uses incorrect cb structure · 8108a775
      John Fastabend 提交于
      SK_SKB program types use bpf_compute_data to store the end of the
      packet data. However, bpf_compute_data assumes the cb is stored in the
      qdisc layer format. But, for SK_SKB this is the wrong layer of the
      stack for this type.
      
      It happens to work (sort of!) because in most cases nothing happens
      to be overwritten today. This is very fragile and error prone.
      Fortunately, we have another hole in tcp_skb_cb we can use so lets
      put the data_end value there.
      
      Note, SK_SKB program types do not use data_meta, they are failed by
      sk_skb_is_valid_access().
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8108a775
  2. 28 10月, 2017 3 次提交
    • G
      tap: reference to KVA of an unloaded module causes kernel panic · dea6e19f
      Girish Moodalbail 提交于
      The commit 9a393b5d ("tap: tap as an independent module") created a
      separate tap module that implements tap functionality and exports
      interfaces that will be used by macvtap and ipvtap modules to create
      create respective tap devices.
      
      However, that patch introduced a regression wherein the modules macvtap
      and ipvtap can be removed (through modprobe -r) while there are
      applications using the respective /dev/tapX devices. These applications
      cause kernel to hold reference to /dev/tapX through 'struct cdev
      macvtap_cdev' and 'struct cdev ipvtap_dev' defined in macvtap and ipvtap
      modules respectively. So,  when the application is later closed the
      kernel panics because we are referencing KVA that is present in the
      unloaded modules.
      
      ----------8<------- Example ----------8<----------
      $ sudo ip li add name mv0 link enp7s0 type macvtap
      $ sudo ip li show mv0 |grep mv0| awk -e '{print $1 $2}'
        14:mv0@enp7s0:
      $ cat /dev/tap14 &
      $ lsmod |egrep -i 'tap|vlan'
      macvtap                16384  0
      macvlan                24576  1 macvtap
      tap                    24576  3 macvtap
      $ sudo modprobe -r macvtap
      $ fg
      cat /dev/tap14
      ^C
      
      <...system panics...>
      BUG: unable to handle kernel paging request at ffffffffa038c500
      IP: cdev_put+0xf/0x30
      ----------8<-----------------8<----------
      
      The fix is to set cdev.owner to the module that creates the tap device
      (either macvtap or ipvtap). With this set, the operations (in
      fs/char_dev.c) on char device holds and releases the module through
      cdev_get() and cdev_put() and will not allow the module to unload
      prematurely.
      
      Fixes: 9a393b5d (tap: tap as an independent module)
      Signed-off-by: NGirish Moodalbail <girish.moodalbail@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dea6e19f
    • E
      tcp: refresh tp timestamp before tcp_mtu_probe() · ee1836ae
      Eric Dumazet 提交于
      In the unlikely event tcp_mtu_probe() is sending a packet, we
      want tp->tcp_mstamp being as accurate as possible.
      
      This means we need to call tcp_mstamp_refresh() a bit earlier in
      tcp_write_xmit().
      
      Fixes: 385e2070 ("tcp: use tp->tcp_mstamp in output path")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee1836ae
    • J
      tuntap: properly align skb->head before building skb · 63b9ab65
      Jason Wang 提交于
      An unaligned alloc_frag->offset caused by previous allocation will
      result an unaligned skb->head. This will lead unaligned
      skb_shared_info and then unaligned dataref which requires to be
      aligned for accessing on some architecture. Fix this by aligning
      alloc_frag->offset before the frag refilling.
      
      Fixes: 0bbd7dad ("tun: make tun_build_skb() thread safe")
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
      Cc: Wei Wei <dotweiba@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Reported-by: NWei Wei <dotweiba@gmail.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      63b9ab65
  3. 27 10月, 2017 8 次提交
    • D
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue · 8ab190fb
      David S. Miller 提交于
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates 2017-10-26
      
      This series contains fixes to e1000, igb, ixgbe and i40e.
      
      Vincenzo Maffione fixes a potential race condition which would result in
      the interface being up but transmits are disabled in the hardware.
      
      Colin Ian King fixes a possible NULL pointer dereference in e1000, which
      was found by Coverity.
      
      Jean-Philippe Brucker fixes a possible kernel panic when a driver cannot
      map a transmit buffer, which is caused by an erroneous test.
      
      Alex provides a fix for ixgbe, which is a partial revert of the commit
      ffed21bc ("ixgbe: Don't bother clearing buffer memory for descriptor rings")
      because the previous commit messed up the exception handling path by
      adding the count back in when we did not need to.  Also fixed a typo,
      where the transmit ITR setting was being used to determine if we were
      using adaptive receive interrupt moderation or not.  Lastly, fixed a
      memory leak by including programming descriptors in the cleaned count.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ab190fb
    • X
      ip6_gre: update dst pmtu if dev mtu has been updated by toobig in __gre6_xmit · 8aec4959
      Xin Long 提交于
      When receiving a Toobig icmpv6 packet, ip6gre_err would just set
      tunnel dev's mtu, that's not enough. For skb_dst(skb)'s pmtu may
      still be using the old value, it has no chance to be updated with
      tunnel dev's mtu.
      
      Jianlin found this issue by reducing route's mtu while running
      netperf, the performance went to 0.
      
      ip6ip6 and ip4ip6 tunnel can work well with this, as they lookup
      the upper dst and update_pmtu it's pmtu or icmpv6_send a Toobig
      to upper socket after setting tunnel dev's mtu.
      
      We couldn't do that for ip6_gre, as gre's inner packet could be
      any protocol, it's difficult to handle them (like lookup upper
      dst) in a good way.
      
      So this patch is to fix it by updating skb_dst(skb)'s pmtu when
      dev->mtu < skb_dst(skb)'s pmtu in tx path. It's safe to do this
      update there, as usually dev->mtu <= skb_dst(skb)'s pmtu and no
      performance regression can be caused by this.
      
      Fixes: c12b395a ("gre: Support GRE over IPv6")
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8aec4959
    • X
      ip6_gre: only increase err_count for some certain type icmpv6 in ip6gre_err · f8d20b46
      Xin Long 提交于
      The similar fix in patch 'ipip: only increase err_count for some
      certain type icmp in ipip_err' is needed for ip6gre_err.
      
      In Jianlin's case, udp netperf broke even when receiving a TooBig
      icmpv6 packet.
      
      Fixes: c12b395a ("gre: Support GRE over IPv6")
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f8d20b46
    • X
      ipip: only increase err_count for some certain type icmp in ipip_err · f3594f0a
      Xin Long 提交于
      t->err_count is used to count the link failure on tunnel and an err
      will be reported to user socket in tx path if t->err_count is not 0.
      udp socket could even return EHOSTUNREACH to users.
      
      Since commit fd58156e ("IPIP: Use ip-tunneling code.") removed
      the 'switch check' for icmp type in ipip_err(), err_count would be
      increased by the icmp packet with ICMP_EXC_FRAGTIME code. an link
      failure would be reported out due to this.
      
      In Jianlin's case, when receiving ICMP_EXC_FRAGTIME a icmp packet,
      udp netperf failed with the err:
        send_data: data send error: No route to host (errno 113)
      
      We expect this error reported from tunnel to socket when receiving
      some certain type icmp, but not ICMP_EXC_FRAGTIME, ICMP_SR_FAILED
      or ICMP_PARAMETERPROB ones.
      
      This patch is to bring 'switch check' for icmp type back to ipip_err
      so that it only reports link failure for the right type icmp, just as
      in ipgre_err() and ipip6_err().
      
      Fixes: fd58156e ("IPIP: Use ip-tunneling code.")
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f3594f0a
    • J
      net: stmmac: First Queue must always be in DCB mode · 6d9f0790
      Jose Abreu 提交于
      According to DWMAC databook the first queue operating mode
      must always be in DCB.
      
      As MTL_QUEUE_DCB = 1, we need to always set the first queue
      operating mode to DCB otherwise driver will think that queue
      is in AVB mode (because MTL_QUEUE_AVB = 0).
      Signed-off-by: NJose Abreu <joabreu@synopsys.com>
      Cc: Joao Pinto <jpinto@synopsys.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: Alexandre Torgue <alexandre.torgue@st.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6d9f0790
    • J
      net: stmmac: dwc-qos-eth: Fix typo in DT bindings parsing · 4894ac6b
      Jose Abreu 提交于
      According to DT bindings documentation we are expecting a
      property called "snps,read-requests" but we are parsing
      instead a property called "read,read-requests".
      
      This is clearly a typo. Fix it.
      Signed-off-by: NJose Abreu <joabreu@synopsys.com>
      Cc: Joao Pinto <jpinto@synopsys.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: Alexandre Torgue <alexandre.torgue@st.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4894ac6b
    • D
      Merge tag 'mlx5-fixes-2017-10-26' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 5be9541a
      David S. Miller 提交于
      Saeed Mahameed says:
      
      ====================
      Mellanox, mlx5 fixes 2017-10-26
      
      The series includes some misc fixes for mlx5 core and etherent driver.
      Please pull and let me know if there's any problem.
      
      For -Stable:
      net/mlx5e: Properly deal with encap flows add/del under neigh update (kernels >= 4.12)
      net/mlx5: Fix health work queue spin lock to IRQ safe  (kernels >= 4.13)
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5be9541a
    • D
      Merge tag 'mac80211-for-davem-2017-10-25' of... · 9618aec3
      David S. Miller 提交于
      Merge tag 'mac80211-for-davem-2017-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
      
      Johannes Berg says:
      
      ====================
      pull-request: mac80211 2017-10-25
      
      Here are:
       * follow-up fixes for the WoWLAN security issue, to fix a
         partial TKIP key material problem and to use crypto_memneq()
       * a change for better enforcement of FQ's memory limit
       * a disconnect/connect handling fix, and
       * a user rate mask validation fix
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9618aec3
  4. 26 10月, 2017 22 次提交
  5. 25 10月, 2017 2 次提交