1. 15 11月, 2013 4 次提交
    • E
      tcp: tsq: restore minimal amount of queueing · 98e09386
      Eric Dumazet 提交于
      After commit c9eeec26 ("tcp: TSQ can use a dynamic limit"), several
      users reported throughput regressions, notably on mvneta and wifi
      adapters.
      
      802.11 AMPDU requires a fair amount of queueing to be effective.
      
      This patch partially reverts the change done in tcp_write_xmit()
      so that the minimal amount is sysctl_tcp_limit_output_bytes.
      
      It also remove the use of this sysctl while building skb stored
      in write queue, as TSO autosizing does the right thing anyway.
      
      Users with well behaving NICS and correct qdisc (like sch_fq),
      can then lower the default sysctl_tcp_limit_output_bytes value from
      128KB to 8KB.
      
      This new usage of sysctl_tcp_limit_output_bytes permits each driver
      authors to check how their driver performs when/if the value is set
      to a minimum of 4KB.
      
      Normally, line rate for a single TCP flow should be possible,
      but some drivers rely on timers to perform TX completion and
      too long TX completion delays prevent reaching full throughput.
      
      Fixes: c9eeec26 ("tcp: TSQ can use a dynamic limit")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NSujith Manoharan <sujith@msujith.org>
      Reported-by: NArnaud Ebalard <arno@natisbad.org>
      Tested-by: NSujith Manoharan <sujith@msujith.org>
      Cc: Felix Fietkau <nbd@openwrt.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      98e09386
    • T
      bridge: Fix memory leak when deleting bridge with vlan filtering enabled · b4e09b29
      Toshiaki Makita 提交于
      We currently don't call br_vlan_flush() when deleting a bridge, which
      leads to memory leak if br->vlan_info is allocated.
      
      Steps to reproduce:
        while :
        do
          brctl addbr br0
          bridge vlan add dev br0 vid 10 self
          brctl delbr br0
        done
      We can observe the cache size of corresponding slab entry
      (as kmalloc-2048 in SLUB) is increased.
      
      kmemleak output:
      unreferenced object 0xffff8800b68a7000 (size 2048):
        comm "bridge", pid 2086, jiffies 4295774704 (age 47.656s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 48 9b 36 00 88 ff ff  .........H.6....
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff815eb6ae>] kmemleak_alloc+0x4e/0xb0
          [<ffffffff8116a1ca>] kmem_cache_alloc_trace+0xca/0x220
          [<ffffffffa03eddd6>] br_vlan_add+0x66/0xe0 [bridge]
          [<ffffffffa03e543c>] br_setlink+0x2dc/0x340 [bridge]
          [<ffffffff8150e481>] rtnl_bridge_setlink+0x101/0x200
          [<ffffffff8150d9d9>] rtnetlink_rcv_msg+0x99/0x260
          [<ffffffff81528679>] netlink_rcv_skb+0xa9/0xc0
          [<ffffffff8150d938>] rtnetlink_rcv+0x28/0x30
          [<ffffffff81527ccd>] netlink_unicast+0xdd/0x190
          [<ffffffff8152807f>] netlink_sendmsg+0x2ff/0x740
          [<ffffffff814e8368>] sock_sendmsg+0x88/0xc0
          [<ffffffff814e8ac8>] ___sys_sendmsg.part.14+0x298/0x2b0
          [<ffffffff814e91de>] __sys_sendmsg+0x4e/0x90
          [<ffffffff814e922e>] SyS_sendmsg+0xe/0x10
          [<ffffffff81601669>] system_call_fastpath+0x16/0x1b
          [<ffffffffffffffff>] 0xffffffffffffffff
      Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b4e09b29
    • T
      bridge: Call vlan_vid_del for all vids at nbp_vlan_flush · dbbaf949
      Toshiaki Makita 提交于
      We should call vlan_vid_del for all vids at nbp_vlan_flush to prevent
      vid_info->refcount from being leaked when detaching a bridge port.
      Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dbbaf949
    • T
      bridge: Use vlan_vid_[add/del] instead of direct ndo_vlan_rx_[add/kill]_vid calls · 19236837
      Toshiaki Makita 提交于
      We should use wrapper functions vlan_vid_[add/del] instead of
      ndo_vlan_rx_[add/kill]_vid. Otherwise, we might be not able to communicate
      using vlan interface in a certain situation.
      
      Example of problematic case:
        vconfig add eth0 10
        brctl addif br0 eth0
        bridge vlan add dev eth0 vid 10
        bridge vlan del dev eth0 vid 10
        brctl delif br0 eth0
      In this case, we cannot communicate via eth0.10 because vlan 10 is
      filtered by NIC that has the vlan filtering feature.
      Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      19236837
  2. 14 11月, 2013 1 次提交
    • A
      core/dev: do not ignore dmac in dev_forward_skb() · 81b9eab5
      Alexei Starovoitov 提交于
      commit 06a23fe3
      ("core/dev: set pkt_type after eth_type_trans() in dev_forward_skb()")
      and refactoring 64261f23
      ("dev: move skb_scrub_packet() after eth_type_trans()")
      
      are forcing pkt_type to be PACKET_HOST when skb traverses veth.
      
      which means that ip forwarding will kick in inside netns
      even if skb->eth->h_dest != dev->dev_addr
      
      Fix order of eth_type_trans() and skb_scrub_packet() in dev_forward_skb()
      and in ip_tunnel_rcv()
      
      Fixes: 06a23fe3 ("core/dev: set pkt_type after eth_type_trans() in dev_forward_skb()")
      CC: Isaku Yamahata <yamahatanetdev@gmail.com>
      CC: Maciej Zenczykowski <zenczykowski@gmail.com>
      CC: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      81b9eab5
  3. 11 11月, 2013 4 次提交
  4. 10 11月, 2013 1 次提交
  5. 09 11月, 2013 7 次提交
  6. 08 11月, 2013 10 次提交
  7. 06 11月, 2013 4 次提交
  8. 05 11月, 2013 5 次提交
  9. 04 11月, 2013 4 次提交
    • D
      net: sctp: do not trigger BUG_ON in sctp_cmd_delete_tcb · 7926c1d5
      Daniel Borkmann 提交于
      Introduced in f9e42b85 ("net: sctp: sideeffect: throw BUG if
      primary_path is NULL"), we intended to find a buggy assoc that's
      part of the assoc hash table with a primary_path that is NULL.
      However, we better remove the BUG_ON for now and find a more
      suitable place to assert for these things as Mark reports that
      this also triggers the bug when duplication cookie processing
      happens, and the assoc is not part of the hash table (so all
      good in this case). Such a situation can for example easily be
      reproduced by:
      
        tc qdisc add dev eth0 root handle 1: prio bands 2 priomap 1 1 1 1 1 1
        tc qdisc add dev eth0 parent 1:2 handle 20: netem loss 20%
        tc filter add dev eth0 protocol ip parent 1: prio 2 u32 match ip \
                  protocol 132 0xff match u8 0x0b 0xff at 32 flowid 1:2
      
      This drops 20% of COOKIE-ACK packets. After some follow-up
      discussion with Vlad we came to the conclusion that for now we
      should still better remove this BUG_ON() assertion, and come up
      with two follow-ups later on, that is, i) find a more suitable
      place for this assertion, and possibly ii) have a special
      allocator/initializer for such kind of temporary assocs.
      Reported-by: NMark Thomas <Mark.Thomas@metaswitch.com>
      Signed-off-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7926c1d5
    • A
      net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0) · f421436a
      Arvid Brodin 提交于
      High-availability Seamless Redundancy ("HSR") provides instant failover
      redundancy for Ethernet networks. It requires a special network topology where
      all nodes are connected in a ring (each node having two physical network
      interfaces). It is suited for applications that demand high availability and
      very short reaction time.
      
      HSR acts on the Ethernet layer, using a registered Ethernet protocol type to
      send special HSR frames in both directions over the ring. The driver creates
      virtual network interfaces that can be used just like any ordinary Linux
      network interface, for IP/TCP/UDP traffic etc. All nodes in the network ring
      must be HSR capable.
      
      This code is a "best effort" to comply with the HSR standard as described in
      IEC 62439-3:2010 (HSRv0).
      Signed-off-by: NArvid Brodin <arvid.brodin@xdin.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f421436a
    • E
      net: extend net_device allocation to vmalloc() · 74d332c1
      Eric Dumazet 提交于
      Joby Poriyath provided a xen-netback patch to reduce the size of
      xenvif structure as some netdev allocation could fail under
      memory pressure/fragmentation.
      
      This patch is handling the problem at the core level, allowing
      any netdev structures to use vmalloc() if kmalloc() failed.
      
      As vmalloc() adds overhead on a critical network path, add __GFP_REPEAT
      to kzalloc() flags to do this fallback only when really needed.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NJoby Poriyath <joby.poriyath@citrix.com>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      74d332c1
    • D
      net: sctp: fix and consolidate SCTP checksumming code · e6d8b64b
      Daniel Borkmann 提交于
      This fixes an outstanding bug found through IPVS, where SCTP packets
      with skb->data_len > 0 (non-linearized) and empty frag_list, but data
      accumulated in frags[] member, are forwarded with incorrect checksum
      letting SCTP initial handshake fail on some systems. Linearizing each
      SCTP skb in IPVS to prevent that would not be a good solution as
      this leads to an additional and unnecessary performance penalty on
      the load-balancer itself for no good reason (as we actually only want
      to update the checksum, and can do that in a different/better way
      presented here).
      
      The actual problem is elsewhere, namely, that SCTP's checksumming
      in sctp_compute_cksum() does not take frags[] into account like
      skb_checksum() does. So while we are fixing this up, we better reuse
      the existing code that we have anyway in __skb_checksum() and use it
      for walking through the data doing checksumming. This will not only
      fix this issue, but also consolidates some SCTP code with core
      sk_buff code, bringing it closer together and removing respectively
      avoiding reimplementation of skb_checksum() for no good reason.
      
      As crc32c() can use hardware implementation within the crypto layer,
      we leave that intact (it wraps around / falls back to e.g. slice-by-8
      algorithm in __crc32c_le() otherwise); plus use the __crc32c_le_combine()
      combinator for crc32c blocks.
      
      Also, we remove all other SCTP checksumming code, so that we only
      have to use sctp_compute_cksum() from now on; for doing that, we need
      to transform SCTP checkumming in output path slightly, and can leave
      the rest intact.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e6d8b64b