1. 05 11月, 2013 2 次提交
  2. 04 11月, 2013 8 次提交
    • D
      net: sctp: do not trigger BUG_ON in sctp_cmd_delete_tcb · 7926c1d5
      Daniel Borkmann 提交于
      Introduced in f9e42b85 ("net: sctp: sideeffect: throw BUG if
      primary_path is NULL"), we intended to find a buggy assoc that's
      part of the assoc hash table with a primary_path that is NULL.
      However, we better remove the BUG_ON for now and find a more
      suitable place to assert for these things as Mark reports that
      this also triggers the bug when duplication cookie processing
      happens, and the assoc is not part of the hash table (so all
      good in this case). Such a situation can for example easily be
      reproduced by:
      
        tc qdisc add dev eth0 root handle 1: prio bands 2 priomap 1 1 1 1 1 1
        tc qdisc add dev eth0 parent 1:2 handle 20: netem loss 20%
        tc filter add dev eth0 protocol ip parent 1: prio 2 u32 match ip \
                  protocol 132 0xff match u8 0x0b 0xff at 32 flowid 1:2
      
      This drops 20% of COOKIE-ACK packets. After some follow-up
      discussion with Vlad we came to the conclusion that for now we
      should still better remove this BUG_ON() assertion, and come up
      with two follow-ups later on, that is, i) find a more suitable
      place for this assertion, and possibly ii) have a special
      allocator/initializer for such kind of temporary assocs.
      Reported-by: NMark Thomas <Mark.Thomas@metaswitch.com>
      Signed-off-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7926c1d5
    • A
      net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0) · f421436a
      Arvid Brodin 提交于
      High-availability Seamless Redundancy ("HSR") provides instant failover
      redundancy for Ethernet networks. It requires a special network topology where
      all nodes are connected in a ring (each node having two physical network
      interfaces). It is suited for applications that demand high availability and
      very short reaction time.
      
      HSR acts on the Ethernet layer, using a registered Ethernet protocol type to
      send special HSR frames in both directions over the ring. The driver creates
      virtual network interfaces that can be used just like any ordinary Linux
      network interface, for IP/TCP/UDP traffic etc. All nodes in the network ring
      must be HSR capable.
      
      This code is a "best effort" to comply with the HSR standard as described in
      IEC 62439-3:2010 (HSRv0).
      Signed-off-by: NArvid Brodin <arvid.brodin@xdin.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f421436a
    • E
      net: extend net_device allocation to vmalloc() · 74d332c1
      Eric Dumazet 提交于
      Joby Poriyath provided a xen-netback patch to reduce the size of
      xenvif structure as some netdev allocation could fail under
      memory pressure/fragmentation.
      
      This patch is handling the problem at the core level, allowing
      any netdev structures to use vmalloc() if kmalloc() failed.
      
      As vmalloc() adds overhead on a critical network path, add __GFP_REPEAT
      to kzalloc() flags to do this fallback only when really needed.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NJoby Poriyath <joby.poriyath@citrix.com>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      74d332c1
    • D
      net: sctp: fix and consolidate SCTP checksumming code · e6d8b64b
      Daniel Borkmann 提交于
      This fixes an outstanding bug found through IPVS, where SCTP packets
      with skb->data_len > 0 (non-linearized) and empty frag_list, but data
      accumulated in frags[] member, are forwarded with incorrect checksum
      letting SCTP initial handshake fail on some systems. Linearizing each
      SCTP skb in IPVS to prevent that would not be a good solution as
      this leads to an additional and unnecessary performance penalty on
      the load-balancer itself for no good reason (as we actually only want
      to update the checksum, and can do that in a different/better way
      presented here).
      
      The actual problem is elsewhere, namely, that SCTP's checksumming
      in sctp_compute_cksum() does not take frags[] into account like
      skb_checksum() does. So while we are fixing this up, we better reuse
      the existing code that we have anyway in __skb_checksum() and use it
      for walking through the data doing checksumming. This will not only
      fix this issue, but also consolidates some SCTP code with core
      sk_buff code, bringing it closer together and removing respectively
      avoiding reimplementation of skb_checksum() for no good reason.
      
      As crc32c() can use hardware implementation within the crypto layer,
      we leave that intact (it wraps around / falls back to e.g. slice-by-8
      algorithm in __crc32c_le() otherwise); plus use the __crc32c_le_combine()
      combinator for crc32c blocks.
      
      Also, we remove all other SCTP checksumming code, so that we only
      have to use sctp_compute_cksum() from now on; for doing that, we need
      to transform SCTP checkumming in output path slightly, and can leave
      the rest intact.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e6d8b64b
    • D
      net: skb_checksum: allow custom update/combine for walking skb · 2817a336
      Daniel Borkmann 提交于
      Currently, skb_checksum walks over 1) linearized, 2) frags[], and
      3) frag_list data and calculats the one's complement, a 32 bit
      result suitable for feeding into itself or csum_tcpudp_magic(),
      but unsuitable for SCTP as we're calculating CRC32c there.
      
      Hence, in order to not re-implement the very same function in
      SCTP (and maybe other protocols) over and over again, use an
      update() + combine() callback internally to allow for walking
      over the skb with different algorithms.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2817a336
    • W
      netfilter: nf_tables: remove duplicated include from nf_tables_ipv4.c · ca0e8bd6
      Wei Yongjun 提交于
      Remove duplicated include.
      Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      ca0e8bd6
    • H
      netfilter: ctnetlink: account both directions in one step · 4542fa47
      Holger Eitzenberger 提交于
      With the intent to dump other accounting data later.
      This patch is a cleanup.
      Signed-off-by: NHolger Eitzenberger <holger@eitzenberger.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      4542fa47
    • H
      netfilter: introduce nf_conn_acct structure · f7b13e43
      Holger Eitzenberger 提交于
      Encapsulate counters for both directions into nf_conn_acct. During
      that process also consistently name pointers to the extend 'acct',
      not 'counters'. This patch is a cleanup.
      Signed-off-by: NHolger Eitzenberger <holger@eitzenberger.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      f7b13e43
  3. 02 11月, 2013 5 次提交
  4. 01 11月, 2013 1 次提交
  5. 31 10月, 2013 7 次提交
  6. 30 10月, 2013 5 次提交
    • Y
      tcp: temporarily disable Fast Open on SYN timeout · c968601d
      Yuchung Cheng 提交于
      Fast Open currently has a fall back feature to address SYN-data being
      dropped but it requires the middle-box to pass on regular SYN retry
      after SYN-data. This is implemented in commit aab48743 ("net-tcp:
      Fast Open client - detecting SYN-data drops")
      
      However some NAT boxes will drop all subsequent packets after first
      SYN-data and blackholes the entire connections.  An example is in
      commit 356d7d88 "netfilter: nf_conntrack: fix tcp_in_window for Fast
      Open".
      
      The sender should note such incidents and fall back to use the regular
      TCP handshake on subsequent attempts temporarily as well: after the
      second SYN timeouts the original Fast Open SYN is most likely lost.
      When such an event recurs Fast Open is disabled based on the number of
      recurrences exponentially.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c968601d
    • D
      net: ipvs: sctp: do not recalc sctp csum when ports didn't change · 97203abe
      Daniel Borkmann 提交于
      Unlike UDP or TCP, we do not take the pseudo-header into
      account in SCTP checksums. So in case port mapping is the
      very same, we do not need to recalculate the whole SCTP
      checksum in software, which is very expensive.
      
      Also, similarly as in TCP, take into account when a private
      helper mangled the packet. In that case, we also need to
      recalculate the checksum even if ports might be same.
      
      Thanks for feedback regarding skb->ip_summed checks from
      Julian Anastasov; here's a discussion on these checks for
      snat and dnat:
      
      * For snat_handler(), we can see CHECKSUM_PARTIAL from
        virtual devices, and from LOCAL_OUT, otherwise it
        should be CHECKSUM_UNNECESSARY. In general, in snat it
        is more complex. skb contains the original route and
        ip_vs_route_me_harder() can change the route after
        snat_handler. So, for locally generated replies from
        local server we can not preserve the CHECKSUM_PARTIAL
        mode. It is an chicken or egg dilemma: snat_handler
        needs the device after rerouting (to check for
        NETIF_F_SCTP_CSUM), while ip_route_me_harder() wants
        the snat_handler() to put the new saddr for proper
        rerouting.
      
      * For dnat_handler(), we should not see CHECKSUM_COMPLETE
        for SCTP, in fact the small set of drivers that support
        SCTP offloading return CHECKSUM_UNNECESSARY on correctly
        received SCTP csum. We can see CHECKSUM_PARTIAL from
        local stack or received from virtual drivers. The idea is
        that SCTP decides to avoid csum calculation if hardware
        supports offloading. IPVS can change the device after
        rerouting to real server but we can preserve the
        CHECKSUM_PARTIAL mode if the new device supports
        offloading too. This works because skb dst is changed
        before dnat_handler and we see the new device. So, checks
        in the 'if' part will decide whether it is ok to keep
        CHECKSUM_PARTIAL for the output. If the packet was with
        CHECKSUM_NONE, hence we deal with unknown checksum. As we
        recalculate the sum for IP header in all cases, it should
        be safe to use CHECKSUM_UNNECESSARY. We can forward wrong
        checksum in this case (without cp->app). In case of
        CHECKSUM_UNNECESSARY, the csum was valid on receive.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      97203abe
    • V
      bridge: pass correct vlan id to multicast code · 06499098
      Vlad Yasevich 提交于
      Currently multicast code attempts to extrace the vlan id from
      the skb even when vlan filtering is disabled.  This can lead
      to mdb entries being created with the wrong vlan id.
      Pass the already extracted vlan id to the multicast
      filtering code to make the correct id is used in
      creation as well as lookup.
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Acked-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06499098
    • M
      net: x25: Fix dead URLs in Kconfig · 706e282b
      Michael Drüing 提交于
      Update the URLs in the Kconfig file to the new pages at sangoma.com and cisco.com
      Signed-off-by: NMichael Drüing <michael@drueing.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      706e282b
    • D
      net: sched: cls_bpf: add BPF-based classifier · 7d1d65cb
      Daniel Borkmann 提交于
      This work contains a lightweight BPF-based traffic classifier that can
      serve as a flexible alternative to ematch-based tree classification, i.e.
      now that BPF filter engine can also be JITed in the kernel. Naturally, tc
      actions and policies are supported as well with cls_bpf. Multiple BPF
      programs/filter can be attached for a class, or they can just as well be
      written within a single BPF program, that's really up to the user how he
      wishes to run/optimize the code, e.g. also for inversion of verdicts etc.
      The notion of a BPF program's return/exit codes is being kept as follows:
      
           0: No match
          -1: Select classid given in "tc filter ..." command
        else: flowid, overwrite the default one
      
      As a minimal usage example with iproute2, we use a 3 band prio root qdisc
      on a router with sfq each as leave, and assign ssh and icmp bpf-based
      filters to band 1, http traffic to band 2 and the rest to band 3. For the
      first two bands we load the bytecode from a file, in the 2nd we load it
      inline as an example:
      
      echo 1 > /proc/sys/net/core/bpf_jit_enable
      
      tc qdisc del dev em1 root
      tc qdisc add dev em1 root handle 1: prio bands 3 priomap 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
      
      tc qdisc add dev em1 parent 1:1 sfq perturb 16
      tc qdisc add dev em1 parent 1:2 sfq perturb 16
      tc qdisc add dev em1 parent 1:3 sfq perturb 16
      
      tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/ssh.bpf flowid 1:1
      tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/icmp.bpf flowid 1:1
      tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/http.bpf flowid 1:2
      tc filter add dev em1 parent 1: bpf run bytecode "`bpfc -f tc -i misc.ops`" flowid 1:3
      
      BPF programs can be easily created and passed to tc, either as inline
      'bytecode' or 'bytecode-file'. There are a couple of front-ends that can
      compile opcodes, for example:
      
      1) People familiar with tcpdump-like filters:
      
         tcpdump -iem1 -ddd port 22 | tr '\n' ',' > /etc/tc/ssh.bpf
      
      2) People that want to low-level program their filters or use BPF
         extensions that lack support by libpcap's compiler:
      
         bpfc -f tc -i ssh.ops > /etc/tc/ssh.bpf
      
         ssh.ops example code:
         ldh [12]
         jne #0x800, drop
         ldb [23]
         jneq #6, drop
         ldh [20]
         jset #0x1fff, drop
         ldxb 4 * ([14] & 0xf)
         ldh [%x + 14]
         jeq #0x16, pass
         ldh [%x + 16]
         jne #0x16, drop
         pass: ret #-1
         drop: ret #0
      
      It was chosen to load bytecode into tc, since the reverse operation,
      tc filter list dev em1, is then able to show the exact commands again.
      Possible follow-up work could also include a small expression compiler
      for iproute2. Tested with the help of bmon. This idea came up during
      the Netfilter Workshop 2013 in Copenhagen. Also thanks to feedback from
      Eric Dumazet!
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Thomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d1d65cb
  7. 29 10月, 2013 12 次提交