1. 08 11月, 2013 6 次提交
    • J
      net: Add layer 2 hardware acceleration operations for macvlan devices · a6cc0cfa
      John Fastabend 提交于
      Add a operations structure that allows a network interface to export
      the fact that it supports package forwarding in hardware between
      physical interfaces and other mac layer devices assigned to it (such
      as macvlans). This operaions structure can be used by virtual mac
      devices to bypass software switching so that forwarding can be done
      in hardware more efficiently.
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a6cc0cfa
    • D
      6lowpan: release device on error path · 78032f9b
      Dan Carpenter 提交于
      We recently added a new error path and it needs a dev_put().
      
      Fixes: 7adac1ec ('6lowpan: Only make 6lowpan links to IEEE802154 devices')
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      78032f9b
    • E
      net/vlan: Provide read access to the vlan egress map · d3243539
      Eyal Perry 提交于
      Provide a method for read-only access to the vlan device egress mapping.
      
      Do this by refactoring vlan_dev_get_egress_qos_mask() such that now it
      receives as an argument the skb priority instead of pointer to the skb.
      
      Such an access is needed for the IBoE stack where the control plane
      goes through the network stack. This is an add-on step on top of commit
      d4a96865 "net/route: export symbol ip_tos2prio" which allowed the RDMA-CM
      to use ip_tos2prio.
      Signed-off-by: NEyal Perry <eyalpe@mellanox.com>
      Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3243539
    • E
      tipc: reassembly failures should cause link reset · a715b49e
      Erik Hugne 提交于
      If appending a received fragment to the pending fragment chain
      in a unicast link fails, the current code tries to force a retransmission
      of the fragment by decrementing the 'next received sequence number'
      field in the link. This is done under the assumption that the failure
      is caused by an out-of-memory situation, an assumption that does
      not hold true after the previous patch in this series.
      
      A failure to append a fragment can now only be caused by a protocol
      violation by the sending peer, and it must hence be assumed that it
      is either malicious or buggy.  Either way, the correct behavior is now
      to reset the link instead of trying to revert its sequence number.
      So, this is what we do in this commit.
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a715b49e
    • E
      tipc: message reassembly using fragment chain · 40ba3cdf
      Erik Hugne 提交于
      When the first fragment of a long data data message is received on a link, a
      reassembly buffer large enough to hold the data from this and all subsequent
      fragments of the message is allocated. The payload of each new fragment is
      copied into this buffer upon arrival. When the last fragment is received, the
      reassembled message is delivered upwards to the port/socket layer.
      
      Not only is this an inefficient approach, but it may also cause bursts of
      reassembly failures in low memory situations. since we may fail to allocate
      the necessary large buffer in the first place. Furthermore, after 100 subsequent
      such failures the link will be reset, something that in reality aggravates the
      situation.
      
      To remedy this problem, this patch introduces a different approach. Instead of
      allocating a big reassembly buffer, we now append the arriving fragments
      to a reassembly chain on the link, and deliver the whole chain up to the
      socket layer once the last fragment has been received. This is safe because
      the retransmission layer of a TIPC link always delivers packets in strict
      uninterrupted order, to the reassembly layer as to all other upper layers.
      Hence there can never be more than one fragment chain pending reassembly at
      any given time in a link, and we can trust (but still verify) that the
      fragments will be chained up in the correct order.
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      40ba3cdf
    • E
      tipc: don't reroute message fragments · 528f6f4b
      Erik Hugne 提交于
      When a message fragment is received in a broadcast or unicast link,
      the reception code will append the fragment payload to a big reassembly
      buffer through a call to the function tipc_recv_fragm(). However, after
      the return of that call, the logics goes on and passes the fragment
      buffer to the function tipc_net_route_msg(), which will simply drop it.
      This behavior is a remnant from the now obsolete multi-cluster
      functionality, and has no relevance in the current code base.
      
      Although currently harmless, this unnecessary call would be fatal
      after applying the next patch in this series, which introduces
      a completely new reassembly algorithm. So we change the code to
      eliminate the redundant call.
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      528f6f4b
  2. 06 11月, 2013 4 次提交
  3. 05 11月, 2013 5 次提交
  4. 04 11月, 2013 8 次提交
    • D
      net: sctp: do not trigger BUG_ON in sctp_cmd_delete_tcb · 7926c1d5
      Daniel Borkmann 提交于
      Introduced in f9e42b85 ("net: sctp: sideeffect: throw BUG if
      primary_path is NULL"), we intended to find a buggy assoc that's
      part of the assoc hash table with a primary_path that is NULL.
      However, we better remove the BUG_ON for now and find a more
      suitable place to assert for these things as Mark reports that
      this also triggers the bug when duplication cookie processing
      happens, and the assoc is not part of the hash table (so all
      good in this case). Such a situation can for example easily be
      reproduced by:
      
        tc qdisc add dev eth0 root handle 1: prio bands 2 priomap 1 1 1 1 1 1
        tc qdisc add dev eth0 parent 1:2 handle 20: netem loss 20%
        tc filter add dev eth0 protocol ip parent 1: prio 2 u32 match ip \
                  protocol 132 0xff match u8 0x0b 0xff at 32 flowid 1:2
      
      This drops 20% of COOKIE-ACK packets. After some follow-up
      discussion with Vlad we came to the conclusion that for now we
      should still better remove this BUG_ON() assertion, and come up
      with two follow-ups later on, that is, i) find a more suitable
      place for this assertion, and possibly ii) have a special
      allocator/initializer for such kind of temporary assocs.
      Reported-by: NMark Thomas <Mark.Thomas@metaswitch.com>
      Signed-off-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7926c1d5
    • A
      net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0) · f421436a
      Arvid Brodin 提交于
      High-availability Seamless Redundancy ("HSR") provides instant failover
      redundancy for Ethernet networks. It requires a special network topology where
      all nodes are connected in a ring (each node having two physical network
      interfaces). It is suited for applications that demand high availability and
      very short reaction time.
      
      HSR acts on the Ethernet layer, using a registered Ethernet protocol type to
      send special HSR frames in both directions over the ring. The driver creates
      virtual network interfaces that can be used just like any ordinary Linux
      network interface, for IP/TCP/UDP traffic etc. All nodes in the network ring
      must be HSR capable.
      
      This code is a "best effort" to comply with the HSR standard as described in
      IEC 62439-3:2010 (HSRv0).
      Signed-off-by: NArvid Brodin <arvid.brodin@xdin.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f421436a
    • E
      net: extend net_device allocation to vmalloc() · 74d332c1
      Eric Dumazet 提交于
      Joby Poriyath provided a xen-netback patch to reduce the size of
      xenvif structure as some netdev allocation could fail under
      memory pressure/fragmentation.
      
      This patch is handling the problem at the core level, allowing
      any netdev structures to use vmalloc() if kmalloc() failed.
      
      As vmalloc() adds overhead on a critical network path, add __GFP_REPEAT
      to kzalloc() flags to do this fallback only when really needed.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NJoby Poriyath <joby.poriyath@citrix.com>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      74d332c1
    • D
      net: sctp: fix and consolidate SCTP checksumming code · e6d8b64b
      Daniel Borkmann 提交于
      This fixes an outstanding bug found through IPVS, where SCTP packets
      with skb->data_len > 0 (non-linearized) and empty frag_list, but data
      accumulated in frags[] member, are forwarded with incorrect checksum
      letting SCTP initial handshake fail on some systems. Linearizing each
      SCTP skb in IPVS to prevent that would not be a good solution as
      this leads to an additional and unnecessary performance penalty on
      the load-balancer itself for no good reason (as we actually only want
      to update the checksum, and can do that in a different/better way
      presented here).
      
      The actual problem is elsewhere, namely, that SCTP's checksumming
      in sctp_compute_cksum() does not take frags[] into account like
      skb_checksum() does. So while we are fixing this up, we better reuse
      the existing code that we have anyway in __skb_checksum() and use it
      for walking through the data doing checksumming. This will not only
      fix this issue, but also consolidates some SCTP code with core
      sk_buff code, bringing it closer together and removing respectively
      avoiding reimplementation of skb_checksum() for no good reason.
      
      As crc32c() can use hardware implementation within the crypto layer,
      we leave that intact (it wraps around / falls back to e.g. slice-by-8
      algorithm in __crc32c_le() otherwise); plus use the __crc32c_le_combine()
      combinator for crc32c blocks.
      
      Also, we remove all other SCTP checksumming code, so that we only
      have to use sctp_compute_cksum() from now on; for doing that, we need
      to transform SCTP checkumming in output path slightly, and can leave
      the rest intact.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e6d8b64b
    • D
      net: skb_checksum: allow custom update/combine for walking skb · 2817a336
      Daniel Borkmann 提交于
      Currently, skb_checksum walks over 1) linearized, 2) frags[], and
      3) frag_list data and calculats the one's complement, a 32 bit
      result suitable for feeding into itself or csum_tcpudp_magic(),
      but unsuitable for SCTP as we're calculating CRC32c there.
      
      Hence, in order to not re-implement the very same function in
      SCTP (and maybe other protocols) over and over again, use an
      update() + combine() callback internally to allow for walking
      over the skb with different algorithms.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2817a336
    • W
      netfilter: nf_tables: remove duplicated include from nf_tables_ipv4.c · ca0e8bd6
      Wei Yongjun 提交于
      Remove duplicated include.
      Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      ca0e8bd6
    • H
      netfilter: ctnetlink: account both directions in one step · 4542fa47
      Holger Eitzenberger 提交于
      With the intent to dump other accounting data later.
      This patch is a cleanup.
      Signed-off-by: NHolger Eitzenberger <holger@eitzenberger.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      4542fa47
    • H
      netfilter: introduce nf_conn_acct structure · f7b13e43
      Holger Eitzenberger 提交于
      Encapsulate counters for both directions into nf_conn_acct. During
      that process also consistently name pointers to the extend 'acct',
      not 'counters'. This patch is a cleanup.
      Signed-off-by: NHolger Eitzenberger <holger@eitzenberger.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      f7b13e43
  5. 02 11月, 2013 5 次提交
  6. 01 11月, 2013 1 次提交
  7. 31 10月, 2013 7 次提交
  8. 30 10月, 2013 4 次提交
    • Y
      tcp: temporarily disable Fast Open on SYN timeout · c968601d
      Yuchung Cheng 提交于
      Fast Open currently has a fall back feature to address SYN-data being
      dropped but it requires the middle-box to pass on regular SYN retry
      after SYN-data. This is implemented in commit aab48743 ("net-tcp:
      Fast Open client - detecting SYN-data drops")
      
      However some NAT boxes will drop all subsequent packets after first
      SYN-data and blackholes the entire connections.  An example is in
      commit 356d7d88 "netfilter: nf_conntrack: fix tcp_in_window for Fast
      Open".
      
      The sender should note such incidents and fall back to use the regular
      TCP handshake on subsequent attempts temporarily as well: after the
      second SYN timeouts the original Fast Open SYN is most likely lost.
      When such an event recurs Fast Open is disabled based on the number of
      recurrences exponentially.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c968601d
    • D
      net: ipvs: sctp: do not recalc sctp csum when ports didn't change · 97203abe
      Daniel Borkmann 提交于
      Unlike UDP or TCP, we do not take the pseudo-header into
      account in SCTP checksums. So in case port mapping is the
      very same, we do not need to recalculate the whole SCTP
      checksum in software, which is very expensive.
      
      Also, similarly as in TCP, take into account when a private
      helper mangled the packet. In that case, we also need to
      recalculate the checksum even if ports might be same.
      
      Thanks for feedback regarding skb->ip_summed checks from
      Julian Anastasov; here's a discussion on these checks for
      snat and dnat:
      
      * For snat_handler(), we can see CHECKSUM_PARTIAL from
        virtual devices, and from LOCAL_OUT, otherwise it
        should be CHECKSUM_UNNECESSARY. In general, in snat it
        is more complex. skb contains the original route and
        ip_vs_route_me_harder() can change the route after
        snat_handler. So, for locally generated replies from
        local server we can not preserve the CHECKSUM_PARTIAL
        mode. It is an chicken or egg dilemma: snat_handler
        needs the device after rerouting (to check for
        NETIF_F_SCTP_CSUM), while ip_route_me_harder() wants
        the snat_handler() to put the new saddr for proper
        rerouting.
      
      * For dnat_handler(), we should not see CHECKSUM_COMPLETE
        for SCTP, in fact the small set of drivers that support
        SCTP offloading return CHECKSUM_UNNECESSARY on correctly
        received SCTP csum. We can see CHECKSUM_PARTIAL from
        local stack or received from virtual drivers. The idea is
        that SCTP decides to avoid csum calculation if hardware
        supports offloading. IPVS can change the device after
        rerouting to real server but we can preserve the
        CHECKSUM_PARTIAL mode if the new device supports
        offloading too. This works because skb dst is changed
        before dnat_handler and we see the new device. So, checks
        in the 'if' part will decide whether it is ok to keep
        CHECKSUM_PARTIAL for the output. If the packet was with
        CHECKSUM_NONE, hence we deal with unknown checksum. As we
        recalculate the sum for IP header in all cases, it should
        be safe to use CHECKSUM_UNNECESSARY. We can forward wrong
        checksum in this case (without cp->app). In case of
        CHECKSUM_UNNECESSARY, the csum was valid on receive.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      97203abe
    • V
      bridge: pass correct vlan id to multicast code · 06499098
      Vlad Yasevich 提交于
      Currently multicast code attempts to extrace the vlan id from
      the skb even when vlan filtering is disabled.  This can lead
      to mdb entries being created with the wrong vlan id.
      Pass the already extracted vlan id to the multicast
      filtering code to make the correct id is used in
      creation as well as lookup.
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Acked-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06499098
    • M
      net: x25: Fix dead URLs in Kconfig · 706e282b
      Michael Drüing 提交于
      Update the URLs in the Kconfig file to the new pages at sangoma.com and cisco.com
      Signed-off-by: NMichael Drüing <michael@drueing.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      706e282b