1. 19 7月, 2013 1 次提交
    • E
      vlan: mask vlan prio bits · d4b812de
      Eric Dumazet 提交于
      In commit 48cc32d3
      ("vlan: don't deliver frames for unknown vlans to protocols")
      Florian made sure we set pkt_type to PACKET_OTHERHOST
      if the vlan id is set and we could find a vlan device for this
      particular id.
      
      But we also have a problem if prio bits are set.
      
      Steinar reported an issue on a router receiving IPv6 frames with a
      vlan tag of 4000 (id 0, prio 2), and tunneled into a sit device,
      because skb->vlan_tci is set.
      
      Forwarded frame is completely corrupted : We can see (8100:4000)
      being inserted in the middle of IPv6 source address :
      
      16:48:00.780413 IP6 2001:16d8:8100:4000:ee1c:0:9d9:bc87 >
      9f94:4d95:2001:67c:29f4::: ICMP6, unknown icmp6 type (0), length 64
             0x0000:  0000 0029 8000 c7c3 7103 0001 a0ae e651
             0x0010:  0000 0000 ccce 0b00 0000 0000 1011 1213
             0x0020:  1415 1617 1819 1a1b 1c1d 1e1f 2021 2223
             0x0030:  2425 2627 2829 2a2b 2c2d 2e2f 3031 3233
      
      It seems we are not really ready to properly cope with this right now.
      
      We can probably do better in future kernels :
      vlan_get_ingress_priority() should be a netdev property instead of
      a per vlan_dev one.
      
      For stable kernels, lets clear vlan_tci to fix the bugs.
      Reported-by: NSteinar H. Gunderson <sesse@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4b812de
  2. 17 7月, 2013 1 次提交
  3. 15 7月, 2013 1 次提交
    • P
      net: delete __cpuinit usage from all net files · 013dbb32
      Paul Gortmaker 提交于
      The __cpuinit type of throwaway sections might have made sense
      some time ago when RAM was more constrained, but now the savings
      do not offset the cost and complications.  For example, the fix in
      commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
      is a good example of the nasty type of bugs that can be created
      with improper use of the various __init prefixes.
      
      After a discussion on LKML[1] it was decided that cpuinit should go
      the way of devinit and be phased out.  Once all the users are gone,
      we can then finally remove the macros themselves from linux/init.h.
      
      This removes all the net/* uses of the __cpuinit macros
      from all C files.
      
      [1] https://lkml.org/lkml/2013/5/20/589
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: netdev@vger.kernel.org
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      013dbb32
  4. 13 7月, 2013 1 次提交
  5. 12 7月, 2013 1 次提交
    • A
      gso: Update tunnel segmentation to support Tx checksum offload · cdbaa0bb
      Alexander Duyck 提交于
      This change makes it so that the GRE and VXLAN tunnels can make use of Tx
      checksum offload support provided by some drivers via the hw_enc_features.
      Without this fix enabling GSO means sacrificing Tx checksum offload and
      this actually leads to a performance regression as shown below:
      
                  Utilization
                  Send
      Throughput  local         GSO
      10^6bits/s  % S           state
        6276.51   8.39          enabled
        7123.52   8.42          disabled
      
      To resolve this it was necessary to address two items.  First
      netif_skb_features needed to be updated so that it would correctly handle
      the Trans Ether Bridging protocol without impacting the need to check for
      Q-in-Q tagging.  To do this it was necessary to update harmonize_features
      so that it used skb_network_protocol instead of just using the outer
      protocol.
      
      Second it was necessary to update the GRE and UDP tunnel segmentation
      offloads so that they would reset the encapsulation bit and inner header
      offsets after the offload was complete.
      
      As a result of this change I have seen the following results on a interface
      with Tx checksum enabled for encapsulated frames:
      
                  Utilization
                  Send
      Throughput  local         GSO
      10^6bits/s  % S           state
        7123.52   8.42          disabled
        8321.75   5.43          enabled
      
      v2: Instead of replacing refrence to skb->protocol with
          skb_network_protocol just replace the protocol reference in
          harmonize_features to allow for double VLAN tag checks.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cdbaa0bb
  6. 11 7月, 2013 2 次提交
  7. 09 7月, 2013 1 次提交
  8. 04 7月, 2013 1 次提交
  9. 03 7月, 2013 1 次提交
    • I
      core/dev: set pkt_type after eth_type_trans() in dev_forward_skb() · 06a23fe3
      Isaku Yamahata 提交于
      The dev_forward_skb() assignment of pkt_type should be done
      after the call to eth_type_trans().
      
      ip-encapsulated packets can be handled by localhost. But skb->pkt_type
      can be PACKET_OTHERHOST when packet comes via veth into ip tunnel device.
      In that case, the packet is dropped by ip_rcv().
      Although this example uses gretap. l2tp-eth also has same issue.
      For l2tp-eth case, add dummy device for ip address and ip l2tp command.
      
      netns A |                     root netns                      | netns B
         veth<->veth=bridge=gretap <-loop back-> gretap=bridge=veth<->veth
      
      arp packet ->
      pkt_type
               BROADCAST------------>ip_rcv()------------------------>
      
                                                                   <- arp reply
                                                                      pkt_type
                                     ip_rcv()<-----------------OTHERHOST
                                     drop
      
      sample operations
        ip link add tapa type gretap remote 172.17.107.4 local 172.17.107.3
        ip link add tapb type gretap remote 172.17.107.3 local 172.17.107.4
        ip link set tapa up
        ip link set tapb up
        ip address add 172.17.107.3 dev tapa
        ip address add 172.17.107.4 dev tapb
        ip route get 172.17.107.3
        > local 172.17.107.3 dev lo  src 172.17.107.3
        >    cache <local>
        ip route get 172.17.107.4
        > local 172.17.107.4 dev lo  src 172.17.107.4
        >    cache <local>
        ip link add vetha type veth peer name vetha-peer
        ip link add vethb type veth peer name vethb-peer
        brctl addbr bra
        brctl addbr brb
        brctl addif bra tapa
        brctl addif bra vetha-peer
        brctl addif brb tapb
        brctl addif brb vethb-peer
        brctl show
        > bridge name     bridge id               STP enabled     interfaces
        > bra             8000.6ea21e758ff1       no              tapa
        >                                                         vetha-peer
        > brb             8000.420020eb92d5       no              tapb
        >                                                         vethb-peer
        ip link set vetha-peer up
        ip link set vethb-peer up
        ip link set bra up
        ip link set brb up
        ip netns add a
        ip netns add b
        ip link set vetha netns a
        ip link set vethb netns b
        ip netns exec a ip address add 10.0.0.3/24 dev vetha
        ip netns exec b ip address add 10.0.0.4/24 dev vethb
        ip netns exec a ip link set vetha up
        ip netns exec b ip link set vethb up
        ip netns exec a arping -I vetha 10.0.0.4
        ARPING 10.0.0.4 from 10.0.0.3 vetha
        ^CSent 2 probes (2 broadcast(s))
        Received 0 response(s)
      
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Hong Zhiguo <honkiko@gmail.com>
      Cc: Rami Rosen <ramirose@gmail.com>
      Cc: Tom Parkin <tparkin@katalix.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Pravin B Shelar <pshelar@nicira.com>
      Cc: Jesse Gross <jesse@nicira.com>
      Cc: dev@openvswitch.org
      Signed-off-by: NIsaku Yamahata <yamahata@valinux.co.jp>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06a23fe3
  10. 02 7月, 2013 2 次提交
  11. 28 6月, 2013 1 次提交
  12. 27 6月, 2013 1 次提交
    • N
      net: fix kernel deadlock with interface rename and netdev name retrieval. · 5dbe7c17
      Nicolas Schichan 提交于
      When the kernel (compiled with CONFIG_PREEMPT=n) is performing the
      rename of a network interface, it can end up waiting for a workqueue
      to complete. If userland is able to invoke a SIOCGIFNAME ioctl or a
      SO_BINDTODEVICE getsockopt in between, the kernel will deadlock due to
      the fact that read_secklock_begin() will spin forever waiting for the
      writer process (the one doing the interface rename) to update the
      devnet_rename_seq sequence.
      
      This patch fixes the problem by adding a helper (netdev_get_name())
      and using it in the code handling the SIOCGIFNAME ioctl and
      SO_BINDTODEVICE setsockopt.
      
      The netdev_get_name() helper uses raw_seqcount_begin() to avoid
      spinning forever, waiting for devnet_rename_seq->sequence to become
      even. cond_resched() is used in the contended case, before retrying
      the access to give the writer process a chance to finish.
      
      The use of raw_seqcount_begin() will incur some unneeded work in the
      reader process in the contended case, but this is better than
      deadlocking the system.
      Signed-off-by: NNicolas Schichan <nschichan@freebox.fr>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5dbe7c17
  13. 26 6月, 2013 3 次提交
  14. 24 6月, 2013 2 次提交
  15. 20 6月, 2013 3 次提交
  16. 18 6月, 2013 3 次提交
  17. 14 6月, 2013 2 次提交
    • R
      net/core: Add VF link state control · 1d8faf48
      Rony Efraim 提交于
      Add netlink directives and ndo entry to allow for controling
      VF link, which can be in one of three states:
      
      Auto - VF link state reflects the PF link state (default)
      
      Up - VF link state is up, traffic from VF to VF works even if
      the actual PF link is down
      
      Down - VF link state is down, no traffic from/to this VF, can be of
      use while configuring the VF
      Signed-off-by: NRony Efraim <ronye@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d8faf48
    • W
      net-rps: fixes for rps flow limit · 5f121b9a
      Willem de Bruijn 提交于
      Caught by sparse:
      - __rcu: missing annotation to sd->flow_limit
      - __user: direct access in cpumask_scnprintf
      
      Also
      - add endline character when printing bitmap if room in buffer
      - avoid bucket overflow by reducing FLOW_LIMIT_HISTORY
      
      The last item warrants some explanation. The hashtable buckets are
      subject to overflow if FLOW_LIMIT_HISTORY is larger than or equal
      to bucket size, since all packets may end up in a single bucket. The
      current (rather arbitrary) history value of 256 happens to match the
      buffer size (u8).
      
      As a result, with a single flow, the first 128 packets are accepted
      (correct), the second 128 packets dropped (correct) and then the
      history[] array has filled, so that each subsequent new packet
      causes an increment in the bucket for new_flow plus a decrement
      for old_flow: a steady state.
      
      This is fine if packets are dropped, as the steady state goes away
      as soon as a mix of traffic reappears. But, because the 256th packet
      overflowed the bucket to 0: no packets are dropped.
      
      Instead of explicitly adding an overflow check, this patch changes
      FLOW_LIMIT_HISTORY to never be able to overflow a single bucket.
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      (first item)
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5f121b9a
  18. 13 6月, 2013 2 次提交
  19. 12 6月, 2013 1 次提交
  20. 11 6月, 2013 6 次提交
  21. 06 6月, 2013 1 次提交
  22. 05 6月, 2013 3 次提交