1. 11 7月, 2013 2 次提交
  2. 09 7月, 2013 1 次提交
  3. 04 7月, 2013 1 次提交
  4. 03 7月, 2013 1 次提交
    • I
      core/dev: set pkt_type after eth_type_trans() in dev_forward_skb() · 06a23fe3
      Isaku Yamahata 提交于
      The dev_forward_skb() assignment of pkt_type should be done
      after the call to eth_type_trans().
      
      ip-encapsulated packets can be handled by localhost. But skb->pkt_type
      can be PACKET_OTHERHOST when packet comes via veth into ip tunnel device.
      In that case, the packet is dropped by ip_rcv().
      Although this example uses gretap. l2tp-eth also has same issue.
      For l2tp-eth case, add dummy device for ip address and ip l2tp command.
      
      netns A |                     root netns                      | netns B
         veth<->veth=bridge=gretap <-loop back-> gretap=bridge=veth<->veth
      
      arp packet ->
      pkt_type
               BROADCAST------------>ip_rcv()------------------------>
      
                                                                   <- arp reply
                                                                      pkt_type
                                     ip_rcv()<-----------------OTHERHOST
                                     drop
      
      sample operations
        ip link add tapa type gretap remote 172.17.107.4 local 172.17.107.3
        ip link add tapb type gretap remote 172.17.107.3 local 172.17.107.4
        ip link set tapa up
        ip link set tapb up
        ip address add 172.17.107.3 dev tapa
        ip address add 172.17.107.4 dev tapb
        ip route get 172.17.107.3
        > local 172.17.107.3 dev lo  src 172.17.107.3
        >    cache <local>
        ip route get 172.17.107.4
        > local 172.17.107.4 dev lo  src 172.17.107.4
        >    cache <local>
        ip link add vetha type veth peer name vetha-peer
        ip link add vethb type veth peer name vethb-peer
        brctl addbr bra
        brctl addbr brb
        brctl addif bra tapa
        brctl addif bra vetha-peer
        brctl addif brb tapb
        brctl addif brb vethb-peer
        brctl show
        > bridge name     bridge id               STP enabled     interfaces
        > bra             8000.6ea21e758ff1       no              tapa
        >                                                         vetha-peer
        > brb             8000.420020eb92d5       no              tapb
        >                                                         vethb-peer
        ip link set vetha-peer up
        ip link set vethb-peer up
        ip link set bra up
        ip link set brb up
        ip netns add a
        ip netns add b
        ip link set vetha netns a
        ip link set vethb netns b
        ip netns exec a ip address add 10.0.0.3/24 dev vetha
        ip netns exec b ip address add 10.0.0.4/24 dev vethb
        ip netns exec a ip link set vetha up
        ip netns exec b ip link set vethb up
        ip netns exec a arping -I vetha 10.0.0.4
        ARPING 10.0.0.4 from 10.0.0.3 vetha
        ^CSent 2 probes (2 broadcast(s))
        Received 0 response(s)
      
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Hong Zhiguo <honkiko@gmail.com>
      Cc: Rami Rosen <ramirose@gmail.com>
      Cc: Tom Parkin <tparkin@katalix.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Pravin B Shelar <pshelar@nicira.com>
      Cc: Jesse Gross <jesse@nicira.com>
      Cc: dev@openvswitch.org
      Signed-off-by: NIsaku Yamahata <yamahata@valinux.co.jp>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06a23fe3
  5. 02 7月, 2013 2 次提交
  6. 28 6月, 2013 1 次提交
  7. 27 6月, 2013 1 次提交
    • N
      net: fix kernel deadlock with interface rename and netdev name retrieval. · 5dbe7c17
      Nicolas Schichan 提交于
      When the kernel (compiled with CONFIG_PREEMPT=n) is performing the
      rename of a network interface, it can end up waiting for a workqueue
      to complete. If userland is able to invoke a SIOCGIFNAME ioctl or a
      SO_BINDTODEVICE getsockopt in between, the kernel will deadlock due to
      the fact that read_secklock_begin() will spin forever waiting for the
      writer process (the one doing the interface rename) to update the
      devnet_rename_seq sequence.
      
      This patch fixes the problem by adding a helper (netdev_get_name())
      and using it in the code handling the SIOCGIFNAME ioctl and
      SO_BINDTODEVICE setsockopt.
      
      The netdev_get_name() helper uses raw_seqcount_begin() to avoid
      spinning forever, waiting for devnet_rename_seq->sequence to become
      even. cond_resched() is used in the contended case, before retrying
      the access to give the writer process a chance to finish.
      
      The use of raw_seqcount_begin() will incur some unneeded work in the
      reader process in the contended case, but this is better than
      deadlocking the system.
      Signed-off-by: NNicolas Schichan <nschichan@freebox.fr>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5dbe7c17
  8. 26 6月, 2013 3 次提交
  9. 24 6月, 2013 2 次提交
  10. 20 6月, 2013 3 次提交
  11. 18 6月, 2013 3 次提交
  12. 14 6月, 2013 2 次提交
    • R
      net/core: Add VF link state control · 1d8faf48
      Rony Efraim 提交于
      Add netlink directives and ndo entry to allow for controling
      VF link, which can be in one of three states:
      
      Auto - VF link state reflects the PF link state (default)
      
      Up - VF link state is up, traffic from VF to VF works even if
      the actual PF link is down
      
      Down - VF link state is down, no traffic from/to this VF, can be of
      use while configuring the VF
      Signed-off-by: NRony Efraim <ronye@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d8faf48
    • W
      net-rps: fixes for rps flow limit · 5f121b9a
      Willem de Bruijn 提交于
      Caught by sparse:
      - __rcu: missing annotation to sd->flow_limit
      - __user: direct access in cpumask_scnprintf
      
      Also
      - add endline character when printing bitmap if room in buffer
      - avoid bucket overflow by reducing FLOW_LIMIT_HISTORY
      
      The last item warrants some explanation. The hashtable buckets are
      subject to overflow if FLOW_LIMIT_HISTORY is larger than or equal
      to bucket size, since all packets may end up in a single bucket. The
      current (rather arbitrary) history value of 256 happens to match the
      buffer size (u8).
      
      As a result, with a single flow, the first 128 packets are accepted
      (correct), the second 128 packets dropped (correct) and then the
      history[] array has filled, so that each subsequent new packet
      causes an increment in the bucket for new_flow plus a decrement
      for old_flow: a steady state.
      
      This is fine if packets are dropped, as the steady state goes away
      as soon as a mix of traffic reappears. But, because the 256th packet
      overflowed the bucket to 0: no packets are dropped.
      
      Instead of explicitly adding an overflow check, this patch changes
      FLOW_LIMIT_HISTORY to never be able to overflow a single bucket.
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      (first item)
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5f121b9a
  13. 13 6月, 2013 2 次提交
  14. 12 6月, 2013 1 次提交
  15. 11 6月, 2013 6 次提交
  16. 06 6月, 2013 1 次提交
  17. 05 6月, 2013 5 次提交
  18. 01 6月, 2013 3 次提交
    • C
      net: clean up skb headers code · 35d04610
      Cong Wang 提交于
      commit 1a37e412 (net: Use 16bits for *_headers
      fields of struct skbuff) converts skb->*_header to u16,
      some #if NET_SKBUFF_DATA_USES_OFFSET are now useless,
      and to be safe, we could just use "X = (typeof(X)) ~0U;"
      as suggested by David.
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Simon Horman <horms@verge.net.au>
      Signed-off-by: NCong Wang <amwang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35d04610
    • J
      net/core: dev_mc_sync_multiple calls wrong helper · b190a508
      Jay Vosburgh 提交于
      The dev_mc_sync_multiple function is currently calling
      __hw_addr_sync, and not __hw_addr_sync_multiple.  This will result in
      addresses only being synced to the first device from the set.
      
      	Corrected by calling the _multiple variant.
      Signed-off-by: NJay Vosburgh <fubar@us.ibm.com>
      Reviewed-by: NVlad Yasevich <vyasevic@redhat.com>
      Tested-by: NShawn Bohrer <sbohrer@rgmadvisors.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b190a508
    • J
      net/core: __hw_addr_sync_one / _multiple broken · 29ca2f8f
      Jay Vosburgh 提交于
      Currently, __hw_addr_sync_one is called in a loop by
      __hw_addr_sync_multiple to sync each of a "from" device's hw addresses
      to a "to" device.  __hw_addr_sync_one calls __hw_addr_add_ex to attempt
      to add each address.  __hw_addr_add_ex is called with global=false, and
      sync=true.
      
      	__hw_addr_add_ex checks to see if the new address matches an
      address already on the list.  If so, it tests global and sync.  In this
      case, sync=true, and it then checks if the address is already synced,
      and if so, returns 0.
      
      	This 0 return causes __hw_addr_sync_one to increment the sync_cnt
      and refcount for the "from" list's address entry, even though the address
      is already synced and has a reference and sync_cnt.  This will cause
      the sync_cnt and refcount to increment without bound every time an
      addresses is added to the "from" device and synced to the "to" device.
      
      	The fix here has two parts:
      
      	First, when __hw_addr_add_ex finds the address already exists
      and is synced, return -EEXIST instead of 0.
      
      	Second, __hw_addr_sync_one checks the error return for -EEXIST,
      and if so, it (a) does not add a refcount/sync_cnt, and (b) returns 0
      itself so that __hw_addr_sync_multiple will not return an error.
      Signed-off-by: NJay Vosburgh <fubar@us.ibm.com>
      Reviewed-by: NVlad Yasevich <vyasevic@redhat.com>
      Tested-by: NShawn Bohrer <sbohrer@rgmadvisors.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      29ca2f8f