1. 30 8月, 2013 4 次提交
    • V
      net: add netdev_upper_get_next_dev_rcu(dev, iter) · 48311f46
      Veaceslav Falico 提交于
      This function returns the next dev in the dev->upper_dev_list after the
      struct list_head **iter position, and updates *iter accordingly. Returns
      NULL if there are no devices left.
      
      Caller must hold RCU read lock.
      
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      CC: Alexander Duyck <alexander.h.duyck@intel.com>
      CC: Cong Wang <amwang@redhat.com>
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48311f46
    • V
      net: remove search_list from netdev_adjacent · 620f3186
      Veaceslav Falico 提交于
      We already don't need it cause we see every upper/lower device in the list
      already.
      
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      CC: Alexander Duyck <alexander.h.duyck@intel.com>
      CC: Cong Wang <amwang@redhat.com>
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      620f3186
    • V
      net: add lower_dev_list to net_device and make a full mesh · 5d261913
      Veaceslav Falico 提交于
      This patch adds lower_dev_list list_head to net_device, which is the same
      as upper_dev_list, only for lower devices, and begins to use it in the same
      way as the upper list.
      
      It also changes the way the whole adjacent device lists work - now they
      contain *all* of upper/lower devices, not only the first level. The first
      level devices are distinguished by the bool neighbour field in
      netdev_adjacent, also added by this patch.
      
      There are cases when a device can be added several times to the adjacent
      list, the simplest would be:
      
           /---- eth0.10 ---\
      eth0-		       --- bond0
           \---- eth0.20 ---/
      
      where both bond0 and eth0 'see' each other in the adjacent lists two times.
      To avoid duplication of netdev_adjacent structures ref_nr is being kept as
      the number of times the device was added to the list.
      
      The 'full view' is achieved by adding, on link creation, all of the
      upper_dev's upper_dev_list devices as upper devices to all of the
      lower_dev's lower_dev_list devices (and to the lower_dev itself), and vice
      versa. On unlink they are removed using the same logic.
      
      I've tested it with thousands vlans/bonds/bridges, everything works ok and
      no observable lags even on a huge number of interfaces.
      
      Memory footprint for 128 devices interconnected with each other via both
      upper and lower (which is impossible, but for the comparison) lists would be:
      
      128*128*2*sizeof(netdev_adjacent) = 1.5MB
      
      but in the real world we usualy have at most several devices with slaves
      and a lot of vlans, so the footprint will be much lower.
      
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      CC: Alexander Duyck <alexander.h.duyck@intel.com>
      CC: Cong Wang <amwang@redhat.com>
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5d261913
    • V
      net: rename netdev_upper to netdev_adjacent · aa9d8560
      Veaceslav Falico 提交于
      Rename the structure to reflect the upcoming addition of lower_dev_list.
      
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      CC: Alexander Duyck <alexander.h.duyck@intel.com>
      CC: Cong Wang <amwang@redhat.com>
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa9d8560
  2. 15 8月, 2013 1 次提交
  3. 31 7月, 2013 1 次提交
  4. 25 7月, 2013 1 次提交
  5. 19 7月, 2013 1 次提交
    • E
      vlan: mask vlan prio bits · d4b812de
      Eric Dumazet 提交于
      In commit 48cc32d3
      ("vlan: don't deliver frames for unknown vlans to protocols")
      Florian made sure we set pkt_type to PACKET_OTHERHOST
      if the vlan id is set and we could find a vlan device for this
      particular id.
      
      But we also have a problem if prio bits are set.
      
      Steinar reported an issue on a router receiving IPv6 frames with a
      vlan tag of 4000 (id 0, prio 2), and tunneled into a sit device,
      because skb->vlan_tci is set.
      
      Forwarded frame is completely corrupted : We can see (8100:4000)
      being inserted in the middle of IPv6 source address :
      
      16:48:00.780413 IP6 2001:16d8:8100:4000:ee1c:0:9d9:bc87 >
      9f94:4d95:2001:67c:29f4::: ICMP6, unknown icmp6 type (0), length 64
             0x0000:  0000 0029 8000 c7c3 7103 0001 a0ae e651
             0x0010:  0000 0000 ccce 0b00 0000 0000 1011 1213
             0x0020:  1415 1617 1819 1a1b 1c1d 1e1f 2021 2223
             0x0030:  2425 2627 2829 2a2b 2c2d 2e2f 3031 3233
      
      It seems we are not really ready to properly cope with this right now.
      
      We can probably do better in future kernels :
      vlan_get_ingress_priority() should be a netdev property instead of
      a per vlan_dev one.
      
      For stable kernels, lets clear vlan_tci to fix the bugs.
      Reported-by: NSteinar H. Gunderson <sesse@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4b812de
  6. 12 7月, 2013 1 次提交
    • A
      gso: Update tunnel segmentation to support Tx checksum offload · cdbaa0bb
      Alexander Duyck 提交于
      This change makes it so that the GRE and VXLAN tunnels can make use of Tx
      checksum offload support provided by some drivers via the hw_enc_features.
      Without this fix enabling GSO means sacrificing Tx checksum offload and
      this actually leads to a performance regression as shown below:
      
                  Utilization
                  Send
      Throughput  local         GSO
      10^6bits/s  % S           state
        6276.51   8.39          enabled
        7123.52   8.42          disabled
      
      To resolve this it was necessary to address two items.  First
      netif_skb_features needed to be updated so that it would correctly handle
      the Trans Ether Bridging protocol without impacting the need to check for
      Q-in-Q tagging.  To do this it was necessary to update harmonize_features
      so that it used skb_network_protocol instead of just using the outer
      protocol.
      
      Second it was necessary to update the GRE and UDP tunnel segmentation
      offloads so that they would reset the encapsulation bit and inner header
      offsets after the offload was complete.
      
      As a result of this change I have seen the following results on a interface
      with Tx checksum enabled for encapsulated frames:
      
                  Utilization
                  Send
      Throughput  local         GSO
      10^6bits/s  % S           state
        7123.52   8.42          disabled
        8321.75   5.43          enabled
      
      v2: Instead of replacing refrence to skb->protocol with
          skb_network_protocol just replace the protocol reference in
          harmonize_features to allow for double VLAN tag checks.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cdbaa0bb
  7. 03 7月, 2013 1 次提交
    • I
      core/dev: set pkt_type after eth_type_trans() in dev_forward_skb() · 06a23fe3
      Isaku Yamahata 提交于
      The dev_forward_skb() assignment of pkt_type should be done
      after the call to eth_type_trans().
      
      ip-encapsulated packets can be handled by localhost. But skb->pkt_type
      can be PACKET_OTHERHOST when packet comes via veth into ip tunnel device.
      In that case, the packet is dropped by ip_rcv().
      Although this example uses gretap. l2tp-eth also has same issue.
      For l2tp-eth case, add dummy device for ip address and ip l2tp command.
      
      netns A |                     root netns                      | netns B
         veth<->veth=bridge=gretap <-loop back-> gretap=bridge=veth<->veth
      
      arp packet ->
      pkt_type
               BROADCAST------------>ip_rcv()------------------------>
      
                                                                   <- arp reply
                                                                      pkt_type
                                     ip_rcv()<-----------------OTHERHOST
                                     drop
      
      sample operations
        ip link add tapa type gretap remote 172.17.107.4 local 172.17.107.3
        ip link add tapb type gretap remote 172.17.107.3 local 172.17.107.4
        ip link set tapa up
        ip link set tapb up
        ip address add 172.17.107.3 dev tapa
        ip address add 172.17.107.4 dev tapb
        ip route get 172.17.107.3
        > local 172.17.107.3 dev lo  src 172.17.107.3
        >    cache <local>
        ip route get 172.17.107.4
        > local 172.17.107.4 dev lo  src 172.17.107.4
        >    cache <local>
        ip link add vetha type veth peer name vetha-peer
        ip link add vethb type veth peer name vethb-peer
        brctl addbr bra
        brctl addbr brb
        brctl addif bra tapa
        brctl addif bra vetha-peer
        brctl addif brb tapb
        brctl addif brb vethb-peer
        brctl show
        > bridge name     bridge id               STP enabled     interfaces
        > bra             8000.6ea21e758ff1       no              tapa
        >                                                         vetha-peer
        > brb             8000.420020eb92d5       no              tapb
        >                                                         vethb-peer
        ip link set vetha-peer up
        ip link set vethb-peer up
        ip link set bra up
        ip link set brb up
        ip netns add a
        ip netns add b
        ip link set vetha netns a
        ip link set vethb netns b
        ip netns exec a ip address add 10.0.0.3/24 dev vetha
        ip netns exec b ip address add 10.0.0.4/24 dev vethb
        ip netns exec a ip link set vetha up
        ip netns exec b ip link set vethb up
        ip netns exec a arping -I vetha 10.0.0.4
        ARPING 10.0.0.4 from 10.0.0.3 vetha
        ^CSent 2 probes (2 broadcast(s))
        Received 0 response(s)
      
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Hong Zhiguo <honkiko@gmail.com>
      Cc: Rami Rosen <ramirose@gmail.com>
      Cc: Tom Parkin <tparkin@katalix.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Pravin B Shelar <pshelar@nicira.com>
      Cc: Jesse Gross <jesse@nicira.com>
      Cc: dev@openvswitch.org
      Signed-off-by: NIsaku Yamahata <yamahata@valinux.co.jp>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06a23fe3
  8. 28 6月, 2013 1 次提交
  9. 27 6月, 2013 1 次提交
    • N
      net: fix kernel deadlock with interface rename and netdev name retrieval. · 5dbe7c17
      Nicolas Schichan 提交于
      When the kernel (compiled with CONFIG_PREEMPT=n) is performing the
      rename of a network interface, it can end up waiting for a workqueue
      to complete. If userland is able to invoke a SIOCGIFNAME ioctl or a
      SO_BINDTODEVICE getsockopt in between, the kernel will deadlock due to
      the fact that read_secklock_begin() will spin forever waiting for the
      writer process (the one doing the interface rename) to update the
      devnet_rename_seq sequence.
      
      This patch fixes the problem by adding a helper (netdev_get_name())
      and using it in the code handling the SIOCGIFNAME ioctl and
      SO_BINDTODEVICE setsockopt.
      
      The netdev_get_name() helper uses raw_seqcount_begin() to avoid
      spinning forever, waiting for devnet_rename_seq->sequence to become
      even. cond_resched() is used in the contended case, before retrying
      the access to give the writer process a chance to finish.
      
      The use of raw_seqcount_begin() will incur some unneeded work in the
      reader process in the contended case, but this is better than
      deadlocking the system.
      Signed-off-by: NNicolas Schichan <nschichan@freebox.fr>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5dbe7c17
  10. 24 6月, 2013 1 次提交
  11. 11 6月, 2013 1 次提交
  12. 05 6月, 2013 1 次提交
  13. 29 5月, 2013 4 次提交
  14. 28 5月, 2013 2 次提交
    • D
      netpoll: remove return value from netpoll_rx_disable() · da6e378b
      dingtianhong 提交于
      The netpoll_rx_disable() will always return 0, it is no use and looks wordy,
      so remove the unnecessary code and get rid of it in _dev_open and _dev_close.
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      da6e378b
    • S
      MPLS: Add limited GSO support · 0d89d203
      Simon Horman 提交于
      In the case where a non-MPLS packet is received and an MPLS stack is
      added it may well be the case that the original skb is GSO but the
      NIC used for transmit does not support GSO of MPLS packets.
      
      The aim of this code is to provide GSO in software for MPLS packets
      whose skbs are GSO.
      
      SKB Usage:
      
      When an implementation adds an MPLS stack to a non-MPLS packet it should do
      the following to skb metadata:
      
      * Set skb->inner_protocol to the old non-MPLS ethertype of the packet.
        skb->inner_protocol is added by this patch.
      
      * Set skb->protocol to the new MPLS ethertype of the packet.
      
      * Set skb->network_header to correspond to the
        end of the L3 header, including the MPLS label stack.
      
      I have posted a patch, "[PATCH v3.29] datapath: Add basic MPLS support to
      kernel" which adds MPLS support to the kernel datapath of Open vSwtich.
      That patch sets the above requirements in datapath/actions.c:push_mpls()
      and was used to exercise this code.  The datapath patch is against the Open
      vSwtich tree but it is intended that it be added to the Open vSwtich code
      present in the mainline Linux kernel at some point.
      
      Features:
      
      I believe that the approach that I have taken is at least partially
      consistent with the handling of other protocols.  Jesse, I understand that
      you have some ideas here.  I am more than happy to change my implementation.
      
      This patch adds dev->mpls_features which may be used by devices
      to advertise features supported for MPLS packets.
      
      A new NETIF_F_MPLS_GSO feature is added for devices which support
      hardware MPLS GSO offload.  Currently no devices support this
      and MPLS GSO always falls back to software.
      
      Alternate Implementation:
      
      One possible alternate implementation is to teach netif_skb_features()
      and skb_network_protocol() about MPLS, in a similar way to their
      understanding of VLANs. I believe this would avoid the need
      for net/mpls/mpls_gso.c and in particular the calls to
      __skb_push() and __skb_push() in mpls_gso_segment().
      
      I have decided on the implementation in this patch as it should
      not introduce any overhead in the case where mpls_gso is not compiled
      into the kernel or inserted as a module.
      
      MPLS GSO suggested by Jesse Gross.
      Based in part on "v4 GRE: Add TCP segmentation offload for GRE"
      by Pravin B Shelar.
      
      Cc: Jesse Gross <jesse@nicira.com>
      Cc: Pravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d89d203
  15. 26 5月, 2013 1 次提交
  16. 21 5月, 2013 1 次提交
    • W
      rps: selective flow shedding during softnet overflow · 99bbc707
      Willem de Bruijn 提交于
      A cpu executing the network receive path sheds packets when its input
      queue grows to netdev_max_backlog. A single high rate flow (such as a
      spoofed source DoS) can exceed a single cpu processing rate and will
      degrade throughput of other flows hashed onto the same cpu.
      
      This patch adds a more fine grained hashtable. If the netdev backlog
      is above a threshold, IRQ cpus track the ratio of total traffic of
      each flow (using 4096 buckets, configurable). The ratio is measured
      by counting the number of packets per flow over the last 256 packets
      from the source cpu. Any flow that occupies a large fraction of this
      (set at 50%) will see packet drop while above the threshold.
      
      Tested:
      Setup is a muli-threaded UDP echo server with network rx IRQ on cpu0,
      kernel receive (RPS) on cpu0 and application threads on cpus 2--7
      each handling 20k req/s. Throughput halves when hit with a 400 kpps
      antagonist storm. With this patch applied, antagonist overload is
      dropped and the server processes its complete load.
      
      The patch is effective when kernel receive processing is the
      bottleneck. The above RPS scenario is a extreme, but the same is
      reached with RFS and sufficient kernel processing (iptables, packet
      socket tap, ..).
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      99bbc707
  17. 18 5月, 2013 1 次提交
  18. 09 5月, 2013 1 次提交
  19. 03 5月, 2013 1 次提交
  20. 30 4月, 2013 1 次提交
  21. 25 4月, 2013 1 次提交
  22. 23 4月, 2013 1 次提交
  23. 20 4月, 2013 4 次提交
    • B
      net: rate-limit warn-bad-offload splats. · c846ad9b
      Ben Greear 提交于
      If one does do something unfortunate and allow a
      bad offload bug into the kernel, this the
      skb_warn_bad_offload can effectively live-lock the
      system, filling the logs with the same error over
      and over.
      
      Add rate limitation to this so that box remains otherwise
      functional in this case.
      Signed-off-by: NBen Greear <greearb@candelatech.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c846ad9b
    • P
      net: vlan: add 802.1ad support · 8ad227ff
      Patrick McHardy 提交于
      Add support for 802.1ad VLAN devices. This mainly consists of checking for
      ETH_P_8021AD in addition to ETH_P_8021Q in a couple of places and check
      offloading capabilities based on the used protocol.
      
      Configuration is done using "ip link":
      
      # ip link add link eth0 eth0.1000 \
      	type vlan proto 802.1ad id 1000
      # ip link add link eth0.1000 eth0.1000.1000 \
      	type vlan proto 802.1q id 1000
      
      52:54:00:12:34:56 > 92:b1:54:28:e4:8c, ethertype 802.1Q (0x8100), length 106: vlan 1000, p 0, ethertype 802.1Q, vlan 1000, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto ICMP (1), length 84)
          20.1.0.2 > 20.1.0.1: ICMP echo request, id 3003, seq 8, length 64
      92:b1:54:28:e4:8c > 52:54:00:12:34:56, ethertype 802.1Q-QinQ (0x88a8), length 106: vlan 1000, p 0, ethertype 802.1Q, vlan 1000, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 47944, offset 0, flags [none], proto ICMP (1), length 84)
          20.1.0.1 > 20.1.0.2: ICMP echo reply, id 3003, seq 8, length 64
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ad227ff
    • P
      net: vlan: add protocol argument to packet tagging functions · 86a9bad3
      Patrick McHardy 提交于
      Add a protocol argument to the VLAN packet tagging functions. In case of HW
      tagging, we need that protocol available in the ndo_start_xmit functions,
      so it is stored in a new field in the skb. The new field fits into a hole
      (on 64 bit) and doesn't increase the sks's size.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86a9bad3
    • P
      net: vlan: rename NETIF_F_HW_VLAN_* feature flags to NETIF_F_HW_VLAN_CTAG_* · f646968f
      Patrick McHardy 提交于
      Rename the hardware VLAN acceleration features to include "CTAG" to indicate
      that they only support CTAGs. Follow up patches will introduce 802.1ad
      server provider tagging (STAGs) and require the distinction for hardware not
      supporting acclerating both.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f646968f
  24. 06 4月, 2013 1 次提交
    • P
      netfilter: don't reset nf_trace in nf_reset() · 124dff01
      Patrick McHardy 提交于
      Commit 130549fe ("netfilter: reset nf_trace in nf_reset") added code
      to reset nf_trace in nf_reset(). This is wrong and unnecessary.
      
      nf_reset() is used in the following cases:
      
      - when passing packets up the the socket layer, at which point we want to
        release all netfilter references that might keep modules pinned while
        the packet is queued. nf_trace doesn't matter anymore at this point.
      
      - when encapsulating or decapsulating IPsec packets. We want to continue
        tracing these packets after IPsec processing.
      
      - when passing packets through virtual network devices. Only devices on
        that encapsulate in IPv4/v6 matter since otherwise nf_trace is not
        used anymore. Its not entirely clear whether those packets should
        be traced after that, however we've always done that.
      
      - when passing packets through virtual network devices that make the
        packet cross network namespace boundaries. This is the only cases
        where we clearly want to reset nf_trace and is also what the
        original patch intended to fix.
      
      Add a new function nf_reset_trace() and use it in dev_forward_skb() to
      fix this properly.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      124dff01
  25. 30 3月, 2013 2 次提交
    • E
      net: add a synchronize_net() in netdev_rx_handler_unregister() · 00cfec37
      Eric Dumazet 提交于
      commit 35d48903 (bonding: fix rx_handler locking) added a race
      in bonding driver, reported by Steven Rostedt who did a very good
      diagnosis :
      
      <quoting Steven>
      
      I'm currently debugging a crash in an old 3.0-rt kernel that one of our
      customers is seeing. The bug happens with a stress test that loads and
      unloads the bonding module in a loop (I don't know all the details as
      I'm not the one that is directly interacting with the customer). But the
      bug looks to be something that may still be present and possibly present
      in mainline too. It will just be much harder to trigger it in mainline.
      
      In -rt, interrupts are threads, and can schedule in and out just like
      any other thread. Note, mainline now supports interrupt threads so this
      may be easily reproducible in mainline as well. I don't have the ability
      to tell the customer to try mainline or other kernels, so my hands are
      somewhat tied to what I can do.
      
      But according to a core dump, I tracked down that the eth irq thread
      crashed in bond_handle_frame() here:
      
              slave = bond_slave_get_rcu(skb->dev);
              bond = slave->bond; <--- BUG
      
      the slave returned was NULL and accessing slave->bond caused a NULL
      pointer dereference.
      
      Looking at the code that unregisters the handler:
      
      void netdev_rx_handler_unregister(struct net_device *dev)
      {
      
              ASSERT_RTNL();
              RCU_INIT_POINTER(dev->rx_handler, NULL);
              RCU_INIT_POINTER(dev->rx_handler_data, NULL);
      }
      
      Which is basically:
              dev->rx_handler = NULL;
              dev->rx_handler_data = NULL;
      
      And looking at __netif_receive_skb() we have:
      
              rx_handler = rcu_dereference(skb->dev->rx_handler);
              if (rx_handler) {
                      if (pt_prev) {
                              ret = deliver_skb(skb, pt_prev, orig_dev);
                              pt_prev = NULL;
                      }
                      switch (rx_handler(&skb)) {
      
      My question to all of you is, what stops this interrupt from happening
      while the bonding module is unloading?  What happens if the interrupt
      triggers and we have this:
      
              CPU0                    CPU1
              ----                    ----
        rx_handler = skb->dev->rx_handler
      
                              netdev_rx_handler_unregister() {
                                 dev->rx_handler = NULL;
                                 dev->rx_handler_data = NULL;
      
        rx_handler()
         bond_handle_frame() {
          slave = skb->dev->rx_handler;
          bond = slave->bond; <-- NULL pointer dereference!!!
      
      What protection am I missing in the bond release handler that would
      prevent the above from happening?
      
      </quoting Steven>
      
      We can fix bug this in two ways. First is adding a test in
      bond_handle_frame() and others to check if rx_handler_data is NULL.
      
      A second way is adding a synchronize_net() in
      netdev_rx_handler_unregister() to make sure that a rcu protected reader
      has the guarantee to see a non NULL rx_handler_data.
      
      The second way is better as it avoids an extra test in fast path.
      Reported-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Jiri Pirko <jpirko@redhat.com>
      Cc: Paul E. McKenney <paulmck@us.ibm.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      00cfec37
    • S
      net: core: Remove redundant call to 'nf_reset' in 'dev_forward_skb' · a561cf7e
      Shmulik Ladkani 提交于
      'nf_reset' is called just prior calling 'netif_rx'.
      No need to call it twice.
      Reported-by: NIgor Michailov <rgohita@gmail.com>
      Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a561cf7e
  26. 27 3月, 2013 1 次提交
  27. 25 3月, 2013 1 次提交
    • E
      net: remove a WARN_ON() in net_enable_timestamp() · 9979a55a
      Eric Dumazet 提交于
      The WARN_ON(in_interrupt()) in net_enable_timestamp() can get false
      positive, in socket clone path, run from softirq context :
      
      [ 3641.624425] WARNING: at net/core/dev.c:1532 net_enable_timestamp+0x7b/0x80()
      [ 3641.668811] Call Trace:
      [ 3641.671254]  <IRQ>  [<ffffffff80286817>] warn_slowpath_common+0x87/0xc0
      [ 3641.677871]  [<ffffffff8028686a>] warn_slowpath_null+0x1a/0x20
      [ 3641.683683]  [<ffffffff80742f8b>] net_enable_timestamp+0x7b/0x80
      [ 3641.689668]  [<ffffffff80732ce5>] sk_clone_lock+0x425/0x450
      [ 3641.695222]  [<ffffffff8078db36>] inet_csk_clone_lock+0x16/0x170
      [ 3641.701213]  [<ffffffff807ae449>] tcp_create_openreq_child+0x29/0x820
      [ 3641.707663]  [<ffffffff807d62e2>] ? ipt_do_table+0x222/0x670
      [ 3641.713354]  [<ffffffff807aaf5b>] tcp_v4_syn_recv_sock+0xab/0x3d0
      [ 3641.719425]  [<ffffffff807af63a>] tcp_check_req+0x3da/0x530
      [ 3641.724979]  [<ffffffff8078b400>] ? inet_hashinfo_init+0x60/0x80
      [ 3641.730964]  [<ffffffff807ade6f>] ? tcp_v4_rcv+0x79f/0xbe0
      [ 3641.736430]  [<ffffffff807ab9bd>] tcp_v4_do_rcv+0x38d/0x4f0
      [ 3641.741985]  [<ffffffff807ae14a>] tcp_v4_rcv+0xa7a/0xbe0
      
      Its safe at this point because the parent socket owns a reference
      on the netstamp_needed, so we cant have a 0 -> 1 transition, which
      requires to lock a mutex.
      
      Instead of refining the check, lets remove it, as all known callers
      are safe. If it ever changes in the future, static_key_slow_inc()
      will complain anyway.
      Reported-by: NLaurent Chavey <chavey@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9979a55a
  28. 19 3月, 2013 1 次提交
  29. 12 3月, 2013 1 次提交