1. 20 4月, 2013 3 次提交
    • P
      net: vlan: add 802.1ad support · 8ad227ff
      Patrick McHardy 提交于
      Add support for 802.1ad VLAN devices. This mainly consists of checking for
      ETH_P_8021AD in addition to ETH_P_8021Q in a couple of places and check
      offloading capabilities based on the used protocol.
      
      Configuration is done using "ip link":
      
      # ip link add link eth0 eth0.1000 \
      	type vlan proto 802.1ad id 1000
      # ip link add link eth0.1000 eth0.1000.1000 \
      	type vlan proto 802.1q id 1000
      
      52:54:00:12:34:56 > 92:b1:54:28:e4:8c, ethertype 802.1Q (0x8100), length 106: vlan 1000, p 0, ethertype 802.1Q, vlan 1000, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto ICMP (1), length 84)
          20.1.0.2 > 20.1.0.1: ICMP echo request, id 3003, seq 8, length 64
      92:b1:54:28:e4:8c > 52:54:00:12:34:56, ethertype 802.1Q-QinQ (0x88a8), length 106: vlan 1000, p 0, ethertype 802.1Q, vlan 1000, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 47944, offset 0, flags [none], proto ICMP (1), length 84)
          20.1.0.1 > 20.1.0.2: ICMP echo reply, id 3003, seq 8, length 64
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ad227ff
    • P
      net: vlan: add protocol argument to packet tagging functions · 86a9bad3
      Patrick McHardy 提交于
      Add a protocol argument to the VLAN packet tagging functions. In case of HW
      tagging, we need that protocol available in the ndo_start_xmit functions,
      so it is stored in a new field in the skb. The new field fits into a hole
      (on 64 bit) and doesn't increase the sks's size.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86a9bad3
    • P
      net: vlan: rename NETIF_F_HW_VLAN_* feature flags to NETIF_F_HW_VLAN_CTAG_* · f646968f
      Patrick McHardy 提交于
      Rename the hardware VLAN acceleration features to include "CTAG" to indicate
      that they only support CTAGs. Follow up patches will introduce 802.1ad
      server provider tagging (STAGs) and require the distinction for hardware not
      supporting acclerating both.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f646968f
  2. 06 4月, 2013 1 次提交
    • P
      netfilter: don't reset nf_trace in nf_reset() · 124dff01
      Patrick McHardy 提交于
      Commit 130549fe ("netfilter: reset nf_trace in nf_reset") added code
      to reset nf_trace in nf_reset(). This is wrong and unnecessary.
      
      nf_reset() is used in the following cases:
      
      - when passing packets up the the socket layer, at which point we want to
        release all netfilter references that might keep modules pinned while
        the packet is queued. nf_trace doesn't matter anymore at this point.
      
      - when encapsulating or decapsulating IPsec packets. We want to continue
        tracing these packets after IPsec processing.
      
      - when passing packets through virtual network devices. Only devices on
        that encapsulate in IPv4/v6 matter since otherwise nf_trace is not
        used anymore. Its not entirely clear whether those packets should
        be traced after that, however we've always done that.
      
      - when passing packets through virtual network devices that make the
        packet cross network namespace boundaries. This is the only cases
        where we clearly want to reset nf_trace and is also what the
        original patch intended to fix.
      
      Add a new function nf_reset_trace() and use it in dev_forward_skb() to
      fix this properly.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      124dff01
  3. 30 3月, 2013 2 次提交
    • E
      net: add a synchronize_net() in netdev_rx_handler_unregister() · 00cfec37
      Eric Dumazet 提交于
      commit 35d48903 (bonding: fix rx_handler locking) added a race
      in bonding driver, reported by Steven Rostedt who did a very good
      diagnosis :
      
      <quoting Steven>
      
      I'm currently debugging a crash in an old 3.0-rt kernel that one of our
      customers is seeing. The bug happens with a stress test that loads and
      unloads the bonding module in a loop (I don't know all the details as
      I'm not the one that is directly interacting with the customer). But the
      bug looks to be something that may still be present and possibly present
      in mainline too. It will just be much harder to trigger it in mainline.
      
      In -rt, interrupts are threads, and can schedule in and out just like
      any other thread. Note, mainline now supports interrupt threads so this
      may be easily reproducible in mainline as well. I don't have the ability
      to tell the customer to try mainline or other kernels, so my hands are
      somewhat tied to what I can do.
      
      But according to a core dump, I tracked down that the eth irq thread
      crashed in bond_handle_frame() here:
      
              slave = bond_slave_get_rcu(skb->dev);
              bond = slave->bond; <--- BUG
      
      the slave returned was NULL and accessing slave->bond caused a NULL
      pointer dereference.
      
      Looking at the code that unregisters the handler:
      
      void netdev_rx_handler_unregister(struct net_device *dev)
      {
      
              ASSERT_RTNL();
              RCU_INIT_POINTER(dev->rx_handler, NULL);
              RCU_INIT_POINTER(dev->rx_handler_data, NULL);
      }
      
      Which is basically:
              dev->rx_handler = NULL;
              dev->rx_handler_data = NULL;
      
      And looking at __netif_receive_skb() we have:
      
              rx_handler = rcu_dereference(skb->dev->rx_handler);
              if (rx_handler) {
                      if (pt_prev) {
                              ret = deliver_skb(skb, pt_prev, orig_dev);
                              pt_prev = NULL;
                      }
                      switch (rx_handler(&skb)) {
      
      My question to all of you is, what stops this interrupt from happening
      while the bonding module is unloading?  What happens if the interrupt
      triggers and we have this:
      
              CPU0                    CPU1
              ----                    ----
        rx_handler = skb->dev->rx_handler
      
                              netdev_rx_handler_unregister() {
                                 dev->rx_handler = NULL;
                                 dev->rx_handler_data = NULL;
      
        rx_handler()
         bond_handle_frame() {
          slave = skb->dev->rx_handler;
          bond = slave->bond; <-- NULL pointer dereference!!!
      
      What protection am I missing in the bond release handler that would
      prevent the above from happening?
      
      </quoting Steven>
      
      We can fix bug this in two ways. First is adding a test in
      bond_handle_frame() and others to check if rx_handler_data is NULL.
      
      A second way is adding a synchronize_net() in
      netdev_rx_handler_unregister() to make sure that a rcu protected reader
      has the guarantee to see a non NULL rx_handler_data.
      
      The second way is better as it avoids an extra test in fast path.
      Reported-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Jiri Pirko <jpirko@redhat.com>
      Cc: Paul E. McKenney <paulmck@us.ibm.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      00cfec37
    • S
      net: core: Remove redundant call to 'nf_reset' in 'dev_forward_skb' · a561cf7e
      Shmulik Ladkani 提交于
      'nf_reset' is called just prior calling 'netif_rx'.
      No need to call it twice.
      Reported-by: NIgor Michailov <rgohita@gmail.com>
      Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a561cf7e
  4. 27 3月, 2013 1 次提交
  5. 25 3月, 2013 1 次提交
    • E
      net: remove a WARN_ON() in net_enable_timestamp() · 9979a55a
      Eric Dumazet 提交于
      The WARN_ON(in_interrupt()) in net_enable_timestamp() can get false
      positive, in socket clone path, run from softirq context :
      
      [ 3641.624425] WARNING: at net/core/dev.c:1532 net_enable_timestamp+0x7b/0x80()
      [ 3641.668811] Call Trace:
      [ 3641.671254]  <IRQ>  [<ffffffff80286817>] warn_slowpath_common+0x87/0xc0
      [ 3641.677871]  [<ffffffff8028686a>] warn_slowpath_null+0x1a/0x20
      [ 3641.683683]  [<ffffffff80742f8b>] net_enable_timestamp+0x7b/0x80
      [ 3641.689668]  [<ffffffff80732ce5>] sk_clone_lock+0x425/0x450
      [ 3641.695222]  [<ffffffff8078db36>] inet_csk_clone_lock+0x16/0x170
      [ 3641.701213]  [<ffffffff807ae449>] tcp_create_openreq_child+0x29/0x820
      [ 3641.707663]  [<ffffffff807d62e2>] ? ipt_do_table+0x222/0x670
      [ 3641.713354]  [<ffffffff807aaf5b>] tcp_v4_syn_recv_sock+0xab/0x3d0
      [ 3641.719425]  [<ffffffff807af63a>] tcp_check_req+0x3da/0x530
      [ 3641.724979]  [<ffffffff8078b400>] ? inet_hashinfo_init+0x60/0x80
      [ 3641.730964]  [<ffffffff807ade6f>] ? tcp_v4_rcv+0x79f/0xbe0
      [ 3641.736430]  [<ffffffff807ab9bd>] tcp_v4_do_rcv+0x38d/0x4f0
      [ 3641.741985]  [<ffffffff807ae14a>] tcp_v4_rcv+0xa7a/0xbe0
      
      Its safe at this point because the parent socket owns a reference
      on the netstamp_needed, so we cant have a 0 -> 1 transition, which
      requires to lock a mutex.
      
      Instead of refining the check, lets remove it, as all known callers
      are safe. If it ever changes in the future, static_key_slow_inc()
      will complain anyway.
      Reported-by: NLaurent Chavey <chavey@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9979a55a
  6. 19 3月, 2013 1 次提交
  7. 12 3月, 2013 1 次提交
  8. 10 3月, 2013 2 次提交
  9. 09 3月, 2013 1 次提交
  10. 06 3月, 2013 3 次提交
  11. 28 2月, 2013 1 次提交
    • S
      hlist: drop the node parameter from iterators · b67bfe0d
      Sasha Levin 提交于
      I'm not sure why, but the hlist for each entry iterators were conceived
      
              list_for_each_entry(pos, head, member)
      
      The hlist ones were greedy and wanted an extra parameter:
      
              hlist_for_each_entry(tpos, pos, head, member)
      
      Why did they need an extra pos parameter? I'm not quite sure. Not only
      they don't really need it, it also prevents the iterator from looking
      exactly like the list iterator, which is unfortunate.
      
      Besides the semantic patch, there was some manual work required:
      
       - Fix up the actual hlist iterators in linux/list.h
       - Fix up the declaration of other iterators based on the hlist ones.
       - A very small amount of places were using the 'node' parameter, this
       was modified to use 'obj->member' instead.
       - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
       properly, so those had to be fixed up manually.
      
      The semantic patch which is mostly the work of Peter Senna Tschudin is here:
      
      @@
      iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
      
      type T;
      expression a,c,d,e;
      identifier b;
      statement S;
      @@
      
      -T b;
          <+... when != b
      (
      hlist_for_each_entry(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue(a,
      - b,
      c) S
      |
      hlist_for_each_entry_from(a,
      - b,
      c) S
      |
      hlist_for_each_entry_rcu(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_rcu_bh(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue_rcu_bh(a,
      - b,
      c) S
      |
      for_each_busy_worker(a, c,
      - b,
      d) S
      |
      ax25_uid_for_each(a,
      - b,
      c) S
      |
      ax25_for_each(a,
      - b,
      c) S
      |
      inet_bind_bucket_for_each(a,
      - b,
      c) S
      |
      sctp_for_each_hentry(a,
      - b,
      c) S
      |
      sk_for_each(a,
      - b,
      c) S
      |
      sk_for_each_rcu(a,
      - b,
      c) S
      |
      sk_for_each_from
      -(a, b)
      +(a)
      S
      + sk_for_each_from(a) S
      |
      sk_for_each_safe(a,
      - b,
      c, d) S
      |
      sk_for_each_bound(a,
      - b,
      c) S
      |
      hlist_for_each_entry_safe(a,
      - b,
      c, d, e) S
      |
      hlist_for_each_entry_continue_rcu(a,
      - b,
      c) S
      |
      nr_neigh_for_each(a,
      - b,
      c) S
      |
      nr_neigh_for_each_safe(a,
      - b,
      c, d) S
      |
      nr_node_for_each(a,
      - b,
      c) S
      |
      nr_node_for_each_safe(a,
      - b,
      c, d) S
      |
      - for_each_gfn_sp(a, c, d, b) S
      + for_each_gfn_sp(a, c, d) S
      |
      - for_each_gfn_indirect_valid_sp(a, c, d, b) S
      + for_each_gfn_indirect_valid_sp(a, c, d) S
      |
      for_each_host(a,
      - b,
      c) S
      |
      for_each_host_safe(a,
      - b,
      c, d) S
      |
      for_each_mesh_entry(a,
      - b,
      c, d) S
      )
          ...+>
      
      [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
      [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
      [akpm@linux-foundation.org: checkpatch fixes]
      [akpm@linux-foundation.org: fix warnings]
      [akpm@linux-foudnation.org: redo intrusive kvm changes]
      Tested-by: NPeter Senna Tschudin <peter.senna@gmail.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b67bfe0d
  12. 23 2月, 2013 1 次提交
  13. 20 2月, 2013 1 次提交
  14. 19 2月, 2013 4 次提交
  15. 16 2月, 2013 3 次提交
  16. 15 2月, 2013 1 次提交
  17. 07 2月, 2013 4 次提交
    • E
      net: reset mac header in dev_start_xmit() · 6d1ccff6
      Eric Dumazet 提交于
      On 64 bit arches :
      
      There is a off-by-one error in qdisc_pkt_len_init() because
      mac_header is not set in xmit path.
      
      skb_mac_header() returns an out of bound value that was
      harmless because hdr_len is an 'unsigned int'
      
      On 32bit arches, the error is abysmal.
      
      This patch is also a prereq for "macvlan: add multicast filter"
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Ben Greear <greearb@candelatech.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6d1ccff6
    • C
      net: adjust skb_gso_segment() for calling in rx path · 12b0004d
      Cong Wang 提交于
      skb_gso_segment() is almost always called in tx path,
      except for openvswitch. It calls this function when
      it receives the packet and tries to queue it to user-space.
      In this special case, the ->ip_summed check inside
      skb_gso_segment() is no longer true, as ->ip_summed value
      has different meanings on rx path.
      
      This patch adjusts skb_gso_segment() so that we can at least
      avoid such warnings on checksum.
      
      Cc: Jesse Gross <jesse@nicira.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NCong Wang <amwang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      12b0004d
    • N
      netpoll: protect napi_poll and poll_controller during dev_[open|close] · ca99ca14
      Neil Horman 提交于
      Ivan Vercera was recently backporting commit
      9c13cb8b to a RHEL kernel, and I noticed that,
      while this patch protects the tg3 driver from having its ndo_poll_controller
      routine called during device initalization, it does nothing for the driver
      during shutdown. I.e. it would be entirely possible to have the
      ndo_poll_controller method (or subsequently the ndo_poll) routine called for a
      driver in the netpoll path on CPU A while in parallel on CPU B, the ndo_close or
      ndo_open routine could be called.  Given that the two latter routines tend to
      initizlize and free many data structures that the former two rely on, the result
      can easily be data corruption or various other crashes.  Furthermore, it seems
      that this is potentially a problem with all net drivers that support netpoll,
      and so this should ideally be fixed in a common path.
      
      As Ben H Pointed out to me, we can't preform dev_open/dev_close in atomic
      context, so I've come up with this solution.  We can use a mutex to sleep in
      open/close paths and just do a mutex_trylock in the napi poll path and abandon
      the poll attempt if we're locked, as we'll just retry the poll on the next send
      anyway.
      
      I've tested this here by flooding netconsole with messages on a system whos nic
      driver I modfied to periodically return NETDEV_TX_BUSY, so that the netpoll tx
      workqueue would be forced to send frames and poll the device.  While this was
      going on I rapidly ifdown/up'ed the interface and watched for any problems.
      I've not found any.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      CC: Ivan Vecera <ivecera@redhat.com>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Ben Hutchings <bhutchings@solarflare.com>
      CC: Francois Romieu <romieu@fr.zoreil.com>
      CC: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ca99ca14
    • J
      net: core: Remove unnecessary alloc/OOM messages · 62b5942a
      Joe Perches 提交于
      alloc failures already get standardized OOM
      messages and a dump_stack.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      62b5942a
  18. 30 1月, 2013 1 次提交
  19. 28 1月, 2013 1 次提交
    • E
      net: fix possible wrong checksum generation · cef401de
      Eric Dumazet 提交于
      Pravin Shelar mentioned that GSO could potentially generate
      wrong TX checksum if skb has fragments that are overwritten
      by the user between the checksum computation and transmit.
      
      He suggested to linearize skbs but this extra copy can be
      avoided for normal tcp skbs cooked by tcp_sendmsg().
      
      This patch introduces a new SKB_GSO_SHARED_FRAG flag, set
      in skb_shinfo(skb)->gso_type if at least one frag can be
      modified by the user.
      
      Typical sources of such possible overwrites are {vm}splice(),
      sendfile(), and macvtap/tun/virtio_net drivers.
      
      Tested:
      
      $ netperf -H 7.7.8.84
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
      7.7.8.84 () port 0 AF_INET
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       87380  16384  16384    10.00    3959.52
      
      $ netperf -H 7.7.8.84 -t TCP_SENDFILE
      TCP SENDFILE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.8.84 ()
      port 0 AF_INET
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       87380  16384  16384    10.00    3216.80
      
      Performance of the SENDFILE is impacted by the extra allocation and
      copy, and because we use order-0 pages, while the TCP_STREAM uses
      bigger pages.
      Reported-by: NPravin Shelar <pshelar@nicira.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cef401de
  20. 22 1月, 2013 1 次提交
  21. 16 1月, 2013 1 次提交
  22. 12 1月, 2013 2 次提交
  23. 11 1月, 2013 3 次提交
    • A
      net: Add support for XPS without sysfs being defined · 024e9679
      Alexander Duyck 提交于
      This patch makes it so that we can support transmit packet steering without
      sysfs needing to be enabled.  The reason for making this change is to make
      it so that a driver can make use of the XPS even while the sysfs portion of
      the interface is not present.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      024e9679
    • A
      net: Rewrite netif_set_xps_queues to address several issues · 01c5f864
      Alexander Duyck 提交于
      This change is meant to address several issues I found within the
      netif_set_xps_queues function.
      
      If the allocation of one of the maps to be assigned to new_dev_maps failed
      we could end up with the device map in an inconsistent state since we had
      already worked through a number of CPUs and removed or added the queue.  To
      address that I split the process into several steps.  The first of which is
      just the allocation of updated maps for CPUs that will need larger maps to
      store the queue.  By doing this we can fail gracefully without actually
      altering the contents of the current device map.
      
      The second issue I found was the fact that we were always allocating a new
      device map even if we were not adding any queues.  I have updated the code
      so that we only allocate a new device map if we are adding queues,
      otherwise if we are not adding any queues to CPUs we just skip to the
      removal process.
      
      The last change I made was to reuse the code from remove_xps_queue to remove
      the queue from the CPU.  By making this change we can be consistent in how
      we go about adding and removing the queues from the CPUs.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      01c5f864
    • A
      net: Rewrite netif_reset_xps_queue to allow for better code reuse · 10cdc3f3
      Alexander Duyck 提交于
      This patch does a minor refactor on netif_reset_xps_queue to address a few
      items I noticed.
      
      First is the fact that we are doing removal of queues in both
      netif_reset_xps_queue and netif_set_xps_queue.  Since there is no need to
      have the code in two places I am pushing it out into a separate function
      and will come back in another patch and reuse the code in
      netif_set_xps_queue.
      
      The second item this change addresses is the fact that the Tx queues were
      not getting their numa_node value cleared as a part of the XPS queue reset.
      This patch resolves that by resetting the numa_node value if the dev_maps
      value is set.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      10cdc3f3