1. 04 6月, 2014 1 次提交
  2. 17 5月, 2014 4 次提交
    • V
      bonding: Fix stacked device detection in arp monitoring · 44a40855
      Vlad Yasevich 提交于
      Prior to commit fbd929f2
      	bonding: support QinQ for bond arp interval
      
      the arp monitoring code allowed for proper detection of devices
      stacked on top of vlans.  Since the above commit, the
      code can still detect a device stacked on top of single
      vlan, but not a device stacked on top of Q-in-Q configuration.
      The search will only set the inner vlan tag if the route
      device is the vlan device.  However, this is not always the
      case, as it is possible to extend the stacked configuration.
      
      With this patch it is possible to provision devices on
      top Q-in-Q vlan configuration that should be used as
      a source of ARP monitoring information.
      
      For example:
      ip link add link bond0 vlan10 type vlan proto 802.1q id 10
      ip link add link vlan10 vlan100 type vlan proto 802.1q id 100
      ip link add link vlan100 type macvlan
      
      Note:  This patch limites the number of stacked VLANs to 2,
      just like before.  The original, however had another issue
      in that if we had more then 2 levels of VLANs, we would end
      up generating incorrectly tagged traffic.  This is no longer
      possible.
      
      Fixes: fbd929f2 (bonding: support QinQ for bond arp interval)
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@redhat.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: Ding Tianhong <dingtianhong@huawei.com>
      CC: Patric McHardy <kaber@trash.net>
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      44a40855
    • V
      vlan: Fix lockdep warning with stacked vlan devices. · d38569ab
      Vlad Yasevich 提交于
      This reverts commit dc8eaaa0.
      	vlan: Fix lockdep warning when vlan dev handle notification
      
      Instead we use the new new API to find the lock subclass of
      our vlan device.  This way we can support configurations where
      vlans are interspersed with other devices:
        bond -> vlan -> macvlan -> vlan
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d38569ab
    • V
      net: Find the nesting level of a given device by type. · 4085ebe8
      Vlad Yasevich 提交于
      Multiple devices in the kernel can be stacked/nested and they
      need to know their nesting level for the purposes of lockdep.
      This patch provides a generic function that determines a nesting
      level of a particular device by its type (ex: vlan, macvlan, etc).
      We only care about nesting of the same type of devices.
      
      For example:
        eth0 <- vlan0.10 <- macvlan0 <- vlan1.20
      
      The nesting level of vlan1.20 would be 1, since there is another vlan
      in the stack under it.
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4085ebe8
    • E
      net: gro: make sure skb->cb[] initial content has not to be zero · 29e98242
      Eric Dumazet 提交于
      Starting from linux-3.13, GRO attempts to build full size skbs.
      
      Problem is the commit assumed one particular field in skb->cb[]
      was clean, but it is not the case on some stacked devices.
      
      Timo reported a crash in case traffic is decrypted before
      reaching a GRE device.
      
      Fix this by initializing NAPI_GRO_CB(skb)->last at the right place,
      this also removes one conditional.
      
      Thanks a lot to Timo for providing full reports and bisecting this.
      
      Fixes: 8a29111c ("net: gro: allow to build full sized skb")
      Bisected-by: NTimo Teras <timo.teras@iki.fi>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Tested-by: NTimo Teräs <timo.teras@iki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      29e98242
  3. 16 5月, 2014 1 次提交
  4. 08 5月, 2014 1 次提交
  5. 06 5月, 2014 1 次提交
    • R
      unregister_netdevice : move RTM_DELLINK to until after ndo_uninit · 56bfa7ee
      Roopa Prabhu 提交于
      This patch fixes ordering of rtnl notifications during unregister_netdevice
      by moving RTM_DELLINK notification to until after ndo_uninit.
      
      The problem was seen with unregistering bond netdevices.
      
      bond ndo_uninit callback generates a few RTM_NEWLINK notifications for
      NETDEV_CHANGEADDR and NETDEV_FEAT_CHANGE. This is seen mostly when the
      bond is deleted with slaves still enslaved to the bond.
      
      During unregister netdevice (rollback_registered_many to be specific)
      bond ndo_uninit is called after RTM_DELLINK notification goes out.
      This results in userspace seeing RTM_DELLINK followed by a couple of
      RTM_NEWLINK's.
      
      In userspace problem was seen with libnl. libnl cache deletes the bond
      when it sees RTM_DELLINK and re-adds the bond with the following
      RTM_NEWLINK. Resulting in a stale bond entry in libnl cache when the kernel
      has already deleted the bond.
      
      This patch has been tested for bond, bridges and vlan devices.
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      56bfa7ee
  6. 21 4月, 2014 1 次提交
  7. 19 4月, 2014 1 次提交
    • D
      vlan: Fix lockdep warning when vlan dev handle notification · dc8eaaa0
      dingtianhong 提交于
      When I open the LOCKDEP config and run these steps:
      
      modprobe 8021q
      vconfig add eth2 20
      vconfig add eth2.20 30
      ifconfig eth2 xx.xx.xx.xx
      
      then the Call Trace happened:
      
      [32524.386288] =============================================
      [32524.386293] [ INFO: possible recursive locking detected ]
      [32524.386298] 3.14.0-rc2-0.7-default+ #35 Tainted: G           O
      [32524.386302] ---------------------------------------------
      [32524.386306] ifconfig/3103 is trying to acquire lock:
      [32524.386310]  (&vlan_netdev_addr_lock_key/1){+.....}, at: [<ffffffff814275f4>] dev_mc_sync+0x64/0xb0
      [32524.386326]
      [32524.386326] but task is already holding lock:
      [32524.386330]  (&vlan_netdev_addr_lock_key/1){+.....}, at: [<ffffffff8141af83>] dev_set_rx_mode+0x23/0x40
      [32524.386341]
      [32524.386341] other info that might help us debug this:
      [32524.386345]  Possible unsafe locking scenario:
      [32524.386345]
      [32524.386350]        CPU0
      [32524.386352]        ----
      [32524.386354]   lock(&vlan_netdev_addr_lock_key/1);
      [32524.386359]   lock(&vlan_netdev_addr_lock_key/1);
      [32524.386364]
      [32524.386364]  *** DEADLOCK ***
      [32524.386364]
      [32524.386368]  May be due to missing lock nesting notation
      [32524.386368]
      [32524.386373] 2 locks held by ifconfig/3103:
      [32524.386376]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff81431d42>] rtnl_lock+0x12/0x20
      [32524.386387]  #1:  (&vlan_netdev_addr_lock_key/1){+.....}, at: [<ffffffff8141af83>] dev_set_rx_mode+0x23/0x40
      [32524.386398]
      [32524.386398] stack backtrace:
      [32524.386403] CPU: 1 PID: 3103 Comm: ifconfig Tainted: G           O 3.14.0-rc2-0.7-default+ #35
      [32524.386409] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
      [32524.386414]  ffffffff81ffae40 ffff8800d9625ae8 ffffffff814f68a2 ffff8800d9625bc8
      [32524.386421]  ffffffff810a35fb ffff8800d8a8d9d0 00000000d9625b28 ffff8800d8a8e5d0
      [32524.386428]  000003cc00000000 0000000000000002 ffff8800d8a8e5f8 0000000000000000
      [32524.386435] Call Trace:
      [32524.386441]  [<ffffffff814f68a2>] dump_stack+0x6a/0x78
      [32524.386448]  [<ffffffff810a35fb>] __lock_acquire+0x7ab/0x1940
      [32524.386454]  [<ffffffff810a323a>] ? __lock_acquire+0x3ea/0x1940
      [32524.386459]  [<ffffffff810a4874>] lock_acquire+0xe4/0x110
      [32524.386464]  [<ffffffff814275f4>] ? dev_mc_sync+0x64/0xb0
      [32524.386471]  [<ffffffff814fc07a>] _raw_spin_lock_nested+0x2a/0x40
      [32524.386476]  [<ffffffff814275f4>] ? dev_mc_sync+0x64/0xb0
      [32524.386481]  [<ffffffff814275f4>] dev_mc_sync+0x64/0xb0
      [32524.386489]  [<ffffffffa0500cab>] vlan_dev_set_rx_mode+0x2b/0x50 [8021q]
      [32524.386495]  [<ffffffff8141addf>] __dev_set_rx_mode+0x5f/0xb0
      [32524.386500]  [<ffffffff8141af8b>] dev_set_rx_mode+0x2b/0x40
      [32524.386506]  [<ffffffff8141b3cf>] __dev_open+0xef/0x150
      [32524.386511]  [<ffffffff8141b177>] __dev_change_flags+0xa7/0x190
      [32524.386516]  [<ffffffff8141b292>] dev_change_flags+0x32/0x80
      [32524.386524]  [<ffffffff8149ca56>] devinet_ioctl+0x7d6/0x830
      [32524.386532]  [<ffffffff81437b0b>] ? dev_ioctl+0x34b/0x660
      [32524.386540]  [<ffffffff814a05b0>] inet_ioctl+0x80/0xa0
      [32524.386550]  [<ffffffff8140199d>] sock_do_ioctl+0x2d/0x60
      [32524.386558]  [<ffffffff81401a52>] sock_ioctl+0x82/0x2a0
      [32524.386568]  [<ffffffff811a7123>] do_vfs_ioctl+0x93/0x590
      [32524.386578]  [<ffffffff811b2705>] ? rcu_read_lock_held+0x45/0x50
      [32524.386586]  [<ffffffff811b39e5>] ? __fget_light+0x105/0x110
      [32524.386594]  [<ffffffff811a76b1>] SyS_ioctl+0x91/0xb0
      [32524.386604]  [<ffffffff815057e2>] system_call_fastpath+0x16/0x1b
      
      ========================================================================
      
      The reason is that all of the addr_lock_key for vlan dev have the same class,
      so if we change the status for vlan dev, the vlan dev and its real dev will
      hold the same class of addr_lock_key together, so the warning happened.
      
      we should distinguish the lock depth for vlan dev and its real dev.
      
      v1->v2: Convert the vlan_netdev_addr_lock_key to an array of eight elements, which
      	could support to add 8 vlan id on a same vlan dev, I think it is enough for current
      	scene, because a netdev's name is limited to IFNAMSIZ which could not hold 8 vlan id,
      	and the vlan dev would not meet the same class key with its real dev.
      
      	The new function vlan_dev_get_lockdep_subkey() will return the subkey and make the vlan
      	dev could get a suitable class key.
      
      v2->v3: According David's suggestion, I use the subclass to distinguish the lock key for vlan dev
      	and its real dev, but it make no sense, because the difference for subclass in the
      	lock_class_key doesn't mean that the difference class for lock_key, so I use lock_depth
      	to distinguish the different depth for every vlan dev, the same depth of the vlan dev
      	could have the same lock_class_key, I import the MAX_LOCK_DEPTH from the include/linux/sched.h,
      	I think it is enough here, the lockdep should never exceed that value.
      
      v3->v4: Add a huge array of locking keys will waste static kernel memory and is not a appropriate method,
      	we could use _nested() variants to fix the problem, calculate the depth for every vlan dev,
      	and use the depth as the subclass for addr_lock_key.
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dc8eaaa0
  8. 15 4月, 2014 1 次提交
    • V
      net: Start with correct mac_len in skb_network_protocol · 1e785f48
      Vlad Yasevich 提交于
      Sometimes, when the packet arrives at skb_mac_gso_segment()
      its skb->mac_len already accounts for some of the mac lenght
      headers in the packet.  This seems to happen when forwarding
      through and OpenSSL tunnel.
      
      When we start looking for any vlan headers in skb_network_protocol()
      we seem to ignore any of the already known mac headers and start
      with an ETH_HLEN.  This results in an incorrect offset, dropped
      TSO frames and general slowness of the connection.
      
      We can start counting from the known skb->mac_len
      and return at least that much if all mac level headers
      are known and accounted for.
      
      Fixes: 53d6471c (net: Account for all vlan headers in skb_mac_gso_segment)
      CC: Eric Dumazet <eric.dumazet@gmail.com>
      CC: Daniel Borkman <dborkman@redhat.com>
      Tested-by: NMartin Filip <nexus+kernel@smoula.net>
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1e785f48
  9. 08 4月, 2014 1 次提交
  10. 04 4月, 2014 2 次提交
  11. 01 4月, 2014 2 次提交
    • E
      net-gro: restore frag0 optimization · a50e233c
      Eric Dumazet 提交于
      Main difference between napi_frags_skb() and napi_gro_receive() is that
      the later is called while ethernet header was already pulled by the NIC
      driver (eth_type_trans() was called before napi_gro_receive())
      
      Jerry Chu in commit 299603e8 ("net-gro: Prepare GRO stack for the
      upcoming tunneling support") tried to remove this difference by calling
      eth_type_trans() from napi_frags_skb() instead of doing this later from
      napi_frags_finish()
      
      Goal was that napi_gro_complete() could call
      ptype->callbacks.gro_complete(skb, 0)  (offset of first network header =
      0)
      
      Also, xxx_gro_receive() handlers all use off = skb_gro_offset(skb) to
      point to their own header, for the current skb and ones held in gro_list
      
      Problem is this cleanup work defeated the frag0 optimization:
      It turns out the consecutive pskb_may_pull() calls are too expensive.
      
      This patch brings back the frag0 stuff in napi_frags_skb().
      
      As all skb have their mac header in skb head, we no longer need
      skb_gro_mac_header()
      Reported-by: NMichal Schmidt <mschmidt@redhat.com>
      Fixes: 299603e8 ("net-gro: Prepare GRO stack for the upcoming tunneling support")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Jerry Chu <hkchu@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a50e233c
    • V
      1ee481fb
  12. 30 3月, 2014 2 次提交
  13. 29 3月, 2014 2 次提交
  14. 27 3月, 2014 1 次提交
  15. 18 3月, 2014 1 次提交
    • E
      netpoll: Remove dead packet receive code (CONFIG_NETPOLL_TRAP) · 9c62a68d
      Eric W. Biederman 提交于
      The netpoll packet receive code only becomes active if the netpoll
      rx_skb_hook is implemented, and there is not a single implementation
      of the netpoll rx_skb_hook in the kernel.
      
      All of the out of tree implementations I have found all call
      netpoll_poll which was removed from the kernel in 2011, so this
      change should not add any additional breakage.
      
      There are problems with the netpoll packet receive code.  __netpoll_rx
      does not call dev_kfree_skb_irq or dev_kfree_skb_any in hard irq
      context.  netpoll_neigh_reply leaks every skb it receives.  Reception
      of packets does not work successfully on stacked devices (aka bonding,
      team, bridge, and vlans).
      
      Given that the netpoll packet receive code is buggy, there are no
      out of tree users that will be merged soon, and the code has
      not been used for in tree for a decade let's just remove it.
      
      Reverting this commit can server as a starting point for anyone
      who wants to resurrect netpoll packet reception support.
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9c62a68d
  16. 13 3月, 2014 1 次提交
  17. 25 2月, 2014 2 次提交
    • F
      smp: Rename __smp_call_function_single() to smp_call_function_single_async() · c46fff2a
      Frederic Weisbecker 提交于
      The name __smp_call_function_single() doesn't tell much about the
      properties of this function, especially when compared to
      smp_call_function_single().
      
      The comments above the implementation are also misleading. The main
      point of this function is actually not to be able to embed the csd
      in an object. This is actually a requirement that result from the
      purpose of this function which is to raise an IPI asynchronously.
      
      As such it can be called with interrupts disabled. And this feature
      comes at the cost of the caller who then needs to serialize the
      IPIs on this csd.
      
      Lets rename the function and enhance the comments so that they reflect
      these properties.
      Suggested-by: NChristoph Hellwig <hch@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jens Axboe <axboe@fb.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      c46fff2a
    • F
      smp: Remove wait argument from __smp_call_function_single() · fce8ad15
      Frederic Weisbecker 提交于
      The main point of calling __smp_call_function_single() is to send
      an IPI in a pure asynchronous way. By embedding a csd in an object,
      a caller can send the IPI without waiting for a previous one to complete
      as is required by smp_call_function_single() for example. As such,
      sending this kind of IPI can be safe even when irqs are disabled.
      
      This flexibility comes at the expense of the caller who then needs to
      synchronize the csd lifecycle by himself and make sure that IPIs on a
      single csd are serialized.
      
      This is how __smp_call_function_single() works when wait = 0 and this
      usecase is relevant.
      
      Now there don't seem to be any usecase with wait = 1 that can't be
      covered by smp_call_function_single() instead, which is safer. Lets look
      at the two possible scenario:
      
      1) The user calls __smp_call_function_single(wait = 1) on a csd embedded
         in an object. It looks like a nice and convenient pattern at the first
         sight because we can then retrieve the object from the IPI handler easily.
      
         But actually it is a waste of memory space in the object since the csd
         can be allocated from the stack by smp_call_function_single(wait = 1)
         and the object can be passed an the IPI argument.
      
         Besides that, embedding the csd in an object is more error prone
         because the caller must take care of the serialization of the IPIs
         for this csd.
      
      2) The user calls __smp_call_function_single(wait = 1) on a csd that
         is allocated on the stack. It's ok but smp_call_function_single()
         can do it as well and it already takes care of the allocation on the
         stack. Again it's more simple and less error prone.
      
      Therefore, using the underscore prepend API version with wait = 1
      is a bad pattern and a sign that the caller can do safer and more
      simple.
      
      There was a single user of that which has just been converted.
      So lets remove this option to discourage further users.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jens Axboe <axboe@fb.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      fce8ad15
  18. 19 2月, 2014 1 次提交
  19. 14 2月, 2014 1 次提交
  20. 10 2月, 2014 1 次提交
  21. 22 1月, 2014 3 次提交
  22. 20 1月, 2014 1 次提交
  23. 17 1月, 2014 3 次提交
    • M
      net-sysfs: add support for device-specific rx queue sysfs attributes · a953be53
      Michael Dalton 提交于
      Extend existing support for netdevice receive queue sysfs attributes to
      permit a device-specific attribute group. Initial use case for this
      support will be to allow the virtio-net device to export per-receive
      queue mergeable receive buffer size.
      Signed-off-by: NMichael Dalton <mwdalton@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a953be53
    • V
      net: add NETDEV_PRECHANGEMTU to notify before mtu change happens · 1d486bfb
      Veaceslav Falico 提交于
      Currently, if a device changes its mtu, first the change happens (invloving
      all the side effects), and after that the NETDEV_CHANGEMTU is sent so that
      other devices can catch up with the new mtu. However, if they return
      NOTIFY_BAD, then the change is reverted and error returned.
      
      This is a really long and costy operation (sometimes). To fix this, add
      NETDEV_PRECHANGEMTU notification which is called prior to any change
      actually happening, and if any callee returns NOTIFY_BAD - the change is
      aborted. This way we're skipping all the playing with apply/revert the mtu.
      
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Jiri Pirko <jiri@resnulli.us>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      CC: Cong Wang <amwang@redhat.com>
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Acked-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d486bfb
    • T
      net: Check skb->rxhash in gro_receive · 0b4cec8c
      Tom Herbert 提交于
      When initializing a gro_list for a packet, first check the rxhash of
      the incoming skb against that of the skb's in the list. This should be
      a very strong inidicator of whether the flow is going to be matched,
      and potentially allows a lot of other checks to be short circuited.
      Use skb_hash_raw so that we don't force the hash to be calculated.
      
      Tested by running netperf 200 TCP_STREAMs between two machines with
      GRO, HW rxhash, and 1G. Saw no performance degration, slight reduction
      of time in dev_gro_receive.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0b4cec8c
  24. 16 1月, 2014 2 次提交
  25. 15 1月, 2014 3 次提交