1. 30 8月, 2013 17 次提交
    • V
      bonding: make bond_arp_send_all use upper device list · 27bc11e6
      Veaceslav Falico 提交于
      Currently, bond_arp_send_all() is aware only of vlans, which breaks
      configurations like bond <- bridge (or any other 'upper' device) with IP
      (which is quite a common scenario for virt setups).
      
      To fix this we convert the bond_arp_send_all() to first verify if the rt
      device is the bond itself, and if not - to go through its list of upper
      vlans and their respectiv upper devices (if the vlan's upper device matches
      - tag the packet), if still not found - go through all of our upper list
      devices to see if any of them match the route device for the target. If the
      match is a vlan device - we also save its vlan_id and tag it in
      bond_arp_send().
      
      Also, clean the function a bit to be more readable.
      
      CC: Vlad Yasevich <vyasevic@redhat.com>
      CC: Jay Vosburgh <fubar@us.ibm.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      27bc11e6
    • V
      bonding: use netdev_upper list in bond_vlan_used · c752af2c
      Veaceslav Falico 提交于
      Convert bond_vlan_used() to traverse the upper device list to see if we
      have any vlans above us. It's protected by rcu, and in case we are holding
      rtnl_lock we should call vlan_uses_dev() instead - it's faster.
      
      CC: Jay Vosburgh <fubar@us.ibm.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c752af2c
    • V
      net: add netdev_for_each_upper_dev_rcu() · 8b5be856
      Veaceslav Falico 提交于
      The new macro netdev_for_each_upper_dev_rcu(dev, upper, iter) iterates
      through the dev->upper_dev_list starting from the first element, using
      the netdev_upper_get_next_dev_rcu(dev, &iter).
      
      Must be called under RCU read lock.
      
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      CC: Alexander Duyck <alexander.h.duyck@intel.com>
      CC: Cong Wang <amwang@redhat.com>
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b5be856
    • V
      net: add netdev_upper_get_next_dev_rcu(dev, iter) · 48311f46
      Veaceslav Falico 提交于
      This function returns the next dev in the dev->upper_dev_list after the
      struct list_head **iter position, and updates *iter accordingly. Returns
      NULL if there are no devices left.
      
      Caller must hold RCU read lock.
      
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      CC: Alexander Duyck <alexander.h.duyck@intel.com>
      CC: Cong Wang <amwang@redhat.com>
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48311f46
    • V
      net: remove search_list from netdev_adjacent · 620f3186
      Veaceslav Falico 提交于
      We already don't need it cause we see every upper/lower device in the list
      already.
      
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      CC: Alexander Duyck <alexander.h.duyck@intel.com>
      CC: Cong Wang <amwang@redhat.com>
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      620f3186
    • V
      net: add lower_dev_list to net_device and make a full mesh · 5d261913
      Veaceslav Falico 提交于
      This patch adds lower_dev_list list_head to net_device, which is the same
      as upper_dev_list, only for lower devices, and begins to use it in the same
      way as the upper list.
      
      It also changes the way the whole adjacent device lists work - now they
      contain *all* of upper/lower devices, not only the first level. The first
      level devices are distinguished by the bool neighbour field in
      netdev_adjacent, also added by this patch.
      
      There are cases when a device can be added several times to the adjacent
      list, the simplest would be:
      
           /---- eth0.10 ---\
      eth0-		       --- bond0
           \---- eth0.20 ---/
      
      where both bond0 and eth0 'see' each other in the adjacent lists two times.
      To avoid duplication of netdev_adjacent structures ref_nr is being kept as
      the number of times the device was added to the list.
      
      The 'full view' is achieved by adding, on link creation, all of the
      upper_dev's upper_dev_list devices as upper devices to all of the
      lower_dev's lower_dev_list devices (and to the lower_dev itself), and vice
      versa. On unlink they are removed using the same logic.
      
      I've tested it with thousands vlans/bonds/bridges, everything works ok and
      no observable lags even on a huge number of interfaces.
      
      Memory footprint for 128 devices interconnected with each other via both
      upper and lower (which is impossible, but for the comparison) lists would be:
      
      128*128*2*sizeof(netdev_adjacent) = 1.5MB
      
      but in the real world we usualy have at most several devices with slaves
      and a lot of vlans, so the footprint will be much lower.
      
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      CC: Alexander Duyck <alexander.h.duyck@intel.com>
      CC: Cong Wang <amwang@redhat.com>
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5d261913
    • V
      net: rename netdev_upper to netdev_adjacent · aa9d8560
      Veaceslav Falico 提交于
      Rename the structure to reflect the upcoming addition of lower_dev_list.
      
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      CC: Alexander Duyck <alexander.h.duyck@intel.com>
      CC: Cong Wang <amwang@redhat.com>
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa9d8560
    • D
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next · 6d508cce
      David S. Miller 提交于
      Jeff Kirsher says:
      
      ====================
      This series contains updates to ixgbe.
      
      Jacob provides a fix for 82599 devices where it can potentially keep link
      lights up when the adapter has gone down.
      
      Mark provides a fix to resolve the possible use of uninitialized memory
      by checking the return value on EEPROM reads.
      
      Don provides 2 patches, one to fix a issue where we were traversing the
      Tx ring with the value of IXGBE_NUM_RX_QUEUES which currently happens
      to have the correct value but this is misleading.  A change later, could
      easily make this no longer correct so when traversing the Tx ring, use
      netdev->num_tx_queues.  His second patch does some minor clean ups of log
      messages.
      
      Emil provides the remaining ixgbe patches.  First he fixes the link test
      where forcing the laser before the link check can lead to inconsistent
      results because it does not guarantee that the link will be negotiated
      correctly.  Then he initializes the message buffer array to 0 in order
      to avoid using random numbers from the memory as a MAC address for the
      VF.  Emil also fixes the read loop for the I2C data to account for the
      offset for SFP+ modules.  Lastly, Emil provides several patches to add
      support for QSFP modules where 1Gbps support is added as well as support
      for older QSFP active direct attach cables which pre-date SFF-8436 v3.6.
      
      v2: Fixed patch 4 description and added blank line based on feedback from
          Sergei Shtylyov
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6d508cce
    • F
      fec: Use NAPI_POLL_WEIGHT · 322555f5
      Fabio Estevam 提交于
      Instead of using a custom 'FEC_NAPI_WEIGHT', just use the generic
      'NAPI_POLL_WEIGHT' definition instead.
      Signed-off-by: NFabio Estevam <fabio.estevam@freescale.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      322555f5
    • D
      net: sctp: sctp_verify_init: clean up mandatory checks and add comment · 7613f5fe
      Daniel Borkmann 提交于
      Add a comment related to RFC4960 explaning why we do not check for initial
      TSN, and while at it, remove yoda notation checks and clean up code from
      checks of mandatory conditions. That's probably just really minor, but makes
      reviewing easier.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7613f5fe
    • E
      tcp: TSO packets automatic sizing · 95bd09eb
      Eric Dumazet 提交于
      After hearing many people over past years complaining against TSO being
      bursty or even buggy, we are proud to present automatic sizing of TSO
      packets.
      
      One part of the problem is that tcp_tso_should_defer() uses an heuristic
      relying on upcoming ACKS instead of a timer, but more generally, having
      big TSO packets makes little sense for low rates, as it tends to create
      micro bursts on the network, and general consensus is to reduce the
      buffering amount.
      
      This patch introduces a per socket sk_pacing_rate, that approximates
      the current sending rate, and allows us to size the TSO packets so
      that we try to send one packet every ms.
      
      This field could be set by other transports.
      
      Patch has no impact for high speed flows, where having large TSO packets
      makes sense to reach line rate.
      
      For other flows, this helps better packet scheduling and ACK clocking.
      
      This patch increases performance of TCP flows in lossy environments.
      
      A new sysctl (tcp_min_tso_segs) is added, to specify the
      minimal size of a TSO packet (default being 2).
      
      A follow-up patch will provide a new packet scheduler (FQ), using
      sk_pacing_rate as an input to perform optional per flow pacing.
      
      This explains why we chose to set sk_pacing_rate to twice the current
      rate, allowing 'slow start' ramp up.
      
      sk_pacing_rate = 2 * cwnd * mss / srtt
      
      v2: Neal Cardwell reported a suspect deferring of last two segments on
      initial write of 10 MSS, I had to change tcp_tso_should_defer() to take
      into account tp->xmit_size_goal_segs
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Van Jacobson <vanj@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      95bd09eb
    • H
      ipv6: drop fragmented ndisc packets by default (RFC 6980) · b800c3b9
      Hannes Frederic Sowa 提交于
      This patch implements RFC6980: Drop fragmented ndisc packets by
      default. If a fragmented ndisc packet is received the user is informed
      that it is possible to disable the check.
      
      Cc: Fernando Gont <fernando@gont.com.ar>
      Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b800c3b9
    • B
      ARM: at91/dt: fix phy address in sama5xmb to match the reg property · a3a975b1
      Boris BREZILLON 提交于
      Fix phy0 address to match the reg property defined in phy0 node.
      Signed-off-by: NBoris BREZILLON <b.brezillon@overkiz.com>
      Acked-by: NNicolas Ferre <nicolas.ferre@atmel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a3a975b1
    • B
      net/cadence/macb: fix invalid 0 return if no phy is discovered on mii init · 7daa78e3
      Boris BREZILLON 提交于
      Replace misleading -1 (-EPERM) by a more appropriate return code (-ENXIO)
      in macb_mii_probe function.
      Save macb_mii_probe return before branching to err_out_unregister to avoid
      erronous 0 return.
      Signed-off-by: NBoris BREZILLON <b.brezillon@overkiz.com>
      Acked-by: NNicolas Ferre <nicolas.ferre@atmel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7daa78e3
    • F
      bridge: inherit slave devices needed_headroom · fd094808
      Florian Fainelli 提交于
      Some slave devices may have set a dev->needed_headroom value which is
      different than the default one, most likely in order to prepend a
      hardware descriptor in front of the Ethernet frame to send. Whenever a
      new slave is added to a bridge, ensure that we update the
      needed_headroom value accordingly to account for the slave
      needed_headroom value.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd094808
    • D
      net: sctp: reorder sctp_globals to reduce cacheline usage · 76bfd898
      Daniel Borkmann 提交于
      Reduce cacheline usage from 2 to 1 cacheline for sctp_globals structure. By
      reordering elements, we can close gaps and simply achieve the following:
      
      Current situation:
        /* size: 80, cachelines: 2, members: 10 */
        /* sum members: 57, holes: 4, sum holes: 16 */
        /* padding: 7 */
        /* last cacheline: 16 bytes */
      
      Afterwards:
        /* size: 64, cachelines: 1, members: 10 */
        /* padding: 7 */
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      76bfd898
    • J
      net: mdio-sun4i: Convert to devm_* api · 03536cc3
      Jisheng Zhang 提交于
      Use devm_ioremap_resource instead of of_iomap() and devm_kzalloc()
      instead of kmalloc() to make cleanup paths simpler. This patch also
      fixes the resource leak caused by missing corresponding iounamp()
      of the of_iomap().
      Signed-off-by: NJisheng Zhang <jszhang@marvell.com>
      Acked-by: NMaxime Ripard <maxime.ripard@free-electrons.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      03536cc3
  2. 29 8月, 2013 17 次提交
  3. 28 8月, 2013 6 次提交