1. 16 6月, 2010 3 次提交
  2. 14 6月, 2010 2 次提交
  3. 13 6月, 2010 1 次提交
    • B
      net: Enable 64-bit net device statistics on 32-bit architectures · be1f3c2c
      Ben Hutchings 提交于
      Use struct rtnl_link_stats64 as the statistics structure.
      
      On 32-bit architectures, insert 32 bits of padding after/before each
      field of struct net_device_stats to make its layout compatible with
      struct rtnl_link_stats64.  Add an anonymous union in net_device; move
      stats into the union and add struct rtnl_link_stats64 stats64.
      
      Add net_device_ops::ndo_get_stats64, implementations of which will
      return a pointer to struct rtnl_link_stats64.  Drivers that implement
      this operation must not update the structure asynchronously.
      
      Change dev_get_stats() to call ndo_get_stats64 if available, and to
      return a pointer to struct rtnl_link_stats64.  Change callers of
      dev_get_stats() accordingly.
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      be1f3c2c
  4. 12 6月, 2010 2 次提交
  5. 11 6月, 2010 3 次提交
    • D
      pktgen: Fix accuracy of inter-packet delay. · 07a0f0f0
      Daniel Turull 提交于
      This patch correct a bug in the delay of pktgen. 
      It makes sure the inter-packet interval is accurate.
      Signed-off-by: NDaniel Turull <daniel.turull@gmail.com>
      Signed-off-by: NRobert Olsson <robert.olsson@its.uu.se>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      07a0f0f0
    • E
      pkt_sched: gen_estimator: add a new lock · ae638c47
      Eric Dumazet 提交于
      gen_kill_estimator() / gen_new_estimator() is not always called with
      RTNL held.
      
      net/netfilter/xt_RATEEST.c is one user of these API that do not hold
      RTNL, so random corruptions can occur between "tc" and "iptables".
      
      Add a new fine grained lock instead of trying to use RTNL in netfilter.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae638c47
    • J
      net: deliver skbs on inactive slaves to exact matches · 597a264b
      John Fastabend 提交于
      Currently, the accelerated receive path for VLAN's will
      drop packets if the real device is an inactive slave and
      is not one of the special pkts tested for in
      skb_bond_should_drop().  This behavior is different then
      the non-accelerated path and for pkts over a bonded vlan.
      
      For example,
      
      vlanx -> bond0 -> ethx
      
      will be dropped in the vlan path and not delivered to any
      packet handlers at all.  However,
      
      bond0 -> vlanx -> ethx
      
      and
      
      bond0 -> ethx
      
      will be delivered to handlers that match the exact dev,
      because the VLAN path checks the real_dev which is not a
      slave and netif_recv_skb() doesn't drop frames but only
      delivers them to exact matches.
      
      This patch adds a sk_buff flag which is used for tagging
      skbs that would previously been dropped and allows the
      skb to continue to skb_netif_recv().  Here we add
      logic to check for the deliver_no_wcard flag and if it
      is set only deliver to handlers that match exactly.  This
      makes both paths above consistent and gives pkt handlers
      a way to identify skbs that come from inactive slaves.
      Without this patch in some configurations skbs will be
      delivered to handlers with exact matches and in others
      be dropped out right in the vlan path.
      
      I have tested the following 4 configurations in failover modes
      and load balancing modes.
      
      # bond0 -> ethx
      
      # vlanx -> bond0 -> ethx
      
      # bond0 -> vlanx -> ethx
      
      # bond0 -> ethx
                  |
        vlanx -> --
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      597a264b
  6. 10 6月, 2010 1 次提交
  7. 08 6月, 2010 1 次提交
    • E
      anycast: Some RCU conversions · bb69ae04
      Eric Dumazet 提交于
      - dev_get_by_flags() changed to dev_get_by_flags_rcu()
      
      - ipv6_sock_ac_join() dont touch dev & idev refcounts
      - ipv6_sock_ac_drop() dont touch dev & idev refcounts
      - ipv6_sock_ac_close() dont touch dev & idev refcounts
      - ipv6_dev_ac_dec() dount touch idev refcount
      - ipv6_chk_acast_addr() dont touch idev refcount
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bb69ae04
  8. 07 6月, 2010 2 次提交
  9. 05 6月, 2010 1 次提交
  10. 02 6月, 2010 4 次提交
    • J
      net: replace hooks in __netif_receive_skb V5 · ab95bfe0
      Jiri Pirko 提交于
      What this patch does is it removes two receive frame hooks (for bridge and for
      macvlan) from __netif_receive_skb. These are replaced them with a single
      hook for both. It only supports one hook per device because it makes no
      sense to do bridging and macvlan on the same device.
      
      Then a network driver (of virtual netdev like macvlan or bridge) can register
      an rx_handler for needed net device.
      Signed-off-by: NJiri Pirko <jpirko@redhat.com>
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ab95bfe0
    • E
      net: add additional lock to qdisc to increase throughput · 79640a4c
      Eric Dumazet 提交于
      When many cpus compete for sending frames on a given qdisc, the qdisc
      spinlock suffers from very high contention.
      
      The cpu owning __QDISC_STATE_RUNNING bit has same priority to acquire
      the lock, and cannot dequeue packets fast enough, since it must wait for
      this lock for each dequeued packet.
      
      One solution to this problem is to force all cpus spinning on a second
      lock before trying to get the main lock, when/if they see
      __QDISC_STATE_RUNNING already set.
      
      The owning cpu then compete with at most one other cpu for the main
      lock, allowing for higher dequeueing rate.
      
      Based on a previous patch from Alexander Duyck. I added the heuristic to
      avoid the atomic in fast path, and put the new lock far away from the
      cache line used by the dequeue worker. Also try to release the busylock
      lock as late as possible.
      
      Tests with following script gave a boost from ~50.000 pps to ~600.000
      pps on a dual quad core machine (E5450 @3.00GHz), tg3 driver.
      (A single netperf flow can reach ~800.000 pps on this platform)
      
      for j in `seq 0 3`; do
        for i in `seq 0 7`; do
          netperf -H 192.168.0.1 -t UDP_STREAM -l 60 -N -T $i -- -m 6 &
        done
      done
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79640a4c
    • J
      net: fix conflict between null_or_orig and null_or_bond · 2df4a0fa
      John Fastabend 提交于
      If a skb is received on an inactive bond that does not meet
      the special cases checked for by skb_bond_should_drop it should
      only be delivered to exact matches as the comment in
      netif_receive_skb() says.
      
      However because null_or_bond could also be null this is not
      always true.  This patch renames null_or_bond to orig_or_bond
      and initializes it to orig_dev.  This keeps the intent of
      null_or_bond to pass frames received on VLAN interfaces stacked
      on bonding interfaces without invalidating the statement for
      null_or_orig.
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2df4a0fa
    • E
      net: Define accessors to manipulate QDISC_STATE_RUNNING · bc135b23
      Eric Dumazet 提交于
      Define three helpers to manipulate QDISC_STATE_RUNNIG flag, that a
      second patch will move on another location.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc135b23
  11. 01 6月, 2010 1 次提交
  12. 31 5月, 2010 2 次提交
  13. 29 5月, 2010 2 次提交
  14. 28 5月, 2010 3 次提交
  15. 27 5月, 2010 1 次提交
    • E
      net: fix lock_sock_bh/unlock_sock_bh · 8a74ad60
      Eric Dumazet 提交于
      This new sock lock primitive was introduced to speedup some user context
      socket manipulation. But it is unsafe to protect two threads, one using
      regular lock_sock/release_sock, one using lock_sock_bh/unlock_sock_bh
      
      This patch changes lock_sock_bh to be careful against 'owned' state.
      If owned is found to be set, we must take the slow path.
      lock_sock_bh() now returns a boolean to say if the slow path was taken,
      and this boolean is used at unlock_sock_bh time to call the appropriate
      unlock function.
      
      After this change, BH are either disabled or enabled during the
      lock_sock_bh/unlock_sock_bh protected section. This might be misleading,
      so we rename these functions to lock_sock_fast()/unlock_sock_fast().
      Reported-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Tested-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a74ad60
  16. 24 5月, 2010 4 次提交
    • H
      tun: Update classid on packet injection · 82862742
      Herbert Xu 提交于
      This patch makes tun update its socket classid every time we
      inject a packet into the network stack.  This is so that any
      updates made by the admin to the process writing packets to
      tun is effected.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      82862742
    • H
      cls_cgroup: Store classid in struct sock · f8451725
      Herbert Xu 提交于
      Up until now cls_cgroup has relied on fetching the classid out of
      the current executing thread.  This runs into trouble when a packet
      processing is delayed in which case it may execute out of another
      thread's context.
      
      Furthermore, even when a packet is not delayed we may fail to
      classify it if soft IRQs have been disabled, because this scenario
      is indistinguishable from one where a packet unrelated to the
      current thread is processed by a real soft IRQ.
      
      In fact, the current semantics is inherently broken, as a single
      skb may be constructed out of the writes of two different tasks.
      A different manifestation of this problem is when the TCP stack
      transmits in response of an incoming ACK.  This is currently
      unclassified.
      
      As we already have a concept of packet ownership for accounting
      purposes in the skb->sk pointer, this is a natural place to store
      the classid in a persistent manner.
      
      This patch adds the cls_cgroup classid in struct sock, filling up
      an existing hole on 64-bit :)
      
      The value is set at socket creation time.  So all sockets created
      via socket(2) automatically gains the ID of the thread creating it.
      Whenever another process touches the socket by either reading or
      writing to it, we will change the socket classid to that of the
      process if it has a valid (non-zero) classid.
      
      For sockets created on inbound connections through accept(2), we
      inherit the classid of the original listening socket through
      sk_clone, possibly preceding the actual accept(2) call.
      
      In order to minimise risks, I have not made this the authoritative
      classid.  For now it is only used as a backup when we execute
      with soft IRQs disabled.  Once we're completely happy with its
      semantics we can use it as the sole classid.
      
      Footnote: I have rearranged the error path on cls_group module
      creation.  If we didn't do this, then there is a window where
      someone could create a tc rule using cls_group before the cgroup
      subsystem has been registered.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f8451725
    • D
      net-2.6 : V2 - fix dev_get_valid_name · 8ce6cebc
      Daniel Lezcano 提交于
      the commit:
      
      commit d9031024
      Author: Octavian Purdila <opurdila@ixiacom.com>
      Date:   Wed Nov 18 02:36:59 2009 +0000
      
          net: device name allocation cleanups
      
      introduced a bug when there is a hash collision making impossible
      to rename a device with eth%d. This bug is very hard to reproduce
      and appears rarely.
      
      The problem is coming from we don't pass a temporary buffer to
      __dev_alloc_name but 'dev->name' which is modified by the function.
      
      A detailed explanation is here:
      
      http://marc.info/?l=linux-netdev&m=127417784011987&w=2
      
      Changelog:
       V2 : replaced strings comparison by pointers comparison
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@free.fr>
      Reviewed-by: NOctavian Purdila <opurdila@ixiacom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ce6cebc
    • D
      rtnetlink: Fix error handling in do_setlink() · 253683bb
      David Howells 提交于
      Commit c02db8c6:
      
      	Author:  Chris Wright <chrisw@sous-sol.org>
      	Date:    Sun May 16 01:05:45 2010 -0700
      	Subject: rtnetlink: make SR-IOV VF interface symmetric
      
      adds broken error handling to do_setlink() in net/core/rtnetlink.c.  The
      problem is the following chunk of code:
      
      	if (tb[IFLA_VFINFO_LIST]) {
      		struct nlattr *attr;
      		int rem;
      		nla_for_each_nested(attr, tb[IFLA_VFINFO_LIST], rem) {
      			if (nla_type(attr) != IFLA_VF_INFO)
        ---->				goto errout;
      			err = do_setvfinfo(dev, attr);
      			if (err < 0)
      				goto errout;
      			modified = 1;
      		}
      	}
      
      which can get to errout without setting err, resulting in the following error:
      
      net/core/rtnetlink.c: In function 'do_setlink':
      net/core/rtnetlink.c:904: warning: 'err' may be used uninitialized in this function
      
      Change the code to return -EINVAL in this case.  Note that this might not be
      the appropriate error though.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Chris Wright <chrisw@sous-sol.org>
      cc: David S. Miller <davem@davemloft.net>
      Acked-by: NChris Wright <chrisw@sous-sol.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      253683bb
  17. 22 5月, 2010 4 次提交
    • J
      pipe: add support for shrinking and growing pipes · 35f3d14d
      Jens Axboe 提交于
      This patch adds F_GETPIPE_SZ and F_SETPIPE_SZ fcntl() actions for
      growing and shrinking the size of a pipe and adjusts pipe.c and splice.c
      (and relay and network splice) usage to work with these larger (or smaller)
      pipes.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      35f3d14d
    • E
      net: Expose all network devices in a namespaces in sysfs · a1b3f594
      Eric W. Biederman 提交于
      This reverts commit aaf8cdc3.
      
      Drivers like the ipw2100 call device_create_group when they
      are initialized and device_remove_group when they are shutdown.
      Moving them between namespaces deletes their sysfs groups early.
      
      In particular the following call chain results.
      netdev_unregister_kobject -> device_del -> kobject_del -> sysfs_remove_dir
      With sysfs_remove_dir recursively deleting all of it's subdirectories,
      and nothing adding them back.
      
      Ouch!
      
      Therefore we need to call something that ultimate calls sysfs_mv_dir
      as that sysfs function can move sysfs directories between namespaces
      without deleting their subdirectories or their contents.   Allowing
      us to avoid placing extra boiler plate into every driver that does
      something interesting with sysfs.
      
      Currently the function that provides that capability is device_rename.
      That is the code works without nasty side effects as originally written.
      
      So remove the misguided fix for moving devices between namespaces.  The
      bug in the kobject layer that inspired it has now been recognized and
      fixed.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      a1b3f594
    • E
      net/sysfs: Fix the bitrot in network device kobject namespace support · d6523ddf
      Eric W. Biederman 提交于
      I had a couple of stupid bugs in:
      netns: Teach network device kobjects which namespace they are in.
      
      - I duplicated the Kconfig for the NET_NS
      - The build was broken when sysfs was not compiled in
      
      The sysfs breakage is because after I moved the operations
      for the sysfs to the kobject layer, to make things cleaner
      I forgot to move the ifdefs.  Opps.
      
      I'm not quite certain how I got introduced a second NET_NS Kconfig,
      but it was probably a 3 way merge somewhere along the way that
      did not notice that the NET_NS Kconfig option had mvoed and thout
      that was a bug.  It probably slipped in because it used to be the
      sysfs patches were the first patches in my network namespace patches.
      Some things just don't go like you would expect.
      
      Neither of these bugs actually affect anything in the common case
      but they should be fixed.
      
      Thanks to Serge for noticing they were present.
      Reported-by: NSerge E. Hallyn <serue@us.ibm.com>
      Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      
      d6523ddf
    • E
      netns: Teach network device kobjects which namespace they are in. · 608b4b95
      Eric W. Biederman 提交于
      The problem.  Network devices show up in sysfs and with the network
      namespace active multiple devices with the same name can show up in
      the same directory, ouch!
      
      To avoid that problem and allow existing applications in network namespaces
      to see the same interface that is currently presented in sysfs, this
      patch enables the tagging directory support in sysfs.
      
      By using the network namespace pointers as tags to separate out the
      the sysfs directory entries we ensure that we don't have conflicts
      in the directories and applications only see a limited set of
      the network devices.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      608b4b95
  18. 21 5月, 2010 2 次提交
  19. 18 5月, 2010 1 次提交
    • S
      net: Add netlink support for virtual port management (was iovnl) · 57b61080
      Scott Feldman 提交于
      Add new netdev ops ndo_{set|get}_vf_port to allow setting of
      port-profile on a netdev interface.  Extends netlink socket RTM_SETLINK/
      RTM_GETLINK with two new sub msgs called IFLA_VF_PORTS and IFLA_PORT_SELF
      (added to end of IFLA_cmd list).  These are both nested atrtibutes
      using this layout:
      
                    [IFLA_NUM_VF]
                    [IFLA_VF_PORTS]
                            [IFLA_VF_PORT]
                                    [IFLA_PORT_*], ...
                            [IFLA_VF_PORT]
                                    [IFLA_PORT_*], ...
                            ...
                    [IFLA_PORT_SELF]
                            [IFLA_PORT_*], ...
      
      These attributes are design to be set and get symmetrically.  VF_PORTS
      is a list of VF_PORTs, one for each VF, when dealing with an SR-IOV
      device.  PORT_SELF is for the PF of the SR-IOV device, in case it wants
      to also have a port-profile, or for the case where the VF==PF, like in
      enic patch 2/2 of this patch set.
      
      A port-profile is used to configure/enable the external switch virtual port
      backing the netdev interface, not to configure the host-facing side of the
      netdev.  A port-profile is an identifier known to the switch.  How port-
      profiles are installed on the switch or how available port-profiles are
      made know to the host is outside the scope of this patch.
      
      There are two types of port-profiles specs in the netlink msg.  The first spec
      is for 802.1Qbg (pre-)standard, VDP protocol.  The second spec is for devices
      that run a similar protocol as VDP but in firmware, thus hiding the protocol
      details.  In either case, the specs have much in common and makes sense to
      define the netlink msg as the union of the two specs.  For example, both specs
      have a notition of associating/deassociating a port-profile.  And both specs
      require some information from the hypervisor manager, such as client port
      instance ID.
      
      The general flow is the port-profile is applied to a host netdev interface
      using RTM_SETLINK, the receiver of the RTM_SETLINK msg communicates with the
      switch, and the switch virtual port backing the host netdev interface is
      configured/enabled based on the settings defined by the port-profile.  What
      those settings comprise, and how those settings are managed is again
      outside the scope of this patch, since this patch only deals with the
      first step in the flow.
      Signed-off-by: NScott Feldman <scofeldm@cisco.com>
      Signed-off-by: NRoopa Prabhu <roprabhu@cisco.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      57b61080