1. 11 10月, 2007 9 次提交
    • H
      [PKT_SCHED]: Add stateless NAT · b4219952
      Herbert Xu 提交于
      Stateless NAT is useful in controlled environments where restrictions are
      placed on through traffic such that we don't need connection tracking to
      correctly NAT protocol-specific data.
      
      In particular, this is of interest when the number of flows or the number
      of addresses being NATed is large, or if connection tracking information
      has to be replicated and where it is not practical to do so.
      
      Previously we had stateless NAT functionality which was integrated into
      the IPv4 routing subsystem.  This was a great solution as long as the NAT
      worked on a subnet to subnet basis such that the number of NAT rules was
      relatively small.  The reason is that for SNAT the routing based system
      had to perform a linear scan through the rules.
      
      If the number of rules is large then major renovations would have take
      place in the routing subsystem to make this practical.
      
      For the time being, the least intrusive way of achieving this is to use
      the u32 classifier written by Alexey Kuznetsov along with the actions
      infrastructure implemented by Jamal Hadi Salim.
      
      The following patch is an attempt at this problem by creating a new nat
      action that can be invoked from u32 hash tables which would allow large
      number of stateless NAT rules that can be used/updated in constant time.
      
      The actual NAT code is mostly based on the previous stateless NAT code
      written by Alexey.  In future we might be able to utilise the protocol
      NAT code from netfilter to improve support for other protocols.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b4219952
    • S
      [NET]: Move hardware header operations out of netdevice. · 3b04ddde
      Stephen Hemminger 提交于
      Since hardware header operations are part of the protocol class
      not the device instance, make them into a separate object and
      save memory.
      Signed-off-by: NStephen Hemminger <shemminger@linux-foundation.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b04ddde
    • S
      [NET]: Wrap netdevice hardware header creation. · 0c4e8581
      Stephen Hemminger 提交于
      Add inline for common usage of hardware header creation, and
      fix bug in IPV6 mcast where the assumption about negative return is
      an errno. Negative return from hard_header means not enough space
      was available,(ie -N bytes).
      Signed-off-by: NStephen Hemminger <shemminger@linux-foundation.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c4e8581
    • J
      [NET_SCHED]: explict hold dev tx lock · 8236632f
      Jamal Hadi Salim 提交于
      For N cpus, with full throttle traffic on all N CPUs, funneling traffic
      to the same ethernet device, the devices queue lock is contended by all
      N CPUs constantly. The TX lock is only contended by a max of 2 CPUS.
      In the current mode of operation, after all the work of entering the
      dequeue region, we may endup aborting the path if we are unable to get
      the tx lock and go back to contend for the queue lock. As N goes up,
      this gets worse.
      
      The changes in this patch result in a small increase in performance
      with a 4CPU (2xdual-core) with no irq binding. Both e1000 and tg3
      showed similar behavior;
      Signed-off-by: NJamal Hadi Salim <hadi@cyberus.ca>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8236632f
    • R
      [NET]: Nuke SET_MODULE_OWNER macro. · 10d024c1
      Ralf Baechle 提交于
      It's been a useless no-op for long enough in 2.6 so I figured it's time to
      remove it.  The number of people that could object because they're
      maintaining unified 2.4 and 2.6 drivers is probably rather small.
      
      [ Handled drivers added by netdev tree and some missed IRDA cases... -DaveM ]
      Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: NJeff Garzik <jeff@garzik.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      10d024c1
    • J
      [NET_SCHED]: Cleanup L2T macros and handle oversized packets · e9bef55d
      Jesper Dangaard Brouer 提交于
      Change L2T (length to time) macros, in all rate based schedulers, to
      call a common function qdisc_l2t() that does the rate table lookup.
      This function handles if the packet size lookup is larger than the
      rate table, which often occurs with TSO enabled.
      Signed-off-by: NJesper Dangaard Brouer <hawk@comx.dk>
      Acked-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e9bef55d
    • E
      [NET]: Make the device list and device lookups per namespace. · 881d966b
      Eric W. Biederman 提交于
      This patch makes most of the generic device layer network
      namespace safe.  This patch makes dev_base_head a
      network namespace variable, and then it picks up
      a few associated variables.  The functions:
      dev_getbyhwaddr
      dev_getfirsthwbytype
      dev_get_by_flags
      dev_get_by_name
      __dev_get_by_name
      dev_get_by_index
      __dev_get_by_index
      dev_ioctl
      dev_ethtool
      dev_load
      wireless_process_ioctl
      
      were modified to take a network namespace argument, and
      deal with it.
      
      vlan_ioctl_set and brioctl_set were modified so their
      hooks will receive a network namespace argument.
      
      So basically anthing in the core of the network stack that was
      affected to by the change of dev_base was modified to handle
      multiple network namespaces.  The rest of the network stack was
      simply modified to explicitly use &init_net the initial network
      namespace.  This can be fixed when those components of the network
      stack are modified to handle multiple network namespaces.
      
      For now the ifindex generator is left global.
      
      Fundametally ifindex numbers are per namespace, or else
      we will have corner case problems with migration when
      we get that far.
      
      At the same time there are assumptions in the network stack
      that the ifindex of a network device won't change.  Making
      the ifindex number global seems a good compromise until
      the network stack can cope with ifindex changes when
      you change namespaces, and the like.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      881d966b
    • E
      [NET]: Make /proc/net per network namespace · 457c4cbc
      Eric W. Biederman 提交于
      This patch makes /proc/net per network namespace.  It modifies the global
      variables proc_net and proc_net_stat to be per network namespace.
      The proc_net file helpers are modified to take a network namespace argument,
      and all of their callers are fixed to pass &init_net for that argument.
      This ensures that all of the /proc/net files are only visible and
      usable in the initial network namespace until the code behind them
      has been updated to be handle multiple network namespaces.
      
      Making /proc/net per namespace is necessary as at least some files
      in /proc/net depend upon the set of network devices which is per
      network namespace, and even more files in /proc/net have contents
      that are relevant to a single network namespace.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      457c4cbc
    • S
      [NET]: Make NAPI polling independent of struct net_device objects. · bea3348e
      Stephen Hemminger 提交于
      Several devices have multiple independant RX queues per net
      device, and some have a single interrupt doorbell for several
      queues.
      
      In either case, it's easier to support layouts like that if the
      structure representing the poll is independant from the net
      device itself.
      
      The signature of the ->poll() call back goes from:
      
      	int foo_poll(struct net_device *dev, int *budget)
      
      to
      
      	int foo_poll(struct napi_struct *napi, int budget)
      
      The caller is returned the number of RX packets processed (or
      the number of "NAPI credits" consumed if you want to get
      abstract).  The callee no longer messes around bumping
      dev->quota, *budget, etc. because that is all handled in the
      caller upon return.
      
      The napi_struct is to be embedded in the device driver private data
      structures.
      
      Furthermore, it is the driver's responsibility to disable all NAPI
      instances in it's ->stop() device close handler.  Since the
      napi_struct is privatized into the driver's private data structures,
      only the driver knows how to get at all of the napi_struct instances
      it may have per-device.
      
      With lots of help and suggestions from Rusty Russell, Roland Dreier,
      Michael Chan, Jeff Garzik, and Jamal Hadi Salim.
      
      Bug fixes from Thomas Graf, Roland Dreier, Peter Zijlstra,
      Joseph Fannin, Scott Wood, Hans J. Koch, and Michael Chan.
      
      [ Ported to current tree and all drivers converted.  Integrated
        Stephen's follow-on kerneldoc additions, and restored poll_list
        handling to the old style to fix mutual exclusion issues.  -DaveM ]
      Signed-off-by: NStephen Hemminger <shemminger@linux-foundation.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bea3348e
  2. 08 10月, 2007 1 次提交
  3. 02 10月, 2007 1 次提交
  4. 21 9月, 2007 1 次提交
  5. 17 9月, 2007 1 次提交
  6. 15 9月, 2007 1 次提交
    • J
      [NET_SCHED] protect action config/dump from irqs · e1e992e5
      Jamal Hadi Salim 提交于
      (with no apologies to C Heston)
      
      On Mon, 2007-10-09 at 21:00 +0800, Herbert Xu wrote:
      On Sun, Sep 02, 2007 at 01:11:29PM +0000, Christian Kujau wrote:
      > >
      > > after upgrading to 2.6.23-rc5 (and applying davem's fix [0]), lockdep
      > > was quite noisy when I tried to shape my external (wireless) interface:
      > >
      > > [ 6400.534545] FahCore_78.exe/3552 just changed the state of lock:
      > > [ 6400.534713]  (&dev->ingress_lock){-+..}, at: [<c038d595>]
      > > netif_receive_skb+0x2d5/0x3c0
      > > [ 6400.534941] but this lock took another, soft-read-irq-unsafe lock in the
      > > past:
      > > [ 6400.535145]  (police_lock){-.--}
      >
      > This is a genuine dead-lock.  The police lock can be taken
      > for reading with softirqs on.  If a second CPU tries to take
      > the police lock for writing, while holding the ingress lock,
      > then a softirq on the first CPU can dead-lock when it tries
      > to get the ingress lock.
      Signed-off-by: NJamal Hadi Salim <hadi@cyberus.ca>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e1e992e5
  7. 31 8月, 2007 1 次提交
    • L
      [NET_SCHED] sch_prio.c: remove duplicate call of tc_classify() · dbaaa07a
      Lucas Nussbaum 提交于
      When CONFIG_NET_CLS_ACT is enabled, tc_classify() is called twice in
      prio_classify(). This causes "interesting" behaviour: with the setup
      below, packets are duplicated, sent twice to ifb0, and then loop in and
      out of ifb0.
      
      The patch uses the previously calculated return value in the switch,
      which is probably what Patrick had in mind in commit
      bdba91ec -- maybe Patrick can
      double-check this?
      
      -- example setup --
      ifconfig ifb0 up
      tc qdisc add dev ifb0 root netem delay 2s
      tc qdisc add dev $ETH root handle 1: prio
      tc filter add dev $ETH parent 1: protocol ip prio 10 u32 \
       match ip dst 172.24.110.6/32 flowid 1:1 \
       action mirred egress redirect dev ifb0
      ping -c1 172.24.110.6
      Signed-off-by: NLucas Nussbaum <lucas.nussbaum@imag.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dbaaa07a
  8. 14 8月, 2007 1 次提交
  9. 31 7月, 2007 3 次提交
  10. 18 7月, 2007 2 次提交
  11. 15 7月, 2007 6 次提交
  12. 12 7月, 2007 1 次提交
  13. 11 7月, 2007 10 次提交
  14. 08 6月, 2007 1 次提交
  15. 04 6月, 2007 1 次提交