1. 29 1月, 2008 16 次提交
  2. 10 1月, 2008 1 次提交
  3. 09 1月, 2008 1 次提交
    • E
      [IPV4] ROUTE: ip_rt_dump() is unecessary slow · d8c92830
      Eric Dumazet 提交于
      I noticed "ip route list cache x.y.z.t" can be *very* slow.
      
      While strace-ing -T it I also noticed that first part of route cache
      is fetched quite fast :
      
      recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202
      GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3772 <0.000047>
      recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\234\0\0\0\30\0\2\0\254i\
      202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3736 <0.000042>
      recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\
      202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3740 <0.000055>
      recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\234\0\0\0\30\0\2\0\254i\
      202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3712 <0.000043>
      recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\
      202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3732 <0.000053>
      recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202
      GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3708 <0.000052>
      recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202
      GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3680 <0.000041>
      
      while the part at the end of the table is more expensive:
      
      recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3656 <0.003857>
      recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3772 <0.003891>
      recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3712 <0.003765>
      recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3700 <0.003879>
      recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3676 <0.003797>
      recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3724 <0.003856>
      recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\234\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3736 <0.003848>
      
      The following patch corrects this performance/latency problem,
      removing quadratic behavior.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d8c92830
  4. 07 12月, 2007 2 次提交
  5. 19 11月, 2007 1 次提交
  6. 15 11月, 2007 1 次提交
    • E
      [NET]: rt_check_expire() can take a long time, add a cond_resched() · d90bf5a9
      Eric Dumazet 提交于
      On commit 39c90ece:
      
      	[IPV4]: Convert rt_check_expire() from softirq processing to workqueue.
      
      we converted rt_check_expire() from softirq to workqueue, allowing the
      function to perform all work it was supposed to do.
      
      When the IP route cache is big, rt_check_expire() can take a long time
      to run.  (default settings : 20% of the hash table is scanned at each
      invocation)
      
      Adding cond_resched() helps giving cpu to higher priority tasks if
      necessary.
      
      Using a "if (need_resched())" test before calling "cond_resched();" is
      necessary to avoid spending too much time doing the resched check.
      (My tests gave a time reduction from 88 ms to 25 ms per
      rt_check_expire() run on my i686 test machine)
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d90bf5a9
  7. 11 11月, 2007 2 次提交
  8. 11 10月, 2007 7 次提交
    • P
      [NET]: Make core networking code use seq_open_private · cf7732e4
      Pavel Emelyanov 提交于
      This concerns the ipv4 and ipv6 code mostly, but also the netlink
      and unix sockets.
      
      The netlink code is an example of how to use the __seq_open_private()
      call - it saves the net namespace on this private.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf7732e4
    • S
      [NET]: sparse warning fixes · cfcabdcc
      Stephen Hemminger 提交于
      Fix a bunch of sparse warnings. Mostly about 0 used as
      NULL pointer, and shadowed variable declarations.
      One notable case was that hash size should have been unsigned.
      Signed-off-by: NStephen Hemminger <shemminger@linux-foundation.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cfcabdcc
    • E
      [NET]: Make the loopback device per network namespace. · 2774c7ab
      Eric W. Biederman 提交于
      This patch makes loopback_dev per network namespace.  Adding
      code to create a different loopback device for each network
      namespace and adding the code to free a loopback device
      when a network namespace exits.
      
      This patch modifies all users the loopback_dev so they
      access it as init_net.loopback_dev, keeping all of the
      code compiling and working.  A later pass will be needed to
      update the users to use something other than the initial network
      namespace.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2774c7ab
    • D
      [NET]: Dynamically allocate the loopback device, part 1. · de3cb747
      Daniel Lezcano 提交于
      This patch replaces all occurences to the static variable
      loopback_dev to a pointer loopback_dev. That provides the
      mindless, trivial, uninteressting change part for the dynamic
      allocation for the loopback.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDaniel Lezcano <dlezcano@fr.ibm.com>
      Acked-By: NKirill Korotaev <dev@sw.ru>
      Acked-by: NBenjamin Thery <benjamin.thery@bull.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de3cb747
    • E
      [IPV4]: Convert rt_check_expire() from softirq processing to workqueue. · 39c90ece
      Eric Dumazet 提交于
      On loaded/big hosts, rt_check_expire() if of litle use, because it
      generally breaks out of its main loop because of a jiffies change.
      
      It can take a long time (read : timer invocations) to actually
      scan the whole hash table, freeing unused entries.
      
      Converting it to use a workqueue instead of softirq is a nice
      move because we can allow rt_check_expire() to do the scan
      it is supposed to do, without hogging the CPU.
      
      This has an impact on the average number of entries in cache,
      reducing ram usage. Cache is more responsive to parameter
      changes (/proc/sys/net/ipv4/route/gc_timeout and
      /proc/sys/net/ipv4/route/gc_interval)
      
      Note: Maybe the default value of gc_interval (60 seconds)
      is too high, since this means we actually need 5 (300/60)
      invocations to scan the whole table.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      39c90ece
    • E
      [NET]: Make the device list and device lookups per namespace. · 881d966b
      Eric W. Biederman 提交于
      This patch makes most of the generic device layer network
      namespace safe.  This patch makes dev_base_head a
      network namespace variable, and then it picks up
      a few associated variables.  The functions:
      dev_getbyhwaddr
      dev_getfirsthwbytype
      dev_get_by_flags
      dev_get_by_name
      __dev_get_by_name
      dev_get_by_index
      __dev_get_by_index
      dev_ioctl
      dev_ethtool
      dev_load
      wireless_process_ioctl
      
      were modified to take a network namespace argument, and
      deal with it.
      
      vlan_ioctl_set and brioctl_set were modified so their
      hooks will receive a network namespace argument.
      
      So basically anthing in the core of the network stack that was
      affected to by the change of dev_base was modified to handle
      multiple network namespaces.  The rest of the network stack was
      simply modified to explicitly use &init_net the initial network
      namespace.  This can be fixed when those components of the network
      stack are modified to handle multiple network namespaces.
      
      For now the ifindex generator is left global.
      
      Fundametally ifindex numbers are per namespace, or else
      we will have corner case problems with migration when
      we get that far.
      
      At the same time there are assumptions in the network stack
      that the ifindex of a network device won't change.  Making
      the ifindex number global seems a good compromise until
      the network stack can cope with ifindex changes when
      you change namespaces, and the like.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      881d966b
    • E
      [NET]: Make /proc/net per network namespace · 457c4cbc
      Eric W. Biederman 提交于
      This patch makes /proc/net per network namespace.  It modifies the global
      variables proc_net and proc_net_stat to be per network namespace.
      The proc_net file helpers are modified to take a network namespace argument,
      and all of their callers are fixed to pass &init_net for that argument.
      This ensures that all of the /proc/net files are only visible and
      usable in the initial network namespace until the code behind them
      has been updated to be handle multiple network namespaces.
      
      Making /proc/net per namespace is necessary as at least some files
      in /proc/net depend upon the set of network devices which is per
      network namespace, and even more files in /proc/net have contents
      that are relevant to a single network namespace.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      457c4cbc
  9. 03 8月, 2007 1 次提交
  10. 20 7月, 2007 1 次提交
    • P
      mm: Remove slab destructors from kmem_cache_create(). · 20c2df83
      Paul Mundt 提交于
      Slab destructors were no longer supported after Christoph's
      c59def9f change. They've been
      BUGs for both slab and slub, and slob never supported them
      either.
      
      This rips out support for the dtor pointer from kmem_cache_create()
      completely and fixes up every single callsite in the kernel (there were
      about 224, not including the slab allocator definitions themselves,
      or the documentation references).
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      20c2df83
  11. 11 7月, 2007 2 次提交
  12. 08 6月, 2007 1 次提交
  13. 25 5月, 2007 1 次提交
    • D
      [XFRM]: Allow packet drops during larval state resolution. · 14e50e57
      David S. Miller 提交于
      The current IPSEC rule resolution behavior we have does not work for a
      lot of people, even though technically it's an improvement from the
      -EAGAIN buisness we had before.
      
      Right now we'll block until the key manager resolves the route.  That
      works for simple cases, but many folks would rather packets get
      silently dropped until the key manager resolves the IPSEC rules.
      
      We can't tell these folks to "set the socket non-blocking" because
      they don't have control over the non-block setting of things like the
      sockets used to resolve DNS deep inside of the resolver libraries in
      libc.
      
      With that in mind I coded up the patch below with some help from
      Herbert Xu which provides packet-drop behavior during larval state
      resolution, controllable via sysctl and off by default.
      
      This lays the framework to either:
      
      1) Make this default at some point or...
      
      2) Move this logic into xfrm{4,6}_policy.c and implement the
         ARP-like resolution queue we've all been dreaming of.
         The idea would be to queue packets to the policy, then
         once the larval state is resolved by the key manager we
         re-resolve the route and push the packets out.  The
         packets would timeout if the rule didn't get resolved
         in a certain amount of time.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      14e50e57
  14. 18 5月, 2007 1 次提交
  15. 26 4月, 2007 2 次提交