1. 25 11月, 2010 1 次提交
  2. 16 11月, 2010 1 次提交
    • S
      IPVS: Add persistence engine to connection entry · e9e5eee8
      Simon Horman 提交于
      The dest of a connection may not exist if it has been created as the result
      of connection synchronisation. But in order for connection entries for
      templates with persistence engine data created through connection
      synchronisation to be valid access to the persistence engine pointer is
      required.  So add the persistence engine to the connection itself.
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      e9e5eee8
  3. 21 10月, 2010 1 次提交
    • J
      ipvs: changes for local real server · fc604767
      Julian Anastasov 提交于
       	This patch deals with local real servers:
      
      - Add support for DNAT to local address (different real server port).
      It needs ip_vs_out hook in LOCAL_OUT for both families because
      skb->protocol is not set for locally generated packets and can not
      be used to set 'af'.
      
      - Skip packets in ip_vs_in marked with skb->ipvs_property because
      ip_vs_out processing can be executed in LOCAL_OUT but we still
      have the conn_out_get check in ip_vs_in.
      
      - Ignore packets with inet->nodefrag from local stack
      
      - Require skb_dst(skb) != NULL because we use it to get struct net
      
      - Add support for changing the route to local IPv4 stack after DNAT
      depending on the source address type. Local client sets output
      route and the remote client sets input route. It looks like
      IPv6 does not need such rerouting because the replies use
      addresses from initial incoming header, not from skb route.
      
      - All transmitters now have strict checks for the destination
      address type: redirect from non-local address to local real
      server requires NAT method, local address can not be used as
      source address when talking to remote real server.
      
      - Now LOCALNODE is not set explicitly as forwarding
      method in real server to allow the connections to provide
      correct forwarding method to the backup server. Not sure if
      this breaks tools that expect to see 'Local' real server type.
      If needed, this can be supported with new flag IP_VS_DEST_F_LOCAL.
      Now it should be possible connections in backup that lost
      their fwmark information during sync to be forwarded properly
      to their daddr, even if it is local address in the backup server.
      By this way backup could be used as real server for DR or TUN,
      for NAT there are some restrictions because tuple collisions
      in conntracks can create problems for the traffic.
      
      - Call ip_vs_dst_reset when destination is updated in case
      some real server IP type is changed between local and remote.
      
      [ horms@verge.net.au: removed trailing whitespace ]
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      fc604767
  4. 19 10月, 2010 1 次提交
  5. 04 10月, 2010 3 次提交
  6. 22 9月, 2010 1 次提交
    • J
      ipvs: changes related to service usecnt · 26c15cfd
      Julian Anastasov 提交于
      	Change the usage of svc usecnt during command execution:
      
      - we check if svc is registered but we do not need to hold usecnt
      reference while under __ip_vs_mutex, only the packet handling needs
      it during scheduling
      
      - change __ip_vs_service_get to __ip_vs_service_find and
      __ip_vs_svc_fwm_get to __ip_vs_svc_fwm_find because now caller
      will increase svc->usecnt
      
      - put common code that calls update_service in __ip_vs_update_dest
      
      - put common code in ip_vs_unlink_service() and use it to unregister
      the service
      
      - add comment that svc should not be accessed after ip_vs_del_service
      anymore
      
      - all IP_VS_WAIT_WHILE calls are now unified: usecnt > 0
      
      - Properly log the app ports
      
      	As result, some problems are fixed:
      
      - possible use-after-free of svc in ip_vs_genl_set_cmd after
      ip_vs_del_service because our usecnt reference does not guarantee that
      svc is not freed on refcnt==0, eg. when no dests are moved to trash
      
      - possible usecnt leak in do_ip_vs_set_ctl after ip_vs_del_service
      when the service is not freed now, for example, when some
      destionations are moved into trash and svc->refcnt remains above 0.
      It is harmless because svc is not in hash anymore.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Acked-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      26c15cfd
  7. 21 9月, 2010 2 次提交
    • J
      ipvs: make rerouting optional with snat_reroute · 8a803040
      Julian Anastasov 提交于
      	Add new sysctl flag "snat_reroute". Recent kernels use
      ip_route_me_harder() to route LVS-NAT responses properly by
      VIP when there are multiple paths to client. But setups
      that do not have alternative default routes can skip this
      routing lookup by using snat_reroute=0.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      8a803040
    • J
      ipvs: netfilter connection tracking changes · f4bc17cd
      Julian Anastasov 提交于
      	Add more code to IPVS to work with Netfilter connection
      tracking and fix some problems.
      
      - Allow IPVS to be compiled without connection tracking as in
      2.6.35 and before. This can avoid keeping conntracks for all
      IPVS connections because this costs memory. ip_vs_ftp still
      depends on connection tracking and NAT as implemented for 2.6.36.
      
      - Add sysctl var "conntrack" to enable connection tracking for
      all IPVS connections. For loaded IPVS directors it needs
      tuning of nf_conntrack_max limit.
      
      - Add IP_VS_CONN_F_NFCT connection flag to request the connection
      to use connection tracking. This allows user space to provide this
      flag, for example, in dest->conn_flags. This can be useful to
      request connection tracking per real server instead of forcing it
      for all connections with the "conntrack" sysctl. This flag is
      set currently only by ip_vs_ftp and of course by "conntrack" sysctl.
      
      - Add ip_vs_nfct.c file to hold all connection tracking code,
      by this way main code should not depend of netfilter conntrack
      support.
      
      - Return back the ip_vs_post_routing handler as in 2.6.35 and use
      skb->ipvs_property=1 to allow IPVS to work without connection
      tracking
      
      Connection tracking:
      
      - most of the code is already in 2.6.36-rc
      
      - alter conntrack reply tuple for LVS-NAT connections when first packet
      from client is forwarded and conntrack state is NEW or RELATED.
      Additionally, alter reply for RELATED connections from real server,
      again for packet in original direction.
      
      - add IP_VS_XMIT_TUNNEL to confirm conntrack (without altering
      reply) for LVS-TUN early because we want to call nf_reset. It is
      needed because we add IPIP header and the original conntrack
      should be preserved, not destroyed. The transmitted IPIP packets
      can reuse same conntrack, so we do not set skb->ipvs_property.
      
      - try to destroy conntrack when the IPVS connection is destroyed.
      It is not fatal if conntrack disappears before that, it depends
      on the used timers.
      
      Fix problems from long time:
      
      - add skb->ip_summed = CHECKSUM_NONE for the LVS-TUN transmitters
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      f4bc17cd
  8. 17 9月, 2010 1 次提交
  9. 27 8月, 2010 2 次提交
  10. 26 8月, 2010 1 次提交
  11. 22 6月, 2010 1 次提交
  12. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  13. 18 2月, 2010 1 次提交
  14. 05 1月, 2010 1 次提交
    • C
      IPVS: Allow boot time change of hash size · 6f7edb48
      Catalin(ux) M. BOIE 提交于
      I was very frustrated about the fact that I have to recompile the kernel
      to change the hash size. So, I created this patch.
      
      If IPVS is built-in you can append ip_vs.conn_tab_bits=?? to kernel
      command line, or, if you built IPVS as modules, you can add
      options ip_vs conn_tab_bits=??.
      
      To keep everything backward compatible, you still can select the size at
      compile time, and that will be used as default.
      
      It has been about a year since this patch was originally posted
      and subsequently dropped on the basis of insufficient test data.
      
      Mark Bergsma has provided the following test results which seem
      to strongly support the need for larger hash table sizes:
      
      We do however run into the same problem with the default setting (212 =
      4096 entries), as most of our LVS balancers handle around a million
      connections/SLAB entries at any point in time (around 100-150 kpps
      load). With only 4096 hash table entries this implies that each entry
      consists of a linked list of 256 connections *on average*.
      
      To provide some statistics, I did an oprofile run on an 2.6.31 kernel,
      with both the default 4096 table size, and the same kernel recompiled
      with IP_VS_CONN_TAB_BITS set to 18 (218 = 262144 entries). I built a
      quick test setup with a part of Wikimedia/Wikipedia's live traffic
      mirrored by the switch to the test host.
      
      With the default setting, at ~ 120 kpps packet load we saw a typical %si
      CPU usage of around 30-35%, and oprofile reported a hot spot in
      ip_vs_conn_in_get:
      
      samples  %        image name               app name
      symbol name
      1719761  42.3741  ip_vs.ko                 ip_vs.ko      ip_vs_conn_in_get
      302577    7.4554  bnx2                     bnx2          /bnx2
      181984    4.4840  vmlinux                  vmlinux       __ticket_spin_lock
      128636    3.1695  vmlinux                  vmlinux       ip_route_input
      74345     1.8318  ip_vs.ko                 ip_vs.ko      ip_vs_conn_out_get
      68482     1.6874  vmlinux                  vmlinux       mwait_idle
      
      After loading the recompiled kernel with 218 entries, %si CPU usage
      dropped in half to around 12-18%, and oprofile looks much healthier,
      with only 7% spent in ip_vs_conn_in_get:
      
      samples  %        image name               app name
      symbol name
      265641   14.4616  bnx2                     bnx2         /bnx2
      143251    7.7986  vmlinux                  vmlinux      __ticket_spin_lock
      140661    7.6576  ip_vs.ko                 ip_vs.ko     ip_vs_conn_in_get
      94364     5.1372  vmlinux                  vmlinux      mwait_idle
      86267     4.6964  vmlinux                  vmlinux      ip_route_input
      
      [ horms@verge.net.au: trivial up-port and minor style fixes ]
      Signed-off-by: NCatalin(ux) M. BOIE <catab@embedromix.ro>
      Cc: Mark Bergsma <mark@wikimedia.org>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      6f7edb48
  15. 04 1月, 2010 1 次提交
  16. 16 12月, 2009 1 次提交
    • S
      ipvs: zero usvc and udest · 258c8893
      Simon Horman 提交于
      Make sure that any otherwise uninitialised fields of usvc are zero.
      
      This has been obvserved to cause a problem whereby the port of
      fwmark services may end up as a non-zero value which causes
      scheduling of a destination server to fail for persisitent services.
      
      As observed by Deon van der Merwe <dvdm@truteq.co.za>.
      This fix suggested by Julian Anastasov <ja@ssi.bg>.
      
      For good measure also zero udest.
      
      Cc: Deon van der Merwe <dvdm@truteq.co.za>
      Acked-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      Cc: stable@kernel.org
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      258c8893
  17. 12 11月, 2009 1 次提交
    • E
      sysctl net: Remove unused binary sysctl code · f8572d8f
      Eric W. Biederman 提交于
      Now that sys_sysctl is a compatiblity wrapper around /proc/sys
      all sysctl strategy routines, and all ctl_name and strategy
      entries in the sysctl tables are unused, and can be
      revmoed.
      
      In addition neigh_sysctl_register has been modified to no longer
      take a strategy argument and it's callers have been modified not
      to pass one.
      
      Cc: "David Miller" <davem@davemloft.net>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: netdev@vger.kernel.org
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      f8572d8f
  18. 24 9月, 2009 1 次提交
  19. 03 8月, 2009 1 次提交
  20. 31 7月, 2009 1 次提交
  21. 13 7月, 2009 1 次提交
    • J
      genetlink: make netns aware · 134e6375
      Johannes Berg 提交于
      This makes generic netlink network namespace aware. No
      generic netlink families except for the controller family
      are made namespace aware, they need to be checked one by
      one and then set the family->netnsok member to true.
      
      A new function genlmsg_multicast_netns() is introduced to
      allow sending a multicast message in a given namespace,
      for example when it applies to an object that lives in
      that namespace, a new function genlmsg_multicast_allns()
      to send a message to all network namespaces (for objects
      that do not have an associated netns).
      
      The function genlmsg_multicast() is changed to multicast
      the message in just init_net, which is currently correct
      for all generic netlink families since they only work in
      init_net right now. Some will later want to work in all
      net namespaces because they do not care about the netns
      at all -- those will have to be converted to use one of
      the new functions genlmsg_multicast_allns() or
      genlmsg_multicast_netns() whenever they are made netns
      aware in some way.
      
      After this patch families can easily decide whether or
      not they should be available in all net namespaces. Many
      genl families us it for objects not related to networking
      and should therefore be available in all namespaces, but
      that will have to be done on a per family basis.
      
      Note that this doesn't touch on the checkpoint/restart
      problem where network namespaces could be used, genl
      families and multicast groups are numbered globally and
      I see no easy way of changing that, especially since it
      must be possible to multicast to all network namespaces
      for those families that do not care about netns.
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      134e6375
  22. 22 5月, 2009 1 次提交
  23. 04 11月, 2008 2 次提交
  24. 31 10月, 2008 1 次提交
  25. 30 10月, 2008 1 次提交
  26. 29 10月, 2008 1 次提交
  27. 07 10月, 2008 1 次提交
  28. 22 9月, 2008 1 次提交
  29. 17 9月, 2008 2 次提交
  30. 09 9月, 2008 1 次提交
  31. 08 9月, 2008 2 次提交
  32. 05 9月, 2008 2 次提交