1. 13 1月, 2011 14 次提交
  2. 25 11月, 2010 4 次提交
  3. 16 11月, 2010 2 次提交
  4. 21 10月, 2010 5 次提交
    • J
      ipvs: provide address family for debugging · 0d79641a
      Julian Anastasov 提交于
       	As skb->protocol is not valid in LOCAL_OUT add
      parameter for address family in packet debugging functions.
      Even if ports are not present in AH and ESP change them to
      use ip_vs_tcpudp_debug_packet to show at least valid addresses
      as before. This patch removes the last user of skb->protocol
      in IPVS.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      0d79641a
    • J
      ipvs: changes for local real server · fc604767
      Julian Anastasov 提交于
       	This patch deals with local real servers:
      
      - Add support for DNAT to local address (different real server port).
      It needs ip_vs_out hook in LOCAL_OUT for both families because
      skb->protocol is not set for locally generated packets and can not
      be used to set 'af'.
      
      - Skip packets in ip_vs_in marked with skb->ipvs_property because
      ip_vs_out processing can be executed in LOCAL_OUT but we still
      have the conn_out_get check in ip_vs_in.
      
      - Ignore packets with inet->nodefrag from local stack
      
      - Require skb_dst(skb) != NULL because we use it to get struct net
      
      - Add support for changing the route to local IPv4 stack after DNAT
      depending on the source address type. Local client sets output
      route and the remote client sets input route. It looks like
      IPv6 does not need such rerouting because the replies use
      addresses from initial incoming header, not from skb route.
      
      - All transmitters now have strict checks for the destination
      address type: redirect from non-local address to local real
      server requires NAT method, local address can not be used as
      source address when talking to remote real server.
      
      - Now LOCALNODE is not set explicitly as forwarding
      method in real server to allow the connections to provide
      correct forwarding method to the backup server. Not sure if
      this breaks tools that expect to see 'Local' real server type.
      If needed, this can be supported with new flag IP_VS_DEST_F_LOCAL.
      Now it should be possible connections in backup that lost
      their fwmark information during sync to be forwarded properly
      to their daddr, even if it is local address in the backup server.
      By this way backup could be used as real server for DR or TUN,
      for NAT there are some restrictions because tuple collisions
      in conntracks can create problems for the traffic.
      
      - Call ip_vs_dst_reset when destination is updated in case
      some real server IP type is changed between local and remote.
      
      [ horms@verge.net.au: removed trailing whitespace ]
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      fc604767
    • J
      ipvs: do not schedule conns from real servers · 190ecd27
      Julian Anastasov 提交于
       	This patch is needed to avoid scheduling of
      packets from local real server when we add ip_vs_in
      in LOCAL_OUT hook to support local client.
      
       	Currently, when ip_vs_in can not find existing
      connection it tries to create new one by calling ip_vs_schedule.
      
       	The default indication from ip_vs_schedule was if
      connection was scheduled to real server. If real server is
      not available we try to use the bypass forwarding method
      or to send ICMP error. But in some cases we do not want to use
      the bypass feature. So, add flag 'ignored' to indicate if
      the scheduler ignores this packet.
      
       	Make sure we do not create new connections from replies.
      We can hit this problem for persistent services and local real
      server when ip_vs_in is added to LOCAL_OUT hook to handle
      local clients.
      
       	Also, make sure ip_vs_schedule ignores SYN packets
      for Active FTP DATA from local real server. The FTP DATA
      connection should be created on SYN+ACK from client to assign
      correct connection daddr.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      190ecd27
    • J
      ipvs: switch to notrack mode · cf356d69
      Julian Anastasov 提交于
       	Change skb->ipvs_property semantic. This is preparation
      to support ip_vs_out processing in LOCAL_OUT. ipvs_property=1
      will be used to avoid expensive lookups for traffic sent by
      transmitters. Now when conntrack support is not used we call
      ip_vs_notrack method to avoid problems in OUTPUT and
      POST_ROUTING hooks instead of exiting POST_ROUTING as before.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      cf356d69
    • J
      ipvs: optimize checksums for apps · 8b27b10f
      Julian Anastasov 提交于
       	Avoid full checksum calculation for apps that can provide
      info whether csum was broken after payload mangling. For now only
      ip_vs_ftp mangles payload and it updates the csum, so the full
      recalculation is avoided for all packets.
      
       	Add CHECKSUM_UNNECESSARY for snat_handler (TCP and UDP).
      It is needed to support SNAT from local address for the case
      when csum is fully recalculated.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      8b27b10f
  5. 19 10月, 2010 1 次提交
    • H
      ipvs: IPv6 tunnel mode · 714f095f
      Hans Schillstrom 提交于
      IPv6 encapsulation uses a bad source address for the tunnel.
      i.e. VIP will be used as local-addr and encap. dst addr.
      Decapsulation will not accept this.
      
      Example
      LVS (eth1 2003::2:0:1/96, VIP 2003::2:0:100)
         (eth0 2003::1:0:1/96)
      RS  (ethX 2003::1:0:5/96)
      
      tcpdump
      2003::2:0:100 > 2003::1:0:5: IP6 (hlim 63, next-header TCP (6) payload length: 40)  2003::3:0:10.50991 > 2003::2:0:100.http: Flags [S], cksum 0x7312 (correct), seq 3006460279, win 5760, options [mss 1440,sackOK,TS val 1904932 ecr 0,nop,wscale 3], length 0
      
      In Linux IPv6 impl. you can't have a tunnel with an any cast address
      receiving packets (I have not tried to interpret RFC 2473)
      To have receive capabilities the tunnel must have:
       - Local address set as multicast addr or an unicast addr
       - Remote address set as an unicast addr.
       - Loop back addres or Link local address are not allowed.
      
      This causes us to setup a tunnel in the Real Server with the
      LVS as the remote address, here you can't use the VIP address since it's
      used inside the tunnel.
      
      Solution
      Use outgoing interface IPv6 address (match against the destination).
      i.e. use ip6_route_output() to look up the route cache and
      then use ipv6_dev_get_saddr(...) to set the source address of the
      encapsulated packet.
      
      Additionally, cache the results in new destination
      fields: dst_cookie and dst_saddr and properly check the
      returned dst from ip6_route_output. We now add xfrm_lookup
      call only for the tunneling method where the source address
      is a local one.
      Signed-off-by: NHans Schillstrom <hans.schillstrom@ericsson.com>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      714f095f
  6. 04 10月, 2010 5 次提交
  7. 21 9月, 2010 2 次提交
    • J
      ipvs: make rerouting optional with snat_reroute · 8a803040
      Julian Anastasov 提交于
      	Add new sysctl flag "snat_reroute". Recent kernels use
      ip_route_me_harder() to route LVS-NAT responses properly by
      VIP when there are multiple paths to client. But setups
      that do not have alternative default routes can skip this
      routing lookup by using snat_reroute=0.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      8a803040
    • J
      ipvs: netfilter connection tracking changes · f4bc17cd
      Julian Anastasov 提交于
      	Add more code to IPVS to work with Netfilter connection
      tracking and fix some problems.
      
      - Allow IPVS to be compiled without connection tracking as in
      2.6.35 and before. This can avoid keeping conntracks for all
      IPVS connections because this costs memory. ip_vs_ftp still
      depends on connection tracking and NAT as implemented for 2.6.36.
      
      - Add sysctl var "conntrack" to enable connection tracking for
      all IPVS connections. For loaded IPVS directors it needs
      tuning of nf_conntrack_max limit.
      
      - Add IP_VS_CONN_F_NFCT connection flag to request the connection
      to use connection tracking. This allows user space to provide this
      flag, for example, in dest->conn_flags. This can be useful to
      request connection tracking per real server instead of forcing it
      for all connections with the "conntrack" sysctl. This flag is
      set currently only by ip_vs_ftp and of course by "conntrack" sysctl.
      
      - Add ip_vs_nfct.c file to hold all connection tracking code,
      by this way main code should not depend of netfilter conntrack
      support.
      
      - Return back the ip_vs_post_routing handler as in 2.6.35 and use
      skb->ipvs_property=1 to allow IPVS to work without connection
      tracking
      
      Connection tracking:
      
      - most of the code is already in 2.6.36-rc
      
      - alter conntrack reply tuple for LVS-NAT connections when first packet
      from client is forwarded and conntrack state is NEW or RELATED.
      Additionally, alter reply for RELATED connections from real server,
      again for packet in original direction.
      
      - add IP_VS_XMIT_TUNNEL to confirm conntrack (without altering
      reply) for LVS-TUN early because we want to call nf_reset. It is
      needed because we add IPIP header and the original conntrack
      should be preserved, not destroyed. The transmitted IPIP packets
      can reuse same conntrack, so we do not set skb->ipvs_property.
      
      - try to destroy conntrack when the IPVS connection is destroyed.
      It is not fatal if conntrack disappears before that, it depends
      on the used timers.
      
      Fix problems from long time:
      
      - add skb->ip_summed = CHECKSUM_NONE for the LVS-TUN transmitters
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      f4bc17cd
  8. 17 9月, 2010 1 次提交
  9. 09 9月, 2010 1 次提交
    • J
      ipvs: fix active FTP · 6523ce15
      Julian Anastasov 提交于
      - Do not create expectation when forwarding the PORT
        command to avoid blocking the connection. The problem is that
        nf_conntrack_ftp.c:help() tries to create the same expectation later in
        POST_ROUTING and drops the packet with "dropping packet" message after
        failure in nf_ct_expect_related.
      
      - Change ip_vs_update_conntrack to alter the conntrack
        for related connections from real server. If we do not alter the reply in
        this direction the next packet from client sent to vport 20 comes as NEW
        connection. We alter it but may be some collision happens for both
        conntracks and the second conntrack gets destroyed immediately. The
        connection stucks too.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6523ce15
  10. 02 8月, 2010 1 次提交
  11. 23 7月, 2010 1 次提交
    • H
      IPVS: make FTP work with full NAT support · 7f1c4075
      Hannes Eder 提交于
      Use nf_conntrack/nf_nat code to do the packet mangling and the TCP
      sequence adjusting.  The function 'ip_vs_skb_replace' is now dead
      code, so it is removed.
      
      To SNAT FTP, use something like:
      
      % iptables -t nat -A POSTROUTING -m ipvs --vaddr 192.168.100.30/32 \
          --vport 21 -j SNAT --to-source 192.168.10.10
      and for the data connections in passive mode:
      
      % iptables -t nat -A POSTROUTING -m ipvs --vaddr 192.168.100.30/32 \
          --vportctl 21 -j SNAT --to-source 192.168.10.10
      using '-m state --state RELATED' would also works.
      
      Make sure the kernel modules ip_vs_ftp, nf_conntrack_ftp, and
      nf_nat_ftp are loaded.
      
      [ up-port and minor fixes by Simon Horman <horms@verge.net.au> ]
      Signed-off-by: NHannes Eder <heder@google.com>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      7f1c4075
  12. 18 2月, 2010 1 次提交
  13. 05 1月, 2010 1 次提交
    • C
      IPVS: Allow boot time change of hash size · 6f7edb48
      Catalin(ux) M. BOIE 提交于
      I was very frustrated about the fact that I have to recompile the kernel
      to change the hash size. So, I created this patch.
      
      If IPVS is built-in you can append ip_vs.conn_tab_bits=?? to kernel
      command line, or, if you built IPVS as modules, you can add
      options ip_vs conn_tab_bits=??.
      
      To keep everything backward compatible, you still can select the size at
      compile time, and that will be used as default.
      
      It has been about a year since this patch was originally posted
      and subsequently dropped on the basis of insufficient test data.
      
      Mark Bergsma has provided the following test results which seem
      to strongly support the need for larger hash table sizes:
      
      We do however run into the same problem with the default setting (212 =
      4096 entries), as most of our LVS balancers handle around a million
      connections/SLAB entries at any point in time (around 100-150 kpps
      load). With only 4096 hash table entries this implies that each entry
      consists of a linked list of 256 connections *on average*.
      
      To provide some statistics, I did an oprofile run on an 2.6.31 kernel,
      with both the default 4096 table size, and the same kernel recompiled
      with IP_VS_CONN_TAB_BITS set to 18 (218 = 262144 entries). I built a
      quick test setup with a part of Wikimedia/Wikipedia's live traffic
      mirrored by the switch to the test host.
      
      With the default setting, at ~ 120 kpps packet load we saw a typical %si
      CPU usage of around 30-35%, and oprofile reported a hot spot in
      ip_vs_conn_in_get:
      
      samples  %        image name               app name
      symbol name
      1719761  42.3741  ip_vs.ko                 ip_vs.ko      ip_vs_conn_in_get
      302577    7.4554  bnx2                     bnx2          /bnx2
      181984    4.4840  vmlinux                  vmlinux       __ticket_spin_lock
      128636    3.1695  vmlinux                  vmlinux       ip_route_input
      74345     1.8318  ip_vs.ko                 ip_vs.ko      ip_vs_conn_out_get
      68482     1.6874  vmlinux                  vmlinux       mwait_idle
      
      After loading the recompiled kernel with 218 entries, %si CPU usage
      dropped in half to around 12-18%, and oprofile looks much healthier,
      with only 7% spent in ip_vs_conn_in_get:
      
      samples  %        image name               app name
      symbol name
      265641   14.4616  bnx2                     bnx2         /bnx2
      143251    7.7986  vmlinux                  vmlinux      __ticket_spin_lock
      140661    7.6576  ip_vs.ko                 ip_vs.ko     ip_vs_conn_in_get
      94364     5.1372  vmlinux                  vmlinux      mwait_idle
      86267     4.6964  vmlinux                  vmlinux      ip_route_input
      
      [ horms@verge.net.au: trivial up-port and minor style fixes ]
      Signed-off-by: NCatalin(ux) M. BOIE <catab@embedromix.ro>
      Cc: Mark Bergsma <mark@wikimedia.org>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      6f7edb48
  14. 04 11月, 2009 1 次提交