1. 19 3月, 2013 1 次提交
    • J
      ipvs: add backup_only flag to avoid loops · 0c12582f
      Julian Anastasov 提交于
      Dmitry Akindinov is reporting for a problem where SYNs are looping
      between the master and backup server when the backup server is used as
      real server in DR mode and has IPVS rules to function as director.
      
      Even when the backup function is enabled we continue to forward
      traffic and schedule new connections when the current master is using
      the backup server as real server. While this is not a problem for NAT,
      for DR and TUN method the backup server can not determine if a request
      comes from client or from director.
      
      To avoid such loops add new sysctl flag backup_only. It can be needed
      for DR/TUN setups that do not need backup and director function at the
      same time. When the backup function is enabled we stop any forwarding
      and pass the traffic to the local stack (real server mode). The flag
      disables the director function when the backup function is enabled.
      
      For setups that enable backup function for some virtual services and
      director function for other virtual services there should be another
      more complex solution to support DR/TUN mode, may be to assign
      per-virtual service syncid value, so that we can differentiate the
      requests.
      Reported-by: NDmitry Akindinov <dimak@stalker.com>
      Tested-by: NGerman Myzovsky <lawyer@sipnet.ru>
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      0c12582f
  2. 23 10月, 2012 1 次提交
  3. 28 9月, 2012 5 次提交
  4. 10 8月, 2012 2 次提交
  5. 17 7月, 2012 1 次提交
    • L
      ipvs: fix oops on NAT reply in br_nf context · 9e33ce45
      Lin Ming 提交于
      IPVS should not reset skb->nf_bridge in FORWARD hook
      by calling nf_reset for NAT replies. It triggers oops in
      br_nf_forward_finish.
      
      [  579.781508] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
      [  579.781669] IP: [<ffffffff817b1ca5>] br_nf_forward_finish+0x58/0x112
      [  579.781792] PGD 218f9067 PUD 0
      [  579.781865] Oops: 0000 [#1] SMP
      [  579.781945] CPU 0
      [  579.781983] Modules linked in:
      [  579.782047]
      [  579.782080]
      [  579.782114] Pid: 4644, comm: qemu Tainted: G        W    3.5.0-rc5-00006-g95e69f9 #282 Hewlett-Packard  /30E8
      [  579.782300] RIP: 0010:[<ffffffff817b1ca5>]  [<ffffffff817b1ca5>] br_nf_forward_finish+0x58/0x112
      [  579.782455] RSP: 0018:ffff88007b003a98  EFLAGS: 00010287
      [  579.782541] RAX: 0000000000000008 RBX: ffff8800762ead00 RCX: 000000000001670a
      [  579.782653] RDX: 0000000000000000 RSI: 000000000000000a RDI: ffff8800762ead00
      [  579.782845] RBP: ffff88007b003ac8 R08: 0000000000016630 R09: ffff88007b003a90
      [  579.782957] R10: ffff88007b0038e8 R11: ffff88002da37540 R12: ffff88002da01a02
      [  579.783066] R13: ffff88002da01a80 R14: ffff88002d83c000 R15: ffff88002d82a000
      [  579.783177] FS:  0000000000000000(0000) GS:ffff88007b000000(0063) knlGS:00000000f62d1b70
      [  579.783306] CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
      [  579.783395] CR2: 0000000000000004 CR3: 00000000218fe000 CR4: 00000000000027f0
      [  579.783505] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  579.783684] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [  579.783795] Process qemu (pid: 4644, threadinfo ffff880021b20000, task ffff880021aba760)
      [  579.783919] Stack:
      [  579.783959]  ffff88007693cedc ffff8800762ead00 ffff88002da01a02 ffff8800762ead00
      [  579.784110]  ffff88002da01a02 ffff88002da01a80 ffff88007b003b18 ffffffff817b26c7
      [  579.784260]  ffff880080000000 ffffffff81ef59f0 ffff8800762ead00 ffffffff81ef58b0
      [  579.784477] Call Trace:
      [  579.784523]  <IRQ>
      [  579.784562]
      [  579.784603]  [<ffffffff817b26c7>] br_nf_forward_ip+0x275/0x2c8
      [  579.784707]  [<ffffffff81704b58>] nf_iterate+0x47/0x7d
      [  579.784797]  [<ffffffff817ac32e>] ? br_dev_queue_push_xmit+0xae/0xae
      [  579.784906]  [<ffffffff81704bfb>] nf_hook_slow+0x6d/0x102
      [  579.784995]  [<ffffffff817ac32e>] ? br_dev_queue_push_xmit+0xae/0xae
      [  579.785175]  [<ffffffff8187fa95>] ? _raw_write_unlock_bh+0x19/0x1b
      [  579.785179]  [<ffffffff817ac417>] __br_forward+0x97/0xa2
      [  579.785179]  [<ffffffff817ad366>] br_handle_frame_finish+0x1a6/0x257
      [  579.785179]  [<ffffffff817b2386>] br_nf_pre_routing_finish+0x26d/0x2cb
      [  579.785179]  [<ffffffff817b2cf0>] br_nf_pre_routing+0x55d/0x5c1
      [  579.785179]  [<ffffffff81704b58>] nf_iterate+0x47/0x7d
      [  579.785179]  [<ffffffff817ad1c0>] ? br_handle_local_finish+0x44/0x44
      [  579.785179]  [<ffffffff81704bfb>] nf_hook_slow+0x6d/0x102
      [  579.785179]  [<ffffffff817ad1c0>] ? br_handle_local_finish+0x44/0x44
      [  579.785179]  [<ffffffff81551525>] ? sky2_poll+0xb35/0xb54
      [  579.785179]  [<ffffffff817ad62a>] br_handle_frame+0x213/0x229
      [  579.785179]  [<ffffffff817ad417>] ? br_handle_frame_finish+0x257/0x257
      [  579.785179]  [<ffffffff816e3b47>] __netif_receive_skb+0x2b4/0x3f1
      [  579.785179]  [<ffffffff816e69fc>] process_backlog+0x99/0x1e2
      [  579.785179]  [<ffffffff816e6800>] net_rx_action+0xdf/0x242
      [  579.785179]  [<ffffffff8107e8a8>] __do_softirq+0xc1/0x1e0
      [  579.785179]  [<ffffffff8135a5ba>] ? trace_hardirqs_off_thunk+0x3a/0x6c
      [  579.785179]  [<ffffffff8188812c>] call_softirq+0x1c/0x30
      
      The steps to reproduce as follow,
      
      1. On Host1, setup brige br0(192.168.1.106)
      2. Boot a kvm guest(192.168.1.105) on Host1 and start httpd
      3. Start IPVS service on Host1
         ipvsadm -A -t 192.168.1.106:80 -s rr
         ipvsadm -a -t 192.168.1.106:80 -r 192.168.1.105:80 -m
      4. Run apache benchmark on Host2(192.168.1.101)
         ab -n 1000 http://192.168.1.106/
      
      ip_vs_reply4
        ip_vs_out
          handle_response
            ip_vs_notrack
              nf_reset()
              {
                skb->nf_bridge = NULL;
              }
      
      Actually, IPVS wants in this case just to replace nfct
      with untracked version. So replace the nf_reset(skb) call
      in ip_vs_notrack() with a nf_conntrack_put(skb->nfct) call.
      Signed-off-by: NLin Ming <mlin@ss.pku.edu.cn>
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      9e33ce45
  6. 09 5月, 2012 3 次提交
    • P
      ipvs: add support for sync threads · f73181c8
      Pablo Neira Ayuso 提交于
      	Allow master and backup servers to use many threads
      for sync traffic. Add sysctl var "sync_ports" to define the
      number of threads. Every thread will use single UDP port,
      thread 0 will use the default port 8848 while last thread
      will use port 8848+sync_ports-1.
      
      	The sync traffic for connections is scheduled to many
      master threads based on the cp address but one connection is
      always assigned to same thread to avoid reordering of the
      sync messages.
      
      	Remove ip_vs_sync_switch_mode because this check
      for sync mode change is still risky. Instead, check for mode
      change under sync_buff_lock.
      
      	Make sure the backup socks do not block on reading.
      
      Special thanks to Aleksey Chudov for helping in all tests.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Tested-by: NAleksey Chudov <aleksey.chudov@gmail.com>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      f73181c8
    • J
      ipvs: reduce sync rate with time thresholds · 749c42b6
      Julian Anastasov 提交于
      	Add two new sysctl vars to control the sync rate with the
      main idea to reduce the rate for connection templates because
      currently it depends on the packet rate for controlled connections.
      This mechanism should be useful also for normal connections
      with high traffic.
      
      sync_refresh_period: in seconds, difference in reported connection
      	timer that triggers new sync message. It can be used to
      	avoid sync messages for the specified period (or half of
      	the connection timeout if it is lower) if connection state
      	is not changed from last sync.
      
      sync_retries: integer, 0..3, defines sync retries with period of
      	sync_refresh_period/8. Useful to protect against loss of
      	sync messages.
      
      	Allow sysctl_sync_threshold to be used with
      sysctl_sync_period=0, so that only single sync message is sent
      if sync_refresh_period is also 0.
      
      	Add new field "sync_endtime" in connection structure to
      hold the reported time when connection expires. The 2 lowest
      bits will represent the retry count.
      
      	As the sysctl_sync_period now can be 0 use ACCESS_ONCE to
      avoid division by zero.
      
      	Special thanks to Aleksey Chudov for being patient with me,
      for his extensive reports and helping in all tests.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Tested-by: NAleksey Chudov <aleksey.chudov@gmail.com>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      749c42b6
    • P
      ipvs: wakeup master thread · 1c003b15
      Pablo Neira Ayuso 提交于
      	High rate of sync messages in master can lead to
      overflowing the socket buffer and dropping the messages.
      Fixed sleep of 1 second without wakeup events is not suitable
      for loaded masters,
      
      	Use delayed_work to schedule sending for queued messages
      and limit the delay to IPVS_SYNC_SEND_DELAY (20ms). This will
      reduce the rate of wakeups but to avoid sending long bursts we
      wakeup the master thread after IPVS_SYNC_WAKEUP_RATE (8) messages.
      
      	Add hard limit for the queued messages before sending
      by using "sync_qlen_max" sysctl var. It defaults to 1/32 of
      the memory pages but actually represents number of messages.
      It will protect us from allocating large parts of memory
      when the sending rate is lower than the queuing rate.
      
      	As suggested by Pablo, add new sysctl var
      "sync_sock_size" to configure the SNDBUF (master) or
      RCVBUF (slave) socket limit. Default value is 0 (preserve
      system defaults).
      
      	Change the master thread to detect and block on
      SNDBUF overflow, so that we do not drop messages when
      the socket limit is low but the sync_qlen_max limit is
      not reached. On ENOBUFS or other errors just drop the
      messages.
      
      	Change master thread to enter TASK_INTERRUPTIBLE
      state early, so that we do not miss wakeups due to messages or
      kthread_should_stop event.
      
      Thanks to Pablo Neira Ayuso for his valuable feedback!
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      1c003b15
  7. 30 4月, 2012 2 次提交
  8. 21 4月, 2012 1 次提交
  9. 16 4月, 2012 1 次提交
  10. 05 3月, 2012 1 次提交
    • P
      BUG: headers with BUG/BUG_ON etc. need linux/bug.h · 187f1882
      Paul Gortmaker 提交于
      If a header file is making use of BUG, BUG_ON, BUILD_BUG_ON, or any
      other BUG variant in a static inline (i.e. not in a #define) then
      that header really should be including <linux/bug.h> and not just
      expecting it to be implicitly present.
      
      We can make this change risk-free, since if the files using these
      headers didn't have exposure to linux/bug.h already, they would have
      been causing compile failures/warnings.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      187f1882
  11. 31 12月, 2011 1 次提交
  12. 23 11月, 2011 1 次提交
  13. 01 11月, 2011 4 次提交
  14. 13 10月, 2011 1 次提交
  15. 27 7月, 2011 1 次提交
  16. 14 6月, 2011 1 次提交
  17. 13 6月, 2011 2 次提交
  18. 27 5月, 2011 1 次提交
  19. 13 5月, 2011 1 次提交
    • J
      ipvs: Remove all remaining references to rt->rt_{src,dst} · c92f5ca2
      Julian Anastasov 提交于
      Remove all remaining references to rt->rt_{src,dst}
      by using dest->dst_saddr to cache saddr (used for TUN mode).
      For ICMP in FORWARD hook just restrict the rt_mode for NAT
      to disable LOCALNODE. All other modes do not allow
      IP_VS_RT_MODE_RDR, so we should be safe with the ICMP
      forwarding. Using cp->daddr as replacement for rt_dst
      is safe for all modes except BYPASS, even when cp->dest is
      NULL because it is cp->daddr that is used to assign cp->dest
      for sync-ed connections.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c92f5ca2
  20. 10 5月, 2011 1 次提交
    • H
      IPVS: init and cleanup restructuring · 7a4f0761
      Hans Schillstrom 提交于
      DESCRIPTION
      This patch tries to restore the initial init and cleanup
      sequences that was before namspace patch.
      Netns also requires action when net devices unregister
      which has never been implemented. I.e this patch also
      covers when a device moves into a network namespace,
      and has to be released.
      
      IMPLEMENTATION
      The number of calls to register_pernet_device have been
      reduced to one for the ip_vs.ko
      Schedulers still have their own calls.
      
      This patch adds a function __ip_vs_service_cleanup()
      and an enable flag for the netfilter hooks.
      
      The nf hooks will be enabled when the first service is loaded
      and never disabled again, except when a namespace exit starts.
      Signed-off-by: NHans Schillstrom <hans@schillstrom.com>
      Acked-by: NJulian Anastasov <ja@ssi.bg>
      [horms@verge.net.au: minor edit to changelog]
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      7a4f0761
  21. 09 5月, 2011 1 次提交
    • H
      IPVS: init and cleanup restructuring · 74973f6f
      Hans Schillstrom 提交于
      DESCRIPTION
      This patch tries to restore the initial init and cleanup
      sequences that was before namspace patch.
      Netns also requires action when net devices unregister
      which has never been implemented. I.e this patch also
      covers when a device moves into a network namespace,
      and has to be released.
      
      IMPLEMENTATION
      The number of calls to register_pernet_device have been
      reduced to one for the ip_vs.ko
      Schedulers still have their own calls.
      
      This patch adds a function __ip_vs_service_cleanup()
      and an enable flag for the netfilter hooks.
      
      The nf hooks will be enabled when the first service is loaded
      and never disabled again, except when a namespace exit starts.
      Signed-off-by: NHans Schillstrom <hans@schillstrom.com>
      Acked-by: NJulian Anastasov <ja@ssi.bg>
      [horms@verge.net.au: minor edit to changelog]
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      74973f6f
  22. 25 4月, 2011 1 次提交
  23. 04 4月, 2011 1 次提交
  24. 31 3月, 2011 1 次提交
  25. 22 3月, 2011 1 次提交
    • S
      IPVS: Use global mutex in ip_vs_app.c · 736561a0
      Simon Horman 提交于
      As part of the work to make IPVS network namespace aware
      __ip_vs_app_mutex was replaced by a per-namespace lock,
      ipvs->app_mutex. ipvs->app_key is also supplied for debugging purposes.
      
      Unfortunately this implementation results in ipvs->app_key residing
      in non-static storage which at the very least causes a lockdep warning.
      
      This patch takes the rather heavy-handed approach of reinstating
      __ip_vs_app_mutex which will cover access to the ipvs->list_head
      of all network namespaces.
      
      [   12.610000] IPVS: Creating netns size=2456 id=0
      [   12.630000] IPVS: Registered protocols (TCP, UDP, SCTP, AH, ESP)
      [   12.640000] BUG: key ffff880003bbf1a0 not in .data!
      [   12.640000] ------------[ cut here ]------------
      [   12.640000] WARNING: at kernel/lockdep.c:2701 lockdep_init_map+0x37b/0x570()
      [   12.640000] Hardware name: Bochs
      [   12.640000] Pid: 1, comm: swapper Tainted: G        W 2.6.38-kexec-06330-g69b7efe-dirty #122
      [   12.650000] Call Trace:
      [   12.650000]  [<ffffffff8102e685>] warn_slowpath_common+0x75/0xb0
      [   12.650000]  [<ffffffff8102e6d5>] warn_slowpath_null+0x15/0x20
      [   12.650000]  [<ffffffff8105967b>] lockdep_init_map+0x37b/0x570
      [   12.650000]  [<ffffffff8105829d>] ? trace_hardirqs_on+0xd/0x10
      [   12.650000]  [<ffffffff81055ad8>] debug_mutex_init+0x38/0x50
      [   12.650000]  [<ffffffff8104bc4c>] __mutex_init+0x5c/0x70
      [   12.650000]  [<ffffffff81685ee7>] __ip_vs_app_init+0x64/0x86
      [   12.660000]  [<ffffffff81685a3b>] ? ip_vs_init+0x0/0xff
      [   12.660000]  [<ffffffff811b1c33>] T.620+0x43/0x170
      [   12.660000]  [<ffffffff811b1e9a>] ? register_pernet_subsys+0x1a/0x40
      [   12.660000]  [<ffffffff81685a3b>] ? ip_vs_init+0x0/0xff
      [   12.660000]  [<ffffffff81685a3b>] ? ip_vs_init+0x0/0xff
      [   12.660000]  [<ffffffff811b1db7>] register_pernet_operations+0x57/0xb0
      [   12.660000]  [<ffffffff81685a3b>] ? ip_vs_init+0x0/0xff
      [   12.670000]  [<ffffffff811b1ea9>] register_pernet_subsys+0x29/0x40
      [   12.670000]  [<ffffffff81685f19>] ip_vs_app_init+0x10/0x12
      [   12.670000]  [<ffffffff81685a87>] ip_vs_init+0x4c/0xff
      [   12.670000]  [<ffffffff8166562c>] do_one_initcall+0x7a/0x12e
      [   12.670000]  [<ffffffff8166583e>] kernel_init+0x13e/0x1c2
      [   12.670000]  [<ffffffff8128c134>] kernel_thread_helper+0x4/0x10
      [   12.670000]  [<ffffffff8128ad40>] ? restore_args+0x0/0x30
      [   12.680000]  [<ffffffff81665700>] ? kernel_init+0x0/0x1c2
      [   12.680000]  [<ffffffff8128c130>] ? kernel_thread_helper+0x0/0x1global0
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Julian Anastasov <ja@ssi.bg>
      Cc: Hans Schillstrom <hans@schillstrom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      736561a0
  26. 15 3月, 2011 3 次提交