1. 23 11月, 2007 1 次提交
    • I
      [TCP]: MTUprobe: receiver window & data available checks fixed · 91cc17c0
      Ilpo Järvinen 提交于
      It seems that the checked range for receiver window check should
      begin from the first rather than from the last skb that is going
      to be included to the probe. And that can be achieved without
      reference to skbs at all, snd_nxt and write_seq provides the
      correct seqno already. Plus, it SHOULD account packets that are
      necessary to trigger fast retransmit [RFC4821].
      
      Location of snd_wnd < probe_size/size_needed check is bogus
      because it will cause the other if() match as well (due to
      snd_nxt >= snd_una invariant).
      
      Removed dead obvious comment.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      91cc17c0
  2. 21 11月, 2007 4 次提交
  3. 20 11月, 2007 7 次提交
    • E
      [NETFILTER]: Fix kernel panic with REDIRECT target. · 1f305323
      Evgeniy Polyakov 提交于
      When connection tracking entry (nf_conn) is about to copy itself it can
      have some of its extension users (like nat) as being already freed and
      thus not required to be copied.
      
      Actually looking at this function I suspect it was copied from
      nf_nat_setup_info() and thus bug was introduced.
      
      Report and testing from David <david@unsolicited.net>.
      
      [ Patrick McHardy states:
      
      	I now understand whats happening:
      
      	- new connection is allocated without helper
      	- connection is REDIRECTed to localhost
      	- nf_nat_setup_info adds NAT extension, but doesn't initialize it yet
      	- nf_conntrack_alter_reply performs a helper lookup based on the
      	   new tuple, finds the SIP helper and allocates a helper extension,
      	   causing reallocation because of too little space
      	- nf_nat_move_storage is called with the uninitialized nat extension
      
      	So your fix is entirely correct, thanks a lot :)  ]
      Signed-off-by: NEvgeniy Polyakov <johnpol@2ka.mipt.ru>
      Acked-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f305323
    • J
      [IPV4]: Add missing "space" · 464c4f18
      Joe Perches 提交于
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      464c4f18
    • S
      [TCP]: Problem bug with sysctl_tcp_congestion_control function · 5487796f
      Sam Jansen 提交于
      From: "Sam Jansen" <sjansen@google.com>
      
      sysctl_tcp_congestion_control seems to have a bug that prevents it
      from actually calling the tcp_set_default_congestion_control
      function. This is not so apparent because it does not return an error
      and generally the /proc interface is used to configure the default TCP
      congestion control algorithm.  This is present in 2.6.18 onwards and
      probably earlier, though I have not inspected 2.6.15--2.6.17.
      
      sysctl_tcp_congestion_control calls sysctl_string and expects a successful
      return code of 0. In such a case it actually sets the congestion control
      algorithm with tcp_set_default_congestion_control. Otherwise, it returns the
      value returned by sysctl_string. This was correct in 2.6.14, as sysctl_string
      returned 0 on success. However, sysctl_string was updated to return 1 on
      success around about 2.6.15 and sysctl_tcp_congestion_control was not updated.
      Even though sysctl_tcp_congestion_control returns 1, do_sysctl_strategy
      converts this return code to '0', so the caller never notices the error.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5487796f
    • I
      [TCP] MTUprobe: fix potential sk_send_head corruption · 6e421410
      Ilpo Jrvinen 提交于
      When the abstraction functions got added, conversion here was
      made incorrectly. As a result, the skb may end up pointing
      to skb which got included to the probe skb and then was freed.
      For it to trigger, however, skb_transmit must fail sending as
      well.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e421410
    • S
      [IPVS]: Move remaining sysctl handlers over to CTL_UNNUMBERED · 9055fa1f
      Simon Horman 提交于
      Switch the remaining IPVS sysctl entries over to to use CTL_UNNUMBERED,
      I stronly doubt that anyone is using the sys_sysctl interface to
      these variables.
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9055fa1f
    • S
      [IPVS]: Fix sysctl warnings about missing strategy in schedulers · 9e103fa6
      Simon Horman 提交于
      sysctl table check failed: /net/ipv4/vs/lblc_expiration .3.5.21.19 Missing strategy
      [...]
      sysctl table check failed: /net/ipv4/vs/lblcr_expiration .3.5.21.20 Missing strategy
      
      Switch these entried over to use CTL_UNNUMBERED as clearly
      the sys_syscal portion wasn't working.
      
      This is along the same lines as Christian Borntraeger's patch that fixes
      up entries with no stratergy in net/ipv4/ipvs/ip_vs_ctl.c
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9e103fa6
    • C
      [IPVS]: Fix sysctl warnings about missing strategy · 611cd55b
      Christian Borntraeger 提交于
      Running the latest git code I get the following messages during boot:
      sysctl table check failed: /net/ipv4/vs/drop_entry .3.5.21.4 Missing strategy
      [...]		  
      sysctl table check failed: /net/ipv4/vs/drop_packet .3.5.21.5 Missing strategy
      [...]
      sysctl table check failed: /net/ipv4/vs/secure_tcp .3.5.21.6 Missing strategy
      [...]
      sysctl table check failed: /net/ipv4/vs/sync_threshold .3.5.21.24 Missing strategy
      
      I removed the binary sysctl handler for those messages and also removed
      the definitions in ip_vs.h. The alternative would be to implement a 
      proper strategy handler, but syscall sysctl is deprecated.
      
      There are other sysctl definitions that are commented out or work with 
      the default sysctl_data strategy. I did not touch these. 
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Acked-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      611cd55b
  4. 19 11月, 2007 1 次提交
  5. 15 11月, 2007 3 次提交
  6. 14 11月, 2007 2 次提交
  7. 13 11月, 2007 3 次提交
  8. 11 11月, 2007 9 次提交
  9. 07 11月, 2007 10 次提交
    • E
      [INET]: Remove per bucket rwlock in tcp/dccp ehash table. · 230140cf
      Eric Dumazet 提交于
      As done two years ago on IP route cache table (commit
      22c047cc) , we can avoid using one
      lock per hash bucket for the huge TCP/DCCP hash tables.
      
      On a typical x86_64 platform, this saves about 2MB or 4MB of ram, for
      litle performance differences. (we hit a different cache line for the
      rwlock, but then the bucket cache line have a better sharing factor
      among cpus, since we dirty it less often). For netstat or ss commands
      that want a full scan of hash table, we perform fewer memory accesses.
      
      Using a 'small' table of hashed rwlocks should be more than enough to
      provide correct SMP concurrency between different buckets, without
      using too much memory. Sizing of this table depends on
      num_possible_cpus() and various CONFIG settings.
      
      This patch provides some locking abstraction that may ease a future
      work using a different model for TCP/DCCP table.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      230140cf
    • R
      [IPVS]: Synchronize closing of Connections · efac5276
      Rumen G. Bogdanovski 提交于
      This patch makes the master daemon to sync the connection when it is about
      to close.  This makes the connections on the backup to close or timeout
      according their state.  Before the sync was performed only if the
      connection is in ESTABLISHED state which always made the connections to
      timeout in the hard coded 3 minutes. However the Andy Gospodarek's patch
      ([IPVS]: use proper timeout instead of fixed value) effectively did nothing
      more than increasing this to 15 minutes (Established state timeout).  So
      this patch makes use of proper timeout since it syncs the connections on
      status changes to FIN_WAIT (2min timeout) and CLOSE (10sec timeout).
      However if the backup misses CLOSE hopefully it did not miss FIN_WAIT.
      Otherwise we will just have to wait for the ESTABLISHED state timeout. As
      it is without this patch.  This way the number of the hanging connections
      on the backup is kept to minimum. And very few of them will be left to
      timeout with a long timeout.
      
      This is important if we want to make use of the fix for the real server
      overcommit on master/backup fail-over.
      Signed-off-by: NRumen G. Bogdanovski <rumen@voicecho.com>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      efac5276
    • R
      [IPVS]: Bind connections on stanby if the destination exists · 1e356f9c
      Rumen G. Bogdanovski 提交于
      This patch fixes the problem with node overload on director fail-over.
      Given the scenario: 2 nodes each accepting 3 connections at a time and 2
      directors, director failover occurs when the nodes are fully loaded (6
      connections to the cluster) in this case the new director will assign
      another 6 connections to the cluster, If the same real servers exist
      there.
      
      The problem turned to be in not binding the inherited connections to
      the real servers (destinations) on the backup director. Therefore:
      "ipvsadm -l" reports 0 connections:
      root@test2:~# ipvsadm -l
      IP Virtual Server version 1.2.1 (size=4096)
      Prot LocalAddress:Port Scheduler Flags
        -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
      TCP  test2.local:5999 wlc
        -> node473.local:5999           Route   1000   0          0
        -> node484.local:5999           Route   1000   0          0
      
      while "ipvs -lnc" is right
      root@test2:~# ipvsadm -lnc
      IPVS connection entries
      pro expire state       source             virtual            destination
      TCP 14:56  ESTABLISHED 192.168.0.10:39164 192.168.0.222:5999
      192.168.0.51:5999
      TCP 14:59  ESTABLISHED 192.168.0.10:39165 192.168.0.222:5999
      192.168.0.52:5999
      
      So the patch I am sending fixes the problem by binding the received
      connections to the appropriate service on the backup director, if it
      exists, else the connection will be handled the old way. So if the
      master and the backup directors are synchronized in terms of real
      services there will be no problem with server over-committing since
      new connections will not be created on the nonexistent real services
      on the backup. However if the service is created later on the backup,
      the binding will be performed when the next connection update is
      received. With this patch the inherited connections will show as
      inactive on the backup:
      
      root@test2:~# ipvsadm -l
      IP Virtual Server version 1.2.1 (size=4096)
      Prot LocalAddress:Port Scheduler Flags
        -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
      TCP  test2.local:5999 wlc
        -> node473.local:5999           Route   1000   0          1
        -> node484.local:5999           Route   1000   0          1
      
      rumen@test2:~$ cat /proc/net/ip_vs
      IP Virtual Server version 1.2.1 (size=4096)
      Prot LocalAddress:Port Scheduler Flags
        -> RemoteAddress:Port Forward Weight ActiveConn InActConn
      TCP  C0A800DE:176F wlc
        -> C0A80033:176F      Route   1000   0          1
        -> C0A80032:176F      Route   1000   0          1
      
      Regards,
      Rumen Bogdanovski
      Acked-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NRumen G. Bogdanovski <rumen@voicecho.com>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      1e356f9c
    • H
      [IPSEC]: Fix crypto_alloc_comp error checking · 4999f362
      Herbert Xu 提交于
      The function crypto_alloc_comp returns an errno instead of NULL
      to indicate error.  So it needs to be tested with IS_ERR.
      
      This is based on a patch by Vicen Beltran Querol.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4999f362
    • P
      [IPV4]: Compact some ifdefs in the fib code. · c3e9a353
      Pavel Emelyanov 提交于
      There are places that check for CONFIG_IP_MULTIPLE_TABLES
      twice in the same file, but the internals of these #ifdefs
      can be merged.
      
      As a side effect - remove one ifdef from inside a function.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c3e9a353
    • E
      [IPV4]: Use the {DEFINE|REF}_PROTO_INUSE infrastructure · 47a31a6f
      Eric Dumazet 提交于
      Trivial patch to make "tcp,udp,udplite,raw" protocols uses the fast
      "inuse sockets" infrastructure
      
      Each protocol use then a static percpu var, instead of a dynamic one.
      This saves some ram and some cpu cycles
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      47a31a6f
    • E
      [NET]: Define infrastructure to keep 'inuse' changes in an efficent SMP/NUMA way. · 286ab3d4
      Eric Dumazet 提交于
      "struct proto" currently uses an array stats[NR_CPUS] to track change on
      'inuse' sockets per protocol.
      
      If NR_CPUS is big, this means we use a big memory area for this.
      Moreover, all this memory area is located on a single node on NUMA
      machines, increasing memory pressure on the boot node.
      
      In this patch, I tried to :
      
      - Keep a fast !CONFIG_SMP implementation
      - Keep a fast CONFIG_SMP implementation for often used protocols
      (tcp,udp,raw,...)
      - Introduce a NUMA efficient implementation
      
      Some helper macros are defined in include/net/sock.h
      These macros take into account CONFIG_SMP
      
      If a "struct proto" is declared without using DEFINE_PROTO_INUSE /
      REF_PROTO_INUSE
      macros, it will automatically use a default implementation, using a
      dynamically allocated percpu zone.
      This default implementation will be NUMA efficient, but might use 32/64
      bytes per possible cpu
      because of current alloc_percpu() implementation.
      However it still should be better than previous implementation based on
      stats[NR_CPUS] field.
      
      When a "struct proto" is changed to use the new macros, we use a single
      static "int" percpu variable,
      lowering the memory and cpu costs, still preserving NUMA efficiency.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      286ab3d4
    • P
      [IPV4]: Clean the ip_sockglue.c from some ugly ifdefs · 6a9fb947
      Pavel Emelyanov 提交于
      The #idfed CONFIG_IP_MROUTE is sometimes places inside the if-s,
      which looks completely bad. Similar ifdefs inside the functions
      looks a bit better, but they are also not recommended to be used.
      
      Provide an ifdef-ed ip_mroute_opt() helper to cleanup the code.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6a9fb947
    • P
      [IPV4]: Consolidate the ip cork destruction in ip_output.c · 429f08e9
      Pavel Emelyanov 提交于
      The ip_push_pending_frames and ip_flush_pending_frames do the
      same things to flush the sock's cork. Move this into a separate
      function and save ~80 bytes from the .text
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      429f08e9
    • P
      [NETFILTER]: remove unneeded rcu_dereference() calls · d1332e0a
      Patrick McHardy 提交于
      As noticed by Paul McKenney, the rcu_dereference calls in the init path
      of NAT modules are unneeded, remove them.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d1332e0a