1. 07 11月, 2007 12 次提交
    • E
      [INET]: Remove per bucket rwlock in tcp/dccp ehash table. · 230140cf
      Eric Dumazet 提交于
      As done two years ago on IP route cache table (commit
      22c047cc) , we can avoid using one
      lock per hash bucket for the huge TCP/DCCP hash tables.
      
      On a typical x86_64 platform, this saves about 2MB or 4MB of ram, for
      litle performance differences. (we hit a different cache line for the
      rwlock, but then the bucket cache line have a better sharing factor
      among cpus, since we dirty it less often). For netstat or ss commands
      that want a full scan of hash table, we perform fewer memory accesses.
      
      Using a 'small' table of hashed rwlocks should be more than enough to
      provide correct SMP concurrency between different buckets, without
      using too much memory. Sizing of this table depends on
      num_possible_cpus() and various CONFIG settings.
      
      This patch provides some locking abstraction that may ease a future
      work using a different model for TCP/DCCP table.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      230140cf
    • R
      [IPVS]: Synchronize closing of Connections · efac5276
      Rumen G. Bogdanovski 提交于
      This patch makes the master daemon to sync the connection when it is about
      to close.  This makes the connections on the backup to close or timeout
      according their state.  Before the sync was performed only if the
      connection is in ESTABLISHED state which always made the connections to
      timeout in the hard coded 3 minutes. However the Andy Gospodarek's patch
      ([IPVS]: use proper timeout instead of fixed value) effectively did nothing
      more than increasing this to 15 minutes (Established state timeout).  So
      this patch makes use of proper timeout since it syncs the connections on
      status changes to FIN_WAIT (2min timeout) and CLOSE (10sec timeout).
      However if the backup misses CLOSE hopefully it did not miss FIN_WAIT.
      Otherwise we will just have to wait for the ESTABLISHED state timeout. As
      it is without this patch.  This way the number of the hanging connections
      on the backup is kept to minimum. And very few of them will be left to
      timeout with a long timeout.
      
      This is important if we want to make use of the fix for the real server
      overcommit on master/backup fail-over.
      Signed-off-by: NRumen G. Bogdanovski <rumen@voicecho.com>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      efac5276
    • R
      [IPVS]: Bind connections on stanby if the destination exists · 1e356f9c
      Rumen G. Bogdanovski 提交于
      This patch fixes the problem with node overload on director fail-over.
      Given the scenario: 2 nodes each accepting 3 connections at a time and 2
      directors, director failover occurs when the nodes are fully loaded (6
      connections to the cluster) in this case the new director will assign
      another 6 connections to the cluster, If the same real servers exist
      there.
      
      The problem turned to be in not binding the inherited connections to
      the real servers (destinations) on the backup director. Therefore:
      "ipvsadm -l" reports 0 connections:
      root@test2:~# ipvsadm -l
      IP Virtual Server version 1.2.1 (size=4096)
      Prot LocalAddress:Port Scheduler Flags
        -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
      TCP  test2.local:5999 wlc
        -> node473.local:5999           Route   1000   0          0
        -> node484.local:5999           Route   1000   0          0
      
      while "ipvs -lnc" is right
      root@test2:~# ipvsadm -lnc
      IPVS connection entries
      pro expire state       source             virtual            destination
      TCP 14:56  ESTABLISHED 192.168.0.10:39164 192.168.0.222:5999
      192.168.0.51:5999
      TCP 14:59  ESTABLISHED 192.168.0.10:39165 192.168.0.222:5999
      192.168.0.52:5999
      
      So the patch I am sending fixes the problem by binding the received
      connections to the appropriate service on the backup director, if it
      exists, else the connection will be handled the old way. So if the
      master and the backup directors are synchronized in terms of real
      services there will be no problem with server over-committing since
      new connections will not be created on the nonexistent real services
      on the backup. However if the service is created later on the backup,
      the binding will be performed when the next connection update is
      received. With this patch the inherited connections will show as
      inactive on the backup:
      
      root@test2:~# ipvsadm -l
      IP Virtual Server version 1.2.1 (size=4096)
      Prot LocalAddress:Port Scheduler Flags
        -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
      TCP  test2.local:5999 wlc
        -> node473.local:5999           Route   1000   0          1
        -> node484.local:5999           Route   1000   0          1
      
      rumen@test2:~$ cat /proc/net/ip_vs
      IP Virtual Server version 1.2.1 (size=4096)
      Prot LocalAddress:Port Scheduler Flags
        -> RemoteAddress:Port Forward Weight ActiveConn InActConn
      TCP  C0A800DE:176F wlc
        -> C0A80033:176F      Route   1000   0          1
        -> C0A80032:176F      Route   1000   0          1
      
      Regards,
      Rumen Bogdanovski
      Acked-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NRumen G. Bogdanovski <rumen@voicecho.com>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      1e356f9c
    • H
      [IPSEC]: Fix crypto_alloc_comp error checking · 4999f362
      Herbert Xu 提交于
      The function crypto_alloc_comp returns an errno instead of NULL
      to indicate error.  So it needs to be tested with IS_ERR.
      
      This is based on a patch by Vicen Beltran Querol.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4999f362
    • P
      [IPV4]: Compact some ifdefs in the fib code. · c3e9a353
      Pavel Emelyanov 提交于
      There are places that check for CONFIG_IP_MULTIPLE_TABLES
      twice in the same file, but the internals of these #ifdefs
      can be merged.
      
      As a side effect - remove one ifdef from inside a function.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c3e9a353
    • E
      [IPV4]: Use the {DEFINE|REF}_PROTO_INUSE infrastructure · 47a31a6f
      Eric Dumazet 提交于
      Trivial patch to make "tcp,udp,udplite,raw" protocols uses the fast
      "inuse sockets" infrastructure
      
      Each protocol use then a static percpu var, instead of a dynamic one.
      This saves some ram and some cpu cycles
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      47a31a6f
    • E
      [NET]: Define infrastructure to keep 'inuse' changes in an efficent SMP/NUMA way. · 286ab3d4
      Eric Dumazet 提交于
      "struct proto" currently uses an array stats[NR_CPUS] to track change on
      'inuse' sockets per protocol.
      
      If NR_CPUS is big, this means we use a big memory area for this.
      Moreover, all this memory area is located on a single node on NUMA
      machines, increasing memory pressure on the boot node.
      
      In this patch, I tried to :
      
      - Keep a fast !CONFIG_SMP implementation
      - Keep a fast CONFIG_SMP implementation for often used protocols
      (tcp,udp,raw,...)
      - Introduce a NUMA efficient implementation
      
      Some helper macros are defined in include/net/sock.h
      These macros take into account CONFIG_SMP
      
      If a "struct proto" is declared without using DEFINE_PROTO_INUSE /
      REF_PROTO_INUSE
      macros, it will automatically use a default implementation, using a
      dynamically allocated percpu zone.
      This default implementation will be NUMA efficient, but might use 32/64
      bytes per possible cpu
      because of current alloc_percpu() implementation.
      However it still should be better than previous implementation based on
      stats[NR_CPUS] field.
      
      When a "struct proto" is changed to use the new macros, we use a single
      static "int" percpu variable,
      lowering the memory and cpu costs, still preserving NUMA efficiency.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      286ab3d4
    • P
      [IPV4]: Clean the ip_sockglue.c from some ugly ifdefs · 6a9fb947
      Pavel Emelyanov 提交于
      The #idfed CONFIG_IP_MROUTE is sometimes places inside the if-s,
      which looks completely bad. Similar ifdefs inside the functions
      looks a bit better, but they are also not recommended to be used.
      
      Provide an ifdef-ed ip_mroute_opt() helper to cleanup the code.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6a9fb947
    • P
      [IPV4]: Consolidate the ip cork destruction in ip_output.c · 429f08e9
      Pavel Emelyanov 提交于
      The ip_push_pending_frames and ip_flush_pending_frames do the
      same things to flush the sock's cork. Move this into a separate
      function and save ~80 bytes from the .text
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      429f08e9
    • P
      [NETFILTER]: remove unneeded rcu_dereference() calls · d1332e0a
      Patrick McHardy 提交于
      As noticed by Paul McKenney, the rcu_dereference calls in the init path
      of NAT modules are unneeded, remove them.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d1332e0a
    • J
      [NETFILTER]: Clean up Makefile · 0795c65d
      Jan Engelhardt 提交于
      Sort matches and targets in the NF makefiles.
      Signed-off-by: NJan Engelhardt <jengelh@computergmbh.de>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0795c65d
    • A
      [NETFILTER]: ip{,6}_queue: convert to seq_file interface · 7351a22a
      Alexey Dobriyan 提交于
      I plan to kill ->get_info which means killing proc_net_create().
      Signed-off-by: NAlexey Dobriyan <adobriyan@sw.ru>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7351a22a
  2. 02 11月, 2007 2 次提交
    • J
      [SG] Get rid of __sg_mark_end() · c46f2334
      Jens Axboe 提交于
      sg_mark_end() overwrites the page_link information, but all users want
      __sg_mark_end() behaviour where we just set the end bit. That is the most
      natural way to use the sg list, since you'll fill it in and then mark the
      end point.
      
      So change sg_mark_end() to only set the termination bit. Add a sg_magic
      debug check as well, and clear a chain pointer if it is set.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      c46f2334
    • A
      cleanup asm/scatterlist.h includes · 87ae9afd
      Adrian Bunk 提交于
      Not architecture specific code should not #include <asm/scatterlist.h>.
      
      This patch therefore either replaces them with
      #include <linux/scatterlist.h> or simply removes them if they were
      unused.
      Signed-off-by: NAdrian Bunk <bunk@kernel.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      87ae9afd
  3. 01 11月, 2007 3 次提交
    • P
      [NET]: Forget the zero_it argument of sk_alloc() · 6257ff21
      Pavel Emelyanov 提交于
      Finally, the zero_it argument can be completely removed from
      the callers and from the function prototype.
      
      Besides, fix the checkpatch.pl warnings about using the
      assignments inside if-s.
      
      This patch is rather big, and it is a part of the previous one.
      I splitted it wishing to make the patches more readable. Hope 
      this particular split helped.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6257ff21
    • I
      [TCP]: Another TAGBITS -> SACKED_ACKED|LOST conversion · 261ab365
      Ilpo Jrvinen 提交于
      Similar to commit 3eec0047, point of this is to avoid
      skipping R-bit skbs.
      Signed-off-by: NIlpo Jrvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      261ab365
    • I
      [TCP]: Process DSACKs that reside within a SACK block · e56d6cd6
      Ilpo Jrvinen 提交于
      DSACK inside another SACK block were missed if start_seq of DSACK
      was larger than SACK block's because sorting prioritizes full
      processing of the SACK block before DSACK. After SACK block
      sorting situation is like this:
      
                   SSSSSSSSS
                        D
                              SSSSSS
                                     SSSSSSS
      
      Because write_queue is walked in-order, when the first SACK block
      has been processed, TCP is already past the skb for which the
      DSACK arrived and we haven't taught it to backtrack (nor should
      we), so TCP just continues processing by going to the next SACK
      block after the DSACK (if any).
      
      Whenever such DSACK is present, do an embedded checking during
      the previous SACK block.
      
      If the DSACK is below snd_una, there won't be overlapping SACK
      block, and thus no problem in that case. Also if start_seq of
      the DSACK is equal to the actual block, it will be processed
      first.
      
      Tested this by using netem to duplicate 15% of packets, and
      by printing SACK block when found_dup_sack is true and the 
      selected skb in the dup_sack = 1 branch (if taken):
      
        SACK block 0: 4344-5792 (relative to snd_una 2019137317)
        SACK block 1: 4344-5792 (relative to snd_una 2019137317) 
      
      equal start seqnos => next_dup = 0, dup_sack = 1 won't occur...
      
        SACK block 0: 5792-7240 (relative to snd_una 2019214061)
        SACK block 1: 2896-7240 (relative to snd_una 2019214061)
        DSACK skb match 5792-7240 (relative to snd_una)
      
      ...and next_dup = 1 case (after the not shown start_seq sort),
      went to dup_sack = 1 branch.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e56d6cd6
  4. 31 10月, 2007 3 次提交
    • D
      [NET]: Fix incorrect sg_mark_end() calls. · 51c739d1
      David S. Miller 提交于
      This fixes scatterlist corruptions added by
      
      	commit 68e3f5dd
      	[CRYPTO] users: Fix up scatterlist conversion errors
      
      The issue is that the code calls sg_mark_end() which clobbers the
      sg_page() pointer of the final scatterlist entry.
      
      The first part fo the fix makes skb_to_sgvec() do __sg_mark_end().
      
      After considering all skb_to_sgvec() call sites the most correct
      solution is to call __sg_mark_end() in skb_to_sgvec() since that is
      what all of the callers would end up doing anyways.
      
      I suspect this might have fixed some problems in virtio_net which is
      the sole non-crypto user of skb_to_sgvec().
      
      Other similar sg_mark_end() cases were converted over to
      __sg_mark_end() as well.
      
      Arguably sg_mark_end() is a poorly named function because it doesn't
      just "mark", it clears out the page pointer as a side effect, which is
      what led to these bugs in the first place.
      
      The one remaining plain sg_mark_end() call is in scsi_alloc_sgtable()
      and arguably it could be converted to __sg_mark_end() if only so that
      we can delete this confusing interface from linux/scatterlist.h
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      51c739d1
    • A
      [IPVS]: Remove /proc/net/ip_vs_lblcr · 07afa040
      Alexey Dobriyan 提交于
      It's under CONFIG_IP_VS_LBLCR_DEBUG option which never existed.
      Signed-off-by: NAlexey Dobriyan <adobriyan@sw.ru>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      07afa040
    • D
      Kbuild/doc: fix links to Documentation files · e403149c
      Dirk Hohndel 提交于
      Fix links to files in Documentation/* in various Kconfig files
      Signed-off-by: NDirk Hohndel <hohndel@linux.intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e403149c
  5. 30 10月, 2007 5 次提交
  6. 27 10月, 2007 2 次提交
  7. 26 10月, 2007 9 次提交
  8. 24 10月, 2007 3 次提交
  9. 22 10月, 2007 1 次提交