1. 11 11月, 2007 27 次提交
  2. 07 11月, 2007 13 次提交
    • P
      [NETLINK]: Fix unicast timeouts · c3d8d1e3
      Patrick McHardy 提交于
      Commit ed6dcf4a in the history.git tree broke netlink_unicast timeouts
      by moving the schedule_timeout() call to a new function that doesn't
      propagate the remaining timeout back to the caller. This means on each
      retry we start with the full timeout again.
      
      ipc/mqueue.c seems to actually want to wait indefinitely so this
      behaviour is retained.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c3d8d1e3
    • E
      [INET]: Remove per bucket rwlock in tcp/dccp ehash table. · 230140cf
      Eric Dumazet 提交于
      As done two years ago on IP route cache table (commit
      22c047cc) , we can avoid using one
      lock per hash bucket for the huge TCP/DCCP hash tables.
      
      On a typical x86_64 platform, this saves about 2MB or 4MB of ram, for
      litle performance differences. (we hit a different cache line for the
      rwlock, but then the bucket cache line have a better sharing factor
      among cpus, since we dirty it less often). For netstat or ss commands
      that want a full scan of hash table, we perform fewer memory accesses.
      
      Using a 'small' table of hashed rwlocks should be more than enough to
      provide correct SMP concurrency between different buckets, without
      using too much memory. Sizing of this table depends on
      num_possible_cpus() and various CONFIG settings.
      
      This patch provides some locking abstraction that may ease a future
      work using a different model for TCP/DCCP table.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      230140cf
    • R
      [IPVS]: Synchronize closing of Connections · efac5276
      Rumen G. Bogdanovski 提交于
      This patch makes the master daemon to sync the connection when it is about
      to close.  This makes the connections on the backup to close or timeout
      according their state.  Before the sync was performed only if the
      connection is in ESTABLISHED state which always made the connections to
      timeout in the hard coded 3 minutes. However the Andy Gospodarek's patch
      ([IPVS]: use proper timeout instead of fixed value) effectively did nothing
      more than increasing this to 15 minutes (Established state timeout).  So
      this patch makes use of proper timeout since it syncs the connections on
      status changes to FIN_WAIT (2min timeout) and CLOSE (10sec timeout).
      However if the backup misses CLOSE hopefully it did not miss FIN_WAIT.
      Otherwise we will just have to wait for the ESTABLISHED state timeout. As
      it is without this patch.  This way the number of the hanging connections
      on the backup is kept to minimum. And very few of them will be left to
      timeout with a long timeout.
      
      This is important if we want to make use of the fix for the real server
      overcommit on master/backup fail-over.
      Signed-off-by: NRumen G. Bogdanovski <rumen@voicecho.com>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      efac5276
    • R
      [IPVS]: Bind connections on stanby if the destination exists · 1e356f9c
      Rumen G. Bogdanovski 提交于
      This patch fixes the problem with node overload on director fail-over.
      Given the scenario: 2 nodes each accepting 3 connections at a time and 2
      directors, director failover occurs when the nodes are fully loaded (6
      connections to the cluster) in this case the new director will assign
      another 6 connections to the cluster, If the same real servers exist
      there.
      
      The problem turned to be in not binding the inherited connections to
      the real servers (destinations) on the backup director. Therefore:
      "ipvsadm -l" reports 0 connections:
      root@test2:~# ipvsadm -l
      IP Virtual Server version 1.2.1 (size=4096)
      Prot LocalAddress:Port Scheduler Flags
        -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
      TCP  test2.local:5999 wlc
        -> node473.local:5999           Route   1000   0          0
        -> node484.local:5999           Route   1000   0          0
      
      while "ipvs -lnc" is right
      root@test2:~# ipvsadm -lnc
      IPVS connection entries
      pro expire state       source             virtual            destination
      TCP 14:56  ESTABLISHED 192.168.0.10:39164 192.168.0.222:5999
      192.168.0.51:5999
      TCP 14:59  ESTABLISHED 192.168.0.10:39165 192.168.0.222:5999
      192.168.0.52:5999
      
      So the patch I am sending fixes the problem by binding the received
      connections to the appropriate service on the backup director, if it
      exists, else the connection will be handled the old way. So if the
      master and the backup directors are synchronized in terms of real
      services there will be no problem with server over-committing since
      new connections will not be created on the nonexistent real services
      on the backup. However if the service is created later on the backup,
      the binding will be performed when the next connection update is
      received. With this patch the inherited connections will show as
      inactive on the backup:
      
      root@test2:~# ipvsadm -l
      IP Virtual Server version 1.2.1 (size=4096)
      Prot LocalAddress:Port Scheduler Flags
        -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
      TCP  test2.local:5999 wlc
        -> node473.local:5999           Route   1000   0          1
        -> node484.local:5999           Route   1000   0          1
      
      rumen@test2:~$ cat /proc/net/ip_vs
      IP Virtual Server version 1.2.1 (size=4096)
      Prot LocalAddress:Port Scheduler Flags
        -> RemoteAddress:Port Forward Weight ActiveConn InActConn
      TCP  C0A800DE:176F wlc
        -> C0A80033:176F      Route   1000   0          1
        -> C0A80032:176F      Route   1000   0          1
      
      Regards,
      Rumen Bogdanovski
      Acked-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NRumen G. Bogdanovski <rumen@voicecho.com>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      1e356f9c
    • P
      [NET]: Clean proto_(un)register from in-code ifdefs · b733c007
      Pavel Emelyanov 提交于
      The struct proto has the per-cpu "inuse" counter, which is handled
      with a special care. All the handling code hides under the ifdef
      CONFIG_SMP and it introduces some code duplication and makes it
      look worse than it could.
      
      Clean this.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b733c007
    • H
      [IPSEC]: Fix crypto_alloc_comp error checking · 4999f362
      Herbert Xu 提交于
      The function crypto_alloc_comp returns an errno instead of NULL
      to indicate error.  So it needs to be tested with IS_ERR.
      
      This is based on a patch by Vicen Beltran Querol.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4999f362
    • P
      [VLAN]: Fix SET_VLAN_INGRESS_PRIORITY_CMD ioctl · fffe470a
      Patrick McHardy 提交于
      Based on report and patch by Doug Kehn <rdkehn@yahoo.com>:
      
      vconfig returns the following error when attempting to execute the
      set_ingress_map command:
      
      vconfig: socket or ioctl error for set_ingress_map: Operation not permitted
      
      In vlan.c, vlan_ioctl_handler for SET_VLAN_INGRESS_PRIORITY_CMD
      sets err = -EPERM and calls vlan_dev_set_ingress_priority.
      vlan_dev_set_ingress_priority is a void function so err remains
      at -EPERM and results in the vconfig error (even though the ingress
      map was set).
      
      Fix by setting err = 0 after the vlan_dev_set_ingress_priority call.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fffe470a
    • J
      [NETNS]: Fix compiler error in net_namespace.c · 45a19b0a
      Johann Felix Soden 提交于
      Because net_free is called by copy_net_ns before its declaration, the
      compiler gives an error. This patch puts net_free before copy_net_ns
      to fix this.
      
      The compiler error:
      net/core/net_namespace.c: In function 'copy_net_ns':
      net/core/net_namespace.c:97: error: implicit declaration of function 'net_free'
      net/core/net_namespace.c: At top level:
      net/core/net_namespace.c:104: warning: conflicting types for 'net_free'
      net/core/net_namespace.c:104: error: static declaration of 'net_free' follows non-static declaration
      net/core/net_namespace.c:97: error: previous implicit declaration of 'net_free' was here
      
      The error was introduced by the '[NET]: Hide the dead code in the
      net_namespace.c' patch (6a1a3b9f).
      Signed-off-by: NJohann Felix Soden <johfel@users.sourceforge.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      45a19b0a
    • R
      [PKT_SCHED] CLS_U32: Fix endianness problem with u32 classifier hash masks. · 543821c6
      Radu Rendec 提交于
      While trying to implement u32 hashes in my shaping machine I ran into
      a possible bug in the u32 hash/bucket computing algorithm
      (net/sched/cls_u32.c).
      
      The problem occurs only with hash masks that extend over the octet
      boundary, on little endian machines (where htonl() actually does
      something).
      
      Let's say that I would like to use 0x3fc0 as the hash mask. This means
      8 contiguous "1" bits starting at b6. With such a mask, the expected
      (and logical) behavior is to hash any address in, for instance,
      192.168.0.0/26 in bucket 0, then any address in 192.168.0.64/26 in
      bucket 1, then 192.168.0.128/26 in bucket 2 and so on.
      
      This is exactly what would happen on a big endian machine, but on
      little endian machines, what would actually happen with current
      implementation is 0x3fc0 being reversed (into 0xc03f0000) by htonl()
      in the userspace tool and then applied to 192.168.x.x in the u32
      classifier. When shifting right by 16 bits (rank of first "1" bit in
      the reversed mask) and applying the divisor mask (0xff for divisor
      256), what would actually remain is 0x3f applied on the "168" octet of
      the address.
      
      One could say is this can be easily worked around by taking endianness
      into account in userspace and supplying an appropriate mask (0xfc03)
      that would be turned into contiguous "1" bits when reversed
      (0x03fc0000). But the actual problem is the network address (inside
      the packet) not being converted to host order, but used as a
      host-order value when computing the bucket.
      
      Let's say the network address is written as n31 n30 ... n0, with n0
      being the least significant bit. When used directly (without any
      conversion) on a little endian machine, it becomes n7 ... n0 n8 ..n15
      etc in the machine's registers. Thus bits n7 and n8 would no longer be
      adjacent and 192.168.64.0/26 and 192.168.128.0/26 would no longer be
      consecutive.
      
      The fix is to apply ntohl() on the hmask before computing fshift,
      and in u32_hash_fold() convert the packet data to host order before
      shifting down by fshift.
      
      With helpful feedback from Jamal Hadi Salim and Jarek Poplawski.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      543821c6
    • J
      [NET]: Removing duplicit #includes · 40208d71
      Jiri Olsa 提交于
      Removing duplicit #includes for net/
      Signed-off-by: NJiri Olsa <olsajiri@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      40208d71
    • P
      [IPV4]: Compact some ifdefs in the fib code. · c3e9a353
      Pavel Emelyanov 提交于
      There are places that check for CONFIG_IP_MULTIPLE_TABLES
      twice in the same file, but the internals of these #ifdefs
      can be merged.
      
      As a side effect - remove one ifdef from inside a function.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c3e9a353
    • A
      [IPV6]: Convert /proc/net/ipv6_route to seq_file interface · 33120b30
      Alexey Dobriyan 提交于
      This removes last proc_net_create() user. Kudos to Benjamin Thery and
      Stephen Hemminger for comments on previous version.
      Signed-off-by: NAlexey Dobriyan <adobriyan@sw.ru>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      33120b30
    • E
      [PKT_SCHED]: Fix OOPS when removing devices from a teql queuing discipline · 4f9f8311
      Evgeniy Polyakov 提交于
      tecl_reset() is called from deactivate and qdisc is set to noop already,
      but subsequent teql_xmit does not know about it and dereference private
      data as teql qdisc and thus oopses.
      not catch it first :)
      Signed-off-by: NEvgeniy Polyakov <johnpol@2ka.mipt.ru>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f9f8311