1. 29 1月, 2008 2 次提交
  2. 05 12月, 2007 2 次提交
  3. 21 11月, 2007 1 次提交
    • P
      [IPVS]: Fix compiler warning about unused register_ip_vs_protocol · d535a916
      Pavel Emelyanov 提交于
      This is silly, but I have turned the CONFIG_IP_VS to m,
      to check the compilation of one (recently sent) fix
      and set all the CONFIG_IP_VS_PROTO_XXX options to n to
      speed up the compilation.
      
      In this configuration the compiler warns me about
      
        CC [M]  net/ipv4/ipvs/ip_vs_proto.o
      net/ipv4/ipvs/ip_vs_proto.c:49: warning: 'register_ip_vs_protocol' defined but not used
      
      Indeed. With no protocols selected there are no
      calls to this function - all are compiled out with
      ifdefs.
      
      Maybe the best fix would be to surround this call with
      ifdef-s or tune the Kconfig dependences, but I think that
      marking this register function as __used is enough. No?
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Acked-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d535a916
  4. 20 11月, 2007 4 次提交
  5. 13 11月, 2007 1 次提交
  6. 07 11月, 2007 2 次提交
    • R
      [IPVS]: Synchronize closing of Connections · efac5276
      Rumen G. Bogdanovski 提交于
      This patch makes the master daemon to sync the connection when it is about
      to close.  This makes the connections on the backup to close or timeout
      according their state.  Before the sync was performed only if the
      connection is in ESTABLISHED state which always made the connections to
      timeout in the hard coded 3 minutes. However the Andy Gospodarek's patch
      ([IPVS]: use proper timeout instead of fixed value) effectively did nothing
      more than increasing this to 15 minutes (Established state timeout).  So
      this patch makes use of proper timeout since it syncs the connections on
      status changes to FIN_WAIT (2min timeout) and CLOSE (10sec timeout).
      However if the backup misses CLOSE hopefully it did not miss FIN_WAIT.
      Otherwise we will just have to wait for the ESTABLISHED state timeout. As
      it is without this patch.  This way the number of the hanging connections
      on the backup is kept to minimum. And very few of them will be left to
      timeout with a long timeout.
      
      This is important if we want to make use of the fix for the real server
      overcommit on master/backup fail-over.
      Signed-off-by: NRumen G. Bogdanovski <rumen@voicecho.com>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      efac5276
    • R
      [IPVS]: Bind connections on stanby if the destination exists · 1e356f9c
      Rumen G. Bogdanovski 提交于
      This patch fixes the problem with node overload on director fail-over.
      Given the scenario: 2 nodes each accepting 3 connections at a time and 2
      directors, director failover occurs when the nodes are fully loaded (6
      connections to the cluster) in this case the new director will assign
      another 6 connections to the cluster, If the same real servers exist
      there.
      
      The problem turned to be in not binding the inherited connections to
      the real servers (destinations) on the backup director. Therefore:
      "ipvsadm -l" reports 0 connections:
      root@test2:~# ipvsadm -l
      IP Virtual Server version 1.2.1 (size=4096)
      Prot LocalAddress:Port Scheduler Flags
        -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
      TCP  test2.local:5999 wlc
        -> node473.local:5999           Route   1000   0          0
        -> node484.local:5999           Route   1000   0          0
      
      while "ipvs -lnc" is right
      root@test2:~# ipvsadm -lnc
      IPVS connection entries
      pro expire state       source             virtual            destination
      TCP 14:56  ESTABLISHED 192.168.0.10:39164 192.168.0.222:5999
      192.168.0.51:5999
      TCP 14:59  ESTABLISHED 192.168.0.10:39165 192.168.0.222:5999
      192.168.0.52:5999
      
      So the patch I am sending fixes the problem by binding the received
      connections to the appropriate service on the backup director, if it
      exists, else the connection will be handled the old way. So if the
      master and the backup directors are synchronized in terms of real
      services there will be no problem with server over-committing since
      new connections will not be created on the nonexistent real services
      on the backup. However if the service is created later on the backup,
      the binding will be performed when the next connection update is
      received. With this patch the inherited connections will show as
      inactive on the backup:
      
      root@test2:~# ipvsadm -l
      IP Virtual Server version 1.2.1 (size=4096)
      Prot LocalAddress:Port Scheduler Flags
        -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
      TCP  test2.local:5999 wlc
        -> node473.local:5999           Route   1000   0          1
        -> node484.local:5999           Route   1000   0          1
      
      rumen@test2:~$ cat /proc/net/ip_vs
      IP Virtual Server version 1.2.1 (size=4096)
      Prot LocalAddress:Port Scheduler Flags
        -> RemoteAddress:Port Forward Weight ActiveConn InActConn
      TCP  C0A800DE:176F wlc
        -> C0A80033:176F      Route   1000   0          1
        -> C0A80032:176F      Route   1000   0          1
      
      Regards,
      Rumen Bogdanovski
      Acked-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NRumen G. Bogdanovski <rumen@voicecho.com>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      1e356f9c
  7. 31 10月, 2007 1 次提交
  8. 30 10月, 2007 1 次提交
  9. 24 10月, 2007 1 次提交
  10. 20 10月, 2007 2 次提交
  11. 16 10月, 2007 3 次提交
  12. 11 10月, 2007 5 次提交
  13. 11 9月, 2007 1 次提交
    • N
      [NETFILTER]: Fix/improve deadlock condition on module removal netfilter · 16fcec35
      Neil Horman 提交于
      So I've had a deadlock reported to me.  I've found that the sequence of
      events goes like this:
      
      1) process A (modprobe) runs to remove ip_tables.ko
      
      2) process B (iptables-restore) runs and calls setsockopt on a netfilter socket,
      increasing the ip_tables socket_ops use count
      
      3) process A acquires a file lock on the file ip_tables.ko, calls remove_module
      in the kernel, which in turn executes the ip_tables module cleanup routine,
      which calls nf_unregister_sockopt
      
      4) nf_unregister_sockopt, seeing that the use count is non-zero, puts the
      calling process into uninterruptible sleep, expecting the process using the
      socket option code to wake it up when it exits the kernel
      
      4) the user of the socket option code (process B) in do_ipt_get_ctl, calls
      ipt_find_table_lock, which in this case calls request_module to load
      ip_tables_nat.ko
      
      5) request_module forks a copy of modprobe (process C) to load the module and
      blocks until modprobe exits.
      
      6) Process C. forked by request_module process the dependencies of
      ip_tables_nat.ko, of which ip_tables.ko is one.
      
      7) Process C attempts to lock the request module and all its dependencies, it
      blocks when it attempts to lock ip_tables.ko (which was previously locked in
      step 3)
      
      Theres not really any great permanent solution to this that I can see, but I've
      developed a two part solution that corrects the problem
      
      Part 1) Modifies the nf_sockopt registration code so that, instead of using a
      use counter internal to the nf_sockopt_ops structure, we instead use a pointer
      to the registering modules owner to do module reference counting when nf_sockopt
      calls a modules set/get routine.  This prevents the deadlock by preventing set 4
      from happening.
      
      Part 2) Enhances the modprobe utilty so that by default it preforms non-blocking
      remove operations (the same way rmmod does), and add an option to explicity
      request blocking operation.  So if you select blocking operation in modprobe you
      can still cause the above deadlock, but only if you explicity try (and since
      root can do any old stupid thing it would like....  :)  ).
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16fcec35
  14. 14 8月, 2007 2 次提交
  15. 31 7月, 2007 1 次提交
  16. 20 7月, 2007 1 次提交
    • P
      mm: Remove slab destructors from kmem_cache_create(). · 20c2df83
      Paul Mundt 提交于
      Slab destructors were no longer supported after Christoph's
      c59def9f change. They've been
      BUGs for both slab and slub, and slob never supported them
      either.
      
      This rips out support for the dtor pointer from kmem_cache_create()
      completely and fixes up every single callsite in the kernel (there were
      about 224, not including the slab allocator definitions themselves,
      or the documentation references).
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      20c2df83
  17. 11 7月, 2007 1 次提交
  18. 19 6月, 2007 1 次提交
    • N
      [IPVS]: Fix state variable on failure to start ipvs threads · cc0191ae
      Neil Horman 提交于
      ip_vs currently fails to reset its ip_vs_sync_state variable if the
      sync thread fails to start properly.  The result is that the kernel
      will report a running daemon when their actuall is none.
      
      If you issue the following commands:
      
      1. ipvsadm --start-daemon master --mcast-interface bla
      2. ipvsadm -L --daemon
      3. ipvsadm --stop-daemon master
      
      Assuming that bla is not an actual interface, step 2 should return no
      data, but instead returns:
      
      $ ipvsadm -L --daemon
      master sync daemon (mcast=bla, syncid=0)
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cc0191ae
  19. 25 5月, 2007 1 次提交
  20. 10 5月, 2007 2 次提交
  21. 09 5月, 2007 1 次提交
  22. 26 4月, 2007 4 次提交
    • H
      [NET]: Treat CHECKSUM_PARTIAL as CHECKSUM_UNNECESSARY · 60476372
      Herbert Xu 提交于
      When a transmitted packet is looped back directly, CHECKSUM_PARTIAL
      maps to the semantics of CHECKSUM_UNNECESSARY.  Therefore we should
      treat it as such in the stack.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      60476372
    • A
      [SK_BUFF]: Introduce skb_copy_to_linear_data{_offset} · 27d7ff46
      Arnaldo Carvalho de Melo 提交于
      To clearly state the intent of copying to linear sk_buffs, _offset being a
      overly long variant but interesting for the sake of saving some bytes.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>
      27d7ff46
    • A
      [SK_BUFF]: Convert skb->tail to sk_buff_data_t · 27a884dc
      Arnaldo Carvalho de Melo 提交于
      So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes
      on 64bit architectures, allowing us to combine the 4 bytes hole left by the
      layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4
      64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN...
      :-)
      
      Many calculations that previously required that skb->{transport,network,
      mac}_header be first converted to a pointer now can be done directly, being
      meaningful as offsets or pointers.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      27a884dc
    • A
      [SK_BUFF]: Use offsets for skb->{mac,network,transport}_header on 64bit architectures · 2e07fa9c
      Arnaldo Carvalho de Melo 提交于
      With this we save 8 bytes per network packet, leaving a 4 bytes hole to be used
      in further shrinking work, likely with the offsetization of other pointers,
      such as ->{data,tail,end}, at the cost of adds, that were minimized by the
      usual practice of setting skb->{mac,nh,n}.raw to a local variable that is then
      accessed multiple times in each function, it also is not more expensive than
      before with regards to most of the handling of such headers, like setting one
      of these headers to another (transport to network, etc), or subtracting, adding
      to/from it, comparing them, etc.
      
      Now we have this layout for sk_buff on a x86_64 machine:
      
      [acme@mica net-2.6.22]$ pahole vmlinux sk_buff
      struct sk_buff {
      	struct sk_buff *       next;             /*   0   8 */
      	struct sk_buff *       prev;             /*   8   8 */
      	struct rb_node         rb;               /*  16  24 */
      	struct sock *          sk;               /*  40   8 */
      	ktime_t                tstamp;           /*  48   8 */
      	struct net_device *    dev;              /*  56   8 */
      	/* --- cacheline 1 boundary (64 bytes) --- */
      	struct net_device *    input_dev;        /*  64   8 */
      	sk_buff_data_t         transport_header; /*  72   4 */
      	sk_buff_data_t         network_header;   /*  76   4 */
      	sk_buff_data_t         mac_header;       /*  80   4 */
      
      	/* XXX 4 bytes hole, try to pack */
      
      	struct dst_entry *     dst;              /*  88   8 */
      	struct sec_path *      sp;               /*  96   8 */
      	char                   cb[48];           /* 104  48 */
      	/* cacheline 2 boundary (128 bytes) was 24 bytes ago*/
      	unsigned int           len;              /* 152   4 */
      	unsigned int           data_len;         /* 156   4 */
      	unsigned int           mac_len;          /* 160   4 */
      	union {
      		__wsum         csum;             /*       4 */
      		__u32          csum_offset;      /*       4 */
      	};                                       /* 164   4 */
      	__u32                  priority;         /* 168   4 */
      	__u8                   local_df:1;       /* 172   1 */
      	__u8                   cloned:1;         /* 172   1 */
      	__u8                   ip_summed:2;      /* 172   1 */
      	__u8                   nohdr:1;          /* 172   1 */
      	__u8                   nfctinfo:3;       /* 172   1 */
      	__u8                   pkt_type:3;       /* 173   1 */
      	__u8                   fclone:2;         /* 173   1 */
      	__u8                   ipvs_property:1;  /* 173   1 */
      
      	/* XXX 2 bits hole, try to pack */
      
      	__be16                 protocol;         /* 174   2 */
      	void    (*destructor)(struct sk_buff *); /* 176   8 */
      	struct nf_conntrack *  nfct;             /* 184   8 */
      	/* --- cacheline 3 boundary (192 bytes) --- */
      	struct sk_buff *       nfct_reasm;       /* 192   8 */
      	struct nf_bridge_info *nf_bridge;        /* 200   8 */
      	__u16                  tc_index;         /* 208   2 */
      	__u16                  tc_verd;          /* 210   2 */
      	dma_cookie_t           dma_cookie;       /* 212   4 */
      	__u32                  secmark;          /* 216   4 */
      	__u32                  mark;             /* 220   4 */
      	unsigned int           truesize;         /* 224   4 */
      	atomic_t               users;            /* 228   4 */
      	unsigned char *        head;             /* 232   8 */
      	unsigned char *        data;             /* 240   8 */
      	unsigned char *        tail;             /* 248   8 */
      	/* --- cacheline 4 boundary (256 bytes) --- */
      	unsigned char *        end;              /* 256   8 */
      }; /* size: 264, cachelines: 5 */
         /* sum members: 260, holes: 1, sum holes: 4 */
         /* bit holes: 1, sum bit holes: 2 bits */
         /* last cacheline: 8 bytes */
      
      On 32 bits nothing changes, and pointers continue to be used with the compiler
      turning all this abstraction layer into dust. But there are some sk_buff
      validation tricks that are now possible, humm... :-)
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e07fa9c