1. 16 4月, 2008 6 次提交
  2. 15 4月, 2008 2 次提交
    • P
      [NETNS]: The generic per-net pointers. · dec827d1
      Pavel Emelyanov 提交于
      Add the elastic array of void * pointer to the struct net.
      The access rules are simple:
      
       1. register the ops with register_pernet_gen_device to get
          the id of your private pointer
       2. call net_assign_generic() to put the private data on the
          struct net (most preferably this should be done in the
          ->init callback of the ops registered)
       3. do not store any private reference on the net_generic array;
       4. do not change this pointer while the net is alive;
       5. use the net_generic() to get the pointer.
      
      When adding a new pointer, I copy the old array, replace it
      with a new one and schedule the old for kfree after an RCU
      grace period.
      
      Since the net_generic explores the net->gen array inside rcu
      read section and once set the net->gen->ptr[x] pointer never 
      changes, this grants us a safe access to generic pointers.
      
      Quoting Paul: "... RCU is protecting -only- the net_generic 
      structure that net_generic() is traversing, and the [pointer]
      returned by net_generic() is protected by a reference counter 
      in the upper-level struct net."
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dec827d1
    • P
      [NETNS]: The net-subsys IDs generator. · c93cf61f
      Pavel Emelyanov 提交于
      To make some per-net generic pointers, we need some way to address
      them, i.e. - IDs. This is simple IDA-based IDs generator for pernet
      subsystems.
      
      Addressing questions about potential checkpoint/restart problems: 
      these IDs are "lite-offsets" within the net structure and are by no 
      means supposed to be exported to the userspace.
      
      Since it will be used in the nearest future by devices only (tun,
      vlan, tunnels, bridge, etc), I make it resemble the functionality
      of register_pernet_device().
      
      The new ids is stored in the *id pointer _before_ calling the init
      callback to make this id available in this callback.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c93cf61f
  3. 14 4月, 2008 3 次提交
  4. 10 4月, 2008 3 次提交
    • P
      [SKFILTER]: Add SKF_ADF_NLATTR instruction · 4738c1db
      Patrick McHardy 提交于
      SKF_ADF_NLATTR searches for a netlink attribute, which avoids manually
      parsing and walking attributes. It takes the offset at which to start
      searching in the 'A' register and the attribute type in the 'X' register
      and returns the offset in the 'A' register. When the attribute is not
      found it returns zero.
      
      A top-level attribute can be located using a filter like this
      (example for nfnetlink, using struct nfgenmsg):
      
      	...
      	{
      		/* A = offset of first attribute */
      		.code	= BPF_LD | BPF_IMM,
      		.k	= sizeof(struct nlmsghdr) + sizeof(struct nfgenmsg)
      	},
      	{
      		/* X = CTA_PROTOINFO */
      		.code	= BPF_LDX | BPF_IMM,
      		.k	= CTA_PROTOINFO,
      	},
      	{
      		/* A = netlink attribute offset */
      		.code	= BPF_LD | BPF_B | BPF_ABS,
      		.k	= SKF_AD_OFF + SKF_AD_NLATTR
      	},
      	{
      		/* Exit if not found */
      		.code   = BPF_JMP | BPF_JEQ | BPF_K,
      		.k	= 0,
      		.jt	= <error>
      	},
      	...
      
      A nested attribute below the CTA_PROTOINFO attribute would then
      be parsed like this:
      
      	...
      	{
      		/* A += sizeof(struct nlattr) */
      		.code	= BPF_ALU | BPF_ADD | BPF_K,
      		.k	= sizeof(struct nlattr),
      	},
      	{
      		/* X = CTA_PROTOINFO_TCP */
      		.code	= BPF_LDX | BPF_IMM,
      		.k	= CTA_PROTOINFO_TCP,
      	},
      	{
      		/* A = netlink attribute offset */
      		.code	= BPF_LD | BPF_B | BPF_ABS,
      		.k	= SKF_AD_OFF + SKF_AD_NLATTR
      	},
      	...
      
      The data of an attribute can be loaded into 'A' like this:
      
      	...
      	{
      		/* X = A (attribute offset) */
      		.code	= BPF_MISC | BPF_TAX,
      	},
      	{
      		/* A = skb->data[X + k] */
      		.code 	= BPF_LD | BPF_B | BPF_IND,
      		.k	= sizeof(struct nlattr),
      	},
      	...
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4738c1db
    • S
      socket: sk_filter deinline · 43db6d65
      Stephen Hemminger 提交于
      The sk_filter function is too big to be inlined. This saves 2296 bytes
      of text on allyesconfig.
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      43db6d65
    • S
      socket: sk_filter minor cleanups · b715631f
      Stephen Hemminger 提交于
      Some minor style cleanups:
        * Move __KERNEL__ definitions to one place in filter.h
        * Use const for sk_filter_len
        * Line wrapping
        * Put EXPORT_SYMBOL next to function definition
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b715631f
  5. 01 4月, 2008 3 次提交
  6. 29 3月, 2008 5 次提交
  7. 28 3月, 2008 8 次提交
  8. 26 3月, 2008 6 次提交
  9. 25 3月, 2008 2 次提交
    • P
      [NETNS]: Minor information leak via /proc/net/ptype file. · 2feb27db
      Pavel Emelyanov 提交于
      This file displays the registered packet types, but some of them
      (packet sockets creates such) can be bound to a net device and showing
      them in a wrong namespace is not correct.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2feb27db
    • P
      [NEIGH]: Fix race between pneigh deletion and ipv6's ndisc_recv_ns (v3). · fa86d322
      Pavel Emelyanov 提交于
      Proxy neighbors do not have any reference counting, so any caller
      of pneigh_lookup (unless it's a netlink triggered add/del routine)
      should _not_ perform any actions on the found proxy entry. 
      
      There's one exception from this rule - the ipv6's ndisc_recv_ns() 
      uses found entry to check the flags for NTF_ROUTER.
      
      This creates a race between the ndisc and pneigh_delete - after 
      the pneigh is returned to the caller, the nd_tbl.lock is dropped 
      and the deleting procedure may proceed.
      
      One of the fixes would be to add a reference counting, but this
      problem exists for ndisc only. Besides such a patch would be too 
      big for -rc4.
      
      So I propose to introduce a __pneigh_lookup() which is supposed
      to be called with the lock held and use it in ndisc code to check
      the flags on alive pneigh entry.
      
      
      Changes from v2:
      As David noticed, Exported the __pneigh_lookup() to ipv6 module. 
      The checkpatch generates a warning on it, since the EXPORT_SYMBOL 
      does not follow the symbol itself, but in this file all the 
      exports come at the end, so I decided no to break this harmony.
      
      Changes from v1:
      Fixed comments from YOSHIFUJI - indentation of prototype in header
      and the pndisc_check_router() name - and a compilation fix, pointed
      by Daniel - the is_routed was (falsely) considered as uninitialized
      by gcc.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fa86d322
  10. 21 3月, 2008 2 次提交
    • P
      [NET]: Add per-connection option to set max TSO frame size · 82cc1a7a
      Peter P Waskiewicz Jr 提交于
      Update: My mailer ate one of Jarek's feedback mails...  Fixed the
      parameter in netif_set_gso_max_size() to be u32, not u16.  Fixed the
      whitespace issue due to a patch import botch.  Changed the types from
      u32 to unsigned int to be more consistent with other variables in the
      area.  Also brought the patch up to the latest net-2.6.26 tree.
      
      Update: Made gso_max_size container 32 bits, not 16.  Moved the
      location of gso_max_size within netdev to be less hotpath.  Made more
      consistent names between the sock and netdev layers, and added a
      define for the max GSO size.
      
      Update: Respun for net-2.6.26 tree.
      
      Update: changed max_gso_frame_size and sk_gso_max_size from signed to
      unsigned - thanks Stephen!
      
      This patch adds the ability for device drivers to control the size of
      the TSO frames being sent to them, per TCP connection.  By setting the
      netdevice's gso_max_size value, the socket layer will set the GSO
      frame size based on that value.  This will propogate into the TCP
      layer, and send TSO's of that size to the hardware.
      
      This can be desirable to help tune the bursty nature of TSO on a
      per-adapter basis, where one may have 1 GbE and 10 GbE devices
      coexisting in a system, one running multiqueue and the other not, etc.
      
      This can also be desirable for devices that cannot support full 64 KB
      TSO's, but still want to benefit from some level of segmentation
      offloading.
      Signed-off-by: NPeter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      82cc1a7a
    • J
      netpoll: zap_completion_queue: adjust skb->users counter · 8a455b08
      Jarek Poplawski 提交于
      zap_completion_queue() retrieves skbs from completion_queue where they have
      zero skb->users counter.  Before dev_kfree_skb_any() it should be non-zero
      yet, so it's increased now.
      Reported-and-tested-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a455b08