1. 09 7月, 2012 1 次提交
    • G
      cgroup: fix panic in netprio_cgroup · b761c9b1
      Gao feng 提交于
      we set max_prioidx to the first zero bit index of prioidx_map in
      function get_prioidx.
      
      So when we delete the low index netprio cgroup and adding a new
      netprio cgroup again,the max_prioidx will be set to the low index.
      
      when we set the high index cgroup's net_prio.ifpriomap,the function
      write_priomap will call update_netdev_tables to alloc memory which
      size is sizeof(struct netprio_map) + sizeof(u32) * (max_prioidx + 1),
      so the size of array that map->priomap point to is max_prioidx +1,
      which is low than what we actually need.
      
      fix this by adding check in get_prioidx,only set max_prioidx when
      max_prioidx low than the new prioidx.
      Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b761c9b1
  2. 29 6月, 2012 1 次提交
  3. 16 6月, 2012 1 次提交
  4. 14 6月, 2012 2 次提交
    • E
      netpoll: fix netpoll_send_udp() bugs · 954fba02
      Eric Dumazet 提交于
      Bogdan Hamciuc diagnosed and fixed following bug in netpoll_send_udp() :
      
      "skb->len += len;" instead of "skb_put(skb, len);"
      
      Meaning that _if_ a network driver needs to call skb_realloc_headroom(),
      only packet headers would be copied, leaving garbage in the payload.
      
      However the skb_realloc_headroom() must be avoided as much as possible
      since it requires memory and netpoll tries hard to work even if memory
      is exhausted (using a pool of preallocated skbs)
      
      It appears netpoll_send_udp() reserved 16 bytes for the ethernet header,
      which happens to work for typicall drivers but not all.
      
      Right thing is to use LL_RESERVED_SPACE(dev)
      (And also add dev->needed_tailroom of tailroom)
      
      This patch combines both fixes.
      
      Many thanks to Bogdan for raising this issue.
      Reported-by: NBogdan Hamciuc <bogdan.hamciuc@freescale.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Tested-by: NBogdan Hamciuc <bogdan.hamciuc@freescale.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Reviewed-by: NNeil Horman <nhorman@tuxdriver.com>
      Reviewed-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      954fba02
    • E
      splice: fix racy pipe->buffers uses · 047fe360
      Eric Dumazet 提交于
      Dave Jones reported a kernel BUG at mm/slub.c:3474! triggered
      by splice_shrink_spd() called from vmsplice_to_pipe()
      
      commit 35f3d14d (pipe: add support for shrinking and growing pipes)
      added capability to adjust pipe->buffers.
      
      Problem is some paths don't hold pipe mutex and assume pipe->buffers
      doesn't change for their duration.
      
      Fix this by adding nr_pages_max field in struct splice_pipe_desc, and
      use it in place of pipe->buffers where appropriate.
      
      splice_shrink_spd() loses its struct pipe_inode_info argument.
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Tom Herbert <therbert@google.com>
      Cc: stable <stable@vger.kernel.org> # 2.6.35
      Tested-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      047fe360
  5. 09 6月, 2012 1 次提交
  6. 08 6月, 2012 1 次提交
  7. 04 6月, 2012 1 次提交
    • E
      drop_monitor: dont sleep in atomic context · bec4596b
      Eric Dumazet 提交于
      drop_monitor calls several sleeping functions while in atomic context.
      
       BUG: sleeping function called from invalid context at mm/slub.c:943
       in_atomic(): 1, irqs_disabled(): 0, pid: 2103, name: kworker/0:2
       Pid: 2103, comm: kworker/0:2 Not tainted 3.5.0-rc1+ #55
       Call Trace:
        [<ffffffff810697ca>] __might_sleep+0xca/0xf0
        [<ffffffff811345a3>] kmem_cache_alloc_node+0x1b3/0x1c0
        [<ffffffff8105578c>] ? queue_delayed_work_on+0x11c/0x130
        [<ffffffff815343fb>] __alloc_skb+0x4b/0x230
        [<ffffffffa00b0360>] ? reset_per_cpu_data+0x160/0x160 [drop_monitor]
        [<ffffffffa00b022f>] reset_per_cpu_data+0x2f/0x160 [drop_monitor]
        [<ffffffffa00b03ab>] send_dm_alert+0x4b/0xb0 [drop_monitor]
        [<ffffffff810568e0>] process_one_work+0x130/0x4c0
        [<ffffffff81058249>] worker_thread+0x159/0x360
        [<ffffffff810580f0>] ? manage_workers.isra.27+0x240/0x240
        [<ffffffff8105d403>] kthread+0x93/0xa0
        [<ffffffff816be6d4>] kernel_thread_helper+0x4/0x10
        [<ffffffff8105d370>] ? kthread_freezable_should_stop+0x80/0x80
        [<ffffffff816be6d0>] ? gs_change+0xb/0xb
      
      Rework the logic to call the sleeping functions in right context.
      
      Use standard timer/workqueue api to let system chose any cpu to perform
      the allocation and netlink send.
      
      Also avoid a loop if reset_per_cpu_data() cannot allocate memory :
      use mod_timer() to wait 1/10 second before next try.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Reviewed-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bec4596b
  8. 01 6月, 2012 1 次提交
  9. 30 5月, 2012 1 次提交
  10. 20 5月, 2012 1 次提交
    • E
      net: introduce skb_try_coalesce() · bad43ca8
      Eric Dumazet 提交于
      Move tcp_try_coalesce() protocol independent part to
      skb_try_coalesce().
      
      skb_try_coalesce() can be used in IPv4 defrag and IPv6 reassembly,
      to build optimized skbs (less sk_buff, and possibly less 'headers')
      
      skb_try_coalesce() is zero copy, unless the copy can fit in destination
      header (its a rare case)
      
      kfree_skb_partial() is also moved to net/core/skbuff.c and exported,
      because IPv6 will need it in patch (ipv6: use skb coalescing in
      reassembly).
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Alexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bad43ca8
  11. 19 5月, 2012 3 次提交
  12. 18 5月, 2012 2 次提交
    • N
      drop_monitor: convert to modular building · cad456d5
      Neil Horman 提交于
      When I first wrote drop monitor I wrote it to just build monolithically.  There
      is no reason it can't be built modularly as well, so lets give it that
      flexibiity.
      
      I've tested this by building it as both a module and monolithically, and it
      seems to work quite well
      
      Change notes:
      
      v2)
      * fixed for_each_present_cpu loops to be more correct as per Eric D.
      * Converted exit path failures to BUG_ON as per Ben H.
      
      v3)
      * Converted del_timer to del_timer_sync to close race noted by Ben H.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Eric Dumazet <eric.dumazet@gmail.com>
      CC: Ben Hutchings <bhutchings@solarflare.com>
      Reviewed-by: NBen Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cad456d5
    • E
      net: netdev_alloc_skb() use build_skb() · a1c7fff7
      Eric Dumazet 提交于
      netdev_alloc_skb() is used by networks driver in their RX path to
      allocate an skb to receive an incoming frame.
      
      With recent skb->head_frag infrastructure, it makes sense to change
      netdev_alloc_skb() to use build_skb() and a frag allocator.
      
      This permits a zero copy splice(socket->pipe), and better GRO or TCP
      coalescing.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a1c7fff7
  13. 17 5月, 2012 3 次提交
  14. 16 5月, 2012 2 次提交
  15. 11 5月, 2012 2 次提交
    • E
      pktgen: fix crash at module unload · c57b5468
      Eric Dumazet 提交于
      commit 7d3d43da (net: In unregister_netdevice_notifier unregister
      the netdevices.) makes pktgen crashing at module unload.
      
      [  296.820578] BUG: spinlock bad magic on CPU#6, rmmod/3267
      [  296.820719]  lock: ffff880310c38000, .magic: ffff8803, .owner: <none>/-1, .owner_cpu: -1
      [  296.820943] Pid: 3267, comm: rmmod Not tainted 3.4.0-rc5+ #254
      [  296.821079] Call Trace:
      [  296.821211]  [<ffffffff8168a715>] spin_dump+0x8a/0x8f
      [  296.821345]  [<ffffffff8168a73b>] spin_bug+0x21/0x26
      [  296.821507]  [<ffffffff812b4741>] do_raw_spin_lock+0x131/0x140
      [  296.821648]  [<ffffffff8169188e>] _raw_spin_lock+0x1e/0x20
      [  296.821786]  [<ffffffffa00cc0fd>] __pktgen_NN_threads+0x4d/0x140 [pktgen]
      [  296.821928]  [<ffffffffa00ccf8d>] pktgen_device_event+0x10d/0x1e0 [pktgen]
      [  296.822073]  [<ffffffff8154ed4f>] unregister_netdevice_notifier+0x7f/0x100
      [  296.822216]  [<ffffffffa00d2a0b>] pg_cleanup+0x48/0x73 [pktgen]
      [  296.822357]  [<ffffffff8109528e>] sys_delete_module+0x17e/0x2a0
      [  296.822502]  [<ffffffff81699652>] system_call_fastpath+0x16/0x1b
      
      Hold the pktgen_thread_lock while splicing pktgen_threads, and test
      pktgen_exiting in pktgen_device_event() to make unload faster.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c57b5468
    • D
      Revert "net: maintain namespace isolation between vlan and real device" · 59b9997b
      David S. Miller 提交于
      This reverts commit 8a83a00b.
      
      It causes regressions for S390 devices, because it does an
      unconditional DST drop on SKBs for vlans and the QETH device
      needs the neighbour entry hung off the DST for certain things
      on transmit.
      
      Arnd can't remember exactly why he even needed this change.
      
      Conflicts:
      
      	drivers/net/macvlan.c
      	net/8021q/vlan_dev.c
      	net/core/dev.c
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      59b9997b
  16. 10 5月, 2012 2 次提交
  17. 09 5月, 2012 1 次提交
  18. 07 5月, 2012 3 次提交
  19. 04 5月, 2012 2 次提交
  20. 03 5月, 2012 4 次提交
  21. 01 5月, 2012 4 次提交
    • E
      net: add a prefetch in socket backlog processing · e4cbb02a
      Eric Dumazet 提交于
      TCP or UDP stacks have big enough latencies that prefetching next
      pointer is worth it.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e4cbb02a
    • E
      net: makes skb_splice_bits() aware of skb->head_frag · 1d0c0b32
      Eric Dumazet 提交于
      __skb_splice_bits() can check if skb to be spliced has its skb->head
      mapped to a page fragment, instead of a kmalloc() area.
      
      If so we can avoid a copy of the skb head and get a reference on
      underlying page.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Maciej Żenczykowski <maze@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Matt Carlson <mcarlson@broadcom.com>
      Cc: Michael Chan <mchan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d0c0b32
    • E
      net: make GRO aware of skb->head_frag · d7e8883c
      Eric Dumazet 提交于
      GRO can check if skb to be merged has its skb->head mapped to a page
      fragment, instead of a kmalloc() area.
      
      We 'upgrade' skb->head as a fragment in itself
      
      This avoids the frag_list fallback, and permits to build true GRO skb
      (one sk_buff and up to 16 fragments), using less memory.
      
      This reduces number of cache misses when user makes its copy, since a
      single sk_buff is fetched.
      
      This is a followup of patch "net: allow skb->head to be a page fragment"
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Maciej Żenczykowski <maze@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Matt Carlson <mcarlson@broadcom.com>
      Cc: Michael Chan <mchan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7e8883c
    • E
      net: allow skb->head to be a page fragment · d3836f21
      Eric Dumazet 提交于
      skb->head is currently allocated from kmalloc(). This is convenient but
      has the drawback the data cannot be converted to a page fragment if
      needed.
      
      We have three spots were it hurts :
      
      1) GRO aggregation
      
       When a linear skb must be appended to another skb, GRO uses the
      frag_list fallback, very inefficient since we keep all struct sk_buff
      around. So drivers enabling GRO but delivering linear skbs to network
      stack aren't enabling full GRO power.
      
      2) splice(socket -> pipe).
      
       We must copy the linear part to a page fragment.
       This kind of defeats splice() purpose (zero copy claim)
      
      3) TCP coalescing.
      
       Recently introduced, this permits to group several contiguous segments
      into a single skb. This shortens queue lengths and save kernel memory,
      and greatly reduce probabilities of TCP collapses. This coalescing
      doesnt work on linear skbs (or we would need to copy data, this would be
      too slow)
      
      Given all these issues, the following patch introduces the possibility
      of having skb->head be a fragment in itself. We use a new skb flag,
      skb->head_frag to carry this information.
      
      build_skb() is changed to accept a frag_size argument. Drivers willing
      to provide a page fragment instead of kmalloc() data will set a non zero
      value, set to the fragment size.
      
      Then, on situations we need to convert the skb head to a frag in itself,
      we can check if skb->head_frag is set and avoid the copies or various
      fallbacks we have.
      
      This means drivers currently using frags could be updated to avoid the
      current skb->head allocation and reduce their memory footprint (aka skb
      truesize). (thats 512 or 1024 bytes saved per skb). This also makes
      bpf/netfilter faster since the 'first frag' will be part of skb linear
      part, no need to copy data.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Maciej Żenczykowski <maze@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Matt Carlson <mcarlson@broadcom.com>
      Cc: Michael Chan <mchan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3836f21
  22. 29 4月, 2012 1 次提交