1. 02 9月, 2015 5 次提交
  2. 01 9月, 2015 2 次提交
    • P
      tun_dst: Remove opts_size · 63b6c13d
      Pravin B Shelar 提交于
      opts_size is only written and never read. Following patch
      removes this unused variable.
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      63b6c13d
    • D
      tcp: use dctcp if enabled on the route to the initiator · c3a8d947
      Daniel Borkmann 提交于
      Currently, the following case doesn't use DCTCP, even if it should:
      A responder has f.e. Cubic as system wide default, but for a specific
      route to the initiating host, DCTCP is being set in RTAX_CC_ALGO. The
      initiating host then uses DCTCP as congestion control, but since the
      initiator sets ECT(0), tcp_ecn_create_request() doesn't set ecn_ok,
      and we have to fall back to Reno after 3WHS completes.
      
      We were thinking on how to solve this in a minimal, non-intrusive
      way without bloating tcp_ecn_create_request() needlessly: lets cache
      the CA ecn option flag in RTAX_FEATURES. In other words, when ECT(0)
      is set on the SYN packet, set ecn_ok=1 iff route RTAX_FEATURES
      contains the unexposed (internal-only) DST_FEATURE_ECN_CA. This allows
      to only do a single metric feature lookup inside tcp_ecn_create_request().
      
      Joint work with Florian Westphal.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c3a8d947
  3. 30 8月, 2015 2 次提交
  4. 29 8月, 2015 1 次提交
  5. 28 8月, 2015 4 次提交
  6. 27 8月, 2015 1 次提交
  7. 26 8月, 2015 2 次提交
    • W
      route: fix a use-after-free · e252b3d1
      WANG Cong 提交于
      This patch fixes the following crash:
      
       general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
       CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc7+ #166
       Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
       task: ffff88010656d280 ti: ffff880106570000 task.ti: ffff880106570000
       RIP: 0010:[<ffffffff8182f91b>]  [<ffffffff8182f91b>] dst_destroy+0xa6/0xef
       RSP: 0018:ffff880107603e38  EFLAGS: 00010202
       RAX: 0000000000000001 RBX: ffff8800d225a000 RCX: ffffffff82250fd0
       RDX: 0000000000000001 RSI: ffffffff82250fd0 RDI: 6b6b6b6b6b6b6b6b
       RBP: ffff880107603e58 R08: 0000000000000001 R09: 0000000000000001
       R10: 000000000000b530 R11: ffff880107609000 R12: 0000000000000000
       R13: ffffffff82343c40 R14: 0000000000000000 R15: ffffffff8182fb4f
       FS:  0000000000000000(0000) GS:ffff880107600000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
       CR2: 00007fcabd9d3000 CR3: 00000000d7279000 CR4: 00000000000006e0
       Stack:
        ffffffff82250fd0 ffff8801077d6f00 ffffffff82253c40 ffff8800d225a000
        ffff880107603e68 ffffffff8182fb5d ffff880107603f08 ffffffff810d795e
        ffffffff810d7648 ffff880106574000 ffff88010656d280 ffff88010656d280
       Call Trace:
        <IRQ>
        [<ffffffff8182fb5d>] dst_destroy_rcu+0xe/0x1d
        [<ffffffff810d795e>] rcu_process_callbacks+0x618/0x7eb
        [<ffffffff810d7648>] ? rcu_process_callbacks+0x302/0x7eb
        [<ffffffff8182fb4f>] ? dst_gc_task+0x1eb/0x1eb
        [<ffffffff8107e11b>] __do_softirq+0x178/0x39f
        [<ffffffff8107e52e>] irq_exit+0x41/0x95
        [<ffffffff81a4f215>] smp_apic_timer_interrupt+0x34/0x40
        [<ffffffff81a4d5cd>] apic_timer_interrupt+0x6d/0x80
        <EOI>
        [<ffffffff8100b968>] ? default_idle+0x21/0x32
        [<ffffffff8100b966>] ? default_idle+0x1f/0x32
        [<ffffffff8100bf19>] arch_cpu_idle+0xf/0x11
        [<ffffffff810b0bc7>] default_idle_call+0x1f/0x21
        [<ffffffff810b0dce>] cpu_startup_entry+0x1ad/0x273
        [<ffffffff8102fe67>] start_secondary+0x135/0x156
      
      dst is freed right before lwtstate_put(), this is not correct...
      
      Fixes: 61adedf3 ("route: move lwtunnel state to dst_entry")
      Acked-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NCong Wang <cwang@twopensource.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e252b3d1
    • M
      net-next: Fix warning while make xmldocs caused by skbuff.c · d7499160
      Masanari Iida 提交于
      This patch fix following warnings.
      
      .//net/core/skbuff.c:407: warning: No description found
      for parameter 'len'
      .//net/core/skbuff.c:407: warning: Excess function parameter
       'length' description in '__netdev_alloc_skb'
      .//net/core/skbuff.c:476: warning: No description found
       for parameter 'len'
      .//net/core/skbuff.c:476: warning: Excess function parameter
      'length' description in '__napi_alloc_skb'
      Signed-off-by: NMasanari Iida <standby24x7@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7499160
  8. 25 8月, 2015 1 次提交
  9. 22 8月, 2015 1 次提交
    • M
      mm: make page pfmemalloc check more robust · 2f064f34
      Michal Hocko 提交于
      Commit c48a11c7 ("netvm: propagate page->pfmemalloc to skb") added
      checks for page->pfmemalloc to __skb_fill_page_desc():
      
              if (page->pfmemalloc && !page->mapping)
                      skb->pfmemalloc = true;
      
      It assumes page->mapping == NULL implies that page->pfmemalloc can be
      trusted.  However, __delete_from_page_cache() can set set page->mapping
      to NULL and leave page->index value alone.  Due to being in union, a
      non-zero page->index will be interpreted as true page->pfmemalloc.
      
      So the assumption is invalid if the networking code can see such a page.
      And it seems it can.  We have encountered this with a NFS over loopback
      setup when such a page is attached to a new skbuf.  There is no copying
      going on in this case so the page confuses __skb_fill_page_desc which
      interprets the index as pfmemalloc flag and the network stack drops
      packets that have been allocated using the reserves unless they are to
      be queued on sockets handling the swapping which is the case here and
      that leads to hangs when the nfs client waits for a response from the
      server which has been dropped and thus never arrive.
      
      The struct page is already heavily packed so rather than finding another
      hole to put it in, let's do a trick instead.  We can reuse the index
      again but define it to an impossible value (-1UL).  This is the page
      index so it should never see the value that large.  Replace all direct
      users of page->pfmemalloc by page_is_pfmemalloc which will hide this
      nastiness from unspoiled eyes.
      
      The information will get lost if somebody wants to use page->index
      obviously but that was the case before and the original code expected
      that the information should be persisted somewhere else if that is
      really needed (e.g.  what SLAB and SLUB do).
      
      [akpm@linux-foundation.org: fix blooper in slub]
      Fixes: c48a11c7 ("netvm: propagate page->pfmemalloc to skb")
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Debugged-by: NVlastimil Babka <vbabka@suse.com>
      Debugged-by: NJiri Bohac <jbohac@suse.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: David Miller <davem@davemloft.net>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: <stable@vger.kernel.org>	[3.6+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2f064f34
  10. 21 8月, 2015 2 次提交
  11. 19 8月, 2015 1 次提交
    • P
      net: warn if drivers set tx_queue_len = 0 · 906470c1
      Phil Sutter 提交于
      Due to the introduction of IFF_NO_QUEUE, there is a better way for
      drivers to indicate that no qdisc should be attached by default. Though,
      the old convention can't be dropped since ignoring that setting would
      break drivers still using it. Instead, add a warning so out-of-tree
      driver maintainers get a chance to adjust their code before we finally
      get rid of any special handling of tx_queue_len == 0.
      Signed-off-by: NPhil Sutter <phil@nwl.cc>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      906470c1
  12. 18 8月, 2015 4 次提交
  13. 14 8月, 2015 2 次提交
  14. 11 8月, 2015 2 次提交
    • E
      inet: fix races with reqsk timers · 2235f2ac
      Eric Dumazet 提交于
      reqsk_queue_destroy() and reqsk_queue_unlink() should use
      del_timer_sync() instead of del_timer() before calling reqsk_put(),
      otherwise we could free a req still used by another cpu.
      
      But before doing so, reqsk_queue_destroy() must release syn_wait_lock
      spinlock or risk a dead lock, as reqsk_timer_handler() might
      need to take this same spinlock from reqsk_queue_unlink() (called from
      inet_csk_reqsk_queue_drop())
      
      Fixes: fa76ce73 ("inet: get rid of central tcp/dccp listener timer")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2235f2ac
    • R
      net: add explicit logging and stat for neighbour table overflow · fb811395
      Rick Jones 提交于
      Add an explicit neighbour table overflow message (ratelimited) and
      statistic to make diagnosing neighbour table overflows tractable in
      the wild.
      
      Diagnosing a neighbour table overflow can be quite difficult in the wild
      because there is no explicit dmesg logged.  Callers to neighbour code
      seem to use net_dbg_ratelimit when the neighbour call fails which means
      the "base message" is not emitted and the callback suppressed messages
      from the ratelimiting can end-up juxtaposed with unrelated messages.
      Further, a forced garbage collection will increment a stat on each call
      whether it was successful in freeing-up a table entry or not, so that
      statistic is only a hint.  So, add a net_info_ratelimited message and
      explicit statistic to the neighbour code.
      Signed-off-by: NRick Jones <rick.jones2@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fb811395
  15. 08 8月, 2015 1 次提交
    • T
      net: Fix race condition in store_rps_map · 10e4ea75
      Tom Herbert 提交于
      There is a race condition in store_rps_map that allows jump label
      count in rps_needed to go below zero. This can happen when
      concurrently attempting to set and a clear map.
      
      Scenario:
      
      1. rps_needed count is zero
      2. New map is assigned by setting thread, but rps_needed count _not_ yet
         incremented (rps_needed count still zero)
      2. Map is cleared by second thread, old_map set to that just assigned
      3. Second thread performs static_key_slow_dec, rps_needed count now goes
         negative
      
      Fix is to increment or decrement rps_needed under the spinlock.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      10e4ea75
  16. 07 8月, 2015 2 次提交
  17. 04 8月, 2015 1 次提交
  18. 03 8月, 2015 1 次提交
    • D
      ebpf: add skb->hash to offset map for usage in {cls, act}_bpf or filters · ba7591d8
      Daniel Borkmann 提交于
      Add skb->hash to the __sk_buff offset map, so it can be accessed from
      an eBPF program. We currently already do this for classic BPF filters,
      but not yet on eBPF, it might be useful as a demuxer in combination with
      helpers like bpf_clone_redirect(), toy example:
      
        __section("cls-lb") int ingress_main(struct __sk_buff *skb)
        {
          unsigned int which = 3 + (skb->hash & 7);
          /* bpf_skb_store_bytes(skb, ...); */
          /* bpf_l{3,4}_csum_replace(skb, ...); */
          bpf_clone_redirect(skb, which, 0);
          return -1;
        }
      
      I was thinking whether to add skb_get_hash(), but then concluded the
      raw skb->hash seems fine in this case: we can directly access the hash
      w/o extra eBPF helper function call, it's filled out by many NICs on
      ingress, and in case the entropy level would not be sufficient, people
      can still implement their own specific sw fallback hash mix anyway.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ba7591d8
  19. 01 8月, 2015 2 次提交
    • T
      net: Add functions to get skb->hash based on flow structures · f70ea018
      Tom Herbert 提交于
      Add skb_get_hash_flowi6 and skb_get_hash_flowi4 which derive an sk_buff
      hash from flowi6 and flowi4 structures respectively. These functions
      can be called when creating a packet in the output path where the new
      sk_buff does not yet contain a fully formed packet that is parsable by
      flow dissector.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f70ea018
    • A
      bpf: add helpers to access tunnel metadata · d3aa45ce
      Alexei Starovoitov 提交于
      Introduce helpers to let eBPF programs attached to TC manipulate tunnel metadata:
      bpf_skb_[gs]et_tunnel_key(skb, key, size, flags)
      skb: pointer to skb
      key: pointer to 'struct bpf_tunnel_key'
      size: size of 'struct bpf_tunnel_key'
      flags: room for future extensions
      
      First eBPF program that uses these helpers will allocate per_cpu
      metadata_dst structures that will be used on TX.
      On RX metadata_dst is allocated by tunnel driver.
      
      Typical usage for TX:
      struct bpf_tunnel_key tkey;
      ... populate tkey ...
      bpf_skb_set_tunnel_key(skb, &tkey, sizeof(tkey), 0);
      bpf_clone_redirect(skb, vxlan_dev_ifindex, 0);
      
      RX:
      struct bpf_tunnel_key tkey = {};
      bpf_skb_get_tunnel_key(skb, &tkey, sizeof(tkey), 0);
      ... lookup or redirect based on tkey ...
      
      'struct bpf_tunnel_key' will be extended in the future by adding
      elements to the end and the 'size' argument will indicate which fields
      are populated, thereby keeping backwards compatibility.
      The 'flags' argument may be used as well when the 'size' is not enough or
      to indicate completely different layout of bpf_tunnel_key.
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3aa45ce
  20. 31 7月, 2015 1 次提交
  21. 30 7月, 2015 2 次提交