1. 23 8月, 2015 1 次提交
    • V
      9p: ensure err is initialized to 0 in p9_client_read/write · 999b8b88
      Vincent Bernat 提交于
      Some use of those functions were providing unitialized values to those
      functions. Notably, when reading 0 bytes from an empty file on a 9P
      filesystem, the return code of read() was not 0.
      
      Tested with this simple program:
      
          #include <assert.h>
          #include <sys/types.h>
          #include <sys/stat.h>
          #include <fcntl.h>
          #include <unistd.h>
      
          int main(int argc, const char **argv)
          {
              assert(argc == 2);
              char buffer[256];
              int fd = open(argv[1], O_RDONLY|O_NOCTTY);
              assert(fd >= 0);
              assert(read(fd, buffer, 0) == 0);
              return 0;
          }
      
      Cc: stable@vger.kernel.org # v4.1
      Signed-off-by: NVincent Bernat <vincent@bernat.im>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      999b8b88
  2. 22 8月, 2015 1 次提交
    • M
      mm: make page pfmemalloc check more robust · 2f064f34
      Michal Hocko 提交于
      Commit c48a11c7 ("netvm: propagate page->pfmemalloc to skb") added
      checks for page->pfmemalloc to __skb_fill_page_desc():
      
              if (page->pfmemalloc && !page->mapping)
                      skb->pfmemalloc = true;
      
      It assumes page->mapping == NULL implies that page->pfmemalloc can be
      trusted.  However, __delete_from_page_cache() can set set page->mapping
      to NULL and leave page->index value alone.  Due to being in union, a
      non-zero page->index will be interpreted as true page->pfmemalloc.
      
      So the assumption is invalid if the networking code can see such a page.
      And it seems it can.  We have encountered this with a NFS over loopback
      setup when such a page is attached to a new skbuf.  There is no copying
      going on in this case so the page confuses __skb_fill_page_desc which
      interprets the index as pfmemalloc flag and the network stack drops
      packets that have been allocated using the reserves unless they are to
      be queued on sockets handling the swapping which is the case here and
      that leads to hangs when the nfs client waits for a response from the
      server which has been dropped and thus never arrive.
      
      The struct page is already heavily packed so rather than finding another
      hole to put it in, let's do a trick instead.  We can reuse the index
      again but define it to an impossible value (-1UL).  This is the page
      index so it should never see the value that large.  Replace all direct
      users of page->pfmemalloc by page_is_pfmemalloc which will hide this
      nastiness from unspoiled eyes.
      
      The information will get lost if somebody wants to use page->index
      obviously but that was the case before and the original code expected
      that the information should be persisted somewhere else if that is
      really needed (e.g.  what SLAB and SLUB do).
      
      [akpm@linux-foundation.org: fix blooper in slub]
      Fixes: c48a11c7 ("netvm: propagate page->pfmemalloc to skb")
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Debugged-by: NVlastimil Babka <vbabka@suse.com>
      Debugged-by: NJiri Bohac <jbohac@suse.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: David Miller <davem@davemloft.net>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: <stable@vger.kernel.org>	[3.6+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2f064f34
  3. 19 8月, 2015 1 次提交
    • S
      batman-adv: Fix memory leak on tt add with invalid vlan · fd7dec25
      Sven Eckelmann 提交于
      The object tt_local is allocated with kmalloc and not initialized when the
      function batadv_tt_local_add checks for the vlan. But this function can
      only cleanup the object when the (not yet initialized) reference counter of
      the object is 1. This is unlikely and thus the object would leak when the
      vlan could not be found.
      
      Instead the uninitialized object tt_local has to be freed manually and the
      pointer has to set to NULL to avoid calling the function which would try to
      decrement the reference counter of the not existing object.
      
      CID: 1316518
      Fixes: 354136bc ("batman-adv: fix kernel crash due to missing NULL checks")
      Signed-off-by: NSven Eckelmann <sven@narfation.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd7dec25
  4. 18 8月, 2015 4 次提交
    • M
      ipv6: Fix a potential deadlock when creating pcpu rt · 9c7370a1
      Martin KaFai Lau 提交于
      rt6_make_pcpu_route() is called under read_lock(&table->tb6_lock).
      rt6_make_pcpu_route() calls ip6_rt_pcpu_alloc(rt) which then
      calls dst_alloc().  dst_alloc() _may_ call ip6_dst_gc() which takes
      the write_lock(&tabl->tb6_lock).  A visualized version:
      
      read_lock(&table->tb6_lock);
      rt6_make_pcpu_route();
      => ip6_rt_pcpu_alloc();
      => dst_alloc();
      => ip6_dst_gc();
      => write_lock(&table->tb6_lock); /* oops */
      
      The fix is to do a read_unlock first before calling ip6_rt_pcpu_alloc().
      
      A reported stack:
      
      [141625.537638] INFO: rcu_sched self-detected stall on CPU { 27}  (t=60000 jiffies g=4159086 c=4159085 q=2139)
      [141625.547469] Task dump for CPU 27:
      [141625.550881] mtr             R  running task        0 22121  22081 0x00000008
      [141625.558069]  0000000000000000 ffff88103f363d98 ffffffff8106e488 000000000000001b
      [141625.565641]  ffffffff81684900 ffff88103f363db8 ffffffff810702b0 0000000008000000
      [141625.573220]  ffffffff81684900 ffff88103f363de8 ffffffff8108df9f ffff88103f375a00
      [141625.580803] Call Trace:
      [141625.583345]  <IRQ>  [<ffffffff8106e488>] sched_show_task+0xc1/0xc6
      [141625.589650]  [<ffffffff810702b0>] dump_cpu_task+0x35/0x39
      [141625.595144]  [<ffffffff8108df9f>] rcu_dump_cpu_stacks+0x6a/0x8c
      [141625.601320]  [<ffffffff81090606>] rcu_check_callbacks+0x1f6/0x5d4
      [141625.607669]  [<ffffffff810940c8>] update_process_times+0x2a/0x4f
      [141625.613925]  [<ffffffff8109fbee>] tick_sched_handle+0x32/0x3e
      [141625.619923]  [<ffffffff8109fc2f>] tick_sched_timer+0x35/0x5c
      [141625.625830]  [<ffffffff81094a1f>] __hrtimer_run_queues+0x8f/0x18d
      [141625.632171]  [<ffffffff81094c9e>] hrtimer_interrupt+0xa0/0x166
      [141625.638258]  [<ffffffff8102bf2a>] local_apic_timer_interrupt+0x4e/0x52
      [141625.645036]  [<ffffffff8102c36f>] smp_apic_timer_interrupt+0x39/0x4a
      [141625.651643]  [<ffffffff8140b9e8>] apic_timer_interrupt+0x68/0x70
      [141625.657895]  <EOI>  [<ffffffff81346ee8>] ? dst_destroy+0x7c/0xb5
      [141625.664188]  [<ffffffff813d45b5>] ? fib6_flush_trees+0x20/0x20
      [141625.670272]  [<ffffffff81082b45>] ? queue_write_lock_slowpath+0x60/0x6f
      [141625.677140]  [<ffffffff8140aa33>] _raw_write_lock_bh+0x23/0x25
      [141625.683218]  [<ffffffff813d4553>] __fib6_clean_all+0x40/0x82
      [141625.689124]  [<ffffffff813d45b5>] ? fib6_flush_trees+0x20/0x20
      [141625.695207]  [<ffffffff813d6058>] fib6_clean_all+0xe/0x10
      [141625.700854]  [<ffffffff813d60d3>] fib6_run_gc+0x79/0xc8
      [141625.706329]  [<ffffffff813d0510>] ip6_dst_gc+0x85/0xf9
      [141625.711718]  [<ffffffff81346d68>] dst_alloc+0x55/0x159
      [141625.717105]  [<ffffffff813d09b5>] __ip6_dst_alloc.isra.32+0x19/0x63
      [141625.723620]  [<ffffffff813d1830>] ip6_pol_route+0x36a/0x3e8
      [141625.729441]  [<ffffffff813d18d6>] ip6_pol_route_output+0x11/0x13
      [141625.735700]  [<ffffffff813f02c8>] fib6_rule_action+0xa7/0x1bf
      [141625.741698]  [<ffffffff813d18c5>] ? ip6_pol_route_input+0x17/0x17
      [141625.748043]  [<ffffffff81357c48>] fib_rules_lookup+0xb5/0x12a
      [141625.754050]  [<ffffffff81141628>] ? poll_select_copy_remaining+0xf9/0xf9
      [141625.761002]  [<ffffffff813f0535>] fib6_rule_lookup+0x37/0x5c
      [141625.766914]  [<ffffffff813d18c5>] ? ip6_pol_route_input+0x17/0x17
      [141625.773260]  [<ffffffff813d008c>] ip6_route_output+0x7a/0x82
      [141625.779177]  [<ffffffff813c44c8>] ip6_dst_lookup_tail+0x53/0x112
      [141625.785437]  [<ffffffff813c45c3>] ip6_dst_lookup_flow+0x2a/0x6b
      [141625.791604]  [<ffffffff813ddaab>] rawv6_sendmsg+0x407/0x9b6
      [141625.797423]  [<ffffffff813d7914>] ? do_ipv6_setsockopt.isra.8+0xd87/0xde2
      [141625.804464]  [<ffffffff8139d4b4>] inet_sendmsg+0x57/0x8e
      [141625.810028]  [<ffffffff81329ba3>] sock_sendmsg+0x2e/0x3c
      [141625.815588]  [<ffffffff8132be57>] SyS_sendto+0xfe/0x143
      [141625.821063]  [<ffffffff813dd551>] ? rawv6_setsockopt+0x5e/0x67
      [141625.827146]  [<ffffffff8132c9f8>] ? sock_common_setsockopt+0xf/0x11
      [141625.833660]  [<ffffffff8132c08c>] ? SyS_setsockopt+0x81/0xa2
      [141625.839565]  [<ffffffff8140ac17>] entry_SYSCALL_64_fastpath+0x12/0x6a
      
      Fixes: d52d3997 ("pv6: Create percpu rt6_info")
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Reported-by: NSteinar H. Gunderson <sgunderson@bigfoot.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9c7370a1
    • M
      ipv6: Add rt6_make_pcpu_route() · a73e4195
      Martin KaFai Lau 提交于
      It is a prep work for fixing a potential deadlock when creating
      a pcpu rt.
      
      The current rt6_get_pcpu_route() will also create a pcpu rt if one does not
      exist.  This patch moves the pcpu rt creation logic into another function,
      rt6_make_pcpu_route().
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a73e4195
    • M
      ipv6: Remove un-used argument from ip6_dst_alloc() · ad706862
      Martin KaFai Lau 提交于
      After 4b32b5ad ("ipv6: Stop rt6_info from using inet_peer's metrics"),
      ip6_dst_alloc() does not need the 'table' argument.  This patch
      cleans it up.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ad706862
    • C
      Revert "net: limit tcp/udp rmem/wmem to SOCK_{RCV,SND}BUF_MIN" · 5d37852b
      Calvin Owens 提交于
      Commit 8133534c ("net: limit tcp/udp rmem/wmem to
      SOCK_{RCV,SND}BUF_MIN") modified four sysctls to enforce that the values
      written to them are not less than SOCK_MIN_{RCV,SND}BUF.
      
      That change causes 4096 to no longer be accepted as a valid value for
      'min' in tcp_wmem and udp_wmem_min. 4096 has been the default for both
      of those sysctls for a long time, and unfortunately seems to be an
      extremely popular setting. This change breaks a large number of sysctl
      configurations at Facebook.
      
      That commit referred to b1cb59cf ("net: sysctl_net_core: check
      SNDBUF and RCVBUF for min length"), which choose to use the SOCK_MIN
      constants as the lower limits to avoid nasty bugs. But AFAICS, a limit
      of SOCK_MIN_SNDBUF isn't necessary to do that: the BUG_ON cited in the
      commit message seems to have happened because unix_stream_sendmsg()
      expects a minimum of a full page (ie SK_MEM_QUANTUM) and the math broke,
      not because it had less than SOCK_MIN_SNDBUF allocated.
      
      This particular issue doesn't seem to affect TCP however: using a
      setting of "1 1 1" for tcp_{r,w}mem works, although it's obviously
      suboptimal. SK_MEM_QUANTUM would be a nice minimum, but it's 64K on
      some archs, so there would still be breakage.
      
      Since a value of one doesn't seem to cause any problems, we can drop the
      minimum 8133534c added to fix this.
      
      This reverts commit 8133534c.
      
      Fixes: 8133534c ("net: limit tcp/udp rmem/wmem to SOCK_MIN...")
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Sorin Dumitru <sorin@returnze.ro>
      Signed-off-by: NCalvin Owens <calvinowens@fb.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5d37852b
  5. 14 8月, 2015 3 次提交
    • E
      inet: fix potential deadlock in reqsk_queue_unlink() · 83fccfc3
      Eric Dumazet 提交于
      When replacing del_timer() with del_timer_sync(), I introduced
      a deadlock condition :
      
      reqsk_queue_unlink() is called from inet_csk_reqsk_queue_drop()
      
      inet_csk_reqsk_queue_drop() can be called from many contexts,
      one being the timer handler itself (reqsk_timer_handler()).
      
      In this case, del_timer_sync() loops forever.
      
      Simple fix is to test if timer is pending.
      
      Fixes: 2235f2ac ("inet: fix races with reqsk timers")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      83fccfc3
    • A
      ipv4: off-by-one in continuation handling in /proc/net/route · 25b97c01
      Andy Whitcroft 提交于
      When generating /proc/net/route we emit a header followed by a line for
      each route.  When a short read is performed we will restart this process
      based on the open file descriptor.  When calculating the start point we
      fail to take into account that the 0th entry is the header.  This leads
      us to skip the first entry when doing a continuation read.
      
      This can be easily seen with the comparison below:
      
        while read l; do echo "$l"; done </proc/net/route >A
        cat /proc/net/route >B
        diff -bu A B | grep '^[+-]'
      
      On my example machine I have approximatly 10KB of route output.  There we
      see the very first non-title element is lost in the while read case,
      and an entry around the 8K mark in the cat case:
      
        +wlan0 00000000 02021EAC 0003 0 0 400 00000000 0 0 0
        -tun1  00C0AC0A 00000000 0001 0 0 950 00C0FFFF 0 0 0
      
      Fix up the off-by-one when reaquiring position on continuation.
      
      Fixes: 8be33e95 ("fib_trie: Fib walk rcu should take a tnode and key instead of a trie and a leaf")
      BugLink: http://bugs.launchpad.net/bugs/1483440Acked-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: NAndy Whitcroft <apw@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      25b97c01
    • L
      net: fix wrong skb_get() usage / crash in IGMP/MLD parsing code · a516993f
      Linus Lüssing 提交于
      The recent refactoring of the IGMP and MLD parsing code into
      ipv6_mc_check_mld() / ip_mc_check_igmp() introduced a potential crash /
      BUG() invocation for bridges:
      
      I wrongly assumed that skb_get() could be used as a simple reference
      counter for an skb which is not the case. skb_get() bears additional
      semantics, a user count. This leads to a BUG() invocation in
      pskb_expand_head() / kernel panic if pskb_may_pull() is called on an skb
      with a user count greater than one - unfortunately the refactoring did
      just that.
      
      Fixing this by removing the skb_get() call and changing the API: The
      caller of ipv6_mc_check_mld() / ip_mc_check_igmp() now needs to
      additionally check whether the returned skb_trimmed is a clone.
      
      Fixes: 9afd85c9 ("net: Export IGMP/MLD message validation code")
      Reported-by: NBrenden Blanco <bblanco@plumgrid.com>
      Signed-off-by: NLinus Lüssing <linus.luessing@c0d3.blue>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a516993f
  6. 13 8月, 2015 2 次提交
  7. 11 8月, 2015 4 次提交
    • E
      inet: fix possible request socket leak · 3257d8b1
      Eric Dumazet 提交于
      In commit b357a364 ("inet: fix possible panic in
      reqsk_queue_unlink()"), I missed fact that tcp_check_req()
      can return the listener socket in one case, and that we must
      release the request socket refcount or we leak it.
      
      Tested:
      
       Following packetdrill test template shows the issue
      
      0     socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      +0    setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      +0    bind(3, ..., ...) = 0
      +0    listen(3, 1) = 0
      
      +0    < S 0:0(0) win 2920 <mss 1460,sackOK,nop,nop>
      +0    > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK>
      +.002 < . 1:1(0) ack 21 win 2920
      +0    > R 21:21(0)
      
      Fixes: b357a364 ("inet: fix possible panic in reqsk_queue_unlink()")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3257d8b1
    • E
      inet: fix races with reqsk timers · 2235f2ac
      Eric Dumazet 提交于
      reqsk_queue_destroy() and reqsk_queue_unlink() should use
      del_timer_sync() instead of del_timer() before calling reqsk_put(),
      otherwise we could free a req still used by another cpu.
      
      But before doing so, reqsk_queue_destroy() must release syn_wait_lock
      spinlock or risk a dead lock, as reqsk_timer_handler() might
      need to take this same spinlock from reqsk_queue_unlink() (called from
      inet_csk_reqsk_queue_drop())
      
      Fixes: fa76ce73 ("inet: get rid of central tcp/dccp listener timer")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2235f2ac
    • F
      ipv6: don't reject link-local nexthop on other interface · 330567b7
      Florian Westphal 提交于
      48ed7b26 ("ipv6: reject locally assigned nexthop addresses") is too
      strict; it rejects following corner-case:
      
      ip -6 route add default via fe80::1:2:3 dev eth1
      
      [ where fe80::1:2:3 is assigned to a local interface, but not eth1 ]
      
      Fix this by restricting search to given device if nh is linklocal.
      
      Joint work with Hannes Frederic Sowa.
      
      Fixes: 48ed7b26 ("ipv6: reject locally assigned nexthop addresses")
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      330567b7
    • D
      netlink: make sure -EBUSY won't escape from netlink_insert · 4e7c1330
      Daniel Borkmann 提交于
      Linus reports the following deadlock on rtnl_mutex; triggered only
      once so far (extract):
      
      [12236.694209] NetworkManager  D 0000000000013b80     0  1047      1 0x00000000
      [12236.694218]  ffff88003f902640 0000000000000000 ffffffff815d15a9 0000000000000018
      [12236.694224]  ffff880119538000 ffff88003f902640 ffffffff81a8ff84 00000000ffffffff
      [12236.694230]  ffffffff81a8ff88 ffff880119c47f00 ffffffff815d133a ffffffff81a8ff80
      [12236.694235] Call Trace:
      [12236.694250]  [<ffffffff815d15a9>] ? schedule_preempt_disabled+0x9/0x10
      [12236.694257]  [<ffffffff815d133a>] ? schedule+0x2a/0x70
      [12236.694263]  [<ffffffff815d15a9>] ? schedule_preempt_disabled+0x9/0x10
      [12236.694271]  [<ffffffff815d2c3f>] ? __mutex_lock_slowpath+0x7f/0xf0
      [12236.694280]  [<ffffffff815d2cc6>] ? mutex_lock+0x16/0x30
      [12236.694291]  [<ffffffff814f1f90>] ? rtnetlink_rcv+0x10/0x30
      [12236.694299]  [<ffffffff8150ce3b>] ? netlink_unicast+0xfb/0x180
      [12236.694309]  [<ffffffff814f5ad3>] ? rtnl_getlink+0x113/0x190
      [12236.694319]  [<ffffffff814f202a>] ? rtnetlink_rcv_msg+0x7a/0x210
      [12236.694331]  [<ffffffff8124565c>] ? sock_has_perm+0x5c/0x70
      [12236.694339]  [<ffffffff814f1fb0>] ? rtnetlink_rcv+0x30/0x30
      [12236.694346]  [<ffffffff8150d62c>] ? netlink_rcv_skb+0x9c/0xc0
      [12236.694354]  [<ffffffff814f1f9f>] ? rtnetlink_rcv+0x1f/0x30
      [12236.694360]  [<ffffffff8150ce3b>] ? netlink_unicast+0xfb/0x180
      [12236.694367]  [<ffffffff8150d344>] ? netlink_sendmsg+0x484/0x5d0
      [12236.694376]  [<ffffffff810a236f>] ? __wake_up+0x2f/0x50
      [12236.694387]  [<ffffffff814cad23>] ? sock_sendmsg+0x33/0x40
      [12236.694396]  [<ffffffff814cb05e>] ? ___sys_sendmsg+0x22e/0x240
      [12236.694405]  [<ffffffff814cab75>] ? ___sys_recvmsg+0x135/0x1a0
      [12236.694415]  [<ffffffff811a9d12>] ? eventfd_write+0x82/0x210
      [12236.694423]  [<ffffffff811a0f9e>] ? fsnotify+0x32e/0x4c0
      [12236.694429]  [<ffffffff8108cb70>] ? wake_up_q+0x60/0x60
      [12236.694434]  [<ffffffff814cba09>] ? __sys_sendmsg+0x39/0x70
      [12236.694440]  [<ffffffff815d4797>] ? entry_SYSCALL_64_fastpath+0x12/0x6a
      
      It seems so far plausible that the recursive call into rtnetlink_rcv()
      looks suspicious. One way, where this could trigger is that the senders
      NETLINK_CB(skb).portid was wrongly 0 (which is rtnetlink socket), so
      the rtnl_getlink() request's answer would be sent to the kernel instead
      to the actual user process, thus grabbing rtnl_mutex() twice.
      
      One theory would be that netlink_autobind() triggered via netlink_sendmsg()
      internally overwrites the -EBUSY error to 0, but where it is wrongly
      originating from __netlink_insert() instead. That would reset the
      socket's portid to 0, which is then filled into NETLINK_CB(skb).portid
      later on. As commit d470e3b4 ("[NETLINK]: Fix two socket hashing bugs.")
      also puts it, -EBUSY should not be propagated from netlink_insert().
      
      It looks like it's very unlikely to reproduce. We need to trigger the
      rhashtable_insert_rehash() handler under a situation where rehashing
      currently occurs (one /rare/ way would be to hit ht->elasticity limits
      while not filled enough to expand the hashtable, but that would rather
      require a specifically crafted bind() sequence with knowledge about
      destination slots, seems unlikely). It probably makes sense to guard
      __netlink_insert() in any case and remap that error. It was suggested
      that EOVERFLOW might be better than an already overloaded ENOMEM.
      
      Reference: http://thread.gmane.org/gmane.linux.network/372676Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e7c1330
  8. 10 8月, 2015 2 次提交
  9. 07 8月, 2015 4 次提交
  10. 06 8月, 2015 1 次提交
    • J
      Bluetooth: fix MGMT_EV_NEW_LONG_TERM_KEY event · cb92205b
      Jakub Pawlowski 提交于
      This patch fixes how MGMT_EV_NEW_LONG_TERM_KEY event is build. Right now
      val vield is filled with only 1 byte, instead of whole value. This bug
      was introduced in
      commit 1fc62c52 ("Bluetooth: Fix exposing full value of shortened LTKs")
      
      Before that patch, if you paired with device using bluetoothd using simple
      pairing, and then restarted bluetoothd, you would be able to re-connect,
      but device would fail to establish encryption and would terminate
      connection. After this patch connecting after bluetoothd restart works
      fine.
      Signed-off-by: NJakub Pawlowski <jpawlowski@google.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      cb92205b
  11. 05 8月, 2015 5 次提交
  12. 04 8月, 2015 4 次提交
  13. 03 8月, 2015 1 次提交
    • E
      fq_codel: explicitly reset flows in ->reset() · 3d0e0af4
      Eric Dumazet 提交于
      Alex reported the following crash when using fq_codel
      with htb:
      
        crash> bt
        PID: 630839  TASK: ffff8823c990d280  CPU: 14  COMMAND: "tc"
         [... snip ...]
         #8 [ffff8820ceec17a0] page_fault at ffffffff8160a8c2
            [exception RIP: htb_qlen_notify+24]
            RIP: ffffffffa0841718  RSP: ffff8820ceec1858  RFLAGS: 00010282
            RAX: 0000000000000000  RBX: 0000000000000000  RCX: ffff88241747b400
            RDX: ffff88241747b408  RSI: 0000000000000000  RDI: ffff8811fb27d000
            RBP: ffff8820ceec1868   R8: ffff88120cdeff24   R9: ffff88120cdeff30
            R10: 0000000000000bd4  R11: ffffffffa0840919  R12: ffffffffa0843340
            R13: 0000000000000000  R14: 0000000000000001  R15: ffff8808dae5c2e8
            ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
         #9 [...] qdisc_tree_decrease_qlen at ffffffff81565375
        #10 [...] fq_codel_dequeue at ffffffffa084e0a0 [sch_fq_codel]
        #11 [...] fq_codel_reset at ffffffffa084e2f8 [sch_fq_codel]
        #12 [...] qdisc_destroy at ffffffff81560d2d
        #13 [...] htb_destroy_class at ffffffffa08408f8 [sch_htb]
        #14 [...] htb_put at ffffffffa084095c [sch_htb]
        #15 [...] tc_ctl_tclass at ffffffff815645a3
        #16 [...] rtnetlink_rcv_msg at ffffffff81552cb0
        [... snip ...]
      
      As Jamal pointed out, there is actually no need to call dequeue
      to purge the queued skb's in reset, data structures can be just
      reset explicitly. Therefore, we reset everything except config's
      and stats, so that we would have a fresh start after device flipping.
      
      Fixes: 4b549a2e ("fq_codel: Fair Queue Codel AQM")
      Reported-by: NAlex Gartrell <agartrell@fb.com>
      Cc: Alex Gartrell <agartrell@fb.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      [xiyou.wangcong@gmail.com: added codel_vars_init() and qdisc_qstats_backlog_dec()]
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3d0e0af4
  14. 01 8月, 2015 1 次提交
  15. 31 7月, 2015 2 次提交
    • S
      net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket · 8a681736
      Sowmini Varadhan 提交于
      The newsk returned by sk_clone_lock should hold a get_net()
      reference if, and only if, the parent is not a kernel socket
      (making this similar to sk_alloc()).
      
      E.g,. for the SYN_RECV path, tcp_v4_syn_recv_sock->..inet_csk_clone_lock
      sets up the syn_recv newsk from sk_clone_lock. When the parent (listen)
      socket is a kernel socket (defined in sk_alloc() as having
      sk_net_refcnt == 0), then the newsk should also have a 0 sk_net_refcnt
      and should not hold a get_net() reference.
      
      Fixes: 26abe143 ("net: Modify sk_alloc to not reference count the
            netns of kernel sockets.")
      Acked-by: NEric Dumazet <edumazet@google.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a681736
    • D
      net: sched: fix refcount imbalance in actions · 28e6b67f
      Daniel Borkmann 提交于
      Since commit 55334a5d ("net_sched: act: refuse to remove bound action
      outside"), we end up with a wrong reference count for a tc action.
      
      Test case 1:
      
        FOO="1,6 0 0 4294967295,"
        BAR="1,6 0 0 4294967294,"
        tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 \
           action bpf bytecode "$FOO"
        tc actions show action bpf
          action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe
          index 1 ref 1 bind 1
        tc actions replace action bpf bytecode "$BAR" index 1
        tc actions show action bpf
          action order 0: bpf bytecode '1,6 0 0 4294967294' default-action pipe
          index 1 ref 2 bind 1
        tc actions replace action bpf bytecode "$FOO" index 1
        tc actions show action bpf
          action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe
          index 1 ref 3 bind 1
      
      Test case 2:
      
        FOO="1,6 0 0 4294967295,"
        tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 action ok
        tc actions show action gact
          action order 0: gact action pass
          random type none pass val 0
           index 1 ref 1 bind 1
        tc actions add action drop index 1
          RTNETLINK answers: File exists [...]
        tc actions show action gact
          action order 0: gact action pass
           random type none pass val 0
           index 1 ref 2 bind 1
        tc actions add action drop index 1
          RTNETLINK answers: File exists [...]
        tc actions show action gact
          action order 0: gact action pass
           random type none pass val 0
           index 1 ref 3 bind 1
      
      What happens is that in tcf_hash_check(), we check tcf_common for a given
      index and increase tcfc_refcnt and conditionally tcfc_bindcnt when we've
      found an existing action. Now there are the following cases:
      
        1) We do a late binding of an action. In that case, we leave the
           tcfc_refcnt/tcfc_bindcnt increased and are done with the ->init()
           handler. This is correctly handeled.
      
        2) We replace the given action, or we try to add one without replacing
           and find out that the action at a specific index already exists
           (thus, we go out with error in that case).
      
      In case of 2), we have to undo the reference count increase from
      tcf_hash_check() in the tcf_hash_check() function. Currently, we fail to
      do so because of the 'tcfc_bindcnt > 0' check which bails out early with
      an -EPERM error.
      
      Now, while commit 55334a5d prevents 'tc actions del action ...' on an
      already classifier-bound action to drop the reference count (which could
      then become negative, wrap around etc), this restriction only accounts for
      invocations outside a specific action's ->init() handler.
      
      One possible solution would be to add a flag thus we possibly trigger
      the -EPERM ony in situations where it is indeed relevant.
      
      After the patch, above test cases have correct reference count again.
      
      Fixes: 55334a5d ("net_sched: act: refuse to remove bound action outside")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: NCong Wang <cwang@twopensource.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      28e6b67f
  16. 30 7月, 2015 4 次提交
    • D
      netfilter: nf_conntrack: checking for IS_ERR() instead of NULL · 1a727c63
      Dan Carpenter 提交于
      We recently changed this from nf_conntrack_alloc() to nf_ct_tmpl_alloc()
      so the error handling needs to changed to check for NULL instead of
      IS_ERR().
      
      Fixes: 0838aa7f ('netfilter: fix netns dependencies with conntrack templates')
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      1a727c63
    • P
      netfilter: nf_conntrack: silence warning on falling back to vmalloc() · f0ad4621
      Pablo Neira Ayuso 提交于
      Since 88eab472 ("netfilter: conntrack: adjust nf_conntrack_buckets default
      value"), the hashtable can easily hit this warning. We got reports from users
      that are getting this message in a quite spamming fashion, so better silence
      this.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: NFlorian Westphal <fw@strlen.de>
      f0ad4621
    • D
      act_bpf: fix memory leaks when replacing bpf programs · f4eaed28
      Daniel Borkmann 提交于
      We currently trigger multiple memory leaks when replacing bpf
      actions, besides others:
      
        comm "tc", pid 1909, jiffies 4294851310 (age 1602.796s)
        hex dump (first 32 bytes):
          01 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00  ................
          18 b0 98 6d 00 88 ff ff 00 00 00 00 00 00 00 00  ...m............
        backtrace:
          [<ffffffff817e623e>] kmemleak_alloc+0x4e/0xb0
          [<ffffffff8120a22d>] __vmalloc_node_range+0x1bd/0x2c0
          [<ffffffff8120a37a>] __vmalloc+0x4a/0x50
          [<ffffffff811a8d0a>] bpf_prog_alloc+0x3a/0xa0
          [<ffffffff816c0684>] bpf_prog_create+0x44/0xa0
          [<ffffffffa09ba4eb>] tcf_bpf_init+0x28b/0x3c0 [act_bpf]
          [<ffffffff816d7001>] tcf_action_init_1+0x191/0x1b0
          [<ffffffff816d70a2>] tcf_action_init+0x82/0xf0
          [<ffffffff816d4d12>] tcf_exts_validate+0xb2/0xc0
          [<ffffffffa09b5838>] cls_bpf_modify_existing+0x98/0x340 [cls_bpf]
          [<ffffffffa09b5cd6>] cls_bpf_change+0x1a6/0x274 [cls_bpf]
          [<ffffffff816d56e5>] tc_ctl_tfilter+0x335/0x910
          [<ffffffff816b9145>] rtnetlink_rcv_msg+0x95/0x240
          [<ffffffff816df34f>] netlink_rcv_skb+0xaf/0xc0
          [<ffffffff816b909e>] rtnetlink_rcv+0x2e/0x40
          [<ffffffff816deaaf>] netlink_unicast+0xef/0x1b0
      
      Issue is that the old content from tcf_bpf is allocated and needs
      to be released when we replace it. We seem to do that since the
      beginning of act_bpf on the filter and insns, later on the name as
      well.
      
      Example test case, after patch:
      
        # FOO="1,6 0 0 4294967295,"
        # BAR="1,6 0 0 4294967294,"
        # tc actions add action bpf bytecode "$FOO" index 2
        # tc actions show action bpf
         action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe
         index 2 ref 1 bind 0
        # tc actions replace action bpf bytecode "$BAR" index 2
        # tc actions show action bpf
         action order 0: bpf bytecode '1,6 0 0 4294967294' default-action pipe
         index 2 ref 1 bind 0
        # tc actions replace action bpf bytecode "$FOO" index 2
        # tc actions show action bpf
         action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe
         index 2 ref 1 bind 0
        # tc actions del action bpf index 2
        [...]
        # echo "scan" > /sys/kernel/debug/kmemleak
        # cat /sys/kernel/debug/kmemleak | grep "comm \"tc\"" | wc -l
        0
      
      Fixes: d23b8ad8 ("tc: add BPF based action")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4eaed28
    • E
      ipv6: flush nd cache on IFF_NOARP change · c8507fb2
      Eric Dumazet 提交于
      This patch is the IPv6 equivalent of commit
      6c8b4e3f ("arp: flush arp cache on IFF_NOARP change")
      
      Without it, we keep buggy neighbours in the cache, with destination
      MAC address equal to our own MAC address.
      
      Tested:
       tcpdump -i eth0 -s 0 ip6 -n -e &
       ip link set dev eth0 arp off
       ping6 remote   // sends buggy frames
       ip link set dev eth0 arp on
       ping6 remote   // should work once kernel is patched
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NMario Fanelli <mariofanelli@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c8507fb2