1. 29 6月, 2016 2 次提交
  2. 28 6月, 2016 3 次提交
  3. 27 6月, 2016 3 次提交
  4. 25 6月, 2016 4 次提交
    • K
      Revert "mm: make faultaround produce old ptes" · 315d09bf
      Kirill A. Shutemov 提交于
      This reverts commit 5c0a85fa.
      
      The commit causes ~6% regression in unixbench.
      
      Let's revert it for now and consider other solution for reclaim problem
      later.
      
      Link: http://lkml.kernel.org/r/1465893750-44080-2-git-send-email-kirill.shutemov@linux.intel.comSigned-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reported-by: N"Huang, Ying" <ying.huang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Vinayak Menon <vinmenon@codeaurora.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      315d09bf
    • A
      mm: mempool: kasan: don't poot mempool objects in quarantine · 9b75a867
      Andrey Ryabinin 提交于
      Currently we may put reserved by mempool elements into quarantine via
      kasan_kfree().  This is totally wrong since quarantine may really free
      these objects.  So when mempool will try to use such element,
      use-after-free will happen.  Or mempool may decide that it no longer
      need that element and double-free it.
      
      So don't put object into quarantine in kasan_kfree(), just poison it.
      Rename kasan_kfree() to kasan_poison_kfree() to respect that.
      
      Also, we shouldn't use kasan_slab_alloc()/kasan_krealloc() in
      kasan_unpoison_element() because those functions may update allocation
      stacktrace.  This would be wrong for the most of the remove_element call
      sites.
      
      (The only call site where we may want to update alloc stacktrace is
       in mempool_alloc(). Kmemleak solves this by calling
       kmemleak_update_trace(), so we could make something like that too.
       But this is out of scope of this patch).
      
      Fixes: 55834c59 ("mm: kasan: initial memory quarantine implementation")
      Link: http://lkml.kernel.org/r/575977C3.1010905@virtuozzo.comSigned-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Reported-by: NKuthonuzo Luruo <kuthonuzo.luruo@hpe.com>
      Acked-by: NAlexander Potapenko <glider@google.com>
      Cc: Dmitriy Vyukov <dvyukov@google.com>
      Cc: Kostya Serebryany <kcc@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9b75a867
    • L
      fix up initial thread stack pointer vs thread_info confusion · 7f1a00b6
      Linus Torvalds 提交于
      The INIT_TASK() initializer was similarly confused about the stack vs
      thread_info allocation that the allocators had, and that were fixed in
      commit b235beea ("Clarify naming of thread info/stack allocators").
      
      The task ->stack pointer only incidentally ends up having the same value
      as the thread_info, and in fact that will change.
      
      So fix the initial task struct initializer to point to 'init_stack'
      instead of 'init_thread_info', and make sure the ia64 definition for
      that exists.
      
      This actually makes the ia64 tsk->stack pointer be sensible for the
      initial task, but not for any other task.  As mentioned in commit
      b235beea, that whole pointer isn't actually used on ia64, since
      task_stack_page() there just points to the (single) allocation.
      
      All the other architectures seem to have copied the 'init_stack'
      definition, even if it tended to be generally unusued.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7f1a00b6
    • L
      Clarify naming of thread info/stack allocators · b235beea
      Linus Torvalds 提交于
      We've had the thread info allocated together with the thread stack for
      most architectures for a long time (since the thread_info was split off
      from the task struct), but that is about to change.
      
      But the patches that move the thread info to be off-stack (and a part of
      the task struct instead) made it clear how confused the allocator and
      freeing functions are.
      
      Because the common case was that we share an allocation with the thread
      stack and the thread_info, the two pointers were identical.  That
      identity then meant that we would have things like
      
      	ti = alloc_thread_info_node(tsk, node);
      	...
      	tsk->stack = ti;
      
      which certainly _worked_ (since stack and thread_info have the same
      value), but is rather confusing: why are we assigning a thread_info to
      the stack? And if we move the thread_info away, the "confusing" code
      just gets to be entirely bogus.
      
      So remove all this confusion, and make it clear that we are doing the
      stack allocation by renaming and clarifying the function names to be
      about the stack.  The fact that the thread_info then shares the
      allocation is an implementation detail, and not really about the
      allocation itself.
      
      This is a pure renaming and type fix: we pass in the same pointer, it's
      just that we clarify what the pointer means.
      
      The ia64 code that actually only has one single allocation (for all of
      task_struct, thread_info and kernel thread stack) now looks a bit odd,
      but since "tsk->stack" is actually not even used there, that oddity
      doesn't matter.  It would be a separate thing to clean that up, I
      intentionally left the ia64 changes as a pure brute-force renaming and
      type change.
      Acked-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b235beea
  5. 24 6月, 2016 4 次提交
  6. 23 6月, 2016 2 次提交
  7. 22 6月, 2016 1 次提交
    • D
      rxrpc: Fix exclusive connection handling · cc8feb8e
      David Howells 提交于
      "Exclusive connections" are meant to be used for a single client call and
      then scrapped.  The idea is to limit the use of the negotiated security
      context.  The current code, however, isn't doing this: it is instead
      restricting the socket to a single virtual connection and doing all the
      calls over that.
      
      This is changed such that the socket no longer maintains a special virtual
      connection over which it will do all the calls, but rather gets a new one
      each time a new exclusive call is made.
      
      Further, using a socket option for this is a poor choice.  It should be
      done on sendmsg with a control message marker instead so that calls can be
      marked exclusive individually.  To that end, add RXRPC_EXCLUSIVE_CALL
      which, if passed to sendmsg() as a control message element, will cause the
      call to be done on an single-use connection.
      
      The socket option (RXRPC_EXCLUSIVE_CONNECTION) still exists and, if set,
      will override any lack of RXRPC_EXCLUSIVE_CALL being specified so that
      programs using the setsockopt() will appear to work the same.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc8feb8e
  8. 20 6月, 2016 1 次提交
  9. 19 6月, 2016 4 次提交
  10. 18 6月, 2016 4 次提交
  11. 17 6月, 2016 1 次提交
  12. 16 6月, 2016 11 次提交
    • D
      bpf, maps: flush own entries on perf map release · 3b1efb19
      Daniel Borkmann 提交于
      The behavior of perf event arrays are quite different from all
      others as they are tightly coupled to perf event fds, f.e. shown
      recently by commit e03e7ee3 ("perf/bpf: Convert perf_event_array
      to use struct file") to make refcounting on perf event more robust.
      A remaining issue that the current code still has is that since
      additions to the perf event array take a reference on the struct
      file via perf_event_get() and are only released via fput() (that
      cleans up the perf event eventually via perf_event_release_kernel())
      when the element is either manually removed from the map from user
      space or automatically when the last reference on the perf event
      map is dropped. However, this leads us to dangling struct file's
      when the map gets pinned after the application owning the perf
      event descriptor exits, and since the struct file reference will
      in such case only be manually dropped or via pinned file removal,
      it leads to the perf event living longer than necessary, consuming
      needlessly resources for that time.
      
      Relations between perf event fds and bpf perf event map fds can be
      rather complex. F.e. maps can act as demuxers among different perf
      event fds that can possibly be owned by different threads and based
      on the index selection from the program, events get dispatched to
      one of the per-cpu fd endpoints. One perf event fd (or, rather a
      per-cpu set of them) can also live in multiple perf event maps at
      the same time, listening for events. Also, another requirement is
      that perf event fds can get closed from application side after they
      have been attached to the perf event map, so that on exit perf event
      map will take care of dropping their references eventually. Likewise,
      when such maps are pinned, the intended behavior is that a user
      application does bpf_obj_get(), puts its fds in there and on exit
      when fd is released, they are dropped from the map again, so the map
      acts rather as connector endpoint. This also makes perf event maps
      inherently different from program arrays as described in more detail
      in commit c9da161c ("bpf: fix clearing on persistent program
      array maps").
      
      To tackle this, map entries are marked by the map struct file that
      added the element to the map. And when the last reference to that map
      struct file is released from user space, then the tracked entries
      are purged from the map. This is okay, because new map struct files
      instances resp. frontends to the anon inode are provided via
      bpf_map_new_fd() that is called when we invoke bpf_obj_get_user()
      for retrieving a pinned map, but also when an initial instance is
      created via map_create(). The rest is resolved by the vfs layer
      automatically for us by keeping reference count on the map's struct
      file. Any concurrent updates on the map slot are fine as well, it
      just means that perf_event_fd_array_release() needs to delete less
      of its own entires.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b1efb19
    • D
      bpf, maps: extend map_fd_get_ptr arguments · d056a788
      Daniel Borkmann 提交于
      This patch extends map_fd_get_ptr() callback that is used by fd array
      maps, so that struct file pointer from the related map can be passed
      in. It's safe to remove map_update_elem() callback for the two maps since
      this is only allowed from syscall side, but not from eBPF programs for these
      two map types. Like in per-cpu map case, bpf_fd_array_map_update_elem()
      needs to be called directly here due to the extra argument.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d056a788
    • D
      bpf, maps: add release callback · 61d1b6a4
      Daniel Borkmann 提交于
      Add a release callback for maps that is invoked when the last
      reference to its struct file is gone and the struct file about
      to be released by vfs. The handler will be used by fd array maps.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      61d1b6a4
    • A
      bpf: fix matching of data/data_end in verifier · 19de99f7
      Alexei Starovoitov 提交于
      The ctx structure passed into bpf programs is different depending on bpf
      program type. The verifier incorrectly marked ctx->data and ctx->data_end
      access based on ctx offset only. That caused loads in tracing programs
      int bpf_prog(struct pt_regs *ctx) { .. ctx->ax .. }
      to be incorrectly marked as PTR_TO_PACKET which later caused verifier
      to reject the program that was actually valid in tracing context.
      Fix this by doing program type specific matching of ctx offsets.
      
      Fixes: 969bf05e ("bpf: direct packet access")
      Reported-by: NSasha Goldshtein <goldshtn@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      19de99f7
    • J
      net: Don't forget pr_fmt on net_dbg_ratelimited for CONFIG_DYNAMIC_DEBUG · daddef76
      Jason A. Donenfeld 提交于
      The implementation of net_dbg_ratelimited in the CONFIG_DYNAMIC_DEBUG
      case was added with 2c94b537 ("net: Implement net_dbg_ratelimited() for
      CONFIG_DYNAMIC_DEBUG case"). The implementation strategy was to take the
      usual definition of the dynamic_pr_debug macro, but alter it by adding a
      call to "net_ratelimit()" in the if statement. This is, in fact, the
      correct approach.
      
      However, while doing this, the author of the commit forgot to surround
      fmt by pr_fmt, resulting in unprefixed log messages appearing in the
      console. So, this commit adds back the pr_fmt(fmt) invocation, making
      net_dbg_ratelimited properly consistent across DEBUG, no DEBUG, and
      DYNAMIC_DEBUG cases, and bringing parity with the behavior of
      dynamic_pr_debug as well.
      
      Fixes: 2c94b537 ("net: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG case")
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Cc: Tim Bingham <tbingham@akamai.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      daddef76
    • A
      ipv6: introduce neighbour discovery ops · f997c55c
      Alexander Aring 提交于
      This patch introduces neighbour discovery ops callback structure. The
      idea is to separate the handling for 6LoWPAN into the 6lowpan module.
      
      These callback offers 6lowpan different handling, such as 802.15.4 short
      address handling or RFC6775 (Neighbor Discovery Optimization for IPv6
      over 6LoWPANs).
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: James Morris <jmorris@namei.org>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: Patrick McHardy <kaber@trash.net>
      Acked-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: NAlexander Aring <aar@pengutronix.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f997c55c
    • A
      6lowpan: add private neighbour data · 8626a0c8
      Alexander Aring 提交于
      This patch will introduce a 6lowpan neighbour private data. Like the
      interface private data we handle private data for generic 6lowpan and
      for link-layer specific 6lowpan.
      
      The current first use case if to save the short address for a 802.15.4
      6lowpan neighbour.
      
      Cc: David S. Miller <davem@davemloft.net>
      Reviewed-by: NStefan Schmidt <stefan@osg.samsung.com>
      Acked-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: NAlexander Aring <aar@pengutronix.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8626a0c8
    • E
      net_sched: add the ability to defer skb freeing · 1b5c5493
      Eric Dumazet 提交于
      qdisc are changed under RTNL protection and often
      while blocking BH and root qdisc spinlock.
      
      When lots of skbs need to be dropped, we free
      them under these locks causing TX/RX freezes,
      and more generally latency spikes.
      
      This commit adds rtnl_kfree_skbs(), used to queue
      skbs for deferred freeing.
      
      Actual freeing happens right after RTNL is released,
      with appropriate scheduling points.
      
      rtnl_qdisc_drop() can also be used in place
      of disc_drop() when RTNL is held.
      
      qdisc_reset_queue() and __qdisc_reset_queue() get
      the new behavior, so standard qdiscs like pfifo, pfifo_fast...
      have their ->reset() method automatically handled.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1b5c5493
    • M
      skb_array: resize support · 7d7072e3
      Michael S. Tsirkin 提交于
      Update skb_array after ptr_ring API changes.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Tested-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d7072e3
    • M
      ptr_ring: resize support · 5d49de53
      Michael S. Tsirkin 提交于
      This adds ring resize support. Seems to be necessary as
      users such as tun allow userspace control over queue size.
      
      If resize is used, this costs us ability to peek at queue without
      consumer lock - should not be a big deal as peek and consumer are
      usually run on the same CPU.
      
      If ring is made bigger, ring contents is preserved.  If ring is made
      smaller, extra pointers are passed to an optional destructor callback.
      
      Cleanup function also gains destructor callback such that
      all pointers in queue can be cleaned up.
      
      This changes some APIs but we don't have any users yet,
      so it won't break bisect.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5d49de53
    • M
      skb_array: array based FIFO for skbs · ad69f35d
      Michael S. Tsirkin 提交于
      A simple array based FIFO of pointers.  Intended for net stack so uses
      skbs for type safety. Implemented as a set of wrappers around ptr_ring.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Tested-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ad69f35d