1. 15 2月, 2017 2 次提交
  2. 11 2月, 2017 1 次提交
  3. 03 2月, 2017 8 次提交
    • S
      ftrace: Have set_graph_function handle multiple functions in one write · e704eff3
      Steven Rostedt (VMware) 提交于
      Currently, only one function can be written to set_graph_function and
      set_graph_notrace. The last function in the list will have saved, even
      though other functions will be added then removed.
      
      Change the behavior to be the same as set_ftrace_function as to allow
      multiple functions to be written. If any one fails, none of them will be
      added. The addition of the functions are done at the end when the file is
      closed.
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      e704eff3
    • S
      ftrace: Do not hold references of ftrace_graph_{notrace_}hash out of graph_lock · 649b988b
      Steven Rostedt (VMware) 提交于
      The hashs ftrace_graph_hash and ftrace_graph_notrace_hash are modified
      within the graph_lock being held. Holding a pointer to them and passing them
      along can lead to a use of a stale pointer (fgd->hash). Move assigning the
      pointer and its use to within the holding of the lock. Note, it's an
      rcu_sched protected data, and other instances of referencing them are done
      with preemption disabled. But the file manipuation code must be protected by
      the lock.
      
      The fgd->hash pointer is set to NULL when the lock is being released.
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      649b988b
    • S
      tracing: Reset parser->buffer to allow multiple "puts" · 0e684b65
      Steven Rostedt (VMware) 提交于
      trace_parser_put() simply frees the allocated parser buffer. But it does not
      reset the pointer that was freed. This means that if trace_parser_put() is
      called on the same parser more than once, it will corrupt the allocation
      system. Setting parser->buffer to NULL after free allows it to be called
      more than once without any ill effect.
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      0e684b65
    • S
      ftrace: Have set_graph_functions handle write with RDWR · ae98d27a
      Steven Rostedt (VMware) 提交于
      Since reading the set_graph_functions uses seq functions, which sets the
      file->private_data pointer to a seq_file descriptor. On writes the
      ftrace_graph_data descriptor is set to file->private_data. But if the file
      is opened for RDWR, the ftrace_graph_write() will incorrectly use the
      file->private_data descriptor instead of
      ((struct seq_file *)file->private_data)->private pointer, and this can crash
      the kernel.
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      ae98d27a
    • S
      ftrace: Reset fgd->hash in ftrace_graph_write() · d4ad9a1c
      Steven Rostedt (VMware) 提交于
      fgd->hash is saved and then freed, but is never reset to either
      ftrace_graph_hash nor ftrace_graph_notrace_hash. But if multiple writes are
      performed, then the freed hash could be accessed again.
      
       # cd /sys/kernel/debug/tracing
       # head -1000 available_filter_functions > /tmp/funcs
       # cat /tmp/funcs > set_graph_function
      
      Causes:
      
       general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
       Modules linked in:  [...]
       CPU: 2 PID: 1337 Comm: cat Not tainted 4.10.0-rc2-test-00010-g6b052e9 #32
       Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
       task: ffff880113a12200 task.stack: ffffc90001940000
       RIP: 0010:free_ftrace_hash+0x7c/0x160
       RSP: 0018:ffffc90001943db0 EFLAGS: 00010246
       RAX: 6b6b6b6b6b6b6b6b RBX: 6b6b6b6b6b6b6b6b RCX: 6b6b6b6b6b6b6b6b
       RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffff8800ce1e1d40
       RBP: ffff8800ce1e1d50 R08: 0000000000000000 R09: 0000000000006400
       R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
       R13: ffff8800ce1e1d40 R14: 0000000000004000 R15: 0000000000000001
       FS:  00007f9408a07740(0000) GS:ffff88011e500000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000aee1f0 CR3: 0000000116bb4000 CR4: 00000000001406e0
       Call Trace:
        ? ftrace_graph_write+0x150/0x190
        ? __vfs_write+0x1f6/0x210
        ? __audit_syscall_entry+0x17f/0x200
        ? rw_verify_area+0xdb/0x210
        ? _cond_resched+0x2b/0x50
        ? __sb_start_write+0xb4/0x130
        ? vfs_write+0x1c8/0x330
        ? SyS_write+0x62/0xf0
        ? do_syscall_64+0xa3/0x1b0
        ? entry_SYSCALL64_slow_path+0x25/0x25
       Code: 01 48 85 db 0f 84 92 00 00 00 b8 01 00 00 00 d3 e0 85 c0 7e 3f 83 e8 01 48 8d 6f 10 45 31 e4 4c 8d 34 c5 08 00 00 00 49 8b 45 08 <4a> 8b 34 20 48 85 f6 74 13 48 8b 1e 48 89 ef e8 20 fa ff ff 48
       RIP: free_ftrace_hash+0x7c/0x160 RSP: ffffc90001943db0
       ---[ end trace 999b48216bf4b393 ]---
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      d4ad9a1c
    • S
      ftrace: Replace (void *)1 with a meaningful macro name FTRACE_GRAPH_EMPTY · 555fc781
      Steven Rostedt (VMware) 提交于
      When the set_graph_function or set_graph_notrace contains no records, a
      banner is displayed of either "#### all functions enabled ####" or
      "#### all functions disabled ####" respectively. To tell the seq operations
      to do this, (void *)1 is passed as a return value. Instead of using a
      hardcoded meaningless variable, define it as a macro.
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      555fc781
    • S
      ftrace: Create a slight optimization on searching the ftrace_hash · 2b2c279c
      Steven Rostedt (VMware) 提交于
      This is a micro-optimization, but as it has to deal with a fast path of the
      function tracer, these optimizations can be noticed.
      
      The ftrace_lookup_ip() returns true if the given ip is found in the hash. If
      it's not found or the hash is NULL, it returns false. But there's some cases
      that a NULL hash is a true, and the ftrace_hash_empty() is tested before
      calling ftrace_lookup_ip() in those cases. But as ftrace_lookup_ip() tests
      that first, that adds a few extra unneeded instructions in those cases.
      
      A new static "always_inlined" function is created that does not perform the
      hash empty test. This most only be used by callers that do the check first
      anyway, as an empty or NULL hash could cause a crash if a lookup is
      performed on it.
      
      Also add kernel doc for the ftrace_lookup_ip() main function.
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      2b2c279c
    • S
      tracing: Add ftrace_hash_key() helper function · 2b0cce0e
      Steven Rostedt (VMware) 提交于
      Replace the couple of use cases that has small logic to produce the ftrace
      function key id with a helper function. No need for duplicate code.
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      2b0cce0e
  4. 21 1月, 2017 3 次提交
  5. 19 1月, 2017 2 次提交
  6. 18 1月, 2017 2 次提交
    • S
      tracing: Process constants for (un)likely() profiler · d45ae1f7
      Steven Rostedt (VMware) 提交于
      When running the likely/unlikely profiler, one of the results did not look
      accurate. It noted that the unlikely() in link_path_walk() was 100%
      incorrect. When I added a trace_printk() to see what was happening there, it
      became 80% correct! Looking deeper into what whas happening, I found that
      gcc split that if statement into two paths. One where the if statement
      became a constant, the other path a variable. The other path had the if
      statement always hit (making the unlikely there, always false), but since
      the #define unlikely() has:
      
        #define unlikely() (__builtin_constant_p(x) ? !!(x) : __branch_check__(x, 0))
      
      Where constants are ignored by the branch profiler, the "constant" path
      made by the compiler was ignored, even though it was hit 80% of the time.
      
      By just passing the constant value to the __branch_check__() function and
      tracing it out of line (as always correct, as likely/unlikely isn't a factor
      for constants), then we get back the accurate readings of branches that were
      optimized by gcc causing part of the execution to become constant.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      d45ae1f7
    • K
      uprobe: Find last occurrence of ':' when parsing uprobe PATH:OFFSET · 6496bb72
      Kenny Yu 提交于
      Previously, `create_trace_uprobe` found the *first* occurence
      of the ':' character when parsing `PATH:OFFSET` for a uprobe.
      However, if the path contains a ':' character, then the function
      would parse the path incorrectly. Even worse, if the path does not
      exist, the subsequent call to `kern_path()` would set `ret` to
      `ENOENT`, leading to very cryptic errno values in user space.
      
      The fix is to find the *last* occurence of ':'.
      
      How to repro:: The write fails with "No such file or directory", suggesting
      incorrectly that the `uprobe_events` file does not exist.
      
        $ mkdir testing && cd testing
        $ cp /bin/bash .
        $ cp /bin/bash ./bash:with:colon
        $ echo "p:uprobes/p__root_testing_bash_0x6 /root/testing/bash:0x6" > /sys/kernel/debug/tracing/uprobe_events     # this works
        $ echo "p:uprobes/p__root_testing_bash_with_colon_0x6 /root/testing/bash:with:colon:0x6" >> /sys/kernel/debug/tracing/uprobe_events     # this doesn't
        -bash: echo: write error: No such file or directory
      
      With the patch:
      
        $ echo "p:uprobes/p__root_testing_bash_0x6 /root/testing/bash:0x6" > /sys/kernel/debug/tracing/uprobe_events     # this still works
        $ echo "p:uprobes/p__root_testing_bash_with_colon_0x6 /root/testing/bash:with:colon:0x6" >> /sys/kernel/debug/tracing/uprobe_events     # this works now too!
        $ cat /sys/kernel/debug/tracing/uprobe_events
        p:uprobes/p__root_testing_bash_0x6 /root/testing/bash:0x0000000000000006
        p:uprobes/p__root_testing_bash_with_colon_0x6 /root/testing/bash:with:colon:0x0000000000000006
      
      Link: http://lkml.kernel.org/r/20170113165834.4081016-1-kennyyu@fb.comSigned-off-by: NKenny Yu <kennyyu@fb.com>
      Reviewed-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      6496bb72
  7. 02 1月, 2017 2 次提交
    • L
      Linux 4.10-rc2 · 0c744ea4
      Linus Torvalds 提交于
      0c744ea4
    • L
      Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · 4759d386
      Linus Torvalds 提交于
      Pull DAX updates from Dan Williams:
       "The completion of Jan's DAX work for 4.10.
      
        As I mentioned in the libnvdimm-for-4.10 pull request, these are some
        final fixes for the DAX dirty-cacheline-tracking invalidation work
        that was merged through the -mm, ext4, and xfs trees in -rc1. These
        patches were prepared prior to the merge window, but we waited for
        4.10-rc1 to have a stable merge base after all the prerequisites were
        merged.
      
        Quoting Jan on the overall changes in these patches:
      
           "So I'd like all these 6 patches to go for rc2. The first three
            patches fix invalidation of exceptional DAX entries (a bug which
            is there for a long time) - without these patches data loss can
            occur on power failure even though user called fsync(2). The other
            three patches change locking of DAX faults so that ->iomap_begin()
            is called in a more relaxed locking context and we are safe to
            start a transaction there for ext4"
      
        These have received a build success notification from the kbuild
        robot, and pass the latest libnvdimm unit tests. There have not been
        any -next releases since -rc1, so they have not appeared there"
      
      * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        ext4: Simplify DAX fault path
        dax: Call ->iomap_begin without entry lock during dax fault
        dax: Finish fault completely when loading holes
        dax: Avoid page invalidation races and unnecessary radix tree traversals
        mm: Invalidate DAX radix tree entries only if appropriate
        ext2: Return BH_New buffers for zeroed blocks
      4759d386
  8. 31 12月, 2016 2 次提交
  9. 30 12月, 2016 2 次提交
    • O
      mm/filemap: fix parameters to test_bit() · 98473f9f
      Olof Johansson 提交于
       mm/filemap.c: In function 'clear_bit_unlock_is_negative_byte':
        mm/filemap.c:933:9: error: too few arguments to function 'test_bit'
          return test_bit(PG_waiters);
               ^~~~~~~~
      
      Fixes: b91e1302 ('mm: optimize PageWaiters bit use for unlock_page()')
      Signed-off-by: NOlof Johansson <olof@lixom.net>
      Brown-paper-bag-by: NLinus Torvalds <dummy@duh.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      98473f9f
    • L
      mm: optimize PageWaiters bit use for unlock_page() · b91e1302
      Linus Torvalds 提交于
      In commit 62906027 ("mm: add PageWaiters indicating tasks are
      waiting for a page bit") Nick Piggin made our page locking no longer
      unconditionally touch the hashed page waitqueue, which not only helps
      performance in general, but is particularly helpful on NUMA machines
      where the hashed wait queues can bounce around a lot.
      
      However, the "clear lock bit atomically and then test the waiters bit"
      sequence turns out to be much more expensive than it needs to be,
      because you get a nasty stall when trying to access the same word that
      just got updated atomically.
      
      On architectures where locking is done with LL/SC, this would be trivial
      to fix with a new primitive that clears one bit and tests another
      atomically, but that ends up not working on x86, where the only atomic
      operations that return the result end up being cmpxchg and xadd.  The
      atomic bit operations return the old value of the same bit we changed,
      not the value of an unrelated bit.
      
      On x86, we could put the lock bit in the high bit of the byte, and use
      "xadd" with that bit (where the overflow ends up not touching other
      bits), and look at the other bits of the result.  However, an even
      simpler model is to just use a regular atomic "and" to clear the lock
      bit, and then the sign bit in eflags will indicate the resulting state
      of the unrelated bit #7.
      
      So by moving the PageWaiters bit up to bit #7, we can atomically clear
      the lock bit and test the waiters bit on x86 too.  And architectures
      with LL/SC (which is all the usual RISC suspects), the particular bit
      doesn't matter, so they are fine with this approach too.
      
      This avoids the extra access to the same atomic word, and thus avoids
      the costly stall at page unlock time.
      
      The only downside is that the interface ends up being a bit odd and
      specialized: clear a bit in a byte, and test the sign bit.  Nick doesn't
      love the resulting name of the new primitive, but I'd rather make the
      name be descriptive and very clear about the limitation imposed by
      trying to work across all relevant architectures than make it be some
      generic thing that doesn't make the odd semantics explicit.
      
      So this introduces the new architecture primitive
      
          clear_bit_unlock_is_negative_byte();
      
      and adds the trivial implementation for x86.  We have a generic
      non-optimized fallback (that just does a "clear_bit()"+"test_bit(7)"
      combination) which can be overridden by any architecture that can do
      better.  According to Nick, Power has the same hickup x86 has, for
      example, but some other architectures may not even care.
      
      All these optimizations mean that my page locking stress-test (which is
      just executing a lot of small short-lived shell scripts: "make test" in
      the git source tree) no longer makes our page locking look horribly bad.
      Before all these optimizations, just the unlock_page() costs were just
      over 3% of all CPU overhead on "make test".  After this, it's down to
      0.66%, so just a quarter of the cost it used to be.
      
      (The difference on NUMA is bigger, but there this micro-optimization is
      likely less noticeable, since the big issue on NUMA was not the accesses
      to 'struct page', but the waitqueue accesses that were already removed
      by Nick's earlier commit).
      Acked-by: NNick Piggin <npiggin@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Bob Peterson <rpeterso@redhat.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Andrew Lutomirski <luto@kernel.org>
      Cc: Andreas Gruenbacher <agruenba@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b91e1302
  10. 28 12月, 2016 10 次提交
  11. 27 12月, 2016 6 次提交
    • L
      crypto: testmgr - Use heap buffer for acomp test input · 02608e02
      Laura Abbott 提交于
      Christopher Covington reported a crash on aarch64 on recent Fedora
      kernels:
      
      kernel BUG at ./include/linux/scatterlist.h:140!
      Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
      Modules linked in:
      CPU: 2 PID: 752 Comm: cryptomgr_test Not tainted 4.9.0-11815-ge93b1cc8 #162
      Hardware name: linux,dummy-virt (DT)
      task: ffff80007c650080 task.stack: ffff800008910000
      PC is at sg_init_one+0xa0/0xb8
      LR is at sg_init_one+0x24/0xb8
      ...
      [<ffff000008398db8>] sg_init_one+0xa0/0xb8
      [<ffff000008350a44>] test_acomp+0x10c/0x438
      [<ffff000008350e20>] alg_test_comp+0xb0/0x118
      [<ffff00000834f28c>] alg_test+0x17c/0x2f0
      [<ffff00000834c6a4>] cryptomgr_test+0x44/0x50
      [<ffff0000080dac70>] kthread+0xf8/0x128
      [<ffff000008082ec0>] ret_from_fork+0x10/0x50
      
      The test vectors used for input are part of the kernel image. These
      inputs are passed as a buffer to sg_init_one which eventually blows up
      with BUG_ON(!virt_addr_valid(buf)). On arm64, virt_addr_valid returns
      false for the kernel image since virt_to_page will not return the
      correct page. Fix this by copying the input vectors to heap buffer
      before setting up the scatterlist.
      Reported-by: NChristopher Covington <cov@codeaurora.org>
      Fixes: d7db7a88 ("crypto: acomp - update testmgr with support for acomp")
      Signed-off-by: NLaura Abbott <labbott@redhat.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      02608e02
    • J
      ext4: Simplify DAX fault path · 1db17542
      Jan Kara 提交于
      Now that dax_iomap_fault() calls ->iomap_begin() without entry lock, we
      can use transaction starting in ext4_iomap_begin() and thus simplify
      ext4_dax_fault(). It also provides us proper retries in case of ENOSPC.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      1db17542
    • J
      dax: Call ->iomap_begin without entry lock during dax fault · 9f141d6e
      Jan Kara 提交于
      Currently ->iomap_begin() handler is called with entry lock held. If the
      filesystem held any locks between ->iomap_begin() and ->iomap_end()
      (such as ext4 which will want to hold transaction open), this would cause
      lock inversion with the iomap_apply() from standard IO path which first
      calls ->iomap_begin() and only then calls ->actor() callback which grabs
      entry locks for DAX (if it faults when copying from/to user provided
      buffers).
      
      Fix the problem by nesting grabbing of entry lock inside ->iomap_begin()
      - ->iomap_end() pair.
      Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      9f141d6e
    • J
      dax: Finish fault completely when loading holes · f449b936
      Jan Kara 提交于
      The only case when we do not finish the page fault completely is when we
      are loading hole pages into a radix tree. Avoid this special case and
      finish the fault in that case as well inside the DAX fault handler. It
      will allow us for easier iomap handling.
      Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      f449b936
    • J
      dax: Avoid page invalidation races and unnecessary radix tree traversals · e3fce68c
      Jan Kara 提交于
      Currently dax_iomap_rw() takes care of invalidating page tables and
      evicting hole pages from the radix tree when write(2) to the file
      happens. This invalidation is only necessary when there is some block
      allocation resulting from write(2). Furthermore in current place the
      invalidation is racy wrt page fault instantiating a hole page just after
      we have invalidated it.
      
      So perform the page invalidation inside dax_iomap_actor() where we can
      do it only when really necessary and after blocks have been allocated so
      nobody will be instantiating new hole pages anymore.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      e3fce68c
    • J
      mm: Invalidate DAX radix tree entries only if appropriate · c6dcf52c
      Jan Kara 提交于
      Currently invalidate_inode_pages2_range() and invalidate_mapping_pages()
      just delete all exceptional radix tree entries they find. For DAX this
      is not desirable as we track cache dirtiness in these entries and when
      they are evicted, we may not flush caches although it is necessary. This
      can for example manifest when we write to the same block both via mmap
      and via write(2) (to different offsets) and fsync(2) then does not
      properly flush CPU caches when modification via write(2) was the last
      one.
      
      Create appropriate DAX functions to handle invalidation of DAX entries
      for invalidate_inode_pages2_range() and invalidate_mapping_pages() and
      wire them up into the corresponding mm functions.
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      c6dcf52c