1. 11 5月, 2018 1 次提交
  2. 27 4月, 2018 1 次提交
  3. 19 4月, 2018 2 次提交
  4. 12 4月, 2018 1 次提交
    • S
      mm, vmscan, tracing: use pointer to reclaim_stat struct in trace event · d51d1e64
      Steven Rostedt 提交于
      The trace event trace_mm_vmscan_lru_shrink_inactive() currently has 12
      parameters! Seven of them are from the reclaim_stat structure.  This
      structure is currently local to mm/vmscan.c.  By moving it to the global
      vmstat.h header, we can also reference it from the vmscan tracepoints.
      In moving it, it brings down the overhead of passing so many arguments
      to the trace event.  In the future, we may limit the number of arguments
      that a trace event may pass (ideally just 6, but more realistically it
      may be 8).
      
      Before this patch, the code to call the trace event is this:
      
       0f 83 aa fe ff ff       jae    ffffffff811e6261 <shrink_inactive_list+0x1e1>
       48 8b 45 a0             mov    -0x60(%rbp),%rax
       45 8b 64 24 20          mov    0x20(%r12),%r12d
       44 8b 6d d4             mov    -0x2c(%rbp),%r13d
       8b 4d d0                mov    -0x30(%rbp),%ecx
       44 8b 75 cc             mov    -0x34(%rbp),%r14d
       44 8b 7d c8             mov    -0x38(%rbp),%r15d
       48 89 45 90             mov    %rax,-0x70(%rbp)
       8b 83 b8 fe ff ff       mov    -0x148(%rbx),%eax
       8b 55 c0                mov    -0x40(%rbp),%edx
       8b 7d c4                mov    -0x3c(%rbp),%edi
       8b 75 b8                mov    -0x48(%rbp),%esi
       89 45 80                mov    %eax,-0x80(%rbp)
       65 ff 05 e4 f7 e2 7e    incl   %gs:0x7ee2f7e4(%rip)        # 15bd0 <__preempt_count>
       48 8b 05 75 5b 13 01    mov    0x1135b75(%rip),%rax        # ffffffff8231bf68 <__tracepoint_mm_vmscan_lru_shrink_inactive+0x28>
       48 85 c0                test   %rax,%rax
       74 72                   je     ffffffff811e646a <shrink_inactive_list+0x3ea>
       48 89 c3                mov    %rax,%rbx
       4c 8b 10                mov    (%rax),%r10
       89 f8                   mov    %edi,%eax
       48 89 85 68 ff ff ff    mov    %rax,-0x98(%rbp)
       89 f0                   mov    %esi,%eax
       48 89 85 60 ff ff ff    mov    %rax,-0xa0(%rbp)
       89 c8                   mov    %ecx,%eax
       48 89 85 78 ff ff ff    mov    %rax,-0x88(%rbp)
       89 d0                   mov    %edx,%eax
       48 89 85 70 ff ff ff    mov    %rax,-0x90(%rbp)
       8b 45 8c                mov    -0x74(%rbp),%eax
       48 8b 7b 08             mov    0x8(%rbx),%rdi
       48 83 c3 18             add    $0x18,%rbx
       50                      push   %rax
       41 54                   push   %r12
       41 55                   push   %r13
       ff b5 78 ff ff ff       pushq  -0x88(%rbp)
       41 56                   push   %r14
       41 57                   push   %r15
       ff b5 70 ff ff ff       pushq  -0x90(%rbp)
       4c 8b 8d 68 ff ff ff    mov    -0x98(%rbp),%r9
       4c 8b 85 60 ff ff ff    mov    -0xa0(%rbp),%r8
       48 8b 4d 98             mov    -0x68(%rbp),%rcx
       48 8b 55 90             mov    -0x70(%rbp),%rdx
       8b 75 80                mov    -0x80(%rbp),%esi
       41 ff d2                callq  *%r10
      
      After the patch:
      
       0f 83 a8 fe ff ff       jae    ffffffff811e626d <shrink_inactive_list+0x1cd>
       8b 9b b8 fe ff ff       mov    -0x148(%rbx),%ebx
       45 8b 64 24 20          mov    0x20(%r12),%r12d
       4c 8b 6d a0             mov    -0x60(%rbp),%r13
       65 ff 05 f5 f7 e2 7e    incl   %gs:0x7ee2f7f5(%rip)        # 15bd0 <__preempt_count>
       4c 8b 35 86 5b 13 01    mov    0x1135b86(%rip),%r14        # ffffffff8231bf68 <__tracepoint_mm_vmscan_lru_shrink_inactive+0x28>
       4d 85 f6                test   %r14,%r14
       74 2a                   je     ffffffff811e6411 <shrink_inactive_list+0x371>
       49 8b 06                mov    (%r14),%rax
       8b 4d 8c                mov    -0x74(%rbp),%ecx
       49 8b 7e 08             mov    0x8(%r14),%rdi
       49 83 c6 18             add    $0x18,%r14
       4c 89 ea                mov    %r13,%rdx
       45 89 e1                mov    %r12d,%r9d
       4c 8d 45 b8             lea    -0x48(%rbp),%r8
       89 de                   mov    %ebx,%esi
       51                      push   %rcx
       48 8b 4d 98             mov    -0x68(%rbp),%rcx
       ff d0                   callq  *%rax
      
      Link: http://lkml.kernel.org/r/2559d7cb-ec60-1200-2362-04fa34fd02bb@fb.com
      Link: http://lkml.kernel.org/r/20180322121003.4177af15@gandalf.local.homeSigned-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Reported-by: NAlexei Starovoitov <ast@fb.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Alexei Starovoitov <ast@fb.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d51d1e64
  5. 11 4月, 2018 3 次提交
  6. 10 4月, 2018 3 次提交
    • D
      afs: Trace protocol errors · 5f702c8e
      David Howells 提交于
      Trace protocol errors detected in afs.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      5f702c8e
    • D
      afs: Locally edit directory data for mkdir/create/unlink/... · 63a4681f
      David Howells 提交于
      Locally edit the contents of an AFS directory upon a successful inode
      operation that modifies that directory (such as mkdir, create and unlink)
      so that we can avoid the current practice of re-downloading the directory
      after each change.
      
      This is viable provided that the directory version number we get back from
      the modifying RPC op is exactly incremented by 1 from what we had
      previously.  The data in the directory contents is in a defined format that
      we have to parse locally to perform lookups and readdir, so modifying isn't
      a problem.
      
      If the edit fails, we just clear the VALID flag on the directory and it
      will be reloaded next time it is needed.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      63a4681f
    • D
      afs: Prospectively look up extra files when doing a single lookup · 5cf9dd55
      David Howells 提交于
      When afs_lookup() is called, prospectively look up the next 50 uncached
      fids also from that same directory and cache the results, rather than just
      looking up the one file requested.
      
      This allows us to use the FS.InlineBulkStatus RPC op to increase efficiency
      by fetching up to 50 file statuses at a time.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      5cf9dd55
  7. 06 4月, 2018 4 次提交
  8. 04 4月, 2018 9 次提交
    • D
      fscache: Add more tracepoints · 08c2e3d0
      David Howells 提交于
      Add more tracepoints to fscache, including:
      
       (*) fscache_page - Tracks netfs pages known to fscache.
      
       (*) fscache_check_page - Tracks the netfs querying whether a page is
           pending storage.
      
       (*) fscache_wake_cookie - Tracks cookies being woken up after a page
           completes/aborts storage in the cache.
      
       (*) fscache_op - Tracks operations being initialised.
      
       (*) fscache_wrote_page - Tracks return of the backend write_page op.
      
       (*) fscache_gang_lookup - Tracks lookup of pages to be stored in the write
           operation.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      08c2e3d0
    • D
      fscache: Add tracepoints · a18feb55
      David Howells 提交于
      Add some tracepoints to fscache:
      
       (*) fscache_cookie - Tracks a cookie's usage count.
      
       (*) fscache_netfs - Logs registration of a network filesystem, including
           the pointer to the cookie allocated.
      
       (*) fscache_acquire - Logs cookie acquisition.
      
       (*) fscache_relinquish - Logs cookie relinquishment.
      
       (*) fscache_enable - Logs enablement of a cookie.
      
       (*) fscache_disable - Logs disablement of a cookie.
      
       (*) fscache_osm - Tracks execution of states in the object state machine.
      
      and cachefiles:
      
       (*) cachefiles_ref - Tracks a cachefiles object's usage count.
      
       (*) cachefiles_lookup - Logs result of lookup_one_len().
      
       (*) cachefiles_mkdir - Logs result of vfs_mkdir().
      
       (*) cachefiles_create - Logs result of vfs_create().
      
       (*) cachefiles_unlink - Logs calls to vfs_unlink().
      
       (*) cachefiles_rename - Logs calls to vfs_rename().
      
       (*) cachefiles_mark_active - Logs an object becoming active.
      
       (*) cachefiles_wait_active - Logs a wait for an old object to be
           destroyed.
      
       (*) cachefiles_mark_inactive - Logs an object becoming inactive.
      
       (*) cachefiles_mark_buried - Logs the burial of an object.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      a18feb55
    • C
      svc: Report xprt dequeue latency · 55f5088c
      Chuck Lever 提交于
      Record the time between when a rqstp is enqueued on a transport
      and when it is dequeued. This includes how long the rqstp waits on
      the queue and how long it takes the kernel scheduler to wake a
      nfsd thread to service it.
      
      The svc_xprt_dequeue trace point is altered to include the number
      of microseconds between xprt_enqueue and xprt_dequeue.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      55f5088c
    • C
      sunrpc: Report per-RPC execution stats · aaba72cd
      Chuck Lever 提交于
      Introduce a mechanism to report the server-side execution latency of
      each RPC. The goal is to enable user space to filter the trace
      record for latency outliers, build histograms, etc.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      aaba72cd
    • C
      sunrpc: Re-purpose trace_svc_process · 0b9547bf
      Chuck Lever 提交于
      Currently, trace_svc_process has two call sites:
      
      1. Just after a call to svc_send. svc_send already invokes
         trace_svc_send with the same arguments just before returning
      
      2. Just before a call to svc_drop. svc_drop already invokes
         trace_svc_drop with the same arguments just after it is called
      
      Therefore trace_svc_process does not provide any additional
      information not already provided by these other trace points.
      
      However, it would be useful to record the incoming RPC procedure.
      So reuse trace_svc_process for this purpose.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      0b9547bf
    • C
      sunrpc: Save remote presentation address in svc_xprt for trace events · ece200dd
      Chuck Lever 提交于
      TP_printk defines a format string that is passed to user space for
      converting raw trace event records to something human-readable.
      
      My user space's printf (Oracle Linux 7), however, does not have a
      %pI format specifier. The result is that what is supposed to be an
      IP address in the output of "trace-cmd report" is just a string that
      says the field couldn't be displayed.
      
      To fix this, adopt the same approach as the client: maintain a pre-
      formated presentation address for occasions when %pI is not
      available.
      
      The location of the trace_svc_send trace point is adjusted so that
      rqst->rq_xprt is not NULL when the trace event is recorded.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      ece200dd
    • C
      sunrpc: Simplify trace_svc_recv · 41f306d0
      Chuck Lever 提交于
      There doesn't seem to be a lot of value in calling trace_svc_recv
      in the failing case.
      
      1. There are two very common cases: one is the transport is not
      ready, and the other is shutdown. Neither is terribly interesting.
      
      2. The trace record for the failing case contains nothing but
      the status code.
      
      Therefore the trace point call site in the error exit is removed.
      Since the trace point is now recording a length instead of a
      status, rename the status field and remove the case that records a
      zero XID.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      41f306d0
    • C
      sunrpc: Move trace_svc_xprt_dequeue() · caa3e106
      Chuck Lever 提交于
      Reduce the amount of noise generated by trace_svc_xprt_dequeue by
      moving it to the end of svc_get_next_xprt. This generates exactly
      one trace event when a ready xprt is found, rather than spurious
      events when there is no work to do. The empty events contain no
      information that can't be obtained simply by tracing function calls
      to svc_xprt_dequeue.
      
      A small additional benefit is simplification of the svc_xprt_event
      trace class, which no longer has to handle the case when the @xprt
      parameter is NULL.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      caa3e106
    • C
      sunrpc: Update show_svc_xprt_flags() to include recently added flags · 03edb90f
      Chuck Lever 提交于
      XPT_KILL_TEMP was added by commit 546125d1 ("sunrpc: don't call
      sleeping functions from the notifier block callbacks"), and
      XPT_CONG_CTRL was added by commit 362142b2 ("sunrpc: flag
      transports as having congestion control") .
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      03edb90f
  9. 31 3月, 2018 5 次提交
  10. 29 3月, 2018 2 次提交
    • A
      bpf: introduce BPF_RAW_TRACEPOINT · c4f6699d
      Alexei Starovoitov 提交于
      Introduce BPF_PROG_TYPE_RAW_TRACEPOINT bpf program type to access
      kernel internal arguments of the tracepoints in their raw form.
      
      >From bpf program point of view the access to the arguments look like:
      struct bpf_raw_tracepoint_args {
             __u64 args[0];
      };
      
      int bpf_prog(struct bpf_raw_tracepoint_args *ctx)
      {
        // program can read args[N] where N depends on tracepoint
        // and statically verified at program load+attach time
      }
      
      kprobe+bpf infrastructure allows programs access function arguments.
      This feature allows programs access raw tracepoint arguments.
      
      Similar to proposed 'dynamic ftrace events' there are no abi guarantees
      to what the tracepoints arguments are and what their meaning is.
      The program needs to type cast args properly and use bpf_probe_read()
      helper to access struct fields when argument is a pointer.
      
      For every tracepoint __bpf_trace_##call function is prepared.
      In assembler it looks like:
      (gdb) disassemble __bpf_trace_xdp_exception
      Dump of assembler code for function __bpf_trace_xdp_exception:
         0xffffffff81132080 <+0>:     mov    %ecx,%ecx
         0xffffffff81132082 <+2>:     jmpq   0xffffffff811231f0 <bpf_trace_run3>
      
      where
      
      TRACE_EVENT(xdp_exception,
              TP_PROTO(const struct net_device *dev,
                       const struct bpf_prog *xdp, u32 act),
      
      The above assembler snippet is casting 32-bit 'act' field into 'u64'
      to pass into bpf_trace_run3(), while 'dev' and 'xdp' args are passed as-is.
      All of ~500 of __bpf_trace_*() functions are only 5-10 byte long
      and in total this approach adds 7k bytes to .text.
      
      This approach gives the lowest possible overhead
      while calling trace_xdp_exception() from kernel C code and
      transitioning into bpf land.
      Since tracepoint+bpf are used at speeds of 1M+ events per second
      this is valuable optimization.
      
      The new BPF_RAW_TRACEPOINT_OPEN sys_bpf command is introduced
      that returns anon_inode FD of 'bpf-raw-tracepoint' object.
      
      The user space looks like:
      // load bpf prog with BPF_PROG_TYPE_RAW_TRACEPOINT type
      prog_fd = bpf_prog_load(...);
      // receive anon_inode fd for given bpf_raw_tracepoint with prog attached
      raw_tp_fd = bpf_raw_tracepoint_open("xdp_exception", prog_fd);
      
      Ctrl-C of tracing daemon or cmdline tool that uses this feature
      will automatically detach bpf program, unload it and
      unregister tracepoint probe.
      
      On the kernel side the __bpf_raw_tp_map section of pointers to
      tracepoint definition and to __bpf_trace_*() probe function is used
      to find a tracepoint with "xdp_exception" name and
      corresponding __bpf_trace_xdp_exception() probe function
      which are passed to tracepoint_probe_register() to connect probe
      with tracepoint.
      
      Addition of bpf_raw_tracepoint doesn't interfere with ftrace and perf
      tracepoint mechanisms. perf_event_open() can be used in parallel
      on the same tracepoint.
      Multiple bpf_raw_tracepoint_open("xdp_exception", prog_fd) are permitted.
      Each with its own bpf program. The kernel will execute
      all tracepoint probes and all attached bpf programs.
      
      In the future bpf_raw_tracepoints can be extended with
      query/introspection logic.
      
      __bpf_raw_tp_map section logic was contributed by Steven Rostedt
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      c4f6699d
    • A
      treewide: remove large struct-pass-by-value from tracepoint arguments · c1055475
      Alexei Starovoitov 提交于
      - fix trace_hfi1_ctxt_info() to pass large struct by reference instead of by value
      - convert 'type array[]' tracepoint arguments into 'type *array',
        since compiler will warn that sizeof('type array[]') == sizeof('type *array')
        and later should be used instead
      
      The CAST_TO_U64 macro in the later patch will enforce that tracepoint
      arguments can only be integers, pointers, or less than 8 byte structures.
      Larger structures should be passed by reference.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      c1055475
  11. 28 3月, 2018 3 次提交
    • D
      rxrpc: Trace call completion · 1bae5d22
      David Howells 提交于
      Add a tracepoint to track rxrpc calls moving into the completed state and
      to log the completion type and the recorded error value and abort code.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      1bae5d22
    • D
      rxrpc, afs: Use debug_ids rather than pointers in traces · a25e21f0
      David Howells 提交于
      In rxrpc and afs, use the debug_ids that are monotonically allocated to
      various objects as they're allocated rather than pointers as kernel
      pointers are now hashed making them less useful.  Further, the debug ids
      aren't reused anywhere nearly as quickly.
      
      In addition, allow kernel services that use rxrpc, such as afs, to take
      numbers from the rxrpc counter, assign them to their own call struct and
      pass them in to rxrpc for both client and service calls so that the trace
      lines for each will have the same ID tag.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      a25e21f0
    • D
      rxrpc: Trace resend · 827efed6
      David Howells 提交于
      Add a tracepoint to trace packet resend events and to dump the Tx
      annotation buffer for added illumination.
      Signed-off-by: NDavid Howells <dhowells@rdhat.com>
      827efed6
  12. 26 3月, 2018 1 次提交
  13. 15 3月, 2018 1 次提交
  14. 23 2月, 2018 1 次提交
  15. 21 2月, 2018 2 次提交
  16. 19 2月, 2018 1 次提交