1. 01 12月, 2019 2 次提交
    • J
      rss_stat: add support to detect RSS updates of external mm · e4dcad20
      Joel Fernandes (Google) 提交于
      When a process updates the RSS of a different process, the rss_stat
      tracepoint appears in the context of the process doing the update.  This
      can confuse userspace that the RSS of process doing the update is
      updated, while in reality a different process's RSS was updated.
      
      This issue happens in reclaim paths such as with direct reclaim or
      background reclaim.
      
      This patch adds more information to the tracepoint about whether the mm
      being updated belongs to the current process's context (curr field).  We
      also include a hash of the mm pointer so that the process who the mm
      belongs to can be uniquely identified (mm_id field).
      
      Also vsprintf.c is refactored a bit to allow reuse of hashing code.
      
      [akpm@linux-foundation.org: remove unused local `str']
      [joelaf@google.com: inline call to ptr_to_hashval]
        Link: http://lore.kernel.org/r/20191113153816.14b95acd@gandalf.local.home
        Link: http://lkml.kernel.org/r/20191114164622.GC233237@google.com
      Link: http://lkml.kernel.org/r/20191106024452.81923-1-joel@joelfernandes.orgSigned-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
      Reported-by: NIoannis Ilkos <ilkos@google.com>
      Acked-by: Petr Mladek <pmladek@suse.com>	[lib/vsprintf.c]
      Cc: Tim Murray <timmurray@google.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Carmen Jackson <carmenjackson@google.com>
      Cc: Mayank Gupta <mayankgupta@google.com>
      Cc: Daniel Colascione <dancol@google.com>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e4dcad20
    • J
      mm: emit tracepoint when RSS changes · b3d1411b
      Joel Fernandes (Google) 提交于
      Useful to track how RSS is changing per TGID to detect spikes in RSS and
      memory hogs.  Several Android teams have been using this patch in
      various kernel trees for half a year now.  Many reported to me it is
      really useful so I'm posting it upstream.
      
      Initial patch developed by Tim Murray.  Changes I made from original
      patch: o Prevent any additional space consumed by mm_struct.
      
      Regarding the fact that the RSS may change too often thus flooding the
      traces - note that, there is some "hysterisis" with this already.  That
      is - We update the counter only if we receive 64 page faults due to
      SPLIT_RSS_ACCOUNTING.  However, during zapping or copying of pte range,
      the RSS is updated immediately which can become noisy/flooding.  In a
      previous discussion, we agreed that BPF or ftrace can be used to rate
      limit the signal if this becomes an issue.
      
      Also note that I added wrappers to trace_rss_stat to prevent compiler
      errors where linux/mm.h is included from tracing code, causing errors
      such as:
      
          CC      kernel/trace/power-traces.o
        In file included from ./include/trace/define_trace.h:102,
                         from ./include/trace/events/kmem.h:342,
                         from ./include/linux/mm.h:31,
                         from ./include/linux/ring_buffer.h:5,
                         from ./include/linux/trace_events.h:6,
                         from ./include/trace/events/power.h:12,
                         from kernel/trace/power-traces.c:15:
        ./include/trace/trace_events.h:113:22: error: field `ent' has incomplete type
           struct trace_entry ent;    \
      
      Link: http://lore.kernel.org/r/20190903200905.198642-1-joel@joelfernandes.org
      Link: http://lkml.kernel.org/r/20191001172817.234886-1-joel@joelfernandes.orgCo-developed-by: NTim Murray <timmurray@google.com>
      Signed-off-by: NTim Murray <timmurray@google.com>
      Signed-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Carmen Jackson <carmenjackson@google.com>
      Cc: Mayank Gupta <mayankgupta@google.com>
      Cc: Daniel Colascione <dancol@google.com>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b3d1411b
  2. 26 11月, 2019 1 次提交
  3. 25 11月, 2019 1 次提交
    • Q
      writeback: fix -Wformat compilation warnings · 40363cf1
      Qian Cai 提交于
      The commit f05499a0 ("writeback: use ino_t for inodes in
      tracepoints") introduced a lot of GCC compilation warnings on s390,
      
      In file included from ./include/trace/define_trace.h:102,
                       from ./include/trace/events/writeback.h:904,
                       from fs/fs-writeback.c:82:
      ./include/trace/events/writeback.h: In function
      'trace_raw_output_writeback_page_template':
      ./include/trace/events/writeback.h:76:12: warning: format '%lu' expects
      argument of type 'long unsigned int', but argument 4 has type 'ino_t'
      {aka 'unsigned int'} [-Wformat=]
        TP_printk("bdi %s: ino=%lu index=%lu",
                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~
      ./include/trace/trace_events.h:360:22: note: in definition of macro
      'DECLARE_EVENT_CLASS'
        trace_seq_printf(s, print);     \
                            ^~~~~
      ./include/trace/events/writeback.h:76:2: note: in expansion of macro
      'TP_printk'
        TP_printk("bdi %s: ino=%lu index=%lu",
        ^~~~~~~~~
      
      Fix them by adding necessary casts where ino_t could be either "unsigned
      int" or "unsigned long".
      
      Fixes: f05499a0 ("writeback: use ino_t for inodes in tracepoints")
      Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      40363cf1
  4. 23 11月, 2019 1 次提交
    • C
      SUNRPC: Capture completion of all RPC tasks · a264abad
      Chuck Lever 提交于
      RPC tasks on the backchannel never invoke xprt_complete_rqst(), so
      there is no way to report their tk_status at completion. Also, any
      RPC task that exits via rpc_exit_task() before it is replied to will
      also disappear without a trace.
      
      Introduce a trace point that is symmetrical with rpc_task_begin that
      captures the termination status of each RPC task.
      
      Sample trace output for callback requests initiated on the server:
         kworker/u8:12-448   [003]   127.025240: rpc_task_end:         task:50@3 flags=ASYNC|DYNAMIC|SOFT|SOFTCONN|SENT runstate=RUNNING|ACTIVE status=0 action=rpc_exit_task
         kworker/u8:12-448   [002]   127.567310: rpc_task_end:         task:51@3 flags=ASYNC|DYNAMIC|SOFT|SOFTCONN|SENT runstate=RUNNING|ACTIVE status=0 action=rpc_exit_task
         kworker/u8:12-448   [001]   130.506817: rpc_task_end:         task:52@3 flags=ASYNC|DYNAMIC|SOFT|SOFTCONN|SENT runstate=RUNNING|ACTIVE status=0 action=rpc_exit_task
      
      Odd, though, that I never see trace_rpc_task_complete, either in the
      forward or backchannel. Should it be removed?
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      a264abad
  5. 21 11月, 2019 1 次提交
  6. 19 11月, 2019 5 次提交
  7. 18 11月, 2019 3 次提交
  8. 17 11月, 2019 1 次提交
  9. 15 11月, 2019 1 次提交
    • A
      y2038: itimer: change implementation to timespec64 · bd40a175
      Arnd Bergmann 提交于
      There is no 64-bit version of getitimer/setitimer since that is not
      actually needed. However, the implementation is built around the
      deprecated 'struct timeval' type.
      
      Change the code to use timespec64 internally to reduce the dependencies
      on timeval and associated helper functions.
      
      Minor adjustments in the code are needed to make the native and compat
      version work the same way, and to keep the range check working after
      the conversion.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      bd40a175
  10. 13 11月, 2019 3 次提交
    • T
      cgroup: use cgrp->kn->id as the cgroup ID · 74321038
      Tejun Heo 提交于
      cgroup ID is currently allocated using a dedicated per-hierarchy idr
      and used internally and exposed through tracepoints and bpf.  This is
      confusing because there are tracepoints and other interfaces which use
      the cgroupfs ino as IDs.
      
      The preceding changes made kn->id exposed as ino as 64bit ino on
      supported archs or ino+gen (low 32bits as ino, high gen).  There's no
      reason for cgroup to use different IDs.  The kernfs IDs are unique and
      userland can easily discover them and map them back to paths using
      standard file operations.
      
      This patch replaces cgroup IDs with kernfs IDs.
      
      * cgroup_id() is added and all cgroup ID users are converted to use it.
      
      * kernfs_node creation is moved to earlier during cgroup init so that
        cgroup_id() is available during init.
      
      * While at it, s/cgroup/cgrp/ in psi helpers for consistency.
      
      * Fallback ID value is changed to 1 to be consistent with root cgroup
        ID.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      74321038
    • T
      kernfs: convert kernfs_node->id from union kernfs_node_id to u64 · 67c0496e
      Tejun Heo 提交于
      kernfs_node->id is currently a union kernfs_node_id which represents
      either a 32bit (ino, gen) pair or u64 value.  I can't see much value
      in the usage of the union - all that's needed is a 64bit ID which the
      current code is already limited to.  Using a union makes the code
      unnecessarily complicated and prevents using 64bit ino without adding
      practical benefits.
      
      This patch drops union kernfs_node_id and makes kernfs_node->id a u64.
      ino is stored in the lower 32bits and gen upper.  Accessors -
      kernfs[_id]_ino() and kernfs[_id]_gen() - are added to retrieve the
      ino and gen.  This simplifies ID handling less cumbersome and will
      allow using 64bit inos on supported archs.
      
      This patch doesn't make any functional changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Alexei Starovoitov <ast@kernel.org>
      67c0496e
    • T
      writeback: use ino_t for inodes in tracepoints · f05499a0
      Tejun Heo 提交于
      Writeback TPs currently use mix of 32 and 64bits for inos.  This isn't
      currently broken because only cgroup inos are using 32bits and they're
      limited to 32bits.  cgroup inos will make use of 64bits.  Let's
      uniformly use ino_t.
      
      While at it, switch the default cgroup ino value used when cgroup is
      disabled to 1 instead of -1U as root cgroup always uses ino 1.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Namhyung Kim <namhyung@kernel.org>
      f05499a0
  11. 10 11月, 2019 1 次提交
  12. 08 11月, 2019 2 次提交
  13. 06 11月, 2019 2 次提交
  14. 04 11月, 2019 1 次提交
  15. 02 11月, 2019 2 次提交
  16. 31 10月, 2019 1 次提交
  17. 30 10月, 2019 5 次提交
    • P
    • P
      d01f8620
    • P
    • J
      io_uring: replace workqueue usage with io-wq · 561fb04a
      Jens Axboe 提交于
      Drop various work-arounds we have for workqueues:
      
      - We no longer need the async_list for tracking sequential IO.
      
      - We don't have to maintain our own mm tracking/setting.
      
      - We don't need a separate workqueue for buffered writes. This didn't
        even work that well to begin with, as it was suboptimal for multiple
        buffered writers on multiple files.
      
      - We can properly cancel pending interruptible work. This fixes
        deadlocks with particularly socket IO, where we cannot cancel them
        when the io_uring is closed. Hence the ring will wait forever for
        these requests to complete, which may never happen. This is different
        from disk IO where we know requests will complete in a finite amount
        of time.
      
      - Due to being able to cancel work interruptible work that is already
        running, we can implement file table support for work. We need that
        for supporting system calls that add to a process file table.
      
      - It gets us one step closer to adding async support for any system
        call.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      561fb04a
    • D
      io_uring: add set of tracing events · c826bd7a
      Dmitrii Dolgov 提交于
      To trace io_uring activity one can get an information from workqueue and
      io trace events, but looks like some parts could be hard to identify via
      this approach. Making what happens inside io_uring more transparent is
      important to be able to reason about many aspects of it, hence introduce
      the set of tracing events.
      
      All such events could be roughly divided into two categories:
      
      * those, that are helping to understand correctness (from both kernel
        and an application point of view). E.g. a ring creation, file
        registration, or waiting for available CQE. Proposed approach is to
        get a pointer to an original structure of interest (ring context, or
        request), and then find relevant events. io_uring_queue_async_work
        also exposes a pointer to work_struct, to be able to track down
        corresponding workqueue events.
      
      * those, that provide performance related information. Mostly it's about
        events that change the flow of requests, e.g. whether an async work
        was queued, or delayed due to some dependencies. Another important
        case is how io_uring optimizations (e.g. registered files) are
        utilized.
      Signed-off-by: NDmitrii Dolgov <9erthalion6@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c826bd7a
  18. 24 10月, 2019 7 次提交