1. 31 7月, 2017 1 次提交
  2. 13 7月, 2017 1 次提交
    • M
      mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic · dcda9b04
      Michal Hocko 提交于
      __GFP_REPEAT was designed to allow retry-but-eventually-fail semantic to
      the page allocator.  This has been true but only for allocations
      requests larger than PAGE_ALLOC_COSTLY_ORDER.  It has been always
      ignored for smaller sizes.  This is a bit unfortunate because there is
      no way to express the same semantic for those requests and they are
      considered too important to fail so they might end up looping in the
      page allocator for ever, similarly to GFP_NOFAIL requests.
      
      Now that the whole tree has been cleaned up and accidental or misled
      usage of __GFP_REPEAT flag has been removed for !costly requests we can
      give the original flag a better name and more importantly a more useful
      semantic.  Let's rename it to __GFP_RETRY_MAYFAIL which tells the user
      that the allocator would try really hard but there is no promise of a
      success.  This will work independent of the order and overrides the
      default allocator behavior.  Page allocator users have several levels of
      guarantee vs.  cost options (take GFP_KERNEL as an example)
      
       - GFP_KERNEL & ~__GFP_RECLAIM - optimistic allocation without _any_
         attempt to free memory at all. The most light weight mode which even
         doesn't kick the background reclaim. Should be used carefully because
         it might deplete the memory and the next user might hit the more
         aggressive reclaim
      
       - GFP_KERNEL & ~__GFP_DIRECT_RECLAIM (or GFP_NOWAIT)- optimistic
         allocation without any attempt to free memory from the current
         context but can wake kswapd to reclaim memory if the zone is below
         the low watermark. Can be used from either atomic contexts or when
         the request is a performance optimization and there is another
         fallback for a slow path.
      
       - (GFP_KERNEL|__GFP_HIGH) & ~__GFP_DIRECT_RECLAIM (aka GFP_ATOMIC) -
         non sleeping allocation with an expensive fallback so it can access
         some portion of memory reserves. Usually used from interrupt/bh
         context with an expensive slow path fallback.
      
       - GFP_KERNEL - both background and direct reclaim are allowed and the
         _default_ page allocator behavior is used. That means that !costly
         allocation requests are basically nofail but there is no guarantee of
         that behavior so failures have to be checked properly by callers
         (e.g. OOM killer victim is allowed to fail currently).
      
       - GFP_KERNEL | __GFP_NORETRY - overrides the default allocator behavior
         and all allocation requests fail early rather than cause disruptive
         reclaim (one round of reclaim in this implementation). The OOM killer
         is not invoked.
      
       - GFP_KERNEL | __GFP_RETRY_MAYFAIL - overrides the default allocator
         behavior and all allocation requests try really hard. The request
         will fail if the reclaim cannot make any progress. The OOM killer
         won't be triggered.
      
       - GFP_KERNEL | __GFP_NOFAIL - overrides the default allocator behavior
         and all allocation requests will loop endlessly until they succeed.
         This might be really dangerous especially for larger orders.
      
      Existing users of __GFP_REPEAT are changed to __GFP_RETRY_MAYFAIL
      because they already had their semantic.  No new users are added.
      __alloc_pages_slowpath is changed to bail out for __GFP_RETRY_MAYFAIL if
      there is no progress and we have already passed the OOM point.
      
      This means that all the reclaim opportunities have been exhausted except
      the most disruptive one (the OOM killer) and a user defined fallback
      behavior is more sensible than keep retrying in the page allocator.
      
      [akpm@linux-foundation.org: fix arch/sparc/kernel/mdesc.c]
      [mhocko@suse.com: semantic fix]
        Link: http://lkml.kernel.org/r/20170626123847.GM11534@dhcp22.suse.cz
      [mhocko@kernel.org: address other thing spotted by Vlastimil]
        Link: http://lkml.kernel.org/r/20170626124233.GN11534@dhcp22.suse.cz
      Link: http://lkml.kernel.org/r/20170623085345.11304-3-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Alex Belits <alex.belits@cavium.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: David Daney <david.daney@cavium.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: NeilBrown <neilb@suse.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dcda9b04
  3. 11 7月, 2017 2 次提交
    • R
      mm/oom_kill.c: add tracepoints for oom reaper-related events · 422580c3
      Roman Gushchin 提交于
      During the debugging of the problem described in
      https://lkml.org/lkml/2017/5/17/542 and fixed by Tetsuo Handa in
      https://lkml.org/lkml/2017/5/19/383 , I've found that the existing debug
      output is not really useful to understand issues related to the oom
      reaper.
      
      So, I assume, that adding some tracepoints might help with debugging of
      similar issues.
      
      Trace the following events:
       1) a process is marked as an oom victim,
       2) a process is added to the oom reaper list,
       3) the oom reaper starts reaping process's mm,
       4) the oom reaper finished reaping,
       5) the oom reaper skips reaping.
      
      How it works in practice? Below is an example which show how the problem
      mentioned above can be found: one process is added twice to the
      oom_reaper list:
      
        $ cd /sys/kernel/debug/tracing
        $ echo "oom:mark_victim" > set_event
        $ echo "oom:wake_reaper" >> set_event
        $ echo "oom:skip_task_reaping" >> set_event
        $ echo "oom:start_task_reaping" >> set_event
        $ echo "oom:finish_task_reaping" >> set_event
        $ cat trace_pipe
                allocate-502   [001] ....    91.836405: mark_victim: pid=502
                allocate-502   [001] .N..    91.837356: wake_reaper: pid=502
                allocate-502   [000] .N..    91.871149: wake_reaper: pid=502
              oom_reaper-23    [000] ....    91.871177: start_task_reaping: pid=502
              oom_reaper-23    [000] .N..    91.879511: finish_task_reaping: pid=502
              oom_reaper-23    [000] ....    91.879580: skip_task_reaping: pid=502
      
      Link: http://lkml.kernel.org/r/20170530185231.GA13412@castleSigned-off-by: NRoman Gushchin <guro@fb.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      422580c3
    • S
      oom, trace: remove ENUM evaluation of COMPACTION_FEEDBACK · 7ab0e50a
      Steven Rostedt (VMware) 提交于
      After enabling CONFIG_TRACE_ENUM_MAP_FILE (which will soon be renamed to
      CONFIG_TRACE_EVAL_MAP_FILE), I am able to examine the enums that have
      been evaluated:
      
       # cat /sys/kernel/debug/tracing/enum_map
      
      (which will soon be renamed to eval_map)
      
      And it showed some interesting results:
      
        [..]
        ZONE_MOVABLE 3 (oom)
        ZONE_NORMAL 2 (oom)
        ZONE_DMA32 1 (oom)
        ZONE_DMA 0 (oom)
        3 3 (oom)
        2 2 (oom)
        1 1 (oom)
        COMPACT_PRIO_ASYNC 2 (oom)
        COMPACT_PRIO_SYNC_LIGHT 1 (oom)
        COMPACT_PRIO_SYNC_FULL 0 (oom)
        [..]
        ZONE_DMA 0 (vmscan)
        3 3 (vmscan)
        2 2 (vmscan)
        1 1 (vmscan)
        COMPACT_PRIO_ASYNC 2 (vmscan)
        [..]
        ZONE_DMA 0 (kmem)
        3 3 (kmem)
        2 2 (kmem)
        1 1 (kmem)
        COMPACT_PRIO_ASYNC 2 (kmem)
        [..]
        ZONE_DMA 0 (compaction)
        3 3 (compaction)
        2 2 (compaction)
        1 1 (compaction)
        COMPACT_PRIO_ASYNC 2 (compaction)
        [..]
      
      The name within the parenthesis are the trace systems that the enum/eval
      maps are associated with. When there's a number evaluated to another
      number, that tells me that the TRACE_DEFINE_ENUM() was used on a #define
      and not an enum. As #defines get converted normally, they are not needed
      to be evaluated.
      
      Each of the above trace systems with the number to number evaluation
      included the file include/trace/events/mmflags.h which has:
      
       /* High-level compaction status feedback */
       #define COMPACTION_FAILED       1
       #define COMPACTION_WITHDRAWN    2
       #define COMPACTION_PROGRESS     3
      
      [..]
      
       #define COMPACTION_FEEDBACK             \
              EM(COMPACTION_FAILED,           "failed")       \
              EM(COMPACTION_WITHDRAWN,        "withdrawn")    \
              EMe(COMPACTION_PROGRESS,        "progress")
      
      Which is still needed for the __print_symbolic() usage in the
      trace_event.  But it is not needed to be evaluated.
      
      Removing the evaluation part removes the unnecessary evaluations of
      numbers to numbers.
      
      Link: http://lkml.kernel.org/r/20170615074944.7be9a647@gandalf.local.homeSigned-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7ab0e50a
  4. 06 7月, 2017 1 次提交
    • J
      fs: new infrastructure for writeback error handling and reporting · 5660e13d
      Jeff Layton 提交于
      Most filesystems currently use mapping_set_error and
      filemap_check_errors for setting and reporting/clearing writeback errors
      at the mapping level. filemap_check_errors is indirectly called from
      most of the filemap_fdatawait_* functions and from
      filemap_write_and_wait*. These functions are called from all sorts of
      contexts to wait on writeback to finish -- e.g. mostly in fsync, but
      also in truncate calls, getattr, etc.
      
      The non-fsync callers are problematic. We should be reporting writeback
      errors during fsync, but many places spread over the tree clear out
      errors before they can be properly reported, or report errors at
      nonsensical times.
      
      If I get -EIO on a stat() call, there is no reason for me to assume that
      it is because some previous writeback failed. The fact that it also
      clears out the error such that a subsequent fsync returns 0 is a bug,
      and a nasty one since that's potentially silent data corruption.
      
      This patch adds a small bit of new infrastructure for setting and
      reporting errors during address_space writeback. While the above was my
      original impetus for adding this, I think it's also the case that
      current fsync semantics are just problematic for userland. Most
      applications that call fsync do so to ensure that the data they wrote
      has hit the backing store.
      
      In the case where there are multiple writers to the file at the same
      time, this is really hard to determine. The first one to call fsync will
      see any stored error, and the rest get back 0. The processes with open
      fds may not be associated with one another in any way. They could even
      be in different containers, so ensuring coordination between all fsync
      callers is not really an option.
      
      One way to remedy this would be to track what file descriptor was used
      to dirty the file, but that's rather cumbersome and would likely be
      slow. However, there is a simpler way to improve the semantics here
      without incurring too much overhead.
      
      This set adds an errseq_t to struct address_space, and a corresponding
      one is added to struct file. Writeback errors are recorded in the
      mapping's errseq_t, and the one in struct file is used as the "since"
      value.
      
      This changes the semantics of the Linux fsync implementation such that
      applications can now use it to determine whether there were any
      writeback errors since fsync(fd) was last called (or since the file was
      opened in the case of fsync having never been called).
      
      Note that those writeback errors may have occurred when writing data
      that was dirtied via an entirely different fd, but that's the case now
      with the current mapping_set_error/filemap_check_error infrastructure.
      This will at least prevent you from getting a false report of success.
      
      The new behavior is still consistent with the POSIX spec, and is more
      reliable for application developers. This patch just adds some basic
      infrastructure for doing this, and ensures that the f_wb_err "cursor"
      is properly set when a file is opened. Later patches will change the
      existing code to use this new infrastructure for reporting errors at
      fsync time.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      5660e13d
  5. 21 6月, 2017 1 次提交
  6. 20 6月, 2017 1 次提交
  7. 14 6月, 2017 5 次提交
  8. 09 6月, 2017 2 次提交
  9. 08 6月, 2017 1 次提交
    • P
      rcu: Prevent rcu_barrier() from starting needless grace periods · f92c734f
      Paul E. McKenney 提交于
      Currently rcu_barrier() uses call_rcu() to enqueue new callbacks
      on each CPU with a non-empty callback list.  This works, but means
      that rcu_barrier() forces grace periods that are not otherwise needed.
      The key point is that rcu_barrier() never needs to wait for a grace
      period, but instead only for all pre-existing callbacks to be invoked.
      This means that rcu_barrier()'s new callbacks should be placed in
      the callback-list segment containing the last pre-existing callback.
      
      This commit makes this change using the new rcu_segcblist_entrain()
      function.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      f92c734f
  10. 05 6月, 2017 1 次提交
    • D
      rxrpc: Add service upgrade support for client connections · 4e255721
      David Howells 提交于
      Make it possible for a client to use AuriStor's service upgrade facility.
      
      The client does this by adding an RXRPC_UPGRADE_SERVICE control message to
      the first sendmsg() of a call.  This takes no parameters.
      
      When recvmsg() starts returning data from the call, the service ID field in
      the returned msg_name will reflect the result of the upgrade attempt.  If
      the upgrade was ignored, srx_service will match what was set in the
      sendmsg(); if the upgrade happened the srx_service will be altered to
      indicate the service the server upgraded to.
      
      Note that:
      
       (1) The choice of upgrade service is up to the server
      
       (2) Further client calls to the same server that would share a connection
           are blocked if an upgrade probe is in progress.
      
       (3) This should only be used to probe the service.  Clients should then
           use the returned service ID in all subsequent communications with that
           server (and not set the upgrade).  Note that the kernel will not
           retain this information should the connection expire from its cache.
      
       (4) If a server that supports upgrading is replaced by one that doesn't,
           whilst a connection is live, and if the replacement is running, say,
           OpenAFS 1.6.4 or older or an older IBM AFS, then the replacement
           server will not respond to packets sent to the upgraded connection.
      
           At this point, calls will time out and the server must be reprobed.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      4e255721
  11. 01 6月, 2017 1 次提交
  12. 24 5月, 2017 2 次提交
  13. 09 5月, 2017 6 次提交
    • R
      dax: add tracepoint to dax_insert_mapping() · b4440734
      Ross Zwisler 提交于
      Add a tracepoint to dax_insert_mapping(), following the same logging
      conventions as the rest of DAX.  This tracepoint, along with the one in
      dax_load_hole(), lets us know how a DAX PTE fault was serviced.
      
      Here is an example DAX fault that inserts a PTE mapping:
      
        small-1126  [007] ....
         145.451604: dax_pte_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220
      
        small-1126  [007] ....
         145.452317: dax_insert_mapping: dev 259:0 ino 0x1003 shared write address 0x10420000 radix_entry 0x100006
      
        small-1126  [007] ....
         145.452399: dax_pte_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220 MAJOR|NOPAGE
      
      Link: http://lkml.kernel.org/r/20170221195116.13278-7-ross.zwisler@linux.intel.comSigned-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b4440734
    • R
      dax: add tracepoint to dax_writeback_one() · f9bc3a07
      Ross Zwisler 提交于
      Add a tracepoint to dax_writeback_one(), following the same logging
      conventions as the rest of DAX.
      
      Here is an example range writeback which ends up flushing one PMD and
      one PTE:
      
        test-1265  [003] ....
         496.615250: dax_writeback_range: dev 259:0 ino 0x1003 pgoff 0x0-0x7ffffffffffff
      
        test-1265  [003] ....
         496.616263: dax_writeback_one: dev 259:0 ino 0x1003 pgoff 0x0 pglen 0x200
      
        test-1265  [003] ....
         496.616270: dax_writeback_one: dev 259:0 ino 0x1003 pgoff 0x305 pglen 0x1
      
        test-1265  [003] ....
         496.616272: dax_writeback_range_done: dev 259:0 ino 0x1003 pgoff 0x0-0x7ffffffffffff
      
      [akpm@linux-foundation.org: struct blk_dax_ctl has disappeared]
      Link: http://lkml.kernel.org/r/20170221195116.13278-6-ross.zwisler@linux.intel.comSigned-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f9bc3a07
    • R
      dax: add tracepoints to dax_writeback_mapping_range() · d14a3f48
      Ross Zwisler 提交于
      Add tracepoints to dax_writeback_mapping_range(), following the same
      logging conventions as the rest of DAX.
      
      Here is an example writeback call:
      
        msync-1085  [006] ....
         200.902565: dax_writeback_range: dev 259:0 ino 0x1003 pgoff 0x200-0x2ff
      
        msync-1085  [006] ....
         200.902579: dax_writeback_range_done: dev 259:0 ino 0x1003 pgoff 0x200-0x2ff
      
      [ross.zwisler@linux.intel.com: fix regression in dax_writeback_mapping_range()]
        Link: http://lkml.kernel.org/r/20170314215358.31451-1-ross.zwisler@linux.intel.com
      Link: http://lkml.kernel.org/r/20170221195116.13278-5-ross.zwisler@linux.intel.comSigned-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d14a3f48
    • R
      dax: add tracepoints to dax_load_hole() · 678c9fd0
      Ross Zwisler 提交于
      Add tracepoints to dax_load_hole(), following the same logging conventions
      as the rest of DAX.
      
      Here is the logging generated by a PTE read from a hole:
      
        read-1075  [002] ....
          62.362108: dax_pte_fault: dev 259:0 ino 0x1003 shared ALLOW_RETRY|KILLABLE|USER address 0x10480000 pgoff 0x280
      
        read-1075  [002] ....
          62.362140: dax_load_hole: dev 259:0 ino 0x1003 shared ALLOW_RETRY|KILLABLE|USER address 0x10480000 pgoff 0x280 NOPAGE
      
        read-1075  [002] ....
          62.362141: dax_pte_fault_done: dev 259:0 ino 0x1003 shared ALLOW_RETRY|KILLABLE|USER address 0x10480000 pgoff 0x280 NOPAGE
      
      Link: http://lkml.kernel.org/r/20170221195116.13278-4-ross.zwisler@linux.intel.comSigned-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      678c9fd0
    • R
      dax: add tracepoints to dax_pfn_mkwrite() · c3ff68d7
      Ross Zwisler 提交于
      Add tracepoints to dax_pfn_mkwrite(), following the same logging
      conventions as the rest of DAX.
      
      Here is an example PTE fault followed by a pfn_mkwrite:
      
        small_aligned-1094  [002] ....
         374.084998: dax_pte_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10400000 pgoff 0x200
      
        small_aligned-1094  [002] ....
         374.085145: dax_pte_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10400000 pgoff 0x200 MAJOR|NOPAGE
      
        small_aligned-1094  [002] ....
         374.085165: dax_pfn_mkwrite: dev 259:0 ino 0x1003 shared WRITE|MKWRITE|ALLOW_RETRY|KILLABLE|USER address 0x10400000 pgoff 0x200 NOPAGE
      
      Link: http://lkml.kernel.org/r/20170221195116.13278-3-ross.zwisler@linux.intel.comSigned-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c3ff68d7
    • R
      dax: add tracepoints to dax_iomap_pte_fault() · a9c42b33
      Ross Zwisler 提交于
      Patch series "second round of tracepoints for DAX".
      
      This second round of DAX tracepoint patches adds tracing to the PTE
      fault path (dax_iomap_pte_fault(), dax_pfn_mkwrite(), dax_load_hole(),
      dax_insert_mapping()) and to the writeback path
      (dax_writeback_mapping_range(), dax_writeback_one()).
      
      The purpose of this tracing is to give us a high level view of what DAX
      is doing, whether faults are being serviced by PMDs or PTEs, and by real
      storage or by zero pages covering holes.
      
      I do have some patches nearly ready which also add tracing to
      grab_mapping_entry() and dax_insert_mapping_entry().  These are more
      targeted at logging how we are interacting with the radix tree, how we
      use empty entries for locking, whether we "downgrade" huge zero pages to
      4k PTE sized allocations, etc.  In the end it seemed to me that this
      might be too detailed to have as constantly present tracepoints, but if
      anyone sees value in having tracepoints like this in the DAX code
      permanently (Jan?), please let me know and I'll add those last two
      patches.
      
      All these tracepoints were done to be consistent with the style of the
      XFS tracepoints and with the existing DAX PMD tracepoints.
      
      This patch (of 6):
      
      Add tracepoints to dax_iomap_pte_fault(), following the same logging
      conventions as the rest of DAX.
      
      Here is an example fault that initially tries to be serviced by the PMD
      fault handler but which falls back to PTEs because the VMA isn't large
      enough to hold a PMD:
      
        small-1086  [005] ....
         71.140014: xfs_filemap_huge_fault: dev 259:0 ino 0x1003
      
        small-1086  [005] ....
          71.140027: dax_pmd_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 vm_start 0x10200000 vm_end 0x10500000 pgoff 0x220 max_pgoff 0x1400
      
        small-1086  [005] ....
          71.140028: dax_pmd_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 vm_start 0x10200000 vm_end 0x10500000 pgoff 0x220 max_pgoff 0x1400 FALLBACK
      
        small-1086  [005] ....
          71.140035: dax_pte_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220
      
        small-1086  [005] ....
          71.140396: dax_pte_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220 MAJOR|NOPAGE
      
      Link: http://lkml.kernel.org/r/20170221195116.13278-2-ross.zwisler@linux.intel.comSigned-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9c42b33
  14. 05 5月, 2017 1 次提交
    • L
      trace: thermal: add another parameter 'power' to the tracing function · 771ffa14
      Lukasz Luba 提交于
      This patch adds another parameter to the trace function:
      trace_thermal_power_devfreq_get_power().
      
      In case when we call directly driver's code for the real power,
      we do not have static/dynamic_power values. Instead we get total
      power in the '*power' value. The 'static_power' and
      'dynamic_power' are set to 0.
      
      Therefore, we have to trace that '*power' value in this scenario.
      
      CC: Steven Rostedt <rostedt@goodmis.org>
      CC: Ingo Molnar <mingo@redhat.com>
      CC: Zhang Rui <rui.zhang@intel.com>
      CC: Eduardo Valentin <edubezval@gmail.com>
      Acked-by: NJavi Merino <javi.merino@kernel.org>
      Signed-off-by: NLukasz Luba <lukasz.luba@arm.com>
      771ffa14
  15. 04 5月, 2017 2 次提交
  16. 30 4月, 2017 1 次提交
  17. 29 4月, 2017 1 次提交
  18. 25 4月, 2017 1 次提交
  19. 21 4月, 2017 2 次提交
  20. 20 4月, 2017 1 次提交
  21. 18 4月, 2017 4 次提交
  22. 15 4月, 2017 1 次提交
  23. 06 4月, 2017 1 次提交