1. 10 5月, 2016 1 次提交
  2. 12 4月, 2016 2 次提交
  3. 04 4月, 2016 1 次提交
  4. 02 4月, 2016 1 次提交
  5. 21 3月, 2016 1 次提交
    • D
      ipv6, trace: fix tos reporting on fib6_table_lookup · 69716a2b
      Daniel Borkmann 提交于
      flowi6_tos of struct flowi6 is unused in IPv6, therefore dumping tos on
      that tracepoint will also give incorrect information wrt traffic class.
      
      If we want to fix it, we need to extract it via ip6_tclass(flp->flowlabel).
      While for the same test case I get a count of 0 non-zero tos values before
      the change, they now start to show up after the change:
      
        # ./perf record -e fib6:fib6_table_lookup -a sleep 10
        # ./perf script | grep -v "tos 0" | wc -l
        60
      
      Since there's no user in the kernel tree anymore of flowi6_tos, remove the
      define to avoid any future confusion on this.
      
      Fixes: b811580d ("net: IPv6 fib lookup tracepoint")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      69716a2b
  6. 18 3月, 2016 3 次提交
    • J
      mm/page_ref: add tracepoint to track down page reference manipulation · 95813b8f
      Joonsoo Kim 提交于
      CMA allocation should be guaranteed to succeed by definition, but,
      unfortunately, it would be failed sometimes.  It is hard to track down
      the problem, because it is related to page reference manipulation and we
      don't have any facility to analyze it.
      
      This patch adds tracepoints to track down page reference manipulation.
      With it, we can find exact reason of failure and can fix the problem.
      Following is an example of tracepoint output.  (note: this example is
      stale version that printing flags as the number.  Recent version will
      print it as human readable string.)
      
      <...>-9018  [004]    92.678375: page_ref_set:         pfn=0x17ac9 flags=0x0 count=1 mapcount=0 mapping=(nil) mt=4 val=1
      <...>-9018  [004]    92.678378: kernel_stack:
       => get_page_from_freelist (ffffffff81176659)
       => __alloc_pages_nodemask (ffffffff81176d22)
       => alloc_pages_vma (ffffffff811bf675)
       => handle_mm_fault (ffffffff8119e693)
       => __do_page_fault (ffffffff810631ea)
       => trace_do_page_fault (ffffffff81063543)
       => do_async_page_fault (ffffffff8105c40a)
       => async_page_fault (ffffffff817581d8)
      [snip]
      <...>-9018  [004]    92.678379: page_ref_mod:         pfn=0x17ac9 flags=0x40048 count=2 mapcount=1 mapping=0xffff880015a78dc1 mt=4 val=1
      [snip]
      ...
      ...
      <...>-9131  [001]    93.174468: test_pages_isolated:  start_pfn=0x17800 end_pfn=0x17c00 fin_pfn=0x17ac9 ret=fail
      [snip]
      <...>-9018  [004]    93.174843: page_ref_mod_and_test: pfn=0x17ac9 flags=0x40068 count=0 mapcount=0 mapping=0xffff880015a78dc1 mt=4 val=-1 ret=1
       => release_pages (ffffffff8117c9e4)
       => free_pages_and_swap_cache (ffffffff811b0697)
       => tlb_flush_mmu_free (ffffffff81199616)
       => tlb_finish_mmu (ffffffff8119a62c)
       => exit_mmap (ffffffff811a53f7)
       => mmput (ffffffff81073f47)
       => do_exit (ffffffff810794e9)
       => do_group_exit (ffffffff81079def)
       => SyS_exit_group (ffffffff81079e74)
       => entry_SYSCALL_64_fastpath (ffffffff817560b6)
      
      This output shows that problem comes from exit path.  In exit path, to
      improve performance, pages are not freed immediately.  They are gathered
      and processed by batch.  During this process, migration cannot be
      possible and CMA allocation is failed.  This problem is hard to find
      without this page reference tracepoint facility.
      
      Enabling this feature bloat kernel text 30 KB in my configuration.
      
         text    data     bss     dec     hex filename
      12127327        2243616 1507328 15878271         f2487f vmlinux_disabled
      12157208        2258880 1507328 15923416         f2f8d8 vmlinux_enabled
      
      Note that, due to header file dependency problem between mm.h and
      tracepoint.h, this feature has to open code the static key functions for
      tracepoints.  Proposed by Steven Rostedt in following link.
      
      https://lkml.org/lkml/2015/12/9/699
      
      [arnd@arndb.de: crypto/async_pq: use __free_page() instead of put_page()]
      [iamjoonsoo.kim@lge.com: fix build failure for xtensa]
      [akpm@linux-foundation.org: tweak Kconfig text, per Vlastimil]
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NMichal Nazarewicz <mina86@mina86.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      95813b8f
    • K
      mm, tracing: refresh __def_vmaflag_names · bcf66917
      Kirill A. Shutemov 提交于
      Get list of VMA flags up-to-date and sort it to match VM_* definition
      order.
      
      [vbabka@suse.cz: add a note above vmaflag definitions to update the names when changing]
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bcf66917
    • V
      mm, compaction: introduce kcompactd · 698b1b30
      Vlastimil Babka 提交于
      Memory compaction can be currently performed in several contexts:
      
       - kswapd balancing a zone after a high-order allocation failure
       - direct compaction to satisfy a high-order allocation, including THP
         page fault attemps
       - khugepaged trying to collapse a hugepage
       - manually from /proc
      
      The purpose of compaction is two-fold.  The obvious purpose is to
      satisfy a (pending or future) high-order allocation, and is easy to
      evaluate.  The other purpose is to keep overal memory fragmentation low
      and help the anti-fragmentation mechanism.  The success wrt the latter
      purpose is more
      
      The current situation wrt the purposes has a few drawbacks:
      
       - compaction is invoked only when a high-order page or hugepage is not
         available (or manually).  This might be too late for the purposes of
         keeping memory fragmentation low.
       - direct compaction increases latency of allocations.  Again, it would
         be better if compaction was performed asynchronously to keep
         fragmentation low, before the allocation itself comes.
       - (a special case of the previous) the cost of compaction during THP
         page faults can easily offset the benefits of THP.
       - kswapd compaction appears to be complex, fragile and not working in
         some scenarios.  It could also end up compacting for a high-order
         allocation request when it should be reclaiming memory for a later
         order-0 request.
      
      To improve the situation, we should be able to benefit from an
      equivalent of kswapd, but for compaction - i.e. a background thread
      which responds to fragmentation and the need for high-order allocations
      (including hugepages) somewhat proactively.
      
      One possibility is to extend the responsibilities of kswapd, which could
      however complicate its design too much.  It should be better to let
      kswapd handle reclaim, as order-0 allocations are often more critical
      than high-order ones.
      
      Another possibility is to extend khugepaged, but this kthread is a
      single instance and tied to THP configs.
      
      This patch goes with the option of a new set of per-node kthreads called
      kcompactd, and lays the foundations, without introducing any new
      tunables.  The lifecycle mimics kswapd kthreads, including the memory
      hotplug hooks.
      
      For compaction, kcompactd uses the standard compaction_suitable() and
      ompact_finished() criteria and the deferred compaction functionality.
      Unlike direct compaction, it uses only sync compaction, as there's no
      allocation latency to minimize.
      
      This patch doesn't yet add a call to wakeup_kcompactd.  The kswapd
      compact/reclaim loop for high-order pages will be replaced by waking up
      kcompactd in the next patch with the description of what's wrong with
      the old approach.
      
      Waking up of the kcompactd threads is also tied to kswapd activity and
      follows these rules:
       - we don't want to affect any fastpaths, so wake up kcompactd only from
         the slowpath, as it's done for kswapd
       - if kswapd is doing reclaim, it's more important than compaction, so
         don't invoke kcompactd until kswapd goes to sleep
       - the target order used for kswapd is passed to kcompactd
      
      Future possible future uses for kcompactd include the ability to wake up
      kcompactd on demand in special situations, such as when hugepages are
      not available (currently not done due to __GFP_NO_KSWAPD) or when a
      fragmentation event (i.e.  __rmqueue_fallback()) occurs.  It's also
      possible to perform periodic compaction with kcompactd.
      
      [arnd@arndb.de: fix build errors with kcompactd]
      [paul.gortmaker@windriver.com: don't use modular references for non modular code]
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      698b1b30
  7. 16 3月, 2016 2 次提交
    • V
      mm, tracing: unify mm flags handling in tracepoints and printk · 420adbe9
      Vlastimil Babka 提交于
      In tracepoints, it's possible to print gfp flags in a human-friendly
      format through a macro show_gfp_flags(), which defines a translation
      array and passes is to __print_flags().  Since the following patch will
      introduce support for gfp flags printing in printk(), it would be nice
      to reuse the array.  This is not straightforward, since __print_flags()
      can't simply reference an array defined in a .c file such as mm/debug.c
      - it has to be a macro to allow the macro magic to communicate the
      format to userspace tools such as trace-cmd.
      
      The solution is to create a macro __def_gfpflag_names which is used both
      in show_gfp_flags(), and to define the gfpflag_names[] array in
      mm/debug.c.
      
      On the other hand, mm/debug.c also defines translation tables for page
      flags and vma flags, and desire was expressed (but not implemented in
      this series) to use these also from tracepoints.  Thus, this patch also
      renames the events/gfpflags.h file to events/mmflags.h and moves the
      table definitions there, using the same macro approach as for gfpflags.
      This allows translating all three kinds of mm-specific flags both in
      tracepoints and printk.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NMichal Hocko <mhocko@suse.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      420adbe9
    • V
      mm, tracing: make show_gfp_flags() up to date · 1f7866b4
      Vlastimil Babka 提交于
      The show_gfp_flags() macro provides human-friendly printing of gfp flags
      in tracepoints.  However, it is somewhat out of date and missing several
      flags.  This patches fills in the missing flags, and distinguishes
      properly between GFP_ATOMIC and __GFP_ATOMIC which were both translated
      to "GFP_ATOMIC".  More generally, all __GFP_X flags which were
      previously printed as GFP_X, are now printed as __GFP_X, since ommiting
      the underscores results in output that doesn't actually match the source
      code, and can only lead to confusion.  Where both variants are defined
      equal (e.g.  _DMA and _DMA32), the variant without underscores are
      preferred.
      
      Also add a note in gfp.h so hopefully future changes will be synced
      better.
      
      __GFP_MOVABLE is defined twice in include/linux/gfp.h with different
      comments.  Leave just the newer one, which was intended to replace the
      old one.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1f7866b4
  8. 15 3月, 2016 1 次提交
    • M
      thermal: trace: migrating thermal traces to use TRACE_DEFINE_ENUM() macros · d0b45880
      Michele Di Giorgio 提交于
      Userspace tools are not aware of how to convert the enums provided by
      the tracepoints to their corresponding strings.
      
      Adding TRACE_DEFINE_ENUM() macros allows to make the enums available
      to userspace to let the tools know what those enum values represent.
      
      In particular, for thermal zone trip types what we obtained before was
      something like:
      
      kworker/1:1-460   [001]   320.372732: thermal_zone_trip:    thermal_zone=soc
      				id=0 trip=1 trip_type=1
      
      Unfortunately, userspace tools do not know how to convert enum values to
      strings and as a consequence they can only forward the enum value to the
      output. By using TRACE_DEFINE_ENUM() macros for thermal traces we get the
      following trace line:
      
      kworker/1:1-460   [001]   320.372732: thermal_zone_trip:    thermal_zone=soc
      				id=0 trip=1 trip_type=PASSIVE
      
      Userspace tools are now able to better understand the meaning of the trip_type
      and provide the user with more readable information.
      
      CC: Steven Rostedt <rostedt@goodmis.org>
      CC: Eduardo Valentin <edubezval@gmail.com>
      Signed-off-by: NMichele Di Giorgio <michele.digiorgio@arm.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Acked-by: NJavi Merino <javi.merino@arm.com>
      Signed-off-by: NZhang Rui <rui.zhang@intel.com>
      d0b45880
  9. 09 3月, 2016 2 次提交
    • Y
      tracing, writeback: Replace cgroup path to cgroup ino · a664edb3
      Yang Shi 提交于
      commit 5634cc2a ("writeback: update writeback
      tracepoints to report cgroup") made writeback tracepoints print out cgroup
      path when CGROUP_WRITEBACK is enabled, but it may trigger the below bug on -rt
      kernel since kernfs_path and kernfs_path_len are called by tracepoints, which
      acquire spin lock that is sleepable on -rt kernel.
      
      BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:930
      in_atomic(): 1, irqs_disabled(): 0, pid: 625, name: kworker/u16:3
      INFO: lockdep is turned off.
      Preemption disabled at:[<ffffffc000374a5c>] wb_writeback+0xec/0x830
      
      CPU: 7 PID: 625 Comm: kworker/u16:3 Not tainted 4.4.1-rt5 #20
      Hardware name: Freescale Layerscape 2085a RDB Board (DT)
      Workqueue: writeback wb_workfn (flush-7:0)
      Call trace:
      [<ffffffc00008d708>] dump_backtrace+0x0/0x200
      [<ffffffc00008d92c>] show_stack+0x24/0x30
      [<ffffffc0007b0f40>] dump_stack+0x88/0xa8
      [<ffffffc000127d74>] ___might_sleep+0x2ec/0x300
      [<ffffffc000d5d550>] rt_spin_lock+0x38/0xb8
      [<ffffffc0003e0548>] kernfs_path_len+0x30/0x90
      [<ffffffc00036b360>] trace_event_raw_event_writeback_work_class+0xe8/0x2e8
      [<ffffffc000374f90>] wb_writeback+0x620/0x830
      [<ffffffc000376224>] wb_workfn+0x61c/0x950
      [<ffffffc000110adc>] process_one_work+0x3ac/0xb30
      [<ffffffc0001112fc>] worker_thread+0x9c/0x7a8
      [<ffffffc00011a9e8>] kthread+0x190/0x1b0
      [<ffffffc000086ca0>] ret_from_fork+0x10/0x30
      
      With unlocked kernfs_* functions, synchronize_sched() has to be called in
      kernfs_rename which could be called in syscall path, but it is problematic.
      So, print out cgroup ino instead of path name, which could be converted to
      path name by userland.
      
      Withouth CGROUP_WRITEBACK enabled, it just prints out root dir. But, root
      dir ino vary from different filesystems, so printing out -1U to indicate
      an invalid cgroup ino.
      
      Link: http://lkml.kernel.org/r/1456996137-8354-1-git-send-email-yang.shi@linaro.orgAcked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NYang Shi <yang.shi@linaro.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a664edb3
    • S
      tracing: Remove duplicate checks for online CPUs · 633f6f58
      Steven Rostedt (Red Hat) 提交于
      Some trace events have conditions that check if the current CPU is online or
      not before recording the tracepoint. That's because certain trace events are
      in locations that can be called as the CPU is going offline and when RCU no
      longer monitors it (like kfree and friends). The check was added because
      trace events require RCU to be active.
      
      This is a trace event infrastructure issue and not something that individual
      trace events should worry about. The tracepoint.h code now has added a check
      to see if the current CPU is considered online, and it only does the
      tracepoint if it is. There's no more need for individual trace events to
      also include this check. It is now redundant.
      
      Cc: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      633f6f58
  10. 02 3月, 2016 2 次提交
    • F
      nohz: Use enum code for tick stop failure tracing message · e6e6cc22
      Frederic Weisbecker 提交于
      It makes nohz tracing more lightweight, standard and easier to parse.
      
      Examples:
      
             user_loop-2904  [007] d..1   517.701126: tick_stop: success=1 dependency=NONE
             user_loop-2904  [007] dn.1   518.021181: tick_stop: success=0 dependency=SCHED
          posix_timers-6142  [007] d..1  1739.027400: tick_stop: success=0 dependency=POSIX_TIMER
             user_loop-5463  [007] dN.1  1185.931939: tick_stop: success=0 dependency=PERF_EVENTS
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Reviewed-by: NChris Metcalf <cmetcalf@ezchip.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      e6e6cc22
    • T
      cpu/hotplug: Add tracepoints · 5ba9ac8e
      Thomas Gleixner 提交于
      We want to trace the hotplug machinery. Add tracepoints to track the
      invocation of callbacks and their result.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Link: http://lkml.kernel.org/r/20160226182340.593563875@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      5ba9ac8e
  11. 26 2月, 2016 1 次提交
    • A
      ASoC: trace: fix printing jack name · f4833a51
      Arnd Bergmann 提交于
      After a change to the snd_jack structure, the 'name' member
      is no longer available in all configurations, which results in a
      build failure in the tracing code:
      
      include/trace/events/asoc.h: In function 'trace_event_raw_event_snd_soc_jack_report':
      include/trace/events/asoc.h:240:32: error: 'struct snd_jack' has no member named 'name'
      
      The name field is normally initialized from the card shortname and
      the jack "id" field:
      
              snprintf(jack->name, sizeof(jack->name), "%s %s",
                       card->shortname, jack->id);
      
      This changes the tracing output to just contain the 'id' by
      itself, which slightly changes the output format but avoids the
      link error and is hopefully still enough to see what is going on.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Fixes: fe0d128c ("ALSA: jack: Allow building the jack layer without input device")
      Signed-off-by: NMark Brown <broonie@kernel.org>
      f4833a51
  12. 23 2月, 2016 2 次提交
    • C
      f2fs: trace old block address for CoWed page · 7a9d7548
      Chao Yu 提交于
      This patch enables to trace old block address of CoWed page for better
      debugging.
      
      f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f0, oldaddr = 0xfe8ab, newaddr = 0xfee90 rw = WRITE_SYNC, type = NODE
      f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f8, oldaddr = 0xfe8b0, newaddr = 0xfee91 rw = WRITE_SYNC, type = NODE
      f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4fa, oldaddr = 0xfe8ae, newaddr = 0xfee92 rw = WRITE_SYNC, type = NODE
      
      f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x96, oldaddr = 0xf049b, newaddr = 0x2bbe rw = WRITE, type = DATA
      f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x97, oldaddr = 0xf049c, newaddr = 0x2bbf rw = WRITE, type = DATA
      f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x98, oldaddr = 0xf049d, newaddr = 0x2bc0 rw = WRITE, type = DATA
      
      f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x47, oldaddr = 0xffffffff, newaddr = 0xf2631 rw = WRITE, type = DATA
      f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x48, oldaddr = 0xffffffff, newaddr = 0xf2632 rw = WRITE, type = DATA
      f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x49, oldaddr = 0xffffffff, newaddr = 0xf2633 rw = WRITE, type = DATA
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7a9d7548
    • C
      f2fs: support revoking atomic written pages · 28bc106b
      Chao Yu 提交于
      f2fs support atomic write with following semantics:
      1. open db file
      2. ioctl start atomic write
      3. (write db file) * n
      4. ioctl commit atomic write
      5. close db file
      
      With this flow we can avoid file becoming corrupted when abnormal power
      cut, because we hold data of transaction in referenced pages linked in
      inmem_pages list of inode, but without setting them dirty, so these data
      won't be persisted unless we commit them in step 4.
      
      But we should still hold journal db file in memory by using volatile
      write, because our semantics of 'atomic write support' is incomplete, in
      step 4, we could fail to submit all dirty data of transaction, once
      partial dirty data was committed in storage, then after a checkpoint &
      abnormal power-cut, db file will be corrupted forever.
      
      So this patch tries to improve atomic write flow by adding a revoking flow,
      once inner error occurs in committing, this gives another chance to try to
      revoke these partial submitted data of current transaction, it makes
      committing operation more like aotmical one.
      
      If we're not lucky, once revoking operation was failed, EAGAIN will be
      reported to user for suggesting doing the recovery with held journal file,
      or retrying current transaction again.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      28bc106b
  13. 17 2月, 2016 1 次提交
  14. 08 2月, 2016 1 次提交
  15. 05 2月, 2016 1 次提交
  16. 26 1月, 2016 1 次提交
  17. 22 1月, 2016 1 次提交
  18. 16 1月, 2016 1 次提交
  19. 15 1月, 2016 4 次提交
  20. 09 1月, 2016 1 次提交
  21. 18 12月, 2015 3 次提交
  22. 12 12月, 2015 1 次提交
  23. 08 12月, 2015 2 次提交
    • J
      ext4: implement allocation of pre-zeroed blocks · c86d8db3
      Jan Kara 提交于
      DAX page fault path needs to get blocks that are pre-zeroed to avoid
      races when two concurrent page faults happen in the same block of a
      file. Implement support for this in ext4_map_blocks().
      Signed-off-by: NJan Kara <jack@suse.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      c86d8db3
    • J
      ext4: get rid of EXT4_GET_BLOCKS_NO_LOCK flag · 2dcba478
      Jan Kara 提交于
      When dioread_nolock mode is enabled, we grab i_data_sem in
      ext4_ext_direct_IO() and therefore we need to instruct _ext4_get_block()
      not to grab i_data_sem again using EXT4_GET_BLOCKS_NO_LOCK. However
      holding i_data_sem over overwrite direct IO isn't needed these days. We
      have exclusion against truncate / hole punching because we increase
      i_dio_count under i_mutex in ext4_ext_direct_IO() so once
      ext4_file_write_iter() verifies blocks are allocated & written, they are
      guaranteed to stay so during the whole direct IO even after we drop
      i_mutex.
      
      So we can just remove this locking abuse and the no longer necessary
      EXT4_GET_BLOCKS_NO_LOCK flag.
      Signed-off-by: NJan Kara <jack@suse.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      2dcba478
  24. 23 11月, 2015 1 次提交
  25. 07 11月, 2015 3 次提交
    • H
      nilfs2: add tracepoints for analyzing reading and writing metadata files · a9cd207c
      Hitoshi Mitake 提交于
      This patch adds tracepoints for analyzing requests of reading and writing
      metadata files.  The tracepoints cover every in-place mdt files (cpfile,
      sufile, and datfile).
      
      Example of tracing mdt_insert_new_block():
                    cp-14635 [000] ...1 30598.199309: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 155
                    cp-14635 [000] ...1 30598.199520: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 5
                    cp-14635 [000] ...1 30598.200828: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 253
      Signed-off-by: NHitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: TK Kato <TK.Kato@wdc.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9cd207c
    • H
      nilfs2: add tracepoints for analyzing sufile manipulation · 83eec5e6
      Hitoshi Mitake 提交于
      This patch adds tracepoints which would be useful for analyzing segment
      usage from a perspective of high level sufile manipulation (check, alloc,
      free).  sufile is an important in-place updated metadata file, so
      analyzing the behavior would be useful for performance turning.
      
      example of usage (a case of allocation):
      
      $ sudo bin/tpoint nilfs2:nilfs2_segment_usage_allocated
      Tracing nilfs2:nilfs2_segment_usage_allocated. Ctrl-C to end.
              segctord-17800 [002] ...1 10671.867294: nilfs2_segment_usage_allocated: sufile = ffff880054f908a8 segnum = 2
              segctord-17800 [002] ...1 10675.073477: nilfs2_segment_usage_allocated: sufile = ffff880054f908a8 segnum = 3
      Signed-off-by: NHitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Benixon Dhas <benixon.dhas@wdc.com>
      Cc: TK Kato <TK.Kato@wdc.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      83eec5e6
    • H
      nilfs2: add a tracepoint for transaction events · 44fda114
      Hitoshi Mitake 提交于
      This patch adds a tracepoint for transaction events of nilfs.  With the
      tracepoint, these events can be tracked: begin, abort, commit, trylock,
      lock, and unlock.  Basically, these events have corresponding functions
      e.g.  begin event corresponds nilfs_transaction_begin().  The unlock event
      is an exception.  It corresponds to the iteration in
      nilfs_transaction_lock().
      
      Only one tracepoint is introcued: nilfs2_transaction_transition.  The
      above events are distinguished with newly introduced enum.  With this
      tracepoint, we can analyse a critical section of segment constructoin.
      
      Sample output by tpoint of perf-tools:
                    cp-4457  [000] ...1    63.266220: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 1 flags = 9 state = BEGIN
                    cp-4457  [000] ...1    63.266221: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 0 flags = 9 state = COMMIT
                    cp-4457  [000] ...1    63.266221: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 0 flags = 9 state = COMMIT
              segctord-4371  [001] ...1    68.261196: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = TRYLOCK
              segctord-4371  [001] ...1    68.261280: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = LOCK
              segctord-4371  [001] ...1    68.261877: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 1 flags = 10 state = BEGIN
              segctord-4371  [001] ...1    68.262116: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 18 state = COMMIT
              segctord-4371  [001] ...1    68.265032: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 18 state = UNLOCK
              segctord-4371  [001] ...1   132.376847: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = TRYLOCK
      
      This patch also does trivial cleaning of comma usage in collection stage
      transition event for consistent coding style.
      Signed-off-by: NHitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      44fda114