1. 12 2月, 2015 3 次提交
  2. 08 2月, 2015 1 次提交
    • S
      x86/tlb/trace: Do not trace on CPU that is offline · 6c8465a8
      Steven Rostedt (Red Hat) 提交于
      When taking a CPU down for suspend and resume, a tracepoint may be called
      when the CPU has been designated offline. As tracepoints require RCU for
      protection, they must not be called if the current CPU is offline.
      
      Unfortunately, trace_tlb_flush() is called in this scenario as was noted
      by LOCKDEP:
      
      ...
      
       Disabling non-boot CPUs ...
       intel_pstate CPU 1 exiting
      
       ===============================
       smpboot: CPU 1 didn't die...
       [ INFO: suspicious RCU usage. ]
       3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted
       -------------------------------
       include/trace/events/tlb.h:35 suspicious rcu_dereference_check() usage!
      
       other info that might help us debug this:
      
       RCU used illegally from offline CPU!
       rcu_scheduler_active = 1, debug_locks = 0
       no locks held by swapper/1/0.
      
       stack backtrace:
       CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150204.1-iniza-small #1
       Hardware name: SAMSUNG ELECTRONICS CO., LTD. 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
        0000000000000001 ffff88011a44fe18 ffffffff817e370d 0000000000000011
        ffff88011a448290 ffff88011a44fe48 ffffffff810d6847 ffff8800c66b9600
        0000000000000001 ffff88011a44c000 ffffffff81cb3900 ffff88011a44fe78
       Call Trace:
        [<ffffffff817e370d>] dump_stack+0x4c/0x65
        [<ffffffff810d6847>] lockdep_rcu_suspicious+0xe7/0x120
        [<ffffffff810b71a5>] idle_task_exit+0x205/0x2c0
        [<ffffffff81054c4e>] play_dead_common+0xe/0x50
        [<ffffffff81054ca5>] native_play_dead+0x15/0x140
        [<ffffffff8102963f>] arch_cpu_idle_dead+0xf/0x20
        [<ffffffff810cd89e>] cpu_startup_entry+0x37e/0x580
        [<ffffffff81053e20>] start_secondary+0x140/0x150
       intel_pstate CPU 2 exiting
      
      ...
      
      By converting the tlb_flush tracepoint to a TRACE_EVENT_CONDITION where the
      condition is cpu_online(smp_processor_id()), we can avoid calling RCU protected
      code when the CPU is offline.
      
      Link: http://lkml.kernel.org/r/CA+icZUUGiGDoL5NU8RuxKzFjoLjEKRtUWx=JB8B9a0EQv-eGzQ@mail.gmail.com
      
      Cc: stable@vger.kernel.org # 3.17+
      Fixes: d17d8f9d "x86/mm: Add tracepoints for TLB flushes"
      Reported-by: NSedat Dilek <sedat.dilek@gmail.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Suggested-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: NDave Hansen <dave@sr71.net>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      6c8465a8
  3. 19 1月, 2015 1 次提交
  4. 14 1月, 2015 2 次提交
    • P
      perf: Avoid horrible stack usage · 86038c5e
      Peter Zijlstra (Intel) 提交于
      Both Linus (most recent) and Steve (a while ago) reported that perf
      related callbacks have massive stack bloat.
      
      The problem is that software events need a pt_regs in order to
      properly report the event location and unwind stack. And because we
      could not assume one was present we allocated one on stack and filled
      it with minimal bits required for operation.
      
      Now, pt_regs is quite large, so this is undesirable. Furthermore it
      turns out that most sites actually have a pt_regs pointer available,
      making this even more onerous, as the stack space is pointless waste.
      
      This patch addresses the problem by observing that software events
      have well defined nesting semantics, therefore we can use static
      per-cpu storage instead of on-stack.
      
      Linus made the further observation that all but the scheduler callers
      of perf_sw_event() have a pt_regs available, so we change the regular
      perf_sw_event() to require a valid pt_regs (where it used to be
      optional) and add perf_sw_event_sched() for the scheduler.
      
      We have a scheduler specific call instead of a more generic _noregs()
      like construct because we can assume non-recursion from the scheduler
      and thereby simplify the code further (_noregs would have to put the
      recursion context call inline in order to assertain which __perf_regs
      element to use).
      
      One last note on the implementation of perf_trace_buf_prepare(); we
      allow .regs = NULL for those cases where we already have a pt_regs
      pointer available and do not need another.
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Reported-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Javi Merino <javi.merino@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Petr Mladek <pmladek@suse.cz>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
      Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
      Link: http://lkml.kernel.org/r/20141216115041.GW3337@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      86038c5e
    • J
      net: rename vlan_tx_* helpers since "tx" is misleading there · df8a39de
      Jiri Pirko 提交于
      The same macros are used for rx as well. So rename it.
      Signed-off-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      df8a39de
  5. 13 12月, 2014 1 次提交
    • S
      tracing/sched: Check preempt_count() for current when reading task->state · aee4e5f3
      Steven Rostedt (Red Hat) 提交于
      When recording the state of a task for the sched_switch tracepoint a check of
      task_preempt_count() is performed to see if PREEMPT_ACTIVE is set. This is
      because, technically, a task being preempted is really in the TASK_RUNNING
      state, and that is what should be recorded when tracing a sched_switch,
      even if the task put itself into another state (it hasn't scheduled out
      in that state yet).
      
      But with the change to use per_cpu preempt counts, the
      task_thread_info(p)->preempt_count is no longer used, and instead
      task_preempt_count(p) is used.
      
      The problem is that this does not use the current preempt count but a stale
      one from a previous sched_switch. The task_preempt_count(p) uses
      saved_preempt_count and not preempt_count(). But for tracing sched_switch,
      if p is current, we really want preempt_count().
      
      I hit this bug when I was tracing sleep and the call from do_nanosleep()
      scheduled out in the "RUNNING" state.
      
                 sleep-4290  [000] 537272.259992: sched_switch:         sleep:4290 [120] R ==> swapper/0:0 [120]
                 sleep-4290  [000] 537272.260015: kernel_stack:         <stack trace>
      => __schedule (ffffffff8150864a)
      => schedule (ffffffff815089f8)
      => do_nanosleep (ffffffff8150b76c)
      => hrtimer_nanosleep (ffffffff8108d66b)
      => SyS_nanosleep (ffffffff8108d750)
      => return_to_handler (ffffffff8150e8e5)
      => tracesys_phase2 (ffffffff8150c844)
      
      After a bit of hair pulling, I found that the state was really
      TASK_INTERRUPTIBLE, but the saved_preempt_count had an old PREEMPT_ACTIVE
      set and caused the sched_switch tracepoint to show it as RUNNING.
      
      Link: http://lkml.kernel.org/r/20141210174428.3cb7542a@gandalf.local.homeAcked-by: NIngo Molnar <mingo@kernel.org>
      Cc: stable@vger.kernel.org # 3.13+
      Cc: Peter Zijlstra <peterz@infradead.org>
      Fixes: 01028747 "sched: Create more preempt_count accessors"
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      aee4e5f3
  6. 10 12月, 2014 8 次提交
  7. 04 12月, 2014 1 次提交
  8. 26 11月, 2014 3 次提交
    • Z
      ext4: change LRU to round-robin in extent status tree shrinker · edaa53ca
      Zheng Liu 提交于
      In this commit we discard the lru algorithm for inodes with extent
      status tree because it takes significant effort to maintain a lru list
      in extent status tree shrinker and the shrinker can take a long time to
      scan this lru list in order to reclaim some objects.
      
      We replace the lru ordering with a simple round-robin.  After that we
      never need to keep a lru list.  That means that the list needn't be
      sorted if the shrinker can not reclaim any objects in the first round.
      
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      edaa53ca
    • Z
      ext4: cache extent hole in extent status tree for ext4_da_map_blocks() · 2f8e0a7c
      Zheng Liu 提交于
      Currently extent status tree doesn't cache extent hole when a write
      looks up in extent tree to make sure whether a block has been allocated
      or not.  In this case, we don't put extent hole in extent cache because
      later this extent might be removed and a new delayed extent might be
      added back.  But it will cause a defect when we do a lot of writes.  If
      we don't put extent hole in extent cache, the following writes also need
      to access extent tree to look at whether or not a block has been
      allocated.  It brings a cache miss.  This commit fixes this defect.
      Also if the inode doesn't have any extent, this extent hole will be
      cached as well.
      
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      2f8e0a7c
    • J
      ext4: fix block reservation for bigalloc filesystems · cbd7584e
      Jan Kara 提交于
      For bigalloc filesystems we have to check whether newly requested inode
      block isn't already part of a cluster for which we already have delayed
      allocation reservation. This check happens in ext4_ext_map_blocks() and
      that function sets EXT4_MAP_FROM_CLUSTER if that's the case. However if
      ext4_da_map_blocks() finds in extent cache information about the block,
      we don't call into ext4_ext_map_blocks() and thus we always end up
      getting new reservation even if the space for cluster is already
      reserved. This results in overreservation and premature ENOSPC reports.
      
      Fix the problem by checking for existing cluster reservation already in
      ext4_da_map_blocks(). That simplifies the logic and actually allows us
      to get rid of the EXT4_MAP_FROM_CLUSTER flag completely.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      cbd7584e
  9. 25 11月, 2014 4 次提交
  10. 20 11月, 2014 2 次提交
  11. 13 11月, 2014 1 次提交
  12. 11 11月, 2014 1 次提交
  13. 09 11月, 2014 1 次提交
  14. 30 10月, 2014 1 次提交
  15. 29 10月, 2014 1 次提交
    • P
      rcu: Make rcu_barrier() understand about missing rcuo kthreads · d7e29933
      Paul E. McKenney 提交于
      Commit 35ce7f29 (rcu: Create rcuo kthreads only for onlined CPUs)
      avoids creating rcuo kthreads for CPUs that never come online.  This
      fixes a bug in many instances of firmware: Instead of lying about their
      age, these systems instead lie about the number of CPUs that they have.
      Before commit 35ce7f29, this could result in huge numbers of useless
      rcuo kthreads being created.
      
      It appears that experience indicates that I should have told the
      people suffering from this problem to fix their broken firmware, but
      I instead produced what turned out to be a partial fix.   The missing
      piece supplied by this commit makes sure that rcu_barrier() knows not to
      post callbacks for no-CBs CPUs that have not yet come online, because
      otherwise rcu_barrier() will hang on systems having firmware that lies
      about the number of CPUs.
      
      It is tempting to simply have rcu_barrier() refuse to post a callback on
      any no-CBs CPU that does not have an rcuo kthread.  This unfortunately
      does not work because rcu_barrier() is required to wait for all pending
      callbacks.  It is therefore required to wait even for those callbacks
      that cannot possibly be invoked.  Even if doing so hangs the system.
      
      Given that posting a callback to a no-CBs CPU that does not yet have an
      rcuo kthread can hang rcu_barrier(), It is tempting to report an error
      in this case.  Unfortunately, this will result in false positives at
      boot time, when it is perfectly legal to post callbacks to the boot CPU
      before the scheduler has started, in other words, before it is legal
      to invoke rcu_barrier().
      
      So this commit instead has rcu_barrier() avoid posting callbacks to
      CPUs having neither rcuo kthread nor pending callbacks, and has it
      complain bitterly if it finds CPUs having no rcuo kthread but some
      pending callbacks.  And when rcu_barrier() does find CPUs having no rcuo
      kthread but pending callbacks, as noted earlier, it has no choice but
      to hang indefinitely.
      Reported-by: NYanko Kaneti <yaneti@declera.com>
      Reported-by: NJay Vosburgh <jay.vosburgh@canonical.com>
      Reported-by: NMeelis Roos <mroos@linux.ee>
      Reported-by: NEric B Munson <emunson@akamai.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NEric B Munson <emunson@akamai.com>
      Tested-by: NJay Vosburgh <jay.vosburgh@canonical.com>
      Tested-by: NYanko Kaneti <yaneti@declera.com>
      Tested-by: NKevin Fenzi <kevin@scrye.com>
      Tested-by: NMeelis Roos <mroos@linux.ee>
      d7e29933
  16. 28 10月, 2014 1 次提交
  17. 08 10月, 2014 1 次提交
  18. 01 10月, 2014 2 次提交
  19. 25 9月, 2014 1 次提交
  20. 24 9月, 2014 1 次提交
  21. 18 9月, 2014 3 次提交