1. 10 12月, 2014 4 次提交
  2. 25 11月, 2014 3 次提交
  3. 29 10月, 2014 1 次提交
    • P
      rcu: Make rcu_barrier() understand about missing rcuo kthreads · d7e29933
      Paul E. McKenney 提交于
      Commit 35ce7f29 (rcu: Create rcuo kthreads only for onlined CPUs)
      avoids creating rcuo kthreads for CPUs that never come online.  This
      fixes a bug in many instances of firmware: Instead of lying about their
      age, these systems instead lie about the number of CPUs that they have.
      Before commit 35ce7f29, this could result in huge numbers of useless
      rcuo kthreads being created.
      
      It appears that experience indicates that I should have told the
      people suffering from this problem to fix their broken firmware, but
      I instead produced what turned out to be a partial fix.   The missing
      piece supplied by this commit makes sure that rcu_barrier() knows not to
      post callbacks for no-CBs CPUs that have not yet come online, because
      otherwise rcu_barrier() will hang on systems having firmware that lies
      about the number of CPUs.
      
      It is tempting to simply have rcu_barrier() refuse to post a callback on
      any no-CBs CPU that does not have an rcuo kthread.  This unfortunately
      does not work because rcu_barrier() is required to wait for all pending
      callbacks.  It is therefore required to wait even for those callbacks
      that cannot possibly be invoked.  Even if doing so hangs the system.
      
      Given that posting a callback to a no-CBs CPU that does not yet have an
      rcuo kthread can hang rcu_barrier(), It is tempting to report an error
      in this case.  Unfortunately, this will result in false positives at
      boot time, when it is perfectly legal to post callbacks to the boot CPU
      before the scheduler has started, in other words, before it is legal
      to invoke rcu_barrier().
      
      So this commit instead has rcu_barrier() avoid posting callbacks to
      CPUs having neither rcuo kthread nor pending callbacks, and has it
      complain bitterly if it finds CPUs having no rcuo kthread but some
      pending callbacks.  And when rcu_barrier() does find CPUs having no rcuo
      kthread but pending callbacks, as noted earlier, it has no choice but
      to hang indefinitely.
      Reported-by: NYanko Kaneti <yaneti@declera.com>
      Reported-by: NJay Vosburgh <jay.vosburgh@canonical.com>
      Reported-by: NMeelis Roos <mroos@linux.ee>
      Reported-by: NEric B Munson <emunson@akamai.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NEric B Munson <emunson@akamai.com>
      Tested-by: NJay Vosburgh <jay.vosburgh@canonical.com>
      Tested-by: NYanko Kaneti <yaneti@declera.com>
      Tested-by: NKevin Fenzi <kevin@scrye.com>
      Tested-by: NMeelis Roos <mroos@linux.ee>
      d7e29933
  4. 08 10月, 2014 1 次提交
  5. 01 10月, 2014 2 次提交
  6. 25 9月, 2014 1 次提交
  7. 24 9月, 2014 1 次提交
  8. 18 9月, 2014 5 次提交
  9. 16 9月, 2014 1 次提交
    • Z
      kvm: ioapic: conditionally delay irq delivery duringeoi broadcast · 184564ef
      Zhang Haoyu 提交于
      Currently, we call ioapic_service() immediately when we find the irq is still
      active during eoi broadcast. But for real hardware, there's some delay between
      the EOI writing and irq delivery.  If we do not emulate this behavior, and
      re-inject the interrupt immediately after the guest sends an EOI and re-enables
      interrupts, a guest might spend all its time in the ISR if it has a broken
      handler for a level-triggered interrupt.
      
      Such livelock actually happens with Windows guests when resuming from
      hibernation.
      
      As there's no way to recognize the broken handle from new raised ones, this patch
      delays an interrupt if 10.000 consecutive EOIs found that the interrupt was
      still high.  The guest can then make a little forward progress, until a proper
      IRQ handler is set or until some detection routine in the guest (such as
      Linux's note_interrupt()) recognizes the situation.
      
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NZhang Haoyu <zhanghy@sangfor.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      184564ef
  10. 08 9月, 2014 1 次提交
  11. 06 9月, 2014 1 次提交
  12. 02 9月, 2014 2 次提交
    • Z
      ext4: track extent status tree shrinker delay statictics · eb68d0e2
      Zheng Liu 提交于
      This commit adds some statictics in extent status tree shrinker.  The
      purpose to add these is that we want to collect more details when we
      encounter a stall caused by extent status tree shrinker.  Here we count
      the following statictics:
        stats:
          the number of all objects on all extent status trees
          the number of reclaimable objects on lru list
          cache hits/misses
          the last sorted interval
          the number of inodes on lru list
        average:
          scan time for shrinking some objects
          the number of shrunk objects
        maximum:
          the inode that has max nr. of objects on lru list
          the maximum scan time for shrinking some objects
      
      The output looks like below:
        $ cat /proc/fs/ext4/sda1/es_shrinker_info
        stats:
          28228 objects
          6341 reclaimable objects
          5281/631 cache hits/misses
          586 ms last sorted interval
          250 inodes on lru list
        average:
          153 us scan time
          128 shrunk objects
        maximum:
          255 inode (255 objects, 198 reclaimable)
          125723 us max scan time
      
      If the lru list has never been sorted, the following line will not be
      printed:
          586ms last sorted interval
      If there is an empty lru list, the following lines also will not be
      printed:
          250 inodes on lru list
        ...
        maximum:
          255 inode (255 objects, 198 reclaimable)
          0 us max scan time
      
      Meanwhile in this commit a new trace point is defined to print some
      details in __ext4_es_shrink().
      
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jan Kara <jack@suse.cz>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      eb68d0e2
    • Z
      ext4: improve extents status tree trace point · e963bb1d
      Zheng Liu 提交于
      This commit improves the trace point of extents status tree.  We rename
      trace_ext4_es_shrink_enter in ext4_es_count() because it is also used
      in ext4_es_scan() and we can not identify them from the result.
      
      Further this commit fixes a variable name in trace point in order to
      keep consistency with others.
      
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jan Kara <jack@suse.cz>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      e963bb1d
  13. 13 8月, 2014 1 次提交
  14. 08 8月, 2014 1 次提交
  15. 07 8月, 2014 2 次提交
    • M
      mm: pagemap: avoid unnecessary overhead when tracepoints are deactivated · 24b7e581
      Mel Gorman 提交于
      This was formerly the series "Improve sequential read throughput" which
      noted some major differences in performance of tiobench since 3.0.
      While there are a number of factors, two that dominated were the
      introduction of the fair zone allocation policy and changes to CFQ.
      
      The behaviour of fair zone allocation policy makes more sense than
      tiobench as a benchmark and CFQ defaults were not changed due to
      insufficient benchmarking.
      
      This series is what's left.  It's one functional fix to the fair zone
      allocation policy when used on NUMA machines and a reduction of overhead
      in general.  tiobench was used for the comparison despite its flaws as
      an IO benchmark as in this case we are primarily interested in the
      overhead of page allocator and page reclaim activity.
      
      On UMA, it makes little difference to overhead
      
                3.16.0-rc3   3.16.0-rc3
                   vanilla lowercost-v5
      User          383.61      386.77
      System        403.83      401.74
      Elapsed      5411.50     5413.11
      
      On a 4-socket NUMA machine it's a bit more noticable
      
                3.16.0-rc3   3.16.0-rc3
                   vanilla lowercost-v5
      User          746.94      802.00
      System      65336.22    40852.33
      Elapsed     27553.52    27368.46
      
      This patch (of 6):
      
      The LRU insertion and activate tracepoints take PFN as a parameter
      forcing the overhead to the caller.  Move the overhead to the tracepoint
      fast-assign method to ensure the cost is only incurred when the
      tracepoint is active.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      24b7e581
    • M
      mm tracing: tell mm_migrate_pages event about numa_misplaced · 4a262d26
      Max Asbock 提交于
      The mm_migrate_pages trace event reports a reason for the migration,
      typically as a symbolic string.  The exception is the reason
      MR_NUMA_MISPLACED for which it just displays the numeric value:
      mm_migrate_pages: nr_succeeded=1 nr_failed=0 mode=MIGRATE_ASYNC
      reason=0x5
      
      This patch makes the output consistent by introducing a string value for
      MR_NUMA_MISPLACED.  The event is then reported as: mm_migrate_pages:
      nr_succeeded=1 nr_failed=0 mode=MIGRATE_ASYNC reason=numa_misplaced
      Signed-off-by: NMax Asbock <masbock@linux.vnet.ibm.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4a262d26
  16. 06 8月, 2014 1 次提交
    • P
      KVM: Move more code under CONFIG_HAVE_KVM_IRQFD · c77dcacb
      Paolo Bonzini 提交于
      Commits e4d57e1e (KVM: Move irq notifier implementation into
      eventfd.c, 2014-06-30) included the irq notifier code unconditionally
      in eventfd.c, while it was under CONFIG_HAVE_KVM_IRQCHIP before.
      
      Similarly, commit 297e2105 (KVM: Give IRQFD its own separate enabling
      Kconfig option, 2014-06-30) moved code from CONFIG_HAVE_IRQ_ROUTING
      to CONFIG_HAVE_KVM_IRQFD but forgot to move the pieces that used to be
      under CONFIG_HAVE_KVM_IRQCHIP.
      
      Together, this broke compilation without CONFIG_KVM_XICS.  Fix by adding
      or changing the #ifdefs so that they point at CONFIG_HAVE_KVM_IRQFD.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c77dcacb
  17. 05 8月, 2014 2 次提交
  18. 02 8月, 2014 1 次提交
  19. 31 7月, 2014 2 次提交
  20. 29 7月, 2014 3 次提交
  21. 09 7月, 2014 1 次提交
    • M
      fence: dma-buf cross-device synchronization (v18) · e941759c
      Maarten Lankhorst 提交于
      A fence can be attached to a buffer which is being filled or consumed
      by hw, to allow userspace to pass the buffer without waiting to another
      device.  For example, userspace can call page_flip ioctl to display the
      next frame of graphics after kicking the GPU but while the GPU is still
      rendering.  The display device sharing the buffer with the GPU would
      attach a callback to get notified when the GPU's rendering-complete IRQ
      fires, to update the scan-out address of the display, without having to
      wake up userspace.
      
      A driver must allocate a fence context for each execution ring that can
      run in parallel. The function for this takes an argument with how many
      contexts to allocate:
        + fence_context_alloc()
      
      A fence is transient, one-shot deal.  It is allocated and attached
      to one or more dma-buf's.  When the one that attached it is done, with
      the pending operation, it can signal the fence:
        + fence_signal()
      
      To have a rough approximation whether a fence is fired, call:
        + fence_is_signaled()
      
      The dma-buf-mgr handles tracking, and waiting on, the fences associated
      with a dma-buf.
      
      The one pending on the fence can add an async callback:
        + fence_add_callback()
      
      The callback can optionally be cancelled with:
        + fence_remove_callback()
      
      To wait synchronously, optionally with a timeout:
        + fence_wait()
        + fence_wait_timeout()
      
      When emitting a fence, call:
        + trace_fence_emit()
      
      To annotate that a fence is blocking on another fence, call:
        + trace_fence_annotate_wait_on(fence, on_fence)
      
      A default software-only implementation is provided, which can be used
      by drivers attaching a fence to a buffer when they have no other means
      for hw sync.  But a memory backed fence is also envisioned, because it
      is common that GPU's can write to, or poll on some memory location for
      synchronization.  For example:
      
        fence = custom_get_fence(...);
        if ((seqno_fence = to_seqno_fence(fence)) != NULL) {
          dma_buf *fence_buf = seqno_fence->sync_buf;
          get_dma_buf(fence_buf);
      
          ... tell the hw the memory location to wait ...
          custom_wait_on(fence_buf, seqno_fence->seqno_ofs, fence->seqno);
        } else {
          /* fall-back to sw sync * /
          fence_add_callback(fence, my_cb);
        }
      
      On SoC platforms, if some other hw mechanism is provided for synchronizing
      between IP blocks, it could be supported as an alternate implementation
      with it's own fence ops in a similar way.
      
      enable_signaling callback is used to provide sw signaling in case a cpu
      waiter is requested or no compatible hardware signaling could be used.
      
      The intention is to provide a userspace interface (presumably via eventfd)
      later, to be used in conjunction with dma-buf's mmap support for sw access
      to buffers (or for userspace apps that would prefer to do their own
      synchronization).
      
      v1: Original
      v2: After discussion w/ danvet and mlankhorst on #dri-devel, we decided
          that dma-fence didn't need to care about the sw->hw signaling path
          (it can be handled same as sw->sw case), and therefore the fence->ops
          can be simplified and more handled in the core.  So remove the signal,
          add_callback, cancel_callback, and wait ops, and replace with a simple
          enable_signaling() op which can be used to inform a fence supporting
          hw->hw signaling that one or more devices which do not support hw
          signaling are waiting (and therefore it should enable an irq or do
          whatever is necessary in order that the CPU is notified when the
          fence is passed).
      v3: Fix locking fail in attach_fence() and get_fence()
      v4: Remove tie-in w/ dma-buf..  after discussion w/ danvet and mlankorst
          we decided that we need to be able to attach one fence to N dma-buf's,
          so using the list_head in dma-fence struct would be problematic.
      v5: [ Maarten Lankhorst ] Updated for dma-bikeshed-fence and dma-buf-manager.
      v6: [ Maarten Lankhorst ] I removed dma_fence_cancel_callback and some comments
          about checking if fence fired or not. This is broken by design.
          waitqueue_active during destruction is now fatal, since the signaller
          should be holding a reference in enable_signalling until it signalled
          the fence. Pass the original dma_fence_cb along, and call __remove_wait
          in the dma_fence_callback handler, so that no cleanup needs to be
          performed.
      v7: [ Maarten Lankhorst ] Set cb->func and only enable sw signaling if
          fence wasn't signaled yet, for example for hardware fences that may
          choose to signal blindly.
      v8: [ Maarten Lankhorst ] Tons of tiny fixes, moved __dma_fence_init to
          header and fixed include mess. dma-fence.h now includes dma-buf.h
          All members are now initialized, so kmalloc can be used for
          allocating a dma-fence. More documentation added.
      v9: Change compiler bitfields to flags, change return type of
          enable_signaling to bool. Rework dma_fence_wait. Added
          dma_fence_is_signaled and dma_fence_wait_timeout.
          s/dma// and change exports to non GPL. Added fence_is_signaled and
          fence_enable_sw_signaling calls, add ability to override default
          wait operation.
      v10: remove event_queue, use a custom list, export try_to_wake_up from
          scheduler. Remove fence lock and use a global spinlock instead,
          this should hopefully remove all the locking headaches I was having
          on trying to implement this. enable_signaling is called with this
          lock held.
      v11:
          Use atomic ops for flags, lifting the need for some spin_lock_irqsaves.
          However I kept the guarantee that after fence_signal returns, it is
          guaranteed that enable_signaling has either been called to completion,
          or will not be called any more.
      
          Add contexts and seqno to base fence implementation. This allows you
          to wait for less fences, by testing for seqno + signaled, and then only
          wait on the later fence.
      
          Add FENCE_TRACE, FENCE_WARN, and FENCE_ERR. This makes debugging easier.
          An CONFIG_DEBUG_FENCE will be added to turn off the FENCE_TRACE
          spam, and another runtime option can turn it off at runtime.
      v12:
          Add CONFIG_FENCE_TRACE. Add missing documentation for the fence->context
          and fence->seqno members.
      v13:
          Fixup CONFIG_FENCE_TRACE kconfig description.
          Move fence_context_alloc to fence.
          Simplify fence_later.
          Kill priv member to fence_cb.
      v14:
          Remove priv argument from fence_add_callback, oops!
      v15:
          Remove priv from documentation.
          Explicitly include linux/atomic.h.
      v16:
          Add trace events.
          Import changes required by android syncpoints.
      v17:
          Use wake_up_state instead of try_to_wake_up. (Colin Cross)
          Fix up commit description for seqno_fence. (Rob Clark)
      v18:
          Rename release_fence to fence_release.
          Move to drivers/dma-buf/.
          Rename __fence_is_signaled and __fence_signal to *_locked.
          Rename __fence_init to fence_init.
          Make fence_default_wait return a signed long, and fix wait ops too.
      Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@canonical.com>
      Signed-off-by: Thierry Reding <thierry.reding@gmail.com> #use smp_mb__before_atomic()
      Acked-by: NSumit Semwal <sumit.semwal@linaro.org>
      Acked-by: NDaniel Vetter <daniel@ffwll.ch>
      Reviewed-by: NRob Clark <robdclark@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e941759c
  22. 24 6月, 2014 1 次提交
  23. 22 6月, 2014 1 次提交
  24. 11 6月, 2014 1 次提交
    • T
      PM / sleep: trace events for device PM callbacks · e8bca479
      Todd E Brandt 提交于
      Adds two trace events which supply the same info that initcall_debug
      provides, but via ftrace instead of dmesg. The existing initcall_debug
      calls require the pm_print_times_enabled var to be set (either via
      sysfs or via the kernel cmd line). The new trace events provide all the
      same info as the initcall_debug prints but with less overhead, and also
      with coverage of device prepare and complete device callbacks.
      
      These events replace the device_pm_report_time event (which has been
      removed). device_pm_callback_start is called first and provides the device
      and callback info. device_pm_callback_end is called after with the
      device name and error info. The time and pid are gathered from the trace
      data headers.
      Signed-off-by: NTodd Brandt <todd.e.brandt@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      e8bca479