1. 11 6月, 2009 14 次提交
  2. 10 6月, 2009 19 次提交
    • L
      tracing/events: convert block trace points to TRACE_EVENT(), fix !CONFIG_BLOCK · f1db457c
      Li Zefan 提交于
      Fix building failures when CONFIG_BLOCK == n.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      LKML-Reference: <4A2F1520.8020003@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f1db457c
    • B
      spinlock: Add missing __raw_spin_lock_flags() stub for UP · 04dce7d9
      Benjamin Herrenschmidt 提交于
      This was only defined with CONFIG_DEBUG_SPINLOCK set, but some
      obscure arch/powerpc code wants it always.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      04dce7d9
    • M
      KVM: protect assigned dev workqueue, int handler and irq acker · 547de29e
      Marcelo Tosatti 提交于
      kvm_assigned_dev_ack_irq is vulnerable to a race condition with the
      interrupt handler function. It does:
      
              if (dev->host_irq_disabled) {
                      enable_irq(dev->host_irq);
                      dev->host_irq_disabled = false;
              }
      
      If an interrupt triggers before the host->dev_irq_disabled assignment,
      it will disable the interrupt and set dev->host_irq_disabled to true.
      
      On return to kvm_assigned_dev_ack_irq, dev->host_irq_disabled is set to
      false, and the next kvm_assigned_dev_ack_irq call will fail to reenable
      it.
      
      Other than that, having the interrupt handler and work handlers run in
      parallel sounds like asking for trouble (could not spot any obvious
      problem, but better not have to, its fragile).
      
      CC: sheng.yang@intel.com
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      547de29e
    • M
      KVM: use smp_send_reschedule in kvm_vcpu_kick · 32f88400
      Marcelo Tosatti 提交于
      KVM uses a function call IPI to cause the exit of a guest running on a
      physical cpu. For virtual interrupt notification there is no need to
      wait on IPI receival, or to execute any function.
      
      This is exactly what the reschedule IPI does, without the overhead
      of function IPI. So use it instead of smp_call_function_single in
      kvm_vcpu_kick.
      
      Also change the "guest_mode" variable to a bit in vcpu->requests, and
      use that to collapse multiple IPI's that would be issued between the
      first one and zeroing of guest mode.
      
      This allows kvm_vcpu_kick to called with interrupts disabled.
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      32f88400
    • S
      KVM: Enable snooping control for supported hardware · 522c68c4
      Sheng Yang 提交于
      Memory aliases with different memory type is a problem for guest. For the guest
      without assigned device, the memory type of guest memory would always been the
      same as host(WB); but for the assigned device, some part of memory may be used
      as DMA and then set to uncacheable memory type(UC/WC), which would be a conflict of
      host memory type then be a potential issue.
      
      Snooping control can guarantee the cache correctness of memory go through the
      DMA engine of VT-d.
      
      [avi: fix build on ia64]
      Signed-off-by: NSheng Yang <sheng@linux.intel.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      522c68c4
    • N
      KVM: Make kvm header C++ friendly · 2f8b9ee1
      nathan binkert 提交于
      Two things needed fixing: 1) g++ does not allow a named structure type
      within an anonymous union and 2) Avoid name clash between two padding
      fields within the same struct by giving them different names as is
      done elsewhere in the header.
      Signed-off-by: NNathan Binkert <nate@binkert.org>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      2f8b9ee1
    • G
      KVM: Fix interrupt unhalting a vcpu when it shouldn't · 78646121
      Gleb Natapov 提交于
      kvm_vcpu_block() unhalts vpu on an interrupt/timer without checking
      if interrupt window is actually opened.
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      78646121
    • S
      KVM: Device assignment framework rework · e56d532f
      Sheng Yang 提交于
      After discussion with Marcelo, we decided to rework device assignment framework
      together. The old problems are kernel logic is unnecessary complex. So Marcelo
      suggest to split it into a more elegant way:
      
      1. Split host IRQ assign and guest IRQ assign. And userspace determine the
      combination. Also discard msi2intx parameter, userspace can specific
      KVM_DEV_IRQ_HOST_MSI | KVM_DEV_IRQ_GUEST_INTX in assigned_irq->flags to
      enable MSI to INTx convertion.
      
      2. Split assign IRQ and deassign IRQ. Import two new ioctls:
      KVM_ASSIGN_DEV_IRQ and KVM_DEASSIGN_DEV_IRQ.
      
      This patch also fixed the reversed _IOR vs _IOW in definition(by deprecated the
      old interface).
      
      [avi: replace homemade bitcount() by hweight_long()]
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NSheng Yang <sheng@linux.intel.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      e56d532f
    • G
      KVM: APIC: get rid of deliver_bitmask · 58c2dde1
      Gleb Natapov 提交于
      Deliver interrupt during destination matching loop.
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      Acked-by: NXiantao Zhang <xiantao.zhang@intel.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      58c2dde1
    • G
      KVM: consolidate ioapic/ipi interrupt delivery logic · 343f94fe
      Gleb Natapov 提交于
      Use kvm_apic_match_dest() in kvm_get_intr_delivery_bitmask() instead
      of duplicating the same code. Use kvm_get_intr_delivery_bitmask() in
      apic_send_ipi() to figure out ipi destination instead of reimplementing
      the logic.
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      343f94fe
    • G
      KVM: ioapic/msi interrupt delivery consolidation · a53c17d2
      Gleb Natapov 提交于
      ioapic_deliver() and kvm_set_msi() have code duplication. Move
      the code into ioapic_deliver_entry() function and call it from
      both places.
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      a53c17d2
    • C
      KVM: declare ioapic functions only on affected hardware · b95b51d5
      Christian Borntraeger 提交于
      Since "KVM: Unify the delivery of IOAPIC and MSI interrupts"
      I get the following warnings:
      
        CC [M]  arch/s390/kvm/kvm-s390.o
      In file included from arch/s390/kvm/kvm-s390.c:22:
      include/linux/kvm_host.h:357: warning: 'struct kvm_ioapic' declared inside parameter list
      include/linux/kvm_host.h:357: warning: its scope is only this definition or declaration, which is probably not what you want
      
      This patch limits IOAPIC functions for architectures that have one.
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      b95b51d5
    • S
      KVM: Enable MSI-X for KVM assigned device · d510d6cc
      Sheng Yang 提交于
      This patch finally enable MSI-X.
      
      What we need for MSI-X:
      1. Intercept one page in MMIO region of device. So that we can get guest desired
      MSI-X table and set up the real one. Now this have been done by guest, and
      transfer to kernel using ioctl KVM_SET_MSIX_NR and KVM_SET_MSIX_ENTRY.
      
      2. Information for incoming interrupt. Now one device can have more than one
      interrupt, and they are all handled by one workqueue structure. So we need to
      identify them. The previous patch enable gsi_msg_pending_bitmap get this done.
      
      3. Mapping from host IRQ to guest gsi as well as guest gsi to real MSI/MSI-X
      message address/data. We used same entry number for the host and guest here, so
      that it's easy to find the correlated guest gsi.
      
      What we lack for now:
      1. The PCI spec said nothing can existed with MSI-X table in the same page of
      MMIO region, except pending bits. The patch ignore pending bits as the first
      step (so they are always 0 - no pending).
      
      2. The PCI spec allowed to change MSI-X table dynamically. That means, the OS
      can enable MSI-X, then mask one MSI-X entry, modify it, and unmask it. The patch
      didn't support this, and Linux also don't work in this way.
      
      3. The patch didn't implement MSI-X mask all and mask single entry. I would
      implement the former in driver/pci/msi.c later. And for single entry, userspace
      should have reposibility to handle it.
      Signed-off-by: NSheng Yang <sheng@linux.intel.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      d510d6cc
    • S
      KVM: Add MSI-X interrupt injection logic · 2350bd1f
      Sheng Yang 提交于
      We have to handle more than one interrupt with one handler for MSI-X. Avi
      suggested to use a flag to indicate the pending. So here is it.
      Signed-off-by: NSheng Yang <sheng@linux.intel.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      2350bd1f
    • S
      KVM: Ioctls for init MSI-X entry · c1e01514
      Sheng Yang 提交于
      Introduce KVM_SET_MSIX_NR and KVM_SET_MSIX_ENTRY two ioctls.
      
      This two ioctls are used by userspace to specific guest device MSI-X entry
      number and correlate MSI-X entry with GSI during the initialization stage.
      
      MSI-X should be well initialzed before enabling.
      
      Don't support change MSI-X entry number for now.
      Signed-off-by: NSheng Yang <sheng@linux.intel.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      c1e01514
    • S
      KVM: Unify the delivery of IOAPIC and MSI interrupts · 116191b6
      Sheng Yang 提交于
      Signed-off-by: NSheng Yang <sheng@linux.intel.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      116191b6
    • S
      KVM: Split IOAPIC structure · cf9e4e15
      Sheng Yang 提交于
      Prepared for reuse ioapic_redir_entry for MSI.
      Signed-off-by: NSheng Yang <sheng@linux.intel.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      cf9e4e15
    • S
      tracing: add trace_seq_vprint interface · 725c624a
      Steven Rostedt 提交于
      The code to update the print formats for events requires a vprintf
      format in the trace_seq. This patch adds that interface.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      725c624a
    • L
      tracing/events: convert block trace points to TRACE_EVENT() · 55782138
      Li Zefan 提交于
      TRACE_EVENT is a more generic way to define tracepoints. Doing so adds
      these new capabilities to this tracepoint:
      
        - zero-copy and per-cpu splice() tracing
        - binary tracing without printf overhead
        - structured logging records exposed under /debug/tracing/events
        - trace events embedded in function tracer output and other plugins
        - user-defined, per tracepoint filter expressions
        ...
      
      Cons:
      
        - no dev_t info for the output of plug, unplug_timer and unplug_io events.
          no dev_t info for getrq and sleeprq events if bio == NULL.
          no dev_t info for rq_abort,...,rq_requeue events if rq->rq_disk == NULL.
      
          This is mainly because we can't get the deivce from a request queue.
          But this may change in the future.
      
        - A packet command is converted to a string in TP_assign, not TP_print.
          While blktrace do the convertion just before output.
      
          Since pc requests should be rather rare, this is not a big issue.
      
        - In blktrace, an event can have 2 different print formats, but a TRACE_EVENT
          has a unique format, which means we have some unused data in a trace entry.
      
          The overhead is minimized by using __dynamic_array() instead of __array().
      
      I've benchmarked the ioctl blktrace vs the splice based TRACE_EVENT tracing:
      
            dd                   dd + ioctl blktrace       dd + TRACE_EVENT (splice)
      1     7.36s, 42.7 MB/s     7.50s, 42.0 MB/s          7.41s, 42.5 MB/s
      2     7.43s, 42.3 MB/s     7.48s, 42.1 MB/s          7.43s, 42.4 MB/s
      3     7.38s, 42.6 MB/s     7.45s, 42.2 MB/s          7.41s, 42.5 MB/s
      
      So the overhead of tracing is very small, and no regression when using
      those trace events vs blktrace.
      
      And the binary output of TRACE_EVENT is much smaller than blktrace:
      
       # ls -l -h
       -rw-r--r-- 1 root root 8.8M 06-09 13:24 sda.blktrace.0
       -rw-r--r-- 1 root root 195K 06-09 13:24 sda.blktrace.1
       -rw-r--r-- 1 root root 2.7M 06-09 13:25 trace_splice.out
      
      Following are some comparisons between TRACE_EVENT and blktrace:
      
      plug:
        kjournald-480   [000]   303.084981: block_plug: [kjournald]
        kjournald-480   [000]   303.084981:   8,0    P   N [kjournald]
      
      unplug_io:
        kblockd/0-118   [000]   300.052973: block_unplug_io: [kblockd/0] 1
        kblockd/0-118   [000]   300.052974:   8,0    U   N [kblockd/0] 1
      
      remap:
        kjournald-480   [000]   303.085042: block_remap: 8,0 W 102736992 + 8 <- (8,8) 33384
        kjournald-480   [000]   303.085043:   8,0    A   W 102736992 + 8 <- (8,8) 33384
      
      bio_backmerge:
        kjournald-480   [000]   303.085086: block_bio_backmerge: 8,0 W 102737032 + 8 [kjournald]
        kjournald-480   [000]   303.085086:   8,0    M   W 102737032 + 8 [kjournald]
      
      getrq:
        kjournald-480   [000]   303.084974: block_getrq: 8,0 W 102736984 + 8 [kjournald]
        kjournald-480   [000]   303.084975:   8,0    G   W 102736984 + 8 [kjournald]
      
        bash-2066  [001]  1072.953770:   8,0    G   N [bash]
        bash-2066  [001]  1072.953773: block_getrq: 0,0 N 0 + 0 [bash]
      
      rq_complete:
        konsole-2065  [001]   300.053184: block_rq_complete: 8,0 W () 103669040 + 16 [0]
        konsole-2065  [001]   300.053191:   8,0    C   W 103669040 + 16 [0]
      
        ksoftirqd/1-7   [001]  1072.953811:   8,0    C   N (5a 00 08 00 00 00 00 00 24 00) [0]
        ksoftirqd/1-7   [001]  1072.953813: block_rq_complete: 0,0 N (5a 00 08 00 00 00 00 00 24 00) 0 + 0 [0]
      
      rq_insert:
        kjournald-480   [000]   303.084985: block_rq_insert: 8,0 W 0 () 102736984 + 8 [kjournald]
        kjournald-480   [000]   303.084986:   8,0    I   W 102736984 + 8 [kjournald]
      
      Changelog from v2 -> v3:
      
      - use the newly introduced __dynamic_array().
      
      Changelog from v1 -> v2:
      
      - use __string() instead of __array() to minimize the memory required
        to store hex dump of rq->cmd().
      
      - support large pc requests.
      
      - add missing blk_fill_rwbs_rq() in block_rq_requeue TRACE_EVENT.
      
      - some cleanups.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      LKML-Reference: <4A2DF669.5070905@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      55782138
  3. 09 6月, 2009 2 次提交
    • Y
      cpumask: introduce zalloc_cpumask_var · 0281b5dc
      Yinghai Lu 提交于
      So can get cpumask_var with cpumask_clear
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      0281b5dc
    • P
      ring-buffer: pass in lockdep class key for reader_lock · 1f8a6a10
      Peter Zijlstra 提交于
      On Sun, 7 Jun 2009, Ingo Molnar wrote:
      > Testing tracer sched_switch: <6>Starting ring buffer hammer
      > PASSED
      > Testing tracer sysprof: PASSED
      > Testing tracer function: PASSED
      > Testing tracer irqsoff:
      > =============================================
      > PASSED
      > Testing tracer preemptoff: PASSED
      > Testing tracer preemptirqsoff: [ INFO: possible recursive locking detected ]
      > PASSED
      > Testing tracer branch: 2.6.30-rc8-tip-01972-ge5b9078-dirty #5760
      > ---------------------------------------------
      > rb_consumer/431 is trying to acquire lock:
      >  (&cpu_buffer->reader_lock){......}, at: [<c109eef7>] ring_buffer_reset_cpu+0x37/0x70
      >
      > but task is already holding lock:
      >  (&cpu_buffer->reader_lock){......}, at: [<c10a019e>] ring_buffer_consume+0x7e/0xc0
      >
      > other info that might help us debug this:
      > 1 lock held by rb_consumer/431:
      >  #0:  (&cpu_buffer->reader_lock){......}, at: [<c10a019e>] ring_buffer_consume+0x7e/0xc0
      
      The ring buffer is a generic structure, and can be used outside of
      ftrace. If ftrace traces within the use of the ring buffer, it can produce
      false positives with lockdep.
      
      This patch passes in a static lock key into the allocation of the ring
      buffer, so that different ring buffers will have their own lock class.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1244477919.13761.9042.camel@twins>
      
      [ store key in ring buffer descriptor ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      1f8a6a10
  4. 07 6月, 2009 1 次提交
  5. 05 6月, 2009 1 次提交
    • O
      ptrace: tracehook_report_clone: fix false positives · 087eb437
      Oleg Nesterov 提交于
      The "trace || CLONE_PTRACE" check in tracehook_report_clone() is not right,
      
      - If the untraced task does clone(CLONE_PTRACE) the new child is not traced,
        we must not queue SIGSTOP.
      
      - If we forked the traced task, but the tracer exits and untraces both the
        forking task and the new child (after copy_process() drops tasklist_lock),
        we should not queue SIGSTOP too.
      
      Change the code to check task_ptrace() != 0 instead. This is still racy, but
      the race is harmless.
      
      We can race with another tracer attaching to this child, or the tracer can
      exit and detach in parallel. But giwen that we didn't do wake_up_new_task()
      yet, the child must have the pending SIGSTOP anyway.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NRoland McGrath <roland@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      087eb437
  6. 04 6月, 2009 1 次提交
  7. 03 6月, 2009 1 次提交
  8. 02 6月, 2009 1 次提交