1. 27 5月, 2011 1 次提交
    • D
      xen: cleancache shim to Xen Transcendent Memory · 5bc20fc5
      Dan Magenheimer 提交于
      This patch provides a shim between the kernel-internal cleancache
      API (see Documentation/mm/cleancache.txt) and the Xen Transcendent
      Memory ABI (see http://oss.oracle.com/projects/tmem).
      
      Xen tmem provides "hypervisor RAM" as an ephemeral page-oriented
      pseudo-RAM store for cleancache pages, shared cleancache pages,
      and frontswap pages.  Tmem provides enterprise-quality concurrency,
      full save/restore and live migration support, compression
      and deduplication.
      
      A presentation showing up to 8% faster performance and up to 52%
      reduction in sectors read on a kernel compile workload, despite
      aggressive in-kernel page reclamation ("self-ballooning") can be
      found at:
      
      http://oss.oracle.com/projects/tmem/dist/documentation/presentations/TranscendentMemoryXenSummit2010.pdfSigned-off-by: NDan Magenheimer <dan.magenheimer@oracle.com>
      Reviewed-by: NJeremy Fitzhardinge <jeremy@goop.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Rik Van Riel <riel@redhat.com>
      Cc: Jan Beulich <JBeulich@novell.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Andreas Dilger <adilger@sun.com>
      Cc: Ted Ts'o <tytso@mit.edu>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <joel.becker@oracle.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      5bc20fc5
  2. 26 5月, 2011 13 次提交
    • R
      powerpc/4xx: Adding PCIe MSI support · 3fb79338
      Rupjyoti Sarmah 提交于
      This patch adds MSI support for 440SPe, 460Ex, 460Sx and 405Ex.
      Signed-off-by: NRupjyoti Sarmah <rsarmah@apm.com>
      Signed-off-by: NTirumala R Marri <tmarri@apm.com>
      Acked-by: NJosh Boyer <jwboyer@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      3fb79338
    • M
      powerpc: Fix irq_free_virt by adjusting bounds before loop · 4dd60290
      Milton Miller 提交于
      Instead of looping over each irq and checking against the irq array
      bounds, adjust the bounds before looping.
      
      The old code will not free any irq if the irq + count is above
      irq_virq_count because the test in the loop is testing irq + count
      instead of irq + i.
      
      This code checks the limits to avoid unsigned integer overflows.
      Signed-off-by: NMilton Miller <miltonm@bga.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      4dd60290
    • M
      powerpc/irq: Protect irq_radix_revmap_lookup against irq_free_virt · 9b788251
      Milton Miller 提交于
      The radix-tree code uses call_rcu when freeing internal elements.
      We must protect against the elements being freed while we traverse
      the tree, even if the returned pointer will still be valid.
      
      While preparing a patch to expand the context in which
      irq_radix_revmap_lookup will be called, I realized that the
      radix tree was not locked.
      
      When asked
      
          For a normal call_rcu usage, is it allowed to read the structure in
          irq_enter / irq_exit, without additional rcu_read_lock?  Could an
          element freed with call_rcu advance with the cpu still between
          irq_enter/irq_exit (and irq_disabled())?
      
      Paul McKenney replied:
      
          Absolutely illegal to do so. OK for call_rcu_sched(), but a
          flaming bug for call_rcu().
      
          And thank you very much for finding this!!!
      
      Further analysis:
      
      In the current CONFIG_TREE_RCU implementation. CONFIG_TREE_PREEMPT_RCU
      (and CONFIG_TINY_PREEMPT_RCU) uses explicit counters.
      
      These counters are reflected from per-CPU to global in the
      scheduling-clock-interrupt handler, so disabling irq does prevent the
      grace period from completing. But there are real-time implementations
      (such as the one use by the Concurrent guys) where disabling irq
      does -not- prevent the grace period from completing.
      
      While an alternative fix would be to switch radix-tree to rcu_sched, I
      don't want to audit the other users of radix trees (nor put alternative
      freeing in the library).  The normal overhead for rcu_read_lock and
      unlock are a local counter increment and decrement.
      
      This does not show up in the rcu lockdep because in 2.6.34 commit
      2676a58c (radix-tree: Disable RCU lockdep checking in radix tree)
      deemed it too hard to pass the condition of the protecting lock
      to the library.
      Signed-off-by: NMilton Miller <miltonm@bga.com>
      Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      9b788251
    • M
      powerpc/irq: Check desc in handle_one_irq and expand generic_handle_irq · 2e455257
      Milton Miller 提交于
      Look up the descriptor and check that it is found in handle_one_irq
      before checking if we are on the irq stack, and call the handler
      directly using the descriptor if we are on the stack.
      
      We need check irq_to_desc finds the descriptor to avoid a NULL
      pointer dereference.  It could have failed because the number from
      ppc_md.get_irq was above NR_IRQS, or various exceptional conditions
      with sparse irqs (eg race conditions while freeing an irq if its was
      not shutdown in the controller).
      
      fe12bc2c (genirq: Uninline and sanity check generic_handle_irq())
      moved generic_handle_irq out of line to allow its use by interrupt
      controllers in modules.  However, handle_one_irq is core arch code.
      It already knows the details of struct irq_desc and handling irqs in
      the nested irq case.  This will avoid the extra stack frame to return
      the value we don't check.
      Signed-off-by: NMilton Miller <miltonm@bga.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      2e455257
    • M
      powerpc/irq: Always free duplicate IRQ_LEGACY hosts · 3d1b5e20
      Milton Miller 提交于
      Since kmem caches are allocated before init_IRQ as noted in 3af259d1
      (powerpc: Radix trees are available before init_IRQ), we now call
      kmalloc in all cases and can can always call kfree if we are asked
      to allocate a duplicate or conflicting IRQ_HOST_MAP_LEGACY host.
      Signed-off-by: NMilton Miller <miltonm@bga.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      3d1b5e20
    • M
      powerpc/irq: Remove stale and misleading comment · 8142f032
      Milton Miller 提交于
      The comment claims we will call host->ops->map() to update the flags if
      we find a previously established mapping, but we never did.  We used
      to call remap, but that call was removed in da051980 (powerpc: Remove
      irq_host_ops->remap hook).
      Signed-off-by: NMilton Miller <miltonm@bga.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8142f032
    • M
      powerpc/cell: Rename ipi functions to match current abstractions · d5a1c193
      Milton Miller 提交于
      Rename functions and arguments to reflect current usage.  iic_cause_ipi
      becomes iic_message_pass and iic_ipi_to_irq becomes iic_msg_to_irq,
      and iic_request_ipi now takes a message (msg) instead of an ipi number.
      Also mesg is renamed to msg.
      
      Commit f1072939 (powerpc: Remove checks for MSG_ALL and
      MSG_ALL_BUT_SELF) connected the smp_message_pass hook for cell to the
      underlying iic_cause_IPI, a platform unique name.  Later 23d72bfd
      (powerpc: Consolidate ipi message mux and demux) added a cause_ipi
      hook to the smp_ops, also used in message passing, but for controllers
      that can not send 4 unique messages and require multiplexing.  It is
      even more confusing that the both take two arguments, but one is the
      small message ordinal and the other is an opaque long data associated
      with the cpu.
      
      Since cell iic maps messages one to one to ipi irqs, rename the
      function and argument to translate from ipi to message.  Also make it
      clear that iic_request_ipi takes a message number as the argument
      for which ipi to create and request.
      
      No functionional change, just renames to avoid future confusion.
      Signed-off-by: NMilton Miller <miltonm@bga.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      d5a1c193
    • M
      powerpc/cell: Use common smp ipi actions · 7ef71d75
      Milton Miller 提交于
      The cell iic interrupt controller has enough software caused interrupts
      to use a unique interrupt for each of the 4 messages powerpc uses.
      This means each interrupt gets its own irq action/data combination.
      
      Use the seperate, optimized, arch common ipi action functions
      registered via the helper smp_request_message_ipi instead passing the
      message as action data to a single action that then demultipexes to
      the required acton via a switch statement.
      
      smp_request_message_ipi will register the action as IRQF_PER_CPU
      and IRQF_DISABLED, and WARN if the allocation fails for some reason,
      so no need to print on that failure.  It will return positive if
      the message will not be used by the kernel, in which case we can
      free the virq.
      
      In addition to elimiating inefficient code, this also corrects the
      error that a kernel built with kexec but without a debugger would
      not register the ipi for kdump to notify the other cpus of a crash.
      
      This also restores the debugger action to be static to kernel/smp.c.
      Signed-off-by: NMilton Miller <miltonm@bga.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      7ef71d75
    • B
      powerpc/pseries: Update MAX_HCALL_OPCODE to reflect page coalescing · ca193150
      Brian King 提交于
      When page coalescing support was added recently, the MAX_HCALL_OPCODE
      define was not updated for the newly added H_GET_MPP_X hcall.
      Signed-off-by: NBrian King <brking@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ca193150
    • E
      powerpc/oprofile: Handle events that raise an exception without overflowing · ad5d5292
      Eric B Munson 提交于
      Commit 0837e324 fixes a situation on POWER7
      where events can roll back if a specualtive event doesn't actually complete.
      This can raise a performance monitor exception.  We need to catch this to ensure
      that we reset the PMC.  In all cases the PMC will be less than 256 cycles from
      overflow.
      
      This patch lifts Anton's fix for the problem in perf and applies it to oprofile
      as well.
      Signed-off-by: NEric B Munson <emunson@mgebm.net>
      Cc: <stable@kernel.org> # as far back as it applies cleanly
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ad5d5292
    • I
      powerpc/ftrace: Implement raw syscall tracepoints on PowerPC · 02424d89
      Ian Munsie 提交于
      This patch implements the raw syscall tracepoints on PowerPC and exports
      them for ftrace syscalls to use.
      
      To minimise reworking existing code, I slightly re-ordered the thread
      info flags such that the new TIF_SYSCALL_TRACEPOINT bit would still fit
      within the 16 bits of the andi. instruction's UI field. The instructions
      in question are in /arch/powerpc/kernel/entry_{32,64}.S to and the
      _TIF_SYSCALL_T_OR_A with the thread flags to see if system call tracing
      is enabled.
      
      In the case of 64bit PowerPC, arch_syscall_addr and
      arch_syscall_match_sym_name are overridden to allow ftrace syscalls to
      work given the unusual system call table structure and symbol names that
      start with a period.
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      02424d89
    • C
      arch/tile: prefer "tilepro" as the name of the 32-bit architecture · 6738d321
      Chris Metcalf 提交于
      With this change, you can (and should) build with ARCH=tilepro for the
      current 32-bit chips.  Building with ARCH=tile continues to work, but
      we've renamed the defconfig file to tilepro_defconfig for consistency.
      Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
      6738d321
    • J
      i8k: Integrate with the hwmon subsystem · 949a9d70
      Jean Delvare 提交于
      Let i8k create an hwmon class device so that libsensors will expose
      the CPU temperature and fan speeds to monitoring applications.
      Signed-off-by: NJean Delvare <khali@linux-fr.org>
      Acked-by: NGuenter Roeck <guenter.roeck@ericsson.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Massimo Dal Zotto <dz@debian.org>
      949a9d70
  3. 25 5月, 2011 26 次提交