1. 09 3月, 2012 3 次提交
  2. 07 3月, 2012 7 次提交
  3. 27 2月, 2012 4 次提交
  4. 24 2月, 2012 7 次提交
  5. 23 2月, 2012 15 次提交
    • M
      powerpc/perf: Move perf core & PMU code into a subdirectory · f2699491
      Michael Ellerman 提交于
      The perf code has grown a lot since it started, and is big enough to
      warrant its own subdirectory. For reference it's ~60% bigger than the
      oprofile code. It declutters the kernel directory, makes it simpler to
      grep for "just perf stuff", and allows us to shorten some filenames.
      
      While we're at it, make it more obvious that we have two implementations
      of the core perf logic. One for (roughly) Book3S CPUs, which was the
      original implementation, and the other for Freescale embedded CPUs.
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      f2699491
    • M
      fadump: Remove the phyp assisted dump code. · 12d92992
      Mahesh Salgaonkar 提交于
      Remove the phyp assisted dump implementation which is not is use.
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      12d92992
    • M
      fadump: Invalidate the fadump registration during machine shutdown. · 67b43b9d
      Mahesh Salgaonkar 提交于
      If dump is active during system reboot, shutdown or halt then invalidate
      the fadump registration as it does not get invalidated automatically.
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      67b43b9d
    • M
      fadump: Invalidate registration and release reserved memory for general use. · b500afff
      Mahesh Salgaonkar 提交于
      This patch introduces an sysfs interface '/sys/kernel/fadump_release_mem' to
      invalidate the last fadump registration, invalidate '/proc/vmcore', release
      the reserved memory for general use and re-register for future kernel dump.
      Once the dump is copied to the disk, unlike phyp dump, the userspace tool
      can release all the memory reserved for dump with one single operation of
      echo 1 to '/sys/kernel/fadump_release_mem'.
      
      Release the reserved memory region excluding the size of the memory required
      for future kernel dump registration. And therefore, unlike kdump, Fadump
      doesn't need a 2nd reboot to get back the system to the production
      configuration.
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      b500afff
    • M
      fadump: Add PT_NOTE program header for vmcoreinfo · d34c5f26
      Mahesh Salgaonkar 提交于
      Introduce a PT_NOTE program header that points to physical address of
      vmcoreinfo_note buffer declared in kernel/kexec.c. The vmcoreinfo
      note buffer is populated during crash_fadump() at the time of system
      crash.
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      d34c5f26
    • M
      fadump: Convert firmware-assisted cpu state dump data into elf notes. · ebaeb5ae
      Mahesh Salgaonkar 提交于
      When registered for firmware assisted dump on powerpc, firmware preserves
      the registers for the active CPUs during a system crash. This patch reads
      the cpu register data stored in Firmware-assisted dump format (except for
      crashing cpu) and converts it into elf notes and updates the PT_NOTE program
      header accordingly. The exact register state for crashing cpu is saved to
      fadump crash info structure in scratch area during crash_fadump() and read
      during second kernel boot.
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ebaeb5ae
    • M
      fadump: Initialize elfcore header and add PT_LOAD program headers. · 2df173d9
      Mahesh Salgaonkar 提交于
      Build the crash memory range list by traversing through system memory during
      the first kernel before we register for firmware-assisted dump. After the
      successful dump registration, initialize the elfcore header and populate
      PT_LOAD program headers with crash memory ranges. The elfcore header is
      saved in the scratch area within the reserved memory. The scratch area starts
      at the end of the memory reserved for saving RMR region contents. The
      scratch area contains fadump crash info structure that contains magic number
      for fadump validation and physical address where the eflcore header can be
      found. This structure will also be used to pass some important crash info
      data to the second kernel which will help second kernel to populate ELF core
      header with correct data before it gets exported through /proc/vmcore. Since
      the firmware preserves the entire partition memory at the time of crash the
      contents of the scratch area will be preserved till second kernel boot.
      
      Since the memory dump exported through /proc/vmcore is in ELF format similar
      to kdump, it will help us to reuse the kdump infrastructure for dump capture
      and filtering. Unlike phyp dump, userspace tool does not need to refer any
      sysfs interface while reading /proc/vmcore.
      
      NOTE: The current design implementation does not address a possibility of
      introducing additional fields (in future) to this structure without affecting
      compatibility. It's on TODO list to come up with better approach to
      address this.
      
      Reserved dump area start => +-------------------------------------+
                                  |  CPU state dump data                |
                                  +-------------------------------------+
                                  |  HPTE region data                   |
                                  +-------------------------------------+
                                  |  RMR region data                    |
      Scratch area start       => +-------------------------------------+
                                  |  fadump crash info structure {      |
                                  |     magic nummber                   |
                           +------|---- elfcorehdr_addr                 |
                           |      |  }                                  |
                           +----> +-------------------------------------+
                                  |  ELF core header                    |
      Reserved dump area end   => +-------------------------------------+
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      2df173d9
    • M
      fadump: Register for firmware assisted dump. · 3ccc00a7
      Mahesh Salgaonkar 提交于
      On 2012-02-20 11:02:51 Mon, Paul Mackerras wrote:
      > On Thu, Feb 16, 2012 at 04:44:30PM +0530, Mahesh J Salgaonkar wrote:
      >
      > If I have read the code correctly, we are going to get this printk on
      > non-pSeries machines or on older pSeries machines, even if the user
      > has not put the fadump=on option on the kernel command line.  The
      > printk will be annoying since there is no actual error condition.  It
      > seems to me that the condition for the printk should include
      > fw_dump.fadump_enabled.  In other words you should probably add
      >
      > 	if (!fw_dump.fadump_enabled)
      > 		return 0;
      >
      > at the beginning of the function.
      
      Hi Paul,
      
      Thanks for pointing it out. Please find the updated patch below.
      
      The existing patches above this (4/10 through 10/10) cleanly applies
      on this update.
      
      Thanks,
      -Mahesh.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      3ccc00a7
    • M
      fadump: Reserve the memory for firmware assisted dump. · eb39c880
      Mahesh Salgaonkar 提交于
      Reserve the memory during early boot to preserve CPU state data, HPTE region
      and RMA (real mode area) region data in case of kernel crash. At the time of
      crash, powerpc firmware will store CPU state data, HPTE region data and move
      RMA region data to the reserved memory area.
      
      If the firmware-assisted dump fails to reserve the memory, then fallback
      to existing kexec-based kdump.
      
      Most of the code implementation to reserve memory has been
      adapted from phyp assisted dump implementation written by Linas Vepstas
      and Manish Ahuja
      
      This patch also introduces a config option CONFIG_FA_DUMP for firmware
      assisted dump feature on Powerpc (ppc64) architecture.
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      eb39c880
    • K
      powerpc/mpic: Remove duplicate MPIC_WANTS_RESET flag · e55d7f73
      Kyle Moffett 提交于
      There are two separate flags controlling whether or not the MPIC is
      reset during initialization, which is completely unnecessary, and only
      one of them can be specified in the device tree.
      
      Also, most platforms in-tree right now do actually want to reset the
      MPIC during initialization anyways, which means lots of duplicate code
      passing the MPIC_WANTS_RESET flag.
      
      Fix all of the callers which currently do not pass the MPIC_WANTS_RESET
      flag to pass the MPIC_NO_RESET flag, then remove the MPIC_WANTS_RESET
      flag and make the code reset the MPIC by default.
      Signed-off-by: NKyle Moffett <Kyle.D.Moffett@boeing.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      e55d7f73
    • K
      powerpc/mpic: Add "last-interrupt-source" property to override hardware · c1b8d45d
      Kyle Moffett 提交于
      The FreeScale PowerQUICC-III-compatible (mpc85xx/mpc86xx) MPICs do not
      correctly report the number of hardware interrupt sources, so software
      needs to override the detected value with "256".
      
      To avoid needing to write custom board-specific code to detect that
      scenario, allow it to be easily overridden in the device-tree.
      Signed-off-by: NKyle Moffett <Kyle.D.Moffett@boeing.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      c1b8d45d
    • K
      powerpc/mpic: Remove MPIC_BROKEN_FRR_NIRQS and duplicate irq_count · 5019609f
      Kyle Moffett 提交于
      The mpic->irq_count variable is only used as a software error-checking
      limit to determine whether or not an IRQ number is valid.  In board code
      which does not manually specify an IRQ count to mpic_alloc(), i.e. 0, it
      is automatically detected from the number of ISUs and the ISU size.
      
      In practice, all hardware ends up with irq_count == num_sources, so all
      of the runtime checks on mpic->irq_count should just check the value of
      mpic->num_sources instead.
      
      When platform hardware does not correctly report the number of IRQs,
      which only happens on the MPC85xx/MPC86xx, the MPIC_BROKEN_FRR_NIRQS
      flag is used to override the detected value of num_sources with the
      manual irq_count parameter.  Since there's no need to manually specify
      the number of IRQs except in this case, the extra flag can be eliminated
      and the test changed to "irq_count != 0".
      Signed-off-by: NKyle Moffett <Kyle.D.Moffett@boeing.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      5019609f
    • K
      fsl/mpic: Create and document the "single-cpu-affinity" device-tree flag · 9ca163c8
      Kyle Moffett 提交于
      The Freescale MPIC (and perhaps others in the future) is incapable of
      routing non-IPI interrupts to more than once CPU at a time.  Currently
      all of the Freescale boards msut pass the MPIC_SINGLE_DEST_CPU flag to
      mpic_alloc(), but that information should really be present in the
      device-tree.
      
      Older board code can't rely on the device-tree having the property set,
      but newer platforms won't need it manually specified in the code.
      
      [BenH: Remove unrelated changes, folded in a different patch]
      Signed-off-by: NKyle Moffett <Kyle.D.Moffett@boeing.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      9ca163c8
    • K
      fsl/mpic: Document and use the "big-endian" device-tree flag · 98cca250
      Kyle Moffett 提交于
      The MPIC code checks for a "big-endian" property and sets the flag
      MPIC_BIG_ENDIAN if one is present, although prior to the "mpic->flags"
      fixup that would never have worked anways.
      
      Unfortunately, even now that it works properly, the Freescale mpic
      device-node (the "PowerQUICC-III"-compatible one) does not specify it,
      so all of the board ports need to manually pass it to mpic_alloc().
      
      Document the flag and add it to the pq3 device tree.  Existing code will
      still need to pass the MPIC_BIG_ENDIAN flag because their dtb may not
      have this property, but new platforms shouldn't need to do so.
      Signed-off-by: NKyle Moffett <Kyle.D.Moffett@boeing.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      98cca250
    • K
      powerpc/mpic: Fix use of "flags" variable in mpic_alloc() · 3a7a7176
      Kyle Moffett 提交于
      The mpic_alloc() function takes a "flags" parameter and assigns it into
      the mpic->flags variable fairly early on, but several later pieces of
      code detect various device-tree properties and save them into the
      "mpic->flags" variable (EG: "big-endian" => MPIC_BIG_ENDIAN).
      
      Unfortunately, a number of codepaths (including several which test the
      flag MPIC_BIG_ENDIAN!) test "flags" instead of "mpic->flags", and get
      wrong answers as a result.
      
      Consolidate the device-tree flag tests early in mpic_alloc() and change
      all of the checks after "mpic->flags" is init'ed to use "mpic->flags".
      
      [BenH: Fixed up use of mpic->node before it's initialized]
      Signed-off-by: NKyle Moffett <Kyle.D.Moffett@boeing.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      3a7a7176
  6. 22 2月, 2012 4 次提交
    • B
      powerpc: Fix various issues with return to userspace · 18b246fa
      Benjamin Herrenschmidt 提交于
      We have a few problems when returning to userspace. This is a
      quick set of fixes for 3.3, I'll look into a more comprehensive
      rework for 3.4. This fixes:
      
       - We kept interrupts soft-disabled when schedule'ing or calling
      do_signal when returning to userspace as a result of a hardware
      interrupt.
      
       - Rename do_signal to do_notify_resume like all other archs (and
      do_signal_pending back to do_signal, which it was before Roland
      changed it).
      
       - Add the missing call to key_replace_session_keyring() to
      do_notify_resume().
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ---
      18b246fa
    • M
      powerpc: Fix program check handling when lockdep is enabled · 922b9f86
      Michael Ellerman 提交于
      In commit 54321242 ("Disable interrupts early in Program Check"), we
      switched from enabling to disabling interrupts in program_check_common.
      
      Whereas ENABLE_INTS leaves r3 untouched, if lockdep is enabled DISABLE_INTS
      calls into lockdep code and will clobber r3. That means we pass a bogus
      struct pt_regs* into program_check_exception() and all hell breaks loose.
      
      So load our regs pointer into r3 after we call DISABLE_INTS.
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      922b9f86
    • R
      powerpc: Remove references to cpu_*_map · 07d2f1a5
      Rusty Russell 提交于
      This has been obsolescent for a while; time for the final push.
      
      In adjacent context, replaced old cpus_* with cpumask_*.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      07d2f1a5
    • L
      sys_poll: fix incorrect type for 'timeout' parameter · faf30900
      Linus Torvalds 提交于
      The 'poll()' system call timeout parameter is supposed to be 'int', not
      'long'.
      
      Now, the reason this matters is that right now 32-bit compat mode is
      broken on at least x86-64, because the 32-bit code just calls
      'sys_poll()' directly on x86-64, and the 32-bit argument will have been
      zero-extended, turning a signed 'int' into a large unsigned 'long'
      value.
      
      We could just introduce a 'compat_sys_poll()' function for this, and
      that may eventually be what we have to do, but since the actual standard
      poll() semantics is *supposed* to be 'int', and since at least on x86-64
      glibc sign-extends the argument before invocing the system call (so
      nobody can actually use a 64-bit timeout value in user space _anyway_,
      even in 64-bit binaries), the simpler solution would seem to be to just
      fix the definition of the system call to match what it should have been
      from the very start.
      
      If it turns out that somebody somehow circumvents the user-level libc
      64-bit sign extension and actually uses a large unsigned 64-bit timeout
      despite that not being how poll() is supposed to work, we will need to
      do the compat_sys_poll() approach.
      Reported-by: NThomas Meyer <thomas@m3y3r.de>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      faf30900