1. 05 3月, 2014 1 次提交
  2. 29 1月, 2014 4 次提交
    • P
      powerpc: Make sure "cache" directory is removed when offlining cpu · 91b973f9
      Paul Mackerras 提交于
      The code in remove_cache_dir() is supposed to remove the "cache"
      subdirectory from the sysfs directory for a CPU when that CPU is
      being offlined.  It tries to do this by calling kobject_put() on
      the kobject for the subdirectory.  However, the subdirectory only
      gets removed once the last reference goes away, and the reference
      being put here may well not be the last reference.  That means
      that the "cache" subdirectory may still exist when the offlining
      operation has finished.  If the same CPU subsequently gets onlined,
      the code tries to add a new "cache" subdirectory.  If the old
      subdirectory has not yet been removed, we get a WARN_ON in the
      sysfs code, with stack trace, and an error message printed on the
      console.  Further, we ultimately end up with an online cpu with no
      "cache" subdirectory.
      
      This fixes it by doing an explicit kobject_del() at the point where
      we want the subdirectory to go away.  kobject_del() removes the sysfs
      directory even though the object still exists in memory.  The object
      will get freed at some point in the future.  A subsequent onlining
      operation can create a new sysfs directory, even if the old object
      still exists in memory, without causing any problems.
      
      Cc: stable@vger.kernel.org # v3.0+
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      91b973f9
    • D
      powerpc/pseries/cpuidle: smt-snooze-delay cleanup. · 3fa8cad8
      Deepthi Dharwar 提交于
      smt-snooze-delay was designed to disable NAP state or delay the entry
      to the NAP state prior to adoption of cpuidle framework. This
      is per-cpu variable. With the coming of CPUIDLE framework,
      states can be disabled on per-cpu basis using the cpuidle/enable
      sysfs entry.
      
      Also, with the coming of cpuidle driver each state's target residency
      is per-driver unlike earlier which was per-device. Therefore,
      the per-cpu sysfs smt-snooze-delay which decides the target residency
      of the idle state on a particular cpu causes more confusion to the user
      as we cannot have different smt-snooze-delay (target residency)
      values for each cpu.
      
      In the current code, smt-snooze-delay functionality is completely broken.
      It makes sense to remove smt-snooze-delay from idle driver with the
      coming of cpuidle framework.
      However, sysfs files are retained as ppc64_util currently
      utilises it. Once we fix ppc64_util, propose to clean
      up the kernel code.
      Signed-off-by: NDeepthi Dharwar <deepthi@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      3fa8cad8
    • P
      powerpc: Fix 32-bit frames for signals delivered when transactional · d765ff23
      Paul Mackerras 提交于
      Commit d31626f7 ("powerpc: Don't corrupt transactional state when
      using FP/VMX in kernel") introduced a bug where the uc_link and uc_regs
      fields of the ucontext_t that is created to hold the transactional
      values of the registers in a 32-bit signal frame didn't get set
      correctly.  The reason is that we now clear the MSR_TS bits in the MSR
      in save_tm_user_regs(), before the code that sets uc_link and uc_regs.
      To fix this, we move the setting of uc_link and uc_regs into the same
      if statement that selects whether to call save_tm_user_regs() or
      save_user_regs().
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      d765ff23
    • A
      powerpc: Fix hw breakpoints on !HAVE_HW_BREAKPOINT configurations · 1c430c06
      Andreas Schwab 提交于
      This fixes a logic error that caused a failure to update the hw breakpoint
      registers when not using the hw-breakpoint interface.
      Signed-off-by: NAndreas Schwab <schwab@linux-m68k.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      1c430c06
  3. 27 1月, 2014 5 次提交
    • M
      KVM: PPC: Book3S HV: Add new state for transactional memory · 7b490411
      Michael Neuling 提交于
      Add new state for transactional memory (TM) to kvm_vcpu_arch.  Also add
      asm-offset bits that are going to be required.
      
      This also moves the existing TFHAR, TFIAR and TEXASR SPRs into a
      CONFIG_PPC_TRANSACTIONAL_MEM section.  This requires some code changes to
      ensure we still compile with CONFIG_PPC_TRANSACTIONAL_MEM=N.  Much of the added
      the added #ifdefs are removed in a later patch when the bulk of the TM code is
      added.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      [agraf: fix merge conflict]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      7b490411
    • A
      KVM: PPC: Book3S HV: Basic little-endian guest support · d682916a
      Anton Blanchard 提交于
      We create a guest MSR from scratch when delivering exceptions in
      a few places.  Instead of extracting LPCR[ILE] and inserting it
      into MSR_LE each time, we simply create a new variable intr_msr which
      contains the entire MSR to use.  For a little-endian guest, userspace
      needs to set the ILE (interrupt little-endian) bit in the LPCR for
      each vcpu (or at least one vcpu in each virtual core).
      
      [paulus@samba.org - removed H_SET_MODE implementation from original
      version of the patch, and made kvmppc_set_lpcr update vcpu->arch.intr_msr.]
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      d682916a
    • P
      KVM: PPC: Book3S HV: Add support for DABRX register on POWER7 · 8563bf52
      Paul Mackerras 提交于
      The DABRX (DABR extension) register on POWER7 processors provides finer
      control over which accesses cause a data breakpoint interrupt.  It
      contains 3 bits which indicate whether to enable accesses in user,
      kernel and hypervisor modes respectively to cause data breakpoint
      interrupts, plus one bit that enables both real mode and virtual mode
      accesses to cause interrupts.  Currently, KVM sets DABRX to allow
      both kernel and user accesses to cause interrupts while in the guest.
      
      This adds support for the guest to specify other values for DABRX.
      PAPR defines a H_SET_XDABR hcall to allow the guest to set both DABR
      and DABRX with one call.  This adds a real-mode implementation of
      H_SET_XDABR, which shares most of its code with the existing H_SET_DABR
      implementation.  To support this, we add a per-vcpu field to store the
      DABRX value plus code to get and set it via the ONE_REG interface.
      
      For Linux guests to use this new hcall, userspace needs to add
      "hcall-xdabr" to the set of strings in the /chosen/hypertas-functions
      property in the device tree.  If userspace does this and then migrates
      the guest to a host where the kernel doesn't include this patch, then
      userspace will need to implement H_SET_XDABR by writing the specified
      DABR value to the DABR using the ONE_REG interface.  In that case, the
      old kernel will set DABRX to DABRX_USER | DABRX_KERNEL.  That should
      still work correctly, at least for Linux guests, since Linux guests
      cope with getting data breakpoint interrupts in modes that weren't
      requested by just ignoring the interrupt, and Linux guests never set
      DABRX_BTI.
      
      The other thing this does is to make H_SET_DABR and H_SET_XDABR work
      on POWER8, which has the DAWR and DAWRX instead of DABR/X.  Guests that
      know about POWER8 should use H_SET_MODE rather than H_SET_[X]DABR, but
      guests running in POWER7 compatibility mode will still use H_SET_[X]DABR.
      For them, this adds the logic to convert DABR/X values into DAWR/X values
      on POWER8.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      8563bf52
    • M
      KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs · b005255e
      Michael Neuling 提交于
      This adds fields to the struct kvm_vcpu_arch to store the new
      guest-accessible SPRs on POWER8, adds code to the get/set_one_reg
      functions to allow userspace to access this state, and adds code to
      the guest entry and exit to context-switch these SPRs between host
      and guest.
      
      Note that DPDES (Directed Privileged Doorbell Exception State) is
      shared between threads on a core; hence we store it in struct
      kvmppc_vcore and have the master thread save and restore it.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      b005255e
    • P
      KVM: PPC: Book3S HV: Align physical and virtual CPU thread numbers · e0b7ec05
      Paul Mackerras 提交于
      On a threaded processor such as POWER7, we group VCPUs into virtual
      cores and arrange that the VCPUs in a virtual core run on the same
      physical core.  Currently we don't enforce any correspondence between
      virtual thread numbers within a virtual core and physical thread
      numbers.  Physical threads are allocated starting at 0 on a first-come
      first-served basis to runnable virtual threads (VCPUs).
      
      POWER8 implements a new "msgsndp" instruction which guest kernels can
      use to interrupt other threads in the same core or sub-core.  Since
      the instruction takes the destination physical thread ID as a parameter,
      it becomes necessary to align the physical thread IDs with the virtual
      thread IDs, that is, to make sure virtual thread N within a virtual
      core always runs on physical thread N.
      
      This means that it's possible that thread 0, which is where we call
      __kvmppc_vcore_entry, may end up running some other vcpu than the
      one whose task called kvmppc_run_core(), or it may end up running
      no vcpu at all, if for example thread 0 of the virtual core is
      currently executing in userspace.  However, we do need thread 0
      to be responsible for switching the MMU -- a previous version of
      this patch that had other threads switching the MMU was found to
      be responsible for occasional memory corruption and machine check
      interrupts in the guest on POWER7 machines.
      
      To accommodate this, we no longer pass the vcpu pointer to
      __kvmppc_vcore_entry, but instead let the assembly code load it from
      the PACA.  Since the assembly code will need to know the kvm pointer
      and the thread ID for threads which don't have a vcpu, we move the
      thread ID into the PACA and we add a kvm pointer to the virtual core
      structure.
      
      In the case where thread 0 has no vcpu to run, it still calls into
      kvmppc_hv_entry in order to do the MMU switch, and then naps until
      either its vcpu is ready to run in the guest, or some other thread
      needs to exit the guest.  In the latter case, thread 0 jumps to the
      code that switches the MMU back to the host.  This control flow means
      that now we switch the MMU before loading any guest vcpu state.
      Similarly, on guest exit we now save all the guest vcpu state before
      switching the MMU back to the host.  This has required substantial
      code movement, making the diff rather large.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      e0b7ec05
  4. 16 1月, 2014 1 次提交
  5. 15 1月, 2014 9 次提交
    • G
      powerpc/eeh: Handle multiple EEH errors · 7e4e7867
      Gavin Shan 提交于
      For one PCI error relevant OPAL event, we possibly have multiple
      EEH errors for that. For example, multiple frozen PEs detected on
      different PHBs. Unfortunately, we didn't cover the case. The patch
      enumarates the return value from eeh_ops::next_error() and change
      eeh_handle_special_event() and eeh_ops::next_error() to handle all
      existing EEH errors.
      
      As Ben pointed out, we needn't list_for_each_entry_safe() since we
      are not deleting any PHB from the hose_list and the EEH serialized
      lock should be held while purging EEH events. The patch covers those
      suggestions as well.
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      7e4e7867
    • P
      powerpc: Fix transactional FP/VMX/VSX unavailable handlers · 3ac8ff1c
      Paul Mackerras 提交于
      Currently, if a process starts a transaction and then takes an
      exception because the FPU, VMX or VSX unit is unavailable to it,
      we end up corrupting any FP/VMX/VSX state that was valid before
      the interrupt.  For example, if the process starts a transaction
      with the FPU available to it but VMX unavailable, and then does
      a VMX instruction inside the transaction, the FP state gets
      corrupted.
      
      Loading up the desired state generally involves doing a reclaim
      and a recheckpoint.  To avoid corrupting already-valid state, we have
      to be careful not to reload that state from the thread_struct
      between the reclaim and the recheckpoint (since the thread_struct
      values are stale by now), and we have to reload that state from
      the transact_fp/vr arrays after the recheckpoint to get back the
      current transactional values saved there by the reclaim.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      3ac8ff1c
    • P
      powerpc: Don't corrupt transactional state when using FP/VMX in kernel · d31626f7
      Paul Mackerras 提交于
      Currently, when we have a process using the transactional memory
      facilities on POWER8 (that is, the processor is in transactional
      or suspended state), and the process enters the kernel and the
      kernel then uses the floating-point or vector (VMX/Altivec) facility,
      we end up corrupting the user-visible FP/VMX/VSX state.  This
      happens, for example, if a page fault causes a copy-on-write
      operation, because the copy_page function will use VMX to do the
      copy on POWER8.  The test program below demonstrates the bug.
      
      The bug happens because when FP/VMX state for a transactional process
      is stored in the thread_struct, we store the checkpointed state in
      .fp_state/.vr_state and the transactional (current) state in
      .transact_fp/.transact_vr.  However, when the kernel wants to use
      FP/VMX, it calls enable_kernel_fp() or enable_kernel_altivec(),
      which saves the current state in .fp_state/.vr_state.  Furthermore,
      when we return to the user process we return with FP/VMX/VSX
      disabled.  The next time the process uses FP/VMX/VSX, we don't know
      which set of state (the current register values, .fp_state/.vr_state,
      or .transact_fp/.transact_vr) we should be using, since we have no
      way to tell if we are still in the same transaction, and if not,
      whether the previous transaction succeeded or failed.
      
      Thus it is necessary to strictly adhere to the rule that if FP has
      been enabled at any point in a transaction, we must keep FP enabled
      for the user process with the current transactional state in the
      FP registers, until we detect that it is no longer in a transaction.
      Similarly for VMX; once enabled it must stay enabled until the
      process is no longer transactional.
      
      In order to keep this rule, we add a new thread_info flag which we
      test when returning from the kernel to userspace, called TIF_RESTORE_TM.
      This flag indicates that there is FP/VMX/VSX state to be restored
      before entering userspace, and when it is set the .tm_orig_msr field
      in the thread_struct indicates what state needs to be restored.
      The restoration is done by restore_tm_state().  The TIF_RESTORE_TM
      bit is set by new giveup_fpu/altivec_maybe_transactional helpers,
      which are called from enable_kernel_fp/altivec, giveup_vsx, and
      flush_fp/altivec_to_thread instead of giveup_fpu/altivec.
      
      The other thing to be done is to get the transactional FP/VMX/VSX
      state from .fp_state/.vr_state when doing reclaim, if that state
      has been saved there by giveup_fpu/altivec_maybe_transactional.
      Having done this, we set the FP/VMX bit in the thread's MSR after
      reclaim to indicate that that part of the state is now valid
      (having been reclaimed from the processor's checkpointed state).
      
      Finally, in the signal handling code, we move the clearing of the
      transactional state bits in the thread's MSR a bit earlier, before
      calling flush_fp_to_thread(), so that we don't unnecessarily set
      the TIF_RESTORE_TM bit.
      
      This is the test program:
      
      /* Michael Neuling 4/12/2013
       *
       * See if the altivec state is leaked out of an aborted transaction due to
       * kernel vmx copy loops.
       *
       *   gcc -m64 htm_vmxcopy.c -o htm_vmxcopy
       *
       */
      
      /* We don't use all of these, but for reference: */
      
      int main(int argc, char *argv[])
      {
      	long double vecin = 1.3;
      	long double vecout;
      	unsigned long pgsize = getpagesize();
      	int i;
      	int fd;
      	int size = pgsize*16;
      	char tmpfile[] = "/tmp/page_faultXXXXXX";
      	char buf[pgsize];
      	char *a;
      	uint64_t aborted = 0;
      
      	fd = mkstemp(tmpfile);
      	assert(fd >= 0);
      
      	memset(buf, 0, pgsize);
      	for (i = 0; i < size; i += pgsize)
      		assert(write(fd, buf, pgsize) == pgsize);
      
      	unlink(tmpfile);
      
      	a = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
      	assert(a != MAP_FAILED);
      
      	asm __volatile__(
      		"lxvd2x 40,0,%[vecinptr] ; " // set 40 to initial value
      		TBEGIN
      		"beq	3f ;"
      		TSUSPEND
      		"xxlxor 40,40,40 ; " // set 40 to 0
      		"std	5, 0(%[map]) ;" // cause kernel vmx copy page
      		TABORT
      		TRESUME
      		TEND
      		"li	%[res], 0 ;"
      		"b	5f ;"
      		"3: ;" // Abort handler
      		"li	%[res], 1 ;"
      		"5: ;"
      		"stxvd2x 40,0,%[vecoutptr] ; "
      		: [res]"=r"(aborted)
      		: [vecinptr]"r"(&vecin),
      		  [vecoutptr]"r"(&vecout),
      		  [map]"r"(a)
      		: "memory", "r0", "r3", "r4", "r5", "r6", "r7");
      
      	if (aborted && (vecin != vecout)){
      		printf("FAILED: vector state leaked on abort %f != %f\n",
      		       (double)vecin, (double)vecout);
      		exit(1);
      	}
      
      	munmap(a, size);
      
      	close(fd);
      
      	printf("PASSED!\n");
      	return 0;
      }
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      d31626f7
    • B
      powerpc: Fix races with irq_work · 0215f7d8
      Benjamin Herrenschmidt 提交于
      If we set irq_work on a processor and immediately afterward, before the
      irq work has a chance to be processed, we change the decrementer value,
      we can seriously delay the handling of that irq_work.
      
      Fix it by checking in a few places for pending irq work, first before
      changing the decrementer in decrementer_set_next_event() and after
      changing it in the same function and in timer_interrupt().
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      0215f7d8
    • M
      Move precessing of MCE queued event out from syscall exit path. · 30c82635
      Mahesh Salgaonkar 提交于
      Huge Dickins reported an issue that b5ff4211
      "powerpc/book3s: Queue up and process delayed MCE events" breaks the
      PowerMac G5 boot. This patch fixes it by moving the mce even processing
      away from syscall exit, which was wrong to do that in first place, and
      using irq work framework to delay processing of mce event.
      
      Reported-by: Hugh Dickins <hughd@google.com
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      30c82635
    • G
      powerpc/iommu: Don't detach device without IOMMU group · 0c4b9e27
      Gavin Shan 提交于
      Some devices, for example PCI root port, don't have IOMMU table and
      group. We needn't detach them from their IOMMU group. Otherwise, it
      potentially incurs kernel crash because of referring NULL IOMMU group
      as following backtrace indicates:
      
        .iommu_group_remove_device+0x74/0x1b0
        .iommu_bus_notifier+0x94/0xb4
        .notifier_call_chain+0x78/0xe8
        .__blocking_notifier_call_chain+0x7c/0xbc
        .blocking_notifier_call_chain+0x38/0x48
        .device_del+0x50/0x234
        .pci_remove_bus_device+0x88/0x138
        .pci_stop_and_remove_bus_device+0x2c/0x40
        .pcibios_remove_pci_devices+0xcc/0xfc
        .pcibios_remove_pci_devices+0x3c/0xfc
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      0c4b9e27
    • G
      powerpc/eeh: Hotplug improvement · f26c7a03
      Gavin Shan 提交于
      When EEH error comes to one specific PCI device before its driver
      is loaded, we will apply hotplug to recover the error. During the
      plug time, the PCI device will be probed and its driver is loaded.
      Then we wrongly calls to the error handlers if the driver supports
      EEH explicitly.
      
      The patch intends to fix by introducing flag EEH_DEV_NO_HANDLER and
      set it before we remove the PCI device. In turn, we can avoid wrongly
      calls the error handlers of the PCI device after its driver loaded.
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      f26c7a03
    • G
      powerpc/eeh: Add restore_config operation · 1d350544
      Gavin Shan 提交于
      After reset on the specific PE or PHB, we never configure AER
      correctly on PowerNV platform. We needn't care it on pSeries
      platform. The patch introduces additional EEH operation eeh_ops::
      restore_config() so that we have chance to configure AER correctly
      for PowerNV platform.
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      1d350544
    • P
      powerpc: Delete non-required instances of include <linux/init.h> · c141611f
      Paul Gortmaker 提交于
      None of these files are actually using any __init type directives
      and hence don't need to include <linux/init.h>.  Most are just a
      left over from __devinit and __cpuinit removal, or simply due to
      code getting copied from one driver to the next.
      
      The one instance where we add an include for init.h covers off
      a case where that file was implicitly getting it from another
      header which itself didn't need it.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      c141611f
  6. 13 1月, 2014 2 次提交
    • B
      powerpc: Check return value of instance-to-package OF call · 10348f59
      Benjamin Herrenschmidt 提交于
      On PA-Semi firmware, the instance-to-package callback doesn't seem
      to be implemented. We didn't check for error, however, thus
      subsequently passed the -1 value returned into stdout_node to
      thins like prom_getprop etc...
      
      Thus caused the firmware to load values around 0 (physical) internally
      as node structures. It somewhat "worked" as long as we had a NULL in the
      right place (address 8) at the beginning of the kernel, we didn't "see"
      the bug. But commit 5c0484e2
      "powerpc: Endian safe trampoline" changed the kernel entry point causing
      that old bug to now cause a crash early during boot.
      
      This fixes booting on PA-Semi board by properly checking the return
      value from instance-to-package.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Tested-by: NOlof Johansson <olof@lixom.net>
      ---
      10348f59
    • G
      clk: mpc5xxx: switch to COMMON_CLK, retire PPC_CLOCK · 7d71d5b2
      Gerhard Sittig 提交于
      the setup before the change was
      - arch/powerpc/Kconfig had the PPC_CLOCK option, off by default
      - depending on the PPC_CLOCK option the arch/powerpc/kernel/clock.c file
        was built, which implements the clk.h API but always returns -ENOSYS
        unless a platform registers specific callbacks
      - the MPC52xx platform selected PPC_CLOCK but did not register any
        callbacks, thus all clk.h API calls keep resulting in -ENOSYS errors
        (which is OK, all peripheral drivers deal with the situation)
      - the MPC512x platform selected PPC_CLOCK and registered specific
        callbacks implemented in arch/powerpc/platforms/512x/clock.c, thus
        provided real support for the clock API
      - no other powerpc platform did select PPC_CLOCK
      
      the situation after the change is
      - the MPC512x platform implements the COMMON_CLK interface, and thus the
        PPC_CLOCK approach in arch/powerpc/platforms/512x/clock.c has become
        obsolete
      - the MPC52xx platform still lacks genuine support for the clk.h API
        while this is not a change against the previous situation (the error
        code returned from COMMON_CLK stubs differs but every call still
        results in an error)
      - with all references gone, the arch/powerpc/kernel/clock.c wrapper and
        the PPC_CLOCK option have become obsolete, as did the clk_interface.h
        header file
      
      the switch from PPC_CLOCK to COMMON_CLK is done for all platforms within
      the same commit such that multiplatform kernels (the combination of 512x
      and 52xx within one executable) keep working
      
      Cc: Mike Turquette <mturquette@linaro.org>
      Cc: Anatolij Gustschin <agust@denx.de>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: NGerhard Sittig <gsi@denx.de>
      Signed-off-by: NAnatolij Gustschin <agust@denx.de>
      7d71d5b2
  7. 11 1月, 2014 1 次提交
    • D
      powerpc: Replaced tlbilx with tlbwe in the initialization code · ed2ddc56
      Diana Craciun 提交于
      On Freescale e6500 cores EPCR[DGTMI] controls whether guest supervisor
      state can execute TLB management instructions. If EPCR[DGTMI]=0
      tlbwe and tlbilx are allowed to execute normally in the guest state.
      
      A hypervisor may choose to virtualize TLB1 and for this purpose it
      may use IPROT to protect the entries for being invalidated by the
      guest. However, because tlbwe and tlbilx execution in the guest state
      are sharing the same bit, it is not possible to have a scenario where
      tlbwe is allowed to be executed in guest state and tlbilx traps. When
      guest TLB management instructions are allowed to be executed in guest
      state the guest cannot use tlbilx to invalidate TLB1 guest entries.
      
      Linux is using tlbilx in the boot code to invalidate the temporary
      entries it creates when initializing the MMU. The patch is replacing
      the usage of tlbilx in initialization code with tlbwe with VALID bit
      cleared.
      
      Linux is also using tlbilx in other contexts (like huge pages or
      indirect entries) but removing the tlbilx from the initialization code
      offers the possibility to have scenarios under hypervisor which are
      not using huge pages or indirect entries.
      Signed-off-by: NDiana Craciun <Diana.Craciun@freescale.com>
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      ed2ddc56
  8. 10 1月, 2014 9 次提交
    • S
      powerpc/e6500: TLB miss handler with hardware tablewalk support · 28efc35f
      Scott Wood 提交于
      There are a few things that make the existing hw tablewalk handlers
      unsuitable for e6500:
      
       - Indirect entries go in TLB1 (though the resulting direct entries go in
         TLB0).
      
       - It has threads, but no "tlbsrx." -- so we need a spinlock and
         a normal "tlbsx".  Because we need this lock, hardware tablewalk
         is mandatory on e6500 unless we want to add spinlock+tlbsx to
         the normal bolted TLB miss handler.
      
       - TLB1 has no HES (nor next-victim hint) so we need software round robin
         (TODO: integrate this round robin data with hugetlb/KVM)
      
       - The existing tablewalk handlers map half of a page table at a time,
         because IBM hardware has a fixed 1MiB indirect page size.  e6500
         has variable size indirect entries, with a minimum of 2MiB.
         So we can't do the half-page indirect mapping, and even if we
         could it would be less efficient than mapping the full page.
      
       - Like on e5500, the linear mapping is bolted, so we don't need the
         overhead of supporting nested tlb misses.
      
      Note that hardware tablewalk does not work in rev1 of e6500.
      We do not expect to support e6500 rev1 in mainline Linux.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Cc: Mihai Caraman <mihai.caraman@freescale.com>
      28efc35f
    • K
      powerpc/fsl_booke: smp support for booting a relocatable kernel above 64M · 0be7d969
      Kevin Hao 提交于
      When booting above the 64M for a secondary cpu, we also face the
      same issue as the boot cpu that the PAGE_OFFSET map two different
      physical address for the init tlb and the final map. So we have to use
      switch_to_as1/restore_to_as0 between the conversion of these two
      maps. When restoring to as0 for a secondary cpu, we only need to
      return to the caller. So add a new parameter for function
      restore_to_as0 for this purpose.
      
      Use LOAD_REG_ADDR_PIC to get the address of variables which may
      be used before we set the final map in cams for the secondary cpu.
      Move the setting of cams a bit earlier in order to avoid the
      unnecessary using of LOAD_REG_ADDR_PIC.
      Signed-off-by: NKevin Hao <haokexin@gmail.com>
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      0be7d969
    • K
      powerpc/fsl_booke: make sure PAGE_OFFSET map to memstart_addr for relocatable kernel · 7d2471f9
      Kevin Hao 提交于
      This is always true for a non-relocatable kernel. Otherwise the kernel
      would get stuck. But for a relocatable kernel, it seems a little
      complicated. When booting a relocatable kernel, we just align the
      kernel start addr to 64M and map the PAGE_OFFSET from there. The
      relocation will base on this virtual address. But if this address
      is not the same as the memstart_addr, we will have to change the
      map of PAGE_OFFSET to the real memstart_addr and do another relocation
      again.
      Signed-off-by: NKevin Hao <haokexin@gmail.com>
      [scottwood@freescale.com: make offset long and non-negative in simple case]
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      7d2471f9
    • K
      powerpc: introduce early_get_first_memblock_info · b27652dd
      Kevin Hao 提交于
      For a relocatable kernel since it can be loaded at any place, there
      is no any relation between the kernel start addr and the memstart_addr.
      So we can't calculate the memstart_addr from kernel start addr. And
      also we can't wait to do the relocation after we get the real
      memstart_addr from device tree because it is so late. So introduce
      a new function we can use to get the first memblock address and size
      in a very early stage (before machine_init).
      Signed-off-by: NKevin Hao <haokexin@gmail.com>
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      b27652dd
    • K
      powerpc/fsl_booke: set the tlb entry for the kernel address in AS1 · 78a235ef
      Kevin Hao 提交于
      We use the tlb1 entries to map low mem to the kernel space. In the
      current code, it assumes that the first tlb entry would cover the
      kernel image. But this is not true for some special cases, such as
      when we run a relocatable kernel above the 64M or set
      CONFIG_KERNEL_START above 64M. So we choose to switch to address
      space 1 before setting these tlb entries.
      Signed-off-by: NKevin Hao <haokexin@gmail.com>
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      78a235ef
    • K
      powerpc: enable the relocatable support for the fsl booke 32bit kernel · dd189692
      Kevin Hao 提交于
      This is based on the codes in the head_44x.S. The difference is that
      the init tlb size we used is 64M. With this patch we can only load the
      kernel at address between memstart_addr ~ memstart_addr + 64M. We will
      fix this restriction in the following patches.
      Signed-off-by: NKevin Hao <haokexin@gmail.com>
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      dd189692
    • K
      powerpc/fsl_booke: introduce get_phys_addr function · 99739611
      Kevin Hao 提交于
      Move the codes which translate a effective address to physical address
      to a separate function. So it can be reused by other code.
      Signed-off-by: NKevin Hao <haokexin@gmail.com>
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      99739611
    • K
      powerpc/fsl_booke: protect the access to MAS7 · 7c732cba
      Kevin Hao 提交于
      The e500v1 doesn't implement the MAS7, so we should avoid to access
      this register on that implementations. In the current kernel, the
      access to MAS7 are protected by either CONFIG_PHYS_64BIT or
      MMU_FTR_BIG_PHYS. Since some code are executed before the code
      patching, we have to use CONFIG_PHYS_64BIT in these cases.
      Signed-off-by: NKevin Hao <haokexin@gmail.com>
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      7c732cba
    • W
      powerpc/85xx: add sysfs for pw20 state and altivec idle · a7189483
      Wang Dongsheng 提交于
      Add a sys interface to enable/diable pw20 state or altivec idle, and
      control the wait entry time.
      
      Enable/Disable interface:
          0, disable. 1, enable.
          /sys/devices/system/cpu/cpuX/pw20_state
          /sys/devices/system/cpu/cpuX/altivec_idle
      
      Set wait time interface:(Nanosecond)
          /sys/devices/system/cpu/cpuX/pw20_wait_time
          /sys/devices/system/cpu/cpuX/altivec_idle_wait_time
      Example: Base on TBfreq is 41MHZ.
          1~48(ns): TB[63]
          49~97(ns): TB[62]
          98~195(ns): TB[61]
          196~390(ns): TB[60]
          391~780(ns): TB[59]
          781~1560(ns): TB[58]
          ...
      Signed-off-by: NWang Dongsheng <dongsheng.wang@freescale.com>
      [scottwood@freescale.com: change ifdef]
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      a7189483
  9. 09 1月, 2014 3 次提交
  10. 08 1月, 2014 5 次提交
    • W
      powerpc/85xx: add hardware automatically enter pw20 state · 1d47ddf7
      Wang Dongsheng 提交于
      Using hardware features make core automatically enter PW20 state.
      Set a TB count to hardware, the effective count begins when PW10
      is entered. When the effective period has expired, the core will
      proceed from PW10 to PW20 if no exit conditions have occurred during
      the period.
      Signed-off-by: NWang Dongsheng <dongsheng.wang@freescale.com>
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      1d47ddf7
    • W
      powerpc/85xx: add hardware automatically enter altivec idle state · 202e059c
      Wang Dongsheng 提交于
      Each core's AltiVec unit may be placed into a power savings mode
      by turning off power to the unit. Core hardware will automatically
      power down the AltiVec unit after no AltiVec instructions have
      executed in N cycles. The AltiVec power-control is triggered by hardware.
      Signed-off-by: NWang Dongsheng <dongsheng.wang@freescale.com>
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      202e059c
    • S
      powerpc/fsl-booke: Use SPRN_SPRGn rather than mfsprg/mtsprg · b58a7bd6
      Scott Wood 提交于
      This fixes a build break that was probably introduced with the removal
      of -Wa,-me500 (commit f49596a4), where
      the assembler refuses to recognize SPRG4-7 with a generic PPC target.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Cc: Dongsheng Wang <dongsheng.wang@freescale.com>
      Cc: Anton Vorontsov <avorontsov@mvista.com>
      Reviewed-by: NWang Dongsheng <dongsheng.wang@freescale.com>
      Tested-by: NWang Dongsheng <dongsheng.wang@freescale.com>
      b58a7bd6
    • J
      powerpc: fix exception clearing in e500 SPE float emulation · 640e9225
      Joseph Myers 提交于
      The e500 SPE floating-point emulation code clears existing exceptions
      (__FPU_FPSCR &= ~FP_EX_MASK;) before ORing in the exceptions from the
      emulated operation.  However, these exception bits are the "sticky",
      cumulative exception bits, and should only be cleared by the user
      program setting SPEFSCR, not implicitly by any floating-point
      instruction (whether executed purely by the hardware or emulated).
      The spurious clearing of these bits shows up as missing exceptions in
      glibc testing.
      
      Fixing this, however, is not as simple as just not clearing the bits,
      because while the bits may be from previous floating-point operations
      (in which case they should not be cleared), the processor can also set
      the sticky bits itself before the interrupt for an exception occurs,
      and this can happen in cases when IEEE 754 semantics are that the
      sticky bit should not be set.  Specifically, the "invalid" sticky bit
      is set in various cases with non-finite operands, where IEEE 754
      semantics do not involve raising such an exception, and the
      "underflow" sticky bit is set in cases of exact underflow, whereas
      IEEE 754 semantics are that this flag is set only for inexact
      underflow.  Thus, for correct emulation the kernel needs to know the
      setting of these two sticky bits before the instruction being
      emulated.
      
      When a floating-point operation raises an exception, the kernel can
      note the state of the sticky bits immediately afterwards.  Some
      <fenv.h> functions that affect the state of these bits, such as
      fesetenv and feholdexcept, need to use prctl with PR_GET_FPEXC and
      PR_SET_FPEXC anyway, and so it is natural to record the state of those
      bits during that call into the kernel and so avoid any need for a
      separate call into the kernel to inform it of a change to those bits.
      Thus, the interface I chose to use (in this patch and the glibc port)
      is that one of those prctl calls must be made after any userspace
      change to those sticky bits, other than through a floating-point
      operation that traps into the kernel anyway.  feclearexcept and
      fesetexceptflag duly make those calls, which would not be required
      were it not for this issue.
      
      The previous EGLIBC port, and the uClibc code copied from it, is
      fundamentally broken as regards any use of prctl for floating-point
      exceptions because it didn't use the PR_FP_EXC_SW_ENABLE bit in its
      prctl calls (and did various worse things, such as passing a pointer
      when prctl expected an integer).  If you avoid anything where prctl is
      used, the clearing of sticky bits still means it will never give
      anything approximating correct exception semantics with existing
      kernels.  I don't believe the patch makes things any worse for
      existing code that doesn't try to inform the kernel of changes to
      sticky bits - such code may get incorrect exceptions in some cases,
      but it would have done so anyway in other cases.
      Signed-off-by: NJoseph Myers <joseph@codesourcery.com>
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      640e9225
    • M
      powerpc/booke64: Add LRAT error exception handler · 228b1a47
      Mihai Caraman 提交于
      LRAT (Logical to Real Address Translation) present in MMU v2 provides hardware
      translation from a logical page number (LPN) to a real page number (RPN) when
      tlbwe is executed by a guest or when a page table translation occurs from a
      guest virtual address.
      
      Add LRAT error exception handler to Booke3E 64-bit kernel and the basic KVM
      handler to avoid build breakage. This is a prerequisite for KVM LRAT support
      that will follow.
      Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com>
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      228b1a47