- 05 10月, 2012 5 次提交
-
-
由 David Gibson 提交于
PAPR hypercalls should only be invoked from the guest kernel, not guest user programs, that is, with MSR[PR]=0. Currently we check this in spapr_hypercall, returning H_PRIVILEGE if MSR[PR]=1. However, under KVM the state of MSR[PR] is already checked by the host kernel before passing the hypercall to qemu, making this check redundant. Worse, however, we don't generally synchronize KVM and qemu state on the hypercall path, meaning that qemu could incorrectly reject a hypercall because it has a stale MSR value. This patch fixes the problem by moving the privilege test exclusively to the TCG hypercall path. Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au> CC: qemu-stable@nongnu.org Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Scott Wood 提交于
While investigating dtb pad issues, I noticed that initrd_base wasn't taking loadaddr into account the way dt_base was. This seems wrong. Signed-off-by: NScott Wood <scottwood@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Scott Wood 提交于
An allowance of 5 MiB for BSS is not enough for Linux kernels with certain debug options enabled (not sure exactly which one caused it, but I'd guess lockdep). The kernel I ran into this with had a BSS of around 6.4 MB. Unfortunately, uImage does not give us enough information to determine the actual BSS size. Increase the allowance to 18 MiB to give us plenty of room. Eventually this should be more intelligent, possibly packing initrd+dtb at the end of guest RAM. Signed-off-by: NScott Wood <scottwood@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Alexander Graf 提交于
As per Peter's suggestion, we can use glib to write out a buffer in whole to a file, simplifying the code dramatically. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Alexander Graf 提交于
The dumpdtb code can be useful in more places than just for e500. Move it to a generic place. Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 04 10月, 2012 27 次提交
-
-
由 David Gibson 提交于
CPUPPCState includes a variable 'power_mode' which is used nowhere. This patch removes it. This includes saving a dummy zero in its place during vmsave, to avoid breaking the save format. Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 David Gibson 提交于
Currently the pseries machine code always attempts to set the size of the guests's hash page table to 16MB. However, because of the way the POWER MMU works, a suitable hash page table size should really depend on memory size. 16MB will be excessive for guests with <1GB and RAM, and may not be enough for guests with >2GB of RAM (depending on guest page size and other factors). The usual given rule of thumb is that the hash table should be 1/64 of the size of memory, but in fact the Linux guests we are aiming at don't really need that much. This patch, therefore, changes the hash table allocation code to aim for 1/128 of the size of RAM (rounding up). When using KVM, this size may still be adjusted by the host kernel if it is unable to allocate a suitable (contiguous) table. Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 David Gibson 提交于
In the paravirtualized environment provided by PAPR, there is a standard locking scheme so that hypercalls updating the hash page table from different guest threads don't corrupt the haah table state. We implement this HVLOCK bit in out page table hypercalls. However, it is not necessary in our case, since the hypercalls all run in the qemu environment under the big qemu lock. Therefore, this patch removes the locking code. This has the additional advantage of freeing up a hash PTE bit which will be useful for migration support. Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Stefan Weil 提交于
Report from smatch: ppc405_uc.c:209 dcr_read_pob(12) error: buffer overflow 'pob->besr' 2 <= 2 ppc405_uc.c:232 dcr_write_pob(12) error: buffer overflow 'pob->besr' 2 <= 2 The old code reads and writes besr[POB0_BESR1 - POB0_BESR0] or besr[2] which is one too much. Signed-off-by: NStefan Weil <sw@weilnetz.de> Reviewed-by: NAndreas Färber <afaerber@suse.de> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 David Gibson 提交于
The kvmppc_reset_htab() function invokes the KVM_PPC_ALLOCATE_HTAB vm ioctl to request KVM to allocate and reset a hash page table for the guest - it returns the size of hash table allocated, or 0 to indicate that qemu needs to allocate the hash table itself. In practice qemu needs to allocate the htab for full emulation and with Book3sPR KVM, but the kernel has to allocate it for Book3sHV KVM (the hash table needs to be physically contiguous in that case). Unfortunately, the logic in this function is incorrect for some existing kernels. Specifically: * at least some PR KVM versions advertise the relevant capability but don't actually implement the ioctl(), returning ENOTTY. * For old kernels which don't have the capability, we currently return 0. This is correct for PV KVM, where we need to allocate the htab, but not for HV KVM - kernels of this era always allocate a 16MB hash table per guest. This patch corrects both of these edge cases. Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 David Gibson 提交于
Currently the ibm,int-on and ibm,int-off RTAS functions are implemented as no-ops. This is because when implemented as specified in PAPR they caused Linux (which calls both int-on/off and set-xive) to end up with interrupts masked when they should not be. Since Linux's set-xive calls make the int-on/off calls redundant, making them nops worked around the problem. In fact, the problem was caused because there was a subtle bug in set-xive, PAPR specifies that as well as updating the current priority, it also needs to update the saved priority used by int-on/off. With this bug fixed the problem goes away. This patch implements this more correct fix. Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 David Gibson 提交于
On the pseries machine the IOMMU (aka TCE tables) is always active for all PCI and VIO devices. Mostly to simplify the SLOF firmware, we implement an extension which allows the IOMMU to be temporarily disabled for certain devices. Currently this is implemented by setting the device's DMAContext pointer to NULL (thus reverting to qemu's default no-IOMMU DMA behaviour), then replacing it when bypass mode is disabled. This approach causes a bunch of complications though. It complexifies the management of the DMAContext lifetimes, it's problematic for savevm/loadvm, and it means that while bypass is active we have nowhere to store the device's LIOBN (Logical IO Bus Number, used to identify DMA address spaces). At present we regenerate the LIOBN from other address information but this restricts how we can allocate LIOBNs. This patch gives up on this approach, replacing it with the much simpler one of having a 'bypass' boolean flag in the TCE state structure. Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 David Gibson 提交于
The general device state structure for PAPR VIO emulated devices includes a 'flags' field which was never used. This patch removes it. Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 David Gibson 提交于
Currently the XICS interrupt controller emulation uses a custom enum to specify whether a given interrupt is level-sensitive or message-triggered. This enum makes life awkward for saving the state, and isn't particularly useful since there are only two possibilities. This patch replaces the enum with a simple bool. Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 David Gibson 提交于
The XICS interrupt controller emulation uses some C bitfield variables in its internal state structure. This makes like awkward for saving the state because we don't have easy VMSTATE helpers for bitfields. This patch removes the bitfields, instead using explicit bit masking in a single status variable. Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 David Gibson 提交于
The H_CEDE hypercall implementation for the pseries machine doesn't trigger quite the right path in the main cpu exec loop. We should set exit_request to pop up one extra level and recheck state, and we should set the exception_index to EXCP_HLT (H_CEDE is roughly equivalent to the hlt instruction on x86). In practice, this doesn't really matter except for KVM, and KVM implements H_CEDE internally so we never hit this code path. But we might as well get it right, just in case it matters some day. Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 David Gibson 提交于
The XICS interrupt controller used on the pseries machine currently has no reset handler. We can get away with this under some circumstances, but it's not correct, and can cause failures if the XICS happens to be in the wrong state at the time of reset. This patch adds a hook to properly reset the XICS state. Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 David Gibson 提交于
The emulated PCI host bridge on the pseries machine incorporates an IOMMU (PAPR TCE table). Currently the mappings in this IOMMU are not cleared when we reset the system. This patch fixes this bug. To do this it adds a new reset function to the IOMMU emulation code. The VIO devices already reset their TCE tables, but they do so by destroying and re-creating their DMA context. This doesn't work for the PCI host bridge, because the infrastructure for PCI IOMMUs has already copied/cached the DMA pointer context into the subordinate PCI device structures. Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 David Gibson 提交于
When we reset the system, the reset method for VIO bus devices resets the state of their request queue (if present) as it should. However it was not resetting the state of their TCE table (DMA translation) if present. It was also not resetting the state of the per-device signal mask set with H_VIO_SIGNAL. This patch corrects both bugs, and also removes some small code duplication in the reset paths. Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 David Gibson 提交于
This adds support for then new "reset htab" ioctl which allows qemu to properly cleanup the MMU hash table when the guest is reset. With the corresponding kernel support, reset of a guest now works properly. This also paves the way for indicating a different size hash table to the kernel and for the kernel to be able to impose limits on the requested size. Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 David Gibson 提交于
A number of things need to occur during reset of the PAPR paravirtualized platform in a specific order. For example, the hash table needs to be cleared before the CPUs are reset, so that they initialize their register state correctly, and the CPUs need to have their main reset called before we set up the entry point state on the boot cpu. We also need to have the main qdev reset happen before the creation and installation of the device tree for the new boot, because we need the state of the devices settled to correctly construct the device tree. We currently do the pseries once-per-reset initializations done from a reset handler. However we can't adequately control when this handler is called during the reset - in particular we can't guarantee it happens after all the qdev resets (since qdevs might be registered after the machine init function has executed). This patch uses the new QEMUMachine reset method to to fix this problem, ensuring the various order dependent reset steps happen in the correct order. Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au> Reviewed-by: NAndreas Färber <afaerber@suse.de> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 David Gibson 提交于
The current pseries machine init function iterates over the CPUs at several points, doing various bits of initialization. This is messy; these can and should be merged into a single iteration doing all the necessary per cpu initialization. Worse, some of these initializations were setting up state which should be set on every reset, not just at machine init time. A few of the initializations simply weren't necessary at all. This patch, therefore, moves those things that need to be to the per-cpu reset handler, and combines the remainder into two loops over the cpus (which also creates them). The second loop is for setting up hash table information, and will be removed in a subsequent patch also making other fixes to the hash table setup. This exposes a bug in our start-cpu RTAS routine (called by the guest to start up CPUs other than CPU0) under kvm. Previously, this function did not make a call to ensure that it's changes to the new cpu's state were pushed into KVM in-kernel state. We sort-of got away with this because some of the initializations had already placed the secondary CPUs into the right starting state for the sorts of Linux guests we've been running. Nonetheless the start-cpu RTAS call's behaviour was not correct and could easily have been broken by guest changes. This patch also fixes it. Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au> Reviewed-by: NAndreas Färber <afaerber@suse.de> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 David Gibson 提交于
At least when invoked with high enough 'level' arguments, kvm_arch_put_registers() is supposed to copy essentially all the cpu state as encoded in qemu's internal structures into the kvm state. Currently the ppc version does not do this - it never calls KVM_SET_SREGS, for example, and therefore never sets the SDR1 and various other important though rarely changed registers. Instead, the code paths which need to set these registers need to explicitly make (conditional) kvm calls which transfer the changes to kvm. This breaks the usual model of handling state updates in qemu, where code just changes the internal model and has it flushed out to kvm automatically at some later point. This patch fixes this for Book S ppc CPUs by adding a suitable call to KVM_SET_SREGS and als to KVM_SET_ONE_REG to set the HIOR (the only register that is set with that call so far). This lets us remove the hacks to explicitly set these registers from the kvmppc_set_papr() function. The problem still exists for Book E CPUs (which use a different version of the kvm_sregs structure). But fixing that has some complications of its own so can be left to another day. Lkewise, there is still some ugly code for setting the PVR through special calls to SET_SREGS which is left in for now. The PVR needs to be set especially early because it can affect what other features are available on the CPU, so I need to do more thinking to see if it can be integrated into the normal paths or not. Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Aurelien Jarno 提交于
We can finally get rid of the ugly HANDLE_NAN{1,2,3} macros. Signed-off-by: NAurelien Jarno <aurelien@aurel32.net> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Aurelien Jarno 提交于
Use the new softfloat float32_muladd() function to implement the vmaddfp and vnmsubfp instructions. As a bonus we can get rid of the call to the HANDLE_NAN3 macro, as the NaN handling is directly done at the softfloat level. Signed-off-by: NAurelien Jarno <aurelien@aurel32.net> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Aurelien Jarno 提交于
Use the new softfloat float32_min() and float32_max() to implement the vminfp and vmaxfp instructions. As a bonus we can get rid of the call to the HANDLE_NAN2 macro, as the NaN handling is directly done at the softfloat level. Signed-off-by: NAurelien Jarno <aurelien@aurel32.net> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Aurelien Jarno 提交于
Commit e024e881 provided a pickNaN() function for PowerPC, implementing the correct NaN propagation rules. Therefore there is no need to test the operands manually, we can rely on the softfloat code to do that. Signed-off-by: NAurelien Jarno <aurelien@aurel32.net> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Andreas Färber 提交于
Place it in alphabetical order, there is a separate section for sharing ppc4xx devices now. Signed-off-by: NAndreas Färber <afaerber@suse.de> Acked-by: NEdgar E. Iglesias <edgar.iglesias@gmail.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Andreas Färber 提交于
Place it in alphabetical order and add new Devices section ppc4xx to share file rules with 405 and virtex_ml507. Signed-off-by: NAndreas Färber <afaerber@suse.de> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Andreas Färber 提交于
As requested by Alex. Signed-off-by: NAndreas Färber <afaerber@suse.de> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Andreas Färber 提交于
Signed-off-by: NAndreas Färber <afaerber@suse.de> Cc: Alexander Graf <agraf@suse.de> Cc: Scott Wood <scottwood@freescale.com> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
由 Andreas Färber 提交于
Signed-off-by: NAndreas Färber <afaerber@suse.de> Acked-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NAlexander Graf <agraf@suse.de>
-
- 02 10月, 2012 3 次提交
-
-
由 Peter Maydell 提交于
The uint64_to_float32() conversion function was incorrectly always returning numbers with the sign bit set (ie negative numbers). Correct this so we return positive numbers instead. Signed-off-by: NPeter Maydell <peter.maydell@linaro.org> Signed-off-by: NAurelien Jarno <aurelien@aurel32.net>
-
由 Peter Maydell 提交于
In float16_to_float32, when returning an infinity, just pass zero as the mantissa argument to packFloat32(), rather than shifting a value which we know must be zero. Signed-off-by: NPeter Maydell <peter.maydell@linaro.org> Reviewed-by: NRichard Henderson <rth@twiddle.net> Signed-off-by: NAurelien Jarno <aurelien@aurel32.net>
-
由 Anthony Liguori 提交于
We cannot cast directly from pointer to uint64. Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Alex Barcelo <abarcelo@ac.upc.edu> Reported-by: NAlex Barcelo <abarcelo@ac.upc.edu> Signed-off-by: NAnthony Liguori <aliguori@us.ibm.com>
-
- 01 10月, 2012 5 次提交
-
-
由 Alex Williamson 提交于
Enabled for all softmmu guests supporting PCI on Linux hosts. Note that currently only x86 hosts have the kernel side VFIO IOMMU support for this. PPC (g3beige) is the only non-x86 guest known to work. ARM (veratile) hangs in firmware, others untested. Signed-off-by: NAlex Williamson <alex.williamson@redhat.com> Signed-off-by: NAnthony Liguori <aliguori@us.ibm.com>
-
由 Alex Williamson 提交于
This adds the core of the QEMU VFIO-based PCI device assignment driver. To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1, and CONFIG_VFIO_PCI in your host Linux kernel config. Load the vfio-pci module. To assign device 0000:05:00.0 to a guest, do the following: for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do vendor=$(cat /sys/bus/pci/devices/$dev/vendor) device=$(cat /sys/bus/pci/devices/$dev/device) if [ -e /sys/bus/pci/devices/$dev/driver ]; then echo $dev > /sys/bus/pci/devices/$dev/driver/unbind fi echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id done See Documentation/vfio.txt in the Linux kernel tree for further description of IOMMU groups and VFIO. Then launch qemu including the option: -device vfio-pci,host=0000:05:00.0 Legacy PCI interrupts (INTx) currently makes use of a kludge where we trap BAR accesses and assume the access is in response to an interrupt, therefore de-asserting and unmasking the interrupt. It's not quite as targetted as using the EOI for this, but it's self contained and seems to work across all architectures. The side-effect is a significant performance slow-down for device in INTx mode. Some devices, like graphics cards, don't really use their interrupt, so this can be turned off with the x-intx=off option, which disables INTx alltogether. This should be considered an experimental option until we refine this code. Both MSI and MSI-X are supported and avoid these issues. Signed-off-by: NAlex Williamson <alex.williamson@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NAnthony Liguori <aliguori@us.ibm.com>
-
由 Alex Williamson 提交于
Based on Linux as of 1a95620. Signed-off-by: NAlex Williamson <alex.williamson@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NAnthony Liguori <aliguori@us.ibm.com>
-
由 Alex Williamson 提交于
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NAnthony Liguori <aliguori@us.ibm.com>
-
由 H. Peter Anvin 提交于
This patch implements Supervisor Mode Execution Prevention (SMEP) and Supervisor Mode Access Prevention (SMAP) for x86. The purpose of the patch, obviously, is to help kernel developers debug the support for those features. A fair bit of the code relates to the handling of CPUID features. The CPUID code probably would get greatly simplified if all the feature bit words were unified into a single vector object, but in the interest of producing a minimal patch for SMEP/SMAP, and because I had very limited time for this project, I followed the existing style. [ v2: don't change the definition of the qemu64 CPU shorthand, since that breaks loading old snapshots. Per Anthony Liguori this can be fixed once the CPU feature set is snapshot. Change the coding style slightly to conform to checkpatch.pl. ] Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com> Signed-off-by: NAnthony Liguori <aliguori@us.ibm.com>
-