- 17 12月, 2015 23 次提交
-
-
由 Nathan Fontenot 提交于
Add the ability to dlpar remove CPUs via hotplug rtas events, either by specifying the drc-index of the CPU to remove or providing a count of cpus to remove. To remove multiple cpus in a single request we create a list of possible DR (Dynamic Reconfiguration) cpus and their drc indexes that can be removed. We can then traverse the list remove each cpu and easily clean up in any cases of failure. Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nathan Fontenot 提交于
Update the cpu dlpar add/remove paths to do better error recovery when a failure occurs during the add/remove operation. Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nathan Fontenot 提交于
Re-factor the cpu hotplug code to support doing cpu hotplug completely in the kernel and using the existing sysfs probe/release interfaces. This patch pulls out pieces of existing cpu hotplug code into common routines, dlpar_cpu_add() and dlpar_cpu_remove(), to be used by both interfaces. There are no functional changes introduced. Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nathan Fontenot 提交于
No functional changes, this patch is simply a move of the cpu hotplug code from pseries/dlpar.c to pseries/hotplug-cpu.c. This is in an effort to consolidate all of the cpu hotplug code in a common place. Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nathan Fontenot 提交于
When DLPAR adding a CPU we should verify that the CPU does not already exist. Failure to do so can generate a kernel oops; [ 9.465585] kernel BUG at arch/powerpc/platforms/pseries/dlpar.c:382! [ 9.465796] Oops: Exception in kernel mode, sig: 5 [#1] This oops can be generated by causing a probe to be performed on a cpu by writing to the sysfs cpu probe file (/sys/devices/system/cpu/probe). This patch adds a check for the existence of cpu prior to probing the cpu so userspace doing the wrong thing won't trigger a BUG_ON(). Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Alistair Popple 提交于
PPC476FPE has a different PVR from previous PPC476 processors. The kexec code checks the PVR in order to correctly setup the MMU. When the initial support for 476FPE processors was added the corresponding change in the kexec code was missed. This patch simply adds the check and solves the following bug on kexec: kexec: Starting new kernel Bye! Unable to handle kernel paging request for instruction fetch Faulting instruction address: 0xee9a50f8 cpu 0x0: Vector: 400 (Instruction Access) at [ee9d7d20] pc: ee9a50f8 lr: ee9a50e4 sp: ee9d7dd0 msr: 21020 current = 0xee40f000 pid = 960, comm = kexec enter ? for help [link register ] ee9a50e4 [ee9d7dd0] c0013748 default_machine_kexec+0x58/0x70 (unreliable) [ee9d7df0] c0012f04 machine_kexec+0x34/0x40 [ee9d7e00] c00aa1ec kernel_kexec+0x9c/0xb0 [ee9d7e20] c005d704 SyS_reboot+0x1f4/0x220 [ee9d7f40] c000db68 ret_from_syscall+0x0/0x3c Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Alistair Popple 提交于
NVLink is a high speed interconnect that is used in conjunction with a PCI-E connection to create an interface between CPU and GPU that provides very high data bandwidth. A PCI-E connection to a GPU is used as the control path to initiate and report status of large data transfers sent via the NVLink. On IBM Power systems the NVLink processing unit (NPU) is similar to the existing PHB3. This patch adds support for a new NPU PHB type. DMA operations on the NPU are not supported as this patch sets the TCE translation tables to be the same as the related GPU PCIe device for each NVLink. Therefore all DMA operations are setup and controlled via the PCIe device. EEH is not presently supported for the NPU devices, although it may be added in future. Signed-off-by: NAlistair Popple <alistair@popple.id.au> Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Alistair Popple 提交于
Move __raw_rm_writeq() from platforms/powernv/pci-ioda.c to include/asm/io.h so that it can be used by other code. Signed-off-by: NAlistair Popple <alistair@popple.id.au> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Alistair Popple 提交于
This commit removed the pcidev field from struct pci_dn as it was no longer in use by the kernel. However to support finding the association of Nvlink devices to GPU devices from the device-tree this field is required. This reverts commit 250c7b27 ("powerpc/pci: Remove unused struct pci_dn.pcidev field"). Signed-off-by: NAlistair Popple <alistair@popple.id.au> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Gavin Shan 提交于
The name of PCI root bus's M64 resource isn't initialized properly. When dumping "/proc/iomem", "<BAD>" is seen for those M64 resources on PCI root buses. ~# cat /proc/iomem | grep -e "BAD" 3b0000000000-3b0fefffffff : <BAD> 3b1000000000-3b1fefffffff : <BAD> 3c0000000000-3c0fefffffff : <BAD> 3c1000000000-3c1fefffffff : <BAD> 3c2000000000-3c2fefffffff : <BAD> This fixes the issue by setting the name of PCI root bus's M64 resource to that of PHB's device node full name. With the patch, no "<BAD>" is seen from "/proc/iomem". Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Laurent Dufour 提交于
User space checkpoint and restart tool (CRIU) needs the page's change to be soft tracked. This allows to do a pre checkpoint and then dump only touched pages. This is done by using a newly assigned PTE bit (_PAGE_SOFT_DIRTY) when the page is backed in memory, and a new _PAGE_SWP_SOFT_DIRTY bit when the page is swapped out. To introduce a new PTE _PAGE_SOFT_DIRTY bit value common to hash 4k and hash 64k pte, the bits already defined in hash-*4k.h should be shifted left by one. The _PAGE_SWP_SOFT_DIRTY bit is dynamically put after the swap type in the swap pte. A check is added to ensure that the bit is not overwritten by _PAGE_HPTEFLAGS. Signed-off-by: NLaurent Dufour <ldufour@linux.vnet.ibm.com> CC: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Ellerman 提交于
The STD_EXCEPTION_PSERIES macro takes both a vector number, and a location (memory address). However both are always identical, so combine them to save repeating ourselves. This does mean an exception handler must always exist at the location in memory that matches its vector number. But that's OK because this is the "STD" macro (standard), which does exactly that. We have other macros for the other cases, eg. STD_EXCEPTION_PSERIES_OOL (out of line). Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Ellerman 提交于
This is only used in one location, open code it. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Ellerman 提交于
HMT_MEDIUM_LOW_HAS_PPR is only used in once place, open code it. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Ellerman 提交于
HMT_MEDIUM_PPR_DISCARD is a macro which is present at the start of most of our first level exception handlers. It conditionally executes a HMT_MEDIUM instruction, which sets the processor priority to medium. On on modern systems, ie. Power7 and later, it is nop'ed out at boot. All it does is make the exception vectors more cramped, and consume 4 bytes of icache. On old systems it has the effect of boosting the processor priority at the start of exception processing. If we were previously in the idle loop for example, we may be at low or very low priority. This is desirable as we want to process the exception as fast as possible. However looking closely at the generated code, we see that in all cases we execute another HMT_MEDIUM just four instructions later. With code patching applied, the final code on an old (Power6) system will look like, eg: c000000000000300 <data_access_pSeries>: c000000000000300: 7c 42 13 78 mr r2,r2 <- c000000000000304: 7d b2 43 a6 mtsprg 2,r13 c000000000000308: 7d b1 42 a6 mfsprg r13,1 c00000000000030c: f9 2d 00 80 std r9,128(r13) c000000000000310: 60 00 00 00 nop c000000000000314: 7c 42 13 78 mr r2,r2 <- So I suggest that the added code complexity of HMT_MEDIUM_PPR_DISCARD is not justified by the benefit of boosting the processor priority for the duration of four instructions, and therefore we drop it. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Ellerman 提交于
There are no longer any users of enter_rtas() outside of rtas.c, so make it "private", by moving the declaration inside rtas.c. Hopefully this will encourage people to use one of the wrappers which takes the sharp edges off the RTAS calling sequence. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Ellerman 提交于
Although call_rtas_display_status() does actually want to use the regular RTAS locking, it doesn't want the extra logic that is in rtas_call(), so currently it open codes the logic. Instead we can use rtas_call_unlocked(), after taking the RTAS lock. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Ellerman 提交于
Avoid open coding the logic by using rtas_call_unlocked(). Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Ellerman 提交于
Avoid open coding the logic by using rtas_call_unlocked(). Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Ellerman 提交于
Most users of RTAS (Run-Time Abstraction Services) use rtas_call(), which deals with locking as well as endian handling. However we have two users outside of rtas.c that can't use rtas_call() because they have different locking requirements. The hotplug CPU code can't take the RTAS lock because the CPU would go offline with the lock held and no other CPUs would be able to call RTAS until the CPU came back online. The xmon code doesn't want to take the lock because it would risk dead locking when we are trying to recover from a crash. Both sites required multiple patches when we added little endian support, proving that programmers can't do endian right. Although that ship has sailed, we can still clean the code up by providing an unlocked version of rtas_call() which avoids the need to open code the logic elsewhere. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Stewart Smith 提交于
Long ago, only in the lab, there was OPALv1 and OPALv2. Now there is just OPALv3, with nobody ever expecting anything on pre-OPALv3 to be cared about or supported by mainline kernels. So, let's remove FW_FEATURE_OPALv3 and instead use FW_FEATURE_OPAL exclusively. Signed-off-by: NStewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Stewart Smith 提交于
OPALv2 only ever existed in the lab and didn't escape to the world. All OPAL systems in the wild are OPALv3. The probability of there being an OPALv2 system still powered on anywhere inside IBM is approximately zero, let alone anyone expecting to run mainline kernels. So, start to remove references to OPALv2. Signed-off-by: NStewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Stewart Smith 提交于
The OpenPower Abstraction Layer firmware went through a couple of iterations in the lab before being released. What we now know as OPAL advertises itself as OPALv3. OPALv2 and OPALv1 never made it outside the lab, and the possibility of anyone at all ever building a mainline kernel today and expecting it to boot on such hardware is zero. Signed-off-by: NStewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 16 12月, 2015 1 次提交
-
-
由 Daniel Axtens 提交于
GregorianDay() is supposed to calculate the day of the week (tm->tm_wday) for a given day/month/year. In that calcuation it indexed into an array called MonthOffset using tm->tm_mon-1. However tm_mon is zero-based, not one-based, so this is off-by-one. It also means that every January, GregoiranDay() will access element -1 of the MonthOffset array. It also doesn't appear to be a correct algorithm either: see in contrast kernel/time/timeconv.c's time_to_tm function. It's been broken forever, which suggests no-one in userland uses this. It looks like no-one in the kernel uses tm->tm_wday either (see e.g. drivers/rtc/rtc-ds1305.c:319). tm->tm_wday is conventionally set to -1 when not available in hardware so we can simply set it to -1 and drop the function. (There are over a dozen other drivers in drivers/rtc that do this.) Found using UBSAN. Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andrew Morton <akpm@linux-foundation.org> # as an example of what UBSan finds. Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com> Cc: rtc-linux@googlegroups.com Signed-off-by: NDaniel Axtens <dja@axtens.net> Acked-by: NAlexandre Belloni <alexandre.belloni@free-electrons.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 14 12月, 2015 16 次提交
-
-
由 Rashmica Gupta 提交于
Currently if you are in xmon without an oops etc. to view the kernel version you have to type "d $linux_banner" - not necessarily obvious. As this is useful information, append to the output of "e" command. Example output: $mon> e cpu 0x1: Vector: 0 at [c0000000f879ba80] pc: c000000000081718: sysrq_handle_xmon+0x68/0x80 lr: c000000000081718: sysrq_handle_xmon+0x68/0x80 sp: c0000000f879bbe0 msr: 8000000000009033 current = 0xc0000000f604d5c0 paca = 0xc00000000fdc0480 softe: 0 irq_happened: 0x01 pid = 2467, comm = bash Linux version 4.4.0-rc2-00008-gc51af91c3ab3-dirty (rashmica@circle) (gcc version 5.1.1 20150629 (GCC) ) #45 SMP Wed Nov 25 10:25:12 AEDT 2015 Signed-off-by: NRashmica Gupta <rashmicy@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Rashmica Gupta 提交于
All users of QPACE have upgraded to QPACE2 so remove the Cell QPACE code. Signed-off-by: NRashmica Gupta <rashmicy@gmail.com> Acked-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Vipin K Parashar 提交于
Kernel prints respective warnings about various EPOW events for user information/action after parsing EPOW interrupts. At times below EPOW reset event warning is seen to be flooding kernel log over a period of time. May 25 03:46:34 alp kernel: Non critical power or cooling issue cleared May 25 03:46:52 alp kernel: Non critical power or cooling issue cleared May 25 03:53:48 alp kernel: Non critical power or cooling issue cleared May 25 03:55:46 alp kernel: Non critical power or cooling issue cleared May 25 03:56:34 alp kernel: Non critical power or cooling issue cleared May 25 03:59:04 alp kernel: Non critical power or cooling issue cleared May 25 04:02:01 alp kernel: Non critical power or cooling issue cleared These EPOW reset events are spurious in nature and are triggered by firmware without an actual EPOW event being reset. This patch avoids these multiple EPOW reset warnings by using a counter variable. This variable is incremented every time an EPOW event is reported. Upon receiving a EPOW reset event the same variable is checked to filter out spurious events and decremented accordingly. This patch also improves log messages to better describe EPOW event being reported. Merged adjacent log messages into single one to reduce number of lines printed per event. Signed-off-by: NKamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: NVipin K Parashar <vipin@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Neuling 提交于
Print MSR TM bits in oops messages. This appends them to the end like this: MSR: 8000000502823031 <SF,VEC,VSX,FP,ME,IR,DR,LE,TM[TE]> You get the TM[] only if at least one TM MSR bit is set. Inside the TM[], E means Enabled (bit 32), S means Suspended (bit 33), and T means Transactional (bit 34) If no bits are set, you get no TM[] output. Include rework of printbits() to handle this case. Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Boqun Feng 提交于
According to memory-barriers.txt, xchg*, cmpxchg* and their atomic_ versions all need to be fully ordered, however they are now just RELEASE+ACQUIRE, which are not fully ordered. So also replace PPC_RELEASE_BARRIER and PPC_ACQUIRE_BARRIER with PPC_ATOMIC_ENTRY_BARRIER and PPC_ATOMIC_EXIT_BARRIER in __{cmp,}xchg_{u32,u64} respectively to guarantee fully ordered semantics of atomic{,64}_{cmp,}xchg() and {cmp,}xchg(), as a complement of commit b97021f8 ("powerpc: Fix atomic_xxx_return barrier semantics") This patch depends on patch "powerpc: Make value-returning atomics fully ordered" for PPC_ATOMIC_ENTRY_BARRIER definition. Cc: stable@vger.kernel.org # 3.2+ Signed-off-by: NBoqun Feng <boqun.feng@gmail.com> Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Boqun Feng 提交于
According to memory-barriers.txt: > Any atomic operation that modifies some state in memory and returns > information about the state (old or new) implies an SMP-conditional > general memory barrier (smp_mb()) on each side of the actual > operation ... Which mean these operations should be fully ordered. However on PPC, PPC_ATOMIC_ENTRY_BARRIER is the barrier before the actual operation, which is currently "lwsync" if SMP=y. The leading "lwsync" can not guarantee fully ordered atomics, according to Paul Mckenney: https://lkml.org/lkml/2015/10/14/970 To fix this, we define PPC_ATOMIC_ENTRY_BARRIER as "sync" to guarantee the fully-ordered semantics. This also makes futex atomics fully ordered, which can avoid possible memory ordering problems if userspace code relies on futex system call for fully ordered semantics. Fixes: b97021f8 ("powerpc: Fix atomic_xxx_return barrier semantics") Cc: stable@vger.kernel.org # 3.2+ Signed-off-by: NBoqun Feng <boqun.feng@gmail.com> Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
The slot information of base page size hash pte is stored in the pgtable_t w.r.t transparent hugepage. We need to make sure we don't index beyond pgtable_t size. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
This will bulk read 4 hash pte slot entries and should reduce the loop Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
Remove the related functions and #defines Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
They don't need to track 4k subpage slot details and hence don't need second half of pgtable_t. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
Use the #define instead of open-coding the same Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
pte and pmd table size are dependent on config items. Don't hard code the same. This make sure we use the right value when masking pmd entries and also while checking pmd_bad Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
For a pte entry we will have _PAGE_PTE set. Our pte page address have a minimum alignment requirement of HUGEPD_SHIFT_MASK + 1. We use the lower 7 bits to indicate hugepd. ie. For pmd and pgd we can find: 1) _PAGE_PTE set pte -> indicate PTE 2) bits [2..6] non zero -> indicate hugepd. They also encode the size. We skip bit 1 (_PAGE_PRESENT). 3) othewise pointer to next table. Acked-by: NScott Wood <scottwood@freescale.com> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
We support THP only with book3s_64 and 64K page size. Move THP details to hash64-64k.h to clarify the same. Acked-by: NScott Wood <scottwood@freescale.com> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
W.r.t hugetlb, we support two format for pmd. With book3s_64 and 64K linux page size, we can have pte at the pmd level. Hence we don't need to support hugepd there. For everything else hugepd is supported and pmd_huge is (0). Acked-by: NScott Wood <scottwood@freescale.com> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
Only difference here is, we apply the WIMG mapping early, so rflags passed to updatepp will also be changed. Acked-by: NScott Wood <scottwood@freescale.com> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-