- 17 2月, 2016 1 次提交
-
-
由 Alexey Kardashevskiy 提交于
Quite often drivers set only "write" permission assuming that this includes "read" permission as well and this works on plenty of platforms. However IODA2 is strict about this and produces an EEH when "read" permission is not set and reading happens. This adds a workaround in the IODA code to always add the "read" bit when the "write" bit is set. Fixes: 10b35b2b ("powerpc/powernv: Do not set "read" flag if direction==DMA_NONE") Cc: stable@vger.kernel.org # 4.2+ Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Tested-by: NDouglas Miller <dougmill@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 15 2月, 2016 4 次提交
-
-
由 Aneesh Kumar K.V 提交于
With ppc64 we use the deposited pgtable_t to store the hash pte slot information. We should not withdraw the deposited pgtable_t without marking the pmd none. This ensure that low level hash fault handling will skip this huge pte and we will handle them at upper levels. Recent change to pmd splitting changed the above in order to handle the race between pmd split and exit_mmap. The race is explained below. Consider following race: CPU0 CPU1 shrink_page_list() add_to_swap() split_huge_page_to_list() __split_huge_pmd_locked() pmdp_huge_clear_flush_notify() // pmd_none() == true exit_mmap() unmap_vmas() zap_pmd_range() // no action on pmd since pmd_none() == true pmd_populate() As result the THP will not be freed. The leak is detected by check_mm(): BUG: Bad rss-counter state mm:ffff880058d2e580 idx:1 val:512 The above required us to not mark pmd none during a pmd split. The fix for ppc is to clear the huge pte of _PAGE_USER, so that low level fault handling code skip this pte. At higher level we do take ptl lock. That should serialze us against the pmd split. Once the lock is acquired we do check the pmd again using pmd_same. That should always return false for us and hence we should retry the access. We do the pmd_same check in all case after taking plt with THP (do_huge_pmd_wp_page, do_huge_pmd_numa_page and huge_pmd_set_accessed) Also make sure we wait for irq disable section in other cpus to finish before flipping a huge pte entry with a regular pmd entry. Code paths like find_linux_pte_or_hugepte depend on irq disable to get a stable pte_t pointer. A parallel thp split need to make sure we don't convert a pmd pte to a regular pmd entry without waiting for the irq disable section to finish. Fixes: eef1b3ba ("thp: implement split_huge_pmd()") Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Gavin Shan 提交于
When PCI bus is unplugged during full hotplug for EEH recovery, the platform PE instance (struct pnv_ioda_pe) isn't released and it dereferences the stale PCI bus that has been released. It leads to kernel crash when referring to the stale PCI bus. This fixes the issue by correcting the PE's primary bus when it's oneline at plugging time, in pnv_pci_dma_bus_setup() which is to be called by pcibios_fixup_bus(). Cc: stable@vger.kernel.org # v4.1+ Reported-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com> Reported-by: NPradipta Ghosh <pradghos@in.ibm.com> Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Tested-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Gavin Shan 提交于
When PE is created, its primary bus is cached to pe->bus. At later point, the cached primary bus is returned from eeh_pe_bus_get(). However, we could get stale cached primary bus and run into kernel crash in one case: full hotplug as part of fenced PHB error recovery releases all PCI busses under the PHB at unplugging time and recreate them at plugging time. pe->bus is still dereferencing the PCI bus that was released. This adds another PE flag (EEH_PE_PRI_BUS) to represent the validity of pe->bus. pe->bus is updated when its first child EEH device is online and the flag is set. Before unplugging in full hotplug for error recovery, the flag is cleared. Fixes: 8cdb2833 ("powerpc/eeh: Trace PCI bus from PE") Cc: stable@vger.kernel.org #v3.11+ Reported-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com> Reported-by: NPradipta Ghosh <pradghos@in.ibm.com> Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Tested-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Denis Kirjanov 提交于
If a cpu is hotplugged while the hcall trace points are active, it's possible to hit a warning from RCU due to the trace points calling into RCU from an offline cpu, eg: RCU used illegally from offline CPU! rcu_scheduler_active = 1, debug_locks = 1 Make the hypervisor tracepoints conditional by using TRACE_EVENT_FN_COND. Acked-by: NSteven Rostedt <rostedt@goodmis.org> Signed-off-by: NDenis Kirjanov <kda@linux-powerpc.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 08 2月, 2016 1 次提交
-
-
由 Andreas Schwab 提交于
Since binutils 2.26 BFD is doing suffix merging on STRTAB sections. But dedotify modifies the symbol names in place, which can also modify unrelated symbols with a name that matches a suffix of a dotted name. To remove the leading dot of a symbol name we can just increment the pointer into the STRTAB section instead. Backport to all stables to avoid breakage when people update their binutils - mpe. Cc: stable@vger.kernel.org Signed-off-by: NAndreas Schwab <schwab@linux-m68k.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 31 1月, 2016 1 次提交
-
-
由 Aneesh Kumar K.V 提交于
In file included from mm/vmscan.c:54:0: include/linux/swapops.h: In function ‘pte_to_swp_entry’: include/linux/swapops.h:69:2: error: implicit declaration of function ‘pte_swp_soft_dirty’ [-Werror=implicit-function-declaration] if (pte_swp_soft_dirty(pte)) ^ include/linux/swapops.h:70:3: error: implicit declaration of function ‘pte_swp_clear_soft_dirty’ [-Werror=implicit-function-declaration] pte = pte_swp_clear_soft_dirty(pte); We support soft dirty tracking only with book3s 64 for now. So change the Kconfig dependency accordingly. Also CHECKPOINT_RESTORE feature is not really dependent on SOFT_DIRTY. We track the dependency between MEM_SOFT_DIRTY and ARCH_SOFT_DIRTY through headers Fixes: 7207f436 ("powerpc/mm: Add page soft dirty tracking") Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 28 1月, 2016 2 次提交
-
-
由 Aneesh Kumar K.V 提交于
This was wrongly updated by commit 7aa9a23c ("powerpc, thp: remove infrastructure for handling splitting PMDs") during the last merge window. Fix it up. This could lead to incorrect behaviour in THP and/or mprotect(), at a minimum. Fixes: 7aa9a23c ("powerpc, thp: remove infrastructure for handling splitting PMDs") Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Madhavan Srinivasan 提交于
Commit 7a786832 ("powerpc/perf: Add an explict flag indicating presence of SLOT field") introduced the PPMU_HAS_SSLOT flag to remove the assumption that MMCRA[SLOT] was present when PPMU_ALT_SIPR was not set. That commit's changelog also mentions that Power8 does not support MMCRA[SLOT]. However when the Power8 PMU support was merged, it errnoeously included the PPMU_HAS_SSLOT flag. So remove PPMU_HAS_SSLOT from the Power8 flags. mpe: On systems where MMCRA[SLOT] exists, the field occupies bits 37:39 (IBM numbering). On Power8 bit 37 is reserved, and 38:39 overlap with the high bits of the Threshold Event Counter Mantissa. I am not aware of any published events which use the threshold counting mechanism, which would cause the mantissa bits to be set. So in practice this bug is unlikely to trigger. Fixes: e05b9b9e ("powerpc/perf: Power8 PMU support") Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 27 1月, 2016 1 次提交
-
-
由 Gavin Shan 提交于
In eeh_pe_loc_get(), the PE location code is retrieved from the "ibm,loc-code" property of the device node for the bridge of the PE's primary bus. It's not correct because the property indicates the parent PE's location code. This reads the correct PE location code from "ibm,io-base-loc-code" or "ibm,slot-location-code" property of PE parent bus's device node. Cc: stable@vger.kernel.org # v3.16+ Fixes: 357b2f3d ("powerpc/eeh: Dump PE location code") Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Tested-by: NRussell Currey <ruscur@russell.cc> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 25 1月, 2016 1 次提交
-
-
由 Vasant Hegde 提交于
With commit 90a545e9 (restrict /dev/mem to idle io memory ranges) mapping rtas_rmo_buf from user space is failing. Hence we are not able to make RTAS syscall. This patch calls page_is_rtas_user_buf before calling iomem_is_exclusive in devmem_is_allowed(). This will allow user space to map rtas_rmo_buf and we are able to make RTAS syscall. Reported-by: NBharata B Rao <bharata@linux.vnet.ibm.com> CC: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: NVasant Hegde <hegdevasant@linux.vnet.ibm.com> Acked-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 21 1月, 2016 3 次提交
-
-
由 Stephen Rothwell 提交于
Commit d5d6a443 ("arch/powerpc/include/asm/pgtable-ppc64.h: add pmd_[dirty|mkclean] for THP") added a new identical definition of pmd_dirty(). Remove it again. Cc: Minchan Kim <minchan@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Alan Modra 提交于
PowerPC64 uses the symbol .TOC. much as other targets use _GLOBAL_OFFSET_TABLE_. It identifies the value of the GOT pointer (or in powerpc parlance, the TOC pointer). Global offset tables are generally local to an executable or shared library, or in the kernel, module. Thus it does not make sense for a module to resolve a relocation against .TOC. to the kernel's .TOC. value. A module has its own .TOC., and indeed the powerpc64 module relocation processing ignores the kernel value of .TOC. and instead calculates a module-local value. This patch removes code involved in exporting the kernel .TOC., tweaks modpost to ignore an undefined .TOC., and the module loader to twiddle the section symbol so that .TOC. isn't seen as undefined. Note that if the kernel was compiled with -msingle-pic-base then ELFv2 would not have function global entry code setting up r2. In that case the module call stubs would need to be modified to set up r2 using the kernel .TOC. value, requiring some of this code to be reinstated. mpe: Furthermore a change in binutils master (not yet released) causes the current way we handle the TOC to no longer work when building with MODVERSIONS=y and RELOCATABLE=n. The symptom is that modules can not be loaded due to there being no version found for TOC. Cc: stable@vger.kernel.org # 3.16+ Signed-off-by: NAlan Modra <amodra@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Chandan Rajendra 提交于
Test runs on a ppc64 BE guest succeeded using modified fstests. Also tested on ppc64 LE using a home made test - mpe. Signed-off-by: NChandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 17 1月, 2016 1 次提交
-
-
由 Will Deacon 提交于
As illustrated by commit a3afe70b ("[S390] latencytop s390 support."), HAVE_LATENCYTOP_SUPPORT is defined by an architecture to advertise an implementation of save_stack_trace_tsk. However, as of 9212ddb5 ("stacktrace: provide save_stack_trace_tsk() weak alias") a dummy implementation is provided if STACKTRACE=y. Given that LATENCYTOP already depends on STACKTRACE_SUPPORT and selects STACKTRACE, we can remove HAVE_LATENCYTOP_SUPPORT altogether. Signed-off-by: NWill Deacon <will.deacon@arm.com> Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: James Hogan <james.hogan@imgtec.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Helge Deller <deller@gmx.de> Acked-by: NMichael Ellerman <mpe@ellerman.id.au> Cc: "David S. Miller" <davem@davemloft.net> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 16 1月, 2016 6 次提交
-
-
由 Dan Williams 提交于
For the purpose of communicating the optional presence of a 'struct page' for the pfn returned from ->direct_access(), introduce a type that encapsulates a page-frame-number plus flags. These flags contain the historical "page_link" encoding for a scatterlist entry, but can also denote "device memory". Where "device memory" is a set of pfns that are not part of the kernel's linear mapping by default, but are accessed via the same memory controller as ram. The motivation for this new type is large capacity persistent memory that needs struct page entries in the 'memmap' to support 3rd party DMA (i.e. O_DIRECT I/O with a persistent memory source/target). However, we also need it in support of maintaining a list of mapped inodes which need to be unmapped at driver teardown or freeze_bdev() time. Signed-off-by: NDan Williams <dan.j.williams@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Hansen <dave@sr71.net> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dan Williams 提交于
To date, we have implemented two I/O usage models for persistent memory, PMEM (a persistent "ram disk") and DAX (mmap persistent memory into userspace). This series adds a third, DAX-GUP, that allows DAX mappings to be the target of direct-i/o. It allows userspace to coordinate DMA/RDMA from/to persistent memory. The implementation leverages the ZONE_DEVICE mm-zone that went into 4.3-rc1 (also discussed at kernel summit) to flag pages that are owned and dynamically mapped by a device driver. The pmem driver, after mapping a persistent memory range into the system memmap via devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus page-backed pmem-pfns via flags in the new pfn_t type. The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the resulting pte(s) inserted into the process page tables with a new _PAGE_DEVMAP flag. Later, when get_user_pages() is walking ptes it keys off _PAGE_DEVMAP to pin the device hosting the page range active. Finally, get_page() and put_page() are modified to take references against the device driver established page mapping. Finally, this need for "struct page" for persistent memory requires memory capacity to store the memmap array. Given the memmap array for a large pool of persistent may exhaust available DRAM introduce a mechanism to allocate the memmap from persistent memory. The new "struct vmem_altmap *" parameter to devm_memremap_pages() enables arch_add_memory() to use reserved pmem capacity rather than the page allocator. This patch (of 18): The core has developed a need for a "pfn_t" type [1]. Move the existing pfn_t in KVM to kvm_pfn_t [2]. [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html [2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.htmlSigned-off-by: NDan Williams <dan.j.williams@intel.com> Acked-by: NChristoffer Dall <christoffer.dall@linaro.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Minchan Kim 提交于
MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite of the contents since MADV_FREE syscall is called for THP page. This patch adds pmd_dirty and pmd_mkclean for THP page MADV_FREE support. Signed-off-by: NMinchan Kim <minchan@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Shaohua Li <shli@kernel.org> Cc: <yalin.wang2010@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen Gang <gang.chen.5i5j@gmail.com> Cc: Chris Zankel <chris@zankel.net> Cc: Daniel Micay <danielmicay@gmail.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: David S. Miller <davem@davemloft.net> Cc: Helge Deller <deller@gmx.de> Cc: Hugh Dickins <hughd@google.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Jason Evans <je@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mika Penttil <mika.penttila@nextfour.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Rik van Riel <riel@redhat.com> Cc: Roland Dreier <roland@kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Shaohua Li <shli@kernel.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kirill A. Shutemov 提交于
With new refcounting we don't need to mark PMDs splitting. Let's drop code to handle this. pmdp_splitting_flush() is not needed too: on splitting PMD we will do pmdp_clear_flush() + set_pte_at(). pmdp_clear_flush() will do IPI as needed for fast_gup. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kirill A. Shutemov 提交于
Tail page refcounting is utterly complicated and painful to support. It uses ->_mapcount on tail pages to store how many times this page is pinned. get_page() bumps ->_mapcount on tail page in addition to ->_count on head. This information is required by split_huge_page() to be able to distribute pins from head of compound page to tails during the split. We will need ->_mapcount to account PTE mappings of subpages of the compound page. We eliminate need in current meaning of ->_mapcount in tail pages by forbidding split entirely if the page is pinned. The only user of tail page refcounting is THP which is marked BROKEN for now. Let's drop all this mess. It makes get_page() and put_page() much simpler. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: NSasha Levin <sasha.levin@oracle.com> Tested-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NJerome Marchand <jmarchan@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kirill A. Shutemov 提交于
We are going to decouple splitting THP PMD from splitting underlying compound page. This patch renames split_huge_page_pmd*() functions to split_huge_pmd*() to reflect the fact that it doesn't imply page splitting, only PMD. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: NSasha Levin <sasha.levin@oracle.com> Tested-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: NJerome Marchand <jmarchan@redhat.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 1月, 2016 1 次提交
-
-
由 Vladimir Davydov 提交于
Mark those kmem allocations that are known to be easily triggered from userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to memcg. For the list, see below: - threadinfo - task_struct - task_delay_info - pid - cred - mm_struct - vm_area_struct and vm_region (nommu) - anon_vma and anon_vma_chain - signal_struct - sighand_struct - fs_struct - files_struct - fdtable and fdtable->full_fds_bits - dentry and external_name - inode for all filesystems. This is the most tedious part, because most filesystems overwrite the alloc_inode method. The list is far from complete, so feel free to add more objects. Nevertheless, it should be close to "account everything" approach and keep most workloads within bounds. Malevolent users will be able to breach the limit, but this was possible even with the former "account everything" approach (simply because it did not account everything in fact). [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Tejun Heo <tj@kernel.org> Cc: Greg Thelen <gthelen@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 1月, 2016 5 次提交
-
-
由 Ulrich Weigand 提交于
GCC 6 will include changes to generated code with -mcmodel=large, which is used to build kernel modules on powerpc64le. This was necessary because the large model is supposed to allow arbitrary sizes and locations of the code and data sections, but the ELFv2 global entry point prolog still made the unconditional assumption that the TOC associated with any particular function can be found within 2 GB of the function entry point: func: addis r2,r12,(.TOC.-func)@ha addi r2,r2,(.TOC.-func)@l .localentry func, .-func To remove this assumption, GCC will now generate instead this global entry point prolog sequence when using -mcmodel=large: .quad .TOC.-func func: .reloc ., R_PPC64_ENTRY ld r2, -8(r12) add r2, r2, r12 .localentry func, .-func The new .reloc triggers an optimization in the linker that will replace this new prolog with the original code (see above) if the linker determines that the distance between .TOC. and func is in range after all. Since this new relocation is now present in module object files, the kernel module loader is required to handle them too. This patch adds support for the new relocation and implements the same optimization done by the GNU linker. Cc: stable@vger.kernel.org Signed-off-by: NUlrich Weigand <ulrich.weigand@de.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Russell Currey 提交于
The recently added OPAL API call, OPAL_CONSOLE_FLUSH, originally took no parameters and returned nothing. The call was updated to accept the terminal number to flush, and returned various values depending on the state of the output buffer. The prototype has been updated and its usage in the OPAL kmsg dumper has been modified to support its new behaviour as an incremental flush. Signed-off-by: NRussell Currey <ruscur@russell.cc> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael S. Tsirkin 提交于
This defines __smp_xxx barriers for powerpc for use by virtualization. smp_xxx barriers are removed as they are defined correctly by asm-generic/barriers.h This reduces the amount of arch-specific boiler-plate code. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Acked-by: NArnd Bergmann <arnd@arndb.de> Acked-by: NBoqun Feng <boqun.feng@gmail.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
-
由 Michael S. Tsirkin 提交于
On powerpc read_barrier_depends, smp_read_barrier_depends smp_store_mb(), smp_mb__before_atomic and smp_mb__after_atomic match the asm-generic variants exactly. Drop the local definitions and pull in asm-generic/barrier.h instead. This is in preparation to refactoring this code area. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Acked-by: NArnd Bergmann <arnd@arndb.de> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Davidlohr Bueso 提交于
With commit b92b8b35 ("locking/arch: Rename set_mb() to smp_store_mb()") it was made clear that the context of this call (and thus set_mb) is strictly for CPU ordering, as opposed to IO. As such all archs should use the smp variant of mb(), respecting the semantics and saving a mandatory barrier on UP. Signed-off-by: NDavidlohr Bueso <dbueso@suse.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: <linux-arch@vger.kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: dave@stgolabs.net Link: http://lkml.kernel.org/r/1445975631-17047-3-git-send-email-dave@stgolabs.netSigned-off-by: NIngo Molnar <mingo@kernel.org> Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
- 12 1月, 2016 2 次提交
-
-
由 Hugh Dickins 提交于
Swapoff after swapping hangs on the G5, when CONFIG_CHECKPOINT_RESTORE=y but CONFIG_MEM_SOFT_DIRTY is not set. That's because the non-zero _PAGE_SWP_SOFT_DIRTY bit, added by CONFIG_HAVE_ARCH_SOFT_DIRTY=y, is not discounted when CONFIG_MEM_SOFT_DIRTY is not set: so swap ptes cannot be recognized. (I suspect that the peculiar dependence of HAVE_ARCH_SOFT_DIRTY on CHECKPOINT_RESTORE in arch/powerpc/Kconfig comes from an incomplete attempt to solve this problem.) It's true that the relationship between CONFIG_HAVE_ARCH_SOFT_DIRTY and and CONFIG_MEM_SOFT_DIRTY is too confusing, and it's true that swapoff should be made more robust; but nevertheless, fix up the powerpc ifdefs as x86_64 and s390 (which met the same problem) have them, defining the bits as 0 if CONFIG_MEM_SOFT_DIRTY is not set. Fixes: 7207f436 ("powerpc/mm: Add page soft dirty tracking") Signed-off-by: NHugh Dickins <hughd@google.com> Reviewed-by: NCyrill Gorcunov <gorcunov@openvz.org> Acked-by: NLaurent Dufour <ldufour@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
Core kernel expects swp_entry_t to consist of only swap type and swap offset. We should not leak pte bits into swp_entry_t. This breaks swapoff which use the swap type and offset to build a swp_entry_t and later compare that to the swp_entry_t obtained from linux page table pte. Leaking pte bits into swp_entry_t breaks that comparison and results in us looping in try_to_unuse. The stack trace can be anywhere below try_to_unuse() in mm/swapfile.c, since swapoff is circling around and around that function, reading from each used swap block into a page, then trying to find where that page belongs, looking at every non-file pte of every mm that ever swapped. Fixes: 6a119eae ("powerpc/mm: Add a _PAGE_PTE bit") Reported-by: NHugh Dickins <hughd@google.com> Suggested-by: NHugh Dickins <hughd@google.com> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: NHugh Dickins <hughd@google.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 11 1月, 2016 6 次提交
-
-
由 Michael Ellerman 提交于
In order to support Power9 we need two new HWCAP bits. We are merging these ahead of the cputable entry so that glibc can start referring to them. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Alistair Popple 提交于
P8+ hardware reports all errors on PE#0. This patch ensures PE#0 is not assigned to NPU devices so that it can be used for EEH. Signed-off-by: NAlistair Popple <alistair@popple.id.au> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Alistair Popple 提交于
The P8+ hardware supports four partitionable endpoints (PEs) however the hardware reports all errors as occurring on PE#0. This means we need to reserve this PE for error handling (EEH) and not assign it to a NPU device, implying that some devices will need to share PEs. This patch changes the PE assignment for NPU devices such that NPU devices which connect to the same GPU are assigned to the same PE#. Signed-off-by: NAlistair Popple <alistair@popple.id.au> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Alistair Popple 提交于
The emulated NVLink PCI devices share the same IODA2 TCE tables but only support a single TVT (instead of the normal two for PCI devices). This requires the kernel to manually replace windows with either the bypass or non-bypass window depending on what the driver has requested. Unfortunately an incorrect optimisation was made in pnv_pci_ioda_dma_set_mask() which caused updating of some NPU device PEs to be skipped in certain configurations due to an incorrect assumption that a NULL peer PE in the array indicated there were no more peers present. This patch fixes the problem by ensuring all peer PEs are updated. Fixes: 5d2aa710 ("powerpc/powernv: Add support for Nvlink NPUs") Signed-off-by: NAlistair Popple <alistair@popple.id.au> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Russell Currey 提交于
PCI in powernv now supports quite a bit more than p5ioc2, so remove the outdated comment. Signed-off-by: NRussell Currey <ruscur@russell.cc> Acked-by: NStewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Steven Rostedt 提交于
It has come to my attention that kprobe event stack tracing does not work on powerpc. You can see with the following: # cd /sys/kernel/debug/tracing # echo stacktrace > trace_options # echo 'p kfree' > kprobe_events # echo 1 > events/kprobes/enable Will print the following warning: save_stack_trace_regs() not implemented yet. Although save_stack_trace() (which normal event stack traces use) is implemented, save_stack_trace_regs() which kprobe events use is not. This is a cheap attempt to implement that function. Note, This may have issues if a task tries to get a stack trace from another task with its regs, because it just passes in "current" to save_context_stack(). But this does solve the issue with stack tracing kprobe events. Reported-by: NChunyu Hu <chuhu@redhat.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 09 1月, 2016 3 次提交
-
-
由 Dan Williams 提交于
Let all the archs that implement devmem_is_allowed() opt-in to a common definition of CONFIG_STRICT_DEVM in lib/Kconfig.debug. Cc: Kees Cook <keescook@chromium.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "David S. Miller" <davem@davemloft.net> Acked-by: NCatalin Marinas <catalin.marinas@arm.com> Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com> [heiko: drop 'default y' for s390] Acked-by: NIngo Molnar <mingo@redhat.com> Suggested-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Michael Ellerman 提交于
Commit 2fc251a8 ("powerpc: Copy only required pieces of the mm_context_t to the paca") broke the build for CONFIG_PPC_STD_MMU_64=y and CONFIG_PPC_MM_SLICES=n. That only happens for a kernel built with 4K pages and HUGETLB disabled, which is why we missed it. Fix it by adding a mm_ctx_user_psize member to the paca and populating it in the appropriate places. Fixes: 2fc251a8 ("powerpc: Copy only required pieces of the mm_context_t to the paca") Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Paolo Bonzini 提交于
Since the numbers now overlap, it makes sense to enumerate them in asm/kvm_host.h rather than linux/kvm_host.h. Functions that refer to architecture-specific requests are also moved to arch/. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 08 1月, 2016 1 次提交
-
-
由 Andrew Lunn 提交于
Have mdio_alloc() create the array of interrupt numbers, and initialize it to POLLING. This is what most MDIO drivers want, so allowing code to be removed from the drivers. Signed-off-by: NAndrew Lunn <andrew@lunn.ch> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 1月, 2016 1 次提交
-
-
由 Rabin Vincent 提交于
The SKF_AD_ALU_XOR_X ancillary is not like the other ancillary data instructions since it XORs A with X while all the others replace A with some loaded value. All the BPF JITs fail to clear A if this is used as the first instruction in a filter. This was found using american fuzzy lop. Add a helper to determine if A needs to be cleared given the first instruction in a filter, and use this in the JITs. Except for ARM, the rest have only been compile-tested. Fixes: 34805931 ("net: filter: get rid of BPF_S_* enum") Signed-off-by: NRabin Vincent <rabin@rab.in> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-