提交 · 637b86e75f4c255a4446bc0b67ce9d914b9d2d42 · openeuler / raspberrypi-kernel

27 8月, 2009 7 次提交

x86, pat: Add lookup_memtype to get the current memtype of a paddr · 637b86e7

由 Venkatesh Pallipadi 提交于 7月 10, 2009

Add a new routine lookup_memtype() to get the current memtype based on
the PAT reserves and frees.
Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

637b86e7

x86, pat: Use page flags to track memtypes of RAM pages · f5841740

由 Venkatesh Pallipadi 提交于 7月 10, 2009

Change reserve_ram_pages_type and free_ram_pages_type to use 2 page
flags to track UC_MINUS, WC, WB and default types. Previous RAM tracking
just tracked WB or NonWB, which was not complete and did not allow
tracking of RAM fully and there was no way to get the actual type
reserved by looking at the page flags.

We use the memtype_lock spinlock for atomicity in dealing with
memtype tracking in struct page.
Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

f5841740

x86, pat: Add rbtree to do quick lookup in memtype tracking · 335ef896

由 Venkatesh Pallipadi 提交于 7月 10, 2009

PAT memtype tracking uses a linear link list to keep track of IO
(non-RAM) regions and their memtypes. The code used a last_accessed
pointer as a cache to speedup the lookup. As per discussions with
H. Peter Anvin a while back, having a rbtree here will avoid bad
performances in pathological cases where we may end up with huge
linked list. This may not add any noticable performance speedup
in normal case as the number of entires in PAT memtype list tend
to be ~20-30 range. The patch removes the "cached_entry" logic
as with rbtree we have more generic way of speeding up the lookup.

With this patch, we use rbtree to do the quick lookup. We still use
linked list as the memtype range tracked can be of different sizes
and can overlap in different ways. We also keep track of usage counts
with linked list.

Example:
Multiple ioremaps with different sizes
uncached-minus @ 0xfffff00000-0xfffff04000
uncached-minus @ 0xfffff02000-0xfffff03000

And one userlevel mmap and the thread forks a new process
uncached-minus @ 0xbf453000-0xbf454000
uncached-minus @ 0xbf453000-0xbf454000
Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

335ef896

x86, pat: Add PAT reserve free to io_mapping* APIs · 9e36fda0

由 Venkatesh Pallipadi 提交于 7月 10, 2009

io_mapping_* interfaces were added, mainly for graphics drivers.
Make this interface go through the PAT reserve/free, instead of
hardcoding WC mapping. This makes sure that there are no
aliases due to unconditional WC setting.
Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

9e36fda0

x86, pat: New i/f for driver to request memtype for IO regions · 9fd126bc

由 Venkatesh Pallipadi 提交于 7月 10, 2009

Add new routines to request memtype for IO regions. This will currently
be a backend for io_mapping_* routines. But, it can also be made available
to drivers directly in future, in case it is needed.

reserve interface reserves the memory, makes sure we have a compatible
memory type available and keeps the identity map in sync when needed.
Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

9fd126bc

x86, pat: ioremap to follow same PAT restrictions as other PAT users · 279e669b

由 Venkatesh Pallipadi 提交于 7月 10, 2009

ioremap has this hard-coded check for new type and requested type. That
check differs from other PAT users like /dev/mem mmap, remap_pfn_range
in only one condition where requested type is UC_MINUS and new type
is WC. Under that condition, ioremap fails. But other PAT interfaces succeed
with a WC mapping.

Change to make ioremap be in sync with other PAT APIs and use the same
macro as others. Also changes the error print to KERN_ERR instead of
pr_debug.
Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

279e669b

x86, pat: Keep identity maps consistent with mmaps even when pat_disabled · 5fc51746

由 Venkatesh Pallipadi 提交于 7月 10, 2009

Make reserve_memtype internally take care of pat disabled case and fallback
to default return values.

Remove the specific pat_disabled checks in track_* routines.

Change kernel_map_sync_memtype to sync identity map even when
pat_disabled.

This change ensures that, even for pat_disabled case, we take care of
keeping identity map in sync. Before this patch, in pat disabled case,
ioremap() keeps the identity maps in sync and other APIs like pci and
/dev/mem mmap don't, which is not a very consistent behavior.
Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

5fc51746

01 7月, 2009 1 次提交

x86: only clear node_states for 64bit · 66918dcd

由 Yinghai Lu 提交于 6月 30, 2009

Nathan reported that

| commit 73d60b7f
| Author: Yinghai Lu <yinghai@kernel.org>
| Date:   Tue Jun 16 15:33:00 2009 -0700
|
|    page-allocator: clear N_HIGH_MEMORY map before we set it again
|
|    SRAT tables may contains nodes of very small size.  The arch code may
|    decide to not activate such a node.  However, currently the early boot
|    code sets N_HIGH_MEMORY for such nodes.  These nodes therefore seem to be
|    active although these nodes have no present pages.
|
|    For 64bit N_HIGH_MEMORY == N_NORMAL_MEMORY, so that works for 64 bit too

unintentionally and incorrectly clears the cpuset.mems cgroup attribute on
an i386 kvm guest, meaning that cpuset.mems can not be used.

Fix this by only clearing node_states[N_NORMAL_MEMORY] for 64bit only.
and need to do save/restore for that in find_zone_movable_pfn
Reported-by: NNathan Lynch <ntl@pobox.com>
Tested-by: NNathan Lynch <ntl@pobox.com>
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>,
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

66918dcd

23 6月, 2009 1 次提交

x86: Move init_gbpages() to setup_arch() · 854c879f

由 Pekka J Enberg 提交于 6月 22, 2009

The init_gbpages() function is conditionally called from
init_memory_mapping() function. There are two call-sites where
this 'after_bootmem' condition can be true: setup_arch() and
mem_init() via pci_iommu_alloc().

Therefore, it's safe to move the call to init_gbpages() to
setup_arch() as it's always called before mem_init().

This removes an after_bootmem use - paving the way to remove
all uses of that state variable.
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
Acked-by: NYinghai Lu <yinghai@kernel.org>
LKML-Reference: <Pine.LNX.4.64.0906221731210.19474@melkki.cs.Helsinki.FI>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

854c879f

22 6月, 2009 3 次提交

x86: fix pageattr handling for lpage percpu allocator and re-enable it · e59a1bb2

由 Tejun Heo 提交于 6月 22, 2009

lpage allocator aliases a PMD page for each cpu and returns whatever
is unused to the page allocator.  When the pageattr of the recycled
pages are changed, this makes the two aliases point to the overlapping
regions with different attributes which isn't allowed and known to
cause subtle data corruption in certain cases.

This can be handled in simliar manner to the x86_64 highmap alias.
pageattr code should detect if the target pages have PMD alias and
split the PMD alias and synchronize the attributes.

pcpur allocator is updated to keep the allocated PMD pages map sorted
in ascending address order and provide pcpu_lpage_remapped() function
which binary searches the array to determine whether the given address
is aliased and if so to which address.  pageattr is updated to use
pcpu_lpage_remapped() to detect the PMD alias and split it up as
necessary from cpa_process_alias().

Jan Beulich spotted the original problem and incorrect usage of vaddr
instead of laddr for lookup.

With this, lpage percpu allocator should work correctly.  Re-enable
it.

[ Impact: fix subtle lpage pageattr bug and re-enable lpage ]
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NJan Beulich <JBeulich@novell.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>

e59a1bb2

x86: reorganize cpa_process_alias() · 992f4c1c

由 Tejun Heo 提交于 6月 22, 2009

Reorganize cpa_process_alias() so that new alias condition can be
added easily.

Jan Beulich spotted problem in the original cleanup thread which
incorrectly assumed the two existing conditions were mutially
exclusive.

[ Impact: code reorganization ]
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>

992f4c1c

Move FAULT_FLAG_xyz into handle_mm_fault() callers · d06063cc

由 Linus Torvalds 提交于 4月 10, 2009

This allows the callers to now pass down the full set of FAULT_FLAG_xyz
flags to handle_mm_fault().  All callers have been (mechanically)
converted to the new calling convention, there's almost certainly room
for architectures to clean up their code and then add FAULT_FLAG_RETRY
when that support is added.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d06063cc

21 6月, 2009 1 次提交

x86: don't use 'access_ok()' as a range check in get_user_pages_fast() · 7f818906

由 Linus Torvalds 提交于 6月 20, 2009

It's really not right to use 'access_ok()', since that is meant for the
normal "get_user()" and "copy_from/to_user()" accesses, which are done
through the TLB, rather than through the page tables.

Why? access_ok() does both too few, and too many checks.  Too many,
because it is meant for regular kernel accesses that will not honor the
'user' bit in the page tables, and because it honors the USER_DS vs
KERNEL_DS distinction that we shouldn't care about in GUP.  And too few,
because it doesn't do the 'canonical' check on the address on x86-64,
since the TLB will do that for us.

So instead of using a function that isn't meant for this, and does
something else and much more complicated, just do the real rules: we
don't want the range to overflow, and on x86-64, we want it to be a
canonical low address (on 32-bit, all addresses are canonical).
Acked-by: NIngo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7f818906

19 6月, 2009 1 次提交

perf_counter, x86: Improve interactions with fast-gup · 0c871971

由 Ingo Molnar 提交于 6月 15, 2009

Improve a few details in perfcounter call-chain recording that
makes use of fast-GUP:

- Use ACCESS_ONCE() to observe the pte value. ptes are fundamentally
  racy and can be changed on another CPU, so we have to be careful
  about how we access them. The PAE branch is already careful with
  read-barriers - but the non-PAE and 64-bit side needs an
  ACCESS_ONCE() to make sure the pte value is observed only once.

- make the checks a bit stricter so that we can feed it any kind of
  cra^H^H^H user-space input ;-)
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

0c871971

16 6月, 2009 1 次提交

x86: mm: Read cr2 before prefetching the mmap_lock · 5dfaf90f

由 Ingo Molnar 提交于 6月 16, 2009

Prefetch instructions can generate spurious faults on certain
models of older CPUs. The faults themselves cannot be stopped
and they can occur pretty much anywhere - so the way we solve
them is that we detect certain patterns and ignore the fault.

There is one small path of code where we must not take faults
though: the #PF handler execution leading up to the reading
of the CR2 (the faulting address). If we take a fault there
then we destroy the CR2 value (with that of the prefetching
instruction's) and possibly mishandle user-space or
kernel-space pagefaults.

It turns out that in current upstream we do exactly that:

	prefetchw(&mm->mmap_sem);

	/* Get the faulting address: */
	address = read_cr2();

This is not good.

So turn around the order: first read the cr2 then prefetch
the lock address. Reading cr2 is plenty fast (2 cycles) so
delaying the prefetch by this amount shouldnt be a big issue
performance-wise.

[ And this might explain a mystery fault.c warning that sometimes
  occurs on one an old AMD/Semptron based test-system i have -
  which does have such prefetch problems. ]

Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
LKML-Reference: <20090616030522.GA22162@Krystal>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

5dfaf90f

15 6月, 2009 10 次提交

x86, mm: Add __get_user_pages_fast() · 465a454f

由 Peter Zijlstra 提交于 6月 15, 2009

Introduce a gup_fast() variant which is usable from IRQ/NMI context.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
CC: Nick Piggin <npiggin@suse.de>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

465a454f

kmemcheck: add opcode self-testing at boot · ac61a757

由 Vegard Nossum 提交于 2月 27, 2009

We've had some troubles in the past with weird instructions. This
patch adds a self-test framework which can be used to verify that
a certain set of opcodes are decoded correctly. Of course, the
opcodes which are not tested can still give the wrong results.

In short, this is just a safeguard to catch unintentional changes
in the opcode decoder. It does not mean that errors can't still
occur!

[rebased for mainline inclusion]
Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com>

ac61a757

kmemcheck: add hooks for the page allocator · b1eeab67

由 Vegard Nossum 提交于 11月 25, 2008

This adds support for tracking the initializedness of memory that
was allocated with the page allocator. Highmem requests are not
tracked.

Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>

[build fix for !CONFIG_KMEMCHECK]
Signed-off-by: NIngo Molnar <mingo@elte.hu>

[rebased for mainline inclusion]
Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com>

b1eeab67

kmemcheck: don't track page tables · 9e730237

由 Vegard Nossum 提交于 2月 22, 2009

As these are allocated using the page allocator, we need to pass
__GFP_NOTRACK before we add page allocator support to kmemcheck.
Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com>

9e730237

x86: add hooks for kmemcheck · f8561296

由 Vegard Nossum 提交于 4月 04, 2008

The hooks that we modify are:
- Page fault handler (to handle kmemcheck faults)
- Debug exception handler (to hide pages after single-stepping
  the instruction that caused the page fault)

Also redefine memset() to use the optimized version if kmemcheck is
enabled.

(Thanks to Pekka Enberg for minimizing the impact on the page fault
handler.)

As kmemcheck doesn't handle MMX/SSE instructions (yet), we also disable
the optimized xor code, and rely instead on the generic C implementation
in order to avoid false-positive warnings.
Signed-off-by: NVegard Nossum <vegardno@ifi.uio.no>

[whitespace fixlet]
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

[rebased for mainline inclusion]
Signed-off-by: NVegard Nossum <vegardno@ifi.uio.no>

f8561296

kmemcheck: use kmemcheck_pte_lookup() instead of open-coding it · f8b4ece2