提交 · 9f888b3a10782de27864659d4ab48eb6ef2c0bf3 · openeuler / Kernel

31 5月, 2014 1 次提交

x86_64: expand kernel stack to 16K · 6538b8ea

由 Minchan Kim 提交于 5月 28, 2014

While I play inhouse patches with much memory pressure on qemu-kvm,
3.14 kernel was randomly crashed. The reason was kernel stack overflow.

When I investigated the problem, the callstack was a little bit deeper
by involve with reclaim functions but not direct reclaim path.

I tried to diet stack size of some functions related with alloc/reclaim
so did a hundred of byte but overflow was't disappeard so that I encounter
overflow by another deeper callstack on reclaim/allocator path.

Of course, we might sweep every sites we have found for reducing
stack usage but I'm not sure how long it saves the world(surely,
lots of developer start to add nice features which will use stack
agains) and if we consider another more complex feature in I/O layer
and/or reclaim path, it might be better to increase stack size(
meanwhile, stack usage on 64bit machine was doubled compared to 32bit
while it have sticked to 8K. Hmm, it's not a fair to me and arm64
already expaned to 16K. )

So, my stupid idea is just let's expand stack size and keep an eye
toward stack consumption on each kernel functions via stacktrace of ftrace.
For example, we can have a bar like that each funcion shouldn't exceed 200K
and emit the warning when some function consumes more in runtime.
Of course, it could make false positive but at least, it could make a
chance to think over it.

I guess this topic was discussed several time so there might be
strong reason not to increase kernel stack size on x86_64, for me not
knowing so Ccing x86_64 maintainers, other MM guys and virtio
maintainers.

Here's an example call trace using up the kernel stack:

         Depth    Size   Location    (51 entries)
         -----    ----   --------
   0)     7696      16   lookup_address
   1)     7680      16   _lookup_address_cpa.isra.3
   2)     7664      24   __change_page_attr_set_clr
   3)     7640     392   kernel_map_pages
   4)     7248     256   get_page_from_freelist
   5)     6992     352   __alloc_pages_nodemask
   6)     6640       8   alloc_pages_current
   7)     6632     168   new_slab
   8)     6464       8   __slab_alloc
   9)     6456      80   __kmalloc
  10)     6376     376   vring_add_indirect
  11)     6000     144   virtqueue_add_sgs
  12)     5856     288   __virtblk_add_req
  13)     5568      96   virtio_queue_rq
  14)     5472     128   __blk_mq_run_hw_queue
  15)     5344      16   blk_mq_run_hw_queue
  16)     5328      96   blk_mq_insert_requests
  17)     5232     112   blk_mq_flush_plug_list
  18)     5120     112   blk_flush_plug_list
  19)     5008      64   io_schedule_timeout
  20)     4944     128   mempool_alloc
  21)     4816      96   bio_alloc_bioset
  22)     4720      48   get_swap_bio
  23)     4672     160   __swap_writepage
  24)     4512      32   swap_writepage
  25)     4480     320   shrink_page_list
  26)     4160     208   shrink_inactive_list
  27)     3952     304   shrink_lruvec
  28)     3648      80   shrink_zone
  29)     3568     128   do_try_to_free_pages
  30)     3440     208   try_to_free_pages
  31)     3232     352   __alloc_pages_nodemask
  32)     2880       8   alloc_pages_current
  33)     2872     200   __page_cache_alloc
  34)     2672      80   find_or_create_page
  35)     2592      80   ext4_mb_load_buddy
  36)     2512     176   ext4_mb_regular_allocator
  37)     2336     128   ext4_mb_new_blocks
  38)     2208     256   ext4_ext_map_blocks
  39)     1952     160   ext4_map_blocks
  40)     1792     384   ext4_writepages
  41)     1408      16   do_writepages
  42)     1392      96   __writeback_single_inode
  43)     1296     176   writeback_sb_inodes
  44)     1120      80   __writeback_inodes_wb
  45)     1040     160   wb_writeback
  46)      880     208   bdi_writeback_workfn
  47)      672     144   process_one_work
  48)      528     112   worker_thread
  49)      416     240   kthread
  50)      176     176   ret_from_fork

[ Note: the problem is exacerbated by certain gcc versions that seem to
  generate much bigger stack frames due to apparently bad coalescing of
  temporaries and generating too many spills.  Rusty saw gcc-4.6.4 using
  35% more stack on the virtio path than 4.8.2 does, for example.

  Minchan not only uses such a bad gcc version (4.6.3 in his case), but
  some of the stack use is due to debugging (CONFIG_DEBUG_PAGEALLOC is
  what causes that kernel_map_pages() frame, for example). But we're
  clearly getting too close.

  The VM code also seems to have excessive stack frames partly for the
  same compiler reason, triggered by excessive inlining and lots of
  function arguments.

  We need to improve on our stack use, but in the meantime let's do this
  simple stack increase too.  Unlike most earlier reports, there is
  nothing simple that stands out as being really horribly wrong here,
  apart from the fact that the stack frames are just bigger than they
  should need to be.        - Linus ]
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S Tsirkin <mst@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: PJ Waskiewicz <pjwaskiewicz@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6538b8ea

27 5月, 2014 1 次提交

x86/xen: map foreign pfns for autotranslated guests · 77945ca7

由 Mukesh Rathor 提交于 5月 23, 2014

When running as a dom0 in PVH mode, foreign pfns that are accessed
must be added to our p2m which is managed by xen. This is done via
XENMEM_add_to_physmap_range hypercall. This is needed for toolstack
building guests and mapping guest memory, xentrace mapping xen pages,
etc.
Signed-off-by: NMukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

77945ca7

15 5月, 2014 8 次提交

x86/xen: do not use _PAGE_IOMAP in xen_remap_domain_mfn_range() · f59c5145

由 David Vrabel 提交于 1月 08, 2014

_PAGE_IOMAP is used in xen_remap_domain_mfn_range() to prevent the
pfn_pte() call in remap_area_mfn_pte_fn() from using the p2m to translate
the MFN.  If mfn_pte() is used instead, the p2m look up is avoided and
the use of _PAGE_IOMAP is no longer needed.
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

f59c5145

x86/xen: set regions above the end of RAM as 1:1 · 25b884a8

由 David Vrabel 提交于 1月 03, 2014

PCI devices may have BARs located above the end of RAM so mark such
frames as identity frames in the p2m (instead of the default of
missing).

PFNs outside the p2m (above MAX_P2M_PFN) are also considered to be
identity frames for the same reason.
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

25b884a8

x86/xen: only warn once if bad MFNs are found during setup · 2dcc9a3d

由 David Vrabel 提交于 1月 07, 2014

In xen_add_extra_mem(), if the WARN() checks for bad MFNs trigger it is
likely that they will trigger at lot, spamming the log.

Use WARN_ONCE() instead.
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

2dcc9a3d

x86/xen: compactly store large identity ranges in the p2m · 3cb83e46

由 David Vrabel 提交于 1月 07, 2014

Large (multi-GB) identity ranges currently require a unique middle page
(filled with p2m_identity entries) per 1 GB region.

Similar to the common p2m_mid_missing middle page for large missing
regions, introduce a p2m_mid_identity page (filled with p2m_identity
entries) which can be used instead.

set_phys_range_identity() thus only needs to allocate new middle pages
at the beginning and end of the range.
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

3cb83e46

x86/xen: fix set_phys_range_identity() if pfn_e > MAX_P2M_PFN · a9b5bff6

由 David Vrabel 提交于 1月 06, 2014

Allow set_phys_range_identity() to work with a range that overlaps
MAX_P2M_PFN by clamping pfn_e to MAX_P2M_PFN.
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

a9b5bff6

x86/xen: rename early_p2m_alloc() and early_p2m_alloc_middle() · fcca2e31

由 David Vrabel 提交于 1月 07, 2014

early_p2m_alloc_middle() allocates a new leaf page and
early_p2m_alloc() allocates a new middle page.  This is confusing.

Swap the names so they match what the functions actually do.
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

fcca2e31

xen/x86: set panic notifier priority to minimum · bc5eb201

由 Radim Krčmář 提交于 5月 13, 2014

Execution is not going to continue after telling Xen about the crash.
Let other panic notifiers run by postponing the final hypercall as much
as possible.
Signed-off-by: NAndrew Jones <drjones@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

bc5eb201

x86-64, modify_ldt: Make support for 16-bit segments a runtime option · fa81511b

由 Linus Torvalds 提交于 5月 14, 2014

Checkin:

b3b42ac2 x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels

disabled 16-bit segments on 64-bit kernels due to an information
leak.  However, it does seem that people are genuinely using Wine to
run old 16-bit Windows programs on Linux.

A proper fix for this ("espfix64") is coming in the upcoming merge
window, but as a temporary fix, create a sysctl to allow the
administrator to re-enable support for 16-bit segments.

It adds a "/proc/sys/abi/ldt16" sysctl that defaults to zero (off). If
you hit this issue and care about your old Windows program more than
you care about a kernel stack address information leak, you can do

   echo 1 > /proc/sys/abi/ldt16

as root (add it to your startup scripts), and you should be ok.

The sysctl table is only added if you have COMPAT support enabled on
x86-64, but I assume anybody who runs old windows binaries very much
does that ;)
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/CA%2B55aFw9BPoD10U1LfHbOMpHWZkvJTkMcfCs9s3urPr1YyWBxw@mail.gmail.com
Cc: <stable@vger.kernel.org>

fa81511b

14 5月, 2014 3 次提交

KVM: x86: disable master clock if TSC is reset during suspend · 16a96021

由 Marcelo Tosatti 提交于 5月 14, 2014

Updating system_time from the kernel clock once master clock
has been enabled can result in time backwards event, in case
kernel clock frequency is lower than TSC frequency.

Disable master clock in case it is necessary to update it
from the resume path.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

16a96021

x86, mm, hugetlb: Add missing TLB page invalidation for hugetlb_cow() · 9844f546

由 Anthony Iliopoulos 提交于 5月 14, 2014

The invalidation is required in order to maintain proper semantics
under CoW conditions. In scenarios where a process clones several
threads, a thread operating on a core whose DTLB entry for a
particular hugepage has not been invalidated, will be reading from
the hugepage that belongs to the forked child process, even after
hugetlb_cow().

The thread will not see the updated page as long as the stale DTLB
entry remains cached, the thread attempts to write into the page,
the child process exits, or the thread gets migrated to a different
processor.
Signed-off-by: NAnthony Iliopoulos <anthony.iliopoulos@huawei.com>
Link: http://lkml.kernel.org/r/20140514092948.GA17391@server-36.huawei.corpSuggested-by: NShay Goikhman <shay.goikhman@huawei.com>
Acked-by: NDave Hansen <dave.hansen@intel.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org> # v2.6.16+ (!)

9844f546

net: filter: x86: fix JIT address randomization · 773cd38f

由 Alexei Starovoitov 提交于 5月 13, 2014

bpf_alloc_binary() adds 128 bytes of room to JITed program image
and rounds it up to the nearest page size. If image size is close
to page size (like 4000), it is rounded to two pages:
round_up(4000 + 4 + 128) == 8192
then 'hole' is computed as 8192 - (4000 + 4) = 4188
If prandom_u32() % hole selects a number >= PAGE_SIZE - sizeof(*header)
then kernel will crash during bpf_jit_free():

kernel BUG at arch/x86/mm/pageattr.c:887!
Call Trace:
 [<ffffffff81037285>] change_page_attr_set_clr+0x135/0x460
 [<ffffffff81694cc0>] ? _raw_spin_unlock_irq+0x30/0x50
 [<ffffffff810378ff>] set_memory_rw+0x2f/0x40
 [<ffffffffa01a0d8d>] bpf_jit_free_deferred+0x2d/0x60
 [<ffffffff8106bf98>] process_one_work+0x1d8/0x6a0
 [<ffffffff8106bf38>] ? process_one_work+0x178/0x6a0
 [<ffffffff8106c90c>] worker_thread+0x11c/0x370

since bpf_jit_free() does:
  unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK;
  struct bpf_binary_header *header = (void *)addr;
to compute start address of 'bpf_binary_header'
and header->pages will pass junk to:
  set_memory_rw(addr, header->pages);

Fix it by making sure that &header->image[prandom_u32() % hole] and &header
are in the same page

Fixes: 314beb9b ("x86: bpf_jit_comp: secure bpf jit against spraying attacks")
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

773cd38f

13 5月, 2014 1 次提交

xen: refactor suspend pre/post hooks · aa8532c3

由 David Vrabel 提交于 5月 08, 2014

New architectures currently have to provide implementations of 5 different
functions: xen_arch_pre_suspend(), xen_arch_post_suspend(),
xen_arch_hvm_post_suspend(), xen_mm_pin_all(), and xen_mm_unpin_all().

Refactor the suspend code to only require xen_arch_pre_suspend() and
xen_arch_post_suspend().
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

aa8532c3

12 5月, 2014 1 次提交

x86, rdrand: When nordrand is specified, disable RDSEED as well · 7a5091d5

由 H. Peter Anvin 提交于 5月 11, 2014

One can logically expect that when the user has specified "nordrand",
the user doesn't want any use of the CPU random number generator,
neither RDRAND nor RDSEED, so disable both.
Reported-by: NStephan Mueller <smueller@chronox.de>
Cc: Theodore Ts'o <tytso@mit.edu>
Link: http://lkml.kernel.org/r/21542339.0lFnPSyGRS@myon.chronox.deSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

7a5091d5

09 5月, 2014 3 次提交

x86, vdso, time: Cast tv_nsec to u64 for proper shifting in update_vsyscall() · 28b92e09

由 Boris Ostrovsky 提交于 5月 09, 2014

With tk->wall_to_monotonic.tv_nsec being a 32-bit value on 32-bit
systems, (tk->wall_to_monotonic.tv_nsec << tk->shift) in update_vsyscall()
may lose upper bits or, worse, add them since compiler will do this:
	(u64)(tk->wall_to_monotonic.tv_nsec << tk->shift)
instead of
	((u64)tk->wall_to_monotonic.tv_nsec << tk->shift)

So if, for example, tv_nsec is 0x800000 and shift is 8 we will end up
with 0xffffffff80000000 instead of 0x80000000. And then we are stuck in
the subsequent 'while' loop.

We need an explicit cast.
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Link: http://lkml.kernel.org/r/1399648287-15178-1-git-send-email-boris.ostrovsky@oracle.comAcked-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: <stable@vger.kernel.org> # v3.14
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

28b92e09

x86: Fix typo in MSR_IA32_MISC_ENABLE_LIMIT_CPUID macro · c45f7736

由 Andres Freund 提交于 5月 09, 2014

The spuriously added semicolon didn't have any effect because the
macro isn't currently in use.

c0a639adSigned-off-by: NAndres Freund <andres@anarazel.de>
Link: http://lkml.kernel.org/r/1399598957-7011-3-git-send-email-andres@anarazel.de
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

c45f7736

x86: Fix typo preventing msr_set/clear_bit from having an effect · 722a0d22

由 Andres Freund 提交于 5月 09, 2014

Due to a typo the msr accessor function introduced in
22085a66 didn't have any lasting
effects because they accidentally wrote the old value back.

After c0a639ad this at the very least
this causes cpuid limits not to be lifted on some cpus leading to
missing capabilities for those.
Signed-off-by: NAndres Freund <andres@anarazel.de>
Link: http://lkml.kernel.org/r/1399598957-7011-2-git-send-email-andres@anarazel.de
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

722a0d22

08 5月, 2014 3 次提交

x86/intel: Add quirk to disable HPET for the Baytrail platform · 62187910

由 Feng Tang 提交于 4月 24, 2014

HPET on current Baytrail platform has accuracy problem to be
used as reliable clocksource/clockevent, so add a early quirk to
disable it.
Signed-off-by: NFeng Tang <feng.tang@intel.com>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1398327498-13163-2-git-send-email-feng.tang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

62187910

x86/hpet: Make boot_hpet_disable extern · f10f383d

由 Feng Tang 提交于 4月 24, 2014

HPET on some platform has accuracy problem. Making
"boot_hpet_disable" extern so that we can runtime disable
the HPET timer by using quirk to check the platform.
Signed-off-by: NFeng Tang <feng.tang@intel.com>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1398327498-13163-1-git-send-email-feng.tang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

f10f383d

x86-64, build: Fix stack protector Makefile breakage with 32-bit userland · 14262d67

由 George Spelvin 提交于 5月 07, 2014

If you are using a 64-bit kernel with 32-bit userland, then
scripts/gcc-x86_64-has-stack-protector.sh invokes 32-bit gcc
with -mcmodel=kernel, which produces:

<stdin>:1:0: error: code model 'kernel' not supported in the 32 bit mode

and trips the "broken compiler" test at arch/x86/Makefile:120.

There are several places a fix is possible, but the following seems
cleanest.  (But it's minimal; it would also be possible to factor
out a bunch of stuff from the two branches of the if.)
Signed-off-by: NGeorge Spelvin <linux@horizon.com>
Link: http://lkml.kernel.org/r/20140507210552.7581.qmail@ns.horizon.com
Cc: <stable@vger.kernel.org> # v3.14
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

14262d67

07 5月, 2014 3 次提交

KVM: vmx: disable APIC virtualization in nested guests · 696dfd95

由 Paolo Bonzini 提交于 5月 07, 2014

While running a nested guest, we should disable APIC virtualization
controls (virtualized APIC register accesses, virtual interrupt
delivery and posted interrupts), because we do not expose them to
the nested guest.
Reported-by: NHu Yaohui <loki2441@gmail.com>
Suggested-by: NAbel Gordon <abel@stratoscale.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

696dfd95

perf/x86/intel: Fix Silvermont's event constraints · a4b4f11b

由 Yan, Zheng 提交于 4月 29, 2014

Event 0x013c is not the same as fixed counter2, remove it from
Silvermont's event constraints.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1398755081-12471-1-git-send-email-zheng.z.yan@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

a4b4f11b

x86/reboot: Add reboot quirk for Certec BPC600 · aadca6fa

由 Christian Gmeiner 提交于 5月 07, 2014

Certec BPC600 needs reboot=pci to actually reboot.
Signed-off-by: NChristian Gmeiner <christian.gmeiner@gmail.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Li Aubrey <aubrey.li@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1399446114-2147-1-git-send-email-christian.gmeiner@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

aadca6fa

06 5月, 2014 2 次提交

asmlinkage, x86: Add explicit __visible to arch/x86/* · 2605fc21

由 Andi Kleen 提交于 5月 02, 2014

As requested by Linus add explicit __visible to the asmlinkage users.
This marks all functions visible to assembler.

Tree sweep for arch/x86/*
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1398984278-29319-3-git-send-email-andi@firstfloor.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

2605fc21

x86, build: Don't get confused by local symbols · ac008fe0

由 H. Peter Anvin 提交于 5月 05, 2014

arch/x86/crypto/sha1_avx2_x86_64_asm.S introduced _end as a local
symbol, which broke the build under certain circumstances.  Although
the wisdom of _end as a local symbol can definitely be questioned, the
build should not break for that reason.

Thus, filter the output of nm to only get global symbols of
appropriate type.
Reported-by: NAndy Lutomirski <luto@amacapital.net>
Cc: Chandramouli Narayanan <mouli@linux.intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-uxm3j3w3odglcwhafwq5tjqu@git.kernel.org

ac008fe0

03 5月, 2014 1 次提交

x86/efi: earlyprintk=efi,keep fix · 5f35eb0e

由 Dave Young 提交于 5月 01, 2014

earlyprintk=efi,keep will cause kernel hangs while freeing initmem like
below:

  VFS: Mounted root (ext4 filesystem) readonly on device 254:2.
  devtmpfs: mounted
  Freeing unused kernel memory: 880K (ffffffff817d4000 - ffffffff818b0000)

It is caused by efi earlyprintk use __init function which will be freed
later.  Such as early_efi_write is marked as __init, also it will use
early_ioremap which is init function as well.

To fix this issue, I added early initcall early_efi_map_fb which maps
the whole efi fb for later use. OTOH, adding a wrapper function
early_efi_map which calls early_ioremap before ioremap is available.

With this patch applied efi boot ok with earlyprintk=efi,keep console=efi
Signed-off-by: NDave Young <dyoung@redhat.com>
Signed-off-by: NMatt Fleming <matt.fleming@intel.com>

5f35eb0e

28 4月, 2014 3 次提交

genirq: x86: Ensure that dynamic irq allocation does not conflict · 62a08ae2

由 Thomas Gleixner 提交于 4月 24, 2014

On x86 the allocation of irq descriptors may allocate interrupts which
are in the range of the GSI interrupts. That's wrong as those
interrupts are hardwired and we don't have the irq domain translation
like PPC. So one of these interrupts can be hooked up later to one of
the devices which are hard wired to it and the io_apic init code for
that particular interrupt line happily reuses that descriptor with a
completely different configuration so hell breaks lose.

Inside x86 we allocate dynamic interrupts from above nr_gsi_irqs,
except for a few usage sites which have not yet blown up in our face
for whatever reason. But for drivers which need an irq range, like the
GPIO drivers, we have no limit in place and we don't want to expose
such a detail to a driver.

To cure this introduce a function which an architecture can implement
to impose a lower bound on the dynamic interrupt allocations.

Implement it for x86 and set the lower bound to nr_gsi_irqs, which is
the end of the hardwired interrupt space, so all dynamic allocations
happen above.

That not only allows the GPIO driver to work sanely, it also protects
the bogus callsites of create_irq_nr() in hpet, uv, irq_remapping and
htirq code. They need to be cleaned up as well, but that's a separate
issue.
Reported-by: NJin Yao <yao.jin@linux.intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: NMika Westerberg <mika.westerberg@linux.intel.com>
Cc: Mathias Nyman <mathias.nyman@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Krogerus Heikki <heikki.krogerus@intel.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1404241617360.28206@ionos.tec.linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

62a08ae2

KVM: x86: Check for host supported fields in shadow vmcs · fe2b201b

由 Bandan Das 提交于 4月 21, 2014

We track shadow vmcs fields through two static lists,
one for read only and another for r/w fields. However, with
addition of new vmcs fields, not all fields may be supported on
all hosts. If so, copy_vmcs12_to_shadow() trying to vmwrite on
unsupported hosts will result in a vmwrite error. For example, commit
36be0b9d introduced GUEST_BNDCFGS, which is not supported
by all processors. Filter out host unsupported fields before
letting guests use shadow vmcs
Signed-off-by: NBandan Das <bsd@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fe2b201b

x86/vsmp: Fix irq routing · 39025ba3

由 Oren Twaig 提交于 4月 28, 2014

Correct IRQ routing in case a vSMP box is detected
but the  Interrupt Routing Comply (IRC) value is set to
"comply", which leads to incorrect IRQ routing.

Before the patch:

When a vSMP box was detected and IRC was set to "comply",
users (and the kernel) couldn't effectively set the
destination of the IRQs. This is because the hook inside
vsmp_64.c always setup all CPUs as the IRQ destination using
cpumask_setall() as the return value for IRQ allocation mask.
Later, this "overrided" mask caused the kernel to set the IRQ
destination to the lowest online CPU in the mask (CPU0 usually).

After the patch:

When the IRC is set to "comply", users (and the kernel) can control
the destination of the IRQs as we will not be changing the
default "apic->vector_allocation_domain".
Signed-off-by: NOren Twaig <oren@scalemp.com>
Acked-by: NShai Fultheim <shai@scalemp.com>
Link: http://lkml.kernel.org/r/1398669697-2123-1-git-send-email-oren@scalemp.com
[ Minor readability edits. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

39025ba3

24 4月, 2014 2 次提交

arm: xen: implement multicall hypercall support. · 5e40704e

由 Ian Campbell 提交于 4月 17, 2014

As part of this make the usual change to xen_ulong_t in place of unsigned long.
This change has no impact on x86.

The Linux definition of struct multicall_entry.result differs from the Xen
definition, I think for good reasons, and used a long rather than an unsigned
long. Therefore introduce a xen_long_t, which is a long on x86 architectures
and a signed 64-bit integer on ARM.

Use uint32_t nr_calls on x86 for consistency with the ARM definition.

Build tested on amd64 and i386 builds. Runtime tested on ARM.
Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

5e40704e

perf/x86: Fix RAPL rdmsrl_safe() usage · 9f7ff893

由 Stephane Eranian 提交于 4月 23, 2014

This patch fixes a bug introduced by:

  24223657 ("perf/x86/intel: Use rdmsrl_safe() when initializing RAPL PMU")

The rdmsrl_safe() function returns 0 on success.
The current code was failing to detect the RAPL PMU
on real hardware  (missing /sys/devices/power) because
the return value of rdmsrl_safe() was misinterpreted.
Signed-off-by: NStephane Eranian <eranian@google.com>
Acked-by: NBorislav Petkov <bp@suse.de>
Acked-by: NVenkatesh Srinivas <venkateshs@google.com>
Cc: peterz@infradead.org
Cc: zheng.z.yan@intel.com
Link: http://lkml.kernel.org/r/20140423170418.GA12767@quadSigned-off-by: NIngo Molnar <mingo@kernel.org>

9f7ff893

22 4月, 2014 1 次提交

x86: LLVMLinux: Wrap -mno-80387 with cc-option · 8f2dd677

由 Behan Webster 提交于 4月 21, 2014

Wrap -mno-80387 gcc options with cc-option so they don't break
clang.
Signed-off-by: NBehan Webster <behanw@converseincode.com>
Cc: torvalds@linux-foundation.org
Cc: dwmw2@infradead.org
Cc: pageexec@freemail.hu
Link: http://lkml.kernel.org/r/1398145227-25053-1-git-send-email-behanw@converseincode.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

8f2dd677

18 4月, 2014 1 次提交

perf/x86/intel: Use rdmsrl_safe() when initializing RAPL PMU · 24223657

由 Venkatesh Srinivas 提交于 3月 13, 2014

CPUs which should support the RAPL counters according to
Family/Model/Stepping may still issue #GP when attempting to access
the RAPL MSRs. This may happen when Linux is running under KVM and
we are passing-through host F/M/S data, for example. Use rdmsrl_safe
to first access the RAPL_POWER_UNIT MSR; if this fails, do not
attempt to use this PMU.
Signed-off-by: NVenkatesh Srinivas <venkateshs@google.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1394739386-22260-1-git-send-email-venkateshs@google.com
Cc: zheng.z.yan@intel.com
Cc: eranian@google.com
Cc: ak@linux.intel.com
Cc: linux-kernel@vger.kernel.org
[ The patch also silently fixes another bug: rapl_pmu_init() didn't handle the memory alloc failure case previously. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

24223657

17 4月, 2014 2 次提交

kprobes/x86: Fix page-fault handling logic · 6381c24c

由 Masami Hiramatsu 提交于 4月 17, 2014

Current kprobes in-kernel page fault handler doesn't
expect that its single-stepping can be interrupted by
an NMI handler which may cause a page fault(e.g. perf
with callback tracing).

In that case, the page-fault handled by kprobes and it
misunderstands the page-fault has been caused by the
single-stepping code and tries to recover IP address
to probed address.

But the truth is the page-fault has been caused by the
NMI handler, and do_page_fault failes to handle real
page fault because the IP address is modified and
causes Kernel BUGs like below.

 ----
 [ 2264.726905] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
 [ 2264.727190] IP: [<ffffffff813c46e0>] copy_user_generic_string+0x0/0x40

To handle this correctly, I fixed the kprobes fault
handler to ensure the faulted ip address is its own
single-step buffer instead of checking current kprobe
state.
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Sandeepa Prabhu <sandeepa.prabhu@linaro.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: fche@redhat.com
Cc: systemtap@sourceware.org
Link: http://lkml.kernel.org/r/20140417081644.26341.52351.stgit@ltc230.yrl.intra.hitachi.co.jpSigned-off-by: NIngo Molnar <mingo@kernel.org>

6381c24c

x86/mce: Fix CMCI preemption bugs · ea431643

由 Ingo Molnar 提交于 4月 17, 2014

The following commit:

  27f6c573 ("x86, CMCI: Add proper detection of end of CMCI storms")

Added two preemption bugs:

 - machine_check_poll() does a get_cpu_var() without a matching
   put_cpu_var(), which causes preemption imbalance and crashes upon
   bootup.

 - it does percpu ops without disabling preemption. Preemption is not
   disabled due to the mistaken use of a raw spinlock.

To fix these bugs fix the imbalance and change
cmci_discover_lock to a regular spinlock.
Reported-by: NOwen Kibel <qmewlo@gmail.com>
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Cc: Chen, Gong <gong.chen@linux.intel.com>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexander Todorov <atodorov@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/n/tip-jtjptvgigpfkpvtQxpEk1at2@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
--
 arch/x86/kernel/cpu/mcheck/mce.c       |    4 +---
 arch/x86/kernel/cpu/mcheck/mce_intel.c |   18 +++++++++---------
 2 files changed, 10 insertions(+), 12 deletions(-)

ea431643

16 4月, 2014 2 次提交

x86: Remove the PCI reboot method from the default chain · 5be44a6f

由 Ingo Molnar 提交于 4月 04, 2014

Steve reported a reboot hang and bisected it back to this commit:

  a4f1987e x86, reboot: Add EFI and CF9 reboot methods into the default list

He heroically tested all reboot methods and found the following:

  reboot=t       # triple fault                  ok
  reboot=k       # keyboard ctrl                 FAIL
  reboot=b       # BIOS                          ok
  reboot=a       # ACPI                          FAIL
  reboot=e       # EFI                           FAIL   [system has no EFI]
  reboot=p       # PCI 0xcf9                     FAIL

And I think it's pretty obvious that we should only try PCI 0xcf9 as a
last resort - if at all.

The other observation is that (on this box) we should never try
the PCI reboot method, but close with either the 'triple fault'
or the 'BIOS' (terminal!) reboot methods.

Thirdly, CF9_COND is a total misnomer - it should be something like
CF9_SAFE or CF9_CAREFUL, and 'CF9' should be 'CF9_FORCE' ...

So this patch fixes the worst problems:

 - it orders the actual reboot logic to follow the reboot ordering
   pattern - it was in a pretty random order before for no good
   reason.

 - it fixes the CF9 misnomers and uses BOOT_CF9_FORCE and
   BOOT_CF9_SAFE flags to make the code more obvious.

 - it tries the BIOS reboot method before the PCI reboot method.
   (Since 'BIOS' is a terminal reboot method resulting in a hang
    if it does not work, this is essentially equivalent to removing
    the PCI reboot method from the default reboot chain.)

 - just for the miraculous possibility of terminal (resulting
   in hang) reboot methods of triple fault or BIOS returning
   without having done their job, there's an ordering between
   them as well.
Reported-and-bisected-and-tested-by: NSteven Rostedt <rostedt@goodmis.org>
Cc: Li Aubrey <aubrey.li@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Link: http://lkml.kernel.org/r/20140404064120.GB11877@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

5be44a6f

xen/spinlock: Don't enable them unconditionally. · e0fc17a9

由 Konrad Rzeszutek Wilk 提交于 4月 04, 2014

The git commit a945928e
('xen: Do not enable spinlocks before jump_label_init() has executed')
was added to deal with the jump machinery. Earlier the code
that turned on the jump label was only called by Xen specific
functions. But now that it had been moved to the initcall machinery
it gets called on Xen, KVM, and baremetal - ouch!. And the detection
machinery to only call it on Xen wasn't remembered in the heat
of merge window excitement.

This means that the slowpath is enabled on baremetal while it should
not be.
Reported-by: NWaiman Long <waiman.long@hp.com>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
CC: stable@vger.kernel.org
CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

e0fc17a9

15 4月, 2014 2 次提交

x86/xen: Fix 32-bit PV guests's usage of kernel_stack · 4461bbc0

由 Boris Ostrovsky 提交于 4月 10, 2014

Commit 198d208d ("x86: Keep
thread_info on thread stack in x86_32") made 32-bit kernels use
kernel_stack to point to thread_info. That change missed a couple of
updates needed by Xen's 32-bit PV guests:

1. kernel_stack needs to be initialized for secondary CPUs

2. GET_THREAD_INFO() now uses %fs register which may not be the
   kernel's version when executing xen_iret().

With respect to the second issue, we don't need GET_THREAD_INFO()
anymore: we used it as an intermediate step to get to per_cpu xen_vcpu
and avoid referencing %fs. Now that we are going to use %fs anyway we
may as well go directly to xen_vcpu.
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

4461bbc0

KVM: x86: remove WARN_ON from get_kernel_ns() · b351c39c

由 Marcelo Tosatti 提交于 4月 10, 2014

Function and callers can be preempted.

https://bugzilla.kernel.org/show_bug.cgi?id=73721Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>

b351c39c

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功