提交 · 0f97580da0ead5cfda2d67afcc8571a86b303516 · openeuler / Kernel

30 4月, 2013 10 次提交

mm/h8300: use common help functions to free reserved pages · 0f97580d

由 Jiang Liu 提交于 4月 29, 2013

Use common help functions to free reserved pages.
Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0f97580d

mm/FRV: use common help functions to free reserved pages · 0516f884

由 Jiang Liu 提交于 4月 29, 2013

Use common help functions to free reserved pages.
Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0516f884

mm/cris: use common help functions to free reserved pages · 2e529815

由 Jiang Liu 提交于 4月 29, 2013

Use common help functions to free reserved pages.
Also include <asm/sections.h> to avoid local declaration.
Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
Cc: Mikael Starvik <starvik@axis.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2e529815

mm/c6x: use common help functions to free reserved pages · 95a259ed

由 Jiang Liu 提交于 4月 29, 2013

Use common help functions to free reserved pages.
Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

95a259ed

mm/blackfin: use common help functions to free reserved pages · ed8a36f5

由 Jiang Liu 提交于 4月 29, 2013

Use common help functions to free reserved pages.
Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ed8a36f5

mm/avr32: use common help functions to free reserved pages · c15da0a8

由 Jiang Liu 提交于 4月 29, 2013

Use common help functions to free reserved pages.
Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
Acked-by: NHans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c15da0a8

mm/ARM: use common help functions to free reserved pages · 83db0384

由 Jiang Liu 提交于 4月 29, 2013

Use common help functions to free reserved pages.
Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

83db0384

mm/alpha: use common help functions to free reserved pages · f3beeb4a

由 Jiang Liu 提交于 4月 29, 2013

Use common help functions to free reserved pages.  Also include
<asm/sections.h> to avoid local declarations.
Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f3beeb4a

mm, show_mem: suppress page counts in non-blockable contexts · 4b59e6c4

由 David Rientjes 提交于 4月 29, 2013

On large systems with a lot of memory, walking all RAM to determine page
types may take a half second or even more.

In non-blockable contexts, the page allocator will emit a page allocation
failure warning unless __GFP_NOWARN is specified.  In such contexts, irqs
are typically disabled and such a lengthy delay may even result in NMI
watchdog timeouts.

To fix this, suppress the page walk in such contexts when printing the
page allocation failure warning.
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4b59e6c4

mkcapflags.pl: convert to mkcapflags.sh · 0c0de199

由 Rob Landley 提交于 4月 29, 2013

Generate asm-x86/cpufeature.h with posix-2008 commands instead of perl.
Signed-off-by: NRob Landley <rob@landley.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: David Howells <dhowell@redhat.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0c0de199

26 4月, 2013 11 次提交

s390/pci: use pci_scan_root_bus · 1c21351b

由 Sebastian Ott 提交于 4月 25, 2013

The pci config space accessors on s390 are (now) smart enough to
figure out if a pci function is available. So instead of calling
pci_create_root_bus and then pci_scan_single_device for each
available function just call pci_scan_root_bus and let the pci core
do the scanning (via config reads on all possible functions) and
device creation.
Reviewed-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

1c21351b

s390: remove small stack config option · 581618f2

由 Heiko Carstens 提交于 4月 25, 2013

We've seen repeatedly that 8KB stack size on 64 bit kernels
is not sufficient.
So simply remove the config option.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

581618f2

s390: system call path micro optimization · 61649881

由 Martin Schwidefsky 提交于 4月 24, 2013

Add a pointer to the system call table to the thread_info structure.
The TIF_31BIT bit is set or cleared by SET_PERSONALITY exactly once
for the lifetime of a process. With the pointer to the correct system
call table in thread_info the system call code in entry64.S path can
drop the check for TIF_31BIT which saves a couple of instructions.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

61649881

s390: lowcore stack pointer offsets · dc7ee00d

由 Martin Schwidefsky 提交于 4月 24, 2013

Store the stack pointers in the lowcore for the kernel stack, the async
stack and the panic stack with the offset required for the first user.
This avoids an unnecessary add instruction on the system call path.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

dc7ee00d

parisc: use spin_lock_irqsave/spin_unlock_irqrestore for PTE updates · bda079d3

由 John David Anglin 提交于 4月 23, 2013

User applications running on SMP kernels have long suffered from instability
and random segmentation faults.  This patch improves the situation although
there is more work to be done.

One of the problems is the various routines in pgtable.h that update page table
entries use different locking mechanisms, or no lock at all (set_pte_at).  This
change modifies the routines to all use the same lock pa_dbit_lock.  This lock
is used for dirty bit updates in the interruption code. The patch also purges
the TLB entries associated with the PTE to ensure that inconsistent values are
not used after the page table entry is updated.  The UP and SMP code are now
identical.

The change also includes a minor update to the purge_tlb_entries function in
cache.c to improve its efficiency.
Signed-off-by: NJohn David Anglin <dave.anglin@bell.net>
Cc: Helge Deller <deller@gmx.de>
Signed-off-by: NHelge Deller <deller@gmx.de>

bda079d3

parisc: disable -mlong-calls compiler option for kernel modules · cf71130d

由 Helge Deller 提交于 4月 23, 2013

CONFIG_MLONGCALLS was introduced in commit
ec758f98 to overcome linker issues when linking
huge linux kernels, e.g. with many modules linked in.

But in the kernel module loader there is no support yet for the new relocation
types, which is why modules built with -mlong-calls can't be loaded.
Furthermore, for modules long calls are not really necessary, since we already
use stub sections which resolve long distance calls.

So, let's just disable this compiler option when compiling kernel modules.
Signed-off-by: NHelge Deller <deller@gmx.de>

cf71130d

parisc: uaccess: fix compiler warnings caused by __put_user casting · 0f28b628

由 Will Deacon 提交于 4月 22, 2013

When targetting 32-bit processors, __put_user emits a pair of stw
instructions for the 8-byte case. If the type of __val is a pointer, the
marshalling code casts it to the wider integer type of u64, resulting
in the following compiler warnings:

  kernel/signal.c: In function 'copy_siginfo_to_user':
  kernel/signal.c:2752:11: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
  kernel/signal.c:2752:11: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
  [...]

This patch fixes the warnings by removing the marshalling code and using
the correct output modifiers in the __put_{user,kernel}_asm64 macros
so that GCC will allocate the right registers without the need to
extract the two words explicitly.

Cc: Helge Deller <deller@gmx.de>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NHelge Deller <deller@gmx.de>

0f28b628

parisc: Change kunmap macro to static inline function · 87be2f88

由 John David Anglin 提交于 4月 23, 2013

Change kunmap macro to static inline function to fix build error
compiling drivers/base/dma-buf.c.

Without the change, the following error can occur:

   CC      drivers/base/dma-buf.o
drivers/base/dma-buf.c: In function 'dma_buf_kunmap':
drivers/base/dma-buf.c:427:46:
error: macro "kunmap" passed 3 arguments, but takes just 1

I believe parisc is the only arch to implement kunmap using a macro.
Signed-off-by: NJohn David Anglin <dave.anglin@bell.net>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Signed-off-by: NHelge Deller <deller@gmx.de>

87be2f88

parisc: Provide __ucmpdi2 to resolve undefined references in 32 bit builds. · ca0ad83d

由 John David Anglin 提交于 4月 20, 2013

The Debian experimental linux source package (3.8.5-1) build fails
with the following errors:
...
MODPOST 2016 modules
ERROR: "__ucmpdi2" [fs/btrfs/btrfs.ko] undefined!
ERROR: "__ucmpdi2" [drivers/md/dm-verity.ko] undefined!

The attached patch resolves this problem.  It is based on the s390
implementation of ucmpdi2.c.
Signed-off-by: NJohn David Anglin <dave.anglin@bell.net>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Signed-off-by: NHelge Deller <deller@gmx.de>

ca0ad83d

USB: OMAP: ISP1301 needs USB_PHY · c3c683ea

由 Arnd Bergmann 提交于 4月 25, 2013

The Kconfig entry for USB_OMAP unconditionally selects USB_ISP1301,
which is now only visible when USB_PHY is also enabled.

This adds an appropriate dependency and enables USB_PHY in the omap1
defconfig, avoiding these build warnings:

warning: (USB_OHCI_HCD && USB_OMAP) selects ISP1301_OMAP which has unmet direct dependencies (USB_SUPPORT && USB_PHY && I2C && ARCH_OMAP_OTG)

Also fix a Makefile typo while we're at it.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Cc: Felipe Balbi <balbi@ti.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

c3c683ea

USB: lpc32xx: ISP1301 needs USB_PHY · 64e98a79

由 Arnd Bergmann 提交于 4月 25, 2013

The Kconfig entry for USB_LPC32XX unconditionally selects USB_ISP1301,
which is now only visible when USB_PHY is also enabled.

This adds an appropriate dependency and enables USB_PHY in the msm
defconfig, avoiding these build errors:

warning: (USB_LPC32XX) selects USB_ISP1301 which has unmet direct dependencies (USB_SUPPORT && USB_PHY && (USB || USB_GADGET) && I2C)
drivers/built-in.o: In function `usb_hcd_nxp_probe':
drivers/usb/host/ohci-nxp.c:224: undefined reference to `isp1301_get_client'
drivers/built-in.o: In function `lpc32xx_udc_probe':
drivers/usb/gadget/lpc32xx_udc.c:3071: undefined reference to `isp1301_get_client'
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Roland Stigge <stigge@antcom.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

64e98a79

25 4月, 2013 1 次提交
- D
  sparc64: Fix missing put_cpu_var() in tlb_batch_add_one() when not batching. · f0af9707
  由 David S. Miller 提交于 4月 24, 2013
```
Reported-by: NMeelis Roos <mroos@linux.ee>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  f0af9707
24 4月, 2013 2 次提交

efi: Check EFI revision in setup_efi_vars · f697036b

由 Josh Boyer 提交于 4月 24, 2013

We need to check the runtime sys_table for the EFI version the firmware
specifies instead of just checking for a NULL QueryVariableInfo. Older
implementations of EFI don't have QueryVariableInfo but the runtime is
a smaller structure, so the pointer to it may be pointing off into garbage.

This is apparently the case with several Apple firmwares that support EFI
1.10, and the current check causes them to no longer boot. Fix based on
a suggestion from Matthew Garrett.
Signed-off-by: NJosh Boyer <jwboyer@redhat.com>
Signed-off-by: NMatt Fleming <matt.fleming@intel.com>

f697036b

x86, efi: Fix a build warning · 51f8fbba

由 Borislav Petkov 提交于 4月 24, 2013

Fix this:

arch/x86/boot/compressed/eboot.c: In function ‘setup_efi_vars’:
arch/x86/boot/compressed/eboot.c:269:2: warning: passing argument 1 of ‘efi_call_phys’ makes pointer from integer without a cast [enabled by default]
In file included from arch/x86/boot/compressed/eboot.c:12:0:
/w/kernel/linux/arch/x86/include/asm/efi.h:8:33: note: expected ‘void *’ but argument is of type ‘long unsigned int’

after cc5a080c ("efi: Pass boot services variable info to runtime
code").
Reported-by: NPaul Bolle <pebolle@tiscali.nl>
Cc: Matthew Garrett <matthew.garrett@nebula.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NMatt Fleming <matt.fleming@intel.com>

51f8fbba

23 4月, 2013 11 次提交

ARM: mxs_defconfig: add CONFIG_USB_PHY · added5fc

由 Shawn Guo 提交于 3月 24, 2013

Commit edc7cb2e (usb: phy: make it a menuconfig) makes USB_MXS_PHY
be a sub-item of menuconfig symbol USB_PHY.  This change gets the
selection of CONFIG_USB_MXS_PHY in mxs_defconfig lost.  Hence the
boot stops at the point below.

  [    1.600867] ci_hdrc ci_hdrc.0: doesn't support gadget
  [    1.606282] ci_hdrc ci_hdrc.0: EHCI Host Controller
  [    1.613522] ci_hdrc ci_hdrc.0: new USB bus registered, assigned bus number 1

Add CONFIG_USB_PHY to have the CONFIG_USB_MXS_PHY selection back to
work.
Signed-off-by: NShawn Guo <shawn.guo@linaro.org>
Signed-off-by: NFelipe Balbi <balbi@ti.com>

added5fc

ARM: imx_v6_v7_defconfig: add CONFIG_USB_PHY · a52f31e9

由 Shawn Guo 提交于 3月 24, 2013

Commit edc7cb2e (usb: phy: make it a menuconfig) makes USB_MXS_PHY
be a sub-item of menuconfig symbol USB_PHY.  This change gets the
selection of CONFIG_USB_MXS_PHY in imx_v6_v7_defconfig lost.  Hence the
boot stops at the point below.

  ci_hdrc ci_hdrc.0: doesn't support gadget
  ci_hdrc ci_hdrc.0: EHCI Host Controller
  ci_hdrc ci_hdrc.0: new USB bus registered, assigned bus number 1

Add CONFIG_USB_PHY to have the CONFIG_USB_MXS_PHY selection back to
work.
Signed-off-by: NShawn Guo <shawn.guo@linaro.org>
Signed-off-by: NFelipe Balbi <balbi@ti.com>

a52f31e9

s390/uapi: change struct statfs[64] member types to unsigned values · b8668fd0

由 Heiko Carstens 提交于 4月 22, 2013

Kay Sievers reported that coreutils' stat tool has a problem with
s390's statfs[64] definition:

> The definition of struct statfs::f_type needs a fix. s390 is the only
> architecture in the kernel that uses an int and expects magic
> constants lager than INT_MAX to fit into.
>
> A fix is needed to make Fedora boot on s390, it currently fails to do
> so. Userspace does not want to add code to paper-over this issue.

[...]

> Even coreutils cannot handle it:
>   #define RAMFS_MAGIC  0x858458f6
>   # stat -f -c%t /
>   ffffffff858458f6
>
>   #define BTRFS_SUPER_MAGIC 0x9123683E
>   # stat -f -c%t /mnt
>   ffffffff9123683e

The bug is caused by an implicit sign extension within the stat tool:

out_uint_x (pformat, prefix_len, statfsbuf->f_type);

where the format finally will be "%lx".
A similar problem can be found in the 'tail' tool.
s390 is the only architecture which has an int type f_type member in
struct statfs[64]. Other architectures have either unsigned ints or
long values, so that the problem doesn't occur there.

Therefore change the type of the f_type member to unsigned int, so
that we get zero extension instead of sign extension when assignment to
a long value happens.

This patch changes the s390 uapi struct stafs[64] definition in the kernel
to contain only unsigned values.
This was true for 32 bit builds anyway, since we use the generic uapi
header file in that case. So lets not include conditionally the generic
uapi header file but have the s390 implementation completely independent.

Also fix the types of struct compat_stafs to match reality and move the
definition of struct compat_statfs64 to asm/compat.h since it is not part
of the api.
Reported-by: NKay Sievers <kay@vrfy.org>
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

b8668fd0

s390/pci: return correct dma address for offset > PAGE_SIZE · 186f50fa

由 Gerald Schaefer 提交于 4月 22, 2013

For offset > PAGE_SIZE, s390_dma_map_pages() will issue a warning
and return a wrong dma address.

This patch removes the warning and fixes the dma return address
calculation.
Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

186f50fa

s390/ptrace: remove empty ifdefs · 63dd9b44

由 Heiko Carstens 提交于 4月 20, 2013

Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

63dd9b44

s390/compat: remove ptrace compat definitions from uapi header file · e4371f60

由 Heiko Carstens 提交于 4月 20, 2013

The compat definitions are not part of the uapi. So move them to
s390's private compat header file.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

e4371f60

s390/compat: fix compile error for !COMPAT · 0f58104c

由 Heiko Carstens 提交于 4月 20, 2013

Fix this one for !COMPAT:

compat.h: In function ‘arch_compat_alloc_user_space’:
compat.h:292:2: error: implicit declaration of function ‘is_compat_task’
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

0f58104c

s390/compat: fix compat_sys_statfs() memory corruption · a2aec0d3

由 Heiko Carstens 提交于 4月 20, 2013

The f_spare field within struct compat_statfs is four bytes larger
than within the native 31 bit struct statfs.
compat_sys_statfs() clears the f_spare field in user space which
means that in compat mode four bytes that are behind the user space
supplied struct compat_statfs will be corrupted (zeroed).

According to Thomas Gleixner's Linux 2.6 history tree this bug is
present since v2.5.74 87880da124 "[PATCH] s390: 31 bit compat.".
So it get's fixed shortly before its 10th anniversary. Tough luck.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

a2aec0d3

s390/mm,gmap: segment mapping race · ab8e5235

由 Martin Schwidefsky 提交于 4月 16, 2013

The gmap_map_segment function creates a special invalid segment table
entry with the address of the requested target location in the process
address space. The first access will create the connection between the
gmap segment table and the target page table of the main process.
If two threads do this concurrently both will walk the page tables and
allocate a gmap_rmap structure for the same segment table entry.
To avoid the race recheck the segment table entry after taking to page
table lock.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

ab8e5235

s390/mm,gmap: implement gmap_translate() · c5034945

由 Heiko Carstens 提交于 9月 10, 2012

Implement gmap_translate() function which translates a guest absolute address
to a user space process address without establishing the guest page table
entries.

This is useful for kvm guest address translations where no memory access
is expected to happen soon (e.g. tprot exception handler).
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

c5034945

Revert "MIPS: page.h: Provide more readable definition for PAGE_MASK." · 3b5e50ed

由 Ralf Baechle 提交于 4月 22, 2013

This reverts commit c17a6554.

Manuel Lauss writes:

lmo commit c17a6554 (MIPS: page.h: Provide more readable definition for
PAGE_MASK) apparently breaks ioremap of 36-bit addresses on my Alchemy
systems (PCI and PCMCIA) The reason is that in arch/mips/mm/ioremap.c
line 157  (phys_addr &= PAGE_MASK) bits 32-35 are cut off.  Seems the
new PAGE_MASK is explicitly 32bit, or one could make it signed instead
of unsigned long.

3b5e50ed

20 4月, 2013 3 次提交

x86, microcode: Verify the family before dispatching microcode patching · 74c3e3fc

由 H. Peter Anvin 提交于 4月 19, 2013

For each CPU vendor that implements CPU microcode patching, there will
be a minimum family for which this is implemented.  Verify this
minimum level of support.

This can be done in the dispatch function or early in the application
functions.  Doing the latter turned out to be somewhat awkward because
of the ineviable split between the BSP and the AP paths, and rather
than pushing deep into the application functions, do this in
the dispatch function.
Reported-by: N"Bryan O'Donoghue" <bryan.odonoghue.lkml@nexus-software.ie>
Suggested-by: NBorislav Petkov <bp@alien8.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1366392183-4149-1-git-send-email-bryan.odonoghue.lkml@nexus-software.ie

74c3e3fc

sparc64: Fix race in TLB batch processing. · f36391d2

由 David S. Miller 提交于 4月 19, 2013

As reported by Dave Kleikamp, when we emit cross calls to do batched
TLB flush processing we have a race because we do not synchronize on
the sibling cpus completing the cross call.

So meanwhile the TLB batch can be reset (tb->tlb_nr set to zero, etc.)
and either flushes are missed or flushes will flush the wrong
addresses.

Fix this by using generic infrastructure to synchonize on the
completion of the cross call.

This first required getting the flush_tlb_pending() call out from
switch_to() which operates with locks held and interrupts disabled.
The problem is that smp_call_function_many() cannot be invoked with
IRQs disabled and this is explicitly checked for with WARN_ON_ONCE().

We get the batch processing outside of locked IRQ disabled sections by
using some ideas from the powerpc port. Namely, we only batch inside
of arch_{enter,leave}_lazy_mmu_mode() calls.  If we're not in such a
region, we flush TLBs synchronously.

1) Get rid of xcall_flush_tlb_pending and per-cpu type
   implementations.

2) Do TLB batch cross calls instead via:

	smp_call_function_many()
		tlb_pending_func()
			__flush_tlb_pending()

3) Batch only in lazy mmu sequences:

	a) Add 'active' member to struct tlb_batch
	b) Define __HAVE_ARCH_ENTER_LAZY_MMU_MODE
	c) Set 'active' in arch_enter_lazy_mmu_mode()
	d) Run batch and clear 'active' in arch_leave_lazy_mmu_mode()
	e) Check 'active' in tlb_batch_add_one() and do a synchronous
           flush if it's clear.

4) Add infrastructure for synchronous TLB page flushes.

	a) Implement __flush_tlb_page and per-cpu variants, patch
	   as needed.
	b) Likewise for xcall_flush_tlb_page.
	c) Implement smp_flush_tlb_page() to invoke the cross-call.
	d) Wire up global_flush_tlb_page() to the right routine based
           upon CONFIG_SMP

5) It turns out that singleton batches are very common, 2 out of every
   3 batch flushes have only a single entry in them.

   The batch flush waiting is very expensive, both because of the poll
   on sibling cpu completeion, as well as because passing the tlb batch
   pointer to the sibling cpus invokes a shared memory dereference.

   Therefore, in flush_tlb_pending(), if there is only one entry in
   the batch perform a completely asynchronous global_flush_tlb_page()
   instead.
Reported-by: NDave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NDave Kleikamp <dave.kleikamp@oracle.com>

f36391d2

ARM: 7699/1: sched_clock: Add more notrace to prevent recursion · cea15092

由 Stephen Boyd 提交于 4月 18, 2013

cyc_to_sched_clock() is called by sched_clock() and cyc_to_ns()
is called by cyc_to_sched_clock(). I suspect that some compilers
inline both of these functions into sched_clock() and so we've
been getting away without having a notrace marking. It seems that
my compiler isn't inlining cyc_to_sched_clock() though, so I'm
hitting a recursion bug when I enable the function graph tracer,
causing my system to crash. Marking these functions notrace fixes
it. Technically cyc_to_ns() doesn't need the notrace because it's
already marked inline, but let's just add it so that if we ever
remove inline from that function it doesn't blow up.
Signed-off-by: NStephen Boyd <sboyd@codeaurora.org>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

cea15092

19 4月, 2013 2 次提交

mutex: Back out architecture specific check for negative mutex count · cc189d25

由 Waiman Long 提交于 4月 17, 2013

Linus suggested that probably all the supported architectures can
allow a negative mutex count without incorrect behavior, so we can
then back out the architecture specific change and allow the
mutex count to go to any negative number. That should further
reduce contention for non-x86 architecture.
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Chandramouleeswaran Aswin <aswin@hp.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Norton Scott J <scott.norton@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1366226594-5506-5-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

cc189d25

mutex: Make more scalable by doing less atomic operations · 0dc8c730

由 Waiman Long 提交于 4月 17, 2013

In the __mutex_lock_common() function, an initial entry into
the lock slow path will cause two atomic_xchg instructions to be
issued. Together with the atomic decrement in the fast path, a
total of three atomic read-modify-write instructions will be
issued in rapid succession. This can cause a lot of cache
bouncing when many tasks are trying to acquire the mutex at the
same time.

This patch will reduce the number of atomic_xchg instructions
used by checking the counter value first before issuing the
instruction. The atomic_read() function is just a simple memory
read. The atomic_xchg() function, on the other hand, can be up
to 2 order of magnitude or even more in cost when compared with
atomic_read(). By using atomic_read() to check the value first
before calling atomic_xchg(), we can avoid a lot of unnecessary
cache coherency traffic. The only downside with this change is
that a task on the slow path will have a tiny bit less chance of
getting the mutex when competing with another task in the fast
path.

The same is true for the atomic_cmpxchg() function in the
mutex-spin-on-owner loop. So an atomic_read() is also performed
before calling atomic_cmpxchg().

The mutex locking and unlocking code for the x86 architecture
can allow any negative number to be used in the mutex count to
indicate that some tasks are waiting for the mutex. I am not so
sure if that is the case for the other architectures. So the
default is to avoid atomic_xchg() if the count has already been
set to -1. For x86, the check is modified to include all
negative numbers to cover a larger case.

The following table shows the jobs per minutes (JPM) scalability
data on an 8-node 80-core Westmere box with a 3.7.10 kernel. The
numactl command is used to restrict the running of the
high_systime workloads to 1/2/4/8 nodes with hyperthreading on
and off.

+-----------------+-----------+------------+----------+
|  Configuration  | Mean JPM  |  Mean JPM  | % Change |
|		  | w/o patch | with patch |	      |
+-----------------+-----------------------------------+
|		  |      User Range 1100 - 2000	      |
+-----------------+-----------------------------------+
| 8 nodes, HT on  |    36980   |   148590  | +301.8%  |
| 8 nodes, HT off |    42799   |   145011  | +238.8%  |
| 4 nodes, HT on  |    61318   |   118445  |  +51.1%  |
| 4 nodes, HT off |   158481   |   158592  |   +0.1%  |
| 2 nodes, HT on  |   180602   |   173967  |   -3.7%  |
| 2 nodes, HT off |   198409   |   198073  |   -0.2%  |
| 1 node , HT on  |   149042   |   147671  |   -0.9%  |
| 1 node , HT off |   126036   |   126533  |   +0.4%  |
+-----------------+-----------------------------------+
|		  |       User Range 200 - 1000	      |
+-----------------+-----------------------------------+
| 8 nodes, HT on  |   41525    |   122349  | +194.6%  |
| 8 nodes, HT off |   49866    |   124032  | +148.7%  |
| 4 nodes, HT on  |   66409    |   106984  |  +61.1%  |
| 4 nodes, HT off |  119880    |   130508  |   +8.9%  |
| 2 nodes, HT on  |  138003    |   133948  |   -2.9%  |
| 2 nodes, HT off |  132792    |   131997  |   -0.6%  |
| 1 node , HT on  |  116593    |   115859  |   -0.6%  |
| 1 node , HT off |  104499    |   104597  |   +0.1%  |
+-----------------+------------+-----------+----------+

At low user range 10-100, the JPM differences were within +/-1%.
So they are not that interesting.

AIM7 benchmark run has a pretty large run-to-run variance due to
random nature of the subtests executed. So a difference of less
than +-5% may not be really significant.

This patch improves high_systime workload performance at 4 nodes
and up by maintaining transaction rates without significant
drop-off at high node count.  The patch has practically no
impact on 1 and 2 nodes system.

The table below shows the percentage time (as reported by perf
record -a -s -g) spent on the __mutex_lock_slowpath() function
by the high_systime workload at 1500 users for 2/4/8-node
configurations with hyperthreading off.

+---------------+-----------------+------------------+---------+
| Configuration | %Time w/o patch | %Time with patch | %Change |
+---------------+-----------------+------------------+---------+
|    8 nodes    |      65.34%     |      0.69%       |  -99%   |
|    4 nodes    |       8.70%	  |      1.02%	     |  -88%   |
|    2 nodes    |       0.41%     |      0.32%       |  -22%   |
+---------------+-----------------+------------------+---------+

It is obvious that the dramatic performance improvement at 8
nodes was due to the drastic cut in the time spent within the
__mutex_lock_slowpath() function.

The table below show the improvements in other AIM7 workloads
(at 8 nodes, hyperthreading off).

+--------------+---------------+----------------+-----------------+
|   Workload   | mean % change | mean % change  | mean % change   |
|              | 10-100 users  | 200-1000 users | 1100-2000 users |
+--------------+---------------+----------------+-----------------+
| alltests     |     +0.6%     |   +104.2%      |   +185.9%       |
| five_sec     |     +1.9%     |     +0.9%      |     +0.9%       |
| fserver      |     +1.4%     |     -7.7%      |     +5.1%       |
| new_fserver  |     -0.5%     |     +3.2%      |     +3.1%       |
| shared       |    +13.1%     |   +146.1%      |   +181.5%       |
| short        |     +7.4%     |     +5.0%      |     +4.2%       |
+--------------+---------------+----------------+-----------------+
Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
Reviewed-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
Reviewed-by: NRik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Chandramouleeswaran Aswin <aswin@hp.com>
Cc: Norton: Scott J <scott.norton@hp.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1366226594-5506-3-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

0dc8c730

openeuler / Kernel 接近 3 年 前同步成功

openeuler / Kernel
接近 3 年前同步成功