提交 · 19095548704ecd0f32fd5deba01d56430ad7a344 · openanolis / cloud-kernel

16 2月, 2011 15 次提交

x86-64, NUMA: Kill {acpi|amd}_get_nodes() · 19095548

由 Tejun Heo 提交于 2月 16, 2011

With common numa_nodes[], common code in numa_64.c can access it
directly.  Copy directly and kill {acpi|amd}_get_nodes().
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>

19095548

x86-64, NUMA: Use common numa_nodes[] · 206e4208

由 Tejun Heo 提交于 2月 16, 2011

ACPI and amd are using separate nodes[] array.  Add numa_nodes[] and
use them in all NUMA init methods.  cutoff_node() cleanup is moved
from srat_64.c to numa_64.c and applied in initmem_init() regardless
of init methods.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>

206e4208

x86-64, NUMA: Move apicid to numa mapping initialization from amd_scan_nodes() to amd_numa_init() · 45fe6c78

由 Tejun Heo 提交于 2月 16, 2011

This brings amd initialization behavior closer to that of acpi.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>

45fe6c78

x86-64, NUMA: Remove local variable found from amd_numa_init() · 99df738c

由 Tejun Heo 提交于 2月 16, 2011

Use weight count on mem_nodes_parsed instead.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>

99df738c

x86-64, NUMA: Use common {cpu|mem}_nodes_parsed · ec8cf29b

由 Tejun Heo 提交于 2月 16, 2011

ACPI and amd are using separate nodes_parsed masks.  Add
{cpu|mem}_nodes_parsed and use them in all NUMA init methods.
Initialization of the masks and building node_possible_map are now
handled commonly by initmem_init().

dummy_numa_init() is updated to set node 0 on both masks.  While at
it, move the info messages from scan to init.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>

ec8cf29b

x86-64, NUMA: Restructure initmem_init() · ffe77a46

由 Tejun Heo 提交于 2月 16, 2011

Reorganize initmem_init() such that,

* Different NUMA init methods are iterated in a consistent way.

* Each iteration re-initializes all the parameters and different
  method can be tried after a failure.

* Dummy init is handled the same as other methods.

Apart from how retry after failure, this patch doesn't change the
behavior.  The call sequences are kept equivalent across the
conversion.

After the change, bad_srat() doesn't need to clear apic to node
mapping or worry about numa_off.  Simplified accordingly.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>

ffe77a46

x86, NUMA: Move *_numa_init() invocations into initmem_init() · d8fc3afc

由 Tejun Heo 提交于 2月 16, 2011

There's no reason for these to live in setup_arch().  Move them inside
initmem_init().

- v2: x86-32 initmem_init() weren't updated breaking 32bit builds.
  Fixed.  Found by Ankita.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Ankita Garg <ankita@in.ibm.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>

d8fc3afc

x86-64, NUMA: Wrap acpi_numa_init() so that failure can be indicated by return value · a9aec56a

由 Tejun Heo 提交于 2月 16, 2011

Because of the way ACPI tables are parsed, the generic
acpi_numa_init() couldn't return failure when error was detected by
arch hooks.  Instead, the failure state was recorded and later arch
dependent init hook - acpi_scan_nodes() - would fail.

Wrap acpi_numa_init() with x86_acpi_numa_init() so that failure can be
indicated as return value immediately.  This is in preparation for
further NUMA init cleanups.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>

a9aec56a

x86-64, NUMA: Unify {acpi|amd}_{numa_init|scan_nodes}() arguments and return values · 940fed2e

由 Tejun Heo 提交于 2月 16, 2011

The functions used during NUMA initialization - *_numa_init() and
*_scan_nodes() - have different arguments and return values.  Unify
them such that they all take no argument and return 0 on success and
-errno on failure.  This is in preparation for further NUMA init
cleanups.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>

940fed2e

x86, NUMA: Drop @start/last_pfn from initmem_init() · 86ef4dbf

由 Tejun Heo 提交于 2月 16, 2011

initmem_init() extensively accesses and modifies global data
structures and the parameters aren't even followed depending on which
path is being used.  Drop @start/last_pfn and let it deal with
@max_pfn directly.  This is in preparation for further NUMA init
cleanups.

- v2: x86-32 initmem_init() weren't updated breaking 32bit builds.
  Fixed.  Found by Yinghai.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>

86ef4dbf

x86-64, NUMA: Simplify hotplug node handling in acpi_numa_memory_affinity_init() · 13081df5

由 Tejun Heo 提交于 2月 16, 2011

Hotplug node handling in acpi_numa_memory_affinity_init() was
unnecessarily complicated with storing the original nodes[] entry and
restoring it afterwards.  Simplify it by not modifying the nodes[]
entry for hotplug nodes from the beginning.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>

13081df5

x86-64, NUMA: Make dummy node initialization path similar to non-dummy ones · 7d36b7bc

由 Tejun Heo 提交于 2月 16, 2011

Dummy node initialization in initmem_init() didn't initialize apicid
to node mapping and set cpu to node mapping directly by caling
numa_set_node(), which is different from non-dummy init paths.

Update it such that they behave similarly.  Initialize apicid to node
mapping and call numa_init_array().  The actual cpu to node mapping is
handled by init_cpu_to_node() later.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NYinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>

7d36b7bc

Merge branch 'x86/amd-nb' into x86/mm · 275a88d3

由 Ingo Molnar 提交于 2月 16, 2011

Merge reason: consolidate it into the more generic x86/mm tree to prevent conflicts
              with ongoing NUMA work.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

275a88d3

Merge branch 'x86/numa' into x86/mm · 52b8b8d7

由 Ingo Molnar 提交于 2月 16, 2011

Merge reason: consolidate it into the more generic x86/mm tree to prevent conflicts.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

52b8b8d7

Merge branch 'x86/bootmem' into x86/mm · 02ac81a8

由 Ingo Molnar 提交于 2月 16, 2011

Merge reason: the topic is ready - consolidate it into the more generic x86/mm tree
              and prevent conflicts.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

02ac81a8

15 2月, 2011 1 次提交

x86, amd: Initialize variable properly · 9e81509e

由 Borislav Petkov 提交于 2月 14, 2011

Commit d518573d ("x86, amd: Normalize compute unit IDs on
multi-node processors") introduced compute unit normalization
but causes a compiler warning:

 arch/x86/kernel/cpu/amd.c: In function 'amd_detect_cmp':
 arch/x86/kernel/cpu/amd.c:268: warning: 'cores_per_cu' may be used uninitialized in this function
 arch/x86/kernel/cpu/amd.c:268: note: 'cores_per_cu' was declared here

The compiler is right - initialize it with a proper value.

Also, fix up a comment while at it.
Reported-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20110214171451.GB10076@kryptos.osrc.amd.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

9e81509e

14 2月, 2011 10 次提交

x86, numa: Add error handling for bad cpu-to-node mappings · 14392fd3

由 David Rientjes 提交于 2月 07, 2011

CONFIG_DEBUG_PER_CPU_MAPS may return NUMA_NO_NODE when an
early_cpu_to_node() mapping hasn't been initialized.  In such a
case, it emits a warning and continues without an issue but
callers may try to use the return value to index into an array.

We can catch those errors and fail silently since a warning has
already been emitted.  No current user of numa_add_cpu()
requires this error checking to avoid a crash, but it's better
to be proactive in case a future user happens to have a bug and
a user tries to diagnose it with CONFIG_DEBUG_PER_CPU_MAPS.
Reported-by: NJesper Juhl <jj@chaosbits.net>
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Cc: Tejun Heo <tj@kernel.org>
LKML-Reference: <alpine.DEB.2.00.1102071407250.7812@chino.kir.corp.google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

14392fd3

Merge commit 'v2.6.38-rc4' into x86/numa · b366801c

由 Ingo Molnar 提交于 2月 14, 2011

Merge reason: Merge latest fixes before applying new patch.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b366801c

x86: Emit "mem=nopentium ignored" warning when not supported · 9a6d44b9

由 Kamal Mostafa 提交于 2月 03, 2011

Emit warning when "mem=nopentium" is specified on any arch other
than x86_32 (the only that arch supports it).
Signed-off-by: NKamal Mostafa <kamal@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/553464
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
LKML-Reference: <1296783486-23033-2-git-send-email-kamal@canonical.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Cc: <stable@kernel.org>

9a6d44b9

x86: Fix panic when handling "mem={invalid}" param · 77eed821

由 Kamal Mostafa 提交于 2月 03, 2011

Avoid removing all of memory and panicing when "mem={invalid}"
is specified, e.g. mem=blahblah, mem=0, or mem=nopentium (on
platforms other than x86_32).
Signed-off-by: NKamal Mostafa <kamal@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/553464
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: <stable@kernel.org> # .3x: as far back as it applies
LKML-Reference: <1296783486-23033-1-git-send-email-kamal@canonical.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

77eed821

x86: Avoid tlbstate lock if not enough cpus · 7064d865

由 Shaohua Li 提交于 1月 17, 2011

This one isn't related to previous patch. If online cpus are
below NUM_INVALIDATE_TLB_VECTORS, we don't need the lock. The
comments in the code declares we don't need the check, but a hot
lock still needs an atomic operation and expensive, so add the
check here.

Uses nr_cpu_ids here as suggested by Eric Dumazet.
Signed-off-by: NShaohua Li <shaohua.li@intel.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: Andi Kleen <andi@firstfloor.org>
LKML-Reference: <1295232730.1949.710.camel@sli10-conroe>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

7064d865

x86: Scale up the number of TLB invalidate vectors with NR_CPUs, up to 32 · 70e4a369

由 Shaohua Li 提交于 1月 17, 2011

Make the maxium TLB invalidate vectors depend on NR_CPUS linearly,
with a maximum of 32 vectors.

We currently only have 8 vectors for TLB invalidation and that is clearly
inadequate. If we have a lot of CPUs, the CPUs need share the 8 vectors and
tlbstate_lock is used to protect them. flush_tlb_page() is
heavily used in page reclaim, which will cause a lot of lock
contention for tlbstate_lock.

Andi Kleen suggested increasing the vectors number to 32, which should be
good for current typical systems to reduce the tlbstate_lock contention.

My test system has 4 sockets and 64G memory, and 64 CPUs. My
workload creates 64 processes. Each process mmap reads a big
empty sparse file. The total size of the files are 2*total_mem,
so this will cause a lot of page reclaim.

Below is the result I get from perf call-graph profiling:

 without the patch:
 ------------------

    24.25%           usemem  [kernel]                                   [k] _raw_spin_lock
                     |
                     --- _raw_spin_lock
                        |
                        |--42.15%-- native_flush_tlb_others

 with the patch:
 ------------------

    14.96%           usemem  [kernel]                                   [k] _raw_spin_lock
                     |
                     --- _raw_spin_lock
                        |--13.89%-- native_flush_tlb_others

So this heavily reduces the tlbstate_lock contention.
Suggested-by: NAndi Kleen <andi@firstfloor.org>
Signed-off-by: NShaohua Li <shaohua.li@intel.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1295232727.1949.709.camel@sli10-conroe>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

70e4a369

x86: Allocate 32 tlb_invalidate_interrupt handler stubs · 3a09fb45

由 Shaohua Li 提交于 1月 17, 2011

Add up to 32 invalidate_interrupt handlers. How many handlers are
added depends on NUM_INVALIDATE_TLB_VECTORS. So if
NUM_INVALIDATE_TLB_VECTORS is smaller than 32, we reduce code
size.
Signed-off-by: NShaohua Li <shaohua.li@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
LKML-Reference: <1295232725.1949.708.camel@sli10-conroe>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

3a09fb45

x86: Cleanup vector usage · 60f6e65d

由 Shaohua Li 提交于 1月 17, 2011

Cleanup the vector usage and make them continuous if possible.
Signed-off-by: NShaohua Li <shaohua.li@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
LKML-Reference: <1295232722.1949.707.camel@sli10-conroe>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

60f6e65d

Merge branch 'linus' into x86/bootmem · d2137d5a

由 Ingo Molnar 提交于 2月 14, 2011

Conflicts:
	arch/x86/mm/numa_64.c

Merge reason: fix the conflict, update to latest -rc and pick up this
              dependent fix from Yinghai:

  e6d2e2b2: memblock: don't adjust size in memblock_find_base()
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d2137d5a

klist: Fix object alignment on 64-bit. · 795abaf1

由 David Miller 提交于 2月 13, 2011

Commit c0e69a5b ("klist.c: bit 0 in pointer can't be used as flag")
intended to make sure that all klist objects were at least pointer size
aligned, but used the constant "4" which only works on 32-bit.

Use "sizeof(void *)" which is correct in all cases.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NJesper Nilsson <jesper.nilsson@axis.com>
Cc: stable <stable@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

795abaf1

13 2月, 2011 8 次提交

Merge branch 'spi/merge' of git://git.secretlab.ca/git/linux-2.6 · 091994cf

由 Linus Torvalds 提交于 2月 13, 2011

* 'spi/merge' of git://git.secretlab.ca/git/linux-2.6:
  devicetree-discuss is moderated for non-subscribers
  MAINTAINERS: Add entry for GPIO subsystem
  dt: add documentation of ARM dt boot interface
  dt: Remove obsolete description of powerpc boot interface
  dt: Move device tree documentation out of powerpc directory
  spi/spi_sh_msiof: fix wrong address calculation, which leads to an Oops

091994cf

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 · d8ed516f

由 Linus Torvalds 提交于 2月 13, 2011

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ALSA: hda - add quirk for Ordissimo EVE using a realtek ALC662
  ALSA: hrtimer: remove superfluous tasklet invocation
  ALSA: hrtimer: handle delayed timer interrupts
  ALSA: HDA: Add subwoofer quirk for Acer Aspire 8942G
  ALSA: hda - Don't handle empty patch files
  ALSA: hda - Fix missing CA initialization for HDMI/DP
  ALSA: usbaudio - Enable the E-MU 0204 USB
  ALSA: hda - switch lfe with side in mixer for 4930g
  ASoC: Improve WM8994 digital power sequencing
  ASoC: Create an AIF1ADCDAT signal widget to match AIF2
  asoc: davinci: da830/omap-l137: correct cpu_dai_name
  ASoC: fill in snd_soc_pcm_runtime.card before calling snd_soc_dai_link.init()

d8ed516f

Revert "pci: use security_capable() when checking capablities during config space read" · f00eaeea

由 Linus Torvalds 提交于 2月 13, 2011

This reverts commit 47970b1b.

It turns out it breaks several distributions.  Looks like the stricter
selinux checks fail due to selinux policies not being set to allow the
access - breaking X, but also lspci.

So while the change was clearly the RightThing(tm) to do in theory, in
practice we have backwards compatibility issues making it not work.
Reported-by: NDave Young <hidave.darkstar@gmail.com>
Acked-by: NDavid Airlie <airlied@linux.ie>
Acked-by: NAlex Riesen <raa.lkml@gmail.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f00eaeea

T

Merge branch 'fix/asoc' into for-linus · 61461241
由 Takashi Iwai 提交于 2月 13, 2011

61461241
G

Merge branch 'devicetree/merge' into spi/merge · c170093d
由 Grant Likely 提交于 2月 12, 2011

c170093d

devicetree-discuss is moderated for non-subscribers · 78bba987

由 Paul Bolle 提交于 2月 12, 2011

Signed-off-by: NPaul Bolle <pebolle@tiscali.nl>
Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>

78bba987

MAINTAINERS: Add entry for GPIO subsystem · a0dc00b4

由 Grant Likely 提交于 2月 12, 2011

I'll probably regret this....
Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a0dc00b4

Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · c8e0b00e

由 Linus Torvalds 提交于 2月 12, 2011

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  jbd2: call __jbd2_log_start_commit with j_state_lock write locked
  ext4: serialize unaligned asynchronous DIO
  ext4: make grpinfo slab cache names static
  ext4: Fix data corruption with multi-block writepages support
  ext4: fix up ext4 error handling
  ext4: unregister features interface on module unload
  ext4: fix panic on module unload when stopping lazyinit thread

c8e0b00e

12 2月, 2011 6 次提交

jbd2: call __jbd2_log_start_commit with j_state_lock write locked · e4471831

由 Theodore Ts'o 提交于 2月 12, 2011

On an SMP ARM system running ext4, I've received a report that the
first J_ASSERT in jbd2_journal_commit_transaction has been triggering:

	J_ASSERT(journal->j_running_transaction != NULL);

While investigating possible causes for this problem, I noticed that
__jbd2_log_start_commit() is getting called with j_state_lock only
read-locked, in spite of the fact that it's possible for it might
j_commit_request.  Fix this by grabbing the necessary information so
we can test to see if we need to start a new transaction before
dropping the read lock, and then calling jbd2_log_start_commit() which
will grab the write lock.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

e4471831

ext4: serialize unaligned asynchronous DIO · e9e3bcec

由 Eric Sandeen 提交于 2月 12, 2011

ext4 has a data corruption case when doing non-block-aligned
asynchronous direct IO into a sparse file, as demonstrated
by xfstest 240.

The root cause is that while ext4 preallocates space in the
hole, mappings of that space still look "new" and 
dio_zero_block() will zero out the unwritten portions.  When
more than one AIO thread is going, they both find this "new"
block and race to zero out their portion; this is uncoordinated
and causes data corruption.

Dave Chinner fixed this for xfs by simply serializing all
unaligned asynchronous direct IO.  I've done the same here.
The difference is that we only wait on conversions, not all IO.
This is a very big hammer, and I'm not very pleased with
stuffing this into ext4_file_write().  But since ext4 is
DIO_LOCKING, we need to serialize it at this high level.

I tried to move this into ext4_ext_direct_IO, but by then
we have the i_mutex already, and we will wait on the
work queue to do conversions - which must also take the
i_mutex.  So that won't work.

This was originally exposed by qemu-kvm installing to
a raw disk image with a normal sector-63 alignment.  I've
tested a backport of this patch with qemu, and it does
avoid the corruption.  It is also quite a lot slower
(14 min for package installs, vs. 8 min for well-aligned)
but I'll take slow correctness over fast corruption any day.

Mingming suggested that we can track outstanding
conversions, and wait on those so that non-sparse
files won't be affected, and I've implemented that here;
unaligned AIO to nonsparse files won't take a perf hit.

[tytso@mit.edu: Keep the mutex as a hashed array instead
 of bloating the ext4 inode]

[tytso@mit.edu: Fix up namespace issues so that global
 variables are protected with an "ext4_" prefix.]
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

e9e3bcec

ext4: make grpinfo slab cache names static · 2892c15d

由 Eric Sandeen 提交于 2月 12, 2011

In 2.6.37 I was running into oopses with repeated module
loads & unloads.  I tracked this down to:

fb1813f4 ext4: use dedicated slab caches for group_info structures

(this was in addition to the features advert unload problem)

The kstrdup & subsequent kfree of the cache name was causing
a double free.  In slub, at least, if I read it right it allocates
& frees the name itself, slab seems to do something different...
so in slub I think we were leaking -our- cachep->name, and double
freeing the one allocated by slub.

After getting lost in slab/slub/slob a bit, I just looked at other
sized-caches that get allocated.  jbd2, biovec, sgpool all do it
more or less the way jbd2 does.  Below patch follows the jbd2
method of dynamically allocating a cache at mount time from
a list of static names.

(This might also possibly fix a race creating the caches with
parallel mounts running).

[Folded in a fix from Dan Carpenter which fixed an off-by-one error in
the original patch]

Cc: stable@kernel.org
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

2892c15d

G
MAINTAINERS: Add entry for GPIO subsystem · 557218e2
由 Grant Likely 提交于 2月 12, 2011
```
I'll probably regret this....
Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
```
557218e2

Merge branch 'kvm-updates/2.6.38' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 3c6c0d6c

由 Linus Torvalds 提交于 2月 11, 2011

* 'kvm-updates/2.6.38' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: SVM: Make sure KERNEL_GS_BASE is valid when loading gs_index

3c6c0d6c

L
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp · 5b49378e
由 Linus Torvalds 提交于 2月 11, 2011
```
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  amd64_edac: Fix DIMMs per DCTs output
```
5b49378e

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功