提交 · 3b93c7aaefc05ee2a75e2726929b01a321402984 · openeuler / Kernel

24 8月, 2010 9 次提交

xfs: don't do memory allocation under the CIL context lock · 3b93c7aa

由 Dave Chinner 提交于 8月 24, 2010

Formatting items requires memory allocation when using delayed
logging. Currently that memory allocation is done while holding the
CIL context lock in read mode. This means that if memory allocation
takes some time (e.g. enters reclaim), we cannot push on the CIL
until the allocation(s) required by formatting complete. This can
stall CIL pushes for some time, and once a push is stalled so are
all new transaction commits.

Fix this splitting the item formatting into two steps. The first
step which does the allocation and memcpy() into the allocated
buffer is now done outside the CIL context lock, and only the CIL
insert is done inside the CIL context lock. This avoids the stall
issue.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

3b93c7aa

xfs: Reduce log force overhead for delayed logging · a44f13ed

由 Dave Chinner 提交于 8月 24, 2010

Delayed logging adds some serialisation to the log force process to
ensure that it does not deference a bad commit context structure
when determining if a CIL push is necessary or not. It does this by
grabing the CIL context lock exclusively, then dropping it before
pushing the CIL if necessary. This causes serialisation of all log
forces and pushes regardless of whether a force is necessary or not.
As a result fsync heavy workloads (like dbench) can be significantly
slower with delayed logging than without.

To avoid this penalty, copy the current sequence from the context to
the CIL structure when they are swapped. This allows us to do
unlocked checks on the current sequence without having to worry
about dereferencing context structures that may have already been
freed. Hence we can remove the CIL context locking in the forcing
code and only call into the push code if the current context matches
the sequence we need to force.

By passing the sequence into the push code, we can check the
sequence again once we have the CIL lock held exclusive and abort if
the sequence has already been pushed. This avoids a lock round-trip
and unnecessary CIL pushes when we have racing push calls.

The result is that the regression in dbench performance goes away -
this change improves dbench performance on a ramdisk from ~2100MB/s
to ~2500MB/s. This compares favourably to not using delayed logging
which retuns ~2500MB/s for the same workload.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

a44f13ed

xfs: dummy transactions should not dirty VFS state · 1a387d3b

由 Dave Chinner 提交于 8月 24, 2010

When we  need to cover the log, we issue dummy transactions to ensure
the current log tail is on disk. Unfortunately we currently use the
root inode in the dummy transaction, and the act of committing the
transaction dirties the inode at the VFS level.

As a result, the VFS writeback of the dirty inode will prevent the
filesystem from idling long enough for the log covering state
machine to complete. The state machine gets stuck in a loop issuing
new dummy transactions to cover the log and never makes progress.

To avoid this problem, the dummy transactions should not cause
externally visible state changes. To ensure this occurs, make sure
that dummy transactions log an unchanging field in the superblock as
it's state is never propagated outside the filesystem. This allows
the log covering state machine to complete successfully and the
filesystem now correctly enters a fully idle state about 90s after
the last modification was made.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

1a387d3b

xfs: ensure f_ffree returned by statfs() is non-negative · 2fe33661

由 Stuart Brodsky 提交于 8月 24, 2010

Because of delayed updates to sb_icount field in the super block, it
is possible to allocate over maxicount number of inodes.  This
causes the arithmetic to calculate a negative number of free inodes
in user commands like df or stat -f.

Since maxicount is a somewhat arbitrary number, a slight over
allocation is not critical but user commands should be displayed as
0 or greater and never go negative.  To do this the value in the
stats buffer f_ffree is capped to never go negative.

[ Modified to use max_t as per Christoph's comment. ]
Signed-off-by: NStu Brodsky <sbrodsky@sgi.com>
Signed-off-by: NDave Chinner <dchinner@redhat.com>

2fe33661

xfs: handle negative wbc->nr_to_write during sync writeback · efceab1d

由 Dave Chinner 提交于 8月 24, 2010

During data integrity (WB_SYNC_ALL) writeback, wbc->nr_to_write will
go negative on inodes with more than 1024 dirty pages due to
implementation details of write_cache_pages(). Currently XFS will
abort page clustering in writeback once nr_to_write drops below
zero, and so for data integrity writeback we will do very
inefficient page at a time allocation and IO submission for inodes
with large numbers of dirty pages.

Fix this by only aborting the page clustering code when
wbc->nr_to_write is negative and the sync mode is WB_SYNC_NONE.

Cc: <stable@kernel.org>
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

efceab1d

writeback: write_cache_pages doesn't terminate at nr_to_write <= 0 · 546a1924

由 Dave Chinner 提交于 8月 24, 2010

I noticed XFS writeback in 2.6.36-rc1 was much slower than it should have
been. Enabling writeback tracing showed:

flush-253:16-8516 [007] 1342952.351608: wbc_writepage: bdi 253:16: towrt=1024 skip=0 mode=0 kupd=0 bgrd=1 reclm=0 cyclic=1 more=0 older=0x0 start=0x0 end=0x0
flush-253:16-8516 [007] 1342952.351654: wbc_writepage: bdi 253:16: towrt=1023 skip=0 mode=0 kupd=0 bgrd=1 reclm=0 cyclic=1 more=0 older=0x0 start=0x0 end=0x0
flush-253:16-8516 [000] 1342952.369520: wbc_writepage: bdi 253:16: towrt=0 skip=0 mode=0 kupd=0 bgrd=1 reclm=0 cyclic=1 more=0 older=0x0 start=0x0 end=0x0
flush-253:16-8516 [000] 1342952.369542: wbc_writepage: bdi 253:16: towrt=-1 skip=0 mode=0 kupd=0 bgrd=1 reclm=0 cyclic=1 more=0 older=0x0 start=0x0 end=0x0
flush-253:16-8516 [000] 1342952.369549: wbc_writepage: bdi 253:16: towrt=-2 skip=0 mode=0 kupd=0 bgrd=1 reclm=0 cyclic=1 more=0 older=0x0 start=0x0 end=0x0

Writeback is not terminating in background writeback if ->writepage is
returning with wbc->nr_to_write == 0, resulting in sub-optimal single page
writeback on XFS.

Fix the write_cache_pages loop to terminate correctly when this situation
occurs and so prevent this sub-optimal background writeback pattern. This
improves sustained sequential buffered write performance from around
250MB/s to 750MB/s for a 100GB file on an XFS filesystem on my 8p test VM.

Cc:<stable@kernel.org>
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NWu Fengguang <fengguang.wu@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

546a1924

xfs: fix untrusted inode number lookup · 4536f2ad

由 Dave Chinner 提交于 8月 24, 2010

Commit 7124fe0a ("xfs: validate untrusted inode
numbers during lookup") changes the inode lookup code to do btree lookups for
untrusted inode numbers. This change made an invalid assumption about the
alignment of inodes and hence incorrectly calculated the first inode in the
cluster. As a result, some inode numbers were being incorrectly considered
invalid when they were actually valid.

The issue was not picked up by the xfstests suite because it always runs fsr
and dump (the two utilities that utilise the bulkstat interface) on cache hot
inodes and hence the lookup code in the cold cache path was not sufficiently
exercised to uncover this intermittent problem.

Fix the issue by relaxing the btree lookup criteria and then checking if the
record returned contains the inode number we are lookup for. If it we get an
incorrect record, then the inode number is invalid.

Cc: <stable@kernel.org>
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

4536f2ad

xfs: ensure we mark all inodes in a freed cluster XFS_ISTALE · 5b3eed75

由 Dave Chinner 提交于 8月 24, 2010

Under heavy load parallel metadata loads (e.g. dbench), we can fail
to mark all the inodes in a cluster being freed as XFS_ISTALE as we
skip inodes we cannot get the XFS_ILOCK_EXCL or the flush lock on.
When this happens and the inode cluster buffer has already been
marked stale and freed, inode reclaim can try to write the inode out
as it is dirty and not marked stale. This can result in writing th
metadata to an freed extent, or in the case it has already
been overwritten trigger a magic number check failure and return an
EUCLEAN error such as:

Filesystem "ram0": inode 0x442ba1 background reclaim flush failed with 117

Fix this by ensuring that we hoover up all in memory inodes in the
cluster and mark them XFS_ISTALE when freeing the cluster.

Cc: <stable@kernel.org>
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

5b3eed75

xfs: unlock items before allowing the CIL to commit · d17c701c

由 Dave Chinner 提交于 8月 24, 2010

When we commit a transaction using delayed logging, we need to
unlock the items in the transaciton before we unlock the CIL context
and allow it to be checkpointed. If we unlock them after we release
the CIl context lock, the CIL can checkpoint and complete before
we free the log items. This breaks stale buffer item unlock and
unpin processing as there is an implicit assumption that the unlock
will occur before the unpin.

Also, some log items need to store the LSN of the transaction commit
in the item (inodes and EFIs) and so can race with other transaction
completions if we don't prevent the CIL from checkpointing before
the unlock occurs.

Cc: <stable@kernel.org>
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

d17c701c

23 8月, 2010 5 次提交

L

Linux 2.6.36-rc2 · 76be97c1
由 Linus Torvalds 提交于 8月 22, 2010

76be97c1

Merge branch 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 3dc8d7f0

由 Linus Torvalds 提交于 8月 22, 2010

* 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: PIT: free irq source id in handling error path
  KVM: destroy workqueue on kvm_create_pit() failures
  KVM: fix poison overwritten caused by using wrong xstate size

3dc8d7f0

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel · 4238a417

由 Linus Torvalds 提交于 8月 22, 2010

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel: (58 commits)
  drm/i915,intel_agp: Add support for Sandybridge D0
  drm/i915: fix render pipe control notify on sandybridge
  agp/intel: set 40-bit dma mask on Sandybridge
  drm/i915: Remove the conflicting BUG_ON()
  drm/i915/suspend: s/IS_IRONLAKE/HAS_PCH_SPLIT/
  drm/i915/suspend: Flush register writes before busy-waiting.
  i915: disable DAC on Ironlake also when doing CRT load detection.
  drm/i915: wait for actual vblank, not just 20ms
  drm/i915: make sure eDP PLL is enabled at the right time
  drm/i915: fix VGA plane disable for Ironlake+
  drm/i915: eDP mode set sequence corrections
  drm/i915: add panel reset workaround
  drm/i915: Enable RC6 on Ironlake.
  drm/i915/sdvo: Only set is_lvds if we have a valid fixed mode.
  drm/i915: Set up a render context on Ironlake
  drm/i915 invalidate indirect state pointers at end of ring exec
  drm/i915: Wake-up wait_request() from elapsed hang-check (v2)
  drm/i915: Apply i830 errata for cursor alignment
  drm/i915: Only update i845/i865 CURBASE when disabled (v2)
  drm/i915: FBC is updated within set_base() so remove second call in mode_set()
  ...

4238a417

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 · bc584c51

由 Linus Torvalds 提交于 8月 22, 2010

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
  slab: fix object alignment
  slub: add missing __percpu markup in mm/slub_def.h

bc584c51

L
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 · a28e0852
由 Linus Torvalds 提交于 8月 22, 2010
```
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
  nilfs2: wait for discard to finish
```
a28e0852

22 8月, 2010 11 次提交

Z
drm/i915,intel_agp: Add support for Sandybridge D0 · 4fefe435
由 Zhenyu Wang 提交于 8月 19, 2010
```
Signed-off-by: NZhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: NEric Anholt <eric@anholt.net>
```
4fefe435

drm/i915: fix render pipe control notify on sandybridge · 3fdef020

由 Zhenyu Wang 提交于 8月 19, 2010

This one is missed in last pipe control fix for sandybridge,
that really unmask interrupt bit for notify in render engine IMR.
Signed-off-by: NZhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: NEric Anholt <eric@anholt.net>

3fdef020

agp/intel: set 40-bit dma mask on Sandybridge · 877fdacf

由 Zhenyu Wang 提交于 8月 19, 2010

Signed-off-by: NZhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: NEric Anholt <eric@anholt.net>

877fdacf

drm/i915: Remove the conflicting BUG_ON() · 156dadc1

由 Chris Wilson 提交于 8月 15, 2010

We now attempt to free "active" objects following a GPU hang as either
the GPU will be reset or the hang is permenant. In either case, the GPU
writes will not be flushed to main memory and it should be safe to
return that memory back to the system.

The BUG_ON(active) is thus overkill and can erroneously fire after a
EIO.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NEric Anholt <eric@anholt.net>

156dadc1

drm/i915/suspend: s/IS_IRONLAKE/HAS_PCH_SPLIT/ · 90eb77ba

由 Chris Wilson 提交于 8月 14, 2010

For the shared paths on the next generation chipsets.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NEric Anholt <eric@anholt.net>

90eb77ba

C
drm/i915/suspend: Flush register writes before busy-waiting. · 72bcb269
由 Chris Wilson 提交于 8月 14, 2010
```
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NEric Anholt <eric@anholt.net>
```
72bcb269

i915: disable DAC on Ironlake also when doing CRT load detection. · d5dd96cb

由 Dave Airlie 提交于 8月 04, 2010

Like on Sandybridge, disabling the DAC here when doing CRT load detect
avoids forever hangs waiting on the hardware.

test procedure on HP 2740p:
boot with no VGA plugged in, start X,
plug in VGA monitor (1280x1024)
chvt 3
machine hangs waiting forever.
Signed-off-by: NDave Airlie <airlied@redhat.com>
Signed-off-by: NEric Anholt <eric@anholt.net>

d5dd96cb

drm/i915: wait for actual vblank, not just 20ms · 9d0498a2

由 Jesse Barnes 提交于 8月 18, 2010

Waiting for a hard coded 20ms isn't always enough to make sure a vblank
period has actually occurred, so add code to make sure we really have
passed through a vblank period (or that the pipe is off when disabling).

This prevents problems with mode setting and link training, and seems to
fix a bug like https://bugs.freedesktop.org/show_bug.cgi?id=29278, but
on an HP 8440p instead. Hopefully also fixes
https://bugs.freedesktop.org/show_bug.cgi?id=29141.
Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: NEric Anholt <eric@anholt.net>

9d0498a2

workqueue: Add basic tracepoints to track workqueue execution · e36c886a

由 Arjan van de Ven 提交于 8月 21, 2010

With the introduction of the new unified work queue thread pools,
we lost one feature: It's no longer possible to know which worker
is causing the CPU to wake out of idle. The result is that PowerTOP
now reports a lot of "kworker/a:b" instead of more readable results.

This patch adds a pair of tracepoints to the new workqueue code,
similar in style to the timer/hrtimer tracepoints.

With this pair of tracepoints, the next PowerTOP can correctly
report which work item caused the wakeup (and how long it took):

Interrupt (43) i915 time 3.51ms wakeups 141
Work ieee80211_iface_work time 0.81ms wakeups 29
Work do_dbs_timer time 0.55ms wakeups 24
Process Xorg time 21.36ms wakeups 4
Timer sched_rt_period_timer time 0.01ms wakeups 1
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e36c886a

Merge git://git.infradead.org/mtd-2.6 · 69b26c7a

由 Linus Torvalds 提交于 8月 21, 2010

* git://git.infradead.org/mtd-2.6:
  mtd: nand: Fix probe of Samsung NAND chips
  mtd: nand: Fix regression in BBM detection
  pxa3xx: fix ns2cycle equation

69b26c7a

Replace Configure with Enable in description of MAXSMP · ddb0c5a6

由 Samuel Thibault 提交于 8月 21, 2010

The "Configure" word tends to make user believe they have to say 'yes'
to be able to choose the number of procs/nodes.  "Enable" should be
unambiguous enough.
Signed-off-by: NSamuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ddb0c5a6

21 8月, 2010 15 次提交

mm: make stack guard page logic use vm_prev pointer · 0e8e50e2

由 Linus Torvalds 提交于 8月 20, 2010

Like the mlock() change previously, this makes the stack guard check
code use vma->vm_prev to see what the mapping below the current stack
is, rather than have to look it up with find_vma().

Also, accept an abutting stack segment, since that happens naturally if
you split the stack with mlock or mprotect.
Tested-by: NIan Campbell <ijc@hellion.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0e8e50e2

mm: make the mlock() stack guard page checks stricter · 7798330a

由 Linus Torvalds 提交于 8月 20, 2010

If we've split the stack vma, only the lowest one has the guard page.
Now that we have a doubly linked list of vma's, checking this is trivial.
Tested-by: NIan Campbell <ijc@hellion.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7798330a

mm: make the vma list be doubly linked · 297c5eee

由 Linus Torvalds 提交于 8月 20, 2010

It's a really simple list, and several of the users want to go backwards
in it to find the previous vma.  So rather than have to look up the
previous entry with 'find_vma_prev()' or something similar, just make it
doubly linked instead.
Tested-by: NIan Campbell <ijc@hellion.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

297c5eee

mtd: nand: Fix probe of Samsung NAND chips · cfe3fdad

由 Tilman Sauerbeck 提交于 8月 20, 2010

Apparently, the check for a 6-byte ID string introduced by commit
426c457a ("mtd: nand: extend NAND flash
detection to new MLC chips") is NOT sufficient to determine whether or
not a Samsung chip uses their new MLC detection scheme or the old,
standard scheme. This adds a condition to check cell type.
Signed-off-by: NTilman Sauerbeck <tilman@code-monkey.de>
Signed-off-by: NBrian Norris <norris@broadcom.com>
Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
Cc: stable@kernel.org

cfe3fdad

Merge branch 'x86-fixes-for-linus' of... · 36423a5e

由 Linus Torvalds 提交于 8月 20, 2010

Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, apic: Fix apic=debug boot crash
  x86, hotplug: Serialize CPU hotplug to avoid bringup concurrency issues
  x86-32: Fix dummy trampoline-related inline stubs
  x86-32: Separate 1:1 pagetables from swapper_pg_dir
  x86, cpu: Fix regression in AMD errata checking code

36423a5e

Documentation: fix ozlabs.org mailing list address · f6143a9b

由 Stephen Rothwell 提交于 8月 20, 2010

This list moved to lists.ozlabs.org quite some time ago.
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f6143a9b

MAINTAINERS: Fix ozlabs.org mailing list addresses · a4724ed6

由 Stephen Rothwell 提交于 8月 20, 2010

All these lists moved to lists.ozlabs.org quite a while ago.
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a4724ed6

Documentation: kernel-locking: mutex_trylock cannot be used in interrupt context · 1ee41680

由 Stefan Richter 提交于 8月 19, 2010

Chapter 6 is right about mutex_trylock, but chapter 10 wasn't.  This error
was introduced during semaphore-to-mutex conversion of the Unreliable
guide.  :-)

If user context which performs mutex_lock() or mutex_trylock() is
preempted by interrupt context which performs mutex_trylock() on the same
mutex instance, a deadlock occurs.  This is because these functions do not
disable local IRQs when they operate on mutex->wait_lock.
Signed-off-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
Acked-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1ee41680

drivers/scsi/qla4xxx: fix build · 626115cd

由 Andrew Morton 提交于 8月 19, 2010

gcc-4.0.2:

drivers/scsi/qla4xxx/ql4_os.c: In function 'qla4_8xxx_error_recovery':
drivers/scsi/qla4xxx/ql4_glbl.h:135: sorry, unimplemented: inlining failed in call to 'qla4_8xxx_set_drv_active': function body not available
drivers/scsi/qla4xxx/ql4_os.c:2377: sorry, unimplemented: called from here
drivers/scsi/qla4xxx/ql4_glbl.h:135: sorry, unimplemented: inlining failed in call to 'qla4_8xxx_set_drv_active': function body not available
drivers/scsi/qla4xxx/ql4_os.c:2393: sorry, unimplemented: called from here

Cc: Ravi Anand <ravi.anand@qlogic.com>
Cc: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

626115cd

uml: fix compile error in dma_get_cache_alignment() · f3c072ad

由 Miklos Szeredi 提交于 8月 19, 2010

Fix uml compile error:

  include/linux/dma-mapping.h:145: error: redefinition of 'dma_get_cache_alignment'
  arch/um/include/asm/dma-mapping.h:99: note: previous definition of 'dma_get_cache_alignment' was here

Introduced by commit 4565f017 ("dma-mapping: unify
dma_get_cache_alignment implementations")
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f3c072ad

oom: __task_cred() need rcu_read_lock() · 8d6c83f0

由 KOSAKI Motohiro 提交于 8月 19, 2010

dump_tasks() needs to hold the RCU read lock around its access of the
target task's UID.  To this end it should use task_uid() as it only needs
that one thing from the creds.

The fact that dump_tasks() holds tasklist_lock is insufficient to prevent the
target process replacing its credentials on another CPU.

Then, this patch change to call rcu_read_lock() explicitly.

	===================================================
	[ INFO: suspicious rcu_dereference_check() usage. ]
	---------------------------------------------------
	mm/oom_kill.c:410 invoked rcu_dereference_check() without protection!

	other info that might help us debug this:

	rcu_scheduler_active = 1, debug_locks = 1
	4 locks held by kworker/1:2/651:
	 #0:  (events){+.+.+.}, at: [<ffffffff8106aae7>]
	process_one_work+0x137/0x4a0
	 #1:  (moom_work){+.+...}, at: [<ffffffff8106aae7>]
	process_one_work+0x137/0x4a0
	 #2:  (tasklist_lock){.+.+..}, at: [<ffffffff810fafd4>]
	out_of_memory+0x164/0x3f0
	 #3:  (&(&p->alloc_lock)->rlock){+.+...}, at: [<ffffffff810fa48e>]
	find_lock_task_mm+0x2e/0x70
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8d6c83f0

oom: fix tasklist_lock leak · b52723c5

由 KOSAKI Motohiro 提交于 8月 19, 2010

Commit 0aad4b31 ("oom: fold __out_of_memory into out_of_memory")
introduced a tasklist_lock leak.  Then it caused following obvious
danger warnings and panic.

    ================================================
    [ BUG: lock held when returning to user space! ]
    ------------------------------------------------
    rsyslogd/1422 is leaving the kernel with locks still held!
    1 lock held by rsyslogd/1422:
     #0:  (tasklist_lock){.+.+.+}, at: [<ffffffff810faf64>] out_of_memory+0x164/0x3f0
    BUG: scheduling while atomic: rsyslogd/1422/0x00000002
    INFO: lockdep is turned off.

This patch fixes it.
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b52723c5

oom: fix NULL pointer dereference · be71cf22

由 KOSAKI Motohiro 提交于 8月 19, 2010

Commit b940fd70 ("oom: remove unnecessary code and cleanup") added an
unnecessary NULL pointer dereference.  remove it.
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

be71cf22

drivers/mmc/host/sdhci-s3c.c: use the correct mutex and card detect function · f522886e

由 Kyungmin Park 提交于 8月 19, 2010

There's some merge problem between sdhic core and sdhci-s3c host.  After
mutex is changed to spinlock.  It needs to use use spin lock functions and
use the correct card detection function.
Signed-off-by: NKyungmin Park <kyungmin.park@samsung.com>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f522886e

sdhci: add no hi-speed bit quirk support · 51932501

由 Kyungmin Park 提交于 8月 19, 2010

Some SDHCI controllers like s5pc110 don't have an HISPD bit in the HOSTCTL
register.
Signed-off-by: NKyungmin Park <kyungmin.park@samsung.com>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

51932501

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功