提交 · 550e2d9270e2f0a10c3b063899f70e4cca25fe72 · openeuler / Kernel

10 12月, 2009 3 次提交

drm/ttm: Have the TTM code return -ERESTARTSYS instead of -ERESTART. · 98ffc415

由 Thomas Hellstrom 提交于 15年前

Return -ERESTARTSYS instead of -ERESTART when interrupted by a signal.
The -ERESTARTSYS is converted to an -EINTR by the kernel signal layer
before returned to user-space.
Signed-off-by: NThomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: NJerome Glisse <jglisse@redhat.com>
Signed-off-by: NDave Airlie <airlied@redhat.com>

98ffc415

drm/ttm: Rework validation & memory space allocation (V3) · ca262a99

由 Jerome Glisse 提交于 15年前

This change allow driver to pass sorted memory placement,
from most prefered placement to least prefered placement.
In order to avoid long function prototype a structure is
used to gather memory placement informations such as range
restriction (if you need a buffer to be in given range).
Range restriction is determined by fpfn & lpfn which are
the first page and last page number btw which allocation
can happen. If those fields are set to 0 ttm will assume
buffer can be put anywhere in the address space (thus it
avoids putting a burden on the driver to always properly
set those fields).

This patch also factor few functions like evicting first
entry of lru list or getting a memory space. This avoid
code duplication.

V2: Change API to use placement flags and array instead
    of packing placement order into a quadword.
V3: Make sure we set the appropriate mem.placement flag
    when validating or allocation memory space.

[Pending Thomas Hellstrom further review but okay
from preliminary review so far].
Signed-off-by: NJerome Glisse <jglisse@redhat.com>
Signed-off-by: NDave Airlie <airlied@redhat.com>

ca262a99

drm: Add search/get functions to get a block in a specific range · a2e68e92

由 Jerome Glisse 提交于 15年前

These are required for changes to TTM.
Signed-off-by: NJerome Glisse <jglisse@redhat.com>
Signed-off-by: NDave Airlie <airlied@redhat.com>

a2e68e92

08 12月, 2009 4 次提交

drm/radeon/kms: add support for DP modesetting · 5801ead6

由 Alex Deucher 提交于 15年前

Signed-off-by: NAlex Deucher <alexdeucher@gmail.com>
Signed-off-by: NDave Airlie <airlied@redhat.com>

5801ead6

drm/radeon/kms: DP fixes and cleanup from the ddx · 1a66c95a

由 Alex Deucher 提交于 15年前

- dpcp -> dpcd
- fix up dig encoder routing
- aux transaction table takes delay in 10 usec units
Signed-off-by: NAlex Deucher <alexdeucher@gmail.com>
Signed-off-by: NDave Airlie <airlied@redhat.com>

1a66c95a

D
drm/radeon/kms: initial radeon displayport porting · 746c1aa4
由 Dave Airlie 提交于 15年前
```
This is enough to retrieve EDID and DPCP.
Signed-off-by: NDave Airlie <airlied@redhat.com>
```
746c1aa4

drm/intel: refactor DP i2c support and DP common header to drm helper · ab2c0672

由 Dave Airlie 提交于 15年前

Both radeon and nouveau can re-use this code so move it up a level
so they can. However the hw interfaces for aux ch are different
enough that the code to translate from mode, address, bytes
to actual hw interfaces isn't generic, so move that code into the
Intel driver.
Signed-off-by: NDave Airlie <airlied@redhat.com>

ab2c0672

07 12月, 2009 4 次提交

drm/ttm: Export symbols needed for the vmwgfx driver. · 4bfd75cb

由 Thomas Hellstrom 提交于 15年前

Signed-off-by: NThomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: NDave Airlie <airlied@redhat.com>

4bfd75cb

drm/ttm: Add TTM execbuf utilities. · c078aa2f

由 Thomas Hellstrom 提交于 15年前

Utilities to reserve, unreserve and fence a list of TTM
buffer objects in a deadlock-safe manner.

Used by the vmwgfx driver.
Signed-off-by: NThomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: NDave Airlie <airlied@redhat.com>

c078aa2f

drm/ttm: Add ttm lock functionality. · 4aff1013

由 Thomas Hellstrom 提交于 15年前

This is intended to be used by ttm-aware drivers to
1) Block clients to inactive masters when
they try to validate buffers for GPU use.
2) Optionally block clients to the current master when
there is thrashing due to GPU memory shortage.

Used by the vmwgfx driver.
Signed-off-by: NThomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: NDave Airlie <airlied@redhat.com>

4aff1013

drm/ttm: Add user-space objects. · 88071539

由 Thomas Hellstrom 提交于 15年前

Add objects needed for user-space to maintain reference counts on ttm objects.
This is used by the vmwgfx driver which allows user-space to maintain
map-counts on dma buffers, lock-counts on the ttm lock and ref-counts on
gpu surfaces, gpu contexts and dma buffer.
Signed-off-by: NThomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: NDave Airlie <airlied@redhat.com>

88071539

04 12月, 2009 7 次提交

drm: Add dirty ioctl and property · 884840aa

由 Jakob Bornecrantz 提交于 15年前

This commit adds a ioctl and property to allow userspace
to notify the kernel that a framebuffer has changed. Instead
of snooping the command stream this allows finer grained
tracking of which areas have changed.

The primary user for this functionality is virtual hardware
like the vmware svga device, but also Xen hardware likes to
be notify. There is also real hardware like DisplayLink and
DisplayPort that might take advantage of this ioctl.
Signed-off-by: NJakob Bornecrantz <jakob@vmware.com>
Signed-off-by: NDave Airlie <airlied@redhat.com>

884840aa

drm/ttm: Fix build failure due to missing struct page · c3a73ba1

由 Martin Michlmayr 提交于 15年前

drm/ttm fails to build on MIPS because "struct page" is not known:
| In file included from drivers/gpu/drm/ttm/ttm_memory.c:28:
| include/drm/ttm/ttm_memory.h:154: warning: 'struct page' declared inside parameter list
| include/drm/ttm/ttm_memory.h:154: warning: its scope is only this definition or declaration, which is probably not what you want
| include/drm/ttm/ttm_memory.h:156: warning: 'struct page' declared inside parameter list
| drivers/gpu/drm/ttm/ttm_memory.c:540: error: conflicting types for 'ttm_mem_global_alloc_page'
| include/drm/ttm/ttm_memory.h:154: error: previous declaration of 'ttm_mem_global_alloc_page' was here
| drivers/gpu/drm/ttm/ttm_memory.c:561: error: conflicting types for 'ttm_mem_global_free_page'
| include/drm/ttm/ttm_memory.h:156: error: previous declaration of 'ttm_mem_global_free_page' was here
Signed-off-by: NMartin Michlmayr <tbm@cyrius.com>
Cc: stable@kernel.org
Acked-by: NThomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: NDave Airlie <airlied@redhat.com>

c3a73ba1

drm: Add compatibility #ifdefs for *BSD · 1a95916f

由 Kristian Høgsberg 提交于 15年前

This let's use use the linux drm headers as the canonical source for
libdrm on all platforms.
Signed-off-by: NKristian Høgsberg <krh@bitplanet.net>
Signed-off-by: NDave Airlie <airlied@redhat.com>

1a95916f

drm: Add support for drm master_[set|drop] callbacks. · 862302ff

由 Thomas Hellstrom 提交于 15年前

The vmwgfx driver has a per master rw lock around TTM, to guarantee 
mutual exclusion when needed.

This is typically when all evictable buffers are evicted due to

1) vt switch
2) master switch
3) suspend / resume.

In the multi-master case, on master switch the new master takes the 
previously active master lock in write mode, and then evicts all 
buffers. Any clients to previous masters will then block on that lock 
when trying to validate a buffer. fbdev also acts as a virtual master
wrt this.
Signed-off-by: NThomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: NJakob Bornecrantz <jakob@vmware.com>
Signed-off-by: NDave Airlie <airlied@redhat.com>

862302ff

drm/edid: Decode 3-byte CVT codes from EDID 1.4 · 9340d8cf

由 Adam Jackson 提交于 15年前

Signed-off-by: NAdam Jackson <ajax@redhat.com>
Signed-off-by: NDave Airlie <airlied@redhat.com>

9340d8cf

A
drm/edid: Add new detailed block types from EDID 1.4 · 2dbdc52c
由 Adam Jackson 提交于 15年前
```
Signed-off-by: NAdam Jackson <ajax@redhat.com>
Signed-off-by: NDave Airlie <airlied@redhat.com>
```
2dbdc52c

drm/modes: Add drm_mode_hsync() · 7ac96a9c

由 Adam Jackson 提交于 15年前

Signed-off-by: NAdam Jackson <ajax@redhat.com>
Signed-off-by: NDave Airlie <airlied@redhat.com>

7ac96a9c

02 12月, 2009 5 次提交

drm/i915: Fix typo in ioctl struct name. · 04b2d218

由 Kristian Høgsberg 提交于 15年前

Signed-off-by: NKristian Høgsberg <krh@bitplanet.net>
Signed-off-by: NEric Anholt <eric@anholt.net>

04b2d218

drm/i915: Fix sync to vblank when VGA output is turned off · 778c9026

由 Li Peng 提交于 15年前

In current vblank-wait implementation, if we turn off VGA output,
drm_wait_vblank will still wait on the disabled pipe until timeout,
because vblank on the pipe is assumed be enabled. This would cause
slow system response on some system such as moblin.

This patch resolve the issue by adding a drm helper function
drm_vblank_off which explicitly clear vblank_enabled[crtc], wake up
any waiting queue and save last vblank counter before turning off
crtc. It also slightly change drm_vblank_get to ensure that we will
will return immediately if trying to wait on a disabled pipe.
Signed-off-by: NLi Peng <peng.li@intel.com>
Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org>
[anholt: hand-applied for conflicts with overlay changes]
Signed-off-by: NEric Anholt <eric@anholt.net>

778c9026

drm/i915: add GETPARAM request for page flipping · e9560f7c

由 Jesse Barnes 提交于 15年前

Add a GETPARAM request for checking if page flipping is supported.
Useful for the 2D driver to enable the flipping path.
Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: NEric Anholt <eric@anholt.net>

e9560f7c

drm: use page flip event to signal flip completion · 7bd4d7be

由 Jesse Barnes 提交于 15年前

We don't actually know which frame number the flip will complete on, so
userspace needs a specific flip notification to tell it when the last flip
completed.
Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: NEric Anholt <eric@anholt.net>
Acked-by: NKristian Høgsberg <krh@bitplanet.net>

7bd4d7be

SLOW_WORK: Move slow_work's proc file to debugfs · f13a48bd

由 David Howells 提交于 15年前

Move slow_work's debugging proc file to debugfs.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Requested-and-acked-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f13a48bd

01 12月, 2009 2 次提交

mfd: Correct WM831X_MAX_ISEL_VALUE · 7716977b

由 Mark Brown 提交于 15年前

There was confusion between the array size and the highest ISEL
value possible.
Reported-by: NDan Carpenter <error27@gmail.com>
Signed-off-by: NMark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>

7716977b

mac80211: fix spurious delBA handling · 827d42c9

由 Johannes Berg 提交于 15年前

Lennert Buytenhek noticed that delBA handling in mac80211
was broken and has remotely triggerable problems, some of
which are due to some code shuffling I did that ended up
changing the order in which things were done -- this was

  commit d75636ef
  Author: Johannes Berg <johannes@sipsolutions.net>
  Date:   Tue Feb 10 21:25:53 2009 +0100

    mac80211: RX aggregation: clean up stop session

and other parts were already present in the original

  commit d92684e6
  Author: Ron Rindjunsky <ron.rindjunsky@intel.com>
  Date:   Mon Jan 28 14:07:22 2008 +0200

      mac80211: A-MPDU Tx add delBA from recipient support

The first problem is that I moved a BUG_ON before various
checks -- thereby making it possible to hit. As the comment
indicates, the BUG_ON can be removed since the ampdu_action
callback must already exist when the state is != IDLE.

The second problem isn't easily exploitable but there's a
race condition due to unconditionally setting the state to
OPERATIONAL when a delBA frame is received, even when no
aggregation session was ever initiated. All the drivers
accept stopping the session even then, but that opens a
race window where crashes could happen before the driver
accepts it. Right now, a WARN_ON may happen with non-HT
drivers, while the race opens only for HT drivers.

For this case, there are two things necessary to fix it:
 1) don't process spurious delBA frames, and be more careful
    about the session state; don't drop the lock

 2) HT drivers need to be prepared to handle a session stop
    even before the session was really started -- this is
    true for all drivers (that support aggregation) but
    iwlwifi which can be fixed easily. The other HT drivers
    (ath9k and ar9170) are behaving properly already.
Reported-by: NLennert Buytenhek <buytenh@marvell.com>
Cc: stable@kernel.org
Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

827d42c9

29 11月, 2009 1 次提交

sctp: on T3_RTX retransmit all the in-flight chunks · 5fdd4bae

由 Andrei Pelinescu-Onciul 提交于 15年前

When retransmitting due to T3 timeout, retransmit all the
in-flight chunks for the corresponding  transport/path, including
chunks sent less then 1 rto ago.
This is the correct behaviour according to rfc4960 section 6.3.3
E3 and
"Note: Any DATA chunks that were sent to the address for which the
 T3-rtx timer expired but did not fit in one MTU (rule E3 above)
 should be marked for retransmission and sent as soon as cwnd
 allows (normally, when a SACK arrives). ".

This fixes problems when more then one path is present and the T3
retransmission of the first chunk that timeouts stops the T3 timer
for the initial active path, leaving all the other in-flight
chunks waiting forever or until a new chunk is transmitted on the
same path and timeouts (and this will happen only if the cwnd
allows sending new chunks, but since cwnd was dropped to MTU by
the timeout => it will wait until the first heartbeat).

Example: 10 packets in flight, sent at 0.1 s intervals on the
primary path. The primary path is down and the first packet
timeouts. The first packet is retransmitted on another path, the
T3 timer for the primary path is stopped and cwnd is set to MTU.
All the other 9 in-flight packets will not be retransmitted
(unless more new packets are sent on the primary path which depend
on cwnd allowing it, and even in this case the 9 packets will be
retransmitted only after a new packet timeouts which even in the
best case would be more then RTO).

This commit reverts d0ce9291 and
also removes the now unused transport->last_rto, introduced in
 b6157d8e.

p.s  The problem is not only when multiple paths are there.  It
can happen in a single homed environment.  If the application
stops sending data, it possible to have a hung association.
Signed-off-by: NAndrei Pelinescu-Onciul <andrei@iptel.org>
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5fdd4bae

26 11月, 2009 1 次提交

[SCSI] fix async scan add/remove race resulting in an oops · 860dc736

由 James Bottomley 提交于 15年前

Async scanning introduced a very wide window where the SCSI device is
up and running but has not yet been added to sysfs.  We delay the
adding until all scans have completed to retain the same ordering as
sync scanning.

This delay in visibility causes an oops if a device is removed before
we make it visible because the SCSI removal routines have an inbuilt
assumption that if a device is in SDEV_RUNNING state, it must be
visible (which is not necessarily true in the async scanning case).

Fix this by introducing an additional is_visible flag which we can use
to condition the tear down so we do the right thing for running but
not yet made visible.
Reported-by: NAlexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>

860dc736

25 11月, 2009 1 次提交

drm/i915: Replace a calloc followed by copying data over it with malloc. · c8e0f93a

由 Eric Anholt 提交于 15年前

Execbufs involve quite a bit of payload, to the extent that cache misses
show up in the profiles here, and a suspicion that some of those cachelines
may get evicted and then reloaded in the subsequent copy.

This is still abstracted like drm_calloc_large since we want to check for
size overflow, and because we want to choose between kmalloc and vmalloc
on the fly.  cairo's interface for malloc-with-calloc's-args was used as
the model.
Signed-off-by: NEric Anholt <eric@anholt.net>

c8e0f93a

20 11月, 2009 12 次提交

i2c: i2c-pnx: Made buf type unsigned to prevent sign extension · 4ced24c8

由 Kevin Wells 提交于 15年前

Made buf type unsigned to prevent sign extension
Signed-off-by: NKevin Wells <kevin.wells@nxp.com>
Signed-off-by: NBen Dooks <ben-linux@fluff.org>

4ced24c8

vt: Fix use of "new" in a struct field · 308efab5

由 Alan Cox 提交于 15年前

As this struct is exposed to user space and the API was added for this
release it's a bit of a pain for the C++ world and we still have time to
fix it. Rename the fields before we end up with that pain in an actual
release.
Signed-off-by: NAlan Cox <alan@linux.intel.com>
Reported-by: Olivier Goffart
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

308efab5

CacheFiles: Catch an overly long wait for an old active object · fee096de

由 David Howells 提交于 15年前

Catch an overly long wait for an old, dying active object when we want to
replace it with a new one.  The probability is that all the slow-work threads
are hogged, and the delete can't get a look in.

What we do instead is:

 (1) if there's nothing in the slow work queue, we sleep until either the dying
     object has finished dying or there is something in the slow work queue
     behind which we can queue our object.

 (2) if there is something in the slow work queue, we return ETIMEDOUT to
     fscache_lookup_object(), which then puts us back on the slow work queue,
     presumably behind the deletion that we're blocked by.  We are then
     deferred for a while until we work our way back through the queue -
     without blocking a slow-work thread unnecessarily.

A backtrace similar to the following may appear in the log without this patch:

	INFO: task kslowd004:5711 blocked for more than 120 seconds.
	"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
	kslowd004     D 0000000000000000     0  5711      2 0x00000080
	 ffff88000340bb80 0000000000000046 ffff88002550d000 0000000000000000
	 ffff88002550d000 0000000000000007 ffff88000340bfd8 ffff88002550d2a8
	 000000000000ddf0 00000000000118c0 00000000000118c0 ffff88002550d2a8
	Call Trace:
	 [<ffffffff81058e21>] ? trace_hardirqs_on+0xd/0xf
	 [<ffffffffa011c4d8>] ? cachefiles_wait_bit+0x0/0xd [cachefiles]
	 [<ffffffffa011c4e1>] cachefiles_wait_bit+0x9/0xd [cachefiles]
	 [<ffffffff81353153>] __wait_on_bit+0x43/0x76
	 [<ffffffff8111ae39>] ? ext3_xattr_get+0x1ec/0x270
	 [<ffffffff813531ef>] out_of_line_wait_on_bit+0x69/0x74
	 [<ffffffffa011c4d8>] ? cachefiles_wait_bit+0x0/0xd [cachefiles]
	 [<ffffffff8104c125>] ? wake_bit_function+0x0/0x2e
	 [<ffffffffa011bc79>] cachefiles_mark_object_active+0x203/0x23b [cachefiles]
	 [<ffffffffa011c209>] cachefiles_walk_to_object+0x558/0x827 [cachefiles]
	 [<ffffffffa011a429>] cachefiles_lookup_object+0xac/0x12a [cachefiles]
	 [<ffffffffa00aa1e9>] fscache_lookup_object+0x1c7/0x214 [fscache]
	 [<ffffffffa00aafc5>] fscache_object_state_machine+0xa5/0x52d [fscache]
	 [<ffffffffa00ab4ac>] fscache_object_slow_work_execute+0x5f/0xa0 [fscache]
	 [<ffffffff81082093>] slow_work_execute+0x18f/0x2d1
	 [<ffffffff8108239a>] slow_work_thread+0x1c5/0x308
	 [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34
	 [<ffffffff810821d5>] ? slow_work_thread+0x0/0x308
	 [<ffffffff8104be91>] kthread+0x7a/0x82
	 [<ffffffff8100beda>] child_rip+0xa/0x20
	 [<ffffffff8100b87c>] ? restore_args+0x0/0x30
	 [<ffffffff8104be17>] ? kthread+0x0/0x82
	 [<ffffffff8100bed0>] ? child_rip+0x0/0x20
	1 lock held by kslowd004/5711:
	 #0:  (&sb->s_type->i_mutex_key#7/1){+.+.+.}, at: [<ffffffffa011be64>] cachefiles_walk_to_object+0x1b3/0x827 [cachefiles]
Signed-off-by: NDavid Howells <dhowells@redhat.com>

fee096de

CacheFiles: Don't write a full page if there's only a partial page to cache · a17754fb

由 David Howells 提交于 15年前

cachefiles_write_page() writes a full page to the backing file for the last
page of the netfs file, even if the netfs file's last page is only a partial
page.

This causes the EOF on the backing file to be extended beyond the EOF of the
netfs, and thus the backing file will be truncated by cachefiles_attr_changed()
called from cachefiles_lookup_object().

So we need to limit the write we make to the backing file on that last page
such that it doesn't push the EOF too far.

Also, if a backing file that has a partial page at the end is expanded, we
discard the partial page and refetch it on the basis that we then have a hole
in the file with invalid data, and should the power go out... A better way to
deal with this could be to record a note that the partial page contains invalid
data until the correct data is written into it.

This isn't a problem for netfs's that discard the whole backing file if the
file size changes (such as NFS).
Signed-off-by: NDavid Howells <dhowells@redhat.com>

a17754fb

FS-Cache: Start processing an object's operations on that object's death · 60d543ca

由 David Howells 提交于 15年前

Start processing an object's operations when that object moves into the DYING
state as the object cannot be destroyed until all its outstanding operations
have completed.

Furthermore, make sure that read and allocation operations handle being woken
up on a dead object. Such events are recorded in the Allocs.abt and
Retrvls.abt statistics as viewable through /proc/fs/fscache/stats.

The code for waiting for object activation for the read and allocation
operations is also extracted into its own function as it is much the same in
all cases, differing only in the stats incremented.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

60d543ca

FS-Cache: Handle pages pending storage that get evicted under OOM conditions · 201a1542

由 David Howells 提交于 15年前

Handle netfs pages that the vmscan algorithm wants to evict from the pagecache
under OOM conditions, but that are waiting for write to the cache.  Under these
conditions, vmscan calls the releasepage() function of the netfs, asking if a
page can be discarded.

The problem is typified by the following trace of a stuck process:

	kslowd005     D 0000000000000000     0  4253      2 0x00000080
	 ffff88001b14f370 0000000000000046 ffff880020d0d000 0000000000000007
	 0000000000000006 0000000000000001 ffff88001b14ffd8 ffff880020d0d2a8
	 000000000000ddf0 00000000000118c0 00000000000118c0 ffff880020d0d2a8
	Call Trace:
	 [<ffffffffa00782d8>] __fscache_wait_on_page_write+0x8b/0xa7 [fscache]
	 [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34
	 [<ffffffffa0078240>] ? __fscache_check_page_write+0x63/0x70 [fscache]
	 [<ffffffffa00b671d>] nfs_fscache_release_page+0x4e/0xc4 [nfs]
	 [<ffffffffa00927f0>] nfs_release_page+0x3c/0x41 [nfs]
	 [<ffffffff810885d3>] try_to_release_page+0x32/0x3b
	 [<ffffffff81093203>] shrink_page_list+0x316/0x4ac
	 [<ffffffff8109372b>] shrink_inactive_list+0x392/0x67c
	 [<ffffffff813532fa>] ? __mutex_unlock_slowpath+0x100/0x10b
	 [<ffffffff81058df0>] ? trace_hardirqs_on_caller+0x10c/0x130
	 [<ffffffff8135330e>] ? mutex_unlock+0x9/0xb
	 [<ffffffff81093aa2>] shrink_list+0x8d/0x8f
	 [<ffffffff81093d1c>] shrink_zone+0x278/0x33c
	 [<ffffffff81052d6c>] ? ktime_get_ts+0xad/0xba
	 [<ffffffff81094b13>] try_to_free_pages+0x22e/0x392
	 [<ffffffff81091e24>] ? isolate_pages_global+0x0/0x212
	 [<ffffffff8108e743>] __alloc_pages_nodemask+0x3dc/0x5cf
	 [<ffffffff81089529>] grab_cache_page_write_begin+0x65/0xaa
	 [<ffffffff8110f8c0>] ext3_write_begin+0x78/0x1eb
	 [<ffffffff81089ec5>] generic_file_buffered_write+0x109/0x28c
	 [<ffffffff8103cb69>] ? current_fs_time+0x22/0x29
	 [<ffffffff8108a509>] __generic_file_aio_write+0x350/0x385
	 [<ffffffff8108a588>] ? generic_file_aio_write+0x4a/0xae
	 [<ffffffff8108a59e>] generic_file_aio_write+0x60/0xae
	 [<ffffffff810b2e82>] do_sync_write+0xe3/0x120
	 [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34
	 [<ffffffff810b18e1>] ? __dentry_open+0x1a5/0x2b8
	 [<ffffffff810b1a76>] ? dentry_open+0x82/0x89
	 [<ffffffffa00e693c>] cachefiles_write_page+0x298/0x335 [cachefiles]
	 [<ffffffffa0077147>] fscache_write_op+0x178/0x2c2 [fscache]
	 [<ffffffffa0075656>] fscache_op_execute+0x7a/0xd1 [fscache]
	 [<ffffffff81082093>] slow_work_execute+0x18f/0x2d1
	 [<ffffffff8108239a>] slow_work_thread+0x1c5/0x308
	 [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34
	 [<ffffffff810821d5>] ? slow_work_thread+0x0/0x308
	 [<ffffffff8104be91>] kthread+0x7a/0x82
	 [<ffffffff8100beda>] child_rip+0xa/0x20
	 [<ffffffff8100b87c>] ? restore_args+0x0/0x30
	 [<ffffffff8102ef83>] ? tg_shares_up+0x171/0x227
	 [<ffffffff8104be17>] ? kthread+0x0/0x82
	 [<ffffffff8100bed0>] ? child_rip+0x0/0x20

In the above backtrace, the following is happening:

 (1) A page storage operation is being executed by a slow-work thread
     (fscache_write_op()).

 (2) FS-Cache farms the operation out to the cache to perform
     (cachefiles_write_page()).

 (3) CacheFiles is then calling Ext3 to perform the actual write, using Ext3's
     standard write (do_sync_write()) under KERNEL_DS directly from the netfs
     page.

 (4) However, for Ext3 to perform the write, it must allocate some memory, in
     particular, it must allocate at least one page cache page into which it
     can copy the data from the netfs page.

 (5) Under OOM conditions, the memory allocator can't immediately come up with
     a page, so it uses vmscan to find something to discard
     (try_to_free_pages()).

 (6) vmscan finds a clean netfs page it might be able to discard (possibly the
     one it's trying to write out).

 (7) The netfs is called to throw the page away (nfs_release_page()) - but it's
     called with __GFP_WAIT, so the netfs decides to wait for the store to
     complete (__fscache_wait_on_page_write()).

 (8) This blocks a slow-work processing thread - possibly against itself.

The system ends up stuck because it can't write out any netfs pages to the
cache without allocating more memory.

To avoid this, we make FS-Cache cancel some writes that aren't in the middle of
actually being performed.  This means that some data won't make it into the
cache this time.  To support this, a new FS-Cache function is added
fscache_maybe_release_page() that replaces what the netfs releasepage()
functions used to do with respect to the cache.

The decisions fscache_maybe_release_page() makes are counted and displayed
through /proc/fs/fscache/stats on a line labelled "VmScan".  There are four
counters provided: "nos=N" - pages that weren't pending storage; "gon=N" -
pages that were pending storage when we first looked, but weren't by the time
we got the object lock; "bsy=N" - pages that we ignored as they were actively
being written when we looked; and "can=N" - pages that we cancelled the storage
of.

What I'd really like to do is alter the behaviour of the cancellation
heuristics, depending on how necessary it is to expel pages.  If there are
plenty of other pages that aren't waiting to be written to the cache that
could be ejected first, then it would be nice to hold up on immediate
cancellation of cache writes - but I don't see a way of doing that.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

201a1542

FS-Cache: Fix lock misorder in fscache_write_op() · 1bccf513

由 David Howells 提交于 15年前

FS-Cache has two structs internally for keeping track of the internal state of
a cached file: the fscache_cookie struct, which represents the netfs's state,
and fscache_object struct, which represents the cache's state.  Each has a
pointer that points to the other (when both are in existence), and each has a
spinlock for pointer maintenance.

Since netfs operations approach these structures from the cookie side, they get
the cookie lock first, then the object lock.  Cache operations, on the other
hand, approach from the object side, and get the object lock first.  It is not
then permitted for a cache operation to get the cookie lock whilst it is
holding the object lock lest deadlock occur; instead, it must do one of two
things:

 (1) increment the cookie usage counter, drop the object lock and then get both
     locks in order, or

 (2) simply hold the object lock as certain parts of the cookie may not be
     altered whilst the object lock is held.

It is also not permitted to follow either pointer without holding the lock at
the end you start with.  To break the pointers between the cookie and the
object, both locks must be held.

fscache_write_op(), however, violates the locking rules: It attempts to get the
cookie lock without (a) checking that the cookie pointer is a valid pointer,
and (b) holding the object lock to protect the cookie pointer whilst it follows
it.  This is so that it can access the pending page store tree without
interference from __fscache_write_page().

This is fixed by splitting the cookie lock, such that the page store tracking
tree is protected by its own lock, and checking that the cookie pointer is
non-NULL before we attempt to follow it whilst holding the object lock.

The new lock is subordinate to both the cookie lock and the object lock, and so
should be taken after those.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

1bccf513

FS-Cache: Allow the current state of all objects to be dumped · 4fbf4291

由 David Howells 提交于 15年前

Allow the current state of all fscache objects to be dumped by doing:

	cat /proc/fs/fscache/objects

By default, all objects and all fields will be shown.  This can be restricted
by adding a suitable key to one of the caller's keyrings (such as the session
keyring):

	keyctl add user fscache:objlist "<restrictions>" @s

The <restrictions> are:

	K	Show hexdump of object key (don't show if not given)
	A	Show hexdump of object aux data (don't show if not given)

And paired restrictions:

	C	Show objects that have a cookie
	c	Show objects that don't have a cookie
	B	Show objects that are busy
	b	Show objects that aren't busy
	W	Show objects that have pending writes
	w	Show objects that don't have pending writes
	R	Show objects that have outstanding reads
	r	Show objects that don't have outstanding reads
	S	Show objects that have slow work queued
	s	Show objects that don't have slow work queued

If neither side of a restriction pair is given, then both are implied.  For
example:

	keyctl add user fscache:objlist KB @s

shows objects that are busy, and lists their object keys, but does not dump
their auxiliary data.  It also implies "CcWwRrSs", but as 'B' is given, 'b' is
not implied.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

4fbf4291

FS-Cache: Annotate slow-work runqueue proc lines for FS-Cache work items · 440f0aff

由 David Howells 提交于 15年前

Annotate slow-work runqueue proc lines for FS-Cache work items. Objects
include the object ID and the state. Operations include the object ID, the
operation ID and the operation type and state.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

440f0aff

SLOW_WORK: Allow a requeueable work item to sleep till the thread is needed · 3bde31a4

由 David Howells 提交于 15年前

Add a function to allow a requeueable work item to sleep till the thread
processing it is needed by the slow-work facility to perform other work.

Sometimes a work item can't progress immediately, but must wait for the
completion of another work item that's currently being processed by another
slow-work thread.

In some circumstances, the waiting item could instead - theoretically - put
itself back on the queue and yield its thread back to the slow-work facility,
thus waiting till it gets processing time again before attempting to progress.
This would allow other work items processing time on that thread.

However, this only works if there is something on the queue for it to queue
behind - otherwise it will just get a thread again immediately, and will end
up cycling between the queue and the thread, eating up valuable CPU time.

So, slow_work_sleep_till_thread_needed() is provided such that an item can put
itself on a wait queue that will wake it up when the event it is actually
interested in occurs, then call this function in lieu of calling schedule().

This function will then sleep until either the item's event occurs or another
work item appears on the queue. If another work item is queued, but the
item's event hasn't occurred, then the work item should requeue itself and
yield the thread back to the slow-work facility by returning.

This can be used by CacheFiles for an object that is being created on one
thread to wait for an object being deleted on another thread where there is
nothing on the queue for the creation to go and wait behind. As soon as an
item appears on the queue that could be given thread time instead, CacheFiles
can stick the creating object back on the queue and return to the slow-work
facility - assuming the object deletion didn't also complete.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

3bde31a4

SLOW_WORK: Allow the owner of a work item to determine if it is queued or not · 31ba99d3

由 David Howells 提交于 15年前

Add a function (slow_work_is_queued()) to permit the owner of a work item to
determine if the item is queued or not.

The work item is counted as being queued if it is actually on the queue, not
just if it is pending. If it is executing and pending, then it is not on the
queue, but will rather be put back on the queue when execution finishes.

This permits a caller to quickly work out if it may be able to put another,
dependent work item on the queue behind it, or whether it will have to wait
till that is finished.

This can be used by CacheFiles to work out whether the creation a new object
can be immediately deferred when it has to wait for an old object to be
deleted, or whether a wait must take place. If a wait is necessary, then the
slow-work thread can otherwise get blocked, preventing the deletion from
taking place.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

31ba99d3

SLOW_WORK: Allow the work items to be viewed through a /proc file · 8fba10a4

由 David Howells 提交于 15年前

Allow the executing and queued work items to be viewed through a /proc file
for debugging purposes.  The contents look something like the following:

    THR PID   ITEM ADDR        FL MARK  DESC
    === ===== ================ == ===== ==========
      0  3005 ffff880023f52348  a 952ms FSC: OBJ17d3: LOOK
      1  3006 ffff880024e33668  2 160ms FSC: OBJ17e5 OP60d3b: Write1/Store fl=2
      2  3165 ffff8800296dd180  a 424ms FSC: OBJ17e4: LOOK
      3  4089 ffff8800262c8d78  a 212ms FSC: OBJ17ea: CRTN
      4  4090 ffff88002792bed8  2 388ms FSC: OBJ17e8 OP60d36: Write1/Store fl=2
      5  4092 ffff88002a0ef308  2 388ms FSC: OBJ17e7 OP60d2e: Write1/Store fl=2
      6  4094 ffff88002abaf4b8  2 132ms FSC: OBJ17e2 OP60d4e: Write1/Store fl=2
      7  4095 ffff88002bb188e0  a 388ms FSC: OBJ17e9: CRTN
    vsq     - ffff880023d99668  1 308ms FSC: OBJ17e0 OP60f91: Write1/EnQ fl=2
    vsq     - ffff8800295d1740  1 212ms FSC: OBJ16be OP4d4b6: Write1/EnQ fl=2
    vsq     - ffff880025ba3308  1 160ms FSC: OBJ179a OP58dec: Write1/EnQ fl=2
    vsq     - ffff880024ec83e0  1 160ms FSC: OBJ17ae OP599f2: Write1/EnQ fl=2
    vsq     - ffff880026618e00  1 160ms FSC: OBJ17e6 OP60d33: Write1/EnQ fl=2
    vsq     - ffff880025a2a4b8  1 132ms FSC: OBJ16a2 OP4d583: Write1/EnQ fl=2
    vsq     - ffff880023cbe6d8  9 212ms FSC: OBJ17eb: LOOK
    vsq     - ffff880024d37590  9 212ms FSC: OBJ17ec: LOOK
    vsq     - ffff880027746cb0  9 212ms FSC: OBJ17ed: LOOK
    vsq     - ffff880024d37ae8  9 212ms FSC: OBJ17ee: LOOK
    vsq     - ffff880024d37cb0  9 212ms FSC: OBJ17ef: LOOK
    vsq     - ffff880025036550  9 212ms FSC: OBJ17f0: LOOK
    vsq     - ffff8800250368e0  9 212ms FSC: OBJ17f1: LOOK
    vsq     - ffff880025036aa8  9 212ms FSC: OBJ17f2: LOOK

In the 'THR' column, executing items show the thread they're occupying and
queued threads indicate which queue they're on.  'PID' shows the process ID of
a slow-work thread that's executing something.  'FL' shows the work item flags.
'MARK' indicates how long since an item was queued or began executing.  Lastly,
the 'DESC' column permits the owner of an item to give some information.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

8fba10a4

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功