提交 · 4134bf81ffd962f4de9bbeca55130d2238bd3698 · openanolis / cloud-kernel

20 1月, 2012 2 次提交
- A
  qnx4: reduce the insane nesting in qnx4_checkroot() · 4134bf81
  由 Al Viro 提交于 1月 19, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  4134bf81
- A
  qnx4: di_fname is an array, for crying out loud... · 1aab323e
  由 Al Viro 提交于 1月 19, 2012
```
(struct qnx4_inode_entry *)(bh->b_data + some_offset)->di_fname
is not going to be NULL, TYVM...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  1aab323e
18 1月, 2012 2 次提交

vfs: remove printk from set_nlink() · 424a5334

由 Miklos Szeredi 提交于 1月 12, 2012

Don't log a message for set_nlink(0).
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

424a5334

wake up s_wait_unfrozen when ->freeze_fs fails · e1616300

由 Kazuya Mio 提交于 12月 01, 2011

dd slept infinitely when fsfeeze failed because of EIO.
To fix this problem, if ->freeze_fs fails, freeze_super() wakes up
the tasks waiting for the filesystem to become unfrozen.

When s_frozen isn't SB_UNFROZEN in __generic_file_aio_write(),
the function sleeps until FITHAW ioctl wakes up s_wait_unfrozen.

However, if ->freeze_fs fails, s_frozen is set to SB_UNFROZEN and then
freeze_super() returns an error number. In this case, FITHAW ioctl returns
EINVAL because s_frozen is already SB_UNFROZEN. There is no way to wake up
s_wait_unfrozen, so __generic_file_aio_write() sleeps infinitely.
Signed-off-by: NKazuya Mio <k-mio@sx.jp.nec.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e1616300

15 1月, 2012 2 次提交

UBIFS: fix non-debug configuration build · e234b5f2

由 Dominique Martinet 提交于 1月 15, 2012

Fix a brown paperbag bug introduced by me in the previous commit. I was
in hurry and forgot about the non-debug case completely.

Artem: amend the commit message and tweak the patch to preserve alignment.
This made the patch a bit less readable, though.
Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@linux.intel.com>

e234b5f2

fsnotify: don't BUG in fsnotify_destroy_mark() · fed47485

由 Miklos Szeredi 提交于 1月 12, 2012

Removing the parent of a watched file results in "kernel BUG at
fs/notify/mark.c:139".

To reproduce

  add "-w /tmp/audit/dir/watched_file" to audit.rules
  rm -rf /tmp/audit/dir

This is caused by fsnotify_destroy_mark() being called without an
extra reference taken by the caller.

Reported by Francesco Cosoleto here:

  https://bugzilla.novell.com/show_bug.cgi?id=689860

Fix by removing the BUG_ON and adding a comment about not accessing mark after
the iput.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
CC: stable@vger.kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fed47485

14 1月, 2012 2 次提交

Unused iocbs in a batch should not be accounted as active. · 69e4747e

由 Gleb Natapov 提交于 1月 08, 2012

Since commit 080d676d ("aio: allocate kiocbs in batches") iocbs are
allocated in a batch during processing of first iocbs.  All iocbs in a
batch are automatically added to ctx->active_reqs list and accounted in
ctx->reqs_active.

If one (not the last one) of iocbs submitted by an user fails, further
iocbs are not processed, but they are still present in ctx->active_reqs
and accounted in ctx->reqs_active.  This causes process to stuck in a D
state in wait_for_all_aios() on exit since ctx->reqs_active will never
go down to zero.  Furthermore since kiocb_batch_free() frees iocb
without removing it from active_reqs list the list become corrupted
which may cause oops.

Fix this by removing iocb from ctx->active_reqs and updating
ctx->reqs_active in kiocb_batch_free().
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
Cc: stable@kernel.org   # 3.2
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

69e4747e

autofs4 - fix deal with autofs4_write races · 8638094e

由 Ian Kent 提交于 1月 13, 2012

I don't know how I missed this obvious mistake when I
reviewed Als' patches, sorry.

[ Quoting Al:

	Grr...  Note to self: do git status *and* git stash show -p
	before git push.  Nothing like "WTF? I'd fixed that braino"
	feeling ;-/

  Al sent the same patch - it got broken in commit d668dc56:
  "autofs4: deal with autofs4_write/autofs4_write races". ]
Reported-and-tested-by: NDave Airlie <airlied@redhat.com>
Signed-off-by: NIan Kent <raven@themaw.net>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8638094e

13 1月, 2012 29 次提交

UBIFS: fix key printing · 515315a1

由 Artem Bityutskiy 提交于 1月 13, 2012

Before commit 56e46742 we have had locking
around all printing macros and we could use static buffers for creating
key strings and printing them. However, now we do not have that locking and
we cannot use static buffers. This commit removes the old DBGKEY() macros
and introduces few new helper macros for printing debugging messages plus
a key at the end. Thankfully, all the messages are already structures in
a way that the key is printed in the end.
Signed-off-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>

515315a1

UBIFS: use snprintf instead of sprintf when printing keys · beba0060

由 Artem Bityutskiy 提交于 1月 11, 2012

Switch to 'snprintf()' which is more secure and reliable. This is also a
preparation to the subsequent key printing fixes.
Signed-off-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>

beba0060

c/r: procfs: add start_data, end_data, start_brk members to /proc/$pid/stat v4 · b3f7f573

由 Cyrill Gorcunov 提交于 1月 12, 2012

The mm->start_code/end_code, mm->start_data/end_data, mm->start_brk are
involved into calculation of program text/data segment sizes (which might
be seen in /proc/<pid>/statm) and into brk() call final address.

For restore we need to know all these values.  While
mm->start_code/end_code already present in /proc/$pid/stat, the rest
members are not, so this patch brings them in.

The restore procedure of these members is addressed in another patch using
prctl().
Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Reviewed-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b3f7f573

dio: optimize cache misses in the submission path · 65dd2aa9

由 Andi Kleen 提交于 1月 12, 2012

Some investigation of a transaction processing workload showed that a
major consumer of cycles in __blockdev_direct_IO is the cache miss while
accessing the block size.  This is because it has to walk the chain from
block_dev to gendisk to queue.

The block size is needed early on to check alignment and sizes.  It's only
done if the check for the inode block size fails.  But the costly block
device state is unconditionally fetched.

- Reorganize the code to only fetch block dev state when actually
  needed.

Then do a prefetch on the block dev early on in the direct IO path.  This
is worth it, because there is substantial code run before we actually
touch the block dev now.

- I also added some unlikelies to make it clear the compiler that block
  device fetch code is not normally executed.

This gave a small, but measurable improvement on a large database
benchmark (about 0.3%)

[akpm@linux-foundation.org: coding-style fixes]
[sfr@canb.auug.org.au: using prefetch requires including prefetch.h]
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

65dd2aa9

vfs: cache request_queue in struct block_device · 87192a2a

由 Andi Kleen 提交于 1月 12, 2012

This makes it possible to get from the inode to the request_queue with one
less cache miss.  Used in followon optimization.

The livetime of the pointer is the same as the gendisk.

This assumes that the queue will always stay the same in the gendisk while
it's visible to block_devices.  I think that's safe correct?
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NJeff Moyer <jmoyer@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

87192a2a

fs/direct-io.c: calculate fs_count correctly in get_more_blocks() · ae55e1aa

由 Tao Ma 提交于 1月 12, 2012

In get_more_blocks(), we use dio_count to calcuate fs_count and do some
tricky things to increase fs_count if dio_count isn't aligned.  But
actually it still has some corner cases that can't be coverd.  See the
following example:

	dio_write foo -s 1024 -w 4096

(direct write 4096 bytes at offset 1024).  The same goes if the offset
isn't aligned to fs_blocksize.

In this case, the old calculation counts fs_count to be 1, but actually we
will write into 2 different blocks (if fs_blocksize=4096).  The old code
just works, since it will call get_block twice (and may have to allocate
and create extents twice for filesystems like ext4).  So we'd better call
get_block just once with the proper fs_count.
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ae55e1aa

mm: compaction: introduce sync-light migration for use by compaction · a6bc32b8

由 Mel Gorman 提交于 1月 12, 2012

This patch adds a lightweight sync migrate operation MIGRATE_SYNC_LIGHT
mode that avoids writing back pages to backing storage.  Async compaction
maps to MIGRATE_ASYNC while sync compaction maps to MIGRATE_SYNC_LIGHT.
For other migrate_pages users such as memory hotplug, MIGRATE_SYNC is
used.

This avoids sync compaction stalling for an excessive length of time,
particularly when copying files to a USB stick where there might be a
large number of dirty pages backed by a filesystem that does not support
->writepages.

[aarcange@redhat.com: This patch is heavily based on Andrea's work]
[akpm@linux-foundation.org: fix fs/nfs/write.c build]
[akpm@linux-foundation.org: fix fs/btrfs/disk-io.c build]
Signed-off-by: NMel Gorman <mgorman@suse.de>
Reviewed-by: NRik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andy Isaacson <adi@hexapodia.org>
Cc: Nai Xia <nai.xia@gmail.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a6bc32b8

mm: compaction: determine if dirty pages can be migrated without blocking within ->migratepage · b969c4ab

由 Mel Gorman 提交于 1月 12, 2012

Asynchronous compaction is used when allocating transparent hugepages to
avoid blocking for long periods of time.  Due to reports of stalling,
there was a debate on disabling synchronous compaction but this severely
impacted allocation success rates.  Part of the reason was that many dirty
pages are skipped in asynchronous compaction by the following check;

	if (PageDirty(page) && !sync &&
		mapping->a_ops->migratepage != migrate_page)
			rc = -EBUSY;

This skips over all mapping aops using buffer_migrate_page() even though
it is possible to migrate some of these pages without blocking.  This
patch updates the ->migratepage callback with a "sync" parameter.  It is
the responsibility of the callback to fail gracefully if migration would
block.
Signed-off-by: NMel Gorman <mgorman@suse.de>
Reviewed-by: NRik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andy Isaacson <adi@hexapodia.org>
Cc: Nai Xia <nai.xia@gmail.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b969c4ab

epoll: limit paths · 28d82dc1

由 Jason Baron 提交于 1月 12, 2012

The current epoll code can be tickled to run basically indefinitely in
both loop detection path check (on ep_insert()), and in the wakeup paths.
The programs that tickle this behavior set up deeply linked networks of
epoll file descriptors that cause the epoll algorithms to traverse them
indefinitely.  A couple of these sample programs have been previously
posted in this thread: https://lkml.org/lkml/2011/2/25/297.

To fix the loop detection path check algorithms, I simply keep track of
the epoll nodes that have been already visited.  Thus, the loop detection
becomes proportional to the number of epoll file descriptor and links.
This dramatically decreases the run-time of the loop check algorithm.  In
one diabolical case I tried it reduced the run-time from 15 mintues (all
in kernel time) to .3 seconds.

Fixing the wakeup paths could be done at wakeup time in a similar manner
by keeping track of nodes that have already been visited, but the
complexity is harder, since there can be multiple wakeups on different
cpus...Thus, I've opted to limit the number of possible wakeup paths when
the paths are created.

This is accomplished, by noting that the end file descriptor points that
are found during the loop detection pass (from the newly added link), are
actually the sources for wakeup events.  I keep a list of these file
descriptors and limit the number and length of these paths that emanate
from these 'source file descriptors'.  In the current implemetation I
allow 1000 paths of length 1, 500 of length 2, 100 of length 3, 50 of
length 4 and 10 of length 5.  Note that it is sufficient to check the
'source file descriptors' reachable from the newly added link, since no
other 'source file descriptors' will have newly added links.  This allows
us to check only the wakeup paths that may have gotten too long, and not
re-check all possible wakeup paths on the system.

In terms of the path limit selection, I think its first worth noting that
the most common case for epoll, is probably the model where you have 1
epoll file descriptor that is monitoring n number of 'source file
descriptors'.  In this case, each 'source file descriptor' has a 1 path of
length 1.  Thus, I believe that the limits I'm proposing are quite
reasonable and in fact may be too generous.  Thus, I'm hoping that the
proposed limits will not prevent any workloads that currently work to
fail.

In terms of locking, I have extended the use of the 'epmutex' to all
epoll_ctl add and remove operations.  Currently its only used in a subset
of the add paths.  I need to hold the epmutex, so that we can correctly
traverse a coherent graph, to check the number of paths.  I believe that
this additional locking is probably ok, since its in the setup/teardown
paths, and doesn't affect the running paths, but it certainly is going to
add some extra overhead.  Also, worth noting is that the epmuex was
recently added to the ep_ctl add operations in the initial path loop
detection code using the argument that it was not on a critical path.

Another thing to note here, is the length of epoll chains that is allowed.
Currently, eventpoll.c defines:

/* Maximum number of nesting allowed inside epoll sets */
#define EP_MAX_NESTS 4

This basically means that I am limited to a graph depth of 5 (EP_MAX_NESTS
+ 1).  However, this limit is currently only enforced during the loop
check detection code, and only when the epoll file descriptors are added
in a certain order.  Thus, this limit is currently easily bypassed.  The
newly added check for wakeup paths, stricly limits the wakeup paths to a
length of 5, regardless of the order in which ep's are linked together.
Thus, a side-effect of the new code is a more consistent enforcement of
the graph depth.

Thus far, I've tested this, using the sample programs previously
mentioned, which now either return quickly or return -EINVAL.  I've also
testing using the piptest.c epoll tester, which showed no difference in
performance.  I've also created a number of different epoll networks and
tested that they behave as expectded.

I believe this solves the original diabolical test cases, while still
preserving the sane epoll nesting.
Signed-off-by: NJason Baron <jbaron@redhat.com>
Cc: Nelson Elhage <nelhage@ksplice.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

28d82dc1

pipe: fail cleanly when root tries F_SETPIPE_SZ with big size · 2ccd4f4d

由 Sasha Levin 提交于 1月 12, 2012

When a user with the CAP_SYS_RESOURCE cap tries to F_SETPIPE_SZ a pipe
with size bigger than kmalloc() can alloc it spits out an ugly warning:

  ------------[ cut here ]------------
  WARNING: at mm/page_alloc.c:2095 __alloc_pages_nodemask+0x5d3/0x7a0()
  Pid: 733, comm: a.out Not tainted 3.2.0-rc1+ #4
  Call Trace:
     warn_slowpath_common+0x75/0xb0
     warn_slowpath_null+0x15/0x20
     __alloc_pages_nodemask+0x5d3/0x7a0
     __get_free_pages+0x12/0x50
     __kmalloc+0x12b/0x150
     pipe_set_size+0x75/0x120
     pipe_fcntl+0xf8/0x140
     do_fcntl+0x2d4/0x410
     sys_fcntl+0x66/0xa0
     system_call_fastpath+0x16/0x1b
  ---[ end trace 432f702e6db7b5ee ]---

Instead, make kcalloc() handle the overflow case and fail quietly.

[akpm@linux-foundation.org: switch to sizeof(*bufs) for 80-column niceness]
Signed-off-by: NSasha Levin <levinsasha928@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Acked-by: NPekka Enberg <penberg@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2ccd4f4d

proc: fix null pointer deref in proc_pid_permission() · a2ef990a

由 Xiaotian Feng 提交于 1月 12, 2012

get_proc_task() can fail to search the task and return NULL,
put_task_struct() will then bomb the kernel with following oops:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
  IP: [<ffffffff81217d34>] proc_pid_permission+0x64/0xe0
  PGD 112075067 PUD 112814067 PMD 0
  Oops: 0002 [#1] PREEMPT SMP

This is a regression introduced by commit 0499680a ("procfs: add hidepid=
and gid= mount options").  The kernel should return -ESRCH if
get_proc_task() failed.
Signed-off-by: NXiaotian Feng <dannyfeng@tencent.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: Stephen Wilson <wilsons@start.ca>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a2ef990a

module_param: make bool parameters really bool (drivers & misc) · 90ab5ee9

由 Rusty Russell 提交于 1月 13, 2012

module_param(bool) used to counter-intuitively take an int.  In
fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
trick.

It's time to remove the int/unsigned int option.  For this version
it'll simply give a warning, but it'll break next kernel version.
Acked-by: NMauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

90ab5ee9

module_param: avoid bool abuse, add bint for special cases. · 69116f27

由 Rusty Russell 提交于 1月 13, 2012

For historical reasons, we allow module_param(bool) to take an int (or
an unsigned int).  That's going away.

A few drivers really want an int: they set it to -1 and a parameter
will set it to 0 or 1.  This sucks: reading them from sysfs will give
'Y' for both -1 and 1, but if we change it to an int, then the users
might be broken (if they did "param" instead of "param=1").

Use a new 'bint' parser for them.

(ntfs has a different problem: it needs an int for debug_msgs because
it's also exposed via sysctl.)

Cc: Steve Glendinning <steve.glendinning@smsc.com>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Guenter Roeck <guenter.roeck@ericsson.com>
Cc: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Cc: Christoph Raisch <raisch@de.ibm.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: linux390@de.ibm.com
Cc: Anton Altaparmakov <anton@tuxera.com>
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: lm-sensors@lm-sensors.org
Cc: linux-rdma@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-ntfs-dev@lists.sourceforge.net
Cc: alsa-devel@alsa-project.org
Acked-by: Takashi Iwai <tiwai@suse.de> (For the sound part)
Acked-by: Guenter Roeck <guenter.roeck@ericsson.com> (For the hwmon driver)
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

69116f27

pnfsblock: alloc short extent before submit bio · 7c5465d6

由 Peng Tao 提交于 1月 12, 2012

As discussed earlier, it is better for block client to allocate memory for
tracking extents state before submitting bio. So the patch does it by allocating
a short_extent for every INVALID extent touched by write pagelist and for
every zeroing page we created, saving them in layout header. Then in end_io we
can just use them to create commit list items and avoid memory allocation there.
Signed-off-by: NPeng Tao <peng_tao@emc.com>
Signed-off-by: NBenny Halevy <bhalevy@tonian.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

7c5465d6

pnfsblock: remove rpc_call_ops from struct parallel_io · c0411a94

由 Peng Tao 提交于 1月 12, 2012

block layout can just make use of generic read/write_done.
Signed-off-by: NPeng Tao <peng_tao@emc.com>
Signed-off-by: NBenny Halevy <bhalevy@tonian.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

c0411a94

pnfsblock: move find lock page logic out of bl_write_pagelist · 72c50887

由 Peng Tao 提交于 1月 12, 2012

Also avoid unnecessary lock_page if page is handled by others.
Signed-off-by: NPeng Tao <peng_tao@emc.com>
Signed-off-by: NBenny Halevy <bhalevy@tonian.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

72c50887

pnfsblock: cleanup bl_mark_sectors_init · 60c52e3a

由 Peng Tao 提交于 1月 12, 2012

It does not need to manipulate on partial initialized blocks.
Writeback code takes care of it.
Signed-off-by: NPeng Tao <peng_tao@emc.com>
Signed-off-by: NBenny Halevy <bhalevy@tonian.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

60c52e3a

pnfsblock: limit bio page count · 74a6eeb4

由 Peng Tao 提交于 1月 12, 2012

One bio can have at most BIO_MAX_PAGES pages. We should limit it bec otherwise
bio_alloc will fail when there are many pages in one read/write_pagelist.

Cc: <stable@vger.kernel.org> #3.1+
Signed-off-by: NPeng Tao <peng_tao@emc.com>
Signed-off-by: NBenny Halevy <bhalevy@tonian.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

74a6eeb4

pnfsblock: don't spinlock when freeing block_dev · 93a3844e

由 Peng Tao 提交于 1月 12, 2012

bl_free_block_dev() may sleep. We can not call it with spinlock held.
Besides, there is no need to take bm_lock as we are last user freeing bm_devlist.

Cc: <stable@vger.kernel.org> #3.1+
Signed-off-by: NPeng Tao <peng_tao@emc.com>
Signed-off-by: NBenny Halevy <bhalevy@tonian.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

93a3844e

pnfsblock: clean up _add_entry · 57582b37

由 Peng Tao 提交于 1月 12, 2012

It is wrong to kmalloc in _add_entry() as it is inside
spinlock. memory should be already allocated _add_entry() is called.
Signed-off-by: NPeng Tao <peng_tao@emc.com>
Signed-off-by: NBenny Halevy <bhalevy@tonian.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

57582b37

pnfsblock: set read/write tk_status to pnfs_error · 82b906d6

由 Peng Tao 提交于 1月 12, 2012

To pass the IO status to upper layer.
Signed-off-by: NPeng Tao <peng_tao@emc.com>
Signed-off-by: NBenny Halevy <bhalevy@tonian.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

82b906d6

pnfsblock: acquire im_lock in _preload_range · 39e567ae

由 Peng Tao 提交于 1月 12, 2012

When calling _add_entry, we should take the im_lock to protect
agains other modifiers.

Cc: <stable@vger.kernel.org> #3.1+
Signed-off-by: NPeng Tao <peng_tao@emc.com>
Signed-off-by: NBenny Halevy <bhalevy@tonian.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

39e567ae

NFS4: fix compile warnings in nfs4proc.c · de040bec

由 Peng Tao 提交于 1月 10, 2012

compile in nfs-for-3.3 branch shows following warnings. Fix it here.

fs/nfs/nfs4proc.c: In function ‘__nfs4_get_acl_uncached’:
fs/nfs/nfs4proc.c:3589: warning: format ‘%ld’ expects type ‘long int’, but argument 4 has type ‘size_t’
fs/nfs/nfs4proc.c:3589: warning: format ‘%ld’ expects type ‘long int’, but argument 6 has type ‘size_t’
Signed-off-by: NPeng Tao <peng_tao@emc.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

de040bec

nfs: check for integer overflow in decode_devicenotify_args() · 363e0df0

由 Dan Carpenter 提交于 1月 12, 2012

On 32 bit, if n is too large then "n * sizeof(*args->devs)" could
overflow and args->devs would be smaller than expected.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

363e0df0

NFS: cleanup endian type in decode_ds_addr() · 13fff2f3

由 Dan Carpenter 提交于 1月 12, 2012

port is supposed to be a __be16 here.  The existing code should work
fine, but this is a cleanup.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

13fff2f3

NFS: add an endian notation · 0e0243dc

由 Dan Carpenter 提交于 1月 12, 2012

This function returns a big endian value.  The implementation in
fs/nfs/callback_proc.c is declared with "__be32" but the .h file uses
"unsigned" instead.  It makes sparse complain:

fs/nfs/callback_proc.c:232:8: error:
	symbol 'nfs4_callback_layoutrecall' redeclared with different
	type (originally declared at fs/nfs/callback.h:165) - different
	base types
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

0e0243dc

ceph: ensure prealloc_blob is in place when removing xattr · 83eb26af

由 Alex Elder 提交于 1月 11, 2012

In __ceph_build_xattrs_blob(), if a ceph inode's extended attributes
are marked dirty, all attributes recorded in its rb_tree index are
formatted into a "blob" buffer.  The target buffer is recorded in
ceph_inode->i_xattrs.prealloc_blob, and it is expected to exist and
be of sufficient size to hold the attributes.

The extended attributes are marked dirty in two cases: when a new
attribute is added to the inode; or when one is removed.  In the
former case work is done to ensure the prealloc_blob buffer is
properly set up, but in the latter it is not.

Change the logic in ceph_removexattr() so it matches what is
done in ceph_setxattr().  Note that this is done in a way that
keeps the two blocks of code nearly identical, in anticipation
of a subsequent patch that encapsulates some of this logic into
one or more helper routines.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

83eb26af

ceph: enable/disable dentry complete flags via mount option · a40dc6cc

由 Sage Weil 提交于 1月 10, 2012

Enable/disable use of the dentry dir 'complete' flag via a mount option.
This lets the admin control whether ceph uses the dcache to satisfy
negative lookups or readdir when it has the entire directory contents in
its cache.

This is purely a performance optimization; correctness is guaranteed
whether it is enabled or not.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSage Weil <sage@newdream.net>

a40dc6cc

vfs: export symbol d_find_any_alias() · 46f72b34

由 Sage Weil 提交于 1月 10, 2012

Ceph needs this.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSage Weil <sage@newdream.net>

46f72b34

12 1月, 2012 3 次提交

fs: remove unneeded plug in mpage_readpages() · 0b4156eb

由 Namjae Jeon 提交于 1月 12, 2012

The block plug in mpage_readpages() duplicates the one in read_pages().
Signed-off-by: NNamjae Jeon <linkinjeon@gmail.com>
Signed-off-by: NAmit Sahrawat <amit.sahrawat83@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0b4156eb

ceph: always initialize the dentry in open_root_dentry() · d46cfba5

由 Alex Elder 提交于 1月 04, 2012

When open_root_dentry() gets a dentry via d_obtain_alias() it does
not get initialized.  If the dentry obtained came from the cache,
this is OK.  But if not, the result is an improperly initialized
dentry.

To fix this, call ceph_init_dentry() regardless of which path
produced the dentry.  That function returns immediately for a dentry
that is already initialized, it is safe to use either way.

(Credit to Sage, who suggested this fix.)
Signed-off-by: NAlex Elder <aelder@sgi.com>

d46cfba5

UBIFS: fix debugging messages · d34315da

由 Artem Bityutskiy 提交于 1月 10, 2012

Patch 56e46742 broke UBIFS debugging messages:
before that commit when UBIFS debugging was enabled, users saw few useful
debugging messages after mount. However, that patch turned 'dbg_msg()' into
'pr_debug()', so to enable the debugging messages users have to enable them
first via /sys/kernel/debug/dynamic_debug/control, which is very impractical.

This commit makes 'dbg_msg()' to use 'printk()' instead of 'pr_debug()', just
as it was before the breakage.
Signed-off-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: stable@kernel.org [3.0+]

d34315da

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功