提交 · 1b48b530dac3e600d92854d98a2a519243661f6c · openeuler / Kernel

17 2月, 2016 2 次提交

kernfs: define kernfs_node_dentry · fb3c8315

由 Aditya Kali 提交于 1月 29, 2016

Add a new kernfs api is added to lookup the dentry for a particular
kernfs path.
Signed-off-by: NAditya Kali <adityakali@google.com>
Signed-off-by: NSerge E. Hallyn <serge.hallyn@canonical.com>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NTejun Heo <tj@kernel.org>

fb3c8315

kernfs: Add API to generate relative kernfs path · 9f6df573

由 Aditya Kali 提交于 1月 29, 2016

The new function kernfs_path_from_node() generates and returns kernfs
path of a given kernfs_node relative to a given parent kernfs_node.
Signed-off-by: NAditya Kali <adityakali@google.com>
Signed-off-by: NSerge E. Hallyn <serge.hallyn@canonical.com>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NTejun Heo <tj@kernel.org>

9f6df573

08 2月, 2016 1 次提交

kernfs: make kernfs_walk_ns() use kernfs_pr_cont_buf[] · e56ed358

由 Tejun Heo 提交于 1月 15, 2016

kernfs_walk_ns() uses a static path_buf[PATH_MAX] to separate out path
components. Keeping around the 4k buffer just for kernfs_walk_ns() is
wasteful. This patch makes it piggyback on kernfs_pr_cont_buf[]
instead. This requires kernfs_walk_ns() to hold kernfs_rename_lock.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

e56ed358

23 1月, 2016 1 次提交

wrappers for ->i_mutex access · 5955102c

由 Al Viro 提交于 1月 22, 2016

parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5955102c

15 1月, 2016 1 次提交

Revert "kernfs: do not account ino_ida allocations to memcg" · b2a209ff

由 Vladimir Davydov 提交于 1月 14, 2016

Currently, all kmem allocations (namely every kmem_cache_alloc, kmalloc,
alloc_kmem_pages call) are accounted to memory cgroup automatically.
Callers have to explicitly opt out if they don't want/need accounting
for some reason.  Such a design decision leads to several problems:

 - kmalloc users are highly sensitive to failures, many of them
   implicitly rely on the fact that kmalloc never fails, while memcg
   makes failures quite plausible.

 - A lot of objects are shared among different containers by design.
   Accounting such objects to one of containers is just unfair.
   Moreover, it might lead to pinning a dead memcg along with its kmem
   caches, which aren't tiny, which might result in noticeable increase
   in memory consumption for no apparent reason in the long run.

 - There are tons of short-lived objects. Accounting them to memcg will
   only result in slight noise and won't change the overall picture, but
   we still have to pay accounting overhead.

For more info, see

 - http://lkml.kernel.org/r/20151105144002.GB15111%40dhcp22.suse.cz
 - http://lkml.kernel.org/r/20151106090555.GK29259@esperanza

Therefore this patchset switches to the white list policy.  Now kmalloc
users have to explicitly opt in by passing __GFP_ACCOUNT flag.

Currently, the list of accounted objects is quite limited and only
includes those allocations that (1) are known to be easily triggered
from userspace and (2) can fail gracefully (for the full list see patch
no.  6) and it still misses many object types.  However, accounting only
those objects should be a satisfactory approximation of the behavior we
used to have for most sane workloads.

This patch (of 6):

Revert 499611ed ("kernfs: do not account ino_ida allocations
to memcg").

Black-list kmem accounting policy (aka __GFP_NOACCOUNT) turned out to be
fragile and difficult to maintain, because there seem to be many more
allocations that should not be accounted than those that should be.
Besides, false accounting an allocation might result in much worse
consequences than not accounting at all, namely increased memory
consumption due to pinned dead kmem caches.

So it was decided to switch to the white-list policy.  This patch reverts
bits introducing the black-list policy.  The white-list policy will be
introduced later in the series.
Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b2a209ff

31 12月, 2015 1 次提交
- A
  switch ->get_link() to delayed_call, kill ->put_link() · fceef393
  由 Al Viro 提交于 12月 29, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  fceef393
30 12月, 2015 1 次提交

kill free_page_put_link() · cd3417c8

由 Al Viro 提交于 12月 29, 2015

all callers are better off with kfree_put_link()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

cd3417c8

09 12月, 2015 1 次提交

replace ->follow_link() with new method that could stay in RCU mode · 6b255391

由 Al Viro 提交于 11月 17, 2015

new method: ->get_link(); replacement of ->follow_link().  The differences
are:
	* inode and dentry are passed separately
	* might be called both in RCU and non-RCU mode;
the former is indicated by passing it a NULL dentry.
	* when called that way it isn't allowed to block
and should return ERR_PTR(-ECHILD) if it needs to be called
in non-RCU mode.

It's a flagday change - the old method is gone, all in-tree instances
converted.  Conversion isn't hard; said that, so far very few instances
do not immediately bail out when called in RCU mode.  That'll change
in the next commits.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6b255391

07 12月, 2015 2 次提交

tmpfs: listxattr should include POSIX ACL xattrs · 786534b9

由 Andreas Gruenbacher 提交于 12月 02, 2015

When a file on tmpfs has an ACL or a Default ACL, listxattr should include the
corresponding xattr name.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: NJames Morris <james.l.morris@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: linux-mm@kvack.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

786534b9

tmpfs: Use xattr handler infrastructure · aa7c5241

由 Andreas Gruenbacher 提交于 12月 02, 2015

Use the VFS xattr handler infrastructure and get rid of similar code in
the filesystem.  For implementing shmem_xattr_handler_set, we need a
version of simple_xattr_set which removes the attribute when value is
NULL.  Use this to implement kernfs_iop_removexattr as well.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: NJames Morris <james.l.morris@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: linux-mm@kvack.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

aa7c5241

21 11月, 2015 1 次提交

kernfs: implement kernfs_walk_and_get() · bd96f76a

由 Tejun Heo 提交于 11月 20, 2015

Implement kernfs_walk_and_get() which is similar to
kernfs_find_and_get() but can walk a path instead of just a name.

v2: Use strlcpy() instead of strlen() + memcpy() as suggested by
    David.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: David Miller <davem@davemloft.net>

bd96f76a

19 8月, 2015 1 次提交

kernfs: implement kernfs_path_len() · 9acee9c5

由 Tejun Heo 提交于 8月 18, 2015

Add a function to determine the path length of a kernfs node.  This
for now will be used by writeback tracepoint updates.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

9acee9c5

01 7月, 2015 1 次提交

kernfs: Add support for always empty directories. · ea015218

由 Eric W. Biederman 提交于 5月 13, 2015

Add a new function kernfs_create_empty_dir that can be used to create
directory that can not be modified.

Update the code to use make_empty_dir_inode when reporting a
permanently empty directory to the vfs.

Update the code to not allow adding to permanently empty directories.

Cc: stable@vger.kernel.org
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

ea015218

19 6月, 2015 1 次提交

kernfs: make kernfs_get_inode() public · fb02915f

由 Tejun Heo 提交于 6月 18, 2015

Move kernfs_get_inode() prototype from fs/kernfs/kernfs-internal.h to
include/linux/kernfs.h.  It obtains the matching inode for a
kernfs_node.

It will be used by cgroup for inode based permission checks for now
but is generally useful.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

fb02915f

25 5月, 2015 1 次提交

kernfs: remove outdated and confusing comment · ba50150e

由 Wolfram Sang 提交于 4月 23, 2015

Grabbing the parent is not happening anymore since 2010 (e72ceb8c
"sysfs: Remove sysfs_get/put_active_two"). Remove this confusing
comment.
Signed-off-by: NWolfram Sang <wsa+renesas@sang-engineering.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

ba50150e

15 5月, 2015 1 次提交

kernfs: do not account ino_ida allocations to memcg · 499611ed

由 Vladimir Davydov 提交于 5月 14, 2015

root->ino_ida is used for kernfs inode number allocations. Since IDA has
a layered structure, different IDs can reside on the same layer, which
is currently accounted to some memory cgroup. The problem is that each
kmem cache of a memory cgroup has its own directory on sysfs (under
/sys/fs/kernel/<cache-name>/cgroup). If the inode number of such a
directory or any file in it gets allocated from a layer accounted to the
cgroup which the cache is created for, the cgroup will get pinned for
good, because one has to free all kmem allocations accounted to a cgroup
in order to release it and destroy all its kmem caches. That said we
must not account layers of ino_ida to any memory cgroup.

Since per net init operations may create new sysfs entries directly
(e.g. lo device) or indirectly (nf_conntrack creates a new kmem cache
per each namespace, which, in turn, creates new sysfs entries), an easy
way to reproduce this issue is by creating network namespace(s) from
inside a kmem-active memory cgroup.
Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
Acked-by: NTejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org>	[4.0.x]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

499611ed

11 5月, 2015 4 次提交

A
new helper: free_page_put_link() · ecc087ff
由 Al Viro 提交于 5月 07, 2015
```
similar to kfree_put_link()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
ecc087ff

switch ->put_link() from dentry to inode · 5f2c4179

由 Al Viro 提交于 5月 07, 2015

only one instance looks at that argument at all; that sole
exception wants inode rather than dentry.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5f2c4179

don't pass nameidata to ->follow_link() · 6e77137b

由 Al Viro 提交于 5月 02, 2015

its only use is getting passed to nd_jump_link(), which can obtain
it from current->nameidata
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6e77137b

new ->follow_link() and ->put_link() calling conventions · 680baacb

由 Al Viro 提交于 5月 02, 2015

a) instead of storing the symlink body (via nd_set_link()) and returning
an opaque pointer later passed to ->put_link(), ->follow_link() _stores_
that opaque pointer (into void * passed by address by caller) and returns
the symlink body.  Returning ERR_PTR() on error, NULL on jump (procfs magic
symlinks) and pointer to symlink body for normal symlinks.  Stored pointer
is ignored in all cases except the last one.

Storing NULL for opaque pointer (or not storing it at all) means no call
of ->put_link().

b) the body used to be passed to ->put_link() implicitly (via nameidata).
Now only the opaque pointer is.  In the cases when we used the symlink body
to free stuff, ->follow_link() now should store it as opaque pointer in addition
to returning it.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

680baacb

16 4月, 2015 1 次提交

VFS: normal filesystems (and lustre): d_inode() annotations · 2b0143b5

由 David Howells 提交于 3月 17, 2015

that's the bulk of filesystem drivers dealing with inodes of their own
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2b0143b5

17 3月, 2015 1 次提交

kernfs: handle poll correctly on 'direct_read' files. · 7cff4b18

由 NeilBrown 提交于 3月 16, 2015

Kernfs supports two styles of read: direct_read and seqfile_read.

The latter supports 'poll' correctly thanks to the update of
'->event' in kernfs_seq_show.
The former does not as '->event' is never updated on a read.

So add an appropriate update in kernfs_file_direct_read().

This was noticed because some 'md' sysfs attributes were
recently changed to use direct reads.
Reported-by: NPrakash Punnoor <prakash@punnoor.de>
Reported-by: NTorsten Kaiser <just.for.lkml@googlemail.com>
Fixes: 750f199eSigned-off-by: NNeilBrown <neilb@suse.de>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

7cff4b18

14 2月, 2015 2 次提交

kernfs: remove KERNFS_STATIC_NAME · dfeb0750

由 Tejun Heo 提交于 2月 13, 2015

When a new kernfs node is created, KERNFS_STATIC_NAME is used to avoid
making a separate copy of its name.  It's currently only used for sysfs
attributes whose filenames are required to stay accessible and unchanged.
There are rare exceptions where these names are allocated and formatted
dynamically but for the vast majority of cases they're consts in the
rodata section.

Now that kernfs is converted to use kstrdup_const() and kfree_const(),
there's little point in keeping KERNFS_STATIC_NAME around.  Remove it.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dfeb0750

kernfs: convert node name allocation to kstrdup_const · 75287a67

由 Andrzej Hajda 提交于 2月 13, 2015

sysfs frequently performs duplication of strings located in read-only
memory section.  Replacing kstrdup by kstrdup_const allows to avoid such
operations.
Signed-off-by: NAndrzej Hajda <a.hajda@samsung.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Mike Turquette <mturquette@linaro.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: NTejun Heo <tj@kernel.org>
Cc: Greg KH <greg@kroah.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

75287a67

21 1月, 2015 2 次提交

fs: remove mapping->backing_dev_info · b83ae6d4

由 Christoph Hellwig 提交于 1月 14, 2015

Now that we never use the backing_dev_info pointer in struct address_space
we can simply remove it and save 4 to 8 bytes in every inode.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Reviewed-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@fb.com>

b83ae6d4

fs: deduplicate noop_backing_dev_info · a7a2c680

由 Christoph Hellwig 提交于 1月 14, 2015

hugetlbfs, kernfs and dlmfs can simply use noop_backing_dev_info instead
of creating a local duplicate.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJan Kara <jack@suse.cz>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

a7a2c680

10 1月, 2015 1 次提交

kernfs: Fix kernfs_name_compare · 72392ed0

由 Rasmus Villemoes 提交于 12月 05, 2014

Returning a difference from a comparison functions is usually wrong
(see acbbe6fb "kcmp: fix standard comparison bug" for the long
story). Here there is the additional twist that if the void pointers
ns and kn->ns happen to differ by a multiple of 2^32,
kernfs_name_compare returns 0, falsely reporting a match to the
caller.

Technically 'hash - kn->hash' is ok since the hashes are restricted to
31 bits, but it's better to avoid that subtlety.
Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

72392ed0

17 12月, 2014 1 次提交

vm_area_operations: kill ->migrate() · 50062175

由 Al Viro 提交于 5月 15, 2014

the only instance this method has ever grown was one in kernfs -
one that call ->migrate() of another vm_ops if it exists.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

50062175

20 11月, 2014 1 次提交
- A
  switch d_materialise_unique() users to d_splice_alias() · 41d28bca
  由 Al Viro 提交于 10月 12, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  41d28bca
08 11月, 2014 2 次提交

sysfs/kernfs: make read requests on pre-alloc files use the buffer. · 4ef67a8c

由 NeilBrown 提交于 10月 14, 2014

To match the previous patch which used the pre-alloc buffer for
writes, this patch causes reads to use the same buffer.
This is not strictly necessary as the current seq_read() will allocate
on first read, so user-space can trigger the required pre-alloc.  But
consistency is valuable.

The read function is somewhat simpler than seq_read() and, for example,
does not support reading from an offset into the file: reads must be
at the start of the file.

As seq_read() does not use the prealloc buffer, ->seq_show is
incompatible with ->prealloc and caused an EINVAL return from open().
sysfs code which calls into kernfs always chooses the correct function.

As the buffer is shared with writes and other reads, the mutex is
extended to cover the copy_to_user.
Signed-off-by: NNeilBrown <neilb@suse.de>
Reviewed-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

4ef67a8c

sysfs/kernfs: allow attributes to request write buffer be pre-allocated. · 2b75869b

由 NeilBrown 提交于 10月 13, 2014

md/raid allows metadata management to be performed in user-space.
A various times, particularly on device failure, the metadata needs
to be updated before further writes can be permitted.
This means that the user-space program which updates metadata much
not block on writeout, and so must not allocate memory.

mlockall(MCL_CURRENT|MCL_FUTURE) and pre-allocation can avoid all
memory allocation issues for user-memory, but that does not help
kernel memory.
Several kernel objects can be pre-allocated.  e.g. files opened before
any writes to the array are permitted.
However some kernel allocation happens in places that cannot be
pre-allocated.
In particular, writes to sysfs files (to tell md that it can now
allow writes to the array) allocate a buffer using GFP_KERNEL.

This patch allows attributes to be marked as "PREALLOC".  In that case
the maximal buffer is allocated when the file is opened, and then used
on each write instead of allocating a new buffer.

As the same buffer is now shared for all writes on the same file
description, the mutex is extended to cover full use of the buffer
including the copy_from_user().

The new __ATTR_PREALLOC() 'or's a new flag in to the 'mode', which is
inspected by sysfs_add_file_mode_ns() to determine if the file should be
marked as requiring prealloc.

Despite the comment, we *do* use ->seq_show together with ->prealloc
in this patch.  The next patch fixes that.
Signed-off-by: NNeilBrown  <neilb@suse.de>
Reviewed-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

2b75869b

09 10月, 2014 1 次提交

vfs: Remove unnecessary calls of check_submounts_and_drop · 9b053f32

由 Eric W. Biederman 提交于 2月 13, 2014

Now that check_submounts_and_drop can not fail and is called from
d_invalidate there is no longer a need to call check_submounts_and_drom
from filesystem d_revalidate methods so remove it.
Reviewed-by: NMiklos Szeredi <miklos@szeredi.hu>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9b053f32

10 7月, 2014 1 次提交

kernfs: kernel-doc warning fix · 8278bd3a

由 Fabian Frederick 提交于 7月 04, 2014

s/static_name/name_is_static
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

8278bd3a

03 7月, 2014 1 次提交

kernfs: kernfs_notify() must be useable from non-sleepable contexts · ecca47ce

由 Tejun Heo 提交于 7月 01, 2014

d911d987 ("kernfs: make kernfs_notify() trigger inotify events
too") added fsnotify triggering to kernfs_notify() which requires a
sleepable context.  There are already existing users of
kernfs_notify() which invoke it from an atomic context and in general
it's silly to require a sleepable context for triggering a
notification.

The following is an invalid context bug triggerd by md invoking
sysfs_notify() from IO completion path.

 BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
 in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
 2 locks held by swapper/1/0:
  #0:  (&(&vblk->vq_lock)->rlock){-.-...}, at: [<ffffffffa0039042>] virtblk_done+0x42/0xe0 [virtio_blk]
  #1:  (&(&bitmap->counts.lock)->rlock){-.....}, at: [<ffffffff81633718>] bitmap_endwrite+0x68/0x240
 irq event stamp: 33518
 hardirqs last  enabled at (33515): [<ffffffff8102544f>] default_idle+0x1f/0x230
 hardirqs last disabled at (33516): [<ffffffff818122ed>] common_interrupt+0x6d/0x72
 softirqs last  enabled at (33518): [<ffffffff810a1272>] _local_bh_enable+0x22/0x50
 softirqs last disabled at (33517): [<ffffffff810a29e0>] irq_enter+0x60/0x80
 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.16.0-0.rc2.git2.1.fc21.x86_64 #1
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  0000000000000000 f90db13964f4ee05 ffff88007d403b80 ffffffff81807b4c
  0000000000000000 ffff88007d403ba8 ffffffff810d4f14 0000000000000000
  0000000000441800 ffff880078fa1780 ffff88007d403c38 ffffffff8180caf2
 Call Trace:
  <IRQ>  [<ffffffff81807b4c>] dump_stack+0x4d/0x66
  [<ffffffff810d4f14>] __might_sleep+0x184/0x240
  [<ffffffff8180caf2>] mutex_lock_nested+0x42/0x440
  [<ffffffff812d76a0>] kernfs_notify+0x90/0x150
  [<ffffffff8163377c>] bitmap_endwrite+0xcc/0x240
  [<ffffffffa00de863>] close_write+0x93/0xb0 [raid1]
  [<ffffffffa00df029>] r1_bio_write_done+0x29/0x50 [raid1]
  [<ffffffffa00e0474>] raid1_end_write_request+0xe4/0x260 [raid1]
  [<ffffffff813acb8b>] bio_endio+0x6b/0xa0
  [<ffffffff813b46c4>] blk_update_request+0x94/0x420
  [<ffffffff813bf0ea>] blk_mq_end_io+0x1a/0x70
  [<ffffffffa00392c2>] virtblk_request_done+0x32/0x80 [virtio_blk]
  [<ffffffff813c0648>] __blk_mq_complete_request+0x88/0x120
  [<ffffffff813c070a>] blk_mq_complete_request+0x2a/0x30
  [<ffffffffa0039066>] virtblk_done+0x66/0xe0 [virtio_blk]
  [<ffffffffa002535a>] vring_interrupt+0x3a/0xa0 [virtio_ring]
  [<ffffffff81116177>] handle_irq_event_percpu+0x77/0x340
  [<ffffffff8111647d>] handle_irq_event+0x3d/0x60
  [<ffffffff81119436>] handle_edge_irq+0x66/0x130
  [<ffffffff8101c3e4>] handle_irq+0x84/0x150
  [<ffffffff818146ad>] do_IRQ+0x4d/0xe0
  [<ffffffff818122f2>] common_interrupt+0x72/0x72
  <EOI>  [<ffffffff8105f706>] ? native_safe_halt+0x6/0x10
  [<ffffffff81025454>] default_idle+0x24/0x230
  [<ffffffff81025f9f>] arch_cpu_idle+0xf/0x20
  [<ffffffff810f5adc>] cpu_startup_entry+0x37c/0x7b0
  [<ffffffff8104df1b>] start_secondary+0x25b/0x300

This patch fixes it by punting the notification delivery through a
work item.  This ends up adding an extra pointer to kernfs_elem_attr
enlarging kernfs_node by a pointer, which is not ideal but not a very
big deal either.  If this turns out to be an actual issue, we can move
kernfs_elem_attr->size to kernfs_node->iattr later.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NJosh Boyer <jwboyer@fedoraproject.org>
Cc: Jens Axboe <axboe@kernel.dk>
Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

ecca47ce

30 6月, 2014 1 次提交

kernfs: introduce kernfs_pin_sb() · 4e26445f

由 Li Zefan 提交于 6月 30, 2014

kernfs_pin_sb() tries to get a refcnt of the superblock.

This will be used by cgroupfs.

v2:
- make kernfs_pin_sb() return the superblock.
- drop kernfs_drop_sb().

tj: Updated the comment a bit.

[ This is a prerequisite for a bugfix. ]
Cc: <stable@vger.kernel.org> # 3.15
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

4e26445f

03 6月, 2014 1 次提交

kernfs: move the last knowledge of sysfs out from kernfs · c9482a5b

由 Jianyu Zhan 提交于 4月 26, 2014

There is still one residue of sysfs remaining: the sb_magic
SYSFS_MAGIC. However this should be kernfs user specific,
so this patch moves it out. Kerrnfs user should specify their
magic number while mouting.
Signed-off-by: NJianyu Zhan <nasa4836@gmail.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c9482a5b

28 5月, 2014 1 次提交

kernfs: move the last knowledge of sysfs out from kernfs · 26fc9cd2

由 Jianyu Zhan 提交于 4月 26, 2014

There is still one residue of sysfs remaining: the sb_magic
SYSFS_MAGIC. However this should be kernfs user specific,
so this patch moves it out. Kerrnfs user should specify their
magic number while mouting.
Signed-off-by: NJianyu Zhan <nasa4836@gmail.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

26fc9cd2

13 5月, 2014 1 次提交

kernfs, sysfs, cgroup: restrict extra perm check on open to sysfs · 555724a8

由 Tejun Heo 提交于 5月 12, 2014

The kernfs open method - kernfs_fop_open() - inherited extra
permission checks from sysfs.  While the vfs layer allows ignoring the
read/write permissions checks if the issuer has CAP_DAC_OVERRIDE,
sysfs explicitly denied open regardless of the cap if the file doesn't
have any of the UGO perms of the requested access or doesn't implement
the requested operation.  It can be debated whether this was a good
idea or not but the behavior is too subtle and dangerous to change at
this point.

After cgroup got converted to kernfs, this extra perm check also got
applied to cgroup breaking libcgroup which opens write-only files with
O_RDWR as root.  This patch gates the extra open permission check with
a new flag KERNFS_ROOT_EXTRA_OPEN_PERM_CHECK and enables it for sysfs.
For sysfs, nothing changes.  For cgroup, root now can perform any
operation regardless of the permissions as it was before kernfs
conversion.  Note that kernfs still fails unimplemented operations
with -EINVAL.

While at it, add comments explaining KERNFS_ROOT flags.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NAndrey Wagin <avagin@gmail.com>
Tested-by: NAndrey Wagin <avagin@gmail.com>
Cc: Li Zefan <lizefan@huawei.com>
References: http://lkml.kernel.org/g/CANaxB-xUm3rJ-Cbp72q-rQJO5mZe1qK6qXsQM=vh0U8upJ44+A@mail.gmail.com
Fixes: 2bd59d48 ("cgroup: convert to kernfs")
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

555724a8

26 4月, 2014 2 次提交

kernfs: add back missing error check in kernfs_fop_mmap() · b44b2140

由 Tejun Heo 提交于 4月 20, 2014

While updating how mmap enabled kernfs files are handled by lockdep,
9b2db6e1 ("sysfs: bail early from kernfs_file_mmap() to avoid
spurious lockdep warning") inadvertently dropped error return check
from kernfs_file_mmap().  The intention was just dropping "if
(ops->mmap)" check as the control won't reach the point if the mmap
callback isn't implemented, but I mistakenly removed the error return
check together with it.

This led to Xorg crash on i810 which was reported and bisected to the
commit and then to the specific change by Tobias.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-and-bisected-by: NTobias Powalowski <tobias.powalowski@googlemail.com>
Tested-by: NTobias Powalowski <tobias.powalowski@googlemail.com>
References: http://lkml.kernel.org/g/533D01BD.1010200@googlemail.com
Cc: stable <stable@vger.kernel.org> # 3.14
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

b44b2140

kernfs: fix a subdir count leak · c1befb88

由 Jianyu Zhan 提交于 4月 17, 2014

Currently kernfs_link_sibling() increates parent->dir.subdirs before
adding the node into parent's chidren rb tree.

Because it is possible that kernfs_link_sibling() couldn't find
a suitable slot and bail out, this leads to a mismatch between
elevated subdir count with actual children node numbers.

This patches fix this problem, by moving the subdir accouting
after the actual addtion happening.
Signed-off-by: NJianyu Zhan <nasa4836@gmail.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

c1befb88

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功