提交 · 11fbf53d66ec302fe50b06bd7cb4863dbb98775a · openeuler / raspberrypi-kernel

27 4月, 2017 1 次提交

fs: constify tree_descr arrays passed to simple_fill_super() · cda37124

由 Eric Biggers 提交于 3月 25, 2017

simple_fill_super() is passed an array of tree_descr structures which
describe the files to create in the filesystem's root directory.  Since
these arrays are never modified intentionally, they should be 'const' so
that they are placed in .rodata and benefit from memory protection.
This patch updates the function signature and all users, and also
constifies tree_descr.name.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

cda37124

03 2月, 2017 1 次提交

debugfs: add debugfs_lookup() · a7c5437b

由 Omar Sandoval 提交于 1月 31, 2017

We don't always have easy access to the dentry of a file or directory we
created in debugfs. Add a helper which allows us to get a dentry we
previously created.

The motivation for this change is a problem with blktrace and the blk-mq
debugfs entries introduced in 07e4fead ("blk-mq: create debugfs
directory tree"). Namely, in some cases, the directory that blktrace
needs to create may already exist, but in other cases, it may not. We
_could_ rely on a bunch of implied knowledge to decide whether to create
the directory or not, but it's much cleaner on our end to just look it
up.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

a7c5437b

01 2月, 2017 1 次提交

fs: Better permission checking for submounts · 93faccbb

由 Eric W. Biederman 提交于 2月 01, 2017

To support unprivileged users mounting filesystems two permission
checks have to be performed: a test to see if the user allowed to
create a mount in the mount namespace, and a test to see if
the user is allowed to access the specified filesystem.

The automount case is special in that mounting the original filesystem
grants permission to mount the sub-filesystems, to any user who
happens to stumble across the their mountpoint and satisfies the
ordinary filesystem permission checks.

Attempting to handle the automount case by using override_creds
almost works.  It preserves the idea that permission to mount
the original filesystem is permission to mount the sub-filesystem.
Unfortunately using override_creds messes up the filesystems
ordinary permission checks.

Solve this by being explicit that a mount is a submount by introducing
vfs_submount, and using it where appropriate.

vfs_submount uses a new mount internal mount flags MS_SUBMOUNT, to let
sget and friends know that a mount is a submount so they can take appropriate
action.

sget and sget_userns are modified to not perform any permission checks
on submounts.

follow_automount is modified to stop using override_creds as that
has proven problemantic.

do_mount is modified to always remove the new MS_SUBMOUNT flag so
that we know userspace will never by able to specify it.

autofs4 is modified to stop using current_real_cred that was put in
there to handle the previous version of submount permission checking.

cifs is modified to pass the mountpoint all of the way down to vfs_submount.

debugfs is modified to pass the mountpoint all of the way down to
trace_automount by adding a new parameter.  To make this change easier
a new typedef debugfs_automount_t is introduced to capture the type of
the debugfs automount function.

Cc: stable@vger.kernel.org
Fixes: 069d5ac9 ("autofs:  Fix automounts by using current_real_cred()->uid")
Fixes: aeaa4a79 ("fs: Call d_automount with the filesystems creds")
Reviewed-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: NSeth Forshee <seth.forshee@canonical.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

93faccbb

28 9月, 2016 1 次提交

fs: Replace current_fs_time() with current_time() · c2050a45

由 Deepa Dinamani 提交于 9月 14, 2016

current_fs_time() uses struct super_block* as an argument.
As per Linus's suggestion, this is changed to take struct
inode* as a parameter instead. This is because the function
is primarily meant for vfs inode timestamps.
Also the function was renamed as per Arnd's suggestion.

Change all calls to current_fs_time() to use the new
current_time() function instead. current_fs_time() will be
deleted.
Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c2050a45

27 9月, 2016 1 次提交

libfs: support RENAME_NOREPLACE in simple_rename() · e0e0be8a

由 Miklos Szeredi 提交于 9月 27, 2016

This is trivial to do:

 - add flags argument to simple_rename()
 - check if flags doesn't have any other than RENAME_NOREPLACE
 - assign simple_rename() to .rename2 instead of .rename

Filesystems converted:

hugetlbfs, ramfs, bpf.

Debugfs uses simple_rename() to implement debugfs_rename(), which is for
debugfs instances to rename files internally, not for userspace filesystem
access.  For this case pass zero flags to simple_rename().
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>

e0e0be8a

30 5月, 2016 1 次提交
- A
  debugfs: ->d_parent is never NULL or negative · acc29fb8
  由 Al Viro 提交于 5月 29, 2016
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  acc29fb8
13 4月, 2016 4 次提交

debugfs: Make automount point inodes permanently empty · 87243deb

由 Seth Forshee 提交于 3月 09, 2016

Starting with 4.1 the tracing subsystem has its own filesystem
which is automounted in the tracing subdirectory of debugfs.
Prior to this debugfs could be bind mounted in a cloned mount
namespace, but if tracefs has been mounted under debugfs this
now fails because there is a locked child mount. This creates
a regression for container software which bind mounts debugfs
to satisfy the assumption of some userspace software.

In other pseudo filesystems such as proc and sysfs we're already
creating mountpoints like this in such a way that no dirents can
be created in the directories, allowing them to be exceptions to
some MNT_LOCKED tests. In fact we're already do this for the
tracefs mountpoint in sysfs.

Do the same in debugfs_create_automount(), since the intention
here is clearly to create a mountpoint. This fixes the regression,
as locked child mounts on permanently empty directories do not
cause a bind mount to fail.

Cc: stable@vger.kernel.org # v4.1+
Signed-off-by: NSeth Forshee <seth.forshee@canonical.com>
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

87243deb

debugfs: add support for self-protecting attribute file fops · c6468808

由 Nicolai Stange 提交于 3月 22, 2016

In order to protect them against file removal issues, debugfs_create_file()
creates a lifetime managing proxy around each struct file_operations
handed in.

In cases where this struct file_operations is able to manage file lifetime
by itself already, the proxy created by debugfs is a waste of resources.

The most common class of struct file_operations given to debugfs are those
defined by means of the DEFINE_SIMPLE_ATTRIBUTE() macro.

Introduce a DEFINE_DEBUGFS_ATTRIBUTE() macro to allow any
struct file_operations of this class to be easily made file lifetime aware
and thus, to be operated unproxied.

Specifically, introduce debugfs_attr_read() and debugfs_attr_write()
which wrap simple_attr_read() and simple_attr_write() under the protection
of a debugfs_use_file_start()/debugfs_use_file_finish() pair.

Make DEFINE_DEBUGFS_ATTRIBUTE() set the defined struct file_operations'
->read() and ->write() members to these wrappers.

Export debugfs_create_file_unsafe() in order to allow debugfs users to
create their files in non-proxying operation mode.
Signed-off-by: NNicolai Stange <nicstange@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

c6468808

debugfs: prevent access to removed files' private data · 49d200de

由 Nicolai Stange 提交于 3月 22, 2016

Upon return of debugfs_remove()/debugfs_remove_recursive(), it might
still be attempted to access associated private file data through
previously opened struct file objects. If that data has been freed by
the caller of debugfs_remove*() in the meanwhile, the reading/writing
process would either encounter a fault or, if the memory address in
question has been reassigned again, unrelated data structures could get
overwritten.

However, since debugfs files are seldomly removed, usually from module
exit handlers only, the impact is very low.

Currently, there are ~1000 call sites of debugfs_create_file() spread
throughout the whole tree and touching all of those struct file_operations
in order to make them file removal aware by means of checking the result of
debugfs_use_file_start() from within their methods is unfeasible.

Instead, wrap the struct file_operations by a lifetime managing proxy at
file open:
- In debugfs_create_file(), the original fops handed in has got stashed
  away in ->d_fsdata already.
- In debugfs_create_file(), install a proxy file_operations factory,
  debugfs_full_proxy_file_operations, at ->i_fop.

This proxy factory has got an ->open() method only. It carries out some
lifetime checks and if successful, dynamically allocates and sets up a new
struct file_operations proxy at ->f_op. Afterwards, it forwards to the
->open() of the original struct file_operations in ->d_fsdata, if any.

The dynamically set up proxy at ->f_op has got a lifetime managing wrapper
set for each of the methods defined in the original struct file_operations
in ->d_fsdata.

Its ->release()er frees the proxy again and forwards to the original
->release(), if any.

In order not to mislead the VFS layer, it is strictly necessary to leave
those fields blank in the proxy that have been NULL in the original
struct file_operations also, i.e. aren't supported. This is why there is a
need for dynamically allocated proxies. The choice made not to allocate a
proxy instance for every dentry at file creation, but for every
struct file object instantiated thereof is justified by the expected usage
pattern of debugfs, namely that in general very few files get opened more
than once at a time.

The wrapper methods set in the struct file_operations implement lifetime
managing by means of the SRCU protection facilities already in place for
debugfs:
They set up a SRCU read side critical section and check whether the dentry
is still alive by means of debugfs_use_file_start(). If so, they forward
the call to the original struct file_operation stored in ->d_fsdata, still
under the protection of the SRCU read side critical section.
This SRCU read side critical section prevents any pending debugfs_remove()
and friends to return to their callers. Since a file's private data must
only be freed after the return of debugfs_remove(), the ongoing proxied
call is guarded against any file removal race.

If, on the other hand, the initial call to debugfs_use_file_start() detects
that the dentry is dead, the wrapper simply returns -EIO and does not
forward the call. Note that the ->poll() wrapper is special in that its
signature does not allow for the return of arbitrary -EXXX values and thus,
POLLHUP is returned here.

In order not to pollute debugfs with wrapper definitions that aren't ever
needed, I chose not to define a wrapper for every struct file_operations
method possible. Instead, a wrapper is defined only for the subset of
methods which are actually set by any debugfs users.
Currently, these are:

  ->llseek()
  ->read()
  ->write()
  ->unlocked_ioctl()
  ->poll()

The ->release() wrapper is special in that it does not protect the original
->release() in any way from dead files in order not to leak resources.
Thus, any ->release() handed to debugfs must implement file lifetime
management manually, if needed.
For only 33 out of a total of 434 releasers handed in to debugfs, it could
not be verified immediately whether they access data structures that might
have been freed upon a debugfs_remove() return in the meanwhile.

Export debugfs_use_file_start() and debugfs_use_file_finish() in order to
allow any ->release() to manually implement file lifetime management.

For a set of common cases of struct file_operations implemented by the
debugfs_core itself, future patches will incorporate file lifetime
management directly within those in order to allow for their unproxied
operation. Rename the original, non-proxying "debugfs_create_file()" to
"debugfs_create_file_unsafe()" and keep it for future internal use by
debugfs itself. Factor out code common to both into the new
__debugfs_create_file().
Signed-off-by: NNicolai Stange <nicstange@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

49d200de

debugfs: prevent access to possibly dead file_operations at file open · 9fd4dcec

由 Nicolai Stange 提交于 3月 22, 2016

Nothing prevents a dentry found by path lookup before a return of
__debugfs_remove() to actually get opened after that return. Now, after
the return of __debugfs_remove(), there are no guarantees whatsoever
regarding the memory the corresponding inode's file_operations object
had been kept in.

Since __debugfs_remove() is seldomly invoked, usually from module exit
handlers only, the race is hard to trigger and the impact is very low.

A discussion of the problem outlined above as well as a suggested
solution can be found in the (sub-)thread rooted at

  http://lkml.kernel.org/g/20130401203445.GA20862@ZenIV.linux.org.uk
  ("Yet another pipe related oops.")

Basically, Greg KH suggests to introduce an intermediate fops and
Al Viro points out that a pointer to the original ones may be stored in
->d_fsdata.

Follow this line of reasoning:
- Add SRCU as a reverse dependency of DEBUG_FS.
- Introduce a srcu_struct object for the debugfs subsystem.
- In debugfs_create_file(), store a pointer to the original
  file_operations object in ->d_fsdata.
- Make debugfs_remove() and debugfs_remove_recursive() wait for a
  SRCU grace period after the dentry has been delete()'d and before they
  return to their callers.
- Introduce an intermediate file_operations object named
  "debugfs_open_proxy_file_operations". It's ->open() functions checks,
  under the protection of a SRCU read lock, whether the dentry is still
  alive, i.e. has not been d_delete()'d and if so, tries to acquire a
  reference on the owning module.
  On success, it sets the file object's ->f_op to the original
  file_operations and forwards the ongoing open() call to the original
  ->open().
- For clarity, rename the former debugfs_file_operations to
  debugfs_noop_file_operations -- they are in no way canonical.

The choice of SRCU over "normal" RCU is justified by the fact, that the
former may also be used to protect ->i_private data from going away
during the execution of a file's readers and writers which may (and do)
sleep.

Finally, introduce the fs/debugfs/internal.h header containing some
declarations internal to the debugfs implementation.
Signed-off-by: NNicolai Stange <nicstange@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

9fd4dcec

30 3月, 2016 2 次提交

fs: debugfs: Replace CURRENT_TIME by current_fs_time() · 1b48b530

由 Deepa Dinamani 提交于 2月 22, 2016

CURRENT_TIME macro is not appropriate for filesystems as it
doesn't use the right granularity for filesystem timestamps.
Use current_fs_time() instead.
Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

1b48b530

debugfs: fix inode i_nlink references for automount dentry · a8f324a4

由 Roman Pen 提交于 2月 09, 2016

Directory inodes should start off with i_nlink == 2 (one extra ref
for "." entry).  debugfs_create_automount() increases neither the
i_nlink reference for current inode nor for parent inode.

On attempt to remove the automount dentry, kernel complains:

  [   86.288070] WARNING: CPU: 1 PID: 3616 at fs/inode.c:273 drop_nlink+0x3e/0x50()
  [   86.288461] Modules linked in: debugfs_example2(O-)
  [   86.288745] CPU: 1 PID: 3616 Comm: rmmod Tainted: G           O    4.4.0-rc3-next-20151207+ #135
  [   86.289197] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.2-20150617_082717-anatol 04/01/2014
  [   86.289696]  ffffffff81be05c9 ffff8800b9e6fda0 ffffffff81352e2c 0000000000000000
  [   86.290110]  ffff8800b9e6fdd8 ffffffff81065142 ffff8801399175e8 ffff8800bb78b240
  [   86.290507]  ffff8801399175e8 ffff8800b73d7898 ffff8800b73d7840 ffff8800b9e6fde8
  [   86.290933] Call Trace:
  [   86.291080]  [<ffffffff81352e2c>] dump_stack+0x4e/0x82
  [   86.291340]  [<ffffffff81065142>] warn_slowpath_common+0x82/0xc0
  [   86.291640]  [<ffffffff8106523a>] warn_slowpath_null+0x1a/0x20
  [   86.291932]  [<ffffffff811ae62e>] drop_nlink+0x3e/0x50
  [   86.292208]  [<ffffffff811ba35b>] simple_unlink+0x4b/0x60
  [   86.292481]  [<ffffffff811ba3a7>] simple_rmdir+0x37/0x50
  [   86.292748]  [<ffffffff812d9808>] __debugfs_remove.part.16+0xa8/0xd0
  [   86.293082]  [<ffffffff812d9a0b>] debugfs_remove_recursive+0xdb/0x1c0
  [   86.293406]  [<ffffffffa00004dd>] cleanup_module+0x2d/0x3b [debugfs_example2]
  [   86.293762]  [<ffffffff810d959b>] SyS_delete_module+0x16b/0x220
  [   86.294077]  [<ffffffff818ef857>] entry_SYSCALL_64_fastpath+0x12/0x6a
  [   86.294405] ---[ end trace c9fc53353fe14a36 ]---
  [   86.294639] ------------[ cut here ]------------

To reproduce the issue it is enough to invoke these lines:

     autom = debugfs_create_automount("automount", NULL, vfsmount_cb, data);
     BUG_ON(IS_ERR_OR_NULL(autom));
     debugfs_remove(autom);

The issue is fixed by increasing inode i_nlink references for current
and parent inodes.
Signed-off-by: NRoman Pen <r.peniaev@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

a8f324a4

23 1月, 2016 1 次提交

wrappers for ->i_mutex access · 5955102c

由 Al Viro 提交于 1月 22, 2016

parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5955102c

11 11月, 2015 1 次提交

debugfs: fix refcount imbalance in start_creating · 0ee9608c

由 Daniel Borkmann 提交于 11月 05, 2015

In debugfs' start_creating(), we pin the file system to safely access
its root. When we failed to create a file, we unpin the file system via
failed_creating() to release the mount count and eventually the reference
of the vfsmount.

However, when we run into an error during lookup_one_len() when still
in start_creating(), we only release the parent's mutex but not so the
reference on the mount. Looks like it was done in the past, but after
splitting portions of __create_file() into start_creating() and
end_creating() via 190afd81 ("debugfs: split the beginning and the
end of __create_file() off"), this seemed missed. Noticed during code
review.

Fixes: 190afd81 ("debugfs: split the beginning and the end of __create_file() off")
Cc: stable@vger.kernel.org # v4.0+
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

0ee9608c

04 10月, 2015 1 次提交

debugfs: document that debugfs_remove*() accepts NULL and error values · 398dc4ad

由 Ulf Magnusson 提交于 9月 07, 2015

According to commit a59d6293 ("debugfs: change parameter check in
debugfs_remove() functions"), this is meant to make cleanup easier for
callers. In that case it ought to be documented.
Signed-off-by: NUlf Magnusson <ulfalizer@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

398dc4ad

29 9月, 2015 1 次提交

debugfs: document that debugfs_remove*() accepts NULL and error values · 636db7a9

由 Ulf Magnusson 提交于 9月 04, 2015

According to commit a59d6293 ("debugfs: change parameter check in
debugfs_remove() functions"), this is meant to make cleanup easier for
callers. In that case it ought to be documented.
Signed-off-by: NUlf Magnusson <ulfalizer@gmail.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

636db7a9

01 7月, 2015 1 次提交

sysfs: Create mountpoints with sysfs_create_mount_point · f9bb4882

由 Eric W. Biederman 提交于 5月 13, 2015

This allows for better documentation in the code and
it allows for a simpler and fully correct version of
fs_fully_visible to be written.

The mount points converted and their filesystems are:
/sys/hypervisor/s390/       s390_hypfs
/sys/kernel/config/         configfs
/sys/kernel/debug/          debugfs
/sys/firmware/efi/efivars/  efivarfs
/sys/fs/fuse/connections/   fusectl
/sys/fs/pstore/             pstore
/sys/kernel/tracing/        tracefs
/sys/fs/cgroup/             cgroup
/sys/kernel/security/       securityfs
/sys/fs/selinux/            selinuxfs
/sys/fs/smackfs/            smackfs

Cc: stable@vger.kernel.org
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

f9bb4882

24 6月, 2015 1 次提交
- A
  make simple_positive() public · dc3f4198
  由 Al Viro 提交于 5月 18, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  dc3f4198
11 5月, 2015 1 次提交

debugfs: switch to simple_follow_link() · 5723cb01

由 Al Viro 提交于 5月 02, 2015

Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5723cb01

16 4月, 2015 2 次提交

VFS: normal filesystems (and lustre): d_inode() annotations · 2b0143b5

由 David Howells 提交于 3月 17, 2015

that's the bulk of filesystem drivers dealing with inodes of their own
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2b0143b5

VFS: Fix up debugfs to use d_is_dir() in place of S_ISDIR() · 7ceab50c

由 David Howells 提交于 3月 05, 2015

Fix up debugfs to use d_is_dir(dentry) in place of
S_ISDIR(dentry->d_inode->i_mode).
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7ceab50c

03 4月, 2015 1 次提交

debugfs: allow bad parent pointers to be passed in · c9e15f25

由 Greg KH 提交于 3月 30, 2015

If something went wrong with creating a debugfs file/symlink/directory,
that value could be passed down into debugfs again as a parent dentry.
To make caller code simpler, just error out if this happens, and don't
crash the kernel.
Reported-by: NAlex Elder <elder@linaro.org>
Reviewed-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: NAlex Elder <elder@linaro.org>

c9e15f25

23 2月, 2015 2 次提交

debugfs: leave freeing a symlink body until inode eviction · 0db59e59

由 Al Viro 提交于 2月 21, 2015

As it is, we have debugfs_remove() racing with symlink traversals.
Supply ->evict_inode() and do freeing there - inode will remain
pinned until we are done with the symlink body.

And rip the idiocy with checking if dentry is positive right after
we'd verified debugfs_positive(), which is a stronger check...

Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

0db59e59

VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) · e36cb0b8

由 David Howells 提交于 1月 29, 2015

Convert the following where appropriate:

 (1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry).

 (2) S_ISREG(dentry->d_inode) to d_is_reg(dentry).

 (3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry).  This is actually more
     complicated than it appears as some calls should be converted to
     d_can_lookup() instead.  The difference is whether the directory in
     question is a real dir with a ->lookup op or whether it's a fake dir with
     a ->d_automount op.

In some circumstances, we can subsume checks for dentry->d_inode not being
NULL into this, provided we the code isn't in a filesystem that expects
d_inode to be NULL if the dirent really *is* negative (ie. if we're going to
use d_inode() rather than d_backing_inode() to get the inode pointer).

Note that the dentry type field may be set to something other than
DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS
manages the fall-through from a negative dentry to a lower layer.  In such a
case, the dentry type of the negative union dentry is set to the same as the
type of the lower dentry.

However, if you know d_inode is not NULL at the call site, then you can use
the d_is_xxx() functions even in a filesystem.

There is one further complication: a 0,0 chardev dentry may be labelled
DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE.  Strictly, this was
intended for special directory entry types that don't have attached inodes.

The following perl+coccinelle script was used:

use strict;

my @callers;
open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') ||
    die "Can't grep for S_ISDIR and co. callers";
@callers = <$fd>;
close($fd);
unless (@callers) {
    print "No matches\n";
    exit(0);
}

my @cocci = (
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISLNK(E->d_inode->i_mode)',
    '+ d_is_symlink(E)',
    '',
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISDIR(E->d_inode->i_mode)',
    '+ d_is_dir(E)',
    '',
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISREG(E->d_inode->i_mode)',
    '+ d_is_reg(E)' );

my $coccifile = "tmp.sp.cocci";
open($fd, ">$coccifile") || die $coccifile;
print($fd "$_\n") || die $coccifile foreach (@cocci);
close($fd);

foreach my $file (@callers) {
    chomp $file;
    print "Processing ", $file, "\n";
    system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 ||
	die "spatch failed";
}

[AV: overlayfs parts skipped]
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e36cb0b8

18 2月, 2015 1 次提交

debugfs: Provide a file creation function that also takes an initial size · e59b4e91

由 David Howells 提交于 1月 21, 2015

Provide a file creation function that also takes an initial size so that the
caller doesn't have to set i_size, thus meaning that we don't have to call
deal with ->d_inode in the callers.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e59b4e91

26 1月, 2015 12 次提交
- D
  debugfs: Provide a file creation function that also takes an initial size · 163f9eb9
  由 David Howells 提交于 1月 21, 2015
```
Provide a file creation function that also takes an initial size so that the
caller doesn't have to set i_size, thus meaning that we don't have to call
deal with ->d_inode in the callers.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
```
  163f9eb9
- A
  new primitive: debugfs_create_automount() · 77b3da6e
  由 Al Viro 提交于 1月 25, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  77b3da6e
- A
  debugfs: split end_creating() into success and failure cases · 5233e311
  由 Al Viro 提交于 1月 25, 2015
```
... and don't bother with dput(dentry) in the former and with
dget(dentry) preceding all its calls.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  5233e311
- A
  debugfs: take mode-dependent parts of debugfs_get_inode() into callers · edac65ea
  由 Al Viro 提交于 1月 25, 2015
```
... and trim the arguments list
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  edac65ea
- A
  fold debugfs_mknod() into callers · 680b3024
  由 Al Viro 提交于 1月 25, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  680b3024
- A
  fold debugfs_create() into caller · 3473cde5
  由 Al Viro 提交于 1月 25, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  3473cde5
- A
  fold debugfs_mkdir() into caller · 02538a75
  由 Al Viro 提交于 1月 25, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  02538a75
- A
  debugfs_mknod(): get rid useless arguments · 160f7592
  由 Al Viro 提交于 1月 25, 2015
```
dev is always zero, dir was only used to get its ->i_sb, which is
equal to ->d_sb of dentry...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  160f7592
- A
  fold debugfs_link() into caller · 9b73fab0
  由 Al Viro 提交于 1月 25, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  9b73fab0
- A
  debugfs: kill __create_file() · ad5abd5b
  由 Al Viro 提交于 1月 25, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  ad5abd5b
- A
  debugfs: split the beginning and the end of __create_file() off · 190afd81
  由 Al Viro 提交于 1月 25, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  190afd81
- A
  debugfs_{mkdir,create,link}(): get rid of redundant argument · e09ddf36
  由 Al Viro 提交于 1月 25, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  e09ddf36
04 11月, 2014 1 次提交
- A
  move d_rcu from overlapping d_child to overlapping d_alias · 946e51f2
  由 Al Viro 提交于 10月 26, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  946e51f2
10 7月, 2014 2 次提交

fs: debugfs: remove trailing whitespace · 88e412ea

由 Rahul Bedarkar 提交于 6月 06, 2014

fixes checkpatch.pl trailing whitespace errors
Signed-off-by: NRahul Bedarkar <rahulbedarkar89@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

88e412ea

debugfs: Fix corrupted loop in debugfs_remove_recursive · 485d4402

由 Steven Rostedt 提交于 6月 09, 2014

[ I'm currently running my tests on it now, and so far, after a few
 hours it has yet to blow up. I'll run it for 24 hours which it never
 succeeded in the past. ]

The tracing code has a way to make directories within the debugfs file
system as well as deleting them using mkdir/rmdir in the instance
directory. This is very limited in functionality, such as there is
no renames, and the parent directory "instance" can not be modified.
The tracing code creates the instance directory from the debugfs code
and then replaces the dentry->d_inode->i_op with its own to allow
for mkdir/rmdir to work.

When these are called, the d_entry and inode locks need to be released
to call the instance creation and deletion code. That code has its own
accounting and locking to serialize everything to prevent multiple
users from causing harm. As the parent "instance" directory can not
be modified this simplifies things.

I created a stress test that creates several threads that randomly
creates and deletes directories thousands of times a second. The code
stood up to this test and I submitted it a while ago.

Recently I added a new test that adds readers to the mix. While the
instance directories were being added and deleted, readers would read
from these directories and even enable tracing within them. This test
was able to trigger a bug:

 general protection fault: 0000 [#1] PREEMPT SMP
 Modules linked in: ...
 CPU: 3 PID: 17789 Comm: rmdir Tainted: G        W     3.15.0-rc2-test+ #41
 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
 task: ffff88003786ca60 ti: ffff880077018000 task.ti: ffff880077018000
 RIP: 0010:[<ffffffff811ed5eb>]  [<ffffffff811ed5eb>] debugfs_remove_recursive+0x1bd/0x367
 RSP: 0018:ffff880077019df8  EFLAGS: 00010246
 RAX: 0000000000000002 RBX: ffff88006f0fe490 RCX: 0000000000000000
 RDX: dead000000100058 RSI: 0000000000000246 RDI: ffff88003786d454
 RBP: ffff88006f0fe640 R08: 0000000000000628 R09: 0000000000000000
 R10: 0000000000000628 R11: ffff8800795110a0 R12: ffff88006f0fe640
 R13: ffff88006f0fe640 R14: ffffffff81817d0b R15: ffffffff818188b7
 FS:  00007ff13ae24700(0000) GS:ffff88007d580000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
 CR2: 0000003054ec7be0 CR3: 0000000076d51000 CR4: 00000000000007e0
 Stack:
  ffff88007a41ebe0 dead000000100058 00000000fffffffe ffff88006f0fe640
  0000000000000000 ffff88006f0fe678 ffff88007a41ebe0 ffff88003793a000
  00000000fffffffe ffffffff810bde82 ffff88006f0fe640 ffff88007a41eb28
 Call Trace:
  [<ffffffff810bde82>] ? instance_rmdir+0x15b/0x1de
  [<ffffffff81132e2d>] ? vfs_rmdir+0x80/0xd3
  [<ffffffff81132f51>] ? do_rmdir+0xd1/0x139
  [<ffffffff8124ad9e>] ? trace_hardirqs_on_thunk+0x3a/0x3c
  [<ffffffff814fea62>] ? system_call_fastpath+0x16/0x1b
 Code: fe ff ff 48 8d 75 30 48 89 df e8 c9 fd ff ff 85 c0 75 13 48 c7 c6 b8 cc d2 81 48 c7 c7 b0 cc d2 81 e8 8c 7a f5 ff 48 8b 54 24 08 <48> 8b 82 a8 00 00 00 48 89 d3 48 2d a8 00 00 00 48 89 44 24 08
 RIP  [<ffffffff811ed5eb>] debugfs_remove_recursive+0x1bd/0x367
  RSP <ffff880077019df8>

It took a while, but every time it triggered, it was always in the
same place:

	list_for_each_entry_safe(child, next, &parent->d_subdirs, d_u.d_child) {

Where the child->d_u.d_child seemed to be corrupted.  I added lots of
trace_printk()s to see what was wrong, and sure enough, it was always
the child's d_u.d_child field. I looked around to see what touches
it and noticed that in __dentry_kill() which calls dentry_free():

static void dentry_free(struct dentry *dentry)
{
	/* if dentry was never visible to RCU, immediate free is OK */
	if (!(dentry->d_flags & DCACHE_RCUACCESS))
		__d_free(&dentry->d_u.d_rcu);
	else
		call_rcu(&dentry->d_u.d_rcu, __d_free);
}

I also noticed that __dentry_kill() unlinks the child->d_u.child
under the parent->d_lock spin_lock.

Looking back at the loop in debugfs_remove_recursive() it never takes the
parent->d_lock to do the list walk. Adding more tracing, I was able to
prove this was the issue:

 ftrace-t-15385   1.... 246662024us : dentry_kill <ffffffff81138b91>: free ffff88006d573600
    rmdir-15409   2.... 246662024us : debugfs_remove_recursive <ffffffff811ec7e5>: child=ffff88006d573600 next=dead000000100058

The dentry_kill freed ffff88006d573600 just as the remove recursive was walking
it.

In order to fix this, the list walk needs to be modified a bit to take
the parent->d_lock. The safe version is no longer necessary, as every
time we remove a child, the parent->d_lock must be released and the
list walk must start over. Each time a child is removed, even though it
may still be on the list, it should be skipped by the first check
in the loop:

		if (!debugfs_positive(child))
			continue;

Cc: stable@vger.kernel.org
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

485d4402