提交 · 09cbfeaf1a5a67bfb3201e0c83c810cecb2efa5a · openeuler / raspberrypi-kernel

05 4月, 2016 1 次提交

mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf

由 Kirill A. Shutemov 提交于 4月 01, 2016

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

09cbfeaf

16 3月, 2016 1 次提交

fuse: return patrial success from fuse_direct_io() · 742f9927

由 Ashish Samant 提交于 3月 14, 2016

If a user calls writev/readv in direct io mode with partially valid data
in the iovec array such that any vector other than the first one in the
array contains invalid data, we currently return the error for the invalid
iovec.

Instead, we should return the number of bytes already written/read and not
the error as we do in the non direct_io case.
Reported-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: NAshish Samant <ashish.samant@oracle.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

742f9927

14 3月, 2016 2 次提交

fuse: Add reference counting for fuse_io_priv · 744742d6

由 Seth Forshee 提交于 3月 11, 2016

The 'reqs' member of fuse_io_priv serves two purposes. First is to track
the number of oustanding async requests to the server and to signal that
the io request is completed. The second is to be a reference count on the
structure to know when it can be freed.

For sync io requests these purposes can be at odds. fuse_direct_IO() wants
to block until the request is done, and since the signal is sent when
'reqs' reaches 0 it cannot keep a reference to the object. Yet it needs to
use the object after the userspace server has completed processing
requests. This leads to some handshaking and special casing that it
needlessly complicated and responsible for at least one race condition.

It's much cleaner and safer to maintain a separate reference count for the
object lifecycle and to let 'reqs' just be a count of outstanding requests
to the userspace server. Then we can know for sure when it is safe to free
the object without any handshaking or special cases.

The catch here is that most of the time these objects are stack allocated
and should not be freed. Initializing these objects with a single reference
that is never released prevents accidental attempts to free the objects.

Fixes: 9d5722b7 ("fuse: handle synchronous iocbs internally")
Cc: stable@vger.kernel.org # v4.1+
Signed-off-by: NSeth Forshee <seth.forshee@canonical.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

744742d6

fuse: do not use iocb after it may have been freed · 7cabc61e

由 Robert Doebbelin 提交于 3月 07, 2016

There's a race in fuse_direct_IO(), whereby is_sync_kiocb() is called on an
iocb that could have been freed if async io has already completed. The fix
in this case is simple and obvious: cache the result before starting io.

It was discovered by KASan:

kernel: ==================================================================
kernel: BUG: KASan: use after free in fuse_direct_IO+0xb1a/0xcc0 at addr ffff88036c414390
Signed-off-by: NRobert Doebbelin <robert@quobyte.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Fixes: bcba24cc ("fuse: enable asynchronous processing direct IO")
Cc: <stable@vger.kernel.org> # 3.10+

7cabc61e

23 1月, 2016 1 次提交

wrappers for ->i_mutex access · 5955102c

由 Al Viro 提交于 1月 22, 2016

parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5955102c

15 1月, 2016 1 次提交

kmemcg: account certain kmem allocations to memcg · 5d097056

由 Vladimir Davydov 提交于 1月 14, 2016

Mark those kmem allocations that are known to be easily triggered from
userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
memcg.  For the list, see below:

 - threadinfo
 - task_struct
 - task_delay_info
 - pid
 - cred
 - mm_struct
 - vm_area_struct and vm_region (nommu)
 - anon_vma and anon_vma_chain
 - signal_struct
 - sighand_struct
 - fs_struct
 - files_struct
 - fdtable and fdtable->full_fds_bits
 - dentry and external_name
 - inode for all filesystems. This is the most tedious part, because
   most filesystems overwrite the alloc_inode method.

The list is far from complete, so feel free to add more objects.
Nevertheless, it should be close to "account everything" approach and
keep most workloads within bounds.  Malevolent users will be able to
breach the limit, but this was possible even with the former "account
everything" approach (simply because it did not account everything in
fact).

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5d097056

31 12月, 2015 1 次提交
- A
  switch ->get_link() to delayed_call, kill ->put_link() · fceef393
  由 Al Viro 提交于 12月 29, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  fceef393
30 12月, 2015 1 次提交

kill free_page_put_link() · cd3417c8

由 Al Viro 提交于 12月 29, 2015

all callers are better off with kfree_put_link()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

cd3417c8

09 12月, 2015 1 次提交

replace ->follow_link() with new method that could stay in RCU mode · 6b255391

由 Al Viro 提交于 11月 17, 2015

new method: ->get_link(); replacement of ->follow_link().  The differences
are:
	* inode and dentry are passed separately
	* might be called both in RCU and non-RCU mode;
the former is indicated by passing it a NULL dentry.
	* when called that way it isn't allowed to block
and should return ERR_PTR(-ECHILD) if it needs to be called
in non-RCU mode.

It's a flagday change - the old method is gone, all in-tree instances
converted.  Conversion isn't hard; said that, so far very few instances
do not immediately bail out when called in RCU mode.  That'll change
in the next commits.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6b255391

10 11月, 2015 3 次提交

fuse: add support for SEEK_HOLE and SEEK_DATA in lseek · 0b5da8db

由 Ravishankar N 提交于 6月 30, 2015

A useful performance improvement for accessing virtual machine images
via FUSE mount.

See https://bugzilla.redhat.com/show_bug.cgi?id=1220173 for a use-case
for glusterFS.
Signed-off-by: NRavishankar N <ravishankar@redhat.com>
Signed-off-by: NMiklos Szeredi <miklos@szeredi.hu>

0b5da8db

fuse: break infinite loop in fuse_fill_write_pages() · 3ca8138f

由 Roman Gushchin 提交于 10月 12, 2015

I got a report about unkillable task eating CPU. Further
investigation shows, that the problem is in the fuse_fill_write_pages()
function. If iov's first segment has zero length, we get an infinite
loop, because we never reach iov_iter_advance() call.

Fix this by calling iov_iter_advance() before repeating an attempt to
copy data from userspace.

A similar problem is described in 124d3b70 ("fix writev regression:
pan hanging unkillable and un-straceable"). If zero-length segmend
is followed by segment with invalid address,
iov_iter_fault_in_readable() checks only first segment (zero-length),
iov_iter_copy_from_user_atomic() skips it, fails at second and
returns zero -> goto again without skipping zero-length segment.

Patch calls iov_iter_advance() before goto again: we'll skip zero-length
segment at second iteraction and iov_iter_fault_in_readable() will detect
invalid address.

Special thanks to Konstantin Khlebnikov, who helped a lot with the commit
description.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Maxim Patlasov <mpatlasov@parallels.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NRoman Gushchin <klamm@yandex-team.ru>
Signed-off-by: NMiklos Szeredi <miklos@szeredi.hu>
Fixes: ea9b9907 ("fuse: implement perform_write")
Cc: <stable@vger.kernel.org>

3ca8138f

cuse: fix memory leak · 2c5816b4

由 Miklos Szeredi 提交于 11月 10, 2015

The problem is that fuse_dev_alloc() acquires an extra reference to cc.fc,
and the original ref count is never dropped.
Reported-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NMiklos Szeredi <miklos@szeredi.hu>
Fixes: cc080e9e ("fuse: introduce per-instance fuse_dev structure")
Cc: <stable@vger.kernel.org> # v4.2+

2c5816b4

23 10月, 2015 1 次提交

Move locks API users to locks_lock_inode_wait() · 4f656367

由 Benjamin Coddington 提交于 10月 22, 2015

Instead of having users check for FL_POSIX or FL_FLOCK to call the correct
locks API function, use the check within locks_lock_inode_wait().  This
allows for some later cleanup.
Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>

4f656367

17 8月, 2015 1 次提交

fs/fuse: fix ioctl type confusion · 8ed1f0e2

由 Jann Horn 提交于 8月 16, 2015

fuse_dev_ioctl() performed fuse_get_dev() on a user-supplied fd,
leading to a type confusion issue. Fix it by checking file->f_op.
Signed-off-by: NJann Horn <jann@thejh.net>
Acked-by: NMiklos Szeredi <miklos@szeredi.hu>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8ed1f0e2

01 7月, 2015 26 次提交

sysfs: Create mountpoints with sysfs_create_mount_point · f9bb4882

由 Eric W. Biederman 提交于 5月 13, 2015

This allows for better documentation in the code and
it allows for a simpler and fully correct version of
fs_fully_visible to be written.

The mount points converted and their filesystems are:
/sys/hypervisor/s390/       s390_hypfs
/sys/kernel/config/         configfs
/sys/kernel/debug/          debugfs
/sys/firmware/efi/efivars/  efivarfs
/sys/fs/fuse/connections/   fusectl
/sys/fs/pstore/             pstore
/sys/kernel/tracing/        tracefs
/sys/fs/cgroup/             cgroup
/sys/kernel/security/       securityfs
/sys/fs/selinux/            selinuxfs
/sys/fs/smackfs/            smackfs

Cc: stable@vger.kernel.org
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

f9bb4882

fuse: separate pqueue for clones · c3696046

由 Miklos Szeredi 提交于 7月 01, 2015

Make each fuse device clone refer to a separate processing queue.  The only
constraint on userspace code is that the request answer must be written to
the same device clone as it was read off.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

c3696046

fuse: introduce per-instance fuse_dev structure · cc080e9e

由 Miklos Szeredi 提交于 7月 01, 2015

Allow fuse device clones to refer to be distinguished.  This patch just
adds the infrastructure by associating a separate "struct fuse_dev" with
each clone.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Reviewed-by: NAshish Samant <ashish.samant@oracle.com>

cc080e9e

fuse: device fd clone · 00c570f4

由 Miklos Szeredi 提交于 7月 01, 2015

Allow an open fuse device to be "cloned".  Userspace can create a clone by:

      newfd = open("/dev/fuse", O_RDWR)
      ioctl(newfd, FUSE_DEV_IOC_CLONE, &oldfd);

At this point newfd will refer to the same fuse connection as oldfd.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Reviewed-by: NAshish Samant <ashish.samant@oracle.com>

00c570f4

fuse: abort: no fc->lock needed for request ending · ee314a87

由 Miklos Szeredi 提交于 7月 01, 2015

In fuse_abort_conn() when all requests are on private lists we no longer
need fc->lock protection.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Reviewed-by: NAshish Samant <ashish.samant@oracle.com>

ee314a87

fuse: no fc->lock for pqueue parts · 46c34a34

由 Miklos Szeredi 提交于 7月 01, 2015

Remove fc->lock protection from processing queue members, now protected by
fpq->lock.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Reviewed-by: NAshish Samant <ashish.samant@oracle.com>

46c34a34

fuse: no fc->lock in request_end() · efe2800f

由 Miklos Szeredi 提交于 7月 01, 2015

No longer need to call request_end() with the connection lock held.  We
still protect the background counters and queue with fc->lock, so acquire
it if necessary.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Reviewed-by: NAshish Samant <ashish.samant@oracle.com>

efe2800f

fuse: cleanup request_end() · 1e6881c3

由 Miklos Szeredi 提交于 7月 01, 2015

Now that we atomically test having already done everything we no longer
need other protection.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Reviewed-by: NAshish Samant <ashish.samant@oracle.com>

1e6881c3

fuse: request_end(): do once · 365ae710

由 Miklos Szeredi 提交于 7月 01, 2015

When the connection is aborted it is possible that request_end() will be
called twice.  Use atomic test and set to do the actual ending only once.

test_and_set_bit() also provides the necessary barrier semantics so no
explicit smp_wmb() is necessary.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Reviewed-by: NAshish Samant <ashish.samant@oracle.com>

365ae710

fuse: add req flag for private list · 77cd9d48

由 Miklos Szeredi 提交于 7月 01, 2015

When an unlocked request is aborted, it is moved from fpq->io to a private
list.  Then, after unlocking fpq->lock, the private list is processed and
the requests are finished off.

To protect the private list, we need to mark the request with a flag, so if
in the meantime the request is unlocked the list is not corrupted.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Reviewed-by: NAshish Samant <ashish.samant@oracle.com>

77cd9d48

fuse: pqueue locking · 45a91cb1

由 Miklos Szeredi 提交于 7月 01, 2015

Add a fpq->lock for protecting members of struct fuse_pqueue and FR_LOCKED
request flag.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Reviewed-by: NAshish Samant <ashish.samant@oracle.com>

45a91cb1

fuse: abort: group pqueue accesses · 24b4d33d

由 Miklos Szeredi 提交于 7月 01, 2015

Rearrange fuse_abort_conn() so that processing queue accesses are grouped
together.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Reviewed-by: NAshish Samant <ashish.samant@oracle.com>

24b4d33d

fuse: cleanup fuse_dev_do_read() · 82cbdcd3

由 Miklos Szeredi 提交于 7月 01, 2015

 - locked list_add() + list_del_init() cancel out

 - common handling of case when request is ended here in the read phase
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Reviewed-by: NAshish Samant <ashish.samant@oracle.com>

82cbdcd3

M
fuse: move list_del_init() from request_end() into callers · f377cb79
由 Miklos Szeredi 提交于 7月 01, 2015
```
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
```
f377cb79

fuse: duplicate ->connected in pqueue · e96edd94

由 Miklos Szeredi 提交于 7月 01, 2015

This will allow checking ->connected just with the processing queue lock.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Reviewed-by: NAshish Samant <ashish.samant@oracle.com>

e96edd94

fuse: separate out processing queue · 3a2b5b9c

由 Miklos Szeredi 提交于 7月 01, 2015

This is just two fields: fc->io and fc->processing.

This patch just rearranges the fields, no functional change.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Reviewed-by: NAshish Samant <ashish.samant@oracle.com>

3a2b5b9c

fuse: simplify request_wait() · 5250921b

由 Miklos Szeredi 提交于 7月 01, 2015

wait_event_interruptible_exclusive_locked() will do everything
request_wait() does, so replace it.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Reviewed-by: NAshish Samant <ashish.samant@oracle.com>

5250921b

fuse: no fc->lock for iqueue parts · fd22d62e

由 Miklos Szeredi 提交于 7月 01, 2015

Remove fc->lock protection from input queue members, now protected by
fiq->waitq.lock.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Reviewed-by: NAshish Samant <ashish.samant@oracle.com>

fd22d62e

fuse: allow interrupt queuing without fc->lock · 8f7bb368

由 Miklos Szeredi 提交于 7月 01, 2015

Interrupt is only queued after the request has been sent to userspace.
This is either done in request_wait_answer() or fuse_dev_do_read()
depending on which state the request is in at the time of the interrupt.
If it's not yet sent, then queuing the interrupt is postponed until the
request is read.  Otherwise (the request has already been read and is
waiting for an answer) the interrupt is queued immedidately.

We want to call queue_interrupt() without fc->lock protection, in which
case there can be a race between the two functions:

 - neither of them queue the interrupt (thinking the other one has already
   done it).

 - both of them queue the interrupt

The first one is prevented by adding memory barriers, the second is
prevented by checking (under fiq->waitq.lock) if the interrupt has already
been queued.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

8f7bb368

fuse: iqueue locking · 4ce60812

由 Miklos Szeredi 提交于 7月 01, 2015

Use fiq->waitq.lock for protecting members of struct fuse_iqueue and
FR_PENDING request flag, previously protected by fc->lock.

Following patches will remove fc->lock protection from these members.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Reviewed-by: NAshish Samant <ashish.samant@oracle.com>

4ce60812

fuse: dev read: split list_move · ef759258

由 Miklos Szeredi 提交于 7月 01, 2015

Different lists will need different locks.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Reviewed-by: NAshish Samant <ashish.samant@oracle.com>

ef759258

fuse: abort: group iqueue accesses · 8c91189a

由 Miklos Szeredi 提交于 7月 01, 2015

Rearrange fuse_abort_conn() so that input queue accesses are grouped
together.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Reviewed-by: NAshish Samant <ashish.samant@oracle.com>

8c91189a

fuse: duplicate ->connected in iqueue · e16714d8

由 Miklos Szeredi 提交于 7月 01, 2015

This will allow checking ->connected just with the input queue lock.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Reviewed-by: NAshish Samant <ashish.samant@oracle.com>

e16714d8

fuse: separate out input queue · f88996a9

由 Miklos Szeredi 提交于 7月 01, 2015

The input queue contains normal requests (fc->pending), forgets
(fc->forget_*) and interrupts (fc->interrupts).  There's also fc->waitq and
fc->fasync for waking up the readers of the fuse device when a request is
available.

The fc->reqctr is also moved to the input queue (assigned to the request
when the request is added to the input queue.

This patch just rearranges the fields, no functional change.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Reviewed-by: NAshish Samant <ashish.samant@oracle.com>

f88996a9

fuse: req state use flags · 33e14b4d

由 Miklos Szeredi 提交于 7月 01, 2015

Use flags for representing the state in fuse_req.  This is needed since
req->list will be protected by different locks in different states, hence
we'll want the state itself to be split into distinct bits, each protected
with the relevant lock in that state.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

33e14b4d

fuse: simplify req states · 7a3b2c75

由 Miklos Szeredi 提交于 7月 01, 2015

FUSE_REQ_INIT is actually the same state as FUSE_REQ_PENDING and
FUSE_REQ_READING and FUSE_REQ_WRITING can be merged into a common
FUSE_REQ_IO state.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Reviewed-by: NAshish Samant <ashish.samant@oracle.com>

7a3b2c75