提交 · c4159a75b64c0e67caededf4d7372c1b58a5f42a · OpenHarmony / kernel_linux

10 8月, 2016 1 次提交

mm: memcontrol: only mark charged pages with PageKmemcg · c4159a75

由 Vladimir Davydov 提交于 8月 08, 2016

To distinguish non-slab pages charged to kmemcg we mark them PageKmemcg,
which sets page->_mapcount to -512.  Currently, we set/clear PageKmemcg
in __alloc_pages_nodemask()/free_pages_prepare() for any page allocated
with __GFP_ACCOUNT, including those that aren't actually charged to any
cgroup, i.e. allocated from the root cgroup context.  To avoid overhead
in case cgroups are not used, we only do that if memcg_kmem_enabled() is
true.  The latter is set iff there are kmem-enabled memory cgroups
(online or offline).  The root cgroup is not considered kmem-enabled.

As a result, if a page is allocated with __GFP_ACCOUNT for the root
cgroup when there are kmem-enabled memory cgroups and is freed after all
kmem-enabled memory cgroups were removed, e.g.

  # no memory cgroups has been created yet, create one
  mkdir /sys/fs/cgroup/memory/test
  # run something allocating pages with __GFP_ACCOUNT, e.g.
  # a program using pipe
  dmesg | tail
  # remove the memory cgroup
  rmdir /sys/fs/cgroup/memory/test

we'll get bad page state bug complaining about page->_mapcount != -1:

  BUG: Bad page state in process swapper/0  pfn:1fd945c
  page:ffffea007f651700 count:0 mapcount:-511 mapping:          (null) index:0x0
  flags: 0x1000000000000000()

To avoid that, let's mark with PageKmemcg only those pages that are
actually charged to and hence pin a non-root memory cgroup.

Fixes: 4949148a ("mm: charge/uncharge kmemcg from generic page allocator paths")
Reported-and-tested-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c4159a75

27 7月, 2016 1 次提交

pipe: account to kmemcg · d86133bd

由 Vladimir Davydov 提交于 7月 26, 2016

Pipes can consume a significant amount of system memory, hence they
should be accounted to kmemcg.

This patch marks pipe_inode_info and anonymous pipe buffer page
allocations as __GFP_ACCOUNT so that they would be charged to kmemcg.
Note, since a pipe buffer page can be "stolen" and get reused for other
purposes, including mapping to userspace, we clear PageKmemcg thus
resetting page->_mapcount and uncharge it in anon_pipe_buf_steal, which
is introduced by this patch.

A note regarding anon_pipe_buf_steal implementation.  We allow to steal
the page if its ref count equals 1.  It looks racy, but it is correct
for anonymous pipe buffer pages, because:

 - We lock out all other pipe users, because ->steal is called with
   pipe_lock held, so the page can't be spliced to another pipe from
   under us.

 - The page is not on LRU and it never was.

 - Thus a parallel thread can access it only by PFN. Although this is
   quite possible (e.g. see page_idle_get_page and balloon_page_isolate)
   this is not dangerous, because all such functions do is increase page
   ref count, check if the page is the one they are looking for, and
   decrease ref count if it isn't. Since our page is clean except for
   PageKmemcg mark, which doesn't conflict with other _mapcount users,
   the worst that can happen is we see page_count > 2 due to a transient
   ref, in which case we false-positively abort ->steal, which is still
   fine, because ->steal is not guaranteed to succeed.

Link: http://lkml.kernel.org/r/20160527150313.GD26059@esperanzaSigned-off-by: NVladimir Davydov <vdavydov@virtuozzo.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d86133bd

05 4月, 2016 1 次提交

mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf

由 Kirill A. Shutemov 提交于 4月 01, 2016

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

09cbfeaf

20 1月, 2016 1 次提交

pipe: limit the per-user amount of pages allocated in pipes · 759c0114

由 Willy Tarreau 提交于 1月 18, 2016

On no-so-small systems, it is possible for a single process to cause an
OOM condition by filling large pipes with data that are never read. A
typical process filling 4000 pipes with 1 MB of data will use 4 GB of
memory. On small systems it may be tricky to set the pipe max size to
prevent this from happening.

This patch makes it possible to enforce a per-user soft limit above
which new pipes will be limited to a single page, effectively limiting
them to 4 kB each, as well as a hard limit above which no new pipes may
be created for this user. This has the effect of protecting the system
against memory abuse without hurting other users, and still allowing
pipes to work correctly though with less data at once.

The limit are controlled by two new sysctls : pipe-user-pages-soft, and
pipe-user-pages-hard. Both may be disabled by setting them to zero. The
default soft limit allows the default number of FDs per process (1024)
to create pipes of the default size (64kB), thus reaching a limit of 64MB
before starting to create only smaller pipes. With 256 processes limited
to 1024 FDs each, this results in 1024*64kB + (256*1024 - 1024) * 4kB =
1084 MB of memory allocated for a user. The hard limit is disabled by
default to avoid breaking existing applications that make intensive use
of pipes (eg: for splicing).

Reported-by: socketpair@gmail.com
Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Mitigates: CVE-2013-4312 (Linux 2.0+)
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NWilly Tarreau <w@1wt.eu>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

759c0114

11 11月, 2015 2 次提交

fs/pipe.c: return error code rather than 0 in pipe_write() · 6ae08069

由 Eric Biggers 提交于 10月 17, 2015

pipe_write() would return 0 if it failed to merge the beginning of the
data to write with the last, partially filled pipe buffer.  It should
return an error code instead.  Userspace programs could be confused by
write() returning 0 when called with a nonzero 'count'.

The EFAULT error case was a regression from f0d1bec9 ("new helper:
copy_page_from_iter()"), while the ops->confirm() error case was a much
older bug.

Test program:

	#include <assert.h>
	#include <errno.h>
	#include <unistd.h>

	int main(void)
	{
		int fd[2];
		char data[1] = {0};

		assert(0 == pipe(fd));
		assert(1 == write(fd[1], data, 1));

		/* prior to this patch, write() returned 0 here  */
		assert(-1 == write(fd[1], NULL, 1));
		assert(errno == EFAULT);
	}

Cc: stable@vger.kernel.org # at least v3.15+
Signed-off-by: NEric Biggers <ebiggers3@gmail.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6ae08069

fs/pipe.c: preserve alloc_file() error code · e9bb1f9b

由 Eric Biggers 提交于 10月 17, 2015

If sys_pipe() was unable to allocate a 'struct file', it always failed
with ENFILE, which means "The number of simultaneously open files in the
system would exceed a system-imposed limit." However, alloc_file()
actually returns an ERR_PTR value and might fail with other error codes.
Currently, in addition to ENFILE, it can fail with ENOMEM, potentially
when there are few open files in the system.  Update sys_pipe() to
preserve this error code.

In a prior submission of a similar patch (1) some concern was raised
about introducing a new error code for sys_pipe().  However, for most
system calls, programs cannot assume that new error codes will never be
introduced.  In addition, ENOMEM was, in fact, already a possible error
code for sys_pipe(), in the case where the file descriptor table could
not be expanded due to insufficient memory.

	(1) http://comments.gmane.org/gmane.linux.kernel/1357942Signed-off-by: NEric Biggers <ebiggers3@gmail.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e9bb1f9b

16 4月, 2015 1 次提交
- D
  VFS: assorted weird filesystems: d_inode() annotations · 75c3cfa8
  由 David Howells 提交于 3月 17, 2015
```
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  75c3cfa8
12 4月, 2015 1 次提交

make new_sync_{read,write}() static · 5d5d5689

由 Al Viro 提交于 4月 03, 2015

All places outside of core VFS that checked ->read and ->write for being NULL or
called the methods directly are gone now, so NULL {read,write} with non-NULL
{read,write}_iter will do the right thing in all cases.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5d5d5689

26 3月, 2015 1 次提交

fs: move struct kiocb to fs.h · e2e40f2c

由 Christoph Hellwig 提交于 2月 22, 2015

struct kiocb now is a generic I/O container, so move it to fs.h.
Also do a #include diet for aio.h while we're at it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e2e40f2c

07 5月, 2014 3 次提交

new helper: copy_page_from_iter() · f0d1bec9

由 Al Viro 提交于 4月 03, 2014

parallel to copy_page_to_iter().  pipe_write() switched to it (and became
->write_iter()).
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

f0d1bec9

A
pipe: switch to ->read_iter() · fb9096a3
由 Al Viro 提交于 4月 02, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
fb9096a3

start adding the tag to iov_iter · 71d8e532

由 Al Viro 提交于 3月 05, 2014

For now, just use the same thing we pass to ->direct_IO() - it's all
iovec-based at the moment.  Pass it explicitly to iov_iter_init() and
account for kvec vs. iovec in there, by the same kludge NFS ->direct_IO()
uses.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

71d8e532

02 4月, 2014 2 次提交
- A
  switch pipe_read() to copy_page_to_iter() · 637b58c2
  由 Al Viro 提交于 2月 03, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  637b58c2
- A
  pipe: kill ->map() and ->unmap() · fbb32750
  由 Al Viro 提交于 2月 02, 2014
```
all pipe_buffer_operations have the same instances of those...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  fbb32750
24 1月, 2014 1 次提交

fs/pipe.c: skip file_update_time on frozen fs · 7e775f46

由 Dmitry Monakhov 提交于 1月 23, 2014

Pipe has no data associated with fs so it is not good idea to block
pipe_write() if FS is frozen, but we can not update file's time on such
filesystem. Let's use same idea as we use in touch_time().

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=65701Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7e775f46

03 12月, 2013 1 次提交

vfs: fix subtle use-after-free of pipe_inode_info · b0d8d229

由 Linus Torvalds 提交于 12月 02, 2013

The pipe code was trying (and failing) to be very careful about freeing
the pipe info only after the last access, with a pattern like:

        spin_lock(&inode->i_lock);
        if (!--pipe->files) {
                inode->i_pipe = NULL;
                kill = 1;
        }
        spin_unlock(&inode->i_lock);
        __pipe_unlock(pipe);
        if (kill)
                free_pipe_info(pipe);

where the final freeing is done last.

HOWEVER.  The above is actually broken, because while the freeing is
done at the end, if we have two racing processes releasing the pipe
inode info, the one that *doesn't* free it will decrement the ->files
count, and unlock the inode i_lock, but then still use the
"pipe_inode_info" afterwards when it does the "__pipe_unlock(pipe)".

This is *very* hard to trigger in practice, since the race window is
very small, and adding debug options seems to just hide it by slowing
things down.

Simon originally reported this way back in July as an Oops in
kmem_cache_allocate due to a single bit corruption (due to the final
"spin_unlock(pipe->mutex.wait_lock)" incrementing a field in a different
allocation that had re-used the free'd pipe-info), it's taken this long
to figure out.

Since the 'pipe->files' accesses aren't even protected by the pipe lock
(we very much use the inode lock for that), the simple solution is to
just drop the pipe lock early.  And since there were two users of this
pattern, create a helper function for it.

Introduced commit ba5bb147 ("pipe: take allocation and freeing of
pipe_inode_info out of ->i_mutex").
Reported-by: NSimon Kirby <sim@hostway.ca>
Reported-by: NIan Applegate <ia@cloudflare.com>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org   # v3.10+
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b0d8d229

08 5月, 2013 1 次提交

aio: don't include aio.h in sched.h · a27bb332

由 Kent Overstreet 提交于 5月 07, 2013

Faster kernel compiles by way of fewer unnecessary includes.

[akpm@linux-foundation.org: fix fallout]
[akpm@linux-foundation.org: fix build]
Signed-off-by: NKent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a27bb332

10 4月, 2013 11 次提交

get rid of the last free_pipe_info() callers · 4b8a8f1e

由 Al Viro 提交于 3月 21, 2013

and rename __free_pipe_info() to free_pipe_info()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4b8a8f1e

A
get rid of alloc_pipe_info() argument · 7bee130e
由 Al Viro 提交于 3月 21, 2013
```
not used anymore
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
7bee130e

get rid of pipe->inode · 6447a3cf

由 Al Viro 提交于 3月 21, 2013

it's used only as a flag to distinguish normal pipes/FIFOs from the
internal per-task one used by file-to-file splice.  And pipe->files
would work just as well for that purpose...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6447a3cf

introduce variants of pipe_lock/pipe_unlock for real pipes/FIFOs · ebec73f4

由 Al Viro 提交于 3月 21, 2013

fs/pipe.c file_operations methods *know* that pipe is not an internal one;
no need to check pipe->inode for those callers.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ebec73f4

pipe: set file->private_data to ->i_pipe · de32ec4c

由 Al Viro 提交于 3月 21, 2013

simplify get_pipe_info(), while we are at it
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

de32ec4c

pipe: don't use ->i_mutex · 72b0d9aa

由 Al Viro 提交于 3月 21, 2013

now it can be done - put mutex into pipe_inode_info, use it instead
of ->i_mutex
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

72b0d9aa

pipe: take allocation and freeing of pipe_inode_info out of ->i_mutex · ba5bb147

由 Al Viro 提交于 3月 21, 2013

* new field - pipe->files; number of struct file over that pipe (all
  sharing the same inode, of course); protected by inode->i_lock.
* pipe_release() decrements pipe->files, clears inode->i_pipe when
  if the counter has reached 0 (all under ->i_lock) and, in that case,
  frees pipe after having done pipe_unlock()
* fifo_open() starts with grabbing ->i_lock, and either bumps pipe->files
  if ->i_pipe was non-NULL or allocates a new pipe (dropping and regaining
  ->i_lock) and rechecks ->i_pipe; if it's still NULL, inserts new pipe
  there, otherwise bumps ->i_pipe->files and frees the one we'd allocated.
  At that point we know that ->i_pipe is non-NULL and won't go away, so
  we can do pipe_lock() on it and proceed as we used to.  If we end up
  failing, decrement pipe->files and if it reaches 0 clear ->i_pipe and
  free the sucker after pipe_unlock().
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ba5bb147

pipe: preparation to new locking rules · 18c03cfd

由 Al Viro 提交于 3月 21, 2013

* use the fact that file_inode(file)->i_pipe doesn't change
  while the file is opened - no locks needed to access that.
* switch to pipe_lock/pipe_unlock where it's easy to do
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

18c03cfd

A
pipe: switch wait_for_partner() and wake_up_partner() to pipe_inode_info · fc7478a2
由 Al Viro 提交于 3月 21, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
fc7478a2
A
pipe: fold file_operations instances in one · 599a0ac1
由 Al Viro 提交于 3月 12, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
599a0ac1
A
fold fifo.c into pipe.c · f776c738
由 Al Viro 提交于 3月 12, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
f776c738

12 3月, 2013 1 次提交

vfs: fix pipe counter breakage · a930d879

由 Al Viro 提交于 3月 12, 2013

If you open a pipe for neither read nor write, the pipe code will not
add any usage counters to the pipe, causing the 'struct pipe_inode_info"
to be potentially released early.

That doesn't normally matter, since you cannot actually use the pipe,
but the pipe release code - particularly fasync handling - still expects
the actual pipe infrastructure to all be there.  And rather than adding
NULL pointer checks, let's just disallow this case, the same way we
already do for the named pipe ("fifo") case.

This is ancient going back to pre-2.4 days, and until trinity, nobody
naver noticed.
Reported-by: NDave Jones <davej@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a930d879

23 2月, 2013 2 次提交

fs: Preserve error code in get_empty_filp(), part 2 · 39b65252

由 Anatol Pomozov 提交于 9月 12, 2012

Allocating a file structure in function get_empty_filp() might fail because
of several reasons:
 - not enough memory for file structures
 - operation is not allowed
 - user is over its limit

Currently the function returns NULL in all cases and we loose the exact
reason of the error. All callers of get_empty_filp() assume that the function
can fail with ENFILE only.

Return error through pointer. Change all callers to preserve this error code.

[AV: cleaned up a bit, carved the get_empty_filp() part out into a separate commit
(things remaining here deal with alloc_file()), removed pipe(2) behaviour change]
Signed-off-by: NAnatol Pomozov <anatol.pomozov@gmail.com>
Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

39b65252

A
new helper: file_inode(file) · 496ad9aa
由 Al Viro 提交于 1月 23, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
496ad9aa

27 9月, 2012 1 次提交

pipe(2) - race-free error recovery · 5b249b1b

由 Al Viro 提交于 8月 19, 2012

don't mess with sys_close() if copy_to_user() fails; just postpone
fd_install() until we know it hasn't.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5b249b1b

30 7月, 2012 1 次提交
- A
  consolidate pipe file creation · e4fad8e5
  由 Al Viro 提交于 7月 21, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  e4fad8e5
24 7月, 2012 1 次提交
- C
  pipe: remove KM_USER0 from comments · 2164d334
  由 Cong Wang 提交于 6月 23, 2012
```
Signed-off-by: NCong Wang <amwang@redhat.com>
```
  2164d334
02 6月, 2012 1 次提交

fs: introduce inode operation ->update_time · c3b2da31

由 Josef Bacik 提交于 3月 26, 2012

Btrfs has to make sure we have space to allocate new blocks in order to modify
the inode, so updating time can fail.  We've gotten around this by having our
own file_update_time but this is kind of a pain, and Christoph has indicated he
would like to make xfs do something different with atime updates.  So introduce
->update_time, where we will deal with i_version an a/m/c time updates and
indicate which changes need to be made.  The normal version just does what it
has always done, updates the time and marks the inode dirty, and then
filesystems can choose to do something different.

I've gone through all of the users of file_update_time and made them check for
errors with the exception of the fault code since it's complicated and I wasn't
quite sure what to do there, also Jan is going to be pushing the file time
updates into page_mkwrite for those who have it so that should satisfy btrfs and
make it not a big deal to check the file_update_time() return code in the
generic fault path. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

c3b2da31

01 6月, 2012 1 次提交

pipe: return -ENOIOCTLCMD instead of -EINVAL on unknown ioctl command · a1d49449

由 Will Deacon 提交于 5月 31, 2012

As described in commit 07d106d0 ("vfs: fix up ENOIOCTLCMD error
handling"), drivers should return -ENOIOCTLCMD if they receive an ioctl
command which they don't understand.  Doing so will result in -ENOTTY
being returned to userspace, which matches the behaviour of the compat
layer if it fails to translate an ioctl command.

This patch fixes the pipe ioctl to return -ENOIOCTLCMD instead of -EINVAL
when passed an unknown ioctl command.
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: NAlan Cox <alan@linux.intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a1d49449

31 5月, 2012 1 次提交

pipe: return -ENOIOCTLCMD instead of -EINVAL on unknown ioctl command · 46ce341b

由 Will Deacon 提交于 5月 25, 2012

As described in commit 07d106d0 ("vfs: fix up ENOIOCTLCMD error
handling"), drivers should return -ENOIOCTLCMD if they receive an ioctl
command which they don't understand. Doing so will result in -ENOTTY
being returned to userspace, which matches the behaviour of the compat
layer if it fails to translate an ioctl command.

This patch fixes the pipe ioctl to return -ENOIOCTLCMD instead of
-EINVAL when passed an unknown ioctl command.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

46ce341b

30 4月, 2012 1 次提交

pipes: add a "packetized pipe" mode for writing · 9883035a

由 Linus Torvalds 提交于 4月 29, 2012

The actual internal pipe implementation is already really about
individual packets (called "pipe buffers"), and this simply exposes that
as a special packetized mode.

When we are in the packetized mode (marked by O_DIRECT as suggested by
Alan Cox), a write() on a pipe will not merge the new data with previous
writes, so each write will get a pipe buffer of its own.  The pipe
buffer is then marked with the PIPE_BUF_FLAG_PACKET flag, which in turn
will tell the reader side to break the read at that boundary (and throw
away any partial packet contents that do not fit in the read buffer).

End result: as long as you do writes less than PIPE_BUF in size (so that
the pipe doesn't have to split them up), you can now treat the pipe as a
packet interface, where each read() system call will read one packet at
a time.  You can just use a sufficiently big read buffer (PIPE_BUF is
sufficient, since bigger than that doesn't guarantee atomicity anyway),
and the return value of the read() will naturally give you the size of
the packet.

NOTE! We do not support zero-sized packets, and zero-sized reads and
writes to a pipe continue to be no-ops.  Also note that big packets will
currently be split at write time, but that the size at which that
happens is not really specified (except that it's bigger than PIPE_BUF).
Currently that limit is the system page size, but we might want to
explicitly support bigger packets some day.

The main user for this is going to be the autofs packet interface,
allowing us to stop having to care so deeply about exact packet sizes
(which have had bugs with 32/64-bit compatibility modes).  But user
space can create packetized pipes with "pipe2(fd, O_DIRECT)", which will
fail with an EINVAL on kernels that do not support this interface.
Tested-by: NMichael Tokarev <mjt@tls.msk.ru>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: David Miller <davem@davemloft.net>
Cc: Ian Kent <raven@themaw.net>
Cc: Thomas Meyer <thomas@m3y3r.de>
Cc: stable@kernel.org  # needed for systemd/autofs interaction fix
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9883035a

24 3月, 2012 1 次提交

magic.h: move some FS magic numbers into magic.h · b502bd11

由 Muthu Kumar 提交于 3月 23, 2012

- Move open-coded filesystem magic numbers into magic.h

- Rearrange magic.h so that the filesystem-related constants are grouped
  together.
Signed-off-by: NMuthukumar R <muthur@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b502bd11

20 3月, 2012 1 次提交
- C
  fs: remove the second argument of k[un]map_atomic() · e8e3c3d6
  由 Cong Wang 提交于 11月 25, 2011
```
Acked-by: NBenjamin LaHaise <bcrl@kvack.org>
Signed-off-by: NCong Wang <amwang@redhat.com>
```
  e8e3c3d6

OpenHarmony / kernel_linux 上一次同步 4 年多

OpenHarmony / kernel_linux
上一次同步 4 年多