提交 · fd2f3a06d30c85a17cf035ebc60c88c2a13a8ece · openeuler / Kernel

23 8月, 2014 1 次提交

nfs: change nfs_page_group_lock argument · fd2f3a06

由 Weston Andros Adamson 提交于 8月 08, 2014

Flip the meaning of the second argument from 'wait' to 'nonblock' to
match related functions. Update all five calls to reflect this change.
Signed-off-by: NWeston Andros Adamson <dros@primarydata.com>
Reviewed-by: NPeng Tao <tao.peng@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

fd2f3a06

20 8月, 2014 3 次提交

ext3: Count internal journal as bsddf overhead in ext3_statfs · e6d8fb34

由 Chin-Tsung Cheng 提交于 8月 15, 2014

The journal blocks of external journal device should not
be counted as overhead.
Signed-off-by: NChin-Tsung Cheng <chintzung@gmail.com>
Signed-off-by: NJan Kara <jack@suse.cz>

e6d8fb34

isofs: Fix unbounded recursion when processing relocated directories · 410dd3cf

由 Jan Kara 提交于 8月 17, 2014

We did not check relocated directory in any way when processing Rock
Ridge 'CL' tag. Thus a corrupted isofs image can possibly have a CL
entry pointing to another CL entry leading to possibly unbounded
recursion in kernel code and thus stack overflow or deadlocks (if there
is a loop created from CL entries).

Fix the problem by not allowing CL entry to point to a directory entry
with CL entry (such use makes no good sense anyway) and by checking
whether CL entry doesn't point to itself.

CC: stable@vger.kernel.org
Reported-by: NChris Evans <cevans@google.com>
Signed-off-by: NJan Kara <jack@suse.cz>

410dd3cf

udf: avoid unneeded up_write when fail to add entry in ->symlink · 85cd083b

由 Chao Yu 提交于 8月 09, 2014

We have released the ->i_data_sem before invoking udf_add_entry(),
so in following error path, we should not release this lock again.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJan Kara <jack@suse.cz>

85cd083b

18 8月, 2014 3 次提交

[SMB3] Enable fallocate -z support for SMB3 mounts · 30175628

由 Steve French 提交于 8月 17, 2014

fallocate -z (FALLOC_FL_ZERO_RANGE) can map to SMB3
FSCTL_SET_ZERO_DATA SMB3 FSCTL but FALLOC_FL_ZERO_RANGE
when called without the FALLOC_FL_KEEPSIZE flag set could want
the file size changed so we can not support that subcase unless
the file is cached (and thus we know the file size).
Signed-off-by: NSteve French <smfrench@gmail.com>
Reviewed-by: NPavel Shilovsky <pshilovsky@samba.org>

30175628

enable fallocate punch hole ("fallocate -p") for SMB3 · 31742c5a

由 Steve French 提交于 8月 17, 2014

Implement FALLOC_FL_PUNCH_HOLE (which does not change the file size
fortunately so this matches the behavior of the equivalent SMB3
fsctl call) for SMB3 mounts.  This allows "fallocate -p" to work.
It requires that the server support setting files as sparse
(which Windows allows).
Signed-off-by: NSteve French <smfrench@gmail.com>

31742c5a

Incorrect error returned on setting file compressed on SMB2 · ad3829cf

由 Steve French 提交于 8月 17, 2014

When the server (for an SMB2 or SMB3 mount) doesn't support
an ioctl (such as setting the compressed flag
on a file) we were incorrectly returning EIO instead
of EOPNOTSUPP, this is confusing e.g. doing chattr +c to a file
on a non-btrfs Samba partition, now the error returned is more
intuitive to the user.  Also fixes error mapping on setting
hardlink to servers which don't support that.
Signed-off-by: NSteve French <smfrench@gmail.com>
Reviewed-by: NDavid Disseldorp <ddiss@suse.de>

ad3829cf

17 8月, 2014 3 次提交

CIFS: Fix wrong directory attributes after rename · b46799a8

由 Pavel Shilovsky 提交于 8月 18, 2014

When we requests rename we also need to update attributes
of both source and target parent directories. Not doing it
causes generic/309 xfstest to fail on SMB2 mounts. Fix this
by marking these directories for force revalidating.

Cc: <stable@vger.kernel.org>
Signed-off-by: NPavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: NSteve French <smfrench@gmail.com>

b46799a8

CIFS: Fix SMB2 readdir error handling · 52755808

由 Pavel Shilovsky 提交于 8月 18, 2014

SMB2 servers indicates the end of a directory search with
STATUS_NO_MORE_FILE error code that is not processed now.
This causes generic/257 xfstest to fail. Fix this by triggering
the end of search by this error code in SMB2_query_directory.

Also when negotiating CIFS protocol we tell the server to close
the search automatically at the end and there is no need to do
it itself. In the case of SMB2 protocol, we need to close it
explicitly - separate close directory checks for different
protocols.

Cc: <stable@vger.kernel.org>
Signed-off-by: NPavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: NSteve French <smfrench@gmail.com>

52755808

[CIFS] Possible null ptr deref in SMB2_tcon · 18f39e7b

由 Steve French 提交于 8月 17, 2014

As Raphael Geissert pointed out, tcon_error_exit can dereference tcon
and there is one path in which tcon can be null.
Signed-off-by: NSteve French <smfrench@gmail.com>
CC: Stable <stable@vger.kernel.org> # v3.7+
Reported-by: NRaphael Geissert <geissert@debian.org>

18f39e7b

16 8月, 2014 3 次提交

[CIFS] Workaround MacOS server problem with SMB2.1 write · 754789a1

由 Steve French 提交于 8月 15, 2014

 response

Writes fail to Mac servers with SMB2.1 mounts (works with cifs though) due
to them sending an incorrect RFC1001 length for the SMB2.1 Write response.
Workaround this problem. MacOS server sends a write response with 3 bytes
of pad beyond the end of the SMB itself.  The RFC1001 length is 3 bytes
more than the sum of the SMB2.1 header length + the write reponse.

Incorporate feedback from Jeff and JRA to allow servers to send
a tcp frame that is even more than three bytes too long
(ie much longer than the SMB2/SMB3 request that it contains) but
we do log it once now. In the earlier version of the patch I had
limited how far off the length field could be before we fail the request.
Signed-off-by: NSteve French <smfrench@gmail.com>

754789a1

cifs: handle lease F_UNLCK requests properly · 02440806

由 Jeff Layton 提交于 8月 09, 2014

Currently any F_UNLCK request for a lease just gets back -EAGAIN. Allow
them to go immediately to generic_setlease instead.
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NSteve French <smfrench@gmail.com>

02440806

Cleanup sparse file support by creating worker function for it · d43cc793

由 Steve French 提交于 8月 13, 2014

Simply move code to new function (for clarity). Function sets or clears
the sparse file attribute flag.
Signed-off-by: NSteve French <smfrench@gmail.com>
Reviewed-by: NDavid Disseldorp <ddiss@samba.org>

d43cc793

15 8月, 2014 9 次提交

btrfs: disable strict file flushes for renames and truncates · 8d875f95

由 Chris Mason 提交于 8月 12, 2014

Truncates and renames are often used to replace old versions of a file
with new versions.  Applications often expect this to be an atomic
replacement, even if they haven't done anything to make sure the new
version is fully on disk.

Btrfs has strict flushing in place to make sure that renaming over an
old file with a new file will fully flush out the new file before
allowing the transaction commit with the rename to complete.

This ordering means the commit code needs to be able to lock file pages,
and there are a few paths in the filesystem where we will try to end a
transaction with the page lock held.  It's rare, but these things can
deadlock.

This patch removes the ordered flushes and switches to a best effort
filemap_flush like ext4 uses. It's not perfect, but it should fix the
deadlocks.
Signed-off-by: NChris Mason <clm@fb.com>

8d875f95

Btrfs: fix csum tree corruption, duplicate and outdated checksums · 27b9a812

由 Filipe Manana 提交于 8月 09, 2014

Under rare circumstances we can end up leaving 2 versions of a checksum
for the same file extent range.

The reason for this is that after calling btrfs_next_leaf we process
slot 0 of the leaf it returns, instead of processing the slot set in
path->slots[0]. Most of the time (by far) path->slots[0] is 0, but after
btrfs_next_leaf() releases the path and before it searches for the next
leaf, another task might cause a split of the next leaf, which migrates
some of its keys to the leaf we were processing before calling
btrfs_next_leaf(). In this case btrfs_next_leaf() returns again the
same leaf but with path->slots[0] having a slot number corresponding
to the first new key it got, that is, a slot number that didn't exist
before calling btrfs_next_leaf(), as the leaf now has more keys than
it had before. So we must really process the returned leaf starting at
path->slots[0] always, as it isn't always 0, and the key at slot 0 can
have an offset much lower than our search offset/bytenr.

For example, consider the following scenario, where we have:

sums->bytenr: 40157184, sums->len: 16384, sums end: 40173568
four 4kb file data blocks with offsets 40157184, 40161280, 40165376, 40169472

  Leaf N:

    slot = 0                           slot = btrfs_header_nritems() - 1
  |-------------------------------------------------------------------|
  | [(CSUM CSUM 39239680), size 8] ... [(CSUM CSUM 40116224), size 4] |
  |-------------------------------------------------------------------|

  Leaf N + 1:

      slot = 0                          slot = btrfs_header_nritems() - 1
  |--------------------------------------------------------------------|
  | [(CSUM CSUM 40161280), size 32] ... [((CSUM CSUM 40615936), size 8 |
  |--------------------------------------------------------------------|

Because we are at the last slot of leaf N, we call btrfs_next_leaf() to
find the next highest key, which releases the current path and then searches
for that next key. However after releasing the path and before finding that
next key, the item at slot 0 of leaf N + 1 gets moved to leaf N, due to a call
to ctree.c:push_leaf_left() (via ctree.c:split_leaf()), and therefore
btrfs_next_leaf() will returns us a path again with leaf N but with the slot
pointing to its new last key (CSUM CSUM 40161280). This new version of leaf N
is then:

    slot = 0                        slot = btrfs_header_nritems() - 2  slot = btrfs_header_nritems() - 1
  |----------------------------------------------------------------------------------------------------|
  | [(CSUM CSUM 39239680), size 8] ... [(CSUM CSUM 40116224), size 4]  [(CSUM CSUM 40161280), size 32] |
  |----------------------------------------------------------------------------------------------------|

And incorrecly using slot 0, makes us set next_offset to 39239680 and we jump
into the "insert:" label, which will set tmp to:

    tmp = min((sums->len - total_bytes) >> blocksize_bits,
        (next_offset - file_key.offset) >> blocksize_bits) =
    min((16384 - 0) >> 12, (39239680 - 40157184) >> 12) =
    min(4, (u64)-917504 = 18446744073708634112 >> 12) = 4

and

   ins_size = csum_size * tmp = 4 * 4 = 16 bytes.

In other words, we insert a new csum item in the tree with key
(CSUM_OBJECTID CSUM_KEY 40157184 = sums->bytenr) that contains the checksums
for all the data (4 blocks of 4096 bytes each = sums->len). Which is wrong,
because the item with key (CSUM CSUM 40161280) (the one that was moved from
leaf N + 1 to the end of leaf N) contains the old checksums of the last 12288
bytes of our data and won't get those old checksums removed.

So this leaves us 2 different checksums for 3 4kb blocks of data in the tree,
and breaks the logical rule:

   Key_N+1.offset >= Key_N.offset + length_of_data_its_checksums_cover

An obvious bad effect of this is that a subsequent csum tree lookup to get
the checksum of any of the blocks with logical offset of 40161280, 40165376
or 40169472 (the last 3 4kb blocks of file data), will get the old checksums.

Cc: stable@vger.kernel.org
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

27b9a812

Btrfs: Fix memory corruption by ulist_add_merge() on 32bit arch · 4eb1f66d

由 Takashi Iwai 提交于 7月 28, 2014

We've got bug reports that btrfs crashes when quota is enabled on
32bit kernel, typically with the Oops like below:
 BUG: unable to handle kernel NULL pointer dereference at 00000004
 IP: [<f9234590>] find_parent_nodes+0x360/0x1380 [btrfs]
 *pde = 00000000
 Oops: 0000 [#1] SMP
 CPU: 0 PID: 151 Comm: kworker/u8:2 Tainted: G S      W 3.15.2-1.gd43d97e-default #1
 Workqueue: btrfs-qgroup-rescan normal_work_helper [btrfs]
 task: f1478130 ti: f147c000 task.ti: f147c000
 EIP: 0060:[<f9234590>] EFLAGS: 00010213 CPU: 0
 EIP is at find_parent_nodes+0x360/0x1380 [btrfs]
 EAX: f147dda8 EBX: f147ddb0 ECX: 00000011 EDX: 00000000
 ESI: 00000000 EDI: f147dda4 EBP: f147ddf8 ESP: f147dd38
  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
 CR0: 8005003b CR2: 00000004 CR3: 00bf3000 CR4: 00000690
 Stack:
  00000000 00000000 f147dda4 00000050 00000001 00000000 00000001 00000050
  00000001 00000000 d3059000 00000001 00000022 000000a8 00000000 00000000
  00000000 000000a1 00000000 00000000 00000001 00000000 00000000 11800000
 Call Trace:
  [<f923564d>] __btrfs_find_all_roots+0x9d/0xf0 [btrfs]
  [<f9237bb1>] btrfs_qgroup_rescan_worker+0x401/0x760 [btrfs]
  [<f9206148>] normal_work_helper+0xc8/0x270 [btrfs]
  [<c025e38b>] process_one_work+0x11b/0x390
  [<c025eea1>] worker_thread+0x101/0x340
  [<c026432b>] kthread+0x9b/0xb0
  [<c0712a71>] ret_from_kernel_thread+0x21/0x30
  [<c0264290>] kthread_create_on_node+0x110/0x110

This indicates a NULL corruption in prefs_delayed list.  The further
investigation and bisection pointed that the call of ulist_add_merge()
results in the corruption.

ulist_add_merge() takes u64 as aux and writes a 64bit value into
old_aux.  The callers of this function in backref.c, however, pass a
pointer of a pointer to old_aux.  That is, the function overwrites
64bit value on 32bit pointer.  This caused a NULL in the adjacent
variable, in this case, prefs_delayed.

Here is a quick attempt to band-aid over this: a new function,
ulist_add_merge_ptr() is introduced to pass/store properly a pointer
value instead of u64.  There are still ugly void ** cast remaining
in the callers because void ** cannot be taken implicitly.  But, it's
safer than explicit cast to u64, anyway.

Bugzilla: https://bugzilla.novell.com/show_bug.cgi?id=887046
Cc: <stable@vger.kernel.org> [v3.11+]
Signed-off-by: NTakashi Iwai <tiwai@suse.de>
Signed-off-by: NChris Mason <clm@fb.com>

4eb1f66d

Btrfs: fix compressed write corruption on enospc · ce62003f

由 Liu Bo 提交于 7月 24, 2014

When failing to allocate space for the whole compressed extent, we'll
fallback to uncompressed IO, but we've forgotten to redirty the pages
which belong to this compressed extent, and these 'clean' pages will
simply skip 'submit' part and go to endio directly, at last we got data
corruption as we write nothing.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Tested-By: NMartin Steigerwald <martin@lichtvoll.de>
Signed-off-by: NChris Mason <clm@fb.com>

ce62003f

btrfs: correctly handle return from ulist_add · f90e579c

由 Mark Fasheh 提交于 7月 17, 2014

ulist_add() can return '1' on sucess, which qgroup_subtree_accounting()
doesn't take into account. As a result, that value can be bubbled up to
callers, causing an error to be printed. Fix this by only returning the
value of ulist_add() when it indicates an error.
Signed-off-by: NMark Fasheh <mfasheh@suse.de>
Signed-off-by: NChris Mason <clm@fb.com>

f90e579c

btrfs: qgroup: account shared subtrees during snapshot delete · 1152651a

由 Mark Fasheh 提交于 7月 17, 2014

During its tree walk, btrfs_drop_snapshot() will skip any shared
subtrees it encounters. This is incorrect when we have qgroups
turned on as those subtrees need to have their contents
accounted. In particular, the case we're concerned with is when
removing our snapshot root leaves the subtree with only one root
reference.

In those cases we need to find the last remaining root and add
each extent in the subtree to the corresponding qgroup exclusive
counts.

This patch implements the shared subtree walk and a new qgroup
operation, BTRFS_QGROUP_OPER_SUB_SUBTREE. When an operation of
this type is encountered during qgroup accounting, we search for
any root references to that extent and in the case that we find
only one reference left, we go ahead and do the math on it's
exclusive counts.
Signed-off-by: NMark Fasheh <mfasheh@suse.de>
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

1152651a

Btrfs: read lock extent buffer while walking backrefs · 6f7ff6d7

由 Filipe Manana 提交于 7月 02, 2014

Before processing the extent buffer, acquire a read lock on it, so
that we're safe against concurrent updates on the extent buffer.
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

6f7ff6d7

Btrfs: __btrfs_mod_ref should always use no_quota · e339a6b0

由 Josef Bacik 提交于 7月 02, 2014

Before I extended the no_quota arg to btrfs_dec/inc_ref because I didn't
understand how snapshot delete was using it and assumed that we needed the
quota operations there.  With Mark's work this has turned out to be not the
case, we _always_ need to use no_quota for btrfs_dec/inc_ref, so just drop the
argument and make __btrfs_mod_ref call it's process function with no_quota set
always.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

e339a6b0

btrfs: adjust statfs calculations according to raid profiles · ba7b6e62

由 David Sterba 提交于 7月 01, 2014

This has been discussed in thread:
http://thread.gmane.org/gmane.comp.file-systems.btrfs/32528

and this patch implements this proposal:
http://thread.gmane.org/gmane.comp.file-systems.btrfs/32536

Works fine for "clean" raid profiles where the raid factor correction
does the right job. Otherwise it's pessimistic and may show low space
although there's still some left.

The df nubmers are lightly wrong in case of mixed block groups, but this
is not a major usecase and can be addressed later.

The RAID56 numbers are wrong almost the same way as before and will be
addressed separately.

CC: Hugo Mills <hugo@carfax.org.uk>
CC: cwillu <cwillu@cwillu.com>
CC: Josef Bacik <jbacik@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

ba7b6e62

14 8月, 2014 4 次提交

locks: move locks_free_lock calls in do_fcntl_add_lease outside spinlock · 2dfb928f

由 Jeff Layton 提交于 8月 11, 2014

There's no need to call locks_free_lock here while still holding the
i_lock. Defer that until the lock has been dropped.
Acked-by: NJ. Bruce Fields <bfields@fieldses.org>
Signed-off-by: NJeff Layton <jlayton@primarydata.com>

2dfb928f

locks: defer freeing locks in locks_delete_lock until after i_lock has been dropped · ed9814d8

由 Jeff Layton 提交于 8月 11, 2014

In commit 72f98e72 (locks: turn lock_flocks into a spinlock), we
moved from using the BKL to a global spinlock. With this change, we lost
the ability to block in the fl_release_private operation.

This is problematic for NFS (and probably some other filesystems as
well). Add a new list_head argument to locks_delete_lock. If that
argument is non-NULL, then queue any locks that we want to free to the
list instead of freeing them.

Then, add a new locks_dispose_list function that will walk such a list
and call locks_free_lock on them after the i_lock has been dropped.

Finally, change all of the callers of locks_delete_lock to pass in a
list_head, except for lease_modify. That function can be called long
after the i_lock has been acquired. Deferring the freeing of a lease
after unlocking it in that function is non-trivial until we overhaul
some of the spinlocking in the lease code.

Currently though, no filesystem that sets fl_release_private supports
leases, so this is not currently a problem. We'll eventually want to
make the same change in the lease code, but it needs a lot more work
before we can reasonably do so.
Acked-by: NJ. Bruce Fields <bfields@fieldses.org>
Signed-off-by: NJeff Layton <jlayton@primarydata.com>

ed9814d8

locks: don't reuse file_lock in __posix_lock_file · b84d49f9

由 Jeff Layton 提交于 8月 12, 2014

Currently in the case where a new file lock completely replaces the old
one, we end up overwriting the existing lock with the new info. This
means that we have to call fl_release_private inside i_lock. Change the
code to instead copy the info to new_fl, insert that lock into the
correct spot and then delete the old lock. In a later patch, we'll defer
the freeing of the old lock until after the i_lock has been dropped.
Acked-by: NJ. Bruce Fields <bfields@fieldses.org>
Signed-off-by: NJeff Layton <jlayton@primarydata.com>

b84d49f9

Add sparse file support to SMB2/SMB3 mounts · 3d1a3745

由 Steve French 提交于 8月 11, 2014

Many Linux filesystes make a file "sparse" when extending
a file with ftruncate. This does work for CIFS to Samba
(only) but not for SMB2/SMB3 (to Samba or Windows) since
there is a "set sparse" fsctl which is supposed to be
sent to mark a file as sparse.

This patch marks a file as sparse by sending this simple
set sparse fsctl if it is extended more than 2 pages.
It has been tested to Windows 8.1, Samba and various
SMB2/SMB3 servers which do support setting sparse (and
MacOS which does not appear to support the fsctl yet).
If a server share does not support setting a file
as sparse, then we do not retry setting sparse on that
share.

The disk space savings for sparse files can be quite
large (even more significant on Windows servers than Samba).
Signed-off-by: NSteve French <smfrench@gmail.com>
Reviewed-by: NShirish Pargaonkar <spargaonkar@suse.com>

3d1a3745

13 8月, 2014 1 次提交
- S
  Add missing definitions for CIFS File System Attributes · 8ae31240
  由 Steve French 提交于 8月 11, 2014
```
Signed-off-by: NSteve French <smfrench@gmail.com>
Acked-by: NShirish Pargaonkar <spargaonkar@suse.com>
```
  8ae31240
12 8月, 2014 4 次提交

reiserfs: Fix use after free in journal teardown · 01777836

由 Jan Kara 提交于 8月 06, 2014

If do_journal_release() races with do_journal_end() which requeues
delayed works for transaction flushing, we can leave work items for
flushing outstanding transactions queued while freeing them. That
results in use after free and possible crash in run_timers_softirq().

Fix the problem by not requeueing works if superblock is being shut down
(MS_ACTIVE not set) and using cancel_delayed_work_sync() in
do_journal_release().

CC: stable@vger.kernel.org
Signed-off-by: NJan Kara <jack@suse.cz>

01777836

locks: don't call locks_release_private from locks_copy_lock · 566709bd

由 Jeff Layton 提交于 8月 11, 2014

All callers of locks_copy_lock pass in a brand new file_lock struct, so
there's no need to call locks_release_private on it. Replace that with
a warning that fires in the event that we receive a target lock that
doesn't look like it's properly initialized.
Acked-by: NJ. Bruce Fields <bfields@fieldses.org>
Signed-off-by: NJeff Layton <jlayton@primarydata.com>

566709bd

locks: show delegations as "DELEG" in /proc/locks · 8144f1f6

由 Jeff Layton 提交于 8月 11, 2014

Now that they are a distinct lease type, show them as such.

Cc: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: NJeff Layton <jlayton@primarydata.com>

8144f1f6

fix copy_tree() regression · 12a5b529

由 Al Viro 提交于 8月 10, 2014

Since 3.14 we had copy_tree() get the shadowing wrong - if we had one
vfsmount shadowing another (i.e. if A is a slave of B, C is mounted
on A/foo, then D got mounted on B/foo creating D' on A/foo shadowed
by C), copy_tree() of A would make a copy of D' shadow the the copy of
C, not the other way around.

It's easy to fix, fortunately - just make sure that mount follows
the one that shadows it in mnt_child as well as in mnt_hash, and when
copy_tree() decides to attach a new mount, check if the last child
it has added to the same parent should be shadowing the new one.
And if it should, just use the same logics commit_tree() has - put the
new mount into the hash and children lists right after the one that
should shadow it.

Cc: stable@vger.kernel.org [3.14 and later]
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

12a5b529

11 8月, 2014 2 次提交

cifs: remove unused function cifs_oplock_break_wait · e91259f3

由 Vincent Stehlé 提交于 7月 21, 2014

Commit 74316201 ("sched: Remove proliferation of wait_on_bit() action
functions") has removed the call to cifs_oplock_break_wait, making this
function unused; remove it.

This fixes the following compilation warning:

  fs/cifs/misc.c:578:1: warning: ‘cifs_oplock_break_wait’ defined but not used [-Wunused-function]
Signed-off-by: NVincent Stehlé <vincent.stehle@laposte.net>
Cc: Steve French <sfrench@samba.org>
Signed-off-by: NSteve French <smfrench@gmail.com>

e91259f3

Revert "proc: Point /proc/{mounts,net} at /proc/thread-self/{mounts,net}... · 155134fe

由 Linus Torvalds 提交于 8月 10, 2014

Revert "proc: Point /proc/{mounts,net} at /proc/thread-self/{mounts,net} instead of /proc/self/{mounts,net}"

This reverts commits 344470ca and e8132440.

It turns out that the exact path in the symlink matters, if for somewhat
unfortunate reasons: some apparmor configurations don't allow dhclient
access to the per-thread /proc files. As reported by Jörg Otte:

audit: type=1400 audit(1407684227.003:28): apparmor="DENIED"
operation="open" profile="/sbin/dhclient"
name="/proc/1540/task/1540/net/dev" pid=1540 comm="dhclient"
requested_mask="r" denied_mask="r" fsuid=0 ouid=0

so we had better revert this for now. We might be able to work around
this in practice by only using the per-thread symlinks if the thread
isn't the thread group leader, and if the namespaces differ between
threads (which basically never happens).

We'll see. In the meantime, the revert was made to be intentionally easy.
Reported-by: NJörg Otte <jrg.otte@gmail.com>
Acked-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

155134fe

09 8月, 2014 7 次提交

shm: add sealing API · 40e041a2

由 David Herrmann 提交于 8月 08, 2014

If two processes share a common memory region, they usually want some
guarantees to allow safe access. This often includes:
  - one side cannot overwrite data while the other reads it
  - one side cannot shrink the buffer while the other accesses it
  - one side cannot grow the buffer beyond previously set boundaries

If there is a trust-relationship between both parties, there is no need
for policy enforcement.  However, if there's no trust relationship (eg.,
for general-purpose IPC) sharing memory-regions is highly fragile and
often not possible without local copies.  Look at the following two
use-cases:

  1) A graphics client wants to share its rendering-buffer with a
     graphics-server. The memory-region is allocated by the client for
     read/write access and a second FD is passed to the server. While
     scanning out from the memory region, the server has no guarantee that
     the client doesn't shrink the buffer at any time, requiring rather
     cumbersome SIGBUS handling.
  2) A process wants to perform an RPC on another process. To avoid huge
     bandwidth consumption, zero-copy is preferred. After a message is
     assembled in-memory and a FD is passed to the remote side, both sides
     want to be sure that neither modifies this shared copy, anymore. The
     source may have put sensible data into the message without a separate
     copy and the target may want to parse the message inline, to avoid a
     local copy.

While SIGBUS handling, POSIX mandatory locking and MAP_DENYWRITE provide
ways to achieve most of this, the first one is unproportionally ugly to
use in libraries and the latter two are broken/racy or even disabled due
to denial of service attacks.

This patch introduces the concept of SEALING.  If you seal a file, a
specific set of operations is blocked on that file forever.  Unlike locks,
seals can only be set, never removed.  Hence, once you verified a specific
set of seals is set, you're guaranteed that no-one can perform the blocked
operations on this file, anymore.

An initial set of SEALS is introduced by this patch:
  - SHRINK: If SEAL_SHRINK is set, the file in question cannot be reduced
            in size. This affects ftruncate() and open(O_TRUNC).
  - GROW: If SEAL_GROW is set, the file in question cannot be increased
          in size. This affects ftruncate(), fallocate() and write().
  - WRITE: If SEAL_WRITE is set, no write operations (besides resizing)
           are possible. This affects fallocate(PUNCH_HOLE), mmap() and
           write().
  - SEAL: If SEAL_SEAL is set, no further seals can be added to a file.
          This basically prevents the F_ADD_SEAL operation on a file and
          can be set to prevent others from adding further seals that you
          don't want.

The described use-cases can easily use these seals to provide safe use
without any trust-relationship:

  1) The graphics server can verify that a passed file-descriptor has
     SEAL_SHRINK set. This allows safe scanout, while the client is
     allowed to increase buffer size for window-resizing on-the-fly.
     Concurrent writes are explicitly allowed.
  2) For general-purpose IPC, both processes can verify that SEAL_SHRINK,
     SEAL_GROW and SEAL_WRITE are set. This guarantees that neither
     process can modify the data while the other side parses it.
     Furthermore, it guarantees that even with writable FDs passed to the
     peer, it cannot increase the size to hit memory-limits of the source
     process (in case the file-storage is accounted to the source).

The new API is an extension to fcntl(), adding two new commands:
  F_GET_SEALS: Return a bitset describing the seals on the file. This
               can be called on any FD if the underlying file supports
               sealing.
  F_ADD_SEALS: Change the seals of a given file. This requires WRITE
               access to the file and F_SEAL_SEAL may not already be set.
               Furthermore, the underlying file must support sealing and
               there may not be any existing shared mapping of that file.
               Otherwise, EBADF/EPERM is returned.
               The given seals are _added_ to the existing set of seals
               on the file. You cannot remove seals again.

The fcntl() handler is currently specific to shmem and disabled on all
files. A file needs to explicitly support sealing for this interface to
work. A separate syscall is added in a follow-up, which creates files that
support sealing. There is no intention to support this on other
file-systems. Semantics are unclear for non-volatile files and we lack any
use-case right now. Therefore, the implementation is specific to shmem.
Signed-off-by: NDavid Herrmann <dh.herrmann@gmail.com>
Acked-by: NHugh Dickins <hughd@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Ryan Lortie <desrt@desrt.ca>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Daniel Mack <zonque@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

40e041a2

mm: allow drivers to prevent new writable mappings · 4bb5f5d9

由 David Herrmann 提交于 8月 08, 2014

This patch (of 6):

The i_mmap_writable field counts existing writable mappings of an
address_space.  To allow drivers to prevent new writable mappings, make
this counter signed and prevent new writable mappings if it is negative.
This is modelled after i_writecount and DENYWRITE.

This will be required by the shmem-sealing infrastructure to prevent any
new writable mappings after the WRITE seal has been set.  In case there
exists a writable mapping, this operation will fail with EBUSY.

Note that we rely on the fact that iff you already own a writable mapping,
you can increase the counter without using the helpers.  This is the same
that we do for i_writecount.
Signed-off-by: NDavid Herrmann <dh.herrmann@gmail.com>
Acked-by: NHugh Dickins <hughd@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Ryan Lortie <desrt@desrt.ca>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Daniel Mack <zonque@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4bb5f5d9

fs/dlm/debug_fs.c: remove unnecessary null test before debugfs_remove · e0d9bf4c

由 Fabian Frederick 提交于 8月 08, 2014

This fixes checkpatch warning:

  WARNING: debugfs_remove(NULL) is safe this check is probably not required
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Cc: Christine Caulfield <ccaulfie@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e0d9bf4c

initramfs: support initramfs that is bigger than 2GiB · d97b07c5

由 Yinghai Lu 提交于 8月 08, 2014

Now with 64bit bzImage and kexec tools, we support ramdisk that size is
bigger than 2g, as we could put it above 4G.

Found compressed initramfs image could not be decompressed properly.  It
turns out that image length is int during decompress detection, and it
will become < 0 when length is more than 2G.  Furthermore, during
decompressing len as int is used for inbuf count, that has problem too.

Change len to long, that should be ok as on 32 bit platform long is
32bits.

Tested with following compressed initramfs image as root with kexec.
	gzip, bzip2, xz, lzma, lzop, lz4.
run time for populate_rootfs():
   size        name       Nehalem-EX  Westmere-EX  Ivybridge-EX
 9034400256 root_img     :   26s           24s          30s
 3561095057 root_img.lz4 :   28s           27s          27s
 3459554629 root_img.lzo :   29s           29s          28s
 3219399480 root_img.gz  :   64s           62s          49s
 2251594592 root_img.xz  :  262s          260s         183s
 2226366598 root_img.lzma:  386s          376s         277s
 2901482513 root_img.bz2 :  635s          599s
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Rashika Kheria <rashika.kheria@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Kyungsik Lee <kyungsik.lee@lge.com>
Cc: P J P <ppandit@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: "Daniel M. Weeks" <dan@danweeks.net>
Cc: Alexandre Courbot <acourbot@nvidia.com>
Cc: Jan Beulich <JBeulich@suse.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d97b07c5

fs/qnx6: update debugging to current functions · fa5a7a41

由 Fabian Frederick 提交于 8月 08, 2014

Add DDEBUG in Makefile when CONFIG_QNX6FS_DEBUG is set.  All QNX6DEBUG
messages are replaced by pr_debug which means debugging will be emitted in
debug level only and no more in error and info levels.  debug uses now
pr_fmt and __func__

QNX6DEBUG definition has been removed.
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Cc: Joe Perches <joe@perches.com>
Cc: Kai Bankett <chaosman@ontika.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fa5a7a41

fs/qnx6: use pr_fmt and __func__ in logging · e6c32616

由 Fabian Frederick 提交于 8月 08, 2014

Remove "qnx6:" and "qnx6: " from each logging instruction.
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Cc: Joe Perches <joe@perches.com>
Cc: Kai Bankett <chaosman@ontika.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e6c32616

fs/qnx6: convert printk to pr_foo() · e00d5b5a

由 Fabian Frederick 提交于 8月 08, 2014

Use current logging functions.

Coalesce formats.
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Cc: Joe Perches <joe@perches.com>
Cc: Kai Bankett <chaosman@ontika.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e00d5b5a

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功