提交 · 55956b59df336f6738da916dbb520b6e37df9fbd · openanolis / cloud-kernel

25 5月, 2018 2 次提交

vfs: Allow userns root to call mknod on owned filesystems. · 55956b59

由 Eric W. Biederman 提交于 5月 23, 2018

These filesystems already always set SB_I_NODEV so mknod will not be
useful for gaining control of any devices no matter their permissions.
This will allow overlayfs and applications like to fakeroot to use
device nodes to represent things on disk.
Acked-by: NSeth Forshee <seth.forshee@canonical.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

55956b59

vfs: Don't allow changing the link count of an inode with an invalid uid or gid · 593d1ce8

由 Eric W. Biederman 提交于 9月 14, 2017

Changing the link count of an inode via unlink or link will cause a
write back of that inode. If the uids or gids are invalid (aka not known
to the kernel) writing the inode back may change the uid or gid in the
filesystem. To prevent possible filesystem and to avoid the need for
filesystem maintainers to worry about it don't allow operations on
inodes with an invalid uid or gid.
Acked-by: NSeth Forshee <seth.forshee@canonical.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

593d1ce8

26 4月, 2018 4 次提交

ext4: add MODULE_SOFTDEP to ensure crc32c is included in the initramfs · 7ef79ad5

由 Theodore Ts'o 提交于 4月 26, 2018

Fixes: a45403b5 ("ext4: always initialize the crc32c checksum driver")
Reported-by: NFrançois Valenduc <francoisvalenduc@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

7ef79ad5

cifs: smbd: Avoid allocating iov on the stack · 8bcda1d2

由 Long Li 提交于 4月 17, 2018

It's not necessary to allocate another iov when going through the buffers
in smbd_send() through RDMA send.

Remove it to reduce stack size.

Thanks to Matt for spotting a printk typo in the earlier version of this.

CC: Matt Redfearn <matt.redfearn@mips.com>
Signed-off-by: NLong Li <longli@microsoft.com>
Acked-by: NRonnie Sahlberg <lsahlber@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NSteve French <smfrench@gmail.com>

8bcda1d2

cifs: smbd: Don't use RDMA read/write when signing is used · bb4c0419

由 Long Li 提交于 4月 17, 2018

SMB server will not sign data transferred through RDMA read/write. When
signing is used, it's a good idea to have all the data signed.

In this case, use RDMA send/recv for all data transfers. This will degrade
performance as this is not generally configured in RDMA environemnt. So
warn the user on signing and RDMA send/recv.
Signed-off-by: NLong Li <longli@microsoft.com>
Acked-by: NRonnie Sahlberg <lsahlber@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NSteve French <smfrench@gmail.com>

bb4c0419

SMB311: Fix reconnect · 0d5ec281

由 Steve French 提交于 4月 22, 2018

The preauth hash was not being recalculated properly on reconnect
of SMB3.11 dialect mounts (which caused access denied repeatedly
on auto-reconnect).

Fixes: 8bd68c6e ("CIFS: implement v3.11 preauth integrity")
Signed-off-by: NSteve French <smfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>

0d5ec281

24 4月, 2018 3 次提交

ext4: fix bitmap position validation · 22be37ac

由 Lukas Czerner 提交于 4月 24, 2018

Currently in ext4_valid_block_bitmap() we expect the bitmap to be
positioned anywhere between 0 and s_blocksize clusters, but that's
wrong because the bitmap can be placed anywhere in the block group. This
causes false positives when validating bitmaps on perfectly valid file
system layouts. Fix it by checking whether the bitmap is within the group
boundary.

The problem can be reproduced using the following

mkfs -t ext3 -E stride=256 /dev/vdb1
mount /dev/vdb1 /mnt/test
cd /mnt/test
wget https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.16.3.tar.xz
tar xf linux-4.16.3.tar.xz

This will result in the warnings in the logs

EXT4-fs error (device vdb1): ext4_validate_block_bitmap:399: comm tar: bg 84: block 2774529: invalid block bitmap

[ Changed slightly for clarity and to not drop a overflow test -- TYT ]
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reported-by: NIlya Dryomov <idryomov@gmail.com>
Fixes: 7dac4a17 ("ext4: add validity checks for bitmap block numbers")
Cc: stable@vger.kernel.org

22be37ac

SMB3: Fix 3.11 encryption to Windows and handle encrypted smb3 tcon · 23657ad7

由 Steve French 提交于 4月 22, 2018

Temporarily disable AES-GCM, as AES-CCM is only currently
enabled mechanism on client side.  This fixes SMB3.11
encrypted mounts to Windows.

Also the tree connect request itself should be encrypted if
requested encryption ("seal" on mount), in addition we should be
enabling encryption in 3.11 based on whether we got any valid
encryption ciphers back in negprot (the corresponding session flag is
not set as it is in 3.0 and 3.02)
Signed-off-by: NSteve French <smfrench@gmail.com>
Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
CC: Stable <stable@vger.kernel.org>

23657ad7

CIFS: set *resp_buf_type to NO_BUFFER on error · 117e3b7f

由 Steve French 提交于 4月 22, 2018

Dan Carpenter had pointed this out a while ago, but the code around
this had changed so wasn't causing any problems since that field
was not used in this error path.

Still, it is cleaner to always initialize this field, so changing
the error path to set it.
Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
CC: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NSteve French <smfrench@gmail.com>

117e3b7f

23 4月, 2018 1 次提交

ceph: check if mds create snaprealm when setting quota · f1919826

由 Yan, Zheng 提交于 4月 08, 2018

If mds does not, return -EOPNOTSUPP.

Link: http://tracker.ceph.com/issues/23491Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

f1919826

21 4月, 2018 13 次提交

fs, elf: don't complain MAP_FIXED_NOREPLACE unless -EEXIST error · d23a61ee

由 Tetsuo Handa 提交于 4月 20, 2018

Commit 4ed28639 ("fs, elf: drop MAP_FIXED usage from elf_map") is
printing spurious messages under memory pressure due to map_addr == -ENOMEM.

9794 (a.out): Uhuuh, elf segment at 00007f2e34738000(fffffffffffffff4) requested but the memory is mapped already
14104 (a.out): Uhuuh, elf segment at 00007f34fd76c000(fffffffffffffff4) requested but the memory is mapped already
16843 (a.out): Uhuuh, elf segment at 00007f930ecc7000(fffffffffffffff4) requested but the memory is mapped already

Complain only if -EEXIST, and use %px for printing the address.

Link: http://lkml.kernel.org/r/201804182307.FAC17665.SFMOFJVFtHOLOQ@I-love.SAKURA.ne.jp
Fixes: 4ed28639 ("fs, elf: drop MAP_FIXED usage from elf_map") is
Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Andrei Vagin <avagin@openvz.org>
Cc: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Kees Cook <keescook@chromium.org>
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d23a61ee

proc: fix /proc/loadavg regression · 9a1015b3

由 Alexey Dobriyan 提交于 4月 20, 2018

Commit 95846ecf ("pid: replace pid bitmap implementation with IDR
API") changed last field of /proc/loadavg (last pid allocated) to be off
by one:

	# unshare -p -f --mount-proc cat /proc/loadavg
	0.00 0.00 0.00 1/60 2	<===

It should be 1 after first fork into pid namespace.

This is formally a regression but given how useless this field is I
don't think anyone is affected.

Bug was found by /proc testsuite!

Link: http://lkml.kernel.org/r/20180413175408.GA27246@avx2
Fixes: 95846ecf ("pid: replace pid bitmap implementation with IDR API")
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Gargi Sharma <gs051095@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9a1015b3

proc: revalidate kernel thread inodes to root:root · 2e0ad552

由 Alexey Dobriyan 提交于 4月 20, 2018

task_dump_owner() has the following code:

	mm = task->mm;
	if (mm) {
		if (get_dumpable(mm) != SUID_DUMP_USER) {
			uid = ...
		}
	}

Check for ->mm is buggy -- kernel thread might be borrowing mm
and inode will go to some random uid:gid pair.

Link: http://lkml.kernel.org/r/20180412220109.GA20978@avx2Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2e0ad552

autofs: mount point create should honour passed in mode · 1e630665

由 Ian Kent 提交于 4月 20, 2018

The autofs file system mkdir inode operation blindly sets the created
directory mode to S_IFDIR | 0555, ingoring the passed in mode, which can
cause selinux dac_override denials.

But the function also checks if the caller is the daemon (as no-one else
should be able to do anything here) so there's no point in not honouring
the passed in mode, allowing the daemon to set appropriate mode when
required.

Link: http://lkml.kernel.org/r/152361593601.8051.14014139124905996173.stgit@pluto.themaw.netSigned-off-by: NIan Kent <raven@themaw.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1e630665

writeback: safer lock nesting · 2e898e4c

由 Greg Thelen 提交于 4月 20, 2018

lock_page_memcg()/unlock_page_memcg() use spin_lock_irqsave/restore() if
the page's memcg is undergoing move accounting, which occurs when a
process leaves its memcg for a new one that has
memory.move_charge_at_immigrate set.

unlocked_inode_to_wb_begin,end() use spin_lock_irq/spin_unlock_irq() if
the given inode is switching writeback domains.  Switches occur when
enough writes are issued from a new domain.

This existing pattern is thus suspicious:
    lock_page_memcg(page);
    unlocked_inode_to_wb_begin(inode, &locked);
    ...
    unlocked_inode_to_wb_end(inode, locked);
    unlock_page_memcg(page);

If both inode switch and process memcg migration are both in-flight then
unlocked_inode_to_wb_end() will unconditionally enable interrupts while
still holding the lock_page_memcg() irq spinlock.  This suggests the
possibility of deadlock if an interrupt occurs before unlock_page_memcg().

    truncate
    __cancel_dirty_page
    lock_page_memcg
    unlocked_inode_to_wb_begin
    unlocked_inode_to_wb_end
    <interrupts mistakenly enabled>
                                    <interrupt>
                                    end_page_writeback
                                    test_clear_page_writeback
                                    lock_page_memcg
                                    <deadlock>
    unlock_page_memcg

Due to configuration limitations this deadlock is not currently possible
because we don't mix cgroup writeback (a cgroupv2 feature) and
memory.move_charge_at_immigrate (a cgroupv1 feature).

If the kernel is hacked to always claim inode switching and memcg
moving_account, then this script triggers lockup in less than a minute:

  cd /mnt/cgroup/memory
  mkdir a b
  echo 1 > a/memory.move_charge_at_immigrate
  echo 1 > b/memory.move_charge_at_immigrate
  (
    echo $BASHPID > a/cgroup.procs
    while true; do
      dd if=/dev/zero of=/mnt/big bs=1M count=256
    done
  ) &
  while true; do
    sync
  done &
  sleep 1h &
  SLEEP=$!
  while true; do
    echo $SLEEP > a/cgroup.procs
    echo $SLEEP > b/cgroup.procs
  done

The deadlock does not seem possible, so it's debatable if there's any
reason to modify the kernel.  I suggest we should to prevent future
surprises.  And Wang Long said "this deadlock occurs three times in our
environment", so there's more reason to apply this, even to stable.
Stable 4.4 has minor conflicts applying this patch.  For a clean 4.4 patch
see "[PATCH for-4.4] writeback: safer lock nesting"
https://lkml.org/lkml/2018/4/11/146

Wang Long said "this deadlock occurs three times in our environment"

[gthelen@google.com: v4]
  Link: http://lkml.kernel.org/r/20180411084653.254724-1-gthelen@google.com
[akpm@linux-foundation.org: comment tweaks, struct initialization simplification]
Change-Id: Ibb773e8045852978f6207074491d262f1b3fb613
Link: http://lkml.kernel.org/r/20180410005908.167976-1-gthelen@google.com
Fixes: 682aa8e1 ("writeback: implement unlocked_inode_to_wb transaction and use it for stat updates")
Signed-off-by: NGreg Thelen <gthelen@google.com>
Reported-by: NWang Long <wanglong19@meituan.com>
Acked-by: NWang Long <wanglong19@meituan.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: <stable@vger.kernel.org>	[v4.2+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2e898e4c

mm, pagemap: fix swap offset value for PMD migration entry · 88c28f24

由 Huang Ying 提交于 4月 20, 2018

The swap offset reported by /proc/<pid>/pagemap may be not correct for
PMD migration entries.  If addr passed into pagemap_pmd_range() isn't
aligned with PMD start address, the swap offset reported doesn't
reflect this.  And in the loop to report information of each sub-page,
the swap offset isn't increased accordingly as that for PFN.

This may happen after opening /proc/<pid>/pagemap and seeking to a page
whose address doesn't align with a PMD start address.  I have verified
this with a simple test program.

BTW: migration swap entries have PFN information, do we need to restrict
whether to show them?

[akpm@linux-foundation.org: fix typo, per Huang, Ying]
Link: http://lkml.kernel.org/r/20180408033737.10897-1-ying.huang@intel.comSigned-off-by: N"Huang, Ying" <ying.huang@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrei Vagin <avagin@openvz.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: "Jerome Glisse" <jglisse@redhat.com>
Cc: Daniel Colascione <dancol@google.com>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

88c28f24

CIFS: fix typo in cifs_dbg · 596632de

由 Aurelien Aptel 提交于 4月 19, 2018

Signed-off-by: NAurelien Aptel <aaptel@suse.com>
Signed-off-by: NSteve French <smfrench@gmail.com>
Reported-by: NLong Li <longli@microsoft.com>

596632de

cifs: do not allow creating sockets except with SMB1 posix exensions · 1d0cffa6

由 Steve French 提交于 4月 20, 2018

RHBZ: 1453123

Since at least the 3.10 kernel and likely a lot earlier we have
not been able to create unix domain sockets in a cifs share
when mounted using the SFU mount option (except when mounted
with the cifs unix extensions to Samba e.g.)
Trying to create a socket, for example using the af_unix command from
xfstests will cause :
BUG: unable to handle kernel NULL pointer dereference at 00000000
00000040

Since no one uses or depends on being able to create unix domains sockets
on a cifs share the easiest fix to stop this vulnerability is to simply
not allow creation of any other special files than char or block devices
when sfu is used.

Added update to Ronnie's patch to handle a tcon link leak, and
to address a buf leak noticed by Gustavo and Colin.
Acked-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
CC:  Colin Ian King <colin.king@canonical.com>
Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
Reported-by: NEryu Guan <eguan@redhat.com>
Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: NSteve French <smfrench@gmail.com>
Cc: stable@vger.kernel.org

1d0cffa6

cifs: smbd: Dump SMB packet when configured · ff30b89e

由 Long Li 提交于 4月 17, 2018

When sending through SMB Direct, also dump the packet in SMB send path.

Also fixed a typo in debug message.
Signed-off-by: NLong Li <longli@microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: NSteve French <smfrench@gmail.com>
Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>

ff30b89e

btrfs: print-tree: debugging output enhancement · c0872323

由 Qu Wenruo 提交于 4月 11, 2018

This patch enhances the following things:

- tree block header
  * add generation and owner output for node and leaf
- node pointer generation output
- allow btrfs_print_tree() to not follow nodes
  * just like btrfs-progs

Please note that, although function btrfs_print_tree() is not called by
anyone right now, it's still a pretty useful function to debug kernel.
So that function is still kept for later use.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Reviewed-by: NLu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

c0872323

btrfs: Fix race condition between delayed refs and blockgroup removal · 5e388e95

由 Nikolay Borisov 提交于 4月 18, 2018

When the delayed refs for a head are all run, eventually
cleanup_ref_head is called which (in case of deletion) obtains a
reference for the relevant btrfs_space_info struct by querying the bg
for the range. This is problematic because when the last extent of a
bg is deleted a race window emerges between removal of that bg and the
subsequent invocation of cleanup_ref_head. This can result in cache being null
and either a null pointer dereference or assertion failure.

	task: ffff8d04d31ed080 task.stack: ffff9e5dc10cc000
	RIP: 0010:assfail.constprop.78+0x18/0x1a [btrfs]
	RSP: 0018:ffff9e5dc10cfbe8 EFLAGS: 00010292
	RAX: 0000000000000044 RBX: 0000000000000000 RCX: 0000000000000000
	RDX: ffff8d04ffc1f868 RSI: ffff8d04ffc178c8 RDI: ffff8d04ffc178c8
	RBP: ffff8d04d29e5ea0 R08: 00000000000001f0 R09: 0000000000000001
	R10: ffff9e5dc0507d58 R11: 0000000000000001 R12: ffff8d04d29e5ea0
	R13: ffff8d04d29e5f08 R14: ffff8d04efe29b40 R15: ffff8d04efe203e0
	FS:  00007fbf58ead500(0000) GS:ffff8d04ffc00000(0000) knlGS:0000000000000000
	CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	CR2: 00007fe6c6975648 CR3: 0000000013b2a000 CR4: 00000000000006f0
	DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
	DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
	Call Trace:
	 __btrfs_run_delayed_refs+0x10e7/0x12c0 [btrfs]
	 btrfs_run_delayed_refs+0x68/0x250 [btrfs]
	 btrfs_should_end_transaction+0x42/0x60 [btrfs]
	 btrfs_truncate_inode_items+0xaac/0xfc0 [btrfs]
	 btrfs_evict_inode+0x4c6/0x5c0 [btrfs]
	 evict+0xc6/0x190
	 do_unlinkat+0x19c/0x300
	 do_syscall_64+0x74/0x140
	 entry_SYSCALL_64_after_hwframe+0x3d/0xa2
	RIP: 0033:0x7fbf589c57a7

To fix this, introduce a new flag "is_system" to head_ref structs,
which is populated at insertion time. This allows to decouple the
querying for the spaceinfo from querying the possibly deleted bg.

Fixes: d7eae340 ("Btrfs: rework delayed ref total_bytes_pinned accounting")
CC: stable@vger.kernel.org # 4.14+
Suggested-by: NOmar Sandoval <osandov@osandov.com>
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

5e388e95

vfs: Undo an overly zealous MS_RDONLY -> SB_RDONLY conversion · a9e5b732

由 David Howells 提交于 4月 20, 2018

In do_mount() when the MS_* flags are being converted to MNT_* flags,
MS_RDONLY got accidentally convered to SB_RDONLY.

Undo this change.

Fixes: e462ec50 ("VFS: Differentiate mount flags (MS_*) from internal superblock flags")
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a9e5b732

afs: Fix server record deletion · 66062592

由 David Howells 提交于 4月 18, 2018

AFS server records get removed from the net->fs_servers tree when
they're deleted, but not from the net->fs_addresses{4,6} lists, which
can lead to an oops in afs_find_server() when a server record has been
removed, for instance during rmmod.

Fix this by deleting the record from the by-address lists before posting
it for RCU destruction.

The reason this hasn't been noticed before is that the fileserver keeps
probing the local cache manager, thereby keeping the service record
alive, so the oops would only happen when a fileserver eventually gets
bored and stops pinging or if the module gets rmmod'd and a call comes
in from the fileserver during the window between the server records
being destroyed and the socket being closed.

The oops looks something like:

  BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
  ...
  Workqueue: kafsd afs_process_async_call [kafs]
  RIP: 0010:afs_find_server+0x271/0x36f [kafs]
  ...
  Call Trace:
   afs_deliver_cb_init_call_back_state3+0x1f2/0x21f [kafs]
   afs_deliver_to_call+0x1ee/0x5e8 [kafs]
   afs_process_async_call+0x5b/0xd0 [kafs]
   process_one_work+0x2c2/0x504
   worker_thread+0x1d4/0x2ac
   kthread+0x11f/0x127
   ret_from_fork+0x24/0x30

Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

66062592

20 4月, 2018 1 次提交

Don't leak MNT_INTERNAL away from internal mounts · 16a34adb

由 Al Viro 提交于 4月 19, 2018

We want it only for the stuff created by SB_KERNMOUNT mounts, *not* for
their copies.  As it is, creating a deep stack of bindings of /proc/*/ns/*
somewhere in a new namespace and exiting yields a stack overflow.

Cc: stable@kernel.org
Reported-by: NAlexander Aring <aring@mojatatu.com>
Bisected-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Tested-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Tested-by: NAlexander Aring <aring@mojatatu.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

16a34adb

19 4月, 2018 2 次提交

cifs: smbd: Check for iov length on sending the last iov · ab60ee7b

由 Long Li 提交于 4月 17, 2018

When sending the last iov that breaks into smaller buffers to fit the
transfer size, it's necessary to check if this is the last iov.

If this is the latest iov, stop and proceed to send pages.
Signed-off-by: NLong Li <longli@microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: NSteve French <stfrench@microsoft.com>
Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>

ab60ee7b

btrfs: fix unaligned access in readdir · 92d32170

由 David Sterba 提交于 4月 16, 2018

The last update to readdir introduced a temporary buffer to store the
emitted readdir data, but as there are file names of variable length,
there's a lot of unaligned access.

This was observed on a sparc64 machine:

  Kernel unaligned access at TPC[102f3080] btrfs_real_readdir+0x51c/0x718 [btrfs]

Fixes: 23b5ec74 ("btrfs: fix readdir deadlock with pagefault")
CC: stable@vger.kernel.org # 4.14+
Reported-and-tested-by: NRené Rebe <rene@exactcode.com>
Reviewed-by: NLiu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

92d32170

18 4月, 2018 8 次提交

ext4: set h_journal if there is a failure starting a reserved handle · b2569260

由 Theodore Ts'o 提交于 4月 18, 2018

If ext4 tries to start a reserved handle via
jbd2_journal_start_reserved(), and the journal has been aborted, this
can result in a NULL pointer dereference.  This is because the fields
h_journal and h_transaction in the handle structure share the same
memory, via a union, so jbd2_journal_start_reserved() will clear
h_journal before calling start_this_handle().  If this function fails
due to an aborted handle, h_journal will still be NULL, and the call
to jbd2_journal_free_reserved() will pass a NULL journal to
sub_reserve_credits().

This can be reproduced by running "kvm-xfstests -c dioread_nolock
generic/475".

Cc: stable@kernel.org # 3.11
Fixes: 8f7d89f3 ("jbd2: transaction reservation support")
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
Reviewed-by: NJan Kara <jack@suse.cz>

b2569260

btrfs: Fix wrong btrfs_delalloc_release_extents parameter · 336a8bb8

由 Qu Wenruo 提交于 4月 17, 2018

Commit 43b18595 ("btrfs: qgroup: Use separate meta reservation type
for delalloc") merged into mainline is not the latest version submitted
to mail list in Dec 2017.

It has a fatal wrong @qgroup_free parameter, which results increasing
qgroup metadata pertrans reserved space, and causing a lot of early EDQUOT.

Fix it by applying the correct diff on top of current branch.

Fixes: 43b18595 ("btrfs: qgroup: Use separate meta reservation type for delalloc")
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

336a8bb8

btrfs: delayed-inode: Remove wrong qgroup meta reservation calls · f218ea6c

由 Qu Wenruo 提交于 4月 17, 2018

Commit 4f5427cc ("btrfs: delayed-inode: Use new qgroup meta rsv for
delayed inode and item") merged into mainline was not latest version
submitted to the mail list in Dec 2017.

Which lacks the following fixes:

1) Remove btrfs_qgroup_convert_reserved_meta() call in
   btrfs_delayed_item_release_metadata()
2) Remove btrfs_qgroup_reserve_meta_prealloc() call in
   btrfs_delayed_inode_reserve_metadata()

Those fixes will resolve unexpected EDQUOT problems.

Fixes: 4f5427cc ("btrfs: delayed-inode: Use new qgroup meta rsv for delayed inode and item")
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

f218ea6c

btrfs: qgroup: Use independent and accurate per inode qgroup rsv · ff6bc37e

由 Qu Wenruo 提交于 12月 21, 2017

Unlike reservation calculation used in inode rsv for metadata, qgroup
doesn't really need to care about things like csum size or extent usage
for the whole tree COW.

Qgroups care more about net change of the extent usage.
That's to say, if we're going to insert one file extent, it will mostly
find its place in COWed tree block, leaving no change in extent usage.
Or causing a leaf split, resulting in one new net extent and increasing
qgroup number by nodesize.
Or in an even more rare case, increase the tree level, increasing qgroup
number by 2 * nodesize.

So here instead of using the complicated calculation for extent
allocator, which cares more about accuracy and no error, qgroup doesn't
need that over-estimated reservation.

This patch will maintain 2 new members in btrfs_block_rsv structure for
qgroup, using much smaller calculation for qgroup rsv, reducing false
EDQUOT.
Signed-off-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NQu Wenruo <wqu@suse.com>

ff6bc37e

btrfs: qgroup: Commit transaction in advance to reduce early EDQUOT · a514d638

由 Qu Wenruo 提交于 12月 22, 2017

Unlike previous method that tries to commit transaction inside
qgroup_reserve(), this time we will try to commit transaction using
fs_info->transaction_kthread to avoid nested transaction and no need to
worry about locking context.

Since it's an asynchronous function call and we won't wait for
transaction commit, unlike previous method, we must call it before we
hit the qgroup limit.

So this patch will use the ratio and size of qgroup meta_pertrans
reservation as indicator to check if we should trigger a transaction
commit.  (meta_prealloc won't be cleaned in transaction committ, it's
useless anyway)
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

a514d638

udf: Fix leak of UTF-16 surrogates into encoded strings · 44f06ba8

由 Jan Kara 提交于 4月 12, 2018

OSTA UDF specification does not mention whether the CS0 charset in case
of two bytes per character encoding should be treated in UTF-16 or
UCS-2. The sample code in the standard does not treat UTF-16 surrogates
in any special way but on systems such as Windows which work in UTF-16
internally, filenames would be treated as being in UTF-16 effectively.
In Linux it is more difficult to handle characters outside of Base
Multilingual plane (beyond 0xffff) as NLS framework works with 2-byte
characters only. Just make sure we don't leak UTF-16 surrogates into the
resulting string when loading names from the filesystem for now.

CC: stable@vger.kernel.org # >= v4.6
Reported-by: NMingye Wang <arthur200126@gmail.com>
Signed-off-by: NJan Kara <jack@suse.cz>

44f06ba8

fs: cifs: Adding new return type vm_fault_t · a5240cbd

由 Souptick Joarder 提交于 4月 15, 2018

Use new return type vm_fault_t for page_mkwrite
handler.
Signed-off-by: NSouptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: NMatthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

a5240cbd

cifs: smb2ops: Fix NULL check in smb2_query_symlink · 0d568cd3

由 Gustavo A. R. Silva 提交于 4月 13, 2018

The current code null checks variable err_buf, which is always null
when it is checked, hence utf16_path is free'd and the function
returns -ENOENT everytime it is called, making it impossible for the
execution path to reach the following code:

err_buf = err_iov.iov_base;

Fix this by null checking err_iov.iov_base instead of err_buf. Also,
notice that err_buf no longer needs to be initialized to NULL.

Addresses-Coverity-ID: 1467876 ("Logically dead code")
Fixes: 2d636199e400 ("cifs: Change SMB2_open to return an iov for the error parameter")
Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: NSteve French <smfrench@gmail.com>
Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>

0d568cd3

17 4月, 2018 1 次提交

eCryptfs: don't pass up plaintext names when using filename encryption · e86281e7

由 Tyler Hicks 提交于 3月 28, 2018

Both ecryptfs_filldir() and ecryptfs_readlink_lower() use
ecryptfs_decode_and_decrypt_filename() to translate lower filenames to
upper filenames. The function correctly passes up lower filenames,
unchanged, when filename encryption isn't in use. However, it was also
passing up lower filenames when the filename wasn't encrypted or
when decryption failed. Since 88ae4ab9, eCryptfs refuses to lookup
lower plaintext names when filename encryption is enabled so this
resulted in a situation where userspace would see lower plaintext
filenames in calls to getdents(2) but then not be able to lookup those
filenames.

An example of this can be seen when enabling filename encryption on an
eCryptfs mount at the root directory of an Ext4 filesystem:

$ ls -1i /lower
12 ECRYPTFS_FNEK_ENCRYPTED.FWYZD8TcW.5FV-TKTEYOHsheiHX9a-w.NURCCYIMjI8pn5BDB9-h3fXwrE--
11 lost+found
$ ls -1i /upper
ls: cannot access '/upper/lost+found': No such file or directory
 ? lost+found
12 test

With this change, the lower lost+found dentry is ignored:

$ ls -1i /lower
12 ECRYPTFS_FNEK_ENCRYPTED.FWYZD8TcW.5FV-TKTEYOHsheiHX9a-w.NURCCYIMjI8pn5BDB9-h3fXwrE--
11 lost+found
$ ls -1i /upper
12 test

Additionally, some potentially noisy error/info messages in the related
code paths are turned into debug messages so that the logs can't be
easily filled.

Fixes: 88ae4ab9 ("ecryptfs_lookup(): try either only encrypted or plaintext name")
Reported-by: NGuenter Roeck <linux@roeck-us.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NTyler Hicks <tyhicks@canonical.com>

e86281e7

16 4月, 2018 5 次提交

fs: ext2: Adding new return type vm_fault_t · 06856938

由 Souptick Joarder 提交于 4月 15, 2018

Use new return type vm_fault_t for page_mkwrite,
pfn_mkwrite and fault handler.
Signed-off-by: NSouptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: NMatthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: NJan Kara <jack@suse.cz>

06856938

isofs: fix potential memory leak in mount option parsing · 4f34a513

由 Chengguang Xu 提交于 4月 14, 2018

When specifying string type mount option (e.g., iocharset)
several times in a mount, current option parsing may
cause memory leak. Hence, call kfree for previous one
in this case. Meanwhile, check memory allocation result
for it.
Signed-off-by: NChengguang Xu <cgxu519@gmx.com>
Signed-off-by: NJan Kara <jack@suse.cz>

4f34a513

ceph: always update atime/mtime/ctime for new inode · ffdeec7a

由 Yan, Zheng 提交于 3月 26, 2018

For new inode, atime/mtime/ctime are uninitialized.  Don't compare
against them.

Cc: stable@kernel.org
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Reviewed-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

ffdeec7a

mm,vmscan: Allow preallocating memory for register_shrinker(). · 8e04944f

由 Tetsuo Handa 提交于 4月 04, 2018

syzbot is catching so many bugs triggered by commit 9ee332d9
("sget(): handle failures of register_shrinker()"). That commit expected
that calling kill_sb() from deactivate_locked_super() without successful
fill_super() is safe, but the reality was different; some callers assign
attributes which are needed for kill_sb() after sget() succeeds.

For example, [1] is a report where sb->s_mode (which seems to be either
FMODE_READ | FMODE_EXCL | FMODE_WRITE or FMODE_READ | FMODE_EXCL) is not
assigned unless sget() succeeds. But it does not worth complicate sget()
so that register_shrinker() failure path can safely call
kill_block_super() via kill_sb(). Making alloc_super() fail if memory
allocation for register_shrinker() failed is much simpler. Let's avoid
calling deactivate_locked_super() from sget_userns() by preallocating
memory for the shrinker and making register_shrinker() in sget_userns()
never fail.

[1] https://syzkaller.appspot.com/bug?id=588996a25a2587be2e3a54e8646728fb9cae44e7Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: Nsyzbot <syzbot+5a170e19c963a2e0df79@syzkaller.appspotmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

8e04944f

orangefs_kill_sb(): deal with allocation failures · 65903842

由 Al Viro 提交于 4月 03, 2018

orangefs_fill_sb() might've failed to allocate ORANGEFS_SB(s); don't
oops in that case.

Cc: stable@kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

65903842

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功