1. 30 5月, 2018 1 次提交
  2. 05 5月, 2018 1 次提交
  3. 03 5月, 2018 6 次提交
    • J
      f2fs: clear PageError on writepage · 17c50035
      Jaegeuk Kim 提交于
      This patch clears PageError in some pages tagged by read path, but when we
      write the pages with valid contents, writepage should clear the bit likewise
      ext4.
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      17c50035
    • J
      f2fs: check cap_resource only for data blocks · a90a0884
      Jaegeuk Kim 提交于
      This patch changes the rule to check cap_resource for data blocks, not inode
      or node blocks in order to avoid selinux denial.
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      a90a0884
    • J
      Revert "f2fs: introduce f2fs_set_page_dirty_nobuffer" · b87078ad
      Jaegeuk Kim 提交于
      This patch reverts copied f2fs_set_page_dirty_nobuffer to use generic function
      for stability.
      
      This reverts commit fe76b796.
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b87078ad
    • E
      f2fs: call unlock_new_inode() before d_instantiate() · ab3835aa
      Eric Biggers 提交于
      xfstest generic/429 sometimes hangs on f2fs, caused by a thread being
      unable to take a directory's i_rwsem for write in vfs_rmdir().  In the
      test, one thread repeatedly creates and removes a directory, and other
      threads repeatedly look up a file in the directory.  The bug is that
      f2fs_mkdir() calls d_instantiate() before unlock_new_inode(), resulting
      in the directory inode being exposed to lookups before it has been fully
      initialized.  And with CONFIG_DEBUG_LOCK_ALLOC, unlock_new_inode()
      reinitializes ->i_rwsem, corrupting its state when it is already held.
      
      Fix it by calling unlock_new_inode() before d_instantiate().  This
      matches what other filesystems do.
      
      Fixes: 57397d86 ("f2fs: add inode operations for special inodes")
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      ab3835aa
    • E
      f2fs: refactor read path to allow multiple postprocessing steps · 6dbb1796
      Eric Biggers 提交于
      Currently f2fs's ->readpage() and ->readpages() assume that either the
      data undergoes no postprocessing, or decryption only.  But with
      fs-verity, there will be an additional authenticity verification step,
      and it may be needed either by itself, or combined with decryption.
      
      To support this, store a 'struct bio_post_read_ctx' in ->bi_private
      which contains a work struct, a bitmask of postprocessing steps that are
      enabled, and an indicator of the current step.  The bio completion
      routine, if there was no I/O error, enqueues the first postprocessing
      step.  When that completes, it continues to the next step.  Pages that
      fail any postprocessing step have PageError set.  Once all steps have
      completed, pages without PageError set are set Uptodate, and all pages
      are unlocked.
      
      Also replace f2fs_encrypted_file() with a new function
      f2fs_post_read_required() in places like direct I/O and garbage
      collection that really should be testing whether the file needs special
      I/O processing, not whether it is encrypted specifically.
      
      This may also be useful for other future f2fs features such as
      compression.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      6dbb1796
    • E
      fscrypt: allow synchronous bio decryption · 0cb8dae4
      Eric Biggers 提交于
      Currently, fscrypt provides fscrypt_decrypt_bio_pages() which decrypts a
      bio's pages asynchronously, then unlocks them afterwards.  But, this
      assumes that decryption is the last "postprocessing step" for the bio,
      so it's incompatible with additional postprocessing steps such as
      authenticity verification after decryption.
      
      Therefore, rename the existing fscrypt_decrypt_bio_pages() to
      fscrypt_enqueue_decrypt_bio().  Then, add fscrypt_decrypt_bio() which
      decrypts the pages in the bio synchronously without unlocking the pages,
      nor setting them Uptodate; and add fscrypt_enqueue_decrypt_work(), which
      enqueues work on the fscrypt_read_workqueue.  The new functions will be
      used by filesystems that support both fscrypt and fs-verity.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0cb8dae4
  4. 26 4月, 2018 4 次提交
  5. 24 4月, 2018 3 次提交
  6. 23 4月, 2018 1 次提交
  7. 21 4月, 2018 13 次提交
    • T
      fs, elf: don't complain MAP_FIXED_NOREPLACE unless -EEXIST error · d23a61ee
      Tetsuo Handa 提交于
      Commit 4ed28639 ("fs, elf: drop MAP_FIXED usage from elf_map") is
      printing spurious messages under memory pressure due to map_addr == -ENOMEM.
      
       9794 (a.out): Uhuuh, elf segment at 00007f2e34738000(fffffffffffffff4) requested but the memory is mapped already
       14104 (a.out): Uhuuh, elf segment at 00007f34fd76c000(fffffffffffffff4) requested but the memory is mapped already
       16843 (a.out): Uhuuh, elf segment at 00007f930ecc7000(fffffffffffffff4) requested but the memory is mapped already
      
      Complain only if -EEXIST, and use %px for printing the address.
      
      Link: http://lkml.kernel.org/r/201804182307.FAC17665.SFMOFJVFtHOLOQ@I-love.SAKURA.ne.jp
      Fixes: 4ed28639 ("fs, elf: drop MAP_FIXED usage from elf_map") is
      Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Andrei Vagin <avagin@openvz.org>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
      Cc: Joel Stanley <joel@jms.id.au>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d23a61ee
    • A
      proc: fix /proc/loadavg regression · 9a1015b3
      Alexey Dobriyan 提交于
      Commit 95846ecf ("pid: replace pid bitmap implementation with IDR
      API") changed last field of /proc/loadavg (last pid allocated) to be off
      by one:
      
      	# unshare -p -f --mount-proc cat /proc/loadavg
      	0.00 0.00 0.00 1/60 2	<===
      
      It should be 1 after first fork into pid namespace.
      
      This is formally a regression but given how useless this field is I
      don't think anyone is affected.
      
      Bug was found by /proc testsuite!
      
      Link: http://lkml.kernel.org/r/20180413175408.GA27246@avx2
      Fixes: 95846ecf ("pid: replace pid bitmap implementation with IDR API")
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Gargi Sharma <gs051095@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9a1015b3
    • A
      proc: revalidate kernel thread inodes to root:root · 2e0ad552
      Alexey Dobriyan 提交于
      task_dump_owner() has the following code:
      
      	mm = task->mm;
      	if (mm) {
      		if (get_dumpable(mm) != SUID_DUMP_USER) {
      			uid = ...
      		}
      	}
      
      Check for ->mm is buggy -- kernel thread might be borrowing mm
      and inode will go to some random uid:gid pair.
      
      Link: http://lkml.kernel.org/r/20180412220109.GA20978@avx2Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2e0ad552
    • I
      autofs: mount point create should honour passed in mode · 1e630665
      Ian Kent 提交于
      The autofs file system mkdir inode operation blindly sets the created
      directory mode to S_IFDIR | 0555, ingoring the passed in mode, which can
      cause selinux dac_override denials.
      
      But the function also checks if the caller is the daemon (as no-one else
      should be able to do anything here) so there's no point in not honouring
      the passed in mode, allowing the daemon to set appropriate mode when
      required.
      
      Link: http://lkml.kernel.org/r/152361593601.8051.14014139124905996173.stgit@pluto.themaw.netSigned-off-by: NIan Kent <raven@themaw.net>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1e630665
    • G
      writeback: safer lock nesting · 2e898e4c
      Greg Thelen 提交于
      lock_page_memcg()/unlock_page_memcg() use spin_lock_irqsave/restore() if
      the page's memcg is undergoing move accounting, which occurs when a
      process leaves its memcg for a new one that has
      memory.move_charge_at_immigrate set.
      
      unlocked_inode_to_wb_begin,end() use spin_lock_irq/spin_unlock_irq() if
      the given inode is switching writeback domains.  Switches occur when
      enough writes are issued from a new domain.
      
      This existing pattern is thus suspicious:
          lock_page_memcg(page);
          unlocked_inode_to_wb_begin(inode, &locked);
          ...
          unlocked_inode_to_wb_end(inode, locked);
          unlock_page_memcg(page);
      
      If both inode switch and process memcg migration are both in-flight then
      unlocked_inode_to_wb_end() will unconditionally enable interrupts while
      still holding the lock_page_memcg() irq spinlock.  This suggests the
      possibility of deadlock if an interrupt occurs before unlock_page_memcg().
      
          truncate
          __cancel_dirty_page
          lock_page_memcg
          unlocked_inode_to_wb_begin
          unlocked_inode_to_wb_end
          <interrupts mistakenly enabled>
                                          <interrupt>
                                          end_page_writeback
                                          test_clear_page_writeback
                                          lock_page_memcg
                                          <deadlock>
          unlock_page_memcg
      
      Due to configuration limitations this deadlock is not currently possible
      because we don't mix cgroup writeback (a cgroupv2 feature) and
      memory.move_charge_at_immigrate (a cgroupv1 feature).
      
      If the kernel is hacked to always claim inode switching and memcg
      moving_account, then this script triggers lockup in less than a minute:
      
        cd /mnt/cgroup/memory
        mkdir a b
        echo 1 > a/memory.move_charge_at_immigrate
        echo 1 > b/memory.move_charge_at_immigrate
        (
          echo $BASHPID > a/cgroup.procs
          while true; do
            dd if=/dev/zero of=/mnt/big bs=1M count=256
          done
        ) &
        while true; do
          sync
        done &
        sleep 1h &
        SLEEP=$!
        while true; do
          echo $SLEEP > a/cgroup.procs
          echo $SLEEP > b/cgroup.procs
        done
      
      The deadlock does not seem possible, so it's debatable if there's any
      reason to modify the kernel.  I suggest we should to prevent future
      surprises.  And Wang Long said "this deadlock occurs three times in our
      environment", so there's more reason to apply this, even to stable.
      Stable 4.4 has minor conflicts applying this patch.  For a clean 4.4 patch
      see "[PATCH for-4.4] writeback: safer lock nesting"
      https://lkml.org/lkml/2018/4/11/146
      
      Wang Long said "this deadlock occurs three times in our environment"
      
      [gthelen@google.com: v4]
        Link: http://lkml.kernel.org/r/20180411084653.254724-1-gthelen@google.com
      [akpm@linux-foundation.org: comment tweaks, struct initialization simplification]
      Change-Id: Ibb773e8045852978f6207074491d262f1b3fb613
      Link: http://lkml.kernel.org/r/20180410005908.167976-1-gthelen@google.com
      Fixes: 682aa8e1 ("writeback: implement unlocked_inode_to_wb transaction and use it for stat updates")
      Signed-off-by: NGreg Thelen <gthelen@google.com>
      Reported-by: NWang Long <wanglong19@meituan.com>
      Acked-by: NWang Long <wanglong19@meituan.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: <stable@vger.kernel.org>	[v4.2+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2e898e4c
    • H
      mm, pagemap: fix swap offset value for PMD migration entry · 88c28f24
      Huang Ying 提交于
      The swap offset reported by /proc/<pid>/pagemap may be not correct for
      PMD migration entries.  If addr passed into pagemap_pmd_range() isn't
      aligned with PMD start address, the swap offset reported doesn't
      reflect this.  And in the loop to report information of each sub-page,
      the swap offset isn't increased accordingly as that for PFN.
      
      This may happen after opening /proc/<pid>/pagemap and seeking to a page
      whose address doesn't align with a PMD start address.  I have verified
      this with a simple test program.
      
      BTW: migration swap entries have PFN information, do we need to restrict
      whether to show them?
      
      [akpm@linux-foundation.org: fix typo, per Huang, Ying]
      Link: http://lkml.kernel.org/r/20180408033737.10897-1-ying.huang@intel.comSigned-off-by: N"Huang, Ying" <ying.huang@intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Andrei Vagin <avagin@openvz.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: "Jerome Glisse" <jglisse@redhat.com>
      Cc: Daniel Colascione <dancol@google.com>
      Cc: Zi Yan <zi.yan@cs.rutgers.edu>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      88c28f24
    • A
      CIFS: fix typo in cifs_dbg · 596632de
      Aurelien Aptel 提交于
      Signed-off-by: NAurelien Aptel <aaptel@suse.com>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      Reported-by: NLong Li <longli@microsoft.com>
      596632de
    • S
      cifs: do not allow creating sockets except with SMB1 posix exensions · 1d0cffa6
      Steve French 提交于
      RHBZ: 1453123
      
      Since at least the 3.10 kernel and likely a lot earlier we have
      not been able to create unix domain sockets in a cifs share
      when mounted using the SFU mount option (except when mounted
      with the cifs unix extensions to Samba e.g.)
      Trying to create a socket, for example using the af_unix command from
      xfstests will cause :
      BUG: unable to handle kernel NULL pointer dereference at 00000000
      00000040
      
      Since no one uses or depends on being able to create unix domains sockets
      on a cifs share the easiest fix to stop this vulnerability is to simply
      not allow creation of any other special files than char or block devices
      when sfu is used.
      
      Added update to Ronnie's patch to handle a tcon link leak, and
      to address a buf leak noticed by Gustavo and Colin.
      Acked-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      CC:  Colin Ian King <colin.king@canonical.com>
      Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
      Reported-by: NEryu Guan <eguan@redhat.com>
      Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      Cc: stable@vger.kernel.org
      1d0cffa6
    • L
      cifs: smbd: Dump SMB packet when configured · ff30b89e
      Long Li 提交于
      When sending through SMB Direct, also dump the packet in SMB send path.
      
      Also fixed a typo in debug message.
      Signed-off-by: NLong Li <longli@microsoft.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NSteve French <smfrench@gmail.com>
      Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
      ff30b89e
    • Q
      btrfs: print-tree: debugging output enhancement · c0872323
      Qu Wenruo 提交于
      This patch enhances the following things:
      
      - tree block header
        * add generation and owner output for node and leaf
      - node pointer generation output
      - allow btrfs_print_tree() to not follow nodes
        * just like btrfs-progs
      
      Please note that, although function btrfs_print_tree() is not called by
      anyone right now, it's still a pretty useful function to debug kernel.
      So that function is still kept for later use.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NLu Fengqi <lufq.fnst@cn.fujitsu.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c0872323
    • N
      btrfs: Fix race condition between delayed refs and blockgroup removal · 5e388e95
      Nikolay Borisov 提交于
      When the delayed refs for a head are all run, eventually
      cleanup_ref_head is called which (in case of deletion) obtains a
      reference for the relevant btrfs_space_info struct by querying the bg
      for the range. This is problematic because when the last extent of a
      bg is deleted a race window emerges between removal of that bg and the
      subsequent invocation of cleanup_ref_head. This can result in cache being null
      and either a null pointer dereference or assertion failure.
      
      	task: ffff8d04d31ed080 task.stack: ffff9e5dc10cc000
      	RIP: 0010:assfail.constprop.78+0x18/0x1a [btrfs]
      	RSP: 0018:ffff9e5dc10cfbe8 EFLAGS: 00010292
      	RAX: 0000000000000044 RBX: 0000000000000000 RCX: 0000000000000000
      	RDX: ffff8d04ffc1f868 RSI: ffff8d04ffc178c8 RDI: ffff8d04ffc178c8
      	RBP: ffff8d04d29e5ea0 R08: 00000000000001f0 R09: 0000000000000001
      	R10: ffff9e5dc0507d58 R11: 0000000000000001 R12: ffff8d04d29e5ea0
      	R13: ffff8d04d29e5f08 R14: ffff8d04efe29b40 R15: ffff8d04efe203e0
      	FS:  00007fbf58ead500(0000) GS:ffff8d04ffc00000(0000) knlGS:0000000000000000
      	CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      	CR2: 00007fe6c6975648 CR3: 0000000013b2a000 CR4: 00000000000006f0
      	DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      	DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      	Call Trace:
      	 __btrfs_run_delayed_refs+0x10e7/0x12c0 [btrfs]
      	 btrfs_run_delayed_refs+0x68/0x250 [btrfs]
      	 btrfs_should_end_transaction+0x42/0x60 [btrfs]
      	 btrfs_truncate_inode_items+0xaac/0xfc0 [btrfs]
      	 btrfs_evict_inode+0x4c6/0x5c0 [btrfs]
      	 evict+0xc6/0x190
      	 do_unlinkat+0x19c/0x300
      	 do_syscall_64+0x74/0x140
      	 entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      	RIP: 0033:0x7fbf589c57a7
      
      To fix this, introduce a new flag "is_system" to head_ref structs,
      which is populated at insertion time. This allows to decouple the
      querying for the spaceinfo from querying the possibly deleted bg.
      
      Fixes: d7eae340 ("Btrfs: rework delayed ref total_bytes_pinned accounting")
      CC: stable@vger.kernel.org # 4.14+
      Suggested-by: NOmar Sandoval <osandov@osandov.com>
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      5e388e95
    • D
      vfs: Undo an overly zealous MS_RDONLY -> SB_RDONLY conversion · a9e5b732
      David Howells 提交于
      In do_mount() when the MS_* flags are being converted to MNT_* flags,
      MS_RDONLY got accidentally convered to SB_RDONLY.
      
      Undo this change.
      
      Fixes: e462ec50 ("VFS: Differentiate mount flags (MS_*) from internal superblock flags")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9e5b732
    • D
      afs: Fix server record deletion · 66062592
      David Howells 提交于
      AFS server records get removed from the net->fs_servers tree when
      they're deleted, but not from the net->fs_addresses{4,6} lists, which
      can lead to an oops in afs_find_server() when a server record has been
      removed, for instance during rmmod.
      
      Fix this by deleting the record from the by-address lists before posting
      it for RCU destruction.
      
      The reason this hasn't been noticed before is that the fileserver keeps
      probing the local cache manager, thereby keeping the service record
      alive, so the oops would only happen when a fileserver eventually gets
      bored and stops pinging or if the module gets rmmod'd and a call comes
      in from the fileserver during the window between the server records
      being destroyed and the socket being closed.
      
      The oops looks something like:
      
        BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
        ...
        Workqueue: kafsd afs_process_async_call [kafs]
        RIP: 0010:afs_find_server+0x271/0x36f [kafs]
        ...
        Call Trace:
         afs_deliver_cb_init_call_back_state3+0x1f2/0x21f [kafs]
         afs_deliver_to_call+0x1ee/0x5e8 [kafs]
         afs_process_async_call+0x5b/0xd0 [kafs]
         process_one_work+0x2c2/0x504
         worker_thread+0x1d4/0x2ac
         kthread+0x11f/0x127
         ret_from_fork+0x24/0x30
      
      Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      66062592
  8. 20 4月, 2018 1 次提交
  9. 19 4月, 2018 2 次提交
  10. 18 4月, 2018 8 次提交
    • T
      ext4: set h_journal if there is a failure starting a reserved handle · b2569260
      Theodore Ts'o 提交于
      If ext4 tries to start a reserved handle via
      jbd2_journal_start_reserved(), and the journal has been aborted, this
      can result in a NULL pointer dereference.  This is because the fields
      h_journal and h_transaction in the handle structure share the same
      memory, via a union, so jbd2_journal_start_reserved() will clear
      h_journal before calling start_this_handle().  If this function fails
      due to an aborted handle, h_journal will still be NULL, and the call
      to jbd2_journal_free_reserved() will pass a NULL journal to
      sub_reserve_credits().
      
      This can be reproduced by running "kvm-xfstests -c dioread_nolock
      generic/475".
      
      Cc: stable@kernel.org # 3.11
      Fixes: 8f7d89f3 ("jbd2: transaction reservation support")
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
      Reviewed-by: NJan Kara <jack@suse.cz>
      b2569260
    • Q
      btrfs: Fix wrong btrfs_delalloc_release_extents parameter · 336a8bb8
      Qu Wenruo 提交于
      Commit 43b18595 ("btrfs: qgroup: Use separate meta reservation type
      for delalloc") merged into mainline is not the latest version submitted
      to mail list in Dec 2017.
      
      It has a fatal wrong @qgroup_free parameter, which results increasing
      qgroup metadata pertrans reserved space, and causing a lot of early EDQUOT.
      
      Fix it by applying the correct diff on top of current branch.
      
      Fixes: 43b18595 ("btrfs: qgroup: Use separate meta reservation type for delalloc")
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      336a8bb8
    • Q
      btrfs: delayed-inode: Remove wrong qgroup meta reservation calls · f218ea6c
      Qu Wenruo 提交于
      Commit 4f5427cc ("btrfs: delayed-inode: Use new qgroup meta rsv for
      delayed inode and item") merged into mainline was not latest version
      submitted to the mail list in Dec 2017.
      
      Which lacks the following fixes:
      
      1) Remove btrfs_qgroup_convert_reserved_meta() call in
         btrfs_delayed_item_release_metadata()
      2) Remove btrfs_qgroup_reserve_meta_prealloc() call in
         btrfs_delayed_inode_reserve_metadata()
      
      Those fixes will resolve unexpected EDQUOT problems.
      
      Fixes: 4f5427cc ("btrfs: delayed-inode: Use new qgroup meta rsv for delayed inode and item")
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      f218ea6c
    • Q
      btrfs: qgroup: Use independent and accurate per inode qgroup rsv · ff6bc37e
      Qu Wenruo 提交于
      Unlike reservation calculation used in inode rsv for metadata, qgroup
      doesn't really need to care about things like csum size or extent usage
      for the whole tree COW.
      
      Qgroups care more about net change of the extent usage.
      That's to say, if we're going to insert one file extent, it will mostly
      find its place in COWed tree block, leaving no change in extent usage.
      Or causing a leaf split, resulting in one new net extent and increasing
      qgroup number by nodesize.
      Or in an even more rare case, increase the tree level, increasing qgroup
      number by 2 * nodesize.
      
      So here instead of using the complicated calculation for extent
      allocator, which cares more about accuracy and no error, qgroup doesn't
      need that over-estimated reservation.
      
      This patch will maintain 2 new members in btrfs_block_rsv structure for
      qgroup, using much smaller calculation for qgroup rsv, reducing false
      EDQUOT.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      ff6bc37e
    • Q
      btrfs: qgroup: Commit transaction in advance to reduce early EDQUOT · a514d638
      Qu Wenruo 提交于
      Unlike previous method that tries to commit transaction inside
      qgroup_reserve(), this time we will try to commit transaction using
      fs_info->transaction_kthread to avoid nested transaction and no need to
      worry about locking context.
      
      Since it's an asynchronous function call and we won't wait for
      transaction commit, unlike previous method, we must call it before we
      hit the qgroup limit.
      
      So this patch will use the ratio and size of qgroup meta_pertrans
      reservation as indicator to check if we should trigger a transaction
      commit.  (meta_prealloc won't be cleaned in transaction committ, it's
      useless anyway)
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a514d638
    • J
      udf: Fix leak of UTF-16 surrogates into encoded strings · 44f06ba8
      Jan Kara 提交于
      OSTA UDF specification does not mention whether the CS0 charset in case
      of two bytes per character encoding should be treated in UTF-16 or
      UCS-2. The sample code in the standard does not treat UTF-16 surrogates
      in any special way but on systems such as Windows which work in UTF-16
      internally, filenames would be treated as being in UTF-16 effectively.
      In Linux it is more difficult to handle characters outside of Base
      Multilingual plane (beyond 0xffff) as NLS framework works with 2-byte
      characters only. Just make sure we don't leak UTF-16 surrogates into the
      resulting string when loading names from the filesystem for now.
      
      CC: stable@vger.kernel.org # >= v4.6
      Reported-by: NMingye Wang <arthur200126@gmail.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      44f06ba8
    • D
      xfs: don't fail when converting shortform attr to long form during ATTR_REPLACE · 7b38460d
      Darrick J. Wong 提交于
      Kanda Motohiro reported that expanding a tiny xattr into a large xattr
      fails on XFS because we remove the tiny xattr from a shortform fork and
      then try to re-add it after converting the fork to extents format having
      not removed the ATTR_REPLACE flag.  This fails because the attr is no
      longer present, causing a fs shutdown.
      
      This is derived from the patch in his bug report, but we really
      shouldn't ignore a nonzero retval from the remove call.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199119
      Reported-by: kanda.motohiro@gmail.com
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      7b38460d
    • D
      xfs: prevent creating negative-sized file via INSERT_RANGE · 7d83fb14
      Darrick J. Wong 提交于
      During the "insert range" fallocate operation, i_size grows by the
      specified 'len' bytes.  XFS verifies that i_size + len < s_maxbytes, as
      it should.  But this comparison is done using the signed 'loff_t', and
      'i_size + len' can wrap around to a negative value, causing the check to
      incorrectly pass, resulting in an inode with "negative" i_size.  This is
      possible on 64-bit platforms, where XFS sets s_maxbytes = LLONG_MAX.
      ext4 and f2fs don't run into this because they set a smaller s_maxbytes.
      
      Fix it by using subtraction instead.
      
      Reproducer:
          xfs_io -f file -c "truncate $(((1<<63)-1))" -c "finsert 0 4096"
      
      Fixes: a904b1ca ("xfs: Add support FALLOC_FL_INSERT_RANGE for fallocate")
      Cc: <stable@vger.kernel.org> # v4.1+
      Originally-From: Eric Biggers <ebiggers@google.com>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      [darrick: fix signed integer addition overflow too]
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      7d83fb14
新手
引导
客服 返回
顶部