提交 · b150f5c1c759d551da9146435d3dc9df5f7e15ef · openeuler / Kernel

16 8月, 2013 1 次提交

ceph: cleanup the logic in ceph_invalidatepage · b150f5c1

由 Milosz Tanski 提交于 8月 09, 2013

The invalidatepage code bails if it encounters a non-zero page offset. The
current logic that does is non-obvious with multiple if statements.

This should be logically and functionally equivalent.
Signed-off-by: NMilosz Tanski <milosz@adfin.com>
Reviewed-by: NSage Weil <sage@inktank.com>

b150f5c1

14 8月, 2013 7 次提交

fs/proc/task_mmu.c: fix buffer overflow in add_page_map() · 8c829622

由 yonghua zheng 提交于 8月 13, 2013

Recently we met quite a lot of random kernel panic issues after enabling
CONFIG_PROC_PAGE_MONITOR.  After debuggind we found this has something
to do with following bug in pagemap:

In struct pagemapread:

  struct pagemapread {
      int pos, len;
      pagemap_entry_t *buffer;
      bool v2;
  };

pos is number of PM_ENTRY_BYTES in buffer, but len is the size of
buffer, it is a mistake to compare pos and len in add_page_map() for
checking buffer is full or not, and this can lead to buffer overflow and
random kernel panic issue.

Correct len to be total number of PM_ENTRY_BYTES in buffer.

[akpm@linux-foundation.org: document pagemapread.pos and .len units, fix PM_ENTRY_BYTES definition]
Signed-off-by: NYonghua Zheng <younghua.zheng@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8c829622

ocfs2: fix null pointer dereference in ocfs2_dir_foreach_blk_id() · d6394b59

由 Jeff Liu 提交于 8月 13, 2013

Fix a NULL pointer deference while removing an empty directory, which
was introduced by commit 3704412b ("[readdir] convert ocfs2").

  BUG: unable to handle kernel NULL pointer dereference at (null)
  IP: [<(null)>]           (null)
  PGD 6da85067 PUD 6da89067 PMD 0
  Oops: 0010 [#1] SMP
  CPU: 0 PID: 6564 Comm: rmdir Tainted: G           O 3.11.0-rc1 #4
  RIP: 0010:[<0000000000000000>]  [<          (null)>]           (null)
  Call Trace:
    ocfs2_dir_foreach+0x49/0x50 [ocfs2]
    ocfs2_empty_dir+0x12c/0x3e0 [ocfs2]
    ocfs2_unlink+0x56e/0xc10 [ocfs2]
    vfs_rmdir+0xd5/0x140
    do_rmdir+0x1cb/0x1e0
    SyS_rmdir+0x16/0x20
    system_call_fastpath+0x16/0x1b
  Code:  Bad RIP value.
  RIP  [<          (null)>]           (null)
  RSP <ffff88006daddc10>
  CR2: 0000000000000000

[dan.carpenter@oracle.com: fix pointer math]
Signed-off-by: NJie Liu <jeff.liu@oracle.com>
Reported-by: NDavid Weber <wb@munzinger.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d6394b59

ocfs2: fix NULL pointer dereference in ocfs2_duplicate_clusters_by_page · c7dd3392

由 Tiger Yang 提交于 8月 13, 2013

Since ocfs2_cow_file_pos will invoke ocfs2_refcount_icow with a NULL as
the struct file pointer, it finally result in a null pointer dereference
in ocfs2_duplicate_clusters_by_page.

This patch replace file pointer with inode pointer in
cow_duplicate_clusters to fix this issue.

[jeff.liu@oracle.com: rebased patch against linux-next tree]
Signed-off-by: NTiger Yang <tiger.yang@oracle.com>
Signed-off-by: NJie Liu <jeff.liu@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Acked-by: NTao Ma <tm@tao.ma>
Tested-by: NDavid Weber <wb@munzinger.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c7dd3392

ocfs2: Revert to avoid regression in extended allocation · 6115ea28

由 Jie Liu 提交于 8月 13, 2013

Revert commit 40bd62eb ("fs/ocfs2/journal.h: add bits_wanted while
calculating credits in ocfs2_calc_extend_credits").

Unfortunately this change broke fallocate even if there is insufficient
disk space for the preallocation, which is a serious problem.

  # df -h
  /dev/sda8        22G  1.2G   21G   6% /ocfs2
  # fallocate -o 0 -l 200M /ocfs2/testfile
  fallocate: /ocfs2/test: fallocate failed: No space left on device

and a kernel warning:

  CPU: 3 PID: 3656 Comm: fallocate Tainted: G        W  O 3.11.0-rc3 #2
  Call Trace:
    dump_stack+0x77/0x9e
    warn_slowpath_common+0xc4/0x110
    warn_slowpath_null+0x2a/0x40
    start_this_handle+0x6c/0x640 [jbd2]
    jbd2__journal_start+0x138/0x300 [jbd2]
    jbd2_journal_start+0x23/0x30 [jbd2]
    ocfs2_start_trans+0x166/0x300 [ocfs2]
    __ocfs2_extend_allocation+0x38f/0xdb0 [ocfs2]
    ocfs2_allocate_unwritten_extents+0x3c9/0x520
    __ocfs2_change_file_space+0x5e0/0xa60 [ocfs2]
    ocfs2_fallocate+0xb1/0xe0 [ocfs2]
    do_fallocate+0x1cb/0x220
    SyS_fallocate+0x6f/0xb0
    system_call_fastpath+0x16/0x1b
  JBD2: fallocate wants too many credits (51216 > 4381)
Signed-off-by: NJie Liu <jeff.liu@oracle.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6115ea28

hugetlb: fix lockdep splat caused by pmd sharing · b610ded7

由 Michal Hocko 提交于 8月 13, 2013

Dave has reported the following lockdep splat:

  =================================
  [ INFO: inconsistent lock state ]
  3.11.0-rc1+ #9 Not tainted
  ---------------------------------
  inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
  kswapd0/49 [HC0[0]:SC0[0]:HE1:SE1] takes:
   (&mapping->i_mmap_mutex){+.+.?.}, at: [<c114971b>] page_referenced+0x87/0x5e3
  {RECLAIM_FS-ON-W} state was registered at:
     mark_held_locks+0x81/0xe7
     lockdep_trace_alloc+0x5e/0xbc
     __alloc_pages_nodemask+0x8b/0x9b6
     __get_free_pages+0x20/0x31
     get_zeroed_page+0x12/0x14
     __pmd_alloc+0x1c/0x6b
     huge_pmd_share+0x265/0x283
     huge_pte_alloc+0x5d/0x71
     hugetlb_fault+0x7c/0x64a
     handle_mm_fault+0x255/0x299
     __do_page_fault+0x142/0x55c
     do_page_fault+0xd/0x16
     error_code+0x6c/0x74
  irq event stamp: 3136917
  hardirqs last  enabled at (3136917):  _raw_spin_unlock_irq+0x27/0x50
  hardirqs last disabled at (3136916):  _raw_spin_lock_irq+0x15/0x78
  softirqs last  enabled at (3136180):  __do_softirq+0x137/0x30f
  softirqs last disabled at (3136175):  irq_exit+0xa8/0xaa
  other info that might help us debug this:
   Possible unsafe locking scenario:
         CPU0
         ----
    lock(&mapping->i_mmap_mutex);
    <Interrupt>
      lock(&mapping->i_mmap_mutex);

  *** DEADLOCK ***
  no locks held by kswapd0/49.

  stack backtrace:
  CPU: 1 PID: 49 Comm: kswapd0 Not tainted 3.11.0-rc1+ #9
  Hardware name: Dell Inc.                 Precision WorkStation 490    /0DT031, BIOS A08 04/25/2008
  Call Trace:
    dump_stack+0x4b/0x79
    print_usage_bug+0x1d9/0x1e3
    mark_lock+0x1e0/0x261
    __lock_acquire+0x623/0x17f2
    lock_acquire+0x7d/0x195
    mutex_lock_nested+0x6c/0x3a7
    page_referenced+0x87/0x5e3
    shrink_page_list+0x3d9/0x947
    shrink_inactive_list+0x155/0x4cb
    shrink_lruvec+0x300/0x5ce
    shrink_zone+0x53/0x14e
    kswapd+0x517/0xa75
    kthread+0xa8/0xaa
    ret_from_kernel_thread+0x1b/0x28

which is a false positive caused by hugetlb pmd sharing code which
allocates a new pmd from withing mapping->i_mmap_mutex.  If this
allocation causes reclaim then the lockdep detector complains that we
might self-deadlock.

This is not correct though, because hugetlb pages are not reclaimable so
their mapping will be never touched from the reclaim path.

The patch tells lockup detector that hugetlb i_mmap_mutex is special by
assigning it a separate lockdep class so it won't report possible
deadlocks on unrelated mappings.

[peterz@infradead.org: comment for annotation]
Reported-by: NDave Jones <davej@redhat.com>
Signed-off-by: NMichal Hocko <mhocko@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: NMinchan Kim <minchan@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b610ded7

mm: save soft-dirty bits on file pages · 41bb3476

由 Cyrill Gorcunov 提交于 8月 13, 2013

Andy reported that if file page get reclaimed we lose the soft-dirty bit
if it was there, so save _PAGE_BIT_SOFT_DIRTY bit when page address get
encoded into pte entry.  Thus when #pf happens on such non-present pte
we can restore it back.
Reported-by: NAndy Lutomirski <luto@amacapital.net>
Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
Acked-by: NPavel Emelyanov <xemul@parallels.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

41bb3476

mm: save soft-dirty bits on swapped pages · 179ef71c

由 Cyrill Gorcunov 提交于 8月 13, 2013

Andy Lutomirski reported that if a page with _PAGE_SOFT_DIRTY bit set
get swapped out, the bit is getting lost and no longer available when
pte read back.

To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is saved in
pte entry for the page being swapped out.  When such page is to be read
back from a swap cache we check for bit presence and if it's there we
clear it and restore the former _PAGE_SOFT_DIRTY bit back.

One of the problem was to find a place in pte entry where we can save
the _PTE_SWP_SOFT_DIRTY bit while page is in swap.  The _PAGE_PSE was
chosen for that, it doesn't intersect with swap entry format stored in
pte.
Reported-by: NAndy Lutomirski <luto@amacapital.net>
Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
Acked-by: NPavel Emelyanov <xemul@parallels.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: NMinchan Kim <minchan@kernel.org>
Reviewed-by: NWanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

179ef71c

12 8月, 2013 1 次提交

ext4: flush the extent status cache during EXT4_IOC_SWAP_BOOT · cde2d7a7

由 Theodore Ts'o 提交于 8月 12, 2013

Previously we weren't swapping only some of the extent_status LRU
fields during the processing of the EXT4_IOC_SWAP_BOOT ioctl.  The
much safer thing to do is to just completely flush the extent status
tree when doing the swap.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: Zheng Liu <gnehzuil.liu@gmail.com>
Cc: stable@vger.kernel.org

cde2d7a7

10 8月, 2013 23 次提交

ceph: Remove bogus check in invalidatepage · fe2a801b

由 Milosz Tanski 提交于 8月 09, 2013

The early bug checks are moot because the VMA layer ensures those things.

1. It will not call invalidatepage unless PagePrivate (or PagePrivate2) are set
2. It will not call invalidatepage without taking a PageLock first.
3. Guantrees that the inode page is mapped.
Signed-off-by: NMilosz Tanski <milosz@adfin.com>
Reviewed-by: NSage Weil <sage@inktank.com>

fe2a801b

ceph: replace hold_mutex flag with goto · 2f75e9e1

由 Sage Weil 提交于 8月 09, 2013

All of the early exit paths need to drop the mutex; it is only the normal
path through the function that does not.  Skip the unlock in that case
with a goto out_unlocked.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NJianpeng Ma <majianpeng@gmail.com>

2f75e9e1

ceph: Move the place for EOLDSNAPC handle in ceph_aio_write to easily understand · 0e5dd45c

由 majianpeng 提交于 8月 08, 2013

Only for ceph_sync_write, the osd can return EOLDSNAPC.so move the
related codes after the call ceph_sync_write.
Signed-off-by: NJianpeng Ma <majianpeng@gmail.com>
Reviewed-by: NSage Weil <sage@inktank.com>

0e5dd45c

ceph: fix freeing inode vs removing session caps race · 6f60f889

由 Yan, Zheng 提交于 7月 24, 2013

remove_session_caps() uses iterate_session_caps() to remove caps,
but iterate_session_caps() skips inodes that are being deleted.
So session->s_nr_caps can be non-zero after iterate_session_caps()
return.

We can fix the issue by waiting until deletions are complete.
__wait_on_freeing_inode() is designed for the job, but it is not
exported, so we use lookup inode function to access it.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>

6f60f889

ceph: Add check returned value on func ceph_calc_ceph_pg. · 2fbcbff1

由 majianpeng 提交于 8月 02, 2013

Func ceph_calc_ceph_pg maybe failed.So add check for returned value.
Signed-off-by: NJianpeng Ma <majianpeng@gmail.com>
Reviewed-by: NSage Weil <sage@inktank.com>
Signed-off-by: NSage Weil <sage@inktank.com>

2fbcbff1

ceph: Don't use ceph-sync-mode for synchronous-fs. · 7ab9b380

由 majianpeng 提交于 6月 27, 2013

Sending reads and writes through the sync read/write paths bypasses the
page cache, which is not expected or generally a good idea.  Removing
the write check is safe as there is a conditional vfs_fsync_range() later
in ceph_aio_write that already checks for the same flag (via
IS_SYNC(inode)).
Signed-off-by: NJianpeng Ma <majianpeng@gmail.com>
Reviewed-by: NSage Weil <sage@inktank.com>

7ab9b380

ceph: cleanup types in striped_read() · 688bac46

由 Dan Carpenter 提交于 7月 23, 2013

We pass in a u64 value for "len" and then immediately truncate away the
upper 32 bits.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <alex.elder@linaro.org>

688bac46

ceph: trim deleted inode · ca20c991

由 Yan, Zheng 提交于 7月 21, 2013

The MDS uses caps message to notify clients about deleted inode.
when receiving a such message, invalidate any alias of the inode.
This makes the kernel release the inode ASAP.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NSage Weil <sage@inktank.com>

ca20c991

ceph: wake up writer if vmtruncate work get blocked · 85ce127a

由 Yan, Zheng 提交于 7月 21, 2013

To write data, the writer first acquires the i_mutex, then try getting
caps. The writer may sleep while holding the i_mutex. If the MDS revokes
Fb cap in this case, vmtruncate work can't do its job because i_mutex
is locked. We should wake up the writer and let it truncate the pages.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NSage Weil <sage@inktank.com>

85ce127a

ceph: drop CAP_LINK_SHARED when sending "link" request to MDS · ad88f23f

由 Yan, Zheng 提交于 7月 21, 2013

To handle "link" request, the MDS need to xlock inode's linklock,
which requires revoking any CAP_LINK_SHARED.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NSage Weil <sage@inktank.com>

ad88f23f

ceph: fix null pointer dereference · c338c07c

由 Nathaniel Yazdani 提交于 8月 04, 2013

When register_session() is given an out-of-range argument for mds,
ceph_mdsmap_get_addr() will return a null pointer, which would be given to
ceph_con_open() & be dereferenced, causing a kernel oops. This fixes bug #4685
in the Ceph bug tracker <http://tracker.ceph.com/issues/4685>.
Signed-off-by: NNathaniel Yazdani <n1ght.4nd.d4y@gmail.com>
Reviewed-by: NSage Weil <sage@inktank.com>

c338c07c

ceph: Don't forget the 'up_read(&osdc->map_sem)' if met error. · 494ddd11

由 majianpeng 提交于 7月 16, 2013

CC: stable@vger.kernel.org
Signed-off-by: NJianpeng Ma <majianpeng@gmail.com>
Reviewed-by: NSage Weil <sage@inktank.com>

494ddd11

btrfs: don't loop on large offsets in readdir · db62efbb

由 Zach Brown 提交于 7月 11, 2013

When btrfs readdir() hits the last entry it sets the readdir offset to a
huge value to stop buggy apps from breaking when the same name is
returned by readdir() with concurrent rename()s.

But unconditionally setting the offset to INT_MAX causes readdir() to
loop returning any entries with offsets past INT_MAX.  It only takes a
few hours of constant file creation and removal to create entries past
INT_MAX.

So let's set the huge offset to LLONG_MAX if the last entry has already
overflowed 32bit loff_t.   Without large offsets behaviour is identical.
With large offsets 64bit apps will work and 32bit apps will be no more
broken than they currently are if they see large offsets.
Signed-off-by: NZach Brown <zab@redhat.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

db62efbb

Btrfs: check to see if root_list is empty before adding it to dead roots · cfad392b

由 Josef Bacik 提交于 7月 25, 2013

A user reported a panic when running with autodefrag and deleting snapshots.
This is because we could end up trying to add the root to the dead roots list
twice.  To fix this check to see if we are empty before adding ourselves to the
dead roots list.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

cfad392b

Btrfs: release both paths before logging dir/changed extents · f3b15ccd

由 Josef Bacik 提交于 7月 22, 2013

The ceph guys tripped over this bug where we were still holding onto the
original path that we used to copy the inode with when logging.  This is based
on Chris's fix which was reported to fix the problem.  We need to drop the paths
in two cases anyway so just move the drop up so that we don't have duplicate
code.  Thanks,

Cc: stable@vger.kernel.org
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

f3b15ccd

Btrfs: allow splitting of hole em's when dropping extent cache · ee20a983

由 Josef Bacik 提交于 7月 11, 2013

I noticed while running multi-threaded fsync tests that sometimes fsck would
complain about an improper gap. This happens because we fail to add a hole
extent to the file, which was happening when we'd split a hole EM because
btrfs_drop_extent_cache was just discarding the whole em instead of splitting
it. So this patch fixes this by allowing us to split a hole em properly, which
means that added holes actually get logged properly and we no longer see this
fsck error. Thankfully we're tolerant of these sort of problems so a user would
not see any adverse effects of this bug, other than fsck complaining. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

ee20a983

Btrfs: make sure the backref walker catches all refs to our extent · ed8c4913

由 Josef Bacik 提交于 7月 05, 2013

Because we don't mess with the offset into the extent for compressed we will
properly find both extents for this case

[extent a][extent b][rest of extent a]

but because we already added a ref for the front half we won't add the inode
information for the second half.  This causes us to leak that memory and not
print out the other offset when we do logical-resolve.  So fix this by calling
ulist_add_merge and then add our eie to the existing entry if there is one.
With this patch we get both offsets out of logical-resolve.  With this and the
other 2 patches I've sent we now pass btrfs/276 on my vm with compress-force=lzo
set.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

ed8c4913

Btrfs: fix backref walking when we hit a compressed extent · 8ca15e05

由 Josef Bacik 提交于 7月 05, 2013

If you do btrfs inspect-internal logical-resolve on a compressed extent that has
been partly overwritten it won't find anything. This is because we try and
match the extent offset we've searched for based on the extent offset in the
data extent entry. However this doesn't work for compressed extents because the
offsets are for the uncompressed size, not the compressed size. So instead only
do this check if we are not compressed, that way we can get an actual entry for
the physical offset rather than nothing for compressed. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

8ca15e05

Btrfs: do not offset physical if we're compressed · b76bb701

由 Josef Bacik 提交于 7月 05, 2013

xfstest btrfs/276 was freaking out on slower boxes partly because fiemap was
offsetting the physical based on the extent offset. This is perfectly fine with
uncompressed extents, however the extent offset is into the uncompressed area,
not the compressed. So we can return a physical value that isn't at all within
the area we have allocated on disk. Fix this by returning the start of the
extent if it is compressed no matter what the offset. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

b76bb701

Btrfs: fix extent buffer leak after backref walking · b5b9b5b3

由 Liu Bo 提交于 7月 03, 2013

commit 47fb091f(Btrfs: fix unlock after free on rewinded tree blocks)
takes an extra increment on the reference of allocated dummy extent buffer, so now we
cannot free this dummy one, and end up with extent buffer leak.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

b5b9b5b3

Btrfs: fix a bug of snapshot-aware defrag to make it work on partial extents · e68afa49

由 Liu Bo 提交于 7月 01, 2013

For partial extents, snapshot-aware defrag does not work as expected,
since
a) we use the wrong logical offset to search for parents, which should be
   disk_bytenr + extent_offset, not just disk_bytenr,
b) 'offset' returned by the backref walking just refers to key.offset, not
   the 'offset' stored in btrfs_extent_data_ref which is
   (key.offset - extent_offset).

The reproducer:
$ mkfs.btrfs sda
$ mount sda /mnt
$ btrfs sub create /mnt/sub
$ for i in `seq 5 -1 1`; do dd if=/dev/zero of=/mnt/sub/foo bs=5k count=1 seek=$i conv=notrunc oflag=sync; done
$ btrfs sub snap /mnt/sub /mnt/snap1
$ btrfs sub snap /mnt/sub /mnt/snap2
$ sync; btrfs filesystem defrag /mnt/sub/foo;
$ umount /mnt
$ btrfs-debug-tree sda (Here we can check whether the defrag operation is snapshot-awared.

This addresses the above two problems.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

e68afa49

btrfs: fix file truncation if FALLOC_FL_KEEP_SIZE is specified · 7cddc193

由 Jie Liu 提交于 6月 28, 2013

Create a small file and fallocate it to a big size with
FALLOC_FL_KEEP_SIZE option, then truncate it back to the
small size again, the disk free space is not changed back
in this case. i.e,

total 4
-rw-r--r-- 1 root root 512 Jun 28 11:35 test

Filesystem      Size  Used Avail Use% Mounted on
....
/dev/sdb1       8.0G   56K  7.2G   1% /mnt

-rw-r--r-- 1 root root 512 Jun 28 11:35 /mnt/test

Filesystem      Size  Used Avail Use% Mounted on
....
/dev/sdb1       8.0G  5.1G  2.2G  70% /mnt

Filesystem      Size  Used Avail Use% Mounted on
....
/dev/sdb1       8.0G  5.1G  2.2G  70% /mnt

With this fix, the truncated up space is back as:
Filesystem      Size  Used Avail Use% Mounted on
....
/dev/sdb1       8.0G   56K  7.2G   1% /mnt
Signed-off-by: NJie Liu <jeff.liu@oracle.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

7cddc193

dlm: kill the unnecessary and wrong device_close()->recalc_sigpending() · 201d3dfa

由 Oleg Nesterov 提交于 8月 09, 2013

device_close()->recalc_sigpending() is not needed, sigprocmask() takes
care of TIF_SIGPENDING correctly.

And without ->siglock it is racy and wrong, it can wrongly clear
TIF_SIGPENDING and miss a signal.

But even with this patch device_close() is still buggy:

  1. sigprocmask() should not be used, we have set_task_blocked(),
     but this is minor.

  2. We should never block SIGKILL or SIGSTOP, and this is what
     the code tries to do.

  3. This can't protect against SIGKILL or SIGSTOP anyway. Another
     thread can do signal_wake_up(), say, do_signal_stop() or
     complete_signal() or debugger.

  4. sigprocmask(SIG_BLOCK, allsigs) doesn't necessarily clears
     TIF_SIGPENDING, say, freezing() or ->jobctl.

  5. device_write() looks equally wrong by the same reason.

Looks like, this tries to protect some wait_event_interruptible() logic
from signals, it should be turned into uninterruptible wait.  Or we need
to implement something like signals_stop/start for such a use-case.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

201d3dfa

09 8月, 2013 2 次提交

ext4: fix mount/remount error messages for incompatible mount options · 6ae6514b

由 Piotr Sarna 提交于 8月 08, 2013

Commit 56889787 ("ext4: improve handling of conflicting mount options")
introduced incorrect messages shown while choosing wrong mount options.

First of all, both cases of incorrect mount options,
"data=journal,delalloc" and "data=journal,dioread_nolock" result in
the same error message.

Secondly, the problem above isn't solved for remount option: the
mismatched parameter is simply ignored.  Moreover, ext4_msg states
that remount with options "data=journal,delalloc" succeeded, which is
not true.

To fix it up, I added a simple check after parse_options() call to
ensure that data=journal and delalloc/dioread_nolock parameters are
not present at the same time.
Signed-off-by: NPiotr Sarna <p.sarna@partner.samsung.com>
Acked-by: NBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: NKyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org

6ae6514b

ext4: allow the mount options nodelalloc and data=journal · 59d9fa5c

由 Theodore Ts'o 提交于 8月 08, 2013

Commit 26092bf5 ("ext4: use a table-driven handler for mount options")
wrongly disallows the specifying the mount options nodelalloc and
data=journal simultaneously.  This is incorrect; it should have only
disallowed the combination of delalloc and data=journal
simultaneously.
Reported-by: NPiotr Sarna <p.sarna@partner.samsung.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org

59d9fa5c

08 8月, 2013 6 次提交

NFSv4: Fix up nfs4_proc_lookup_mountpoint · b72888cb

由 Trond Myklebust 提交于 8月 07, 2013

Currently, we do not check the return value of client = rpc_clone_client(),
nor do we shut down the resulting cloned rpc_clnt in the case where a
NFS4ERR_WRONGSEC has caused nfs4_proc_lookup_common() to replace the
original value of 'client' (causing a memory leak).

Fix both issues and simplify the code by moving the call to
rpc_clone_client() until after nfs4_proc_lookup_common() has
done its business.
Reported-by: NAndy Adamson <andros@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

b72888cb

NFS: Remove unnecessary call to nfs_setsecurity in nfs_fhget() · eddffa40

由 Trond Myklebust 提交于 8月 07, 2013

We only need to call it on the creation of the inode.
Reported-by: NJulia Lawall <Julia.Lawall@lip6.fr>
Cc: Steve Dickson <SteveD@redhat.com>
Cc: Dave Quigley <dpquigl@davequigley.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

eddffa40

NFSv4: Fix the sync mount option for nfs4 mounts · e890db01

由 Scott Mayhew 提交于 7月 31, 2013

The sync mount option stopped working for NFSv4 mounts after commit
c02d7adf (NFSv4: Replace nfs4_path_walk() with
FS path lookup in a private namespace).  If MS_SYNCHRONOUS is set in the
super_block that we're cloning from, then it should be set in the new
super_block as well.
Signed-off-by: NScott Mayhew <smayhew@redhat.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

e890db01

NFS: Fix writeback performance issue on cache invalidation · f8806c84

由 Trond Myklebust 提交于 8月 05, 2013

If a cache invalidation is triggered, and we happen to have a lot of
writebacks cached at the time, then the call to invalidate_inode_pages2()
will end up calling ->launder_page() on each and every dirty page in order
to sync its contents to disk, thus defeating write coalescing.
The following patch ensures that we try to sync the inode to disk before
calling invalidate_inode_pages2() so that we do the writeback as efficiently
as possible.
Reported-by: NWilliam Dauchy <william@gandi.net>
Reported-by: NPascal Bouchareine <pascal@gandi.net>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: NWilliam Dauchy <william@gandi.net>
Reviewed-by: NJeff Layton <jlayton@redhat.com>

f8806c84

nfsd: Fix SP4_MACH_CRED negotiation in EXCHANGE_ID · 58cd57bf

由 Weston Andros Adamson 提交于 8月 05, 2013

 - don't BUG_ON() when not SP4_NONE
 - calculate recv and send reserve sizes correctly
Signed-off-by: NWeston Andros Adamson <dros@netapp.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

58cd57bf

nfsd4: Fix MACH_CRED NULL dereference · c4720591

由 J. Bruce Fields 提交于 8月 07, 2013

Fixes a NULL-dereference on attempts to use MACH_CRED protection over
auth_sys.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

c4720591

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功