提交 · 2bc334dcc7c77be3700dd443d92a78603f76976b · xiphi1978 / linux

07 1月, 2011 7 次提交

jfs: dont overwrite dentry name in d_revalidate · 2bc334dc

由 Nick Piggin 提交于 1月 07, 2011

Use vfat's method for dealing with negative dentries to preserve case,
rather than overwrite dentry name in d_revalidate, which is a bit ugly
and also gets in the way of doing lock-free path walking.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

2bc334dc

cifs: dont overwrite dentry name in d_revalidate · 79eb4dde

由 Nick Piggin 提交于 1月 07, 2011

79eb4dde

fs: change d_delete semantics · fe15ce44

由 Nick Piggin 提交于 1月 07, 2011

Change d_delete from a dentry deletion notification to a dentry caching
advise, more like ->drop_inode. Require it to be constant and idempotent,
and not take d_lock. This is how all existing filesystems use the callback
anyway.

This makes fine grained dentry locking of dput and dentry lru scanning
much simpler.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

fe15ce44

config fs: avoid switching ->d_op on live dentry · fbc8d4c0

由 Nick Piggin 提交于 1月 07, 2011

Switching d_op on a live dentry is racy in general, so avoid it. In this case
it is a negative dentry, which is safer, but there are still concurrent ops
which may be called on d_op in that case (eg. d_revalidate). So in general
a filesystem may not do this. Fix configfs so as not to do this.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

fbc8d4c0

fs: use fast counters for vfs caches · 3e880fb5

由 Nick Piggin 提交于 1月 07, 2011

percpu_counter library generates quite nasty code, so unless you need
to dynamically allocate counters or take fast approximate value, a
simple per cpu set of counters is much better.

The percpu_counter can never be made to work as well, because it has an
indirection from pointer to percpu memory, and it can't use direct
this_cpu_inc interfaces because it doesn't use static PER_CPU data, so
code will always be worse.

In the fastpath, it is the difference between this:

        incl %gs:nr_dentry      # nr_dentry

and this:

        movl    percpu_counter_batch(%rip), %edx        # percpu_counter_batch,
        movl    $1, %esi        #,
        movq    $nr_dentry, %rdi        #,
        call    __percpu_counter_add    # (plus I clobber registers)

__percpu_counter_add:
        pushq   %rbp    #
        movq    %rsp, %rbp      #,
        subq    $32, %rsp       #,
        movq    %rbx, -24(%rbp) #,
        movq    %r12, -16(%rbp) #,
        movq    %r13, -8(%rbp)  #,
        movq    %rdi, %rbx      # fbc, fbc
#APP
# 216 "/home/npiggin/usr/src/linux-2.6/arch/x86/include/asm/thread_info.h" 1
        movq %gs:kernel_stack,%rax      #, pfo_ret__
# 0 "" 2
#NO_APP
        incl    -8124(%rax)     # <variable>.preempt_count
        movq    32(%rdi), %r12  # <variable>.counters, tcp_ptr__
#APP
# 78 "lib/percpu_counter.c" 1
        add %gs:this_cpu_off, %r12      # this_cpu_off, tcp_ptr__
# 0 "" 2
#NO_APP
        movslq  (%r12),%r13     #* tcp_ptr__, tmp73
        movslq  %edx,%rax       # batch, batch
        addq    %rsi, %r13      # amount, count
        cmpq    %rax, %r13      # batch, count
        jge     .L27    #,
        negl    %edx    # tmp76
        movslq  %edx,%rdx       # tmp76, tmp77
        cmpq    %rdx, %r13      # tmp77, count
        jg      .L28    #,
.L27:
        movq    %rbx, %rdi      # fbc,
        call    _raw_spin_lock  #
        addq    %r13, 8(%rbx)   # count, <variable>.count
        movq    %rbx, %rdi      # fbc,
        movl    $0, (%r12)      #,* tcp_ptr__
        call    _raw_spin_unlock        #
.L29:
#APP
# 216 "/home/npiggin/usr/src/linux-2.6/arch/x86/include/asm/thread_info.h" 1
        movq %gs:kernel_stack,%rax      #, pfo_ret__
# 0 "" 2
#NO_APP
        decl    -8124(%rax)     # <variable>.preempt_count
        movq    -8136(%rax), %rax       #, D.14625
        testb   $8, %al #, D.14625
        jne     .L32    #,
.L31:
        movq    -24(%rbp), %rbx #,
        movq    -16(%rbp), %r12 #,
        movq    -8(%rbp), %r13  #,
        leave
        ret
        .p2align 4,,10
        .p2align 3
.L28:
        movl    %r13d, (%r12)   # count,*
        jmp     .L29    #
.L32:
        call    preempt_schedule        #
        .p2align 4,,6
        jmp     .L31    #
        .size   __percpu_counter_add, .-__percpu_counter_add
        .p2align 4,,15
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

3e880fb5

vfs: revert per-cpu nr_unused counters for dentry and inodes · 86c8749e

由 Nick Piggin 提交于 1月 07, 2011

The nr_unused counters count the number of objects on an LRU, and as such they
are synchronized with LRU object insertion and removal and scanning, and
protected under the LRU lock.

Making it per-cpu does not actually get any concurrency improvements because of
this lock, and summing the counter is much slower, and
incrementing/decrementing it costs more code size and is slower too.

These counters should stay per-LRU, which currently means global.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

86c8749e

fs: d_validate fixes · 786a5e15

由 Nick Piggin 提交于 1月 07, 2011

d_validate has been broken for a long time.

kmem_ptr_validate does not guarantee that a pointer can be dereferenced
if it can go away at any time. Even rcu_read_lock doesn't help, because
the pointer might be queued in RCU callbacks but not executed yet.

So the parent cannot be checked, nor the name hashed. The dentry pointer
can not be touched until it can be verified under lock. Hashing simply
cannot be used.

Instead, verify the parent/child relationship by traversing parent's
d_child list. It's slow, but only ncpfs and the destaged smbfs care
about it, at this point.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

786a5e15

05 1月, 2011 1 次提交

Revert "fs: use RCU read side protection in d_validate" · d3a23e16

由 Nick Piggin 提交于 1月 05, 2011

This reverts commit 3825bdb7.

You cannot dget() a dentry without having a reference, or holding
a lock that guarantees it remains valid.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

d3a23e16

24 12月, 2010 1 次提交

ext4: fix on-line resizing regression · 8a7411a2

由 Theodore Ts'o 提交于 12月 20, 2010

https://bugzilla.kernel.org/show_bug.cgi?id=25352

This regression was caused by commit a31437b8: "ext4: use
sb_issue_zeroout in setup_new_group_blocks", by accidentally dropping
the code which reserved the block group descriptor and inode table
blocks.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

8a7411a2

23 12月, 2010 2 次提交

logfs: fix "Kernel BUG at readwrite.c:1193" · f06328d7

由 Prasad Joshi 提交于 12月 21, 2010

This happens when __logfs_create() tries to write a new inode to the disk
which is full.

__logfs_create() associates the transaction pointer with inode.  During
the logfs_write_inode() function call chain this transaction pointer is
moved from inode to page->private using function move_inode_to_page
(do_write_inode() -> inode_to_page() -> move_inode_to_page)

When the write inode fails, the transaction is aborted and iput is called
on the failed inode.  During delete_inode the same transaction pointer
associated with the page is getting used.  Thus causing kernel BUG.

The patch checks for error in write_inode() and restores the page->private
to NULL.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=20162Signed-off-by: NPrasad Joshi <prasadjoshi124@gmail.com>
Cc: Joern Engel <joern@logfs.org>
Cc: Florian Mickler <florian@mickler.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f06328d7

logfs: fix deadlock in logfs_get_wblocks, hold and wait on super->s_write_mutex · eabb26ca

由 Prasad Joshi 提交于 12月 21, 2010

do_logfs_journal_wl_pass() should use GFP_NOFS for memory allocation GC
code calls btree_insert32 with GFP_KERNEL while holding a mutex
super->s_write_mutex.

The same mutex is used in address_space_operations->writepage(), and a
call to writepage() could be triggered as a result of memory allocation
in btree_insert32, causing a deadlock.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=20342Signed-off-by: NPrasad Joshi <prasadjoshi124@gmail.com>
Cc: Joern Engel <joern@logfs.org>
Cc: Florian Mickler <florian@mickler.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

eabb26ca

22 12月, 2010 1 次提交

ocfs2: Fix system inodes cache overflow. · 7d8f9876

由 Tao Ma 提交于 12月 22, 2010

When we store system inodes cache in ocfs2_super,
we use a array for global system inodes. But unfortunately,
the range is calculated wrongly which makes it overflow and
pollute ocfs2_super->local_system_inodes.
This patch fix it by setting the range properly.

The corresponding bug is ossbug1303.
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1303

Cc: stable@kernel.org
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

7d8f9876

21 12月, 2010 1 次提交

Fix btrfs b0rkage · 3cb50ddf

由 Al Viro 提交于 12月 20, 2010

Buggered-in: 76dda93c ("Btrfs: add snapshot/subvolume destroy
ioctl")
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Acked-by: NChris Mason <chris.mason@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3cb50ddf

18 12月, 2010 2 次提交

ceph: mark user pages dirty on direct-io reads · b6aa5901

由 Henry C Chang 提交于 12月 15, 2010

For read operation, we have to set the argument _write_ of get_user_pages
to 1 since we will write data to pages. Also, we need to SetPageDirty before
releasing these pages.
Signed-off-by: NHenry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: NSage Weil <sage@newdream.net>

b6aa5901

ceph: fix null pointer dereference in ceph_init_dentry for nfs reexport · 92cf7652

由 Sage Weil 提交于 12月 17, 2010

The fh_to_dentry etc. methods use ceph_init_dentry(), which assumes that
d_parent is defined.  It isn't for those callers, so check!
Signed-off-by: NSage Weil <sage@newdream.net>

92cf7652

16 12月, 2010 6 次提交

ocfs2: Hold ip_lock when set/clear flags for indexed dir. · 8ac33dc8

由 Tao Ma 提交于 12月 15, 2010

When we set/clear the dyn_features for an inode we hold the ip_lock.
So do it when we set/clear OCFS2_INDEXED_DIR_FL also.
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

8ac33dc8

ocfs2: Adjust masklog flag values · 41b41a26

由 Sunil Mushran 提交于 12月 09, 2010

Two masklogs had the same flag value.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

41b41a26

nilfs2: fix regression of garbage collection ioctl · 947b10ae

由 Ryusuke Konishi 提交于 12月 16, 2010

On 2.6.37-rc1, garbage collection ioctl of nilfs was broken due to the
commit 263d90ce ("nilfs2: remove own inode hash used for GC"),
and leading to filesystem corruption.

The patch doesn't queue gc-inodes for log writer if they are reused
through the vfs inode cache.  Here, gc-inode is the inode which
buffers blocks to be relocated on GC.  That patch queues gc-inodes in
nilfs_init_gcinode() function, but this function is not called when
they don't have I_NEW flag.  Thus, some of live blocks are wrongly
overrode without being moved to new logs.

This resolves the problem by moving the gc-inode queueing to an outer
function to ensure it's done right.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

947b10ae

ceph: fix direct-io on non-page-aligned buffers · ab226e21

由 Henry C Chang 提交于 12月 15, 2010

The user buffer may be 512-byte aligned, not page-aligned.  We were
assuming the buffer was page-aligned and only accounting for
non-page-aligned io offsets.
Signed-off-by: NHenry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: NSage Weil <sage@newdream.net>

ab226e21

install_special_mapping skips security_file_mmap check. · 462e635e

由 Tavis Ormandy 提交于 12月 09, 2010

The install_special_mapping routine (used, for example, to setup the
vdso) skips the security check before insert_vm_struct, allowing a local
attacker to bypass the mmap_min_addr security restriction by limiting
the available pages for special mappings.

bprm_mm_init() also skips the check, and although I don't think this can
be used to bypass any restrictions, I don't see any reason not to have
the security check.

  $ uname -m
  x86_64
  $ cat /proc/sys/vm/mmap_min_addr
  65536
  $ cat install_special_mapping.s
  section .bss
      resb BSS_SIZE
  section .text
      global _start
      _start:
          mov     eax, __NR_pause
          int     0x80
  $ nasm -D__NR_pause=29 -DBSS_SIZE=0xfffed000 -f elf -o install_special_mapping.o install_special_mapping.s
  $ ld -m elf_i386 -Ttext=0x10000 -Tbss=0x11000 -o install_special_mapping install_special_mapping.o
  $ ./install_special_mapping &
  [1] 14303
  $ cat /proc/14303/maps
  0000f000-00010000 r-xp 00000000 00:00 0                                  [vdso]
  00010000-00011000 r-xp 00001000 00:19 2453665                            /home/taviso/install_special_mapping
  00011000-ffffe000 rwxp 00000000 00:00 0                                  [stack]

It's worth noting that Red Hat are shipping with mmap_min_addr set to
4096.
Signed-off-by: NTavis Ormandy <taviso@google.com>
Acked-by: NKees Cook <kees@ubuntu.com>
Acked-by: NRobert Swiecki <swiecki@google.com>
[ Changed to not drop the error code - akpm ]
Reviewed-by: NJames Morris <jmorris@namei.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

462e635e

fanotify: fill in the metadata_len field on struct fanotify_event_metadata · 7d131623

由 Eric Paris 提交于 12月 07, 2010

The fanotify_event_metadata now has a field which is supposed to
indicate the length of the metadata portion of the event.  Fill in that
field as well.
Based-in-part-on-patch-by: NAlexey Zaytsev <alexey.zaytsev@gmail.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

7d131623

15 12月, 2010 2 次提交

ext4: fix typo which broke '..' detection in ext4_find_entry() · 6d5c3aa8

由 Aaro Koskinen 提交于 12月 14, 2010

There should be a check for the NUL character instead of '0'.

Fortunately the only thing that cares about this is NFS serving, which
is why we didn't notice this in the merge window testing.
Reported-by: NPhil Carmody <ext-phil.2.carmody@nokia.com>
Signed-off-by: NAaro Koskinen <aaro.koskinen@nokia.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

6d5c3aa8

ext4: Turn off multiple page-io submission by default · 1449032b

由 Theodore Ts'o 提交于 12月 14, 2010

Jon Nelson has found a test case which causes postgresql to fail with
the error:

psql:t.sql:4: ERROR: invalid page header in block 38269 of relation base/16384/16581

Under memory pressure, it looks like part of a file can end up getting
replaced by zero's.  Until we can figure out the cause, we'll roll
back the change and use block_write_full_page() instead of
ext4_bio_write_page().  The new, more efficient writing function can
be used via the mount option mblk_io_submit, so we can test and fix
the new page I/O code.

To reproduce the problem, install postgres 8.4 or 9.0, and pin enough
memory such that the system just at the end of triggering writeback
before running the following sql script:

begin;
create temporary table foo as select x as a, ARRAY[x] as b FROM
generate_series(1, 10000000 ) AS x;
create index foo_a_idx on foo (a);
create index foo_b_idx on foo USING GIN (b);
rollback;

If the temporary table is created on a hard drive partition which is
encrypted using dm_crypt, then under memory pressure, approximately
30-40% of the time, pgsql will issue the above failure.

This patch should fix this problem, and the problem will come back if
the file system is mounted with the mblk_io_submit mount option.
Reported-by: NJon Nelson <jnelson@jamponi.net>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

1449032b

14 12月, 2010 3 次提交

Btrfs: prevent RAID level downgrades when space is low · 83a50de9

由 Chris Mason 提交于 12月 13, 2010

The extent allocator has code that allows us to fill
allocations from any available block group, even if it doesn't
match the raid level we've requested.

This was put in because adding a new drive to a filesystem
made with the default mkfs options actually upgrades the metadata from
single spindle dup to full RAID1.

But, the code also allows us to allocate from a raid0 chunk when we
really want a raid1 or raid10 chunk.  This can cause big trouble because
mkfs creates a small (4MB) raid0 chunk for data and metadata which then
goes unused for raid1/raid10 installs.

The allocator will happily wander in and allocate from that chunk when
things get tight, which is not correct.

The fix here is to make sure that we provide duplication when the
caller has asked for it.  It does all the dups to be any raid level,
which preserves the dup->raid1 upgrade abilities.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

83a50de9

Btrfs: account for missing devices in RAID allocation profiles · cd02dca5

由 Chris Mason 提交于 12月 13, 2010

When we mount in RAID degraded mode without adding a new device to
replace the failed one, we can end up using the wrong RAID flags for
allocations.

This results in strange combinations of block groups (raid1 in a raid10
filesystem) and corruptions when we try to allocate blocks from single
spindle chunks on drives that are actually missing.

The first device has two small 4MB chunks in it that mkfs creates and
these are usually unused in a raid1 or raid10 setup.  But, in -o degraded,
the allocator will fall back to these because the mask of desired raid groups
isn't correct.

The fix here is to count the missing devices as we build up the list
of devices in the system.  This count is used when picking the
raid level to make sure we continue using the same levels that were
in place before we lost a drive.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

cd02dca5

Btrfs: EIO when we fail to read tree roots · 68433b73

由 Chris Mason 提交于 12月 13, 2010

If we just get a plain IO error when we read tree roots, the code
wasn't properly sending that error up the chain.  This allowed mounts to
continue when they should failed, and allowed operations
on partially setup root structs.  The end result was usually oopsen
on spinlocks that hadn't been spun up correctly.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

68433b73

11 12月, 2010 8 次提交

Btrfs: fix compiler warnings · 3dd1462e

由 Jan Beulich 提交于 12月 07, 2010

... regarding an unused function when !MIGRATION, and regarding a
printk() format string vs argument mismatch.
Signed-off-by: NJan Beulich <jbeulich@novell.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

3dd1462e

Btrfs: Make async snapshot ioctl more generic · fdfb1e4f

由 Li Zefan 提交于 12月 10, 2010

If we had reserved some bytes in struct btrfs_ioctl_vol_args, we
wouldn't have to create a new structure for async snapshot creation.

Here we convert async snapshot ioctl to use a more generic ABI, as
we'll add more ioctls for snapshots/subvolumes in the future, readonly
snapshots for example.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

fdfb1e4f

Btrfs: pwrite blocked when writing from the mmaped buffer of the same page · 914ee295

由 Xin Zhong 提交于 12月 09, 2010

This problem is found in meego testing:
http://bugs.meego.com/show_bug.cgi?id=6672
A file in btrfs is mmaped and the mmaped buffer is passed to pwrite to write to the same page
of the same file. In btrfs_file_aio_write(), the pages is locked by prepare_pages(). So when
btrfs_copy_from_user() is called, page fault happens and the same page needs to be locked again
in filemap_fault(). The fix is to move iov_iter_fault_in_readable() before prepage_pages() to make page
fault happen before pages are locked. And also disable page fault in critical region in
btrfs_copy_from_user().

Reviewed-by: Yan, Zheng<zheng.z.yan@intel.com>
Signed-off-by: NZhong, Xin <xin.zhong@intel.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

914ee295

Btrfs: Fix a crash when mounting a subvolume · f106e82c

由 Li Zefan 提交于 12月 07, 2010

We should drop dentry before deactivating the superblock, otherwise
we can hit this bug:

BUG: Dentry f349a690{i=100,n=/} still in use (1) [unmount of btrfs loop1]
...

Steps to reproduce the bug:

  # mount /dev/loop1 /mnt
  # mkdir save
  # btrfs subvolume snapshot /mnt save/snap1
  # umount /mnt
  # mount -o subvol=save/snap1 /dev/loop1 /mnt
  (crash)
Reported-by: NMichael Niederle <mniederle@gmx.at>
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

f106e82c

Btrfs: fix sync subvol/snapshot creation · 75eaa0e2

由 Sage Weil 提交于 12月 10, 2010

We were incorrectly taking the async path even for the sync ioctls by
passing in &transid unconditionally.

There's ample room for further cleanup here, but this keeps the fix simple.
Signed-off-by: NSage Weil <sage@newdream.net>
Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

75eaa0e2

Btrfs: Fix page leak in compressed writeback path · 24ae6365

由 Yan, Zheng 提交于 12月 06, 2010

"start + num_bytes >= actual_end" can happen when compressed page writeback races
with file truncation. In that case we need unlock and release pages past the end
of file.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

24ae6365

Btrfs: do not BUG if we fail to remove the orphan item for dead snapshots · 84cd948c

由 Josef Bacik 提交于 12月 08, 2010

Not being able to delete an orphan item isn't a horrible thing. The worst that
happens is the next time around we try and do the orphan cleanup and we can't
find the referenced object and just delete the item and move on.
Signed-off-by: NJosef Bacik <josef@redhat.com>

84cd948c

NFS: Fix panic after nfs_umount() · 5b362ac3

由 Chuck Lever 提交于 12月 10, 2010

After a few unsuccessful NFS mount attempts in which the client and
server cannot agree on an authentication flavor both support, the
client panics.  nfs_umount() is invoked in the kernel in this case.

Turns out nfs_umount()'s UMNT RPC invocation causes the RPC client to
write off the end of the rpc_clnt's iostat array.  This is because the
mount client's nrprocs field is initialized with the count of defined
procedures (two: MNT and UMNT), rather than the size of the client's
proc array (four).

The fix is to use the same initialization technique used by most other
upper layer clients in the kernel.

Introduced by commit 0b524123, which failed to update nrprocs when
support was added for UMNT in the kernel.

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=24302
BugLink: http://bugs.launchpad.net/bugs/683938Reported-by: NStefan Bader <stefan.bader@canonical.com>
Tested-by: NStefan Bader <stefan.bader@canonical.com>
Cc: stable@kernel.org # >= 2.6.32
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

5b362ac3

10 12月, 2010 6 次提交

Ocfs2: Teach 'coherency=full' O_DIRECT writes to correctly up_read i_alloc_sem. · 39c99f12

由 Tristan Ye 提交于 12月 07, 2010

Due to newly-introduced 'coherency=full' O_DIRECT writes also takes the EX
rw_lock like buffered writes did(rw_level == 1), it turns out messing the
usage of 'level' in ocfs2_dio_end_io() up, which caused i_alloc_sem being
failed to get up_read'd correctly.

This patch tries to teach ocfs2_dio_end_io to understand well on all locking
stuffs by explicitly introducing a new bit for i_alloc_sem in iocb's private
data, just like what we did for rw_lock.
Signed-off-by: NTristan Ye <tristan.ye@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

39c99f12

ocfs2/dlm: Migrate lockres with no locks if it has a reference · 388c4bcb

由 Sunil Mushran 提交于 11月 19, 2010

o2dlm was not migrating resources with zero locks because it assumed that that
resource would get purged by dlm_thread. However, some usage patterns involve
creating and dropping locks at a high rate leading to the migrate thread seeing
zero locks but the purge thread seeing an active reference. When this happens,
the dlm_thread cannot purge the resource and the migrate thread sees no reason
to migrate that resource. The spell is broken when the migrate thread catches
the resource with a lock.

The fix is to make the migrate thread also consider the reference map.

This usage pattern can be triggered by userspace on userdlm locks and flocks.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

388c4bcb

xfs: log timestamp changes to the source inode in rename · 05340d4a

由 Christoph Hellwig 提交于 12月 07, 2010

Now that we don't mark VFS inodes dirty anymore for internal
timestamp changes, but rely on the transaction subsystem to push
them out, we need to explicitly log the source inode in rename after
updating it's timestamps to make sure the changes actually get
forced out by sync/fsync or an AIL push.

We already account for the fourth inode in the log reservation, as a
rename of directories needs to update the nlink field, so just
adding the xfs_trans_log_inode call is enough.

This fixes the xfsqa 065 regression introduced by:

	"xfs: don't use vfs writeback for pure metadata modifications"
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

05340d4a

Btrfs: fixup return code for btrfs_del_orphan_item · 7e1fea73

由 Josef Bacik 提交于 12月 08, 2010

If the orphan item doesn't exist, we return 1, which doesn't make any sense to
the callers. Instead return -ENOENT if we didn't find the item. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

7e1fea73

Btrfs: do not do fast caching if we are allocating blocks for tree_root · b8399dee

由 Josef Bacik 提交于 12月 08, 2010

Since the fast caching uses normal tree locking, we can possibly deadlock if we
get to the caching via a btrfs_search_slot() on the tree_root. So just check to
see if the root we are on is the tree root, and just don't do the fast caching.
Reported-by: NSage Weil <sage@newdream.net>
Signed-off-by: NJosef Bacik <josef@redhat.com>

b8399dee

Btrfs: deal with space cache errors better · 2b20982e

由 Josef Bacik 提交于 12月 03, 2010

Currently if the space cache inode generation number doesn't match the
generation number in the space cache header we will just fail to load the space
cache, but we won't mark the space cache as an error, so we'll keep getting that
error each time somebody tries to cache that block group until we actually clear
the thing. Fix this by marking the space cache as having an error so we only
get the message once. This patch also makes it so that we don't try and setup
space cache for a block group that isn't cached, since we won't be able to write
it out anyway. None of these problems are actual problems, they are just
annoying and sub-optimal. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

2b20982e