1. 24 2月, 2013 6 次提交
  2. 22 2月, 2013 11 次提交
    • Z
      binfmt_elf: remove unused argument in fill_elf_header · d3330cf0
      Zhang Yanfei 提交于
      In fill_elf_header(), elf->e_ident[EI_OSABI] is always set to ELF_OSABI,
      so remove the unused argument 'osabi'.
      Signed-off-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d3330cf0
    • J
      ubifs: wait for page writeback to provide stable pages · 182dcfd6
      Jan Kara 提交于
      When stable pages are required, we have to wait if the page is just
      going to disk and we want to modify it.  Add proper callback to
      ubifs_vm_page_mkwrite().
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Cc: Artem Bityutskiy <dedekind1@gmail.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Eric Van Hensbergen <ericvh@gmail.com>
      Cc: Ron Minnich <rminnich@sandia.gov>
      Cc: Latchesar Ionkov <lucho@ionkov.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      182dcfd6
    • J
      ocfs2: wait for page writeback to provide stable pages · 1269529b
      Jan Kara 提交于
      When stable pages are required, we have to wait if the page is just
      going to disk and we want to modify it.  Add proper callback to
      ocfs2_grab_pages_for_write().
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Acked-by: NJoel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Artem Bityutskiy <dedekind1@gmail.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Eric Van Hensbergen <ericvh@gmail.com>
      Cc: Ron Minnich <rminnich@sandia.gov>
      Cc: Latchesar Ionkov <lucho@ionkov.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1269529b
    • D
      block: optionally snapshot page contents to provide stable pages during write · ffecfd1a
      Darrick J. Wong 提交于
      This provides a band-aid to provide stable page writes on jbd without
      needing to backport the fixed locking and page writeback bit handling
      schemes of jbd2.  The band-aid works by using bounce buffers to snapshot
      page contents instead of waiting.
      
      For those wondering about the ext3 bandage -- fixing the jbd locking
      (which was done as part of ext4dev years ago) is a lot of surgery, and
      setting PG_writeback on data pages when we actually hold the page lock
      dropped ext3 performance by nearly an order of magnitude.  If we're
      going to migrate iscsi and raid to use stable page writes, the
      complaints about high latency will likely return.  We might as well
      centralize their page snapshotting thing to one place.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Tested-by: NAndy Lutomirski <luto@amacapital.net>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Artem Bityutskiy <dedekind1@gmail.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Eric Van Hensbergen <ericvh@gmail.com>
      Cc: Ron Minnich <rminnich@sandia.gov>
      Cc: Latchesar Ionkov <lucho@ionkov.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ffecfd1a
    • D
      9pfs: fix filesystem to wait for stable page writeback · 13575ca1
      Darrick J. Wong 提交于
      Fix up the ->page_mkwrite handler to provide stable page writes if necessary.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Artem Bityutskiy <dedekind1@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Eric Van Hensbergen <ericvh@gmail.com>
      Cc: Ron Minnich <rminnich@sandia.gov>
      Cc: Latchesar Ionkov <lucho@ionkov.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      13575ca1
    • D
      mm: only enforce stable page writes if the backing device requires it · 1d1d1a76
      Darrick J. Wong 提交于
      Create a helper function to check if a backing device requires stable
      page writes and, if so, performs the necessary wait.  Then, make it so
      that all points in the memory manager that handle making pages writable
      use the helper function.  This should provide stable page write support
      to most filesystems, while eliminating unnecessary waiting for devices
      that don't require the feature.
      
      Before this patchset, all filesystems would block, regardless of whether
      or not it was necessary.  ext3 would wait, but still generate occasional
      checksum errors.  The network filesystems were left to do their own
      thing, so they'd wait too.
      
      After this patchset, all the disk filesystems except ext3 and btrfs will
      wait only if the hardware requires it.  ext3 (if necessary) snapshots
      pages instead of blocking, and btrfs provides its own bdi so the mm will
      never wait.  Network filesystems haven't been touched, so either they
      provide their own stable page guarantees or they don't block at all.
      The blocking behavior is back to what it was before 3.0 if you don't
      have a disk requiring stable page writes.
      
      Here's the result of using dbench to test latency on ext2:
      
      3.8.0-rc3:
       Operation      Count    AvgLat    MaxLat
       ----------------------------------------
       WriteX        109347     0.028    59.817
       ReadX         347180     0.004     3.391
       Flush          15514    29.828   287.283
      
      Throughput 57.429 MB/sec  4 clients  4 procs  max_latency=287.290 ms
      
      3.8.0-rc3 + patches:
       WriteX        105556     0.029     4.273
       ReadX         335004     0.005     4.112
       Flush          14982    30.540   298.634
      
      Throughput 55.4496 MB/sec  4 clients  4 procs  max_latency=298.650 ms
      
      As you can see, the maximum write latency drops considerably with this
      patch enabled.  The other filesystems (ext3/ext4/xfs/btrfs) behave
      similarly, but see the cover letter for those results.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Artem Bityutskiy <dedekind1@gmail.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Eric Van Hensbergen <ericvh@gmail.com>
      Cc: Ron Minnich <rminnich@sandia.gov>
      Cc: Latchesar Ionkov <lucho@ionkov.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1d1d1a76
    • J
      ocfs2: unlock super lock if lockres refresh failed · 3278bb74
      Junxiao Bi 提交于
      If lockres refresh failed, the super lock will never be released which
      will cause some processes on other cluster nodes hung forever.
      Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3278bb74
    • T
      ocfs2: remove kfree() redundant null checks · d787ab09
      Tim Gardner 提交于
      smatch analysis indicates a number of redundant NULL checks before
      calling kfree(), eg:
      
        fs/ocfs2/alloc.c:6138 ocfs2_begin_truncate_log_recovery() info:
         redundant null check on *tl_copy calling kfree()
      
        fs/ocfs2/alloc.c:6755 ocfs2_zero_range_for_truncate() info:
         redundant null check on pages calling kfree()
      
      etc....
      
      [akpm@linux-foundation.org: revert dubious change in ocfs2_begin_truncate_log_recovery()]
      Signed-off-by: NTim Gardner <tim.gardner@canonical.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Acked-by: NJoel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d787ab09
    • W
      configfs: move the dereference below the NULL test · 49deb4bc
      Wei Yongjun 提交于
      The dereference should be moved below the NULL test.
      
      spatch with a semantic match is used to found this.
      (http://coccinelle.lip6.fr/)
      Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      49deb4bc
    • M
      fs/block_dev.c: page cache wrongly left invalidated after revalidate_disk() · 7630b661
      MITSUNARI Shigeo 提交于
      We found that bdev->bd_invalidated was left set once revalidate_disk()
      is called, which results in page cache flush every time that device is
      open.
      
      Specifically, we found this problem in MD block device.  Once we resize
      a MD device, mdadm --monitor periodically flush all page cache for that
      device every 60 or 1000 seconds when it opens the device.
      
      This bug lies since at least 3.2.0 till the latest kernel(3.6.2).  Patch
      is attached.
      
      The following steps will reproduce the problem.
      
      1. prepair a block device (eg /dev/sdb).
      
      2. create two partitions:
      
         sudo parted /dev/sdb
         mklabel gpt
         mkpart primary 0% 50%
         mkpart primary 50% 100%
      
      3. create a md device.
      
         sudo mdadm -C /dev/md/hoge -l 1 -n 2 -e 1.2 --assume-clean --auto=md --symlink=no /dev/sdb1 /dev/sdb2
      
      4. create file system and mount it
      
         sudo mkfs.ext3 /dev/md/hoge
         sudo mkdir /mnt/test
         sudo mount /dev/md/hoge /mnt/test
      
      5. try to resize the device
      
         sudo mdadm -G /dev/md/hoge --size=max
      
      6. create a file to fill file cache.
      
        sudo dd if=/dev/urandom of=/mnt/test/data bs=1M count=10
      
      and verify the current status of file by free command.
      
      7. mdadm monitor will open the md device every 1000 seconds and you
         will find all file cache on the device are cleared.
      
      The timing can be reduced by the following steps.
      
      a) kill mdadm and restart it with --delay option
      
         /sbin/mdadm --monitor --delay=30 --pid-file /var/run/mdadm/monitor.pid --daemonise --scan --syslog
      
      or open the md device directly.
      
         sudo dd if=/dev/md/hoge of=/dev/null bs=4096 count=1
      Signed-off-by: NMITSUNARI Shigeo <herumi@nifty.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7630b661
    • J
      inotify: remove broken mask checks causing unmount to be EINVAL · 676a0675
      Jim Somerville 提交于
      Running the command:
      
      	inotifywait -e unmount /mnt/disk
      
      immediately aborts with a -EINVAL return code.  This is however a valid
      parameter.  This abort occurs only if unmount is the sole event
      parameter.  If other event parameters are supplied, then the unmount
      event wait will work.
      
      The problem was introduced by commit 44b350fc ("inotify: Fix mask
      checks").  In that commit, it states:
      
      	The mask checks in inotify_update_existing_watch() and
      	inotify_new_watch() are useless because inotify_arg_to_mask()
      	sets FS_IN_IGNORED and FS_EVENT_ON_CHILD bits anyway.
      
      But instead of removing the useless checks, it did this:
      
      	        mask = inotify_arg_to_mask(arg);
      	-       if (unlikely(!mask))
      	+       if (unlikely(!(mask & IN_ALL_EVENTS)))
      	                return -EINVAL;
      
      The problem is that IN_ALL_EVENTS doesn't include IN_UNMOUNT, and other
      parts of the code keep IN_UNMOUNT separate from IN_ALL_EVENTS.  So the
      check should be:
      
      	if (unlikely(!(mask & (IN_ALL_EVENTS | IN_UNMOUNT))))
      
      But inotify_arg_to_mask(arg) always sets the IN_UNMOUNT bit in the mask
      anyway, so the check is always going to pass and thus should simply be
      removed.  Also note that inotify_arg_to_mask completely controls what
      mask bits get set from arg, there's no way for invalid bits to get
      enabled there.
      
      Lets fix it by simply removing the useless broken checks.
      Signed-off-by: NJim Somerville <Jim.Somerville@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: John McCutchan <john@johnmccutchan.com>
      Cc: Robert Love <rlove@rlove.org>
      Cc: Eric Paris <eparis@parisplace.org>
      Cc: <stable@vger.kernel.org>		[2.6.37+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      676a0675
  3. 20 2月, 2013 1 次提交
  4. 19 2月, 2013 2 次提交
  5. 18 2月, 2013 3 次提交
    • F
      umount oops when remove blocklayoutdriver first · 5a12cca6
      fanchaoting 提交于
      now pnfs client uses block layout, maybe we can remove
      blocklayoutdriver first. if we umount later,
      it can cause oops in unset_pnfs_layoutdriver.
      because nfss->pnfs_curr_ld->clear_layoutdriver is invalid.
      
      reproduce it:
       modprobe  blocklayoutdriver
       mount -t nfs4 -o minorversion=1 pnfsip:/ /mnt/
       rmmod blocklayoutdriver
       umount /mnt
      
      then you can see following
      
      CPU 0
      Pid: 17023, comm: umount.nfs4 Tainted: GF          O 3.7.0-rc6-pnfs #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
      RIP: 0010:[<ffffffffa04cfe6d>]  [<ffffffffa04cfe6d>] unset_pnfs_layoutdriver+0x1d/0x70 [nfsv4]
      RSP: 0018:ffff8800022d9e48  EFLAGS: 00010286
      RAX: ffffffffa04a1b00 RBX: ffff88000b013800 RCX: 0000000000000001
      RDX: ffffffff81ae8ee0 RSI: ffff880001ee94b8 RDI: ffff88000b013800
      RBP: ffff8800022d9e58 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff880001ee9400
      R13: ffff8800105978c0 R14: 00007fff25846c08 R15: 0000000001bba550
      FS:  00007f45ae7f0700(0000) GS:ffff880012c00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: ffffffffa04a1b38 CR3: 0000000002c0c000 CR4: 00000000000006f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process umount.nfs4 (pid: 17023, threadinfo ffff8800022d8000, task ffff880006e48aa0)
      Stack:
      ffff8800105978c0 ffff88000b013800 ffff8800022d9e78 ffffffffa04cd0ce
      ffff8800022d9e78 ffff88000b013800 ffff8800022d9ea8 ffffffffa04755a7
      ffff8800022d9ea8 ffff880002f96400 ffff88000b013800 ffff880002f96400
      Call Trace:
      [<ffffffffa04cd0ce>] nfs4_destroy_server+0x1e/0x30 [nfsv4]
      [<ffffffffa04755a7>] nfs_free_server+0xb7/0x150 [nfs]
      [<ffffffffa047d4d5>] nfs_kill_super+0x35/0x40 [nfs]
      [<ffffffff81178d35>] deactivate_locked_super+0x45/0x70
      [<ffffffff8117986a>] deactivate_super+0x4a/0x70
      [<ffffffff81193ee2>] mntput_no_expire+0xd2/0x130
      [<ffffffff81194d62>] sys_umount+0x72/0xe0
      [<ffffffff8154af59>] system_call_fastpath+0x16/0x1b
      Code: 06 e1 b8 ea ff ff ff eb 9e 0f 1f 44 00 00 55 48 89 e5 53 48 83 ec 08 66 66 66 66 90 48 8b 87 80 03 00 00 48 89 fb 48 85 c0 74 29 <48> 8b 40 38 48 85 c0 74 02 ff d0 48 8b 03 3e ff 48 04 0f 94 c2
      RIP  [<ffffffffa04cfe6d>] unset_pnfs_layoutdriver+0x1d/0x70 [nfsv4]
      RSP <ffff8800022d9e48>
      CR2: ffffffffa04a1b38
      ---[ end trace 29f75aaedda058bf ]---
      
      Signed-off-by: fanchaoting<fanchaoting@cn.fujitsu.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Cc: stable@vger.kernel.org
      5a12cca6
    • T
      nfs: remove kfree() redundant null checks · 96aa1549
      Tim Gardner 提交于
      smatch analysis:
      
      fs/nfs/getroot.c:130 nfs_get_root() info: redundant null
       check on name calling kfree()
      
      fs/nfs/unlink.c:272 nfs_async_unlink() info: redundant null
       check on devname_garbage calling kfree()
      
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: linux-nfs@vger.kernel.org
      Signed-off-by: NTim Gardner <tim.gardner@canonical.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      96aa1549
    • W
      NFSv4.1: Don't decode skipped layoutgets · 085b7a45
      Weston Andros Adamson 提交于
      layoutget's prepare hook can call rpc_exit with status = NFS4_OK (0).
      Because of this, nfs4_proc_layoutget can't depend on a 0 status to mean
      that the RPC was successfully sent, received and parsed.
      
      To fix this, use the result's len member to see if parsing took place.
      
      This fixes the following OOPS -- calling xdr_init_decode() with a buffer length
      0 doesn't set the stream's 'p' member and ends up using uninitialized memory
      in filelayout_decode_layout.
      
      BUG: unable to handle kernel paging request at 0000000000008050
      IP: [<ffffffff81282e78>] memcpy+0x18/0x120
      PGD 0
      Oops: 0000 [#1] SMP
      last sysfs file: /sys/devices/pci0000:00/0000:00:11.0/0000:02:01.0/irq
      CPU 1
      Modules linked in: nfs_layout_nfsv41_files nfs lockd fscache auth_rpcgss nfs_acl autofs4 sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 dm_mirror dm_region_hash dm_log dm_mod ppdev parport_pc parport snd_ens1371 snd_rawmidi snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc e1000 microcode vmware_balloon i2c_piix4 i2c_core sg shpchp ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix mptspi mptscsih mptbase scsi_transport_spi [last unloaded: speedstep_lib]
      
      Pid: 1665, comm: flush-0:22 Not tainted 2.6.32-356-test-2 #2 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
      RIP: 0010:[<ffffffff81282e78>]  [<ffffffff81282e78>] memcpy+0x18/0x120
      RSP: 0018:ffff88003dfab588  EFLAGS: 00010206
      RAX: ffff88003dc42000 RBX: ffff88003dfab610 RCX: 0000000000000009
      RDX: 000000003f807ff0 RSI: 0000000000008050 RDI: ffff88003dc42000
      RBP: ffff88003dfab5b0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000080 R12: 0000000000000024
      R13: ffff88003dc42000 R14: ffff88003f808030 R15: ffff88003dfab6a0
      FS:  0000000000000000(0000) GS:ffff880003420000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      CR2: 0000000000008050 CR3: 000000003bc92000 CR4: 00000000001407e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process flush-0:22 (pid: 1665, threadinfo ffff88003dfaa000, task ffff880037f77540)
      Stack:
      ffffffffa0398ac1 ffff8800397c5940 ffff88003dfab610 ffff88003dfab6a0
      <d> ffff88003dfab5d0 ffff88003dfab680 ffffffffa01c150b ffffea0000d82e70
      <d> 000000508116713b 0000000000000000 0000000000000000 0000000000000000
      Call Trace:
      [<ffffffffa0398ac1>] ? xdr_inline_decode+0xb1/0x120 [sunrpc]
      [<ffffffffa01c150b>] filelayout_decode_layout+0xeb/0x350 [nfs_layout_nfsv41_files]
      [<ffffffffa01c17fc>] filelayout_alloc_lseg+0x8c/0x3c0 [nfs_layout_nfsv41_files]
      [<ffffffff8150e6ce>] ? __wait_on_bit+0x7e/0x90
      Signed-off-by: NWeston Andros Adamson <dros@netapp.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Cc: stable@vger.kernel.org
      085b7a45
  6. 15 2月, 2013 5 次提交
    • D
      xfs: xfs_bmap_add_attrfork_local is too generic · 1e82379b
      Dave Chinner 提交于
      When we are converting local data to an extent format as a result of
      adding an attribute, the type of data contained in the local fork
      determines the behaviour that needs to occur.
      
      xfs_bmap_add_attrfork_local() already handles the directory data
      case specially by using S_ISDIR() and calling out to
      xfs_dir2_sf_to_block(), but with verifiers we now need to handle
      each different type of metadata specially and different metadata
      formats require different verifiers (and eventually block header
      initialisation).
      
      There is only a single place that we add and attribute fork to
      the inode, but that is in the attribute code and it knows nothing
      about the specific contents of the data fork. It is only the case of
      local data that is the issue here, so adding code to hadnle this
      case in the attribute specific code is wrong. Hence we are really
      stuck trying to detect the data fork contents in
      xfs_bmap_add_attrfork_local() and performing the correct callout
      there.
      
      Luckily the current cases can be determined by S_IS* macros, and we
      can push the work off to data specific callouts, but each of those
      callouts does a lot of work in common with
      xfs_bmap_local_to_extents(). The only reason that this fails for
      symlinks right now is is that xfs_bmap_local_to_extents() assumes
      the data fork contains extent data, and so attaches a a bmap extent
      data verifier to the buffer and simply copies the data fork
      information straight into it.
      
      To fix this, allow us to pass a "formatting" callback into
      xfs_bmap_local_to_extents() which is responsible for setting the
      buffer type, initialising it and copying the data fork contents over
      to the new buffer. This allows callers to specify how they want to
      format the new buffer (which is necessary for the upcoming CRC
      enabled metadata blocks) and hence make xfs_bmap_local_to_extents()
      useful for any type of data fork content.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: Mark Tinguely <tinguely@sgi.com> 
      Signed-off-by: NBen Myers <bpm@sgi.com>
      1e82379b
    • B
      xfs: remove log force from xfs_buf_trylock() · fa5566e4
      Brian Foster 提交于
      The trylock log force invoked via xfs_buf_item_push() can attempt
      to acquire xa_lock, thus leading to a recursion bug when called
      with xa_lock held.
      
      This log force was originally added to xfs_buf_trylock() to address
      xfsaild stalls due to pinned and stale buffers. Since the addition
      of this behavior, the log item pushing code had been reworked to
      detect and track pinned items to inform xfsaild to issue a log
      force itself when necessary. As such, the log force on trylock
      failure is redundant and safe to remove.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      fa5566e4
    • B
      xfs: recheck buffer pinned status after push trylock failure · 5337fe9b
      Brian Foster 提交于
      The buffer pinned check and trylock sequence in xfs_buf_item_push()
      can race with an active transaction on marking the buffer pinned.
      This can result in the buffer becoming pinned and stale after the
      initial check and the trylock failure, but before the check in
      xfs_buf_trylock() that issues a log force. If the log force is
      issued from this context, a spinlock recursion occurs on xa_lock.
      
      Prepare xfs_buf_item_push() to handle the race by detecting a
      pinned buffer after the trylock failure so xfsaild issues a log
      force from a safe context. This, along with various previous fixes,
      renders the log force in xfs_buf_trylock() redundant.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      5337fe9b
    • D
      xfs: limit speculative prealloc size on sparse files · a1e16c26
      Dave Chinner 提交于
      Speculative preallocation based on the current file size works well
      for contiguous files, but is sub-optimal for sparse files where the
      EOF preallocation can fill holes and result in large amounts of
      zeros being written when it is not necessary.
      
      The algorithm is modified to prevent EOF speculative preallocation
      from triggering larger allocations on IO patterns of
      truncate--to-zero-seek-write-seek-write-....  which results in
      non-sparse files for large files. This, unfortunately, is the way cp
      now behaves when copying sparse files and so needs to be fixed.
      
      What this code does is that it looks at the existing extent adjacent
      to the current EOF and if it determines that it is a hole we disable
      speculative preallocation altogether. To avoid the next write from
      doing a large prealloc, it takes the size of subsequent
      preallocations from the current size of the existing EOF extent.
      IOWs, if you leave a hole in the file, it resets preallocation
      behaviour to the same as if it was a zero size file.
      
      Example new behaviour:
      
      $ xfs_io -f -c "pwrite 0 31m" \
                  -c "pwrite 33m 1m" \
                  -c "pwrite 128m 1m" \
                  -c "fiemap -v" /mnt/scratch/blah
      wrote 32505856/32505856 bytes at offset 0
      31 MiB, 7936 ops; 0.0000 sec (1.608 GiB/sec and 421432.7439 ops/sec)
      wrote 1048576/1048576 bytes at offset 34603008
      1 MiB, 256 ops; 0.0000 sec (1.462 GiB/sec and 383233.5329 ops/sec)
      wrote 1048576/1048576 bytes at offset 134217728
      1 MiB, 256 ops; 0.0000 sec (1.719 GiB/sec and 450704.2254 ops/sec)
      /mnt/scratch/blah:
       EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
         0: [0..65535]:      96..65631        65536   0x0
         1: [65536..67583]:  hole              2048
         2: [67584..69631]:  67680..69727      2048   0x0
         3: [69632..262143]: hole             192512
         4: [262144..264191]: 262240..264287    2048   0x1
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      a1e16c26
    • T
      NFSv4.1: Fix bulk recall and destroy of layouts · fd9a8d71
      Trond Myklebust 提交于
      The current code in pnfs_destroy_all_layouts() assumes that removing
      the layout from the server->layouts list is sufficient to make it
      invisible to other processes. This ignores the fact that most
      users access the layout through the nfs_inode->layout...
      There is further breakage due to lack of reference counting of the
      layouts, meaning that the whole thing Oopses at the drop of a hat.
      
      The code in initiate_bulk_draining() is almost correct, and can be
      used as a model for pnfs_destroy_all_layouts(), so move that
      code to pnfs.c, and refactor the code to allow us to choose between
      a single filesystem bulk recall, and a recall of all layouts.
      Also note that initiate_bulk_draining() currently calls iput() while
      holding locks. Fix that too.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Cc: stable@vger.kernel.org
      fd9a8d71
  7. 13 2月, 2013 2 次提交
  8. 12 2月, 2013 7 次提交
  9. 11 2月, 2013 3 次提交
    • M
      fs/9p: Fix atomic_open · b6f4bee0
      M. Mohan Kumar 提交于
      Return EEXISTS if requested file already exists, without this patch open
      call will always succeed even if the file exists and user specified
      O_CREAT|O_EXCL.
      
      Following test code can be used to verify this patch. Without this patch
      executing following test code on 9p mount will result in printing 'test case
      failed' always.
      
      main()
      {
              int fd;
      
              /* first create the file */
              fd = open("./file", O_CREAT|O_WRONLY);
              if (fd < 0) {
                      perror("open");
                      return -1;
              }
              close(fd);
      
              /* Now opening same file with O_CREAT|O_EXCL should fail */
              fd = open("./file", O_CREAT|O_EXCL);
              if (fd < 0 && errno == EEXIST)
      	        printf("test case pass\n");
              else
      	        printf("test case failed\n");
              close(fd);
              return 0;
      }
      Signed-off-by: NM. Mohan Kumar <mohan@in.ibm.com>
      Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>
      b6f4bee0
    • A
      fs/9p: Don't use O_TRUNC flag in TOPEN and TLOPEN request · 03f0e022
      Aneesh Kumar K.V 提交于
      We do the truncate via setattr request, hence don't pass the O_TRUNC flag in
      open request. Without this patch we end up sending zero sized write request
      to server when we try to truncate. Some servers (VirtFS) were not handling that
      properly.
      Reported-by: NM. Mohan Kumar <mohan@in.ibm.com>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>
      03f0e022
    • A
      locking in fs/9p ->readdir() · 7ffdea7e
      Al Viro 提交于
      	... is really excessive.  First of all, ->readdir() is serialized by
      file->f_path.dentry->d_inode->i_mutex; playing with file->f_path.dentry->d_lock
      is not buying you anything.  Moreover, rdir->mutex is pointless for exactly
      the same reason - you'll never see contention on it.
      
      	While we are at it, there's no point in having rdir->buf a pointer -
      you have it point just past the end of rdir, so it might as well be a flex
      array (and no, it's not a gccism).
      
      	Absolutely untested patch follows:
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>
      7ffdea7e