1. 15 7月, 2008 15 次提交
    • J
      configfs: Allow ->make_item() and ->make_group() to return detailed errors. · 11c3b792
      Joel Becker 提交于
      The configfs operations ->make_item() and ->make_group() currently
      return a new item/group.  A return of NULL signifies an error.  Because
      of this, -ENOMEM is the only return code bubbled up the stack.
      
      Multiple folks have requested the ability to return specific error codes
      when these operations fail.  This patch adds that ability by changing the
      ->make_item/group() ops to return an int.
      
      Also updated are the in-kernel users of configfs.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      11c3b792
    • L
      configfs: Fix failing mkdir() making racing rmdir() fail · 6d8344ba
      Louis Rilling 提交于
      When fixing the rename() vs rmdir() deadlock, we stopped locking default groups'
      inodes in configfs_detach_prep(), letting racing mkdir() in default groups
      proceed concurrently. This enables races like below happen, which leads to a
      failing mkdir() making rmdir() fail, despite the group to remove having no
      user-created directory under it in the end.
      
      	process A: 			process B:
      	/* PWD=A/B */
      	mkdir("C")
      	  make_item("C")
      	  attach_group("C")
      					rmdir("A")
      					  detach_prep("A")
      					    detach_prep("B")
      					      error because of "C"
      					  return -ENOTEMPTY
      	    attach_group("C/D")
      	      error (eg -ENOMEM)
      	  return -ENOMEM
      
      This patch prevents such scenarii by making rmdir() wait as long as
      detach_prep() fails because a racing mkdir() is in the middle of attach_group().
      To achieve this, mkdir() sets a flag CONFIGFS_USET_IN_MKDIR in parent's
      configfs_dirent before calling attach_group(), and clears the flag once
      attach_group() is done. detach_prep() fails with -EAGAIN whenever the flag is
      hit and returns the guilty inode's mutex so that rmdir() can wait on it.
      Signed-off-by: NLouis Rilling <Louis.Rilling@kerlabs.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      6d8344ba
    • L
      configfs: Fix deadlock with racing rmdir() and rename() · b3e76af8
      Louis Rilling 提交于
      This patch fixes the deadlock between racing sys_rename() and configfs_rmdir().
      
      The idea is to avoid locking i_mutexes of default groups in
      configfs_detach_prep(), and rely instead on the new configfs_dirent_lock to
      protect against configfs_dirent's linkage mutations. To ensure that an mkdir()
      racing with rmdir() will not create new items in a to-be-removed default group,
      we make configfs_new_dirent() check for the CONFIGFS_USET_DROPPING flag right
      before linking the new dirent, and return error if the flag is set. This makes
      racing mkdir()/symlink()/dir_open() fail in places where errors could already
      happen, resp. in (attach_item()|attach_group())/create_link()/new_dirent().
      
      configfs_depend() remains safe since it locks all the path from configfs root,
      and is thus mutually exclusive with rmdir().
      
      An advantage of this is that now detach_groups() unconditionnaly takes the
      default groups i_mutex, which makes it more consistent with populate_groups().
      Signed-off-by: NLouis Rilling <Louis.Rilling@kerlabs.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      b3e76af8
    • L
      configfs: Make configfs_new_dirent() return error code instead of NULL · 107ed40b
      Louis Rilling 提交于
      This patch makes configfs_new_dirent return negative error code instead of NULL,
      which will be useful in the next patch to differentiate ENOMEM from ENOENT.
      Signed-off-by: NLouis Rilling <Louis.Rilling@kerlabs.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      107ed40b
    • L
      configfs: Protect configfs_dirent s_links list mutations · 5301a77d
      Louis Rilling 提交于
      Symlinks to a config_item are listed under its configfs_dirent s_links, but the
      list mutations are not protected by any common lock.
      
      This patch uses the configfs_dirent_lock spinlock to add the necessary
      protection.
      
      Note: we should also protect the list_empty() test in configfs_detach_prep() but
      1/ the lock should not be released immediately because nothing would prevent the
      list from being filled after a successful list_empty() test, making the problem
      tricky,
      2/ this will be solved by the rmdir() vs rename() deadlock bugfix.
      Signed-off-by: NLouis Rilling <Louis.Rilling@kerlabs.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      5301a77d
    • L
      configfs: Introduce configfs_dirent_lock · 6f610764
      Louis Rilling 提交于
      This patch introduces configfs_dirent_lock spinlock to protect configfs_dirent
      traversals against linkage mutations (add/del/move). This will allow
      configfs_detach_prep() to avoid locking i_mutexes.
      
      Locking rules for configfs_dirent linkage mutations are the same plus the
      requirement of taking configfs_dirent_lock. For configfs_dirent walking, one can
      either take appropriate i_mutex as before, or take configfs_dirent_lock.
      
      The spinlock could actually be a mutex, but the critical sections are either
      O(1) or should not be too long (default groups walking in last patch).
      
      ChangeLog:
        - Clarify the comment on configfs_dirent_lock usage
        - Move sd->s_element init before linking the new dirent
        - In lseek(), do not release configfs_dirent_lock before the dirent is
          relinked.
      Signed-off-by: NLouis Rilling <Louis.Rilling@kerlabs.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      6f610764
    • J
      ocfs2: Don't snprintf() without a format. · fe9f3877
      Joel Becker 提交于
      Some system files are per-slot.  Their names include the slot number.
      ocfs2_sprintf_system_inode_name() uses the system inode definitions to
      fill in the slot number with snprintf().
      
      For global system files, there is no node number, and the name was
      printed as a format with no arguments.  -Wformat-nonliteral and
      -Wformat-security don't like this.  Instead, use a static "%s" format
      and the name as the argument.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      fe9f3877
    • J
      ocfs2: Fix CONFIG_OCFS2_DEBUG_FS #ifdefs · e407e397
      Joel Becker 提交于
      A couple places use OCFS2_DEBUG_FS where they really mean
      CONFIG_OCFS2_DEBUG_FS.
      Reported-by: NRobert P. J. Day <rpjday@crashcourse.ca>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      e407e397
    • S
      ocfs2/net: Silence build warnings on sparc64 · 461c6a30
      Sunil Mushran 提交于
      suseconds_t is type long on most arches except sparc64 where it is type int.
      This patch silences the following warnings that are generated when building
      on it.
      
      netdebug.c: In function 'nst_seq_show':
      netdebug.c:152: warning: format '%lu' expects type 'long unsigned int', but argument 13 has type 'suseconds_t'
      netdebug.c:152: warning: format '%lu' expects type 'long unsigned int', but argument 15 has type 'suseconds_t'
      netdebug.c:152: warning: format '%lu' expects type 'long unsigned int', but argument 17 has type 'suseconds_t'
      netdebug.c: In function 'sc_seq_show':
      netdebug.c:332: warning: format '%lu' expects type 'long unsigned int', but argument 19 has type 'suseconds_t'
      netdebug.c:332: warning: format '%lu' expects type 'long unsigned int', but argument 21 has type 'suseconds_t'
      netdebug.c:332: warning: format '%lu' expects type 'long unsigned int', but argument 23 has type 'suseconds_t'
      netdebug.c:332: warning: format '%lu' expects type 'long unsigned int', but argument 25 has type 'suseconds_t'
      netdebug.c:332: warning: format '%lu' expects type 'long unsigned int', but argument 27 has type 'suseconds_t'
      netdebug.c:332: warning: format '%lu' expects type 'long unsigned int', but argument 29 has type 'suseconds_t'
      Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      461c6a30
    • W
      ocfs2: Handle error during journal load · 01af4820
      Wengang Wang 提交于
      This patch ensures the mount fails if the fs is unable to load the journal.
      Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
      Acked-by: NSunil Mushran <sunil.mushran@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      01af4820
    • S
      ocfs2: Silence an error message in ocfs2_file_aio_read() · 56753bd3
      Sunil Mushran 提交于
      This patch silences an EINVAL error message in ocfs2_file_aio_read()
      that is always due to a user error.
      Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      56753bd3
    • A
      7600c72b
    • R
      ocfs2: fix printk format warnings with OCFS2_FS_STATS=n · dd25e55e
      Randy Dunlap 提交于
      Fix printk format warnings when OCFS2_FS_STATS=n:
      
      linux-next-20080528/fs/ocfs2/dlmglue.c: In function 'ocfs2_dlm_seq_show':
      linux-next-20080528/fs/ocfs2/dlmglue.c:2623: warning: format '%llu' expects type 'long long unsigned int', but argument 3 has type 'int'
      linux-next-20080528/fs/ocfs2/dlmglue.c:2623: warning: format '%llu' expects type 'long long unsigned int', but argument 4 has type 'int'
      linux-next-20080528/fs/ocfs2/dlmglue.c:2623: warning: format '%llu' expects type 'long long unsigned int', but argument 7 has type 'int'
      linux-next-20080528/fs/ocfs2/dlmglue.c:2623: warning: format '%llu' expects type 'long long unsigned int', but argument 8 has type 'int'
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      dd25e55e
    • S
      [PATCH 2/2] ocfs2: Instrument fs cluster locks · 8ddb7b00
      Sunil Mushran 提交于
      This patch adds code to track the number of times the fs takes
      various cluster locks as well as the times associated with it.
      The information is made available to users via debugfs.
      
      This patch was originally written by Jan Kara <jack@suse.cz>.
      Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      8ddb7b00
    • S
      [PATCH 1/2] ocfs2: Add CONFIG_OCFS2_FS_STATS config option · ce7231e9
      Sunil Mushran 提交于
      This patch adds config option CONFIG_OCFS2_FS_STATS to allow building
      the fs with instrumentation enabled. An upcoming patch will provide
      support to instrument cluster locking, which is a crucial overhead in
      a cluster file system. This config option allows users to avoid the cpu
      and memory overhead that is involved in gathering such statistics.
      Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      ce7231e9
  2. 13 7月, 2008 2 次提交
  3. 12 7月, 2008 1 次提交
    • D
      Fix reference counting race on log buffers · 49641f1a
      Dave Chinner 提交于
      When we release the iclog, we do an atomic_dec_and_lock to determine if
      we are the last reference and need to trigger update of log headers and
      writeout.  However, in xlog_state_get_iclog_space() we also need to
      check if we have the last reference count there.  If we do, we release
      the log buffer, otherwise we decrement the reference count.
      
      But the compare and decrement in xlog_state_get_iclog_space() is not
      atomic, so both places can see a reference count of 2 and neither will
      release the iclog.  That leads to a filesystem hang.
      
      Close the race by replacing the atomic_read() and atomic_dec() pair with
      atomic_add_unless() to ensure that they are executed atomically.
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NTim Shimmin <tes@sgi.com>
      Tested-by: NEric Sandeen <sandeen@sandeen.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      49641f1a
  4. 11 7月, 2008 2 次提交
    • H
      exec: fix stack excutability without PT_GNU_STACK · 96a8e13e
      Hugh Dickins 提交于
      Kernel Bugzilla #11063 points out that on some architectures (e.g. x86_32)
      exec'ing an ELF without a PT_GNU_STACK program header should default to an
      executable stack; but this got broken by the unlimited argv feature because
      stack vma is now created before the right personality has been established:
      so breaking old binaries using nested function trampolines.
      
      Therefore re-evaluate VM_STACK_FLAGS in setup_arg_pages, where stack
      vm_flags used to be set, before the mprotect_fixup.  Checking through
      our existing VM_flags, none would have changed since insert_vm_struct:
      so this seems safer than finding a way through the personality labyrinth.
      
      Reported-by: pageexec@freemail.hu
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      96a8e13e
    • M
      ocfs2: Fix flags in ocfs2_file_lock · e988cf1c
      Mark Fasheh 提交于
      The stack-glue merge changed the way we use flags in dlmglue in that we now
      use the fs/dlm equivalents. Unfortunately, a merge error left the new flock
      code only partially updated. This took a while to show up though, because
      the lock level constants are actually identical between o2dlm and fs/dlm.
      The *_CONVERT and *_NOQUEUE flags have different values though, which is
      eventually causing a crash in flags_to_o2dlm().
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      e988cf1c
  5. 09 7月, 2008 2 次提交
    • J
      reiserfs: discard prealloc in reiserfs_delete_inode · eb35c218
      Jeff Mahoney 提交于
      With the removal of struct file from the xattr code,
      reiserfs_file_release() isn't used anymore, so the prealloc isn't
      discarded.  This causes hangs later down the line.
      
      This patch adds it to reiserfs_delete_inode.  In most cases it will be a
      no-op due to it already having been called, but will avoid hangs with
      xattrs.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eb35c218
    • T
      NFS: Fix readdir cache invalidation · 2aac05a9
      Trond Myklebust 提交于
      invalidate_inode_pages2_range() takes page offset arguments, not byte
      ranges.
      
      Another thought is that individual pages might perhaps get evicted by VM
      pressure, in which case we might perhaps want to re-read not only the
      evicted page, but all subsequent pages too (in case the server returns
      more/less data per page so that the alignment of the next entry
      changes). We should therefore remove the condition that we only do this on
      page->index==0.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      2aac05a9
  6. 08 7月, 2008 1 次提交
  7. 06 7月, 2008 2 次提交
  8. 05 7月, 2008 7 次提交
  9. 03 7月, 2008 1 次提交
    • E
      9p: fix O_APPEND in legacy mode · 2e4bef41
      Eric Van Hensbergen 提交于
      The legacy protocol's open operation doesn't handle an append operation
      (it is expected that the client take care of it).  We were incorrectly
      passing the extended protocol's flag through even in legacy mode.  This
      was reported in bugzilla report #10689.  This patch fixes the problem
      by disallowing extended protocol open modes from being passed in legacy
      mode and implemented append functionality on the client side by adding
      a seek after the open.
      Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>
      2e4bef41
  10. 01 7月, 2008 1 次提交
    • J
      Properly notify block layer of sync writes · 18ce3751
      Jens Axboe 提交于
      fsync_buffers_list() and sync_dirty_buffer() both issue async writes and
      then immediately wait on them. Conceptually, that makes them sync writes
      and we should treat them as such so that the IO schedulers can handle
      them appropriately.
      
      This patch fixes a write starvation issue that Lin Ming reported, where
      xx is stuck for more than 2 minutes because of a large number of
      synchronous IO in the system:
      
      INFO: task kjournald:20558 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
      message.
      kjournald     D ffff810010820978  6712 20558      2
      ffff81022ddb1d10 0000000000000046 ffff81022e7baa10 ffffffff803ba6f2
      ffff81022ecd0000 ffff8101e6dc9160 ffff81022ecd0348 000000008048b6cb
      0000000000000086 ffff81022c4e8d30 0000000000000000 ffffffff80247537
      Call Trace:
      [<ffffffff803ba6f2>] kobject_get+0x12/0x17
      [<ffffffff80247537>] getnstimeofday+0x2f/0x83
      [<ffffffff8029c1ac>] sync_buffer+0x0/0x3f
      [<ffffffff8066d195>] io_schedule+0x5d/0x9f
      [<ffffffff8029c1e7>] sync_buffer+0x3b/0x3f
      [<ffffffff8066d3f0>] __wait_on_bit+0x40/0x6f
      [<ffffffff8029c1ac>] sync_buffer+0x0/0x3f
      [<ffffffff8066d48b>] out_of_line_wait_on_bit+0x6c/0x78
      [<ffffffff80243909>] wake_bit_function+0x0/0x23
      [<ffffffff8029e3ad>] sync_dirty_buffer+0x98/0xcb
      [<ffffffff8030056b>] journal_commit_transaction+0x97d/0xcb6
      [<ffffffff8023a676>] lock_timer_base+0x26/0x4b
      [<ffffffff8030300a>] kjournald+0xc1/0x1fb
      [<ffffffff802438db>] autoremove_wake_function+0x0/0x2e
      [<ffffffff80302f49>] kjournald+0x0/0x1fb
      [<ffffffff802437bb>] kthread+0x47/0x74
      [<ffffffff8022de51>] schedule_tail+0x28/0x5d
      [<ffffffff8020cac8>] child_rip+0xa/0x12
      [<ffffffff80243774>] kthread+0x0/0x74
      [<ffffffff8020cabe>] child_rip+0x0/0x12
      
      Lin Ming confirms that this patch fixes the issue. I've run tests with
      it for the past week and no ill effects have been observed, so I'm
      proposing it for inclusion into 2.6.26.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      18ce3751
  11. 25 6月, 2008 1 次提交
    • B
      [GFS2] fix gfs2 block allocation (cleaned up) · 5af4e7a0
      Benjamin Marzinski 提交于
      This patch fixes bz 450641.
      
      This patch changes the computation for zero_metapath_length(), which it
      renames to metapath_branch_start(). When you are extending the metadata
      tree, The indirect blocks that point to the new data block must either
      diverge from the existing tree either at the inode, or at the first
      indirect block. They can diverge at the first indirect block because the
      inode has room for 483 pointers while the indirect blocks have room for
      509 pointers, so when the tree is grown, there is some free space in the
      first indirect block. What metapath_branch_start() now computes is the
      height where the first indirect block for the new data block is located.
      It can either be 1 (if the indirect block diverges from the inode) or 2
      (if it diverges from the first indirect block).
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      5af4e7a0
  12. 24 6月, 2008 5 次提交