1. 10 11月, 2010 2 次提交
  2. 09 11月, 2010 2 次提交
    • S
      ceph: fix update of ctime from MDS · d8672d64
      Sage Weil 提交于
      The client can have a newer ctime than the MDS due to AUTH_EXCL and
      XATTR_EXCL caps as well; update the check in ceph_fill_file_time
      appropriately.
      
      This fixes cases where ctime/mtime goes backward under the right sequence
      of local updates (e.g. chmod) and mds replies (e.g. subsequent stat that
      goes to the MDS).
      Signed-off-by: NSage Weil <sage@newdream.net>
      d8672d64
    • S
      ceph: fix version check on racing inode updates · 8bd59e01
      Sage Weil 提交于
      We may get updates on the same inode from multiple MDSs; generally we only
      pay attention if the update is newer than what we already have.  The
      exception is when an MDS sense unstable information, in which case we
      always update.
      
      The old > check got this wrong when our version was odd (e.g. 3) and the
      reply version was even (e.g. 2): the older stale (v2) info would be
      applied.  Fixed and clarified the comment.
      Signed-off-by: NSage Weil <sage@newdream.net>
      8bd59e01
  3. 08 11月, 2010 6 次提交
    • S
      ceph: fix uid/gid on resent mds requests · cb4276cc
      Sage Weil 提交于
      MDS requests can be rebuilt and resent in non-process context, but were
      filling in uid/gid from current_fsuid/gid.  Put that information in the
      request struct on request setup.
      
      This fixes incorrect (and root) uid/gid getting set for requests that
      are forwarded between MDSs, usually due to metadata migrations.
      Signed-off-by: NSage Weil <sage@newdream.net>
      cb4276cc
    • S
      ceph: fix rdcache_gen usage and invalidate · cd045cb4
      Sage Weil 提交于
      We used to use rdcache_gen to indicate whether we "might" have cached
      pages.  Now we just look at the mapping to determine that.  However, some
      old behavior remains from that transition.
      
      First, rdcache_gen == 0 no longer means we have no pages.  That can happen
      at any time (presumably when we carry FILE_CACHE).  We should not reset it
      to zero, and we should not check that it is zero.
      
      That means that the only purpose for rdcache_revoking is to resolve races
      between new issues of FILE_CACHE and an async invalidate.  If they are
      equal, we should invalidate.  On success, we decrement rdcache_revoking,
      so that it is no longer equal to rdcache_gen.  Similarly, if we success
      in doing a sync invalidate, set revoking = gen - 1.  (This is a small
      optimization to avoid doing unnecessary invalidate work and does not
      affect correctness.)
      Signed-off-by: NSage Weil <sage@newdream.net>
      cd045cb4
    • S
      ceph: re-request max_size if cap auth changes · feb4cc9b
      Sage Weil 提交于
      If the auth cap migrates to another MDS, clear requested_max_size so that
      we resend any pending max_size increase requests.  This fixes potential
      hangs on writes that extend a file and race with an cap migration between
      MDSs.
      Signed-off-by: NSage Weil <sage@newdream.net>
      feb4cc9b
    • S
      ceph: only let auth caps update max_size · 912a9b03
      Sage Weil 提交于
      Only the auth MDS has a meaningful max_size value for us, so only update it
      in fill_inode if we're being issued an auth cap.  Otherwise, a random
      stat result from a non-auth MDS can clobber a meaningful max_size, get
      the client<->mds cap state out of sync, and make writes hang.
      
      Specifically, even if the client re-requests a larger max_size (which it
      will), the MDS won't respond because as far as it knows we already have a
      sufficiently large value.
      Signed-off-by: NSage Weil <sage@newdream.net>
      912a9b03
    • S
      ceph: fix open for write on clustered mds · 7421ab80
      Sage Weil 提交于
      Normally when we open a file we already have a cap, and simply update the
      wanted set.  However, if we open a file for write, but don't have an auth
      cap, that doesn't work; we need to open a new cap with the auth MDS.  Only
      reuse existing caps if we are opening for read or the existing cap is auth.
      Signed-off-by: NSage Weil <sage@newdream.net>
      7421ab80
    • S
      ceph: fix bad pointer dereference in ceph_fill_trace · d8b16b3d
      Sage Weil 提交于
      We dereference *in a few lines down, but only set it on rename.  It is
      apparently pretty rare for this to trigger, but I have been hitting it
      with a clustered MDSs.
      Signed-off-by: NSage Weil <sage@newdream.net>
      d8b16b3d
  4. 28 10月, 2010 1 次提交
    • S
      Revert "ceph: update issue_seq on cap grant" · 2f56f56a
      Sage Weil 提交于
      This reverts commit d91f2438.
      
      The intent of issue_seq is to distinguish between mds->client messages that
      (re)create the cap and those that do not, which means we should _only_ be
      updating that value in the create paths.  By updating it in handle_cap_grant,
      we reset it to zero, which then breaks release.
      
      The larger question is what workload/problem made me think it should be
      updated here...
      Signed-off-by: NSage Weil <sage@newdream.net>
      2f56f56a
  5. 21 10月, 2010 14 次提交
  6. 15 10月, 2010 2 次提交
    • L
      Un-inline the core-dump helper functions · 3aa0ce82
      Linus Torvalds 提交于
      Tony Luck reports that the addition of the access_ok() check in commit
      0eead9ab ("Don't dump task struct in a.out core-dumps") broke the
      ia64 compile due to missing the necessary header file includes.
      
      Rather than add yet another include (<asm/unistd.h>) to make everything
      happy, just uninline the silly core dump helper functions and move the
      bodies to fs/exec.c where they make a lot more sense.
      
      dump_seek() in particular was too big to be an inline function anyway,
      and none of them are in any way performance-critical.  And we really
      don't need to mess up our include file headers more than they already
      are.
      Reported-and-tested-by: NTony Luck <tony.luck@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3aa0ce82
    • L
      Don't dump task struct in a.out core-dumps · 0eead9ab
      Linus Torvalds 提交于
      akiphie points out that a.out core-dumps have that odd task struct
      dumping that was never used and was never really a good idea (it goes
      back into the mists of history, probably the original core-dumping
      code).  Just remove it.
      
      Also do the access_ok() check on dump_write().  It probably doesn't
      matter (since normal filesystems all seem to do it anyway), but he
      points out that it's normally done by the VFS layer, so ...
      
      [ I suspect that we should possibly do "vfs_write()" instead of
        calling ->write directly.  That also does the whole fsnotify and write
        statistics thing, which may or may not be a good idea. ]
      
      And just to be anal, do this all for the x86-64 32-bit a.out emulation
      code too, even though it's not enabled (and won't currently even
      compile)
      Reported-by: Nakiphie <akiphie@lavabit.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0eead9ab
  7. 14 10月, 2010 1 次提交
  8. 12 10月, 2010 1 次提交
    • E
      fanotify: disable fanotify syscalls · 7c534773
      Eric Paris 提交于
      This patch disables the fanotify syscalls by just not building them and
      letting the cond_syscall() statements in kernel/sys_ni.c redirect them
      to sys_ni_syscall().
      
      It was pointed out by Tvrtko Ursulin that the fanotify interface did not
      include an explicit prioritization between groups.  This is necessary
      for fanotify to be usable for hierarchical storage management software,
      as they must get first access to the file, before inotify-like notifiers
      see the file.
      
      This feature can be added in an ABI compatible way in the next release
      (by using a number of bits in the flags field to carry the info) but it
      was suggested by Alan that maybe we should just hold off and do it in
      the next cycle, likely with an (new) explicit argument to the syscall.
      I don't like this approach best as I know people are already starting to
      use the current interface, but Alan is all wise and noone on list backed
      me up with just using what we have.  I feel this is needlessly ripping
      the rug out from under people at the last minute, but if others think it
      needs to be a new argument it might be the best way forward.
      
      Three choices:
      Go with what we got (and implement the new feature next cycle).  Add a
      new field right now (and implement the new feature next cycle).  Wait
      till next cycle to release the ABI (and implement the new feature next
      cycle).  This is number 3.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7c534773
  9. 08 10月, 2010 1 次提交
    • B
      exofs: Fix double page_unlock BUG in write_begin/end · f17b1f9f
      Boaz Harrosh 提交于
      This BUG is there since the first submit of the code, but only triggered
      in last Kernel. It's timing related do to the asynchronous object-creation
      behaviour of exofs. (Which should be investigated farther)
      
      The bug is obvious hence the fixed.
      
      Signed-off-by: Boaz Harrosh <Boaz Harrosh bharrosh@panasas.com>
      f17b1f9f
  10. 07 10月, 2010 7 次提交
  11. 04 10月, 2010 2 次提交
    • C
      writeback: always use sb->s_bdi for writeback purposes · aaead25b
      Christoph Hellwig 提交于
      We currently use struct backing_dev_info for various different purposes.
      Originally it was introduced to describe a backing device which includes
      an unplug and congestion function and various bits of readahead information
      and VM-relevant flags.  We're also using for tracking dirty inodes for
      writeback.
      
      To make writeback properly find all inodes we need to only access the
      per-filesystem backing_device pointed to by the superblock in ->s_bdi
      inside the writeback code, and not the instances pointeded to by
      inode->i_mapping->backing_dev which can be overriden by special devices
      or might not be set at all by some filesystems.
      
      Long term we should split out the writeback-relevant bits of struct
      backing_device_info (which includes more than the current bdi_writeback)
      and only point to it from the superblock while leaving the traditional
      backing device as a separate structure that can be overriden by devices.
      
      The one exception for now is the block device filesystem which really
      wants different writeback contexts for it's different (internal) inodes
      to handle the writeout more efficiently.  For now we do this with
      a hack in fs-writeback.c because we're so late in the cycle, but in
      the future I plan to replace this with a superblock method that allows
      for multiple writeback contexts per filesystem.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      aaead25b
    • G
      fuse: Initialize total_len in fuse_retrieve() · 0157443c
      Geert Uytterhoeven 提交于
      fs/fuse/dev.c:1357: warning: ‘total_len’ may be used uninitialized in this
      function
      
      Initialize total_len to zero, else its value will be undefined.
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      0157443c
  12. 02 10月, 2010 1 次提交
    • F
      reiserfs: fix unwanted reiserfs lock recursion · 9d8117e7
      Frederic Weisbecker 提交于
      Prevent from recursively locking the reiserfs lock in reiserfs_unpack()
      because we may call journal_begin() that requires the lock to be taken
      only once, otherwise it won't be able to release the lock while taking
      other mutexes, ending up in inverted dependencies between the journal
      mutex and the reiserfs lock for example.
      
      This fixes:
      
        =======================================================
        [ INFO: possible circular locking dependency detected ]
        2.6.35.4.4a #3
        -------------------------------------------------------
        lilo/1620 is trying to acquire lock:
         (&journal->j_mutex){+.+...}, at: [<d0325bff>] do_journal_begin_r+0x7f/0x340 [reiserfs]
      
        but task is already holding lock:
         (&REISERFS_SB(s)->lock){+.+.+.}, at: [<d032a278>] reiserfs_write_lock+0x28/0x40 [reiserfs]
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #1 (&REISERFS_SB(s)->lock){+.+.+.}:
               [<c10562b7>] lock_acquire+0x67/0x80
               [<c12facad>] __mutex_lock_common+0x4d/0x410
               [<c12fb0c8>] mutex_lock_nested+0x18/0x20
               [<d032a278>] reiserfs_write_lock+0x28/0x40 [reiserfs]
               [<d0325c06>] do_journal_begin_r+0x86/0x340 [reiserfs]
               [<d0325f77>] journal_begin+0x77/0x140 [reiserfs]
               [<d0315be4>] reiserfs_remount+0x224/0x530 [reiserfs]
               [<c10b6a20>] do_remount_sb+0x60/0x110
               [<c10cee25>] do_mount+0x625/0x790
               [<c10cf014>] sys_mount+0x84/0xb0
               [<c12fca3d>] syscall_call+0x7/0xb
      
        -> #0 (&journal->j_mutex){+.+...}:
               [<c10560f6>] __lock_acquire+0x1026/0x1180
               [<c10562b7>] lock_acquire+0x67/0x80
               [<c12facad>] __mutex_lock_common+0x4d/0x410
               [<c12fb0c8>] mutex_lock_nested+0x18/0x20
               [<d0325bff>] do_journal_begin_r+0x7f/0x340 [reiserfs]
               [<d0325f77>] journal_begin+0x77/0x140 [reiserfs]
               [<d0326271>] reiserfs_persistent_transaction+0x41/0x90 [reiserfs]
               [<d030d06c>] reiserfs_get_block+0x22c/0x1530 [reiserfs]
               [<c10db9db>] __block_prepare_write+0x1bb/0x3a0
               [<c10dbbe6>] block_prepare_write+0x26/0x40
               [<d030b738>] reiserfs_prepare_write+0x88/0x170 [reiserfs]
               [<d03294d6>] reiserfs_unpack+0xe6/0x120 [reiserfs]
               [<d0329782>] reiserfs_ioctl+0x272/0x320 [reiserfs]
               [<c10c3188>] vfs_ioctl+0x28/0xa0
               [<c10c3bbd>] do_vfs_ioctl+0x32d/0x5c0
               [<c10c3eb3>] sys_ioctl+0x63/0x70
               [<c12fca3d>] syscall_call+0x7/0xb
      
        other info that might help us debug this:
      
        2 locks held by lilo/1620:
         #0:  (&sb->s_type->i_mutex_key#8){+.+.+.}, at: [<d032945a>] reiserfs_unpack+0x6a/0x120 [reiserfs]
         #1:  (&REISERFS_SB(s)->lock){+.+.+.}, at: [<d032a278>] reiserfs_write_lock+0x28/0x40 [reiserfs]
      
        stack backtrace:
        Pid: 1620, comm: lilo Not tainted 2.6.35.4.4a #3
        Call Trace:
         [<c10560f6>] __lock_acquire+0x1026/0x1180
         [<c10562b7>] lock_acquire+0x67/0x80
         [<c12facad>] __mutex_lock_common+0x4d/0x410
         [<c12fb0c8>] mutex_lock_nested+0x18/0x20
         [<d0325bff>] do_journal_begin_r+0x7f/0x340 [reiserfs]
         [<d0325f77>] journal_begin+0x77/0x140 [reiserfs]
         [<d0326271>] reiserfs_persistent_transaction+0x41/0x90 [reiserfs]
         [<d030d06c>] reiserfs_get_block+0x22c/0x1530 [reiserfs]
         [<c10db9db>] __block_prepare_write+0x1bb/0x3a0
         [<c10dbbe6>] block_prepare_write+0x26/0x40
         [<d030b738>] reiserfs_prepare_write+0x88/0x170 [reiserfs]
         [<d03294d6>] reiserfs_unpack+0xe6/0x120 [reiserfs]
         [<d0329782>] reiserfs_ioctl+0x272/0x320 [reiserfs]
         [<c10c3188>] vfs_ioctl+0x28/0xa0
         [<c10c3bbd>] do_vfs_ioctl+0x32d/0x5c0
         [<c10c3eb3>] sys_ioctl+0x63/0x70
         [<c12fca3d>] syscall_call+0x7/0xb
      Reported-by: NJarek Poplawski <jarkao2@gmail.com>
      Tested-by: NJarek Poplawski <jarkao2@gmail.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: All since 2.6.32 <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9d8117e7