1. 17 6月, 2009 2 次提交
  2. 16 6月, 2009 5 次提交
  3. 10 6月, 2009 2 次提交
    • J
      jbd: fix race in buffer processing in commit code · a61d90d7
      Jan Kara 提交于
      In commit code, we scan buffers attached to a transaction.  During this
      scan, we sometimes have to drop j_list_lock and then we recheck whether
      the journal buffer head didn't get freed by journal_try_to_free_buffers().
       But checking for buffer_jbd(bh) isn't enough because a new journal head
      could get attached to our buffer head.  So add a check whether the journal
      head remained the same and whether it's still at the same transaction and
      list.
      
      This is a nasty bug and can cause problems like memory corruption (use after
      free) or trigger various assertions in JBD code (observed).
      Signed-off-by: NJan Kara <jack@suse.cz>
      Cc: <stable@kernel.org>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a61d90d7
    • I
      autofs4: remove hashed check in validate_wait() · 463aea1a
      Ian Kent 提交于
      The recent ->lookup() deadlock correction required the directory inode
      mutex to be dropped while waiting for expire completion.  We were
      concerned about side effects from this change and one has been identified.
      
      I saw several error messages.
      
      They cause autofs to become quite confused and don't really point to the
      actual problem.
      
      Things like:
      
      handle_packet_missing_direct:1376: can't find map entry for (43,1827932)
      
      which is usually totally fatal (although in this case it wouldn't be
      except that I treat is as such because it normally is).
      
      do_mount_direct: direct trigger not valid or already mounted
      /test/nested/g3c/s1/ss1
      
      which is recoverable, however if this problem is at play it can cause
      autofs to become quite confused as to the dependencies in the mount tree
      because mount triggers end up mounted multiple times.  It's hard to
      accurately check for this over mounting case and automount shouldn't need
      to if the kernel module is doing its job.
      
      There was one other message, similar in consequence of this last one but I
      can't locate a log example just now.
      
      When checking if a mount has already completed prior to adding a new mount
      request to the wait queue we check if the dentry is hashed and, if so, if
      it is a mount point.  But, if a mount successfully completed while we
      slept on the wait queue mutex the dentry must exist for the mount to have
      completed so the test is not really needed.
      
      Mounts can also be done on top of a global root dentry, so for the above
      case, where a mount request completes and the wait queue entry has already
      been removed, the hashed test returning false can cause an incorrect
      callback to the daemon.  Also, d_mountpoint() is not sufficient to check
      if a mount has completed for the multi-mount case when we don't have a
      real mount at the base of the tree.
      Signed-off-by: NIan Kent <raven@themaw.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      463aea1a
  4. 07 6月, 2009 1 次提交
  5. 06 6月, 2009 2 次提交
    • A
      ext3/4 with synchronous writes gets wedged by Postfix · 72a43d63
      Al Viro 提交于
      OK, that's probably the easiest way to do that, as much as I don't like it...
      Since iget() et.al. will not accept I_FREEING (will wait to go away
      and restart), and since we'd better have serialization between new/free
      on fs data structures anyway, we can afford simply skipping I_FREEING
      et.al. in insert_inode_locked().
      
      We do that from new_inode, so it won't race with free_inode in any interesting
      ways and it won't race with iget (of any origin; nfsd or in case of fs
      corruption a lookup) since both still will wait for I_LOCK.
      Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
      Acked-by: NJan Kara <jack@suse.cz>
      Tested-by: NDavid Watson <dbwatson@ukfsn.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      72a43d63
    • T
      Fix nobh_truncate_page() to not pass stack garbage to get_block() · 460bcf57
      Theodore Ts'o 提交于
      The nobh_truncate_page() function is used by ext2, exofs, and jfs.  Of
      these three, only ext2 and jfs's get_block() function pays attention
      to bh->b_size --- which is normally always the filesystem blocksize
      except when the get_block() function is called by either
      mpage_readpage(), mpage_readpages(), or the direct I/O routines in
      fs/direct_io.c.
      
      Unfortunately, nobh_truncate_page() does not initialize map_bh before
      calling the filesystem-supplied get_block() function.  So ext2 and jfs
      will try to calculate the number of blocks to map by taking stack
      garbage and shifting it left by inode->i_blkbits.  This should be
      *mostly* harmless (except the filesystem will do some unnneeded work)
      unless the stack garbage is less than filesystem's blocksize, in which
      case maxblocks will be zero, and the attempt to find out whether or
      not the filesystem has a hole at a given logical block will fail, and
      the page cache entry might not get zero'ed out.
      
      Also if the stack garbage in in map_bh->state happens to have the
      BH_Mapped bit set, there could be an attempt to call readpage() on a
      non-existent page, which could cause nobh_truncate_page() to return an
      error when it should not.
      
      Fix this by initializing map_bh->state and map_bh->size.
      
      Fortunately, it's probably fairly unlikely that ext2 and jfs users
      mount with nobh these days.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
      Cc: linux-fsdevel@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      460bcf57
  6. 05 6月, 2009 1 次提交
    • C
      Btrfs: Fix oops and use after free during space balancing · 44fb5511
      Chris Mason 提交于
      The btrfs allocator uses list_for_each to walk the available block
      groups when searching for free blocks.  It starts off with a hint
      to help find the best block group for a given allocation.
      
      The hint is resolved into a block group, but we don't properly check
      to make sure the block group we find isn't in the middle of being
      freed due to filesystem shrinking or balancing.  If it is being
      freed, the list pointers in it are bogus and can't be trusted.  But,
      the code happily goes along and uses them in the list_for_each loop,
      leading to all kinds of fun.
      
      The fix used here is to check to make sure the block group we find really
      is on the list before we use it.  list_del_init is used when removing
      it from the list, so we can do a proper check.
      
      The allocation clustering code has a similar bug where it will trust
      the block group in the current free space cluster.  If our allocation
      flags have changed (going from single spindle dup to raid1 for example)
      because the drives in the FS have changed, we're not allowed to use
      the old block group any more.
      
      The fix used here is to check the current cluster against the
      current allocation flags.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      44fb5511
  7. 04 6月, 2009 1 次提交
  8. 02 6月, 2009 5 次提交
    • F
      xfs: prevent deadlock in xfs_qm_shake() · 1b17d766
      Felix Blyakher 提交于
      It's possible to recurse into filesystem from the memory
      allocation, which deadlocks in xfs_qm_shake(). Add check
      for __GFP_FS, and bail out if it is not set.
      Signed-off-by: NFelix Blyakher <felixb@sgi.com>
      Signed-off-by: NHedi Berriche <hedi@sgi.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NFelix Blyakher <felixb@sgi.com>
      1b17d766
    • E
      xfs: fix overflow in xfs_growfs_data_private · e6da7c9f
      Eric Sandeen 提交于
      In the case where growing a filesystem would leave the last AG
      too small, the fixup code has an overflow in the calculation
      of the new size with one fewer ag, because "nagcount" is a 32
      bit number.  If the new filesystem has > 2^32 blocks in it
      this causes a problem resulting in an EINVAL return from growfs:
      
       # xfs_io -f -c "truncate 19998630180864" fsfile
       # mkfs.xfs -f -bsize=4096 -dagsize=76288719b,size=3905982455b fsfile
       # mount -o loop fsfile /mnt
       # xfs_growfs /mnt
      
      meta-data=/dev/loop0             isize=256    agcount=52,
      agsize=76288719 blks
               =                       sectsz=512   attr=2
      data     =                       bsize=4096   blocks=3905982455, imaxpct=5
               =                       sunit=0      swidth=0 blks
      naming   =version 2              bsize=4096   ascii-ci=0
      log      =internal               bsize=4096   blocks=32768, version=2
               =                       sectsz=512   sunit=0 blks, lazy-count=0
      realtime =none                   extsz=4096   blocks=0, rtextents=0
      xfs_growfs: XFS_IOC_FSGROWFSDATA xfsctl failed: Invalid argument
      
      Reported-by: richard.ems@cape-horn-eng.com
      Signed-off-by: NEric Sandeen <sandeen@sandeen.net>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NFelix Blyakher <felixb@sgi.com>
      Signed-off-by: NFelix Blyakher <felixb@sgi.com>
      e6da7c9f
    • F
      xfs: fix double unlock in xfs_swap_extents() · 1f23920d
      Felix Blyakher 提交于
      Regreesion from commit ef8f7fc5, which rearranged the code in
      xfs_swap_extents() leading to double unlock of xfs inode ilock.
      That resulted in xfs_fsr deadlocking itself on platforms, which
      don't handle double unlock of rw_semaphore nicely. It caused the
      count go negative, which represents the write holder, without
      really having one. ia64 is one of the platforms where deadlock
      was easily reproduced and the fix was tested.
      Signed-off-by: NEric Sandeen <sandeen@sandeen.net>
      Reviewed-by: NEric Sandeen <sandeen@sandeen.net>
      Signed-off-by: NFelix Blyakher <felixb@sgi.com>
      1f23920d
    • Y
      NFSv4: kill off complicated macro 'PROC' · 0a93a47f
      Yu Zhiguo 提交于
      J. Bruce Fields wrote:
      ...
      > (This is extremely confusing code to track down: note that
      > proc->pc_decode is set to nfs4svc_decode_compoundargs() by the PROC()
      > macro at the end of fs/nfsd/nfs4proc.c.  Which means, for example, that
      > grepping for nfs4svc_decode_compoundargs() gets you nowhere.  Patches to
      > kill off that macro would be welcomed....)
      
      the macro 'PROC' is complicated and obscure, it had better
      be killed off in order to make the code more clear.
      Signed-off-by: NYu Zhiguo <yuzg@cn.fujitsu.com>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      0a93a47f
    • Y
      NFSv4: do exact check about attribute specified · 3c8e0316
      Yu Zhiguo 提交于
      Server should return NFS4ERR_ATTRNOTSUPP if an attribute specified is
      not supported in current environment.
      Operations CREATE, NVERIFY, OPEN, SETATTR and VERIFY should do this check.
      
      This bug is found when do newpynfs tests. The names of the tests that failed
      are following:
        CR12 NVF7a NVF7b NVF7c NVF7d NVF7f NVF7r NVF7s
        OPEN15 VF7a VF7b VF7c VF7d VF7f VF7r VF7s
      
      Add function do_check_fattr() to do exact check:
      1, Check attribute specified is supported by the NFSv4 server or not.
      2, Check FATTR4_WORD0_ACL & FATTR4_WORD0_FS_LOCATIONS are supported
         in current environment or not.
      3, Check attribute specified is writable or not.
      
      step 1 and 3 are done in function nfsd4_decode_fattr() but removed
      to this function now.
      Signed-off-by: NYu Zhiguo <yuzg@cn.fujitsu.com>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      3c8e0316
  9. 30 5月, 2009 1 次提交
  10. 29 5月, 2009 4 次提交
  11. 28 5月, 2009 6 次提交
    • W
      nfsd: fix hung up of nfs client while sync write data to nfs server · a0d24b29
      Wei Yongjun 提交于
      Commit 'Short write in nfsd becomes a full write to the client'
      (31dec253) broken the sync write.
      With the following commands to reproduce:
      
        $ mount -t nfs -o sync 192.168.0.21:/nfsroot /mnt
        $ cd /mnt
        $ echo aaaa > temp.txt
      
      Then nfs client is hung up.
      
      In SYNC mode the server alaways return the write count 0 to the
      client. This is because the value of host_err in nfsd_vfs_write()
      will be overwrite in SYNC mode by 'host_err=nfsd_sync(file);',
      and then we return host_err(which is now 0) as write count.
      
      This patch fixed the problem.
      Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      a0d24b29
    • G
      knfsd: remove unreported filehandle stats counters · 1dbd0d53
      Greg Banks 提交于
      The file nfsfh.c contains two static variables nfsd_nr_verified and
      nfsd_nr_put.  These are counters which are incremented as a side
      effect of the fh_verify() fh_compose() and fh_put() operations,
      i.e. at least twice per NFS call for any non-trivial workload.
      Needless to say this makes the cacheline that contains them (and any
      other innocent victims) a very hot contention point indeed under high
      call-rate workloads on multiprocessor NFS server.  It also turns out
      that these counters are not used anywhere.  They're not reported to
      userspace, they're not used in logic, they're not even exported from
      the object file (let alone the module).  All they do is waste CPU time.
      
      So this patch removes them.
      
      Tests on a 16 CPU Altix A4700 with 2 10gige Myricom cards, configured
      separately (no bonding).  Workload is 640 client threads doing directory
      traverals with random small reads, from server RAM.
      
      Before
      ======
      
      Kernel profile:
      
        %   cumulative   self              self     total
       time   samples   samples    calls   1/call   1/call  name
        6.05   2716.00  2716.00    30406     0.09     1.02  svc_process
        4.44   4706.00  1990.00     1975     1.01     1.01  spin_unlock_irqrestore
        3.72   6376.00  1670.00     1666     1.00     1.00  svc_export_put
        3.41   7907.00  1531.00     1786     0.86     1.02  nfsd_ofcache_lookup
        3.25   9363.00  1456.00    10965     0.13     1.01  nfsd_dispatch
        3.10  10752.00  1389.00     1376     1.01     1.01  nfsd_cache_lookup
        2.57  11907.00  1155.00     4517     0.26     1.03  svc_tcp_recvfrom
        ...
        2.21  15352.00  1003.00     1081     0.93     1.00  nfsd_choose_ofc  <----
        ^^^^
      
      Here the function nfsd_choose_ofc() reads a global variable
      which by accident happened to be located in the same cacheline as
      nfsd_nr_verified.
      
      Call rate:
      
      nullarbor:~ # pmdumptext nfs3.server.calls
      ...
      Thu Dec 13 00:15:27     184780.663
      Thu Dec 13 00:15:28     184885.881
      Thu Dec 13 00:15:29     184449.215
      Thu Dec 13 00:15:30     184971.058
      Thu Dec 13 00:15:31     185036.052
      Thu Dec 13 00:15:32     185250.475
      Thu Dec 13 00:15:33     184481.319
      Thu Dec 13 00:15:34     185225.737
      Thu Dec 13 00:15:35     185408.018
      Thu Dec 13 00:15:36     185335.764
      
      After
      =====
      
      kernel profile:
      
        %   cumulative   self              self     total
       time   samples   samples    calls   1/call   1/call  name
        6.33   2813.00  2813.00    29979     0.09     1.01  svc_process
        4.66   4883.00  2070.00     2065     1.00     1.00  spin_unlock_irqrestore
        4.06   6687.00  1804.00     2182     0.83     1.00  nfsd_ofcache_lookup
        3.20   8110.00  1423.00    10932     0.13     1.00  nfsd_dispatch
        3.03   9456.00  1346.00     1343     1.00     1.00  nfsd_cache_lookup
        2.62  10622.00  1166.00     4645     0.25     1.01  svc_tcp_recvfrom
      [...]
        0.10  42586.00    44.00       74     0.59     1.00  nfsd_choose_ofc  <--- HA!!
        ^^^^
      
      Call rate:
      
      nullarbor:~ # pmdumptext nfs3.server.calls
      ...
      Thu Dec 13 01:45:28     194677.118
      Thu Dec 13 01:45:29     193932.692
      Thu Dec 13 01:45:30     194294.364
      Thu Dec 13 01:45:31     194971.276
      Thu Dec 13 01:45:32     194111.207
      Thu Dec 13 01:45:33     194999.635
      Thu Dec 13 01:45:34     195312.594
      Thu Dec 13 01:45:35     195707.293
      Thu Dec 13 01:45:36     194610.353
      Thu Dec 13 01:45:37     195913.662
      Thu Dec 13 01:45:38     194808.675
      
      i.e. about a 5.3% improvement in call rate.
      Signed-off-by: NGreg Banks <gnb@melbourne.sgi.com>
      Reviewed-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      1dbd0d53
    • G
      knfsd: fix reply cache memory corruption · cf0a586c
      Greg Banks 提交于
      Fix a regression in the reply cache introduced when the code was
      converted to use proper Linux lists.  When a new entry needs to be
      inserted, the case where all the entries are currently being used
      by threads is not correctly detected.  This can result in memory
      corruption and a crash.  In the current code this is an extremely
      unlikely corner case; it would require the machine to have 1024
      nfsd threads and all of them to be busy at the same time.  However,
      upcoming reply cache changes make this more likely; a crash due to
      this problem was actually observed in field.
      Signed-off-by: NGreg Banks <gnb@sgi.com>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      cf0a586c
    • G
      knfsd: reply cache cleanups · fca4217c
      Greg Banks 提交于
      Make REQHASH() an inline function.  Rename hash_list to cache_hash.
      Fix an obsolete comment.
      Signed-off-by: NGreg Banks <gnb@sgi.com>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      fca4217c
    • D
      CacheFiles: Fixup renamed filenames in comments in internal.h · 911e690e
      David Howells 提交于
      Fix up renamed filenames in comments in fs/cachefiles/internal.h.
      
      Originally, the files were all called cf-xxx.c, but they got renamed to
      just xxx.c.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      911e690e
    • D
      FS-Cache: Fixup renamed filenames in comments in internal.h · 348ca102
      David Howells 提交于
      Fix up renamed filenames in comments in fs/fscache/internal.h.
      
      Originally, the files were all called fsc-xxx.c, but they got renamed to
      just xxx.c.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      348ca102
  12. 27 5月, 2009 2 次提交
  13. 24 5月, 2009 1 次提交
    • S
      [CIFS] Avoid open on possible directories since Samba now rejects them · 8db14ca1
      Steve French 提交于
      Small change (mostly formatting) to limit lookup based open calls to
      file create only.
      
      After discussion yesteday on samba-technical about the posix lookup
      regression,  and looking at a problem with cifs posix open to one
      particular Samba version, Jeff and JRA realized that Samba server's
      behavior changed in this area (posix open behavior on files vs.
      directories).   To make this behavior consistent, JRA just made a
      fix to Samba server to alter how it handles open of directories (now
      returning the equivalent of EISDIR instead of success). Since we don't
      know at lookup time whether the inode is a directory or file (and
      thus whether posix open will succeed with most current Samba server),
      this change avoids the posix open code on lookup open (just issues
      posix open on creates).    This gets the semantic benefits we want
      (atomicity, posix byte range locks, improved write semantics on newly
      created files) and file create still is fast, and we avoid the problem
      that Jeff noticed yesterday with "openat" (and some open directory
      calls) of non-cached directories to one version of Samba server, and
      will work with future Samba versions (which include the fix jra just
      pushed into Samba server).  I confirmed this approach with jra
      yesterday and with Shirish today.
      
      Posix open is only called (at lookup time) for file create now.
      For opens (rather than creates), because we do not know if it
      is a file or directory yet, and current Samba no longer allows
      us to do posix open on dirs, we could end up wasting an open call
      on what turns out to be a dir. For file opens, we wait to call posix
      open till cifs_open.  It could be added here (lookup) in the future
      but the performance tradeoff of the extra network request when EISDIR
      or EACCES is returned would have to be weighed against the 50%
      reduction in network traffic in the other paths.
      Reviewed-by: NShirish Pargaonkar <shirishp@us.ibm.com>
      Tested-by: NJeff Layton <jlayton@redhat.com>
      CC: Jeremy Allison <jra@samba.org>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      8db14ca1
  14. 22 5月, 2009 2 次提交
  15. 19 5月, 2009 2 次提交
  16. 18 5月, 2009 3 次提交
新手
引导
客服 返回
顶部