1. 30 9月, 2008 21 次提交
  2. 17 9月, 2008 7 次提交
    • L
      [XFS] Don't do I/O beyond eof when unreserving space · 2fd6f6ec
      Lachlan McIlroy 提交于
      When unreserving space with boundaries that are not block aligned we round
      up the start and round down the end boundaries and then use this function,
      xfs_zero_remaining_bytes(), to zero the parts of the blocks that got
      dropped during the rounding. The problem is we don't consider if these
      blocks are beyond eof. Worse still is if we encounter delayed allocations
      beyond eof we will try to use the magic delayed allocation block number as
      a real block number. If the file size is ever extended to expose these
      blocks then we'll go through xfs_zero_eof() to zero them anyway.
      
      SGI-PV: 983683
      
      SGI-Modid: xfs-linux-melb:xfs-kern:32055a
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      2fd6f6ec
    • L
      [XFS] Fix use-after-free with buffers · e1f5dbd7
      Lachlan McIlroy 提交于
      We have a use-after-free issue where log completions access buffers via
      the buffer log item and the buffer has already been freed. Fix this by
      taking a reference on the buffer when attaching the buffer log item and
      release the hold when the buffer log item is detached and we no longer
      need the buffer. Also create a new function xfs_buf_item_free() to combine
      some common code.
      
      SGI-PV: 985757
      
      SGI-Modid: xfs-linux-melb:xfs-kern:32025a
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      e1f5dbd7
    • D
      [XFS] Prevent lockdep false positives when locking two inodes. · f9114eba
      David Chinner 提交于
      If we call xfs_lock_two_inodes() to grab both the iolock and the ilock,
      then drop the ilocks on both inodes, then grab them again (as
      xfs_swap_extents() does) then lockdep will report a locking order problem.
      This is a false positive.
      
      To avoid this, disallow xfs_lock_two_inodes() fom locking both inode locks
      at once - force calers to make two separate calls. This means that nested
      dropping and regaining of the ilocks will retain the same lockdep subclass
      and so lockdep will not see anything wrong with this code.
      
      SGI-PV: 986238
      
      SGI-Modid: xfs-linux-melb:xfs-kern:31999a
      Signed-off-by: NDavid Chinner <david@fromorbit.com>
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NPeter Leckie <pleckie@sgi.com>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      f9114eba
    • D
      [XFS] Fix barrier status change detection. · b5b8c9ac
      David Chinner 提交于
      The current code in xlog_iodone() uses the wrong macro to check if the
      barrier has been cleared due to an EOPNOTSUPP error form the lower layer.
      
      SGI-PV: 986143
      
      SGI-Modid: xfs-linux-melb:xfs-kern:31984a
      Signed-off-by: NDavid Chinner <david@fromorbit.com>
      Signed-off-by: NNathaniel W. Turner <nate@houseofnate.net>
      Signed-off-by: NPeter Leckie <pleckie@sgi.com>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      b5b8c9ac
    • L
      [XFS] Prevent direct I/O from mapping extents beyond eof · 364f358a
      Lachlan McIlroy 提交于
      With the help from some tracing I found that we try to map extents beyond
      eof when doing a direct I/O read. It appears that the way to inform the
      generic direct I/O path (ie do_direct_IO()) that we have breached eof is
      to return an unmapped buffer from xfs_get_blocks_direct(). This will cause
      do_direct_IO() to jump to the hole handling code where is will check for
      eof and then abort.
      
      This problem was found because a direct I/O read was trying to map beyond
      eof and was encountering delayed allocations. The delayed allocations
      beyond eof are speculative allocations and they didn't get converted when
      the direct I/O flushed the file because there was only enough space in the
      current AG to convert and write out the dirty pages within eof. Note that
      xfs_iomap_write_allocate() wont necessarily convert all the delayed
      allocation passed to it - it will return after allocating the first extent
      - so if the delayed allocation extends beyond eof then it will stay that
      way.
      
      SGI-PV: 983683
      
      SGI-Modid: xfs-linux-melb:xfs-kern:31929a
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      364f358a
    • C
      [XFS] Fix regression introduced by remount fixup · 6efdf281
      Christoph Hellwig 提交于
      Logically we would return an error in xfs_fs_remount code to prevent users
      from believing they might have changed mount options using remount which
      can't be changed.
      
      But unfortunately mount(8) adds all options from mtab and fstab to the
      mount arguments in some cases so we can't blindly reject options, but have
      to check for each specified option if it actually differs from the
      currently set option and only reject it if that's the case.
      
      Until that is implemented we return success for every remount request, and
      silently ignore all options that we can't actually change.
      
      SGI-PV: 985710
      
      SGI-Modid: xfs-linux-melb:xfs-kern:31908a
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      6efdf281
    • L
      [XFS] Move memory allocations for log tracing out of the critical path · 31bd61f2
      Lachlan McIlroy 提交于
      Memory allocations for log->l_grant_trace and iclog->ic_trace are done on
      demand when the first event is logged. In xlog_state_get_iclog_space() we
      call xlog_trace_iclog() under a spinlock and allocating memory here can
      cause us to sleep with a spinlock held and deadlock the system.
      
      For the log grant tracing we use KM_NOSLEEP but that means we can lose
      trace entries. Since there is no locking to serialize the log grant
      tracing we could race and have multiple allocations and leak memory.
      
      So move the allocations to where we initialize the log/iclog structures.
      Use KM_NOFS to avoid recursing into the filesystem and drop log->l_trace
      since it's not even used.
      
      SGI-PV: 983738
      
      SGI-Modid: xfs-linux-melb:xfs-kern:31896a
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      31bd61f2
  3. 14 9月, 2008 4 次提交
    • A
      rescan_partitions(): make device capacity errors non-fatal · 8d99f83b
      Andrew Morton 提交于
      Herton Krzesinski reports that the error-checking changes in
      04ebd4ae ("block/ioctl.c and
      fs/partition/check.c: check value returned by add_partition") cause his
      buggy USB camera to no longer mount.  "The camera is an Olympus X-840.
      The original issue comes from the camera itself: its format program
      creates a partition with an off by one error".
      
      Buggy devices happen.  It is better for the kernel to warn and to proceed
      with the mount.
      Reported-by: NHerton Ronaldo Krzesinski <herton@mandriva.com.br>
      Cc: Abdel Benamrouche <draconux@gmail.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: David Brownell <david-b@pacbell.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8d99f83b
    • H
      mm: ifdef Quicklists in /proc/meminfo · d7a3e495
      Hugh Dickins 提交于
      A "Quicklists:          0 kB" line has just started appearing in
      /proc/meminfo, but most architectures (including x86) don't have
      them configured, so #ifdef it, like the highmem lines.
      
      And those architectures which do have quicklists configured are
      using them for page tables: so let's place it next to PageTables.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Acked-by: NChristoph Lameter <cl@linux-foundation.org>
      Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d7a3e495
    • E
      bfs: fix Lockdep warning · 1558182f
      Eric Sesterhenn 提交于
      This fixes:
      
        =============================================
        [ INFO: possible recursive locking detected ]
        2.6.27-rc5-00283-g70bb0896 #68
        ---------------------------------------------
        touch/6855 is trying to acquire lock:
         (&info->bfs_lock){--..}, at: [<c02262f5>] bfs_delete_inode+0x9e/0x18c
      
        but task is already holding lock:
         (&info->bfs_lock){--..}, at: [<c0226c00>] bfs_create+0x45/0x187
      
        other info that might help us debug this:
        2 locks held by touch/6855:
         #0:  (&type->i_mutex_dir_key#5){--..}, at: [<c018ad13>] do_filp_open+0x10b/0x62f
         #1:  (&info->bfs_lock){--..}, at: [<c0226c00>] bfs_create+0x45/0x187
      
        stack backtrace:
        Pid: 6855, comm: touch Not tainted 2.6.27-rc5-00283-g70bb0896 #68
         [<c013e769>] validate_chain+0x458/0x9f4
         [<c013bece>] ? trace_hardirqs_off+0xb/0xd
         [<c013f36b>] __lock_acquire+0x666/0x6e0
         [<c013f440>] lock_acquire+0x5b/0x77
         [<c02262f5>] ? bfs_delete_inode+0x9e/0x18c
         [<c06aab74>] mutex_lock_nested+0xbc/0x234
         [<c02262f5>] ? bfs_delete_inode+0x9e/0x18c
         [<c02262f5>] ? bfs_delete_inode+0x9e/0x18c
         [<c02262f5>] bfs_delete_inode+0x9e/0x18c
         [<c0226257>] ? bfs_delete_inode+0x0/0x18c
         [<c01925e1>] generic_delete_inode+0x94/0xfe
         [<c019265d>] generic_drop_inode+0x12/0x12f
         [<c0191b7e>] iput+0x4b/0x4e
         [<c0226d1e>] bfs_create+0x163/0x187
         [<c0188b42>] vfs_create+0xa6/0x114
         [<c018adb5>] do_filp_open+0x1ad/0x62f
         [<c0107cdc>] ? native_sched_clock+0x82/0x96
         [<c06ac309>] ? _spin_unlock+0x27/0x3c
         [<c019379e>] ? alloc_fd+0xbf/0xc9
         [<c06ae2f4>] ? sub_preempt_count+0x9d/0xab
         [<c019379e>] ? alloc_fd+0xbf/0xc9
         [<c0180391>] do_sys_open+0x42/0xb8
         [<c041d564>] ? trace_hardirqs_on_thunk+0xc/0x10
         [<c0180449>] sys_open+0x1e/0x26
         [<c01038bd>] sysenter_do_call+0x12/0x31
         =======================
      
      The problem is that we don't unlock the bfs->lock mutex before calling
      iput (we do in the other cases).
      Signed-off-by: NEric Sesterhenn <snakebyte@gmx.de>
      Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1558182f
    • A
      proc: more debugging for "already registered" case · 665020c3
      Alexey Dobriyan 提交于
      Print parent directory name as well.
      
      The aim is to catch non-creation of parent directory when proc_mkdir will
      return NULL and all subsequent registrations go directly in /proc instead
      of intended directory.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      [ Fixed insane printk string while at it.  - Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      665020c3
  4. 10 9月, 2008 1 次提交
    • T
      ocfs2: Fix a bug in direct IO read. · 0e116227
      Tao Ma 提交于
      ocfs2 will become read-only if we try to read the bytes which pass
      the end of i_size. This can be easily reproduced by following steps:
      1. mkfs a ocfs2 volume with bs=4k cs=4k and nosparse.
      2. create a small file(say less than 100 bytes) and we will create the file
         which is allocated 1 cluster.
      3. read 8196 bytes from the kernel using O_DIRECT which exceeds the limit.
      4. The ocfs2 volume becomes read-only and dmesg shows:
      OCFS2: ERROR (device sda13): ocfs2_direct_IO_get_blocks:
      Inode 66010 has a hole at block 1
      File system is now read-only due to the potential of on-disk corruption.
      Please run fsck.ocfs2 once the file system is unmounted.
      
      So suppress the ERROR message.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      0e116227
  5. 09 9月, 2008 2 次提交
    • C
      NFS: Restore missing hunk in NFS mount option parser · af904dea
      Chuck Lever 提交于
      Automounter maps can contain mount options valid for other NFS
      implementations but not for Linux.  The Linux automounter uses the
      mount command's "-s" command line option ("s" for "sloppy") so that
      mount requests containing such options are not rejected.
      
      Commit f45663ce attempted to address a
      known regression with text-based NFS mount option parsing.  Unrecognized
      mount options would cause mount requests to fail, even if the "-s"
      option was used on the mount command line.
      
      Unfortunately, this commit was not complete as submitted.  It adds a
      new mount option, "sloppy".  But it is missing a hunk, so it now allows
      NFS mounts with unrecognized mount options, even if the "sloppy" option
      is not present.  This could be a problem if a required critical mount
      option such as "sync" is misspelled, for example, and is considered a
      regression from 2.6.26.
      
      This patch restores the missing hunk.  Now, the default behavior of
      text-based NFS mount options is as before: any unrecognized mount option
      will cause the mount to fail.
      
      Please include this in 2.6.27-rc.
      
      Thanks to Neil Brown for reporting this.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Acked-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      af904dea
    • C
      udf: add llseek method · 5c89468c
      Christoph Hellwig 提交于
      UDF currently doesn't set a llseek method for regular files, which
      means it will fall back to default_llseek.  This means no one can seek
      beyond 2 Gigabytes on udf, and that there's not protection vs
      the i_size updates from writers.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      5c89468c
  6. 06 9月, 2008 3 次提交
    • A
      UBIFS: make minimum fanout 3 · a5cb562d
      Artem Bityutskiy 提交于
      UBIFS does not really work correctly when fanout is 2,
      because of the way we manage the indexing tree. It may
      just become a list and UBIFS screws up.
      Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      a5cb562d
    • A
      UBIFS: fix division by zero · f171d4d7
      Artem Bityutskiy 提交于
      If fanout is 3, we have division by zero in
      'ubifs_read_superblock()':
      
      divide error: 0000 [#1] PREEMPT SMP
      
      Pid: 28744, comm: mount Not tainted (2.6.27-rc4-ubifs-2.6 #23)
      EIP: 0060:[<f8f9e3ef>] EFLAGS: 00010202 CPU: 0
      EIP is at ubifs_reported_space+0x2d/0x69 [ubifs]
      EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000
      ESI: 00000000 EDI: f0ae64b0 EBP: f1f9fcf4 ESP: f1f9fce0
       DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
      Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      f171d4d7
    • B
      sched: fix process time monotonicity · 49048622
      Balbir Singh 提交于
      Spencer reported a problem where utime and stime were going negative despite
      the fixes in commit b27f03d4. The suspected
      reason for the problem is that signal_struct maintains it's own utime and
      stime (of exited tasks), these are not updated using the new task_utime()
      routine, hence sig->utime can go backwards and cause the same problem
      to occur (sig->utime, adds tsk->utime and not task_utime()). This patch
      fixes the problem
      
      TODO: using max(task->prev_utime, derived utime) works for now, but a more
      generic solution is to implement cputime_max() and use the cputime_gt()
      function for comparison.
      
      Reported-by: spencer@bluehost.com
      Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      49048622
  7. 03 9月, 2008 2 次提交
    • A
      UBIFS: amend f_fsid · 7c7cbadf
      Artem Bityutskiy 提交于
      David Woodhouse suggested to be consistent with other FSes
      and xor the beginning and the end of the UUID.
      Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      7c7cbadf
    • K
      mm: show quicklist usage in /proc/meminfo · 4b856152
      KOSAKI Motohiro 提交于
      Quicklists can consume several GB of memory.  We should provide a means of
      monitoring this.
      
      After this patch is applied, /proc/meminfo will output the following:
      
      % cat /proc/meminfo
      
      MemTotal:      7715392 kB
      MemFree:       5401600 kB
      Buffers:         80384 kB
      Cached:         300800 kB
      SwapCached:          0 kB
      Active:         235584 kB
      Inactive:       262656 kB
      SwapTotal:     2031488 kB
      SwapFree:      2031488 kB
      Dirty:            3520 kB
      Writeback:           0 kB
      AnonPages:      117696 kB
      Mapped:          38528 kB
      Slab:          1589952 kB
      SReclaimable:    23104 kB
      SUnreclaim:    1566848 kB
      PageTables:      14656 kB
      NFS_Unstable:        0 kB
      Bounce:              0 kB
      WritebackTmp:        0 kB
      CommitLimit:   5889152 kB
      Committed_AS:   393152 kB
      VmallocTotal: 17592177655808 kB
      VmallocUsed:     29056 kB
      VmallocChunk: 17592177626432 kB
      Quicklists:     130944 kB
      HugePages_Total:     0
      HugePages_Free:      0
      HugePages_Rsvd:      0
      HugePages_Surp:      0
      Hugepagesize:    262144 kB
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Keiichiro Tokunaga <tokunaga.keiich@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4b856152