1. 14 2月, 2012 2 次提交
  2. 11 2月, 2012 1 次提交
    • C
      xfs: use a normal shrinker for the dquot freelist · 04da0c81
      Christoph Hellwig 提交于
      Stop reusing dquots from the freelist when allocating new ones directly, and
      implement a shrinker that actually follows the specifications for the
      interface.  The shrinker implementation is still highly suboptimal at this
      point, but we can gradually work on it.
      
      This also fixes an bug in the previous lock ordering, where we would take
      the hash and dqlist locks inside of the freelist lock against the normal
      lock ordering.  This is only solvable by introducing the dispose list,
      and thus not when using direct reclaim of unused dquots for new allocations.
      
      As a side-effect the quota upper bound and used to free ratio values in
      /proc/fs/xfs/xqm are set to 0 as these values don't make any sense in the
      new world order.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      04da0c81
  3. 09 2月, 2012 2 次提交
  4. 07 2月, 2012 2 次提交
    • T
      block: strip out locking optimization in put_io_context() · 11a3122f
      Tejun Heo 提交于
      put_io_context() performed a complex trylock dancing to avoid
      deferring ioc release to workqueue.  It was also broken on UP because
      trylock was always assumed to succeed which resulted in unbalanced
      preemption count.
      
      While there are ways to fix the UP breakage, even the most
      pathological microbench (forced ioc allocation and tight fork/exit
      loop) fails to show any appreciable performance benefit of the
      optimization.  Strip it out.  If there turns out to be workloads which
      are affected by this change, simpler optimization from the discussion
      thread can be applied later.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      LKML-Reference: <1328514611.21268.66.camel@sli10-conroe>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      11a3122f
    • H
      exec: fix use-after-free bug in setup_new_exec() · 96e02d15
      Heiko Carstens 提交于
      Setting the task name is done within setup_new_exec() by accessing
      bprm->filename. However this happens after flush_old_exec().
      This may result in a use after free bug, flush_old_exec() may
      "complete" vfork_done, which will wake up the parent which in turn
      may free the passed in filename.
      To fix this add a new tcomm field in struct linux_binprm which
      contains the now early generated task name until it is used.
      
      Fixes this bug on s390:
      
        Unable to handle kernel pointer dereference at virtual kernel address 0000000039768000
        Process kworker/u:3 (pid: 245, task: 000000003a3dc840, ksp: 0000000039453818)
        Krnl PSW : 0704000180000000 0000000000282e94 (setup_new_exec+0xa0/0x374)
        Call Trace:
        ([<0000000000282e2c>] setup_new_exec+0x38/0x374)
         [<00000000002dd12e>] load_elf_binary+0x402/0x1bf4
         [<0000000000280a42>] search_binary_handler+0x38e/0x5bc
         [<0000000000282b6c>] do_execve_common+0x410/0x514
         [<0000000000282cb6>] do_execve+0x46/0x58
         [<00000000005bce58>] kernel_execve+0x28/0x70
         [<000000000014ba2e>] ____call_usermodehelper+0x102/0x140
         [<00000000005bc8da>] kernel_thread_starter+0x6/0xc
         [<00000000005bc8d4>] kernel_thread_starter+0x0/0xc
        Last Breaking-Event-Address:
         [<00000000002830f0>] setup_new_exec+0x2fc/0x374
      
        Kernel panic - not syncing: Fatal exception: panic_on_oops
      Reported-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      96e02d15
  5. 03 2月, 2012 5 次提交
  6. 02 2月, 2012 4 次提交
  7. 01 2月, 2012 3 次提交
  8. 31 1月, 2012 2 次提交
  9. 28 1月, 2012 9 次提交
    • J
      Logfs: Allow NULL block_isbad() methods · f2933e86
      Joern Engel 提交于
      Not all mtd drivers define block_isbad().  Let's assume no bad blocks
      instead of refusing to mount.
      Signed-off-by: NJoern Engel <joern@logfs.org>
      f2933e86
    • J
      logfs: Grow inode in delete path · bbe01387
      Joern Engel 提交于
      Can be necessary if an inode gets deleted (through -ENOSPC) before being
      written.  Might be better to move this into logfs_write_rec(), but for
      now go with the stupid&safe patch.
      Signed-off-by: NJoern Engel <joern@logfs.org>
      bbe01387
    • J
      logfs: Free areas before calling generic_shutdown_super() · 1bcceaff
      Joern Engel 提交于
      Or hit an assertion in map_invalidatepage() instead.
      Signed-off-by: NJoern Engel <joern@logfs.org>
      1bcceaff
    • J
      logfs: remove useless BUG_ON · 6c69494f
      Joern Engel 提交于
      It prevents write sizes >4k.
      Signed-off-by: NJoern Engel <joern@logfs.org>
      6c69494f
    • P
      logfs: Propagate page parameter to __logfs_write_inode · 0bd90387
      Prasad Joshi 提交于
      During GC LogFS has to rewrite each valid block to a separate segment.
      Rewrite operation reads data from an old segment and writes it to a
      newly allocated segment. Since every write operation changes data
      block pointers maintained in inode, inode should also be rewritten.
      
      In GC path to avoid AB-BA deadlock LogFS marks a page with
      PG_pre_locked in addition to locking the page (PG_locked). The page
      lock is ignored iff the page is pre-locked.
      
      LogFS uses a special file called segment file. The segment file
      maintains an 8 bytes entry for every segment. It keeps track of erase
      count, level etc. for every segment.
      
      Bad things happen with a segment belonging to the segment file is GCed
      
       ------------[ cut here ]------------
      kernel BUG at /home/prasad/logfs/readwrite.c:297!
      invalid opcode: 0000 [#1] SMP
      Modules linked in: logfs joydev usbhid hid psmouse e1000 i2c_piix4
      		serio_raw [last unloaded: logfs]
      Pid: 20161, comm: mount Not tainted 3.1.0-rc3+ #3 innotek GmbH
      		VirtualBox
      EIP: 0060:[<f809132a>] EFLAGS: 00010292 CPU: 0
      EIP is at logfs_lock_write_page+0x6a/0x70 [logfs]
      EAX: 00000027 EBX: f73f5b20 ECX: c16007c8 EDX: 00000094
      ESI: 00000000 EDI: e59be6e4 EBP: c7337b28 ESP: c7337b18
      DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
      Process mount (pid: 20161, ti=c7336000 task=eb323f70 task.ti=c7336000)
      Stack:
      f8099a3d c7337b24 f73f5b20 00001002 c7337b50 f8091f6d f8099a4d f80994e4
      00000003 00000000 c7337b68 00000000 c67e4400 00001000 c7337b80 f80935e5
      00000000 00000000 00000000 00000000 e1fcf000 0000000f e59be618 c70bf900
      Call Trace:
      [<f8091f6d>] logfs_get_write_page.clone.16+0xdd/0x100 [logfs]
      [<f80935e5>] logfs_mod_segment_entry+0x55/0x110 [logfs]
      [<f809460d>] logfs_get_segment_entry+0x1d/0x20 [logfs]
      [<f8091060>] ? logfs_cleanup_journal+0x50/0x50 [logfs]
      [<f809521b>] ostore_get_erase_count+0x1b/0x40 [logfs]
      [<f80965b8>] logfs_open_area+0xc8/0x150 [logfs]
      [<c141a7ec>] ? kmemleak_alloc+0x2c/0x60
      [<f809668e>] __logfs_segment_write.clone.16+0x4e/0x1b0 [logfs]
      [<c10dd563>] ? mempool_kmalloc+0x13/0x20
      [<c10dd563>] ? mempool_kmalloc+0x13/0x20
      [<f809696f>] logfs_segment_write+0x17f/0x1d0 [logfs]
      [<f8092e8c>] logfs_write_i0+0x11c/0x180 [logfs]
      [<f8092f35>] logfs_write_direct+0x45/0x90 [logfs]
      [<f80934cd>] __logfs_write_buf+0xbd/0xf0 [logfs]
      [<c102900e>] ? kmap_atomic_prot+0x4e/0xe0
      [<f809424b>] logfs_write_buf+0x3b/0x60 [logfs]
      [<f80947a9>] __logfs_write_inode+0xa9/0x110 [logfs]
      [<f8094cb0>] logfs_rewrite_block+0xc0/0x110 [logfs]
      [<f8095300>] ? get_mapping_page+0x10/0x60 [logfs]
      [<f8095aa0>] ? logfs_load_object_aliases+0x2e0/0x2f0 [logfs]
      [<f808e57d>] logfs_gc_segment+0x2ad/0x310 [logfs]
      [<f808e62a>] __logfs_gc_once+0x4a/0x80 [logfs]
      [<f808ed43>] logfs_gc_pass+0x683/0x6a0 [logfs]
      [<f8097a89>] logfs_mount+0x5a9/0x680 [logfs]
      [<c1126b21>] mount_fs+0x21/0xd0
      [<c10f6f6f>] ? __alloc_percpu+0xf/0x20
      [<c113da41>] ? alloc_vfsmnt+0xb1/0x130
      [<c113db4b>] vfs_kern_mount+0x4b/0xa0
      [<c113e06e>] do_kern_mount+0x3e/0xe0
      [<c113f60d>] do_mount+0x34d/0x670
      [<c10f2749>] ? strndup_user+0x49/0x70
      [<c113fcab>] sys_mount+0x6b/0xa0
      [<c142d87c>] syscall_call+0x7/0xb
      Code: f8 e8 8b 93 39 c9 8b 45 f8 3e 0f ba 28 00 19 d2 85 d2 74 ca eb d0 0f 0b 8d 45 fc 89 44 24 04 c7 04 24 3d 9a 09 f8 e8 09 92 39 c9 <0f> 0b 8d 74 26 00 55 89 e5 3e 8d 74 26 00 8b 10 80 e6 01 74 09
      EIP: [<f809132a>] logfs_lock_write_page+0x6a/0x70 [logfs] SS:ESP 0068:c7337b18
      ---[ end trace 96e67d5b3aa3d6ca ]---
      
      The patch passes locked page to __logfs_write_inode. It calls function
      logfs_get_wblocks() to pre-lock the page. This ensures any further
      attempts to lock the page are ignored (esp from get_erase_count).
      Acked-by: NJoern Engel <joern@logfs.org>
      Signed-off-by: NPrasad Joshi <prasadjoshi.linux@gmail.com>
      0bd90387
    • P
      logfs: set superblock shutdown flag after generic sb shutdown · ecfd8909
      Prasad Joshi 提交于
      While unmounting the file system LogFS calls generic_shutdown_super.
      The function does file system independent superblock shutdown.
      However, it might result in call file system specific inode eviction.
      
      LogFS marks FS shutting down by setting bit LOGFS_SB_FLAG_SHUTDOWN in
      super->s_flags. Since, inode eviction might call truncate on inode,
      following BUG is observed when file system is unmounted:
      
      ------------[ cut here ]------------
      kernel BUG at /home/prasad/logfs/segment.c:362!
      invalid opcode: 0000 [#1] PREEMPT SMP
      CPU 3
      Modules linked in: logfs binfmt_misc ppdev virtio_blk parport_pc lp
      	parport psmouse floppy virtio_pci serio_raw virtio_ring virtio
      
      Pid: 1933, comm: umount Not tainted 3.0.0+ #4 Bochs Bochs
      RIP: 0010:[<ffffffffa008c841>]  [<ffffffffa008c841>]
      		logfs_segment_write+0x211/0x230 [logfs]
      RSP: 0018:ffff880062d7b9e8  EFLAGS: 00010202
      RAX: 000000000000000e RBX: ffff88006eca9000 RCX: 0000000000000000
      RDX: ffff88006fd87c40 RSI: ffffea00014ff468 RDI: ffff88007b68e000
      RBP: ffff880062d7ba48 R08: 8000000020451430 R09: 0000000000000000
      R10: dead000000100100 R11: 0000000000000000 R12: ffff88006fd87c40
      R13: ffffea00014ff468 R14: ffff88005ad0a460 R15: 0000000000000000
      FS:  00007f25d50ea760(0000) GS:ffff88007fd80000(0000)
      	knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000000000d05e48 CR3: 0000000062c72000 CR4: 00000000000006e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process umount (pid: 1933, threadinfo ffff880062d7a000,
      	task ffff880070b44500)
      Stack:
      ffff880062d7ba38 ffff88005ad0a508 0000000000001000 0000000000000000
      8000000020451430 ffffea00014ff468 ffff880062d7ba48 ffff88005ad0a460
      ffff880062d7bad8 ffffea00014ff468 ffff88006fd87c40 0000000000000000
      Call Trace:
      [<ffffffffa0088fee>] logfs_write_i0+0x12e/0x190 [logfs]
      [<ffffffffa0089360>] __logfs_write_rec+0x140/0x220 [logfs]
      [<ffffffffa0089312>] __logfs_write_rec+0xf2/0x220 [logfs]
      [<ffffffffa00894a4>] logfs_write_rec+0x64/0xd0 [logfs]
      [<ffffffffa0089616>] __logfs_write_buf+0x106/0x110 [logfs]
      [<ffffffffa008a19e>] logfs_write_buf+0x4e/0x80 [logfs]
      [<ffffffffa008a6b8>] __logfs_write_inode+0x98/0x110 [logfs]
      [<ffffffffa008a7c4>] logfs_truncate+0x54/0x290 [logfs]
      [<ffffffffa008abfc>] logfs_evict_inode+0xdc/0x190 [logfs]
      [<ffffffff8115eef5>] evict+0x85/0x170
      [<ffffffff8115f126>] iput+0xe6/0x1b0
      [<ffffffff8115b4a8>] shrink_dcache_for_umount_subtree+0x218/0x280
      [<ffffffff8115ce91>] shrink_dcache_for_umount+0x51/0x90
      [<ffffffff8114796c>] generic_shutdown_super+0x2c/0x100
      [<ffffffffa008cc47>] logfs_kill_sb+0x57/0xf0 [logfs]
      [<ffffffff81147de5>] deactivate_locked_super+0x45/0x70
      [<ffffffff811487ea>] deactivate_super+0x4a/0x70
      [<ffffffff81163934>] mntput_no_expire+0xa4/0xf0
      [<ffffffff8116469f>] sys_umount+0x6f/0x380
      [<ffffffff814dd46b>] system_call_fastpath+0x16/0x1b
      Code: 55 c8 49 8d b6 a8 00 00 00 45 89 f9 45 89 e8 4c 89 e1 4c 89 55
      b8 c7 04 24 00 00 00 00 e8 68 fc ff ff 4c 8b 55 b8 e9 3c ff ff ff <0f>
      0b 0f 0b c7 45 c0 00 00 00 00 e9 44 fe ff ff 66 66 66 66 66
      RIP  [<ffffffffa008c841>] logfs_segment_write+0x211/0x230 [logfs]
      RSP <ffff880062d7b9e8>
      ---[ end trace fe6b040cea952290 ]---
      
      Therefore, move super->s_flags setting after the fs-indenpendent work
      has been finished.
      Reviewed-by: NJoern Engel <joern@logfs.org>
      Signed-off-by: NPrasad Joshi <prasadjoshi.linux@gmail.com>
      ecfd8909
    • P
      logfs: take write mutex lock during fsync and sync · 13ced29c
      Prasad Joshi 提交于
      LogFS uses super->s_write_mutex while writing data to disk. Taking the
      same mutex lock in sync and fsync code path solves the following BUG:
      
      ------------[ cut here ]------------
      kernel BUG at /home/prasad/logfs/dev_bdev.c:134!
      
      Pid: 2387, comm: flush-253:16 Not tainted 3.0.0+ #4 Bochs Bochs
      RIP: 0010:[<ffffffffa007deed>]  [<ffffffffa007deed>]
                      bdev_writeseg+0x25d/0x270 [logfs]
      Call Trace:
      [<ffffffffa007c381>] logfs_open_area+0x91/0x150 [logfs]
      [<ffffffff8128dcb2>] ? find_level.clone.9+0x62/0x100
      [<ffffffffa007c49c>] __logfs_segment_write.clone.20+0x5c/0x190 [logfs]
      [<ffffffff810ef005>] ? mempool_kmalloc+0x15/0x20
      [<ffffffff810ef383>] ? mempool_alloc+0x53/0x130
      [<ffffffffa007c7a4>] logfs_segment_write+0x1d4/0x230 [logfs]
      [<ffffffffa0078f8e>] logfs_write_i0+0x12e/0x190 [logfs]
      [<ffffffffa0079300>] __logfs_write_rec+0x140/0x220 [logfs]
      [<ffffffffa0079444>] logfs_write_rec+0x64/0xd0 [logfs]
      [<ffffffffa00795b6>] __logfs_write_buf+0x106/0x110 [logfs]
      [<ffffffffa007a13e>] logfs_write_buf+0x4e/0x80 [logfs]
      [<ffffffffa0073e33>] __logfs_writepage+0x23/0x80 [logfs]
      [<ffffffffa007410c>] logfs_writepage+0xdc/0x110 [logfs]
      [<ffffffff810f5ba7>] __writepage+0x17/0x40
      [<ffffffff810f6208>] write_cache_pages+0x208/0x4f0
      [<ffffffff810f5b90>] ? set_page_dirty+0x70/0x70
      [<ffffffff810f653a>] generic_writepages+0x4a/0x70
      [<ffffffff810f75d1>] do_writepages+0x21/0x40
      [<ffffffff8116b9d1>] writeback_single_inode+0x101/0x250
      [<ffffffff8116bdbd>] writeback_sb_inodes+0xed/0x1c0
      [<ffffffff8116c5fb>] writeback_inodes_wb+0x7b/0x1e0
      [<ffffffff8116cc23>] wb_writeback+0x4c3/0x530
      [<ffffffff814d984d>] ? sub_preempt_count+0x9d/0xd0
      [<ffffffff8116cd6b>] wb_do_writeback+0xdb/0x290
      [<ffffffff814d984d>] ? sub_preempt_count+0x9d/0xd0
      [<ffffffff814d6208>] ? _raw_spin_unlock_irqrestore+0x18/0x40
      [<ffffffff8105aa5a>] ? del_timer+0x8a/0x120
      [<ffffffff8116cfac>] bdi_writeback_thread+0x8c/0x2e0
      [<ffffffff8116cf20>] ? wb_do_writeback+0x290/0x290
      [<ffffffff8106d2e6>] kthread+0x96/0xa0
      [<ffffffff814de514>] kernel_thread_helper+0x4/0x10
      [<ffffffff8106d250>] ? kthread_worker_fn+0x190/0x190
      [<ffffffff814de510>] ? gs_change+0xb/0xb
      RIP  [<ffffffffa007deed>] bdev_writeseg+0x25d/0x270 [logfs]
      ---[ end trace 0211ad60a57657c4 ]---
      Reviewed-by: NJoern Engel <joern@logfs.org>
      Signed-off-by: NPrasad Joshi <prasadjoshi.linux@gmail.com>
      13ced29c
    • J
      logfs: Prevent memory corruption · 934eed39
      Joern Engel 提交于
      This is a bad one.  I wonder whether we were so far protected by
      no_free_segments(sb) usually being smaller than LOGFS_NO_AREAS.
      
      Found by Dan Carpenter <dan.carpenter@oracle.com> using smatch.
      Signed-off-by: NJoern Engel <joern@logfs.org>
      Signed-off-by: NPrasad Joshi <prasadjoshi.linux@gmail.com>
      934eed39
    • P
      logfs: update page reference count for pined pages · 96150606
      Prasad Joshi 提交于
      LogFS sets PG_private flag to indicate a pined page. We assumed that
      marking a page as private is enough to ensure its existence. But
      instead it is necessary to hold a reference count to the page.
      
      The change resolves the following BUG
      
      BUG: Bad page state in process flush-253:16  pfn:6a6d0
      page flags: 0x100000000000808(uptodate|private)
      Suggested-and-Acked-by: NJoern Engel <joern@logfs.org>
      Signed-off-by: NPrasad Joshi <prasadjoshi.linux@gmail.com>
      96150606
  10. 27 1月, 2012 10 次提交
    • C
      Btrfs: fix reservations in btrfs_page_mkwrite · 9998eb70
      Chris Mason 提交于
      Josef fixed btrfs_page_mkwrite to properly release reserved
      extents if there was an error.  But if we fail to get a reservation
      and we fail to dirty the inode (for ENOSPC reasons), we'll end up
      trying to release a reservation we never had.
      
      This makes sure we only release if we were able to reserve.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      9998eb70
    • J
      Btrfs: advance window_start if we're using a bitmap · 9b230628
      Josef Bacik 提交于
      If we span a long area in a bitmap we could end up taking a lot of time
      searching to the next free area if we're searching from the original
      window_start, so advance window_start in order to make sure we don't do any
      superficial searching.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      9b230628
    • D
      btrfs: mask out gfp flags in releasepage · 0c4e538b
      David Sterba 提交于
      btree_releasepage is a callback and can be passed unknown gfp flags and then
      they may end up in kmem_cache_alloc called from alloc_extent_state, slab
      allocator will BUG_ON when there is HIGHMEM or DMA32 flag set.
      
      This may happen when btrfs is mounted from a loop device, which masks out
      __GFP_IO flag. The check in try_release_extent_state
      
      3399                 if ((mask & GFP_NOFS) == GFP_NOFS)
      3400                         mask = GFP_NOFS;
      
      will not work and passes unfiltered flags further resulting in crash at
      mm/slab.c:2963
      
       [<000000000024ae4c>] cache_alloc_refill+0x3b4/0x5c8
       [<000000000024c810>] kmem_cache_alloc+0x204/0x294
       [<00000000001fd3c2>] mempool_alloc+0x52/0x170
       [<000003c000ced0b0>] alloc_extent_state+0x40/0xd4 [btrfs]
       [<000003c000cee5ae>] __clear_extent_bit+0x38a/0x4cc [btrfs]
       [<000003c000cee78c>] try_release_extent_state+0x9c/0xd4 [btrfs]
       [<000003c000cc4c66>] btree_releasepage+0x7e/0xd0 [btrfs]
       [<0000000000210d84>] shrink_page_list+0x6a0/0x724
       [<0000000000211394>] shrink_inactive_list+0x230/0x578
       [<0000000000211bb8>] shrink_list+0x6c/0x120
       [<0000000000211e4e>] shrink_zone+0x1e2/0x228
       [<0000000000211f24>] shrink_zones+0x90/0x254
       [<0000000000213410>] do_try_to_free_pages+0xac/0x420
       [<0000000000213ae0>] try_to_free_pages+0x13c/0x1b0
       [<0000000000204e6c>] __alloc_pages_nodemask+0x5b4/0x9a8
       [<00000000001fb04a>] grab_cache_page_write_begin+0x7e/0xe8
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      0c4e538b
    • M
      Btrfs: fix enospc error caused by wrong checks of the chunk · 9e622d6b
      Miao Xie 提交于
      When we did sysbench test for inline files, enospc error happened easily though
      there was lots of free disk space which could be allocated for new chunks.
      
      Reproduce steps:
       # mkfs.btrfs -b $((2 * 1024 * 1024 * 1024)) <test partition>
       # mount <test partition> /mnt
       # ulimit -n 102400
       # cd /mnt
       # sysbench --num-threads=1 --test=fileio --file-num=81920 \
       > --file-total-size=80M --file-block-size=1K --file-io-mode=sync \
       > --file-test-mode=seqwr prepare
       # sysbench --num-threads=1 --test=fileio --file-num=81920 \
       > --file-total-size=80M --file-block-size=1K --file-io-mode=sync \
       > --file-test-mode=seqwr run
       <soon later, BUG_ON() was triggered by enospc error>
      
      The reason of this bug is:
      Now, we can reserve space which is larger than the free space in the chunks if
      we have enough free disk space which can be used for new chunks. By this way,
      the space allocator should allocate a new chunk by force if there is no free
      space in the free space cache. But there are two wrong checks which break this
      operation.
      
      One is
      	if (ret == -ENOSPC && num_bytes > min_alloc_size)
      in btrfs_reserve_extent(), it is wrong, we should try to allocate a new chunk
      even we fail to allocate free space by minimum allocable size.
      
      The other is
      	if (space_info->force_alloc)
      		force = space_info->force_alloc;
      in do_chunk_alloc(). It makes the allocator ignore CHUNK_ALLOC_FORCE If someone
      sets ->force_alloc to CHUNK_ALLOC_LIMITED, and makes the enospc error happen.
      
      Fix these two wrong checks. Especially the second one, we fix it by changing
      the value of CHUNK_ALLOC_LIMITED and CHUNK_ALLOC_FORCE, and make
      CHUNK_ALLOC_FORCE greater than CHUNK_ALLOC_LIMITED since CHUNK_ALLOC_FORCE has
      higher priority. And if the value which is passed in by the caller is greater
      than ->force_alloc, use the passed value.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      9e622d6b
    • L
      Btrfs: do not defrag a file partially · 7ec31b54
      Liu Bo 提交于
      xfstests 218 complains that btrfs defrags a file partially:
       After: 1
       Write backwards sync, but contiguous - should defrag to 1 extent
       Before: 10
      -After: 1
      +After: 2
      
      To fix this, we need to set max_to_defrag count properly.
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      7ec31b54
    • S
      Btrfs: fix warning for 32-bit build of fs/btrfs/check-integrity.c · 0b485143
      Stefan Behrens 提交于
      There have been 4 warnings on 32-bit build, they are herewith fixed.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      0b485143
    • J
      Btrfs: use cluster->window_start when allocating from a cluster bitmap · 0b4a9d24
      Josef Bacik 提交于
      We specifically set window_start in the cluster struct to indicate where the
      cluster starts in a bitmap, but we've been using min_start to indicate where
      we're searching from.  This is usually the start of the blockgroup, so
      essentially means we're constantly searching from the start of any bitmap we
      find, which completely negates all the trouble we go to in order to setup a
      cluster.  So start using window_start to make sure we actually use the area we
      found.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      0b4a9d24
    • M
      Btrfs: Check for NULL page in extent_range_uptodate · 8bedd51b
      Mitch Harder 提交于
      A user has encountered a NULL pointer kernel oops in btrfs when
      encountering media errors.  The problem has been identified
      as an unhandled NULL pointer returned from find_get_page().
      This modification simply checks for a NULL page, and returns
      with an error if found (the extent_range_uptodate() function
      returns 1 on errors).
      
      After testing this patch, the user reported that the error with
      the NULL pointer oops was solved.  However, there is still a
      remaining problem with a thread becoming stuck in
      wait_on_page_locked(page) in the read_extent_buffer_pages(...)
      function in extent_io.c
      
             for (i = start_i; i < num_pages; i++) {
                     page = extent_buffer_page(eb, i);
                     wait_on_page_locked(page);
                     if (!PageUptodate(page))
                             ret = -EIO;
             }
      
      This patch leaves the issue with the locked page yet to be resolved.
      Signed-off-by: NMitch Harder <mitch.harder@sabayonlinux.org>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      8bedd51b
    • J
      btrfs: Fix busyloops in transaction waiting code · 6dd70ce4
      Jan Kara 提交于
      wait_log_commit() and wait_for_writer() were using slightly different
      conditions for deciding whether they should call schedule() and whether they
      should continue in the wait loop. Thus it could happen that we busylooped when
      the first condition was not true while the second one was. That is burning CPU
      cycles needlessly and is deadly on UP machines...
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      6dd70ce4
    • J
      Btrfs: make sure a bitmap has enough bytes · 357b9784
      Josef Bacik 提交于
      We have only been checking for min_bytes available in bitmap entries, but we
      won't successfully setup a bitmap cluster unless it has at least bytes in the
      bitmap, so in the common case min_bytes is 4k and we want something like 2MB, so
      if there are a bunch of bitmap entries with less than 2mb's in them, we'll
      search all them anyway, which is suboptimal.  Fix this check.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      357b9784