1. 19 3月, 2012 1 次提交
  2. 28 1月, 2012 9 次提交
    • J
      Logfs: Allow NULL block_isbad() methods · f2933e86
      Joern Engel 提交于
      Not all mtd drivers define block_isbad().  Let's assume no bad blocks
      instead of refusing to mount.
      Signed-off-by: NJoern Engel <joern@logfs.org>
      f2933e86
    • J
      logfs: Grow inode in delete path · bbe01387
      Joern Engel 提交于
      Can be necessary if an inode gets deleted (through -ENOSPC) before being
      written.  Might be better to move this into logfs_write_rec(), but for
      now go with the stupid&safe patch.
      Signed-off-by: NJoern Engel <joern@logfs.org>
      bbe01387
    • J
      logfs: Free areas before calling generic_shutdown_super() · 1bcceaff
      Joern Engel 提交于
      Or hit an assertion in map_invalidatepage() instead.
      Signed-off-by: NJoern Engel <joern@logfs.org>
      1bcceaff
    • J
      logfs: remove useless BUG_ON · 6c69494f
      Joern Engel 提交于
      It prevents write sizes >4k.
      Signed-off-by: NJoern Engel <joern@logfs.org>
      6c69494f
    • P
      logfs: Propagate page parameter to __logfs_write_inode · 0bd90387
      Prasad Joshi 提交于
      During GC LogFS has to rewrite each valid block to a separate segment.
      Rewrite operation reads data from an old segment and writes it to a
      newly allocated segment. Since every write operation changes data
      block pointers maintained in inode, inode should also be rewritten.
      
      In GC path to avoid AB-BA deadlock LogFS marks a page with
      PG_pre_locked in addition to locking the page (PG_locked). The page
      lock is ignored iff the page is pre-locked.
      
      LogFS uses a special file called segment file. The segment file
      maintains an 8 bytes entry for every segment. It keeps track of erase
      count, level etc. for every segment.
      
      Bad things happen with a segment belonging to the segment file is GCed
      
       ------------[ cut here ]------------
      kernel BUG at /home/prasad/logfs/readwrite.c:297!
      invalid opcode: 0000 [#1] SMP
      Modules linked in: logfs joydev usbhid hid psmouse e1000 i2c_piix4
      		serio_raw [last unloaded: logfs]
      Pid: 20161, comm: mount Not tainted 3.1.0-rc3+ #3 innotek GmbH
      		VirtualBox
      EIP: 0060:[<f809132a>] EFLAGS: 00010292 CPU: 0
      EIP is at logfs_lock_write_page+0x6a/0x70 [logfs]
      EAX: 00000027 EBX: f73f5b20 ECX: c16007c8 EDX: 00000094
      ESI: 00000000 EDI: e59be6e4 EBP: c7337b28 ESP: c7337b18
      DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
      Process mount (pid: 20161, ti=c7336000 task=eb323f70 task.ti=c7336000)
      Stack:
      f8099a3d c7337b24 f73f5b20 00001002 c7337b50 f8091f6d f8099a4d f80994e4
      00000003 00000000 c7337b68 00000000 c67e4400 00001000 c7337b80 f80935e5
      00000000 00000000 00000000 00000000 e1fcf000 0000000f e59be618 c70bf900
      Call Trace:
      [<f8091f6d>] logfs_get_write_page.clone.16+0xdd/0x100 [logfs]
      [<f80935e5>] logfs_mod_segment_entry+0x55/0x110 [logfs]
      [<f809460d>] logfs_get_segment_entry+0x1d/0x20 [logfs]
      [<f8091060>] ? logfs_cleanup_journal+0x50/0x50 [logfs]
      [<f809521b>] ostore_get_erase_count+0x1b/0x40 [logfs]
      [<f80965b8>] logfs_open_area+0xc8/0x150 [logfs]
      [<c141a7ec>] ? kmemleak_alloc+0x2c/0x60
      [<f809668e>] __logfs_segment_write.clone.16+0x4e/0x1b0 [logfs]
      [<c10dd563>] ? mempool_kmalloc+0x13/0x20
      [<c10dd563>] ? mempool_kmalloc+0x13/0x20
      [<f809696f>] logfs_segment_write+0x17f/0x1d0 [logfs]
      [<f8092e8c>] logfs_write_i0+0x11c/0x180 [logfs]
      [<f8092f35>] logfs_write_direct+0x45/0x90 [logfs]
      [<f80934cd>] __logfs_write_buf+0xbd/0xf0 [logfs]
      [<c102900e>] ? kmap_atomic_prot+0x4e/0xe0
      [<f809424b>] logfs_write_buf+0x3b/0x60 [logfs]
      [<f80947a9>] __logfs_write_inode+0xa9/0x110 [logfs]
      [<f8094cb0>] logfs_rewrite_block+0xc0/0x110 [logfs]
      [<f8095300>] ? get_mapping_page+0x10/0x60 [logfs]
      [<f8095aa0>] ? logfs_load_object_aliases+0x2e0/0x2f0 [logfs]
      [<f808e57d>] logfs_gc_segment+0x2ad/0x310 [logfs]
      [<f808e62a>] __logfs_gc_once+0x4a/0x80 [logfs]
      [<f808ed43>] logfs_gc_pass+0x683/0x6a0 [logfs]
      [<f8097a89>] logfs_mount+0x5a9/0x680 [logfs]
      [<c1126b21>] mount_fs+0x21/0xd0
      [<c10f6f6f>] ? __alloc_percpu+0xf/0x20
      [<c113da41>] ? alloc_vfsmnt+0xb1/0x130
      [<c113db4b>] vfs_kern_mount+0x4b/0xa0
      [<c113e06e>] do_kern_mount+0x3e/0xe0
      [<c113f60d>] do_mount+0x34d/0x670
      [<c10f2749>] ? strndup_user+0x49/0x70
      [<c113fcab>] sys_mount+0x6b/0xa0
      [<c142d87c>] syscall_call+0x7/0xb
      Code: f8 e8 8b 93 39 c9 8b 45 f8 3e 0f ba 28 00 19 d2 85 d2 74 ca eb d0 0f 0b 8d 45 fc 89 44 24 04 c7 04 24 3d 9a 09 f8 e8 09 92 39 c9 <0f> 0b 8d 74 26 00 55 89 e5 3e 8d 74 26 00 8b 10 80 e6 01 74 09
      EIP: [<f809132a>] logfs_lock_write_page+0x6a/0x70 [logfs] SS:ESP 0068:c7337b18
      ---[ end trace 96e67d5b3aa3d6ca ]---
      
      The patch passes locked page to __logfs_write_inode. It calls function
      logfs_get_wblocks() to pre-lock the page. This ensures any further
      attempts to lock the page are ignored (esp from get_erase_count).
      Acked-by: NJoern Engel <joern@logfs.org>
      Signed-off-by: NPrasad Joshi <prasadjoshi.linux@gmail.com>
      0bd90387
    • P
      logfs: set superblock shutdown flag after generic sb shutdown · ecfd8909
      Prasad Joshi 提交于
      While unmounting the file system LogFS calls generic_shutdown_super.
      The function does file system independent superblock shutdown.
      However, it might result in call file system specific inode eviction.
      
      LogFS marks FS shutting down by setting bit LOGFS_SB_FLAG_SHUTDOWN in
      super->s_flags. Since, inode eviction might call truncate on inode,
      following BUG is observed when file system is unmounted:
      
      ------------[ cut here ]------------
      kernel BUG at /home/prasad/logfs/segment.c:362!
      invalid opcode: 0000 [#1] PREEMPT SMP
      CPU 3
      Modules linked in: logfs binfmt_misc ppdev virtio_blk parport_pc lp
      	parport psmouse floppy virtio_pci serio_raw virtio_ring virtio
      
      Pid: 1933, comm: umount Not tainted 3.0.0+ #4 Bochs Bochs
      RIP: 0010:[<ffffffffa008c841>]  [<ffffffffa008c841>]
      		logfs_segment_write+0x211/0x230 [logfs]
      RSP: 0018:ffff880062d7b9e8  EFLAGS: 00010202
      RAX: 000000000000000e RBX: ffff88006eca9000 RCX: 0000000000000000
      RDX: ffff88006fd87c40 RSI: ffffea00014ff468 RDI: ffff88007b68e000
      RBP: ffff880062d7ba48 R08: 8000000020451430 R09: 0000000000000000
      R10: dead000000100100 R11: 0000000000000000 R12: ffff88006fd87c40
      R13: ffffea00014ff468 R14: ffff88005ad0a460 R15: 0000000000000000
      FS:  00007f25d50ea760(0000) GS:ffff88007fd80000(0000)
      	knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000000000d05e48 CR3: 0000000062c72000 CR4: 00000000000006e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process umount (pid: 1933, threadinfo ffff880062d7a000,
      	task ffff880070b44500)
      Stack:
      ffff880062d7ba38 ffff88005ad0a508 0000000000001000 0000000000000000
      8000000020451430 ffffea00014ff468 ffff880062d7ba48 ffff88005ad0a460
      ffff880062d7bad8 ffffea00014ff468 ffff88006fd87c40 0000000000000000
      Call Trace:
      [<ffffffffa0088fee>] logfs_write_i0+0x12e/0x190 [logfs]
      [<ffffffffa0089360>] __logfs_write_rec+0x140/0x220 [logfs]
      [<ffffffffa0089312>] __logfs_write_rec+0xf2/0x220 [logfs]
      [<ffffffffa00894a4>] logfs_write_rec+0x64/0xd0 [logfs]
      [<ffffffffa0089616>] __logfs_write_buf+0x106/0x110 [logfs]
      [<ffffffffa008a19e>] logfs_write_buf+0x4e/0x80 [logfs]
      [<ffffffffa008a6b8>] __logfs_write_inode+0x98/0x110 [logfs]
      [<ffffffffa008a7c4>] logfs_truncate+0x54/0x290 [logfs]
      [<ffffffffa008abfc>] logfs_evict_inode+0xdc/0x190 [logfs]
      [<ffffffff8115eef5>] evict+0x85/0x170
      [<ffffffff8115f126>] iput+0xe6/0x1b0
      [<ffffffff8115b4a8>] shrink_dcache_for_umount_subtree+0x218/0x280
      [<ffffffff8115ce91>] shrink_dcache_for_umount+0x51/0x90
      [<ffffffff8114796c>] generic_shutdown_super+0x2c/0x100
      [<ffffffffa008cc47>] logfs_kill_sb+0x57/0xf0 [logfs]
      [<ffffffff81147de5>] deactivate_locked_super+0x45/0x70
      [<ffffffff811487ea>] deactivate_super+0x4a/0x70
      [<ffffffff81163934>] mntput_no_expire+0xa4/0xf0
      [<ffffffff8116469f>] sys_umount+0x6f/0x380
      [<ffffffff814dd46b>] system_call_fastpath+0x16/0x1b
      Code: 55 c8 49 8d b6 a8 00 00 00 45 89 f9 45 89 e8 4c 89 e1 4c 89 55
      b8 c7 04 24 00 00 00 00 e8 68 fc ff ff 4c 8b 55 b8 e9 3c ff ff ff <0f>
      0b 0f 0b c7 45 c0 00 00 00 00 e9 44 fe ff ff 66 66 66 66 66
      RIP  [<ffffffffa008c841>] logfs_segment_write+0x211/0x230 [logfs]
      RSP <ffff880062d7b9e8>
      ---[ end trace fe6b040cea952290 ]---
      
      Therefore, move super->s_flags setting after the fs-indenpendent work
      has been finished.
      Reviewed-by: NJoern Engel <joern@logfs.org>
      Signed-off-by: NPrasad Joshi <prasadjoshi.linux@gmail.com>
      ecfd8909
    • P
      logfs: take write mutex lock during fsync and sync · 13ced29c
      Prasad Joshi 提交于
      LogFS uses super->s_write_mutex while writing data to disk. Taking the
      same mutex lock in sync and fsync code path solves the following BUG:
      
      ------------[ cut here ]------------
      kernel BUG at /home/prasad/logfs/dev_bdev.c:134!
      
      Pid: 2387, comm: flush-253:16 Not tainted 3.0.0+ #4 Bochs Bochs
      RIP: 0010:[<ffffffffa007deed>]  [<ffffffffa007deed>]
                      bdev_writeseg+0x25d/0x270 [logfs]
      Call Trace:
      [<ffffffffa007c381>] logfs_open_area+0x91/0x150 [logfs]
      [<ffffffff8128dcb2>] ? find_level.clone.9+0x62/0x100
      [<ffffffffa007c49c>] __logfs_segment_write.clone.20+0x5c/0x190 [logfs]
      [<ffffffff810ef005>] ? mempool_kmalloc+0x15/0x20
      [<ffffffff810ef383>] ? mempool_alloc+0x53/0x130
      [<ffffffffa007c7a4>] logfs_segment_write+0x1d4/0x230 [logfs]
      [<ffffffffa0078f8e>] logfs_write_i0+0x12e/0x190 [logfs]
      [<ffffffffa0079300>] __logfs_write_rec+0x140/0x220 [logfs]
      [<ffffffffa0079444>] logfs_write_rec+0x64/0xd0 [logfs]
      [<ffffffffa00795b6>] __logfs_write_buf+0x106/0x110 [logfs]
      [<ffffffffa007a13e>] logfs_write_buf+0x4e/0x80 [logfs]
      [<ffffffffa0073e33>] __logfs_writepage+0x23/0x80 [logfs]
      [<ffffffffa007410c>] logfs_writepage+0xdc/0x110 [logfs]
      [<ffffffff810f5ba7>] __writepage+0x17/0x40
      [<ffffffff810f6208>] write_cache_pages+0x208/0x4f0
      [<ffffffff810f5b90>] ? set_page_dirty+0x70/0x70
      [<ffffffff810f653a>] generic_writepages+0x4a/0x70
      [<ffffffff810f75d1>] do_writepages+0x21/0x40
      [<ffffffff8116b9d1>] writeback_single_inode+0x101/0x250
      [<ffffffff8116bdbd>] writeback_sb_inodes+0xed/0x1c0
      [<ffffffff8116c5fb>] writeback_inodes_wb+0x7b/0x1e0
      [<ffffffff8116cc23>] wb_writeback+0x4c3/0x530
      [<ffffffff814d984d>] ? sub_preempt_count+0x9d/0xd0
      [<ffffffff8116cd6b>] wb_do_writeback+0xdb/0x290
      [<ffffffff814d984d>] ? sub_preempt_count+0x9d/0xd0
      [<ffffffff814d6208>] ? _raw_spin_unlock_irqrestore+0x18/0x40
      [<ffffffff8105aa5a>] ? del_timer+0x8a/0x120
      [<ffffffff8116cfac>] bdi_writeback_thread+0x8c/0x2e0
      [<ffffffff8116cf20>] ? wb_do_writeback+0x290/0x290
      [<ffffffff8106d2e6>] kthread+0x96/0xa0
      [<ffffffff814de514>] kernel_thread_helper+0x4/0x10
      [<ffffffff8106d250>] ? kthread_worker_fn+0x190/0x190
      [<ffffffff814de510>] ? gs_change+0xb/0xb
      RIP  [<ffffffffa007deed>] bdev_writeseg+0x25d/0x270 [logfs]
      ---[ end trace 0211ad60a57657c4 ]---
      Reviewed-by: NJoern Engel <joern@logfs.org>
      Signed-off-by: NPrasad Joshi <prasadjoshi.linux@gmail.com>
      13ced29c
    • J
      logfs: Prevent memory corruption · 934eed39
      Joern Engel 提交于
      This is a bad one.  I wonder whether we were so far protected by
      no_free_segments(sb) usually being smaller than LOGFS_NO_AREAS.
      
      Found by Dan Carpenter <dan.carpenter@oracle.com> using smatch.
      Signed-off-by: NJoern Engel <joern@logfs.org>
      Signed-off-by: NPrasad Joshi <prasadjoshi.linux@gmail.com>
      934eed39
    • P
      logfs: update page reference count for pined pages · 96150606
      Prasad Joshi 提交于
      LogFS sets PG_private flag to indicate a pined page. We assumed that
      marking a page as private is enough to ensure its existence. But
      instead it is necessary to hold a reference count to the page.
      
      The change resolves the following BUG
      
      BUG: Bad page state in process flush-253:16  pfn:6a6d0
      page flags: 0x100000000000808(uptodate|private)
      Suggested-and-Acked-by: NJoern Engel <joern@logfs.org>
      Signed-off-by: NPrasad Joshi <prasadjoshi.linux@gmail.com>
      96150606
  3. 27 1月, 2012 11 次提交
    • C
      Btrfs: fix reservations in btrfs_page_mkwrite · 9998eb70
      Chris Mason 提交于
      Josef fixed btrfs_page_mkwrite to properly release reserved
      extents if there was an error.  But if we fail to get a reservation
      and we fail to dirty the inode (for ENOSPC reasons), we'll end up
      trying to release a reservation we never had.
      
      This makes sure we only release if we were able to reserve.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      9998eb70
    • J
      Btrfs: advance window_start if we're using a bitmap · 9b230628
      Josef Bacik 提交于
      If we span a long area in a bitmap we could end up taking a lot of time
      searching to the next free area if we're searching from the original
      window_start, so advance window_start in order to make sure we don't do any
      superficial searching.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      9b230628
    • D
      btrfs: mask out gfp flags in releasepage · 0c4e538b
      David Sterba 提交于
      btree_releasepage is a callback and can be passed unknown gfp flags and then
      they may end up in kmem_cache_alloc called from alloc_extent_state, slab
      allocator will BUG_ON when there is HIGHMEM or DMA32 flag set.
      
      This may happen when btrfs is mounted from a loop device, which masks out
      __GFP_IO flag. The check in try_release_extent_state
      
      3399                 if ((mask & GFP_NOFS) == GFP_NOFS)
      3400                         mask = GFP_NOFS;
      
      will not work and passes unfiltered flags further resulting in crash at
      mm/slab.c:2963
      
       [<000000000024ae4c>] cache_alloc_refill+0x3b4/0x5c8
       [<000000000024c810>] kmem_cache_alloc+0x204/0x294
       [<00000000001fd3c2>] mempool_alloc+0x52/0x170
       [<000003c000ced0b0>] alloc_extent_state+0x40/0xd4 [btrfs]
       [<000003c000cee5ae>] __clear_extent_bit+0x38a/0x4cc [btrfs]
       [<000003c000cee78c>] try_release_extent_state+0x9c/0xd4 [btrfs]
       [<000003c000cc4c66>] btree_releasepage+0x7e/0xd0 [btrfs]
       [<0000000000210d84>] shrink_page_list+0x6a0/0x724
       [<0000000000211394>] shrink_inactive_list+0x230/0x578
       [<0000000000211bb8>] shrink_list+0x6c/0x120
       [<0000000000211e4e>] shrink_zone+0x1e2/0x228
       [<0000000000211f24>] shrink_zones+0x90/0x254
       [<0000000000213410>] do_try_to_free_pages+0xac/0x420
       [<0000000000213ae0>] try_to_free_pages+0x13c/0x1b0
       [<0000000000204e6c>] __alloc_pages_nodemask+0x5b4/0x9a8
       [<00000000001fb04a>] grab_cache_page_write_begin+0x7e/0xe8
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      0c4e538b
    • M
      Btrfs: fix enospc error caused by wrong checks of the chunk · 9e622d6b
      Miao Xie 提交于
      When we did sysbench test for inline files, enospc error happened easily though
      there was lots of free disk space which could be allocated for new chunks.
      
      Reproduce steps:
       # mkfs.btrfs -b $((2 * 1024 * 1024 * 1024)) <test partition>
       # mount <test partition> /mnt
       # ulimit -n 102400
       # cd /mnt
       # sysbench --num-threads=1 --test=fileio --file-num=81920 \
       > --file-total-size=80M --file-block-size=1K --file-io-mode=sync \
       > --file-test-mode=seqwr prepare
       # sysbench --num-threads=1 --test=fileio --file-num=81920 \
       > --file-total-size=80M --file-block-size=1K --file-io-mode=sync \
       > --file-test-mode=seqwr run
       <soon later, BUG_ON() was triggered by enospc error>
      
      The reason of this bug is:
      Now, we can reserve space which is larger than the free space in the chunks if
      we have enough free disk space which can be used for new chunks. By this way,
      the space allocator should allocate a new chunk by force if there is no free
      space in the free space cache. But there are two wrong checks which break this
      operation.
      
      One is
      	if (ret == -ENOSPC && num_bytes > min_alloc_size)
      in btrfs_reserve_extent(), it is wrong, we should try to allocate a new chunk
      even we fail to allocate free space by minimum allocable size.
      
      The other is
      	if (space_info->force_alloc)
      		force = space_info->force_alloc;
      in do_chunk_alloc(). It makes the allocator ignore CHUNK_ALLOC_FORCE If someone
      sets ->force_alloc to CHUNK_ALLOC_LIMITED, and makes the enospc error happen.
      
      Fix these two wrong checks. Especially the second one, we fix it by changing
      the value of CHUNK_ALLOC_LIMITED and CHUNK_ALLOC_FORCE, and make
      CHUNK_ALLOC_FORCE greater than CHUNK_ALLOC_LIMITED since CHUNK_ALLOC_FORCE has
      higher priority. And if the value which is passed in by the caller is greater
      than ->force_alloc, use the passed value.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      9e622d6b
    • L
      Btrfs: do not defrag a file partially · 7ec31b54
      Liu Bo 提交于
      xfstests 218 complains that btrfs defrags a file partially:
       After: 1
       Write backwards sync, but contiguous - should defrag to 1 extent
       Before: 10
      -After: 1
      +After: 2
      
      To fix this, we need to set max_to_defrag count properly.
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      7ec31b54
    • S
      Btrfs: fix warning for 32-bit build of fs/btrfs/check-integrity.c · 0b485143
      Stefan Behrens 提交于
      There have been 4 warnings on 32-bit build, they are herewith fixed.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      0b485143
    • J
      Btrfs: use cluster->window_start when allocating from a cluster bitmap · 0b4a9d24
      Josef Bacik 提交于
      We specifically set window_start in the cluster struct to indicate where the
      cluster starts in a bitmap, but we've been using min_start to indicate where
      we're searching from.  This is usually the start of the blockgroup, so
      essentially means we're constantly searching from the start of any bitmap we
      find, which completely negates all the trouble we go to in order to setup a
      cluster.  So start using window_start to make sure we actually use the area we
      found.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      0b4a9d24
    • M
      Btrfs: Check for NULL page in extent_range_uptodate · 8bedd51b
      Mitch Harder 提交于
      A user has encountered a NULL pointer kernel oops in btrfs when
      encountering media errors.  The problem has been identified
      as an unhandled NULL pointer returned from find_get_page().
      This modification simply checks for a NULL page, and returns
      with an error if found (the extent_range_uptodate() function
      returns 1 on errors).
      
      After testing this patch, the user reported that the error with
      the NULL pointer oops was solved.  However, there is still a
      remaining problem with a thread becoming stuck in
      wait_on_page_locked(page) in the read_extent_buffer_pages(...)
      function in extent_io.c
      
             for (i = start_i; i < num_pages; i++) {
                     page = extent_buffer_page(eb, i);
                     wait_on_page_locked(page);
                     if (!PageUptodate(page))
                             ret = -EIO;
             }
      
      This patch leaves the issue with the locked page yet to be resolved.
      Signed-off-by: NMitch Harder <mitch.harder@sabayonlinux.org>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      8bedd51b
    • J
      btrfs: Fix busyloops in transaction waiting code · 6dd70ce4
      Jan Kara 提交于
      wait_log_commit() and wait_for_writer() were using slightly different
      conditions for deciding whether they should call schedule() and whether they
      should continue in the wait loop. Thus it could happen that we busylooped when
      the first condition was not true while the second one was. That is burning CPU
      cycles needlessly and is deadly on UP machines...
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      6dd70ce4
    • J
      Btrfs: make sure a bitmap has enough bytes · 357b9784
      Josef Bacik 提交于
      We have only been checking for min_bytes available in bitmap entries, but we
      won't successfully setup a bitmap cluster unless it has at least bytes in the
      bitmap, so in the common case min_bytes is 4k and we want something like 2MB, so
      if there are a bunch of bitmap entries with less than 2mb's in them, we'll
      search all them anyway, which is suboptimal.  Fix this check.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      357b9784
    • J
      Btrfs: fix uninit warning in backref.c · b1375d64
      Jan Schmidt 提交于
      Added initialization with the declaration of ret. It isn't set later on the
      switch-default branch (which should never be taken).
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      b1375d64
  4. 26 1月, 2012 12 次提交
    • L
      eCryptfs: move misleading function comments · 1589cb1a
      Li Wang 提交于
       The data encryption was moved from ecryptfs_write_end into
      ecryptfs_writepage, this patch moves the corresponding function
      comments to be consistent with the modification.
      Signed-off-by: NLi Wang <liwang@nudt.edu.cn>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1589cb1a
    • T
      eCryptfs: Fix oops when printing debug info in extent crypto functions · 58ded24f
      Tyler Hicks 提交于
      If pages passed to the eCryptfs extent-based crypto functions are not
      mapped and the module parameter ecryptfs_verbosity=1 was specified at
      loading time, a NULL pointer dereference will occur.
      
      Note that this wouldn't happen on a production system, as you wouldn't
      pass ecryptfs_verbosity=1 on a production system. It leaks private
      information to the system logs and is for debugging only.
      
      The debugging info printed in these messages is no longer very useful
      and rather than doing a kmap() in these debugging paths, it will be
      better to simply remove the debugging paths completely.
      
      https://launchpad.net/bugs/913651Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
      Reported-by: Daniel DeFreez
      Cc: <stable@vger.kernel.org>
      58ded24f
    • T
      eCryptfs: Remove unused ecryptfs_read() · f2cb9335
      Tyler Hicks 提交于
      ecryptfs_read() has been ifdef'ed out for years now and it was
      apparently unused before then. It is time to get rid of it for good.
      Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
      f2cb9335
    • T
      eCryptfs: Check inode changes in setattr · a261a039
      Tyler Hicks 提交于
      Most filesystems call inode_change_ok() very early in ->setattr(), but
      eCryptfs didn't call it at all. It allowed the lower filesystem to make
      the call in its ->setattr() function. Then, eCryptfs would copy the
      appropriate inode attributes from the lower inode to the eCryptfs inode.
      
      This patch changes that and actually calls inode_change_ok() on the
      eCryptfs inode, fairly early in ecryptfs_setattr(). Ideally, the call
      would happen earlier in ecryptfs_setattr(), but there are some possible
      inode initialization steps that must happen first.
      
      Since the call was already being made on the lower inode, the change in
      functionality should be minimal, except for the case of a file extending
      truncate call. In that case, inode_newsize_ok() was never being
      called on the eCryptfs inode. Rather than inode_newsize_ok() catching
      maximum file size errors early on, eCryptfs would encrypt zeroed pages
      and write them to the lower filesystem until the lower filesystem's
      write path caught the error in generic_write_checks(). This patch
      introduces a new function, called ecryptfs_inode_newsize_ok(), which
      checks if the new lower file size is within the appropriate limits when
      the truncate operation will be growing the lower file.
      
      In summary this change prevents eCryptfs truncate operations (and the
      resulting page encryptions), which would exceed the lower filesystem
      limits or FSIZE rlimits, from ever starting.
      Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
      Reviewed-by: NLi Wang <liwang@nudt.edu.cn>
      Cc: <stable@vger.kernel.org>
      a261a039
    • T
      eCryptfs: Make truncate path killable · 5e6f0d76
      Tyler Hicks 提交于
      ecryptfs_write() handles the truncation of eCryptfs inodes. It grabs a
      page, zeroes out the appropriate portions, and then encrypts the page
      before writing it to the lower filesystem. It was unkillable and due to
      the lack of sparse file support could result in tying up a large portion
      of system resources, while encrypting pages of zeros, with no way for
      the truncate operation to be stopped from userspace.
      
      This patch adds the ability for ecryptfs_write() to detect a pending
      fatal signal and return as gracefully as possible. The intent is to
      leave the lower file in a useable state, while still allowing a user to
      break out of the encryption loop. If a pending fatal signal is detected,
      the eCryptfs inode size is updated to reflect the modified inode size
      and then -EINTR is returned.
      Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
      Cc: <stable@vger.kernel.org>
      5e6f0d76
    • L
      eCryptfs: Infinite loop due to overflow in ecryptfs_write() · 684a3ff7
      Li Wang 提交于
      ecryptfs_write() can enter an infinite loop when truncating a file to a
      size larger than 4G. This only happens on architectures where size_t is
      represented by 32 bits.
      
      This was caused by a size_t overflow due to it incorrectly being used to
      store the result of a calculation which uses potentially large values of
      type loff_t.
      
      [tyhicks@canonical.com: rewrite subject and commit message]
      Signed-off-by: NLi Wang <liwang@nudt.edu.cn>
      Signed-off-by: NYunchuan Wen <wenyunchuan@kylinos.com.cn>
      Reviewed-by: NCong Wang <xiyou.wangcong@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
      684a3ff7
    • T
      eCryptfs: Replace miscdev read/write magic numbers · 48399c0b
      Tyler Hicks 提交于
      ecryptfs_miscdev_read() and ecryptfs_miscdev_write() contained many
      magic numbers for specifying packet header field sizes and offsets. This
      patch defines those values and replaces the magic values.
      Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
      48399c0b
    • T
      eCryptfs: Report errors in writes to /dev/ecryptfs · 7f133504
      Tyler Hicks 提交于
      Errors in writes to /dev/ecryptfs were being incorrectly reported by
      returning 0 or the value of the original write count.
      
      This patch clears up the return code assignment in error paths.
      Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
      7f133504
    • T
      eCryptfs: Sanitize write counts of /dev/ecryptfs · db10e556
      Tyler Hicks 提交于
      A malicious count value specified when writing to /dev/ecryptfs may
      result in a a very large kernel memory allocation.
      
      This patch peeks at the specified packet payload size, adds that to the
      size of the packet headers and compares the result with the write count
      value. The resulting maximum memory allocation size is approximately 532
      bytes.
      Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
      Reported-by: NSasha Levin <levinsasha928@gmail.com>
      Cc: <stable@vger.kernel.org>
      db10e556
    • T
      ecryptfs: Remove unnecessary variable initialization · bb450361
      Tim Gardner 提交于
      Removes unneeded variable initialization in ecryptfs_read_metadata(). Also adds
      a small comment to help explain metadata reading logic.
      
      [tyhicks@canonical.com: Pulled out of for-stable patch and wrote commit msg]
      Signed-off-by: NTim Gardner <tim.gardner@canonical.com>
      Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
      bb450361
    • T
      ecryptfs: Improve metadata read failure logging · 30373dc0
      Tim Gardner 提交于
      Print inode on metadata read failure. The only real
      way of dealing with metadata read failures is to delete
      the underlying file system file. Having the inode
      allows one to 'find . -inum INODE`.
      
      [tyhicks@canonical.com: Removed some minor not-for-stable parts]
      Signed-off-by: NTim Gardner <tim.gardner@canonical.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
      30373dc0
    • J
      xfs: Fix missing xfs_iunlock() on error recovery path in xfs_readlink() · 9b025eb3
      Jan Kara 提交于
      Commit b52a360b forgot to call xfs_iunlock() when it detected corrupted
      symplink and bailed out. Fix it by jumping to 'out' instead of doing return.
      
      CC: stable@kernel.org
      CC: Carlos Maiolino <cmaiolino@redhat.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NAlex Elder <elder@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: Ben Myers <bpm@sgi.com> 
      9b025eb3
  5. 25 1月, 2012 2 次提交
  6. 24 1月, 2012 2 次提交
    • R
      kernel-doc: fix new warnings in debugfs · b5763acc
      Randy Dunlap 提交于
      Fix new kernel-doc warnings:
      
      Warning(fs/debugfs/file.c:556): No description found for parameter 'nregs'
      Warning(fs/debugfs/file.c:556): Excess function parameter 'mregs' description in 'debugfs_print_regs32'
      Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b5763acc
    • W
      proc: clear_refs: do not clear reserved pages · 85e72aa5
      Will Deacon 提交于
      /proc/pid/clear_refs is used to clear the Referenced and YOUNG bits for
      pages and corresponding page table entries of the task with PID pid, which
      includes any special mappings inserted into the page tables in order to
      provide things like vDSOs and user helper functions.
      
      On ARM this causes a problem because the vectors page is mapped as a
      global mapping and since ec706dab ("ARM: add a vma entry for the user
      accessible vector page"), a VMA is also inserted into each task for this
      page to aid unwinding through signals and syscall restarts.  Since the
      vectors page is required for handling faults, clearing the YOUNG bit (and
      subsequently writing a faulting pte) means that we lose the vectors page
      *globally* and cannot fault it back in.  This results in a system deadlock
      on the next exception.
      
      To see this problem in action, just run:
      
      	$ echo 1 > /proc/self/clear_refs
      
      on an ARM platform (as any user) and watch your system hang.  I think this
      has been the case since 2.6.37
      
      This patch avoids clearing the aforementioned bits for reserved pages,
      therefore leaving the vectors page intact on ARM.  Since reserved pages
      are not candidates for swap, this change should not have any impact on the
      usefulness of clear_refs.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Reported-by: NMoussa Ba <moussaba@micron.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Acked-by: NNicolas Pitre <nico@linaro.org>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: <stable@vger.kernel.org>		[2.6.37+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      85e72aa5
  7. 20 1月, 2012 3 次提交