1. 02 2月, 2011 1 次提交
    • E
      fs/vfs/security: pass last path component to LSM on inode creation · 2a7dba39
      Eric Paris 提交于
      SELinux would like to implement a new labeling behavior of newly created
      inodes.  We currently label new inodes based on the parent and the creating
      process.  This new behavior would also take into account the name of the
      new object when deciding the new label.  This is not the (supposed) full path,
      just the last component of the path.
      
      This is very useful because creating /etc/shadow is different than creating
      /etc/passwd but the kernel hooks are unable to differentiate these
      operations.  We currently require that userspace realize it is doing some
      difficult operation like that and than userspace jumps through SELinux hoops
      to get things set up correctly.  This patch does not implement new
      behavior, that is obviously contained in a seperate SELinux patch, but it
      does pass the needed name down to the correct LSM hook.  If no such name
      exists it is fine to pass NULL.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      2a7dba39
  2. 07 1月, 2011 3 次提交
    • N
      ext2,3,4: provide simple rcu-walk ACL implementation · 73598611
      Nick Piggin 提交于
      This simple implementation just checks for no ACLs on the inode, and
      if so, then the rcu-walk may proceed, otherwise fail it.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      73598611
    • N
      fs: provide rcu-walk aware permission i_ops · b74c79e9
      Nick Piggin 提交于
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      b74c79e9
    • N
      fs: icache RCU free inodes · fa0d7e3d
      Nick Piggin 提交于
      RCU free the struct inode. This will allow:
      
      - Subsequent store-free path walking patch. The inode must be consulted for
        permissions when walking, so an RCU inode reference is a must.
      - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
        to take i_lock no longer need to take sb_inode_list_lock to walk the list in
        the first place. This will simplify and optimize locking.
      - Could remove some nested trylock loops in dcache code
      - Could potentially simplify things a bit in VM land. Do not need to take the
        page lock to follow page->mapping.
      
      The downsides of this is the performance cost of using RCU. In a simple
      creat/unlink microbenchmark, performance drops by about 10% due to inability to
      reuse cache-hot slab objects. As iterations increase and RCU freeing starts
      kicking over, this increases to about 20%.
      
      In cases where inode lifetimes are longer (ie. many inodes may be allocated
      during the average life span of a single inode), a lot of this cache reuse is
      not applicable, so the regression caused by this patch is smaller.
      
      The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
      however this adds some complexity to list walking and store-free path walking,
      so I prefer to implement this at a later date, if it is shown to be a win in
      real situations. I haven't found a regression in any non-micro benchmark so I
      doubt it will be a problem.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      fa0d7e3d
  3. 24 12月, 2010 1 次提交
  4. 15 12月, 2010 2 次提交
    • A
      ext4: fix typo which broke '..' detection in ext4_find_entry() · 6d5c3aa8
      Aaro Koskinen 提交于
      There should be a check for the NUL character instead of '0'.
      
      Fortunately the only thing that cares about this is NFS serving, which
      is why we didn't notice this in the merge window testing.
      Reported-by: NPhil Carmody <ext-phil.2.carmody@nokia.com>
      Signed-off-by: NAaro Koskinen <aaro.koskinen@nokia.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      6d5c3aa8
    • T
      ext4: Turn off multiple page-io submission by default · 1449032b
      Theodore Ts'o 提交于
      Jon Nelson has found a test case which causes postgresql to fail with
      the error:
      
      psql:t.sql:4: ERROR: invalid page header in block 38269 of relation base/16384/16581
      
      Under memory pressure, it looks like part of a file can end up getting
      replaced by zero's.  Until we can figure out the cause, we'll roll
      back the change and use block_write_full_page() instead of
      ext4_bio_write_page().  The new, more efficient writing function can
      be used via the mount option mblk_io_submit, so we can test and fix
      the new page I/O code.
      
      To reproduce the problem, install postgres 8.4 or 9.0, and pin enough
      memory such that the system just at the end of triggering writeback
      before running the following sql script:
      
      begin;
      create temporary table foo as select x as a, ARRAY[x] as b FROM
      generate_series(1, 10000000 ) AS x;
      create index foo_a_idx on foo (a);
      create index foo_b_idx on foo USING GIN (b);
      rollback;
      
      If the temporary table is created on a hard drive partition which is
      encrypted using dm_crypt, then under memory pressure, approximately
      30-40% of the time, pgsql will issue the above failure.
      
      This patch should fix this problem, and the problem will come back if
      the file system is mounted with the mblk_io_submit mount option.
      Reported-by: NJon Nelson <jnelson@jamponi.net>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      1449032b
  5. 20 11月, 2010 2 次提交
    • L
      ext4: Add EXT4_IOC_TRIM ioctl to handle batched discard · e681c047
      Lukas Czerner 提交于
      Filesystem independent ioctl was rejected as not common enough to be in
      core vfs ioctl. Since we still need to access to this functionality this
      commit adds ext4 specific ioctl EXT4_IOC_TRIM to dispatch
      ext4_trim_fs().
      
      It takes fstrim_range structure as an argument. fstrim_range is definec in
      the include/linux/fs.h and its definition is as follows.
      
      struct fstrim_range {
      	__u64 start;
      	__u64 len;
      	__u64 minlen;
      }
      
      start	- first Byte to trim
      len	- number of Bytes to trim from start
      minlen	- minimum extent length to trim, free extents shorter than this
        number of Bytes will be ignored. This will be rounded up to fs
        block size.
      
      After the FITRIM is done, the number of actually discarded Bytes is stored
      in fstrim_range.len to give the user better insight on how much storage
      space has been really released for wear-leveling.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      e681c047
    • L
      fs: Do not dispatch FITRIM through separate super_operation · 93bb41f4
      Lukas Czerner 提交于
      There was concern that FITRIM ioctl is not common enough to be included
      in core vfs ioctl, as Christoph Hellwig pointed out there's no real point
      in dispatching this out to a separate vector instead of just through
      ->ioctl.
      
      So this commit removes ioctl_fstrim() from vfs ioctl and trim_fs
      from super_operation structure.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      93bb41f4
  6. 19 11月, 2010 1 次提交
    • D
      ext4: ext4_fill_super shouldn't return 0 on corruption · 5a9ae68a
      Darrick J. Wong 提交于
      At the start of ext4_fill_super, ret is set to -EINVAL, and any failure path
      out of that function returns ret.  However, the generic_check_addressable
      clause sets ret = 0 (if it passes), which means that a subsequent failure (e.g.
      a group checksum error) returns 0 even though the mount should fail.  This
      causes vfs_kern_mount in turn to think that the mount succeeded, leading to an
      oops.
      
      A simple fix is to avoid using ret for the generic_check_addressable check,
      which was last changed in commit 30ca22c7.
      Signed-off-by: NDarrick J. Wong <djwong@us.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      5a9ae68a
  7. 18 11月, 2010 2 次提交
  8. 09 11月, 2010 5 次提交
    • T
      ext4: Add new ext4 inode tracepoints · 7ff9c073
      Theodore Ts'o 提交于
      Add ext4_evict_inode, ext4_drop_inode, ext4_mark_inode_dirty, and
      ext4_begin_ordered_truncate()
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      7ff9c073
    • T
      ext4: Don't call sb_issue_discard() in ext4_free_blocks() · b56ff9d3
      Theodore Ts'o 提交于
      Commit 5c521830 (ext4: Support discard requests when running in
      no-journal mode) attempts to add sb_issue_discard() for data blocks
      (in data=writeback mode) and in no-journal mode.  Unfortunately, this
      no longer works, because in commit dd3932ed (block: remove
      BLKDEV_IFL_WAIT), sb_issue_discard() only presents a synchronous
      interface, and there are times when we call ext4_free_blocks() when we
      are are holding a spinlock, or are otherwise in an atomic context.
      
      For now, I've removed the call to sb_issue_discard() to prevent a
      deadlock or (if spinlock debugging is enabled) failures like this:
      
      BUG: scheduling while atomic: rc.sysinit/1376/0x00000002
      Pid: 1376, comm: rc.sysinit Not tainted 2.6.36-ARCH #1
      Call Trace:
      [<ffffffff810397ce>] __schedule_bug+0x5e/0x70
      [<ffffffff81403110>] schedule+0x950/0xa70
      [<ffffffff81060bad>] ? insert_work+0x7d/0x90
      [<ffffffff81060fbd>] ? queue_work_on+0x1d/0x30
      [<ffffffff81061127>] ? queue_work+0x37/0x60
      [<ffffffff8140377d>] schedule_timeout+0x21d/0x360
      [<ffffffff812031c3>] ? generic_make_request+0x2c3/0x540
      [<ffffffff81402680>] wait_for_common+0xc0/0x150
      [<ffffffff81041490>] ? default_wake_function+0x0/0x10
      [<ffffffff812034bc>] ? submit_bio+0x7c/0x100
      [<ffffffff810680a0>] ? wake_bit_function+0x0/0x40
      [<ffffffff814027b8>] wait_for_completion+0x18/0x20
      [<ffffffff8120a969>] blkdev_issue_discard+0x1b9/0x210
      [<ffffffff811ba03e>] ext4_free_blocks+0x68e/0xb60
      [<ffffffff811b1650>] ? __ext4_handle_dirty_metadata+0x110/0x120
      [<ffffffff811b098c>] ext4_ext_truncate+0x8cc/0xa70
      [<ffffffff810d713e>] ? pagevec_lookup+0x1e/0x30
      [<ffffffff81191618>] ext4_truncate+0x178/0x5d0
      [<ffffffff810eacbb>] ? unmap_mapping_range+0xab/0x280
      [<ffffffff810d8976>] vmtruncate+0x56/0x70
      [<ffffffff811925cb>] ext4_setattr+0x14b/0x460
      [<ffffffff811319e4>] notify_change+0x194/0x380
      [<ffffffff81117f80>] do_truncate+0x60/0x90
      [<ffffffff811e08fa>] ? security_inode_permission+0x1a/0x20
      [<ffffffff811eaec1>] ? tomoyo_path_truncate+0x11/0x20
      [<ffffffff81127539>] do_last+0x5d9/0x770
      [<ffffffff811278bd>] do_filp_open+0x1ed/0x680
      [<ffffffff8140644f>] ? page_fault+0x1f/0x30
      [<ffffffff81132bfc>] ? alloc_fd+0xec/0x140
      [<ffffffff81118db1>] do_sys_open+0x61/0x120
      [<ffffffff81118e8b>] sys_open+0x1b/0x20
      [<ffffffff81002e6b>] system_call_fastpath+0x16/0x1b
      
      https://bugzilla.kernel.org/show_bug.cgi?id=22302Reported-by: NMathias Burén <mathias.buren@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: jiayingz@google.com
      b56ff9d3
    • D
      ext4: do not try to grab the s_umount semaphore in ext4_quota_off · 87009d86
      Dmitry Monakhov 提交于
      It's not needed to sync the filesystem, and it fixes a lock_dep complaint.
      Signed-off-by: NDmitry Monakhov <dmonakhov@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      87009d86
    • T
      ext4: fix potential race when freeing ext4_io_page structures · 83668e71
      Theodore Ts'o 提交于
      Use an atomic_t and make sure we don't free the structure while we
      might still be submitting I/O for that page.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      83668e71
    • T
      ext4: handle writeback of inodes which are being freed · f7ad6d2e
      Theodore Ts'o 提交于
      The following BUG can occur when an inode which is getting freed when
      it still has dirty pages outstanding, and it gets deleted (in this
      because it was the target of a rename).  In ordered mode, we need to
      make sure the data pages are written just in case we crash before the
      rename (or unlink) is committed.  If the inode is being freed then
      when we try to igrab the inode, we end up tripping the BUG_ON at
      fs/ext4/page-io.c:146.
      
      To solve this problem, we need to keep track of the number of io
      callbacks which are pending, and avoid destroying the inode until they
      have all been completed.  That way we don't have to bump the inode
      count to keep the inode from being destroyed; an approach which
      doesn't work because the count could have already been dropped down to
      zero before the inode writeback has started (at which point we're not
      allowed to bump the count back up to 1, since it's already started
      getting freed).
      
      Thanks to Dave Chinner for suggesting this approach, which is also
      used by XFS.
      
        kernel BUG at /scratch_space/linux-2.6/fs/ext4/page-io.c:146!
        Call Trace:
         [<ffffffff811075b1>] ext4_bio_write_page+0x172/0x307
         [<ffffffff811033a7>] mpage_da_submit_io+0x2f9/0x37b
         [<ffffffff811068d7>] mpage_da_map_and_submit+0x2cc/0x2e2
         [<ffffffff811069b3>] mpage_add_bh_to_extent+0xc6/0xd5
         [<ffffffff81106c66>] write_cache_pages_da+0x2a4/0x3ac
         [<ffffffff81107044>] ext4_da_writepages+0x2d6/0x44d
         [<ffffffff81087910>] do_writepages+0x1c/0x25
         [<ffffffff810810a4>] __filemap_fdatawrite_range+0x4b/0x4d
         [<ffffffff810815f5>] filemap_fdatawrite_range+0xe/0x10
         [<ffffffff81122a2e>] jbd2_journal_begin_ordered_truncate+0x7b/0xa2
         [<ffffffff8110615d>] ext4_evict_inode+0x57/0x24c
         [<ffffffff810c14a3>] evict+0x22/0x92
         [<ffffffff810c1a3d>] iput+0x212/0x249
         [<ffffffff810bdf16>] dentry_iput+0xa1/0xb9
         [<ffffffff810bdf6b>] d_kill+0x3d/0x5d
         [<ffffffff810be613>] dput+0x13a/0x147
         [<ffffffff810b990d>] sys_renameat+0x1b5/0x258
         [<ffffffff81145f71>] ? _atomic_dec_and_lock+0x2d/0x4c
         [<ffffffff810b2950>] ? cp_new_stat+0xde/0xea
         [<ffffffff810b29c1>] ? sys_newlstat+0x2d/0x38
         [<ffffffff810b99c6>] sys_rename+0x16/0x18
         [<ffffffff81002a2b>] system_call_fastpath+0x16/0x1b
      Reported-by: NNick Bowler <nbowler@elliptictech.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Tested-by: NNick Bowler <nbowler@elliptictech.com>
      f7ad6d2e
  9. 04 11月, 2010 1 次提交
  10. 03 11月, 2010 2 次提交
  11. 02 11月, 2010 1 次提交
  12. 29 10月, 2010 3 次提交
  13. 28 10月, 2010 16 次提交