1. 26 1月, 2014 3 次提交
  2. 16 10月, 2013 2 次提交
  3. 29 8月, 2013 1 次提交
  4. 01 8月, 2013 1 次提交
    • E
      ext3: allow specifying external journal by pathname mount option · cf7eff46
      Eric Sandeen 提交于
      It's always been a hassle that if an external journal's
      device number changes, the filesystem won't mount.
      And since boot-time enumeration can change, device number
      changes aren't unusual.
      
      The current mechanism to update the journal location is by
      passing in a mount option w/ a new devnum, but that's a hassle;
      it's a manual approach, fixing things after the fact.
      
      Adding a mount option, "-o journal_path=/dev/$DEVICE" would
      help, since then we can do i.e.
      
      # mount -o journal_path=/dev/disk/by-label/$JOURNAL_LABEL ...
      
      and it'll mount even if the devnum has changed, as shown here:
      
      # losetup /dev/loop0 journalfile
      # mke2fs -L mylabel-journal -O journal_dev /dev/loop0
      # mkfs.ext3 -L mylabel -J device=/dev/loop0 /dev/sdb1
      
      Change the journal device number:
      
      # losetup -d /dev/loop0
      # losetup /dev/loop1 journalfile
      
      And today it will fail:
      
      # mount /dev/sdb1 /mnt/test
      mount: wrong fs type, bad option, bad superblock on /dev/sdb1,
             missing codepage or helper program, or other error
             In some cases useful info is found in syslog - try
             dmesg | tail  or so
      
      # dmesg | tail -n 1
      [17343.240702] EXT3-fs (sdb1): error: couldn't read superblock of external journal
      
      But with this new mount option, we can specify the new path:
      
      # mount -o journal_path=/dev/loop1 /dev/sdb1 /mnt/test
      #
      
      (which does update the encoded device number, incidentally):
      
      # umount /dev/sdb1
      # dumpe2fs -h /dev/sdb1 | grep "Journal device"
      dumpe2fs 1.41.12 (17-May-2010)
      Journal device:	          0x0701
      
      But best of all we can just always mount by journal-path, and
      it'll always work:
      
      # mount -o journal_path=/dev/disk/by-label/mylabel-journal /dev/sdb1 /mnt/test
      #
      
      So the journal_path option can be specified in fstab, and as long as
      the disk is available somewhere, and findable by label (or by UUID),
      we can mount.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      cf7eff46
  5. 21 7月, 2013 1 次提交
    • Z
      ext3: fix a BUG when opening a file with O_TMPFILE flag · dda5690d
      Zheng Liu 提交于
      When we try to open a file with O_TMPFILE flag, we will trigger a bug.
      The root cause is that in ext4_orphan_add() we check ->i_nlink == 0 and
      this check always fails because we set ->i_nlink = 1 in
      inode_init_always().  We can use the following program to trigger it:
      
      int main(int argc, char *argv[])
      {
      	int fd;
      
      	fd = open(argv[1], O_TMPFILE, 0666);
      	if (fd < 0) {
      		perror("open ");
      		return -1;
      	}
      	close(fd);
      	return 0;
      }
      
      The oops message looks like this:
      
      kernel: kernel BUG at fs/ext3/namei.c:1992!
      kernel: invalid opcode: 0000 [#1] SMP
      kernel: Modules linked in: ext4 jbd2 crc16 cpufreq_ondemand ipv6 dm_mirror dm_region_hash dm_log dm_mod parport_pc parport serio_raw sg dcdbas pcspkr i2c_i801 ehci_pci ehci_hcd button acpi_cpufreq mperf e1000e ptp pps_core ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core ext3 jbd sd_mod ahci libahci libata scsi_mod uhci_hcd
      kernel: CPU: 0 PID: 2882 Comm: tst_tmpfile Not tainted 3.11.0-rc1+ #4
      kernel: Hardware name: Dell Inc. OptiPlex 780 /0V4W66, BIOS A05 08/11/2010
      kernel: task: ffff880112d30050 ti: ffff8801124d4000 task.ti: ffff8801124d4000
      kernel: RIP: 0010:[<ffffffffa00db5ae>] [<ffffffffa00db5ae>] ext3_orphan_add+0x6a/0x1eb [ext3]
      kernel: RSP: 0018:ffff8801124d5cc8  EFLAGS: 00010202
      kernel: RAX: 0000000000000000 RBX: ffff880111510128 RCX: ffff8801114683a0
      kernel: RDX: 0000000000000000 RSI: ffff880111510128 RDI: ffff88010fcf65a8
      kernel: RBP: ffff8801124d5d18 R08: 0080000000000000 R09: ffffffffa00d3b7f
      kernel: R10: ffff8801114683a0 R11: ffff8801032a2558 R12: 0000000000000000
      kernel: R13: ffff88010fcf6800 R14: ffff8801032a2558 R15: ffff8801115100d8
      kernel: FS:  00007f5d172b5700(0000) GS:ffff880117c00000(0000) knlGS:0000000000000000
      kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      kernel: CR2: 00007f5d16df15d0 CR3: 0000000110b1d000 CR4: 00000000000407f0
      kernel: Stack:
      kernel: 000000000000000c ffff8801048a7dc8 ffff8801114685a8 ffffffffa00b80d7
      kernel: ffff8801124d5e38 ffff8801032a2558 ffff88010ce24d68 0000000000000000
      kernel: ffff88011146b300 ffff8801124d5d44 ffff8801124d5d78 ffffffffa00db7e1
      kernel: Call Trace:
      kernel: [<ffffffffa00b80d7>] ? journal_start+0x8c/0xbd [jbd]
      kernel: [<ffffffffa00db7e1>] ext3_tmpfile+0xb2/0x13b [ext3]
      kernel: [<ffffffff821076f8>] path_openat+0x11f/0x5e7
      kernel: [<ffffffff821c86b4>] ? list_del+0x11/0x30
      kernel: [<ffffffff82065fa2>] ?  __dequeue_entity+0x33/0x38
      kernel: [<ffffffff82107cd5>] do_filp_open+0x3f/0x8d
      kernel: [<ffffffff82112532>] ? __alloc_fd+0x50/0x102
      kernel: [<ffffffff820f9296>] do_sys_open+0x13b/0x1cd
      kernel: [<ffffffff820f935c>] SyS_open+0x1e/0x20
      kernel: [<ffffffff82398c02>] system_call_fastpath+0x16/0x1b
      kernel: Code: 39 c7 0f 85 67 01 00 00 0f b7 03 25 00 f0 00 00 3d 00 40 00 00 74 18 3d 00 80 00 00 74 11 3d 00 a0 00 00 74 0a 83 7b 48 00 74 04 <0f> 0b eb fe 49 8b 85 50 03 00 00 4c 89 f6 48 c7 c7 c0 99 0e a0
      kernel: RIP  [<ffffffffa00db5ae>] ext3_orphan_add+0x6a/0x1eb [ext3]
      kernel: RSP <ffff8801124d5cc8>
      
      Here we couldn't call clear_nlink() directly because in d_tmpfile() we
      will call inode_dec_link_count() to decrease ->i_nlink.  So this commit
      tries to call d_tmpfile() before ext4_orphan_add() to fix this problem.
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      dda5690d
  6. 05 7月, 2013 1 次提交
  7. 04 7月, 2013 1 次提交
    • M
      mm: vmscan: take page buffers dirty and locked state into account · b4597226
      Mel Gorman 提交于
      Page reclaim keeps track of dirty and under writeback pages and uses it
      to determine if wait_iff_congested() should stall or if kswapd should
      begin writing back pages.  This fails to account for buffer pages that
      can be under writeback but not PageWriteback which is the case for
      filesystems like ext3 ordered mode.  Furthermore, PageDirty buffer pages
      can have all the buffers clean and writepage does no IO so it should not
      be accounted as congested.
      
      This patch adds an address_space operation that filesystems may
      optionally use to check if a page is really dirty or really under
      writeback.  An implementation is provided for for buffer_heads is added
      and used for block operations and ext3 in ordered mode.  By default the
      page flags are obeyed.
      
      Credit goes to Jan Kara for identifying that the page flags alone are
      not sufficient for ext3 and sanity checking a number of ideas on how the
      problem could be addressed.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
      Cc: Zlatko Calusic <zcalusic@bitsync.net>
      Cc: dormando <dormando@rydia.net>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b4597226
  8. 01 7月, 2013 1 次提交
  9. 29 6月, 2013 2 次提交
    • A
      ext3 ->tmpfile() support · e6bbef95
      Al Viro 提交于
      In this case we do need a bit more than usual, due to orphan
      list handling.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e6bbef95
    • A
      [readdir] convert ext3 · 5ded75ec
      Al Viro 提交于
      new helper: dir_relax(inode).  Call when you are in location that will
      _not_ be invalidated by directory modifications (block boundary, in case
      of ext*).  Returns whether the directory has survived (dropping i_mutex
      allows rmdir to kill the sucker; if it returns false to us, ->iterate()
      is obviously done)
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5ded75ec
  10. 22 5月, 2013 2 次提交
    • L
      jbd: change journal_invalidatepage() to accept length · d8c8900a
      Lukas Czerner 提交于
      ->invalidatepage() aop now accepts range to invalidate so we can make
      use of it in journal_invalidatepage() and all the users in ext3 file
      system. Also update ext3 trace point to print out length argument.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      d8c8900a
    • L
      mm: change invalidatepage prototype to accept length · d47992f8
      Lukas Czerner 提交于
      Currently there is no way to truncate partial page where the end
      truncate point is not at the end of the page. This is because it was not
      needed and the functionality was enough for file system truncate
      operation to work properly. However more file systems now support punch
      hole feature and it can benefit from mm supporting truncating page just
      up to the certain point.
      
      Specifically, with this functionality truncate_inode_pages_range() can
      be changed so it supports truncating partial page at the end of the
      range (currently it will BUG_ON() if 'end' is not at the end of the
      page).
      
      This commit changes the invalidatepage() address space operation
      prototype to accept range to be invalidated and update all the instances
      for it.
      
      We also change the block_invalidatepage() in the same way and actually
      make a use of the new length argument implementing range invalidation.
      
      Actual file system implementations will follow except the file systems
      where the changes are really simple and should not change the behaviour
      in any way .Implementation for truncate_page_range() which will be able
      to accept page unaligned ranges will follow as well.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      d47992f8
  11. 08 5月, 2013 1 次提交
  12. 07 5月, 2013 1 次提交
  13. 30 4月, 2013 1 次提交
    • D
      mm: make snapshotting pages for stable writes a per-bio operation · 71368511
      Darrick J. Wong 提交于
      Walking a bio's page mappings has proved problematic, so create a new
      bio flag to indicate that a bio's data needs to be snapshotted in order
      to guarantee stable pages during writeback.  Next, for the one user
      (ext3/jbd) of snapshotting, hook all the places where writes can be
      initiated without PG_writeback set, and set BIO_SNAP_STABLE there.
      
      We must also flag journal "metadata" bios for stable writeout, since
      file data can be written through the journal.  Finally, the
      MS_SNAP_STABLE mount flag (only used by ext3) is now superfluous, so get
      rid of it.
      
      [akpm@linux-foundation.org: rename _submit_bh()'s `flags' to `bio_flags', delobotomize the _submit_bh declaration]
      [akpm@linux-foundation.org: teeny cleanup]
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Artem Bityutskiy <dedekind1@gmail.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      71368511
  14. 20 3月, 2013 1 次提交
    • J
      ext3: fix data=journal fast mount/umount hang · e6436921
      Jan Kara 提交于
      In data=journal mode, if we unmount the file system before a
      transaction has a chance to complete, when the journal inode is being
      evicted, we can end up calling into log_wait_commit() for the
      last transaction, after the journalling machinery has been shut down.
      That triggers the WARN_ONCE in __log_start_commit().
      
      Arguably we should adjust ext3_should_journal_data() to return FALSE
      for the journal inode, but the only place it matters is
      ext3_evict_inode(), and so it's to save a bit of CPU time, and to make
      the patch much more obviously correct by inspection(tm), we'll fix it
      by explicitly not trying to waiting for a journal commit when we are
      evicting the journal inode, since it's guaranteed to never succeed in
      this case.
      
      This can be easily replicated via:
      
           mount -t ext3 -o data=journal /dev/vdb /vdb ; umount /vdb
      
      This is a port of ext4 fix from Ted Ts'o.
      Signed-off-by: NJan Kara <jack@suse.cz>
      e6436921
  15. 12 3月, 2013 1 次提交
    • L
      ext3: Fix format string issues · 8d0c2d10
      Lars-Peter Clausen 提交于
      ext3_msg() takes the printk prefix as the second parameter and the
      format string as the third parameter. Two callers of ext3_msg omit the
      prefix and pass the format string as the second parameter and the first
      parameter to the format string as the third parameter. In both cases
      this string comes from an arbitrary source. Which means the string may
      contain format string characters, which will
      lead to undefined and potentially harmful behavior.
      
      The issue was introduced in commit 4cf46b67("ext3: Unify log messages
      in ext3") and is fixed by this patch.
      
      CC: stable@vger.kernel.org
      Signed-off-by: NLars-Peter Clausen <lars@metafoo.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      8d0c2d10
  16. 04 3月, 2013 1 次提交
    • E
      fs: Limit sys_mount to only request filesystem modules. · 7f78e035
      Eric W. Biederman 提交于
      Modify the request_module to prefix the file system type with "fs-"
      and add aliases to all of the filesystems that can be built as modules
      to match.
      
      A common practice is to build all of the kernel code and leave code
      that is not commonly needed as modules, with the result that many
      users are exposed to any bug anywhere in the kernel.
      
      Looking for filesystems with a fs- prefix limits the pool of possible
      modules that can be loaded by mount to just filesystems trivially
      making things safer with no real cost.
      
      Using aliases means user space can control the policy of which
      filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
      with blacklist and alias directives.  Allowing simple, safe,
      well understood work-arounds to known problematic software.
      
      This also addresses a rare but unfortunate problem where the filesystem
      name is not the same as it's module name and module auto-loading
      would not work.  While writing this patch I saw a handful of such
      cases.  The most significant being autofs that lives in the module
      autofs4.
      
      This is relevant to user namespaces because we can reach the request
      module in get_fs_type() without having any special permissions, and
      people get uncomfortable when a user specified string (in this case
      the filesystem type) goes all of the way to request_module.
      
      After having looked at this issue I don't think there is any
      particular reason to perform any filtering or permission checks beyond
      making it clear in the module request that we want a filesystem
      module.  The common pattern in the kernel is to call request_module()
      without regards to the users permissions.  In general all a filesystem
      module does once loaded is call register_filesystem() and go to sleep.
      Which means there is not much attack surface exposed by loading a
      filesytem module unless the filesystem is mounted.  In a user
      namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
      which most filesystems do not set today.
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Reported-by: NKees Cook <keescook@google.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      7f78e035
  17. 23 2月, 2013 1 次提交
  18. 22 2月, 2013 1 次提交
    • D
      block: optionally snapshot page contents to provide stable pages during write · ffecfd1a
      Darrick J. Wong 提交于
      This provides a band-aid to provide stable page writes on jbd without
      needing to backport the fixed locking and page writeback bit handling
      schemes of jbd2.  The band-aid works by using bounce buffers to snapshot
      page contents instead of waiting.
      
      For those wondering about the ext3 bandage -- fixing the jbd locking
      (which was done as part of ext4dev years ago) is a lot of surgery, and
      setting PG_writeback on data pages when we actually hold the page lock
      dropped ext3 performance by nearly an order of magnitude.  If we're
      going to migrate iscsi and raid to use stable page writes, the
      complaints about high latency will likely return.  We might as well
      centralize their page snapshotting thing to one place.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Tested-by: NAndy Lutomirski <luto@amacapital.net>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Artem Bityutskiy <dedekind1@gmail.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Eric Van Hensbergen <ericvh@gmail.com>
      Cc: Ron Minnich <rminnich@sandia.gov>
      Cc: Latchesar Ionkov <lucho@ionkov.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ffecfd1a
  19. 21 1月, 2013 5 次提交
  20. 18 12月, 2012 1 次提交
  21. 13 12月, 2012 2 次提交
  22. 20 11月, 2012 1 次提交
  23. 10 10月, 2012 4 次提交
  24. 03 10月, 2012 1 次提交
  25. 18 9月, 2012 3 次提交
    • E
      userns: Convert struct dquot dq_id to be a struct kqid · 4c376dca
      Eric W. Biederman 提交于
      Change struct dquot dq_id to a struct kqid and remove the now
      unecessary dq_type.
      
      Make minimal changes to dquot, quota_tree, quota_v1, quota_v2, ext3,
      ext4, and ocfs2 to deal with the change in quota structures and
      signatures.  The ocfs2 changes are larger than most because of the
      extensive tracing throughout the ocfs2 quota code that prints out
      dq_id.
      
      quota_tree.c:get_index is modified to take a struct kqid instead of a
      qid_t because all of it's callers pass in dquot->dq_id and it allows
      me to introduce only a single conversion.
      
      The rest of the changes are either just replacing dq_type with dq_id.type,
      adding conversions to deal with the change in type and occassionally
      adding qid_eq to allow quota id comparisons in a user namespace safe way.
      
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Theodore Tso <tytso@mit.edu>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      4c376dca
    • E
      userns: Convert extN to support kuids and kgids in posix acls · af84df93
      Eric W. Biederman 提交于
      Convert ext2, ext3, and ext4 to fully support the posix acl changes,
      using e_uid e_gid instead e_id.
      
      Enabled building with posix acls enabled, all filesystems supporting
      user namespaces, now also support posix acls when user namespaces are enabled.
      
      Cc: Theodore Tso <tytso@mit.edu>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      af84df93
    • E
      userns: Pass a userns parameter into posix_acl_to_xattr and posix_acl_from_xattr · 5f3a4a28
      Eric W. Biederman 提交于
       - Pass the user namespace the uid and gid values in the xattr are stored
         in into posix_acl_from_xattr.
      
       - Pass the user namespace kuid and kgid values should be converted into
         when storing uid and gid values in an xattr in posix_acl_to_xattr.
      
      - Modify all callers of posix_acl_from_xattr and posix_acl_to_xattr to
        pass in &init_user_ns.
      
      In the short term this change is not strictly needed but it makes the
      code clearer.  In the longer term this change is necessary to be able to
      mount filesystems outside of the initial user namespace that natively
      store posix acls in the linux xattr format.
      
      Cc: Theodore Tso <tytso@mit.edu>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      5f3a4a28