1. 03 5月, 2012 4 次提交
  2. 26 4月, 2012 1 次提交
    • E
      userns: Rework the user_namespace adding uid/gid mapping support · 22d917d8
      Eric W. Biederman 提交于
      - Convert the old uid mapping functions into compatibility wrappers
      - Add a uid/gid mapping layer from user space uid and gids to kernel
        internal uids and gids that is extent based for simplicty and speed.
        * Working with number space after mapping uids/gids into their kernel
          internal version adds only mapping complexity over what we have today,
          leaving the kernel code easy to understand and test.
      - Add proc files /proc/self/uid_map /proc/self/gid_map
        These files display the mapping and allow a mapping to be added
        if a mapping does not exist.
      - Allow entering the user namespace without a uid or gid mapping.
        Since we are starting with an existing user our uids and gids
        still have global mappings so are still valid and useful they just don't
        have local mappings.  The requirement for things to work are global uid
        and gid so it is odd but perfectly fine not to have a local uid
        and gid mapping.
        Not requiring global uid and gid mappings greatly simplifies
        the logic of setting up the uid and gid mappings by allowing
        the mappings to be set after the namespace is created which makes the
        slight weirdness worth it.
      - Make the mappings in the initial user namespace to the global
        uid/gid space explicit.  Today it is an identity mapping
        but in the future we may want to twist this for debugging, similar
        to what we do with jiffies.
      - Document the memory ordering requirements of setting the uid and
        gid mappings.  We only allow the mappings to be set once
        and there are no pointers involved so the requirments are
        trivial but a little atypical.
      
      Performance:
      
      In this scheme for the permission checks the performance is expected to
      stay the same as the actuall machine instructions should remain the same.
      
      The worst case I could think of is ls -l on a large directory where
      all of the stat results need to be translated with from kuids and
      kgids to uids and gids.  So I benchmarked that case on my laptop
      with a dual core hyperthread Intel i5-2520M cpu with 3M of cpu cache.
      
      My benchmark consisted of going to single user mode where nothing else
      was running. On an ext4 filesystem opening 1,000,000 files and looping
      through all of the files 1000 times and calling fstat on the
      individuals files.  This was to ensure I was benchmarking stat times
      where the inodes were in the kernels cache, but the inode values were
      not in the processors cache.  My results:
      
      v3.4-rc1:         ~= 156ns (unmodified v3.4-rc1 with user namespace support disabled)
      v3.4-rc1-userns-: ~= 155ns (v3.4-rc1 with my user namespace patches and user namespace support disabled)
      v3.4-rc1-userns+: ~= 164ns (v3.4-rc1 with my user namespace patches and user namespace support enabled)
      
      All of the configurations ran in roughly 120ns when I performed tests
      that ran in the cpu cache.
      
      So in summary the performance impact is:
      1ns improvement in the worst case with user namespace support compiled out.
      8ns aka 5% slowdown in the worst case with user namespace support compiled in.
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      22d917d8
  3. 08 4月, 2012 3 次提交
  4. 03 4月, 2012 1 次提交
  5. 01 4月, 2012 20 次提交
  6. 30 3月, 2012 3 次提交
    • L
      Revert "ext4: don't release page refs in ext4_end_bio()" · 6268b325
      Linus Torvalds 提交于
      This reverts commit b43d17f3.
      
      Dave Jones reports that it causes lockups on his laptop, and his debug
      output showed a lot of processes hung waiting for page_writeback (or
      more commonly - processes hung waiting for a lock that was held during
      that writeback wait).
      
      The page_writeback hint made Ted suggest that Dave look at this commit,
      and Dave verified that reverting it makes his problems go away.
      
      Ted says:
       "That commit fixes a race which is seen when you write into fallocated
        (and hence uninitialized) disk blocks under *very* heavy memory
        pressure.  Furthermore, although theoretically it could trigger under
        normal direct I/O writes, it only seems to trigger if you are issuing
        a huge number of AIO writes, such that a just-written page can get
        evicted from memory, and then read back into memory, before the
        workqueue has a chance to update the extent tree.
      
        This race has been around for a little over a year, and no one noticed
        until two months ago; it only happens under fairly exotic conditions,
        and in fact even after trying very hard to create a simple repro under
        lab conditions, we could only reproduce the problem and confirm the
        fix on production servers running MySQL on very fast PCIe-attached
        flash devices.
      
        Given that Dave was able to hit this problem pretty quickly, if we
        confirm that this commit is at fault, the only reasonable thing to do
        is to revert it IMO."
      Reported-and-tested-by: NDave Jones <davej@redhat.com>
      Acked-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6268b325
    • N
      pagemap: remove remaining unneeded spin_lock() · 10bdfb5e
      Naoya Horiguchi 提交于
      Commit 025c5b24 ("thp: optimize away unnecessary page table
      locking") moves spin_lock() into pmd_trans_huge_lock() in order to avoid
      locking unless pmd is for thp.  So this spin_lock() is a bug.
      Reported-by: NSasha Levin <levinsasha928@gmail.com>
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      10bdfb5e
    • C
      Btrfs: update the checks for mixed block groups with big metadata blocks · bc3f116f
      Chris Mason 提交于
      Dave Sterba had put in patches to look for mixed data/metadata groups
      with metadata bigger than 4KB.  But these ended up in the wrong place
      and it wasn't testing the feature flag correctly.
      
      This updates the tests to make sure our sizes are matching
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      bc3f116f
  7. 29 3月, 2012 8 次提交