1. 05 6月, 2012 1 次提交
    • L
      vfs: Fix /proc/<tid>/fdinfo/<fd> file handling · 0640113b
      Linus Torvalds 提交于
      Cyrill Gorcunov reports that I broke the fdinfo files with commit
      30a08bf2 ("proc: move fd symlink i_mode calculations into
      tid_fd_revalidate()"), and he's quite right.
      
      The tid_fd_revalidate() function is not just used for the <tid>/fd
      symlinks, it's also used for the <tid>/fdinfo/<fd> files, and the
      permission model for those are different.
      
      So do the dynamic symlink permission handling just for symlinks, making
      the fdinfo files once more appear as the proper regular files they are.
      
      Of course, Al Viro argued (probably correctly) that we shouldn't do the
      symlink permission games at all, and make the symlinks always just be
      the normal 'lrwxrwxrwx'.  That would have avoided this issue too, but
      since somebody noticed that the permissions had changed (which was the
      reason for that original commit 30a08bf2 in the first place), people
      do apparently use this feature.
      
      [ Basically, you can use the symlink permission data as a cheap "fdinfo"
        replacement, since you see whether the file is open for reading and/or
        writing by just looking at st_mode of the symlink.  So the feature
        does make sense, even if the pain it has caused means we probably
        shouldn't have done it to begin with. ]
      Reported-and-tested-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0640113b
  2. 01 6月, 2012 11 次提交
  3. 30 5月, 2012 2 次提交
  4. 19 5月, 2012 1 次提交
    • L
      proc: move fd symlink i_mode calculations into tid_fd_revalidate() · 30a08bf2
      Linus Torvalds 提交于
      Instead of doing the i_mode calculations at proc_fd_instantiate() time,
      move them into tid_fd_revalidate(), which is where the other inode state
      (notably uid/gid information) is updated too.
      
      Otherwise we'll end up with stale i_mode information if an fd is re-used
      while the dentry still hangs around.  Not that anything really *cares*
      (symlink permissions don't really matter), but Tetsuo Handa noticed that
      the owner read/write bits don't always match the state of the
      readability of the file descriptor, and we _used_ to get this right a
      long time ago in a galaxy far, far away.
      
      Besides, aside from fixing an ugly detail (that has apparently been this
      way since commit 61a28784: "proc: Remove the hard coded inode
      numbers" in 2006), this removes more lines of code than it adds.  And it
      just makes sense to update i_mode in the same place we update i_uid/gid.
      
      Al Viro correctly points out that we could just do the inode fill in the
      inode iops ->getattr() function instead.  However, that does require
      somewhat slightly more invasive changes, and adds yet *another* lookup
      of the file descriptor.  We need to do the revalidate() for other
      reasons anyway, and have the file descriptor handy, so we might as well
      fill in the information at this point.
      Reported-by: NTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Acked-by: NEric Biederman <ebiederm@xmission.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      30a08bf2
  5. 18 5月, 2012 1 次提交
    • C
      fs, proc: fix ABBA deadlock in case of execution attempt of map_files/ entries · eb94cd96
      Cyrill Gorcunov 提交于
      map_files/ entries are never supposed to be executed, still curious
      minds might try to run them, which leads to the following deadlock
      
        ======================================================
        [ INFO: possible circular locking dependency detected ]
        3.4.0-rc4-24406-g841e6a6 #121 Not tainted
        -------------------------------------------------------
        bash/1556 is trying to acquire lock:
         (&sb->s_type->i_mutex_key#8){+.+.+.}, at: do_lookup+0x267/0x2b1
      
        but task is already holding lock:
         (&sig->cred_guard_mutex){+.+.+.}, at: prepare_bprm_creds+0x2d/0x69
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #1 (&sig->cred_guard_mutex){+.+.+.}:
               validate_chain+0x444/0x4f4
               __lock_acquire+0x387/0x3f8
               lock_acquire+0x12b/0x158
               __mutex_lock_common+0x56/0x3a9
               mutex_lock_killable_nested+0x40/0x45
               lock_trace+0x24/0x59
               proc_map_files_lookup+0x5a/0x165
               __lookup_hash+0x52/0x73
               do_lookup+0x276/0x2b1
               walk_component+0x3d/0x114
               do_last+0xfc/0x540
               path_openat+0xd3/0x306
               do_filp_open+0x3d/0x89
               do_sys_open+0x74/0x106
               sys_open+0x21/0x23
               tracesys+0xdd/0xe2
      
        -> #0 (&sb->s_type->i_mutex_key#8){+.+.+.}:
               check_prev_add+0x6a/0x1ef
               validate_chain+0x444/0x4f4
               __lock_acquire+0x387/0x3f8
               lock_acquire+0x12b/0x158
               __mutex_lock_common+0x56/0x3a9
               mutex_lock_nested+0x40/0x45
               do_lookup+0x267/0x2b1
               walk_component+0x3d/0x114
               link_path_walk+0x1f9/0x48f
               path_openat+0xb6/0x306
               do_filp_open+0x3d/0x89
               open_exec+0x25/0xa0
               do_execve_common+0xea/0x2f9
               do_execve+0x43/0x45
               sys_execve+0x43/0x5a
               stub_execve+0x6c/0xc0
      
      This is because prepare_bprm_creds grabs task->signal->cred_guard_mutex
      and when do_lookup happens we try to grab task->signal->cred_guard_mutex
      again in lock_trace.
      
      Fix it using plain ptrace_may_access() helper in proc_map_files_lookup()
      and in proc_map_files_readdir() instead of lock_trace(), the caller must
      be CAP_SYS_ADMIN granted anyway.
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Reported-by: NSasha Levin <levinsasha928@gmail.com>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Vasiliy Kulikov <segoon@openwall.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eb94cd96
  6. 16 5月, 2012 2 次提交
  7. 11 5月, 2012 1 次提交
  8. 06 5月, 2012 1 次提交
  9. 03 5月, 2012 1 次提交
  10. 26 4月, 2012 2 次提交
    • E
      userns: Rework the user_namespace adding uid/gid mapping support · 22d917d8
      Eric W. Biederman 提交于
      - Convert the old uid mapping functions into compatibility wrappers
      - Add a uid/gid mapping layer from user space uid and gids to kernel
        internal uids and gids that is extent based for simplicty and speed.
        * Working with number space after mapping uids/gids into their kernel
          internal version adds only mapping complexity over what we have today,
          leaving the kernel code easy to understand and test.
      - Add proc files /proc/self/uid_map /proc/self/gid_map
        These files display the mapping and allow a mapping to be added
        if a mapping does not exist.
      - Allow entering the user namespace without a uid or gid mapping.
        Since we are starting with an existing user our uids and gids
        still have global mappings so are still valid and useful they just don't
        have local mappings.  The requirement for things to work are global uid
        and gid so it is odd but perfectly fine not to have a local uid
        and gid mapping.
        Not requiring global uid and gid mappings greatly simplifies
        the logic of setting up the uid and gid mappings by allowing
        the mappings to be set after the namespace is created which makes the
        slight weirdness worth it.
      - Make the mappings in the initial user namespace to the global
        uid/gid space explicit.  Today it is an identity mapping
        but in the future we may want to twist this for debugging, similar
        to what we do with jiffies.
      - Document the memory ordering requirements of setting the uid and
        gid mappings.  We only allow the mappings to be set once
        and there are no pointers involved so the requirments are
        trivial but a little atypical.
      
      Performance:
      
      In this scheme for the permission checks the performance is expected to
      stay the same as the actuall machine instructions should remain the same.
      
      The worst case I could think of is ls -l on a large directory where
      all of the stat results need to be translated with from kuids and
      kgids to uids and gids.  So I benchmarked that case on my laptop
      with a dual core hyperthread Intel i5-2520M cpu with 3M of cpu cache.
      
      My benchmark consisted of going to single user mode where nothing else
      was running. On an ext4 filesystem opening 1,000,000 files and looping
      through all of the files 1000 times and calling fstat on the
      individuals files.  This was to ensure I was benchmarking stat times
      where the inodes were in the kernels cache, but the inode values were
      not in the processors cache.  My results:
      
      v3.4-rc1:         ~= 156ns (unmodified v3.4-rc1 with user namespace support disabled)
      v3.4-rc1-userns-: ~= 155ns (v3.4-rc1 with my user namespace patches and user namespace support disabled)
      v3.4-rc1-userns+: ~= 164ns (v3.4-rc1 with my user namespace patches and user namespace support enabled)
      
      All of the configurations ran in roughly 120ns when I performed tests
      that ran in the cpu cache.
      
      So in summary the performance impact is:
      1ns improvement in the worst case with user namespace support compiled out.
      8ns aka 5% slowdown in the worst case with user namespace support compiled in.
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      22d917d8
    • W
      revert "proc: clear_refs: do not clear reserved pages" · 63f61a6f
      Will Deacon 提交于
      Revert commit 85e72aa5 ("proc: clear_refs: do not clear reserved
      pages"), which was a quick fix suitable for -stable until ARM had been
      moved over to the gate_vma mechanism:
      
      https://lkml.org/lkml/2012/1/14/55
      
      With commit f9d4861f ("ARM: 7294/1: vectors: use gate_vma for vectors user
      mapping"), ARM does now use the gate_vma, so the PageReserved check can be
      removed from the proc code.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Cc: Nicolas Pitre <nico@linaro.org>
      Acked-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      63f61a6f
  11. 06 4月, 2012 1 次提交
  12. 30 3月, 2012 2 次提交
  13. 29 3月, 2012 4 次提交
  14. 24 3月, 2012 5 次提交
  15. 23 3月, 2012 1 次提交
    • L
      sysctl: protect poll() in entries that may go away · 4e474a00
      Lucas De Marchi 提交于
      Protect code accessing ctl_table by grabbing the header with grab_header()
      and after releasing with sysctl_head_finish().  This is needed if poll()
      is called in entries created by modules: currently only hostname and
      domainname support poll(), but this bug may be triggered when/if modules
      use it and if user called poll() in a file that doesn't support it.
      
      Dave Jones reported the following when using a syscall fuzzer while
      hibernating/resuming:
      
      RIP: 0010:[<ffffffff81233e3e>]  [<ffffffff81233e3e>] proc_sys_poll+0x4e/0x90
      RAX: 0000000000000145 RBX: ffff88020cab6940 RCX: 0000000000000000
      RDX: ffffffff81233df0 RSI: 6b6b6b6b6b6b6b6b RDI: ffff88020cab6940
      [ ... ]
      Code: 00 48 89 fb 48 89 f1 48 8b 40 30 4c 8b 60 e8 b8 45 01 00 00 49 83
      7c 24 28 00 74 2e 49 8b 74 24 30 48 85 f6 74 24 48 85 c9 75 32 <8b> 16
      b8 45 01 00 00 48 63 d2 49 39 d5 74 10 8b 06 48 98 48 89
      
      If an entry goes away while we are polling() it, ctl_table may not exist
      anymore.
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      4e474a00
  16. 22 3月, 2012 4 次提交
    • S
      procfs: mark thread stack correctly in proc/<pid>/maps · b7643757
      Siddhesh Poyarekar 提交于
      Stack for a new thread is mapped by userspace code and passed via
      sys_clone.  This memory is currently seen as anonymous in
      /proc/<pid>/maps, which makes it difficult to ascertain which mappings
      are being used for thread stacks.  This patch uses the individual task
      stack pointers to determine which vmas are actually thread stacks.
      
      For a multithreaded program like the following:
      
      	#include <pthread.h>
      
      	void *thread_main(void *foo)
      	{
      		while(1);
      	}
      
      	int main()
      	{
      		pthread_t t;
      		pthread_create(&t, NULL, thread_main, NULL);
      		pthread_join(t, NULL);
      	}
      
      proc/PID/maps looks like the following:
      
          00400000-00401000 r-xp 00000000 fd:0a 3671804                            /home/siddhesh/a.out
          00600000-00601000 rw-p 00000000 fd:0a 3671804                            /home/siddhesh/a.out
          019ef000-01a10000 rw-p 00000000 00:00 0                                  [heap]
          7f8a44491000-7f8a44492000 ---p 00000000 00:00 0
          7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0
          7f8a44c92000-7f8a44e3d000 r-xp 00000000 fd:00 2097482                    /lib64/libc-2.14.90.so
          7f8a44e3d000-7f8a4503d000 ---p 001ab000 fd:00 2097482                    /lib64/libc-2.14.90.so
          7f8a4503d000-7f8a45041000 r--p 001ab000 fd:00 2097482                    /lib64/libc-2.14.90.so
          7f8a45041000-7f8a45043000 rw-p 001af000 fd:00 2097482                    /lib64/libc-2.14.90.so
          7f8a45043000-7f8a45048000 rw-p 00000000 00:00 0
          7f8a45048000-7f8a4505f000 r-xp 00000000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
          7f8a4505f000-7f8a4525e000 ---p 00017000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
          7f8a4525e000-7f8a4525f000 r--p 00016000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
          7f8a4525f000-7f8a45260000 rw-p 00017000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
          7f8a45260000-7f8a45264000 rw-p 00000000 00:00 0
          7f8a45264000-7f8a45286000 r-xp 00000000 fd:00 2097348                    /lib64/ld-2.14.90.so
          7f8a45457000-7f8a4545a000 rw-p 00000000 00:00 0
          7f8a45484000-7f8a45485000 rw-p 00000000 00:00 0
          7f8a45485000-7f8a45486000 r--p 00021000 fd:00 2097348                    /lib64/ld-2.14.90.so
          7f8a45486000-7f8a45487000 rw-p 00022000 fd:00 2097348                    /lib64/ld-2.14.90.so
          7f8a45487000-7f8a45488000 rw-p 00000000 00:00 0
          7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0                          [stack]
          7fff627ff000-7fff62800000 r-xp 00000000 00:00 0                          [vdso]
          ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
      
      Here, one could guess that 7f8a44492000-7f8a44c92000 is a stack since
      the earlier vma that has no permissions (7f8a44e3d000-7f8a4503d000) but
      that is not always a reliable way to find out which vma is a thread
      stack.  Also, /proc/PID/maps and /proc/PID/task/TID/maps has the same
      content.
      
      With this patch in place, /proc/PID/task/TID/maps are treated as 'maps
      as the task would see it' and hence, only the vma that that task uses as
      stack is marked as [stack].  All other 'stack' vmas are marked as
      anonymous memory.  /proc/PID/maps acts as a thread group level view,
      where all thread stack vmas are marked as [stack:TID] where TID is the
      process ID of the task that uses that vma as stack, while the process
      stack is marked as [stack].
      
      So /proc/PID/maps will look like this:
      
          00400000-00401000 r-xp 00000000 fd:0a 3671804                            /home/siddhesh/a.out
          00600000-00601000 rw-p 00000000 fd:0a 3671804                            /home/siddhesh/a.out
          019ef000-01a10000 rw-p 00000000 00:00 0                                  [heap]
          7f8a44491000-7f8a44492000 ---p 00000000 00:00 0
          7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0                          [stack:1442]
          7f8a44c92000-7f8a44e3d000 r-xp 00000000 fd:00 2097482                    /lib64/libc-2.14.90.so
          7f8a44e3d000-7f8a4503d000 ---p 001ab000 fd:00 2097482                    /lib64/libc-2.14.90.so
          7f8a4503d000-7f8a45041000 r--p 001ab000 fd:00 2097482                    /lib64/libc-2.14.90.so
          7f8a45041000-7f8a45043000 rw-p 001af000 fd:00 2097482                    /lib64/libc-2.14.90.so
          7f8a45043000-7f8a45048000 rw-p 00000000 00:00 0
          7f8a45048000-7f8a4505f000 r-xp 00000000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
          7f8a4505f000-7f8a4525e000 ---p 00017000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
          7f8a4525e000-7f8a4525f000 r--p 00016000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
          7f8a4525f000-7f8a45260000 rw-p 00017000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
          7f8a45260000-7f8a45264000 rw-p 00000000 00:00 0
          7f8a45264000-7f8a45286000 r-xp 00000000 fd:00 2097348                    /lib64/ld-2.14.90.so
          7f8a45457000-7f8a4545a000 rw-p 00000000 00:00 0
          7f8a45484000-7f8a45485000 rw-p 00000000 00:00 0
          7f8a45485000-7f8a45486000 r--p 00021000 fd:00 2097348                    /lib64/ld-2.14.90.so
          7f8a45486000-7f8a45487000 rw-p 00022000 fd:00 2097348                    /lib64/ld-2.14.90.so
          7f8a45487000-7f8a45488000 rw-p 00000000 00:00 0
          7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0                          [stack]
          7fff627ff000-7fff62800000 r-xp 00000000 00:00 0                          [vdso]
          ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
      
      Thus marking all vmas that are used as stacks by the threads in the
      thread group along with the process stack.  The task level maps will
      however like this:
      
          00400000-00401000 r-xp 00000000 fd:0a 3671804                            /home/siddhesh/a.out
          00600000-00601000 rw-p 00000000 fd:0a 3671804                            /home/siddhesh/a.out
          019ef000-01a10000 rw-p 00000000 00:00 0                                  [heap]
          7f8a44491000-7f8a44492000 ---p 00000000 00:00 0
          7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0                          [stack]
          7f8a44c92000-7f8a44e3d000 r-xp 00000000 fd:00 2097482                    /lib64/libc-2.14.90.so
          7f8a44e3d000-7f8a4503d000 ---p 001ab000 fd:00 2097482                    /lib64/libc-2.14.90.so
          7f8a4503d000-7f8a45041000 r--p 001ab000 fd:00 2097482                    /lib64/libc-2.14.90.so
          7f8a45041000-7f8a45043000 rw-p 001af000 fd:00 2097482                    /lib64/libc-2.14.90.so
          7f8a45043000-7f8a45048000 rw-p 00000000 00:00 0
          7f8a45048000-7f8a4505f000 r-xp 00000000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
          7f8a4505f000-7f8a4525e000 ---p 00017000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
          7f8a4525e000-7f8a4525f000 r--p 00016000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
          7f8a4525f000-7f8a45260000 rw-p 00017000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
          7f8a45260000-7f8a45264000 rw-p 00000000 00:00 0
          7f8a45264000-7f8a45286000 r-xp 00000000 fd:00 2097348                    /lib64/ld-2.14.90.so
          7f8a45457000-7f8a4545a000 rw-p 00000000 00:00 0
          7f8a45484000-7f8a45485000 rw-p 00000000 00:00 0
          7f8a45485000-7f8a45486000 r--p 00021000 fd:00 2097348                    /lib64/ld-2.14.90.so
          7f8a45486000-7f8a45487000 rw-p 00022000 fd:00 2097348                    /lib64/ld-2.14.90.so
          7f8a45487000-7f8a45488000 rw-p 00000000 00:00 0
          7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0
          7fff627ff000-7fff62800000 r-xp 00000000 00:00 0                          [vdso]
          ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
      
      where only the vma that is being used as a stack by *that* task is
      marked as [stack].
      
      Analogous changes have been made to /proc/PID/smaps,
      /proc/PID/numa_maps, /proc/PID/task/TID/smaps and
      /proc/PID/task/TID/numa_maps. Relevant snippets from smaps and
      numa_maps:
      
          [siddhesh@localhost ~ ]$ pgrep a.out
          1441
          [siddhesh@localhost ~ ]$ cat /proc/1441/smaps | grep "\[stack"
          7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0                          [stack:1442]
          7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0                          [stack]
          [siddhesh@localhost ~ ]$ cat /proc/1441/task/1442/smaps | grep "\[stack"
          7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0                          [stack]
          [siddhesh@localhost ~ ]$ cat /proc/1441/task/1441/smaps | grep "\[stack"
          7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0                          [stack]
          [siddhesh@localhost ~ ]$ cat /proc/1441/numa_maps | grep "stack"
          7f8a44492000 default stack:1442 anon=2 dirty=2 N0=2
          7fff6273a000 default stack anon=3 dirty=3 N0=3
          [siddhesh@localhost ~ ]$ cat /proc/1441/task/1442/numa_maps | grep "stack"
          7f8a44492000 default stack anon=2 dirty=2 N0=2
          [siddhesh@localhost ~ ]$ cat /proc/1441/task/1441/numa_maps | grep "stack"
          7fff6273a000 default stack anon=3 dirty=3 N0=3
      
      [akpm@linux-foundation.org: checkpatch fixes]
      [akpm@linux-foundation.org: fix build]
      Signed-off-by: NSiddhesh Poyarekar <siddhesh.poyarekar@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Jamie Lokier <jamie@shareable.org>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b7643757
    • N
      pagemap: introduce data structure for pagemap entry · 092b50ba
      Naoya Horiguchi 提交于
      Currently a local variable of pagemap entry in pagemap_pte_range() is
      named pfn and typed with u64, but it's not correct (pfn should be unsigned
      long.)
      
      This patch introduces special type for pagemap entries and replaces code
      with it.
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      092b50ba
    • N
      pagemap: export KPF_THP · e873c49f
      Naoya Horiguchi 提交于
      This flag shows that a given page is a subpage of a transparent hugepage.
      It helps us debug and test the kernel by showing physical address of thp.
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reviewed-by: NWu Fengguang <fengguang.wu@intel.com>
      Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e873c49f
    • N
      thp: optimize away unnecessary page table locking · 025c5b24
      Naoya Horiguchi 提交于
      Currently when we check if we can handle thp as it is or we need to split
      it into regular sized pages, we hold page table lock prior to check
      whether a given pmd is mapping thp or not.  Because of this, when it's not
      "huge pmd" we suffer from unnecessary lock/unlock overhead.  To remove it,
      this patch introduces a optimized check function and replace several
      similar logics with it.
      
      [akpm@linux-foundation.org: checkpatch fixes]
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      025c5b24