1. 01 2月, 2017 3 次提交
  2. 25 1月, 2017 1 次提交
  3. 10 1月, 2017 1 次提交
    • Z
      sysctl: Drop reference added by grab_header in proc_sys_readdir · 93362fa4
      Zhou Chengming 提交于
      Fixes CVE-2016-9191, proc_sys_readdir doesn't drop reference
      added by grab_header when return from !dir_emit_dots path.
      It can cause any path called unregister_sysctl_table will
      wait forever.
      
      The calltrace of CVE-2016-9191:
      
      [ 5535.960522] Call Trace:
      [ 5535.963265]  [<ffffffff817cdaaf>] schedule+0x3f/0xa0
      [ 5535.968817]  [<ffffffff817d33fb>] schedule_timeout+0x3db/0x6f0
      [ 5535.975346]  [<ffffffff817cf055>] ? wait_for_completion+0x45/0x130
      [ 5535.982256]  [<ffffffff817cf0d3>] wait_for_completion+0xc3/0x130
      [ 5535.988972]  [<ffffffff810d1fd0>] ? wake_up_q+0x80/0x80
      [ 5535.994804]  [<ffffffff8130de64>] drop_sysctl_table+0xc4/0xe0
      [ 5536.001227]  [<ffffffff8130de17>] drop_sysctl_table+0x77/0xe0
      [ 5536.007648]  [<ffffffff8130decd>] unregister_sysctl_table+0x4d/0xa0
      [ 5536.014654]  [<ffffffff8130deff>] unregister_sysctl_table+0x7f/0xa0
      [ 5536.021657]  [<ffffffff810f57f5>] unregister_sched_domain_sysctl+0x15/0x40
      [ 5536.029344]  [<ffffffff810d7704>] partition_sched_domains+0x44/0x450
      [ 5536.036447]  [<ffffffff817d0761>] ? __mutex_unlock_slowpath+0x111/0x1f0
      [ 5536.043844]  [<ffffffff81167684>] rebuild_sched_domains_locked+0x64/0xb0
      [ 5536.051336]  [<ffffffff8116789d>] update_flag+0x11d/0x210
      [ 5536.057373]  [<ffffffff817cf61f>] ? mutex_lock_nested+0x2df/0x450
      [ 5536.064186]  [<ffffffff81167acb>] ? cpuset_css_offline+0x1b/0x60
      [ 5536.070899]  [<ffffffff810fce3d>] ? trace_hardirqs_on+0xd/0x10
      [ 5536.077420]  [<ffffffff817cf61f>] ? mutex_lock_nested+0x2df/0x450
      [ 5536.084234]  [<ffffffff8115a9f5>] ? css_killed_work_fn+0x25/0x220
      [ 5536.091049]  [<ffffffff81167ae5>] cpuset_css_offline+0x35/0x60
      [ 5536.097571]  [<ffffffff8115aa2c>] css_killed_work_fn+0x5c/0x220
      [ 5536.104207]  [<ffffffff810bc83f>] process_one_work+0x1df/0x710
      [ 5536.110736]  [<ffffffff810bc7c0>] ? process_one_work+0x160/0x710
      [ 5536.117461]  [<ffffffff810bce9b>] worker_thread+0x12b/0x4a0
      [ 5536.123697]  [<ffffffff810bcd70>] ? process_one_work+0x710/0x710
      [ 5536.130426]  [<ffffffff810c3f7e>] kthread+0xfe/0x120
      [ 5536.135991]  [<ffffffff817d4baf>] ret_from_fork+0x1f/0x40
      [ 5536.142041]  [<ffffffff810c3e80>] ? kthread_create_on_node+0x230/0x230
      
      One cgroup maintainer mentioned that "cgroup is trying to offline
      a cpuset css, which takes place under cgroup_mutex.  The offlining
      ends up trying to drain active usages of a sysctl table which apprently
      is not happening."
      The real reason is that proc_sys_readdir doesn't drop reference added
      by grab_header when return from !dir_emit_dots path. So this cpuset
      offline path will wait here forever.
      
      See here for details: http://www.openwall.com/lists/oss-security/2016/11/04/13
      
      Fixes: f0c3b509 ("[readdir] convert procfs")
      Cc: stable@vger.kernel.org
      Reported-by: NCAI Qian <caiqian@redhat.com>
      Tested-by: NYang Shukui <yangshukui@huawei.com>
      Signed-off-by: NZhou Chengming <zhouchengming1@huawei.com>
      Acked-by: NAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      93362fa4
  4. 25 12月, 2016 1 次提交
  5. 13 12月, 2016 11 次提交
  6. 09 12月, 2016 2 次提交
  7. 17 11月, 2016 1 次提交
  8. 15 11月, 2016 1 次提交
  9. 04 11月, 2016 1 次提交
  10. 28 10月, 2016 1 次提交
  11. 25 10月, 2016 1 次提交
    • L
      proc: don't use FOLL_FORCE for reading cmdline and environment · 272ddc8b
      Linus Torvalds 提交于
      Now that Lorenzo cleaned things up and made the FOLL_FORCE users
      explicit, it becomes obvious how some of them don't really need
      FOLL_FORCE at all.
      
      So remove FOLL_FORCE from the proc code that reads the command line and
      arguments from user space.
      
      The mem_rw() function actually does want FOLL_FORCE, because gdd (and
      possibly many other debuggers) use it as a much more convenient version
      of PTRACE_PEEKDATA, but we should consider making the FOLL_FORCE part
      conditional on actually being a ptracer.  This does not actually do
      that, just moves adds a comment to that effect and moves the gup_flags
      settings next to each other.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      272ddc8b
  12. 20 10月, 2016 2 次提交
    • A
      fs/proc: Stop trying to report thread stacks · b18cb64e
      Andy Lutomirski 提交于
      This reverts more of:
      
        b7643757 ("procfs: mark thread stack correctly in proc/<pid>/maps")
      
      ... which was partially reverted by:
      
        65376df5 ("proc: revert /proc/<pid>/maps [stack:TID] annotation")
      
      Originally, /proc/PID/task/TID/maps was the same as /proc/TID/maps.
      
      In current kernels, /proc/PID/maps (or /proc/TID/maps even for
      threads) shows "[stack]" for VMAs in the mm's stack address range.
      
      In contrast, /proc/PID/task/TID/maps uses KSTK_ESP to guess the
      target thread's stack's VMA.  This is racy, probably returns garbage
      and, on arches with CONFIG_TASK_INFO_IN_THREAD=y, is also crash-prone:
      KSTK_ESP is not safe to use on tasks that aren't known to be running
      ordinary process-context kernel code.
      
      This patch removes the difference and just shows "[stack]" for VMAs
      in the mm's stack range.  This is IMO much more sensible -- the
      actual "stack" address really is treated specially by the VM code,
      and the current thread stack isn't even well-defined for programs
      that frequently switch stacks on their own.
      Reported-by: NJann Horn <jann@thejh.net>
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Linux API <linux-api@vger.kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tycho Andersen <tycho.andersen@canonical.com>
      Link: http://lkml.kernel.org/r/3e678474ec14e0a0ec34c611016753eea2e1b8ba.1475257877.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b18cb64e
    • A
      fs/proc: Stop reporting eip and esp in /proc/PID/stat · 0a1eb2d4
      Andy Lutomirski 提交于
      Reporting these fields on a non-current task is dangerous.  If the
      task is in any state other than normal kernel code, they may contain
      garbage or even kernel addresses on some architectures.  (x86_64
      used to do this.  I bet lots of architectures still do.)  With
      CONFIG_THREAD_INFO_IN_TASK=y, it can OOPS, too.
      
      As far as I know, there are no use programs that make any material
      use of these fields, so just get rid of them.
      Reported-by: NJann Horn <jann@thejh.net>
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Linux API <linux-api@vger.kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Cc: Tycho Andersen <tycho.andersen@canonical.com>
      Link: http://lkml.kernel.org/r/a5fed4c3f4e33ed25d4bb03567e329bc5a712bcc.1475257877.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0a1eb2d4
  13. 19 10月, 2016 1 次提交
  14. 08 10月, 2016 9 次提交
    • A
      cred: simpler, 1D supplementary groups · 81243eac
      Alexey Dobriyan 提交于
      Current supplementary groups code can massively overallocate memory and
      is implemented in a way so that access to individual gid is done via 2D
      array.
      
      If number of gids is <= 32, memory allocation is more or less tolerable
      (140/148 bytes).  But if it is not, code allocates full page (!)
      regardless and, what's even more fun, doesn't reuse small 32-entry
      array.
      
      2D array means dependent shifts, loads and LEAs without possibility to
      optimize them (gid is never known at compile time).
      
      All of the above is unnecessary.  Switch to the usual
      trailing-zero-len-array scheme.  Memory is allocated with
      kmalloc/vmalloc() and only as much as needed.  Accesses become simpler
      (LEA 8(gi,idx,4) or even without displacement).
      
      Maximum number of gids is 65536 which translates to 256KB+8 bytes.  I
      think kernel can handle such allocation.
      
      On my usual desktop system with whole 9 (nine) aux groups, struct
      group_info shrinks from 148 bytes to 44 bytes, yay!
      
      Nice side effects:
      
       - "gi->gid[i]" is shorter than "GROUP_AT(gi, i)", less typing,
      
       - fix little mess in net/ipv4/ping.c
         should have been using GROUP_AT macro but this point becomes moot,
      
       - aux group allocation is persistent and should be accounted as such.
      
      Link: http://lkml.kernel.org/r/20160817201927.GA2096@p183.telecom.bySigned-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: Vasily Kulikov <segoon@openwall.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      81243eac
    • R
      mm, proc: fix region lost in /proc/self/smaps · 855af072
      Robert Ho 提交于
      Recently, Redhat reported that nvml test suite failed on QEMU/KVM,
      more detailed info please refer to:
      
         https://bugzilla.redhat.com/show_bug.cgi?id=1365721
      
      Actually, this bug is not only for NVDIMM/DAX but also for any other
      file systems.  This simple test case abstracted from nvml can easily
      reproduce this bug in common environment:
      
      -------------------------- testcase.c -----------------------------
      
      int
      is_pmem_proc(const void *addr, size_t len)
      {
              const char *caddr = addr;
      
              FILE *fp;
              if ((fp = fopen("/proc/self/smaps", "r")) == NULL) {
                      printf("!/proc/self/smaps");
                      return 0;
              }
      
              int retval = 0;         /* assume false until proven otherwise */
              char line[PROCMAXLEN];  /* for fgets() */
              char *lo = NULL;        /* beginning of current range in smaps file */
              char *hi = NULL;        /* end of current range in smaps file */
              int needmm = 0;         /* looking for mm flag for current range */
              while (fgets(line, PROCMAXLEN, fp) != NULL) {
                      static const char vmflags[] = "VmFlags:";
                      static const char mm[] = " wr";
      
                      /* check for range line */
                      if (sscanf(line, "%p-%p", &lo, &hi) == 2) {
                              if (needmm) {
                                      /* last range matched, but no mm flag found */
                                      printf("never found mm flag.\n");
                                      break;
                              } else if (caddr < lo) {
                                      /* never found the range for caddr */
                                      printf("#######no match for addr %p.\n", caddr);
                                      break;
                              } else if (caddr < hi) {
                                      /* start address is in this range */
                                      size_t rangelen = (size_t)(hi - caddr);
      
                                      /* remember that matching has started */
                                      needmm = 1;
      
                                      /* calculate remaining range to search for */
                                      if (len > rangelen) {
                                              len -= rangelen;
                                              caddr += rangelen;
                                              printf("matched %zu bytes in range "
                                                      "%p-%p, %zu left over.\n",
                                                              rangelen, lo, hi, len);
                                      } else {
                                              len = 0;
                                              printf("matched all bytes in range "
                                                              "%p-%p.\n", lo, hi);
                                      }
                              }
                      } else if (needmm && strncmp(line, vmflags,
                                              sizeof(vmflags) - 1) == 0) {
                              if (strstr(&line[sizeof(vmflags) - 1], mm) != NULL) {
                                      printf("mm flag found.\n");
                                      if (len == 0) {
                                              /* entire range matched */
                                              retval = 1;
                                              break;
                                      }
                                      needmm = 0;     /* saw what was needed */
                              } else {
                                      /* mm flag not set for some or all of range */
                                      printf("range has no mm flag.\n");
                                      break;
                              }
                      }
              }
      
              fclose(fp);
      
              printf("returning %d.\n", retval);
              return retval;
      }
      
      void *Addr;
      size_t Size;
      
      /*
       * worker -- the work each thread performs
       */
      static void *
      worker(void *arg)
      {
              int *ret = (int *)arg;
              *ret =  is_pmem_proc(Addr, Size);
              return NULL;
      }
      
      int main(int argc, char *argv[])
      {
              if (argc <  2 || argc > 3) {
                      printf("usage: %s file [env].\n", argv[0]);
                      return -1;
              }
      
              int fd = open(argv[1], O_RDWR);
      
              struct stat stbuf;
              fstat(fd, &stbuf);
      
              Size = stbuf.st_size;
              Addr = mmap(0, stbuf.st_size, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
      
              close(fd);
      
              pthread_t threads[NTHREAD];
              int ret[NTHREAD];
      
              /* kick off NTHREAD threads */
              for (int i = 0; i < NTHREAD; i++)
                      pthread_create(&threads[i], NULL, worker, &ret[i]);
      
              /* wait for all the threads to complete */
              for (int i = 0; i < NTHREAD; i++)
                      pthread_join(threads[i], NULL);
      
              /* verify that all the threads return the same value */
              for (int i = 1; i < NTHREAD; i++) {
                      if (ret[0] != ret[i]) {
                              printf("Error i %d ret[0] = %d ret[i] = %d.\n", i,
                                      ret[0], ret[i]);
                      }
              }
      
              printf("%d", ret[0]);
              return 0;
      }
      
      It failed as some threads can not find the memory region in
      "/proc/self/smaps" which is allocated in the main process
      
      It is caused by proc fs which uses 'file->version' to indicate the VMA that
      is the last one has already been handled by read() system call. When the
      next read() issues, it uses the 'version' to find the VMA, then the next
      VMA is what we want to handle, the related code is as follows:
      
              if (last_addr) {
                      vma = find_vma(mm, last_addr);
                      if (vma && (vma = m_next_vma(priv, vma)))
                              return vma;
              }
      
      However, VMA will be lost if the last VMA is gone, e.g:
      
      The process VMA list is A->B->C->D
      
      CPU 0                                  CPU 1
      read() system call
         handle VMA B
         version = B
      return to userspace
      
                                         unmap VMA B
      
      issue read() again to continue to get
      the region info
         find_vma(version) will get VMA C
         m_next_vma(C) will get VMA D
         handle D
         !!! VMA C is lost !!!
      
      In order to fix this bug, we make 'file->version' indicate the end address
      of the current VMA.  m_start will then look up a vma which with vma_start
      < last_vm_end and moves on to the next vma if we found the same or an
      overlapping vma.  This will guarantee that we will not miss an exclusive
      vma but we can still miss one if the previous vma was shrunk.  This is
      acceptable because guaranteeing "never miss a vma" is simply not feasible.
      User has to cope with some inconsistencies if the file is not read in one
      go.
      
      [mhocko@suse.com: changelog fixes]
      Link: http://lkml.kernel.org/r/1475296958-27652-1-git-send-email-robert.hu@intel.comAcked-by: NDave Hansen <dave.hansen@intel.com>
      Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
      Signed-off-by: NRobert Hu <robert.hu@intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Stefan Hajnoczi <stefanha@redhat.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      855af072
    • J
      proc: fix timerslack_ns CAP_SYS_NICE check when adjusting self · 4b2bd5fe
      John Stultz 提交于
      In changing from checking ptrace_may_access(p, PTRACE_MODE_ATTACH_FSCREDS)
      to capable(CAP_SYS_NICE), I missed that ptrace_my_access succeeds when p
      == current, but the CAP_SYS_NICE doesn't.
      
      Thus while the previous commit was intended to loosen the needed
      privileges to modify a processes timerslack, it needlessly restricted a
      task modifying its own timerslack via the proc/<tid>/timerslack_ns
      (which is permitted also via the PR_SET_TIMERSLACK method).
      
      This patch corrects this by checking if p == current before checking the
      CAP_SYS_NICE value.
      
      This patch applies on top of my two previous patches currently in -mm
      
      Link: http://lkml.kernel.org/r/1471906870-28624-1-git-send-email-john.stultz@linaro.orgSigned-off-by: NJohn Stultz <john.stultz@linaro.org>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: "Serge E. Hallyn" <serge@hallyn.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Oren Laadan <orenl@cellrox.com>
      Cc: Ruchi Kandoi <kandoiruchi@google.com>
      Cc: Rom Lemarchand <romlem@android.com>
      Cc: Todd Kjos <tkjos@google.com>
      Cc: Colin Cross <ccross@android.com>
      Cc: Nick Kralevich <nnk@google.com>
      Cc: Dmitry Shmidt <dimitrysh@google.com>
      Cc: Elliott Hughes <enh@google.com>
      Cc: Android Kernel Team <kernel-team@android.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4b2bd5fe
    • J
      proc: add LSM hook checks to /proc/<tid>/timerslack_ns · 904763e1
      John Stultz 提交于
      As requested, this patch checks the existing LSM hooks
      task_getscheduler/task_setscheduler when reading or modifying the task's
      timerslack value.
      
      Previous versions added new get/settimerslack LSM hooks, but since they
      checked the same PROCESS__SET/GETSCHED values as existing hooks, it was
      suggested we just use the existing ones.
      
      Link: http://lkml.kernel.org/r/1469132667-17377-2-git-send-email-john.stultz@linaro.orgSigned-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: "Serge E. Hallyn" <serge@hallyn.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Oren Laadan <orenl@cellrox.com>
      Cc: Ruchi Kandoi <kandoiruchi@google.com>
      Cc: Rom Lemarchand <romlem@android.com>
      Cc: Todd Kjos <tkjos@google.com>
      Cc: Colin Cross <ccross@android.com>
      Cc: Nick Kralevich <nnk@google.com>
      Cc: Dmitry Shmidt <dimitrysh@google.com>
      Cc: Elliott Hughes <enh@google.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Android Kernel Team <kernel-team@android.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      904763e1
    • J
      proc: relax /proc/<tid>/timerslack_ns capability requirements · 7abbaf94
      John Stultz 提交于
      When an interface to allow a task to change another tasks timerslack was
      first proposed, it was suggested that something greater then
      CAP_SYS_NICE would be needed, as a task could be delayed further then
      what normally could be done with nice adjustments.
      
      So CAP_SYS_PTRACE was adopted instead for what became the
      /proc/<tid>/timerslack_ns interface.  However, for Android (where this
      feature originates), giving the system_server CAP_SYS_PTRACE would allow
      it to observe and modify all tasks memory.  This is considered too high
      a privilege level for only needing to change the timerslack.
      
      After some discussion, it was realized that a CAP_SYS_NICE process can
      set a task as SCHED_FIFO, so they could fork some spinning processes and
      set them all SCHED_FIFO 99, in effect delaying all other tasks for an
      infinite amount of time.
      
      So as a CAP_SYS_NICE task can already cause trouble for other tasks,
      using it as a required capability for accessing and modifying
      /proc/<tid>/timerslack_ns seems sufficient.
      
      Thus, this patch loosens the capability requirements to CAP_SYS_NICE and
      removes CAP_SYS_PTRACE, simplifying some of the code flow as well.
      
      This is technically an ABI change, but as the feature just landed in
      4.6, I suspect no one is yet using it.
      
      Link: http://lkml.kernel.org/r/1469132667-17377-1-git-send-email-john.stultz@linaro.orgSigned-off-by: NJohn Stultz <john.stultz@linaro.org>
      Reviewed-by: NNick Kralevich <nnk@google.com>
      Acked-by: NSerge Hallyn <serge@hallyn.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: "Serge E. Hallyn" <serge@hallyn.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Oren Laadan <orenl@cellrox.com>
      Cc: Ruchi Kandoi <kandoiruchi@google.com>
      Cc: Rom Lemarchand <romlem@android.com>
      Cc: Todd Kjos <tkjos@google.com>
      Cc: Colin Cross <ccross@android.com>
      Cc: Nick Kralevich <nnk@google.com>
      Cc: Dmitry Shmidt <dimitrysh@google.com>
      Cc: Elliott Hughes <enh@google.com>
      Cc: Android Kernel Team <kernel-team@android.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7abbaf94
    • J
      meminfo: break apart a very long seq_printf with #ifdefs · e16e2d8e
      Joe Perches 提交于
      Use a specific routine to emit most lines so that the code is easier to
      read and maintain.
      
      akpm:
         text    data     bss     dec     hex filename
         2976       8       0    2984     ba8 fs/proc/meminfo.o before
         2669       8       0    2677     a75 fs/proc/meminfo.o after
      
      Link: http://lkml.kernel.org/r/8fce7fdef2ba081a4ef531594e97da8a9feebb58.1470810406.git.joe@perches.comSigned-off-by: NJoe Perches <joe@perches.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e16e2d8e
    • J
      seq/proc: modify seq_put_decimal_[u]ll to take a const char *, not char · 75ba1d07
      Joe Perches 提交于
      Allow some seq_puts removals by taking a string instead of a single
      char.
      
      [akpm@linux-foundation.org: update vmstat_show(), per Joe]
      Link: http://lkml.kernel.org/r/667e1cf3d436de91a5698170a1e98d882905e956.1470704995.git.joe@perches.comSigned-off-by: NJoe Perches <joe@perches.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      75ba1d07
    • A
      proc: faster /proc/*/status · f7a5f132
      Alexey Dobriyan 提交于
      top(1) opens the following files for every PID:
      
      	/proc/*/stat
      	/proc/*/statm
      	/proc/*/status
      
      This patch switches /proc/*/status away from seq_printf().
      The result is 13.5% speedup.
      
      Benchmark is open("/proc/self/status")+read+close 1.000.000 million times.
      
      				BEFORE
      $ perf stat -r 10 taskset -c 3 ./proc-self-status
      
       Performance counter stats for 'taskset -c 3 ./proc-self-status' (10 runs):
      
            10748.474301      task-clock (msec)         #    0.954 CPUs utilized            ( +-  0.91% )
                      12      context-switches          #    0.001 K/sec                    ( +-  1.09% )
                       1      cpu-migrations            #    0.000 K/sec
                     104      page-faults               #    0.010 K/sec                    ( +-  0.45% )
          37,424,127,876      cycles                    #    3.482 GHz                      ( +-  0.04% )
           8,453,010,029      stalled-cycles-frontend   #   22.59% frontend cycles idle     ( +-  0.12% )
           3,747,609,427      stalled-cycles-backend    #  10.01% backend cycles idle       ( +-  0.68% )
          65,632,764,147      instructions              #    1.75  insn per cycle
                                                        #    0.13  stalled cycles per insn  ( +-  0.00% )
          13,981,324,775      branches                  # 1300.773 M/sec                    ( +-  0.00% )
             138,967,110      branch-misses             #    0.99% of all branches          ( +-  0.18% )
      
            11.263885428 seconds time elapsed                                          ( +-  0.04% )
            ^^^^^^^^^^^^
      
      				AFTER
      $ perf stat -r 10 taskset -c 3 ./proc-self-status
      
       Performance counter stats for 'taskset -c 3 ./proc-self-status' (10 runs):
      
             9010.521776      task-clock (msec)         #    0.925 CPUs utilized            ( +-  1.54% )
                      11      context-switches          #    0.001 K/sec                    ( +-  1.54% )
                       1      cpu-migrations            #    0.000 K/sec                    ( +- 11.11% )
                     103      page-faults               #    0.011 K/sec                    ( +-  0.60% )
          32,352,310,603      cycles                    #    3.591 GHz                      ( +-  0.07% )
           7,849,199,578      stalled-cycles-frontend   #   24.26% frontend cycles idle     ( +-  0.27% )
           3,269,738,842      stalled-cycles-backend    #  10.11% backend cycles idle       ( +-  0.73% )
          56,012,163,567      instructions              #    1.73  insn per cycle
                                                        #    0.14  stalled cycles per insn  ( +-  0.00% )
          11,735,778,795      branches                  # 1302.453 M/sec                    ( +-  0.00% )
              98,084,459      branch-misses             #    0.84% of all branches          ( +-  0.28% )
      
             9.741247736 seconds time elapsed                                          ( +-  0.07% )
             ^^^^^^^^^^^
      
      Link: http://lkml.kernel.org/r/20160806125608.GB1187@p183.telecom.bySigned-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f7a5f132
    • J
      fs/proc/task_mmu.c: make the task_mmu walk_page_range() limit in clear_refs_write() obvious · 0f30206b
      James Morse 提交于
      Trying to walk all of virtual memory requires architecture specific
      knowledge.  On x86_64, addresses must be sign extended from bit 48,
      whereas on arm64 the top VA_BITS of address space have their own set of
      page tables.
      
      clear_refs_write() calls walk_page_range() on the range 0 to ~0UL, it
      provides a test_walk() callback that only expects to be walking over
      VMAs.  Currently walk_pmd_range() will skip memory regions that don't
      have a VMA, reporting them as a hole.
      
      As this call only expects to walk user address space, make it walk 0 to
      'highest_vm_end'.
      
      Link: http://lkml.kernel.org/r/1472655792-22439-1-git-send-email-james.morse@arm.comSigned-off-by: NJames Morse <james.morse@arm.com>
      Acked-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0f30206b
  15. 06 10月, 2016 1 次提交
  16. 28 9月, 2016 3 次提交