1. 13 10月, 2016 2 次提交
  2. 12 10月, 2016 27 次提交
  3. 11 10月, 2016 3 次提交
  4. 10 10月, 2016 1 次提交
  5. 08 10月, 2016 7 次提交
    • A
      vfs: Remove {get,set,remove}xattr inode operations · fd50ecad
      Andreas Gruenbacher 提交于
      These inode operations are no longer used; remove them.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      fd50ecad
    • A
      cred: simpler, 1D supplementary groups · 81243eac
      Alexey Dobriyan 提交于
      Current supplementary groups code can massively overallocate memory and
      is implemented in a way so that access to individual gid is done via 2D
      array.
      
      If number of gids is <= 32, memory allocation is more or less tolerable
      (140/148 bytes).  But if it is not, code allocates full page (!)
      regardless and, what's even more fun, doesn't reuse small 32-entry
      array.
      
      2D array means dependent shifts, loads and LEAs without possibility to
      optimize them (gid is never known at compile time).
      
      All of the above is unnecessary.  Switch to the usual
      trailing-zero-len-array scheme.  Memory is allocated with
      kmalloc/vmalloc() and only as much as needed.  Accesses become simpler
      (LEA 8(gi,idx,4) or even without displacement).
      
      Maximum number of gids is 65536 which translates to 256KB+8 bytes.  I
      think kernel can handle such allocation.
      
      On my usual desktop system with whole 9 (nine) aux groups, struct
      group_info shrinks from 148 bytes to 44 bytes, yay!
      
      Nice side effects:
      
       - "gi->gid[i]" is shorter than "GROUP_AT(gi, i)", less typing,
      
       - fix little mess in net/ipv4/ping.c
         should have been using GROUP_AT macro but this point becomes moot,
      
       - aux group allocation is persistent and should be accounted as such.
      
      Link: http://lkml.kernel.org/r/20160817201927.GA2096@p183.telecom.bySigned-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: Vasily Kulikov <segoon@openwall.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      81243eac
    • R
      mm, proc: fix region lost in /proc/self/smaps · 855af072
      Robert Ho 提交于
      Recently, Redhat reported that nvml test suite failed on QEMU/KVM,
      more detailed info please refer to:
      
         https://bugzilla.redhat.com/show_bug.cgi?id=1365721
      
      Actually, this bug is not only for NVDIMM/DAX but also for any other
      file systems.  This simple test case abstracted from nvml can easily
      reproduce this bug in common environment:
      
      -------------------------- testcase.c -----------------------------
      
      int
      is_pmem_proc(const void *addr, size_t len)
      {
              const char *caddr = addr;
      
              FILE *fp;
              if ((fp = fopen("/proc/self/smaps", "r")) == NULL) {
                      printf("!/proc/self/smaps");
                      return 0;
              }
      
              int retval = 0;         /* assume false until proven otherwise */
              char line[PROCMAXLEN];  /* for fgets() */
              char *lo = NULL;        /* beginning of current range in smaps file */
              char *hi = NULL;        /* end of current range in smaps file */
              int needmm = 0;         /* looking for mm flag for current range */
              while (fgets(line, PROCMAXLEN, fp) != NULL) {
                      static const char vmflags[] = "VmFlags:";
                      static const char mm[] = " wr";
      
                      /* check for range line */
                      if (sscanf(line, "%p-%p", &lo, &hi) == 2) {
                              if (needmm) {
                                      /* last range matched, but no mm flag found */
                                      printf("never found mm flag.\n");
                                      break;
                              } else if (caddr < lo) {
                                      /* never found the range for caddr */
                                      printf("#######no match for addr %p.\n", caddr);
                                      break;
                              } else if (caddr < hi) {
                                      /* start address is in this range */
                                      size_t rangelen = (size_t)(hi - caddr);
      
                                      /* remember that matching has started */
                                      needmm = 1;
      
                                      /* calculate remaining range to search for */
                                      if (len > rangelen) {
                                              len -= rangelen;
                                              caddr += rangelen;
                                              printf("matched %zu bytes in range "
                                                      "%p-%p, %zu left over.\n",
                                                              rangelen, lo, hi, len);
                                      } else {
                                              len = 0;
                                              printf("matched all bytes in range "
                                                              "%p-%p.\n", lo, hi);
                                      }
                              }
                      } else if (needmm && strncmp(line, vmflags,
                                              sizeof(vmflags) - 1) == 0) {
                              if (strstr(&line[sizeof(vmflags) - 1], mm) != NULL) {
                                      printf("mm flag found.\n");
                                      if (len == 0) {
                                              /* entire range matched */
                                              retval = 1;
                                              break;
                                      }
                                      needmm = 0;     /* saw what was needed */
                              } else {
                                      /* mm flag not set for some or all of range */
                                      printf("range has no mm flag.\n");
                                      break;
                              }
                      }
              }
      
              fclose(fp);
      
              printf("returning %d.\n", retval);
              return retval;
      }
      
      void *Addr;
      size_t Size;
      
      /*
       * worker -- the work each thread performs
       */
      static void *
      worker(void *arg)
      {
              int *ret = (int *)arg;
              *ret =  is_pmem_proc(Addr, Size);
              return NULL;
      }
      
      int main(int argc, char *argv[])
      {
              if (argc <  2 || argc > 3) {
                      printf("usage: %s file [env].\n", argv[0]);
                      return -1;
              }
      
              int fd = open(argv[1], O_RDWR);
      
              struct stat stbuf;
              fstat(fd, &stbuf);
      
              Size = stbuf.st_size;
              Addr = mmap(0, stbuf.st_size, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
      
              close(fd);
      
              pthread_t threads[NTHREAD];
              int ret[NTHREAD];
      
              /* kick off NTHREAD threads */
              for (int i = 0; i < NTHREAD; i++)
                      pthread_create(&threads[i], NULL, worker, &ret[i]);
      
              /* wait for all the threads to complete */
              for (int i = 0; i < NTHREAD; i++)
                      pthread_join(threads[i], NULL);
      
              /* verify that all the threads return the same value */
              for (int i = 1; i < NTHREAD; i++) {
                      if (ret[0] != ret[i]) {
                              printf("Error i %d ret[0] = %d ret[i] = %d.\n", i,
                                      ret[0], ret[i]);
                      }
              }
      
              printf("%d", ret[0]);
              return 0;
      }
      
      It failed as some threads can not find the memory region in
      "/proc/self/smaps" which is allocated in the main process
      
      It is caused by proc fs which uses 'file->version' to indicate the VMA that
      is the last one has already been handled by read() system call. When the
      next read() issues, it uses the 'version' to find the VMA, then the next
      VMA is what we want to handle, the related code is as follows:
      
              if (last_addr) {
                      vma = find_vma(mm, last_addr);
                      if (vma && (vma = m_next_vma(priv, vma)))
                              return vma;
              }
      
      However, VMA will be lost if the last VMA is gone, e.g:
      
      The process VMA list is A->B->C->D
      
      CPU 0                                  CPU 1
      read() system call
         handle VMA B
         version = B
      return to userspace
      
                                         unmap VMA B
      
      issue read() again to continue to get
      the region info
         find_vma(version) will get VMA C
         m_next_vma(C) will get VMA D
         handle D
         !!! VMA C is lost !!!
      
      In order to fix this bug, we make 'file->version' indicate the end address
      of the current VMA.  m_start will then look up a vma which with vma_start
      < last_vm_end and moves on to the next vma if we found the same or an
      overlapping vma.  This will guarantee that we will not miss an exclusive
      vma but we can still miss one if the previous vma was shrunk.  This is
      acceptable because guaranteeing "never miss a vma" is simply not feasible.
      User has to cope with some inconsistencies if the file is not read in one
      go.
      
      [mhocko@suse.com: changelog fixes]
      Link: http://lkml.kernel.org/r/1475296958-27652-1-git-send-email-robert.hu@intel.comAcked-by: NDave Hansen <dave.hansen@intel.com>
      Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
      Signed-off-by: NRobert Hu <robert.hu@intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Stefan Hajnoczi <stefanha@redhat.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      855af072
    • J
      proc: fix timerslack_ns CAP_SYS_NICE check when adjusting self · 4b2bd5fe
      John Stultz 提交于
      In changing from checking ptrace_may_access(p, PTRACE_MODE_ATTACH_FSCREDS)
      to capable(CAP_SYS_NICE), I missed that ptrace_my_access succeeds when p
      == current, but the CAP_SYS_NICE doesn't.
      
      Thus while the previous commit was intended to loosen the needed
      privileges to modify a processes timerslack, it needlessly restricted a
      task modifying its own timerslack via the proc/<tid>/timerslack_ns
      (which is permitted also via the PR_SET_TIMERSLACK method).
      
      This patch corrects this by checking if p == current before checking the
      CAP_SYS_NICE value.
      
      This patch applies on top of my two previous patches currently in -mm
      
      Link: http://lkml.kernel.org/r/1471906870-28624-1-git-send-email-john.stultz@linaro.orgSigned-off-by: NJohn Stultz <john.stultz@linaro.org>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: "Serge E. Hallyn" <serge@hallyn.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Oren Laadan <orenl@cellrox.com>
      Cc: Ruchi Kandoi <kandoiruchi@google.com>
      Cc: Rom Lemarchand <romlem@android.com>
      Cc: Todd Kjos <tkjos@google.com>
      Cc: Colin Cross <ccross@android.com>
      Cc: Nick Kralevich <nnk@google.com>
      Cc: Dmitry Shmidt <dimitrysh@google.com>
      Cc: Elliott Hughes <enh@google.com>
      Cc: Android Kernel Team <kernel-team@android.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4b2bd5fe
    • J
      proc: add LSM hook checks to /proc/<tid>/timerslack_ns · 904763e1
      John Stultz 提交于
      As requested, this patch checks the existing LSM hooks
      task_getscheduler/task_setscheduler when reading or modifying the task's
      timerslack value.
      
      Previous versions added new get/settimerslack LSM hooks, but since they
      checked the same PROCESS__SET/GETSCHED values as existing hooks, it was
      suggested we just use the existing ones.
      
      Link: http://lkml.kernel.org/r/1469132667-17377-2-git-send-email-john.stultz@linaro.orgSigned-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: "Serge E. Hallyn" <serge@hallyn.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Oren Laadan <orenl@cellrox.com>
      Cc: Ruchi Kandoi <kandoiruchi@google.com>
      Cc: Rom Lemarchand <romlem@android.com>
      Cc: Todd Kjos <tkjos@google.com>
      Cc: Colin Cross <ccross@android.com>
      Cc: Nick Kralevich <nnk@google.com>
      Cc: Dmitry Shmidt <dimitrysh@google.com>
      Cc: Elliott Hughes <enh@google.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Android Kernel Team <kernel-team@android.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      904763e1
    • J
      proc: relax /proc/<tid>/timerslack_ns capability requirements · 7abbaf94
      John Stultz 提交于
      When an interface to allow a task to change another tasks timerslack was
      first proposed, it was suggested that something greater then
      CAP_SYS_NICE would be needed, as a task could be delayed further then
      what normally could be done with nice adjustments.
      
      So CAP_SYS_PTRACE was adopted instead for what became the
      /proc/<tid>/timerslack_ns interface.  However, for Android (where this
      feature originates), giving the system_server CAP_SYS_PTRACE would allow
      it to observe and modify all tasks memory.  This is considered too high
      a privilege level for only needing to change the timerslack.
      
      After some discussion, it was realized that a CAP_SYS_NICE process can
      set a task as SCHED_FIFO, so they could fork some spinning processes and
      set them all SCHED_FIFO 99, in effect delaying all other tasks for an
      infinite amount of time.
      
      So as a CAP_SYS_NICE task can already cause trouble for other tasks,
      using it as a required capability for accessing and modifying
      /proc/<tid>/timerslack_ns seems sufficient.
      
      Thus, this patch loosens the capability requirements to CAP_SYS_NICE and
      removes CAP_SYS_PTRACE, simplifying some of the code flow as well.
      
      This is technically an ABI change, but as the feature just landed in
      4.6, I suspect no one is yet using it.
      
      Link: http://lkml.kernel.org/r/1469132667-17377-1-git-send-email-john.stultz@linaro.orgSigned-off-by: NJohn Stultz <john.stultz@linaro.org>
      Reviewed-by: NNick Kralevich <nnk@google.com>
      Acked-by: NSerge Hallyn <serge@hallyn.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: "Serge E. Hallyn" <serge@hallyn.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Oren Laadan <orenl@cellrox.com>
      Cc: Ruchi Kandoi <kandoiruchi@google.com>
      Cc: Rom Lemarchand <romlem@android.com>
      Cc: Todd Kjos <tkjos@google.com>
      Cc: Colin Cross <ccross@android.com>
      Cc: Nick Kralevich <nnk@google.com>
      Cc: Dmitry Shmidt <dimitrysh@google.com>
      Cc: Elliott Hughes <enh@google.com>
      Cc: Android Kernel Team <kernel-team@android.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7abbaf94
    • J
      meminfo: break apart a very long seq_printf with #ifdefs · e16e2d8e
      Joe Perches 提交于
      Use a specific routine to emit most lines so that the code is easier to
      read and maintain.
      
      akpm:
         text    data     bss     dec     hex filename
         2976       8       0    2984     ba8 fs/proc/meminfo.o before
         2669       8       0    2677     a75 fs/proc/meminfo.o after
      
      Link: http://lkml.kernel.org/r/8fce7fdef2ba081a4ef531594e97da8a9feebb58.1470810406.git.joe@perches.comSigned-off-by: NJoe Perches <joe@perches.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e16e2d8e