1. 14 8月, 2013 3 次提交
  2. 18 7月, 2013 1 次提交
  3. 04 7月, 2013 14 次提交
    • Z
      fs/proc/kcore.c: using strlcpy() instead of strncpy() · 30bc30df
      Zhao Hongjiang 提交于
      For NUL terminated string, set '\0' at the end.
      Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      30bc30df
    • O
      fs/proc/uptime.c:uptime_proc_show(): use get_monotonic_boottime() · 1d98a5fa
      Oleg Nesterov 提交于
      Change uptime_proc_show() to use get_monotonic_boottime() instead of
      do_posix_clock_monotonic_gettime() + monotonic_to_bootbased().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: NJohn Stultz <johnstul@us.ibm.com>
      Cc: Tomas Janousek <tjanouse@redhat.com>
      Cc: Tomas Smetana <tsmetana@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1d98a5fa
    • H
      vmcore: support mmap() on /proc/vmcore · 83086978
      HATAYAMA Daisuke 提交于
      This patch introduces mmap_vmcore().
      
      Don't permit writable nor executable mapping even with mprotect()
      because this mmap() is aimed at reading crash dump memory.  Non-writable
      mapping is also requirement of remap_pfn_range() when mapping linear
      pages on non-consecutive physical pages; see is_cow_mapping().
      
      Set VM_MIXEDMAP flag to remap memory by remap_pfn_range and by
      remap_vmalloc_range_pertial at the same time for a single vma.
      do_munmap() can correctly clean partially remapped vma with two
      functions in abnormal case.  See zap_pte_range(), vm_normal_page() and
      their comments for details.
      
      On x86-32 PAE kernels, mmap() supports at most 16TB memory only.  This
      limitation comes from the fact that the third argument of
      remap_pfn_range(), pfn, is of 32-bit length on x86-32: unsigned long.
      
      [akpm@linux-foundation.org: use min(), switch to conventional error-unwinding approach]
      Signed-off-by: NHATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
      Cc: Lisa Mitchell <lisa.mitchell@hp.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Tested-by: NMaxim Uvarov <muvarov@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      83086978
    • H
      vmcore: calculate vmcore file size from buffer size and total size of vmcore objects · 591ff716
      HATAYAMA Daisuke 提交于
      The previous patches newly added holes before each chunk of memory and
      the holes need to be count in vmcore file size.  There are two ways to
      count file size in such a way:
      
      1) suppose m is a poitner to the last vmcore object in vmcore_list.
         Then file size is (m->offset + m->size), or
      
      2) calculate sum of size of buffers for ELF header, program headers,
         ELF note segments and objects in vmcore_list.
      
      Although 1) is more direct and simpler than 2), 2) seems better in that
      it reflects internal object structure of /proc/vmcore.  Thus, this patch
      changes get_vmcore_size_elf{64, 32} so that it calculates size in the
      way of 2).
      
      As a result, both get_vmcore_size_elf{64, 32} have the same definition.
      Merge them as get_vmcore_size.
      Signed-off-by: NHATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
      Cc: Lisa Mitchell <lisa.mitchell@hp.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      591ff716
    • H
      vmcore: allow user process to remap ELF note segment buffer · ef9e78fd
      HATAYAMA Daisuke 提交于
      Now ELF note segment has been copied in the buffer on vmalloc memory.
      To allow user process to remap the ELF note segment buffer with
      remap_vmalloc_page, the corresponding VM area object has to have
      VM_USERMAP flag set.
      
      [akpm@linux-foundation.org: use the conventional comment layout]
      Signed-off-by: NHATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
      Cc: Lisa Mitchell <lisa.mitchell@hp.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ef9e78fd
    • H
      vmcore: allocate ELF note segment in the 2nd kernel vmalloc memory · 087350c9
      HATAYAMA Daisuke 提交于
      The reasons why we don't allocate ELF note segment in the 1st kernel
      (old memory) on page boundary is to keep backward compatibility for old
      kernels, and that if doing so, we waste not a little memory due to
      round-up operation to fit the memory to page boundary since most of the
      buffers are in per-cpu area.
      
      ELF notes are per-cpu, so total size of ELF note segments depends on
      number of CPUs.  The current maximum number of CPUs on x86_64 is 5192,
      and there's already system with 4192 CPUs in SGI, where total size
      amounts to 1MB.  This can be larger in the near future or possibly even
      now on another architecture that has larger size of note per a single
      cpu.  Thus, to avoid the case where memory allocation for large block
      fails, we allocate vmcore objects on vmalloc memory.
      
      This patch adds elfnotes_buf and elfnotes_sz variables to keep pointer
      to the ELF note segment buffer and its size.  There's no longer the
      vmcore object that corresponds to the ELF note segment in vmcore_list.
      Accordingly, read_vmcore() has new case for ELF note segment and
      set_vmcore_list_offsets_elf{64,32}() and other helper functions starts
      calculating offset from sum of size of ELF headers and size of ELF note
      segment.
      
      [akpm@linux-foundation.org: use min(), fix error-path vzalloc() leaks]
      Signed-off-by: NHATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
      Cc: Lisa Mitchell <lisa.mitchell@hp.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      087350c9
    • H
      vmcore: treat memory chunks referenced by PT_LOAD program header entries in... · 7f614cd1
      HATAYAMA Daisuke 提交于
      vmcore: treat memory chunks referenced by PT_LOAD program header entries in page-size boundary in vmcore_list
      
      Treat memory chunks referenced by PT_LOAD program header entries in
      page-size boundary in vmcore_list.  Formally, for each range [start,
      end], we set up the corresponding vmcore object in vmcore_list to
      [rounddown(start, PAGE_SIZE), roundup(end, PAGE_SIZE)].
      
      This change affects layout of /proc/vmcore.  The gaps generated by the
      rearrangement are newly made visible to applications as holes.
      Concretely, they are two ranges [rounddown(start, PAGE_SIZE), start] and
      [end, roundup(end, PAGE_SIZE)].
      
      Suppose variable m points at a vmcore object in vmcore_list, and
      variable phdr points at the program header of PT_LOAD type the variable
      m corresponds to.  Then, pictorially:
      
        m->offset                    +---------------+
                                     | hole          |
      phdr->p_offset =               +---------------+
        m->offset + (paddr - start)  |               |\
                                     | kernel memory | phdr->p_memsz
                                     |               |/
                                     +---------------+
                                     | hole          |
        m->offset + m->size          +---------------+
      
      where m->offset and m->offset + m->size are always page-size aligned.
      Signed-off-by: NHATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
      Cc: Lisa Mitchell <lisa.mitchell@hp.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7f614cd1
    • H
      vmcore: allocate buffer for ELF headers on page-size alignment · f2bdacdd
      HATAYAMA Daisuke 提交于
      Allocate ELF headers on page-size boundary using __get_free_pages()
      instead of kmalloc().
      
      Later patch will merge PT_NOTE entries into a single unique one and
      decrease the buffer size actually used.  Keep original buffer size in
      variable elfcorebuf_sz_orig to kfree the buffer later and actually used
      buffer size with rounded up to page-size boundary in variable
      elfcorebuf_sz separately.
      
      The size of part of the ELF buffer exported from /proc/vmcore is
      elfcorebuf_sz.
      
      The merged, removed PT_NOTE entries, i.e.  the range [elfcorebuf_sz,
      elfcorebuf_sz_orig], is filled with 0.
      
      Use size of the ELF headers as an initial offset value in
      set_vmcore_list_offsets_elf{64,32} and
      process_ptload_program_headers_elf{64,32} in order to indicate that the
      offset includes the holes towards the page boundary.
      
      As a result, both set_vmcore_list_offsets_elf{64,32} have the same
      definition.  Merge them as set_vmcore_list_offsets.
      
      [akpm@linux-foundation.org: add free_elfcorebuf(), cleanups]
      Signed-off-by: NHATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
      Cc: Lisa Mitchell <lisa.mitchell@hp.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f2bdacdd
    • H
      vmcore: clean up read_vmcore() · b27eb186
      HATAYAMA Daisuke 提交于
      Rewrite part of read_vmcore() that reads objects in vmcore_list in the
      same way as part reading ELF headers, by which some duplicated and
      redundant codes are removed.
      Signed-off-by: NHATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
      Cc: Lisa Mitchell <lisa.mitchell@hp.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b27eb186
    • P
      pagemap: prepare to reuse constant bits with page-shift · 541c237c
      Pavel Emelyanov 提交于
      In order to reuse bits from pagemap entries gracefully, we leave the
      entries as is but on pagemap open emit a warning in dmesg, that bits
      55-60 are about to change in a couple of releases.  Next, if a user
      issues soft-dirty clear command via the clear_refs file (it was disabled
      before v3.9) we assume that he's aware of the new pagemap format, note
      that fact and report the bits in pagemap in the new manner.
      
      The "migration strategy" looks like this then:
      
      1. existing users are not affected -- they don't touch soft-dirty feature, thus
         see old bits in pagemap, but are warned and have time to fix themselves
      2. those who use soft-dirty know about new pagemap format
      3. some time soon we get rid of any signs of page-shift in pagemap as well as
         this trick with clear-soft-dirty affecting pagemap format.
      Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Cc: Glauber Costa <glommer@parallels.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      541c237c
    • P
      mm: soft-dirty bits for user memory changes tracking · 0f8975ec
      Pavel Emelyanov 提交于
      The soft-dirty is a bit on a PTE which helps to track which pages a task
      writes to.  In order to do this tracking one should
      
        1. Clear soft-dirty bits from PTEs ("echo 4 > /proc/PID/clear_refs)
        2. Wait some time.
        3. Read soft-dirty bits (55'th in /proc/PID/pagemap2 entries)
      
      To do this tracking, the writable bit is cleared from PTEs when the
      soft-dirty bit is.  Thus, after this, when the task tries to modify a
      page at some virtual address the #PF occurs and the kernel sets the
      soft-dirty bit on the respective PTE.
      
      Note, that although all the task's address space is marked as r/o after
      the soft-dirty bits clear, the #PF-s that occur after that are processed
      fast.  This is so, since the pages are still mapped to physical memory,
      and thus all the kernel does is finds this fact out and puts back
      writable, dirty and soft-dirty bits on the PTE.
      
      Another thing to note, is that when mremap moves PTEs they are marked
      with soft-dirty as well, since from the user perspective mremap modifies
      the virtual memory at mremap's new address.
      Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Cc: Glauber Costa <glommer@parallels.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0f8975ec
    • P
      pagemap: introduce pagemap_entry_t without pmshift bits · 2b0a9f01
      Pavel Emelyanov 提交于
      These bits are always constant (== PAGE_SHIFT) and just occupy space in
      the entry.  Moreover, in next patch we will need to report one more bit
      in the pagemap, but all bits are already busy on it.
      
      That said, describe the pagemap entry that has 6 more free zero bits.
      Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Cc: Glauber Costa <glommer@parallels.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2b0a9f01
    • P
      clear_refs: introduce private struct for mm_walk · af9de7eb
      Pavel Emelyanov 提交于
      In the next patch the clear-refs-type will be required in
      clear_refs_pte_range funciton, so prepare the walk->private to carry
      this info.
      Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Cc: Glauber Costa <glommer@parallels.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      af9de7eb
    • P
      clear_refs: sanitize accepted commands declaration · 040fa020
      Pavel Emelyanov 提交于
      This is the implementation of the soft-dirty bit concept that should
      help keep track of changes in user memory, which in turn is very-very
      required by the checkpoint-restore project (http://criu.org).
      
      To create a dump of an application(s) we save all the information about
      it to files, and the biggest part of such dump is the contents of tasks'
      memory.  However, there are usage scenarios where it's not required to
      get _all_ the task memory while creating a dump.  For example, when
      doing periodical dumps, it's only required to take full memory dump only
      at the first step and then take incremental changes of memory.  Another
      example is live migration.  We copy all the memory to the destination
      node without stopping all tasks, then stop them, check for what pages
      has changed, dump it and the rest of the state, then copy it to the
      destination node.  This decreases freeze time significantly.
      
      That said, some help from kernel to watch how processes modify the
      contents of their memory is required.
      
      The proposal is to track changes with the help of new soft-dirty bit
      this way:
      
      1. First do "echo 4 > /proc/$pid/clear_refs".
         At that point kernel clears the soft dirty _and_ the writable bits from all
         ptes of process $pid. From now on every write to any page will result in #pf
         and the subsequent call to pte_mkdirty/pmd_mkdirty, which in turn will set
         the soft dirty flag.
      
      2. Then read the /proc/$pid/pagemap2 and check the soft-dirty bit reported there
         (the 55'th one). If set, the respective pte was written to since last call
         to clear refs.
      
      The soft-dirty bit is the _PAGE_BIT_HIDDEN one.  Although it's used by
      kmemcheck, the latter one marks kernel pages with it, while the former
      bit is put on user pages so they do not conflict to each other.
      
      This patch:
      
      A new clear-refs type will be added in the next patch, so prepare
      code for that.
      
      [akpm@linux-foundation.org: don't assume that sizeof(enum clear_refs_types) == sizeof(int)]
      Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Cc: Glauber Costa <glommer@parallels.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      040fa020
  4. 29 6月, 2013 6 次提交
  5. 13 6月, 2013 1 次提交
    • K
      kmsg: honor dmesg_restrict sysctl on /dev/kmsg · 637241a9
      Kees Cook 提交于
      The dmesg_restrict sysctl currently covers the syslog method for access
      dmesg, however /dev/kmsg isn't covered by the same protections.  Most
      people haven't noticed because util-linux dmesg(1) defaults to using the
      syslog method for access in older versions.  With util-linux dmesg(1)
      defaults to reading directly from /dev/kmsg.
      
      To fix /dev/kmsg, let's compare the existing interfaces and what they
      allow:
      
       - /proc/kmsg allows:
        - open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
          single-reader interface (SYSLOG_ACTION_READ).
        - everything, after an open.
      
       - syslog syscall allows:
        - anything, if CAP_SYSLOG.
        - SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
          dmesg_restrict==0.
        - nothing else (EPERM).
      
      The use-cases were:
       - dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
       - sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
         destructive SYSLOG_ACTION_READs.
      
      AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
      clear the ring buffer.
      
      Based on the comments in devkmsg_llseek, it sounds like actions besides
      reading aren't going to be supported by /dev/kmsg (i.e.
      SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
      syslog syscall actions.
      
      To this end, move the check as Josh had done, but also rename the
      constants to reflect their new uses (SYSLOG_FROM_CALL becomes
      SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
      SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
      allows destructive actions after a capabilities-constrained
      SYSLOG_ACTION_OPEN check.
      
       - /dev/kmsg allows:
        - open if CAP_SYSLOG or dmesg_restrict==0
        - reading/polling, after open
      
      Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192
      
      [akpm@linux-foundation.org: use pr_warn_once()]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Reported-by: NChristian Kujau <lists@nerdbynature.de>
      Tested-by: NJosh Boyer <jwboyer@redhat.com>
      Cc: Kay Sievers <kay@vrfy.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      637241a9
  6. 28 5月, 2013 1 次提交
  7. 05 5月, 2013 1 次提交
  8. 02 5月, 2013 12 次提交
  9. 01 5月, 2013 1 次提交