You need to sign in or sign up before continuing.
  1. 20 1月, 2016 1 次提交
  2. 18 1月, 2016 1 次提交
    • S
      dmaengine: rcar-dmac: Document SoC specific bindings · 6bf64103
      Simon Horman 提交于
      In general Renesas hardware is not documented to the extent where the
      relationship between IP blocks on different SoCs can be assumed although
      they may appear to operate the same way. Furthermore the documentation
      typically does not specify a version for individual IP blocks. For these
      reasons a convention of using the SoC name in place of a version and
      providing SoC-specific compat strings has been adopted.
      
      Although not universally liked this convention is used in the bindings for
      most drivers for Renesas hardware. The purpose of this patch is to
      update the Renesas R-Car DMA Controller driver to follow this convention.
      
      Cc: devicetree@vger.kernel.org
      Cc: Magnus Damm <magnus.damm@gmail.com>
      Cc: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
      Signed-off-by: NSimon Horman <horms+renesas@verge.net.au>
      Acked-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Acked-by: NRob Herring <robh@kernel.org>
      Signed-off-by: NVinod Koul <vinod.koul@intel.com>
      6bf64103
  3. 17 1月, 2016 1 次提交
  4. 16 1月, 2016 3 次提交
  5. 15 1月, 2016 10 次提交
    • R
      Documentation/filesystems: describe the shared memory usage/accounting · 0bc126d4
      Rodrigo Freire 提交于
      The Shared Memory accounting support is present in Kernel since commit
      4b02108a ("mm: oom analysis: add shmem vmstat") and in userland
      free(1) since 2014.  This patch updates the Documentation to reflect
      this change.
      Signed-off-by: NRodrigo Freire <rfreire@redhat.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0bc126d4
    • J
      mm: memcontrol: account socket memory in unified hierarchy memory controller · f7e1cb6e
      Johannes Weiner 提交于
      Socket memory can be a significant share of overall memory consumed by
      common workloads.  In order to provide reasonable resource isolation in
      the unified hierarchy, this type of memory needs to be included in the
      tracking/accounting of a cgroup under active memory resource control.
      
      Overhead is only incurred when a non-root control group is created AND
      the memory controller is instructed to track and account the memory
      footprint of that group.  cgroup.memory=nosocket can be specified on the
      boot commandline to override any runtime configuration and forcibly
      exclude socket memory from active memory resource control.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Reviewed-by: NVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f7e1cb6e
    • D
      mm: mmap: add new /proc tunable for mmap_base ASLR · d07e2259
      Daniel Cashman 提交于
      Address Space Layout Randomization (ASLR) provides a barrier to
      exploitation of user-space processes in the presence of security
      vulnerabilities by making it more difficult to find desired code/data
      which could help an attack.  This is done by adding a random offset to
      the location of regions in the process address space, with a greater
      range of potential offset values corresponding to better protection/a
      larger search-space for brute force, but also to greater potential for
      fragmentation.
      
      The offset added to the mmap_base address, which provides the basis for
      the majority of the mappings for a process, is set once on process exec
      in arch_pick_mmap_layout() and is done via hard-coded per-arch values,
      which reflect, hopefully, the best compromise for all systems.  The
      trade-off between increased entropy in the offset value generation and
      the corresponding increased variability in address space fragmentation
      is not absolute, however, and some platforms may tolerate higher amounts
      of entropy.  This patch introduces both new Kconfig values and a sysctl
      interface which may be used to change the amount of entropy used for
      offset generation on a system.
      
      The direct motivation for this change was in response to the
      libstagefright vulnerabilities that affected Android, specifically to
      information provided by Google's project zero at:
      
        http://googleprojectzero.blogspot.com/2015/09/stagefrightened.html
      
      The attack presented therein, by Google's project zero, specifically
      targeted the limited randomness used to generate the offset added to the
      mmap_base address in order to craft a brute-force-based attack.
      Concretely, the attack was against the mediaserver process, which was
      limited to respawning every 5 seconds, on an arm device.  The hard-coded
      8 bits used resulted in an average expected success rate of defeating
      the mmap ASLR after just over 10 minutes (128 tries at 5 seconds a
      piece).  With this patch, and an accompanying increase in the entropy
      value to 16 bits, the same attack would take an average expected time of
      over 45 hours (32768 tries), which makes it both less feasible and more
      likely to be noticed.
      
      The introduced Kconfig and sysctl options are limited by per-arch
      minimum and maximum values, the minimum of which was chosen to match the
      current hard-coded value and the maximum of which was chosen so as to
      give the greatest flexibility without generating an invalid mmap_base
      address, generally a 3-4 bits less than the number of bits in the
      user-space accessible virtual address space.
      
      When decided whether or not to change the default value, a system
      developer should consider that mmap_base address could be placed
      anywhere up to 2^(value) bits away from the non-randomized location,
      which would introduce variable-sized areas above and below the mmap_base
      address such that the maximum vm_area_struct size may be reduced,
      preventing very large allocations.
      
      This patch (of 4):
      
      ASLR only uses as few as 8 bits to generate the random offset for the
      mmap base address on 32 bit architectures.  This value was chosen to
      prevent a poorly chosen value from dividing the address space in such a
      way as to prevent large allocations.  This may not be an issue on all
      platforms.  Allow the specification of a minimum number of bits so that
      platforms desiring greater ASLR protection may determine where to place
      the trade-off.
      Signed-off-by: NDaniel Cashman <dcashman@google.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mark Salyzyn <salyzyn@android.com>
      Cc: Jeff Vander Stoep <jeffv@google.com>
      Cc: Nick Kralevich <nnk@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Hector Marco-Gisbert <hecmargi@upv.es>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d07e2259
    • J
      mm, procfs: breakdown RSS for anon, shmem and file in /proc/pid/status · 8cee852e
      Jerome Marchand 提交于
      There are several shortcomings with the accounting of shared memory
      (SysV shm, shared anonymous mapping, mapping of a tmpfs file).  The
      values in /proc/<pid>/status and <...>/statm don't allow to distinguish
      between shmem memory and a shared mapping to a regular file, even though
      theirs implication on memory usage are quite different: during reclaim,
      file mapping can be dropped or written back on disk, while shmem needs a
      place in swap.
      
      Also, to distinguish the memory occupied by anonymous and file mappings,
      one has to read the /proc/pid/statm file, which has a field for the file
      mappings (again, including shmem) and total memory occupied by these
      mappings (i.e.  equivalent to VmRSS in the <...>/status file.  Getting
      the value for anonymous mappings only is thus not exactly user-friendly
      (the statm file is intended to be rather efficiently machine-readable).
      
      To address both of these shortcomings, this patch adds a breakdown of
      VmRSS in /proc/<pid>/status via new fields RssAnon, RssFile and
      RssShmem, making use of the previous preparatory patch.  These fields
      tell the user the memory occupied by private anonymous pages, mapped
      regular files and shmem, respectively.  Other existing fields in /status
      and /statm files are left without change.  The /statm file can be
      extended in the future, if there's a need for that.
      
      Example (part of) /proc/pid/status output including the new Rss* fields:
      
      VmPeak:  2001008 kB
      VmSize:  2001004 kB
      VmLck:         0 kB
      VmPin:         0 kB
      VmHWM:      5108 kB
      VmRSS:      5108 kB
      RssAnon:              92 kB
      RssFile:            1324 kB
      RssShmem:           3692 kB
      VmData:      192 kB
      VmStk:       136 kB
      VmExe:         4 kB
      VmLib:      1784 kB
      VmPTE:      3928 kB
      VmPMD:        20 kB
      VmSwap:        0 kB
      HugetlbPages:          0 kB
      
      [vbabka@suse.cz: forward-porting, tweak changelog]
      Signed-off-by: NJerome Marchand <jmarchan@redhat.com>
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8cee852e
    • V
      mm, proc: account for shmem swap in /proc/pid/smaps · c261e7d9
      Vlastimil Babka 提交于
      Currently, /proc/pid/smaps will always show "Swap: 0 kB" for
      shmem-backed mappings, even if the mapped portion does contain pages
      that were swapped out.  This is because unlike private anonymous
      mappings, shmem does not change pte to swap entry, but pte_none when
      swapping the page out.  In the smaps page walk, such page thus looks
      like it was never faulted in.
      
      This patch changes smaps_pte_entry() to determine the swap status for
      such pte_none entries for shmem mappings, similarly to how
      mincore_page() does it.  Swapped out shmem pages are thus accounted for.
      For private mappings of tmpfs files that COWed some of the pages, swaped
      out status of the original shmem pages is naturally ignored.  If some of
      the private copies was also swapped out, they are accounted via their
      page table swap entries, so the resulting reported swap usage is then a
      sum of both swapped out private copies, and swapped out shmem pages that
      were not COWed.  No double accounting can thus happen.
      
      The accounting is arguably still not as precise as for private anonymous
      mappings, since now we will count also pages that the process in
      question never accessed, but another process populated them and then let
      them become swapped out.  I believe it is still less confusing and
      subtle than not showing any swap usage by shmem mappings at all.
      Swapped out counter might of interest of users who would like to prevent
      from future swapins during performance critical operation and pre-fault
      them at their convenience.  Especially for larger swapped out regions
      the cost of swapin is much higher than a fresh page allocation.  So a
      differentiation between pte_none vs.  swapped out is important for those
      usecases.
      
      One downside of this patch is that it makes /proc/pid/smaps more
      expensive for shmem mappings, as we consult the radix tree for each
      pte_none entry, so the overal complexity is O(n*log(n)).  I have
      measured this on a process that creates a 2GB mapping and dirties single
      pages with a stride of 2MB, and time how long does it take to cat
      /proc/pid/smaps of this process 100 times.
      
      Private anonymous mapping:
      
      real    0m0.949s
      user    0m0.116s
      sys     0m0.348s
      
      Mapping of a /dev/shm/file:
      
      real    0m3.831s
      user    0m0.180s
      sys     0m3.212s
      
      The difference is rather substantial, so the next patch will reduce the
      cost for shared or read-only mappings.
      
      In a less controlled experiment, I've gathered pids of processes on my
      desktop that have either '/dev/shm/*' or 'SYSV*' in smaps.  This
      included the Chrome browser and some KDE processes.  Again, I've run cat
      /proc/pid/smaps on each 100 times.
      
      Before this patch:
      
      real    0m9.050s
      user    0m0.518s
      sys     0m8.066s
      
      After this patch:
      
      real    0m9.221s
      user    0m0.541s
      sys     0m8.187s
      
      This suggests low impact on average systems.
      
      Note that this patch doesn't attempt to adjust the SwapPss field for
      shmem mappings, which would need extra work to determine who else could
      have the pages mapped.  Thus the value stays zero except for COWed
      swapped out pages in a shmem mapping, which are accounted as usual.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Acked-by: NJerome Marchand <jmarchan@redhat.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c261e7d9
    • V
      mm, documentation: clarify /proc/pid/status VmSwap limitations for shmem · bf9683d6
      Vlastimil Babka 提交于
      This series is based on Jerome Marchand's [1] so let me quote the first
      paragraph from there:
      
      There are several shortcomings with the accounting of shared memory
      (sysV shm, shared anonymous mapping, mapping to a tmpfs file).  The
      values in /proc/<pid>/status and statm don't allow to distinguish
      between shmem memory and a shared mapping to a regular file, even though
      their implications on memory usage are quite different: at reclaim, file
      mapping can be dropped or written back on disk while shmem needs a place
      in swap.  As for shmem pages that are swapped-out or in swap cache, they
      aren't accounted at all.
      
      The original motivation for myself is that a customer found (IMHO
      rightfully) confusing that e.g.  top output for process swap usage is
      unreliable with respect to swapped out shmem pages, which are not
      accounted for.
      
      The fundamental difference between private anonymous and shmem pages is
      that the latter has PTE's converted to pte_none, and not swapents.  As
      such, they are not accounted to the number of swapents visible e.g.  in
      /proc/pid/status VmSwap row.  It might be theoretically possible to use
      swapents when swapping out shmem (without extra cost, as one has to
      change all mappers anyway), and on swap in only convert the swapent for
      the faulting process, leaving swapents in other processes until they
      also fault (so again no extra cost).  But I don't know how many
      assumptions this would break, and it would be too disruptive change for
      a relatively small benefit.
      
      Instead, my approach is to document the limitation of VmSwap, and
      provide means to determine the swap usage for shmem areas for those who
      are interested and willing to pay the price, using /proc/pid/smaps.
      Because outside of ipcs, I don't think it's possible to currently to
      determine the usage at all.  The previous patchset [1] did introduce new
      shmem-specific fields into smaps output, and functions to determine the
      values.  I take a simpler approach, noting that smaps output already has
      a "Swap: X kB" line, where currently X == 0 always for shmem areas.  I
      think we can just consider this a bug and provide the proper value by
      consulting the radix tree, as e.g.  mincore_page() does.  In the patch
      changelog I explain why this is also not perfect (and cannot be without
      swapents), but still arguably much better than showing a 0.
      
      The last two patches are adapted from Jerome's patchset and provide a
      VmRSS breakdown to RssAnon, RssFile and RssShm in /proc/pid/status.
      Hugh noted that this is a welcome addition, and I agree that it might
      help e.g.  debugging process memory usage at albeit non-zero, but still
      rather low cost of extra per-mm counter and some page flag checks.
      
      [1] http://lwn.net/Articles/611966/
      
      This patch (of 6):
      
      The documentation for /proc/pid/status does not mention that the value
      of VmSwap counts only swapped out anonymous private pages, and not
      swapped out pages of the underlying shmem objects (for shmem mappings).
      This is not obvious, so document this limitation.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NJerome Marchand <jmarchan@redhat.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bf9683d6
    • A
      Make sure that highmem pages are not added to symlink page cache · e8ecde25
      Al Viro 提交于
      inode_nohighmem() is sufficient to make sure that page_get_link()
      won't try to allocate a highmem page.  Moreover, it is sufficient
      to make sure that page_symlink/__page_symlink won't do the same
      thing.  However, any filesystem that manually preseeds the symlink's
      page cache upon symlink(2) needs to make sure that the page it
      inserts there won't be a highmem one.
      
      Fortunately, only nfs and shmem have run afoul of that...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e8ecde25
    • L
      thermal: add description for integral_cutoff unit · ec3fc58b
      Leo Yan 提交于
      Add more explicitly description for unit of integral_cutoff which used
      by power allocator governor.
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Acked-by: NJavi Merino <javi.merino@arm.com>
      Signed-off-by: NJonathan Corbet <corbet@lwn.net>
      ec3fc58b
    • S
      Documentation: update libhugetlbfs site url · ceec86ec
      SeongJae Park 提交于
      The site for libhugetlbfs has moved from sourceforge to github. This
      commit updates the old url.
      Signed-off-by: NSeongJae Park <sj38.park@gmail.com>
      Acked-by: NMike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: NJonathan Corbet <corbet@lwn.net>
      ceec86ec
    • B
      Documentation: Explain pci=conf1,conf2 more verbosely · afd8c084
      Borislav Petkov 提交于
      People complained that setting the PCI config space access mechanism
      through "pci=conf1" or "pci=conf2" on the command line is not really
      documented. Yeah, can you blame them? Look at what we have now.
      
      So try to improve the situation a bit by explaining what those "conf1"
      and "conf2" things actually mean.
      
      See http://wiki.osdev.org/PCI for more info.
      Suggested-by: NEric Morton <Eric.Morton@amd.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      [jc: Added the above URL to the document too]
      Signed-off-by: NJonathan Corbet <corbet@lwn.net>
      afd8c084
  6. 14 1月, 2016 4 次提交
  7. 13 1月, 2016 1 次提交
    • M
      asm-generic: implement virt_xxx memory barriers · 6a65d263
      Michael S. Tsirkin 提交于
      Guests running within virtual machines might be affected by SMP effects even if
      the guest itself is compiled without SMP support.  This is an artifact of
      interfacing with an SMP host while running an UP kernel.  Using mandatory
      barriers for this use-case would be possible but is often suboptimal.
      
      In particular, virtio uses a bunch of confusing ifdefs to work around
      this, while xen just uses the mandatory barriers.
      
      To better handle this case, low-level virt_mb() etc macros are made available.
      These are implemented trivially using the low-level __smp_xxx macros,
      the purpose of these wrappers is to annotate those specific cases.
      
      These have the same effect as smp_mb() etc when SMP is enabled, but generate
      identical code for SMP and non-SMP systems. For example, virtual machine guests
      should use virt_mb() rather than smp_mb() when synchronizing against a
      (possibly SMP) host.
      Suggested-by: NDavid Miller <davem@davemloft.net>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      6a65d263
  8. 12 1月, 2016 7 次提交
  9. 11 1月, 2016 12 次提交