1. 09 5月, 2018 2 次提交
  2. 08 5月, 2018 2 次提交
  3. 07 5月, 2018 1 次提交
    • C
      PCI: remove PCI_DMA_BUS_IS_PHYS · 325ef185
      Christoph Hellwig 提交于
      This was used by the ide, scsi and networking code in the past to
      determine if they should bounce payloads.  Now that the dma mapping
      always have to support dma to all physical memory (thanks to swiotlb
      for non-iommu systems) there is no need to this crude hack any more.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Acked-by: Palmer Dabbelt <palmer@sifive.com> (for riscv)
      Reviewed-by: NJens Axboe <axboe@kernel.dk>
      325ef185
  4. 19 4月, 2018 2 次提交
    • M
      MIPS: uaccess: Add micromips clobbers to bzero invocation · b3d7e55c
      Matt Redfearn 提交于
      The micromips implementation of bzero additionally clobbers registers t7
      & t8. Specify this in the clobbers list when invoking bzero.
      
      Fixes: 26c5e07d ("MIPS: microMIPS: Optimise 'memset' core library function.")
      Reported-by: NJames Hogan <jhogan@kernel.org>
      Signed-off-by: NMatt Redfearn <matt.redfearn@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: <stable@vger.kernel.org> # 3.10+
      Patchwork: https://patchwork.linux-mips.org/patch/19110/Signed-off-by: NJames Hogan <jhogan@kernel.org>
      b3d7e55c
    • M
      MIPS: memset.S: Fix clobber of v1 in last_fixup · c96eebf0
      Matt Redfearn 提交于
      The label .Llast_fixup\@ is jumped to on page fault within the final
      byte set loop of memset (on < MIPSR6 architectures). For some reason, in
      this fault handler, the v1 register is randomly set to a2 & STORMASK.
      This clobbers v1 for the calling function. This can be observed with the
      following test code:
      
      static int __init __attribute__((optimize("O0"))) test_clear_user(void)
      {
        register int t asm("v1");
        char *test;
        int j, k;
      
        pr_info("\n\n\nTesting clear_user\n");
        test = vmalloc(PAGE_SIZE);
      
        for (j = 256; j < 512; j++) {
          t = 0xa5a5a5a5;
          if ((k = clear_user(test + PAGE_SIZE - 256, j)) != j - 256) {
              pr_err("clear_user (%px %d) returned %d\n", test + PAGE_SIZE - 256, j, k);
          }
          if (t != 0xa5a5a5a5) {
             pr_err("v1 was clobbered to 0x%x!\n", t);
          }
        }
      
        return 0;
      }
      late_initcall(test_clear_user);
      
      Which demonstrates that v1 is indeed clobbered (MIPS64):
      
      Testing clear_user
      v1 was clobbered to 0x1!
      v1 was clobbered to 0x2!
      v1 was clobbered to 0x3!
      v1 was clobbered to 0x4!
      v1 was clobbered to 0x5!
      v1 was clobbered to 0x6!
      v1 was clobbered to 0x7!
      
      Since the number of bytes that could not be set is already contained in
      a2, the andi placing a value in v1 is not necessary and actively
      harmful in clobbering v1.
      Reported-by: NJames Hogan <jhogan@kernel.org>
      Signed-off-by: NMatt Redfearn <matt.redfearn@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: stable@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/19109/Signed-off-by: NJames Hogan <jhogan@kernel.org>
      c96eebf0
  5. 17 4月, 2018 2 次提交
    • M
      MIPS: memset.S: Fix return of __clear_user from Lpartial_fixup · daf70d89
      Matt Redfearn 提交于
      The __clear_user function is defined to return the number of bytes that
      could not be cleared. From the underlying memset / bzero implementation
      this means setting register a2 to that number on return. Currently if a
      page fault is triggered within the memset_partial block, the value
      loaded into a2 on return is meaningless.
      
      The label .Lpartial_fixup\@ is jumped to on page fault. In order to work
      out how many bytes failed to copy, the exception handler should find how
      many bytes left in the partial block (andi a2, STORMASK), add that to
      the partial block end address (a2), and subtract the faulting address to
      get the remainder. Currently it incorrectly subtracts the partial block
      start address (t1), which has additionally been clobbered to generate a
      jump target in memset_partial. Fix this by adding the block end address
      instead.
      
      This issue was found with the following test code:
            int j, k;
            for (j = 0; j < 512; j++) {
              if ((k = clear_user(NULL, j)) != j) {
                 pr_err("clear_user (NULL %d) returned %d\n", j, k);
              }
            }
      Which now passes on Creator Ci40 (MIPS32) and Cavium Octeon II (MIPS64).
      Suggested-by: NJames Hogan <jhogan@kernel.org>
      Signed-off-by: NMatt Redfearn <matt.redfearn@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: stable@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/19108/Signed-off-by: NJames Hogan <jhogan@kernel.org>
      daf70d89
    • M
      MIPS: memset.S: EVA & fault support for small_memset · 8a8158c8
      Matt Redfearn 提交于
      The MIPS kernel memset / bzero implementation includes a small_memset
      branch which is used when the region to be set is smaller than a long (4
      bytes on 32bit, 8 bytes on 64bit). The current small_memset
      implementation uses a simple store byte loop to write the destination.
      There are 2 issues with this implementation:
      
      1. When EVA mode is active, user and kernel address spaces may overlap.
      Currently the use of the sb instruction means kernel mode addressing is
      always used and an intended write to userspace may actually overwrite
      some critical kernel data.
      
      2. If the write triggers a page fault, for example by calling
      __clear_user(NULL, 2), instead of gracefully handling the fault, an OOPS
      is triggered.
      
      Fix these issues by replacing the sb instruction with the EX() macro,
      which will emit EVA compatible instuctions as required. Additionally
      implement a fault fixup for small_memset which sets a2 to the number of
      bytes that could not be cleared (as defined by __clear_user).
      Reported-by: NChuanhua Lei <chuanhua.lei@intel.com>
      Signed-off-by: NMatt Redfearn <matt.redfearn@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: stable@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/18975/Signed-off-by: NJames Hogan <jhogan@kernel.org>
      8a8158c8
  6. 16 4月, 2018 1 次提交
    • M
      MIPS: dts: Boston: Fix PCI bus dtc warnings: · 2c2bf522
      Matt Redfearn 提交于
      dtc recently (v1.4.4-8-g756ffc4f52f6) added PCI bus checks. Fix the
      warnings now emitted:
      
      arch/mips/boot/dts/img/boston.dtb: Warning (pci_bridge): /pci@10000000: missing bus-range for PCI bridge
      arch/mips/boot/dts/img/boston.dtb: Warning (pci_bridge): /pci@12000000: missing bus-range for PCI bridge
      arch/mips/boot/dts/img/boston.dtb: Warning (pci_bridge): /pci@14000000: missing bus-range for PCI bridge
      Signed-off-by: NMatt Redfearn <matt.redfearn@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: linux-mips@linux-mips.org
      Cc: devicetree@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/19070/Signed-off-by: NJames Hogan <jhogan@kernel.org>
      2c2bf522
  7. 14 4月, 2018 2 次提交
  8. 13 4月, 2018 1 次提交
  9. 12 4月, 2018 2 次提交
    • M
      mm: introduce MAP_FIXED_NOREPLACE · a4ff8e86
      Michal Hocko 提交于
      Patch series "mm: introduce MAP_FIXED_NOREPLACE", v2.
      
      This has started as a follow up discussion [3][4] resulting in the
      runtime failure caused by hardening patch [5] which removes MAP_FIXED
      from the elf loader because MAP_FIXED is inherently dangerous as it
      might silently clobber an existing underlying mapping (e.g.  stack).
      The reason for the failure is that some architectures enforce an
      alignment for the given address hint without MAP_FIXED used (e.g.  for
      shared or file backed mappings).
      
      One way around this would be excluding those archs which do alignment
      tricks from the hardening [6].  The patch is really trivial but it has
      been objected, rightfully so, that this screams for a more generic
      solution.  We basically want a non-destructive MAP_FIXED.
      
      The first patch introduced MAP_FIXED_NOREPLACE which enforces the given
      address but unlike MAP_FIXED it fails with EEXIST if the given range
      conflicts with an existing one.  The flag is introduced as a completely
      new one rather than a MAP_FIXED extension because of the backward
      compatibility.  We really want a never-clobber semantic even on older
      kernels which do not recognize the flag.  Unfortunately mmap sucks
      wrt flags evaluation because we do not EINVAL on unknown flags.  On
      those kernels we would simply use the traditional hint based semantic so
      the caller can still get a different address (which sucks) but at least
      not silently corrupt an existing mapping.  I do not see a good way
      around that.  Except we won't export expose the new semantic to the
      userspace at all.
      
      It seems there are users who would like to have something like that.
      Jemalloc has been mentioned by Michael Ellerman [7]
      
      Florian Weimer has mentioned the following:
      : glibc ld.so currently maps DSOs without hints.  This means that the kernel
      : will map right next to each other, and the offsets between them a completely
      : predictable.  We would like to change that and supply a random address in a
      : window of the address space.  If there is a conflict, we do not want the
      : kernel to pick a non-random address. Instead, we would try again with a
      : random address.
      
      John Hubbard has mentioned CUDA example
      : a) Searches /proc/<pid>/maps for a "suitable" region of available
      : VA space.  "Suitable" generally means it has to have a base address
      : within a certain limited range (a particular device model might
      : have odd limitations, for example), it has to be large enough, and
      : alignment has to be large enough (again, various devices may have
      : constraints that lead us to do this).
      :
      : This is of course subject to races with other threads in the process.
      :
      : Let's say it finds a region starting at va.
      :
      : b) Next it does:
      :     p = mmap(va, ...)
      :
      : *without* setting MAP_FIXED, of course (so va is just a hint), to
      : attempt to safely reserve that region. If p != va, then in most cases,
      : this is a failure (almost certainly due to another thread getting a
      : mapping from that region before we did), and so this layer now has to
      : call munmap(), before returning a "failure: retry" to upper layers.
      :
      :     IMPROVEMENT: --> if instead, we could call this:
      :
      :             p = mmap(va, ... MAP_FIXED_NOREPLACE ...)
      :
      :         , then we could skip the munmap() call upon failure. This
      :         is a small thing, but it is useful here. (Thanks to Piotr
      :         Jaroszynski and Mark Hairgrove for helping me get that detail
      :         exactly right, btw.)
      :
      : c) After that, CUDA suballocates from p, via:
      :
      :      q = mmap(sub_region_start, ... MAP_FIXED ...)
      :
      : Interestingly enough, "freeing" is also done via MAP_FIXED, and
      : setting PROT_NONE to the subregion. Anyway, I just included (c) for
      : general interest.
      
      Atomic address range probing in the multithreaded programs in general
      sounds like an interesting thing to me.
      
      The second patch simply replaces MAP_FIXED use in elf loader by
      MAP_FIXED_NOREPLACE.  I believe other places which rely on MAP_FIXED
      should follow.  Actually real MAP_FIXED usages should be docummented
      properly and they should be more of an exception.
      
      [1] http://lkml.kernel.org/r/20171116101900.13621-1-mhocko@kernel.org
      [2] http://lkml.kernel.org/r/20171129144219.22867-1-mhocko@kernel.org
      [3] http://lkml.kernel.org/r/20171107162217.382cd754@canb.auug.org.au
      [4] http://lkml.kernel.org/r/1510048229.12079.7.camel@abdul.in.ibm.com
      [5] http://lkml.kernel.org/r/20171023082608.6167-1-mhocko@kernel.org
      [6] http://lkml.kernel.org/r/20171113094203.aofz2e7kueitk55y@dhcp22.suse.cz
      [7] http://lkml.kernel.org/r/87efp1w7vy.fsf@concordia.ellerman.id.au
      
      This patch (of 2):
      
      MAP_FIXED is used quite often to enforce mapping at the particular range.
      The main problem of this flag is, however, that it is inherently dangerous
      because it unmaps existing mappings covered by the requested range.  This
      can cause silent memory corruptions.  Some of them even with serious
      security implications.  While the current semantic might be really
      desiderable in many cases there are others which would want to enforce the
      given range but rather see a failure than a silent memory corruption on a
      clashing range.  Please note that there is no guarantee that a given range
      is obeyed by the mmap even when it is free - e.g.  arch specific code is
      allowed to apply an alignment.
      
      Introduce a new MAP_FIXED_NOREPLACE flag for mmap to achieve this
      behavior.  It has the same semantic as MAP_FIXED wrt.  the given address
      request with a single exception that it fails with EEXIST if the requested
      address is already covered by an existing mapping.  We still do rely on
      get_unmaped_area to handle all the arch specific MAP_FIXED treatment and
      check for a conflicting vma after it returns.
      
      The flag is introduced as a completely new one rather than a MAP_FIXED
      extension because of the backward compatibility.  We really want a
      never-clobber semantic even on older kernels which do not recognize the
      flag.  Unfortunately mmap sucks wrt.  flags evaluation because we do not
      EINVAL on unknown flags.  On those kernels we would simply use the
      traditional hint based semantic so the caller can still get a different
      address (which sucks) but at least not silently corrupt an existing
      mapping.  I do not see a good way around that.
      
      [mpe@ellerman.id.au: fix whitespace]
      [fail on clashing range with EEXIST as per Florian Weimer]
      [set MAP_FIXED before round_hint_to_min as per Khalid Aziz]
      Link: http://lkml.kernel.org/r/20171213092550.2774-2-mhocko@kernel.orgReviewed-by: NKhalid Aziz <khalid.aziz@oracle.com>
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Russell King - ARM Linux <linux@armlinux.org.uk>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Florian Weimer <fweimer@redhat.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
      Cc: Joel Stanley <joel@jms.id.au>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Jason Evans <jasone@google.com>
      Cc: David Goldblatt <davidtgoldblatt@gmail.com>
      Cc: Edward Tomasz Napierała <trasz@FreeBSD.org>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a4ff8e86
    • K
      exec: pass stack rlimit into mm layout functions · 8f2af155
      Kees Cook 提交于
      Patch series "exec: Pin stack limit during exec".
      
      Attempts to solve problems with the stack limit changing during exec
      continue to be frustrated[1][2].  In addition to the specific issues
      around the Stack Clash family of flaws, Andy Lutomirski pointed out[3]
      other places during exec where the stack limit is used and is assumed to
      be unchanging.  Given the many places it gets used and the fact that it
      can be manipulated/raced via setrlimit() and prlimit(), I think the only
      way to handle this is to move away from the "current" view of the stack
      limit and instead attach it to the bprm, and plumb this down into the
      functions that need to know the stack limits.  This series implements
      the approach.
      
      [1] 04e35f44 ("exec: avoid RLIMIT_STACK races with prlimit()")
      [2] 779f4e1c ("Revert "exec: avoid RLIMIT_STACK races with prlimit()"")
      [3] to security@kernel.org, "Subject: existing rlimit races?"
      
      This patch (of 3):
      
      Since it is possible that the stack rlimit can change externally during
      exec (either via another thread calling setrlimit() or another process
      calling prlimit()), provide a way to pass the rlimit down into the
      per-architecture mm layout functions so that the rlimit can stay in the
      bprm structure instead of sitting in the signal structure until exec is
      finalized.
      
      Link: http://lkml.kernel.org/r/1518638796-20819-2-git-send-email-keescook@chromium.orgSigned-off-by: NKees Cook <keescook@chromium.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
      Cc: Brad Spengler <spender@grsecurity.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8f2af155
  10. 07 4月, 2018 1 次提交
  11. 06 4月, 2018 2 次提交
    • H
      mm: fix races between swapoff and flush dcache · cb9f753a
      Huang Ying 提交于
      Thanks to commit 4b3ef9da ("mm/swap: split swap cache into 64MB
      trunks"), after swapoff the address_space associated with the swap
      device will be freed.  So page_mapping() users which may touch the
      address_space need some kind of mechanism to prevent the address_space
      from being freed during accessing.
      
      The dcache flushing functions (flush_dcache_page(), etc) in architecture
      specific code may access the address_space of swap device for anonymous
      pages in swap cache via page_mapping() function.  But in some cases
      there are no mechanisms to prevent the swap device from being swapoff,
      for example,
      
        CPU1					CPU2
        __get_user_pages()			swapoff()
          flush_dcache_page()
            mapping = page_mapping()
              ...				  exit_swap_address_space()
              ...				    kvfree(spaces)
              mapping_mapped(mapping)
      
      The address space may be accessed after being freed.
      
      But from cachetlb.txt and Russell King, flush_dcache_page() only care
      about file cache pages, for anonymous pages, flush_anon_page() should be
      used.  The implementation of flush_dcache_page() in all architectures
      follows this too.  They will check whether page_mapping() is NULL and
      whether mapping_mapped() is true to determine whether to flush the
      dcache immediately.  And they will use interval tree (mapping->i_mmap)
      to find all user space mappings.  While mapping_mapped() and
      mapping->i_mmap isn't used by anonymous pages in swap cache at all.
      
      So, to fix the race between swapoff and flush dcache, __page_mapping()
      is add to return the address_space for file cache pages and NULL
      otherwise.  All page_mapping() invoking in flush dcache functions are
      replaced with page_mapping_file().
      
      [akpm@linux-foundation.org: simplify page_mapping_file(), per Mike]
      Link: http://lkml.kernel.org/r/20180305083634.15174-1-ying.huang@intel.comSigned-off-by: N"Huang, Ying" <ying.huang@intel.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Chen Liqin <liqin.linux@gmail.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cb9f753a
    • H
      zboot: fix stack protector in compressed boot phase · 7bbaf27d
      Huacai Chen 提交于
      Calling __stack_chk_guard_setup() in decompress_kernel() is too late
      that stack checking always fails for decompress_kernel() itself.  So
      remove __stack_chk_guard_setup() and initialize __stack_chk_guard before
      we call decompress_kernel().
      
      Original code comes from ARM but also used for MIPS and SH, so fix them
      together.  If without this fix, compressed booting of these archs will
      fail because stack checking is enabled by default (>=4.16).
      
      Link: http://lkml.kernel.org/r/1522226933-29317-1-git-send-email-chenhc@lemote.com
      Fixes: 8779657d ("stackprotector: Introduce CONFIG_CC_STACKPROTECTOR_STRONG")
      Signed-off-by: NHuacai Chen <chenhc@lemote.com>
      Acked-by: NJames Hogan <jhogan@kernel.org>
      Acked-by: NKees Cook <keescook@chromium.org>
      Acked-by: NRich Felker <dalias@libc.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7bbaf27d
  12. 03 4月, 2018 8 次提交
  13. 28 3月, 2018 2 次提交
  14. 26 3月, 2018 1 次提交
  15. 23 3月, 2018 1 次提交
    • M
      MIPS: Use the entry point from the ELF file header · 27c524d1
      Maciej W. Rozycki 提交于
      In order to fetch the correct entry point with the ISA bit included, for
      use by non-ELF boot loaders, parse the output of `objdump -f' for the
      start address recorded in the kernel executable itself, rather than
      using `nm' to get the value of the `kernel_entry' symbol.
      
      Sign-extend the address retrieved if 32-bit, so that execution is
      correctly started on 64-bit processors as well.  The tool always prints
      the entry point using either 8 or 16 hexadecimal digits, matching the
      address width (aka class) of the ELF file, even in the presence of
      leading zeros.
      Signed-off-by: NMaciej W. Rozycki <macro@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/18912/Signed-off-by: NJames Hogan <jhogan@kernel.org>
      27c524d1
  16. 22 3月, 2018 8 次提交
  17. 20 3月, 2018 2 次提交