1. 13 6月, 2016 6 次提交
  2. 23 5月, 2016 1 次提交
    • M
      s390: fix info leak in do_sigsegv · cf0d44d5
      Michal Hocko 提交于
      Aleksa has reported incorrect si_errno value when stracing task which
      received SIGSEGV:
      [pid 20799] --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_errno=2510266, si_addr=0x100000000000000}
      
      The reason seems to be that do_sigsegv is not initializing siginfo
      structure defined on the stack completely so it will leak 4B of
      the previous stack content. Fix it simply by initializing si_errno
      to 0 (same as do_sigbus does already).
      
      Cc: stable # introduced pre-git times
      Reported-by: NAleksa Sarai <asarai@suse.de>
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      cf0d44d5
  3. 11 5月, 2016 2 次提交
  4. 10 5月, 2016 1 次提交
  5. 21 4月, 2016 1 次提交
    • G
      s390/mm: fix asce_bits handling with dynamic pagetable levels · 723cacbd
      Gerald Schaefer 提交于
      There is a race with multi-threaded applications between context switch and
      pagetable upgrade. In switch_mm() a new user_asce is built from mm->pgd and
      mm->context.asce_bits, w/o holding any locks. A concurrent mmap with a
      pagetable upgrade on another thread in crst_table_upgrade() could already
      have set new asce_bits, but not yet the new mm->pgd. This would result in a
      corrupt user_asce in switch_mm(), and eventually in a kernel panic from a
      translation exception.
      
      Fix this by storing the complete asce instead of just the asce_bits, which
      can then be read atomically from switch_mm(), so that it either sees the
      old value or the new value, but no mixture. Both cases are OK. Having the
      old value would result in a page fault on access to the higher level memory,
      but the fault handler would see the new mm->pgd, if it was a valid access
      after the mmap on the other thread has completed. So as worst-case scenario
      we would have a page fault loop for the racing thread until the next time
      slice.
      
      Also remove dead code and simplify the upgrade/downgrade path, there are no
      upgrades from 2 levels, and only downgrades from 3 levels for compat tasks.
      There are also no concurrent upgrades, because the mmap_sem is held with
      down_write() in do_mmap, so the flush and table checks during upgrade can
      be removed.
      Reported-by: NMichael Munday <munday@ca.ibm.com>
      Reviewed-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      723cacbd
  6. 16 4月, 2016 1 次提交
  7. 05 4月, 2016 1 次提交
  8. 23 3月, 2016 1 次提交
  9. 17 3月, 2016 2 次提交
    • G
      s390/mm: handle PTE-mapped tail pages in fast gup · fc897c95
      Gerald Schaefer 提交于
      With the THP refcounting rework it is possible to see THP compound tail
      pages mapped with PTEs during a THP split. This needs to be considered
      when using page_cache_get_speculative(), which will always fail on tail
      pages because ->_count is always zero. commit 7aef4172 "mm: handle
      PTE-mapped tail pages in gerneric fast gup implementaiton" fixed it for
      the generic fast gup code by using compound_head(page) instead of page,
      but not for s390.
      
      This patch is a 1:1 adaption of commit 7aef4172 for the s390 fast gup
      code. Without this fix, gup will fall back to the slow path or fail
      in the unlikely scenario that we hit a THP under splitting in-between
      the page table split and the compound page split.
      
      Cc: stable@vger.kernel.org # v4.5
      Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      fc897c95
    • H
      s390: add DEBUG_RODATA support · 91d37211
      Heiko Carstens 提交于
      git commit d2aa1aca ("mm/init: Add 'rodata=off' boot cmdline
      parameter to disable read-only kernel mappings") adds a bogus warning
      to the console which states that s390 does not support kernel memory
      protection.
      
      This however is not true. We do support that since a couple of years
      however in a different way than the author of the above named patch
      expected.
      
      To get rid of the misleading message implement the mark_rodata_ro
      function and emit a message which states the amount of memory which
      was write protected already earlier.
      
      This is the same what parisc currently does.
      
      We currently do not support the kernel parameter "rodata=off" which
      would allow to write to the rodata section again. However since we
      have this feature since years without any problems there is no reason
      to add support for this.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      91d37211
  10. 16 3月, 2016 1 次提交
  11. 08 3月, 2016 3 次提交
  12. 07 3月, 2016 1 次提交
  13. 02 3月, 2016 2 次提交
    • H
      s390/fault: merge report_user_fault implementations · 5d7eccec
      Heiko Carstens 提交于
      We have two close to identical report_user_fault functions.
      Add a parameter to one and get rid of the other one in order
      to reduce code duplication.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      5d7eccec
    • M
      s390/kvm: simplify set_guest_storage_key · 443a8133
      Martin Schwidefsky 提交于
      Git commit ab3f285f
      "KVM: s390/mm: try a cow on read only pages for key ops"
      added a fixup_user_fault to set_guest_storage_key force a copy on
      write if the page is mapped read-only. This is supposed to fix the
      problem of differing storage keys for shared mappings, e.g. the
      empty_zero_page.
      But if the storage key is set before the pte is mapped the storage
      key update is done on the pgste. A later fault will happily map the
      shared page with the key from the pgste.
      
      Eventually git commit 2faee8ff
      "s390/mm: prevent and break zero page mappings in case of storage keys"
      fixed this problem for the empty_zero_page. The commit makes sure that
      guests enabled for storage keys will not use the empty_zero_page at all.
      
      As the call to fixup_user_fault in set_guest_storage_key depends on the
      order of the storage key operation vs. the fault that maps the pte
      it does not really fix anything. Just remove it.
      Reviewed-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      443a8133
  14. 23 2月, 2016 1 次提交
  15. 17 2月, 2016 1 次提交
  16. 16 2月, 2016 1 次提交
    • D
      mm/gup: Switch all callers of get_user_pages() to not pass tsk/mm · d4edcf0d
      Dave Hansen 提交于
      We will soon modify the vanilla get_user_pages() so it can no
      longer be used on mm/tasks other than 'current/current->mm',
      which is by far the most common way it is called.  For now,
      we allow the old-style calls, but warn when they are used.
      (implemented in previous patch)
      
      This patch switches all callers of:
      
      	get_user_pages()
      	get_user_pages_unlocked()
      	get_user_pages_locked()
      
      to stop passing tsk/mm so they will no longer see the warnings.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: jack@suse.cz
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210156.113E9407@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d4edcf0d
  17. 11 2月, 2016 1 次提交
  18. 19 1月, 2016 4 次提交
  19. 16 1月, 2016 4 次提交
    • D
      s390/mm: enable fixup_user_fault retrying · fef8953a
      Dominik Dingel 提交于
      By passing a non-null flag we allow fixup_user_fault to retry, which
      enables userfaultfd.  As during these retries we might drop the mmap_sem
      we need to check if that happened and redo the complete chain of
      actions.
      Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
      Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: "Jason J. Herne" <jjherne@linux.vnet.ibm.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Eric B Munson <emunson@akamai.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fef8953a
    • D
      mm: bring in additional flag for fixup_user_fault to signal unlock · 4a9e1cda
      Dominik Dingel 提交于
      During Jason's work with postcopy migration support for s390 a problem
      regarding gmap faults was discovered.
      
      The gmap code will call fixup_user_fault which will end up always in
      handle_mm_fault.  Till now we never cared about retries, but as the
      userfaultfd code kind of relies on it.  this needs some fix.
      
      This patchset does not take care of the futex code.  I will now look
      closer at this.
      
      This patch (of 2):
      
      With the introduction of userfaultfd, kvm on s390 needs fixup_user_fault
      to pass in FAULT_FLAG_ALLOW_RETRY and give feedback if during the
      faulting we ever unlocked mmap_sem.
      
      This patch brings in the logic to handle retries as well as it cleans up
      the current documentation.  fixup_user_fault was not having the same
      semantics as filemap_fault.  It never indicated if a retry happened and
      so a caller wasn't able to handle that case.  So we now changed the
      behaviour to always retry a locked mmap_sem.
      Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
      Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: "Jason J. Herne" <jjherne@linux.vnet.ibm.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Eric B Munson <emunson@akamai.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4a9e1cda
    • K
      s390, thp: remove infrastructure for handling splitting PMDs · fecffad2
      Kirill A. Shutemov 提交于
      With new refcounting we don't need to mark PMDs splitting.  Let's drop
      code to handle this.
      
      pmdp_splitting_flush() is not needed too: on splitting PMD we will do
      pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
      needed for fast_gup.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Steve Capper <steve.capper@linaro.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fecffad2
    • K
      mm: drop tail page refcounting · ddc58f27
      Kirill A. Shutemov 提交于
      Tail page refcounting is utterly complicated and painful to support.
      
      It uses ->_mapcount on tail pages to store how many times this page is
      pinned.  get_page() bumps ->_mapcount on tail page in addition to
      ->_count on head.  This information is required by split_huge_page() to
      be able to distribute pins from head of compound page to tails during
      the split.
      
      We will need ->_mapcount to account PTE mappings of subpages of the
      compound page.  We eliminate need in current meaning of ->_mapcount in
      tail pages by forbidding split entirely if the page is pinned.
      
      The only user of tail page refcounting is THP which is marked BROKEN for
      now.
      
      Let's drop all this mess.  It makes get_page() and put_page() much
      simpler.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Tested-by: NSasha Levin <sasha.levin@oracle.com>
      Tested-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NJerome Marchand <jmarchan@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Steve Capper <steve.capper@linaro.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ddc58f27
  20. 15 1月, 2016 1 次提交
  21. 11 1月, 2016 3 次提交
    • A
      s390: fix normalization bug in exception table sorting · bcb7825a
      Ard Biesheuvel 提交于
      The normalization pass in the sorting routine of the relative exception
      table serves two purposes:
      - it ensures that the address fields of the exception table entries are
        fully ordered, so that no ambiguities arise between entries with
        identical instruction offsets (i.e., when two instructions that are
        exactly 8 bytes apart each have an exception table entry associated with
        them)
      - it ensures that the offsets of both the instruction and the fixup fields
        of each entry are relative to their final location after sorting.
      
      Commit eb608fb3 ("s390/exceptions: switch to relative exception table
      entries") ported the relative exception table format from x86, but modified
      the sorting routine to only normalize the instruction offset field and not
      the fixup offset field. The result is that the fixup offset of each entry
      will be relative to the original location of the entry before sorting,
      likely leading to crashes when those entries are dereferenced.
      
      Fixes: eb608fb3 ("s390/exceptions: switch to relative exception table entries")
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      bcb7825a
    • H
      s390: rename struct _lowcore to struct lowcore · c667aeac
      Heiko Carstens 提交于
      Finally get rid of the leading underscore. I tried this already two or
      three years ago, however Michael Holzheu objected since this would
      break the crash utility (again).
      
      However Michael integrated support for the new name into the crash
      utility back then, so it doesn't break if the name will be changed
      now.  So finally get rid of the ever confusing leading underscore.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      c667aeac
    • H
      s390/mem_detect: use unsigned longs · 423d5b36
      Heiko Carstens 提交于
      The memory detection code historically had to use unsigned long long
      since the machine reported the true memory size (>4GB) even if the
      virtual machine was running in ESA/390 mode.
      
      Since the old code is gone use unsigned long everywhere and also get
      rid of an unused ADDR2G define.
      
      (this patch converts all long longs within sclp_info to longs)
      
      There are many more possible conversions, however that can be done if
      somebody touches the corresponding code.  Since people started to
      convert unrelated long types to long longs because of the types within
      struct sclp_info convert this now.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      423d5b36
  22. 18 12月, 2015 1 次提交