1. 10 6月, 2016 13 次提交
  2. 20 5月, 2016 1 次提交
    • H
      arch: fix has_transparent_hugepage() · fd8cfd30
      Hugh Dickins 提交于
      I've just discovered that the useful-sounding has_transparent_hugepage()
      is actually an architecture-dependent minefield: on some arches it only
      builds if CONFIG_TRANSPARENT_HUGEPAGE=y, on others it's also there when
      not, but on some of those (arm and arm64) it then gives the wrong
      answer; and on mips alone it's marked __init, which would crash if
      called later (but so far it has not been called later).
      
      Straighten this out: make it available to all configs, with a sensible
      default in asm-generic/pgtable.h, removing its definitions from those
      arches (arc, arm, arm64, sparc, tile) which are served by the default,
      adding #define has_transparent_hugepage has_transparent_hugepage to
      those (mips, powerpc, s390, x86) which need to override the default at
      runtime, and removing the __init from mips (but maybe that kind of code
      should be avoided after init: set a static variable the first time it's
      called).
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andres Lagar-Cavilla <andreslc@google.com>
      Cc: Yang Shi <yang.shi@linaro.org>
      Cc: Ning Qu <quning@gmail.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: Vineet Gupta <vgupta@synopsys.com>		[arch/arc]
      Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>	[arch/s390]
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fd8cfd30
  3. 13 5月, 2016 2 次提交
    • C
      KVM: s390: set halt polling to 80 microseconds · c4a8de35
      Christian Borntraeger 提交于
      on s390 we disabled the halt polling with commit 920552b2
      ("KVM: disable halt_poll_ns as default for s390x"), as floating
      interrupts would let all CPUs have a successful poll, resulting
      in much higher CPU usage (on otherwise idle systems).
      
      With the improved selection of polls we can now retry halt polling.
      Performance measurements with different choices like 25,50,80,100,200
      microseconds showed that 80 microseconds seems to improve several cases
      without increasing the CPU costs too much. Higher values would improve
      the performance even more but increased the cpu time as well.
      So let's start small and use this value of 80 microseconds on s390 until
      we have a better understanding of cost/benefit of higher values.
      Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c4a8de35
    • C
      KVM: halt_polling: provide a way to qualify wakeups during poll · 3491caf2
      Christian Borntraeger 提交于
      Some wakeups should not be considered a sucessful poll. For example on
      s390 I/O interrupts are usually floating, which means that _ALL_ CPUs
      would be considered runnable - letting all vCPUs poll all the time for
      transactional like workload, even if one vCPU would be enough.
      This can result in huge CPU usage for large guests.
      This patch lets architectures provide a way to qualify wakeups if they
      should be considered a good/bad wakeups in regard to polls.
      
      For s390 the implementation will fence of halt polling for anything but
      known good, single vCPU events. The s390 implementation for floating
      interrupts does a wakeup for one vCPU, but the interrupt will be delivered
      by whatever CPU checks first for a pending interrupt. We prefer the
      woken up CPU by marking the poll of this CPU as "good" poll.
      This code will also mark several other wakeup reasons like IPI or
      expired timers as "good". This will of course also mark some events as
      not sucessful. As  KVM on z runs always as a 2nd level hypervisor,
      we prefer to not poll, unless we are really sure, though.
      
      This patch successfully limits the CPU usage for cases like uperf 1byte
      transactional ping pong workload or wakeup heavy workload like OLTP
      while still providing a proper speedup.
      
      This also introduced a new vcpu stat "halt_poll_no_tuning" that marks
      wakeups that are considered not good for polling.
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Acked-by: Radim Krčmář <rkrcmar@redhat.com> (for an earlier version)
      Cc: David Matlack <dmatlack@google.com>
      Cc: Wanpeng Li <kernellwp@gmail.com>
      [Rename config symbol. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3491caf2
  4. 10 5月, 2016 1 次提交
  5. 09 5月, 2016 3 次提交
  6. 04 5月, 2016 1 次提交
  7. 21 4月, 2016 2 次提交
    • M
      s390/fpu: allocate 'struct fpu' with the task_struct · 3f6813b9
      Martin Schwidefsky 提交于
      Analog to git commit 0c8c0f03
      "x86/fpu, sched: Dynamically allocate 'struct fpu'"
      move the struct fpu to the end of the struct thread_struct,
      set CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and add the
      setup_task_size() function to calculate the correct size
      fo the task struct.
      
      For the performance_defconfig this increases the size of
      struct task_struct from 7424 bytes to 7936 bytes (MACHINE_HAS_VX==1)
      or 7552 bytes (MACHINE_HAS_VX==0). The dynamic allocation of the
      struct fpu is removed. The slab cache uses an 8KB block for the
      task struct in all cases, there is enough room for the struct fpu.
      For MACHINE_HAS_VX==1 each task now needs 512 bytes less memory.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      3f6813b9
    • G
      s390/mm: fix asce_bits handling with dynamic pagetable levels · 723cacbd
      Gerald Schaefer 提交于
      There is a race with multi-threaded applications between context switch and
      pagetable upgrade. In switch_mm() a new user_asce is built from mm->pgd and
      mm->context.asce_bits, w/o holding any locks. A concurrent mmap with a
      pagetable upgrade on another thread in crst_table_upgrade() could already
      have set new asce_bits, but not yet the new mm->pgd. This would result in a
      corrupt user_asce in switch_mm(), and eventually in a kernel panic from a
      translation exception.
      
      Fix this by storing the complete asce instead of just the asce_bits, which
      can then be read atomically from switch_mm(), so that it either sees the
      old value or the new value, but no mixture. Both cases are OK. Having the
      old value would result in a page fault on access to the higher level memory,
      but the fault handler would see the new mm->pgd, if it was a valid access
      after the mmap on the other thread has completed. So as worst-case scenario
      we would have a page fault loop for the racing thread until the next time
      slice.
      
      Also remove dead code and simplify the upgrade/downgrade path, there are no
      upgrades from 2 levels, and only downgrades from 3 levels for compat tasks.
      There are also no concurrent upgrades, because the mmap_sem is held with
      down_write() in do_mmap, so the flush and table checks during upgrade can
      be removed.
      Reported-by: NMichael Munday <munday@ca.ibm.com>
      Reviewed-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      723cacbd
  8. 16 4月, 2016 3 次提交
  9. 13 4月, 2016 2 次提交
    • M
      locking/rwsem, s390: Provide __down_write_killable() · 4edab14e
      Michal Hocko 提交于
      Introduce ___down_write() for the fast path and reuse it for __down_write()
      resp. __down_write_killable() each using the respective generic slow path
      (rwsem_down_write_failed() resp. rwsem_down_write_failed_killable()).
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
      Cc: Signed-off-by: Jason Low <jason.low2@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: linux-alpha@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: linux-ia64@vger.kernel.org
      Cc: linux-s390@vger.kernel.org
      Cc: linux-sh@vger.kernel.org
      Cc: linux-xtensa@linux-xtensa.org
      Cc: sparclinux@vger.kernel.org
      Link: http://lkml.kernel.org/r/1460041951-22347-10-git-send-email-mhocko@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4edab14e
    • M
      locking/rwsem: Get rid of __down_write_nested() · f8e04d85
      Michal Hocko 提交于
      This is no longer used anywhere and all callers (__down_write()) use
      0 as a subclass. Ditch __down_write_nested() to make the code easier
      to follow.
      
      This shouldn't introduce any functional change.
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NDavidlohr Bueso <dave@stgolabs.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
      Cc: Signed-off-by: Jason Low <jason.low2@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: linux-alpha@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: linux-ia64@vger.kernel.org
      Cc: linux-s390@vger.kernel.org
      Cc: linux-sh@vger.kernel.org
      Cc: linux-xtensa@linux-xtensa.org
      Cc: sparclinux@vger.kernel.org
      Link: http://lkml.kernel.org/r/1460041951-22347-2-git-send-email-mhocko@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f8e04d85
  10. 01 4月, 2016 3 次提交
    • S
      s390/seccomp: include generic seccomp header file · 4f375903
      Sudip Mukherjee 提交于
      Fixes this build error on linux-next:
      
      kernel/seccomp.c: In function '__secure_computing_strict':
      kernel/seccomp.c:526:3: error: implicit declaration of function
      				'get_compat_mode1_syscalls'
      
      The retrieval of compat syscall numbers were moved into inline function
      defined in asm-generic header but the asm-generic header is not being
      used by s390.
      
      [heiko.carstens@de.ibm.com]: even though the build error will trigger
      only in the next merge window it makes sense to include the generic
      header file already now.
      
      Fixes: ("seccomp: Get compat syscalls from asm-generic header")
      Cc: Matt Redfearn <matt.redfearn@imgtec.com>
      Signed-off-by: NSudip Mukherjee <sudip.mukherjee@codethink.co.uk>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      4f375903
    • S
      s390/pci: add extra padding to function measurement block · 9d89d9e6
      Sebastian Ott 提交于
      Newer machines might use a different (larger) format for function
      measurement blocks. To ensure that we comply with the alignment
      requirement on these machines and prevent memory corruption (when
      firmware writes more data than we expect) add 16 padding bytes
      at the end of the fmb.
      
      Cc: stable@vger.kernel.org # v4.1+
      Signed-off-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      9d89d9e6
    • J
      livepatch: reuse module loader code to write relocations · 425595a7
      Jessica Yu 提交于
      Reuse module loader code to write relocations, thereby eliminating the need
      for architecture specific relocation code in livepatch. Specifically, reuse
      the apply_relocate_add() function in the module loader to write relocations
      instead of duplicating functionality in livepatch's arch-dependent
      klp_write_module_reloc() function.
      
      In order to accomplish this, livepatch modules manage their own relocation
      sections (marked with the SHF_RELA_LIVEPATCH section flag) and
      livepatch-specific symbols (marked with SHN_LIVEPATCH symbol section
      index). To apply livepatch relocation sections, livepatch symbols
      referenced by relocs are resolved and then apply_relocate_add() is called
      to apply those relocations.
      
      In addition, remove x86 livepatch relocation code and the s390
      klp_write_module_reloc() function stub. They are no longer needed since
      relocation work has been offloaded to module loader.
      
      Lastly, mark the module as a livepatch module so that the module loader
      canappropriately identify and initialize it.
      Signed-off-by: NJessica Yu <jeyu@redhat.com>
      Reviewed-by: NMiroslav Benes <mbenes@suse.cz>
      Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>   # for s390 changes
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      425595a7
  11. 23 3月, 2016 1 次提交
  12. 17 3月, 2016 1 次提交
  13. 14 3月, 2016 2 次提交
    • S
      s390/pci: enforce fmb page boundary rule · 80c544de
      Sebastian Ott 提交于
      The function measurement block must not cross a page boundary. Ensure
      that by raising the alignment requirement to the smallest power of 2
      larger than the size of the fmb.
      
      Fixes: d0b08853 ("s390/pci: performance statistics and debug infrastructure")
      Cc: stable@vger.kernel.org # v3.8+
      Signed-off-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      80c544de
    • A
      ipv4: Update parameters for csum_tcpudp_magic to their original types · 01cfbad7
      Alexander Duyck 提交于
      This patch updates all instances of csum_tcpudp_magic and
      csum_tcpudp_nofold to reflect the types that are usually used as the source
      inputs.  For example the protocol field is populated based on nexthdr which
      is actually an unsigned 8 bit value.  The length is usually populated based
      on skb->len which is an unsigned integer.
      
      This addresses an issue in which the IPv6 function csum_ipv6_magic was
      generating a checksum using the full 32b of skb->len while
      csum_tcpudp_magic was only using the lower 16 bits.  As a result we could
      run into issues when attempting to adjust the checksum as there was no
      protocol agnostic way to update it.
      
      With this change the value is still truncated as many architectures use
      "(len + proto) << 8", however this truncation only occurs for values
      greater than 16776960 in length and as such is unlikely to occur as we stop
      the inner headers at ~64K in size.
      
      I did have to make a few minor changes in the arm, mn10300, nios2, and
      score versions of the function in order to support these changes as they
      were either using things such as an OR to combine the protocol and length,
      or were using ntohs to convert the length which would have truncated the
      value.
      
      I also updated a few spots in terms of whitespace and type differences for
      the addresses.  Most of this was just to make sure all of the definitions
      were in sync going forward.
      Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      01cfbad7
  14. 10 3月, 2016 1 次提交
    • M
      s390/mm: four page table levels vs. fork · 3446c13b
      Martin Schwidefsky 提交于
      The fork of a process with four page table levels is broken since
      git commit 6252d702 "[S390] dynamic page tables."
      
      All new mm contexts are created with three page table levels and
      an asce limit of 4TB. If the parent has four levels dup_mmap will
      add vmas to the new context which are outside of the asce limit.
      The subsequent call to copy_page_range will walk the three level
      page table structure of the new process with non-zero pgd and pud
      indexes. This leads to memory clobbers as the pgd_index *and* the
      pud_index is added to the mm->pgd pointer without a pgd_deref
      in between.
      
      The init_new_context() function is selecting the number of page
      table levels for a new context. The function is used by mm_init()
      which in turn is called by dup_mm() and mm_alloc(). These two are
      used by fork() and exec(). The init_new_context() function can
      distinguish the two cases by looking at mm->context.asce_limit,
      for fork() the mm struct has been copied and the number of page
      table levels may not change. For exec() the mm_alloc() function
      set the new mm structure to zero, in this case a three-level page
      table is created as the temporary stack space is located at
      STACK_TOP_MAX = 4TB.
      
      This fixes CVE-2016-2143.
      Reported-by: NMarcin Kościelnicki <koriakin@0x04.net>
      Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      3446c13b
  15. 08 3月, 2016 4 次提交