1. 09 10月, 2017 1 次提交
  2. 23 8月, 2017 3 次提交
  3. 09 8月, 2017 2 次提交
    • H
      s390/vmcp: make use of contiguous memory allocator · 3f429842
      Heiko Carstens 提交于
      If memory is fragmented it is unlikely that large order memory
      allocations succeed. This has been an issue with the vmcp device
      driver since a long time, since it requires large physical contiguous
      memory ares for large responses.
      
      To hopefully resolve this issue make use of the contiguous memory
      allocator (cma). This patch adds a vmcp specific vmcp cma area with a
      default size of 4MB. The size can be changed either via the
      VMCP_CMA_SIZE config option at compile time or with the "vmcp_cma"
      kernel parameter (e.g. "vmcp_cma=16m").
      
      For any vmcp response buffers larger than 16k memory from the cma area
      will be allocated. If such an allocation fails, there is a fallback to
      the buddy allocator.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      3f429842
    • H
      s390/cpcmd,vmcp: avoid GFP_DMA allocations · cd4386a9
      Heiko Carstens 提交于
      According to the CP Programming Services manual Diagnose Code 8
      "Virtual Console Function" can be used in all addressing modes. Also
      the input and output buffers do not have a limitation which specifies
      they need to be below the 2GB line.
      
      This is true at least since z/VM 5.4.
      
      Therefore remove the sam31/64 instructions and allow for simple
      GFP_KERNEL allocations. This makes it easier to allocate a 1MB page
      if the user requested such a large return buffer.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      cd4386a9
  4. 26 7月, 2017 3 次提交
  5. 25 7月, 2017 2 次提交
    • M
      s390/mm: tag normal pages vs pages used in page tables · c9b5ad54
      Martin Schwidefsky 提交于
      The ESSA instruction has a new option that allows to tag pages that
      are not used as a page table. Without the tag the hypervisor has to
      assume that any guest page could be used in a page table inside the
      guest. This forces the hypervisor to flush all guest TLB entries
      whenever a host page table entry is invalidated. With the tag
      the host can skip the TLB flush if the page is tagged as normal page.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      c9b5ad54
    • E
      signal: Remove kernel interal si_code magic · cc731525
      Eric W. Biederman 提交于
      struct siginfo is a union and the kernel since 2.4 has been hiding a union
      tag in the high 16bits of si_code using the values:
      __SI_KILL
      __SI_TIMER
      __SI_POLL
      __SI_FAULT
      __SI_CHLD
      __SI_RT
      __SI_MESGQ
      __SI_SYS
      
      While this looks plausible on the surface, in practice this situation has
      not worked well.
      
      - Injected positive signals are not copied to user space properly
        unless they have these magic high bits set.
      
      - Injected positive signals are not reported properly by signalfd
        unless they have these magic high bits set.
      
      - These kernel internal values leaked to userspace via ptrace_peek_siginfo
      
      - It was possible to inject these kernel internal values and cause the
        the kernel to misbehave.
      
      - Kernel developers got confused and expected these kernel internal values
        in userspace in kernel self tests.
      
      - Kernel developers got confused and set si_code to __SI_FAULT which
        is SI_USER in userspace which causes userspace to think an ordinary user
        sent the signal and that it was not kernel generated.
      
      - The values make it impossible to reorganize the code to transform
        siginfo_copy_to_user into a plain copy_to_user.  As si_code must
        be massaged before being passed to userspace.
      
      So remove these kernel internal si codes and make the kernel code simpler
      and more maintainable.
      
      To replace these kernel internal magic si_codes introduce the helper
      function siginfo_layout, that takes a signal number and an si_code and
      computes which union member of siginfo is being used.  Have
      siginfo_layout return an enumeration so that gcc will have enough
      information to warn if a switch statement does not handle all of union
      members.
      
      A couple of architectures have a messed up ABI that defines signal
      specific duplications of SI_USER which causes more special cases in
      siginfo_layout than I would like.  The good news is only problem
      architectures pay the cost.
      
      Update all of the code that used the previous magic __SI_ values to
      use the new SIL_ values and to call siginfo_layout to get those
      values.  Escept where not all of the cases are handled remove the
      defaults in the switch statements so that if a new case is missed in
      the future the lack will show up at compile time.
      
      Modify the code that copies siginfo si_code to userspace to just copy
      the value and not cast si_code to a short first.  The high bits are no
      longer used to hold a magic union member.
      
      Fixup the siginfo header files to stop including the __SI_ values in
      their constants and for the headers that were missing it to properly
      update the number of si_codes for each signal type.
      
      The fixes to copy_siginfo_from_user32 implementations has the
      interesting property that several of them perviously should never have
      worked as the __SI_ values they depended up where kernel internal.
      With that dependency gone those implementations should work much
      better.
      
      The idea of not passing the __SI_ values out to userspace and then
      not reinserting them has been tested with criu and criu worked without
      changes.
      
      Ref: 2.4.0-test1
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      cc731525
  6. 13 7月, 2017 2 次提交
  7. 29 6月, 2017 1 次提交
  8. 27 6月, 2017 2 次提交
  9. 21 6月, 2017 1 次提交
  10. 14 6月, 2017 1 次提交
    • H
      s390/ipl: revert Load Normal semantics for LPAR CCW-type re-IPL · 4130b28f
      Heiko Carstens 提交于
      This reverts the two commits
      
      7afbeb6d ("s390/ipl: always use load normal for CCW-type re-IPL")
      0f7451ff ("s390/ipl: use load normal for LPAR re-ipl")
      
      The two commits did not take into account that behavior of standby
      memory changes fundamentally if the re-IPL method is changed from
      Load Clear to Load Normal.
      
      In case of the old re-IPL clear method all memory that was initially
      in standby state will be put into standby state again within the
      re-IPL process. Or in other words: memory that was brought online
      before a re-IPL will be offline again after a reboot.
      
      Given that we use different re-IPL methods depending on the hypervisor
      and CCW-type vs SCSI re-IPL it is not easy to tell in advance when and
      why memory will stay online or will be offline after a re-IPL.
      This does also have other side effects, since memory that is online
      from the beginning will be in ZONE_NORMAL by default vs ZONE_MOVABLE
      for memory that is offline.
      
      Therefore, before the change, a user could online and offline memory
      easily since standby memory was always in ZONE_NORMAL.  After the
      change, and a re-IPL, this depended on which memory parts were online
      before the re-IPL.
      
      From a usability point of view the current behavior is more than
      suboptimal. Therefore revert these changes until we have a better
      solution and get back to a consistent behavior. The bad thing about
      this is that the time required for a re-IPL will be significantly
      increased for configurations with several 100GB or 1TB of memory.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      4130b28f
  11. 13 6月, 2017 2 次提交
    • M
      s390/fpu: export save_fpu_regs for all configs · f044f4c5
      Martin Schwidefsky 提交于
      The save_fpu_regs function is a general API that is supposed to be
      usable for modules as well. Remove the #ifdef that hides the symbol
      for CONFIG_KVM=n.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      f044f4c5
    • M
      s390/kvm: avoid global config of vm.alloc_pgste=1 · 23fefe11
      Martin Schwidefsky 提交于
      The system control vm.alloc_pgste is used to control the size of the
      page tables, either 2K or 4K. The idea is that a KVM host sets the
      vm.alloc_pgste control to 1 which causes *all* new processes to run
      with 4K page tables. For a non-kvm system the control should stay off
      to save on memory used for page tables.
      
      Trouble is that distributions choose to set the control globally to
      be able to run KVM guests. This wastes memory on non-KVM systems.
      
      Introduce the PT_S390_PGSTE ELF segment type to "mark" the qemu
      executable with it. All executables with this (empty) segment in
      its ELF phdr array will be started with 4K page tables. Any executable
      without PT_S390_PGSTE will run with the default 2K page tables.
      
      This removes the need to set vm.alloc_pgste=1 for a KVM host and
      minimizes the waste of memory for page tables.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      23fefe11
  12. 12 6月, 2017 10 次提交
    • H
      s390: rename struct psw_bits members · a7525982
      Heiko Carstens 提交于
      Rename a couple of the struct psw_bits members so it is more obvious
      for what they are good. Initially I thought using the single character
      names from the PoP would be sufficient and obvious, but admittedly
      that is not true.
      
      The current implementation is not easy to use, if one has to look into
      the source file to figure out which member represents the 'per' bit
      (which is the 'r' member).
      
      Therefore rename the members to sane names that are identical to the
      uapi psw mask defines:
      
      r -> per
      i -> io
      e -> ext
      t -> dat
      m -> mcheck
      w -> wait
      p -> pstate
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      a7525982
    • H
      s390: rename psw_bits enums · 8bb3fdd6
      Heiko Carstens 提交于
      The address space enums that must be used when modifying the address
      space part of a psw with the psw_bits() macro can easily be confused
      with the psw defines that are used to mask and compare directly the
      mask part of a psw.
      We have e.g. PSW_AS_PRIMARY vs PSW_ASC_PRIMARY.
      
      To avoid confusion rename the PSW_AS_* enums to PSW_BITS_AS_*.
      
      In addition also rename the PSW_AMODE_* enums, so they also follow the
      same naming scheme: PSW_BITS_AMODE_*.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      8bb3fdd6
    • H
      s390/ipl: revert Load Normal semantics for LPAR CCW-type re-IPL · ead1dec8
      Heiko Carstens 提交于
      This reverts the two commits
      
      7afbeb6d ("s390/ipl: always use load normal for CCW-type re-IPL")
      0f7451ff ("s390/ipl: use load normal for LPAR re-ipl")
      
      The two commits did not take into account that behavior of standby
      memory changes fundamentally if the re-IPL method is changed from
      Load Clear to Load Normal.
      
      In case of the old re-IPL clear method all memory that was initially
      in standby state will be put into standby state again within the
      re-IPL process. Or in other words: memory that was brought online
      before a re-IPL will be offline again after a reboot.
      
      Given that we use different re-IPL methods depending on the hypervisor
      and CCW-type vs SCSI re-IPL it is not easy to tell in advance when and
      why memory will stay online or will be offline after a re-IPL.
      This does also have other side effects, since memory that is online
      from the beginning will be in ZONE_NORMAL by default vs ZONE_MOVABLE
      for memory that is offline.
      
      Therefore, before the change, a user could online and offline memory
      easily since standby memory was always in ZONE_NORMAL.  After the
      change, and a re-IPL, this depended on which memory parts were online
      before the re-IPL.
      
      From a usability point of view the current behavior is more than
      suboptimal. Therefore revert these changes until we have a better
      solution and get back to a consistent behavior. The bad thing about
      this is that the time required for a re-IPL will be significantly
      increased for configurations with several 100GB or 1TB of memory.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      ead1dec8
    • H
      s390/dumpstack: remove raw stack dump · 2b7b9817
      Heiko Carstens 提交于
      Remove raw stack dumps that are printed before call traces in case of
      a warning, or the 'l' sysrq trigger (show a stack backtrace for all
      active CPUs).
      
      Besides that a raw stack dump should not be shown for the 'l' sysrq
      trigger the value of the dump is close to zero. That's also why we
      don't print it in case of a panic since ages anymore. That this is
      still printed on warnings is just a leftover. So get rid of this
      completely.
      
      The following won't be printed anymore with this change:
      
      Stack:
             00000000bbc4fbc8 00000000bbc4fc58 0000000000000003 0000000000000000
             00000000bbc4fcf8 00000000bbc4fc70 00000000bbc4fc70 0000000000000020
             000000007fe00098 00000000bfe8be00 00000000bbc4fe94 000000000000000a
             000000000000000c 00000000bbc4fcc0 0000000000000000 0000000000000000
             000000000095b930 0000000000113366 00000000bbc4fc58 00000000bbc4fca0
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      2b7b9817
    • T
      s390/perf: fix null string in perf list pmu command · c39457ff
      Thomas Richter 提交于
      Command 'perf list pmu' displays events which contain
      an invalid string "(null)=xxx", where xxx is the pmu event
      name, for example:
         cpum_cf/AES_BLOCKED_CYCLES,(null)=AES_BLOCKED_CYCLES/
      This is not correct, the invalid string should not be
      displayed at all.
      
      It is caused by an obsolete term in the
      sysfs attribute file for each s390 CPUMF counter event.
      Reading from the sysfs file also displays the event
      name.
      
      Fix this by omitting the event name.  This patch makes
      s390 CPUMF sysfs files consistent with other plattforms.
      
      This is an interface change between user and kernel
      but does not break anything. Reading from a counter event
      sysfs file should only list terms mentioned in the
      /sys/bus/event_source/devices/<cpumf>/format directory.
      Name is not listed.
      Reported-by: NZvonko Kosic <zvonko.kosic@de.ibm.com>
      Signed-off-by: NThomas Richter <tmricht@linux.vnet.ibm.com>
      Reviewed-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      c39457ff
    • M
      s390/ptrace: guarded storage regset for the current task · f5bbd721
      Martin Schwidefsky 提交于
      The regset functions for guarded storage are supposed to work on
      the current task as well. For task == current add the required
      load and store instructions for the guarded storage control block.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      f5bbd721
    • C
      s390/smp: fix false positive kmemleak of mcesa data structure · 9cf8edb7
      Christian Borntraeger 提交于
      I get number of CPUs - 1 kmemleak hits like
      
      unreferenced object 0x37ec6f000 (size 1024):
        comm "swapper/0", pid 1, jiffies 4294937330 (age 889.690s)
        hex dump (first 32 bytes):
          6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
          6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
        backtrace:
          [<000000000034a848>] kmem_cache_alloc+0x2b8/0x3d0
          [<00000000001164de>] __cpu_up+0x456/0x488
          [<000000000016f60c>] bringup_cpu+0x4c/0xd0
          [<000000000016d5d2>] cpuhp_invoke_callback+0xe2/0x9e8
          [<000000000016f3c6>] cpuhp_up_callbacks+0x5e/0x110
          [<000000000016f988>] _cpu_up+0xe0/0x158
          [<000000000016faf0>] do_cpu_up+0xf0/0x110
          [<0000000000dae1ee>] smp_init+0x126/0x130
          [<0000000000d9bd04>] kernel_init_freeable+0x174/0x2e0
          [<000000000089fc62>] kernel_init+0x2a/0x148
          [<00000000008adce2>] kernel_thread_starter+0x6/0xc
          [<00000000008adcdc>] kernel_thread_starter+0x0/0xc
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      The pointer of this data structure is stored in the prefix page of that
      CPU together with some extra bits ORed into the the low bits.
      Mark the data structure as non-leak.
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      9cf8edb7
    • M
      s390/vdso: use _install_special_mapping to establish vdso · 35bb092a
      Martin Schwidefsky 提交于
      Switch to the improved _install_special_mapping function to install
      the vdso mapping. This has two advantages, the arch_vma_name function
      is not needed anymore and the vdso vma still has its name after its
      memory location has been changed with mremap.
      Tested-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      35bb092a
    • M
      s390/cputime: simplify account_system_index_scaled · b29e061b
      Martin Schwidefsky 提交于
      The account_system_index_scaled gets two cputime values, a raw value
      derived from CPU timer deltas and a scaled value. The scaled value
      is always calculated from the raw value, the code can be simplified
      by moving the scale_vtime call into account_system_index_scaled.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      b29e061b
    • H
      s390: add missing header includes for type checking · 92acfb74
      Heiko Carstens 提交于
      Add missing include statements to make sure that prototypes match
      implementation. As reported by sparse:
      
      arch/s390/crypto/arch_random.c:18:1:
        warning: symbol 's390_arch_random_available' was not declared. Should it be static?
      arch/s390/kernel/traps.c:279:13: warning:
        symbol 'trap_init' was not declared. Should it be static?
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      92acfb74
  13. 05 6月, 2017 1 次提交
  14. 26 5月, 2017 2 次提交
  15. 17 5月, 2017 1 次提交
  16. 11 5月, 2017 1 次提交
  17. 09 5月, 2017 3 次提交
    • H
      s390: move _text symbol to address higher than zero · d04a4c76
      Heiko Carstens 提交于
      The perf tool assumes that kernel symbols are never present at address
      zero. In fact it assumes if functions that map symbols to addresses
      return zero, that the symbol was not found.
      
      Given that s390's _text symbol historically is located at address zero
      this yields at least a couple of false errors and warnings in one of
      perf's test cases about not present symbols ("perf test 1").
      
      To fix this simply move the _text symbol to address 0x200, just behind
      the initial psw and channel program located at the beginning of the
      kernel image. This is now hard coded within the linker script.
      
      I tried a nicer solution which moves the initial psw and channel
      program into an own section. However that would move the symbols
      within the "real" head.text section to different addresses, since the
      ".org" statements within head.S are relative to the head.text
      section. If there is a new section in front, everything else will be
      moved. Alternatively I could have adjusted all ".org" statements. But
      this current solution seems to be the easiest one, since nobody really
      cares where the _text symbol is actually located.
      Reported-by: NZvonko Kosic <zkosic@linux.vnet.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      d04a4c76
    • L
      s390: use set_memory.h header · e6c7c630
      Laura Abbott 提交于
      set_memory_* functions have moved to set_memory.h.  Switch to this
      explicitly
      
      Link: http://lkml.kernel.org/r/1488920133-27229-5-git-send-email-labbott@redhat.comSigned-off-by: NLaura Abbott <labbott@redhat.com>
      Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e6c7c630
    • A
      cpumask: make "nr_cpumask_bits" unsigned · c311c797
      Alexey Dobriyan 提交于
      Bit searching functions accept "unsigned long" indices but
      "nr_cpumask_bits" is "int" which is signed, so inevitable sign
      extensions occur on x86_64.  Those MOVSX are #1 MOVSX bloat by number of
      uses across whole kernel.
      
      Change "nr_cpumask_bits" to unsigned, this number can't be negative
      after all.  It allows to do implicit zero-extension on x86_64 without
      MOVSX.
      
      Change signed comparisons into unsigned comparisons where necessary.
      
      Other uses looks fine because it is either argument passed to a function
      or comparison is already unsigned.
      
      Net win on allyesconfig type of kernel: ~2.8 KB (!)
      
      	add/remove: 0/0 grow/shrink: 8/725 up/down: 93/-2926 (-2833)
      	function                                     old     new   delta
      	xen_exit_mmap                                691     735     +44
      	qstat_read                                   426     440     +14
      	__cpufreq_cooling_register                  1678    1687      +9
      	trace_rb_cpu_prepare                         447     455      +8
      	vermagic                                      54      60      +6
      	nfp_driver_version                            54      60      +6
      	rcu_torture_stats_print                     1147    1151      +4
      	find_next_push_cpu                           267     269      +2
      	xen_irq_resume                               961     960      -1
      				...
      	init_vp_index                                946     906     -40
      	od_set_powersave_bias                        328     281     -47
      	power_cpu_exit                               193     139     -54
      	arch_show_interrupts                        3538    3484     -54
      	select_idle_sibling                         1558    1471     -87
      	Total: Before=158358910, After=158356077, chg -0.00%
      
      Same arguments apply to "nr_cpu_ids" but I haven't yet found enough
      courage to delve into this issue (and proper fix may require new type
      "cpu_t" which is whole separate story).
      
      Link: http://lkml.kernel.org/r/20170309205322.GA1728@avx2Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c311c797
  18. 03 5月, 2017 2 次提交