1. 22 4月, 2014 19 次提交
  2. 09 4月, 2014 1 次提交
    • H
      s390/smp: fix smp_stop_cpu() for !CONFIG_SMP · e7c46c66
      Heiko Carstens 提交于
      smp_stop_cpu() should stop the current cpu even for !CONFIG_SMP.
      Otherwise machine_halt() will return and and the machine generates a
      panic instread of simply stopping the current cpu:
      
      Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000000
      
      CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 3.14.0-01527-g2b6ef16a6bc5 #10
      [...]
      Call Trace:
      ([<0000000000110db0>] show_trace+0xf8/0x158)
       [<0000000000110e7a>] show_stack+0x6a/0xe8
       [<000000000074dba8>] panic+0xe4/0x268
       [<0000000000140570>] do_exit+0xa88/0xb2c
       [<000000000016e12c>] SyS_reboot+0x1f0/0x234
       [<000000000075da70>] sysc_nr_ok+0x22/0x28
       [<000000007d5a09b4>] 0x7d5a09b4
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      e7c46c66
  3. 03 4月, 2014 5 次提交
    • H
      s390/uaccess: rework uaccess code - fix locking issues · 457f2180
      Heiko Carstens 提交于
      The current uaccess code uses a page table walk in some circumstances,
      e.g. in case of the in atomic futex operations or if running on old
      hardware which doesn't support the mvcos instruction.
      
      However it turned out that the page table walk code does not correctly
      lock page tables when accessing page table entries.
      In other words: a different cpu may invalidate a page table entry while
      the current cpu inspects the pte. This may lead to random data corruption.
      
      Adding correct locking however isn't trivial for all uaccess operations.
      Especially copy_in_user() is problematic since that requires to hold at
      least two locks, but must be protected against ABBA deadlock when a
      different cpu also performs a copy_in_user() operation.
      
      So the solution is a different approach where we change address spaces:
      
      User space runs in primary address mode, or access register mode within
      vdso code, like it currently already does.
      
      The kernel usually also runs in home space mode, however when accessing
      user space the kernel switches to primary or secondary address mode if
      the mvcos instruction is not available or if a compare-and-swap (futex)
      instruction on a user space address is performed.
      KVM however is special, since that requires the kernel to run in home
      address space while implicitly accessing user space with the sie
      instruction.
      
      So we end up with:
      
      User space:
      - runs in primary or access register mode
      - cr1 contains the user asce
      - cr7 contains the user asce
      - cr13 contains the kernel asce
      
      Kernel space:
      - runs in home space mode
      - cr1 contains the user or kernel asce
        -> the kernel asce is loaded when a uaccess requires primary or
           secondary address mode
      - cr7 contains the user or kernel asce, (changed with set_fs())
      - cr13 contains the kernel asce
      
      In case of uaccess the kernel changes to:
      - primary space mode in case of a uaccess (copy_to_user) and uses
        e.g. the mvcp instruction to access user space. However the kernel
        will stay in home space mode if the mvcos instruction is available
      - secondary space mode in case of futex atomic operations, so that the
        instructions come from primary address space and data from secondary
        space
      
      In case of kvm the kernel runs in home space mode, but cr1 gets switched
      to contain the gmap asce before the sie instruction gets executed. When
      the sie instruction is finished cr1 will be switched back to contain the
      user asce.
      
      A context switch between two processes will always load the kernel asce
      for the next process in cr1. So the first exit to user space is a bit
      more expensive (one extra load control register instruction) than before,
      however keeps the code rather simple.
      
      In sum this means there is no need to perform any error prone page table
      walks anymore when accessing user space.
      
      The patch seems to be rather large, however it mainly removes the
      the page table walk code and restores the previously deleted "standard"
      uaccess code, with a couple of changes.
      
      The uaccess without mvcos mode can be enforced with the "uaccess_primary"
      kernel parameter.
      Reported-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      457f2180
    • M
      s390/mm,tlb: optimize TLB flushing for zEC12 · 1b948d6c
      Martin Schwidefsky 提交于
      The zEC12 machines introduced the local-clearing control for the IDTE
      and IPTE instruction. If the control is set only the TLB of the local
      CPU is cleared of entries, either all entries of a single address space
      for IDTE, or the entry for a single page-table entry for IPTE.
      Without the local-clearing control the TLB flush is broadcasted to all
      CPUs in the configuration, which is expensive.
      
      The reset of the bit mask of the CPUs that need flushing after a
      non-local IDTE is tricky. As TLB entries for an address space remain
      in the TLB even if the address space is detached a new bit field is
      required to keep track of attached CPUs vs. CPUs in the need of a
      flush. After a non-local flush with IDTE the bit-field of attached CPUs
      is copied to the bit-field of CPUs in need of a flush. The ordering
      of operations on cpu_attach_mask, attach_count and mm_cpumask(mm) is
      such that an underindication in mm_cpumask(mm) is prevented but an
      overindication in mm_cpumask(mm) is possible.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      1b948d6c
    • M
      s390/mm,tlb: safeguard against speculative TLB creation · 02a8f3ab
      Martin Schwidefsky 提交于
      The principles of operations states that the CPU is allowed to create
      TLB entries for an address space anytime while an ASCE is loaded to
      the control register. This is true even if the CPU is running in the
      kernel and the user address space is not (actively) accessed.
      
      In theory this can affect two aspects of the TLB flush logic.
      For full-mm flushes the ASCE of the dying process is still attached.
      The approach to flush first with IDTE and then just free all page
      tables can in theory lead to stale TLB entries. Use the batched
      free of page tables for the full-mm flushes as well.
      
      For operations that can have a stale ASCE in the control register,
      e.g. a delayed update_user_asce in switch_mm, load the kernel ASCE
      to prevent invalid TLBs from being created.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      02a8f3ab
    • T
      s390/irq: Use defines for external interruption codes · 1dad093b
      Thomas Huth 提交于
      Use the new defines for external interruption codes to get rid
      of "magic" numbers in the s390 source code. And while we're at it,
      also rename the (un-)register_external_interrupt function to
      something shorter so that this patch does not exceed the 80
      columns all over the place.
      Signed-off-by: NThomas Huth <thuth@linux.vnet.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      1dad093b
    • T
      s390/irq: Add defines for external interruption codes · 072c2790
      Thomas Huth 提交于
      Introduce defines for external interruption codes so that we
      can get rid of some "magic" numbers in the s390 source code.
      Signed-off-by: NThomas Huth <thuth@linux.vnet.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      072c2790
  4. 01 4月, 2014 1 次提交
    • H
      s390/bitops,atomic: add missing memory barriers · 0ccc8b7a
      Heiko Carstens 提交于
      When reworking the bitops and atomic ops I missed that those instructions
      that got atomic behaviour only perform a "specific-operand-serialization"
      instead of a full "serialization".
      The compare-and-swap instruction used before performs a full serialization
      before and after the instruction is executed, which means it has full
      memory barrier semantics.
      In order to give the new bitops and atomic ops functions also full memory
      barrier semantics add a "bcr 14,0" before and after each of those new
      instructions which performs full serialization as well.
      
      This restores memory barrier semantics for bitops and atomic ops functions
      which return values, like e.g. atomic_add_return(), but not for functions
      which do not return a value, like e.g. atomic_add().
      This is consistent to other architectures and what common code requires.
      
      Cc: stable@vger.kernel.org # v3.13+
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      0ccc8b7a
  5. 25 3月, 2014 1 次提交
  6. 21 3月, 2014 4 次提交
  7. 20 3月, 2014 2 次提交
    • E
      audit: use uapi/linux/audit.h for AUDIT_ARCH declarations · 579ec9e1
      Eric Paris 提交于
      The syscall.h headers were including linux/audit.h but really only
      needed the uapi/linux/audit.h to get the requisite defines.  Switch to
      the uapi headers.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-mips@linux-mips.org
      Cc: linux-s390@vger.kernel.org
      Cc: x86@kernel.org
      579ec9e1
    • E
      syscall_get_arch: remove useless function arguments · 5e937a9a
      Eric Paris 提交于
      Every caller of syscall_get_arch() uses current for the task and no
      implementors of the function need args.  So just get rid of both of
      those things.  Admittedly, since these are inline functions we aren't
      wasting stack space, but it just makes the prototypes better.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-mips@linux-mips.org
      Cc: linux390@de.ibm.com
      Cc: x86@kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-s390@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      5e937a9a
  8. 17 3月, 2014 1 次提交
  9. 14 3月, 2014 1 次提交
  10. 06 3月, 2014 2 次提交
    • H
      s390/compat: build error for large compat syscall args · 9a205286
      Heiko Carstens 提交于
      Enforce 32 bit types for all compat syscall argument types.
      
      This way we can make sure that all arguments get correct sign
      or zero extension. Otherwise incorrect code would be generated.
      
      E.g. for a 'long' type the COMPAT_SYSCALL_DEFINE macro wouldn't
      generate code that would cause sign extension of the passed in 32
      bit user space parameter.
      This can cause quite subtle bugs like e.g. the one that was fixed
      with dfd948e3 "fs/compat: fix parameter handling for compat
      readv/writev syscalls".
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      9a205286
    • H
      fs/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types · 932602e2
      Heiko Carstens 提交于
      Some fs compat system calls have unsigned long parameters instead of
      compat_ulong_t.
      In order to allow the COMPAT_SYSCALL_DEFINE macro generate code that
      performs proper zero and sign extension convert all 64 bit parameters
      their corresponding 32 bit counterparts.
      
      compat_sys_io_getevents() is a bit different: the non-compat version
      has signed parameters for the "min_nr" and "nr" parameters while the
      compat version has unsigned parameters.
      So change this as well. For all practical purposes this shouldn't make
      any difference (doesn't fix a real bug).
      Also introduce a generic compat_aio_context_t type which can be used
      everywhere.
      The access_ok() check within compat_sys_io_getevents() got also removed
      since the non-compat sys_io_getevents() should be able to handle
      everything anyway.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      932602e2
  11. 04 3月, 2014 3 次提交