1. 20 5月, 2014 2 次提交
  2. 11 4月, 2014 1 次提交
  3. 03 4月, 2014 1 次提交
    • H
      s390/uaccess: rework uaccess code - fix locking issues · 457f2180
      Heiko Carstens 提交于
      The current uaccess code uses a page table walk in some circumstances,
      e.g. in case of the in atomic futex operations or if running on old
      hardware which doesn't support the mvcos instruction.
      
      However it turned out that the page table walk code does not correctly
      lock page tables when accessing page table entries.
      In other words: a different cpu may invalidate a page table entry while
      the current cpu inspects the pte. This may lead to random data corruption.
      
      Adding correct locking however isn't trivial for all uaccess operations.
      Especially copy_in_user() is problematic since that requires to hold at
      least two locks, but must be protected against ABBA deadlock when a
      different cpu also performs a copy_in_user() operation.
      
      So the solution is a different approach where we change address spaces:
      
      User space runs in primary address mode, or access register mode within
      vdso code, like it currently already does.
      
      The kernel usually also runs in home space mode, however when accessing
      user space the kernel switches to primary or secondary address mode if
      the mvcos instruction is not available or if a compare-and-swap (futex)
      instruction on a user space address is performed.
      KVM however is special, since that requires the kernel to run in home
      address space while implicitly accessing user space with the sie
      instruction.
      
      So we end up with:
      
      User space:
      - runs in primary or access register mode
      - cr1 contains the user asce
      - cr7 contains the user asce
      - cr13 contains the kernel asce
      
      Kernel space:
      - runs in home space mode
      - cr1 contains the user or kernel asce
        -> the kernel asce is loaded when a uaccess requires primary or
           secondary address mode
      - cr7 contains the user or kernel asce, (changed with set_fs())
      - cr13 contains the kernel asce
      
      In case of uaccess the kernel changes to:
      - primary space mode in case of a uaccess (copy_to_user) and uses
        e.g. the mvcp instruction to access user space. However the kernel
        will stay in home space mode if the mvcos instruction is available
      - secondary space mode in case of futex atomic operations, so that the
        instructions come from primary address space and data from secondary
        space
      
      In case of kvm the kernel runs in home space mode, but cr1 gets switched
      to contain the gmap asce before the sie instruction gets executed. When
      the sie instruction is finished cr1 will be switched back to contain the
      user asce.
      
      A context switch between two processes will always load the kernel asce
      for the next process in cr1. So the first exit to user space is a bit
      more expensive (one extra load control register instruction) than before,
      however keeps the code rather simple.
      
      In sum this means there is no need to perform any error prone page table
      walks anymore when accessing user space.
      
      The patch seems to be rather large, however it mainly removes the
      the page table walk code and restores the previously deleted "standard"
      uaccess code, with a couple of changes.
      
      The uaccess without mvcos mode can be enforced with the "uaccess_primary"
      kernel parameter.
      Reported-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      457f2180
  4. 21 2月, 2014 6 次提交
  5. 22 1月, 2014 2 次提交
  6. 16 1月, 2014 1 次提交
  7. 25 11月, 2013 1 次提交
  8. 24 10月, 2013 3 次提交
    • M
      s390/uaccess: always run the kernel in home space · e258d719
      Martin Schwidefsky 提交于
      Simplify the uaccess code by removing the user_mode=home option.
      The kernel will now always run in the home space mode.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      e258d719
    • H
      s390/bitops: rename find_first_bit_left() to find_first_bit_inv() · 7d7c7b24
      Heiko Carstens 提交于
      find_first_bit_left() and friends have nothing to do with the normal
      LSB0 bit numbering for big endian machines used in Linux (least
      significant bit has bit number 0).
      Instead they use MSB0 bit numbering, where the most signficant bit has
      bit number 0. So rename find_first_bit_left() and friends to
      find_first_bit_inv(), to avoid any confusion.
      Also provide inv versions of set_bit, clear_bit and test_bit.
      
      This also removes the confusing use of e.g. set_bit() in airq.c which
      uses a "be_to_le" bit number conversion, which could imply that instead
      set_bit_le() could be used. But that is entirely wrong since the _le
      bitops variant uses yet another bit numbering scheme.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      7d7c7b24
    • H
      s390/bitops: use generic find bit functions / reimplement _left variant · 746479cd
      Heiko Carstens 提交于
      Just like all other architectures we should use out-of-line find bit
      operations, since the inline variant bloat the size of the kernel image.
      And also like all other architecures we should only supply optimized
      variants of the __ffs, ffs, etc. primitives.
      
      Therefore this patch removes the inlined s390 find bit functions and uses
      the generic out-of-line variants instead.
      
      The optimization of the primitives follows with the next patch.
      
      With this patch also the functions find_first_bit_left() and
      find_next_bit_left() have been reimplemented, since logically, they are
      nothing else but a find_first_bit()/find_next_bit() implementation that
      use an inverted __fls() instead of __ffs().
      Also the restriction that these functions only work on machines which
      support the "flogr" instruction is gone now.
      
      This reduces the size of the kernel image (defconfig, -march=z9-109)
      by 144,482 bytes.
      Alone the size of the function build_sched_domains() gets reduced from
      7 KB to 3,5 KB.
      
      We also git rid of unused functions like find_first_bit_le()...
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      746479cd
  9. 22 10月, 2013 1 次提交
    • M
      s390/time: correct use of store clock fast · 8c071b0f
      Martin Schwidefsky 提交于
      The result of the store-clock-fast (STCKF) instruction is a bit fuzzy.
      It can happen that the value stored on one CPU is smaller than the value
      stored on another CPU, although the order of the stores is the other
      way around. This can cause deltas of get_tod_clock() values to become
      negative when they should not be.
      
      We need to be more careful with store-clock-fast, this patch partially
      reverts git commit e4b7b4238e666682555461fa52eecd74652f36bb "time:
      always use stckf instead of stck if available". The get_tod_clock()
      function now uses the store-clock-extended (STCKE) instruction.
      get_tod_clock_fast() can be used if the fuzziness of store-clock-fast
      is acceptable e.g. for wait loops local to a CPU.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      8c071b0f
  10. 28 8月, 2013 1 次提交
  11. 22 8月, 2013 1 次提交
  12. 02 5月, 2013 1 次提交
  13. 01 5月, 2013 1 次提交
    • S
      Kconfig: consolidate CONFIG_DEBUG_STRICT_USER_COPY_CHECKS · 446f24d1
      Stephen Boyd 提交于
      The help text for this config is duplicated across the x86, parisc, and
      s390 Kconfig.debug files.  Arnd Bergman noted that the help text was
      slightly misleading and should be fixed to state that enabling this
      option isn't a problem when using pre 4.4 gcc.
      
      To simplify the rewording, consolidate the text into lib/Kconfig.debug
      and modify it there to be more explicit about when you should say N to
      this config.
      
      Also, make the text a bit more generic by stating that this option
      enables compile time checks so we can cover architectures which emit
      warnings vs.  ones which emit errors.  The details of how an
      architecture decided to implement the checks isn't as important as the
      concept of compile time checking of copy_from_user() calls.
      
      While we're doing this, remove all the copy_from_user_overflow() code
      that's duplicated many times and place it into lib/ so that any
      architecture supporting this option can get the function for free.
      Signed-off-by: NStephen Boyd <sboyd@codeaurora.org>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Acked-by: NHelge Deller <deller@gmx.de>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      446f24d1
  14. 02 4月, 2013 1 次提交
    • H
      s390/uaccess: fix page table walk · ea81531d
      Heiko Carstens 提交于
      When translating user space addresses to kernel addresses the follow_table()
      function had two bugs:
      
      - PROT_NONE mappings could be read accessed via the kernel mapping. That is
        e.g. putting a filename into a user page, then protecting the page with
        PROT_NONE and afterwards issuing the "open" syscall with a pointer to
        the filename would incorrectly succeed.
      
      - when walking the page tables it used the pgd/pud/pmd/pte primitives which
        with dynamic page tables give no indication which real level of page tables
        is being walked (region2, region3, segment or page table). So in case of an
        exception the translation exception code passed to __handle_fault() is not
        necessarily correct.
        This is not really an issue since __handle_fault() doesn't evaluate the code.
        Only in case of e.g. a SIGBUS this code gets passed to user space. If user
        space can do something sane with the value is a different question though.
      
      To fix these issues don't use any Linux primitives. Only walk the page tables
      like the hardware would do it, however we leave quite some checks away since
      we know that we only have full size page tables and each index is within bounds.
      
      In theory this should fix all issues...
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Reviewed-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      ea81531d
  15. 21 3月, 2013 1 次提交
    • H
      s390/uaccess: fix clear_user_pt() · b7fef2dd
      Heiko Carstens 提交于
      The page table walker variant of clear_user() is supposed to copy the
      contents of the empty zero page to user space.
      However since 238ec4ef "[S390] zero page cache synonyms" empty_zero_page
      is not anymore the page itself but contains the pointer to the empty zero
      pages. Therefore the page table walker variant of clear_user() copied
      the address of the first empty zero page and afterwards more or less
      random data to user space instead of clearing the given user space range.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      b7fef2dd
  16. 28 2月, 2013 4 次提交
  17. 14 2月, 2013 2 次提交
    • M
      s390/mm: implement software dirty bits · abf09bed
      Martin Schwidefsky 提交于
      The s390 architecture is unique in respect to dirty page detection,
      it uses the change bit in the per-page storage key to track page
      modifications. All other architectures track dirty bits by means
      of page table entries. This property of s390 has caused numerous
      problems in the past, e.g. see git commit ef5d437f
      "mm: fix XFS oops due to dirty pages without buffers on s390".
      
      To avoid future issues in regard to per-page dirty bits convert
      s390 to a fault based software dirty bit detection mechanism. All
      user page table entries which are marked as clean will be hardware
      read-only, even if the pte is supposed to be writable. A write by
      the user process will trigger a protection fault which will cause
      the user pte to be marked as dirty and the hardware read-only bit
      is removed.
      
      With this change the dirty bit in the storage key is irrelevant
      for Linux as a host, but the storage key is still required for
      KVM guests. The effect is that page_test_and_clear_dirty and the
      related code can be removed. The referenced bit in the storage
      key is still used by the page_test_and_clear_young primitive to
      provide page age information.
      
      For page cache pages of mappings with mapping_cap_account_dirty
      there will not be any change in behavior as the dirty bit tracking
      already uses read-only ptes to control the amount of dirty pages.
      Only for swap cache pages and pages of mappings without
      mapping_cap_account_dirty there can be additional protection faults.
      To avoid an excessive number of additional faults the mk_pte
      primitive checks for PageDirty if the pgprot value allows for writes
      and pre-dirties the pte. That avoids all additional faults for
      tmpfs and shmem pages until these pages are added to the swap cache.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      abf09bed
    • H
      s390/time: rename tod clock access functions · 1aae0560
      Heiko Carstens 提交于
      Fix name clash with some common code device drivers and add "tod"
      to all tod clock access function names.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      1aae0560
  18. 26 10月, 2012 1 次提交
  19. 26 9月, 2012 1 次提交
  20. 17 9月, 2012 1 次提交
    • G
      s390/mm: fix user access page-table walk code · 4db84d4f
      Gerald Schaefer 提交于
      The s390 page-table walk code, used for user copy and futex, currently
      cannot handle huge pages. As far as user copy is concerned, that is
      not really a problem because those functions will only be used on old
      hardware that has no huge page support. But the futex code will also
      use pagetable walk functions on current hardware when user space runs
      in primary space mode. So, if a futex sits in a huge page, the futex
      operation on it will result in a page fault loop or even data
      corruption.
      
      This patch adds the code for resolving huge page mappings in the user
      access pagetable walk code on s390.
      Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      4db84d4f
  21. 20 7月, 2012 2 次提交
    • M
      s390/vtimer: rework virtual timer interface · 27f6b416
      Martin Schwidefsky 提交于
      The current virtual timer interface is inherently per-cpu and hard to
      use. The sole user of the interface is appldata which uses it to execute
      a function after a specific amount of cputime has been used over all cpus.
      
      Rework the virtual timer interface to hook into the cputime accounting.
      This makes the interface independent from the CPU timer interrupts, and
      makes the virtual timers global as opposed to per-cpu.
      Overall the code is greatly simplified. The downside is that the accuracy
      is not as good as the original implementation, but it is still good enough
      for appldata.
      Reviewed-by: NJan Glauber <jang@linux.vnet.ibm.com>
      Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      27f6b416
    • H
      s390/comments: unify copyright messages and remove file names · a53c8fab
      Heiko Carstens 提交于
      Remove the file name from the comment at top of many files. In most
      cases the file name was wrong anyway, so it's rather pointless.
      
      Also unify the IBM copyright statement. We did have a lot of sightly
      different statements and wanted to change them one after another
      whenever a file gets touched. However that never happened. Instead
      people start to take the old/"wrong" statements to use as a template
      for new files.
      So unify all of them in one go.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      a53c8fab
  22. 24 5月, 2012 1 次提交
  23. 11 3月, 2012 2 次提交
    • M
      [S390] rework idle code · 4c1051e3
      Martin Schwidefsky 提交于
      Whenever the cpu loads an enabled wait PSW it will appear as idle to the
      underlying host system. The code in default_idle calls vtime_stop_cpu
      which does the necessary voodoo to get the cpu time accounting right.
      The udelay code just loads an enabled wait PSW. To correct this rework
      the vtime_stop_cpu/vtime_start_cpu logic and move the difficult parts
      to entry[64].S, vtime_stop_cpu can now be called from anywhere and
      vtime_start_cpu is gone. The correction of the cpu time during wakeup
      from an enabled wait PSW is done with a critical section in entry[64].S.
      As vtime_start_cpu is gone, s390_idle_check can be removed as well.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      4c1051e3
    • M
      [S390] rework smp code · 8b646bd7
      Martin Schwidefsky 提交于
      Define struct pcpu and merge some of the NR_CPUS arrays into it, including
      __cpu_logical_map, current_set and smp_cpu_state. Split smp related
      functions to those operating on physical cpus and the functions operating
      on a logical cpu number. Make the functions for physical cpus use a
      pointer to a struct pcpu. This hides the knowledge about cpu addresses in
      smp.c, entry[64].S and swsusp_asm64.S, thus remove the sigp.h header.
      
      The PSW restart mechanism is used to start secondary cpus, calling a
      function on an online cpu, calling a function on the ipl cpu, and for
      the nmi signal. Replace the different assembler functions with a
      single function restart_int_handler. The new entry point calls a function
      whose pointer is stored in the lowcore of the target cpu and it can wait
      for the source cpu to stop. This covers all existing use cases.
      
      Overall the code is now simpler and there are ~380 lines less code.
      Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      8b646bd7
  24. 30 10月, 2011 2 次提交