1. 17 4月, 2013 17 次提交
  2. 15 4月, 2013 1 次提交
    • M
      s390/kdump: Add PM notifier for kdump · b66ac63e
      Michael Holzheu 提交于
      For s390 the page table mapping for the crashkernel memory is removed to
      protect the pre-loaded kdump kernel and ramdisk. Because the crashkernel
      memory is not included in the page tables for suspend/resume it is not
      included in the suspend image. Therefore after resume the resumed system
      does no longer contain the pre-loaded kdump kernel and when kdump is
      triggered it fails.
      
      This patch adds a PM notifier that creates the page tables before suspend
      is done and removes them for resume. This ensures that the kdump kernel
      is included in the suspend image.
      Signed-off-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      b66ac63e
  3. 02 4月, 2013 2 次提交
    • H
      s390/mm: provide emtpy check_pgt_cache() function · 765a0cac
      Heiko Carstens 提交于
      All architectures need to provide a check_pgt_cache() function. The s390 one
      got lost somewhere.
      So reintroduce it to prevent future compile errors e.g. if Thomas Gleixner's
      idle loop rework patches get merged.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      765a0cac
    • H
      s390/uaccess: fix page table walk · ea81531d
      Heiko Carstens 提交于
      When translating user space addresses to kernel addresses the follow_table()
      function had two bugs:
      
      - PROT_NONE mappings could be read accessed via the kernel mapping. That is
        e.g. putting a filename into a user page, then protecting the page with
        PROT_NONE and afterwards issuing the "open" syscall with a pointer to
        the filename would incorrectly succeed.
      
      - when walking the page tables it used the pgd/pud/pmd/pte primitives which
        with dynamic page tables give no indication which real level of page tables
        is being walked (region2, region3, segment or page table). So in case of an
        exception the translation exception code passed to __handle_fault() is not
        necessarily correct.
        This is not really an issue since __handle_fault() doesn't evaluate the code.
        Only in case of e.g. a SIGBUS this code gets passed to user space. If user
        space can do something sane with the value is a different question though.
      
      To fix these issues don't use any Linux primitives. Only walk the page tables
      like the hardware would do it, however we leave quite some checks away since
      we know that we only have full size page tables and each index is within bounds.
      
      In theory this should fix all issues...
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Reviewed-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      ea81531d
  4. 21 3月, 2013 1 次提交
    • H
      s390/uaccess: fix clear_user_pt() · b7fef2dd
      Heiko Carstens 提交于
      The page table walker variant of clear_user() is supposed to copy the
      contents of the empty zero page to user space.
      However since 238ec4ef "[S390] zero page cache synonyms" empty_zero_page
      is not anymore the page itself but contains the pointer to the empty zero
      pages. Therefore the page table walker variant of clear_user() copied
      the address of the first empty zero page and afterwards more or less
      random data to user space instead of clearing the given user space range.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      b7fef2dd
  5. 13 3月, 2013 1 次提交
  6. 11 3月, 2013 1 次提交
  7. 07 3月, 2013 3 次提交
  8. 05 3月, 2013 3 次提交
    • H
      s390/mm: fix flush_tlb_kernel_range() · f6a70a07
      Heiko Carstens 提交于
      Our flush_tlb_kernel_range() implementation calls __tlb_flush_mm() with
      &init_mm as argument. __tlb_flush_mm() however will only flush tlbs
      for the passed in mm if its mm_cpumask is not empty.
      
      For the init_mm however its mm_cpumask has never any bits set. Which in
      turn means that our flush_tlb_kernel_range() implementation doesn't
      work at all.
      
      This can be easily verified with a vmalloc/vfree loop which allocates
      a page, writes to it and then frees the page again. A crash will follow
      almost instantly.
      
      To fix this remove the cpumask_empty() check in __tlb_flush_mm() since
      there shouldn't be too many mms with a zero mm_cpumask, besides the
      init_mm of course.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      f6a70a07
    • H
      s390/mm: fix vmemmap size calculation · a7bb1ae7
      Heiko Carstens 提交于
      The size of the vmemmap must be a multiple of PAGES_PER_SECTION, since the
      common code always initializes the vmemmap in such pieces.
      So we must round up in order to not have a too small vmemmap.
      
      Fixes an IPL crash on 31 bit with more than 1920MB.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      a7bb1ae7
    • M
      s390: critical section cleanup vs. machine checks · 6551fbdf
      Martin Schwidefsky 提交于
      The current machine check code uses the registers stored by the machine
      in the lowcore at __LC_GPREGS_SAVE_AREA as the registers of the interrupted
      context. The registers 0-7 of a user process can get clobbered if a machine
      checks interrupts the execution of a critical section in entry[64].S.
      
      The reason is that the critical section cleanup code may need to modify
      the PSW and the registers for the previous context to get to the end of a
      critical section. If registers 0-7 have to be replaced the relevant copy
      will be in the registers, which invalidates the copy in the lowcore. The
      machine check handler needs to explicitly store registers 0-7 to the stack.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      6551fbdf
  9. 04 3月, 2013 1 次提交
    • E
      fs: Limit sys_mount to only request filesystem modules. · 7f78e035
      Eric W. Biederman 提交于
      Modify the request_module to prefix the file system type with "fs-"
      and add aliases to all of the filesystems that can be built as modules
      to match.
      
      A common practice is to build all of the kernel code and leave code
      that is not commonly needed as modules, with the result that many
      users are exposed to any bug anywhere in the kernel.
      
      Looking for filesystems with a fs- prefix limits the pool of possible
      modules that can be loaded by mount to just filesystems trivially
      making things safer with no real cost.
      
      Using aliases means user space can control the policy of which
      filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
      with blacklist and alias directives.  Allowing simple, safe,
      well understood work-arounds to known problematic software.
      
      This also addresses a rare but unfortunate problem where the filesystem
      name is not the same as it's module name and module auto-loading
      would not work.  While writing this patch I saw a handful of such
      cases.  The most significant being autofs that lives in the module
      autofs4.
      
      This is relevant to user namespaces because we can reach the request
      module in get_fs_type() without having any special permissions, and
      people get uncomfortable when a user specified string (in this case
      the filesystem type) goes all of the way to request_module.
      
      After having looked at this issue I don't think there is any
      particular reason to perform any filtering or permission checks beyond
      making it clear in the module request that we want a filesystem
      module.  The common pattern in the kernel is to call request_module()
      without regards to the users permissions.  In general all a filesystem
      module does once loaded is call register_filesystem() and go to sleep.
      Which means there is not much attack surface exposed by loading a
      filesytem module unless the filesystem is mounted.  In a user
      namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
      which most filesystems do not set today.
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Reported-by: NKees Cook <keescook@google.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      7f78e035
  10. 28 2月, 2013 10 次提交
    • H
      s390/module: fix compile warning · 72a6b43e
      Heiko Carstens 提交于
      Get rid of this one (false positive):
      arch/s390/kernel/module.c: In function ‘apply_relocate_add’:
      arch/s390/kernel/module.c:404:5: warning: ‘rc’ may be used uninitialized
                                       in this function [-Wmaybe-uninitialized]
      arch/s390/kernel/module.c:225:6: note: ‘rc’ was declared here
      
      Play safe and preinitialize rc with an error value, so we see an error
      if new users indeed don't initialize it.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      72a6b43e
    • H
      s390/uaccess: fix kernel ds access for page table walk · 066c4373
      Heiko Carstens 提交于
      When the kernel resides in home space and the mvcos instruction is not
      available uaccesses for kernel ds happen via simple strnlen() or memcpy()
      calls.
      This however can break badly, since uaccesses in kernel space may fail as
      well, especially if CONFIG_DEBUG_PAGEALLOC is turned on.
      
      To fix this implement strnlen_kernel() and copy_in_kernel() functions
      which can only be used by the page table uaccess functions. These two
      functions detect invalid memory accesses and return the correct length
      of processed data.. Both functions are more or less a copy of the std
      variants without sacf calls.
      
      Fixes ipl crashes on 31 bit machines as well on 64 bit machines without
      mvcos. Caused by changing the default address space of the kernel being
      home space.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      066c4373
    • H
      s390/uaccess: fix strncpy_from_user string length check · 225cf8d6
      Heiko Carstens 提交于
      The "standard" and page table walk variants of strncpy_from_user() first
      check the length of the to be copied string in userspace.
      The string is then copied to kernel space and the length returned to the
      caller.
      However userspace can modify the string at any time while the kernel
      checks for the length of the string or copies the string. In result the
      returned length of the string is not necessarily correct.
      Fix this by copying in a loop which mimics the mvcos variant of
      strncpy_from_user(), which handles this correctly.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      225cf8d6
    • S
      s390/dis: Fix invalid array size · db7760ad
      Syam Sidhardhan 提交于
      We are using sizeof operator for an array given as function argument,
      which is incorrect.
      Signed-off-by: NSyam Sidhardhan <s.syam@samsung.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      db7760ad
    • H
      s390/uaccess: remove pointless access_ok() checks · d12a2970
      Heiko Carstens 提交于
      access_ok() always returns 'true' on s390. Therefore all calls
      are quite pointless and can be removed.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      d12a2970
    • H
      s390/uaccess: fix strncpy_from_user/strnlen_user zero maxlen case · f45655f6
      Heiko Carstens 提交于
      If the maximum length specified for the to be accessed string for
      strncpy_from_user() and strnlen_user() is zero the following incorrect
      values would be returned or incorrect memory accesses would happen:
      
      strnlen_user_std() and strnlen_user_pt() incorrectly return "1"
      strncpy_from_user_pt() would incorrectly access "dst[maxlen - 1]"
      strncpy_from_user_mvcos() would incorrectly return "-EFAULT"
      
      Fix all these oddities by adding early checks.
      Reviewed-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      f45655f6
    • H
      s390/uaccess: shorten strncpy_from_user/strnlen_user · d7b788cd
      Heiko Carstens 提交于
      Always stay within page boundaries when copying from user within
      strlen_user_mvcos()/strncpy_from_user_mvcos(). This allows to
      shorten the code a bit and may prevent unnecessary faults, since
      we copy quite large amounts of memory to kernel space.
      
      Also directly call the mvcos variants of copy_from_user() to
      avoid indirect branches.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      d7b788cd
    • H
      s390/mm: ignore change bit for vmemmap · 17ea345a
      Heiko Carstens 提交于
      Add hint to the page tables that we don't care about the change bit
      in storage keys that belong to vmemmap pages.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      17ea345a
    • H
    • S
      hlist: drop the node parameter from iterators · b67bfe0d
      Sasha Levin 提交于
      I'm not sure why, but the hlist for each entry iterators were conceived
      
              list_for_each_entry(pos, head, member)
      
      The hlist ones were greedy and wanted an extra parameter:
      
              hlist_for_each_entry(tpos, pos, head, member)
      
      Why did they need an extra pos parameter? I'm not quite sure. Not only
      they don't really need it, it also prevents the iterator from looking
      exactly like the list iterator, which is unfortunate.
      
      Besides the semantic patch, there was some manual work required:
      
       - Fix up the actual hlist iterators in linux/list.h
       - Fix up the declaration of other iterators based on the hlist ones.
       - A very small amount of places were using the 'node' parameter, this
       was modified to use 'obj->member' instead.
       - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
       properly, so those had to be fixed up manually.
      
      The semantic patch which is mostly the work of Peter Senna Tschudin is here:
      
      @@
      iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
      
      type T;
      expression a,c,d,e;
      identifier b;
      statement S;
      @@
      
      -T b;
          <+... when != b
      (
      hlist_for_each_entry(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue(a,
      - b,
      c) S
      |
      hlist_for_each_entry_from(a,
      - b,
      c) S
      |
      hlist_for_each_entry_rcu(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_rcu_bh(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue_rcu_bh(a,
      - b,
      c) S
      |
      for_each_busy_worker(a, c,
      - b,
      d) S
      |
      ax25_uid_for_each(a,
      - b,
      c) S
      |
      ax25_for_each(a,
      - b,
      c) S
      |
      inet_bind_bucket_for_each(a,
      - b,
      c) S
      |
      sctp_for_each_hentry(a,
      - b,
      c) S
      |
      sk_for_each(a,
      - b,
      c) S
      |
      sk_for_each_rcu(a,
      - b,
      c) S
      |
      sk_for_each_from
      -(a, b)
      +(a)
      S
      + sk_for_each_from(a) S
      |
      sk_for_each_safe(a,
      - b,
      c, d) S
      |
      sk_for_each_bound(a,
      - b,
      c) S
      |
      hlist_for_each_entry_safe(a,
      - b,
      c, d, e) S
      |
      hlist_for_each_entry_continue_rcu(a,
      - b,
      c) S
      |
      nr_neigh_for_each(a,
      - b,
      c) S
      |
      nr_neigh_for_each_safe(a,
      - b,
      c, d) S
      |
      nr_node_for_each(a,
      - b,
      c) S
      |
      nr_node_for_each_safe(a,
      - b,
      c, d) S
      |
      - for_each_gfn_sp(a, c, d, b) S
      + for_each_gfn_sp(a, c, d) S
      |
      - for_each_gfn_indirect_valid_sp(a, c, d, b) S
      + for_each_gfn_indirect_valid_sp(a, c, d) S
      |
      for_each_host(a,
      - b,
      c) S
      |
      for_each_host_safe(a,
      - b,
      c, d) S
      |
      for_each_mesh_entry(a,
      - b,
      c, d) S
      )
          ...+>
      
      [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
      [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
      [akpm@linux-foundation.org: checkpatch fixes]
      [akpm@linux-foundation.org: fix warnings]
      [akpm@linux-foudnation.org: redo intrusive kvm changes]
      Tested-by: NPeter Senna Tschudin <peter.senna@gmail.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b67bfe0d