1. 30 7月, 2016 1 次提交
  2. 29 7月, 2016 10 次提交
    • J
      x86/power/64: Fix hibernation return address corruption · 4ce827b4
      Josh Poimboeuf 提交于
      In kernel bug 150021, a kernel panic was reported when restoring a
      hibernate image.  Only a picture of the oops was reported, so I can't
      paste the whole thing here.  But here are the most interesting parts:
      
        kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
        BUG: unable to handle kernel paging request at ffff8804615cfd78
        ...
        RIP: ffff8804615cfd78
        RSP: ffff8804615f0000
        RBP: ffff8804615cfdc0
        ...
        Call Trace:
         do_signal+0x23
         exit_to_usermode_loop+0x64
         ...
      
      The RIP is on the same page as RBP, so it apparently started executing
      on the stack.
      
      The bug was bisected to commit ef0f3ed5 (x86/asm/power: Create
      stack frames in hibernate_asm_64.S), which in retrospect seems quite
      dangerous, since that code saves and restores the stack pointer from a
      global variable ('saved_context').
      
      There are a lot of moving parts in the hibernate save and restore paths,
      so I don't know exactly what caused the panic.  Presumably, a FRAME_END
      was executed without the corresponding FRAME_BEGIN, or vice versa.  That
      would corrupt the return address on the stack and would be consistent
      with the details of the above panic.
      
      [ rjw: One major problem is that by the time the FRAME_BEGIN in
        restore_registers() is executed, the stack pointer value may not
        be valid any more.  Namely, the stack area pointed to by it
        previously may have been overwritten by some image memory contents
        and that page frame may now be used for whatever different purpose
        it had been allocated for before hibernation.  In that case, the
        FRAME_BEGIN will corrupt that memory. ]
      
      Instead of doing the frame pointer save/restore around the bounds of the
      affected functions, just do it around the call to swsusp_save().
      
      That has the same effect of ensuring that if swsusp_save() sleeps, the
      frame pointers will be correct.  It's also a much more obviously safe
      way to do it than the original patch.  And objtool still doesn't report
      any warnings.
      
      Fixes: ef0f3ed5 (x86/asm/power: Create stack frames in hibernate_asm_64.S)
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=150021
      Cc: 4.6+ <stable@vger.kernel.org> # 4.6+
      Reported-by: NAndre Reinke <andre.reinke@mailbox.org>
      Tested-by: NAndre Reinke <andre.reinke@mailbox.org>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      4ce827b4
    • D
      avr32: off by one in at32_init_pio() · 55f1cf83
      Dan Carpenter 提交于
      The pio_dev[] array has MAX_NR_PIO_DEVICES elements so the > should be
      >=.
      
      Fixes: 5f97f7f9 ('[PATCH] avr32 architecture')
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      55f1cf83
    • H
      avr32: fixup code style in unistd.h and syscall_table.S · 6ad4a21b
      Hans-Christian Noren Egtvedt 提交于
      This patch swaps the mix of tabs and space for alignment of comment
      after code to use spaces only.
      
      Also document why recvmmsg was defined twice in the syscall_table.S
      table, but only once in unistd.h. In short, wired in the table by
      generic arch patch, but forgotten in unistd.h (review slip).
      6ad4a21b
    • H
      avr32: wire up preadv2 and pwritev2 syscalls · 389ce5a9
      Hans-Christian Noren Egtvedt 提交于
      This patch wires up the new preadv2 and pwritev2 syscall on AVR32.
      
      On AVR32, all parameters beyond the 5th are passed on the stack. System
      calls don't use the stack -- they borrow a callee-saved register
      instead. This means that syscalls that take 6 parameters must be called
      through a stub that pushes the last parameter on the stack.
      Signed-off-by: NHans-Christian Noren Egtvedt <egtvedt@samfundet.no>
      389ce5a9
    • M
      sparc64 mm: Fix base TSB sizing when hugetlb pages are used · af1b1a9b
      Mike Kravetz 提交于
      do_sparc64_fault() calculates both the base and huge page RSS sizes and
      uses this information in calls to tsb_grow().  The calculation for base
      page TSB size is not correct if the task uses hugetlb pages.  hugetlb
      pages are not accounted for in RSS, therefore the call to get_mm_rss(mm)
      does not include hugetlb pages.  However, the number of pages based on
      huge_pte_count (which does include hugetlb pages) is subtracted from
      this value.  This will result in an artificially small and often negative
      RSS calculation.  The base TSB size is then often set to max_tsb_size
      as the passed RSS is unsigned, so a negative value looks really big.
      
      THP pages are also accounted for in huge_pte_count, and THP pages are
      accounted for in RSS so the calculation in do_sparc64_fault() is correct
      if a task only uses THP pages.
      
      A single huge_pte_count is not sufficient for TSB sizing if both hugetlb
      and THP pages can be used.  Instead of a single counter, use two:  one
      for hugetlb and one for THP.
      Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af1b1a9b
    • D
      arm64:acpi: fix the acpi alignment exception when 'mem=' specified · cb0a6502
      Dennis Chen 提交于
      When booting an ACPI enabled kernel with 'mem=x', there is the
      possibility that ACPI data regions from the firmware will lie above the
      memory limit.  Ordinarily these will be removed by
      memblock_enforce_memory_limit(.).
      
      Unfortunately, this means that these regions will then be mapped by
      acpi_os_ioremap(.) as device memory (instead of normal) thus unaligned
      accessess will then provoke alignment faults.
      
      In this patch we adopt memblock_mem_limit_remove_map instead, and this
      preserves these ACPI data regions (marked NOMAP) thus ensuring that
      these regions are not mapped as device memory.
      
      For example, below is an alignment exception observed on ARM platform
      when booting the kernel with 'acpi=on mem=8G':
      
        ...
        Unable to handle kernel paging request at virtual address ffff0000080521e7
        pgd = ffff000008aa0000
        [ffff0000080521e7] *pgd=000000801fffe003, *pud=000000801fffd003, *pmd=000000801fffc003, *pte=00e80083ff1c1707
        Internal error: Oops: 96000021 [#1] PREEMPT SMP
        Modules linked in:
        CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.7.0-rc3-next-20160616+ #172
        Hardware name: AMD Overdrive/Supercharger/Default string, BIOS ROD1001A 02/09/2016
        task: ffff800001ef0000 ti: ffff800001ef8000 task.ti: ffff800001ef8000
        PC is at acpi_ns_lookup+0x520/0x734
        LR is at acpi_ns_lookup+0x4a4/0x734
        pc : [<ffff0000083b8b10>] lr : [<ffff0000083b8a94>] pstate: 60000045
        sp : ffff800001efb8b0
        x29: ffff800001efb8c0 x28: 000000000000001b
        x27: 0000000000000001 x26: 0000000000000000
        x25: ffff800001efb9e8 x24: ffff000008a10000
        x23: 0000000000000001 x22: 0000000000000001
        x21: ffff000008724000 x20: 000000000000001b
        x19: ffff0000080521e7 x18: 000000000000000d
        x17: 00000000000038ff x16: 0000000000000002
        x15: 0000000000000007 x14: 0000000000007fff
        x13: ffffff0000000000 x12: 0000000000000018
        x11: 000000001fffd200 x10: 00000000ffffff76
        x9 : 000000000000005f x8 : ffff000008725fa8
        x7 : ffff000008a8df70 x6 : ffff000008a8df70
        x5 : ffff000008a8d000 x4 : 0000000000000010
        x3 : 0000000000000010 x2 : 000000000000000c
        x1 : 0000000000000006 x0 : 0000000000000000
        ...
          acpi_ns_lookup+0x520/0x734
          acpi_ds_load1_begin_op+0x174/0x4fc
          acpi_ps_build_named_op+0xf8/0x220
          acpi_ps_create_op+0x208/0x33c
          acpi_ps_parse_loop+0x204/0x838
          acpi_ps_parse_aml+0x1bc/0x42c
          acpi_ns_one_complete_parse+0x1e8/0x22c
          acpi_ns_parse_table+0x8c/0x128
          acpi_ns_load_table+0xc0/0x1e8
          acpi_tb_load_namespace+0xf8/0x2e8
          acpi_load_tables+0x7c/0x110
          acpi_init+0x90/0x2c0
          do_one_initcall+0x38/0x12c
          kernel_init_freeable+0x148/0x1ec
          kernel_init+0x10/0xec
          ret_from_fork+0x10/0x40
        Code: b9009fbc 2a00037b 36380057 3219037b (b9400260)
        ---[ end trace 03381e5eb0a24de4 ]---
        Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
      
      With 'efi=debug', we can see those ACPI regions loaded by firmware on
      that board as:
      
        efi:   0x0083ff185000-0x0083ff1b4fff [Reserved           |   |  |  |  |  |  |  |   |WB|WT|WC|UC]*
        efi:   0x0083ff1b5000-0x0083ff1c2fff [ACPI Reclaim Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]*
        efi:   0x0083ff223000-0x0083ff224fff [ACPI Memory NVS    |   |  |  |  |  |  |  |   |WB|WT|WC|UC]*
      
      Link: http://lkml.kernel.org/r/1468475036-5852-3-git-send-email-dennis.chen@arm.comAcked-by: NSteve Capper <steve.capper@arm.com>
      Signed-off-by: NDennis Chen <dennis.chen@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Kaly Xin <kaly.xin@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cb0a6502
    • M
      mm: move most file-based accounting to the node · 11fb9989
      Mel Gorman 提交于
      There are now a number of accounting oddities such as mapped file pages
      being accounted for on the node while the total number of file pages are
      accounted on the zone.  This can be coped with to some extent but it's
      confusing so this patch moves the relevant file-based accounted.  Due to
      throttling logic in the page allocator for reliable OOM detection, it is
      still necessary to track dirty and writeback pages on a per-zone basis.
      
      [mgorman@techsingularity.net: fix NR_ZONE_WRITE_PENDING accounting]
        Link: http://lkml.kernel.org/r/1468404004-5085-5-git-send-email-mgorman@techsingularity.net
      Link: http://lkml.kernel.org/r/1467970510-21195-20-git-send-email-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@surriel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      11fb9989
    • M
      mm: move page mapped accounting to the node · 50658e2e
      Mel Gorman 提交于
      Reclaim makes decisions based on the number of pages that are mapped but
      it's mixing node and zone information.  Account NR_FILE_MAPPED and
      NR_ANON_PAGES pages on the node.
      
      Link: http://lkml.kernel.org/r/1467970510-21195-18-git-send-email-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@surriel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      50658e2e
    • M
      mm, vmscan: move LRU lists to node · 599d0c95
      Mel Gorman 提交于
      This moves the LRU lists from the zone to the node and related data such
      as counters, tracing, congestion tracking and writeback tracking.
      
      Unfortunately, due to reclaim and compaction retry logic, it is
      necessary to account for the number of LRU pages on both zone and node
      logic.  Most reclaim logic is based on the node counters but the retry
      logic uses the zone counters which do not distinguish inactive and
      active sizes.  It would be possible to leave the LRU counters on a
      per-zone basis but it's a heavier calculation across multiple cache
      lines that is much more frequent than the retry checks.
      
      Other than the LRU counters, this is mostly a mechanical patch but note
      that it introduces a number of anomalies.  For example, the scans are
      per-zone but using per-node counters.  We also mark a node as congested
      when a zone is congested.  This causes weird problems that are fixed
      later but is easier to review.
      
      In the event that there is excessive overhead on 32-bit systems due to
      the nodes being on LRU then there are two potential solutions
      
      1. Long-term isolation of highmem pages when reclaim is lowmem
      
         When pages are skipped, they are immediately added back onto the LRU
         list. If lowmem reclaim persisted for long periods of time, the same
         highmem pages get continually scanned. The idea would be that lowmem
         keeps those pages on a separate list until a reclaim for highmem pages
         arrives that splices the highmem pages back onto the LRU. It potentially
         could be implemented similar to the UNEVICTABLE list.
      
         That would reduce the skip rate with the potential corner case is that
         highmem pages have to be scanned and reclaimed to free lowmem slab pages.
      
      2. Linear scan lowmem pages if the initial LRU shrink fails
      
         This will break LRU ordering but may be preferable and faster during
         memory pressure than skipping LRU pages.
      
      Link: http://lkml.kernel.org/r/1467970510-21195-4-git-send-email-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@surriel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      599d0c95
    • V
      ARC: mm: don't loose PTE_SPECIAL in pte_modify() · 3925a16a
      Vineet Gupta 提交于
      LTP madvise05 was generating mm splat
      
      | [ARCLinux]# /sd/ltp/testcases/bin/madvise05
      | BUG: Bad page map in process madvise05  pte:80e08211 pmd:9f7d4000
      | page:9fdcfc90 count:1 mapcount:-1 mapping:  (null) index:0x0 flags: 0x404(referenced|reserved)
      | page dumped because: bad pte
      | addr:200b8000 vm_flags:00000070 anon_vma:  (null) mapping:  (null) index:1005c
      | file:  (null) fault:  (null) mmap:  (null) readpage:  (null)
      | CPU: 2 PID: 6707 Comm: madvise05
      
      And for newer kernels, the system was rendered unusable afterwards.
      
      The problem was mprotect->pte_modify() clearing PTE_SPECIAL (which is
      set to identify the special zero page wired to the pte).
      When pte was finally unmapped, special casing for zero page was not
      done, and instead it was treated as a "normal" page, tripping on the
      map counts etc.
      
      This fixes ARC STAR 9001053308
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      3925a16a
  3. 28 7月, 2016 4 次提交
  4. 27 7月, 2016 14 次提交
  5. 26 7月, 2016 9 次提交
  6. 25 7月, 2016 2 次提交