1. 21 8月, 2013 1 次提交
    • V
      xen/pvhvm: Initialize xen panic handler for PVHVM guests · 669b0ae9
      Vaughan Cao 提交于
      kernel use callback linked in panic_notifier_list to notice others when panic
      happens.
      NORET_TYPE void panic(const char * fmt, ...){
          ...
          atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
      }
      When Xen becomes aware of this, it will call xen_reboot(SHUTDOWN_crash) to
      send out an event with reason code - SHUTDOWN_crash.
      
      xen_panic_handler_init() is defined to register on panic_notifier_list but
      we only call it in xen_arch_setup which only be called by PV, this patch is
      necessary for PVHVM.
      
      Without this patch, setting 'on_crash=coredump-restart' in PVHVM guest config
      file won't lead a vmcore to be generate when the guest panics. It can be
      reproduced with 'echo c > /proc/sysrq-trigger'.
      Signed-off-by: NVaughan Cao <vaughan.cao@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: NJoe Jin <joe.jin@oracle.com>
      669b0ae9
  2. 20 8月, 2013 3 次提交
    • S
      xen/m2p: use GNTTABOP_unmap_and_replace to reinstate the original mapping · ee072640
      Stefano Stabellini 提交于
      GNTTABOP_unmap_grant_ref unmaps a grant and replaces it with a 0
      mapping instead of reinstating the original mapping.
      Doing so separately would be racy.
      
      To unmap a grant and reinstate the original mapping atomically we use
      GNTTABOP_unmap_and_replace.
      GNTTABOP_unmap_and_replace doesn't work with GNTMAP_contains_pte, so
      don't use it for kmaps.  GNTTABOP_unmap_and_replace zeroes the mapping
      passed in new_addr so we have to reinstate it, however that is a
      per-cpu mapping only used for balloon scratch pages, so we can be sure that
      it's not going to be accessed while the mapping is not valid.
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      CC: alex@alex.org.uk
      CC: dcrisan@flexiant.com
      
      [v1: Konrad fixed up the conflicts]
      Conflicts:
      	arch/x86/xen/p2m.c
      ee072640
    • D
      x86/xen: during early setup, only 1:1 map the ISA region · e201bfcc
      David Vrabel 提交于
      During early setup, when the reserved regions and MMIO holes are being
      setup as 1:1 in the p2m, clear any mappings instead of making them 1:1
      (execept for the ISA region which is expected to be mapped).
      
      This fixes a regression introduced in 3.5 by 83d51ab4 (xen/setup:
      update VA mapping when releasing memory during setup) which caused
      hosts with tboot to fail to boot.
      
      tboot marks a region in the e820 map as unusable and the dom0 kernel
      would attempt to map this region and Xen does not permit unusable
      regions to be mapped by guests.
      
      (XEN)  0000000000000000 - 0000000000060000 (usable)
      (XEN)  0000000000060000 - 0000000000068000 (reserved)
      (XEN)  0000000000068000 - 000000000009e000 (usable)
      (XEN)  0000000000100000 - 0000000000800000 (usable)
      (XEN)  0000000000800000 - 0000000000972000 (unusable)
      
      tboot marked this region as unusable.
      
      (XEN)  0000000000972000 - 00000000cf200000 (usable)
      (XEN)  00000000cf200000 - 00000000cf38f000 (reserved)
      (XEN)  00000000cf38f000 - 00000000cf3ce000 (ACPI data)
      (XEN)  00000000cf3ce000 - 00000000d0000000 (reserved)
      (XEN)  00000000e0000000 - 00000000f0000000 (reserved)
      (XEN)  00000000fe000000 - 0000000100000000 (reserved)
      (XEN)  0000000100000000 - 0000000630000000 (usable)
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      e201bfcc
    • D
      x86/xen: disable premption when enabling local irqs · fb58e300
      David Vrabel 提交于
      If CONFIG_PREEMPT is enabled then xen_enable_irq() (and
      xen_restore_fl()) could be preempted and rescheduled on a different
      VCPU in between the clear of the mask and the check for pending
      events.  This may result in events being lost as the upcall will check
      for pending events on the wrong VCPU.
      
      Fix this by disabling preemption around the unmask and check for
      events.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      fb58e300
  3. 09 8月, 2013 2 次提交
    • D
      xen/p2m: avoid unneccesary TLB flush in m2p_remove_override() · 65a45fa2
      David Vrabel 提交于
      In m2p_remove_override() when removing the grant map from the kernel
      mapping and replacing with a mapping to the original page, the grant
      unmap will already have flushed the TLB and it is not necessary to do
      it again after updating the mapping.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      65a45fa2
    • K
      xen: Support 64-bit PV guest receiving NMIs · 6efa20e4
      Konrad Rzeszutek Wilk 提交于
      This is based on a patch that Zhenzhong Duan had sent - which
      was missing some of the remaining pieces. The kernel has the
      logic to handle Xen-type-exceptions using the paravirt interface
      in the assembler code (see PARAVIRT_ADJUST_EXCEPTION_FRAME -
      pv_irq_ops.adjust_exception_frame and and INTERRUPT_RETURN -
      pv_cpu_ops.iret).
      
      That means the nmi handler (and other exception handlers) use
      the hypervisor iret.
      
      The other changes that would be neccessary for this would
      be to translate the NMI_VECTOR to one of the entries on the
      ipi_vector and make xen_send_IPI_mask_allbutself use different
      events.
      
      Fortunately for us commit 1db01b49
      (xen: Clean up apic ipi interface) implemented this and we piggyback
      on the cleanup such that the apic IPI interface will pass the right
      vector value for NMI.
      
      With this patch we can trigger NMIs within a PV guest (only tested
      x86_64).
      
      For this to work with normal PV guests (not initial domain)
      we need the domain to be able to use the APIC ops - they are
      already implemented to use the Xen event channels. For that
      to be turned on in a PV domU we need to remove the masking
      of X86_FEATURE_APIC.
      
      Incidentally that means kgdb will also now work within
      a PV guest without using the 'nokgdbroundup' workaround.
      
      Note that the 32-bit version is different and this patch
      does not enable that.
      
      CC: Lisa Nguyen <lisa@xenapiadmin.com>
      CC: Ben Guthro <benjamin.guthro@citrix.com>
      CC: Zhenzhong Duan <zhenzhong.duan@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      [v1: Fixed up per David Vrabel comments]
      Reviewed-by: NBen Guthro <benjamin.guthro@citrix.com>
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      6efa20e4
  4. 01 8月, 2013 1 次提交
  5. 30 7月, 2013 1 次提交
  6. 24 7月, 2013 1 次提交
  7. 19 7月, 2013 1 次提交
  8. 18 7月, 2013 1 次提交
  9. 17 7月, 2013 1 次提交
    • K
      x86: Make sure IDT is page aligned · 4df05f36
      Kees Cook 提交于
      Since the IDT is referenced from a fixmap, make sure it is page aligned.
      Merge with 32-bit one, since it was already aligned to deal with F00F
      bug. Since bss is cleared before IDT setup, it can live there. This also
      moves the other *_idt_table variables into common locations.
      
      This avoids the risk of the IDT ever being moved in the bss and having
      the mapping be offset, resulting in calling incorrect handlers. In the
      current upstream kernel this is not a manifested bug, but heavily patched
      kernels (such as those using the PaX patch series) did encounter this bug.
      
      The tables other than idt_table technically do not need to be page
      aligned, at least not at the current time, but using a common
      declaration avoids mistakes.  On 64 bits the table is exactly one page
      long, anyway.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Link: http://lkml.kernel.org/r/20130716183441.GA14232@www.outflux.netReported-by: NPaX Team <pageexec@gmail.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      4df05f36
  10. 16 7月, 2013 1 次提交
  11. 15 7月, 2013 1 次提交
    • P
      x86: delete __cpuinit usage from all x86 files · 148f9bb8
      Paul Gortmaker 提交于
      The __cpuinit type of throwaway sections might have made sense
      some time ago when RAM was more constrained, but now the savings
      do not offset the cost and complications.  For example, the fix in
      commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
      is a good example of the nasty type of bugs that can be created
      with improper use of the various __init prefixes.
      
      After a discussion on LKML[1] it was decided that cpuinit should go
      the way of devinit and be phased out.  Once all the users are gone,
      we can then finally remove the macros themselves from linux/init.h.
      
      Note that some harmless section mismatch warnings may result, since
      notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
      are flagged as __cpuinit  -- so if we remove the __cpuinit from
      arch specific callers, we will also get section mismatch warnings.
      As an intermediate step, we intend to turn the linux/init.h cpuinit
      content into no-ops as early as possible, since that will get rid
      of these warnings.  In any case, they are temporary and harmless.
      
      This removes all the arch/x86 uses of the __cpuinit macros from
      all C files.  x86 only had the one __CPUINIT used in assembly files,
      and it wasn't paired off with a .previous or a __FINIT, so we can
      delete it directly w/o any corresponding additional change there.
      
      [1] https://lkml.org/lkml/2013/5/20/589
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: x86@kernel.org
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      148f9bb8
  12. 12 7月, 2013 2 次提交
  13. 11 7月, 2013 2 次提交
  14. 10 7月, 2013 11 次提交
  15. 05 7月, 2013 2 次提交
  16. 04 7月, 2013 9 次提交
    • G
      KVM: VMX: mark unusable segment as nonpresent · 03617c18
      Gleb Natapov 提交于
      Some userspaces do not preserve unusable property. Since usable
      segment has to be present according to VMX spec we can use present
      property to amend userspace bug by making unusable segment always
      nonpresent. vmx_segment_access_rights() already marks nonpresent segment
      as unusable.
      
      Cc: stable@vger.kernel.org # 3.9+
      Reported-by: NStefan Pietsch <stefan.pietsch@lsexperts.de>
      Tested-by: NStefan Pietsch <stefan.pietsch@lsexperts.de>
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      03617c18
    • A
      rapidio: add modular build option for the subsystem core · fdf90abc
      Alexandre Bounine 提交于
      Add a configuration option to build RapidIO subsystem core code as a
      loadable kernel module.  Currently this option is available only for
      x86-based platforms, with the additional patch for PowerPC planned to be
      provided later.
      
      This patch replaces kernel command line parameter "riohdid=" with its
      module-specific analog "rapidio.hdid=".
      Signed-off-by: NAlexandre Bounine <alexandre.bounine@idt.com>
      Cc: Matt Porter <mporter@kernel.crashing.org>
      Cc: Li Yang <leoli@freescale.com>
      Cc: Kumar Gala <galak@kernel.crashing.org>
      Cc: Andre van Herk <andre.van.herk@Prodrive.nl>
      Cc: Micha Nelissen <micha.nelissen@Prodrive.nl>
      Cc: Stef van Os <stef.van.os@Prodrive.nl>
      Cc: Jean Delvare <jdelvare@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fdf90abc
    • O
      x86: kill TIF_DEBUG · 37f07655
      Oleg Nesterov 提交于
      Because it is not used.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      37f07655
    • J
      mm/x86: prepare for removing num_physpages and simplify mem_init() · 46a84132
      Jiang Liu 提交于
      Prepare for removing num_physpages and simplify mem_init().
      Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      46a84132
    • J
      mm: concentrate modification of totalram_pages into the mm core · 0c988534
      Jiang Liu 提交于
      Concentrate code to modify totalram_pages into the mm core, so the arch
      memory initialized code doesn't need to take care of it.  With these
      changes applied, only following functions from mm core modify global
      variable totalram_pages: free_bootmem_late(), free_all_bootmem(),
      free_all_bootmem_node(), adjust_managed_page_count().
      
      With this patch applied, it will be much more easier for us to keep
      totalram_pages and zone->managed_pages in consistence.
      Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: <sworddragon2@aol.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0c988534
    • J
      mm: make __free_pages_bootmem() only available at boot time · 170a5a7e
      Jiang Liu 提交于
      In order to simpilify management of totalram_pages and
      zone->managed_pages, make __free_pages_bootmem() only available at boot
      time.  With this change applied, __free_pages_bootmem() will only be
      used by bootmem.c and nobootmem.c at boot time, so mark it as __init.
      Other callers of __free_pages_bootmem() have been converted to use
      free_reserved_page(), which handles totalram_pages and
      zone->managed_pages in a safer way.
      
      This patch also fix a bug in free_pagetable() for x86_64, which should
      increase zone->managed_pages instead of zone->present_pages when freeing
      reserved pages.
      
      And now we have managed_pages_count_lock to protect totalram_pages and
      zone->managed_pages, so remove the redundant ppb_lock lock in
      put_page_bootmem().  This greatly simplifies the locking rules.
      Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: <sworddragon2@aol.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      170a5a7e
    • J
      mm: accurately calculate zone->managed_pages for highmem zones · 7b4b2a0d
      Jiang Liu 提交于
      Commit "mm: introduce new field 'managed_pages' to struct zone" assumes
      that all highmem pages will be freed into the buddy system by function
      mem_init().  But that's not always true, some architectures may reserve
      some highmem pages during boot.  For example PPC may allocate highmem
      pages for giagant HugeTLB pages, and several architectures have code to
      check PageReserved flag to exclude highmem pages allocated during boot
      when freeing highmem pages into the buddy system.
      
      So treat highmem pages in the same way as normal pages, that is to:
      1) reset zone->managed_pages to zero in mem_init().
      2) recalculate managed_pages when freeing pages into the buddy system.
      Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: <sworddragon2@aol.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7b4b2a0d
    • J
      mm/x86: use free_reserved_area() to simplify code · c88442ec
      Jiang Liu 提交于
      Use common help function free_reserved_area() to simplify code.
      Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: <sworddragon2@aol.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c88442ec
    • P
      mm: soft-dirty bits for user memory changes tracking · 0f8975ec
      Pavel Emelyanov 提交于
      The soft-dirty is a bit on a PTE which helps to track which pages a task
      writes to.  In order to do this tracking one should
      
        1. Clear soft-dirty bits from PTEs ("echo 4 > /proc/PID/clear_refs)
        2. Wait some time.
        3. Read soft-dirty bits (55'th in /proc/PID/pagemap2 entries)
      
      To do this tracking, the writable bit is cleared from PTEs when the
      soft-dirty bit is.  Thus, after this, when the task tries to modify a
      page at some virtual address the #PF occurs and the kernel sets the
      soft-dirty bit on the respective PTE.
      
      Note, that although all the task's address space is marked as r/o after
      the soft-dirty bits clear, the #PF-s that occur after that are processed
      fast.  This is so, since the pages are still mapped to physical memory,
      and thus all the kernel does is finds this fact out and puts back
      writable, dirty and soft-dirty bits on the PTE.
      
      Another thing to note, is that when mremap moves PTEs they are marked
      with soft-dirty as well, since from the user perspective mremap modifies
      the virtual memory at mremap's new address.
      Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Cc: Glauber Costa <glommer@parallels.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0f8975ec