1. 16 9月, 2009 14 次提交
    • A
      HWPOISON: Add madvise() based injector for hardware poisoned pages v4 · 9893e49d
      Andi Kleen 提交于
      Impact: optional, useful for debugging
      
      Add a new madvice sub command to inject poison for some
      pages in a process' address space.  This is useful for
      testing the poison page handling.
      
      This patch can allow root to tie up large amounts of memory.
      I got feedback from container developers and they didn't see any
      problem.
      
      v2: Use write flag for get_user_pages to make sure to always get
      a fresh page
      v3: Don't request write mapping (Fengguang Wu)
      v4: Move MADV_* number to avoid conflict with KSM (Hugh Dickins)
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      9893e49d
    • A
      HWPOISON: The high level memory error handler in the VM v7 · 6a46079c
      Andi Kleen 提交于
      Add the high level memory handler that poisons pages
      that got corrupted by hardware (typically by a two bit flip in a DIMM
      or a cache) on the Linux level. The goal is to prevent everyone
      from accessing these pages in the future.
      
      This done at the VM level by marking a page hwpoisoned
      and doing the appropriate action based on the type of page
      it is.
      
      The code that does this is portable and lives in mm/memory-failure.c
      
      To quote the overview comment:
      
      High level machine check handler. Handles pages reported by the
      hardware as being corrupted usually due to a 2bit ECC memory or cache
      failure.
      
      This focuses on pages detected as corrupted in the background.
      When the current CPU tries to consume corruption the currently
      running process can just be killed directly instead. This implies
      that if the error cannot be handled for some reason it's safe to
      just ignore it because no corruption has been consumed yet. Instead
      when that happens another machine check will happen.
      
      Handles page cache pages in various states. The tricky part
      here is that we can access any page asynchronous to other VM
      users, because memory failures could happen anytime and anywhere,
      possibly violating some of their assumptions. This is why this code
      has to be extremely careful. Generally it tries to use normal locking
      rules, as in get the standard locks, even if that means the
      error handling takes potentially a long time.
      
      Some of the operations here are somewhat inefficient and have non
      linear algorithmic complexity, because the data structures have not
      been optimized for this case. This is in particular the case
      for the mapping from a vma to a process. Since this case is expected
      to be rare we hope we can get away with this.
      
      There are in principle two strategies to kill processes on poison:
      - just unmap the data and wait for an actual reference before
      killing
      - kill as soon as corruption is detected.
      Both have advantages and disadvantages and should be used
      in different situations. Right now both are implemented and can
      be switched with a new sysctl vm.memory_failure_early_kill
      The default is early kill.
      
      The patch does some rmap data structure walking on its own to collect
      processes to kill. This is unusual because normally all rmap data structure
      knowledge is in rmap.c only. I put it here for now to keep
      everything together and rmap knowledge has been seeping out anyways
      
      Includes contributions from Johannes Weiner, Chris Mason, Fengguang Wu,
      Nick Piggin (who did a lot of great work) and others.
      
      Cc: npiggin@suse.de
      Cc: riel@redhat.com
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NHidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      6a46079c
    • A
      HWPOISON: Add PR_MCE_KILL prctl to control early kill behaviour per process · 4db96cf0
      Andi Kleen 提交于
      This allows processes to override their early/late kill
      behaviour on hardware memory errors.
      
      Typically applications which are memory error aware is
      better of with early kill (see the error as soon
      as possible), all others with late kill (only
      see the error when the error is really impacting execution)
      
      There's a global sysctl, but this way an application
      can set its specific policy.
      
      We're using two bits, one to signify that the process
      stated its intention and that
      
      I also made the prctl future proof by enforcing
      the unused arguments are 0.
      
      The state is inherited to children.
      
      Note this makes us officially run out of process flags
      on 32bit, but the next patch can easily add another field.
      
      Manpage patch will be supplied separately.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      4db96cf0
    • A
      HWPOISON: Define a new error_remove_page address space op for async truncation · 25718736
      Andi Kleen 提交于
      Truncating metadata pages is not safe right now before
      we haven't audited all file systems.
      
      To enable truncation only for data address space define
      a new address_space callback error_remove_page.
      
      This is used for memory_failure.c memory error handling.
      
      This can be then set to truncate_inode_page()
      
      This patch just defines the new operation and adds documentation.
      
      Callers and users come in followon patches.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      25718736
    • W
      HWPOISON: Add invalidate_inode_page · 83f78668
      Wu Fengguang 提交于
      Add a simple way to invalidate a single page
      This is just a refactoring of the truncate.c code.
      Originally from Fengguang, modified by Andi Kleen.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      83f78668
    • N
      HWPOISON: Refactor truncate to allow direct truncating of page v2 · 750b4987
      Nick Piggin 提交于
      Extract out truncate_inode_page() out of the truncate path so that
      it can be used by memory-failure.c
      
      [AK: description, headers, fix typos]
      v2: Some white space changes from Fengguang Wu
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      750b4987
    • A
      HWPOISON: Handle hardware poisoned pages in try_to_unmap · 888b9f7c
      Andi Kleen 提交于
      When a page has the poison bit set replace the PTE with a poison entry.
      This causes the right error handling to be done later when a process runs
      into it.
      
      v2: add a new flag to not do that (needed for the memory-failure handler
      later) (Fengguang)
      v3: remove unnecessary is_migration_entry() test (Fengguang, Minchan)
      Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
      Reviewed-by: NWu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      888b9f7c
    • A
      HWPOISON: Use bitmask/action code for try_to_unmap behaviour · 14fa31b8
      Andi Kleen 提交于
      try_to_unmap currently has multiple modi (migration, munlock, normal unmap)
      which are selected by magic flag variables. The logic is not very straight
      forward, because each of these flag change multiple behaviours (e.g.
      migration turns off aging, not only sets up migration ptes etc.)
      Also the different flags interact in magic ways.
      
      A later patch in this series adds another mode to try_to_unmap, so
      this becomes quickly unmanageable.
      
      Replace the different flags with a action code (migration, munlock, munmap)
      and some additional flags as modifiers (ignore mlock, ignore aging).
      This makes the logic more straight forward and allows easier extension
      to new behaviours. Change all the caller to declare what they want to
      do.
      
      This patch is supposed to be a nop in behaviour. If anyone can prove
      it is not that would be a bug.
      
      Cc: Lee.Schermerhorn@hp.com
      Cc: npiggin@suse.de
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      14fa31b8
    • A
      HWPOISON: Add basic support for poisoned pages in fault handler v3 · d1737fdb
      Andi Kleen 提交于
      - Add a new VM_FAULT_HWPOISON error code to handle_mm_fault. Right now
      architectures have to explicitely enable poison page support, so
      this is forward compatible to all architectures. They only need
      to add it when they enable poison page support.
      - Add poison page handling in swap in fault code
      
      v2: Add missing delayacct_clear_flag (Hidehiro Kawai)
      v3: Really use delayacct_clear_flag (Hidehiro Kawai)
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      d1737fdb
    • A
      HWPOISON: Add new SIGBUS error codes for hardware poison signals · ad5fa913
      Andi Kleen 提交于
      Add new SIGBUS codes for reporting machine checks as signals. When
      the hardware detects an uncorrected ECC error it can trigger these
      signals.
      
      This is needed for telling KVM's qemu about machine checks that happen to
      guests, so that it can inject them, but might be also useful for other programs.
      I find it useful in my test programs.
      
      This patch merely defines the new types.
      
      - Define two new si_codes for SIGBUS.  BUS_MCEERR_AO and BUS_MCEERR_AR
      * BUS_MCEERR_AO is for "Action Optional" machine checks, which means that some
      corruption has been detected in the background, but nothing has been consumed
      so far. The program can ignore those if it wants (but most programs would
      already get killed)
      * BUS_MCEERR_AR is for "Action Required" machine checks. This happens
      when corrupted data is consumed or the application ran into an area
      which has been known to be corrupted earlier. These require immediate
      action and cannot just returned to. Most programs would kill themselves.
      - They report the address of the corruption in the user address space
      in si_addr.
      - Define a new si_addr_lsb field that reports the extent of the corruption
      to user space. That's currently always a (small) page. The user application
      cannot tell where in this page the corruption happened.
      
      AK: I plan to write a man page update before anyone asks.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      ad5fa913
    • A
      HWPOISON: Add support for poison swap entries v2 · a7420aa5
      Andi Kleen 提交于
      Memory migration uses special swap entry types to trigger special actions on
      page faults. Extend this mechanism to also support poisoned swap entries, to
      trigger poison handling on page faults. This allows follow-on patches to
      prevent processes from faulting in poisoned pages again.
      
      v2: Fix overflow in MAX_SWAPFILES (Fengguang Wu)
      v3: Better overflow fix (Hidehiro Kawai)
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      a7420aa5
    • A
      HWPOISON: Export some rmap vma locking to outside world · 10be22df
      Andi Kleen 提交于
      Needed for later patch that walks rmap entries on its own.
      
      This used to be very frowned upon, but memory-failure.c does
      some rather specialized rmap walking and rmap has been stable
      for quite some time, so I think it's ok now to export it.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      10be22df
    • A
      HWPOISON: Add page flag for poisoned pages · d466f2fc
      Andi Kleen 提交于
      Hardware poisoned pages need special handling in the VM and shouldn't be
      touched again. This requires a new page flag. Define it here.
      
      The page flags wars seem to be over, so it shouldn't be a problem
      to get a new one.
      
      v2: Add TestSetHWPoison (suggested by Johannes Weiner)
      Acked-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      d466f2fc
    • N
      Nicolas Pitre has a new email address · 2f82af08
      Nicolas Pitre 提交于
      Due to problems at cam.org, my nico@cam.org email address is no longer
      valid.  FRom now on, nico@fluxnic.net should be used instead.
      Signed-off-by: NNicolas Pitre <nico@fluxnic.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2f82af08
  2. 15 9月, 2009 3 次提交
  3. 14 9月, 2009 10 次提交
  4. 12 9月, 2009 13 次提交