1. 17 2月, 2012 1 次提交
    • S
      uprobes, mm, x86: Add the ability to install and remove uprobes breakpoints · 2b144498
      Srikar Dronamraju 提交于
      Add uprobes support to the core kernel, with x86 support.
      
      This commit adds the kernel facilities, the actual uprobes
      user-space ABI and perf probe support comes in later commits.
      
      General design:
      
      Uprobes are maintained in an rb-tree indexed by inode and offset
      (the offset here is from the start of the mapping). For a unique
      (inode, offset) tuple, there can be at most one uprobe in the
      rb-tree.
      
      Since the (inode, offset) tuple identifies a unique uprobe, more
      than one user may be interested in the same uprobe. This provides
      the ability to connect multiple 'consumers' to the same uprobe.
      
      Each consumer defines a handler and a filter (optional). The
      'handler' is run every time the uprobe is hit, if it matches the
      'filter' criteria.
      
      The first consumer of a uprobe causes the breakpoint to be
      inserted at the specified address and subsequent consumers are
      appended to this list.  On subsequent probes, the consumer gets
      appended to the existing list of consumers. The breakpoint is
      removed when the last consumer unregisters. For all other
      unregisterations, the consumer is removed from the list of
      consumers.
      
      Given a inode, we get a list of the mms that have mapped the
      inode. Do the actual registration if mm maps the page where a
      probe needs to be inserted/removed.
      
      We use a temporary list to walk through the vmas that map the
      inode.
      
      - The number of maps that map the inode, is not known before we
        walk the rmap and keeps changing.
      - extending vm_area_struct wasn't recommended, it's a
        size-critical data structure.
      - There can be more than one maps of the inode in the same mm.
      
      We add callbacks to the mmap methods to keep an eye on text vmas
      that are of interest to uprobes.  When a vma of interest is mapped,
      we insert the breakpoint at the right address.
      
      Uprobe works by replacing the instruction at the address defined
      by (inode, offset) with the arch specific breakpoint
      instruction. We save a copy of the original instruction at the
      uprobed address.
      
      This is needed for:
      
       a. executing the instruction out-of-line (xol).
       b. instruction analysis for any subsequent fixups.
       c. restoring the instruction back when the uprobe is unregistered.
      
      We insert or delete a breakpoint instruction, and this
      breakpoint instruction is assumed to be the smallest instruction
      available on the platform. For fixed size instruction platforms
      this is trivially true, for variable size instruction platforms
      the breakpoint instruction is typically the smallest (often a
      single byte).
      
      Writing the instruction is done by COWing the page and changing
      the instruction during the copy, this even though most platforms
      allow atomic writes of the breakpoint instruction. This also
      mirrors the behaviour of a ptrace() memory write to a PRIVATE
      file map.
      
      The core worker is derived from KSM's replace_page() logic.
      
      In essence, similar to KSM:
      
       a. allocate a new page and copy over contents of the page that
          has the uprobed vaddr
       b. modify the copy and insert the breakpoint at the required
          address
       c. switch the original page with the copy containing the
          breakpoint
       d. flush page tables.
      
      replace_page() is being replicated here because of some minor
      changes in the type of pages and also because Hugh Dickins had
      plans to improve replace_page() for KSM specific work.
      
      Instruction analysis on x86 is based on instruction decoder and
      determines if an instruction can be probed and determines the
      necessary fixups after singlestep.  Instruction analysis is done
      at probe insertion time so that we avoid having to repeat the
      same analysis every time a probe is hit.
      
      A lot of code here is due to the improvement/suggestions/inputs
      from Peter Zijlstra.
      
      Changelog:
      
      (v10):
       - Add code to clear REX.B prefix as suggested by Denys Vlasenko
         and Masami Hiramatsu.
      
      (v9):
       - Use insn_offset_modrm as suggested by Masami Hiramatsu.
      
      (v7):
      
       Handle comments from Peter Zijlstra:
      
       - Dont take reference to inode. (expect inode to uprobe_register to be sane).
       - Use PTR_ERR to set the return value.
       - No need to take reference to inode.
       - use PTR_ERR to return error value.
       - register and uprobe_unregister share code.
      
      (v5):
      
       - Modified del_consumer as per comments from Peter.
       - Drop reference to inode before dropping reference to uprobe.
       - Use i_size_read(inode) instead of inode->i_size.
       - Ensure uprobe->consumers is NULL, before __uprobe_unregister() is called.
       - Includes errno.h as recommended by Stephen Rothwell to fix a build issue
         on sparc defconfig
       - Remove restrictions while unregistering.
       - Earlier code leaked inode references under some conditions while
         registering/unregistering.
       - Continue the vma-rmap walk even if the intermediate vma doesnt
         meet the requirements.
       - Validate the vma found by find_vma before inserting/removing the
         breakpoint
       - Call del_consumer under mutex_lock.
       - Use hash locks.
       - Handle mremap.
       - Introduce find_least_offset_node() instead of close match logic in
         find_uprobe
       - Uprobes no more depends on MM_OWNER; No reference to task_structs
         while inserting/removing a probe.
       - Uses read_mapping_page instead of grab_cache_page so that the pages
         have valid content.
       - pass NULL to get_user_pages for the task parameter.
       - call SetPageUptodate on the new page allocated in write_opcode.
       - fix leaking a reference to the new page under certain conditions.
       - Include Instruction Decoder if Uprobes gets defined.
       - Remove const attributes for instruction prefix arrays.
       - Uses mm_context to know if the application is 32 bit.
      Signed-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Also-written-by: NJim Keniston <jkenisto@us.ibm.com>
      Reviewed-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Roland McGrath <roland@hack.frob.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Anton Arapov <anton@redhat.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linux-mm <linux-mm@kvack.org>
      Link: http://lkml.kernel.org/r/20120209092642.GE16600@linux.vnet.ibm.com
      [ Made various small edits to the commit log ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2b144498
  2. 26 1月, 2012 1 次提交
  3. 17 1月, 2012 1 次提交
  4. 13 1月, 2012 3 次提交
  5. 11 1月, 2012 1 次提交
  6. 30 12月, 2011 1 次提交
  7. 18 12月, 2011 2 次提交
  8. 16 12月, 2011 1 次提交
  9. 13 12月, 2011 1 次提交
    • M
      x86, efi: EFI boot stub support · 291f3632
      Matt Fleming 提交于
      There is currently a large divide between kernel development and the
      development of EFI boot loaders. The idea behind this patch is to give
      the kernel developers full control over the EFI boot process. As
      H. Peter Anvin put it,
      
      "The 'kernel carries its own stub' approach been very successful in
      dealing with BIOS, and would make a lot of sense to me for EFI as
      well."
      
      This patch introduces an EFI boot stub that allows an x86 bzImage to
      be loaded and executed by EFI firmware. The bzImage appears to the
      firmware as an EFI application. Luckily there are enough free bits
      within the bzImage header so that it can masquerade as an EFI
      application, thereby coercing the EFI firmware into loading it and
      jumping to its entry point. The beauty of this masquerading approach
      is that both BIOS and EFI boot loaders can still load and run the same
      bzImage, thereby allowing a single kernel image to work in any boot
      environment.
      
      The EFI boot stub supports multiple initrds, but they must exist on
      the same partition as the bzImage. Command-line arguments for the
      kernel can be appended after the bzImage name when run from the EFI
      shell, e.g.
      
      Shell> bzImage console=ttyS0 root=/dev/sdb initrd=initrd.img
      
      v7:
       - Fix checkpatch warnings.
      
      v6:
      
       - Try to allocate initrd memory just below hdr->inird_addr_max.
      
      v5:
      
       - load_options_size is UTF-16, which needs dividing by 2 to convert
         to the corresponding ASCII size.
      
      v4:
      
       - Don't read more than image->load_options_size
      
      v3:
      
       - Fix following warnings when compiling CONFIG_EFI_STUB=n
      
         arch/x86/boot/tools/build.c: In function ‘main’:
         arch/x86/boot/tools/build.c:138:24: warning: unused variable ‘pe_header’
         arch/x86/boot/tools/build.c:138:15: warning: unused variable ‘file_sz’
      
       - As reported by Matthew Garrett, some Apple machines have GOPs that
         don't have hardware attached. We need to weed these out by
         searching for ones that handle the PCIIO protocol.
      
       - Don't allocate memory if no initrds are on cmdline
       - Don't trust image->load_options_size
      
      Maarten Lankhorst noted:
       - Don't strip first argument when booted from efibootmgr
       - Don't allocate too much memory for cmdline
       - Don't update cmdline_size, the kernel considers it read-only
       - Don't accept '\n' for initrd names
      
      v2:
      
       - File alignment was too large, was 8192 should be 512. Reported by
         Maarten Lankhorst on LKML.
       - Added UGA support for graphics
       - Use VIDEO_TYPE_EFI instead of hard-coded number.
       - Move linelength assignment until after we've assigned depth
       - Dynamically fill out AddressOfEntryPoint in tools/build.c
       - Don't use magic number for GDT/TSS stuff. Requested by Andi Kleen
       - The bzImage may need to be relocated as it may have been loaded at
         a high address address by the firmware. This was required to get my
         macbook booting because the firmware loaded it at 0x7cxxxxxx, which
         triggers this error in decompress_kernel(),
      
      	if (heap > ((-__PAGE_OFFSET-(128<<20)-1) & 0x7fffffff))
      		error("Destination address too large");
      
      Cc: Mike Waychison <mikew@google.com>
      Cc: Matthew Garrett <mjg@redhat.com>
      Tested-by: NHenrik Rydberg <rydberg@euromail.se>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      Link: http://lkml.kernel.org/r/1321383097.2657.9.camel@mfleming-mobl1.ger.corp.intel.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      291f3632
  10. 09 12月, 2011 1 次提交
    • T
      memblock: Kill early_node_map[] · 0ee332c1
      Tejun Heo 提交于
      Now all ARCH_POPULATES_NODE_MAP archs select HAVE_MEBLOCK_NODE_MAP -
      there's no user of early_node_map[] left.  Kill early_node_map[] and
      replace ARCH_POPULATES_NODE_MAP with HAVE_MEMBLOCK_NODE_MAP.  Also,
      relocate for_each_mem_pfn_range() and helper from mm.h to memblock.h
      as page_alloc.c would no longer host an alternative implementation.
      
      This change is ultimately one to one mapping and shouldn't cause any
      observable difference; however, after the recent changes, there are
      some functions which now would fit memblock.c better than page_alloc.c
      and dependency on HAVE_MEMBLOCK_NODE_MAP instead of HAVE_MEMBLOCK
      doesn't make much sense on some of them.  Further cleanups for
      functions inside HAVE_MEMBLOCK_NODE_MAP in mm.h would be nice.
      
      -v2: Fix compile bug introduced by mis-spelling
       CONFIG_HAVE_MEMBLOCK_NODE_MAP to CONFIG_MEMBLOCK_HAVE_NODE_MAP in
       mmzone.h.  Reported by Stephen Rothwell.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Chen Liqin <liqin.chen@sunplusct.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      0ee332c1
  11. 06 12月, 2011 4 次提交
  12. 05 12月, 2011 1 次提交
  13. 25 11月, 2011 1 次提交
  14. 23 11月, 2011 1 次提交
  15. 01 11月, 2011 1 次提交
  16. 13 10月, 2011 1 次提交
  17. 28 9月, 2011 1 次提交
    • P
      doc: fix broken references · 395cf969
      Paul Bolle 提交于
      There are numerous broken references to Documentation files (in other
      Documentation files, in comments, etc.). These broken references are
      caused by typo's in the references, and by renames or removals of the
      Documentation files. Some broken references are simply odd.
      
      Fix these broken references, sometimes by dropping the irrelevant text
      they were part of.
      Signed-off-by: NPaul Bolle <pebolle@tiscali.nl>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      395cf969
  18. 21 9月, 2011 2 次提交
    • E
      x86: geode: New PCEngines Alix system driver · d4f3e350
      Ed Wildgoose 提交于
      This new driver replaces the old PCEngines Alix 2/3 LED driver with a
      new driver that controls the LEDs through the leds-gpio driver. The
      old driver accessed GPIOs directly, which created a conflict and
      prevented also loading the cs5535-gpio driver to read other GPIOs on
      the Alix board. With this new driver, we hook into leds-gpio which in
      turn uses GPIO to control the LEDs and therefore it's possible to
      control both the LEDs and access onboard GPIOs
      
      Driver is moved to platform/geode as requested by Grant and any other
      geode initialisation modules should move here also
      
      This driver is inspired by leds-net5501.c by Alessandro Zummo.
      
      Ideally, leds-net5501.c should also be moved to platform/geode. 
      Additionally the driver relies on parts of the patch: 7f131cf3 ("leds:
      leds-alix2c - take port address from MSR) by Daniel Mack to perform
      detection of the Alix board.
      
      [akpm@linux-foundation.org: include module.h]
      Signed-off-by: NEd Wildgoose <kernel@wildgooses.com>
      Cc: git@wildgooses.com
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Cc: Daniel Mack <daniel@caiaq.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Richard Purdie <rpurdie@rpsys.net>
      Reviewed-by: NGrant Likely <grant.likely@secretlab.ca>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      d4f3e350
    • S
      iommu: Rename the DMAR and INTR_REMAP config options · d3f13810
      Suresh Siddha 提交于
      Change the CONFIG_DMAR to CONFIG_INTEL_IOMMU to be consistent
      with the other IOMMU options.
      
      Rename the CONFIG_INTR_REMAP to CONFIG_IRQ_REMAP to match the
      irq subsystem name.
      
      And define the CONFIG_DMAR_TABLE for the common ACPI DMAR
      routines shared by both CONFIG_INTEL_IOMMU and CONFIG_IRQ_REMAP.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: yinghai@kernel.org
      Cc: youquan.song@intel.com
      Cc: joerg.roedel@amd.com
      Cc: tony.luck@intel.com
      Cc: dwmw2@infradead.org
      Link: http://lkml.kernel.org/r/20110824001456.558630224@sbsiddha-desk.sc.intel.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      d3f13810
  19. 08 9月, 2011 1 次提交
  20. 03 8月, 2011 1 次提交
  21. 01 8月, 2011 1 次提交
    • H
      x86, random: Architectural inlines to get random integers with RDRAND · 628c6246
      H. Peter Anvin 提交于
      Architectural inlines to get random ints and longs using the RDRAND
      instruction.
      
      Intel has introduced a new RDRAND instruction, a Digital Random Number
      Generator (DRNG), which is functionally an high bandwidth entropy
      source, cryptographic whitener, and integrity monitor all built into
      hardware.  This enables RDRAND to be used directly, bypassing the
      kernel random number pool.
      
      For technical documentation, see:
      
      http://software.intel.com/en-us/articles/download-the-latest-bull-mountain-software-implementation-guide/
      
      In this patch, this is *only* used for the nonblocking random number
      pool.  RDRAND is a nonblocking source, similar to our /dev/urandom,
      and is therefore not a direct replacement for /dev/random.  The
      architectural hooks presented in the previous patch only feed the
      kernel internal users, which only use the nonblocking pool, and so
      this is not a problem.
      
      Since this instruction is available in userspace, there is no reason
      to have a /dev/hw_rng device driver for the purpose of feeding rngd.
      This is especially so since RDRAND is a nonblocking source, and needs
      additional whitening and reduction (see the above technical
      documentation for details) in order to be of "pure entropy source"
      quality.
      
      The CONFIG_EXPERT compile-time option can be used to disable this use
      of RDRAND.
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Originally-by: NFenghua Yu <fenghua.yu@intel.com>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      628c6246
  22. 25 7月, 2011 1 次提交
  23. 22 7月, 2011 2 次提交
  24. 21 7月, 2011 1 次提交
  25. 15 7月, 2011 2 次提交
  26. 14 7月, 2011 1 次提交
    • G
      sched: adjust scheduler cpu power for stolen time · 095c0aa8
      Glauber Costa 提交于
      This patch makes update_rq_clock() aware of steal time.
      The mechanism of operation is not different from irq_time,
      and follows the same principles. This lives in a CONFIG
      option itself, and can be compiled out independently of
      the rest of steal time reporting. The effect of disabling it
      is that the scheduler will still report steal time (that cannot be
      disabled), but won't use this information for cpu power adjustments.
      
      Everytime update_rq_clock_task() is invoked, we query information
      about how much time was stolen since last call, and feed it into
      sched_rt_avg_update().
      
      Although steal time reporting in account_process_tick() keeps
      track of the last time we read the steal clock, in prev_steal_time,
      this patch do it independently using another field,
      prev_steal_time_rq. This is because otherwise, information about time
      accounted in update_process_tick() would never reach us in update_rq_clock().
      Signed-off-by: NGlauber Costa <glommer@redhat.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Tested-by: NEric B Munson <emunson@mgebm.net>
      CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      CC: Anthony Liguori <aliguori@us.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      095c0aa8
  27. 11 7月, 2011 2 次提交
  28. 07 7月, 2011 3 次提交