1. 21 9月, 2015 1 次提交
    • H
      net: add Hisilicon Network Subsystem support (config and documents) · fc7e37c6
      huangdaode 提交于
      The Hisilicon Network Subsystem is a long term evolution IP which is
      supposed to be used in Hisilicon ICT SoC. The IP, which is called hns
      for short, is a TCP/IP acceleration engine, which can directly decode
      TCP/IP stream and distribute them to different ring buffers.
      
      HNS can be configured to work on different mode for different scenario.
      This patch make use only some of the mode to make it as standard
      ethernet NIC. The other mode will be added soon.
      
      The whole function has 4 kernel sub-modules:
      
      hnae: the HNS acceleration engine framework. It provides a abstract
      interface between the engine and the upper layers which make use of the
      engine by ring buffer.
      
      hns_enet_drv: a standard ethernet driver that base on the ring buffer.
      
      hns_dsaf: one of the implementation of HNS acceleration engine, which is
      applied on Hililicon hip05, Hi1610 and other later-on SoCs
      
      hns_mdio: the mdio control to the PHY, used by acceleration engine
      
      This submit add basic config and documents
      Signed-off-by: Nhuangdaode <huangdaode@hisilicon.com>
      Signed-off-by: NKenneth Lee <liguozhu@huawei.com>
      Signed-off-by: NYisen Zhuang <Yisen.Zhuang@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fc7e37c6
  2. 16 9月, 2015 1 次提交
  3. 11 9月, 2015 9 次提交
    • C
      dma-mapping: consolidate dma_set_mask · 452e06af
      Christoph Hellwig 提交于
      Almost everyone implements dma_set_mask the same way, although some time
      that's hidden in ->set_dma_mask methods.
      
      This patch consolidates those into a common implementation that either
      calls ->set_dma_mask if present or otherwise uses the default
      implementation.  Some architectures used to only call ->set_dma_mask
      after the initial checks, and those instance have been fixed to do the
      full work.  h8300 implemented dma_set_mask bogusly as a no-ops and has
      been fixed.
      
      Unfortunately some architectures overload unrelated semantics like changing
      the dma_ops into it so we still need to allow for an architecture override
      for now.
      
      [jcmvbkbc@gmail.com: fix xtensa]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
      Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      452e06af
    • C
      dma-mapping: consolidate dma_supported · ee196371
      Christoph Hellwig 提交于
      Most architectures just call into ->dma_supported, but some also return 1
      if the method is not present, or 0 if no dma ops are present (although
      that should never happeb). Consolidate this more broad version into
      common code.
      
      Also fix h8300 which inorrectly always returned 0, which would have been
      a problem if it's dma_set_mask implementation wasn't a similarly buggy
      noop.
      
      As a few architectures have much more elaborate implementations, we
      still allow for arch overrides.
      
      [jcmvbkbc@gmail.com: fix xtensa]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
      Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ee196371
    • C
      dma-mapping: cosolidate dma_mapping_error · efa21e43
      Christoph Hellwig 提交于
      Currently there are three valid implementations of dma_mapping_error:
      
       (1) call ->mapping_error
       (2) check for a hardcoded error code
       (3) always return 0
      
      This patch provides a common implementation that calls ->mapping_error
      if present, then checks for DMA_ERROR_CODE if defined or otherwise
      returns 0.
      
      [jcmvbkbc@gmail.com: fix xtensa]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
      Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      efa21e43
    • C
      dma-mapping: consolidate dma_{alloc,free}_noncoherent · 1e893752
      Christoph Hellwig 提交于
      Most architectures do not support non-coherent allocations and either
      define dma_{alloc,free}_noncoherent to their coherent versions or stub
      them out.
      
      Openrisc uses dma_{alloc,free}_attrs to implement them, and only Mips
      implements them directly.
      
      This patch moves the Openrisc version to common code, and handles the
      DMA_ATTR_NON_CONSISTENT case in the mips dma_map_ops instance.
      
      Note that actual non-coherent allocations require a dma_cache_sync
      implementation, so if non-coherent allocations didn't work on
      an architecture before this patch they still won't work after it.
      
      [jcmvbkbc@gmail.com: fix xtensa]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
      Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1e893752
    • C
      dma-mapping: consolidate dma_{alloc,free}_{attrs,coherent} · 6894258e
      Christoph Hellwig 提交于
      Since 2009 we have a nice asm-generic header implementing lots of DMA API
      functions for architectures using struct dma_map_ops, but unfortunately
      it's still missing a lot of APIs that all architectures still have to
      duplicate.
      
      This series consolidates the remaining functions, although we still need
      arch opt outs for two of them as a few architectures have very
      non-standard implementations.
      
      This patch (of 5):
      
      The coherent DMA allocator works the same over all architectures supporting
      dma_map operations.
      
      This patch consolidates them and converges the minor differences:
      
       - the debug_dma helpers are now called from all architectures, including
         those that were previously missing them
       - dma_alloc_from_coherent and dma_release_from_coherent are now always
         called from the generic alloc/free routines instead of the ops
         dma-mapping-common.h always includes dma-coherent.h to get the defintions
         for them, or the stubs if the architecture doesn't support this feature
       - checks for ->alloc / ->free presence are removed.  There is only one
         magic instead of dma_map_ops without them (mic_dma_ops) and that one
         is x86 only anyway.
      
      Besides that only x86 needs special treatment to replace a default devices
      if none is passed and tweak the gfp_flags.  An optional arch hook is provided
      for that.
      
      [linux@roeck-us.net: fix build]
      [jcmvbkbc@gmail.com: fix xtensa]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6894258e
    • O
      mm, mpx: add "vm_flags_t vm_flags" arg to do_mmap_pgoff() · 1fcfd8db
      Oleg Nesterov 提交于
      Add the additional "vm_flags_t vm_flags" argument to do_mmap_pgoff(),
      rename it to do_mmap(), and re-introduce do_mmap_pgoff() as a simple
      wrapper on top of do_mmap().  Perhaps we should update the callers of
      do_mmap_pgoff() and kill it later.
      
      This way mpx_mmap() can simply call do_mmap(vm_flags => VM_MPX) and do not
      play with vm internals.
      
      After this change mmap_region() has a single user outside of mmap.c,
      arch/tile/mm/elf.c:arch_setup_additional_pages().  It would be nice to
      change arch/tile/ and unexport mmap_region().
      
      [kirill@shutemov.name: fix build]
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NDave Hansen <dave.hansen@linux.intel.com>
      Tested-by: NDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1fcfd8db
    • K
      mm: mark most vm_operations_struct const · 7cbea8dc
      Kirill A. Shutemov 提交于
      With two exceptions (drm/qxl and drm/radeon) all vm_operations_struct
      structs should be constant.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7cbea8dc
    • Y
      lib/decompressors: use real out buf size for gunzip with kernel · 2d3862d2
      Yinghai Lu 提交于
      When loading x86 64bit kernel above 4GiB with patched grub2, got kernel
      gunzip error.
      
      | early console in decompress_kernel
      | decompress_kernel:
      |       input: [0x807f2143b4-0x807ff61aee]
      |      output: [0x807cc00000-0x807f3ea29b] 0x027ea29c: output_len
      | boot via startup_64
      | KASLR using RDTSC...
      |  new output: [0x46fe000000-0x470138cfff] 0x0338d000: output_run_size
      |  decompress: [0x46fe000000-0x47007ea29b] <=== [0x807f2143b4-0x807ff61aee]
      |
      | Decompressing Linux... gz...
      |
      | uncompression error
      |
      | -- System halted
      
      the new buffer is at 0x46fe000000ULL, decompressor_gzip is using
      0xffffffb901ffffff as out_len.  gunzip in lib/zlib_inflate/inflate.c cap
      that len to 0x01ffffff and decompress fails later.
      
      We could hit this problem with crashkernel booting that uses kexec loading
      kernel above 4GiB.
      
      We have decompress_* support:
          1. inbuf[]/outbuf[] for kernel preboot.
          2. inbuf[]/flush() for initramfs
          3. fill()/flush() for initrd.
      This bug only affect kernel preboot path that use outbuf[].
      
      Add __decompress and take real out_buf_len for gunzip instead of guessing
      wrong buf size.
      
      Fixes: 1431574a (lib/decompressors: fix "no limit" output buffer length)
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Alexandre Courbot <acourbot@nvidia.com>
      Cc: Jon Medhurst <tixy@linaro.org>
      Cc: Stephen Warren <swarren@wwwdotorg.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2d3862d2
    • D
      kexec: split kexec_load syscall from kexec core code · 2965faa5
      Dave Young 提交于
      There are two kexec load syscalls, kexec_load another and kexec_file_load.
       kexec_file_load has been splited as kernel/kexec_file.c.  In this patch I
      split kexec_load syscall code to kernel/kexec.c.
      
      And add a new kconfig option KEXEC_CORE, so we can disable kexec_load and
      use kexec_file_load only, or vice verse.
      
      The original requirement is from Ted Ts'o, he want kexec kernel signature
      being checked with CONFIG_KEXEC_VERIFY_SIG enabled.  But kexec-tools use
      kexec_load syscall can bypass the checking.
      
      Vivek Goyal proposed to create a common kconfig option so user can compile
      in only one syscall for loading kexec kernel.  KEXEC/KEXEC_FILE selects
      KEXEC_CORE so that old config files still work.
      
      Because there's general code need CONFIG_KEXEC_CORE, so I updated all the
      architecture Kconfig with a new option KEXEC_CORE, and let KEXEC selects
      KEXEC_CORE in arch Kconfig.  Also updated general kernel code with to
      kexec_load syscall.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NDave Young <dyoung@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Petr Tesarik <ptesarik@suse.cz>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Josh Boyer <jwboyer@fedoraproject.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2965faa5
  4. 10 9月, 2015 13 次提交
  5. 09 9月, 2015 10 次提交
    • V
      hexagon/time: Migrate to new 'set-state' interface · d70e22d5
      Viresh Kumar 提交于
      Migrate hexagon driver to the new 'set-state' interface provided by
      clockevents core, the earlier 'set-mode' interface is marked obsolete
      now.
      
      This also enables us to implement callbacks for new states of clockevent
      devices, for example: ONESHOT_STOPPED.
      
      We weren't doing anything in the ->set_mode() callback. So, this patch
      doesn't provide any set-state callbacks.
      
      Cc: linux-hexagon@vger.kernel.org
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRichard Kuo <rkuo@codeaurora.org>
      d70e22d5
    • V
      mm: rename alloc_pages_exact_node() to __alloc_pages_node() · 96db800f
      Vlastimil Babka 提交于
      alloc_pages_exact_node() was introduced in commit 6484eb3e ("page
      allocator: do not check NUMA node ID when the caller knows the node is
      valid") as an optimized variant of alloc_pages_node(), that doesn't
      fallback to current node for nid == NUMA_NO_NODE.  Unfortunately the
      name of the function can easily suggest that the allocation is
      restricted to the given node and fails otherwise.  In truth, the node is
      only preferred, unless __GFP_THISNODE is passed among the gfp flags.
      
      The misleading name has lead to mistakes in the past, see for example
      commits 5265047a ("mm, thp: really limit transparent hugepage
      allocation to local node") and b360edb4 ("mm, mempolicy:
      migrate_to_node should only migrate to node").
      
      Another issue with the name is that there's a family of
      alloc_pages_exact*() functions where 'exact' means exact size (instead
      of page order), which leads to more confusion.
      
      To prevent further mistakes, this patch effectively renames
      alloc_pages_exact_node() to __alloc_pages_node() to better convey that
      it's an optimized variant of alloc_pages_node() not intended for general
      usage.  Both functions get described in comments.
      
      It has been also considered to really provide a convenience function for
      allocations restricted to a node, but the major opinion seems to be that
      __GFP_THISNODE already provides that functionality and we shouldn't
      duplicate the API needlessly.  The number of users would be small
      anyway.
      
      Existing callers of alloc_pages_exact_node() are simply converted to
      call __alloc_pages_node(), with the exception of sba_alloc_coherent()
      which open-codes the check for NUMA_NO_NODE, so it is converted to use
      alloc_pages_node() instead.  This means it no longer performs some
      VM_BUG_ON checks, and since the current check for nid in
      alloc_pages_node() uses a 'nid < 0' comparison (which includes
      NUMA_NO_NODE), it may hide wrong values which would be previously
      exposed.
      
      Both differences will be rectified by the next patch.
      
      To sum up, this patch makes no functional changes, except temporarily
      hiding potentially buggy callers.  Restricting the checks in
      alloc_pages_node() is left for the next patch which can in turn expose
      more existing buggy callers.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NRobin Holt <robinmholt@gmail.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Cliff Whickman <cpw@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      96db800f
    • M
      x86: use generic early mem copy · 5dd2c4bd
      Mark Salter 提交于
      The early_ioremap library now has a generic copy_from_early_mem()
      function.  Use the generic copy function for x86 relocate_initrd().
      
      [akpm@linux-foundation.org: remove MAX_MAP_CHUNK define, per Yinghai Lu]
      Signed-off-by: NMark Salter <msalter@redhat.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5dd2c4bd
    • M
      arm64: support initrd outside kernel linear map · 1570f0d7
      Mark Salter 提交于
      The use of mem= could leave part or all of the initrd outside of the
      kernel linear map.  This will lead to an error when unpacking the initrd
      and a probable failure to boot.  This patch catches that situation and
      relocates the initrd to be fully within the linear map.
      Signed-off-by: NMark Salter <msalter@redhat.com>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1570f0d7
    • T
      mem-hotplug: handle node hole when initializing numa_meminfo. · 95cf82ec
      Tang Chen 提交于
      When parsing SRAT, all memory ranges are added into numa_meminfo.  In
      numa_init(), before entering numa_cleanup_meminfo(), all possible memory
      ranges are in numa_meminfo.  And numa_cleanup_meminfo() removes all
      ranges over max_pfn or empty.
      
      But, this only works if the nodes are continuous.  Let's have a look at
      the following example:
      
      We have an SRAT like this:
      SRAT: Node 0 PXM 0 [mem 0x00000000-0x5fffffff]
      SRAT: Node 0 PXM 0 [mem 0x100000000-0x1ffffffffff]
      SRAT: Node 1 PXM 1 [mem 0x20000000000-0x3ffffffffff]
      SRAT: Node 4 PXM 2 [mem 0x40000000000-0x5ffffffffff] hotplug
      SRAT: Node 5 PXM 3 [mem 0x60000000000-0x7ffffffffff] hotplug
      SRAT: Node 2 PXM 4 [mem 0x80000000000-0x9ffffffffff] hotplug
      SRAT: Node 3 PXM 5 [mem 0xa0000000000-0xbffffffffff] hotplug
      SRAT: Node 6 PXM 6 [mem 0xc0000000000-0xdffffffffff] hotplug
      SRAT: Node 7 PXM 7 [mem 0xe0000000000-0xfffffffffff] hotplug
      
      On boot, only node 0,1,2,3 exist.
      
      And the numa_meminfo will look like this:
      numa_meminfo.nr_blks = 9
      1. on node 0: [0, 60000000]
      2. on node 0: [100000000, 20000000000]
      3. on node 1: [20000000000, 40000000000]
      4. on node 4: [40000000000, 60000000000]
      5. on node 5: [60000000000, 80000000000]
      6. on node 2: [80000000000, a0000000000]
      7. on node 3: [a0000000000, a0800000000]
      8. on node 6: [c0000000000, a0800000000]
      9. on node 7: [e0000000000, a0800000000]
      
      And numa_cleanup_meminfo() will merge 1 and 2, and remove 8,9 because the
      end address is over max_pfn, which is a0800000000.  But 4 and 5 are not
      removed because their end addresses are less then max_pfn.  But in fact,
      node 4 and 5 don't exist.
      
      In a word, numa_cleanup_meminfo() is not able to handle holes between nodes.
      
      Since memory ranges in node 4 and 5 are in numa_meminfo, in
      numa_register_memblks(), node 4 and 5 will be mistakenly set to online.
      
      If you run lscpu, it will show:
      NUMA node0 CPU(s):     0-14,128-142
      NUMA node1 CPU(s):     15-29,143-157
      NUMA node2 CPU(s):
      NUMA node3 CPU(s):
      NUMA node4 CPU(s):     62-76,190-204
      NUMA node5 CPU(s):     78-92,206-220
      
      In this patch, we use memblock_overlaps_region() to check if ranges in
      numa_meminfo overlap with ranges in memory_block.  Since memory_block
      contains all available memory at boot time, if they overlap, it means the
      ranges exist.  If not, then remove them from numa_meminfo.
      
      After this patch, lscpu will show:
      NUMA node0 CPU(s):     0-14,128-142
      NUMA node1 CPU(s):     15-29,143-157
      NUMA node4 CPU(s):     62-76,190-204
      NUMA node5 CPU(s):     78-92,206-220
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Reviewed-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Vladimir Murzin <vladimir.murzin@arm.com>
      Cc: Fabian Frederick <fabf@skynet.be>
      Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
      Cc: Baoquan He <bhe@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      95cf82ec
    • M
      sparc32: do not include swap.h from pgtable_32.h · b3d9ed3f
      Michal Hocko 提交于
      "memcg: export struct mem_cgroup" will add includes into
      linux/memcontrol.h which lead to further header dependency issues as
      reported by Guenter Roeck:
      
        In file included from include/linux/highmem.h:7:0,
                         from include/linux/bio.h:23,
                         from include/linux/writeback.h:192,
                         from include/linux/memcontrol.h:30,
                         from include/linux/swap.h:8,
                         from ./arch/sparc/include/asm/pgtable_32.h:17,
                         from ./arch/sparc/include/asm/pgtable.h:6,
                         from arch/sparc/kernel/traps_32.c:23:
        include/linux/mm.h: In function 'is_vmalloc_addr':
        include/linux/mm.h:371:17: error: 'VMALLOC_START' undeclared (first use in this function)
        include/linux/mm.h:371:17: note: each undeclared identifier is reported only once for each function it appears in
        include/linux/mm.h:371:41: error: 'VMALLOC_END' undeclared (first use in this function)
        include/linux/mm.h: In function 'maybe_mkwrite':
        include/linux/mm.h:556:3: error: implicit declaration of function 'pte_mkwrite'
      
      The issue is that pgtable_32.h depends on swap.h to get swap_entry_t but
      that goes all the way down to linux/mm.h which wants to have VMALLOC_*
      which is defined later in pgtable_32.h, though.
      
      swap_entry_t is defined in include/mm_types.h so it should be sufficient
      to include this header without more dependencies.
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Reported-by: NGuenter Roeck <linux@roeck-us.net>
      Tested-by: NGuenter Roeck <linux@roeck-us.net>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b3d9ed3f
    • J
      xen/privcmd: Further s/MFN/GFN/ clean-up · a13d7201
      Julien Grall 提交于
      The privcmd code is mixing the usage of GFN and MFN within the same
      functions which make the code difficult to understand when you only work
      with auto-translated guests.
      
      The privcmd driver is only dealing with GFN so replace all the mention
      of MFN into GFN.
      
      The ioctl structure used to map foreign change has been left unchanged
      given that the userspace is using it. Nonetheless, add a comment to
      explain the expected value within the "mfn" field.
      Signed-off-by: NJulien Grall <julien.grall@citrix.com>
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      a13d7201
    • J
      xen: Use correctly the Xen memory terminologies · 0df4f266
      Julien Grall 提交于
      Based on include/xen/mm.h [1], Linux is mistakenly using MFN when GFN
      is meant, I suspect this is because the first support for Xen was for
      PV. This resulted in some misimplementation of helpers on ARM and
      confused developers about the expected behavior.
      
      For instance, with pfn_to_mfn, we expect to get an MFN based on the name.
      Although, if we look at the implementation on x86, it's returning a GFN.
      
      For clarity and avoid new confusion, replace any reference to mfn with
      gfn in any helpers used by PV drivers. The x86 code will still keep some
      reference of pfn_to_mfn which may be used by all kind of guests
      No changes as been made in the hypercall field, even
      though they may be invalid, in order to keep the same as the defintion
      in xen repo.
      
      Note that page_to_mfn has been renamed to xen_page_to_gfn to avoid a
      name to close to the KVM function gfn_to_page.
      
      Take also the opportunity to simplify simple construction such
      as pfn_to_mfn(page_to_pfn(page)) into xen_page_to_gfn. More complex clean up
      will come in follow-up patches.
      
      [1] http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=e758ed14f390342513405dd766e874934573e6cbSigned-off-by: NJulien Grall <julien.grall@citrix.com>
      Reviewed-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Acked-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
      Acked-by: NWei Liu <wei.liu2@citrix.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      0df4f266
    • J
      arm/xen: implement correctly pfn_to_mfn · 5192b35d
      Julien Grall 提交于
      After the commit introducing convertion between DMA and guest addresses,
      all the callers of pfn_to_mfn are expecting to get a GFN (Guest Frame
      Number). On ARM, all the guests are auto-translated so the GFN is equal
      to the Linux PFN (Pseudo-physical Frame Number).
      
      The current implementation may return an MFN if the caller is passing a
      PFN associated to a mapped foreign grant. In pratice, I haven't seen
      the problem on running guest but we should fix it for the sake of
      correctness.
      
      Correct the implementation by always returning the pfn passed in parameter.
      
      A follow-up patch will take care to rename pfn_to_mfn to a suitable
      name.
      Signed-off-by: NJulien Grall <julien.grall@citrix.com>
      Reviewed-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      5192b35d
    • J
      xen: Make clear that swiotlb and biomerge are dealing with DMA address · 32e09870
      Julien Grall 提交于
      The swiotlb is required when programming a DMA address on ARM when a
      device is not protected by an IOMMU.
      
      In this case, the DMA address should always be equal to the machine address.
      For DOM0 memory, Xen ensure it by have an identity mapping between the
      guest address and host address. However, when mapping a foreign grant
      reference, the 1:1 model doesn't work.
      
      For ARM guest, most of the callers of pfn_to_mfn expects to get a GFN
      (Guest Frame Number), i.e a PFN (Page Frame Number) from the Linux point
      of view given that all ARM guest are auto-translated.
      
      Even though the name pfn_to_mfn is misleading, we need to ensure that
      those caller get a GFN and not by mistake a MFN. In pratical, I haven't
      seen error related to this but we should fix it for the sake of
      correctness.
      
      In order to fix the implementation of pfn_to_mfn on ARM in a follow-up
      patch, we have to introduce new helpers to return the DMA from a PFN and
      the invert.
      
      On x86, the new helpers will be an alias of pfn_to_mfn and mfn_to_pfn.
      
      The helpers will be used in swiotlb and xen_biovec_phys_mergeable.
      
      This is necessary in the latter because we have to ensure that the
      biovec code will not try to merge a biovec using foreign page and
      another using Linux memory.
      
      Lastly, the helper mfn_to_local_pfn has been renamed to bfn_to_local_pfn
      given that the only usage was in swiotlb.
      Signed-off-by: NJulien Grall <julien.grall@citrix.com>
      Reviewed-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      32e09870
  6. 08 9月, 2015 6 次提交
    • H
      6dc0dcde
    • H
      parisc: Drop CONFIG_SMP around update_cr16_clocksource() · 72581cec
      Helge Deller 提交于
      No need to use CONFIG_SMP around update_cr16_clocksource(). It checks for
      num_online_cpus() beeing greater than 1, which is always 1 in UP builds.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      72581cec
    • J
      xen: switch extra memory accounting to use pfns · 626d7508
      Juergen Gross 提交于
      Instead of using physical addresses for accounting of extra memory
      areas available for ballooning switch to pfns as this is much less
      error prone regarding partial pages.
      Reported-by: NRoger Pau Monné <roger.pau@citrix.com>
      Tested-by: NRoger Pau Monné <roger.pau@citrix.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      626d7508
    • J
      xen: limit memory to architectural maximum · cb9e444b
      Juergen Gross 提交于
      When a pv-domain (including dom0) is started it tries to size it's
      p2m list according to the maximum possible memory amount it ever can
      achieve. Limit the initial maximum memory size to the architectural
      limit of the hardware in order to avoid overflows during remapping
      of memory.
      
      This problem will occur when dom0 is started with an initial memory
      size being a multiple of 1GB, but without specifying it's maximum
      memory size. The kernel must be configured without
      CONFIG_XEN_BALLOON_MEMORY_HOTPLUG for the problem to happen.
      Reported-by: NRoger Pau Monné <roger.pau@citrix.com>
      Tested-by: NRoger Pau Monné <roger.pau@citrix.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      cb9e444b
    • J
      xen: avoid another early crash of memory limited dom0 · ab24507c
      Juergen Gross 提交于
      Commit b1c9f169047b ("xen: split counting of extra memory pages...")
      introduced an error when dom0 was started with limited memory occurring
      only on some hardware.
      
      The problem arises in case dom0 is started with initial memory and
      maximum memory being the same. The kernel must be configured without
      CONFIG_XEN_BALLOON_MEMORY_HOTPLUG for the problem to happen. If all
      of this is true and the E820 map of the machine is sparse (some areas
      are not covered) then the machine might crash early in the boot
      process.
      
      An example E820 map triggering the problem looks like this:
      
      [    0.000000] e820: BIOS-provided physical RAM map:
      [    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009d7ff] usable
      [    0.000000] BIOS-e820: [mem 0x000000000009d800-0x000000000009ffff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
      [    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000cf7fafff] usable
      [    0.000000] BIOS-e820: [mem 0x00000000cf7fb000-0x00000000cf95ffff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000cf960000-0x00000000cfb62fff] ACPI NVS
      [    0.000000] BIOS-e820: [mem 0x00000000cfb63000-0x00000000cfd14fff] usable
      [    0.000000] BIOS-e820: [mem 0x00000000cfd15000-0x00000000cfd61fff] ACPI NVS
      [    0.000000] BIOS-e820: [mem 0x00000000cfd62000-0x00000000cfd6cfff] ACPI data
      [    0.000000] BIOS-e820: [mem 0x00000000cfd6d000-0x00000000cfd6ffff] ACPI NVS
      [    0.000000] BIOS-e820: [mem 0x00000000cfd70000-0x00000000cfd70fff] usable
      [    0.000000] BIOS-e820: [mem 0x00000000cfd71000-0x00000000cfea8fff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000cfea9000-0x00000000cfeb9fff] ACPI NVS
      [    0.000000] BIOS-e820: [mem 0x00000000cfeba000-0x00000000cfecafff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000cfecb000-0x00000000cfecbfff] ACPI NVS
      [    0.000000] BIOS-e820: [mem 0x00000000cfecc000-0x00000000cfedbfff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000cfedc000-0x00000000cfedcfff] ACPI NVS
      [    0.000000] BIOS-e820: [mem 0x00000000cfedd000-0x00000000cfeddfff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000cfede000-0x00000000cfee3fff] ACPI NVS
      [    0.000000] BIOS-e820: [mem 0x00000000cfee4000-0x00000000cfef6fff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000cfef7000-0x00000000cfefffff] usable
      [    0.000000] BIOS-e820: [mem 0x00000000e0000000-0x00000000efffffff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000fec10000-0x00000000fec10fff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000fed00000-0x00000000fed00fff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000fed40000-0x00000000fed44fff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000fed61000-0x00000000fed70fff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000fed80000-0x00000000fed8ffff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000ff000000-0x00000000ffffffff] reserved
      [    0.000000] BIOS-e820: [mem 0x0000000100001000-0x000000020effffff] usable
      
      In this case the area a0000-dffff isn't present in the map. This will
      confuse the memory setup of the domain when remapping the memory from
      such holes to populated areas.
      
      To avoid the problem the accounting of to be remapped memory has to
      count such holes in the E820 map as well.
      Reported-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      ab24507c
    • J
      xen: avoid early crash of memory limited dom0 · eafd72e0
      Juergen Gross 提交于
      Commit b1c9f169047b ("xen: split counting of extra memory pages...")
      introduced an error when dom0 was started with limited memory.
      
      The problem arises in case dom0 is started with initial memory and
      maximum memory being the same and exactly a multiple of 1 GB. The
      kernel must be configured without CONFIG_XEN_BALLOON_MEMORY_HOTPLUG
      for the problem to happen. In this case it will crash very early
      during boot due to the virtual mapped p2m list not being large
      enough to be able to remap any memory:
      
      (XEN) Freed 304kB init memory.
      mapping kernel into physical memory
      about to get started...
      (XEN) traps.c:459:d0v0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000]
      (XEN) domain_crash_sync called from entry.S: fault at ffff82d080229a93 create_bounce_frame+0x12b/0x13a
      (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
      (XEN) ----[ Xen-4.5.2-pre  x86_64  debug=n Not tainted ]----
      (XEN) CPU:    0
      (XEN) RIP:    e033:[<ffffffff81d120cb>]
      (XEN) RFLAGS: 0000000000000206   EM: 1 CONTEXT: pv guest (d0v0)
      (XEN) rax: ffffffff81db2000   rbx: 000000004d000000   rcx: 0000000000000000
      (XEN) rdx: 000000004d000000   rsi: 0000000000063000   rdi: 000000004d063000
      (XEN) rbp: ffffffff81c03d78   rsp: ffffffff81c03d28   r8:  0000000000023000
      (XEN) r9:  00000001040ff000   r10: 0000000000007ff0   r11: 0000000000000000
      (XEN) r12: 0000000000063000   r13: 000000000004d000   r14: 0000000000000063
      (XEN) r15: 0000000000000063   cr0: 0000000080050033   cr4: 00000000000006f0
      (XEN) cr3: 0000000105c0f000   cr2: ffffc90000268000
      (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
      (XEN) Guest stack trace from rsp=ffffffff81c03d28:
      (XEN)   0000000000000000 0000000000000000 ffffffff81d120cb 000000010000e030
      (XEN)   0000000000010006 ffffffff81c03d68 000000000000e02b ffffffffffffffff
      (XEN)   0000000000000063 000000000004d063 ffffffff81c03de8 ffffffff81d130a7
      (XEN)   ffffffff81c03de8 000000000004d000 00000001040ff000 0000000000105db1
      (XEN)   00000001040ff001 000000000004d062 ffff8800092d6ff8 0000000002027000
      (XEN)   ffff8800094d8340 ffff8800092d6ff8 00003ffffffff000 ffff8800092d7ff8
      (XEN)   ffffffff81c03e48 ffffffff81d13c43 ffff8800094d8000 ffff8800094d9000
      (XEN)   0000000000000000 ffff8800092d6000 00000000092d6000 000000004cfbf000
      (XEN)   00000000092d6000 00000000052d5442 0000000000000000 0000000000000000
      (XEN)   ffffffff81c03ed8 ffffffff81d185c1 0000000000000000 0000000000000000
      (XEN)   ffffffff81c03e78 ffffffff810f8ca4 ffffffff81c03ed8 ffffffff8171a15d
      (XEN)   0000000000000010 ffffffff81c03ee8 0000000000000000 0000000000000000
      (XEN)   ffffffff81f0e402 ffffffffffffffff ffffffff81dae900 0000000000000000
      (XEN)   0000000000000000 0000000000000000 ffffffff81c03f28 ffffffff81d0cf0f
      (XEN)   0000000000000000 0000000000000000 0000000000000000 ffffffff81db82e0
      (XEN)   0000000000000000 0000000000000000 0000000000000000 0000000000000000
      (XEN)   ffffffff81c03f38 ffffffff81d0c603 ffffffff81c03ff8 ffffffff81d11c86
      (XEN)   0300000100000032 0000000000000005 0000000000000020 0000000000000000
      (XEN)   0000000000000000 0000000000000000 0000000000000000 0000000000000000
      (XEN)   0000000000000000 0000000000000000 0000000000000000 0000000000000000
      (XEN) Domain 0 crashed: rebooting machine in 5 seconds.
      
      This can be avoided by allocating aneough space for the p2m to cover
      the maximum memory of dom0 plus the identity mapped holes required
      for PCI space, BIOS etc.
      Reported-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      eafd72e0