1. 24 3月, 2013 2 次提交
    • K
      block: Avoid deadlocks with bio allocation by stacking drivers · df2cb6da
      Kent Overstreet 提交于
      Previously, if we ever try to allocate more than once from the same bio
      set while running under generic_make_request() (i.e. a stacking block
      driver), we risk deadlock.
      
      This is because of the code in generic_make_request() that converts
      recursion to iteration; any bios we submit won't actually be submitted
      (so they can complete and eventually be freed) until after we return -
      this means if we allocate a second bio, we're blocking the first one
      from ever being freed.
      
      Thus if enough threads call into a stacking block driver at the same
      time with bios that need multiple splits, and the bio_set's reserve gets
      used up, we deadlock.
      
      This can be worked around in the driver code - we could check if we're
      running under generic_make_request(), then mask out __GFP_WAIT when we
      go to allocate a bio, and if the allocation fails punt to workqueue and
      retry the allocation.
      
      But this is tricky and not a generic solution. This patch solves it for
      all users by inverting the previously described technique. We allocate a
      rescuer workqueue for each bio_set, and then in the allocation code if
      there are bios on current->bio_list we would be blocking, we punt them
      to the rescuer workqueue to be submitted.
      
      This guarantees forward progress for bio allocations under
      generic_make_request() provided each bio is submitted before allocating
      the next, and provided the bios are freed after they complete.
      
      Note that this doesn't do anything for allocation from other mempools.
      Instead of allocating per bio data structures from a mempool, code
      should use bio_set's front_pad.
      
      Tested it by forcing the rescue codepath to be taken (by disabling the
      first GFP_NOWAIT) attempt, and then ran it with bcache (which does a lot
      of arbitrary bio splitting) and verified that the rescuer was being
      invoked.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      Acked-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NMuthukumar Ratty <muthur@gmail.com>
      df2cb6da
    • K
      block: Reorder struct bio_set · 57fb233f
      Kent Overstreet 提交于
      This is prep work for the next patch, which embeds a struct bio_list in
      struct bio_set.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      57fb233f
  2. 18 3月, 2013 1 次提交
  3. 16 3月, 2013 1 次提交
  4. 15 3月, 2013 1 次提交
    • P
      list: Fix double fetch of pointer in hlist_entry_safe() · f65846a1
      Paul E. McKenney 提交于
      The current version of hlist_entry_safe() fetches the pointer twice,
      once to test for NULL and the other to compute the offset back to the
      enclosing structure.  This is OK for normal lock-based use because in
      that case, the pointer cannot change.  However, when the pointer is
      protected by RCU (as in "rcu_dereference(p)"), then the pointer can
      change at any time.  This use case can result in the following sequence
      of events:
      
      1.	CPU 0 invokes hlist_entry_safe(), fetches the RCU-protected
      	pointer as sees that it is non-NULL.
      
      2.	CPU 1 invokes hlist_del_rcu(), deleting the entry that CPU 0
      	just fetched a pointer to.  Because this is the last entry
      	in the list, the pointer fetched by CPU 0 is now NULL.
      
      3.	CPU 0 refetches the pointer, obtains NULL, and then gets a
      	NULL-pointer crash.
      
      This commit therefore applies gcc's "({ })" statement expression to
      create a temporary variable so that the specified pointer is fetched
      only once, avoiding the above sequence of events.  Please note that
      it is the caller's responsibility to use rcu_dereference() as needed.
      This allows RCU-protected uses to work correctly without imposing
      any additional overhead on the non-RCU case.
      
      Many thanks to Eric Dumazet for spotting root cause!
      Reported-by: NCAI Qian <caiqian@redhat.com>
      Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NLi Zefan <lizefan@huawei.com>
      f65846a1
  5. 14 3月, 2013 5 次提交
    • D
      UAPI: fix endianness conditionals in linux/raid/md_p.h · ca044f9a
      David Howells 提交于
      In the UAPI header files, __BIG_ENDIAN and __LITTLE_ENDIAN must be
      compared against __BYTE_ORDER in preprocessor conditionals where these are
      exposed to userspace (that is they're not inside __KERNEL__ conditionals).
      
      However, in the main kernel the norm is to check for
      "defined(__XXX_ENDIAN)" rather than comparing against __BYTE_ORDER and
      this has incorrectly leaked into the userspace headers.
      
      The definition of struct mdp_superblock_s in linux/raid/md_p.h is wrong in
      this way.  Note that userspace will likely interpret the ordering of the
      fields incorrectly as the big-endian variant on a little-endian machines -
      depending on header inclusion order.
      
      [!!!] NOTE [!!!]  This patch may adversely change the userspace API.  It might
      be better to fix the ordering of events_hi, events_lo, cp_events_hi and
      cp_events_lo in struct mdp_superblock_s / typedef mdp_super_t.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ca044f9a
    • D
      UAPI: fix endianness conditionals in linux/acct.h · 29ba06b9
      David Howells 提交于
      In the UAPI header files, __BIG_ENDIAN and __LITTLE_ENDIAN must be
      compared against __BYTE_ORDER in preprocessor conditionals where these are
      exposed to userspace (that is they're not inside __KERNEL__ conditionals).
      
      However, in the main kernel the norm is to check for
      "defined(__XXX_ENDIAN)" rather than comparing against __BYTE_ORDER and
      this has incorrectly leaked into the userspace headers.
      
      The definition of ACCT_BYTEORDER in linux/acct.h is wrong in this way.
      Note that userspace will likely interpret this incorrectly as the
      big-endian variant on little-endian machines - depending on header
      inclusion order.
      
      [!!!] NOTE [!!!]  This patch may adversely change the userspace API.  It might
      be better to fix the value of ACCT_BYTEORDER.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      29ba06b9
    • D
      UAPI: fix endianness conditionals in linux/aio_abi.h · 51b154ed
      David Howells 提交于
      In the UAPI header files, __BIG_ENDIAN and __LITTLE_ENDIAN must be
      compared against __BYTE_ORDER in preprocessor conditionals where these are
      exposed to userspace (that is they're not inside __KERNEL__ conditionals).
      
      However, in the main kernel the norm is to check for
      "defined(__XXX_ENDIAN)" rather than comparing against __BYTE_ORDER and
      this has incorrectly leaked into the userspace headers.
      
      The definition of PADDED() in linux/aio_abi.h is wrong in this way.  Note
      that userspace will likely interpret this and thus the order of fields in
      struct iocb incorrectly as the little-endian variant on big-endian
      machines - depending on header inclusion order.
      
      [!!!] NOTE [!!!]  This patch may adversely change the userspace API.  It might
      be better to fix the ordering of aio_key and aio_reserved1 in struct iocb.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NBenjamin LaHaise <bcrl@kvack.org>
      Acked-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      51b154ed
    • T
      idr: deprecate idr_pre_get() and idr_get_new[_above]() · c8615d37
      Tejun Heo 提交于
      Now that all in-kernel users are converted to ues the new alloc
      interface, mark the old interface deprecated.  We should be able to
      remove these in a few releases.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c8615d37
    • A
      include/linux/res_counter.h needs errno.h · ebf47beb
      Andrew Morton 提交于
      alpha allmodconfig:
      
        In file included from mm/memcontrol.c:28:
        include/linux/res_counter.h: In function 'res_counter_set_limit':
        include/linux/res_counter.h:203: error: 'EBUSY' undeclared (first use in this function)
        include/linux/res_counter.h:203: error: (Each undeclared identifier is reported only once
        include/linux/res_counter.h:203: error: for each function it appears in.)
      
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Glauber Costa <glommer@parallels.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Frederic Weisbecker <fweisbec@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ebf47beb
  6. 13 3月, 2013 2 次提交
  7. 12 3月, 2013 4 次提交
  8. 09 3月, 2013 1 次提交
  9. 08 3月, 2013 1 次提交
  10. 06 3月, 2013 1 次提交
    • K
      acpi: Export the acpi_processor_get_performance_info · c705c78c
      Konrad Rzeszutek Wilk 提交于
      The git commit d5aaffa9
      (cpufreq: handle cpufreq being disabled for all exported function)
      tightens the cpufreq API by returning errors when disable_cpufreq()
      had been called.
      
      The problem we are hitting is that the module xen-acpi-processor which
      uses the ACPI's functions: acpi_processor_register_performance,
      acpi_processor_preregister_performance, and acpi_processor_notify_smm
      fails at acpi_processor_register_performance with -22.
      
      Note that earlier during bootup in arch/x86/xen/setup.c there is also
      an call to cpufreq's API: disable_cpufreq().
      
      This is b/c we want the Linux kernel to parse the ACPI data, but leave
      the cpufreq decisions to the hypervisor.
      
      In v3.9 all the checks that d5aaffa9
      added are now hit and the calls to cpufreq_register_notifier will now
      fail. This means that acpi_processor_ppc_init ends up printing:
      
      "Warning: Processor Platform Limit not supported"
      
      and the acpi_processor_ppc_status is not set.
      
      The repercussions of that is that the call to
      acpi_processor_register_performance fails right away at:
      
      	if (!(acpi_processor_ppc_status & PPC_REGISTERED))
      
      and we don't progress any further on parsing and extracting the _P*
      objects.
      
      The only reason the Xen code called that function was b/c it was
      exported and the only way to gather the P-states. But we can also
      just make acpi_processor_get_performance_info be exported and not
      use acpi_processor_register_performance. This patch does so.
      Acked-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      c705c78c
  11. 05 3月, 2013 1 次提交
    • N
      usb: gadget: composite: fix kernel-doc warnings · 43febb27
      Nishanth Menon 提交于
      A few trivial fixes for composite driver:
      
      Warning(include/linux/usb/composite.h:165): No description found for parameter
      	'fs_descriptors'
      Warning(include/linux/usb/composite.h:165): Excess struct/union/enum/typedef
      	member 'descriptors' description in 'usb_function'
      Warning(include/linux/usb/composite.h:321): No description found for parameter
      	'gadget_driver'
      Warning(drivers/usb/gadget/composite.c:1777): Excess function parameter 'bind'
      	description in 'usb_composite_probe'
      
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jiri Kosina <trivial@kernel.org>
      Cc: linux-usb@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NNishanth Menon <nm@ti.com>
      Signed-off-by: NFelipe Balbi <balbi@ti.com>
      43febb27
  12. 04 3月, 2013 4 次提交
    • R
      ACPI / glue: Drop .find_bridge() callback from struct acpi_bus_type · 92414481
      Rafael J. Wysocki 提交于
      After PCI and USB have stopped using the .find_bridge() callback in
      struct acpi_bus_type, the only remaining user of it is SATA, but SATA
      only pretends to be a user, because it points that callback to a stub
      always returning -ENODEV.
      
      For this reason, drop the SATA's dummy .find_bridge() callback and
      remove .find_bridge(), which is not used any more, from struct
      acpi_bus_type entirely.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NYinghai Lu <yinghai@kernel.org>
      Acked-by: NJeff Garzik <jgarzik@pobox.com>
      92414481
    • R
      ACPI / glue: Add .match() callback to struct acpi_bus_type · 53540098
      Rafael J. Wysocki 提交于
      USB uses the .find_bridge() callback from struct acpi_bus_type
      incorrectly, because as a result of the way it is used by USB every
      device in the system that doesn't have a bus type or parent is
      passed to usb_acpi_find_device() for inspection.
      
      What USB actually needs, though, is to call usb_acpi_find_device()
      for USB ports that don't have a bus type defined, but have
      usb_port_device_type as their device type, as well as for USB
      devices.
      
      To fix that replace the struct bus_type pointer in struct
      acpi_bus_type used for matching devices to specific subsystems
      with a .match() callback to be used for this purpose and update
      the users of struct acpi_bus_type, including USB, accordingly.
      Define the .match() callback routine for USB, usb_acpi_bus_match(),
      in such a way that it will cover both USB devices and USB ports
      and remove the now redundant .find_bridge() callback pointer from
      usb_acpi_bus.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Acked-by: NYinghai Lu <yinghai@kernel.org>
      Acked-by: NJeff Garzik <jgarzik@pobox.com>
      53540098
    • K
      eCryptfs: allow userspace messaging to be disabled · 290502be
      Kees Cook 提交于
      When the userspace messaging (for the less common case of userspace key
      wrap/unwrap via ecryptfsd) is not needed, allow eCryptfs to build with
      it removed. This saves on kernel code size and reduces potential attack
      surface by removing the /dev/ecryptfs node.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
      290502be
    • E
      fs: Limit sys_mount to only request filesystem modules. · 7f78e035
      Eric W. Biederman 提交于
      Modify the request_module to prefix the file system type with "fs-"
      and add aliases to all of the filesystems that can be built as modules
      to match.
      
      A common practice is to build all of the kernel code and leave code
      that is not commonly needed as modules, with the result that many
      users are exposed to any bug anywhere in the kernel.
      
      Looking for filesystems with a fs- prefix limits the pool of possible
      modules that can be loaded by mount to just filesystems trivially
      making things safer with no real cost.
      
      Using aliases means user space can control the policy of which
      filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
      with blacklist and alias directives.  Allowing simple, safe,
      well understood work-arounds to known problematic software.
      
      This also addresses a rare but unfortunate problem where the filesystem
      name is not the same as it's module name and module auto-loading
      would not work.  While writing this patch I saw a handful of such
      cases.  The most significant being autofs that lives in the module
      autofs4.
      
      This is relevant to user namespaces because we can reach the request
      module in get_fs_type() without having any special permissions, and
      people get uncomfortable when a user specified string (in this case
      the filesystem type) goes all of the way to request_module.
      
      After having looked at this issue I don't think there is any
      particular reason to perform any filtering or permission checks beyond
      making it clear in the module request that we want a filesystem
      module.  The common pattern in the kernel is to call request_module()
      without regards to the users permissions.  In general all a filesystem
      module does once loaded is call register_filesystem() and go to sleep.
      Which means there is not much attack surface exposed by loading a
      filesytem module unless the filesystem is mounted.  In a user
      namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
      which most filesystems do not set today.
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Reported-by: NKees Cook <keescook@google.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      7f78e035
  13. 03 3月, 2013 8 次提交
    • J
      mm: define VM_GROWSUP for CONFIG_METAG · 9ca52ed9
      James Hogan 提交于
      Commit cc2383ec ("mm: introduce
      arch-specific vma flag VM_ARCH_1") merged in v3.7-rc1.
      
      The above commit combined several arch-specific vma flags into one, and
      in the process it changed the VM_GROWSUP definition to depend on
      specific architectures rather than CONFIG_STACK_GROWSUP. Therefore add
      an ifdef for CONFIG_METAG to also set VM_GROWSUP.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: linux-mm@kvack.org
      9ca52ed9
    • J
      metag: Internal and external irqchips · 5698c50d
      James Hogan 提交于
      Meta core internal interrupts (from HWSTATMETA and friends) are vectored
      onto the TR1 core trigger for the current thread. This is demultiplexed
      in irq-metag.c to individual Linux IRQs for each internal interrupt.
      
      External SoC interrupts (from HWSTATEXT and friends) are vectored onto
      the TR2 core trigger for the current thread. This is demultiplexed in
      irq-metag-ext.c to individual Linux IRQs for each external SoC interrupt.
      The external irqchip has devicetree bindings for configuring the number
      of irq banks and the type of masking available.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Grant Likely <grant.likely@secretlab.ca>
      Cc: Rob Herring <rob.herring@calxeda.com>
      Cc: Rob Landley <rob@landley.net>
      Cc: Dom Cobley <popcornmix@gmail.com>
      Cc: Simon Arlott <simon@fire.lp0.eu>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Maxime Ripard <maxime.ripard@free-electrons.com>
      Cc: devicetree-discuss@lists.ozlabs.org
      Cc: linux-doc@vger.kernel.org
      5698c50d
    • J
      metag: Time keeping · a2c5d4ed
      James Hogan 提交于
      Add time keeping code for metag. Meta hardware threads have 2 timers.
      The background timer (TXTIMER) is used as a free-running time base, and
      the interrupt timer (TXTIMERI) is used for the timer interrupt. Both
      counters traditionally count at approximately 1MHz.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: John Stultz <johnstul@us.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      a2c5d4ed
    • J
      metag: ptrace · bc3966bf
      James Hogan 提交于
      The ptrace interface for metag provides access to some core register
      sets using the PTRACE_GETREGSET and PTRACE_SETREGSET operations. The
      details of the internal context structures is abstracted into user API
      structures to both ease use and allow flexibility to change the internal
      context layouts. Copyin and copyout functions for these register sets
      are exposed to allow signal handling code to use them to copy to and
      from the signal context.
      
      struct user_gp_regs (NT_PRSTATUS) provides access to the core general
      purpose register context.
      
      struct user_cb_regs (NT_METAG_CBUF) provides access to the TXCATCH*
      registers which contains information abuot a memory fault, unaligned
      access error or watchpoint. This can be modified to alter the way the
      fault is replayed on resume ("catch replay"), or to prevent the replay
      taking place.
      
      struct user_rp_state (NT_METAG_RPIPE) provides access to the state of
      the Meta read pipeline which can be used to hide memory latencies in
      hand optimised data loops.
      
      Extended DSP register state, DSP RAM, and hardware breakpoint registers
      aren't yet exposed through ptrace.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Tony Lindgren <tony@atomide.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      bc3966bf
    • J
      asm-generic/unistd.h: handle symbol prefixes in cond_syscall · 4dd3c959
      James Hogan 提交于
      Some architectures have symbol prefixes and set CONFIG_SYMBOL_PREFIX,
      but this wasn't taken into account by the generic cond_syscall. It's
      easy enough to fix in a generic fashion, so add the symbol prefix to
      symbol names in cond_syscall when CONFIG_SYMBOL_PREFIX is set.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NMike Frysinger <vapier@gentoo.org>
      4dd3c959
    • J
      asm-generic/io.h: check CONFIG_VIRT_TO_BUS · c93d0312
      James Hogan 提交于
      Make asm-generic/io.h check CONFIG_VIRT_TO_BUS before defining
      virt_to_bus() and bus_to_virt(), otherwise it's easy to accidentally
      have a silently failing incorrect direct mapped definition rather then
      no definition at all.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      c93d0312
    • Y
      x86, ACPI, mm: Revert movablemem_map support · 20e6926d
      Yinghai Lu 提交于
      Tim found:
      
        WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x6f/0x80()
        Hardware name: S2600CP
        sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
        smpboot: Booting Node   1, Processors  #1
        Modules linked in:
        Pid: 0, comm: swapper/1 Not tainted 3.9.0-0-generic #1
        Call Trace:
          set_cpu_sibling_map+0x279/0x449
          start_secondary+0x11d/0x1e5
      
      Don Morris reproduced on a HP z620 workstation, and bisected it to
      commit e8d19552 ("acpi, memory-hotplug: parse SRAT before memblock
      is ready")
      
      It turns out movable_map has some problems, and it breaks several things
      
      1. numa_init is called several times, NOT just for srat. so those
      	nodes_clear(numa_nodes_parsed)
      	memset(&numa_meminfo, 0, sizeof(numa_meminfo))
         can not be just removed.  Need to consider sequence is: numaq, srat, amd, dummy.
         and make fall back path working.
      
      2. simply split acpi_numa_init to early_parse_srat.
         a. that early_parse_srat is NOT called for ia64, so you break ia64.
         b.  for (i = 0; i < MAX_LOCAL_APIC; i++)
      	     set_apicid_to_node(i, NUMA_NO_NODE)
           still left in numa_init. So it will just clear result from early_parse_srat.
           it should be moved before that....
         c.  it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved
             early before override from INITRD is settled.
      
      3. that patch TITLE is total misleading, there is NO x86 in the title,
         but it changes critical x86 code. It caused x86 guys did not
         pay attention to find the problem early. Those patches really should
         be routed via tip/x86/mm.
      
      4. after that commit, following range can not use movable ram:
        a. real_mode code.... well..funny, legacy Node0 [0,1M) could be hot-removed?
        b. initrd... it will be freed after booting, so it could be on movable...
        c. crashkernel for kdump...: looks like we can not put kdump kernel above 4G
      	anymore.
        d. init_mem_mapping: can not put page table high anymore.
        e. initmem_init: vmemmap can not be high local node anymore. That is
           not good.
      
      If node is hotplugable, the mem related range like page table and
      vmemmap could be on the that node without problem and should be on that
      node.
      
      We have workaround patch that could fix some problems, but some can not
      be fixed.
      
      So just remove that offending commit and related ones including:
      
       f7210e6c ("mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to
          protect movablecore_map in memblock_overlaps_region().")
      
       01a178a9 ("acpi, memory-hotplug: support getting hotplug info from
          SRAT")
      
       27168d38 ("acpi, memory-hotplug: extend movablemem_map ranges to
          the end of node")
      
       e8d19552 ("acpi, memory-hotplug: parse SRAT before memblock is
          ready")
      
       fb06bc8e ("page_alloc: bootmem limit with movablecore_map")
      
       42f47e27 ("page_alloc: make movablemem_map have higher priority")
      
       6981ec31 ("page_alloc: introduce zone_movable_limit[] to keep
          movable limit for nodes")
      
       34b71f1e ("page_alloc: add movable_memmap kernel parameter")
      
       4d59a751 ("x86: get pg_data_t's memory from other node")
      
      Later we should have patches that will make sure kernel put page table
      and vmemmap on local node ram instead of push them down to node0.  Also
      need to find way to put other kernel used ram to local node ram.
      Reported-by: NTim Gardner <tim.gardner@canonical.com>
      Reported-by: NDon Morris <don.morris@hp.com>
      Bisected-by: NDon Morris <don.morris@hp.com>
      Tested-by: NDon Morris <don.morris@hp.com>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      20e6926d
    • G
      iio: Fix build error seen if IIO_TRIGGER is defined but IIO_BUFFER is not · 5a4d7291
      Guenter Roeck 提交于
      If CONFIG_IIO_TRIGGER is defined but CONFIG_IIO_BUFFER is not, the following
      build error is seen.
      
      drivers/iio/common/st_sensors/st_sensors_trigger.c:21:5: error:
      redefinition of ‘st_sensors_allocate_trigger’
      In file included from
      drivers/iio/common/st_sensors/st_sensors_trigger.c:18:0:
      include/linux/iio/common/st_sensors.h:239:19: note: previous
      definition of ‘st_sensors_allocate_trigger’ was here
      drivers/iio/common/st_sensors/st_sensors_trigger.c:65:6: error:
      redefinition of ‘st_sensors_deallocate_trigger’
      In file included from
      drivers/iio/common/st_sensors/st_sensors_trigger.c:18:0:
      include/linux/iio/common/st_sensors.h:244:20: note: previous
      definition of ‘st_sensors_deallocate_trigger’ was here
      
      This occurs because st_sensors_deallocate_trigger is built if CONFIG_IIO_TRIGGER
      is defined, but the dummy function is compiled if CONFIG_IIO_BUFFER is defined.
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Acked-by: NDenis Ciocca <denis.ciocca@st.com>
      Signed-off-by: NJonathan Cameron <jic23@kernel.org>
      5a4d7291
  14. 02 3月, 2013 8 次提交
    • A
      constify path_get/path_put and fs_struct.c stuff · dcf787f3
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      dcf787f3
    • A
      cache the value of file_inode() in struct file · dd37978c
      Al Viro 提交于
      Note that this thing does *not* contribute to inode refcount;
      it's pinned down by dentry.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      dd37978c
    • A
      dm: add target num_write_bios fn · b0d8ed4d
      Alasdair G Kergon 提交于
      Add a num_write_bios function to struct target.
      
      If an instance of a target sets this, it will be queried before the
      target's mapping function is called on a write bio, and the response
      controls the number of copies of the write bio that the target will
      receive.
      
      This provides a convenient way for a target to send the same data to
      more than one device.  The new cache target uses this in writethrough
      mode, to send the data both to the cache and the backing device.
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      b0d8ed4d
    • M
      dm kcopyd: introduce configurable throttling · df5d2e90
      Mikulas Patocka 提交于
      This patch allows the administrator to reduce the rate at which kcopyd
      issues I/O.
      
      Each module that uses kcopyd acquires a throttle parameter that can be
      set in /sys/module/*/parameters.
      
      We maintain a history of kcopyd usage by each module in the variables
      io_period and total_period in struct dm_kcopyd_throttle. The actual
      kcopyd activity is calculated as a percentage of time equal to
      "(100 * io_period / total_period)".  This is compared with the user-defined
      throttle percentage threshold and if it is exceeded, we sleep.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      df5d2e90
    • M
      dm ioctl: allow message to return data · a2606241
      Mikulas Patocka 提交于
      This patch introduces enhanced message support that allows the
      device-mapper core to recognise messages that are common to all devices,
      and for messages to return data to userspace.
      
      Core messages are processed by the function "message_for_md".  If the
      device mapper doesn't support the message, it is passed to the target
      driver.
      
      If the message returns data, the kernel sets the flag
      DM_MESSAGE_OUT_FLAG.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      a2606241
    • M
      dm ioctl: optimize functions without variable params · 02cde50b
      Mikulas Patocka 提交于
      Device-mapper ioctls receive and send data in a buffer supplied
      by userspace.  The buffer has two parts.  The first part contains
      a 'struct dm_ioctl' and has a fixed size.  The second part depends
      on the ioctl and has a variable size.
      
      This patch recognises the specific ioctls that do not use the variable
      part of the buffer and skips allocating memory for it.
      
      In particular, when a device is suspended and a resume ioctl is sent,
      this now avoid memory allocation completely.
      
      The variable "struct dm_ioctl tmp" is moved from the function
      copy_params to its caller ctl_ioctl and renamed to param_kernel.
      It is used directly when the ioctl function doesn't need any arguments.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      02cde50b
    • A
      dm: rename request variables to bios · 55a62eef
      Alasdair G Kergon 提交于
      Use 'bio' in the name of variables and functions that deal with
      bios rather than 'request' to avoid confusion with the normal
      block layer use of 'request'.
      
      No functional changes.
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      55a62eef
    • M
      dm: fix truncated status strings · fd7c092e
      Mikulas Patocka 提交于
      Avoid returning a truncated table or status string instead of setting
      the DM_BUFFER_FULL_FLAG when the last target of a table fills the
      buffer.
      
      When processing a table or status request, the function retrieve_status
      calls ti->type->status. If ti->type->status returns non-zero,
      retrieve_status assumes that the buffer overflowed and sets
      DM_BUFFER_FULL_FLAG.
      
      However, targets don't return non-zero values from their status method
      on overflow. Most targets returns always zero.
      
      If a buffer overflow happens in a target that is not the last in the
      table, it gets noticed during the next iteration of the loop in
      retrieve_status; but if a buffer overflow happens in the last target, it
      goes unnoticed and erroneously truncated data is returned.
      
      In the current code, the targets behave in the following way:
      * dm-crypt returns -ENOMEM if there is not enough space to store the
        key, but it returns 0 on all other overflows.
      * dm-thin returns errors from the status method if a disk error happened.
        This is incorrect because retrieve_status doesn't check the error
        code, it assumes that all non-zero values mean buffer overflow.
      * all the other targets always return 0.
      
      This patch changes the ti->type->status function to return void (because
      most targets don't use the return code). Overflow is detected in
      retrieve_status: if the status method fills up the remaining space
      completely, it is assumed that buffer overflow happened.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      fd7c092e