1. 16 4月, 2013 1 次提交
  2. 15 4月, 2013 3 次提交
    • L
      cgroup: remove cgrp->top_cgroup · 05fb22ec
      Li Zefan 提交于
      It's not used, and it can be retrieved via cgrp->root->top_cgroup.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      05fb22ec
    • T
      cgroup: introduce sane_behavior mount option · 873fe09e
      Tejun Heo 提交于
      It's a sad fact that at this point various cgroup controllers are
      carrying so many idiosyncrasies and pure insanities that it simply
      isn't possible to reach any sort of sane consistent behavior while
      maintaining staying fully compatible with what already has been
      exposed to userland.
      
      As we can't break exposed userland interface, transitioning to sane
      behaviors can only be done in steps while maintaining backwards
      compatibility.  This patch introduces a new mount option -
      __DEVEL__sane_behavior - which disables crazy features and enforces
      consistent behaviors in cgroup core proper and various controllers.
      As exactly which behaviors it changes are still being determined, the
      mount option, at this point, is useful only for development of the new
      behaviors.  As such, the mount option is prefixed with __DEVEL__ and
      generates a warning message when used.
      
      Eventually, once we get to the point where all controller's behaviors
      are consistent enough to implement unified hierarchy, the __DEVEL__
      prefix will be dropped, and more importantly, unified-hierarchy will
      enforce sane_behavior by default.  Maybe we'll able to completely drop
      the crazy stuff after a while, maybe not, but we at least have a
      strategy to move on to saner behaviors.
      
      This patch introduces the mount option and changes the following
      behaviors in cgroup core.
      
      * Mount options "noprefix" and "clone_children" are disallowed.  Also,
        cgroupfs file cgroup.clone_children is not created.
      
      * When mounting an existing superblock, mount options should match.
        This is currently pretty crazy.  If one mounts a cgroup, creates a
        subdirectory, unmounts it and then mount it again with different
        option, it looks like the new options are applied but they aren't.
      
      * Remount is disallowed.
      
      The behaviors changes are documented in the comment above
      CGRP_ROOT_SANE_BEHAVIOR enum and will be expanded as different
      controllers are converted and planned improvements progress.
      
      v2: Dropped unnecessary explicit file permission setting sane_behavior
          cftype entry as suggested by Li Zefan.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      873fe09e
    • T
      move cgroupfs_root to include/linux/cgroup.h · 25a7e684
      Tejun Heo 提交于
      While controllers shouldn't be accessing cgroupfs_root directly, it
      being hidden inside kern/cgroup.c makes somethings pretty silly.  This
      makes routing hierarchy-wide settings which need to be visible to
      controllers cumbersome.
      
      We're gonna add another hierarchy-wide setting which needs to be
      accessed from controllers.  Move cgroupfs_root and its flags to the
      header file so that we can access root settings with inline helpers.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      25a7e684
  3. 13 4月, 2013 1 次提交
  4. 11 4月, 2013 2 次提交
    • L
      cgroup: implement cgroup_is_descendant() · 78574cf9
      Li Zefan 提交于
      A couple controllers want to determine whether two cgroups are in
      ancestor/descendant relationship.  As it's more likely that the
      descendant is the primary subject of interest and there are other
      operations focusing on the descendants, let's ask is_descendent rather
      than is_ancestor.
      
      Implementation is trivial as the previous patch guarantees that all
      ancestors of a cgroup stay accessible as long as the cgroup is
      accessible.
      
      tj: Removed depth optimization, renamed from cgroup_is_ancestor(),
          rewrote descriptions.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      78574cf9
    • R
      cgroup: remove bind() method from cgroup_subsys. · 84cfb6ab
      Rami Rosen 提交于
      The bind() method of cgroup_subsys is not used in any of the
      controllers (cpuset, freezer, blkio, net_cls, memcg, net_prio,
      devices, perf, hugetlb, cpu and cpuacct)
      
      tj: Removed the entry on ->bind() from
          Documentation/cgroups/cgroups.txt.  Also updated a couple
          paragraphs which were suggesting that dynamic re-binding may be
          implemented.  It's not gonna.
      Signed-off-by: NRami Rosen <ramirose@gmail.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      84cfb6ab
  5. 08 4月, 2013 3 次提交
    • T
      cgroup: remove cgroup_lock_is_held() · 2219449a
      Tejun Heo 提交于
      We don't want controllers to assume that the information is officially
      available and do funky things with it.
      
      The only user is task_subsys_state_check() which uses it to verify RCU
      access context.  We can move cgroup_lock_is_held() inside
      CONFIG_PROVE_RCU but that doesn't add meaningful protection compared
      to conditionally exposing cgroup_mutex.
      
      Remove cgroup_lock_is_held(), export cgroup_mutex iff CONFIG_PROVE_RCU
      and use lockdep_is_held() directly on the mutex in
      task_subsys_state_check().
      
      While at it, add parentheses around macro arguments in
      task_subsys_state_check().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      2219449a
    • T
      cgroup: unexport locking interface and cgroup_attach_task() · b9777cf8
      Tejun Heo 提交于
      Now that all external cgroup_lock() users are gone, we can finally
      unexport the locking interface and prevent future abuse of
      cgroup_mutex.
      
      Make cgroup_[un]lock() and cgroup_lock_live_group() static.  Also,
      cgroup_attach_task() doesn't have any user left and can't be used
      without locking interface anyway.  Make it static too.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      b9777cf8
    • T
      cgroup, cpuset: replace move_member_tasks_to_cpuset() with cgroup_transfer_tasks() · 8cc99345
      Tejun Heo 提交于
      When a cpuset becomes empty (no CPU or memory), its tasks are
      transferred with the nearest ancestor with execution resources.  This
      is implemented using cgroup_scan_tasks() with a callback which grabs
      cgroup_mutex and invokes cgroup_attach_task() on each task.
      
      Both cgroup_mutex and cgroup_attach_task() are scheduled to be
      unexported.  Implement cgroup_transfer_tasks() in cgroup proper which
      is essentially the same as move_member_tasks_to_cpuset() except that
      it takes cgroups instead of cpusets and @to comes before @from like
      normal functions with those arguments, and replace
      move_member_tasks_to_cpuset() with it.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      8cc99345
  6. 20 3月, 2013 1 次提交
  7. 13 3月, 2013 1 次提交
  8. 06 3月, 2013 3 次提交
  9. 05 3月, 2013 1 次提交
    • L
      cgroup: fix cgroup_path() vs rename() race · 65dff759
      Li Zefan 提交于
      rename() will change dentry->d_name. The result of this race can
      be worse than seeing partially rewritten name, but we might access
      a stale pointer because rename() will re-allocate memory to hold
      a longer name.
      
      As accessing dentry->name must be protected by dentry->d_lock or
      parent inode's i_mutex, while on the other hand cgroup-path() can
      be called with some irq-safe spinlocks held, we can't generate
      cgroup path using dentry->d_name.
      
      Alternatively we make a copy of dentry->d_name and save it in
      cgrp->name when a cgroup is created, and update cgrp->name at
      rename().
      
      v5: use flexible array instead of zero-size array.
      v4: - allocate root_cgroup_name and all root_cgroup->name points to it.
          - add cgroup_name() wrapper.
      v3: use kfree_rcu() instead of synchronize_rcu() in user-visible path.
      v2: make cgrp->name RCU safe.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      65dff759
  10. 03 3月, 2013 3 次提交
    • J
      mm: define VM_GROWSUP for CONFIG_METAG · 9ca52ed9
      James Hogan 提交于
      Commit cc2383ec ("mm: introduce
      arch-specific vma flag VM_ARCH_1") merged in v3.7-rc1.
      
      The above commit combined several arch-specific vma flags into one, and
      in the process it changed the VM_GROWSUP definition to depend on
      specific architectures rather than CONFIG_STACK_GROWSUP. Therefore add
      an ifdef for CONFIG_METAG to also set VM_GROWSUP.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: linux-mm@kvack.org
      9ca52ed9
    • J
      metag: Internal and external irqchips · 5698c50d
      James Hogan 提交于
      Meta core internal interrupts (from HWSTATMETA and friends) are vectored
      onto the TR1 core trigger for the current thread. This is demultiplexed
      in irq-metag.c to individual Linux IRQs for each internal interrupt.
      
      External SoC interrupts (from HWSTATEXT and friends) are vectored onto
      the TR2 core trigger for the current thread. This is demultiplexed in
      irq-metag-ext.c to individual Linux IRQs for each external SoC interrupt.
      The external irqchip has devicetree bindings for configuring the number
      of irq banks and the type of masking available.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Grant Likely <grant.likely@secretlab.ca>
      Cc: Rob Herring <rob.herring@calxeda.com>
      Cc: Rob Landley <rob@landley.net>
      Cc: Dom Cobley <popcornmix@gmail.com>
      Cc: Simon Arlott <simon@fire.lp0.eu>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Maxime Ripard <maxime.ripard@free-electrons.com>
      Cc: devicetree-discuss@lists.ozlabs.org
      Cc: linux-doc@vger.kernel.org
      5698c50d
    • Y
      x86, ACPI, mm: Revert movablemem_map support · 20e6926d
      Yinghai Lu 提交于
      Tim found:
      
        WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x6f/0x80()
        Hardware name: S2600CP
        sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
        smpboot: Booting Node   1, Processors  #1
        Modules linked in:
        Pid: 0, comm: swapper/1 Not tainted 3.9.0-0-generic #1
        Call Trace:
          set_cpu_sibling_map+0x279/0x449
          start_secondary+0x11d/0x1e5
      
      Don Morris reproduced on a HP z620 workstation, and bisected it to
      commit e8d19552 ("acpi, memory-hotplug: parse SRAT before memblock
      is ready")
      
      It turns out movable_map has some problems, and it breaks several things
      
      1. numa_init is called several times, NOT just for srat. so those
      	nodes_clear(numa_nodes_parsed)
      	memset(&numa_meminfo, 0, sizeof(numa_meminfo))
         can not be just removed.  Need to consider sequence is: numaq, srat, amd, dummy.
         and make fall back path working.
      
      2. simply split acpi_numa_init to early_parse_srat.
         a. that early_parse_srat is NOT called for ia64, so you break ia64.
         b.  for (i = 0; i < MAX_LOCAL_APIC; i++)
      	     set_apicid_to_node(i, NUMA_NO_NODE)
           still left in numa_init. So it will just clear result from early_parse_srat.
           it should be moved before that....
         c.  it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved
             early before override from INITRD is settled.
      
      3. that patch TITLE is total misleading, there is NO x86 in the title,
         but it changes critical x86 code. It caused x86 guys did not
         pay attention to find the problem early. Those patches really should
         be routed via tip/x86/mm.
      
      4. after that commit, following range can not use movable ram:
        a. real_mode code.... well..funny, legacy Node0 [0,1M) could be hot-removed?
        b. initrd... it will be freed after booting, so it could be on movable...
        c. crashkernel for kdump...: looks like we can not put kdump kernel above 4G
      	anymore.
        d. init_mem_mapping: can not put page table high anymore.
        e. initmem_init: vmemmap can not be high local node anymore. That is
           not good.
      
      If node is hotplugable, the mem related range like page table and
      vmemmap could be on the that node without problem and should be on that
      node.
      
      We have workaround patch that could fix some problems, but some can not
      be fixed.
      
      So just remove that offending commit and related ones including:
      
       f7210e6c ("mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to
          protect movablecore_map in memblock_overlaps_region().")
      
       01a178a9 ("acpi, memory-hotplug: support getting hotplug info from
          SRAT")
      
       27168d38 ("acpi, memory-hotplug: extend movablemem_map ranges to
          the end of node")
      
       e8d19552 ("acpi, memory-hotplug: parse SRAT before memblock is
          ready")
      
       fb06bc8e ("page_alloc: bootmem limit with movablecore_map")
      
       42f47e27 ("page_alloc: make movablemem_map have higher priority")
      
       6981ec31 ("page_alloc: introduce zone_movable_limit[] to keep
          movable limit for nodes")
      
       34b71f1e ("page_alloc: add movable_memmap kernel parameter")
      
       4d59a751 ("x86: get pg_data_t's memory from other node")
      
      Later we should have patches that will make sure kernel put page table
      and vmemmap on local node ram instead of push them down to node0.  Also
      need to find way to put other kernel used ram to local node ram.
      Reported-by: NTim Gardner <tim.gardner@canonical.com>
      Reported-by: NDon Morris <don.morris@hp.com>
      Bisected-by: NDon Morris <don.morris@hp.com>
      Tested-by: NDon Morris <don.morris@hp.com>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      20e6926d
  11. 02 3月, 2013 7 次提交
    • A
      constify path_get/path_put and fs_struct.c stuff · dcf787f3
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      dcf787f3
    • A
      cache the value of file_inode() in struct file · dd37978c
      Al Viro 提交于
      Note that this thing does *not* contribute to inode refcount;
      it's pinned down by dentry.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      dd37978c
    • A
      dm: add target num_write_bios fn · b0d8ed4d
      Alasdair G Kergon 提交于
      Add a num_write_bios function to struct target.
      
      If an instance of a target sets this, it will be queried before the
      target's mapping function is called on a write bio, and the response
      controls the number of copies of the write bio that the target will
      receive.
      
      This provides a convenient way for a target to send the same data to
      more than one device.  The new cache target uses this in writethrough
      mode, to send the data both to the cache and the backing device.
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      b0d8ed4d
    • M
      dm kcopyd: introduce configurable throttling · df5d2e90
      Mikulas Patocka 提交于
      This patch allows the administrator to reduce the rate at which kcopyd
      issues I/O.
      
      Each module that uses kcopyd acquires a throttle parameter that can be
      set in /sys/module/*/parameters.
      
      We maintain a history of kcopyd usage by each module in the variables
      io_period and total_period in struct dm_kcopyd_throttle. The actual
      kcopyd activity is calculated as a percentage of time equal to
      "(100 * io_period / total_period)".  This is compared with the user-defined
      throttle percentage threshold and if it is exceeded, we sleep.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      df5d2e90
    • A
      dm: rename request variables to bios · 55a62eef
      Alasdair G Kergon 提交于
      Use 'bio' in the name of variables and functions that deal with
      bios rather than 'request' to avoid confusion with the normal
      block layer use of 'request'.
      
      No functional changes.
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      55a62eef
    • M
      dm: fix truncated status strings · fd7c092e
      Mikulas Patocka 提交于
      Avoid returning a truncated table or status string instead of setting
      the DM_BUFFER_FULL_FLAG when the last target of a table fills the
      buffer.
      
      When processing a table or status request, the function retrieve_status
      calls ti->type->status. If ti->type->status returns non-zero,
      retrieve_status assumes that the buffer overflowed and sets
      DM_BUFFER_FULL_FLAG.
      
      However, targets don't return non-zero values from their status method
      on overflow. Most targets returns always zero.
      
      If a buffer overflow happens in a target that is not the last in the
      table, it gets noticed during the next iteration of the loop in
      retrieve_status; but if a buffer overflow happens in the last target, it
      goes unnoticed and erroneously truncated data is returned.
      
      In the current code, the targets behave in the following way:
      * dm-crypt returns -ENOMEM if there is not enough space to store the
        key, but it returns 0 on all other overflows.
      * dm-thin returns errors from the status method if a disk error happened.
        This is incorrect because retrieve_status doesn't check the error
        code, it assumes that all non-zero values mean buffer overflow.
      * all the other targets always return 0.
      
      This patch changes the ti->type->status function to return void (because
      most targets don't use the return code). Overflow is detected in
      retrieve_status: if the status method fills up the remaining space
      completely, it is assumed that buffer overflow happened.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      fd7c092e
    • R
      hsi: fix kernel-doc warnings · 8eae508b
      Randy Dunlap 提交于
      Fix kernel-doc warnings in hsi files:
      
        Warning(include/linux/hsi/hsi.h:136): Excess struct/union/enum/typedef member 'e_handler' description in 'hsi_client'
        Warning(include/linux/hsi/hsi.h:136): Excess struct/union/enum/typedef member 'pclaimed' description in 'hsi_client'
        Warning(include/linux/hsi/hsi.h:136): Excess struct/union/enum/typedef member 'nb' description in 'hsi_client'
        Warning(drivers/hsi/hsi.c:434): No description found for parameter 'handler'
        Warning(drivers/hsi/hsi.c:434): Excess function parameter 'cb' description in 'hsi_register_port_event'
      
      Don't document "private:" fields with kernel-doc notation.
      If you want to leave them fully documented, that's OK, but
      then don't mark them as "private:".
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Cc: Carlos Chinea <carlos.chinea@nokia.com>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-omap@vger.kernel.org
      Acked-by: NNishanth Menon <nm@ti.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8eae508b
  12. 01 3月, 2013 5 次提交
  13. 28 2月, 2013 9 次提交
    • A
      dmaengine: dw_dmac: move to generic DMA binding · f9c6a655
      Arnd Bergmann 提交于
      The original device tree binding for this driver, from Viresh Kumar
      unfortunately conflicted with the generic DMA binding, and did not allow
      to completely seperate slave device configuration from the controller.
      
      This is an attempt to replace it with an implementation of the generic
      binding, but it is currently completely untested, because I do not have
      any hardware with this particular controller.
      
      The patch applies on top of the slave-dma tree, which contains both the base
      support for the generic DMA binding, as well as the earlier attempt from
      Viresh. Both of these are currently not merged upstream however.
      
      This version incorporates feedback from Viresh Kumar, Andy Shevchenko
      and Russell King.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Acked-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Vinod Koul <vinod.koul@linux.intel.com>
      Cc: devicetree-discuss@lists.ozlabs.org
      Cc: linux-arm-kernel@lists.infradead.org
      Signed-off-by: NVinod Koul <vinod.koul@intel.com>
      f9c6a655
    • S
      hlist: drop the node parameter from iterators · b67bfe0d
      Sasha Levin 提交于
      I'm not sure why, but the hlist for each entry iterators were conceived
      
              list_for_each_entry(pos, head, member)
      
      The hlist ones were greedy and wanted an extra parameter:
      
              hlist_for_each_entry(tpos, pos, head, member)
      
      Why did they need an extra pos parameter? I'm not quite sure. Not only
      they don't really need it, it also prevents the iterator from looking
      exactly like the list iterator, which is unfortunate.
      
      Besides the semantic patch, there was some manual work required:
      
       - Fix up the actual hlist iterators in linux/list.h
       - Fix up the declaration of other iterators based on the hlist ones.
       - A very small amount of places were using the 'node' parameter, this
       was modified to use 'obj->member' instead.
       - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
       properly, so those had to be fixed up manually.
      
      The semantic patch which is mostly the work of Peter Senna Tschudin is here:
      
      @@
      iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
      
      type T;
      expression a,c,d,e;
      identifier b;
      statement S;
      @@
      
      -T b;
          <+... when != b
      (
      hlist_for_each_entry(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue(a,
      - b,
      c) S
      |
      hlist_for_each_entry_from(a,
      - b,
      c) S
      |
      hlist_for_each_entry_rcu(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_rcu_bh(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue_rcu_bh(a,
      - b,
      c) S
      |
      for_each_busy_worker(a, c,
      - b,
      d) S
      |
      ax25_uid_for_each(a,
      - b,
      c) S
      |
      ax25_for_each(a,
      - b,
      c) S
      |
      inet_bind_bucket_for_each(a,
      - b,
      c) S
      |
      sctp_for_each_hentry(a,
      - b,
      c) S
      |
      sk_for_each(a,
      - b,
      c) S
      |
      sk_for_each_rcu(a,
      - b,
      c) S
      |
      sk_for_each_from
      -(a, b)
      +(a)
      S
      + sk_for_each_from(a) S
      |
      sk_for_each_safe(a,
      - b,
      c, d) S
      |
      sk_for_each_bound(a,
      - b,
      c) S
      |
      hlist_for_each_entry_safe(a,
      - b,
      c, d, e) S
      |
      hlist_for_each_entry_continue_rcu(a,
      - b,
      c) S
      |
      nr_neigh_for_each(a,
      - b,
      c) S
      |
      nr_neigh_for_each_safe(a,
      - b,
      c, d) S
      |
      nr_node_for_each(a,
      - b,
      c) S
      |
      nr_node_for_each_safe(a,
      - b,
      c, d) S
      |
      - for_each_gfn_sp(a, c, d, b) S
      + for_each_gfn_sp(a, c, d) S
      |
      - for_each_gfn_indirect_valid_sp(a, c, d, b) S
      + for_each_gfn_indirect_valid_sp(a, c, d) S
      |
      for_each_host(a,
      - b,
      c) S
      |
      for_each_host_safe(a,
      - b,
      c, d) S
      |
      for_each_mesh_entry(a,
      - b,
      c, d) S
      )
          ...+>
      
      [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
      [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
      [akpm@linux-foundation.org: checkpatch fixes]
      [akpm@linux-foundation.org: fix warnings]
      [akpm@linux-foudnation.org: redo intrusive kvm changes]
      Tested-by: NPeter Senna Tschudin <peter.senna@gmail.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b67bfe0d
    • M
      include/linux/eventfd.h: fix incorrect filename is a comment · 1d730c49
      Martin Sustrik 提交于
      Comment in eventfd.h referred to 'include/asm-generic/fcntl.h'
      while the correct path is 'include/uapi/asm-generic/fcntl.h'.
      Signed-off-by: NMartin Sustrik <sustrik@250bpm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1d730c49
    • R
      ipmi: remove superfluous kernel/userspace explanation · 59fb1b9f
      Robert P. J. Day 提交于
      Given the obvious distinction between kernel and userspace supported
      by uapi/, it seems unnecessary to comment on that.
      Signed-off-by: NRobert P. J. Day <rpjday@crashcourse.ca>
      Signed-off-by: NCorey Minyard <cminyard@mvista.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      59fb1b9f
    • T
      idr: implement lookup hint · 0ffc2a9c
      Tejun Heo 提交于
      While idr lookup isn't a particularly heavy operation, it still is too
      substantial to use in hot paths without worrying about the performance
      implications.  With recent changes, each idr_layer covers 256 slots
      which should be enough to cover most use cases with single idr_layer
      making lookup hint very attractive.
      
      This patch adds idr->hint which points to the idr_layer which
      allocated an ID most recently and the fast path lookup becomes
      
      	if (look up target's prefix matches that of the hinted layer)
      		return hint->ary[ID's offset in the leaf layer];
      
      which can be inlined.
      
      idr->hint is set to the leaf node on idr_fill_slot() and cleared from
      free_layer().
      
      [andriy.shevchenko@linux.intel.com: always do slow path when hint is uninitialized]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0ffc2a9c
    • T
      idr: add idr_layer->prefix · 54616283
      Tejun Heo 提交于
      Add a field which carries the prefix of ID the idr_layer covers.  This
      will be used to implement lookup hint.
      
      This patch doesn't make use of the new field and doesn't introduce any
      behavior difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54616283
    • T
      idr: make idr_layer larger · 050a6b47
      Tejun Heo 提交于
      With recent preloading changes, idr no longer keeps full layer cache per
      each idr instance (used to be ~6.5k per idr on 64bit) and the previous
      patch removed restriction on the bitmap size.  Both now allow us to have
      larger layers.
      
      Increase IDR_BITS to 8 regardless of BITS_PER_LONG.  Each layer is
      slightly larger than 2k on 64bit and 1k on 32bit and carries 256 entries.
      The size isn't too large, especially compared to what we used to waste on
      per-idr caches, and 256 entries should be able to serve most use cases
      with single layer.  The max tree depth is 4 which is much better than the
      previous 6 on 64bit and 7 on 32bit.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      050a6b47
    • T
      idr: remove length restriction from idr_layer->bitmap · 1d9b2e1e
      Tejun Heo 提交于
      Currently, idr->bitmap is declared as an unsigned long which restricts
      the number of bits an idr_layer can contain.  All bitops can handle
      arbitrary positive integer bit number and there's no reason for this
      restriction.
      
      Declare idr_layer->bitmap using DECLARE_BITMAP() instead of a single
      unsigned long.
      
      * idr_layer->bitmap is now an array.  '&' dropped from params to
        bitops.
      
      * Replaced "== IDR_FULL" tests with bitmap_full() and removed
        IDR_FULL.
      
      * Replaced find_next_bit() on ~bitmap with find_next_zero_bit().
      
      * Replaced "bitmap = 0" with bitmap_clear().
      
      This patch doesn't (or at least shouldn't) introduce any behavior
      changes.
      
      [akpm@linux-foundation.org: checkpatch fixes]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1d9b2e1e
    • T
      idr: remove MAX_IDR_MASK and move left MAX_IDR_* into idr.c · e8c8d1bc
      Tejun Heo 提交于
      MAX_IDR_MASK is another weirdness in the idr interface.  As idr covers
      whole positive integer range, it's defined as 0x7fffffff or INT_MAX.
      
      Its usage in idr_find(), idr_replace() and idr_remove() is bizarre.
      They basically mask off the sign bit and operate on the rest, so if
      the caller, by accident, passes in a negative number, the sign bit
      will be masked off and the remaining part will be used as if that was
      the input, which is worse than crashing.
      
      The constant is visible in idr.h and there are several users in the
      kernel.
      
      * drivers/i2c/i2c-core.c:i2c_add_numbered_adapter()
      
        Basically used to test if adap->nr is a negative number which isn't
        -1 and returns -EINVAL if so.  idr_alloc() already has negative
        @start checking (w/ WARN_ON_ONCE), so this can go away.
      
      * drivers/infiniband/core/cm.c:cm_alloc_id()
        drivers/infiniband/hw/mlx4/cm.c:id_map_alloc()
      
        Used to wrap cyclic @start.  Can be replaced with max(next, 0).
        Note that this type of cyclic allocation using idr is buggy.  These
        are prone to spurious -ENOSPC failure after the first wraparound.
      
      * fs/super.c:get_anon_bdev()
      
        The ID allocated from ida is masked off before being tested whether
        it's inside valid range.  ida allocated ID can never be a negative
        number and the masking is unnecessary.
      
      Update idr_*() functions to fail with -EINVAL when negative @id is
      specified and update other MAX_IDR_MASK users as described above.
      
      This leaves MAX_IDR_MASK without any user, remove it and relocate
      other MAX_IDR_* constants to lib/idr.c.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Jean Delvare <khali@linux-fr.org>
      Cc: Roland Dreier <roland@kernel.org>
      Cc: Sean Hefty <sean.hefty@intel.com>
      Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
      Cc: "Marciniszyn, Mike" <mike.marciniszyn@intel.com>
      Cc: Jack Morgenstein <jackm@dev.mellanox.co.il>
      Cc: Or Gerlitz <ogerlitz@mellanox.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Acked-by: NWolfram Sang <wolfram@the-dreams.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e8c8d1bc