1. 08 5月, 2013 2 次提交
    • D
      rwsem: check counter to avoid cmpxchg calls · 9607a85b
      Davidlohr Bueso 提交于
      This patch tries to reduce the amount of cmpxchg calls in the writer
      failed path by checking the counter value first before issuing the
      instruction.  If ->count is not set to RWSEM_WAITING_BIAS then there is
      no point wasting a cmpxchg call.
      
      Furthermore, Michel states "I suppose it helps due to the case where
      someone else steals the lock while we're trying to acquire
      sem->wait_lock."
      
      Two very different workloads and machines were used to see how this
      patch improves throughput: pgbench on a quad-core laptop and aim7 on a
      large 8 socket box with 80 cores.
      
      Some results comparing Michel's fast-path write lock stealing
      (tps-rwsem) on a quad-core laptop running pgbench:
      
        | db_size | clients  |  tps-rwsem     |   tps-patch  |
        +---------+----------+----------------+--------------+
        | 160 MB   |       1 |           6906 |         9153 | + 32.5
        | 160 MB   |       2 |          15931 |        22487 | + 41.1%
        | 160 MB   |       4 |          33021 |        32503 |
        | 160 MB   |       8 |          34626 |        34695 |
        | 160 MB   |      16 |          33098 |        34003 |
        | 160 MB   |      20 |          31343 |        31440 |
        | 160 MB   |      30 |          28961 |        28987 |
        | 160 MB   |      40 |          26902 |        26970 |
        | 160 MB   |      50 |          25760 |        25810 |
        ------------------------------------------------------
        | 1.6 GB   |       1 |           7729 |         7537 |
        | 1.6 GB   |       2 |          19009 |        23508 | + 23.7%
        | 1.6 GB   |       4 |          33185 |        32666 |
        | 1.6 GB   |       8 |          34550 |        34318 |
        | 1.6 GB   |      16 |          33079 |        32689 |
        | 1.6 GB   |      20 |          31494 |        31702 |
        | 1.6 GB   |      30 |          28535 |        28755 |
        | 1.6 GB   |      40 |          27054 |        27017 |
        | 1.6 GB   |      50 |          25591 |        25560 |
        ------------------------------------------------------
        | 7.6 GB   |       1 |           6224 |         7469 | + 20.0%
        | 7.6 GB   |       2 |          13611 |        12778 |
        | 7.6 GB   |       4 |          33108 |        32927 |
        | 7.6 GB   |       8 |          34712 |        34878 |
        | 7.6 GB   |      16 |          32895 |        33003 |
        | 7.6 GB   |      20 |          31689 |        31974 |
        | 7.6 GB   |      30 |          29003 |        28806 |
        | 7.6 GB   |      40 |          26683 |        26976 |
        | 7.6 GB   |      50 |          25925 |        25652 |
        ------------------------------------------------------
      
      For the aim7 worloads, they overall improved on top of Michel's
      patchset.  For full graphs on how the rwsem series plus this patch
      behaves on a large 8 socket machine against a vanilla kernel:
      
        http://stgolabs.net/rwsem-aim7-results.tar.gzSigned-off-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9607a85b
    • A
      kref: minor cleanup · 2d864e41
      Anatol Pomozov 提交于
       - make warning smp-safe
       - result of atomic _unless_zero functions should be checked by caller
         to avoid use-after-free error
       - trivial whitespace fix.
      
      Link: https://lkml.org/lkml/2013/4/12/391
      
      Tested: compile x86, boot machine and run xfstests
      Signed-off-by: NAnatol Pomozov <anatol.pomozov@gmail.com>
      [ Removed line-break, changed to use WARN_ON_ONCE()  - Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2d864e41
  2. 07 5月, 2013 13 次提交
  3. 06 5月, 2013 1 次提交
  4. 01 5月, 2013 10 次提交
  5. 30 4月, 2013 8 次提交
    • A
      lib/: rename random32() to prandom_u32() · f39fee5f
      Akinobu Mita 提交于
      Use preferable function name which implies using a pseudo-random
      number generator.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f39fee5f
    • A
      uuid: use prandom_bytes() · cedddb00
      Akinobu Mita 提交于
      Use prandom_bytes() to generate 16 bytes of pseudo-random bytes.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Huang Ying <ying.huang@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cedddb00
    • J
      idr: introduce idr_alloc_cyclic() · 3e6628c4
      Jeff Layton 提交于
      As Tejun points out, there are several users of the IDR facility that
      attempt to use it in a cyclic fashion.  These users are likely to see
      -ENOSPC errors after the counter wraps one or more times however.
      
      This patchset adds a new idr_alloc_cyclic routine and converts several
      of these users to it.  Many of these users are in obscure parts of the
      kernel, and I don't have a good way to test some of them.  The change is
      pretty straightforward though, so hopefully it won't be an issue.
      
      There is one other cyclic user of idr_alloc that I didn't touch in
      ipc/util.c.  That one is doing some strange stuff that I didn't quite
      understand, but it looks like it should probably be converted later
      somehow.
      
      This patch:
      
      Thus spake Tejun Heo:
      
          Ooh, BTW, the cyclic allocation is broken.  It's prone to -ENOSPC
          after the first wraparound.  There are several cyclic users in the
          kernel and I think it probably would be best to implement cyclic
          support in idr.
      
      This patch does that by adding new idr_alloc_cyclic function that such
      users in the kernel can use.  With this, there's no need for a caller to
      keep track of the last value used as that's now tracked internally.  This
      should prevent the ENOSPC problems that can hit when the "last allocated"
      counter exceeds INT_MAX.
      
      Later patches will convert existing cyclic users to the new interface.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NTejun Heo <tj@kernel.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Eric Paris <eparis@parisplace.org>
      Cc: Jack Morgenstein <jackm@dev.mellanox.co.il>
      Cc: John McCutchan <john@johnmccutchan.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Or Gerlitz <ogerlitz@mellanox.com>
      Cc: Robert Love <rlove@rlove.org>
      Cc: Roland Dreier <roland@purestorage.com>
      Cc: Sridhar Samudrala <sri@us.ibm.com>
      Cc: Steve Wise <swise@opengridcomputing.com>
      Cc: Tom Tucker <tom@opengridcomputing.com>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3e6628c4
    • A
      lib, net: make isodigit() public and use it · 2e0fb404
      Andy Shevchenko 提交于
      There are at least two users of isodigit().  Let's make it a public
      function of ctype.h.
      Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2e0fb404
    • O
      argv_split(): teach it to handle mutable strings · 095d141b
      Oleg Nesterov 提交于
      argv_split() allocates argv[count_argc(str)] array and assumes that it
      will find the same number of arguments later.  This is obviously wrong if
      this string can be changed, say, by sysctl.
      
      With this patch argv_split() kstrndup's the whole string and does not
      split it, we simply replace the spaces with zeroes and keep the allocated
      memory in argv[-1] for argv_free(arg).
      
      We do not use argv[0] because:
      
      	- str can be all-spaces or empty. In fact this case is fine,
      	  we could kfree() it before return, but:
      
      	- str can have a space at the start, and we can not rely on
      	  kstrndup(skip_spaces(str)) because it can equally race if
      	  this string is mutable.
      
      Also, simplify count_argc() and kill the no longer used skip_arg().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      095d141b
    • D
      lib/int_sqrt.c: optimize square root algorithm · 30493cc9
      Davidlohr Bueso 提交于
      Optimize the current version of the shift-and-subtract (hardware)
      algorithm, described by John von Newmann[1] and Guy L Steele.
      
      Iterating 1,000,000 times, perf shows for the current version:
      
       Performance counter stats for './sqrt-curr' (10 runs):
      
               27.170996 task-clock                #    0.979 CPUs utilized            ( +-  3.19% )
                       3 context-switches          #    0.103 K/sec                    ( +-  4.76% )
                       0 cpu-migrations            #    0.004 K/sec                    ( +-100.00% )
                     104 page-faults               #    0.004 M/sec                    ( +-  0.16% )
              64,921,199 cycles                    #    2.389 GHz                      ( +-  0.03% )
              28,967,789 stalled-cycles-frontend   #   44.62% frontend cycles idle     ( +-  0.18% )
         <not supported> stalled-cycles-backend
             104,502,623 instructions              #    1.61  insns per cycle
                                                   #    0.28  stalled cycles per insn  ( +-  0.00% )
              34,088,368 branches                  # 1254.587 M/sec                    ( +-  0.00% )
                   4,901 branch-misses             #    0.01% of all branches          ( +-  1.32% )
      
             0.027763015 seconds time elapsed                                          ( +-  3.22% )
      
      And for the new version:
      
      Performance counter stats for './sqrt-new' (10 runs):
      
                0.496869 task-clock                #    0.519 CPUs utilized            ( +-  2.38% )
                       0 context-switches          #    0.000 K/sec
                       0 cpu-migrations            #    0.403 K/sec                    ( +-100.00% )
                     104 page-faults               #    0.209 M/sec                    ( +-  0.15% )
                 590,760 cycles                    #    1.189 GHz                      ( +-  2.35% )
                 395,053 stalled-cycles-frontend   #   66.87% frontend cycles idle     ( +-  3.67% )
         <not supported> stalled-cycles-backend
                 398,963 instructions              #    0.68  insns per cycle
                                                   #    0.99  stalled cycles per insn  ( +-  0.39% )
                  70,228 branches                  #  141.341 M/sec                    ( +-  0.36% )
                   3,364 branch-misses             #    4.79% of all branches          ( +-  5.45% )
      
             0.000957440 seconds time elapsed                                          ( +-  2.42% )
      
      Furthermore, this saves space in instruction text:
      
         text    data     bss     dec     hex filename
          111       0       0     111      6f lib/int_sqrt-baseline.o
           89       0       0      89      59 lib/int_sqrt.o
      
      [1] http://en.wikipedia.org/wiki/First_Draft_of_a_Report_on_the_EDVACSigned-off-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
      Reviewed-by: NJonathan Gonzalez <jgonzlez@linets.cl>
      Tested-by: NJonathan Gonzalez <jgonzlez@linets.cl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      30493cc9
    • P
      genalloc: add devres support, allow to find a managed pool by device · 9375db07
      Philipp Zabel 提交于
      This patch adds three exported functions to lib/genalloc.c:
      devm_gen_pool_create, dev_get_gen_pool, and of_get_named_gen_pool.
      
      devm_gen_pool_create is a managed version of gen_pool_create that keeps
      track of the pool via devres and allows the management code to
      automatically destroy it after device removal.
      
      dev_get_gen_pool retrieves the gen_pool for a given device, if it was
      created with devm_gen_pool_create, using devres_find.
      
      of_get_named_gen_pool retrieves the gen_pool for a given device node and
      property name, where the property must contain a phandle pointing to a
      platform device node.  The corresponding platform device is then fed into
      dev_get_gen_pool and the resulting gen_pool is returned.
      
      [akpm@linux-foundation.org: make the of_get_named_gen_pool() stub static, fixing a zillion link errors]
      [akpm@linux-foundation.org: squish "struct device declared inside parameter list" warning]
      Signed-off-by: NPhilipp Zabel <p.zabel@pengutronix.de>
      Acked-by: NGrant Likely <grant.likely@secretlab.ca>
      Tested-by: NMichal Simek <monstr@monstr.eu>
      Cc: Fabio Estevam <fabio.estevam@freescale.com>
      Cc: Matt Porter <mporter@ti.com>
      Cc: Dong Aisheng <dong.aisheng@linaro.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Rob Herring <rob.herring@calxeda.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Javier Martin <javier.martin@vista-silicon.com>
      Cc: Huang Shijie <shijie8@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9375db07
    • D
      mm, show_mem: suppress page counts in non-blockable contexts · 4b59e6c4
      David Rientjes 提交于
      On large systems with a lot of memory, walking all RAM to determine page
      types may take a half second or even more.
      
      In non-blockable contexts, the page allocator will emit a page allocation
      failure warning unless __GFP_NOWARN is specified.  In such contexts, irqs
      are typically disabled and such a lengthy delay may even result in NMI
      watchdog timeouts.
      
      To fix this, suppress the page walk in such contexts when printing the
      page allocation failure warning.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4b59e6c4
  6. 18 4月, 2013 1 次提交
    • Y
      x86, kdump: Set crashkernel_low automatically · c729de8f
      Yinghai Lu 提交于
      Chao said that kdump does does work well on his system on 3.8
      without extra parameter, even iommu does not work with kdump.
      And now have to append crashkernel_low=Y in first kernel to make
      kdump work.
      
      We have now modified crashkernel=X to allocate memory beyong 4G (if
      available) and do not allocate low range for crashkernel if the user
      does not specify that with crashkernel_low=Y.  This causes regression
      if iommu is not enabled.  Without iommu, swiotlb needs to be setup in
      first 4G and there is no low memory available to second kernel.
      
      Set crashkernel_low automatically if the user does not specify that.
      
      For system that does support IOMMU with kdump properly, user could
      specify crashkernel_low=0 to save that 72M low ram.
      
      -v3: add swiotlb_size() according to Konrad.
      -v4: add comments what 8M is for according to hpa.
           also update more crashkernel_low= in kernel-parameters.txt
      -v5: update changelog according to Vivek.
      -v6: Change description about swiotlb referring according to HATAYAMA.
      Reported-by: NWANG Chao <chaowang@redhat.com>
      Tested-by: NWANG Chao <chaowang@redhat.com>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1366089828-19692-2-git-send-email-yinghai@kernel.orgAcked-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      c729de8f
  7. 16 4月, 2013 1 次提交
  8. 14 4月, 2013 1 次提交
    • L
      kobject: fix kset_find_obj() race with concurrent last kobject_put() · a49b7e82
      Linus Torvalds 提交于
      Anatol Pomozov identified a race condition that hits module unloading
      and re-loading.  To quote Anatol:
      
       "This is a race codition that exists between kset_find_obj() and
        kobject_put().  kset_find_obj() might return kobject that has refcount
        equal to 0 if this kobject is freeing by kobject_put() in other
        thread.
      
        Here is timeline for the crash in case if kset_find_obj() searches for
        an object tht nobody holds and other thread is doing kobject_put() on
        the same kobject:
      
          THREAD A (calls kset_find_obj())     THREAD B (calls kobject_put())
          splin_lock()
                                               atomic_dec_return(kobj->kref), counter gets zero here
                                               ... starts kobject cleanup ....
                                               spin_lock() // WAIT thread A in kobj_kset_leave()
          iterate over kset->list
          atomic_inc(kobj->kref) (counter becomes 1)
          spin_unlock()
                                               spin_lock() // taken
                                               // it does not know that thread A increased counter so it
                                               remove obj from list
                                               spin_unlock()
                                               vfree(module) // frees module object with containing kobj
      
          // kobj points to freed memory area!!
          kobject_put(kobj) // OOPS!!!!
      
        The race above happens because module.c tries to use kset_find_obj()
        when somebody unloads module.  The module.c code was introduced in
        commit 6494a93d"
      
      Anatol supplied a patch specific for module.c that worked around the
      problem by simply not using kset_find_obj() at all, but rather than make
      a local band-aid, this just fixes kset_find_obj() to be thread-safe
      using the proper model of refusing the get a new reference if the
      refcount has already dropped to zero.
      
      See examples of this proper refcount handling not only in the kref
      documentation, but in various other equivalent uses of this pattern by
      grepping for atomic_inc_not_zero().
      
      [ Side note: the module race does indicate that module loading and
        unloading is not properly serialized wrt sysfs information using the
        module mutex.  That may require further thought, but this is the
        correct fix at the kobject layer regardless. ]
      Reported-analyzed-and-tested-by: NAnatol Pomozov <anatol.pomozov@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a49b7e82
  9. 10 4月, 2013 1 次提交
  10. 28 3月, 2013 1 次提交
  11. 23 3月, 2013 1 次提交
    • L
      lru_cache: introduce lc_get_cumulative() · cbe5e610
      Lars Ellenberg 提交于
      New helper to be able to consolidate more updates
      into a single transaction.
      Without this, we can only grab a single refcount
      on an updated element while preparing a transaction.
      
      lc_get_cumulative - like lc_get; also finds to-be-changed elements
        @lc: the lru cache to operate on
        @enr: the label to look up
      
        Unlike lc_get this also returns the element for @enr, if it is belonging to
        a pending transaction, so the return values are like for lc_get(),
        plus:
      
        pointer to an element already on the "to_be_changed" list.
      	  In this case, the cache was already marked %LC_DIRTY.
      
        Caller needs to make sure that the pending transaction is completed,
        before proceeding to actually use this element.
      Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      
      Fixed up by Jens to export lc_get_cumulative().
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      cbe5e610
新手
引导
客服 返回
顶部