1. 28 1月, 2014 1 次提交
  2. 24 1月, 2014 13 次提交
  3. 22 1月, 2014 6 次提交
    • H
      reciprocal_divide: update/correction of the algorithm · 809fa972
      Hannes Frederic Sowa 提交于
      Jakub Zawadzki noticed that some divisions by reciprocal_divide()
      were not correct [1][2], which he could also show with BPF code
      after divisions are transformed into reciprocal_value() for runtime
      invariance which can be passed to reciprocal_divide() later on;
      reverse in BPF dump ended up with a different, off-by-one K in
      some situations.
      
      This has been fixed by Eric Dumazet in commit aee636c4
      ("bpf: do not use reciprocal divide"). This follow-up patch
      improves reciprocal_value() and reciprocal_divide() to work in
      all cases by using Granlund and Montgomery method, so that also
      future use is safe and without any non-obvious side-effects.
      Known problems with the old implementation were that division by 1
      always returned 0 and some off-by-ones when the dividend and divisor
      where very large. This seemed to not be problematic with its
      current users, as far as we can tell. Eric Dumazet checked for
      the slab usage, we cannot surely say so in the case of flex_array.
      Still, in order to fix that, we propose an extension from the
      original implementation from commit 6a2d7a95 resp. [3][4],
      by using the algorithm proposed in "Division by Invariant Integers
      Using Multiplication" [5], Torbjörn Granlund and Peter L.
      Montgomery, that is, pseudocode for q = n/d where q, n, d is in
      u32 universe:
      
      1) Initialization:
      
        int l = ceil(log_2 d)
        uword m' = floor((1<<32)*((1<<l)-d)/d)+1
        int sh_1 = min(l,1)
        int sh_2 = max(l-1,0)
      
      2) For q = n/d, all uword:
      
        uword t = (n*m')>>32
        q = (t+((n-t)>>sh_1))>>sh_2
      
      The assembler implementation from Agner Fog [6] also helped a lot
      while implementing. We have tested the implementation on x86_64,
      ppc64, i686, s390x; on x86_64/haswell we're still half the latency
      compared to normal divide.
      
      Joint work with Daniel Borkmann.
      
        [1] http://www.wireshark.org/~darkjames/reciprocal-buggy.c
        [2] http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c
        [3] https://gmplib.org/~tege/division-paper.pdf
        [4] http://homepage.cs.uiowa.edu/~jones/bcd/divide.html
        [5] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.2556
        [6] http://www.agner.org/optimize/asmlib.zipReported-by: NJakub Zawadzki <darkjames-ws@darkjames.pl>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Austin S Hemmelgarn <ahferroin7@gmail.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: Jesse Gross <jesse@nicira.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Cc: Veaceslav Falico <vfalico@redhat.com>
      Cc: Jay Vosburgh <fubar@us.ibm.com>
      Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      809fa972
    • X
      lib/show_mem.c: show num_poisoned_pages when oom · 25487d73
      Xishi Qiu 提交于
      Show num_poisoned_pages when oom, it is a little helpful to find the
      reason.  Also it will be emitted anytime show_mem() is called.
      Signed-off-by: NXishi Qiu <qiuxishi@huawei.com>
      Suggested-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Reviewed-by: NWanpeng Li <liwanp@linux.vnet.ibm.com>
      Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      25487d73
    • S
      lib/cpumask.c: use memblock apis for early memory allocations · c1529500
      Santosh Shilimkar 提交于
      Switch to memblock interfaces for early memory allocator instead of
      bootmem allocator.  No functional change in beahvior than what it is in
      current code from bootmem users points of view.
      
      Archs already converted to NO_BOOTMEM now directly use memblock
      interfaces instead of bootmem wrappers build on top of memblock.  And
      the archs which still uses bootmem, these new apis just fallback to
      exiting bootmem APIs.
      Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Grygorii Strashko <grygorii.strashko@ti.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Paul Walmsley <paul@pwsan.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Tony Lindgren <tony@atomide.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c1529500
    • S
      lib/swiotlb.c: use memblock apis for early memory allocations · 457ff1de
      Santosh Shilimkar 提交于
      Switch to memblock interfaces for early memory allocator instead of
      bootmem allocator.  No functional change in beahvior than what it is in
      current code from bootmem users points of view.
      
      Archs already converted to NO_BOOTMEM now directly use memblock
      interfaces instead of bootmem wrappers build on top of memblock.  And
      the archs which still uses bootmem, these new apis just fallback to
      exiting bootmem APIs.
      Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Grygorii Strashko <grygorii.strashko@ti.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Paul Walmsley <paul@pwsan.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Tony Lindgren <tony@atomide.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      457ff1de
    • M
      mm, show_mem: remove SHOW_MEM_FILTER_PAGE_COUNT · aec6a888
      Mel Gorman 提交于
      Commit 4b59e6c4 ("mm, show_mem: suppress page counts in
      non-blockable contexts") introduced SHOW_MEM_FILTER_PAGE_COUNT to
      suppress PFN walks on large memory machines.  Commit c78e9363 ("mm:
      do not walk all of system memory during show_mem") avoided a PFN walk in
      the generic show_mem helper which removes the requirement for
      SHOW_MEM_FILTER_PAGE_COUNT in that case.
      
      This patch removes PFN walkers from the arch-specific implementations
      that report on a per-node or per-zone granularity.  ARM and unicore32
      still do a PFN walk as they report memory usage on each bank which is a
      much finer granularity where the debugging information may still be of
      use.  As the remaining arches doing PFN walks have relatively small
      amounts of memory, this patch simply removes SHOW_MEM_FILTER_PAGE_COUNT.
      
      [akpm@linux-foundation.org: fix parisc]
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: James Bottomley <jejb@parisc-linux.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aec6a888
    • D
      dma-debug: introduce debug_dma_assert_idle() · 0abdd7a8
      Dan Williams 提交于
      Record actively mapped pages and provide an api for asserting a given
      page is dma inactive before execution proceeds.  Placing
      debug_dma_assert_idle() in cow_user_page() flagged the violation of the
      dma-api in the NET_DMA implementation (see commit 77873803 "net_dma:
      mark broken").
      
      The implementation includes the capability to count, in a limited way,
      repeat mappings of the same page that occur without an intervening
      unmap.  This 'overlap' counter is limited to the few bits of tag space
      in a radix tree.  This mechanism is added to mitigate false negative
      cases where, for example, a page is dma mapped twice and
      debug_dma_assert_idle() is called after the page is un-mapped once.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Vinod Koul <vinod.koul@intel.com>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: James Bottomley <JBottomley@Parallels.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0abdd7a8
  4. 21 1月, 2014 1 次提交
  5. 20 1月, 2014 1 次提交
  6. 17 1月, 2014 2 次提交
  7. 15 1月, 2014 1 次提交
    • M
      lib/percpu_counter.c: fix __percpu_counter_add() · 74e72f89
      Ming Lei 提交于
      __percpu_counter_add() may be called in softirq/hardirq handler (such
      as, blk_mq_queue_exit() is typically called in hardirq/softirq handler),
      so we need to call this_cpu_add()(irq safe helper) to update percpu
      counter, otherwise counts may be lost.
      
      This fixes the problem that 'rmmod null_blk' hangs in blk_cleanup_queue()
      because of miscounting of request_queue->mq_usage_counter.
      
      This patch is the v1 of previous one of "lib/percpu_counter.c:
      disable local irq when updating percpu couter", and takes Andrew's
      approach which may be more efficient for ARCHs(x86, s390) that
      have optimized this_cpu_add().
      Signed-off-by: NMing Lei <tom.leiming@gmail.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Shaohua Li <shli@fusionio.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Fan Du <fan.du@windriver.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      74e72f89
  8. 13 1月, 2014 1 次提交
  9. 09 1月, 2014 1 次提交
  10. 07 1月, 2014 1 次提交
  11. 05 1月, 2014 1 次提交
  12. 19 12月, 2013 1 次提交
  13. 18 12月, 2013 1 次提交
    • F
      lib: introduce arch optimized hash library · 71ae8aac
      Francesco Fusco 提交于
      We introduce a new hashing library that is meant to be used in
      the contexts where speed is more important than uniformity of the
      hashed values. The hash library leverages architecture specific
      implementation to achieve high performance and fall backs to
      jhash() for the generic case.
      
      On Intel-based x86 architectures, the library can exploit the crc32l
      instruction, part of the Intel SSE4.2 instruction set, if the
      instruction is supported by the processor. This implementation
      is twice as fast as the jhash() implementation on an i7 processor.
      
      Additional architectures, such as Arm64 provide instructions for
      accelerating the computation of CRC, so they could be added as well
      in follow-up work.
      Signed-off-by: NFrancesco Fusco <ffusco@redhat.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NThomas Graf <tgraf@redhat.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      71ae8aac
  14. 12 12月, 2013 1 次提交
    • T
      kernfs: s/sysfs_dirent/kernfs_node/ and rename its friends accordingly · 324a56e1
      Tejun Heo 提交于
      kernfs has just been separated out from sysfs and we're already in
      full conflict mode.  Nothing can make the situation any worse.  Let's
      take the chance to name things properly.
      
      This patch performs the following renames.
      
      * s/sysfs_elem_dir/kernfs_elem_dir/
      * s/sysfs_elem_symlink/kernfs_elem_symlink/
      * s/sysfs_elem_attr/kernfs_elem_file/
      * s/sysfs_dirent/kernfs_node/
      * s/sd/kn/ in kernfs proper
      * s/parent_sd/parent/
      * s/target_sd/target/
      * s/dir_sd/parent/
      * s/to_sysfs_dirent()/rb_to_kn()/
      * misc renames of local vars when they conflict with the above
      
      Because md, mic and gpio dig into sysfs details, this patch ends up
      modifying them.  All are sysfs_dirent renames and trivial.  While we
      can avoid these by introducing a dummy wrapping struct sysfs_dirent
      around kernfs_node, given the limited usage outside kernfs and sysfs
      proper, I don't think such workaround is called for.
      
      This patch is strictly rename only and doesn't introduce any
      functional difference.
      
      - mic / gpio renames were missing.  Spotted by kbuild test robot.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
      Cc: kbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      324a56e1
  15. 09 12月, 2013 2 次提交
  16. 08 12月, 2013 2 次提交
  17. 02 12月, 2013 1 次提交
    • D
      KEYS: Fix multiple key add into associative array · 23fd78d7
      David Howells 提交于
      If sufficient keys (or keyrings) are added into a keyring such that a node in
      the associative array's tree overflows (each node has a capacity N, currently
      16) and such that all N+1 keys have the same index key segment for that level
      of the tree (the level'th nibble of the index key), then assoc_array_insert()
      calls ops->diff_objects() to indicate at which bit position the two index keys
      vary.
      
      However, __key_link_begin() passes a NULL object to assoc_array_insert() with
      the intention of supplying the correct pointer later before we commit the
      change.  This means that keyring_diff_objects() is given a NULL pointer as one
      of its arguments which it does not expect.  This results in an oops like the
      attached.
      
      With the previous patch to fix the keyring hash function, this can be forced
      much more easily by creating a keyring and only adding keyrings to it.  Add any
      other sort of key and a different insertion path is taken - all 16+1 objects
      must want to cluster in the same node slot.
      
      This can be tested by:
      
      	r=`keyctl newring sandbox @s`
      	for ((i=0; i<=16; i++)); do keyctl newring ring$i $r; done
      
      This should work fine, but oopses when the 17th keyring is added.
      
      Since ops->diff_objects() is always called with the first pointer pointing to
      the object to be inserted (ie. the NULL pointer), we can fix the problem by
      changing the to-be-inserted object pointer to point to the index key passed
      into assoc_array_insert() instead.
      
      Whilst we're at it, we also switch the arguments so that they are the same as
      for ->compare_object().
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000088
      IP: [<ffffffff81191ee4>] hash_key_type_and_desc+0x18/0xb0
      ...
      RIP: 0010:[<ffffffff81191ee4>] hash_key_type_and_desc+0x18/0xb0
      ...
      Call Trace:
       [<ffffffff81191f9d>] keyring_diff_objects+0x21/0xd2
       [<ffffffff811f09ef>] assoc_array_insert+0x3b6/0x908
       [<ffffffff811929a7>] __key_link_begin+0x78/0xe5
       [<ffffffff81191a2e>] key_create_or_update+0x17d/0x36a
       [<ffffffff81192e0a>] SyS_add_key+0x123/0x183
       [<ffffffff81400ddb>] tracesys+0xdd/0xe2
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NStephen Gallagher <sgallagh@redhat.com>
      23fd78d7
  18. 30 11月, 2013 2 次提交
    • I
      Revert "smp/cpumask: Make CONFIG_CPUMASK_OFFSTACK=y usable without debug dependency" · 962d9c57
      Ingo Molnar 提交于
      This reverts commit 9dd12201.
      
      Revert it until Linus's concerns are addressed: this option should not
      allow nonsensical CONFIG_CPUMASK_OFFSTACK and CONFIG_NR_CPUS values, and
      it should probably select sane defaults as well.
      
      Cc: Josh Boyer <jwboyer@fedoraproject.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/n/tip-etcruvuw9neycYf0Rripxrjv@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      962d9c57
    • T
      sysfs, kernfs: introduce kernfs_create_dir[_ns]() · 93b2b8e4
      Tejun Heo 提交于
      Introduce kernfs interface to manipulate a directory which takes and
      returns sysfs_dirents.
      
      create_dir() is renamed to kernfs_create_dir_ns() and its argumantes
      and return value are updated.  create_dir() usages are replaced with
      kernfs_create_dir_ns() and sysfs_create_subdir() usages are replaced
      with kernfs_create_dir().  Dup warnings are handled explicitly by
      sysfs users of the kernfs interface.
      
      sysfs_enable_ns() is renamed to kernfs_enable_ns().
      
      This patch doesn't introduce any behavior changes.
      
      v2: Dummy implementation for !CONFIG_SYSFS updated to return -ENOSYS.
      
      v3: kernfs_enable_ns() added.
      
      v4: Refreshed on top of "sysfs: drop kobj_ns_type handling, take #2"
          so that this patch removes sysfs_enable_ns().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      93b2b8e4
  19. 28 11月, 2013 1 次提交