1. 24 5月, 2014 1 次提交
    • A
      lib/test_bpf.c: don't use gcc union shortcut · ece80490
      Andrew Morton 提交于
      Older gcc's (mine is gcc-4.4.4) make a mess of this.
      
      lib/test_bpf.c:74: error: unknown field 'insns' specified in initializer
      lib/test_bpf.c:75: warning: missing braces around initializer
      lib/test_bpf.c:75: warning: (near initialization for 'tests[0].<anonymous>.insns[0]')
      lib/test_bpf.c:76: error: extra brace group at end of initializer
      lib/test_bpf.c:76: error: (near initialization for 'tests[0].<anonymous>')
      lib/test_bpf.c:76: warning: excess elements in union initializer
      lib/test_bpf.c:76: warning: (near initialization for 'tests[0].<anonymous>')
      lib/test_bpf.c:77: error: extra brace group at end of initializer
      
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ece80490
  2. 22 5月, 2014 1 次提交
    • A
      net: filter: cleanup invocation of internal BPF · 5fe821a9
      Alexei Starovoitov 提交于
      Kernel API for classic BPF socket filters is:
      
      sk_unattached_filter_create() - validate classic BPF, convert, JIT
      SK_RUN_FILTER() - run it
      sk_unattached_filter_destroy() - destroy socket filter
      
      Cleanup internal BPF kernel API as following:
      
      sk_filter_select_runtime() - final step of internal BPF creation.
        Try to JIT internal BPF program, if JIT is not available select interpreter
      SK_RUN_FILTER() - run it
      sk_filter_free() - free internal BPF program
      
      Disallow direct calls to BPF interpreter. Execution of the BPF program should
      be done with SK_RUN_FILTER() macro.
      
      Example of internal BPF create, run, destroy:
      
        struct sk_filter *fp;
      
        fp = kzalloc(sk_filter_size(prog_len), GFP_KERNEL);
        memcpy(fp->insni, prog, prog_len * sizeof(fp->insni[0]));
        fp->len = prog_len;
      
        sk_filter_select_runtime(fp);
      
        SK_RUN_FILTER(fp, ctx);
      
        sk_filter_free(fp);
      
      Sockets, seccomp, testsuite, tracing are using different ways to populate
      sk_filter, so first steps of program creation are not common.
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Acked-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5fe821a9
  3. 14 5月, 2014 1 次提交
  4. 12 5月, 2014 2 次提交
    • A
      net: filter: additional BPF tests · 9def624a
      Alexei Starovoitov 提交于
      All tests should pass with and without JIT.
      
      Example output:
      test_bpf: #0 TAX 35 16 16 PASS
      test_bpf: #1 TXA 7 7 7 PASS
      test_bpf: #2 ADD_SUB_MUL_K 10 PASS
      test_bpf: #3 DIV_KX 33 PASS
      test_bpf: #4 AND_OR_LSH_K 10 10 PASS
      test_bpf: #5 LD_IND 8 8 8 PASS
      test_bpf: #6 LD_ABS 8 8 8 PASS
      test_bpf: #7 LD_ABS_LL 13 14 PASS
      test_bpf: #8 LD_IND_LL 12 12 12 PASS
      test_bpf: #9 LD_ABS_NET 10 12 PASS
      test_bpf: #10 LD_IND_NET 11 12 12 PASS
      ...
      
      Numbers are times in nsec per filter for given input data.
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9def624a
    • A
      net: filter: BPF testsuite · 64a8946b
      Alexei Starovoitov 提交于
      The testsuite covers classic and internal BPF instructions.
      It is particularly useful for JIT compiler developers.
      Adds to "net" selftest target.
      
      The testsuite can be used as a set of micro-benchmarks.
      It measures execution time of each BPF program in nsec.
      
      This patch adds core framework.
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      64a8946b
  5. 19 4月, 2014 1 次提交
  6. 09 4月, 2014 1 次提交
    • J
      lib/percpu_counter.c: fix bad percpu counter state during suspend · e39435ce
      Jens Axboe 提交于
      I got a bug report yesterday from Laszlo Ersek in which he states that
      his kvm instance fails to suspend.  Laszlo bisected it down to this
      commit 1cf7e9c6 ("virtio_blk: blk-mq support") where virtio-blk is
      converted to use the blk-mq infrastructure.
      
      After digging a bit, it became clear that the issue was with the queue
      drain.  blk-mq tracks queue usage in a percpu counter, which is
      incremented on request alloc and decremented when the request is freed.
      The initial hunt was for an inconsistency in blk-mq, but everything
      seemed fine.  In fact, the counter only returned crazy values when
      suspend was in progress.
      
      When a CPU is unplugged, the percpu counters merges that CPU state with
      the general state.  blk-mq takes care to register a hotcpu notifier with
      the appropriate priority, so we know it runs after the percpu counter
      notifier.  However, the percpu counter notifier only merges the state
      when the CPU is fully gone.  This leaves a state transition where the
      CPU going away is no longer in the online mask, yet it still holds
      private values.  This means that in this state, percpu_counter_sum()
      returns invalid results, and the suspend then hangs waiting for
      abs(dead-cpu-value) requests to complete which of course will never
      happen.
      
      Fix this by clearing the state earlier, so we never have a case where
      the CPU isn't in online mask but still holds private state.  This bug
      has been there since forever, I guess we don't have a lot of users where
      percpu counters needs to be reliable during the suspend cycle.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      Reported-by: NLaszlo Ersek <lersek@redhat.com>
      Tested-by: NLaszlo Ersek <lersek@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e39435ce
  7. 08 4月, 2014 5 次提交
  8. 07 4月, 2014 1 次提交
    • J
      percpu_counter: fix bad counter state during suspend · 60b0ea12
      Jens Axboe 提交于
      I got a bug report yesterday from Laszlo Ersek <lersek@xxxxxxxxxx>, in
      which he states that his kvm instance fails to suspend. He Laszlo
      bisected it down to this commit:
      
      commit 1cf7e9c6
      Author: Jens Axboe <axboe@xxxxxxxxx>
      Date: Fri Nov 1 10:52:52 2013 -0600
      
      virtio_blk: blk-mq support
      
      where virtio-blk is converted to use the blk-mq infrastructure. After
      digging a bit, it became clear that the issue was with the queue drain.
      blk-mq tracks queue usage in a percpu counter, which is incremented on
      request alloc and decremented when the request is freed. The initial
      hunt was for an inconsistency in blk-mq, but everything seemed fine. In
      fact, the counter only returned crazy values when suspend was in
      progress. When a CPU is unplugged, the percpu counters merges that CPU
      state with the general state. blk-mq takes care to register a hotcpu
      notifier with the appropriate priority, so we know it runs after the
      percpu counter notifier. However, the percpu counter notifier only
      merges the state when the CPU is fully gone. This leaves a state
      transition where the CPU going away is no longer in the online mask, yet
      it still holds private values. This means that in this state,
      percpu_counter_sum() returns invalid results, and the suspend then hangs
      waiting for abs(dead-cpu-value) requests to complete which of course
      will never happen.
      
      Fix this by clearing the state earlier, so we never have a case where
      the CPU isn't in online mask but still holds private state. This bug has
      been there since forever, I guess we don't have a lot of users where
      percpu counters needs to be reliable during the suspend cycle.
      
      Reported-by: <lersek@redhat.com>
      Tested-by: NLaszlo Ersek <lersek@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      60b0ea12
  9. 04 4月, 2014 11 次提交
    • R
      lib/decompress_inflate.c: include appropriate header file · af661e83
      Rashika Kheria 提交于
      Include appropriate header file include/linux/decompress/inflate.h in
      lib/decompress_inflate.c because it has prototype declaration of
      function defined in lib/decompress_inflate.c.
      
      Also, fix the guard around the header file
      include/linux/decompress/inflate.h to use a more unique guard symbol.
      This avoids conflict with the INFLATE_H defined by
      zlib_inflate/inflate.h.
      
      This eliminates the following warning in lib/decompress_inflate.c:
      
        lib/decompress_inflate.c:35:17: warning: no previous prototype for `gunzip' [-Wmissing-prototypes]
      Signed-off-by: NRashika Kheria <rashika.kheria@gmail.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      af661e83
    • R
      lib/clz_ctz.c: add prototype declarations in lib/clz_ctz.c · 3c516cdd
      Rashika Kheria 提交于
      Add prototype declarations of functions in lib/clz_ctz.c.  These
      functions are required by GCC builtins and hence can not be removed
      despite of their unreferenced appearance in kernel source.
      
      This eliminates the following warning in lib/clz_ctz.c:
      
        lib/clz_ctz.c:16:12: warning: no previous prototype for `__ctzsi2' [-Wmissing-prototypes]
        lib/clz_ctz.c:22:12: warning: no previous prototype for `__clzsi2' [-Wmissing-prototypes]
        lib/clz_ctz.c:44:12: warning: no previous prototype for `__clzdi2' [-Wmissing-prototypes]
        lib/clz_ctz.c:50:12: warning: no previous prototype for `__ctzdi2' [-Wmissing-prototypes]
      Signed-off-by: NRashika Kheria <rashika.kheria@gmail.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      Acked-by: NChanho Min <chanho.min@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3c516cdd
    • D
      lib/random32.c: minor cleanups and kdoc fix · d3d47eb2
      Daniel Borkmann 提交于
      These are just some very minor and misc cleanups in the PRNG.  In
      prandom_u32() we store the result in an unsigned long which is
      unnecessary as it should be u32 instead that we get from
      prandom_u32_state().  prandom_bytes_state()'s comment is in kdoc format,
      so change it into such as it's done everywhere else.  Also, use the
      normal comment style for the header comment.  Last but not least for
      readability, add some newlines.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d3d47eb2
    • S
      lib/devres.c: fix some sparse warnings · b104d6a5
      Steven Rostedt 提交于
      Having a discussion about sparse warnings in the kernel, and that we
      should clean them up, I decided to pick a random file to do so.  This
      happened to be devres.c which gives the following warnings:
      
          CHECK   lib/devres.c
        lib/devres.c:83:9: warning: cast removes address space of expression
        lib/devres.c:117:31: warning: incorrect type in return expression (different address spaces)
        lib/devres.c:117:31:    expected void [noderef] <asn:2>*
        lib/devres.c:117:31:    got void *
        lib/devres.c:125:31: warning: incorrect type in return expression (different address spaces)
        lib/devres.c:125:31:    expected void [noderef] <asn:2>*
        lib/devres.c:125:31:    got void *
        lib/devres.c:136:26: warning: incorrect type in assignment (different address spaces)
        lib/devres.c:136:26:    expected void [noderef] <asn:2>*[assigned] dest_ptr
        lib/devres.c:136:26:    got void *
        lib/devres.c:226:9: warning: cast removes address space of expression
      
      Mostly it's just the use of typecasting to void * without adding
      __force, or returning ERR_PTR(-ESOMEERR) without typecasting to a
      __iomem type.
      
      I added a helper macro IOMEM_ERR_PTR() that does the typecast to make
      the code a little nicer than adding ugly typecasts to the code.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Tejun Heo <tj@kernel.org>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b104d6a5
    • R
      vsprintf: remove %n handling · 708d96fd
      Ryan Mallon 提交于
      All in-kernel users of %n in format strings have now been removed and
      the %n directive is ignored.  Remove the handling of %n so that it is
      treated the same as any other invalid format string directive.  Keep a
      warning in place to deter new instances of %n in format strings.
      Signed-off-by: NRyan Mallon <rmallon@gmail.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      708d96fd
    • A
      lib/syscall.c: unexport task_current_syscall() · d300d59c
      Andrew Morton 提交于
      It is only used by procfs and procfs cannot be a module.
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d300d59c
    • V
      kobject: don't block for each kobject_uevent · bcccff93
      Vladimir Davydov 提交于
      Currently kobject_uevent has somewhat unpredictable semantics.  The
      point is, since it may call a usermode helper and wait for it to execute
      (UMH_WAIT_EXEC), it is impossible to say for sure what lock dependencies
      it will introduce for the caller - strictly speaking it depends on what
      fs the binary is located on and the set of locks fork may take.  There
      are quite a few kobject_uevent's users that do not take this into
      account and call it with various mutexes taken, e.g.  rtnl_mutex,
      net_mutex, which might potentially lead to a deadlock.
      
      Since there is actually no reason to wait for the usermode helper to
      execute there, let's make kobject_uevent start the helper asynchronously
      with the aid of the UMH_NO_WAIT flag.
      
      Personally, I'm interested in this, because I really want kobject_uevent
      to be called under the slab_mutex in the slub implementation as it used
      to be some time ago, because it greatly simplifies synchronization and
      automatically fixes a kmemcg-related race.  However, there was a
      deadlock detected on an attempt to call kobject_uevent under the
      slab_mutex (see https://lkml.org/lkml/2012/1/14/45), which was reported
      to be fixed by releasing the slab_mutex for kobject_uevent.
      
      Unfortunately, there was no information about who exactly blocked on the
      slab_mutex causing the usermode helper to stall, neither have I managed
      to find this out or reproduce the issue.
      
      BTW, this is not the first attempt to make kobject_uevent use
      UMH_NO_WAIT.  Previous one was made by commit f520360d ("kobject:
      don't block for each kobject_uevent"), but it was wrong (it passed
      arguments allocated on stack to async thread) so it was reverted in
      05f54c13 ("Revert "kobject: don't block for each kobject_uevent".").
      It targeted on speeding up the boot process though.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bcccff93
    • J
      mm: keep page cache radix tree nodes in check · 449dd698
      Johannes Weiner 提交于
      Previously, page cache radix tree nodes were freed after reclaim emptied
      out their page pointers.  But now reclaim stores shadow entries in their
      place, which are only reclaimed when the inodes themselves are
      reclaimed.  This is problematic for bigger files that are still in use
      after they have a significant amount of their cache reclaimed, without
      any of those pages actually refaulting.  The shadow entries will just
      sit there and waste memory.  In the worst case, the shadow entries will
      accumulate until the machine runs out of memory.
      
      To get this under control, the VM will track radix tree nodes
      exclusively containing shadow entries on a per-NUMA node list.  Per-NUMA
      rather than global because we expect the radix tree nodes themselves to
      be allocated node-locally and we want to reduce cross-node references of
      otherwise independent cache workloads.  A simple shrinker will then
      reclaim these nodes on memory pressure.
      
      A few things need to be stored in the radix tree node to implement the
      shadow node LRU and allow tree deletions coming from the list:
      
      1. There is no index available that would describe the reverse path
         from the node up to the tree root, which is needed to perform a
         deletion.  To solve this, encode in each node its offset inside the
         parent.  This can be stored in the unused upper bits of the same
         member that stores the node's height at no extra space cost.
      
      2. The number of shadow entries needs to be counted in addition to the
         regular entries, to quickly detect when the node is ready to go to
         the shadow node LRU list.  The current entry count is an unsigned
         int but the maximum number of entries is 64, so a shadow counter
         can easily be stored in the unused upper bits.
      
      3. Tree modification needs tree lock and tree root, which are located
         in the address space, so store an address_space backpointer in the
         node.  The parent pointer of the node is in a union with the 2-word
         rcu_head, so the backpointer comes at no extra cost as well.
      
      4. The node needs to be linked to an LRU list, which requires a list
         head inside the node.  This does increase the size of the node, but
         it does not change the number of objects that fit into a slab page.
      
      [akpm@linux-foundation.org: export the right function]
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NMinchan Kim <minchan@kernel.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Luigi Semenzato <semenzato@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Metin Doslu <metin@citusdata.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Ozgun Erdogan <ozgun@citusdata.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Roman Gushchin <klamm@yandex-team.ru>
      Cc: Ryan Mallon <rmallon@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      449dd698
    • J
      lib: radix_tree: tree node interface · 139e5616
      Johannes Weiner 提交于
      Make struct radix_tree_node part of the public interface and provide API
      functions to create, look up, and delete whole nodes.  Refactor the
      existing insert, look up, delete functions on top of these new node
      primitives.
      
      This will allow the VM to track and garbage collect page cache radix
      tree nodes.
      
      [sasha.levin@oracle.com: return correct error code on insertion failure]
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Luigi Semenzato <semenzato@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Metin Doslu <metin@citusdata.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Ozgun Erdogan <ozgun@citusdata.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Roman Gushchin <klamm@yandex-team.ru>
      Cc: Ryan Mallon <rmallon@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      139e5616
    • J
      mm: filemap: move radix tree hole searching here · e7b563bb
      Johannes Weiner 提交于
      The radix tree hole searching code is only used for page cache, for
      example the readahead code trying to get a a picture of the area
      surrounding a fault.
      
      It sufficed to rely on the radix tree definition of holes, which is
      "empty tree slot".  But this is about to change, though, as shadow page
      descriptors will be stored in the page cache after the actual pages get
      evicted from memory.
      
      Move the functions over to mm/filemap.c and make them native page cache
      operations, where they can later be adapted to handle the new definition
      of "page cache hole".
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Luigi Semenzato <semenzato@google.com>
      Cc: Metin Doslu <metin@citusdata.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Ozgun Erdogan <ozgun@citusdata.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Roman Gushchin <klamm@yandex-team.ru>
      Cc: Ryan Mallon <rmallon@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e7b563bb
    • J
      lib: radix-tree: add radix_tree_delete_item() · 53c59f26
      Johannes Weiner 提交于
      Provide a function that does not just delete an entry at a given index,
      but also allows passing in an expected item.  Delete only if that item
      is still located at the specified index.
      
      This is handy when lockless tree traversals want to delete entries as
      well because they don't have to do an second, locked lookup to verify
      the slot has not changed under them before deleting the entry.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: NMinchan Kim <minchan@kernel.org>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Luigi Semenzato <semenzato@google.com>
      Cc: Metin Doslu <metin@citusdata.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Ozgun Erdogan <ozgun@citusdata.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Roman Gushchin <klamm@yandex-team.ru>
      Cc: Ryan Mallon <rmallon@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      53c59f26
  10. 02 4月, 2014 2 次提交
    • A
      get rid of DEBUG_WRITECOUNT · 4597e695
      Al Viro 提交于
      it only makes control flow in __fput() and friends more convoluted.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4597e695
    • P
      netlink: don't compare the nul-termination in nla_strcmp · 8b7b9324
      Pablo Neira 提交于
      nla_strcmp compares the string length plus one, so it's implicitly
      including the nul-termination in the comparison.
      
       int nla_strcmp(const struct nlattr *nla, const char *str)
       {
              int len = strlen(str) + 1;
              ...
                      d = memcmp(nla_data(nla), str, len);
      
      However, if NLA_STRING is used, userspace can send us a string without
      the nul-termination. This is a problem since the string
      comparison will not match as the last byte may be not the
      nul-termination.
      
      Fix this by skipping the comparison of the nul-termination if the
      attribute data is nul-terminated. Suggested by Thomas Graf.
      
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Thomas Graf <tgraf@suug.ch>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b7b9324
  11. 29 3月, 2014 1 次提交
  12. 23 3月, 2014 1 次提交
  13. 20 3月, 2014 1 次提交
  14. 04 3月, 2014 2 次提交
    • H
      lib/radix-tree.c: swapoff tmpfs radix_tree: remember to rcu_read_unlock · 5f30fc94
      Hugh Dickins 提交于
      Running fsx on tmpfs with concurrent memhog-swapoff-swapon, lots of
      
        BUG: sleeping function called from invalid context at kernel/fork.c:606
        in_atomic(): 0, irqs_disabled(): 0, pid: 1394, name: swapoff
        1 lock held by swapoff/1394:
         #0:  (rcu_read_lock){.+.+.+}, at: [<ffffffff812520a1>] radix_tree_locate_item+0x1f/0x2b6
      
      followed by
      
        ================================================
        [ BUG: lock held when returning to user space! ]
        3.14.0-rc1 #3 Not tainted
        ------------------------------------------------
        swapoff/1394 is leaving the kernel with locks still held!
        1 lock held by swapoff/1394:
         #0:  (rcu_read_lock){.+.+.+}, at: [<ffffffff812520a1>] radix_tree_locate_item+0x1f/0x2b6
      
      after which the system recovered nicely.
      
      Whoops, I long ago forgot the rcu_read_unlock() on one unlikely branch.
      
      Fixes e504f3fd ("tmpfs radix_tree: locate_item to speed up swapoff")
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5f30fc94
    • D
      dma debug: account for cachelines and read-only mappings in overlap tracking · 3b7a6418
      Dan Williams 提交于
      While debug_dma_assert_idle() checks if a given *page* is actively
      undergoing dma the valid granularity of a dma mapping is a *cacheline*.
      Sander's testing shows that the warning message "DMA-API: exceeded 7
      overlapping mappings of pfn..." is falsely triggering.  The test is
      simply mapping multiple cachelines in a given page.
      
      Ultimately we want overlap tracking to be valid as it is a real api
      violation, so we need to track active mappings by cachelines.  Update
      the active dma tracking to use the page-frame-relative cacheline of the
      mapping as the key, and update debug_dma_assert_idle() to check for all
      possible mapped cachelines for a given page.
      
      However, the need to track active mappings is only relevant when the
      dma-mapping is writable by the device.  In fact it is fairly standard
      for read-only mappings to have hundreds or thousands of overlapping
      mappings at once.  Limiting the overlap tracking to writable
      (!DMA_TO_DEVICE) eliminates this class of false-positive overlap
      reports.
      
      Note, the radix gang lookup is sub-optimal.  It would be best if it
      stopped fetching entries once the search passed a page boundary.
      Nevertheless, this implementation does not perturb the original net_dma
      failing case.  That is to say the extra overhead does not show up in
      terms of making the failing case pass due to a timing change.
      
      References:
        http://marc.info/?l=linux-netdev&m=139232263419315&w=2
        http://marc.info/?l=linux-netdev&m=139217088107122&w=2Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Reported-by: NSander Eikelenboom <linux@eikelenboom.it>
      Reported-by: NDave Jones <davej@redhat.com>
      Tested-by: NDave Jones <davej@redhat.com>
      Tested-by: NSander Eikelenboom <linux@eikelenboom.it>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Francois Romieu <romieu@fr.zoreil.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Wei Liu <wei.liu2@citrix.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3b7a6418
  15. 27 2月, 2014 1 次提交
    • B
      vsprintf: Add support for IORESOURCE_UNSET in %pR · d19cb803
      Bjorn Helgaas 提交于
      Sometimes we have a struct resource where we know the type (MEM/IO/etc.)
      and the size, but we haven't assigned address space for it.  The
      IORESOURCE_UNSET flag is a way to indicate this situation.  For these
      "unset" resources, the start address is meaningless, so print only the
      size, e.g.,
      
        - pci 0000:0c:00.0: reg 184: [mem 0x00000000-0x00001fff 64bit]
        + pci 0000:0c:00.0: reg 184: [mem size 0x2000 64bit]
      
      For %pr (printing with raw flags), we still print the address range,
      because %pr is mostly used for debugging anyway.
      
      Thanks to Fengguang Wu <fengguang.wu@intel.com> for suggesting
      resource_size().
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      d19cb803
  16. 24 2月, 2014 2 次提交
  17. 17 2月, 2014 1 次提交
  18. 14 2月, 2014 1 次提交
  19. 08 2月, 2014 1 次提交
    • T
      sysfs, kobject: add sysfs wrapper for kernfs_enable_ns() · fa4cd451
      Tejun Heo 提交于
      Currently, kobject is invoking kernfs_enable_ns() directly.  This is
      fine now as sysfs and kernfs are enabled and disabled together.  If
      sysfs is disabled, kernfs_enable_ns() is switched to dummy
      implementation too and everything is fine; however, kernfs will soon
      have its own config option CONFIG_KERNFS and !SYSFS && KERNFS will be
      possible, which can make kobject call into non-dummy
      kernfs_enable_ns() with NULL kernfs_node pointers leading to an oops.
      
      Introduce sysfs_enable_ns() which is a wrapper around
      kernfs_enable_ns() so that it can be made a noop depending only on
      CONFIG_SYSFS regardless of the planned CONFIG_KERNFS.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fa4cd451
  20. 06 2月, 2014 1 次提交
  21. 05 2月, 2014 1 次提交
    • L
      kbuild: don't enable DEBUG_INFO when building for COMPILE_TEST · 12b13835
      Linus Torvalds 提交于
      It really isn't very interesting to have DEBUG_INFO when doing compile
      coverage stuff (you wouldn't want to run the result anyway, that's kind
      of the whole point of COMPILE_TEST), and it currently makes the build
      take longer and use much more disk space for "all{yes,mod}config".
      
      There's somewhat active discussion about this still, and we might end up
      with some new config option for things like this (Andi points out that
      the silly X86_DECODER_SELFTEST option also slows down the normal
      coverage tests hugely), but I'm starting the ball rolling with this
      simple one-liner.
      
      DEBUG_INFO isn't that noticeable if you have tons of memory and a good
      IO subsystem, but it hurts you a lot if you don't - for very little
      upside for the common use.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      12b13835
  22. 03 2月, 2014 1 次提交