1. 13 12月, 2016 18 次提交
  2. 12 12月, 2016 1 次提交
  3. 10 12月, 2016 2 次提交
    • E
      udp: add batching to udp_rmem_release() · 6b229cf7
      Eric Dumazet 提交于
      If udp_recvmsg() constantly releases sk_rmem_alloc
      for every read packet, it gives opportunity for
      producers to immediately grab spinlocks and desperatly
      try adding another packet, causing false sharing.
      
      We can add a simple heuristic to give the signal
      by batches of ~25 % of the queue capacity.
      
      This patch considerably increases performance under
      flood by about 50 %, since the thread draining the queue
      is no longer slowed by false sharing.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6b229cf7
    • E
      udp: copy skb->truesize in the first cache line · c84d9490
      Eric Dumazet 提交于
      In UDP RX handler, we currently clear skb->dev before skb
      is added to receive queue, because device pointer is no longer
      available once we exit from RCU section.
      
      Since this first cache line is always hot, lets reuse this space
      to store skb->truesize and thus avoid a cache line miss at
      udp_recvmsg()/udp_skb_destructor time while receive queue
      spinlock is held.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c84d9490
  4. 09 12月, 2016 8 次提交
  5. 08 12月, 2016 2 次提交
    • O
      kthread: Make struct kthread kmalloc'ed · 1da5c46f
      Oleg Nesterov 提交于
      commit 23196f2e "kthread: Pin the stack via try_get_task_stack() /
      put_task_stack() in to_live_kthread() function" is a workaround for the
      fragile design of struct kthread being allocated on the task stack.
      
      struct kthread in its current form should be removed, but this needs
      cleanups outside of kthread.c.
      
      As a first step move struct kthread away from the task stack by making it
      kmalloc'ed. This allows to access kthread.exited without the magic of
      trying to pin task stack and the try logic in to_live_kthread().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Chunming Zhou <David1.Zhou@amd.com>
      Cc: Roman Pen <roman.penyaev@profitbricks.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20161129175057.GA5330@redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      1da5c46f
    • M
      hotplug: Make register and unregister notifier API symmetric · 777c6e0d
      Michal Hocko 提交于
      Yu Zhao has noticed that __unregister_cpu_notifier only unregisters its
      notifiers when HOTPLUG_CPU=y while the registration might succeed even
      when HOTPLUG_CPU=n if MODULE is enabled. This means that e.g. zswap
      might keep a stale notifier on the list on the manual clean up during
      the pool tear down and thus corrupt the list. Resulting in the following
      
      [  144.964346] BUG: unable to handle kernel paging request at ffff880658a2be78
      [  144.971337] IP: [<ffffffffa290b00b>] raw_notifier_chain_register+0x1b/0x40
      <snipped>
      [  145.122628] Call Trace:
      [  145.125086]  [<ffffffffa28e5cf8>] __register_cpu_notifier+0x18/0x20
      [  145.131350]  [<ffffffffa2a5dd73>] zswap_pool_create+0x273/0x400
      [  145.137268]  [<ffffffffa2a5e0fc>] __zswap_param_set+0x1fc/0x300
      [  145.143188]  [<ffffffffa2944c1d>] ? trace_hardirqs_on+0xd/0x10
      [  145.149018]  [<ffffffffa2908798>] ? kernel_param_lock+0x28/0x30
      [  145.154940]  [<ffffffffa2a3e8cf>] ? __might_fault+0x4f/0xa0
      [  145.160511]  [<ffffffffa2a5e237>] zswap_compressor_param_set+0x17/0x20
      [  145.167035]  [<ffffffffa2908d3c>] param_attr_store+0x5c/0xb0
      [  145.172694]  [<ffffffffa290848d>] module_attr_store+0x1d/0x30
      [  145.178443]  [<ffffffffa2b2b41f>] sysfs_kf_write+0x4f/0x70
      [  145.183925]  [<ffffffffa2b2a5b9>] kernfs_fop_write+0x149/0x180
      [  145.189761]  [<ffffffffa2a99248>] __vfs_write+0x18/0x40
      [  145.194982]  [<ffffffffa2a9a412>] vfs_write+0xb2/0x1a0
      [  145.200122]  [<ffffffffa2a9a732>] SyS_write+0x52/0xa0
      [  145.205177]  [<ffffffffa2ff4d97>] entry_SYSCALL_64_fastpath+0x12/0x17
      
      This can be even triggered manually by changing
      /sys/module/zswap/parameters/compressor multiple times.
      
      Fix this issue by making unregister APIs symmetric to the register so
      there are no surprises.
      
      Fixes: 47e627bc ("[PATCH] hotplug: Allow modules to use the cpu hotplug notifiers even if !CONFIG_HOTPLUG_CPU")
      Reported-and-tested-by: NYu Zhao <yuzhao@google.com>
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Cc: linux-mm@kvack.org
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Link: http://lkml.kernel.org/r/20161207135438.4310-1-mhocko@kernel.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      777c6e0d
  6. 07 12月, 2016 8 次提交
    • L
      gpio: pl061: move platform data into driver · 562b4884
      Linus Walleij 提交于
      No boardfile defines any PL061 platform data anymore: the
      Integrator IM/PD-1 includes the file but is not making use
      of the struct. Let's delete the include and all references,
      then move the platform data into the driver for later
      consolidation into the driver state container.
      
      The only resource defined by the IM/PD-1 is the IRQ which
      is passed through the AMBA PrimeCell bus abstraction
      struct amba_device.
      
      Cc: arm@vger.kernel.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: Russell King <linux@armlinux.org.uk>
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      562b4884
    • D
      acpi, nfit, libnvdimm: fix / harden ars_status output length handling · efda1b5d
      Dan Williams 提交于
      Given ambiguities in the ACPI 6.1 definition of the "Output (Size)"
      field of the ARS (Address Range Scrub) Status command, a firmware
      implementation may in practice return 0, 4, or 8 to indicate that there
      is no output payload to process.
      
      The specification states "Size of Output Buffer in bytes, including this
      field.". However, 'Output Buffer' is also the name of the entire
      payload, and earlier in the specification it states "Max Query ARS
      Status Output Buffer Size: Maximum size of buffer (including the Status
      and Extended Status fields)".
      
      Without this fix if the BIOS happens to return 0 it causes memory
      corruption as evidenced by this result from the acpi_nfit_ctl() unit
      test.
      
       ars_status00000000: 00020000 00000000                    ........
       BUG: stack guard page was hit at ffffc90001750000 (stack is ffffc9000174c000..ffffc9000174ffff)
       kernel stack overflow (page fault): 0000 [#1] SMP DEBUG_PAGEALLOC
       task: ffff8803332d2ec0 task.stack: ffffc9000174c000
       RIP: 0010:[<ffffffff814cfe72>]  [<ffffffff814cfe72>] __memcpy+0x12/0x20
       RSP: 0018:ffffc9000174f9a8  EFLAGS: 00010246
       RAX: ffffc9000174fab8 RBX: 0000000000000000 RCX: 000000001fffff56
       RDX: 0000000000000000 RSI: ffff8803231f5a08 RDI: ffffc90001750000
       RBP: ffffc9000174fa88 R08: ffffc9000174fab0 R09: ffff8803231f54b8
       R10: 0000000000000008 R11: 0000000000000001 R12: 0000000000000000
       R13: 0000000000000000 R14: 0000000000000003 R15: ffff8803231f54a0
       FS:  00007f3a611af640(0000) GS:ffff88033ed00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: ffffc90001750000 CR3: 0000000325b20000 CR4: 00000000000406e0
       Stack:
        ffffffffa00bc60d 0000000000000008 ffffc90000000001 ffffc9000174faac
        0000000000000292 ffffffffa00c24e4 ffffffffa00c2914 0000000000000000
        0000000000000000 ffffffff00000003 ffff880331ae8ad0 0000000800000246
       Call Trace:
        [<ffffffffa00bc60d>] ? acpi_nfit_ctl+0x49d/0x750 [nfit]
        [<ffffffffa01f4fe0>] nfit_test_probe+0x670/0xb1b [nfit_test]
      
      Cc: <stable@vger.kernel.org>
      Fixes: 747ffe11 ("libnvdimm, tools/testing/nvdimm: fix 'ars_status' output buffer sizing")
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      efda1b5d
    • F
      netfilter: ingress: translate 0 nf_hook_slow retval to -1 · df122f58
      Florian Westphal 提交于
      The caller assumes that < 0 means that skb was stolen (or free'd).
      
      All other return values continue skb processing.
      
      nf_hook_slow returns 3 different return value types:
      
      A) a (negative) errno value: the skb was dropped (NF_DROP, e.g.
      by iptables '-j DROP' rule).
      
      B) 0. The skb was stolen by the hook or queued to userspace.
      
      C) 1. all hooks returned NF_ACCEPT so the caller should invoke
         the okfn so packet processing can continue.
      
      nft ingress facility currently doesn't have the 'okfn' that
      the NF_HOOK() macros use; there is no nfqueue support either.
      
      So 1 means that nf_hook_ingress() caller should go on processing the skb.
      
      In order to allow use of NF_STOLEN from ingress we need to translate
      this to an errno number, else we'd crash because we continue with
      already-free'd (or about to be free-d) skb.
      
      The errno value isn't checked, its just important that its less than 0,
      so return -1.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      df122f58
    • F
      netfilter: x_tables: pack percpu counter allocations · ae0ac0ed
      Florian Westphal 提交于
      instead of allocating each xt_counter individually, allocate 4k chunks
      and then use these for counter allocation requests.
      
      This should speed up rule evaluation by increasing data locality,
      also speeds up ruleset loading because we reduce calls to the percpu
      allocator.
      
      As Eric points out we can't use PAGE_SIZE, page_allocator would fail on
      arches with 64k page size.
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      ae0ac0ed
    • F
      netfilter: x_tables: pass xt_counters struct to counter allocator · f28e15ba
      Florian Westphal 提交于
      Keeps some noise away from a followup patch.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      f28e15ba
    • F
      netfilter: x_tables: pass xt_counters struct instead of packet counter · 4d31eef5
      Florian Westphal 提交于
      On SMP we overload the packet counter (unsigned long) to contain
      percpu offset.  Hide this from callers and pass xt_counters address
      instead.
      
      Preparation patch to allocate the percpu counters in page-sized batch
      chunks.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      4d31eef5
    • A
      netfilter: decouple nf_hook_entry and nf_hook_ops · d415b9eb
      Aaron Conole 提交于
      During nfhook traversal we only need a very small subset of
      nf_hook_ops members.
      
      We need:
      - next element
      - hook function to call
      - hook function priv argument
      
      Bridge netfilter also needs 'thresh'; can be obtained via ->orig_ops.
      
      nf_hook_entry struct is now 32 bytes on x86_64.
      
      A followup patch will turn the run-time list into an array that only
      stores hook functions plus their priv arguments, eliminating the ->next
      element.
      Suggested-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NAaron Conole <aconole@bytheb.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      d415b9eb
    • A
      netfilter: introduce accessor functions for hook entries · 0aa8c57a
      Aaron Conole 提交于
      This allows easier future refactoring.
      Signed-off-by: NAaron Conole <aconole@bytheb.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      0aa8c57a
  7. 06 12月, 2016 1 次提交