1. 16 6月, 2015 3 次提交
  2. 13 6月, 2015 1 次提交
  3. 11 6月, 2015 5 次提交
  4. 05 6月, 2015 2 次提交
  5. 03 6月, 2015 1 次提交
  6. 01 6月, 2015 2 次提交
    • M
      include/uapi/linux/virtio_balloon.h: include linux/virtio_types.h · 8a7b19d8
      Mikko Rapeli 提交于
      Fixes userspace compilation error:
      
      error: unknown type name ‘__virtio16’
        __virtio16 tag;
      Signed-off-by: NMikko Rapeli <mikko.rapeli@iki.fi>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      8a7b19d8
    • N
      tcp: fix child sockets to use system default congestion control if not set · 9f950415
      Neal Cardwell 提交于
      Linux 3.17 and earlier are explicitly engineered so that if the app
      doesn't specifically request a CC module on a listener before the SYN
      arrives, then the child gets the system default CC when the connection
      is established. See tcp_init_congestion_control() in 3.17 or earlier,
      which says "if no choice made yet assign the current value set as
      default". The change ("net: tcp: assign tcp cong_ops when tcp sk is
      created") altered these semantics, so that children got their parent
      listener's congestion control even if the system default had changed
      after the listener was created.
      
      This commit returns to those original semantics from 3.17 and earlier,
      since they are the original semantics from 2007 in 4d4d3d1e ("[TCP]:
      Congestion control initialization."), and some Linux congestion
      control workflows depend on that.
      
      In summary, if a listener socket specifically sets TCP_CONGESTION to
      "x", or the route locks the CC module to "x", then the child gets
      "x". Otherwise the child gets current system default from
      net.ipv4.tcp_congestion_control. That's the behavior in 3.17 and
      earlier, and this commit restores that.
      
      Fixes: 55d8694f ("net: tcp: assign tcp cong_ops when tcp sk is created")
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Daniel Borkmann <dborkman@redhat.com>
      Cc: Glenn Judd <glenn.judd@morganstanley.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f950415
  7. 31 5月, 2015 3 次提交
  8. 29 5月, 2015 5 次提交
    • S
      tracing/mm: don't trace mm_page_pcpu_drain on offline cpus · 649b8de2
      Shreyas B. Prabhu 提交于
      Since tracepoints use RCU for protection, they must not be called on
      offline cpus.  trace_mm_page_pcpu_drain can be called on an offline cpu
      in this scenario caught by LOCKDEP:
      
           ===============================
           [ INFO: suspicious RCU usage. ]
           4.1.0-rc1+ #9 Not tainted
           -------------------------------
           include/trace/events/kmem.h:265 suspicious rcu_dereference_check() usage!
      
          other info that might help us debug this:
      
          RCU used illegally from offline CPU!
          rcu_scheduler_active = 1, debug_locks = 1
           1 lock held by swapper/5/0:
            #0:  (&(&zone->lock)->rlock){..-...}, at: [<c0000000002073b0>] .free_pcppages_bulk+0x70/0x920
      
          stack backtrace:
           CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.1.0-rc1+ #9
           Call Trace:
             .dump_stack+0x98/0xd4 (unreliable)
             .lockdep_rcu_suspicious+0x108/0x170
             .free_pcppages_bulk+0x60c/0x920
             .free_hot_cold_page+0x208/0x280
             .destroy_context+0x90/0xd0
             .__mmdrop+0x58/0x160
             .idle_task_exit+0xf0/0x100
             .pnv_smp_cpu_kill_self+0x58/0x2c0
             .cpu_die+0x34/0x50
             .arch_cpu_idle_dead+0x20/0x40
             .cpu_startup_entry+0x708/0x7a0
             .start_secondary+0x36c/0x3a0
             start_secondary_prolog+0x10/0x14
      
      Fix this by converting mm_page_pcpu_drain trace point into
      TRACE_EVENT_CONDITION where condition is cpu_online(smp_processor_id())
      Signed-off-by: NShreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
      Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      649b8de2
    • S
      tracing/mm: don't trace mm_page_free on offline cpus · 1f0c27b5
      Shreyas B. Prabhu 提交于
      Since tracepoints use RCU for protection, they must not be called on
      offline cpus.  trace_mm_page_free can be called on an offline cpu in this
      scenario caught by LOCKDEP:
      
           ===============================
           [ INFO: suspicious RCU usage. ]
           4.1.0-rc1+ #9 Not tainted
           -------------------------------
           include/trace/events/kmem.h:170 suspicious rcu_dereference_check() usage!
      
          other info that might help us debug this:
      
          RCU used illegally from offline CPU!
          rcu_scheduler_active = 1, debug_locks = 1
           no locks held by swapper/1/0.
      
          stack backtrace:
           CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.1.0-rc1+ #9
           Call Trace:
             .dump_stack+0x98/0xd4 (unreliable)
             .lockdep_rcu_suspicious+0x108/0x170
             .free_pages_prepare+0x494/0x680
             .free_hot_cold_page+0x50/0x280
             .destroy_context+0x90/0xd0
             .__mmdrop+0x58/0x160
             .idle_task_exit+0xf0/0x100
             .pnv_smp_cpu_kill_self+0x58/0x2c0
             .cpu_die+0x34/0x50
             .arch_cpu_idle_dead+0x20/0x40
             .cpu_startup_entry+0x708/0x7a0
             .start_secondary+0x36c/0x3a0
             start_secondary_prolog+0x10/0x14
      
      Fix this by converting mm_page_free trace point into TRACE_EVENT_CONDITION
      where condition is cpu_online(smp_processor_id())
      Signed-off-by: NShreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
      Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1f0c27b5
    • S
      tracing/mm: don't trace kmem_cache_free on offline cpus · e5feb1eb
      Shreyas B. Prabhu 提交于
      Since tracepoints use RCU for protection, they must not be called on
      offline cpus.  trace_kmem_cache_free can be called on an offline cpu in
      this scenario caught by LOCKDEP:
      
          ===============================
          [ INFO: suspicious RCU usage. ]
          4.1.0-rc1+ #9 Not tainted
          -------------------------------
          include/trace/events/kmem.h:148 suspicious rcu_dereference_check() usage!
      
          other info that might help us debug this:
      
          RCU used illegally from offline CPU!
          rcu_scheduler_active = 1, debug_locks = 1
          no locks held by swapper/1/0.
      
          stack backtrace:
          CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.1.0-rc1+ #9
          Call Trace:
            .dump_stack+0x98/0xd4 (unreliable)
            .lockdep_rcu_suspicious+0x108/0x170
            .kmem_cache_free+0x344/0x4b0
            .__mmdrop+0x4c/0x160
            .idle_task_exit+0xf0/0x100
            .pnv_smp_cpu_kill_self+0x58/0x2c0
            .cpu_die+0x34/0x50
            .arch_cpu_idle_dead+0x20/0x40
            .cpu_startup_entry+0x708/0x7a0
            .start_secondary+0x36c/0x3a0
            start_secondary_prolog+0x10/0x14
      
      Fix this by converting kmem_cache_free trace point into
      TRACE_EVENT_CONDITION where condition is cpu_online(smp_processor_id())
      Signed-off-by: NShreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
      Reported-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e5feb1eb
    • D
      percpu_counter: batch size aware __percpu_counter_compare() · 80188b0d
      Dave Chinner 提交于
      XFS uses non-stanard batch sizes for avoiding frequent global
      counter updates on it's allocated inode counters, as they increment
      or decrement in batches of 64 inodes. Hence the standard percpu
      counter batch of 32 means that the counter is effectively a global
      counter. Currently Xfs uses a batch size of 128 so that it doesn't
      take the global lock on every single modification.
      
      However, Xfs also needs to compare accurately against zero, which
      means we need to use percpu_counter_compare(), and that has a
      hard-coded batch size of 32, and hence will spuriously fail to
      detect when it is supposed to use precise comparisons and hence
      the accounting goes wrong.
      
      Add __percpu_counter_compare() to take a custom batch size so we can
      use it sanely in XFS and factor percpu_counter_compare() to use it.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      80188b0d
    • N
      block: discard bdi_unregister() in favour of bdi_destroy() · aad653a0
      NeilBrown 提交于
      bdi_unregister() now contains very little functionality.
      
      It contains a "WARN_ON" if bdi->dev is NULL.  This warning is of no
      real consequence as bdi->dev isn't needed by anything else in the function,
      and it triggers if
         blk_cleanup_queue() -> bdi_destroy()
      is called before bdi_unregister, which happens since
        Commit: 6cd18e71 ("block: destroy bdi before blockdev is unregistered.")
      
      So this isn't wanted.
      
      It also calls bdi_set_min_ratio().  This needs to be called after
      writes through the bdi have all been flushed, and before the bdi is destroyed.
      Calling it early is better than calling it late as it frees up a global
      resource.
      
      Calling it immediately after bdi_wb_shutdown() in bdi_destroy()
      perfectly fits these requirements.
      
      So bdi_unregister() can be discarded with the important content moved to
      bdi_destroy(), as can the
        writeback_bdi_unregister
      event which is already not used.
      Reported-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org (v4.0)
      Fixes: c4db59d3 ("fs: don't reassign dirty inodes to default_backing_dev_info")
      Fixes: 6cd18e71 ("block: destroy bdi before blockdev is unregistered.")
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NDan Williams <dan.j.williams@intel.com>
      Tested-by: NNicholas Moulin <nicholas.w.moulin@linux.intel.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      aad653a0
  9. 28 5月, 2015 3 次提交
    • J
      mac80211: Fix mac80211.h docbook comments · 3a7af58f
      Jonathan Corbet 提交于
      A couple of enums in mac80211.h became structures recently, but the
      comments didn't follow suit, leading to errors like:
      
        Error(.//include/net/mac80211.h:367): Cannot parse enum!
        Documentation/DocBook/Makefile:93: recipe for target 'Documentation/DocBook/80211.xml' failed
        make[1]: *** [Documentation/DocBook/80211.xml] Error 1
        Makefile:1361: recipe for target 'mandocs' failed
        make: *** [mandocs] Error 2
      
      Fix the comments comments accordingly.  Added a couple of other small
      comment fixes while I was there to silence other recently-added docbook
      warnings.
      Reported-by: NJim Davis <jim.epost@gmail.com>
      Signed-off-by: NJonathan Corbet <corbet@lwn.net>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      3a7af58f
    • R
      cpumask_set_cpu_local_first => cpumask_local_spread, lament · f36963c9
      Rusty Russell 提交于
      da91309e (cpumask: Utility function to set n'th cpu...) created a
      genuinely weird function.  I never saw it before, it went through DaveM.
      (He only does this to make us other maintainers feel better about our own
      mistakes.)
      
      cpumask_set_cpu_local_first's purpose is say "I need to spread things
      across N online cpus, choose the ones on this numa node first"; you call
      it in a loop.
      
      It can fail.  One of the two callers ignores this, the other aborts and
      fails the device open.
      
      It can fail in two ways: allocating the off-stack cpumask, or through a
      convoluted codepath which AFAICT can only occur if cpu_online_mask
      changes.  Which shouldn't happen, because if cpu_online_mask can change
      while you call this, it could return a now-offline cpu anyway.
      
      It contains a nonsensical test "!cpumask_of_node(numa_node)".  This was
      drawn to my attention by Geert, who said this causes a warning on Sparc.
      It sets a single bit in a cpumask instead of returning a cpu number,
      because that's what the callers want.
      
      It could be made more efficient by passing the previous cpu rather than
      an index, but that would be more invasive to the callers.
      
      Fixes: da91309e
      Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (then rebased)
      Tested-by: NAmir Vadai <amirv@mellanox.com>
      Acked-by: NAmir Vadai <amirv@mellanox.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      f36963c9
    • J
      sctp: Fix mangled IPv4 addresses on a IPv6 listening socket · 9302d7bb
      Jason Gunthorpe 提交于
      sctp_v4_map_v6 was subtly writing and reading from members
      of a union in a way the clobbered data it needed to read before
      it read it.
      
      Zeroing the v6 flowinfo overwrites the v4 sin_addr with 0, meaning
      that every place that calls sctp_v4_map_v6 gets ::ffff:0.0.0.0 as the
      result.
      
      Reorder things to guarantee correct behaviour no matter what the
      union layout is.
      
      This impacts user space clients that open an IPv6 SCTP socket and
      receive IPv4 connections. Prior to 299ee user space would see a
      sockaddr with AF_INET and a correct address, after 299ee the sockaddr
      is AF_INET6, but the address is wrong.
      
      Fixes: 299ee123 (sctp: Fixup v4mapped behaviour to comply with Sock API)
      Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9302d7bb
  10. 27 5月, 2015 2 次提交
    • P
      perf/x86: Fix event/group validation · b371b594
      Peter Zijlstra 提交于
      Commit 43b45780 ("perf/x86: Reduce stack usage of
      x86_schedule_events()") violated the rule that 'fake' scheduling; as
      used for event/group validation; should not change the event state.
      
      This went mostly un-noticed because repeated calls of
      x86_pmu::get_event_constraints() would give the same result. And
      x86_pmu::put_event_constraints() would mostly not do anything.
      
      Commit e979121b ("perf/x86/intel: Implement cross-HT corruption
      bug workaround") made the situation much worse by actually setting the
      event->hw.constraint value to NULL, so when validation and actual
      scheduling interact we get NULL ptr derefs.
      
      Fix it by removing the constraint pointer from the event and move it
      back to an array, this time in cpuc instead of on the stack.
      
      validate_group()
        x86_schedule_events()
          event->hw.constraint = c; # store
      
            <context switch>
              perf_task_event_sched_in()
                ...
                  x86_schedule_events();
                    event->hw.constraint = c2; # store
      
                    ...
      
                    put_event_constraints(event); # assume failure to schedule
                      intel_put_event_constraints()
                        event->hw.constraint = NULL;
      
            <context switch end>
      
          c = event->hw.constraint; # read -> NULL
      
          if (!test_bit(hwc->idx, c->idxmsk)) # <- *BOOM* NULL deref
      
      This in particular is possible when the event in question is a
      cpu-wide event and group-leader, where the validate_group() tries to
      add an event to the group.
      Reported-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 43b45780 ("perf/x86: Reduce stack usage of x86_schedule_events()")
      Fixes: e979121b ("perf/x86/intel: Implement cross-HT corruption bug workaround")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      b371b594
    • S
      drivers: of/base: move of_init to driver_init · 194ec936
      Sudeep Holla 提交于
      Commit 5590f319 ("drivers/core/of: Add symlink to device-tree from
      devices with an OF node") adds the symlink `of_node` for each device
      pointing to it's device tree node while creating/initialising it.
      
      However the devicetree sysfs is created and setup in of_init which is
      executed at core_initcall level. For all the devices created before
      of_init, the following error is thrown:
      	"Error -2(-ENOENT) creating of_node link"
      
      Like many other components in driver model, initialize the sysfs support
      for OF/devicetree from driver_init so that it's ready before any devices
      are created.
      
      Fixes: 5590f319 ("drivers/core/of: Add symlink to device-tree from
      	devices with an OF node")
      Suggested-by: NRob Herring <robh+dt@kernel.org>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: Pawel Moll <pawel.moll@arm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NSudeep Holla <sudeep.holla@arm.com>
      Tested-by: NRobert Schwebel <r.schwebel@pengutronix.de>
      Acked-by: NRob Herring <robh@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      194ec936
  11. 25 5月, 2015 1 次提交
  12. 23 5月, 2015 1 次提交
    • E
      tcp: fix a potential deadlock in tcp_get_info() · d654976c
      Eric Dumazet 提交于
      Taking socket spinlock in tcp_get_info() can deadlock, as
      inet_diag_dump_icsk() holds the &hashinfo->ehash_locks[i],
      while packet processing can use the reverse locking order.
      
      We could avoid this locking for TCP_LISTEN states, but lockdep would
      certainly get confused as all TCP sockets share same lockdep classes.
      
      [  523.722504] ======================================================
      [  523.728706] [ INFO: possible circular locking dependency detected ]
      [  523.734990] 4.1.0-dbg-DEV #1676 Not tainted
      [  523.739202] -------------------------------------------------------
      [  523.745474] ss/18032 is trying to acquire lock:
      [  523.750002]  (slock-AF_INET){+.-...}, at: [<ffffffff81669d44>] tcp_get_info+0x2c4/0x360
      [  523.758129]
      [  523.758129] but task is already holding lock:
      [  523.763968]  (&(&hashinfo->ehash_locks[i])->rlock){+.-...}, at: [<ffffffff816bcb75>] inet_diag_dump_icsk+0x1d5/0x6c0
      [  523.774661]
      [  523.774661] which lock already depends on the new lock.
      [  523.774661]
      [  523.782850]
      [  523.782850] the existing dependency chain (in reverse order) is:
      [  523.790326]
      -> #1 (&(&hashinfo->ehash_locks[i])->rlock){+.-...}:
      [  523.796599]        [<ffffffff811126bb>] lock_acquire+0xbb/0x270
      [  523.802565]        [<ffffffff816f5868>] _raw_spin_lock+0x38/0x50
      [  523.808628]        [<ffffffff81665af8>] __inet_hash_nolisten+0x78/0x110
      [  523.815273]        [<ffffffff816819db>] tcp_v4_syn_recv_sock+0x24b/0x350
      [  523.822067]        [<ffffffff81684d41>] tcp_check_req+0x3c1/0x500
      [  523.828199]        [<ffffffff81682d09>] tcp_v4_do_rcv+0x239/0x3d0
      [  523.834331]        [<ffffffff816842fe>] tcp_v4_rcv+0xa8e/0xc10
      [  523.840202]        [<ffffffff81658fa3>] ip_local_deliver_finish+0x133/0x3e0
      [  523.847214]        [<ffffffff81659a9a>] ip_local_deliver+0xaa/0xc0
      [  523.853440]        [<ffffffff816593b8>] ip_rcv_finish+0x168/0x5c0
      [  523.859624]        [<ffffffff81659db7>] ip_rcv+0x307/0x420
      
      Lets use u64_sync infrastructure instead. As a bonus, 64bit
      arches get optimized, as these are nop for them.
      
      Fixes: 0df48c26 ("tcp: add tcpi_bytes_acked to tcp_info")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d654976c
  13. 20 5月, 2015 2 次提交
    • F
      Revert "netfilter: bridge: query conntrack about skb dnat" · faecbb45
      Florian Westphal 提交于
      This reverts commit c055d5b0.
      
      There are two issues:
      'dnat_took_place' made me think that this is related to
      -j DNAT/MASQUERADE.
      
      But thats only one part of the story.  This is also relevant for SNAT
      when we undo snat translation in reverse/reply direction.
      
      Furthermore, I originally wanted to do this mainly to avoid
      storing ipv6 addresses once we make DNAT/REDIRECT work
      for ipv6 on bridges.
      
      However, I forgot about SNPT/DNPT which is stateless.
      
      So we can't escape storing address for ipv6 anyway. Might as
      well do it for ipv4 too.
      Reported-and-tested-by: NBernhard Thaler <bernhard.thaler@wvnet.at>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      faecbb45
    • D
      xen/events: don't bind non-percpu VIRQs with percpu chip · 77bb3dfd
      David Vrabel 提交于
      A non-percpu VIRQ (e.g., VIRQ_CONSOLE) may be freed on a different
      VCPU than it is bound to.  This can result in a race between
      handle_percpu_irq() and removing the action in __free_irq() because
      handle_percpu_irq() does not take desc->lock.  The interrupt handler
      sees a NULL action and oopses.
      
      Only use the percpu chip/handler for per-CPU VIRQs (like VIRQ_TIMER).
      
        # cat /proc/interrupts | grep virq
         40:      87246          0  xen-percpu-virq      timer0
         44:          0          0  xen-percpu-virq      debug0
         47:          0      20995  xen-percpu-virq      timer1
         51:          0          0  xen-percpu-virq      debug1
         69:          0          0   xen-dyn-virq      xen-pcpu
         74:          0          0   xen-dyn-virq      mce
         75:         29          0   xen-dyn-virq      hvc_console
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Cc: <stable@vger.kernel.org>
      77bb3dfd
  14. 19 5月, 2015 1 次提交
  15. 17 5月, 2015 1 次提交
    • H
      rhashtable: Add cap on number of elements in hash table · 07ee0722
      Herbert Xu 提交于
      We currently have no limit on the number of elements in a hash table.
      This is a problem because some users (tipc) set a ceiling on the
      maximum table size and when that is reached the hash table may
      degenerate.  Others may encounter OOM when growing and if we allow
      insertions when that happens the hash table perofrmance may also
      suffer.
      
      This patch adds a new paramater insecure_max_entries which becomes
      the cap on the table.  If unset it defaults to max_size * 2.  If
      it is also zero it means that there is no cap on the number of
      elements in the table.  However, the table will grow whenever the
      utilisation hits 100% and if that growth fails, you will get ENOMEM
      on insertion.
      
      As allowing oversubscription is potentially dangerous, the name
      contains the word insecure.
      
      Note that the cap is not a hard limit.  This is done for performance
      reasons as enforcing a hard limit will result in use of atomic ops
      that are heavier than the ones we currently use.
      
      The reasoning is that we're only guarding against a gross over-
      subscription of the table, rather than a small breach of the limit.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      07ee0722
  16. 16 5月, 2015 1 次提交
  17. 15 5月, 2015 3 次提交
    • R
      rename RTNH_F_EXTERNAL to RTNH_F_OFFLOAD · eea39946
      Roopa Prabhu 提交于
      RTNH_F_EXTERNAL today is printed as "offload" in iproute2 output.
      
      This patch renames the flag to be consistent with what the user sees.
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eea39946
    • J
      uidgid: make uid_valid and gid_valid work with !CONFIG_MULTIUSER · 929aa5b2
      Josh Triplett 提交于
      {u,g}id_valid call {u,g}id_eq, which calls __k{u,g}id_val on both
      arguments and compares.  With !CONFIG_MULTIUSER, __k{u,g}id_val return a
      constant 0, which makes {u,g}id_valid always return false.  Change
      {u,g}id_valid to compare their argument against -1 instead.  That produces
      identical results in the normal CONFIG_MULTIUSER=y case, but with
      !CONFIG_MULTIUSER will make {u,g}id_valid constant-fold into "return
      true;" rather than "return false;".
      
      This fixes uses of devpts without CONFIG_MULTIUSER.
      Signed-off-by: NJosh Triplett <josh@joshtriplett.org>
      Reported-by: Fengguang Wu <fengguang.wu@intel.com>,
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      929aa5b2
    • V
      gfp: add __GFP_NOACCOUNT · 8f4fc071
      Vladimir Davydov 提交于
      Not all kmem allocations should be accounted to memcg.  The following
      patch gives an example when accounting of a certain type of allocations to
      memcg can effectively result in a memory leak.  This patch adds the
      __GFP_NOACCOUNT flag which if passed to kmalloc and friends will force the
      allocation to go through the root cgroup.  It will be used by the next
      patch.
      
      Note, since in case of kmemleak enabled each kmalloc implies yet another
      allocation from the kmemleak_object cache, we add __GFP_NOACCOUNT to
      gfp_kmemleak_mask.
      
      Alternatively, we could introduce a per kmem cache flag disabling
      accounting for all allocations of a particular kind, but (a) we would not
      be able to bypass accounting for kmalloc then and (b) a kmem cache with
      this flag set could not be merged with a kmem cache without this flag,
      which would increase the number of global caches and therefore
      fragmentation even if the memory cgroup controller is not used.
      
      Despite its generic name, currently __GFP_NOACCOUNT disables accounting
      only for kmem allocations while user page allocations are always charged.
      To catch abusing of this flag, a warning is issued on an attempt of
      passing it to mem_cgroup_try_charge.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: <stable@vger.kernel.org>	[4.0.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8f4fc071
  18. 13 5月, 2015 3 次提交