1. 09 6月, 2015 1 次提交
  2. 31 5月, 2015 3 次提交
  3. 29 5月, 2015 4 次提交
    • S
      tracing/mm: don't trace mm_page_pcpu_drain on offline cpus · 649b8de2
      Shreyas B. Prabhu 提交于
      Since tracepoints use RCU for protection, they must not be called on
      offline cpus.  trace_mm_page_pcpu_drain can be called on an offline cpu
      in this scenario caught by LOCKDEP:
      
           ===============================
           [ INFO: suspicious RCU usage. ]
           4.1.0-rc1+ #9 Not tainted
           -------------------------------
           include/trace/events/kmem.h:265 suspicious rcu_dereference_check() usage!
      
          other info that might help us debug this:
      
          RCU used illegally from offline CPU!
          rcu_scheduler_active = 1, debug_locks = 1
           1 lock held by swapper/5/0:
            #0:  (&(&zone->lock)->rlock){..-...}, at: [<c0000000002073b0>] .free_pcppages_bulk+0x70/0x920
      
          stack backtrace:
           CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.1.0-rc1+ #9
           Call Trace:
             .dump_stack+0x98/0xd4 (unreliable)
             .lockdep_rcu_suspicious+0x108/0x170
             .free_pcppages_bulk+0x60c/0x920
             .free_hot_cold_page+0x208/0x280
             .destroy_context+0x90/0xd0
             .__mmdrop+0x58/0x160
             .idle_task_exit+0xf0/0x100
             .pnv_smp_cpu_kill_self+0x58/0x2c0
             .cpu_die+0x34/0x50
             .arch_cpu_idle_dead+0x20/0x40
             .cpu_startup_entry+0x708/0x7a0
             .start_secondary+0x36c/0x3a0
             start_secondary_prolog+0x10/0x14
      
      Fix this by converting mm_page_pcpu_drain trace point into
      TRACE_EVENT_CONDITION where condition is cpu_online(smp_processor_id())
      Signed-off-by: NShreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
      Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      649b8de2
    • S
      tracing/mm: don't trace mm_page_free on offline cpus · 1f0c27b5
      Shreyas B. Prabhu 提交于
      Since tracepoints use RCU for protection, they must not be called on
      offline cpus.  trace_mm_page_free can be called on an offline cpu in this
      scenario caught by LOCKDEP:
      
           ===============================
           [ INFO: suspicious RCU usage. ]
           4.1.0-rc1+ #9 Not tainted
           -------------------------------
           include/trace/events/kmem.h:170 suspicious rcu_dereference_check() usage!
      
          other info that might help us debug this:
      
          RCU used illegally from offline CPU!
          rcu_scheduler_active = 1, debug_locks = 1
           no locks held by swapper/1/0.
      
          stack backtrace:
           CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.1.0-rc1+ #9
           Call Trace:
             .dump_stack+0x98/0xd4 (unreliable)
             .lockdep_rcu_suspicious+0x108/0x170
             .free_pages_prepare+0x494/0x680
             .free_hot_cold_page+0x50/0x280
             .destroy_context+0x90/0xd0
             .__mmdrop+0x58/0x160
             .idle_task_exit+0xf0/0x100
             .pnv_smp_cpu_kill_self+0x58/0x2c0
             .cpu_die+0x34/0x50
             .arch_cpu_idle_dead+0x20/0x40
             .cpu_startup_entry+0x708/0x7a0
             .start_secondary+0x36c/0x3a0
             start_secondary_prolog+0x10/0x14
      
      Fix this by converting mm_page_free trace point into TRACE_EVENT_CONDITION
      where condition is cpu_online(smp_processor_id())
      Signed-off-by: NShreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
      Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1f0c27b5
    • S
      tracing/mm: don't trace kmem_cache_free on offline cpus · e5feb1eb
      Shreyas B. Prabhu 提交于
      Since tracepoints use RCU for protection, they must not be called on
      offline cpus.  trace_kmem_cache_free can be called on an offline cpu in
      this scenario caught by LOCKDEP:
      
          ===============================
          [ INFO: suspicious RCU usage. ]
          4.1.0-rc1+ #9 Not tainted
          -------------------------------
          include/trace/events/kmem.h:148 suspicious rcu_dereference_check() usage!
      
          other info that might help us debug this:
      
          RCU used illegally from offline CPU!
          rcu_scheduler_active = 1, debug_locks = 1
          no locks held by swapper/1/0.
      
          stack backtrace:
          CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.1.0-rc1+ #9
          Call Trace:
            .dump_stack+0x98/0xd4 (unreliable)
            .lockdep_rcu_suspicious+0x108/0x170
            .kmem_cache_free+0x344/0x4b0
            .__mmdrop+0x4c/0x160
            .idle_task_exit+0xf0/0x100
            .pnv_smp_cpu_kill_self+0x58/0x2c0
            .cpu_die+0x34/0x50
            .arch_cpu_idle_dead+0x20/0x40
            .cpu_startup_entry+0x708/0x7a0
            .start_secondary+0x36c/0x3a0
            start_secondary_prolog+0x10/0x14
      
      Fix this by converting kmem_cache_free trace point into
      TRACE_EVENT_CONDITION where condition is cpu_online(smp_processor_id())
      Signed-off-by: NShreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
      Reported-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e5feb1eb
    • D
      percpu_counter: batch size aware __percpu_counter_compare() · 80188b0d
      Dave Chinner 提交于
      XFS uses non-stanard batch sizes for avoiding frequent global
      counter updates on it's allocated inode counters, as they increment
      or decrement in batches of 64 inodes. Hence the standard percpu
      counter batch of 32 means that the counter is effectively a global
      counter. Currently Xfs uses a batch size of 128 so that it doesn't
      take the global lock on every single modification.
      
      However, Xfs also needs to compare accurately against zero, which
      means we need to use percpu_counter_compare(), and that has a
      hard-coded batch size of 32, and hence will spuriously fail to
      detect when it is supposed to use precise comparisons and hence
      the accounting goes wrong.
      
      Add __percpu_counter_compare() to take a custom batch size so we can
      use it sanely in XFS and factor percpu_counter_compare() to use it.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      80188b0d
  4. 28 5月, 2015 2 次提交
    • R
      cpumask_set_cpu_local_first => cpumask_local_spread, lament · f36963c9
      Rusty Russell 提交于
      da91309e (cpumask: Utility function to set n'th cpu...) created a
      genuinely weird function.  I never saw it before, it went through DaveM.
      (He only does this to make us other maintainers feel better about our own
      mistakes.)
      
      cpumask_set_cpu_local_first's purpose is say "I need to spread things
      across N online cpus, choose the ones on this numa node first"; you call
      it in a loop.
      
      It can fail.  One of the two callers ignores this, the other aborts and
      fails the device open.
      
      It can fail in two ways: allocating the off-stack cpumask, or through a
      convoluted codepath which AFAICT can only occur if cpu_online_mask
      changes.  Which shouldn't happen, because if cpu_online_mask can change
      while you call this, it could return a now-offline cpu anyway.
      
      It contains a nonsensical test "!cpumask_of_node(numa_node)".  This was
      drawn to my attention by Geert, who said this causes a warning on Sparc.
      It sets a single bit in a cpumask instead of returning a cpu number,
      because that's what the callers want.
      
      It could be made more efficient by passing the previous cpu rather than
      an index, but that would be more invasive to the callers.
      
      Fixes: da91309e
      Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (then rebased)
      Tested-by: NAmir Vadai <amirv@mellanox.com>
      Acked-by: NAmir Vadai <amirv@mellanox.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      f36963c9
    • J
      sctp: Fix mangled IPv4 addresses on a IPv6 listening socket · 9302d7bb
      Jason Gunthorpe 提交于
      sctp_v4_map_v6 was subtly writing and reading from members
      of a union in a way the clobbered data it needed to read before
      it read it.
      
      Zeroing the v6 flowinfo overwrites the v4 sin_addr with 0, meaning
      that every place that calls sctp_v4_map_v6 gets ::ffff:0.0.0.0 as the
      result.
      
      Reorder things to guarantee correct behaviour no matter what the
      union layout is.
      
      This impacts user space clients that open an IPv6 SCTP socket and
      receive IPv4 connections. Prior to 299ee user space would see a
      sockaddr with AF_INET and a correct address, after 299ee the sockaddr
      is AF_INET6, but the address is wrong.
      
      Fixes: 299ee123 (sctp: Fixup v4mapped behaviour to comply with Sock API)
      Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9302d7bb
  5. 25 5月, 2015 1 次提交
  6. 23 5月, 2015 1 次提交
    • E
      tcp: fix a potential deadlock in tcp_get_info() · d654976c
      Eric Dumazet 提交于
      Taking socket spinlock in tcp_get_info() can deadlock, as
      inet_diag_dump_icsk() holds the &hashinfo->ehash_locks[i],
      while packet processing can use the reverse locking order.
      
      We could avoid this locking for TCP_LISTEN states, but lockdep would
      certainly get confused as all TCP sockets share same lockdep classes.
      
      [  523.722504] ======================================================
      [  523.728706] [ INFO: possible circular locking dependency detected ]
      [  523.734990] 4.1.0-dbg-DEV #1676 Not tainted
      [  523.739202] -------------------------------------------------------
      [  523.745474] ss/18032 is trying to acquire lock:
      [  523.750002]  (slock-AF_INET){+.-...}, at: [<ffffffff81669d44>] tcp_get_info+0x2c4/0x360
      [  523.758129]
      [  523.758129] but task is already holding lock:
      [  523.763968]  (&(&hashinfo->ehash_locks[i])->rlock){+.-...}, at: [<ffffffff816bcb75>] inet_diag_dump_icsk+0x1d5/0x6c0
      [  523.774661]
      [  523.774661] which lock already depends on the new lock.
      [  523.774661]
      [  523.782850]
      [  523.782850] the existing dependency chain (in reverse order) is:
      [  523.790326]
      -> #1 (&(&hashinfo->ehash_locks[i])->rlock){+.-...}:
      [  523.796599]        [<ffffffff811126bb>] lock_acquire+0xbb/0x270
      [  523.802565]        [<ffffffff816f5868>] _raw_spin_lock+0x38/0x50
      [  523.808628]        [<ffffffff81665af8>] __inet_hash_nolisten+0x78/0x110
      [  523.815273]        [<ffffffff816819db>] tcp_v4_syn_recv_sock+0x24b/0x350
      [  523.822067]        [<ffffffff81684d41>] tcp_check_req+0x3c1/0x500
      [  523.828199]        [<ffffffff81682d09>] tcp_v4_do_rcv+0x239/0x3d0
      [  523.834331]        [<ffffffff816842fe>] tcp_v4_rcv+0xa8e/0xc10
      [  523.840202]        [<ffffffff81658fa3>] ip_local_deliver_finish+0x133/0x3e0
      [  523.847214]        [<ffffffff81659a9a>] ip_local_deliver+0xaa/0xc0
      [  523.853440]        [<ffffffff816593b8>] ip_rcv_finish+0x168/0x5c0
      [  523.859624]        [<ffffffff81659db7>] ip_rcv+0x307/0x420
      
      Lets use u64_sync infrastructure instead. As a bonus, 64bit
      arches get optimized, as these are nop for them.
      
      Fixes: 0df48c26 ("tcp: add tcpi_bytes_acked to tcp_info")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d654976c
  7. 20 5月, 2015 2 次提交
    • F
      Revert "netfilter: bridge: query conntrack about skb dnat" · faecbb45
      Florian Westphal 提交于
      This reverts commit c055d5b0.
      
      There are two issues:
      'dnat_took_place' made me think that this is related to
      -j DNAT/MASQUERADE.
      
      But thats only one part of the story.  This is also relevant for SNAT
      when we undo snat translation in reverse/reply direction.
      
      Furthermore, I originally wanted to do this mainly to avoid
      storing ipv6 addresses once we make DNAT/REDIRECT work
      for ipv6 on bridges.
      
      However, I forgot about SNPT/DNPT which is stateless.
      
      So we can't escape storing address for ipv6 anyway. Might as
      well do it for ipv4 too.
      Reported-and-tested-by: NBernhard Thaler <bernhard.thaler@wvnet.at>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      faecbb45
    • D
      xen/events: don't bind non-percpu VIRQs with percpu chip · 77bb3dfd
      David Vrabel 提交于
      A non-percpu VIRQ (e.g., VIRQ_CONSOLE) may be freed on a different
      VCPU than it is bound to.  This can result in a race between
      handle_percpu_irq() and removing the action in __free_irq() because
      handle_percpu_irq() does not take desc->lock.  The interrupt handler
      sees a NULL action and oopses.
      
      Only use the percpu chip/handler for per-CPU VIRQs (like VIRQ_TIMER).
      
        # cat /proc/interrupts | grep virq
         40:      87246          0  xen-percpu-virq      timer0
         44:          0          0  xen-percpu-virq      debug0
         47:          0      20995  xen-percpu-virq      timer1
         51:          0          0  xen-percpu-virq      debug1
         69:          0          0   xen-dyn-virq      xen-pcpu
         74:          0          0   xen-dyn-virq      mce
         75:         29          0   xen-dyn-virq      hvc_console
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Cc: <stable@vger.kernel.org>
      77bb3dfd
  8. 19 5月, 2015 1 次提交
  9. 17 5月, 2015 1 次提交
    • H
      rhashtable: Add cap on number of elements in hash table · 07ee0722
      Herbert Xu 提交于
      We currently have no limit on the number of elements in a hash table.
      This is a problem because some users (tipc) set a ceiling on the
      maximum table size and when that is reached the hash table may
      degenerate.  Others may encounter OOM when growing and if we allow
      insertions when that happens the hash table perofrmance may also
      suffer.
      
      This patch adds a new paramater insecure_max_entries which becomes
      the cap on the table.  If unset it defaults to max_size * 2.  If
      it is also zero it means that there is no cap on the number of
      elements in the table.  However, the table will grow whenever the
      utilisation hits 100% and if that growth fails, you will get ENOMEM
      on insertion.
      
      As allowing oversubscription is potentially dangerous, the name
      contains the word insecure.
      
      Note that the cap is not a hard limit.  This is done for performance
      reasons as enforcing a hard limit will result in use of atomic ops
      that are heavier than the ones we currently use.
      
      The reasoning is that we're only guarding against a gross over-
      subscription of the table, rather than a small breach of the limit.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      07ee0722
  10. 16 5月, 2015 1 次提交
  11. 15 5月, 2015 3 次提交
    • R
      rename RTNH_F_EXTERNAL to RTNH_F_OFFLOAD · eea39946
      Roopa Prabhu 提交于
      RTNH_F_EXTERNAL today is printed as "offload" in iproute2 output.
      
      This patch renames the flag to be consistent with what the user sees.
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eea39946
    • J
      uidgid: make uid_valid and gid_valid work with !CONFIG_MULTIUSER · 929aa5b2
      Josh Triplett 提交于
      {u,g}id_valid call {u,g}id_eq, which calls __k{u,g}id_val on both
      arguments and compares.  With !CONFIG_MULTIUSER, __k{u,g}id_val return a
      constant 0, which makes {u,g}id_valid always return false.  Change
      {u,g}id_valid to compare their argument against -1 instead.  That produces
      identical results in the normal CONFIG_MULTIUSER=y case, but with
      !CONFIG_MULTIUSER will make {u,g}id_valid constant-fold into "return
      true;" rather than "return false;".
      
      This fixes uses of devpts without CONFIG_MULTIUSER.
      Signed-off-by: NJosh Triplett <josh@joshtriplett.org>
      Reported-by: Fengguang Wu <fengguang.wu@intel.com>,
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      929aa5b2
    • V
      gfp: add __GFP_NOACCOUNT · 8f4fc071
      Vladimir Davydov 提交于
      Not all kmem allocations should be accounted to memcg.  The following
      patch gives an example when accounting of a certain type of allocations to
      memcg can effectively result in a memory leak.  This patch adds the
      __GFP_NOACCOUNT flag which if passed to kmalloc and friends will force the
      allocation to go through the root cgroup.  It will be used by the next
      patch.
      
      Note, since in case of kmemleak enabled each kmalloc implies yet another
      allocation from the kmemleak_object cache, we add __GFP_NOACCOUNT to
      gfp_kmemleak_mask.
      
      Alternatively, we could introduce a per kmem cache flag disabling
      accounting for all allocations of a particular kind, but (a) we would not
      be able to bypass accounting for kmalloc then and (b) a kmem cache with
      this flag set could not be merged with a kmem cache without this flag,
      which would increase the number of global caches and therefore
      fragmentation even if the memory cgroup controller is not used.
      
      Despite its generic name, currently __GFP_NOACCOUNT disables accounting
      only for kmem allocations while user page allocations are always charged.
      To catch abusing of this flag, a warning is issued on an attempt of
      passing it to mem_cgroup_try_charge.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: <stable@vger.kernel.org>	[4.0.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8f4fc071
  12. 13 5月, 2015 4 次提交
  13. 12 5月, 2015 1 次提交
    • S
      HID: hid-sensor-hub: Fix debug lock warning · 2d94e522
      Srinivas Pandruvada 提交于
      When CONFIG_DEBUG_LOCK_ALLOC is defined, mutex magic is compared and
      warned for (l->magic != l), here l is the address of mutex pointer.
      In hid-sensor-hub as part of hsdev creation, a per hsdev mutex is
      initialized during MFD cell creation. This hsdev, which contains, mutex
      is part of platform data for the a cell. But platform_data is copied
      in platform_device_add_data() in platform.c. This copy will copy the
      whole hsdev structure including mutex. But once copied the magic
      will no longer match. So when client driver call
      sensor_hub_input_attr_get_raw_value, this will trigger mutex warning.
      So to avoid this allocate mutex dynamically. This will be same even
      after copy.
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      2d94e522
  14. 11 5月, 2015 1 次提交
    • P
      pty: Fix input race when closing · 1a48632f
      Peter Hurley 提交于
      A read() from a pty master may mistakenly indicate EOF (errno == -EIO)
      after the pty slave has closed, even though input data remains to be read.
      For example,
      
             pty slave       |        input worker        |    pty master
                             |                            |
                             |                            |   n_tty_read()
      pty_write()            |                            |     input avail? no
        add data             |                            |     sleep
        schedule worker  --->|                            |     .
                             |---> flush_to_ldisc()       |     .
      pty_close()            |       fill read buffer     |     .
        wait for worker      |       wakeup reader    --->|     .
                             |       read buffer full?    |---> input avail ? yes
                             |<---   yes - exit worker    |     copy 4096 bytes to user
        TTY_OTHER_CLOSED <---|                            |<--- kick worker
                             |                            |
      
      		                **** New read() before worker starts ****
      
                             |                            |   n_tty_read()
                             |                            |     input avail? no
                             |                            |     TTY_OTHER_CLOSED? yes
                             |                            |     return -EIO
      
      Several conditions are required to trigger this race:
      1. the ldisc read buffer must become full so the input worker exits
      2. the read() count parameter must be >= 4096 so the ldisc read buffer
         is empty
      3. the subsequent read() occurs before the kicked worker has processed
         more input
      
      However, the underlying cause of the race is that data is pipelined, while
      tty state is not; ie., data already written by the pty slave end is not
      yet visible to the pty master end, but state changes by the pty slave end
      are visible to the pty master end immediately.
      
      Pipeline the TTY_OTHER_CLOSED state through input worker to the reader.
      1. Introduce TTY_OTHER_DONE which is set by the input worker when
         TTY_OTHER_CLOSED is set and either the input buffers are flushed or
         input processing has completed. Readers/polls are woken when
         TTY_OTHER_DONE is set.
      2. Reader/poll checks TTY_OTHER_DONE instead of TTY_OTHER_CLOSED.
      3. A new input worker is started from pty_close() after setting
         TTY_OTHER_CLOSED, which ensures the TTY_OTHER_DONE state will be
         set if the last input worker is already finished (or just about to
         exit).
      
      Remove tty_flush_to_ldisc(); no in-tree callers.
      
      Fixes: 52bce7f8 ("pty, n_tty: Simplify input processing on final close")
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=96311
      BugLink: http://bugs.launchpad.net/bugs/1429756
      Cc: <stable@vger.kernel.org> # 3.19+
      Reported-by: NAndy Whitcroft <apw@canonical.com>
      Reported-by: NH.J. Lu <hjl.tools@gmail.com>
      Signed-off-by: NPeter Hurley <peter@hurleysoftware.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1a48632f
  15. 10 5月, 2015 1 次提交
  16. 09 5月, 2015 1 次提交
  17. 08 5月, 2015 1 次提交
  18. 07 5月, 2015 1 次提交
  19. 06 5月, 2015 5 次提交
  20. 05 5月, 2015 2 次提交
    • T
      RDMA/core: Enable the iWarp Port Mapper to provide the actual address of the... · 6eec1774
      Tatyana Nikolova 提交于
      RDMA/core: Enable the iWarp Port Mapper to provide the actual address of the connecting peer to its clients
      
      Add functionality to enable the port mapper on the passive side to provide to its
      clients the actual (non-mapped) ip/tcp address information of the connecting peer
      
      1) Adding remote_info_cb() to process the address info of the connecting peer
         The address info is provided by the user space port mapper service when
         the connection is initiated by the peer
      2) Adding a hash list to store the remote address info
      3) Adding functionality to add/remove the remote address info
         After the info has been provided to the port mapper client,
         it is removed from the hash list
      Signed-off-by: NTatyana Nikolova <tatyana.e.nikolova@intel.com>
      Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      6eec1774
    • S
      blk-mq: fix FUA request hang · b2387ddc
      Shaohua Li 提交于
      When a FUA request enters its DATA stage of flush pipeline, the
      request is added to mq requeue list, the request will then be added to
      ctx->rq_list. blk_mq_attempt_merge() might merge the request with a bio.
      Later when the request is finished the flush pipeline, the
      request->__data_len is 0. Then I only saw the bio gets endio called, the
      original request never finish.
      
      Adding REQ_FLUSH_SEQ into REQ_NOMERGE_FLAGS looks an easy fix.
      
      stable: 3.15+
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      b2387ddc
  21. 04 5月, 2015 3 次提交
    • R
      mac80211: fix 90 kernel-doc warnings · ff419b3f
      Randy Dunlap 提交于
      Eliminate 90 of these warnings:
      
      Warning(..//include/net/mac80211.h:1682): No description found for parameter 'drv_priv[0]'
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      ff419b3f
    • D
      lib: make memzero_explicit more robust against dead store elimination · 7829fb09
      Daniel Borkmann 提交于
      In commit 0b053c95 ("lib: memzero_explicit: use barrier instead
      of OPTIMIZER_HIDE_VAR"), we made memzero_explicit() more robust in
      case LTO would decide to inline memzero_explicit() and eventually
      find out it could be elimiated as dead store.
      
      While using barrier() works well for the case of gcc, recent efforts
      from LLVMLinux people suggest to use llvm as an alternative to gcc,
      and there, Stephan found in a simple stand-alone user space example
      that llvm could nevertheless optimize and thus elimitate the memset().
      A similar issue has been observed in the referenced llvm bug report,
      which is regarded as not-a-bug.
      
      Based on some experiments, icc is a bit special on its own, while it
      doesn't seem to eliminate the memset(), it could do so with an own
      implementation, and then result in similar findings as with llvm.
      
      The fix in this patch now works for all three compilers (also tested
      with more aggressive optimization levels). Arguably, in the current
      kernel tree it's more of a theoretical issue, but imho, it's better
      to be pedantic about it.
      
      It's clearly visible with gcc/llvm though, with the below code: if we
      would have used barrier() only here, llvm would have omitted clearing,
      not so with barrier_data() variant:
      
        static inline void memzero_explicit(void *s, size_t count)
        {
          memset(s, 0, count);
          barrier_data(s);
        }
      
        int main(void)
        {
          char buff[20];
          memzero_explicit(buff, sizeof(buff));
          return 0;
        }
      
        $ gcc -O2 test.c
        $ gdb a.out
        (gdb) disassemble main
        Dump of assembler code for function main:
         0x0000000000400400  <+0>: lea   -0x28(%rsp),%rax
         0x0000000000400405  <+5>: movq  $0x0,-0x28(%rsp)
         0x000000000040040e <+14>: movq  $0x0,-0x20(%rsp)
         0x0000000000400417 <+23>: movl  $0x0,-0x18(%rsp)
         0x000000000040041f <+31>: xor   %eax,%eax
         0x0000000000400421 <+33>: retq
        End of assembler dump.
      
        $ clang -O2 test.c
        $ gdb a.out
        (gdb) disassemble main
        Dump of assembler code for function main:
         0x00000000004004f0  <+0>: xorps  %xmm0,%xmm0
         0x00000000004004f3  <+3>: movaps %xmm0,-0x18(%rsp)
         0x00000000004004f8  <+8>: movl   $0x0,-0x8(%rsp)
         0x0000000000400500 <+16>: lea    -0x18(%rsp),%rax
         0x0000000000400505 <+21>: xor    %eax,%eax
         0x0000000000400507 <+23>: retq
        End of assembler dump.
      
      As gcc, clang, but also icc defines __GNUC__, it's sufficient to define
      this in compiler-gcc.h only to be picked up. For a fallback or otherwise
      unsupported compiler, we define it as a barrier. Similarly, for ecc which
      does not support gcc inline asm.
      
      Reference: https://llvm.org/bugs/show_bug.cgi?id=15495Reported-by: NStephan Mueller <smueller@chronox.de>
      Tested-by: NStephan Mueller <smueller@chronox.de>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Stephan Mueller <smueller@chronox.de>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: mancha security <mancha1@zoho.com>
      Cc: Mark Charlebois <charlebm@gmail.com>
      Cc: Behan Webster <behanw@converseincode.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      7829fb09
    • E
      codel: fix maxpacket/mtu confusion · a5d28090
      Eric Dumazet 提交于
      Under presence of TSO/GSO/GRO packets, codel at low rates can be quite
      useless. In following example, not a single packet was ever dropped,
      while average delay in codel queue is ~100 ms !
      
      qdisc codel 0: parent 1:12 limit 16000p target 5.0ms interval 100.0ms
       Sent 134376498 bytes 88797 pkt (dropped 0, overlimits 0 requeues 0)
       backlog 13626b 3p requeues 0
        count 0 lastcount 0 ldelay 96.9ms drop_next 0us
        maxpacket 9084 ecn_mark 0 drop_overlimit 0
      
      This comes from a confusion of what should be the minimal backlog. It is
      pretty clear it is not 64KB or whatever max GSO packet ever reached the
      qdisc.
      
      codel intent was to use MTU of the device.
      
      After the fix, we finally drop some packets, and rtt/cwnd of my single
      TCP flow are meeting our expectations.
      
      qdisc codel 0: parent 1:12 limit 16000p target 5.0ms interval 100.0ms
       Sent 102798497 bytes 67912 pkt (dropped 1365, overlimits 0 requeues 0)
       backlog 6056b 3p requeues 0
        count 1 lastcount 1 ldelay 36.3ms drop_next 0us
        maxpacket 10598 ecn_mark 0 drop_overlimit 0
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Kathleen Nichols <nichols@pollere.com>
      Cc: Dave Taht <dave.taht@gmail.com>
      Cc: Van Jacobson <vanj@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a5d28090