1. 24 6月, 2015 1 次提交
  2. 23 6月, 2015 2 次提交
    • D
      module: add per-module param_lock · b51d23e4
      Dan Streetman 提交于
      Add a "param_lock" mutex to each module, and update params.c to use
      the correct built-in or module mutex while locking kernel params.
      Remove the kparam_block_sysfs_r/w() macros, replace them with direct
      calls to kernel_param_[un]lock(module).
      
      The kernel param code currently uses a single mutex to protect
      modification of any and all kernel params.  While this generally works,
      there is one specific problem with it; a module callback function
      cannot safely load another module, i.e. with request_module() or even
      with indirect calls such as crypto_has_alg().  If the module to be
      loaded has any of its params configured (e.g. with a /etc/modprobe.d/*
      config file), then the attempt will result in a deadlock between the
      first module param callback waiting for modprobe, and modprobe trying to
      lock the single kernel param mutex to set the new module's param.
      
      This fixes that by using per-module mutexes, so that each individual module
      is protected against concurrent changes in its own kernel params, but is
      not blocked by changes to other module params.  All built-in modules
      continue to use the built-in mutex, since they will always be loaded at
      runtime and references (e.g. request_module(), crypto_has_alg()) to them
      will never cause load-time param changing.
      
      This also simplifies the interface used by modules to block sysfs access
      to their params; while there are currently functions to block and unblock
      sysfs param access which are split up by read and write and expect a single
      kernel param to be passed, their actual operation is identical and applies
      to all params, not just the one passed to them; they simply lock and unlock
      the global param mutex.  They are replaced with direct calls to
      kernel_param_[un]lock(THIS_MODULE), which locks THIS_MODULE's param_lock, or
      if the module is built-in, it locks the built-in mutex.
      Suggested-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NDan Streetman <ddstreet@ieee.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      b51d23e4
    • D
      module: make perm const · 5104b7d7
      Dan Streetman 提交于
      Change the struct kernel_param.perm field to a const, as it should never
      be changed.
      Signed-off-by: NDan Streetman <ddstreet@ieee.org>
      Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (cut from larger patch)
      5104b7d7
  3. 28 5月, 2015 11 次提交
    • L
      kernel/params.c: generalize bool_enable_only · d19f05d8
      Luis R. Rodriguez 提交于
      This takes out the bool_enable_only implementation from
      the module loading code and generalizes it so that others
      can make use of it.
      
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jani Nikula <jani.nikula@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Cc: cocci@systeme.lip6.fr
      Signed-off-by: NLuis R. Rodriguez <mcgrof@suse.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      d19f05d8
    • L
      kernel/params: constify struct kernel_param_ops uses · 9c27847d
      Luis R. Rodriguez 提交于
      Most code already uses consts for the struct kernel_param_ops,
      sweep the kernel for the last offending stragglers. Other than
      include/linux/moduleparam.h and kernel/params.c all other changes
      were generated with the following Coccinelle SmPL patch. Merge
      conflicts between trees can be handled with Coccinelle.
      
      In the future git could get Coccinelle merge support to deal with
      patch --> fail --> grammar --> Coccinelle --> new patch conflicts
      automatically for us on patches where the grammar is available and
      the patch is of high confidence. Consider this a feature request.
      
      Test compiled on x86_64 against:
      
      	* allnoconfig
      	* allmodconfig
      	* allyesconfig
      
      @ const_found @
      identifier ops;
      @@
      
      const struct kernel_param_ops ops = {
      };
      
      @ const_not_found depends on !const_found @
      identifier ops;
      @@
      
      -struct kernel_param_ops ops = {
      +const struct kernel_param_ops ops = {
      };
      
      Generated-by: Coccinelle SmPL
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Junio C Hamano <gitster@pobox.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: cocci@systeme.lip6.fr
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NLuis R. Rodriguez <mcgrof@suse.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      9c27847d
    • G
      sysfs: tightened sysfs permission checks · 28b8d0c8
      Gobinda Charan Maji 提交于
      There were some inconsistency in restriction to VERIFY_OCTAL_PERMISSIONS().
      Previously the test was "User perms >= group perms >= other perms". The
      permission field of User, Group or Other consists of three bits. LSB is
      EXECUTE permission, MSB is READ permission and the middle bit is WRITE
      permission. But logically WRITE is "more privileged" than READ.
      
      Say for example, permission value is "0430". Here User has only READ
      permission whereas Group has both WRITE and EXECUTE permission.
      
      So, the checks could be tightened and the tests are separated to
      USER_READABLE >= GROUP_READABLE >= OTHER_READABLE,
      USER_WRITABLE >= GROUP_WRITABLE and OTHER_WRITABLE is not permitted.
      Signed-off-by: NGobinda Charan Maji <gobinda.cemk07@gmail.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      28b8d0c8
    • P
      module: Make the mod_tree stuff conditional on PERF_EVENTS || TRACING · 6c9692e2
      Peter Zijlstra 提交于
      Andrew worried about the overhead on small systems; only use the fancy
      code when either perf or tracing is enabled.
      
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Requested-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      6c9692e2
    • P
      module: Optimize __module_address() using a latched RB-tree · 93c2e105
      Peter Zijlstra 提交于
      Currently __module_address() is using a linear search through all
      modules in order to find the module corresponding to the provided
      address. With a lot of modules this can take a lot of time.
      
      One of the users of this is kernel_text_address() which is employed
      in many stack unwinders; which in turn are used by perf-callchain and
      ftrace (possibly from NMI context).
      
      So by optimizing __module_address() we optimize many stack unwinders
      which are used by both perf and tracing in performance sensitive code.
      
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      93c2e105
    • P
      rbtree: Implement generic latch_tree · ade3f510
      Peter Zijlstra 提交于
      Implement a latched RB-tree in order to get unconditional RCU/lockless
      lookups.
      
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <David.Woodhouse@intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      ade3f510
    • P
      seqlock: Introduce raw_read_seqcount_latch() · 7fc26327
      Peter Zijlstra 提交于
      Because with latches there is a strict data dependency on the seq load
      we can avoid the rmb in favour of a read_barrier_depends.
      Suggested-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      7fc26327
    • P
      rcu: Move lockless_dereference() out of rcupdate.h · 0a04b016
      Peter Zijlstra 提交于
      I want to use lockless_dereference() from seqlock.h, which would mean
      including rcupdate.h from it, however rcupdate.h already includes
      seqlock.h.
      
      Avoid this by moving lockless_dereference() into compiler.h. This is
      somewhat tricky since it uses smp_read_barrier_depends() which isn't
      available there, but its a CPP macro so we can get away with it.
      
      The alternative would be moving it into asm/barrier.h, but that would
      be updating each arch (I can do if people feel that is more
      appropriate).
      
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      0a04b016
    • P
      seqlock: Better document raw_write_seqcount_latch() · 6695b92a
      Peter Zijlstra 提交于
      Improve the documentation of the latch technique as used in the
      current timekeeping code, such that it can be readily employed
      elsewhere.
      
      Borrow from the comments in timekeeping and replace those with a
      reference to this more generic comment.
      
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <David.Woodhouse@intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Reviewed-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: NMichel Lespinasse <walken@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      6695b92a
    • P
      rbtree: Make lockless searches non-fatal · d72da4a4
      Peter Zijlstra 提交于
      Change the insert and erase code such that lockless searches are
      non-fatal.
      
      In and of itself an rbtree cannot be correctly searched while
      in-modification, we can however provide weaker guarantees that will
      allow the rbtree to be used in conjunction with other techniques, such
      as latches; see 9b0fd802 ("seqcount: Add raw_write_seqcount_latch()").
      
      For this to work we need the following guarantees from the rbtree
      code:
      
       1) a lockless reader must not see partial stores, this would allow it
          to observe nodes that are invalid memory.
      
       2) there must not be (temporary) loops in the tree structure in the
          modifier's program order, this would cause a lookup which
          interrupts the modifier to get stuck indefinitely.
      
      For 1) we must use WRITE_ONCE() for all updates to the tree structure;
      in particular this patch only does rb_{left,right} as those are the
      only element required for simple searches.
      
      It generates slightly worse code, probably because volatile. But in
      pointer chasing heavy code a few instructions more should not matter.
      
      For 2) I have carefully audited the code and drawn every intermediate
      link state and not found a loop.
      
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <David.Woodhouse@intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Reviewed-by: NMichel Lespinasse <walken@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      d72da4a4
    • P
      module: Sanitize RCU usage and locking · 0be964be
      Peter Zijlstra 提交于
      Currently the RCU usage in module is an inconsistent mess of RCU and
      RCU-sched, this is broken for CONFIG_PREEMPT where synchronize_rcu()
      does not imply synchronize_sched().
      
      Most usage sites use preempt_{dis,en}able() which is RCU-sched, but
      (most of) the modification sites use synchronize_rcu(). With the
      exception of the module bug list, which actually uses RCU.
      
      Convert everything over to RCU-sched.
      
      Furthermore add lockdep asserts to all sites, because it's not at all
      clear to me the required locking is observed, esp. on exported
      functions.
      
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Acked-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      0be964be
  4. 23 5月, 2015 1 次提交
    • E
      tcp: fix a potential deadlock in tcp_get_info() · d654976c
      Eric Dumazet 提交于
      Taking socket spinlock in tcp_get_info() can deadlock, as
      inet_diag_dump_icsk() holds the &hashinfo->ehash_locks[i],
      while packet processing can use the reverse locking order.
      
      We could avoid this locking for TCP_LISTEN states, but lockdep would
      certainly get confused as all TCP sockets share same lockdep classes.
      
      [  523.722504] ======================================================
      [  523.728706] [ INFO: possible circular locking dependency detected ]
      [  523.734990] 4.1.0-dbg-DEV #1676 Not tainted
      [  523.739202] -------------------------------------------------------
      [  523.745474] ss/18032 is trying to acquire lock:
      [  523.750002]  (slock-AF_INET){+.-...}, at: [<ffffffff81669d44>] tcp_get_info+0x2c4/0x360
      [  523.758129]
      [  523.758129] but task is already holding lock:
      [  523.763968]  (&(&hashinfo->ehash_locks[i])->rlock){+.-...}, at: [<ffffffff816bcb75>] inet_diag_dump_icsk+0x1d5/0x6c0
      [  523.774661]
      [  523.774661] which lock already depends on the new lock.
      [  523.774661]
      [  523.782850]
      [  523.782850] the existing dependency chain (in reverse order) is:
      [  523.790326]
      -> #1 (&(&hashinfo->ehash_locks[i])->rlock){+.-...}:
      [  523.796599]        [<ffffffff811126bb>] lock_acquire+0xbb/0x270
      [  523.802565]        [<ffffffff816f5868>] _raw_spin_lock+0x38/0x50
      [  523.808628]        [<ffffffff81665af8>] __inet_hash_nolisten+0x78/0x110
      [  523.815273]        [<ffffffff816819db>] tcp_v4_syn_recv_sock+0x24b/0x350
      [  523.822067]        [<ffffffff81684d41>] tcp_check_req+0x3c1/0x500
      [  523.828199]        [<ffffffff81682d09>] tcp_v4_do_rcv+0x239/0x3d0
      [  523.834331]        [<ffffffff816842fe>] tcp_v4_rcv+0xa8e/0xc10
      [  523.840202]        [<ffffffff81658fa3>] ip_local_deliver_finish+0x133/0x3e0
      [  523.847214]        [<ffffffff81659a9a>] ip_local_deliver+0xaa/0xc0
      [  523.853440]        [<ffffffff816593b8>] ip_rcv_finish+0x168/0x5c0
      [  523.859624]        [<ffffffff81659db7>] ip_rcv+0x307/0x420
      
      Lets use u64_sync infrastructure instead. As a bonus, 64bit
      arches get optimized, as these are nop for them.
      
      Fixes: 0df48c26 ("tcp: add tcpi_bytes_acked to tcp_info")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d654976c
  5. 20 5月, 2015 2 次提交
    • F
      Revert "netfilter: bridge: query conntrack about skb dnat" · faecbb45
      Florian Westphal 提交于
      This reverts commit c055d5b0.
      
      There are two issues:
      'dnat_took_place' made me think that this is related to
      -j DNAT/MASQUERADE.
      
      But thats only one part of the story.  This is also relevant for SNAT
      when we undo snat translation in reverse/reply direction.
      
      Furthermore, I originally wanted to do this mainly to avoid
      storing ipv6 addresses once we make DNAT/REDIRECT work
      for ipv6 on bridges.
      
      However, I forgot about SNPT/DNPT which is stateless.
      
      So we can't escape storing address for ipv6 anyway. Might as
      well do it for ipv4 too.
      Reported-and-tested-by: NBernhard Thaler <bernhard.thaler@wvnet.at>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      faecbb45
    • D
      xen/events: don't bind non-percpu VIRQs with percpu chip · 77bb3dfd
      David Vrabel 提交于
      A non-percpu VIRQ (e.g., VIRQ_CONSOLE) may be freed on a different
      VCPU than it is bound to.  This can result in a race between
      handle_percpu_irq() and removing the action in __free_irq() because
      handle_percpu_irq() does not take desc->lock.  The interrupt handler
      sees a NULL action and oopses.
      
      Only use the percpu chip/handler for per-CPU VIRQs (like VIRQ_TIMER).
      
        # cat /proc/interrupts | grep virq
         40:      87246          0  xen-percpu-virq      timer0
         44:          0          0  xen-percpu-virq      debug0
         47:          0      20995  xen-percpu-virq      timer1
         51:          0          0  xen-percpu-virq      debug1
         69:          0          0   xen-dyn-virq      xen-pcpu
         74:          0          0   xen-dyn-virq      mce
         75:         29          0   xen-dyn-virq      hvc_console
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Cc: <stable@vger.kernel.org>
      77bb3dfd
  6. 19 5月, 2015 1 次提交
  7. 17 5月, 2015 1 次提交
    • H
      rhashtable: Add cap on number of elements in hash table · 07ee0722
      Herbert Xu 提交于
      We currently have no limit on the number of elements in a hash table.
      This is a problem because some users (tipc) set a ceiling on the
      maximum table size and when that is reached the hash table may
      degenerate.  Others may encounter OOM when growing and if we allow
      insertions when that happens the hash table perofrmance may also
      suffer.
      
      This patch adds a new paramater insecure_max_entries which becomes
      the cap on the table.  If unset it defaults to max_size * 2.  If
      it is also zero it means that there is no cap on the number of
      elements in the table.  However, the table will grow whenever the
      utilisation hits 100% and if that growth fails, you will get ENOMEM
      on insertion.
      
      As allowing oversubscription is potentially dangerous, the name
      contains the word insecure.
      
      Note that the cap is not a hard limit.  This is done for performance
      reasons as enforcing a hard limit will result in use of atomic ops
      that are heavier than the ones we currently use.
      
      The reasoning is that we're only guarding against a gross over-
      subscription of the table, rather than a small breach of the limit.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      07ee0722
  8. 16 5月, 2015 1 次提交
  9. 15 5月, 2015 3 次提交
    • R
      rename RTNH_F_EXTERNAL to RTNH_F_OFFLOAD · eea39946
      Roopa Prabhu 提交于
      RTNH_F_EXTERNAL today is printed as "offload" in iproute2 output.
      
      This patch renames the flag to be consistent with what the user sees.
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eea39946
    • J
      uidgid: make uid_valid and gid_valid work with !CONFIG_MULTIUSER · 929aa5b2
      Josh Triplett 提交于
      {u,g}id_valid call {u,g}id_eq, which calls __k{u,g}id_val on both
      arguments and compares.  With !CONFIG_MULTIUSER, __k{u,g}id_val return a
      constant 0, which makes {u,g}id_valid always return false.  Change
      {u,g}id_valid to compare their argument against -1 instead.  That produces
      identical results in the normal CONFIG_MULTIUSER=y case, but with
      !CONFIG_MULTIUSER will make {u,g}id_valid constant-fold into "return
      true;" rather than "return false;".
      
      This fixes uses of devpts without CONFIG_MULTIUSER.
      Signed-off-by: NJosh Triplett <josh@joshtriplett.org>
      Reported-by: Fengguang Wu <fengguang.wu@intel.com>,
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      929aa5b2
    • V
      gfp: add __GFP_NOACCOUNT · 8f4fc071
      Vladimir Davydov 提交于
      Not all kmem allocations should be accounted to memcg.  The following
      patch gives an example when accounting of a certain type of allocations to
      memcg can effectively result in a memory leak.  This patch adds the
      __GFP_NOACCOUNT flag which if passed to kmalloc and friends will force the
      allocation to go through the root cgroup.  It will be used by the next
      patch.
      
      Note, since in case of kmemleak enabled each kmalloc implies yet another
      allocation from the kmemleak_object cache, we add __GFP_NOACCOUNT to
      gfp_kmemleak_mask.
      
      Alternatively, we could introduce a per kmem cache flag disabling
      accounting for all allocations of a particular kind, but (a) we would not
      be able to bypass accounting for kmalloc then and (b) a kmem cache with
      this flag set could not be merged with a kmem cache without this flag,
      which would increase the number of global caches and therefore
      fragmentation even if the memory cgroup controller is not used.
      
      Despite its generic name, currently __GFP_NOACCOUNT disables accounting
      only for kmem allocations while user page allocations are always charged.
      To catch abusing of this flag, a warning is issued on an attempt of
      passing it to mem_cgroup_try_charge.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: <stable@vger.kernel.org>	[4.0.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8f4fc071
  10. 13 5月, 2015 4 次提交
  11. 12 5月, 2015 1 次提交
    • S
      HID: hid-sensor-hub: Fix debug lock warning · 2d94e522
      Srinivas Pandruvada 提交于
      When CONFIG_DEBUG_LOCK_ALLOC is defined, mutex magic is compared and
      warned for (l->magic != l), here l is the address of mutex pointer.
      In hid-sensor-hub as part of hsdev creation, a per hsdev mutex is
      initialized during MFD cell creation. This hsdev, which contains, mutex
      is part of platform data for the a cell. But platform_data is copied
      in platform_device_add_data() in platform.c. This copy will copy the
      whole hsdev structure including mutex. But once copied the magic
      will no longer match. So when client driver call
      sensor_hub_input_attr_get_raw_value, this will trigger mutex warning.
      So to avoid this allocate mutex dynamically. This will be same even
      after copy.
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      2d94e522
  12. 11 5月, 2015 1 次提交
    • P
      pty: Fix input race when closing · 1a48632f
      Peter Hurley 提交于
      A read() from a pty master may mistakenly indicate EOF (errno == -EIO)
      after the pty slave has closed, even though input data remains to be read.
      For example,
      
             pty slave       |        input worker        |    pty master
                             |                            |
                             |                            |   n_tty_read()
      pty_write()            |                            |     input avail? no
        add data             |                            |     sleep
        schedule worker  --->|                            |     .
                             |---> flush_to_ldisc()       |     .
      pty_close()            |       fill read buffer     |     .
        wait for worker      |       wakeup reader    --->|     .
                             |       read buffer full?    |---> input avail ? yes
                             |<---   yes - exit worker    |     copy 4096 bytes to user
        TTY_OTHER_CLOSED <---|                            |<--- kick worker
                             |                            |
      
      		                **** New read() before worker starts ****
      
                             |                            |   n_tty_read()
                             |                            |     input avail? no
                             |                            |     TTY_OTHER_CLOSED? yes
                             |                            |     return -EIO
      
      Several conditions are required to trigger this race:
      1. the ldisc read buffer must become full so the input worker exits
      2. the read() count parameter must be >= 4096 so the ldisc read buffer
         is empty
      3. the subsequent read() occurs before the kicked worker has processed
         more input
      
      However, the underlying cause of the race is that data is pipelined, while
      tty state is not; ie., data already written by the pty slave end is not
      yet visible to the pty master end, but state changes by the pty slave end
      are visible to the pty master end immediately.
      
      Pipeline the TTY_OTHER_CLOSED state through input worker to the reader.
      1. Introduce TTY_OTHER_DONE which is set by the input worker when
         TTY_OTHER_CLOSED is set and either the input buffers are flushed or
         input processing has completed. Readers/polls are woken when
         TTY_OTHER_DONE is set.
      2. Reader/poll checks TTY_OTHER_DONE instead of TTY_OTHER_CLOSED.
      3. A new input worker is started from pty_close() after setting
         TTY_OTHER_CLOSED, which ensures the TTY_OTHER_DONE state will be
         set if the last input worker is already finished (or just about to
         exit).
      
      Remove tty_flush_to_ldisc(); no in-tree callers.
      
      Fixes: 52bce7f8 ("pty, n_tty: Simplify input processing on final close")
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=96311
      BugLink: http://bugs.launchpad.net/bugs/1429756
      Cc: <stable@vger.kernel.org> # 3.19+
      Reported-by: NAndy Whitcroft <apw@canonical.com>
      Reported-by: NH.J. Lu <hjl.tools@gmail.com>
      Signed-off-by: NPeter Hurley <peter@hurleysoftware.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1a48632f
  13. 10 5月, 2015 1 次提交
  14. 09 5月, 2015 1 次提交
  15. 08 5月, 2015 1 次提交
  16. 07 5月, 2015 1 次提交
  17. 06 5月, 2015 5 次提交
  18. 05 5月, 2015 2 次提交
    • T
      RDMA/core: Enable the iWarp Port Mapper to provide the actual address of the... · 6eec1774
      Tatyana Nikolova 提交于
      RDMA/core: Enable the iWarp Port Mapper to provide the actual address of the connecting peer to its clients
      
      Add functionality to enable the port mapper on the passive side to provide to its
      clients the actual (non-mapped) ip/tcp address information of the connecting peer
      
      1) Adding remote_info_cb() to process the address info of the connecting peer
         The address info is provided by the user space port mapper service when
         the connection is initiated by the peer
      2) Adding a hash list to store the remote address info
      3) Adding functionality to add/remove the remote address info
         After the info has been provided to the port mapper client,
         it is removed from the hash list
      Signed-off-by: NTatyana Nikolova <tatyana.e.nikolova@intel.com>
      Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      6eec1774
    • S
      blk-mq: fix FUA request hang · b2387ddc
      Shaohua Li 提交于
      When a FUA request enters its DATA stage of flush pipeline, the
      request is added to mq requeue list, the request will then be added to
      ctx->rq_list. blk_mq_attempt_merge() might merge the request with a bio.
      Later when the request is finished the flush pipeline, the
      request->__data_len is 0. Then I only saw the bio gets endio called, the
      original request never finish.
      
      Adding REQ_FLUSH_SEQ into REQ_NOMERGE_FLAGS looks an easy fix.
      
      stable: 3.15+
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      b2387ddc