1. 19 5月, 2020 3 次提交
  2. 10 5月, 2020 1 次提交
  3. 07 5月, 2020 2 次提交
  4. 05 5月, 2020 5 次提交
  5. 01 5月, 2020 2 次提交
    • K
      security: Fix the default value of fs_context_parse_param hook · 54261af4
      KP Singh 提交于
      security_fs_context_parse_param is called by vfs_parse_fs_param and
      a succussful return value (i.e 0) implies that a parameter will be
      consumed by the LSM framework. This stops all further parsing of the
      parmeter by VFS. Furthermore, if an LSM hook returns a success, the
      remaining LSM hooks are not invoked for the parameter.
      
      The current default behavior of returning success means that all the
      parameters are expected to be parsed by the LSM hook and none of them
      end up being populated by vfs in fs_context
      
      This was noticed when lsm=bpf is supplied on the command line before any
      other LSM. As the bpf lsm uses this default value to implement a default
      hook, this resulted in a failure to parse any fs_context parameters and
      a failure to mount the root filesystem.
      
      Fixes: 98e828a0 ("security: Refactor declaration of LSM hooks")
      Reported-by: NMikko Ylinen <mikko.ylinen@linux.intel.com>
      Signed-off-by: NKP Singh <kpsingh@google.com>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      54261af4
    • P
      mptcp: move option parsing into mptcp_incoming_options() · cfde141e
      Paolo Abeni 提交于
      The mptcp_options_received structure carries several per
      packet flags (mp_capable, mp_join, etc.). Such fields must
      be cleared on each packet, even on dropped ones or packet
      not carrying any MPTCP options, but the current mptcp
      code clears them only on TCP option reset.
      
      On several races/corner cases we end-up with stray bits in
      incoming options, leading to WARN_ON splats. e.g.:
      
      [  171.164906] Bad mapping: ssn=32714 map_seq=1 map_data_len=32713
      [  171.165006] WARNING: CPU: 1 PID: 5026 at net/mptcp/subflow.c:533 warn_bad_map (linux-mptcp/net/mptcp/subflow.c:533 linux-mptcp/net/mptcp/subflow.c:531)
      [  171.167632] Modules linked in: ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel geneve ip6_udp_tunnel udp_tunnel macsec macvtap tap ipvlan macvlan 8021q garp mrp xfrm_interface veth netdevsim nlmon dummy team bonding vcan bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun binfmt_misc intel_rapl_msr intel_rapl_common rfkill kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev virtio_balloon pcspkr i2c_piix4 sunrpc ip_tables xfs libcrc32c crc32c_intel serio_raw virtio_console ata_generic virtio_blk virtio_net net_failover failover ata_piix libata
      [  171.199464] CPU: 1 PID: 5026 Comm: repro Not tainted 5.7.0-rc1.mptcp_f227fdf5d388+ #95
      [  171.200886] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
      [  171.202546] RIP: 0010:warn_bad_map (linux-mptcp/net/mptcp/subflow.c:533 linux-mptcp/net/mptcp/subflow.c:531)
      [  171.206537] Code: c1 ea 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 04 84 d2 75 1d 8b 55 3c 44 89 e6 48 c7 c7 20 51 13 95 e8 37 8b 22 fe <0f> 0b 48 83 c4 08 5b 5d 41 5c c3 89 4c 24 04 e8 db d6 94 fe 8b 4c
      [  171.220473] RSP: 0018:ffffc90000150560 EFLAGS: 00010282
      [  171.221639] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      [  171.223108] RDX: 0000000000000000 RSI: 0000000000000008 RDI: fffff5200002a09e
      [  171.224388] RBP: ffff8880aa6e3c00 R08: 0000000000000001 R09: fffffbfff2ec9955
      [  171.225706] R10: ffffffff9764caa7 R11: fffffbfff2ec9954 R12: 0000000000007fca
      [  171.227211] R13: ffff8881066f4a7f R14: ffff8880aa6e3c00 R15: 0000000000000020
      [  171.228460] FS:  00007f8623719740(0000) GS:ffff88810be00000(0000) knlGS:0000000000000000
      [  171.230065] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  171.231303] CR2: 00007ffdab190a50 CR3: 00000001038ea006 CR4: 0000000000160ee0
      [  171.232586] Call Trace:
      [  171.233109]  <IRQ>
      [  171.233531] get_mapping_status (linux-mptcp/net/mptcp/subflow.c:691)
      [  171.234371] mptcp_subflow_data_available (linux-mptcp/net/mptcp/subflow.c:736 linux-mptcp/net/mptcp/subflow.c:832)
      [  171.238181] subflow_state_change (linux-mptcp/net/mptcp/subflow.c:1085 (discriminator 1))
      [  171.239066] tcp_fin (linux-mptcp/net/ipv4/tcp_input.c:4217)
      [  171.240123] tcp_data_queue (linux-mptcp/./include/linux/compiler.h:199 linux-mptcp/net/ipv4/tcp_input.c:4822)
      [  171.245083] tcp_rcv_established (linux-mptcp/./include/linux/skbuff.h:1785 linux-mptcp/./include/net/tcp.h:1774 linux-mptcp/./include/net/tcp.h:1847 linux-mptcp/net/ipv4/tcp_input.c:5238 linux-mptcp/net/ipv4/tcp_input.c:5730)
      [  171.254089] tcp_v4_rcv (linux-mptcp/./include/linux/spinlock.h:393 linux-mptcp/net/ipv4/tcp_ipv4.c:2009)
      [  171.258969] ip_protocol_deliver_rcu (linux-mptcp/net/ipv4/ip_input.c:204 (discriminator 1))
      [  171.260214] ip_local_deliver_finish (linux-mptcp/./include/linux/rcupdate.h:651 linux-mptcp/net/ipv4/ip_input.c:232)
      [  171.261389] ip_local_deliver (linux-mptcp/./include/linux/netfilter.h:307 linux-mptcp/./include/linux/netfilter.h:301 linux-mptcp/net/ipv4/ip_input.c:252)
      [  171.265884] ip_rcv (linux-mptcp/./include/linux/netfilter.h:307 linux-mptcp/./include/linux/netfilter.h:301 linux-mptcp/net/ipv4/ip_input.c:539)
      [  171.273666] process_backlog (linux-mptcp/./include/linux/rcupdate.h:651 linux-mptcp/net/core/dev.c:6135)
      [  171.275328] net_rx_action (linux-mptcp/net/core/dev.c:6572 linux-mptcp/net/core/dev.c:6640)
      [  171.280472] __do_softirq (linux-mptcp/./arch/x86/include/asm/jump_label.h:25 linux-mptcp/./include/linux/jump_label.h:200 linux-mptcp/./include/trace/events/irq.h:142 linux-mptcp/kernel/softirq.c:293)
      [  171.281379] do_softirq_own_stack (linux-mptcp/arch/x86/entry/entry_64.S:1083)
      [  171.282358]  </IRQ>
      
      We could address the issue clearing explicitly the relevant fields
      in several places - tcp_parse_option, tcp_fast_parse_options,
      possibly others.
      
      Instead we move the MPTCP option parsing into the already existing
      mptcp ingress hook, so that we need to clear the fields in a single
      place.
      
      This allows us dropping an MPTCP hook from the TCP code and
      removing the quite large mptcp_options_received from the tcp_sock
      struct. On the flip side, the MPTCP sockets will traverse the
      option space twice (in tcp_parse_option() and in
      mptcp_incoming_options(). That looks acceptable: we already
      do that for syn and 3rd ack packets, plain TCP socket will
      benefit from it, and even MPTCP sockets will experience better
      code locality, reducing the jumps between TCP and MPTCP code.
      
      v1 -> v2:
       - rebased on current '-net' tree
      
      Fixes: 648ef4b8 ("mptcp: Implement MPTCP receive path")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cfde141e
  6. 30 4月, 2020 2 次提交
  7. 29 4月, 2020 2 次提交
    • O
      NFSv4.1: fix handling of backchannel binding in BIND_CONN_TO_SESSION · dff58530
      Olga Kornievskaia 提交于
      Currently, if the client sends BIND_CONN_TO_SESSION with
      NFS4_CDFC4_FORE_OR_BOTH but only gets NFS4_CDFS4_FORE back it ignores
      that it wasn't able to enable a backchannel.
      
      To make sure, the client sends BIND_CONN_TO_SESSION as the first
      operation on the connections (ie., no other session compounds haven't
      been sent before), and if the client's request to bind the backchannel
      is not satisfied, then reset the connection and retry.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NOlga Kornievskaia <kolga@netapp.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      dff58530
    • N
      SUNRPC: defer slow parts of rpc_free_client() to a workqueue. · 7c4310ff
      NeilBrown 提交于
      The rpciod workqueue is on the write-out path for freeing dirty memory,
      so it is important that it never block waiting for memory to be
      allocated - this can lead to a deadlock.
      
      rpc_execute() - which is often called by an rpciod work item - calls
      rcp_task_release_client() which can lead to rpc_free_client().
      
      rpc_free_client() makes two calls which could potentially block wating
      for memory allocation.
      
      rpc_clnt_debugfs_unregister() calls into debugfs and will block while
      any of the debugfs files are being accessed.  In particular it can block
      while any of the 'open' methods are being called and all of these use
      malloc for one thing or another.  So this can deadlock if the memory
      allocation waits for NFS to complete some writes via rpciod.
      
      rpc_clnt_remove_pipedir() can take the inode_lock() and while it isn't
      obvious that memory allocations can happen while the lock it held, it is
      safer to assume they might and to not let rpciod call
      rpc_clnt_remove_pipedir().
      
      So this patch moves these two calls (together with the final kfree() and
      rpciod_down()) into a work-item to be run from the system work-queue.
      rpciod can continue its important work, and the final stages of the free
      can happen whenever they happen.
      
      I have seen this deadlock on a 4.12 based kernel where debugfs used
      synchronize_srcu() when removing objects.  synchronize_srcu() requires a
      workqueue and there were no free workther threads and none could be
      allocated.  While debugsfs no longer uses SRCU, I believe the deadlock
      is still possible.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      7c4310ff
  8. 28 4月, 2020 15 次提交
    • U
      amba: Initialize dma_parms for amba devices · f4584884
      Ulf Hansson 提交于
      It's currently the amba driver's responsibility to initialize the pointer,
      dma_parms, for its corresponding struct device. The benefit with this
      approach allows us to avoid the initialization and to not waste memory for
      the struct device_dma_parameters, as this can be decided on a case by case
      basis.
      
      However, it has turned out that this approach is not very practical. Not
      only does it lead to open coding, but also to real errors. In principle
      callers of dma_set_max_seg_size() doesn't check the error code, but just
      assumes it succeeds.
      
      For these reasons, let's do the initialization from the common amba bus at
      the device registration point. This also follows the way the PCI devices
      are being managed, see pci_device_add().
      Suggested-by: NChristoph Hellwig <hch@lst.de>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: <stable@vger.kernel.org>
      Tested-by: NHaibo Chen <haibo.chen@nxp.com>
      Reviewed-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20200422101013.31267-1-ulf.hansson@linaro.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f4584884
    • U
      driver core: platform: Initialize dma_parms for platform devices · 9495b7e9
      Ulf Hansson 提交于
      It's currently the platform driver's responsibility to initialize the
      pointer, dma_parms, for its corresponding struct device. The benefit with
      this approach allows us to avoid the initialization and to not waste memory
      for the struct device_dma_parameters, as this can be decided on a case by
      case basis.
      
      However, it has turned out that this approach is not very practical.  Not
      only does it lead to open coding, but also to real errors. In principle
      callers of dma_set_max_seg_size() doesn't check the error code, but just
      assumes it succeeds.
      
      For these reasons, let's do the initialization from the common platform bus
      at the device registration point. This also follows the way the PCI devices
      are being managed, see pci_device_add().
      Suggested-by: NChristoph Hellwig <hch@lst.de>
      Cc: <stable@vger.kernel.org>
      Tested-by: NHaibo Chen <haibo.chen@nxp.com>
      Reviewed-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20200422100954.31211-1-ulf.hansson@linaro.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9495b7e9
    • R
      locktorture.c: Fix if-statement empty body warnings · be44ae62
      Randy Dunlap 提交于
      When using -Wextra, gcc complains about torture_preempt_schedule()
      when its definition is empty (i.e., when CONFIG_PREEMPTION is not
      set/enabled).  Fix these warnings by adding an empty do-while block
      for that macro when CONFIG_PREEMPTION is not set.
      Fixes these build warnings:
      
      ../kernel/locking/locktorture.c:119:29: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
      ../kernel/locking/locktorture.c:166:29: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
      ../kernel/locking/locktorture.c:337:29: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
      ../kernel/locking/locktorture.c:490:29: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
      ../kernel/locking/locktorture.c:528:29: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
      ../kernel/locking/locktorture.c:553:29: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
      
      I have verified that there is no object code change (with gcc 7.5.0).
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Signed-off-by: N"Paul E. McKenney" <paulmck@kernel.org>
      be44ae62
    • P
      rcu-tasks: Add Kconfig option to mediate smp_mb() vs. IPI · 9ae58d7b
      Paul E. McKenney 提交于
      This commit provides a new TASKS_TRACE_RCU_READ_MB Kconfig option that
      enables use of read-side memory barriers by both rcu_read_lock_trace()
      and rcu_read_unlock_trace() when the are executed with the
      current->trc_reader_special.b.need_mb flag set.  This flag is currently
      never set.  Doing that is the subject of a later commit.
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      9ae58d7b
    • P
      rcu-tasks: Split ->trc_reader_need_end · 276c4104
      Paul E. McKenney 提交于
      This commit splits ->trc_reader_need_end by using the rcu_special union.
      This change permits readers to check to see if a memory barrier is
      required without any added overhead in the common case where no such
      barrier is required.  This commit also adds the read-side checking.
      Later commits will add the machinery to properly set the new
      ->trc_reader_special.b.need_mb field.
      
      This commit also makes rcu_read_unlock_trace_special() tolerate nested
      read-side critical sections within interrupt and NMI handlers.
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      276c4104
    • P
      rcu-tasks: Make RCU Tasks Trace make use of RCU scheduler hooks · 43766c3e
      Paul E. McKenney 提交于
      This commit makes the calls to rcu_tasks_qs() detect and report
      quiescent states for RCU tasks trace.  If the task is in a quiescent
      state and if ->trc_reader_checked is not yet set, the task sets its own
      ->trc_reader_checked.  This will cause the grace-period kthread to
      remove it from the holdout list if it still remains there.
      
      [ paulmck: Fix conditional compilation per kbuild test robot feedback. ]
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      43766c3e
    • P
      rcu-tasks: Add an RCU Tasks Trace to simplify protection of tracing hooks · d5f177d3
      Paul E. McKenney 提交于
      Because RCU does not watch exception early-entry/late-exit, idle-loop,
      or CPU-hotplug execution, protection of tracing and BPF operations is
      needlessly complicated.  This commit therefore adds a variant of
      Tasks RCU that:
      
      o	Has explicit read-side markers to allow finite grace periods in
      	the face of in-kernel loops for PREEMPT=n builds.  These markers
      	are rcu_read_lock_trace() and rcu_read_unlock_trace().
      
      o	Protects code in the idle loop, exception entry/exit, and
      	CPU-hotplug code paths.  In this respect, RCU-tasks trace is
      	similar to SRCU, but with lighter-weight readers.
      
      o	Avoids expensive read-side instruction, having overhead similar
      	to that of Preemptible RCU.
      
      There are of course downsides:
      
      o	The grace-period code can send IPIs to CPUs, even when those
      	CPUs are in the idle loop or in nohz_full userspace.  This is
      	mitigated by later commits.
      
      o	It is necessary to scan the full tasklist, much as for Tasks RCU.
      
      o	There is a single callback queue guarded by a single lock,
      	again, much as for Tasks RCU.  However, those early use cases
      	that request multiple grace periods in quick succession are
      	expected to do so from a single task, which makes the single
      	lock almost irrelevant.  If needed, multiple callback queues
      	can be provided using any number of schemes.
      
      Perhaps most important, this variant of RCU does not affect the vanilla
      flavors, rcu_preempt and rcu_sched.  The fact that RCU Tasks Trace
      readers can operate from idle, offline, and exception entry/exit in no
      way enables rcu_preempt and rcu_sched readers to do so.
      
      The memory ordering was outlined here:
      https://lore.kernel.org/lkml/20200319034030.GX3199@paulmck-ThinkPad-P72/
      
      This effort benefited greatly from off-list discussions of BPF
      requirements with Alexei Starovoitov and Andrii Nakryiko.  At least
      some of the on-list discussions are captured in the Link: tags below.
      In addition, KCSAN was quite helpful in finding some early bugs.
      
      Link: https://lore.kernel.org/lkml/20200219150744.428764577@infradead.org/
      Link: https://lore.kernel.org/lkml/87mu8p797b.fsf@nanos.tec.linutronix.de/
      Link: https://lore.kernel.org/lkml/20200225221305.605144982@linutronix.de/
      Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
      Cc: Andrii Nakryiko <andriin@fb.com>
      [ paulmck: Apply feedback from Steve Rostedt and Joel Fernandes. ]
      [ paulmck: Decrement trc_n_readers_need_end upon IPI failure. ]
      [ paulmck: Fix locking issue reported by rcutorture. ]
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      d5f177d3
    • P
      rcu-tasks: Add an RCU-tasks rude variant · c84aad76
      Paul E. McKenney 提交于
      This commit adds a "rude" variant of RCU-tasks that has as quiescent
      states schedule(), cond_resched_tasks_rcu_qs(), userspace execution,
      and (in theory, anyway) cond_resched().  In other words, RCU-tasks rude
      readers are regions of code with preemption disabled, but excluding code
      early in the CPU-online sequence and late in the CPU-offline sequence.
      Updates make use of IPIs and force an IPI and a context switch on each
      online CPU.  This variant is useful in some situations in tracing.
      Suggested-by: NSteven Rostedt <rostedt@goodmis.org>
      [ paulmck: Apply EXPORT_SYMBOL_GPL() feedback from Qiujun Huang. ]
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      [ paulmck: Apply review feedback from Steve Rostedt. ]
      c84aad76
    • P
      rcu-tasks: Refactor RCU-tasks to allow variants to be added · 5873b8a9
      Paul E. McKenney 提交于
      This commit splits out generic processing from RCU-tasks-specific
      processing in order to allow additional flavors to be added.  It also
      adds a def_bool TASKS_RCU_GENERIC to enable the common RCU-tasks
      infrastructure code.
      
      This is primarily, but not entirely, a code-movement commit.
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      5873b8a9
    • P
      rcu: Reinstate synchronize_rcu_mult() · b3d73156
      Paul E. McKenney 提交于
      With the advent and likely usage of synchronize_rcu_rude(), there is
      again a need to wait on multiple types of RCU grace periods, for
      example, call_rcu_tasks() and call_rcu_tasks_rude().  This commit
      therefore reinstates synchronize_rcu_mult() in order to allow these
      grace periods to be straightforwardly waited on concurrently.
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      b3d73156
    • P
      sched/core: Add function to sample state of locked-down task · 2beaf328
      Paul E. McKenney 提交于
      A running task's state can be sampled in a consistent manner (for example,
      for diagnostic purposes) simply by invoking smp_call_function_single()
      on its CPU, which may be obtained using task_cpu(), then having the
      IPI handler verify that the desired task is in fact still running.
      However, if the task is not running, this sampling can in theory be done
      immediately and directly.  In practice, the task might start running at
      any time, including during the sampling period.  Gaining a consistent
      sample of a not-running task therefore requires that something be done
      to lock down the target task's state.
      
      This commit therefore adds a try_invoke_on_locked_down_task() function
      that invokes a specified function if the specified task can be locked
      down, returning true if successful and if the specified function returns
      true.  Otherwise this function simply returns false.  Given that the
      function passed to try_invoke_on_nonrunning_task() might be invoked with
      a runqueue lock held, that function had better be quite lightweight.
      
      The function is passed the target task's task_struct pointer and the
      argument passed to try_invoke_on_locked_down_task(), allowing easy access
      to task state and to a location for further variables to be passed in
      and out.
      
      Note that the specified function will be called even if the specified
      task is currently running.  The function can use ->on_rq and task_curr()
      to quickly and easily determine the task's state, and can return false
      if this state is not to the function's liking.  The caller of the
      try_invoke_on_locked_down_task() would then see the false return value,
      and could take appropriate action, for example, trying again later or
      sending an IPI if matters are more urgent.
      
      It is expected that use cases such as the RCU CPU stall warning code will
      simply return false if the task is currently running.  However, there are
      use cases involving nohz_full CPUs where the specified function might
      instead fall back to an alternative sampling scheme that relies on heavier
      synchronization (such as memory barriers) in the target task.
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Ben Segall <bsegall@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      [ paulmck: Apply feedback from Peter Zijlstra and Steven Rostedt. ]
      [ paulmck: Invoke if running to handle feedback from Mathieu Desnoyers. ]
      Reviewed-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Reviewed-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      2beaf328
    • L
      rcu: Remove unused ->rcu_read_unlock_special.b.deferred_qs field · f0bdf6d4
      Lai Jiangshan 提交于
      The ->rcu_read_unlock_special.b.deferred_qs field is set to true in
      rcu_read_unlock_special() but never set to false.  This is not
      particularly useful, so this commit removes this field.
      
      The only possible justification for this field is to ease debugging
      of RCU deferred quiscent states, but the combination of the other
      ->rcu_read_unlock_special fields plus ->rcu_blocked_node and of course
      ->rcu_read_lock_nesting should cover debugging needs.  And if this last
      proves incorrect, this patch can always be reverted, along with the
      required setting of ->rcu_read_unlock_special.b.deferred_qs to false
      in rcu_preempt_deferred_qs_irqrestore().
      Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      f0bdf6d4
    • P
      rcu: Add rcu_gp_might_be_stalled() · 6be7436d
      Paul E. McKenney 提交于
      This commit adds rcu_gp_might_be_stalled(), which returns true if there
      is some reason to believe that the RCU grace period is stalled.  The use
      case is where an RCU free-memory path needs to allocate memory in order
      to free it, a situation that should be avoided where possible.
      
      But where it is necessary, there is always the alternative of using
      synchronize_rcu() to wait for a grace period in order to avoid the
      allocation.  And if the grace period is stalled, allocating memory to
      asynchronously wait for it is a bad idea of epic proportions: Far better
      to let others use the memory, because these others might actually be
      able to free that memory before the grace period ends.
      
      Thus, rcu_gp_might_be_stalled() can be used to help decide whether
      allocating memory on an RCU free path is a semi-reasonable course
      of action.
      
      Cc: Joel Fernandes <joel@joelfernandes.org>
      Cc: Uladzislau Rezki <urezki@gmail.com>
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      6be7436d
    • J
      Revert "rculist: Describe variadic macro argument in a Sphinx-compatible way" · ddc46593
      Jonathan Neuschäfer 提交于
      This reverts commit f452ee09.
      
      The workaround became unnecessary with commit 43756e34
      ("scripts/kernel-doc: Add support for named variable macro arguments").
      Signed-off-by: NJonathan Neuschäfer <j.neuschaefer@gmx.net>
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      ddc46593
    • S
      vsock/virtio: fix multiple packet delivery to monitoring devices · a78d1639
      Stefano Garzarella 提交于
      In virtio_transport.c, if the virtqueue is full, the transmitting
      packet is queued up and it will be sent in the next iteration.
      This causes the same packet to be delivered multiple times to
      monitoring devices.
      
      We want to continue to deliver packets to monitoring devices before
      it is put in the virtqueue, to avoid that replies can appear in the
      packet capture before the transmitted packet.
      
      This patch fixes the issue, adding a new flag (tap_delivered) in
      struct virtio_vsock_pkt, to check if the packet is already delivered
      to monitoring devices.
      
      In vhost/vsock.c, we are splitting packets, so we must set
      'tap_delivered' to false when we queue up the same virtio_vsock_pkt
      to handle the remaining bytes.
      Signed-off-by: NStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a78d1639
  9. 27 4月, 2020 1 次提交
  10. 23 4月, 2020 2 次提交
    • N
      tracing: Remove DECLARE_TRACE_NOARGS · a2806ef7
      Nikolay Borisov 提交于
      This macro was intentionally broken so that the kernel code is not
      poluted with such noargs macro used simply as markers. This use case
      can be satisfied by using dummy no inline functions. Just remove it.
      
      Link: http://lkml.kernel.org/r/20200413153246.8511-1-nborisov@suse.comSigned-off-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      a2806ef7
    • M
      arch: split MODULE_ARCH_VERMAGIC definitions out to <asm/vermagic.h> · 62d0fd59
      Masahiro Yamada 提交于
      As the bug report [1] pointed out, <linux/vermagic.h> must be included
      after <linux/module.h>.
      
      I believe we should not impose any include order restriction. We often
      sort include directives alphabetically, but it is just coding style
      convention. Technically, we can include header files in any order by
      making every header self-contained.
      
      Currently, arch-specific MODULE_ARCH_VERMAGIC is defined in
      <asm/module.h>, which is not included from <linux/vermagic.h>.
      
      Hence, the straight-forward fix-up would be as follows:
      
      |--- a/include/linux/vermagic.h
      |+++ b/include/linux/vermagic.h
      |@@ -1,5 +1,6 @@
      | /* SPDX-License-Identifier: GPL-2.0 */
      | #include <generated/utsrelease.h>
      |+#include <linux/module.h>
      |
      | /* Simply sanity version stamp for modules. */
      | #ifdef CONFIG_SMP
      
      This works enough, but for further cleanups, I split MODULE_ARCH_VERMAGIC
      definitions into <asm/vermagic.h>.
      
      With this, <linux/module.h> and <linux/vermagic.h> will be orthogonal,
      and the location of MODULE_ARCH_VERMAGIC definitions will be consistent.
      
      For arc and ia64, MODULE_PROC_FAMILY is only used for defining
      MODULE_ARCH_VERMAGIC. I squashed it.
      
      For hexagon, nds32, and xtensa, I removed <asm/modules.h> entirely
      because they contained nothing but MODULE_ARCH_VERMAGIC definition.
      Kbuild will automatically generate <asm/modules.h> at build-time,
      wrapping <asm-generic/module.h>.
      
      [1] https://lore.kernel.org/lkml/20200411155623.GA22175@zn.tnicReported-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      Acked-by: NJessica Yu <jeyu@kernel.org>
      62d0fd59
  11. 22 4月, 2020 3 次提交
    • J
      pnp: Use list_for_each_entry() instead of open coding · 01b2bafe
      Jason Gunthorpe 提交于
      Aside from good practice, this avoids a warning from gcc 10:
      
      ./include/linux/kernel.h:997:3: warning: array subscript -31 is outside array bounds of ‘struct list_head[1]’ [-Warray-bounds]
        997 |  ((type *)(__mptr - offsetof(type, member))); })
            |  ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      ./include/linux/list.h:493:2: note: in expansion of macro ‘container_of’
        493 |  container_of(ptr, type, member)
            |  ^~~~~~~~~~~~
      ./include/linux/pnp.h:275:30: note: in expansion of macro ‘list_entry’
        275 | #define global_to_pnp_dev(n) list_entry(n, struct pnp_dev, global_list)
            |                              ^~~~~~~~~~
      ./include/linux/pnp.h:281:11: note: in expansion of macro ‘global_to_pnp_dev’
        281 |  (dev) != global_to_pnp_dev(&pnp_global); \
            |           ^~~~~~~~~~~~~~~~~
      arch/x86/kernel/rtc.c:189:2: note: in expansion of macro ‘pnp_for_each_dev’
        189 |  pnp_for_each_dev(dev) {
      
      Because the common code doesn't cast the starting list_head to the
      containing struct.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      [ rjw: Whitespace adjustments ]
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      01b2bafe
    • V
      net: stmmac: Enable SERDES power up/down sequence · b9663b7c
      Voon Weifeng 提交于
      This patch is to enable Intel SERDES power up/down sequence. The SERDES
      converts 8/10 bits data to SGMII signal. Below is an example of
      HW configuration for SGMII mode. The SERDES is located in the PHY IF
      in the diagram below.
      
      <-----------------GBE Controller---------->|<--External PHY chip-->
      +----------+         +----+            +---+           +----------+
      |   EQoS   | <-GMII->| DW | < ------ > |PHY| <-SGMII-> | External |
      |   MAC    |         |xPCS|            |IF |           | PHY      |
      +----------+         +----+            +---+           +----------+
             ^               ^                 ^                ^
             |               |                 |                |
             +---------------------MDIO-------------------------+
      
      PHY IF configuration and status registers are accessible through
      mdio address 0x15 which is defined as mdio_adhoc_addr. During D0,
      The driver will need to power up PHY IF by changing the power state
      to P0. Likewise, for D3, the driver sets PHY IF power state to P3.
      Signed-off-by: NVoon Weifeng <weifeng.voon@intel.com>
      Signed-off-by: NOng Boon Leong <boon.leong.ong@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b9663b7c
    • J
      vmalloc: fix remap_vmalloc_range() bounds checks · bdebd6a2
      Jann Horn 提交于
      remap_vmalloc_range() has had various issues with the bounds checks it
      promises to perform ("This function checks that addr is a valid
      vmalloc'ed area, and that it is big enough to cover the vma") over time,
      e.g.:
      
       - not detecting pgoff<<PAGE_SHIFT overflow
      
       - not detecting (pgoff<<PAGE_SHIFT)+usize overflow
      
       - not checking whether addr and addr+(pgoff<<PAGE_SHIFT) are the same
         vmalloc allocation
      
       - comparing a potentially wildly out-of-bounds pointer with the end of
         the vmalloc region
      
      In particular, since commit fc970227 ("bpf: Add mmap() support for
      BPF_MAP_TYPE_ARRAY"), unprivileged users can cause kernel null pointer
      dereferences by calling mmap() on a BPF map with a size that is bigger
      than the distance from the start of the BPF map to the end of the
      address space.
      
      This could theoretically be used as a kernel ASLR bypass, by using
      whether mmap() with a given offset oopses or returns an error code to
      perform a binary search over the possible address range.
      
      To allow remap_vmalloc_range_partial() to verify that addr and
      addr+(pgoff<<PAGE_SHIFT) are in the same vmalloc region, pass the offset
      to remap_vmalloc_range_partial() instead of adding it to the pointer in
      remap_vmalloc_range().
      
      In remap_vmalloc_range_partial(), fix the check against
      get_vm_area_size() by using size comparisons instead of pointer
      comparisons, and add checks for pgoff.
      
      Fixes: 83342314 ("[PATCH] mm: introduce remap_vmalloc_range()")
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: stable@vger.kernel.org
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Yonghong Song <yhs@fb.com>
      Cc: Andrii Nakryiko <andriin@fb.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: KP Singh <kpsingh@chromium.org>
      Link: http://lkml.kernel.org/r/20200415222312.236431-1-jannh@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bdebd6a2
  12. 20 4月, 2020 1 次提交
  13. 19 4月, 2020 1 次提交
    • G
      xattr.h: Replace zero-length array with flexible-array member · 43951585
      Gustavo A. R. Silva 提交于
      The current codebase makes use of the zero-length array language
      extension to the C90 standard, but the preferred mechanism to declare
      variable-length types such as these ones is a flexible array member[1][2],
      introduced in C99:
      
      struct foo {
              int stuff;
              struct boo array[];
      };
      
      By making use of the mechanism above, we will get a compiler warning
      in case the flexible array does not occur last in the structure, which
      will help us prevent some kind of undefined behavior bugs from being
      inadvertently introduced[3] to the codebase from now on.
      
      Also, notice that, dynamic memory allocations won't be affected by
      this change:
      
      "Flexible array members have incomplete type, and so the sizeof operator
      may not be applied. As a quirk of the original implementation of
      zero-length arrays, sizeof evaluates to zero."[1]
      
      This issue was found with the help of Coccinelle.
      
      [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
      [2] https://github.com/KSPP/linux/issues/21
      [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour")
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      43951585