1. 16 9月, 2019 6 次提交
    • B
      gpio: don't WARN() on NULL descs if gpiolib is disabled · c9c90711
      Bartosz Golaszewski 提交于
      [ Upstream commit ffe0bbabb0cffceceae07484fde1ec2a63b1537c ]
      
      If gpiolib is disabled, we use the inline stubs from gpio/consumer.h
      instead of regular definitions of GPIO API. The stubs for 'optional'
      variants of gpiod_get routines return NULL in this case as if the
      relevant GPIO wasn't found. This is correct so far.
      
      Calling other (non-gpio_get) stubs from this header triggers a warning
      because the GPIO descriptor couldn't have been requested. The warning
      however is unconditional (WARN_ON(1)) and is emitted even if the passed
      descriptor pointer is NULL.
      
      We don't want to force the users of 'optional' gpio_get to check the
      returned pointer before calling e.g. gpiod_set_value() so let's only
      WARN on non-NULL descriptors.
      
      Cc: stable@vger.kernel.org
      Reported-by: NClaus H. Stovgaard <cst@phaseone.com>
      Signed-off-by: NBartosz Golaszewski <bgolaszewski@baylibre.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      c9c90711
    • Y
      dm mpath: fix missing call of path selector type->end_io · 69409854
      Yufen Yu 提交于
      [ Upstream commit 5de719e3d01b4abe0de0d7b857148a880ff2a90b ]
      
      After commit 396eaf21 ("blk-mq: improve DM's blk-mq IO merging via
      blk_insert_cloned_request feedback"), map_request() will requeue the tio
      when issued clone request return BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE.
      
      Thus, if device driver status is error, a tio may be requeued multiple
      times until the return value is not DM_MAPIO_REQUEUE.  That means
      type->start_io may be called multiple times, while type->end_io is only
      called when IO complete.
      
      In fact, even without commit 396eaf21, setup_clone() failure can
      also cause tio requeue and associated missed call to type->end_io.
      
      The service-time path selector selects path based on in_flight_size,
      which is increased by st_start_io() and decreased by st_end_io().
      Missed calls to st_end_io() can lead to in_flight_size count error and
      will cause the selector to make the wrong choice.  In addition,
      queue-length path selector will also be affected.
      
      To fix the problem, call type->end_io in ->release_clone_rq before tio
      requeue.  map_info is passed to ->release_clone_rq() for map_request()
      error path that result in requeue.
      
      Fixes: 396eaf21 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback")
      Cc: stable@vger.kernl.org
      Signed-off-by: NYufen Yu <yuyufen@huawei.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      69409854
    • V
      drm/vblank: Allow dynamic per-crtc max_vblank_count · 2b4f5679
      Ville Syrjälä 提交于
      [ Upstream commit ed20151a7699bb2c77eba3610199789a126940c4 ]
      
      On i965gm we need to adjust max_vblank_count dynamically
      depending on whether the TV encoder is used or not. To
      that end add a per-crtc max_vblank_count that takes
      precedence over its device wide counterpart. The driver
      can now call drm_crtc_set_max_vblank_count() to configure
      the per-crtc value before calling drm_vblank_on().
      
      Also looks like there was some discussion about exynos needing
      similar treatment.
      
      v2: Drop the extra max_vblank_count!=0 check for the
          WARN(last!=current), will take care of it in i915 code (Daniel)
          WARN_ON(!inmodeset) (Daniel)
          WARN_ON(dev->max_vblank_count)
          Pimp up the docs (Daniel)
      
      Cc: stable@vger.kernel.org
      Cc: Inki Dae <inki.dae@samsung.com>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20181127182004.28885-1-ville.syrjala@linux.intel.comReviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      2b4f5679
    • D
      keys: Fix the use of the C++ keyword "private" in uapi/linux/keyctl.h · 3f3beae2
      David Howells 提交于
      [ Upstream commit 2ecefa0a15fd0ef88b9cd5d15ceb813008136431 ]
      
      The keyctl_dh_params struct in uapi/linux/keyctl.h contains the symbol
      "private" which means that the header file will cause compilation failure
      if #included in to a C++ program.  Further, the patch that added the same
      struct to the keyutils package named the symbol "priv", not "private".
      
      The previous attempt to fix this (commit 8a2336e5) did so by simply
      renaming the kernel's copy of the field to dh_private, but this then breaks
      existing userspace and as such has been reverted (commit 8c0f9f5b).
      
      [And note, to those who think that wrapping the struct in extern "C" {}
       will work: it won't; that only changes how symbol names are presented to
       the assembler and linker.].
      
      Instead, insert an anonymous union around the "private" member and add a
      second member in there with the name "priv" to match the one in the
      keyutils package.  The "private" member is then wrapped in !__cplusplus
      cpp-conditionals to hide it from C++.
      
      Fixes: ddbb4114 ("KEYS: Add KEYCTL_DH_COMPUTE command")
      Fixes: 8a2336e5 ("uapi/linux/keyctl.h: don't use C++ reserved keyword as a struct member name")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Randy Dunlap <rdunlap@infradead.org>
      cc: Lubomir Rintel <lkundrak@v3.sk>
      cc: James Morris <jmorris@namei.org>
      cc: Mat Martineau <mathew.j.martineau@linux.intel.com>
      cc: Stephan Mueller <smueller@chronox.de>
      cc: Andrew Morton <akpm@linux-foundation.org>
      cc: Linus Torvalds <torvalds@linux-foundation.org>
      cc: stable@vger.kernel.org
      Signed-off-by: NJames Morris <james.morris@microsoft.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      3f3beae2
    • H
      media: cec/v4l2: move V4L2 specific CEC functions to V4L2 · 85130845
      Hans Verkuil 提交于
      [ Upstream commit 9cfd2753f8f3923f89cbb15f940f3aa0e7202d3e ]
      
      Several CEC functions are actually specific for use with receivers,
      i.e. they should be part of the V4L2 subsystem, not CEC.
      
      These functions deal with validating and modifying EDIDs for (HDMI)
      receivers, and they do not actually have anything to do with the CEC
      subsystem and whether or not CEC is enabled. The problem was that if
      the CEC_CORE config option was not set, then these functions would
      become stubs, but that's not right: they should always be valid.
      
      So replace the cec_ prefix by v4l2_ and move them to v4l2-dv-timings.c.
      Update all drivers that call these accordingly.
      Signed-off-by: NHans Verkuil <hans.verkuil@cisco.com>
      Reported-by: NLars-Peter Clausen <lars@metafoo.de>
      Cc: <stable@vger.kernel.org>      # for v4.17 and up
      Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      85130845
    • M
      {nl,mac}80211: fix interface combinations on crypto controlled devices · 1aa38ece
      Manikanta Pubbisetty 提交于
      [ Upstream commit e6f4051123fd33901e9655a675b22aefcdc5d277 ]
      
      Commit 33d915d9e8ce ("{nl,mac}80211: allow 4addr AP operation on
      crypto controlled devices") has introduced a change which allows
      4addr operation on crypto controlled devices (ex: ath10k). This
      change has inadvertently impacted the interface combinations logic
      on such devices.
      
      General rule is that software interfaces like AP/VLAN should not be
      listed under supported interface combinations and should not be
      considered during validation of these combinations; because of the
      aforementioned change, AP/VLAN interfaces(if present) will be checked
      against interfaces supported by the device and blocks valid interface
      combinations.
      
      Consider a case where an AP and AP/VLAN are up and running; when a
      second AP device is brought up on the same physical device, this AP
      will be checked against the AP/VLAN interface (which will not be
      part of supported interface combinations of the device) and blocks
      second AP to come up.
      
      Add a new API cfg80211_iftype_allowed() to fix the problem, this
      API works for all devices with/without SW crypto control.
      Signed-off-by: NManikanta Pubbisetty <mpubbise@codeaurora.org>
      Fixes: 33d915d9e8ce ("{nl,mac}80211: allow 4addr AP operation on crypto controlled devices")
      Link: https://lore.kernel.org/r/1563779690-9716-1-git-send-email-mpubbise@codeaurora.orgSigned-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      1aa38ece
  2. 10 9月, 2019 5 次提交
    • L
      libceph: allow ceph_buffer_put() to receive a NULL ceph_buffer · 0f134f6e
      Luis Henriques 提交于
      [ Upstream commit 5c498950f730aa17c5f8a2cdcb903524e4002ed2 ]
      Signed-off-by: NLuis Henriques <lhenriques@suse.com>
      Reviewed-by: NJeff Layton <jlayton@kernel.org>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      0f134f6e
    • Y
      gpio: Fix build error of function redefinition · 60520902
      YueHaibing 提交于
      [ Upstream commit 68e03b85474a51ec1921b4d13204782594ef7223 ]
      
      when do randbuilding, I got this error:
      
      In file included from drivers/hwmon/pmbus/ucd9000.c:19:0:
      ./include/linux/gpio/driver.h:576:1: error: redefinition of gpiochip_add_pin_range
       gpiochip_add_pin_range(struct gpio_chip *chip, const char *pinctl_name,
       ^~~~~~~~~~~~~~~~~~~~~~
      In file included from drivers/hwmon/pmbus/ucd9000.c:18:0:
      ./include/linux/gpio.h:245:1: note: previous definition of gpiochip_add_pin_range was here
       gpiochip_add_pin_range(struct gpio_chip *chip, const char *pinctl_name,
       ^~~~~~~~~~~~~~~~~~~~~~
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Fixes: 964cb341 ("gpio: move pincontrol calls to <linux/gpio/driver.h>")
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Link: https://lore.kernel.org/r/20190731123814.46624-1-yuehaibing@huawei.comSigned-off-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      60520902
    • P
      netfilter: nf_tables: use-after-free in failing rule with bound set · 5776970f
      Pablo Neira Ayuso 提交于
      [ Upstream commit 6a0a8d10a3661a036b55af695542a714c429ab7c ]
      
      If a rule that has already a bound anonymous set fails to be added, the
      preparation phase releases the rule and the bound set. However, the
      transaction object from the abort path still has a reference to the set
      object that is stale, leading to a use-after-free when checking for the
      set->bound field. Add a new field to the transaction that specifies if
      the set is bound, so the abort path can skip releasing it since the rule
      command owns it and it takes care of releasing it. After this update,
      the set->bound field is removed.
      
      [   24.649883] Unable to handle kernel paging request at virtual address 0000000000040434
      [   24.657858] Mem abort info:
      [   24.660686]   ESR = 0x96000004
      [   24.663769]   Exception class = DABT (current EL), IL = 32 bits
      [   24.669725]   SET = 0, FnV = 0
      [   24.672804]   EA = 0, S1PTW = 0
      [   24.675975] Data abort info:
      [   24.678880]   ISV = 0, ISS = 0x00000004
      [   24.682743]   CM = 0, WnR = 0
      [   24.685723] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000428952000
      [   24.692207] [0000000000040434] pgd=0000000000000000
      [   24.697119] Internal error: Oops: 96000004 [#1] SMP
      [...]
      [   24.889414] Call trace:
      [   24.891870]  __nf_tables_abort+0x3f0/0x7a0
      [   24.895984]  nf_tables_abort+0x20/0x40
      [   24.899750]  nfnetlink_rcv_batch+0x17c/0x588
      [   24.904037]  nfnetlink_rcv+0x13c/0x190
      [   24.907803]  netlink_unicast+0x18c/0x208
      [   24.911742]  netlink_sendmsg+0x1b0/0x350
      [   24.915682]  sock_sendmsg+0x4c/0x68
      [   24.919185]  ___sys_sendmsg+0x288/0x2c8
      [   24.923037]  __sys_sendmsg+0x7c/0xd0
      [   24.926628]  __arm64_sys_sendmsg+0x2c/0x38
      [   24.930744]  el0_svc_common.constprop.0+0x94/0x158
      [   24.935556]  el0_svc_handler+0x34/0x90
      [   24.939322]  el0_svc+0x8/0xc
      [   24.942216] Code: 37280300 f9404023 91014262 aa1703e0 (f9401863)
      [   24.948336] ---[ end trace cebbb9dcbed3b56f ]---
      
      Fixes: f6ac85858976 ("netfilter: nf_tables: unbind set in rule from commit path")
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      5776970f
    • C
      net_sched: fix a NULL pointer deref in ipt action · 38166934
      Cong Wang 提交于
      [ Upstream commit 981471bd3abf4d572097645d765391533aac327d ]
      
      The net pointer in struct xt_tgdtor_param is not explicitly
      initialized therefore is still NULL when dereferencing it.
      So we have to find a way to pass the correct net pointer to
      ipt_destroy_target().
      
      The best way I find is just saving the net pointer inside the per
      netns struct tcf_idrinfo, which could make this patch smaller.
      
      Fixes: 0c66dc1e ("netfilter: conntrack: register hooks in netns when needed by ruleset")
      Reported-and-tested-by: itugrok@yahoo.com
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      38166934
    • V
      net: sched: act_sample: fix psample group handling on overwrite · 5ff0ab0c
      Vlad Buslov 提交于
      [ Upstream commit dbf47a2a094edf58983265e323ca4bdcdb58b5ee ]
      
      Action sample doesn't properly handle psample_group pointer in overwrite
      case. Following issues need to be fixed:
      
      - In tcf_sample_init() function RCU_INIT_POINTER() is used to set
        s->psample_group, even though we neither setting the pointer to NULL, nor
        preventing concurrent readers from accessing the pointer in some way.
        Use rcu_swap_protected() instead to safely reset the pointer.
      
      - Old value of s->psample_group is not released or deallocated in any way,
        which results resource leak. Use psample_group_put() on non-NULL value
        obtained with rcu_swap_protected().
      
      - The function psample_group_put() that released reference to struct
        psample_group pointed by rcu-pointer s->psample_group doesn't respect rcu
        grace period when deallocating it. Extend struct psample_group with rcu
        head and use kfree_rcu when freeing it.
      
      Fixes: 5c5670fa ("net/sched: Introduce sample tc action")
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5ff0ab0c
  3. 06 9月, 2019 3 次提交
  4. 29 8月, 2019 1 次提交
    • D
      rxrpc: Fix read-after-free in rxrpc_queue_local() · a05354cb
      David Howells 提交于
      commit 06d9532fa6b34f12a6d75711162d47c17c1add72 upstream.
      
      rxrpc_queue_local() attempts to queue the local endpoint it is given and
      then, if successful, prints a trace line.  The trace line includes the
      current usage count - but we're not allowed to look at the local endpoint
      at this point as we passed our ref on it to the workqueue.
      
      Fix this by reading the usage count before queuing the work item.
      
      Also fix the reading of local->debug_id for trace lines, which must be done
      with the same consideration as reading the usage count.
      
      Fixes: 09d2bf59 ("rxrpc: Add a tracepoint to track rxrpc_local refcounting")
      Reported-by: syzbot+78e71c5bab4f76a6a719@syzkaller.appspotmail.com
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a05354cb
  5. 25 8月, 2019 3 次提交
    • R
      drm/i915/cfl: Add a new CFL PCI ID. · 4af28b2f
      Rodrigo Vivi 提交于
      commit d0e062ebb3a44b56a7e672da568334c76f763552 upstream.
      
      One more CFL ID added to spec.
      
      Cc: José Roberto de Souza <jose.souza@intel.com>
      Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
      Reviewed-by: NJosé Roberto de Souza <jose.souza@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180803232721.20038-1-rodrigo.vivi@intel.comSigned-off-by: NWan Yusof, Wan Fahim AsqalaniX <wan.fahim.asqalanix.wan.yusof@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4af28b2f
    • M
      KVM: arm/arm64: Sync ICH_VMCR_EL2 back when about to block · 8c7053d1
      Marc Zyngier 提交于
      commit 5eeaf10eec394b28fad2c58f1f5c3a5da0e87d1c upstream.
      
      Since commit commit 328e5664 ("KVM: arm/arm64: vgic: Defer
      touching GICH_VMCR to vcpu_load/put"), we leave ICH_VMCR_EL2 (or
      its GICv2 equivalent) loaded as long as we can, only syncing it
      back when we're scheduled out.
      
      There is a small snag with that though: kvm_vgic_vcpu_pending_irq(),
      which is indirectly called from kvm_vcpu_check_block(), needs to
      evaluate the guest's view of ICC_PMR_EL1. At the point were we
      call kvm_vcpu_check_block(), the vcpu is still loaded, and whatever
      changes to PMR is not visible in memory until we do a vcpu_put().
      
      Things go really south if the guest does the following:
      
      	mov x0, #0	// or any small value masking interrupts
      	msr ICC_PMR_EL1, x0
      
      	[vcpu preempted, then rescheduled, VMCR sampled]
      
      	mov x0, #ff	// allow all interrupts
      	msr ICC_PMR_EL1, x0
      	wfi		// traps to EL2, so samping of VMCR
      
      	[interrupt arrives just after WFI]
      
      Here, the hypervisor's view of PMR is zero, while the guest has enabled
      its interrupts. kvm_vgic_vcpu_pending_irq() will then say that no
      interrupts are pending (despite an interrupt being received) and we'll
      block for no reason. If the guest doesn't have a periodic interrupt
      firing once it has blocked, it will stay there forever.
      
      To avoid this unfortuante situation, let's resync VMCR from
      kvm_arch_vcpu_blocking(), ensuring that a following kvm_vcpu_check_block()
      will observe the latest value of PMR.
      
      This has been found by booting an arm64 Linux guest with the pseudo NMI
      feature, and thus using interrupt priorities to mask interrupts instead
      of the usual PSTATE masking.
      
      Cc: stable@vger.kernel.org # 4.12
      Fixes: 328e5664 ("KVM: arm/arm64: vgic: Defer touching GICH_VMCR to vcpu_load/put")
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      8c7053d1
    • Q
      asm-generic: fix -Wtype-limits compiler warnings · 0755b6b1
      Qian Cai 提交于
      [ Upstream commit cbedfe11347fe418621bd188d58a206beb676218 ]
      
      Commit d66acc39 ("bitops: Optimise get_order()") introduced a
      compilation warning because "rx_frag_size" is an "ushort" while
      PAGE_SHIFT here is 16.
      
      The commit changed the get_order() to be a multi-line macro where
      compilers insist to check all statements in the macro even when
      __builtin_constant_p(rx_frag_size) will return false as "rx_frag_size"
      is a module parameter.
      
      In file included from ./arch/powerpc/include/asm/page_64.h:107,
                       from ./arch/powerpc/include/asm/page.h:242,
                       from ./arch/powerpc/include/asm/mmu.h:132,
                       from ./arch/powerpc/include/asm/lppaca.h:47,
                       from ./arch/powerpc/include/asm/paca.h:17,
                       from ./arch/powerpc/include/asm/current.h:13,
                       from ./include/linux/thread_info.h:21,
                       from ./arch/powerpc/include/asm/processor.h:39,
                       from ./include/linux/prefetch.h:15,
                       from drivers/net/ethernet/emulex/benet/be_main.c:14:
      drivers/net/ethernet/emulex/benet/be_main.c: In function 'be_rx_cqs_create':
      ./include/asm-generic/getorder.h:54:9: warning: comparison is always
      true due to limited range of data type [-Wtype-limits]
         (((n) < (1UL << PAGE_SHIFT)) ? 0 :  \
               ^
      drivers/net/ethernet/emulex/benet/be_main.c:3138:33: note: in expansion
      of macro 'get_order'
        adapter->big_page_size = (1 << get_order(rx_frag_size)) * PAGE_SIZE;
                                       ^~~~~~~~~
      
      Fix it by moving all of this multi-line macro into a proper function,
      and killing __get_order() off.
      
      [akpm@linux-foundation.org: remove __get_order() altogether]
      [cai@lca.pw: v2]
        Link: http://lkml.kernel.org/r/1564000166-31428-1-git-send-email-cai@lca.pw
      Link: http://lkml.kernel.org/r/1563914986-26502-1-git-send-email-cai@lca.pw
      Fixes: d66acc39 ("bitops: Optimise get_order()")
      Signed-off-by: NQian Cai <cai@lca.pw>
      Reviewed-by: NNathan Chancellor <natechancellor@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Jakub Jelinek <jakub@redhat.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Bill Wendling <morbo@google.com>
      Cc: James Y Knight <jyknight@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      0755b6b1
  6. 16 8月, 2019 4 次提交
  7. 09 8月, 2019 7 次提交
    • T
      cgroup: Include dying leaders with live threads in PROCS iterations · 4340d175
      Tejun Heo 提交于
      commit c03cd7738a83b13739f00546166969342c8ff014 upstream.
      
      CSS_TASK_ITER_PROCS currently iterates live group leaders; however,
      this means that a process with dying leader and live threads will be
      skipped.  IOW, cgroup.procs might be empty while cgroup.threads isn't,
      which is confusing to say the least.
      
      Fix it by making cset track dying tasks and include dying leaders with
      live threads in PROCS iteration.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-and-tested-by: NTopi Miettinen <toiwoton@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4340d175
    • T
      cgroup: Implement css_task_iter_skip() · 370b9e63
      Tejun Heo 提交于
      commit b636fd38dc40113f853337a7d2a6885ad23b8811 upstream.
      
      When a task is moved out of a cset, task iterators pointing to the
      task are advanced using the normal css_task_iter_advance() call.  This
      is fine but we'll be tracking dying tasks on csets and thus moving
      tasks from cset->tasks to (to be added) cset->dying_tasks.  When we
      remove a task from cset->tasks, if we advance the iterators, they may
      move over to the next cset before we had the chance to add the task
      back on the dying list, which can allow the task to escape iteration.
      
      This patch separates out skipping from advancing.  Skipping only moves
      the affected iterators to the next pointer rather than fully advancing
      it and the following advancing will recognize that the cursor has
      already been moved forward and do the rest of advancing.  This ensures
      that when a task moves from one list to another in its cset, as long
      as it moves in the right direction, it's always visible to iteration.
      
      This doesn't cause any visible behavior changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      370b9e63
    • A
      compat_ioctl: pppoe: fix PPPOEIOCSFWD handling · e6e9bcef
      Arnd Bergmann 提交于
      [ Upstream commit 055d88242a6046a1ceac3167290f054c72571cd9 ]
      
      Support for handling the PPPOEIOCSFWD ioctl in compat mode was added in
      linux-2.5.69 along with hundreds of other commands, but was always broken
      sincen only the structure is compatible, but the command number is not,
      due to the size being sizeof(size_t), or at first sizeof(sizeof((struct
      sockaddr_pppox)), which is different on 64-bit architectures.
      
      Guillaume Nault adds:
      
        And the implementation was broken until 2016 (see 29e73269 ("pppoe:
        fix reference counting in PPPoE proxy")), and nobody ever noticed. I
        should probably have removed this ioctl entirely instead of fixing it.
        Clearly, it has never been used.
      
      Fix it by adding a compat_ioctl handler for all pppoe variants that
      translates the command number and then calls the regular ioctl function.
      
      All other ioctl commands handled by pppoe are compatible between 32-bit
      and 64-bit, and require compat_ptr() conversion.
      
      This should apply to all stable kernels.
      Acked-by: NGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e6e9bcef
    • A
      net/mlx5e: Prevent encap flow counter update async to user query · 0ccf4726
      Ariel Levkovich 提交于
      [ Upstream commit 90bb769291161cf25a818d69cf608c181654473e ]
      
      This patch prevents a race between user invoked cached counters
      query and a neighbor last usage updater.
      
      The cached flow counter stats can be queried by calling
      "mlx5_fc_query_cached" which provides the number of bytes and
      packets that passed via this flow since the last time this counter
      was queried.
      It does so by reducting the last saved stats from the current, cached
      stats and then updating the last saved stats with the cached stats.
      It also provide the lastuse value for that flow.
      
      Since "mlx5e_tc_update_neigh_used_value" needs to retrieve the
      last usage time of encapsulation flows, it calls the flow counter
      query method periodically and async to user queries of the flow counter
      using cls_flower.
      This call is causing the driver to update the last reported bytes and
      packets from the cache and therefore, future user queries of the flow
      stats will return lower than expected number for bytes and packets
      since the last saved stats in the driver was updated async to the last
      saved stats in cls_flower.
      
      This causes wrong stats presentation of encapsulation flows to user.
      
      Since the neighbor usage updater only needs the lastuse stats from the
      cached counter, the fix is to use a dedicated lastuse query call that
      returns the lastuse value without synching between the cached stats and
      the last saved stats.
      
      Fixes: f6dfb4c3 ("net/mlx5e: Update neighbour 'used' state using HW flow rules counters")
      Signed-off-by: NAriel Levkovich <lariel@mellanox.com>
      Reviewed-by: NRoi Dayan <roid@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0ccf4726
    • E
      net/mlx5: Fix modify_cq_in alignment · cd84a107
      Edward Srouji 提交于
      [ Upstream commit 7a32f2962c56d9d8a836b4469855caeee8766bd4 ]
      
      Fix modify_cq_in alignment to match the device specification.
      After this fix the 'cq_umem_valid' field will be in the right offset.
      
      Cc: <stable@vger.kernel.org> # 4.19
      Fixes: bd37197554eb ("net/mlx5: Update mlx5_ifc with DEVX UID bits")
      Signed-off-by: NEdward Srouji <edwards@mellanox.com>
      Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cd84a107
    • D
      drivers/base: Introduce kill_device() · c23106d4
      Dan Williams 提交于
      commit 00289cd87676e14913d2d8492d1ce05c4baafdae upstream.
      
      The libnvdimm subsystem arranges for devices to be destroyed as a result
      of a sysfs operation. Since device_unregister() cannot be called from
      an actively running sysfs attribute of the same device libnvdimm
      arranges for device_unregister() to be performed in an out-of-line async
      context.
      
      The driver core maintains a 'dead' state for coordinating its own racing
      async registration / de-registration requests. Rather than add local
      'dead' state tracking infrastructure to libnvdimm device objects, export
      the existing state tracking via a new kill_device() helper.
      
      The kill_device() helper simply marks the device as dead, i.e. that it
      is on its way to device_del(), or returns that the device was already
      dead. This can be used in advance of calling device_unregister() for
      subsystems like libnvdimm that might need to handle multiple user
      threads racing to delete a device.
      
      This refactoring does not change any behavior, but it is a pre-requisite
      for follow-on fixes and therefore marked for -stable.
      
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Fixes: 4d88a97a ("libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver...")
      Cc: <stable@vger.kernel.org>
      Tested-by: NJane Chu <jane.chu@oracle.com>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Link: https://lore.kernel.org/r/156341207332.292348.14959761496009347574.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      c23106d4
    • H
      scsi: fcoe: Embed fc_rport_priv in fcoe_rport structure · 93d6f084
      Hannes Reinecke 提交于
      commit 023358b136d490ca91735ac6490db3741af5a8bd upstream.
      
      Gcc-9 complains for a memset across pointer boundaries, which happens as
      the code tries to allocate a flexible array on the stack.  Turns out we
      cannot do this without relying on gcc-isms, so with this patch we'll embed
      the fc_rport_priv structure into fcoe_rport, can use the normal
      'container_of' outcast, and will only have to do a memset over one
      structure.
      Signed-off-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      93d6f084
  8. 07 8月, 2019 3 次提交
  9. 04 8月, 2019 5 次提交
    • B
      block, scsi: Change the preempt-only flag into a counter · c58a6507
      Bart Van Assche 提交于
      commit cd84a62e0078dce09f4ed349bec84f86c9d54b30 upstream.
      
      The RQF_PREEMPT flag is used for three purposes:
      - In the SCSI core, for making sure that power management requests
        are executed even if a device is in the "quiesced" state.
      - For domain validation by SCSI drivers that use the parallel port.
      - In the IDE driver, for IDE preempt requests.
      Rename "preempt-only" into "pm-only" because the primary purpose of
      this mode is power management. Since the power management core may
      but does not have to resume a runtime suspended device before
      performing system-wide suspend and since a later patch will set
      "pm-only" mode as long as a block device is runtime suspended, make
      it possible to set "pm-only" mode from more than one context. Since
      with this change scsi_device_quiesce() is no longer idempotent, make
      that function return early if it is called for a quiesced queue.
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Acked-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c58a6507
    • J
      sched/fair: Use RCU accessors consistently for ->numa_group · a5a3915f
      Jann Horn 提交于
      commit cb361d8cdef69990f6b4504dc1fd9a594d983c97 upstream.
      
      The old code used RCU annotations and accessors inconsistently for
      ->numa_group, which can lead to use-after-frees and NULL dereferences.
      
      Let all accesses to ->numa_group use proper RCU helpers to prevent such
      issues.
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Fixes: 8c8a743c ("sched/numa: Use {cpu, pid} to create task groups for shared faults")
      Link: https://lkml.kernel.org/r/20190716152047.14424-3-jannh@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a5a3915f
    • J
      sched/fair: Don't free p->numa_faults with concurrent readers · 48046e09
      Jann Horn 提交于
      commit 16d51a590a8ce3befb1308e0e7ab77f3b661af33 upstream.
      
      When going through execve(), zero out the NUMA fault statistics instead of
      freeing them.
      
      During execve, the task is reachable through procfs and the scheduler. A
      concurrent /proc/*/sched reader can read data from a freed ->numa_faults
      allocation (confirmed by KASAN) and write it back to userspace.
      I believe that it would also be possible for a use-after-free read to occur
      through a race between a NUMA fault and execve(): task_numa_fault() can
      lead to task_numa_compare(), which invokes task_weight() on the currently
      running task of a different CPU.
      
      Another way to fix this would be to make ->numa_faults RCU-managed or add
      extra locking, but it seems easier to wipe the NUMA fault statistics on
      execve.
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Fixes: 82727018 ("sched/numa: Call task_numa_free() from do_execve()")
      Link: https://lkml.kernel.org/r/20190716152047.14424-1-jannh@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      48046e09
    • J
      iommu/iova: Fix compilation error with !CONFIG_IOMMU_IOVA · 3a0c22cb
      Joerg Roedel 提交于
      commit 201c1db90cd643282185a00770f12f95da330eca upstream.
      
      The stub function for !CONFIG_IOMMU_IOVA needs to be
      'static inline'.
      
      Fixes: effa467870c76 ('iommu/vt-d: Don't queue_iova() if there is no flush queue')
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      Signed-off-by: NDmitry Safonov <dima@arista.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3a0c22cb
    • D
      iommu/vt-d: Don't queue_iova() if there is no flush queue · 4fd0eb60
      Dmitry Safonov 提交于
      commit effa467870c7612012885df4e246bdb8ffd8e44c upstream.
      
      Intel VT-d driver was reworked to use common deferred flushing
      implementation. Previously there was one global per-cpu flush queue,
      afterwards - one per domain.
      
      Before deferring a flush, the queue should be allocated and initialized.
      
      Currently only domains with IOMMU_DOMAIN_DMA type initialize their flush
      queue. It's probably worth to init it for static or unmanaged domains
      too, but it may be arguable - I'm leaving it to iommu folks.
      
      Prevent queuing an iova flush if the domain doesn't have a queue.
      The defensive check seems to be worth to keep even if queue would be
      initialized for all kinds of domains. And is easy backportable.
      
      On 4.19.43 stable kernel it has a user-visible effect: previously for
      devices in si domain there were crashes, on sata devices:
      
       BUG: spinlock bad magic on CPU#6, swapper/0/1
        lock: 0xffff88844f582008, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
       CPU: 6 PID: 1 Comm: swapper/0 Not tainted 4.19.43 #1
       Call Trace:
        <IRQ>
        dump_stack+0x61/0x7e
        spin_bug+0x9d/0xa3
        do_raw_spin_lock+0x22/0x8e
        _raw_spin_lock_irqsave+0x32/0x3a
        queue_iova+0x45/0x115
        intel_unmap+0x107/0x113
        intel_unmap_sg+0x6b/0x76
        __ata_qc_complete+0x7f/0x103
        ata_qc_complete+0x9b/0x26a
        ata_qc_complete_multiple+0xd0/0xe3
        ahci_handle_port_interrupt+0x3ee/0x48a
        ahci_handle_port_intr+0x73/0xa9
        ahci_single_level_irq_intr+0x40/0x60
        __handle_irq_event_percpu+0x7f/0x19a
        handle_irq_event_percpu+0x32/0x72
        handle_irq_event+0x38/0x56
        handle_edge_irq+0x102/0x121
        handle_irq+0x147/0x15c
        do_IRQ+0x66/0xf2
        common_interrupt+0xf/0xf
       RIP: 0010:__do_softirq+0x8c/0x2df
      
      The same for usb devices that use ehci-pci:
       BUG: spinlock bad magic on CPU#0, swapper/0/1
        lock: 0xffff88844f402008, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
       CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.43 #4
       Call Trace:
        <IRQ>
        dump_stack+0x61/0x7e
        spin_bug+0x9d/0xa3
        do_raw_spin_lock+0x22/0x8e
        _raw_spin_lock_irqsave+0x32/0x3a
        queue_iova+0x77/0x145
        intel_unmap+0x107/0x113
        intel_unmap_page+0xe/0x10
        usb_hcd_unmap_urb_setup_for_dma+0x53/0x9d
        usb_hcd_unmap_urb_for_dma+0x17/0x100
        unmap_urb_for_dma+0x22/0x24
        __usb_hcd_giveback_urb+0x51/0xc3
        usb_giveback_urb_bh+0x97/0xde
        tasklet_action_common.isra.4+0x5f/0xa1
        tasklet_action+0x2d/0x30
        __do_softirq+0x138/0x2df
        irq_exit+0x7d/0x8b
        smp_apic_timer_interrupt+0x10f/0x151
        apic_timer_interrupt+0xf/0x20
        </IRQ>
       RIP: 0010:_raw_spin_unlock_irqrestore+0x17/0x39
      
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Lu Baolu <baolu.lu@linux.intel.com>
      Cc: iommu@lists.linux-foundation.org
      Cc: <stable@vger.kernel.org> # 4.14+
      Fixes: 13cf0174 ("iommu/vt-d: Make use of iova deferred flushing")
      Signed-off-by: NDmitry Safonov <dima@arista.com>
      Reviewed-by: NLu Baolu <baolu.lu@linux.intel.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      [v4.14-port notes:
      o minor conflict with untrusted IOMMU devices check under if-condition]
      Signed-off-by: NDmitry Safonov <dima@arista.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4fd0eb60
  10. 31 7月, 2019 2 次提交
    • L
      access: avoid the RCU grace period for the temporary subjective credentials · 408af823
      Linus Torvalds 提交于
      commit d7852fbd0f0423937fa287a598bfde188bb68c22 upstream.
      
      It turns out that 'access()' (and 'faccessat()') can cause a lot of RCU
      work because it installs a temporary credential that gets allocated and
      freed for each system call.
      
      The allocation and freeing overhead is mostly benign, but because
      credentials can be accessed under the RCU read lock, the freeing
      involves a RCU grace period.
      
      Which is not a huge deal normally, but if you have a lot of access()
      calls, this causes a fair amount of seconday damage: instead of having a
      nice alloc/free patterns that hits in hot per-CPU slab caches, you have
      all those delayed free's, and on big machines with hundreds of cores,
      the RCU overhead can end up being enormous.
      
      But it turns out that all of this is entirely unnecessary.  Exactly
      because access() only installs the credential as the thread-local
      subjective credential, the temporary cred pointer doesn't actually need
      to be RCU free'd at all.  Once we're done using it, we can just free it
      synchronously and avoid all the RCU overhead.
      
      So add a 'non_rcu' flag to 'struct cred', which can be set by users that
      know they only use it in non-RCU context (there are other potential
      users for this).  We can make it a union with the rcu freeing list head
      that we need for the RCU case, so this doesn't need any extra storage.
      
      Note that this also makes 'get_current_cred()' clear the new non_rcu
      flag, in case we have filesystems that take a long-term reference to the
      cred and then expect the RCU delayed freeing afterwards.  It's not
      entirely clear that this is required, but it makes for clear semantics:
      the subjective cred remains non-RCU as long as you only access it
      synchronously using the thread-local accessors, but you _can_ use it as
      a generic cred if you want to.
      
      It is possible that we should just remove the whole RCU markings for
      ->cred entirely.  Only ->real_cred is really supposed to be accessed
      through RCU, and the long-term cred copies that nfs uses might want to
      explicitly re-enable RCU freeing if required, rather than have
      get_current_cred() do it implicitly.
      
      But this is a "minimal semantic changes" change for the immediate
      problem.
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Jan Glauber <jglauber@marvell.com>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Jayachandran Chandrasekharan Nair <jnair@marvell.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      408af823
    • T
      gpu: host1x: Increase maximum DMA segment size · 4d14323a
      Thierry Reding 提交于
      [ Upstream commit 1e390478cfb527e34c9ab89ba57212cb05c33c51 ]
      
      Recent versions of the DMA API debug code have started to warn about
      violations of the maximum DMA segment size. This is because the segment
      size defaults to 64 KiB, which can easily be exceeded in large buffer
      allocations such as used in DRM/KMS for framebuffers.
      
      Technically the Tegra SMMU and ARM SMMU don't have a maximum segment
      size (they map individual pages irrespective of whether they are
      contiguous or not), so the choice of 4 MiB is a bit arbitrary here. The
      maximum segment size is a 32-bit unsigned integer, though, so we can't
      set it to the correct maximum size, which would be the size of the
      aperture.
      Signed-off-by: NThierry Reding <treding@nvidia.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      4d14323a
  11. 28 7月, 2019 1 次提交
    • R
      jbd2: introduce jbd2_inode dirty range scoping · af3812b6
      Ross Zwisler 提交于
      commit 6ba0e7dc64a5adcda2fbe65adc466891795d639e upstream.
      
      Currently both journal_submit_inode_data_buffers() and
      journal_finish_inode_data_buffers() operate on the entire address space
      of each of the inodes associated with a given journal entry.  The
      consequence of this is that if we have an inode where we are constantly
      appending dirty pages we can end up waiting for an indefinite amount of
      time in journal_finish_inode_data_buffers() while we wait for all the
      pages under writeback to be written out.
      
      The easiest way to cause this type of workload is do just dd from
      /dev/zero to a file until it fills the entire filesystem.  This can
      cause journal_finish_inode_data_buffers() to wait for the duration of
      the entire dd operation.
      
      We can improve this situation by scoping each of the inode dirty ranges
      associated with a given transaction.  We do this via the jbd2_inode
      structure so that the scoping is contained within jbd2 and so that it
      follows the lifetime and locking rules for that structure.
      
      This allows us to limit the writeback & wait in
      journal_submit_inode_data_buffers() and
      journal_finish_inode_data_buffers() respectively to the dirty range for
      a given struct jdb2_inode, keeping us from waiting forever if the inode
      in question is still being appended to.
      Signed-off-by: NRoss Zwisler <zwisler@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: stable@vger.kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      af3812b6