1. 06 4月, 2019 8 次提交
    • A
      sched/core: Use READ_ONCE()/WRITE_ONCE() in move_queued_task()/task_rq_lock() · e8e0bd49
      Andrea Parri 提交于
      [ Upstream commit c546951d9c9300065bad253ecdf1ac59ce9d06c8 ]
      
      move_queued_task() synchronizes with task_rq_lock() as follows:
      
      	move_queued_task()		task_rq_lock()
      
      	[S] ->on_rq = MIGRATING		[L] rq = task_rq()
      	WMB (__set_task_cpu())		ACQUIRE (rq->lock);
      	[S] ->cpu = new_cpu		[L] ->on_rq
      
      where "[L] rq = task_rq()" is ordered before "ACQUIRE (rq->lock)" by an
      address dependency and, in turn, "ACQUIRE (rq->lock)" is ordered before
      "[L] ->on_rq" by the ACQUIRE itself.
      
      Use READ_ONCE() to load ->cpu in task_rq() (c.f., task_cpu()) to honor
      this address dependency.  Also, mark the accesses to ->cpu and ->on_rq
      with READ_ONCE()/WRITE_ONCE() to comply with the LKMM.
      Signed-off-by: NAndrea Parri <andrea.parri@amarulasolutions.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul E. McKenney <paulmck@linux.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Link: https://lkml.kernel.org/r/20190121155240.27173-1-andrea.parri@amarulasolutions.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      e8e0bd49
    • M
      perf/aux: Make perf_event accessible to setup_aux() · efd85d83
      Mathieu Poirier 提交于
      [ Upstream commit 840018668ce2d96783356204ff282d6c9b0e5f66 ]
      
      When pmu::setup_aux() is called the coresight PMU needs to know which
      sink to use for the session by looking up the information in the
      event's attr::config2 field.
      
      As such simply replace the cpu information by the complete perf_event
      structure and change all affected customers.
      Signed-off-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Reviewed-by: NSuzuki Poulouse <suzuki.poulose@arm.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-s390@vger.kernel.org
      Link: http://lkml.kernel.org/r/20190131184714.20388-2-mathieu.poirier@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      efd85d83
    • T
      genirq: Avoid summation loops for /proc/stat · 1f369486
      Thomas Gleixner 提交于
      [ Upstream commit 1136b0728969901a091f0471968b2b76ed14d9ad ]
      
      Waiman reported that on large systems with a large amount of interrupts the
      readout of /proc/stat takes a long time to sum up the interrupt
      statistics. In principle this is not a problem. but for unknown reasons
      some enterprise quality software reads /proc/stat with a high frequency.
      
      The reason for this is that interrupt statistics are accounted per cpu. So
      the /proc/stat logic has to sum up the interrupt stats for each interrupt.
      
      This can be largely avoided for interrupts which are not marked as
      'PER_CPU' interrupts by simply adding a per interrupt summation counter
      which is incremented along with the per interrupt per cpu counter.
      
      The PER_CPU interrupts need to avoid that and use only per cpu accounting
      because they share the interrupt number and the interrupt descriptor and
      concurrent updates would conflict or require unwanted synchronization.
      Reported-by: NWaiman Long <longman@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NWaiman Long <longman@redhat.com>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: NDavidlohr Bueso <dbueso@suse.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Daniel Colascione <dancol@google.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Link: https://lkml.kernel.org/r/20190208135020.925487496@linutronix.de
      
      8<-------------
      
      v2: Undo the unintentional layout change of struct irq_desc.
      
       include/linux/irqdesc.h |    1 +
       kernel/irq/chip.c       |   12 ++++++++++--
       kernel/irq/internals.h  |    8 +++++++-
       kernel/irq/irqdesc.c    |    7 ++++++-
       4 files changed, 24 insertions(+), 4 deletions(-)
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      1f369486
    • L
      sched/topology: Fix percpu data types in struct sd_data & struct s_data · 845d4849
      Luc Van Oostenryck 提交于
      [ Upstream commit 99687cdbb3f6c8e32bcc7f37496e811f30460e48 ]
      
      The percpu members of struct sd_data and s_data are declared as:
      
      	struct ... ** __percpu member;
      
      So their type is:
      
      	__percpu pointer to pointer to struct ...
      
      But looking at how they're used, their type should be:
      
      	pointer to __percpu pointer to struct ...
      
      and they should thus be declared as:
      
      	struct ... * __percpu *member;
      
      So fix the placement of '__percpu' in the definition of these
      structures.
      
      This addresses a bunch of Sparse's warnings like:
      
      	warning: incorrect type in initializer (different address spaces)
      	  expected void const [noderef] <asn:3> *__vpp_verify
      	  got struct sched_domain **
      Signed-off-by: NLuc Van Oostenryck <luc.vanoostenryck@gmail.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20190118144936.79158-1-luc.vanoostenryck@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      845d4849
    • K
      clk: fractional-divider: check parent rate only if flag is set · 763a895a
      Katsuhiro Suzuki 提交于
      [ Upstream commit d13501a2bedfbea0983cc868d3f1dc692627f60d ]
      
      Custom approximation of fractional-divider may not need parent clock
      rate checking. For example Rockchip SoCs work fine using grand parent
      clock rate even if target rate is greater than parent.
      
      This patch checks parent clock rate only if CLK_SET_RATE_PARENT flag
      is set.
      
      For detailed example, clock tree of Rockchip I2S audio hardware.
        - Clock rate of CPLL is 1.2GHz, GPLL is 491.52MHz.
        - i2s1_div is integer divider can divide N (N is 1~128).
          Input clock is CPLL or GPLL. Initial divider value is N = 1.
          Ex) PLL = CPLL, N = 10, i2s1_div output rate is
            CPLL / 10 = 1.2GHz / 10 = 120MHz
        - i2s1_frac is fractional divider can divide input to x/y, x and
          y are 16bit integer.
      
      CPLL --> | selector | ---> i2s1_div -+--> | selector | --> I2S1 MCLK
      GPLL --> |          | ,--------------'    |          |
                            `--> i2s1_frac ---> |          |
      
      Clock mux system try to choose suitable one from i2s1_div and
      i2s1_frac for master clock (MCLK) of I2S1.
      
      Bad scenario as follows:
        - Try to set MCLK to 8.192MHz (32kHz audio replay)
          Candidate setting is
          - i2s1_div: GPLL / 60 = 8.192MHz
          i2s1_div candidate is exactly same as target clock rate, so mux
          choose this clock source. i2s1_div output rate is changed
          491.52MHz -> 8.192MHz
      
        - After that try to set to 11.2896MHz (44.1kHz audio replay)
          Candidate settings are
          - i2s1_div : CPLL / 107 = 11.214945MHz
          - i2s1_frac: i2s1_div   = 8.192MHz
            This is because clk_fd_round_rate() thinks target rate
            (11.2896MHz) is higher than parent rate (i2s1_div = 8.192MHz)
            and returns parent clock rate.
      
      Above is current upstreamed behavior. Clock mux system choose
      i2s1_div, but this clock rate is not acceptable for I2S driver, so
      users cannot replay audio.
      
      Expected behavior is:
        - Try to set master clock to 11.2896MHz (44.1kHz audio replay)
          Candidate settings are
          - i2s1_div : CPLL / 107          = 11.214945MHz
          - i2s1_frac: i2s1_div * 147/6400 = 11.2896MHz
                       Change i2s1_div to GPLL / 1 = 491.52MHz at same
                       time.
      
      If apply this commit, clk_fd_round_rate() calls custom approximate
      function of Rockchip even if target rate is higher than parent.
      Custom function changes both grand parent (i2s1_div) and parent
      (i2s_frac) settings at same time. Clock mux system can choose
      i2s1_frac and audio works fine.
      Signed-off-by: NKatsuhiro Suzuki <katsuhiro@katsuster.net>
      Reviewed-by: NHeiko Stuebner <heiko@sntech.de>
      [sboyd@kernel.org: Make function into a macro instead]
      Signed-off-by: NStephen Boyd <sboyd@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      763a895a
    • C
      f2fs: fix to check inline_xattr_size boundary correctly · 4ab78f4d
      Chao Yu 提交于
      [ Upstream commit 500e0b28ecd3c5aade98f3c3a339d18dcb166bb6 ]
      
      We use below condition to check inline_xattr_size boundary:
      
      	if (!F2FS_OPTION(sbi).inline_xattr_size ||
      		F2FS_OPTION(sbi).inline_xattr_size >=
      				DEF_ADDRS_PER_INODE -
      				F2FS_TOTAL_EXTRA_ATTR_SIZE -
      				DEF_INLINE_RESERVED_SIZE -
      				DEF_MIN_INLINE_SIZE)
      
      There is there problems in that check:
      - we should allow inline_xattr_size equaling to min size of inline
      {data,dentry} area.
      - F2FS_TOTAL_EXTRA_ATTR_SIZE and inline_xattr_size are based on
      different size unit, previous one is 4 bytes, latter one is 1 bytes.
      - DEF_MIN_INLINE_SIZE only indicate min size of inline data area,
      however, we need to consider min size of inline dentry area as well,
      minimal inline dentry should at least contain two entries: '.' and
      '..', so that min inline_dentry size is 40 bytes.
      
      .bitmap		1 * 1 = 1
      .reserved	1 * 1 = 1
      .dentry		11 * 2 = 22
      .filename	8 * 2 = 16
      total		40
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      4ab78f4d
    • L
      include/linux/relay.h: fix percpu annotation in struct rchan · d6ad08aa
      Luc Van Oostenryck 提交于
      [ Upstream commit 62461ac2e5b6520b6d65fc6d7d7b4b8df4b848d8 ]
      
      The percpu member of this structure is declared as:
      	struct ... ** __percpu member;
      So its type is:
      	__percpu pointer to pointer to struct ...
      
      But looking at how it's used, its type should be:
      	pointer to __percpu pointer to struct ...
      and it should thus be declared as:
      	struct ... * __percpu *member;
      
      So fix the placement of '__percpu' in the definition of this
      structures.
      
      This silents a few Sparse's warnings like:
      	warning: incorrect type in initializer (different address spaces)
      	  expected void const [noderef] <asn:3> *__vpp_verify
      	  got struct sched_domain **
      
      Link: http://lkml.kernel.org/r/20190118144902.79065-1-luc.vanoostenryck@gmail.com
      Fixes: 017c59c0 ("relay: Use per CPU constructs for the relay channel buffer pointers")
      Signed-off-by: NLuc Van Oostenryck <luc.vanoostenryck@gmail.com>
      Cc: Jens Axboe <axboe@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      d6ad08aa
    • D
      tracing: kdb: Fix ftdump to not sleep · b73c7d02
      Douglas Anderson 提交于
      [ Upstream commit 31b265b3baaf55f209229888b7ffea523ddab366 ]
      
      As reported back in 2016-11 [1], the "ftdump" kdb command triggers a
      BUG for "sleeping function called from invalid context".
      
      kdb's "ftdump" command wants to call ring_buffer_read_prepare() in
      atomic context.  A very simple solution for this is to add allocation
      flags to ring_buffer_read_prepare() so kdb can call it without
      triggering the allocation error.  This patch does that.
      
      Note that in the original email thread about this, it was suggested
      that perhaps the solution for kdb was to either preallocate the buffer
      ahead of time or create our own iterator.  I'm hoping that this
      alternative of adding allocation flags to ring_buffer_read_prepare()
      can be considered since it means I don't need to duplicate more of the
      core trace code into "trace_kdb.c" (for either creating my own
      iterator or re-preparing a ring allocator whose memory was already
      allocated).
      
      NOTE: another option for kdb is to actually figure out how to make it
      reuse the existing ftrace_dump() function and totally eliminate the
      duplication.  This sounds very appealing and actually works (the "sr
      z" command can be seen to properly dump the ftrace buffer).  The
      downside here is that ftrace_dump() fully consumes the trace buffer.
      Unless that is changed I'd rather not use it because it means "ftdump
      | grep xyz" won't be very useful to search the ftrace buffer since it
      will throw away the whole trace on the first grep.  A future patch to
      dump only the last few lines of the buffer will also be hard to
      implement.
      
      [1] https://lkml.kernel.org/r/20161117191605.GA21459@google.com
      
      Link: http://lkml.kernel.org/r/20190308193205.213659-1-dianders@chromium.orgReported-by: NBrian Norris <briannorris@chromium.org>
      Signed-off-by: NDouglas Anderson <dianders@chromium.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      b73c7d02
  2. 03 4月, 2019 2 次提交
    • H
      drivers: base: Helpers for adding device connection descriptions · e99d90ce
      Heikki Krogerus 提交于
      commit cd7753d371388e712e3ee52b693459f9b71aaac2 upstream.
      
      Introducing helpers for adding and removing multiple device
      connection descriptions at once.
      Acked-by: NHans de Goede <hdegoede@redhat.com>
      Tested-by: NHans de Goede <hdegoede@redhat.com>
      Signed-off-by: NHeikki Krogerus <heikki.krogerus@linux.intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e99d90ce
    • N
      mm: add support for kmem caches in DMA32 zone · 62d342d6
      Nicolas Boichat 提交于
      commit 6d6ea1e967a246f12cfe2f5fb743b70b2e608d4a upstream.
      
      Patch series "iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables",
      v6.
      
      This is a followup to the discussion in [1], [2].
      
      IOMMUs using ARMv7 short-descriptor format require page tables (level 1
      and 2) to be allocated within the first 4GB of RAM, even on 64-bit
      systems.
      
      For L1 tables that are bigger than a page, we can just use
      __get_free_pages with GFP_DMA32 (on arm64 systems only, arm would still
      use GFP_DMA).
      
      For L2 tables that only take 1KB, it would be a waste to allocate a full
      page, so we considered 3 approaches:
       1. This series, adding support for GFP_DMA32 slab caches.
       2. genalloc, which requires pre-allocating the maximum number of L2 page
          tables (4096, so 4MB of memory).
       3. page_frag, which is not very memory-efficient as it is unable to reuse
          freed fragments until the whole page is freed. [3]
      
      This series is the most memory-efficient approach.
      
      stable@ note:
        We confirmed that this is a regression, and IOMMU errors happen on 4.19
        and linux-next/master on MT8173 (elm, Acer Chromebook R13). The issue
        most likely starts from commit ad67f5a6 ("arm64: replace ZONE_DMA
        with ZONE_DMA32"), i.e. 4.15, and presumably breaks a number of Mediatek
        platforms (and maybe others?).
      
      [1] https://lists.linuxfoundation.org/pipermail/iommu/2018-November/030876.html
      [2] https://lists.linuxfoundation.org/pipermail/iommu/2018-December/031696.html
      [3] https://patchwork.codeaurora.org/patch/671639/
      
      This patch (of 3):
      
      IOMMUs using ARMv7 short-descriptor format require page tables to be
      allocated within the first 4GB of RAM, even on 64-bit systems.  On arm64,
      this is done by passing GFP_DMA32 flag to memory allocation functions.
      
      For IOMMU L2 tables that only take 1KB, it would be a waste to allocate
      a full page using get_free_pages, so we considered 3 approaches:
       1. This patch, adding support for GFP_DMA32 slab caches.
       2. genalloc, which requires pre-allocating the maximum number of L2
          page tables (4096, so 4MB of memory).
       3. page_frag, which is not very memory-efficient as it is unable
          to reuse freed fragments until the whole page is freed.
      
      This change makes it possible to create a custom cache in DMA32 zone using
      kmem_cache_create, then allocate memory using kmem_cache_alloc.
      
      We do not create a DMA32 kmalloc cache array, as there are currently no
      users of kmalloc(..., GFP_DMA32).  These calls will continue to trigger a
      warning, as we keep GFP_DMA32 in GFP_SLAB_BUG_MASK.
      
      This implies that calls to kmem_cache_*alloc on a SLAB_CACHE_DMA32
      kmem_cache must _not_ use GFP_DMA32 (it is anyway redundant and
      unnecessary).
      
      Link: http://lkml.kernel.org/r/20181210011504.122604-2-drinkcat@chromium.orgSigned-off-by: NNicolas Boichat <drinkcat@chromium.org>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Sasha Levin <Alexander.Levin@microsoft.com>
      Cc: Huaisheng Ye <yehs1@lenovo.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Yong Wu <yong.wu@mediatek.com>
      Cc: Matthias Brugger <matthias.bgg@gmail.com>
      Cc: Tomasz Figa <tfiga@google.com>
      Cc: Yingjoe Chen <yingjoe.chen@mediatek.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Hsin-Yi Wang <hsinyi@chromium.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      62d342d6
  3. 27 3月, 2019 1 次提交
  4. 24 3月, 2019 8 次提交
    • S
      KVM: Call kvm_arch_memslots_updated() before updating memslots · 23ad135a
      Sean Christopherson 提交于
      commit 152482580a1b0accb60676063a1ac57b2d12daf6 upstream.
      
      kvm_arch_memslots_updated() is at this point in time an x86-specific
      hook for handling MMIO generation wraparound.  x86 stashes 19 bits of
      the memslots generation number in its MMIO sptes in order to avoid
      full page fault walks for repeat faults on emulated MMIO addresses.
      Because only 19 bits are used, wrapping the MMIO generation number is
      possible, if unlikely.  kvm_arch_memslots_updated() alerts x86 that
      the generation has changed so that it can invalidate all MMIO sptes in
      case the effective MMIO generation has wrapped so as to avoid using a
      stale spte, e.g. a (very) old spte that was created with generation==0.
      
      Given that the purpose of kvm_arch_memslots_updated() is to prevent
      consuming stale entries, it needs to be called before the new generation
      is propagated to memslots.  Invalidating the MMIO sptes after updating
      memslots means that there is a window where a vCPU could dereference
      the new memslots generation, e.g. 0, and incorrectly reuse an old MMIO
      spte that was created with (pre-wrap) generation==0.
      
      Fixes: e59dbe09 ("KVM: Introduce kvm_arch_memslots_updated()")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      23ad135a
    • N
      dm: fix to_sector() for 32bit · 7668d6e4
      NeilBrown 提交于
      commit 0bdb50c531f7377a9da80d3ce2d61f389c84cb30 upstream.
      
      A dm-raid array with devices larger than 4GB won't assemble on
      a 32 bit host since _check_data_dev_sectors() was added in 4.16.
      This is because to_sector() treats its argument as an "unsigned long"
      which is 32bits (4GB) on a 32bit host.  Using "unsigned long long"
      is more correct.
      
      Kernels as early as 4.2 can have other problems due to to_sector()
      being used on the size of a device.
      
      Fixes: 0cf45031 ("dm raid: add support for the MD RAID0 personality")
      cc: stable@vger.kernel.org (v4.2+)
      Reported-and-tested-by: NGuillaume Perréal <gperreal@free.fr>
      Signed-off-by: NNeilBrown <neil@brown.name>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7668d6e4
    • J
      arm64: Fix HCR.TGE status for NMI contexts · 85c8ea22
      Julien Thierry 提交于
      commit 5870970b9a828d8693aa6d15742573289d7dbcd0 upstream.
      
      When using VHE, the host needs to clear HCR_EL2.TGE bit in order
      to interact with guest TLBs, switching from EL2&0 translation regime
      to EL1&0.
      
      However, some non-maskable asynchronous event could happen while TGE is
      cleared like SDEI. Because of this address translation operations
      relying on EL2&0 translation regime could fail (tlb invalidation,
      userspace access, ...).
      
      Fix this by properly setting HCR_EL2.TGE when entering NMI context and
      clear it if necessary when returning to the interrupted context.
      Signed-off-by: NJulien Thierry <julien.thierry@arm.com>
      Suggested-by: NMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: NJames Morse <james.morse@arm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: linux-arch@vger.kernel.org
      Cc: stable@vger.kernel.org
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      85c8ea22
    • W
      bpf: only test gso type on gso packets · 9920eb40
      Willem de Bruijn 提交于
      [ Upstream commit 4c3024debf62de4c6ac6d3cb4c0063be21d4f652 ]
      
      BPF can adjust gso only for tcp bytestreams. Fail on other gso types.
      
      But only on gso packets. It does not touch this field if !gso_size.
      
      Fixes: b90efd225874 ("bpf: only adjust gso_size on bytestream protocols")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      9920eb40
    • H
      device property: Fix the length used in PROPERTY_ENTRY_STRING() · c835b441
      Heikki Krogerus 提交于
      commit 2b6e492467c78183bb629bb0a100ea3509b615a5 upstream.
      
      With string type property entries we need to use
      sizeof(const char *) instead of the number of characters as
      the length of the entry.
      
      If the string was shorter then sizeof(const char *),
      attempts to read it would have failed with -EOVERFLOW. The
      problem has been hidden because all build-in string
      properties have had a string longer then 8 characters until
      now.
      
      Fixes: a85f4204 ("device property: helper macros for property entry creation")
      Cc: 4.5+ <stable@vger.kernel.org> # 4.5+
      Signed-off-by: NHeikki Krogerus <heikki.krogerus@linux.intel.com>
      Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c835b441
    • J
      splice: don't merge into linked buffers · 2af926fd
      Jann Horn 提交于
      commit a0ce2f0aa6ad97c3d4927bf2ca54bcebdf062d55 upstream.
      
      Before this patch, it was possible for two pipes to affect each other after
      data had been transferred between them with tee():
      
      ============
      $ cat tee_test.c
      
      int main(void) {
        int pipe_a[2];
        if (pipe(pipe_a)) err(1, "pipe");
        int pipe_b[2];
        if (pipe(pipe_b)) err(1, "pipe");
        if (write(pipe_a[1], "abcd", 4) != 4) err(1, "write");
        if (tee(pipe_a[0], pipe_b[1], 2, 0) != 2) err(1, "tee");
        if (write(pipe_b[1], "xx", 2) != 2) err(1, "write");
      
        char buf[5];
        if (read(pipe_a[0], buf, 4) != 4) err(1, "read");
        buf[4] = 0;
        printf("got back: '%s'\n", buf);
      }
      $ gcc -o tee_test tee_test.c
      $ ./tee_test
      got back: 'abxx'
      $
      ============
      
      As suggested by Al Viro, fix it by creating a separate type for
      non-mergeable pipe buffers, then changing the types of buffers in
      splice_pipe_to_pipe() and link_pipe().
      
      Cc: <stable@vger.kernel.org>
      Fixes: 7c77f0b3 ("splice: implement pipe to pipe splicing")
      Fixes: 70524490 ("[PATCH] splice: add support for sys_tee()")
      Suggested-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2af926fd
    • D
      keys: Fix dependency loop between construction record and auth key · 72683282
      David Howells 提交于
      [ Upstream commit 822ad64d7e46a8e2c8b8a796738d7b657cbb146d ]
      
      In the request_key() upcall mechanism there's a dependency loop by which if
      a key type driver overrides the ->request_key hook and the userspace side
      manages to lose the authorisation key, the auth key and the internal
      construction record (struct key_construction) can keep each other pinned.
      
      Fix this by the following changes:
      
       (1) Killing off the construction record and using the auth key instead.
      
       (2) Including the operation name in the auth key payload and making the
           payload available outside of security/keys/.
      
       (3) The ->request_key hook is given the authkey instead of the cons
           record and operation name.
      
      Changes (2) and (3) allow the auth key to naturally be cleaned up if the
      keyring it is in is destroyed or cleared or the auth key is unlinked.
      
      Fixes: 7ee02a316600 ("keys: Fix dependency loop between construction record and auth key")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NJames Morris <james.morris@microsoft.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      72683282
    • W
      bpf: only adjust gso_size on bytestream protocols · 413e3985
      Willem de Bruijn 提交于
      [ Upstream commit b90efd2258749e04e1b3f71ef0d716f2ac2337e0 ]
      
      bpf_skb_change_proto and bpf_skb_adjust_room change skb header length.
      For GSO packets they adjust gso_size to maintain the same MTU.
      
      The gso size can only be safely adjusted on bytestream protocols.
      Commit d02f51cb ("bpf: fix bpf_skb_adjust_net/bpf_skb_proto_xlat
      to deal with gso sctp skbs") excluded SKB_GSO_SCTP.
      
      Since then type SKB_GSO_UDP_L4 has been added, whose contents are one
      gso_size unit per datagram. Also exclude these.
      
      Move from a blacklist to a whitelist check to future proof against
      additional such new GSO types, e.g., for fraglist based GRO.
      
      Fixes: bec1f6f6 ("udp: generate gso with UDP_SEGMENT")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      413e3985
  5. 14 3月, 2019 2 次提交
  6. 10 3月, 2019 1 次提交
    • V
      cpufreq: Use struct kobj_attribute instead of struct global_attr · 464b4279
      Viresh Kumar 提交于
      commit 625c85a62cb7d3c79f6e16de3cfa972033658250 upstream.
      
      The cpufreq_global_kobject is created using kobject_create_and_add()
      helper, which assigns the kobj_type as dynamic_kobj_ktype and show/store
      routines are set to kobj_attr_show() and kobj_attr_store().
      
      These routines pass struct kobj_attribute as an argument to the
      show/store callbacks. But all the cpufreq files created using the
      cpufreq_global_kobject expect the argument to be of type struct
      attribute. Things work fine currently as no one accesses the "attr"
      argument. We may not see issues even if the argument is used, as struct
      kobj_attribute has struct attribute as its first element and so they
      will both get same address.
      
      But this is logically incorrect and we should rather use struct
      kobj_attribute instead of struct global_attr in the cpufreq core and
      drivers and the show/store callbacks should take struct kobj_attribute
      as argument instead.
      
      This bug is caught using CFI CLANG builds in android kernel which
      catches mismatch in function prototypes for such callbacks.
      Reported-by: NDonghee Han <dh.han@samsung.com>
      Reported-by: NSangkyu Kim <skwith.kim@samsung.com>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      464b4279
  7. 06 3月, 2019 3 次提交
  8. 27 2月, 2019 4 次提交
  9. 23 2月, 2019 2 次提交
  10. 20 2月, 2019 2 次提交
    • Z
      mmc: block: handle complete_work on separate workqueue · c4609e81
      Zachary Hays 提交于
      commit dcf6e2e38a1c7ccbc535de5e1d9b14998847499d upstream.
      
      The kblockd workqueue is created with the WQ_MEM_RECLAIM flag set.
      This generates a rescuer thread for that queue that will trigger when
      the CPU is under heavy load and collect the uncompleted work.
      
      In the case of mmc, this creates the possibility of a deadlock when
      there are multiple partitions on the device as other blk-mq work is
      also run on the same queue. For example:
      
      - worker 0 claims the mmc host to work on partition 1
      - worker 1 attempts to claim the host for partition 2 but has to wait
        for worker 0 to finish
      - worker 0 schedules complete_work to release the host
      - rescuer thread is triggered after time-out and collects the dangling
        work
      - rescuer thread attempts to complete the work in order starting with
        claim host
      - the task to release host is now blocked by a task to claim it and
        will never be called
      
      The above results in multiple hung tasks that lead to failures to
      mount partitions.
      
      Handling complete_work on a separate workqueue avoids this by keeping
      the work completion tasks separate from the other blk-mq work. This
      allows the host to be released without getting blocked by other tasks
      attempting to claim the host.
      Signed-off-by: NZachary Hays <zhays@lexmark.com>
      Fixes: 81196976 ("mmc: block: Add blk-mq support")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c4609e81
    • J
      perf/x86: Add check_period PMU callback · 74cbb754
      Jiri Olsa 提交于
      commit 81ec3f3c4c4d78f2d3b6689c9816bfbdf7417dbb upstream.
      
      Vince (and later on Ravi) reported crashes in the BTS code during
      fuzzing with the following backtrace:
      
        general protection fault: 0000 [#1] SMP PTI
        ...
        RIP: 0010:perf_prepare_sample+0x8f/0x510
        ...
        Call Trace:
         <IRQ>
         ? intel_pmu_drain_bts_buffer+0x194/0x230
         intel_pmu_drain_bts_buffer+0x160/0x230
         ? tick_nohz_irq_exit+0x31/0x40
         ? smp_call_function_single_interrupt+0x48/0xe0
         ? call_function_single_interrupt+0xf/0x20
         ? call_function_single_interrupt+0xa/0x20
         ? x86_schedule_events+0x1a0/0x2f0
         ? x86_pmu_commit_txn+0xb4/0x100
         ? find_busiest_group+0x47/0x5d0
         ? perf_event_set_state.part.42+0x12/0x50
         ? perf_mux_hrtimer_restart+0x40/0xb0
         intel_pmu_disable_event+0xae/0x100
         ? intel_pmu_disable_event+0xae/0x100
         x86_pmu_stop+0x7a/0xb0
         x86_pmu_del+0x57/0x120
         event_sched_out.isra.101+0x83/0x180
         group_sched_out.part.103+0x57/0xe0
         ctx_sched_out+0x188/0x240
         ctx_resched+0xa8/0xd0
         __perf_event_enable+0x193/0x1e0
         event_function+0x8e/0xc0
         remote_function+0x41/0x50
         flush_smp_call_function_queue+0x68/0x100
         generic_smp_call_function_single_interrupt+0x13/0x30
         smp_call_function_single_interrupt+0x3e/0xe0
         call_function_single_interrupt+0xf/0x20
         </IRQ>
      
      The reason is that while event init code does several checks
      for BTS events and prevents several unwanted config bits for
      BTS event (like precise_ip), the PERF_EVENT_IOC_PERIOD allows
      to create BTS event without those checks being done.
      
      Following sequence will cause the crash:
      
      If we create an 'almost' BTS event with precise_ip and callchains,
      and it into a BTS event it will crash the perf_prepare_sample()
      function because precise_ip events are expected to come
      in with callchain data initialized, but that's not the
      case for intel_pmu_drain_bts_buffer() caller.
      
      Adding a check_period callback to be called before the period
      is changed via PERF_EVENT_IOC_PERIOD. It will deny the change
      if the event would become BTS. Plus adding also the limit_period
      check as well.
      Reported-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20190204123532.GA4794@kravaSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      74cbb754
  11. 15 2月, 2019 1 次提交
    • B
      SUNRPC: Always drop the XPRT_LOCK on XPRT_CLOSE_WAIT · 2440f3ce
      Benjamin Coddington 提交于
      This patch is only appropriate for stable kernels v4.16 - v4.19
      
      Since commit 9b30889c ("SUNRPC: Ensure we always close the socket after
      a connection shuts down"), and until commit c544577daddb ("SUNRPC: Clean up
      transport write space handling"), it is possible for the NFS client to spin
      in the following tight loop:
      
      269.964083: rpc_task_run_action: task:43@0 flags=5a81 state=0005 status=0 action=call_bind [sunrpc]
      269.964083: rpc_task_run_action: task:43@0 flags=5a81 state=0005 status=0 action=call_connect [sunrpc]
      269.964083: rpc_task_run_action: task:43@0 flags=5a81 state=0005 status=0 action=call_transmit [sunrpc]
      269.964085: xprt_transmit: peer=[10.0.1.82]:2049 xid=0x761d3f77 status=-32
      269.964085: rpc_task_run_action: task:43@0 flags=5a81 state=0005 status=-32 action=call_transmit_status [sunrpc]
      269.964085: rpc_task_run_action: task:43@0 flags=5a81 state=0005 status=-32 action=call_status [sunrpc]
      269.964085: rpc_call_status: task:43@0 status=-32
      
      The issue is that the path through call_transmit_status does not release
      the XPRT_LOCK when the transmit result is -EPIPE, so the socket cannot be
      properly shut down.
      
      The below commit fixed things up in mainline by unconditionally calling
      xprt_end_transmit() and releasing the XPRT_LOCK after every pass through
      call_transmit.  However, the entirety of this commit is not appropriate for
      stable kernels because its original inclusion was part of a series that
      modifies the sunrpc code to use a different queueing model.  As a result,
      there are machinations within this patch that are not needed for a stable
      fix and will not make sense without a larger backport of the mainline
      series.
      
      In this patch, we take the slightly modified bit of the mainline patch
      below, which is to release the XPRT_LOCK on transmission error should we
      detect that the transport is waiting to close.
      
      commit c544577daddb618c7dd5fa7fb98d6a41782f020e upstream
      Author: Trond Myklebust <trond.myklebust@hammerspace.com>
      Date:   Mon Sep 3 23:39:27 2018 -0400
      
          SUNRPC: Clean up transport write space handling
      
          Treat socket write space handling in the same way we now treat transport
          congestion: by denying the XPRT_LOCK until the transport signals that it
          has free buffer space.
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      
      The original discussion of the problem is here:
      
          https://lore.kernel.org/linux-nfs/20181212135157.4489-1-dwysocha@redhat.com/T/#t
      
      This passes my usual cthon and xfstests on NFS as applied on v4.19 mainline.
      Reported-by: NDave Wysochanski <dwysocha@redhat.com>
      Suggested-by: NTrond Myklebust <trondmy@hammerspace.com>
      Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2440f3ce
  12. 13 2月, 2019 6 次提交