1. 22 11月, 2019 3 次提交
  2. 10 11月, 2019 10 次提交
  3. 09 11月, 2019 3 次提交
  4. 08 11月, 2019 1 次提交
    • R
      EDAC/ghes: Fix locking and memory barrier issues · 23f61b9f
      Robert Richter 提交于
      The ghes registration and refcount is broken in several ways:
      
       * ghes_edac_register() returns with success for a 2nd instance
         even if a first instance's registration is still running. This is
         not correct as the first instance may fail later. A subsequent
         registration may not finish before the first. Parallel registrations
         must be avoided.
      
       * The refcount was increased even if a registration failed. This
         leads to stale counters preventing the device from being released.
      
       * The ghes refcount may not be decremented properly on unregistration.
         Always decrement the refcount once ghes_edac_unregister() is called to
         keep the refcount sane.
      
       * The ghes_pvt pointer is handed to the irq handler before registration
         finished.
      
       * The mci structure could be freed while the irq handler is running.
      
      Fix this by adding a mutex to ghes_edac_register(). This mutex
      serializes instances to register and unregister. The refcount is only
      increased if the registration succeeded. This makes sure the refcount is
      in a consistent state after registering or unregistering a device.
      
      Note: A spinlock cannot be used here as the code section may sleep.
      
      The ghes_pvt is protected by ghes_lock now. This ensures the pointer is
      not updated before registration was finished or while the irq handler is
      running. It is unset before unregistering the device including necessary
      (implicit) memory barriers making the changes visible to other CPUs.
      Thus, the device can not be used anymore by an interrupt.
      
      Also, rename ghes_init to ghes_refcount for better readability and
      switch to refcount API.
      
      A refcount is needed because there can be multiple GHES structures being
      defined (see ACPI 6.3 specification, 18.3.2.7 Generic Hardware Error
      Source, "Some platforms may describe multiple Generic Hardware Error
      Source structures with different notification types, ...").
      
      Another approach to use the mci's device refcount (get_device()) and
      have a release function does not work here. A release function will be
      called only for device_release() with the last put_device() call. The
      device must be deleted *before* that with device_del(). This is only
      possible by maintaining an own refcount.
      
       [ bp: touchups. ]
      
      Fixes: 0fe5f281 ("EDAC, ghes: Model a single, logical memory controller")
      Fixes: 1e72e673 ("EDAC/ghes: Fix Use after free in ghes_edac remove path")
      Co-developed-by: NJames Morse <james.morse@arm.com>
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Co-developed-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NRobert Richter <rrichter@marvell.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Link: https://lkml.kernel.org/r/20191105200732.3053-1-rrichter@marvell.com
      23f61b9f
  5. 06 11月, 2019 5 次提交
  6. 25 10月, 2019 1 次提交
  7. 24 10月, 2019 1 次提交
  8. 19 10月, 2019 2 次提交
    • T
      EDAC, skx: Retrieve and print retry_rd_err_log registers · e80634a7
      Tony Luck 提交于
      Skylake logs some additional useful information in per-channel
      registers in addition the the architectural status/addr/misc
      logged in the machine check bank.
      
      Pick up this information and add it to the EDAC log:
      
      	retry_rd_err_[five 32-bit register values]
      
      Sorry, no definitions for these registers. OEMs and DIMM vendors
      will be able to use them to isolate which cells in the DIMM are
      causing problems.
      
      	correrrcnt[per rank corrected error counts]
      
      Note that if additional errors are logged while these registers are
      being read, you may see a jumble of values some from earlier errors,
      others from later errors (since the registers report the most recent
      logged error). The correrrcnt registers provide error counts per possible
      rank. If these counts only change by one since the previous error logged
      for this channel, then it is safe to assume that the registers logged
      provide a coherent view of one error.
      
      With this change EDAC logs look like this:
      
      EDAC MC4: 1 CE memory read error on CPU_SrcID#2_MC#0_Chan#1_DIMM#0 (channel:1 slot:0 page:0x8f26018 offset:0x0 grain:32 syndrome:0x0 -  err_code:0x0101:0x0091 socket:2 imc:0 rank:0 bg:0 ba:0 row:0x1f880 col:0x200 retry_rd_err_log[0001a209 00000000 00000001 04800001 0001f880] correrrcnt[0001 0000 0000 0000 0000 0000 0000 0000])
      Acked-by: NAristeu Rozanski <aris@redhat.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      e80634a7
    • T
      EDAC, skx_common: Refactor so that we initialize "dev" in result of adxl decode. · 29b8e84f
      Tony Luck 提交于
      Simplifies the code a little.
      Acked-by: NAristeu Rozanski <aris@redhat.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      29b8e84f
  9. 17 10月, 2019 2 次提交
    • B
      Merge branch 'edac-urgent' into edac-for-next · 3a5e7ec9
      Borislav Petkov 提交于
      Pick up urgent change into next queue.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      3a5e7ec9
    • J
      EDAC/ghes: Fix Use after free in ghes_edac remove path · 1e72e673
      James Morse 提交于
      ghes_edac models a single logical memory controller, and uses a global
      ghes_init variable to ensure only the first ghes_edac_register() will
      do anything.
      
      ghes_edac is registered the first time a GHES entry in the HEST is
      probed. There may be multiple entries, so subsequent attempts to
      register ghes_edac are silently ignored as the work has already been
      done.
      
      When a GHES entry is unregistered, it calls ghes_edac_unregister(),
      which free()s the memory behind the global variables in ghes_edac.
      
      But there may be multiple GHES entries, the next call to
      ghes_edac_unregister() will dereference the free()d memory, and attempt
      to free it a second time.
      
      This may also be triggered on a platform with one GHES entry, if the
      driver is unbound/re-bound and unbound. The re-bind step will do
      nothing because of ghes_init, the second unbind will then do the same
      work as the first.
      
      Doing the unregister work on the first call is unsafe, as another
      CPU may be processing a notification in ghes_edac_report_mem_error(),
      using the memory we are about to free.
      
      ghes_init is already half of the reference counting. We only need
      to do the register work for the first call, and the unregister work
      for the last. Add the unregister check.
      
      This means we no longer free ghes_edac's memory while there are
      GHES entries that may receive a notification.
      
      This was detected by KASAN and DEBUG_TEST_DRIVER_REMOVE.
      
       [ bp: merge into a single patch. ]
      
      Fixes: 0fe5f281 ("EDAC, ghes: Model a single, logical memory controller")
      Reported-by: NJohn Garry <john.garry@huawei.com>
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Robert Richter <rrichter@marvell.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20191014171919.85044-2-james.morse@arm.com
      Link: https://lkml.kernel.org/r/304df85b-8b56-b77e-1a11-aa23769f2e7c@huawei.com
      1e72e673
  10. 09 10月, 2019 1 次提交
  11. 01 10月, 2019 11 次提交
    • M
      EDAC: skx_common: get rid of unused type var · f05390d3
      Mauro Carvalho Chehab 提交于
      	drivers/edac/skx_common.c: In function ‘skx_mce_output_error’:
      	drivers/edac/skx_common.c:478:8: warning: variable ‘type’ set but not used [-Wunused-but-set-variable]
      	  478 |  char *type, *optype;
      	      |        ^~~~
      Acked-by: NBorislav Petkov <bp@alien8.de>
      Acked-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      f05390d3
    • M
      EDAC: sb_edac: get rid of unused vars · 323014d8
      Mauro Carvalho Chehab 提交于
      There are several vars unused on this driver, probably because
      it was a modified copy of another driver. Get rid of them.
      
      	drivers/edac/sb_edac.c: In function ‘knl_get_dimm_capacity’:
      	drivers/edac/sb_edac.c:1343:16: warning: variable ‘sad_size’ set but not used [-Wunused-but-set-variable]
      	 1343 |  u64 sad_base, sad_size, sad_limit = 0;
      	      |                ^~~~~~~~
      	drivers/edac/sb_edac.c: In function ‘sbridge_mce_output_error’:
      	drivers/edac/sb_edac.c:2955:8: warning: variable ‘type’ set but not used [-Wunused-but-set-variable]
      	 2955 |  char *type, *optype, msg[256];
      	      |        ^~~~
      	drivers/edac/sb_edac.c: In function ‘sbridge_unregister_mci’:
      	drivers/edac/sb_edac.c:3203:22: warning: variable ‘pvt’ set but not used [-Wunused-but-set-variable]
      	 3203 |  struct sbridge_pvt *pvt;
      	      |                      ^~~
      	At top level:
      	drivers/edac/sb_edac.c:266:18: warning: ‘correrrthrsld’ defined but not used [-Wunused-const-variable=]
      	  266 | static const u32 correrrthrsld[] = {
      	      |                  ^~~~~~~~~~~~~
      	drivers/edac/sb_edac.c:257:18: warning: ‘correrrcnt’ defined but not used [-Wunused-const-variable=]
      	  257 | static const u32 correrrcnt[] = {
      	      |                  ^~~~~~~~~~
      Acked-by: NBorislav Petkov <bp@alien8.de>
      Acked-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      323014d8
    • M
      EDAC: i5400_edac: get rid of some unused vars · bb66f867
      Mauro Carvalho Chehab 提交于
      There are several temporary unused vars:
      
      	drivers/edac/i5400_edac.c: In function ‘i5400_get_mc_regs’:
      	drivers/edac/i5400_edac.c:1058:6: warning: variable ‘maxdimmperch’ set but not used [-Wunused-but-set-variable]
      	 1058 |  int maxdimmperch;
      	      |      ^~~~~~~~~~~~
      	drivers/edac/i5400_edac.c:1057:6: warning: variable ‘maxch’ set but not used [-Wunused-but-set-variable]
      	 1057 |  int maxch;
      	      |      ^~~~~
      	drivers/edac/i5400_edac.c: In function ‘i5400_init_dimms’:
      	drivers/edac/i5400_edac.c:1174:6: warning: variable ‘max_dimms’ set but not used [-Wunused-but-set-variable]
      	 1174 |  int max_dimms;
      	      |      ^~~~~~~~~
      	drivers/edac/i5400_edac.c:1173:14: warning: variable ‘channel_count’ set but not used [-Wunused-but-set-variable]
      	 1173 |  int ndimms, channel_count;
      	      |              ^~~~~~~~~~~~~
      
      Get rid of them.
      Acked-by: NBorislav Petkov <bp@alien8.de>
      Acked-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      bb66f867
    • M
      EDAC: i5400_edac: print type at debug message · 1acd05e4
      Mauro Carvalho Chehab 提交于
      There are 3 types of non-recoverable errors that the MC reports:
      
      	- Fatal;
      	- Non-fatal uncorrected
      	- Non-fatal correctable
      
      While we don't add it to the log itself, it could be useful to
      have this at least for debug messages.
      
      This shuts up this warning:
      
      	drivers/edac/i5400_edac.c: In function ‘i5400_proccess_non_recoverable_info’:
      	drivers/edac/i5400_edac.c:524:8: warning: variable ‘type’ set but not used [-Wunused-but-set-variable]
      	  524 |  char *type = NULL;
      	      |        ^~~~
      Acked-by: NBorislav Petkov <bp@alien8.de>
      Acked-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      1acd05e4
    • M
      EDAC: i7300_edac: fix a kernel-doc syntax · 48356e0d
      Mauro Carvalho Chehab 提交于
      The declaration of the kerneldoc entry is wrong, causing this
      warning:
      
      	drivers/edac/i7300_edac.c:824: warning: Function parameter or member 'mir_no' not described in 'decode_mir'
      Acked-by: NBorislav Petkov <bp@alien8.de>
      Acked-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      48356e0d
    • M
      EDAC: i7300_edac: rename a kernel-doc var description · 9f95c8d5
      Mauro Carvalho Chehab 提交于
      One var was renamed, but the associated kernel-doc markup still
      points to the old name.
      Acked-by: NBorislav Petkov <bp@alien8.de>
      Acked-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      9f95c8d5
    • M
      EDAC: i5100_edac: get rid of an unused var · c43fa3b1
      Mauro Carvalho Chehab 提交于
      As reported by GCC with W=1:
      
      	drivers/edac/i5100_edac.c:714:16: warning: variable ‘et’ set but not used [-Wunused-but-set-variable]
      	  714 |  unsigned long et;
      	      |                ^~
      
      It sounds some left over from some code before the addition of
      an udelay().
      Acked-by: NBorislav Petkov <bp@alien8.de>
      Acked-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      c43fa3b1
    • L
      Linux 5.4-rc1 · 54ecb8f7
      Linus Torvalds 提交于
      54ecb8f7
    • L
      Merge tag 'for-5.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · bb48a591
      Linus Torvalds 提交于
      Pull btrfs fixes from David Sterba:
       "A bunch of fixes that accumulated in recent weeks, mostly material for
        stable.
      
        Summary:
      
         - fix for regression from 5.3 that prevents to use balance convert
           with single profile
      
         - qgroup fixes: rescan race, accounting leak with multiple writers,
           potential leak after io failure recovery
      
         - fix for use after free in relocation (reported by KASAN)
      
         - other error handling fixups"
      
      * tag 'for-5.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: qgroup: Fix reserved data space leak if we have multiple reserve calls
        btrfs: qgroup: Fix the wrong target io_tree when freeing reserved data space
        btrfs: Fix a regression which we can't convert to SINGLE profile
        btrfs: relocation: fix use-after-free on dead relocation roots
        Btrfs: fix race setting up and completing qgroup rescan workers
        Btrfs: fix missing error return if writeback for extent buffer never started
        btrfs: adjust dirty_metadata_bytes after writeback failure of extent buffer
        Btrfs: fix selftests failure due to uninitialized i_mode in test inodes
      bb48a591
    • L
      Merge tag 'csky-for-linus-5.4-rc1' of git://github.com/c-sky/csky-linux · 80b29b6b
      Linus Torvalds 提交于
      Pull csky updates from Guo Ren:
       "This round of csky subsystem just some fixups:
      
         - Fix mb() synchronization problem
      
         - Fix dma_alloc_coherent with PAGE_SO attribute
      
         - Fix cache_op failed when cross memory ZONEs
      
         - Optimize arch_sync_dma_for_cpu/device with dma_inv_range
      
         - Fix ioremap function losing
      
         - Fix arch_get_unmapped_area() implementation
      
         - Fix defer cache flush for 610
      
         - Support kernel non-aligned access
      
         - Fix 610 vipt cache flush mechanism
      
         - Fix add zero_fp fixup perf backtrace panic
      
         - Move static keyword to the front of declaration
      
         - Fix csky_pmu.max_period assignment
      
         - Use generic free_initrd_mem()
      
         - entry: Remove unneeded need_resched() loop"
      
      * tag 'csky-for-linus-5.4-rc1' of git://github.com/c-sky/csky-linux:
        csky: Move static keyword to the front of declaration
        csky: entry: Remove unneeded need_resched() loop
        csky: Fixup csky_pmu.max_period assignment
        csky: Fixup add zero_fp fixup perf backtrace panic
        csky: Use generic free_initrd_mem()
        csky: Fixup 610 vipt cache flush mechanism
        csky: Support kernel non-aligned access
        csky: Fixup defer cache flush for 610
        csky: Fixup arch_get_unmapped_area() implementation
        csky: Fixup ioremap function losing
        csky: Optimize arch_sync_dma_for_cpu/device with dma_inv_range
        csky/dma: Fixup cache_op failed when cross memory ZONEs
        csky: Fixup dma_alloc_coherent with PAGE_SO attribute
        csky: Fixup mb() synchronization problem
      80b29b6b
    • L
      Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · cef0aa0c
      Linus Torvalds 提交于
      Pull ARM SoC fixes from Olof Johansson:
       "A few fixes that have trickled in through the merge window:
      
         - Video fixes for OMAP due to panel-dpi driver removal
      
         - Clock fixes for OMAP that broke no-idle quirks + nfsroot on DRA7
      
         - Fixing arch version on ASpeed ast2500
      
         - Two fixes for reset handling on ARM SCMI"
      
      * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
        ARM: aspeed: ast2500 is ARMv6K
        reset: reset-scmi: add missing handle initialisation
        firmware: arm_scmi: reset: fix reset_state assignment in scmi_domain_reset
        bus: ti-sysc: Remove unpaired sysc_clkdm_deny_idle()
        ARM: dts: logicpd-som-lv: Fix i2c2 and i2c3 Pin mux
        ARM: dts: am3517-evm: Fix missing video
        ARM: dts: logicpd-torpedo-baseboard: Fix missing video
        ARM: omap2plus_defconfig: Fix missing video
        bus: ti-sysc: Fix handling of invalid clocks
        bus: ti-sysc: Fix clock handling for no-idle quirks
      cef0aa0c