1. 07 8月, 2019 40 次提交
    • Y
      IB/mlx5: Move MRs to a kernel PD when freeing them to the MR cache · 7e5ce9f3
      Yishai Hadas 提交于
      commit 9ec4483a3f0f71a228a5933bc040441322bfb090 upstream.
      
      Fix unreg_umr to move the MR to a kernel owned PD (i.e. the UMR PD) which
      can't be accessed by userspace.
      
      This ensures that nothing can continue to access the MR once it has been
      placed in the kernels cache for reuse.
      
      MRs in the cache continue to have their HW state, including DMA tables,
      present. Even though the MR has been invalidated, changing the PD provides
      an additional layer of protection against use of the MR.
      
      Link: https://lore.kernel.org/r/20190723065733.4899-5-leon@kernel.org
      Cc: <stable@vger.kernel.org> # 3.10
      Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
      Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
      Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7e5ce9f3
    • Y
      IB/mlx5: Use direct mkey destroy command upon UMR unreg failure · 3cfa1087
      Yishai Hadas 提交于
      commit afd1417404fba6dbfa6c0a8e5763bd348da682e4 upstream.
      
      Use a direct firmware command to destroy the mkey in case the unreg UMR
      operation has failed.
      
      This prevents a case that a mkey will leak out from the cache post a
      failure to be destroyed by a UMR WR.
      
      In case the MR cache limit didn't reach a call to add another entry to the
      cache instead of the destroyed one is issued.
      
      In addition, replaced a warn message to WARN_ON() as this flow is fatal
      and can't happen unless some bug around.
      
      Link: https://lore.kernel.org/r/20190723065733.4899-4-leon@kernel.org
      Cc: <stable@vger.kernel.org> # 4.10
      Fixes: 49780d42 ("IB/mlx5: Expose MR cache for mlx5_ib")
      Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
      Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3cfa1087
    • Y
      IB/mlx5: Fix unreg_umr to ignore the mkey state · 41be1928
      Yishai Hadas 提交于
      commit 6a053953739d23694474a5f9c81d1a30093da81a upstream.
      
      Fix unreg_umr to ignore the mkey state and do not fail if was freed.  This
      prevents a case that a user space application already changed the mkey
      state to free and then the UMR operation will fail leaving the mkey in an
      inappropriate state.
      
      Link: https://lore.kernel.org/r/20190723065733.4899-3-leon@kernel.org
      Cc: <stable@vger.kernel.org> # 3.19
      Fixes: 968e78dd ("IB/mlx5: Enhance UMR support to allow partial page table update")
      Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
      Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      41be1928
    • J
      xen/swiotlb: fix condition for calling xen_destroy_contiguous_region() · 04fdca1f
      Juergen Gross 提交于
      commit 50f6393f9654c561df4cdcf8e6cfba7260143601 upstream.
      
      The condition in xen_swiotlb_free_coherent() for deciding whether to
      call xen_destroy_contiguous_region() is wrong: in case the region to
      be freed is not contiguous calling xen_destroy_contiguous_region() is
      the wrong thing to do: it would result in inconsistent mappings of
      multiple PFNs to the same MFN. This will lead to various strange
      crashes or data corruption.
      
      Instead of calling xen_destroy_contiguous_region() in that case a
      warning should be issued as that situation should never occur.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Reviewed-by: NJan Beulich <jbeulich@suse.com>
      Acked-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      04fdca1f
    • M
      nbd: replace kill_bdev() with __invalidate_device() again · eb828241
      Munehisa Kamata 提交于
      commit 2b5c8f0063e4b263cf2de82029798183cf85c320 upstream.
      
      Commit abbbdf12 ("replace kill_bdev() with __invalidate_device()")
      once did this, but 29eaadc0 ("nbd: stop using the bdev everywhere")
      resurrected kill_bdev() and it has been there since then. So buffer_head
      mappings still get killed on a server disconnection, and we can still
      hit the BUG_ON on a filesystem on the top of the nbd device.
      
        EXT4-fs (nbd0): mounted filesystem with ordered data mode. Opts: (null)
        block nbd0: Receive control failed (result -32)
        block nbd0: shutting down sockets
        print_req_error: I/O error, dev nbd0, sector 66264 flags 3000
        EXT4-fs warning (device nbd0): htree_dirblock_to_tree:979: inode #2: lblock 0: comm ls: error -5 reading directory block
        print_req_error: I/O error, dev nbd0, sector 2264 flags 3000
        EXT4-fs error (device nbd0): __ext4_get_inode_loc:4690: inode #2: block 283: comm ls: unable to read itable block
        EXT4-fs error (device nbd0) in ext4_reserve_inode_write:5894: IO failure
        ------------[ cut here ]------------
        kernel BUG at fs/buffer.c:3057!
        invalid opcode: 0000 [#1] SMP PTI
        CPU: 7 PID: 40045 Comm: jbd2/nbd0-8 Not tainted 5.1.0-rc3+ #4
        Hardware name: Amazon EC2 m5.12xlarge/, BIOS 1.0 10/16/2017
        RIP: 0010:submit_bh_wbc+0x18b/0x190
        ...
        Call Trace:
         jbd2_write_superblock+0xf1/0x230 [jbd2]
         ? account_entity_enqueue+0xc5/0xf0
         jbd2_journal_update_sb_log_tail+0x94/0xe0 [jbd2]
         jbd2_journal_commit_transaction+0x12f/0x1d20 [jbd2]
         ? __switch_to_asm+0x40/0x70
         ...
         ? lock_timer_base+0x67/0x80
         kjournald2+0x121/0x360 [jbd2]
         ? remove_wait_queue+0x60/0x60
         kthread+0xf8/0x130
         ? commit_timeout+0x10/0x10 [jbd2]
         ? kthread_bind+0x10/0x10
         ret_from_fork+0x35/0x40
      
      With __invalidate_device(), I no longer hit the BUG_ON with sync or
      unmount on the disconnected device.
      
      Fixes: 29eaadc0 ("nbd: stop using the bdev everywhere")
      Cc: linux-block@vger.kernel.org
      Cc: Ratna Manoj Bolla <manoj.br@gmail.com>
      Cc: nbd@other.debian.org
      Cc: stable@vger.kernel.org
      Cc: David Woodhouse <dwmw@amazon.com>
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NMunehisa Kamata <kamatam@amazon.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eb828241
    • W
      arm64: cpufeature: Fix feature comparison for CTR_EL0.{CWG,ERG} · 8dfef0f4
      Will Deacon 提交于
      commit 147b9635e6347104b91f48ca9dca61eb0fbf2a54 upstream.
      
      If CTR_EL0.{CWG,ERG} are 0b0000 then they must be interpreted to have
      their architecturally maximum values, which defeats the use of
      FTR_HIGHER_SAFE when sanitising CPU ID registers on heterogeneous
      machines.
      
      Introduce FTR_HIGHER_OR_ZERO_SAFE so that these fields effectively
      saturate at zero.
      
      Fixes: 3c739b57 ("arm64: Keep track of CPU feature registers")
      Cc: <stable@vger.kernel.org> # 4.4.x-
      Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NWill Deacon <will@kernel.org>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8dfef0f4
    • W
      arm64: compat: Allow single-byte watchpoints on all addresses · 2bddc985
      Will Deacon 提交于
      commit 849adec41203ac5837c40c2d7e08490ffdef3c2c upstream.
      
      Commit d968d2b8 ("ARM: 7497/1: hw_breakpoint: allow single-byte
      watchpoints on all addresses") changed the validation requirements for
      hardware watchpoints on arch/arm/. Update our compat layer to implement
      the same relaxation.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NWill Deacon <will@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2bddc985
    • W
      drivers/perf: arm_pmu: Fix failure path in PM notifier · c385cda0
      Will Deacon 提交于
      commit 0d7fd70f26039bd4b33444ca47f0e69ce3ae0354 upstream.
      
      Handling of the CPU_PM_ENTER_FAILED transition in the Arm PMU PM
      notifier code incorrectly skips restoration of the counters. Fix the
      logic so that CPU_PM_ENTER_FAILED follows the same path as CPU_PM_EXIT.
      
      Cc: <stable@vger.kernel.org>
      Fixes: da4e4f18 ("drivers/perf: arm_pmu: implement CPU_PM notifier")
      Reported-by: NAnders Roxell <anders.roxell@linaro.org>
      Acked-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: NWill Deacon <will@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c385cda0
    • H
      parisc: Fix build of compressed kernel even with debug enabled · 5f80ac50
      Helge Deller 提交于
      commit 3fe6c873af2f2247544debdbe51ec29f690a2ccf upstream.
      
      With debug info enabled (CONFIG_DEBUG_INFO=y) the resulting vmlinux may get
      that huge that we need to increase the start addresss for the decompression
      text section otherwise one will face a linker error.
      Reported-by: NSven Schnelle <svens@stackframe.org>
      Tested-by: NSven Schnelle <svens@stackframe.org>
      Cc: stable@vger.kernel.org # v4.14+
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5f80ac50
    • C
      cgroup: kselftest: relax fs_spec checks · 001f93d9
      Chris Down 提交于
      commit b59b1baab789eacdde809135542e3d4f256f6878 upstream.
      
      On my laptop most memcg kselftests were being skipped because it claimed
      cgroup v2 hierarchy wasn't mounted, but this isn't correct.  Instead, it
      seems current systemd HEAD mounts it with the name "cgroup2" instead of
      "cgroup":
      
          % grep cgroup /proc/mounts
          cgroup2 /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate 0 0
      
      I can't think of a reason to need to check fs_spec explicitly
      since it's arbitrary, so we can just rely on fs_vfstype.
      
      After these changes, `make TARGETS=cgroup kselftest` actually runs the
      cgroup v2 tests in more cases.
      
      Link: http://lkml.kernel.org/r/20190723210737.GA487@chrisdown.nameSigned-off-by: NChris Down <chris@chrisdown.name>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      001f93d9
    • S
      s390/dasd: fix endless loop after read unit address configuration · 6cb9e0d9
      Stefan Haberland 提交于
      commit 41995342b40c418a47603e1321256d2c4a2ed0fb upstream.
      
      After getting a storage server event that causes the DASD device driver
      to update its unit address configuration during a device shutdown there is
      the possibility of an endless loop in the device driver.
      
      In the system log there will be ongoing DASD error messages with RC: -19.
      
      The reason is that the loop starting the ruac request only terminates when
      the retry counter is decreased to 0. But in the sleep_on function there are
      early exit paths that do not decrease the retry counter.
      
      Prevent an endless loop by handling those cases separately.
      
      Remove the unnecessary do..while loop since the sleep_on function takes
      care of retries by itself.
      
      Fixes: 8e09f215 ("[S390] dasd: add hyper PAV support to DASD device driver, part 1")
      Cc: stable@vger.kernel.org # 2.6.25+
      Signed-off-by: NStefan Haberland <sth@linux.ibm.com>
      Reviewed-by: NJan Hoeppner <hoeppner@linux.ibm.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6cb9e0d9
    • Y
      mm: vmscan: check if mem cgroup is disabled or not before calling memcg slab shrinker · beb0cc78
      Yang Shi 提交于
      commit fa1e512fac717f34e7c12d7a384c46e90a647392 upstream.
      
      Shakeel Butt reported premature oom on kernel with
      "cgroup_disable=memory" since mem_cgroup_is_root() returns false even
      though memcg is actually NULL.  The drop_caches is also broken.
      
      It is because commit aeed1d32 ("mm/vmscan.c: generalize
      shrink_slab() calls in shrink_node()") removed the !memcg check before
      !mem_cgroup_is_root().  And, surprisingly root memcg is allocated even
      though memory cgroup is disabled by kernel boot parameter.
      
      Add mem_cgroup_disabled() check to make reclaimer work as expected.
      
      Link: http://lkml.kernel.org/r/1563385526-20805-1-git-send-email-yang.shi@linux.alibaba.com
      Fixes: aeed1d32 ("mm/vmscan.c: generalize shrink_slab() calls in shrink_node()")
      Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
      Reported-by: NShakeel Butt <shakeelb@google.com>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Reviewed-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Jan Hadrava <had@kam.mff.cuni.cz>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: <stable@vger.kernel.org>	[4.19+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      beb0cc78
    • S
      ALSA: hda: Fix 1-minute detection delay when i915 module is not available · 72651bbd
      Samuel Thibault 提交于
      commit 74bf71ed792ab0f64631cc65ccdb54c356c36d45 upstream.
      
      Distribution installation images such as Debian include different sets
      of modules which can be downloaded dynamically.  Such images may notably
      include the hda sound modules but not the i915 DRM module, even if the
      latter was enabled at build time, as reported on
      https://bugs.debian.org/931507
      
      In such a case hdac_i915 would be linked in and try to load the i915
      module, fail since it is not there, but still wait for a whole minute
      before giving up binding with it.
      
      This fixes such as case by only waiting for the binding if the module
      was properly loaded (or module support is disabled, in which case i915
      is already compiled-in anyway).
      
      Fixes: f9b54e19 ("ALSA: hda/i915: Allow delayed i915 audio component binding")
      Signed-off-by: NSamuel Thibault <samuel.thibault@ens-lyon.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      72651bbd
    • O
      selinux: fix memory leak in policydb_init() · 46650ac2
      Ondrej Mosnacek 提交于
      commit 45385237f65aeee73641f1ef737d7273905a233f upstream.
      
      Since roles_init() adds some entries to the role hash table, we need to
      destroy also its keys/values on error, otherwise we get a memory leak in
      the error path.
      
      Cc: <stable@vger.kernel.org>
      Reported-by: syzbot+fee3a14d4cdf92646287@syzkaller.appspotmail.com
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NOndrej Mosnacek <omosnace@redhat.com>
      Signed-off-by: NPaul Moore <paul@paul-moore.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      46650ac2
    • M
      mtd: rawnand: micron: handle on-die "ECC-off" devices correctly · e7bb4c81
      Marco Felsch 提交于
      commit 8493b2a06fc5b77ef5c579dc32b12761f7b7a84c upstream.
      
      Some devices are not supposed to support on-die ECC but experience
      shows that internal ECC machinery can actually be enabled through the
      "SET FEATURE (EFh)" command, even if a read of the "READ ID Parameter
      Tables" returns that it is not.
      
      Currently, the driver checks the "READ ID Parameter" field directly
      after having enabled the feature. If the check fails it returns
      immediately but leaves the ECC on. When using buggy chips like
      MT29F2G08ABAGA and MT29F2G08ABBGA, all future read/program cycles will
      go through the on-die ECC, confusing the host controller which is
      supposed to be the one handling correction.
      
      To address this in a common way we need to turn off the on-die ECC
      directly after reading the "READ ID Parameter" and before checking the
      "ECC status".
      
      Cc: stable@vger.kernel.org
      Fixes: dbc44edb ("mtd: rawnand: micron: Fix on-die ECC detection logic")
      Signed-off-by: NMarco Felsch <m.felsch@pengutronix.de>
      Reviewed-by: NBoris Brezillon <boris.brezillon@collabora.com>
      Signed-off-by: NMiquel Raynal <miquel.raynal@bootlin.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e7bb4c81
    • G
      IB/hfi1: Fix Spectre v1 vulnerability · fafaeae4
      Gustavo A. R. Silva 提交于
      commit 6497d0a9c53df6e98b25e2b79f2295d7caa47b6e upstream.
      
      sl is controlled by user-space, hence leading to a potential
      exploitation of the Spectre variant 1 vulnerability.
      
      Fix this by sanitizing sl before using it to index ibp->sl_to_sc.
      
      Notice that given that speculation windows are large, the policy is
      to kill the speculation on the first load and not worry if it can be
      completed with a dependent load/store [1].
      
      [1] https://lore.kernel.org/lkml/20180423164740.GY17484@dhcp22.suse.cz/
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      Link: https://lore.kernel.org/r/20190731175428.GA16736@embeddedorSigned-off-by: NDoug Ledford <dledford@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fafaeae4
    • M
      gpiolib: fix incorrect IRQ requesting of an active-low lineevent · fdb0fb56
      Michael Wu 提交于
      commit 223ecaf140b1dd1c1d2a1a1d96281efc5c906984 upstream.
      
      When a pin is active-low, logical trigger edge should be inverted to match
      the same interrupt opportunity.
      
      For example, a button pushed triggers falling edge in ACTIVE_HIGH case; in
      ACTIVE_LOW case, the button pushed triggers rising edge. For user space the
      IRQ requesting doesn't need to do any modification except to configuring
      GPIOHANDLE_REQUEST_ACTIVE_LOW.
      
      For example, we want to catch the event when the button is pushed. The
      button on the original board drives level to be low when it is pushed, and
      drives level to be high when it is released.
      
      In user space we can do:
      
      	req.handleflags = GPIOHANDLE_REQUEST_INPUT;
      	req.eventflags = GPIOEVENT_REQUEST_FALLING_EDGE;
      
      	while (1) {
      		read(fd, &dat, sizeof(dat));
      		if (dat.id == GPIOEVENT_EVENT_FALLING_EDGE)
      			printf("button pushed\n");
      	}
      
      Run the same logic on another board which the polarity of the button is
      inverted; it drives level to be high when pushed, and level to be low when
      released. For this inversion we add flag GPIOHANDLE_REQUEST_ACTIVE_LOW:
      
      	req.handleflags = GPIOHANDLE_REQUEST_INPUT |
      		GPIOHANDLE_REQUEST_ACTIVE_LOW;
      	req.eventflags = GPIOEVENT_REQUEST_FALLING_EDGE;
      
      At the result, there are no any events caught when the button is pushed.
      By the way, button releasing will emit a "falling" event. The timing of
      "falling" catching is not expected.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMichael Wu <michael.wu@vatics.com>
      Tested-by: NBartosz Golaszewski <bgolaszewski@baylibre.com>
      Signed-off-by: NBartosz Golaszewski <bgolaszewski@baylibre.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fdb0fb56
    • J
      mmc: meson-mx-sdio: Fix misuse of GENMASK macro · 7e3efb65
      Joe Perches 提交于
      commit 665e985c2f41bebc3e6cee7e04c36a44afbc58f7 upstream.
      
      Arguments are supposed to be ordered high then low.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Reviewed-by: NNeil Armstrong <narmstrong@baylibre.com>
      Fixes: ed80a13b ("mmc: meson-mx-sdio: Add a driver for the Amlogic
      Meson8 and Meson8b SoCs")
      Cc: stable@vger.kernel.org
      Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7e3efb65
    • D
      mmc: dw_mmc: Fix occasional hang after tuning on eMMC · 29841b5c
      Douglas Anderson 提交于
      commit ba2d139b02ba684c6c101de42fed782d6cd2b997 upstream.
      
      In commit 46d17952 ("mmc: dw_mmc: Wait for data transfer after
      response errors.") we fixed a tuning-induced hang that I saw when
      stress testing tuning on certain SD cards.  I won't re-hash that whole
      commit, but the summary is that as a normal part of tuning you need to
      deal with transfer errors and there were cases where these transfer
      errors was putting my system into a bad state causing all future
      transfers to fail.  That commit fixed handling of the transfer errors
      for me.
      
      In downstream Chrome OS my fix landed and had the same behavior for
      all SD/MMC commands.  However, it looks like when the commit landed
      upstream we limited it to only SD tuning commands.  Presumably this
      was to try to get around problems that Alim Akhtar reported on exynos
      [1].
      
      Unfortunately while stress testing reboots (and suspend/resume) on
      some rk3288-based Chromebooks I found the same problem on the eMMC on
      some of my Chromebooks (the ones with Hynix eMMC).  Since the eMMC
      tuning command is different (MMC_SEND_TUNING_BLOCK_HS200
      vs. MMC_SEND_TUNING_BLOCK) we were basically getting back into the
      same situation.
      
      I'm hoping that whatever problems exynos was having in the past are
      somehow magically fixed now and we can make the behavior the same for
      all commands.
      
      [1] https://lkml.kernel.org/r/CAGOxZ53WfNbaMe0_AM0qBqU47kAfgmPBVZC8K8Y-_J3mDMqW4A@mail.gmail.com
      
      Fixes: 46d17952 ("mmc: dw_mmc: Wait for data transfer after response errors.")
      Signed-off-by: NDouglas Anderson <dianders@chromium.org>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Alim Akhtar <alim.akhtar@gmail.com>
      Cc: Enric Balletbo i Serra <enric.balletbo@collabora.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      29841b5c
    • F
      Btrfs: fix race leading to fs corruption after transaction abort · 50d70040
      Filipe Manana 提交于
      commit cb2d3daddbfb6318d170e79aac1f7d5e4d49f0d7 upstream.
      
      When one transaction is finishing its commit, it is possible for another
      transaction to start and enter its initial commit phase as well. If the
      first ends up getting aborted, we have a small time window where the second
      transaction commit does not notice that the previous transaction aborted
      and ends up committing, writing a superblock that points to btrees that
      reference extent buffers (nodes and leafs) that were not persisted to disk.
      The consequence is that after mounting the filesystem again, we will be
      unable to load some btree nodes/leafs, either because the content on disk
      is either garbage (or just zeroes) or corresponds to the old content of a
      previouly COWed or deleted node/leaf, resulting in the well known error
      messages "parent transid verify failed on ...".
      The following sequence diagram illustrates how this can happen.
      
              CPU 1                                           CPU 2
      
       <at transaction N>
      
       btrfs_commit_transaction()
         (...)
         --> sets transaction state to
             TRANS_STATE_UNBLOCKED
         --> sets fs_info->running_transaction
             to NULL
      
                                                          (...)
                                                          btrfs_start_transaction()
                                                            start_transaction()
                                                              wait_current_trans()
                                                                --> returns immediately
                                                                    because
                                                                    fs_info->running_transaction
                                                                    is NULL
                                                              join_transaction()
                                                                --> creates transaction N + 1
                                                                --> sets
                                                                    fs_info->running_transaction
                                                                    to transaction N + 1
                                                                --> adds transaction N + 1 to
                                                                    the fs_info->trans_list list
                                                              --> returns transaction handle
                                                                  pointing to the new
                                                                  transaction N + 1
                                                          (...)
      
                                                          btrfs_sync_file()
                                                            btrfs_start_transaction()
                                                              --> returns handle to
                                                                  transaction N + 1
                                                            (...)
      
         btrfs_write_and_wait_transaction()
           --> writeback of some extent
               buffer fails, returns an
      	 error
         btrfs_handle_fs_error()
           --> sets BTRFS_FS_STATE_ERROR in
               fs_info->fs_state
         --> jumps to label "scrub_continue"
         cleanup_transaction()
           btrfs_abort_transaction(N)
             --> sets BTRFS_FS_STATE_TRANS_ABORTED
                 flag in fs_info->fs_state
             --> sets aborted field in the
                 transaction and transaction
      	   handle structures, for
                 transaction N only
           --> removes transaction from the
               list fs_info->trans_list
                                                            btrfs_commit_transaction(N + 1)
                                                              --> transaction N + 1 was not
      							    aborted, so it proceeds
                                                              (...)
                                                              --> sets the transaction's state
                                                                  to TRANS_STATE_COMMIT_START
                                                              --> does not find the previous
                                                                  transaction (N) in the
                                                                  fs_info->trans_list, so it
                                                                  doesn't know that transaction
                                                                  was aborted, and the commit
                                                                  of transaction N + 1 proceeds
                                                              (...)
                                                              --> sets transaction N + 1 state
                                                                  to TRANS_STATE_UNBLOCKED
                                                              btrfs_write_and_wait_transaction()
                                                                --> succeeds writing all extent
                                                                    buffers created in the
                                                                    transaction N + 1
                                                              write_all_supers()
                                                                 --> succeeds
                                                                 --> we now have a superblock on
                                                                     disk that points to trees
                                                                     that refer to at least one
                                                                     extent buffer that was
                                                                     never persisted
      
      So fix this by updating the transaction commit path to check if the flag
      BTRFS_FS_STATE_TRANS_ABORTED is set on fs_info->fs_state if after setting
      the transaction to the TRANS_STATE_COMMIT_START we do not find any previous
      transaction in the fs_info->trans_list. If the flag is set, just fail the
      transaction commit with -EROFS, as we do in other places. The exact error
      code for the previous transaction abort was already logged and reported.
      
      Fixes: 49b25e05 ("btrfs: enhance transaction abort infrastructure")
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      50d70040
    • F
      Btrfs: fix incremental send failure after deduplication · 009d7a4e
      Filipe Manana 提交于
      commit b4f9a1a87a48c255bb90d8a6c3d555a1abb88130 upstream.
      
      When doing an incremental send operation we can fail if we previously did
      deduplication operations against a file that exists in both snapshots. In
      that case we will fail the send operation with -EIO and print a message
      to dmesg/syslog like the following:
      
        BTRFS error (device sdc): Send: inconsistent snapshot, found updated \
        extent for inode 257 without updated inode item, send root is 258, \
        parent root is 257
      
      This requires that we deduplicate to the same file in both snapshots for
      the same amount of times on each snapshot. The issue happens because a
      deduplication only updates the iversion of an inode and does not update
      any other field of the inode, therefore if we deduplicate the file on
      each snapshot for the same amount of time, the inode will have the same
      iversion value (stored as the "sequence" field on the inode item) on both
      snapshots, therefore it will be seen as unchanged between in the send
      snapshot while there are new/updated/deleted extent items when comparing
      to the parent snapshot. This makes the send operation return -EIO and
      print an error message.
      
      Example reproducer:
      
        $ mkfs.btrfs -f /dev/sdb
        $ mount /dev/sdb /mnt
      
        # Create our first file. The first half of the file has several 64Kb
        # extents while the second half as a single 512Kb extent.
        $ xfs_io -f -s -c "pwrite -S 0xb8 -b 64K 0 512K" /mnt/foo
        $ xfs_io -c "pwrite -S 0xb8 512K 512K" /mnt/foo
      
        # Create the base snapshot and the parent send stream from it.
        $ btrfs subvolume snapshot -r /mnt /mnt/mysnap1
        $ btrfs send -f /tmp/1.snap /mnt/mysnap1
      
        # Create our second file, that has exactly the same data as the first
        # file.
        $ xfs_io -f -c "pwrite -S 0xb8 0 1M" /mnt/bar
      
        # Create the second snapshot, used for the incremental send, before
        # doing the file deduplication.
        $ btrfs subvolume snapshot -r /mnt /mnt/mysnap2
      
        # Now before creating the incremental send stream:
        #
        # 1) Deduplicate into a subrange of file foo in snapshot mysnap1. This
        #    will drop several extent items and add a new one, also updating
        #    the inode's iversion (sequence field in inode item) by 1, but not
        #    any other field of the inode;
        #
        # 2) Deduplicate into a different subrange of file foo in snapshot
        #    mysnap2. This will replace an extent item with a new one, also
        #    updating the inode's iversion by 1 but not any other field of the
        #    inode.
        #
        # After these two deduplication operations, the inode items, for file
        # foo, are identical in both snapshots, but we have different extent
        # items for this inode in both snapshots. We want to check this doesn't
        # cause send to fail with an error or produce an incorrect stream.
      
        $ xfs_io -r -c "dedupe /mnt/bar 0 0 512K" /mnt/mysnap1/foo
        $ xfs_io -r -c "dedupe /mnt/bar 512K 512K 512K" /mnt/mysnap2/foo
      
        # Create the incremental send stream.
        $ btrfs send -p /mnt/mysnap1 -f /tmp/2.snap /mnt/mysnap2
        ERROR: send ioctl failed with -5: Input/output error
      
      This issue started happening back in 2015 when deduplication was updated
      to not update the inode's ctime and mtime and update only the iversion.
      Back then we would hit a BUG_ON() in send, but later in 2016 send was
      updated to return -EIO and print the error message instead of doing the
      BUG_ON().
      
      A test case for fstests follows soon.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203933
      Fixes: 1c919a5e ("btrfs: don't update mtime/ctime on deduped inodes")
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      009d7a4e
    • M
      kbuild: initialize CLANG_FLAGS correctly in the top Makefile · 4c5a4425
      Masahiro Yamada 提交于
      commit 5241ab4cf42d3a93b933b55d3d53f43049081fa1 upstream.
      
      CLANG_FLAGS is initialized by the following line:
      
        CLANG_FLAGS     := --target=$(notdir $(CROSS_COMPILE:%-=%))
      
      ..., which is run only when CROSS_COMPILE is set.
      
      Some build targets (bindeb-pkg etc.) recurse to the top Makefile.
      
      When you build the kernel with Clang but without CROSS_COMPILE,
      the same compiler flags such as -no-integrated-as are accumulated
      into CLANG_FLAGS.
      
      If you run 'make CC=clang' and then 'make CC=clang bindeb-pkg',
      Kbuild will recompile everything needlessly due to the build command
      change.
      
      Fix this by correctly initializing CLANG_FLAGS.
      
      Fixes: 238bcbc4e07f ("kbuild: consolidate Clang compiler flags")
      Cc: <stable@vger.kernel.org> # v5.0+
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Reviewed-by: NNathan Chancellor <natechancellor@gmail.com>
      Acked-by: NNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4c5a4425
    • M
      kconfig: Clear "written" flag to avoid data loss · 3736612d
      M. Vefa Bicakci 提交于
      commit 0c5b6c28ed68becb692b43eae5e44d5aa7e160ce upstream.
      
      Prior to this commit, starting nconfig, xconfig or gconfig, and saving
      the .config file more than once caused data loss, where a .config file
      that contained only comments would be written to disk starting from the
      second save operation.
      
      This bug manifests itself because the SYMBOL_WRITTEN flag is never
      cleared after the first call to conf_write, and subsequent calls to
      conf_write then skip all of the configuration symbols due to the
      SYMBOL_WRITTEN flag being set.
      
      This commit resolves this issue by clearing the SYMBOL_WRITTEN flag
      from all symbols before conf_write returns.
      
      Fixes: 8e2442a5f86e ("kconfig: fix missing choice values in auto.conf")
      Cc: linux-stable <stable@vger.kernel.org> # 4.19+
      Signed-off-by: NM. Vefa Bicakci <m.v.b@runbox.com>
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3736612d
    • Y
      drm/nouveau: fix memory leak in nouveau_conn_reset() · 4c6500b5
      Yongxin Liu 提交于
      [ Upstream commit 09b90e2fe35faeace2488234e2a7728f2ea8ba26 ]
      
      In nouveau_conn_reset(), if connector->state is true,
      __drm_atomic_helper_connector_destroy_state() will be called,
      but the memory pointed by asyc isn't freed. Memory leak happens
      in the following function __drm_atomic_helper_connector_reset(),
      where newly allocated asyc->state will be assigned to connector->state.
      
      So using nouveau_conn_atomic_destroy_state() instead of
      __drm_atomic_helper_connector_destroy_state to free the "old" asyc.
      
      Here the is the log showing memory leak.
      
      unreferenced object 0xffff8c5480483c80 (size 192):
        comm "kworker/0:2", pid 188, jiffies 4294695279 (age 53.179s)
        hex dump (first 32 bytes):
          00 f0 ba 7b 54 8c ff ff 00 00 00 00 00 00 00 00  ...{T...........
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<000000005005c0d0>] kmem_cache_alloc_trace+0x195/0x2c0
          [<00000000a122baed>] nouveau_conn_reset+0x25/0xc0 [nouveau]
          [<000000004fd189a2>] nouveau_connector_create+0x3a7/0x610 [nouveau]
          [<00000000c73343a8>] nv50_display_create+0x343/0x980 [nouveau]
          [<000000002e2b03c3>] nouveau_display_create+0x51f/0x660 [nouveau]
          [<00000000c924699b>] nouveau_drm_device_init+0x182/0x7f0 [nouveau]
          [<00000000cc029436>] nouveau_drm_probe+0x20c/0x2c0 [nouveau]
          [<000000007e961c3e>] local_pci_probe+0x47/0xa0
          [<00000000da14d569>] work_for_cpu_fn+0x1a/0x30
          [<0000000028da4805>] process_one_work+0x27c/0x660
          [<000000001d415b04>] worker_thread+0x22b/0x3f0
          [<0000000003b69f1f>] kthread+0x12f/0x150
          [<00000000c94c29b7>] ret_from_fork+0x3a/0x50
      Signed-off-by: NYongxin Liu <yongxin.liu@windriver.com>
      Signed-off-by: NBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      4c6500b5
    • Z
      x86, boot: Remove multiple copy of static function sanitize_boot_params() · 84ce0452
      Zhenzhong Duan 提交于
      [ Upstream commit 8c5477e8046ca139bac250386c08453da37ec1ae ]
      
      Kernel build warns:
       'sanitize_boot_params' defined but not used [-Wunused-function]
      
      at below files:
        arch/x86/boot/compressed/cmdline.c
        arch/x86/boot/compressed/error.c
        arch/x86/boot/compressed/early_serial_console.c
        arch/x86/boot/compressed/acpi.c
      
      That's becausethey each include misc.h which includes a definition of
      sanitize_boot_params() via bootparam_utils.h.
      
      Remove the inclusion from misc.h and have the c file including
      bootparam_utils.h directly.
      Signed-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/1563283092-1189-1-git-send-email-zhenzhong.duan@oracle.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      84ce0452
    • J
      x86/paravirt: Fix callee-saved function ELF sizes · 740e0167
      Josh Poimboeuf 提交于
      [ Upstream commit 083db6764821996526970e42d09c1ab2f4155dd4 ]
      
      The __raw_callee_save_*() functions have an ELF symbol size of zero,
      which confuses objtool and other tools.
      
      Fixes a bunch of warnings like the following:
      
        arch/x86/xen/mmu_pv.o: warning: objtool: __raw_callee_save_xen_pte_val() is missing an ELF size annotation
        arch/x86/xen/mmu_pv.o: warning: objtool: __raw_callee_save_xen_pgd_val() is missing an ELF size annotation
        arch/x86/xen/mmu_pv.o: warning: objtool: __raw_callee_save_xen_make_pte() is missing an ELF size annotation
        arch/x86/xen/mmu_pv.o: warning: objtool: __raw_callee_save_xen_make_pgd() is missing an ELF size annotation
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/afa6d49bb07497ca62e4fc3b27a2d0cece545b4e.1563413318.git.jpoimboe@redhat.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      740e0167
    • J
      x86/kvm: Don't call kvm_spurious_fault() from .fixup · ba5c072f
      Josh Poimboeuf 提交于
      [ Upstream commit 3901336ed9887b075531bffaeef7742ba614058b ]
      
      After making a change to improve objtool's sibling call detection, it
      started showing the following warning:
      
        arch/x86/kvm/vmx/nested.o: warning: objtool: .fixup+0x15: sibling call from callable instruction with modified stack frame
      
      The problem is the ____kvm_handle_fault_on_reboot() macro.  It does a
      fake call by pushing a fake RIP and doing a jump.  That tricks the
      unwinder into printing the function which triggered the exception,
      rather than the .fixup code.
      
      Instead of the hack to make it look like the original function made the
      call, just change the macro so that the original function actually does
      make the call.  This allows removal of the hack, and also makes objtool
      happy.
      
      I triggered a vmx instruction exception and verified that the stack
      trace is still sane:
      
        kernel BUG at arch/x86/kvm/x86.c:358!
        invalid opcode: 0000 [#1] SMP PTI
        CPU: 28 PID: 4096 Comm: qemu-kvm Not tainted 5.2.0+ #16
        Hardware name: Lenovo THINKSYSTEM SD530 -[7X2106Z000]-/-[7X2106Z000]-, BIOS -[TEE113Z-1.00]- 07/17/2017
        RIP: 0010:kvm_spurious_fault+0x5/0x10
        Code: 00 00 00 00 00 8b 44 24 10 89 d2 45 89 c9 48 89 44 24 10 8b 44 24 08 48 89 44 24 08 e9 d4 40 22 00 0f 1f 40 00 0f 1f 44 00 00 <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 49 89 fd 41
        RSP: 0018:ffffbf91c683bd00 EFLAGS: 00010246
        RAX: 000061f040000000 RBX: ffff9e159c77bba0 RCX: ffff9e15a5c87000
        RDX: 0000000665c87000 RSI: ffff9e15a5c87000 RDI: ffff9e159c77bba0
        RBP: 0000000000000000 R08: 0000000000000000 R09: ffff9e15a5c87000
        R10: 0000000000000000 R11: fffff8f2d99721c0 R12: ffff9e159c77bba0
        R13: ffffbf91c671d960 R14: ffff9e159c778000 R15: 0000000000000000
        FS:  00007fa341cbe700(0000) GS:ffff9e15b7400000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007fdd38356804 CR3: 00000006759de003 CR4: 00000000007606e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        PKRU: 55555554
        Call Trace:
         loaded_vmcs_init+0x4f/0xe0
         alloc_loaded_vmcs+0x38/0xd0
         vmx_create_vcpu+0xf7/0x600
         kvm_vm_ioctl+0x5e9/0x980
         ? __switch_to_asm+0x40/0x70
         ? __switch_to_asm+0x34/0x70
         ? __switch_to_asm+0x40/0x70
         ? __switch_to_asm+0x34/0x70
         ? free_one_page+0x13f/0x4e0
         do_vfs_ioctl+0xa4/0x630
         ksys_ioctl+0x60/0x90
         __x64_sys_ioctl+0x16/0x20
         do_syscall_64+0x55/0x1c0
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7fa349b1ee5b
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/64a9b64d127e87b6920a97afde8e96ea76f6524e.1563413318.git.jpoimboe@redhat.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      ba5c072f
    • Z
      xen/pv: Fix a boot up hang revealed by int3 self test · 11cb9f87
      Zhenzhong Duan 提交于
      [ Upstream commit b23e5844dfe78a80ba672793187d3f52e4b528d7 ]
      
      Commit 7457c0da024b ("x86/alternatives: Add int3_emulate_call()
      selftest") is used to ensure there is a gap setup in int3 exception stack
      which could be used for inserting call return address.
      
      This gap is missed in XEN PV int3 exception entry path, then below panic
      triggered:
      
      [    0.772876] general protection fault: 0000 [#1] SMP NOPTI
      [    0.772886] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.2.0+ #11
      [    0.772893] RIP: e030:int3_magic+0x0/0x7
      [    0.772905] RSP: 3507:ffffffff82203e98 EFLAGS: 00000246
      [    0.773334] Call Trace:
      [    0.773334]  alternative_instructions+0x3d/0x12e
      [    0.773334]  check_bugs+0x7c9/0x887
      [    0.773334]  ? __get_locked_pte+0x178/0x1f0
      [    0.773334]  start_kernel+0x4ff/0x535
      [    0.773334]  ? set_init_arg+0x55/0x55
      [    0.773334]  xen_start_kernel+0x571/0x57a
      
      For 64bit PV guests, Xen's ABI enters the kernel with using SYSRET, with
      %rcx/%r11 on the stack. To convert back to "normal" looking exceptions,
      the xen thunks do 'xen_*: pop %rcx; pop %r11; jmp *'.
      
      E.g. Extracting 'xen_pv_trap xenint3' we have:
      xen_xenint3:
       pop %rcx;
       pop %r11;
       jmp xenint3
      
      As xenint3 and int3 entry code are same except xenint3 doesn't generate
      a gap, we can fix it by using int3 and drop useless xenint3.
      Signed-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      11cb9f87
    • P
      mlxsw: spectrum_dcb: Configure DSCP map as the last rule is removed · d3e36788
      Petr Machata 提交于
      [ Upstream commit dedfde2fe1c4ccf27179fcb234e2112d065c39bb ]
      
      Spectrum systems use DSCP rewrite map to update DSCP field in egressing
      packets to correspond to priority that the packet has. Whether rewriting
      will take place is determined at the point when the packet ingresses the
      switch: if the port is in Trust L3 mode, packet priority is determined from
      the DSCP map at the port, and DSCP rewrite will happen. If the port is in
      Trust L2 mode, 802.1p is used for packet prioritization, and no DSCP
      rewrite will happen.
      
      The driver determines the port trust mode based on whether any DSCP
      prioritization rules are in effect at given port. If there are any, trust
      level is L3, otherwise it's L2. When the last DSCP rule is removed, the
      port is switched to trust L2. Under that scenario, if DSCP of a packet
      should be rewritten, it should be rewritten to 0.
      
      However, when switching to Trust L2, the driver neglects to also update the
      DSCP rewrite map. The last DSCP rule thus remains in effect, and packets
      egressing through this port, if they have the right priority, will have
      their DSCP set according to this rule.
      
      Fix by first configuring the rewrite map, and only then switching to trust
      L2 and bailing out.
      
      Fixes: b2b1dab6 ("mlxsw: spectrum: Support ieee_setapp, ieee_delapp")
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Reported-by: NAlex Veber <alexve@mellanox.com>
      Tested-by: NAlex Veber <alexve@mellanox.com>
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      d3e36788
    • K
      ipc/mqueue.c: only perform resource calculation if user valid · 48c5c4f0
      Kees Cook 提交于
      [ Upstream commit a318f12ed8843cfac53198390c74a565c632f417 ]
      
      Andreas Christoforou reported:
      
        UBSAN: Undefined behaviour in ipc/mqueue.c:414:49 signed integer overflow:
        9 * 2305843009213693951 cannot be represented in type 'long int'
        ...
        Call Trace:
          mqueue_evict_inode+0x8e7/0xa10 ipc/mqueue.c:414
          evict+0x472/0x8c0 fs/inode.c:558
          iput_final fs/inode.c:1547 [inline]
          iput+0x51d/0x8c0 fs/inode.c:1573
          mqueue_get_inode+0x8eb/0x1070 ipc/mqueue.c:320
          mqueue_create_attr+0x198/0x440 ipc/mqueue.c:459
          vfs_mkobj+0x39e/0x580 fs/namei.c:2892
          prepare_open ipc/mqueue.c:731 [inline]
          do_mq_open+0x6da/0x8e0 ipc/mqueue.c:771
      
      Which could be triggered by:
      
              struct mq_attr attr = {
                      .mq_flags = 0,
                      .mq_maxmsg = 9,
                      .mq_msgsize = 0x1fffffffffffffff,
                      .mq_curmsgs = 0,
              };
      
              if (mq_open("/testing", 0x40, 3, &attr) == (mqd_t) -1)
                      perror("mq_open");
      
      mqueue_get_inode() was correctly rejecting the giant mq_msgsize, and
      preparing to return -EINVAL.  During the cleanup, it calls
      mqueue_evict_inode() which performed resource usage tracking math for
      updating "user", before checking if there was a valid "user" at all
      (which would indicate that the calculations would be sane).  Instead,
      delay this check to after seeing a valid "user".
      
      The overflow was real, but the results went unused, so while the flaw is
      harmless, it's noisy for kernel fuzzers, so just fix it by moving the
      calculation under the non-NULL "user" where it actually gets used.
      
      Link: http://lkml.kernel.org/r/201906072207.ECB65450@keescookSigned-off-by: NKees Cook <keescook@chromium.org>
      Reported-by: NAndreas Christoforou <andreaschristofo@gmail.com>
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      48c5c4f0
    • D
      drivers/rapidio/devices/rio_mport_cdev.c: NUL terminate some strings · f8731422
      Dan Carpenter 提交于
      [ Upstream commit 156e0b1a8112b76e351684ac948c59757037ac36 ]
      
      The dev_info.name[] array has space for RIO_MAX_DEVNAME_SZ + 1
      characters.  But the problem here is that we don't ensure that the user
      put a NUL terminator on the end of the string.  It could lead to an out
      of bounds read.
      
      Link: http://lkml.kernel.org/r/20190529110601.GB19119@mwanda
      Fixes: e8de3701 ("rapidio: add mport char device driver")
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: NAlexandre Bounine <alex.bou9@gmail.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      f8731422
    • M
      uapi linux/coda_psdev.h: move upc_req definition from uapi to kernel side headers · 35ee8b84
      Mikko Rapeli 提交于
      [ Upstream commit f90fb3c7e2c13ae829db2274b88b845a75038b8a ]
      
      Only users of upc_req in kernel side fs/coda/psdev.c and
      fs/coda/upcall.c already include linux/coda_psdev.h.
      
      Suggested by Jan Harkes <jaharkes@cs.cmu.edu> in
        https://lore.kernel.org/lkml/20150531111913.GA23377@cs.cmu.edu/
      
      Fixes these include/uapi/linux/coda_psdev.h compilation errors in userspace:
      
        linux/coda_psdev.h:12:19: error: field `uc_chain' has incomplete type
        struct list_head    uc_chain;
                         ^
        linux/coda_psdev.h:13:2: error: unknown type name `caddr_t'
        caddr_t             uc_data;
        ^
        linux/coda_psdev.h:14:2: error: unknown type name `u_short'
        u_short             uc_flags;
        ^
        linux/coda_psdev.h:15:2: error: unknown type name `u_short'
        u_short             uc_inSize;  /* Size is at most 5000 bytes */
        ^
        linux/coda_psdev.h:16:2: error: unknown type name `u_short'
        u_short             uc_outSize;
        ^
        linux/coda_psdev.h:17:2: error: unknown type name `u_short'
        u_short             uc_opcode;  /* copied from data to save lookup */
        ^
        linux/coda_psdev.h:19:2: error: unknown type name `wait_queue_head_t'
        wait_queue_head_t   uc_sleep;   /* process' wait queue */
        ^
      
      Link: http://lkml.kernel.org/r/9f99f5ce6a0563d5266e6cf7aa9585aac2cae971.1558117389.git.jaharkes@cs.cmu.eduSigned-off-by: NMikko Rapeli <mikko.rapeli@iki.fi>
      Signed-off-by: NJan Harkes <jaharkes@cs.cmu.edu>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Fabian Frederick <fabf@skynet.be>
      Cc: Sam Protsenko <semen.protsenko@linaro.org>
      Cc: Yann Droneaud <ydroneaud@opteya.com>
      Cc: Zhouyang Jia <jiazhouyang09@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      35ee8b84
    • S
      coda: fix build using bare-metal toolchain · dea2ee49
      Sam Protsenko 提交于
      [ Upstream commit b2a57e334086602be56b74958d9f29b955cd157f ]
      
      The kernel is self-contained project and can be built with bare-metal
      toolchain.  But bare-metal toolchain doesn't define __linux__.  Because
      of this u_quad_t type is not defined when using bare-metal toolchain and
      codafs build fails.  This patch fixes it by defining u_quad_t type
      unconditionally.
      
      Link: http://lkml.kernel.org/r/3cbb40b0a57b6f9923a9d67b53473c0b691a3eaa.1558117389.git.jaharkes@cs.cmu.eduSigned-off-by: NSam Protsenko <semen.protsenko@linaro.org>
      Signed-off-by: NJan Harkes <jaharkes@cs.cmu.edu>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Fabian Frederick <fabf@skynet.be>
      Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
      Cc: Yann Droneaud <ydroneaud@opteya.com>
      Cc: Zhouyang Jia <jiazhouyang09@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      dea2ee49
    • Z
      coda: add error handling for fget · cf3ddc00
      Zhouyang Jia 提交于
      [ Upstream commit 02551c23bcd85f0c68a8259c7b953d49d44f86af ]
      
      When fget fails, the lack of error-handling code may cause unexpected
      results.
      
      This patch adds error-handling code after calling fget.
      
      Link: http://lkml.kernel.org/r/2514ec03df9c33b86e56748513267a80dd8004d9.1558117389.git.jaharkes@cs.cmu.eduSigned-off-by: NZhouyang Jia <jiazhouyang09@gmail.com>
      Signed-off-by: NJan Harkes <jaharkes@cs.cmu.edu>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Fabian Frederick <fabf@skynet.be>
      Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
      Cc: Sam Protsenko <semen.protsenko@linaro.org>
      Cc: Yann Droneaud <ydroneaud@opteya.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      cf3ddc00
    • P
      lib/test_string.c: avoid masking memset16/32/64 failures · 93b83005
      Peter Rosin 提交于
      [ Upstream commit 33d6e0ff68af74be0c846c8e042e84a9a1a0561e ]
      
      If a memsetXX implementation is completely broken and fails in the first
      iteration, when i, j, and k are all zero, the failure is masked as zero
      is returned.  Failing in the first iteration is perhaps the most likely
      failure, so this makes the tests pretty much useless.  Avoid the
      situation by always setting a random unused bit in the result on
      failure.
      
      Link: http://lkml.kernel.org/r/20190506124634.6807-3-peda@axentia.se
      Fixes: 03270c13 ("lib/string.c: add testcases for memset16/32/64")
      Signed-off-by: NPeter Rosin <peda@axentia.se>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      93b83005
    • K
      lib/test_overflow.c: avoid tainting the kernel and fix wrap size · 8e087a2a
      Kees Cook 提交于
      [ Upstream commit 8e060c21ae2c265a2b596e9e7f9f97ec274151a4 ]
      
      This adds __GFP_NOWARN to the kmalloc()-portions of the overflow test to
      avoid tainting the kernel.  Additionally fixes up the math on wrap size
      to be architecture and page size agnostic.
      
      Link: http://lkml.kernel.org/r/201905282012.0A8767E24@keescook
      Fixes: ca90800a ("test_overflow: Add memory allocation overflow tests")
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Reported-by: NRandy Dunlap <rdunlap@infradead.org>
      Suggested-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Joe Perches <joe@perches.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      8e087a2a
    • D
      mm/cma.c: fail if fixed declaration can't be honored · 439c79ed
      Doug Berger 提交于
      [ Upstream commit c633324e311243586675e732249339685e5d6faa ]
      
      The description of cma_declare_contiguous() indicates that if the
      'fixed' argument is true the reserved contiguous area must be exactly at
      the address of the 'base' argument.
      
      However, the function currently allows the 'base', 'size', and 'limit'
      arguments to be silently adjusted to meet alignment constraints.  This
      commit enforces the documented behavior through explicit checks that
      return an error if the region does not fit within a specified region.
      
      Link: http://lkml.kernel.org/r/1561422051-16142-1-git-send-email-opendmb@gmail.com
      Fixes: 5ea3b1b2 ("cma: add placement specifier for "cma=" kernel parameter")
      Signed-off-by: NDoug Berger <opendmb@gmail.com>
      Acked-by: NMichal Nazarewicz <mina86@mina86.com>
      Cc: Yue Hu <huyue2@yulong.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: Peng Fan <peng.fan@nxp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      439c79ed
    • A
      x86: math-emu: Hide clang warnings for 16-bit overflow · 1b84e674
      Arnd Bergmann 提交于
      [ Upstream commit 29e7e9664aec17b94a9c8c5a75f8d216a206aa3a ]
      
      clang warns about a few parts of the math-emu implementation
      where a 16-bit integer becomes negative during assignment:
      
      arch/x86/math-emu/poly_tan.c:88:35: error: implicit conversion from 'int' to 'short' changes value from 49216 to -16320 [-Werror,-Wconstant-conversion]
                                            (0x41 + EXTENDED_Ebias) | SIGN_Negative);
                                            ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~
      arch/x86/math-emu/fpu_emu.h:180:58: note: expanded from macro 'setexponent16'
       #define setexponent16(x,y)  { (*(short *)&((x)->exp)) = (y); }
                                                            ~  ^
      arch/x86/math-emu/reg_constant.c:37:32: error: implicit conversion from 'int' to 'short' changes value from 49085 to -16451 [-Werror,-Wconstant-conversion]
      FPU_REG const CONST_PI2extra = MAKE_REG(NEG, -66,
                                     ^~~~~~~~~~~~~~~~~~
      arch/x86/math-emu/reg_constant.c:21:25: note: expanded from macro 'MAKE_REG'
                      ((EXTENDED_Ebias+(e)) | ((SIGN_##s != 0)*0x8000)) }
                       ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~
      arch/x86/math-emu/reg_constant.c:48:28: error: implicit conversion from 'int' to 'short' changes value from 65535 to -1 [-Werror,-Wconstant-conversion]
      FPU_REG const CONST_QNaN = MAKE_REG(NEG, EXP_OVER, 0x00000000, 0xC0000000);
                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      arch/x86/math-emu/reg_constant.c:21:25: note: expanded from macro 'MAKE_REG'
                      ((EXTENDED_Ebias+(e)) | ((SIGN_##s != 0)*0x8000)) }
                       ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~
      
      The code is correct as is, so add a typecast to shut up the warnings.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20190712090816.350668-1-arnd@arndb.deSigned-off-by: NSasha Levin <sashal@kernel.org>
      1b84e674
    • Q
      x86/apic: Silence -Wtype-limits compiler warnings · 242666b2
      Qian Cai 提交于
      [ Upstream commit ec6335586953b0df32f83ef696002063090c7aef ]
      
      There are many compiler warnings like this,
      
      In file included from ./arch/x86/include/asm/smp.h:13,
                       from ./arch/x86/include/asm/mmzone_64.h:11,
                       from ./arch/x86/include/asm/mmzone.h:5,
                       from ./include/linux/mmzone.h:969,
                       from ./include/linux/gfp.h:6,
                       from ./include/linux/mm.h:10,
                       from arch/x86/kernel/apic/io_apic.c:34:
      arch/x86/kernel/apic/io_apic.c: In function 'check_timer':
      ./arch/x86/include/asm/apic.h:37:11: warning: comparison of unsigned
      expression >= 0 is always true [-Wtype-limits]
         if ((v) <= apic_verbosity) \
                 ^~
      arch/x86/kernel/apic/io_apic.c:2160:2: note: in expansion of macro
      'apic_printk'
        apic_printk(APIC_QUIET, KERN_INFO "..TIMER: vector=0x%02X "
        ^~~~~~~~~~~
      ./arch/x86/include/asm/apic.h:37:11: warning: comparison of unsigned
      expression >= 0 is always true [-Wtype-limits]
         if ((v) <= apic_verbosity) \
                 ^~
      arch/x86/kernel/apic/io_apic.c:2207:4: note: in expansion of macro
      'apic_printk'
          apic_printk(APIC_QUIET, KERN_ERR "..MP-BIOS bug: "
          ^~~~~~~~~~~
      
      APIC_QUIET is 0, so silence them by making apic_verbosity type int.
      Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/1562621805-24789-1-git-send-email-cai@lca.pwSigned-off-by: NSasha Levin <sashal@kernel.org>
      242666b2
    • B
      be2net: Signal that the device cannot transmit during reconfiguration · ade866ad
      Benjamin Poirier 提交于
      [ Upstream commit 7429c6c0d9cb086d8e79f0d2a48ae14851d2115e ]
      
      While changing the number of interrupt channels, be2net stops adapter
      operation (including netif_tx_disable()) but it doesn't signal that it
      cannot transmit. This may lead dev_watchdog() to falsely trigger during
      that time.
      
      Add the missing call to netif_carrier_off(), following the pattern used in
      many other drivers. netif_carrier_on() is already taken care of in
      be_open().
      Signed-off-by: NBenjamin Poirier <bpoirier@suse.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      ade866ad