1. 17 1月, 2018 1 次提交
  2. 16 1月, 2018 3 次提交
  3. 14 1月, 2018 1 次提交
  4. 12 1月, 2018 2 次提交
    • S
      net/mlx5: Fix get vector affinity helper function · 05e0cc84
      Saeed Mahameed 提交于
      mlx5_get_vector_affinity used to call pci_irq_get_affinity and after
      reverting the patch that sets the device affinity via PCI_IRQ_AFFINITY
      API, calling pci_irq_get_affinity becomes useless and it breaks RDMA
      mlx5 users.  To fix this, this patch provides an alternative way to
      retrieve IRQ vector affinity using legacy IRQ API, following
      smp_affinity read procfs implementation.
      
      Fixes: 231243c8 ("Revert mlx5: move affinity hints assignments to generic code")
      Fixes: a435393a ("mlx5: move affinity hints assignments to generic code")
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      05e0cc84
    • E
      {net,ib}/mlx5: Don't disable local loopback multicast traffic when needed · 8978cc92
      Eran Ben Elisha 提交于
      There are systems platform information management interfaces (such as
      HOST2BMC) for which we cannot disable local loopback multicast traffic.
      
      Separate disable_local_lb_mc and disable_local_lb_uc capability bits so
      driver will not disable multicast loopback traffic if not supported.
      (It is expected that Firmware will not set disable_local_lb_mc if
      HOST2BMC is running for example.)
      
      Function mlx5_nic_vport_update_local_lb will do best effort to
      disable/enable UC/MC loopback traffic and return success only in case it
      succeeded to changed all allowed by Firmware.
      
      Adapt mlx5_ib and mlx5e to support the new cap bits.
      
      Fixes: 2c43c5a0 ("net/mlx5e: Enable local loopback in loopback selftest")
      Fixes: c85023e1 ("IB/mlx5: Add raw ethernet local loopback support")
      Fixes: bded747b ("net/mlx5: Add raw ethernet local loopback firmware command")
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Cc: kernel-team@fb.com
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      8978cc92
  5. 10 1月, 2018 1 次提交
    • D
      bpf: avoid false sharing of map refcount with max_entries · be95a845
      Daniel Borkmann 提交于
      In addition to commit b2157399 ("bpf: prevent out-of-bounds
      speculation") also change the layout of struct bpf_map such that
      false sharing of fast-path members like max_entries is avoided
      when the maps reference counter is altered. Therefore enforce
      them to be placed into separate cachelines.
      
      pahole dump after change:
      
        struct bpf_map {
              const struct bpf_map_ops  * ops;                 /*     0     8 */
              struct bpf_map *           inner_map_meta;       /*     8     8 */
              void *                     security;             /*    16     8 */
              enum bpf_map_type          map_type;             /*    24     4 */
              u32                        key_size;             /*    28     4 */
              u32                        value_size;           /*    32     4 */
              u32                        max_entries;          /*    36     4 */
              u32                        map_flags;            /*    40     4 */
              u32                        pages;                /*    44     4 */
              u32                        id;                   /*    48     4 */
              int                        numa_node;            /*    52     4 */
              bool                       unpriv_array;         /*    56     1 */
      
              /* XXX 7 bytes hole, try to pack */
      
              /* --- cacheline 1 boundary (64 bytes) --- */
              struct user_struct *       user;                 /*    64     8 */
              atomic_t                   refcnt;               /*    72     4 */
              atomic_t                   usercnt;              /*    76     4 */
              struct work_struct         work;                 /*    80    32 */
              char                       name[16];             /*   112    16 */
              /* --- cacheline 2 boundary (128 bytes) --- */
      
              /* size: 128, cachelines: 2, members: 17 */
              /* sum members: 121, holes: 1, sum holes: 7 */
        };
      
      Now all entries in the first cacheline are read only throughout
      the life time of the map, set up once during map creation. Overall
      struct size and number of cachelines doesn't change from the
      reordering. struct bpf_map is usually first member and embedded
      in map structs in specific map implementations, so also avoid those
      members to sit at the end where it could potentially share the
      cacheline with first map values e.g. in the array since remote
      CPUs could trigger map updates just as well for those (easily
      dirtying members like max_entries intentionally as well) while
      having subsequent values in cache.
      
      Quoting from Google's Project Zero blog [1]:
      
        Additionally, at least on the Intel machine on which this was
        tested, bouncing modified cache lines between cores is slow,
        apparently because the MESI protocol is used for cache coherence
        [8]. Changing the reference counter of an eBPF array on one
        physical CPU core causes the cache line containing the reference
        counter to be bounced over to that CPU core, making reads of the
        reference counter on all other CPU cores slow until the changed
        reference counter has been written back to memory. Because the
        length and the reference counter of an eBPF array are stored in
        the same cache line, this also means that changing the reference
        counter on one physical CPU core causes reads of the eBPF array's
        length to be slow on other physical CPU cores (intentional false
        sharing).
      
      While this doesn't 'control' the out-of-bounds speculation through
      masking the index as in commit b2157399, triggering a manipulation
      of the map's reference counter is really trivial, so lets not allow
      to easily affect max_entries from it.
      
      Splitting to separate cachelines also generally makes sense from
      a performance perspective anyway in that fast-path won't have a
      cache miss if the map gets pinned, reused in other progs, etc out
      of control path, thus also avoids unintentional false sharing.
      
        [1] https://googleprojectzero.blogspot.ch/2018/01/reading-privileged-memory-with-side.htmlSigned-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      be95a845
  6. 09 1月, 2018 2 次提交
    • A
      bpf: prevent out-of-bounds speculation · b2157399
      Alexei Starovoitov 提交于
      Under speculation, CPUs may mis-predict branches in bounds checks. Thus,
      memory accesses under a bounds check may be speculated even if the
      bounds check fails, providing a primitive for building a side channel.
      
      To avoid leaking kernel data round up array-based maps and mask the index
      after bounds check, so speculated load with out of bounds index will load
      either valid value from the array or zero from the padded area.
      
      Unconditionally mask index for all array types even when max_entries
      are not rounded to power of 2 for root user.
      When map is created by unpriv user generate a sequence of bpf insns
      that includes AND operation to make sure that JITed code includes
      the same 'index & index_mask' operation.
      
      If prog_array map is created by unpriv user replace
        bpf_tail_call(ctx, map, index);
      with
        if (index >= max_entries) {
          index &= map->index_mask;
          bpf_tail_call(ctx, map, index);
        }
      (along with roundup to power 2) to prevent out-of-bounds speculation.
      There is secondary redundant 'if (index >= max_entries)' in the interpreter
      and in all JITs, but they can be optimized later if necessary.
      
      Other array-like maps (cpumap, devmap, sockmap, perf_event_array, cgroup_array)
      cannot be used by unpriv, so no changes there.
      
      That fixes bpf side of "Variant 1: bounds check bypass (CVE-2017-5753)" on
      all architectures with and without JIT.
      
      v2->v3:
      Daniel noticed that attack potentially can be crafted via syscall commands
      without loading the program, so add masking to those paths as well.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      b2157399
    • I
      locking/lockdep: Remove cross-release leftovers · 527187d2
      Ingo Molnar 提交于
      There's two cross-release leftover facilities:
      
       - the crossrelease_hist_*() irq-tracing callbacks (NOPs currently)
       - the complete_release_commit() callback (NOP as well)
      
      Remove them.
      
      Cc: David Sterba <dsterba@suse.com>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      527187d2
  7. 08 1月, 2018 1 次提交
  8. 06 1月, 2018 2 次提交
  9. 03 1月, 2018 1 次提交
    • A
      efi/capsule-loader: Reinstate virtual capsule mapping · f24c4d47
      Ard Biesheuvel 提交于
      Commit:
      
        82c3768b ("efi/capsule-loader: Use a cached copy of the capsule header")
      
      ... refactored the capsule loading code that maps the capsule header,
      to avoid having to map it several times.
      
      However, as it turns out, the vmap() call we ended up removing did not
      just map the header, but the entire capsule image, and dropping this
      virtual mapping breaks capsules that are processed by the firmware
      immediately (i.e., without a reboot).
      
      Unfortunately, that change was part of a larger refactor that allowed
      a quirk to be implemented for Quark, which has a non-standard memory
      layout for capsules, and we have slightly painted ourselves into a
      corner by allowing quirk code to mangle the capsule header and memory
      layout.
      
      So we need to fix this without breaking Quark. Fortunately, Quark does
      not appear to care about the virtual mapping, and so we can simply
      do a partial revert of commit:
      
        2a457fb3 ("efi/capsule-loader: Use page addresses rather than struct page pointers")
      
      ... and create a vmap() mapping of the entire capsule (including header)
      based on the reinstated struct page array, unless running on Quark, in
      which case we pass the capsule header copy as before.
      Reported-by: NGe Song <ge.song@hxt-semitech.com>
      Tested-by: NBryan O'Donoghue <pure.logic@nexus-software.ie>
      Tested-by: NGe Song <ge.song@hxt-semitech.com>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: <stable@vger.kernel.org>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Fixes: 82c3768b ("efi/capsule-loader: Use a cached copy of the capsule header")
      Link: http://lkml.kernel.org/r/20180102172110.17018-3-ard.biesheuvel@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f24c4d47
  10. 02 1月, 2018 1 次提交
    • D
      fscache: Fix the default for fscache_maybe_release_page() · 98801506
      David Howells 提交于
      Fix the default for fscache_maybe_release_page() for when the cookie isn't
      valid or the page isn't cached.  It mustn't return false as that indicates
      the page cannot yet be freed.
      
      The problem with the default is that if, say, there's no cache, but a
      network filesystem's pages are using up almost all the available memory, a
      system can OOM because the filesystem ->releasepage() op will not allow
      them to be released as fscache_maybe_release_page() incorrectly prevents
      it.
      
      This can be tested by writing a sequence of 512MiB files to an AFS mount.
      It does not affect NFS or CIFS because both of those wrap the call in a
      check of PG_fscache and it shouldn't bother Ceph as that only has
      PG_private set whilst writeback is in progress.  This might be an issue for
      9P, however.
      
      Note that the pages aren't entirely stuck.  Removing a file or unmounting
      will clear things because that uses ->invalidatepage() instead.
      
      Fixes: 201a1542 ("FS-Cache: Handle pages pending storage that get evicted under OOM conditions")
      Reported-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Tested-by: NMarc Dionne <marc.dionne@auristor.com>
      cc: stable@vger.kernel.org # 2.6.32+
      98801506
  11. 30 12月, 2017 3 次提交
    • T
      timers: Reinitialize per cpu bases on hotplug · 26456f87
      Thomas Gleixner 提交于
      The timer wheel bases are not (re)initialized on CPU hotplug. That leaves
      them with a potentially stale clk and next_expiry valuem, which can cause
      trouble then the CPU is plugged.
      
      Add a prepare callback which forwards the clock, sets next_expiry to far in
      the future and reset the control flags to a known state.
      
      Set base->must_forward_clk so the first timer which is queued will try to
      forward the clock to current jiffies.
      
      Fixes: 500462a9 ("timers: Switch to a non-cascading wheel")
      Reported-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712272152200.2431@nanos
      26456f87
    • T
      genirq/irqdomain: Rename early argument of irq_domain_activate_irq() · 702cb0a0
      Thomas Gleixner 提交于
      The 'early' argument of irq_domain_activate_irq() is actually used to
      denote reservation mode. To avoid confusion, rename it before abuse
      happens.
      
      No functional change.
      
      Fixes: 72491643 ("genirq/irqdomain: Update irq_domain_ops.activate() signature")
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Alexandru Chirvasitu <achirvasub@gmail.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Maciej W. Rozycki <macro@linux-mips.org>
      Cc: Mikael Pettersson <mikpelinux@gmail.com>
      Cc: Josh Poulson <jopoulso@microsoft.com>
      Cc: Mihai Costache <v-micos@microsoft.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: linux-pci@vger.kernel.org
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Dexuan Cui <decui@microsoft.com>
      Cc: Simon Xiao <sixiao@microsoft.com>
      Cc: Saeed Mahameed <saeedm@mellanox.com>
      Cc: Jork Loeser <Jork.Loeser@microsoft.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: devel@linuxdriverproject.org
      Cc: KY Srinivasan <kys@microsoft.com>
      Cc: Alan Cox <alan@linux.intel.com>
      Cc: Sakari Ailus <sakari.ailus@intel.com>,
      Cc: linux-media@vger.kernel.org
      702cb0a0
    • T
      genirq: Introduce IRQD_CAN_RESERVE flag · 69790ba9
      Thomas Gleixner 提交于
      Add a new flag to mark interrupts which can use reservation mode. This is
      going to be used in subsequent patches to disable reservation mode for a
      certain class of MSI devices.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NAlexandru Chirvasitu <achirvasub@gmail.com>
      Tested-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Maciej W. Rozycki <macro@linux-mips.org>
      Cc: Mikael Pettersson <mikpelinux@gmail.com>
      Cc: Josh Poulson <jopoulso@microsoft.com>
      Cc: Mihai Costache <v-micos@microsoft.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: linux-pci@vger.kernel.org
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Dexuan Cui <decui@microsoft.com>
      Cc: Simon Xiao <sixiao@microsoft.com>
      Cc: Saeed Mahameed <saeedm@mellanox.com>
      Cc: Jork Loeser <Jork.Loeser@microsoft.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: devel@linuxdriverproject.org
      Cc: KY Srinivasan <kys@microsoft.com>
      Cc: Alan Cox <alan@linux.intel.com>
      Cc: Sakari Ailus <sakari.ailus@intel.com>,
      Cc: linux-media@vger.kernel.org
      69790ba9
  12. 28 12月, 2017 2 次提交
  13. 24 12月, 2017 1 次提交
    • T
      x86/mm/pti: Add infrastructure for page table isolation · aa8c6248
      Thomas Gleixner 提交于
      Add the initial files for kernel page table isolation, with a minimal init
      function and the boot time detection for this misfeature.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      aa8c6248
  14. 22 12月, 2017 2 次提交
    • M
      IB/mlx5: Fix congestion counters in LAG mode · 71a0ff65
      Majd Dibbiny 提交于
      Congestion counters are counted and queried per physical function.
      When working in LAG mode, CNP packets can be sent or received on both
      of the functions, thus congestion counters should be aggregated from
      the two physical functions.
      
      Fixes: e1f24a79 ("IB/mlx5: Support congestion related counters")
      Signed-off-by: NMajd Dibbiny <majd@mellanox.com>
      Reviewed-by: NAviv Heller <avivh@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      71a0ff65
    • S
      net: reevalulate autoflowlabel setting after sysctl setting · 513674b5
      Shaohua Li 提交于
      sysctl.ip6.auto_flowlabels is default 1. In our hosts, we set it to 2.
      If sockopt doesn't set autoflowlabel, outcome packets from the hosts are
      supposed to not include flowlabel. This is true for normal packet, but
      not for reset packet.
      
      The reason is ipv6_pinfo.autoflowlabel is set in sock creation. Later if
      we change sysctl.ip6.auto_flowlabels, the ipv6_pinfo.autoflowlabel isn't
      changed, so the sock will keep the old behavior in terms of auto
      flowlabel. Reset packet is suffering from this problem, because reset
      packet is sent from a special control socket, which is created at boot
      time. Since sysctl.ipv6.auto_flowlabels is 1 by default, the control
      socket will always have its ipv6_pinfo.autoflowlabel set, even after
      user set sysctl.ipv6.auto_flowlabels to 1, so reset packset will always
      have flowlabel. Normal sock created before sysctl setting suffers from
      the same issue. We can't even turn off autoflowlabel unless we kill all
      socks in the hosts.
      
      To fix this, if IPV6_AUTOFLOWLABEL sockopt is used, we use the
      autoflowlabel setting from user, otherwise we always call
      ip6_default_np_autolabel() which has the new settings of sysctl.
      
      Note, this changes behavior a little bit. Before commit 42240901
      (ipv6: Implement different admin modes for automatic flow labels), the
      autoflowlabel behavior of a sock isn't sticky, eg, if sysctl changes,
      existing connection will change autoflowlabel behavior. After that
      commit, autoflowlabel behavior is sticky in the whole life of the sock.
      With this patch, the behavior isn't sticky again.
      
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Tom Herbert <tom@quantonium.net>
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      513674b5
  15. 21 12月, 2017 3 次提交
    • A
      bpf: fix integer overflows · bb7f0f98
      Alexei Starovoitov 提交于
      There were various issues related to the limited size of integers used in
      the verifier:
       - `off + size` overflow in __check_map_access()
       - `off + reg->off` overflow in check_mem_access()
       - `off + reg->var_off.value` overflow or 32-bit truncation of
         `reg->var_off.value` in check_mem_access()
       - 32-bit truncation in check_stack_boundary()
      
      Make sure that any integer math cannot overflow by not allowing
      pointer math with large values.
      
      Also reduce the scope of "scalar op scalar" tracking.
      
      Fixes: f1174f77 ("bpf/verifier: rework value tracking")
      Reported-by: NJann Horn <jannh@google.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      bb7f0f98
    • J
      block: unalign call_single_data in struct request · 4ccafe03
      Jens Axboe 提交于
      A previous change blindly added massive alignment to the
      call_single_data structure in struct request. This ballooned it in size
      from 296 to 320 bytes on my setup, for no valid reason at all.
      
      Use the unaligned struct __call_single_data variant instead.
      
      Fixes: 966a9671 ("smp: Avoid using two cache lines for struct call_single_data")
      Cc: stable@vger.kernel.org # v4.14
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4ccafe03
    • S
      block-throttle: avoid double charge · 111be883
      Shaohua Li 提交于
      If a bio is throttled and split after throttling, the bio could be
      resubmited and enters the throttling again. This will cause part of the
      bio to be charged multiple times. If the cgroup has an IO limit, the
      double charge will significantly harm the performance. The bio split
      becomes quite common after arbitrary bio size change.
      
      To fix this, we always set the BIO_THROTTLED flag if a bio is throttled.
      If the bio is cloned/split, we copy the flag to new bio too to avoid a
      double charge. However, cloned bio could be directed to a new disk,
      keeping the flag be a problem. The observation is we always set new disk
      for the bio in this case, so we can clear the flag in bio_set_dev().
      
      This issue exists for a long time, arbitrary bio size change just makes
      it worse, so this should go into stable at least since v4.2.
      
      V1-> V2: Not add extra field in bio based on discussion with Tejun
      
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: stable@vger.kernel.org
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      111be883
  16. 20 12月, 2017 3 次提交
    • M
      net/mlx5: Cleanup IRQs in case of unload failure · d6b2785c
      Moshe Shemesh 提交于
      When mlx5_stop_eqs fails to destroy any of the eqs it returns with an error.
      In such failure flow the function will return without
      releasing all EQs irqs and then pci_free_irq_vectors will fail.
      Fix by only warn on destroy EQ failure and continue to release other
      EQs and their irqs.
      
      It fixes the following kernel trace:
      kernel: kernel BUG at drivers/pci/msi.c:352!
      ...
      ...
      kernel: Call Trace:
      kernel: pci_disable_msix+0xd3/0x100
      kernel: pci_free_irq_vectors+0xe/0x20
      kernel: mlx5_load_one.isra.17+0x9f5/0xec0 [mlx5_core]
      
      Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
      Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      d6b2785c
    • E
      net/mlx5: Fix rate limit packet pacing naming and struct · 37e92a9d
      Eran Ben Elisha 提交于
      In mlx5_ifc, struct size was not complete, and thus driver was sending
      garbage after the last defined field. Fixed it by adding reserved field
      to complete the struct size.
      
      In addition, rename all set_rate_limit to set_pp_rate_limit to be
      compliant with the Firmware <-> Driver definition.
      
      Fixes: 7486216b ("{net,IB}/mlx5: mlx5_ifc updates")
      Fixes: 1466cc5b ("net/mlx5: Rate limit tables support")
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      37e92a9d
    • S
      Revert "mlx5: move affinity hints assignments to generic code" · 231243c8
      Saeed Mahameed 提交于
      Before the offending commit, mlx5 core did the IRQ affinity itself,
      and it seems that the new generic code have some drawbacks and one
      of them is the lack for user ability to modify irq affinity after
      the initial affinity values got assigned.
      
      The issue is still being discussed and a solution in the new generic code
      is required, until then we need to revert this patch.
      
      This fixes the following issue:
      echo <new affinity> > /proc/irq/<x>/smp_affinity
      fails with  -EIO
      
      This reverts commit a435393a.
      Note: kept mlx5_get_vector_affinity in include/linux/mlx5/driver.h since
      it is used in mlx5_ib driver.
      
      Fixes: a435393a ("mlx5: move affinity hints assignments to generic code")
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jes Sorensen <jsorensen@fb.com>
      Reported-by: NJes Sorensen <jsorensen@fb.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      231243c8
  17. 19 12月, 2017 2 次提交
    • J
      block: fix blk_rq_append_bio · 0abc2a10
      Jens Axboe 提交于
      Commit caa4b024(blk-map: call blk_queue_bounce from blk_rq_append_bio)
      moves blk_queue_bounce() into blk_rq_append_bio(), but don't consider
      the fact that the bounced bio becomes invisible to caller since the
      parameter type is 'struct bio *'. Make it a pointer to a pointer to
      a bio, so the caller sees the right bio also after a bounce.
      
      Fixes: caa4b024 ("blk-map: call blk_queue_bounce from blk_rq_append_bio")
      Cc: Christoph Hellwig <hch@lst.de>
      Reported-by: NMichele Ballabio <barra_cuda@katamail.com>
      (handling failure of blk_rq_append_bio(), only call bio_get() after
      blk_rq_append_bio() returns OK)
      Tested-by: NMichele Ballabio <barra_cuda@katamail.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0abc2a10
    • M
      block: don't let passthrough IO go into .make_request_fn() · 14cb0dc6
      Ming Lei 提交于
      Commit a8821f3f("block: Improvements to bounce-buffer handling") tries
      to make sure that the bio to .make_request_fn won't exceed BIO_MAX_PAGES,
      but ignores that passthrough I/O can use blk_queue_bounce() too.
      Especially, passthrough IO may not be sector-aligned, and the check
      of 'sectors < bio_sectors(*bio_orig)' inside __blk_queue_bounce() may
      become true even though the max bvec number doesn't exceed BIO_MAX_PAGES,
      then cause the bio splitted, and the original passthrough bio is submited
      to generic_make_request().
      
      This patch fixes this issue by checking if the bio is passthrough IO,
      and use bio_kmalloc() to allocate the cloned passthrough bio.
      
      Cc: NeilBrown <neilb@suse.com>
      Fixes: a8821f3f("block: Improvements to bounce-buffer handling")
      Tested-by: NMichele Ballabio <barra_cuda@katamail.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      14cb0dc6
  18. 17 12月, 2017 3 次提交
  19. 15 12月, 2017 6 次提交