1. 09 9月, 2016 1 次提交
  2. 05 9月, 2016 1 次提交
    • M
      bonding: Fix bonding crash · 24b27fc4
      Mahesh Bandewar 提交于
      Following few steps will crash kernel -
      
        (a) Create bonding master
            > modprobe bonding miimon=50
        (b) Create macvlan bridge on eth2
            > ip link add link eth2 dev mvl0 address aa:0:0:0:0:01 \
      	   type macvlan
        (c) Now try adding eth2 into the bond
            > echo +eth2 > /sys/class/net/bond0/bonding/slaves
            <crash>
      
      Bonding does lots of things before checking if the device enslaved is
      busy or not.
      
      In this case when the notifier call-chain sends notifications, the
      bond_netdev_event() assumes that the rx_handler /rx_handler_data is
      registered while the bond_enslave() hasn't progressed far enough to
      register rx_handler for the new slave.
      
      This patch adds a rx_handler check that can be performed right at the
      beginning of the enslave code to avoid getting into this situation.
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      24b27fc4
  3. 29 8月, 2016 1 次提交
    • R
      net: smc91x: fix SMC accesses · 2fb04fdf
      Russell King 提交于
      Commit b70661c7 ("net: smc91x: use run-time configuration on all ARM
      machines") broke some ARM platforms through several mistakes.  Firstly,
      the access size must correspond to the following rule:
      
      (a) at least one of 16-bit or 8-bit access size must be supported
      (b) 32-bit accesses are optional, and may be enabled in addition to
          the above.
      
      Secondly, it provides no emulation of 16-bit accesses, instead blindly
      making 16-bit accesses even when the platform specifies that only 8-bit
      is supported.
      
      Reorganise smc91x.h so we can make use of the existing 16-bit access
      emulation already provided - if 16-bit accesses are supported, use
      16-bit accesses directly, otherwise if 8-bit accesses are supported,
      use the provided 16-bit access emulation.  If neither, BUG().  This
      exactly reflects the driver behaviour prior to the commit being fixed.
      
      Since the conversion incorrectly cut down the available access sizes on
      several platforms, we also need to go through every platform and fix up
      the overly-restrictive access size: Arnd assumed that if a platform can
      perform 32-bit, 16-bit and 8-bit accesses, then only a 32-bit access
      size needed to be specified - not so, all available access sizes must
      be specified.
      
      This likely fixes some performance regressions in doing this: if a
      platform does not support 8-bit accesses, 8-bit accesses have been
      emulated by performing a 16-bit read-modify-write access.
      
      Tested on the Intel Assabet/Neponset platform, which supports only 8-bit
      accesses, which was broken by the original commit.
      
      Fixes: b70661c7 ("net: smc91x: use run-time configuration on all ARM machines")
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Tested-by: NRobert Jarzmik <robert.jarzmik@free.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2fb04fdf
  4. 27 8月, 2016 2 次提交
  5. 24 8月, 2016 1 次提交
    • T
      drm/tegra: dsi: Enhance runtime power management · 87904c3e
      Thierry Reding 提交于
      The MIPI DSI output on Tegra SoCs requires some external logic to
      calibrate the MIPI pads before a video signal can be transmitted. This
      MIPI calibration logic requires to be powered on while the MIPI pads are
      being used, which is currently done as part of the DSI driver's probe
      implementation.
      
      This is suboptimal because it will leave the MIPI calibration logic
      powered up even if the DSI output is never used.
      
      On Tegra114 and earlier this behaviour also causes the driver to hang
      while trying to power up the MIPI calibration logic because the power
      partition that contains the MIPI calibration logic will be powered on
      by the display controller at output pipeline configuration time. Thus
      the power up sequence for the MIPI calibration logic happens before
      it's power partition is guaranteed to be enabled.
      
      Fix this by splitting up the API into a request/free pair of functions
      that manage the runtime dependency between the DSI and the calibration
      modules (no registers are accessed) and a set of enable, calibrate and
      disable functions that program the MIPI calibration logic at points in
      time where the power partition is really enabled.
      
      While at it, make sure that the runtime power management also works in
      ganged mode, which is currently also broken.
      Reported-by: NJonathan Hunter <jonathanh@nvidia.com>
      Tested-by: NJonathan Hunter <jonathanh@nvidia.com>
      Signed-off-by: NThierry Reding <treding@nvidia.com>
      87904c3e
  6. 18 8月, 2016 2 次提交
  7. 17 8月, 2016 1 次提交
    • C
      PCI: Use positive flags in pci_alloc_irq_vectors() · 4fe0d154
      Christoph Hellwig 提交于
      Instead of passing negative flags like PCI_IRQ_NOMSI to prevent use of
      certain interrupt types, pass positive flags like PCI_IRQ_LEGACY,
      PCI_IRQ_MSI, etc., to specify the acceptable interrupt types.
      
      This is based on a number of pending driver conversions that just happend
      to be a whole more obvious to read this way, and given that we have no
      users in the tree yet it can still easily be done.
      
      I've also added a PCI_IRQ_ALL_TYPES catchall to keep the case of accepting
      all interrupt types very simple.
      
      [bhelgaas: changelog, fix PCI_IRQ_AFFINITY doc typo, remove mention of
      PCI_IRQ_NOLEGACY]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NAlexander Gordeev <agordeev@redhat.com>
      4fe0d154
  8. 16 8月, 2016 1 次提交
  9. 14 8月, 2016 1 次提交
    • S
      net: remove type_check from dev_get_nest_level() · 952fcfd0
      Sabrina Dubroca 提交于
      The idea for type_check in dev_get_nest_level() was to count the number
      of nested devices of the same type (currently, only macvlan or vlan
      devices).
      This prevented the false positive lockdep warning on configurations such
      as:
      
      eth0 <--- macvlan0 <--- vlan0 <--- macvlan1
      
      However, this doesn't prevent a warning on a configuration such as:
      
      eth0 <--- macvlan0 <--- vlan0
      eth1 <--- vlan1 <--- macvlan1
      
      In this case, all the locks end up with a nesting subclass of 1, so
      lockdep thinks that there is still a deadlock:
      
      - in the first case we have (macvlan_netdev_addr_lock_key, 1) and then
        take (vlan_netdev_xmit_lock_key, 1)
      - in the second case, we have (vlan_netdev_xmit_lock_key, 1) and then
        take (macvlan_netdev_addr_lock_key, 1)
      
      By removing the linktype check in dev_get_nest_level() and always
      incrementing the nesting depth, lockdep considers this configuration
      valid.
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      952fcfd0
  10. 12 8月, 2016 2 次提交
  11. 11 8月, 2016 1 次提交
  12. 10 8月, 2016 2 次提交
    • D
      perf/core: Set cgroup in CPU contexts for new cgroup events · db4a8356
      David Carrillo-Cisneros 提交于
      There's a perf stat bug easy to observer on a machine with only one cgroup:
      
        $ perf stat -e cycles -I 1000 -C 0 -G /
        #          time             counts unit events
            1.000161699      <not counted>      cycles                    /
            2.000355591      <not counted>      cycles                    /
            3.000565154      <not counted>      cycles                    /
            4.000951350      <not counted>      cycles                    /
      
      We'd expect some output there.
      
      The underlying problem is that there is an optimization in
      perf_cgroup_sched_{in,out}() that skips the switch of cgroup events
      if the old and new cgroups in a task switch are the same.
      
      This optimization interacts with the current code in two ways
      that cause a CPU context's cgroup (cpuctx->cgrp) to be NULL even if a
      cgroup event matches the current task. These are:
      
        1. On creation of the first cgroup event in a CPU: In current code,
        cpuctx->cpu is only set in perf_cgroup_sched_in, but due to the
        aforesaid optimization, perf_cgroup_sched_in will run until the next
        cgroup switches in that CPU. This may happen late or never happen,
        depending on system's number of cgroups, CPU load, etc.
      
        2. On deletion of the last cgroup event in a cpuctx: In list_del_event,
        cpuctx->cgrp is set NULL. Any new cgroup event will not be sched in
        because cpuctx->cgrp == NULL until a cgroup switch occurs and
        perf_cgroup_sched_in is executed (updating cpuctx->cgrp).
      
      This patch fixes both problems by setting cpuctx->cgrp in list_add_event,
      mirroring what list_del_event does when removing a cgroup event from CPU
      context, as introduced in:
      
        commit 68cacd29 ("perf_events: Fix stale ->cgrp pointer in update_cgrp_time_from_cpuctx()")
      
      With this patch, cpuctx->cgrp is always set/clear when installing/removing
      the first/last cgroup event in/from the CPU context. With cpuctx->cgrp
      correctly set, event_filter_match works as intended when events are
      sched in/out.
      
      After the fix, the output is as expected:
      
        $ perf stat -e cycles -I 1000 -a -G /
        #         time             counts unit events
           1.004699159          627342882      cycles                    /
           2.007397156          615272690      cycles                    /
           3.010019057          616726074      cycles                    /
      Signed-off-by: NDavid Carrillo-Cisneros <davidcc@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vegard Nossum <vegard.nossum@gmail.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1470124092-113192-1-git-send-email-davidcc@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      db4a8356
    • L
      Revert "printk: create pr_<level> functions" · a0cba217
      Linus Torvalds 提交于
      This reverts commit 874f9c7d.
      
      Geert Uytterhoeven reports:
       "This change seems to have an (unintendent?) side-effect.
      
        Before, pr_*() calls without a trailing newline characters would be
        printed with a newline character appended, both on the console and in
        the output of the dmesg command.
      
        After this commit, no new line character is appended, and the output
        of the next pr_*() call of the same type may be appended, like in:
      
          - Truncating RAM at 0x0000000040000000-0x00000000c0000000 to -0x0000000070000000
          - Ignoring RAM at 0x0000000200000000-0x0000000240000000 (!CONFIG_HIGHMEM)
          + Truncating RAM at 0x0000000040000000-0x00000000c0000000 to -0x0000000070000000Ignoring RAM at 0x0000000200000000-0x0000000240000000 (!CONFIG_HIGHMEM)"
      
      Joe Perches says:
       "No, that is not intentional.
      
        The newline handling code inside vprintk_emit is a bit involved and
        for now I suggest a revert until this has all the same behavior as
        earlier"
      Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Requested-by: NJoe Perches <joe@perches.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a0cba217
  13. 09 8月, 2016 6 次提交
    • A
      KVM: arm64: ITS: return 1 on successful MSI injection · fd837b08
      Andre Przywara 提交于
      According to the KVM API documentation a successful MSI injection
      should return a value > 0 on success.
      Return possible errors in vgic_its_trigger_msi() and report a
      successful injection back to userland, while also reporting the
      case where the MSI could not be delivered due to the guest not
      having the LPI mapped, for instance.
      Signed-off-by: NAndre Przywara <andre.przywara@arm.com>
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      fd837b08
    • M
      genirq/msi: Make sure PCI MSIs are activated early · f3b0946d
      Marc Zyngier 提交于
      Bharat Kumar Gogada reported issues with the generic MSI code, where the
      end-point ended up with garbage in its MSI configuration (both for the vector
      and the message).
      
      It turns out that the two MSI paths in the kernel are doing slightly different
      things:
      
      generic MSI: disable MSI -> allocate MSI -> enable MSI -> setup EP
      PCI MSI: disable MSI -> allocate MSI -> setup EP -> enable MSI
      
      And it turns out that end-points are allowed to latch the content of the MSI
      configuration registers as soon as MSIs are enabled.  In Bharat's case, the
      end-point ends up using whatever was there already, which is not what you
      want.
      
      In order to make things converge, we introduce a new MSI domain flag
      (MSI_FLAG_ACTIVATE_EARLY) that is unconditionally set for PCI/MSI. When set,
      this flag forces the programming of the end-point as soon as the MSIs are
      allocated.
      
      A consequence of this is that we have an extra activate in irq_startup, but
      that should be without much consequence.
      
      tglx: 
      
       - Several people reported a VMWare regression with PCI/MSI-X passthrough. It
         turns out that the patch also cures that issue.
      
       - We need to have a look at the MSI disable interrupt path, where we write
         the msg to all zeros without disabling MSI in the PCI device. Is that
         correct?
      
      Fixes: 52f518a3 "x86/MSI: Use hierarchical irqdomains to manage MSI interrupts"
      Reported-and-tested-by: NBharat Kumar Gogada <bharat.kumar.gogada@xilinx.com>
      Reported-and-tested-by: NFoster Snowhill <forst@forstwoof.ru>
      Reported-by: NMatthias Prager <linux@matthiasprager.de>
      Reported-by: NJason Taylor <jason.taylor@simplivity.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Acked-by: NBjorn Helgaas <bhelgaas@google.com>
      Cc: linux-pci@vger.kernel.org
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/1468426713-31431-1-git-send-email-marc.zyngier@arm.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      f3b0946d
    • S
      qed: Add dcbx app support for IEEE Selection Field. · 59bcb797
      Sudarsana Reddy Kalluru 提交于
      MFW now supports the Selection field for IEEE mode. Add driver changes to
      use the newer MFW masks to read/write the port-id value.
      Signed-off-by: NSudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com>
      Signed-off-by: NYuval Mintz <Yuval.Mintz@qlogic.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      59bcb797
    • D
      bpf: fix checksum fixups on bpf_skb_store_bytes · 479ffccc
      Daniel Borkmann 提交于
      bpf_skb_store_bytes() invocations above L2 header need BPF_F_RECOMPUTE_CSUM
      flag for updates, so that CHECKSUM_COMPLETE will be fixed up along the way.
      Where we ran into an issue with bpf_skb_store_bytes() is when we did a
      single-byte update on the IPv6 hoplimit despite using BPF_F_RECOMPUTE_CSUM
      flag; simple ping via ICMPv6 triggered a hw csum failure as a result. The
      underlying issue has been tracked down to a buffer alignment issue.
      
      Meaning, that csum_partial() computations via skb_postpull_rcsum() and
      skb_postpush_rcsum() pair invoked had a wrong result since they operated on
      an odd address for the hoplimit, while other computations were done on an
      even address. This mix doesn't work as-is with skb_postpull_rcsum(),
      skb_postpush_rcsum() pair as it always expects at least half-word alignment
      of input buffers, which is normally the case. Thus, instead of these helpers
      using csum_sub() and (implicitly) csum_add(), we need to use csum_block_sub(),
      csum_block_add(), respectively. For unaligned offsets, they rotate the sum
      to align it to a half-word boundary again, otherwise they work the same as
      csum_sub() and csum_add().
      
      Adding __skb_postpull_rcsum(), __skb_postpush_rcsum() variants that take the
      offset as an input and adapting bpf_skb_store_bytes() to them fixes the hw
      csum failures again. The skb_postpull_rcsum(), skb_postpush_rcsum() helpers
      use a 0 constant for offset so that the compiler optimizes the offset & 1
      test away and generates the same code as with csum_sub()/_add().
      
      Fixes: 608cd71a ("tc: bpf: generalize pedit action")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      479ffccc
    • L
      unsafe_[get|put]_user: change interface to use a error target label · 1bd4403d
      Linus Torvalds 提交于
      When I initially added the unsafe_[get|put]_user() helpers in commit
      5b24a7a2 ("Add 'unsafe' user access functions for batched
      accesses"), I made the mistake of modeling the interface on our
      traditional __[get|put]_user() functions, which return zero on success,
      or -EFAULT on failure.
      
      That interface is fairly easy to use, but it's actually fairly nasty for
      good code generation, since it essentially forces the caller to check
      the error value for each access.
      
      In particular, since the error handling is already internally
      implemented with an exception handler, and we already use "asm goto" for
      various other things, we could fairly easily make the error cases just
      jump directly to an error label instead, and avoid the need for explicit
      checking after each operation.
      
      So switch the interface to pass in an error label, rather than checking
      the error value in the caller.  Best do it now before we start growing
      more users (the signal handling code in particular would be a good place
      to use the new interface).
      
      So rather than
      
      	if (unsafe_get_user(x, ptr))
      		... handle error ..
      
      the interface is now
      
      	unsafe_get_user(x, ptr, label);
      
      where an error during the user mode fetch will now just cause a jump to
      'label' in the caller.
      
      Right now the actual _implementation_ of this all still ends up being a
      "if (err) goto label", and does not take advantage of any exception
      label tricks, but for "unsafe_put_user()" in particular it should be
      fairly straightforward to convert to using the exception table model.
      
      Note that "unsafe_get_user()" is much harder to convert to a clever
      exception table model, because current versions of gcc do not allow the
      use of "asm goto" (for the exception) with output values (for the actual
      value to be fetched).  But that is hopefully not a limitation in the
      long term.
      
      [ Also note that it might be a good idea to switch unsafe_get_user() to
        actually _return_ the value it fetches from user space, but this
        commit only changes the error handling semantics ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1bd4403d
    • P
      sctp: Export struct sctp_info to userspace · dca3f53c
      Phil Sutter 提交于
      This is required to correctly interpret INET_DIAG_INFO messages exported
      by sctp_diag module.
      Signed-off-by: NPhil Sutter <phil@nwl.cc>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dca3f53c
  14. 08 8月, 2016 2 次提交
    • J
      block: rename bio bi_rw to bi_opf · 1eff9d32
      Jens Axboe 提交于
      Since commit 63a4cc24, bio->bi_rw contains flags in the lower
      portion and the op code in the higher portions. This means that
      old code that relies on manually setting bi_rw is most likely
      going to be broken. Instead of letting that brokeness linger,
      rename the member, to force old and out-of-tree code to break
      at compile time instead of at runtime.
      
      No intended functional changes in this commit.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      1eff9d32
    • J
      block/mm: make bdev_ops->rw_page() take a bool for read/write · c11f0c0b
      Jens Axboe 提交于
      Commit abf54548 changed it from an 'rw' flags type to the
      newer ops based interface, but now we're effectively leaking
      some bdev internals to the rest of the kernel. Since we only
      care about whether it's a read or a write at that level, just
      pass in a bool 'is_write' parameter instead.
      
      Then we can also move op_is_write() and friends back under
      CONFIG_BLOCK protection.
      Reviewed-by: NMike Christie <mchristi@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      c11f0c0b
  15. 06 8月, 2016 2 次提交
  16. 05 8月, 2016 4 次提交
    • M
      mm/block: convert rw_page users to bio op use · abf54548
      Mike Christie 提交于
      The rw_page users were not converted to use bio/req ops. As a result
      bdev_write_page is not passing down REQ_OP_WRITE and the IOs will
      be sent down as reads.
      Signed-off-by: NMike Christie <mchristi@redhat.com>
      Fixes: 4e1b2d52 ("block, fs, drivers: remove REQ_OP compat defs and related code")
      
      Modified by me to:
      
      1) Drop op_flags passing into ->rw_page(), as we don't use it.
      2) Make op_is_write() and friends safe to use for !CONFIG_BLOCK
      Signed-off-by: NJens Axboe <axboe@fb.com>
      abf54548
    • J
      Include: blkdev: Removed duplicate 'struct request;' declaration. · 6d25ec14
      John Pittman 提交于
      In include/linux/blkdev.h duplicate declarations of the request
      struct exist.  Cleaned up by removing the second, unneeded
      declaration.
      Signed-off-by: NJohn Pittman <jpittman@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      6d25ec14
    • D
      block: fix bdi vs gendisk lifetime mismatch · df08c32c
      Dan Williams 提交于
      The name for a bdi of a gendisk is derived from the gendisk's devt.
      However, since the gendisk is destroyed before the bdi it leaves a
      window where a new gendisk could dynamically reuse the same devt while a
      bdi with the same name is still live.  Arrange for the bdi to hold a
      reference against its "owner" disk device while it is registered.
      Otherwise we can hit sysfs duplicate name collisions like the following:
      
       WARNING: CPU: 10 PID: 2078 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x64/0x80
       sysfs: cannot create duplicate filename '/devices/virtual/bdi/259:1'
      
       Hardware name: HP ProLiant DL580 Gen8, BIOS P79 05/06/2015
        0000000000000286 0000000002c04ad5 ffff88006f24f970 ffffffff8134caec
        ffff88006f24f9c0 0000000000000000 ffff88006f24f9b0 ffffffff8108c351
        0000001f0000000c ffff88105d236000 ffff88105d1031e0 ffff8800357427f8
       Call Trace:
        [<ffffffff8134caec>] dump_stack+0x63/0x87
        [<ffffffff8108c351>] __warn+0xd1/0xf0
        [<ffffffff8108c3cf>] warn_slowpath_fmt+0x5f/0x80
        [<ffffffff812a0d34>] sysfs_warn_dup+0x64/0x80
        [<ffffffff812a0e1e>] sysfs_create_dir_ns+0x7e/0x90
        [<ffffffff8134faaa>] kobject_add_internal+0xaa/0x320
        [<ffffffff81358d4e>] ? vsnprintf+0x34e/0x4d0
        [<ffffffff8134ff55>] kobject_add+0x75/0xd0
        [<ffffffff816e66b2>] ? mutex_lock+0x12/0x2f
        [<ffffffff8148b0a5>] device_add+0x125/0x610
        [<ffffffff8148b788>] device_create_groups_vargs+0xd8/0x100
        [<ffffffff8148b7cc>] device_create_vargs+0x1c/0x20
        [<ffffffff811b775c>] bdi_register+0x8c/0x180
        [<ffffffff811b7877>] bdi_register_dev+0x27/0x30
        [<ffffffff813317f5>] add_disk+0x175/0x4a0
      
      Cc: <stable@vger.kernel.org>
      Reported-by: NYi Zhang <yizhan@redhat.com>
      Tested-by: NYi Zhang <yizhan@redhat.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      
      Fixed up missing 0 return in bdi_register_owner().
      Signed-off-by: NJens Axboe <axboe@fb.com>
      df08c32c
    • P
      block: add missing group association in bio-cloning functions · 20bd723e
      Paolo Valente 提交于
      When a bio is cloned, the newly created bio must be associated with
      the same blkcg as the original bio (if BLK_CGROUP is enabled). If
      this operation is not performed, then the new bio is not associated
      with any group, and the group of the current task is returned when
      the group of the bio is requested.
      
      Depending on the cloning frequency, this may cause a large
      percentage of the bios belonging to a given group to be treated
      as if belonging to other groups (in most cases as if belonging to
      the root group). The expected group isolation may thereby be broken.
      
      This commit adds the missing association in bio-cloning functions.
      
      Fixes: da2f0f74 ("Btrfs: add support for blkio controllers")
      Cc: stable@vger.kernel.org # v4.3+
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Reviewed-by: NNikolay Borisov <kernel@kyup.com>
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      20bd723e
  17. 04 8月, 2016 10 次提交
    • J
      dynamic_debug: add jump label support · 9049fc74
      Jason Baron 提交于
      Although dynamic debug is often only used for debug builds, sometimes
      its enabled for production builds as well.  Minimize its impact by using
      jump labels.  This reduces the text section by 7000+ bytes in the kernel
      image below.  It does increase data, but this should only be referenced
      when changing the direction of the branches, and hence usually not in
      cache.
      
           text     data     bss       dec     hex  filename
        8194852  4879776  925696  14000324  d5a0c4  vmlinux.pre
        8187337  4960224  925696  14073257  d6bda9  vmlinux.post
      
      Link: http://lkml.kernel.org/r/d165b465e8c89bc582d973758d40be44c33f018b.1467837322.git.jbaron@akamai.comSigned-off-by: NJason Baron <jbaron@akamai.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9049fc74
    • J
      jump_label: remove bug.h, atomic.h dependencies for HAVE_JUMP_LABEL · 1f69bf9c
      Jason Baron 提交于
      The current jump_label.h includes bug.h for things such as WARN_ON().
      This makes the header problematic for inclusion by kernel.h or any
      headers that kernel.h includes, since bug.h includes kernel.h (circular
      dependency).  The inclusion of atomic.h is similarly problematic.  Thus,
      this should make jump_label.h 'includable' from most places.
      
      Link: http://lkml.kernel.org/r/7060ce35ddd0d20b33bf170685e6b0fab816bdf2.1467837322.git.jbaron@akamai.comSigned-off-by: NJason Baron <jbaron@akamai.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1f69bf9c
    • K
      dma-mapping: use unsigned long for dma_attrs · 00085f1e
      Krzysztof Kozlowski 提交于
      The dma-mapping core and the implementations do not change the DMA
      attributes passed by pointer.  Thus the pointer can point to const data.
      However the attributes do not have to be a bitfield.  Instead unsigned
      long will do fine:
      
      1. This is just simpler.  Both in terms of reading the code and setting
         attributes.  Instead of initializing local attributes on the stack
         and passing pointer to it to dma_set_attr(), just set the bits.
      
      2. It brings safeness and checking for const correctness because the
         attributes are passed by value.
      
      Semantic patches for this change (at least most of them):
      
          virtual patch
          virtual context
      
          @r@
          identifier f, attrs;
      
          @@
          f(...,
          - struct dma_attrs *attrs
          + unsigned long attrs
          , ...)
          {
          ...
          }
      
          @@
          identifier r.f;
          @@
          f(...,
          - NULL
          + 0
           )
      
      and
      
          // Options: --all-includes
          virtual patch
          virtual context
      
          @r@
          identifier f, attrs;
          type t;
      
          @@
          t f(..., struct dma_attrs *attrs);
      
          @@
          identifier r.f;
          @@
          f(...,
          - NULL
          + 0
           )
      
      Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.comSigned-off-by: NKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Acked-by: NVineet Gupta <vgupta@synopsys.com>
      Acked-by: NRobin Murphy <robin.murphy@arm.com>
      Acked-by: NHans-Christian Noren Egtvedt <egtvedt@samfundet.no>
      Acked-by: Mark Salter <msalter@redhat.com> [c6x]
      Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> [cris]
      Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [drm]
      Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
      Acked-by: Fabien Dessenne <fabien.dessenne@st.com> [bdisp]
      Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> [vb2-core]
      Acked-by: David Vrabel <david.vrabel@citrix.com> [xen]
      Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen swiotlb]
      Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
      Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon]
      Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
      Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390]
      Acked-by: NBjorn Andersson <bjorn.andersson@linaro.org>
      Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> [avr32]
      Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc]
      Acked-by: Robin Murphy <robin.murphy@arm.com> [arm64 and dma-iommu]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      00085f1e
    • A
      include/linux/bitmap.h: cleanup · 4b9d314c
      Andrew Morton 提交于
      Remove two unneeded `else's.
      
      Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4b9d314c
    • M
      tree-wide: replace config_enabled() with IS_ENABLED() · 97f2645f
      Masahiro Yamada 提交于
      The use of config_enabled() against config options is ambiguous.  In
      practical terms, config_enabled() is equivalent to IS_BUILTIN(), but the
      author might have used it for the meaning of IS_ENABLED().  Using
      IS_ENABLED(), IS_BUILTIN(), IS_MODULE() etc.  makes the intention
      clearer.
      
      This commit replaces config_enabled() with IS_ENABLED() where possible.
      This commit is only touching bool config options.
      
      I noticed two cases where config_enabled() is used against a tristate
      option:
      
       - config_enabled(CONFIG_HWMON)
        [ drivers/net/wireless/ath/ath10k/thermal.c ]
      
       - config_enabled(CONFIG_BACKLIGHT_CLASS_DEVICE)
        [ drivers/gpu/drm/gma500/opregion.c ]
      
      I did not touch them because they should be converted to IS_BUILTIN()
      in order to keep the logic, but I was not sure it was the authors'
      intention.
      
      Link: http://lkml.kernel.org/r/1465215656-20569-1-git-send-email-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Stas Sergeev <stsp@list.ru>
      Cc: Matt Redfearn <matt.redfearn@imgtec.com>
      Cc: Joshua Kinard <kumba@gentoo.org>
      Cc: Jiri Slaby <jslaby@suse.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Markos Chandras <markos.chandras@imgtec.com>
      Cc: "Dmitry V. Levin" <ldv@altlinux.org>
      Cc: yu-cheng yu <yu-cheng.yu@intel.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Will Drewry <wad@chromium.org>
      Cc: Nikolay Martynov <mar.kolya@gmail.com>
      Cc: Huacai Chen <chenhc@lemote.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com>
      Cc: Rafal Milecki <zajec5@gmail.com>
      Cc: James Cowgill <James.Cowgill@imgtec.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Alex Smith <alex.smith@imgtec.com>
      Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
      Cc: Qais Yousef <qais.yousef@imgtec.com>
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Brian Norris <computersforpeace@gmail.com>
      Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Cc: "Luis R. Rodriguez" <mcgrof@do-not-panic.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Roland McGrath <roland@hack.frob.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: Kalle Valo <kvalo@qca.qualcomm.com>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Tony Wu <tung7970@gmail.com>
      Cc: Huaitong Han <huaitong.han@intel.com>
      Cc: Sumit Semwal <sumit.semwal@linaro.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Jason Cooper <jason@lakedaemon.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrea Gelmini <andrea.gelmini@gelma.net>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Rabin Vincent <rabin@rab.in>
      Cc: "Maciej W. Rozycki" <macro@imgtec.com>
      Cc: David Daney <david.daney@cavium.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      97f2645f
    • M
      IB/mlx4: Add diagnostic hardware counters · 3f85f2aa
      Mark Bloch 提交于
      Expose IB diagnostic hardware counters.
      The counters count IB events and are applicable for IB and RoCE.
      
      The counters can be divided into two groups, per device and per port.
      Device counters are always exposed.
      Port counters are exposed only if the firmware supports per port counters.
      
      rq_num_dup and sq_num_to are only exposed if we have firmware support
      for them, if we do, we expose them per device and per port.
      rq_num_udsdprd and num_cqovf are device only counters.
      
      rq - denotes responder.
      sq - denotes requester.
      
      |-----------------------|---------------------------------------|
      |	Name		|	Description			|
      |-----------------------|---------------------------------------|
      |rq_num_lle		| Number of local length errors		|
      |-----------------------|---------------------------------------|
      |sq_num_lle		| number of local length errors		|
      |-----------------------|---------------------------------------|
      |rq_num_lqpoe		| Number of local QP operation errors	|
      |-----------------------|---------------------------------------|
      |sq_num_lqpoe		| Number of local QP operation errors	|
      |-----------------------|---------------------------------------|
      |rq_num_lpe		| Number of local protection errors	|
      |-----------------------|---------------------------------------|
      |sq_num_lpe		| Number of local protection errors	|
      |-----------------------|---------------------------------------|
      |rq_num_wrfe		| Number of CQEs with error		|
      |-----------------------|---------------------------------------|
      |sq_num_wrfe		| Number of CQEs with error		|
      |-----------------------|---------------------------------------|
      |sq_num_mwbe		| Number of Memory Window bind errors	|
      |-----------------------|---------------------------------------|
      |sq_num_bre		| Number of bad response errors		|
      |-----------------------|---------------------------------------|
      |sq_num_rire		| Number of Remote Invalid request	|
      |			| errors				|
      |-----------------------|---------------------------------------|
      |rq_num_rire		| Number of Remote Invalid request	|
      |			| errors				|
      |-----------------------|---------------------------------------|
      |sq_num_rae		| Number of remote access errors	|
      |-----------------------|---------------------------------------|
      |rq_num_rae		| Number of remote access errors	|
      |-----------------------|---------------------------------------|
      |sq_num_roe		| Number of remote operation errors	|
      |-----------------------|---------------------------------------|
      |sq_num_tree		| Number of transport retries exceeded	|
      |			| errors				|
      |-----------------------|---------------------------------------|
      |sq_num_rree		| Number of RNR NAK retries exceeded	|
      |			| errors				|
      |-----------------------|---------------------------------------|
      |rq_num_rnr		| Number of RNR NAKs sent		|
      |-----------------------|---------------------------------------|
      |sq_num_rnr		| Number of RNR NAKs received		|
      |-----------------------|---------------------------------------|
      |rq_num_oos		| Number of Out of Sequence requests	|
      |			| received				|
      |-----------------------|---------------------------------------|
      |sq_num_oos		| Number of Out of Sequence NAKs	|
      |			| received				|
      |-----------------------|---------------------------------------|
      |rq_num_udsdprd		| Number of UD packets silently		|
      |			| discarded on the Receive Queue due to	|
      |			| lack of receive descriptor		|
      |-----------------------|---------------------------------------|
      |rq_num_dup		| Number of duplicate requests received	|
      |-----------------------|---------------------------------------|
      |sq_num_to		| Number of time out received		|
      |-----------------------|---------------------------------------|
      |num_cqovf		| Number of CQ overflows		|
      |-----------------------|---------------------------------------|
      Signed-off-by: NMark Bloch <markb@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      3f85f2aa
    • M
      net/mlx4: Query performance and diagnostics counters · bfaf3168
      Mark Bloch 提交于
      Add a function to query diagnostics counters from the firmware.
      Signed-off-by: NMark Bloch <markb@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      bfaf3168
    • M
      net/mlx4: Add diagnostic counters capability bit · c7c122ed
      Mark Bloch 提交于
      Add a bit that indicates if the firmware supports per port
      diagnostic counters.
      Signed-off-by: NMark Bloch <markb@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      c7c122ed
    • P
      extable.h: add stddef.h so "NULL" definition is not implicit · 49aadcf1
      Paul Gortmaker 提交于
      While not an issue now, eventually we will have independent users of
      the extable.h file and we will stop sourcing it via module.h header.
      
      In testing that pending work, with very sparse builds, characteristic
      of an "allnoconfig" on various architectures, we can sometimes hit an
      instance where the very basic standard definitions aren't present,
      resulting in:
      
       include/linux/extable.h:26:9: error: 'NULL' undeclared (first use in this function)
      
      To be clear, this isn't a regression, since currently extable.h is
      only used by module.h -- however, we will need this addition present
      before we start migrating exception table users off module.h and onto
      extable.h during the next release cycle.
      
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      49aadcf1
    • J
      modules: add ro_after_init support · 444d13ff
      Jessica Yu 提交于
      Add ro_after_init support for modules by adding a new page-aligned section
      in the module layout (after rodata) for ro_after_init data and enabling RO
      protection for that section after module init runs.
      Signed-off-by: NJessica Yu <jeyu@redhat.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      444d13ff