1. 18 8月, 2016 2 次提交
    • D
      perf/core: Generalize event->group_flags · 4ff6a8de
      David Carrillo-Cisneros 提交于
      Currently, PERF_GROUP_SOFTWARE is used in the group_flags field of a
      group's leader to indicate that is_software_event(event) is true for all
      events in a group. This is the only usage of event->group_flags.
      
      This pattern of setting a group level flags when all events in the group
      share a property is useful for the flag introduced in the next patch and
      for future CQM/CMT flags. So this patches generalizes group_flags to work
      as an aggregate of event level flags.
      
      PERF_GROUP_SOFTWARE denotes an inmutable event's property. All other flags
      that I intend to add are also determinable at event initialization.
      To better convey the above, this patch renames event's group_flags to
      group_caps and PERF_GROUP_SOFTWARE to PERF_EV_CAP_SOFTWARE.
      
      Individual event flags are stored in the new event->event_caps. Since the
      cap flags do not change after event initialization, there is no need to
      serialize event_caps. This new field is used when events are added to a
      context, similarly to how PERF_GROUP_SOFTWARE and is_software_event()
      worked.
      
      Lastly, for consistency, updates is_software_event() to rely in event_cap
      instead of the context index.
      Signed-off-by: NDavid Carrillo-Cisneros <davidcc@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vegard Nossum <vegard.nossum@gmail.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1471467307-61171-3-git-send-email-davidcc@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4ff6a8de
    • M
      bitmap.h, perf/core: Fix the mask in perf_output_sample_regs() · 29dd3288
      Madhavan Srinivasan 提交于
      When decoding the perf_regs mask in perf_output_sample_regs(),
      we loop through the mask using find_first_bit and find_next_bit functions.
      
      While the exisiting code works fine in most of the case, the logic
      is broken for big-endian 32-bit kernels.
      
      When reading a u64 mask using (u32 *)(&val)[0], find_*_bit() assumes
      that it gets the lower 32 bits of u64, but instead it gets the upper
      32 bits - which is wrong.
      
      The fix is to swap the words of the u64 to handle this case.
      This is _not_ a regular endianness swap.
      Suggested-by: NYury Norov <ynorov@caviumnetworks.com>
      Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NYury Norov <ynorov@caviumnetworks.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/1471426568-31051-2-git-send-email-maddy@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      29dd3288
  2. 12 8月, 2016 2 次提交
  3. 11 8月, 2016 1 次提交
  4. 10 8月, 2016 4 次提交
    • P
      locking/qrwlock: Fix write unlock bug on big endian systems · 2db34e8b
      pan xinhui 提交于
      This patch aims to get rid of endianness in queued_write_unlock(). We
      want to set  __qrwlock->wmode to NULL, however the address is not
      &lock->cnts in big endian machine. That causes queued_write_unlock()
      write NULL to the wrong field of __qrwlock.
      
      So implement __qrwlock_write_byte() which returns the correct
      __qrwlock->wmode address.
      Suggested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NPan Xinhui <xinhui.pan@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Waiman.Long@hpe.com
      Cc: arnd@arndb.de
      Cc: boqun.feng@gmail.com
      Cc: will.deacon@arm.com
      Link: http://lkml.kernel.org/r/1468835259-4486-1-git-send-email-xinhui.pan@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2db34e8b
    • P
      perf/core: Optimize perf_pmu_sched_task() · e48c1788
      Peter Zijlstra 提交于
      For perf record -b, which requires the pmu::sched_task callback the
      current code is rather expensive:
      
           7.68%  sched-pipe  [kernel.vmlinux]    [k] perf_pmu_sched_task
           5.95%  sched-pipe  [kernel.vmlinux]    [k] __switch_to
           5.20%  sched-pipe  [kernel.vmlinux]    [k] __intel_pmu_disable_all
           3.95%  sched-pipe  perf                [.] worker_thread
      
      The problem is that it will iterate all registered PMUs, most of which
      will not have anything to do. Avoid this by keeping an explicit list
      of PMUs that have requested the callback.
      
      The perf_sched_cb_{inc,dec}() functions already takes the required pmu
      argument, and now that these functions are no longer called from NMI
      context we can use them to manage a list.
      
      With this patch applied the function doesn't show up in the top 4
      anymore (it dropped to 18th place).
      
           6.67%  sched-pipe  [kernel.vmlinux]    [k] __switch_to
           6.18%  sched-pipe  [kernel.vmlinux]    [k] __intel_pmu_disable_all
           3.92%  sched-pipe  [kernel.vmlinux]    [k] switch_mm_irqs_off
           3.71%  sched-pipe  perf                [.] worker_thread
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      e48c1788
    • D
      perf/core: Set cgroup in CPU contexts for new cgroup events · db4a8356
      David Carrillo-Cisneros 提交于
      There's a perf stat bug easy to observer on a machine with only one cgroup:
      
        $ perf stat -e cycles -I 1000 -C 0 -G /
        #          time             counts unit events
            1.000161699      <not counted>      cycles                    /
            2.000355591      <not counted>      cycles                    /
            3.000565154      <not counted>      cycles                    /
            4.000951350      <not counted>      cycles                    /
      
      We'd expect some output there.
      
      The underlying problem is that there is an optimization in
      perf_cgroup_sched_{in,out}() that skips the switch of cgroup events
      if the old and new cgroups in a task switch are the same.
      
      This optimization interacts with the current code in two ways
      that cause a CPU context's cgroup (cpuctx->cgrp) to be NULL even if a
      cgroup event matches the current task. These are:
      
        1. On creation of the first cgroup event in a CPU: In current code,
        cpuctx->cpu is only set in perf_cgroup_sched_in, but due to the
        aforesaid optimization, perf_cgroup_sched_in will run until the next
        cgroup switches in that CPU. This may happen late or never happen,
        depending on system's number of cgroups, CPU load, etc.
      
        2. On deletion of the last cgroup event in a cpuctx: In list_del_event,
        cpuctx->cgrp is set NULL. Any new cgroup event will not be sched in
        because cpuctx->cgrp == NULL until a cgroup switch occurs and
        perf_cgroup_sched_in is executed (updating cpuctx->cgrp).
      
      This patch fixes both problems by setting cpuctx->cgrp in list_add_event,
      mirroring what list_del_event does when removing a cgroup event from CPU
      context, as introduced in:
      
        commit 68cacd29 ("perf_events: Fix stale ->cgrp pointer in update_cgrp_time_from_cpuctx()")
      
      With this patch, cpuctx->cgrp is always set/clear when installing/removing
      the first/last cgroup event in/from the CPU context. With cpuctx->cgrp
      correctly set, event_filter_match works as intended when events are
      sched in/out.
      
      After the fix, the output is as expected:
      
        $ perf stat -e cycles -I 1000 -a -G /
        #         time             counts unit events
           1.004699159          627342882      cycles                    /
           2.007397156          615272690      cycles                    /
           3.010019057          616726074      cycles                    /
      Signed-off-by: NDavid Carrillo-Cisneros <davidcc@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vegard Nossum <vegard.nossum@gmail.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1470124092-113192-1-git-send-email-davidcc@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      db4a8356
    • L
      Revert "printk: create pr_<level> functions" · a0cba217
      Linus Torvalds 提交于
      This reverts commit 874f9c7d.
      
      Geert Uytterhoeven reports:
       "This change seems to have an (unintendent?) side-effect.
      
        Before, pr_*() calls without a trailing newline characters would be
        printed with a newline character appended, both on the console and in
        the output of the dmesg command.
      
        After this commit, no new line character is appended, and the output
        of the next pr_*() call of the same type may be appended, like in:
      
          - Truncating RAM at 0x0000000040000000-0x00000000c0000000 to -0x0000000070000000
          - Ignoring RAM at 0x0000000200000000-0x0000000240000000 (!CONFIG_HIGHMEM)
          + Truncating RAM at 0x0000000040000000-0x00000000c0000000 to -0x0000000070000000Ignoring RAM at 0x0000000200000000-0x0000000240000000 (!CONFIG_HIGHMEM)"
      
      Joe Perches says:
       "No, that is not intentional.
      
        The newline handling code inside vprintk_emit is a bit involved and
        for now I suggest a revert until this has all the same behavior as
        earlier"
      Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Requested-by: NJoe Perches <joe@perches.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a0cba217
  5. 09 8月, 2016 5 次提交
    • S
      tracing: Fix tick_stop tracepoint symbols for user export · c87edb36
      Steven Rostedt (Red Hat) 提交于
      The symbols used in the tick_stop tracepoint were not being converted
      properly into integers in the trace_stop format file. Instead we had this:
      
      print fmt: "success=%d dependency=%s", REC->success,
          __print_symbolic(REC->dependency, { 0, "NONE" },
           { (1 << TICK_DEP_BIT_POSIX_TIMER), "POSIX_TIMER" },
           { (1 << TICK_DEP_BIT_PERF_EVENTS), "PERF_EVENTS" },
           { (1 << TICK_DEP_BIT_SCHED), "SCHED" },
           { (1 << TICK_DEP_BIT_CLOCK_UNSTABLE), "CLOCK_UNSTABLE" })
      
      User space tools have no idea how to parse "TICK_DEP_BIT_SCHED" or the other
      symbols used to do the bit shifting. The reason is that the conversion was
      done with using the TICK_DEP_MASK_* symbols which are just macros that
      convert to the BIT shift itself (with the exception of NONE, which was
      converted properly, because it doesn't use bits, and is defined as zero).
      
      The TICK_DEP_BIT_* needs to be denoted by TRACE_DEFINE_ENUM() in order to
      have this properly converted for user space tools to parse this event.
      
      Cc: stable@vger.kernel.org
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Fixes: e6e6cc22 ("nohz: Use enum code for tick stop failure tracing message")
      Reported-by: NLuiz Capitulino <lcapitulino@redhat.com>
      Tested-by: NLuiz Capitulino <lcapitulino@redhat.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      c87edb36
    • S
      virtio-vsock: fix include guard typo · 28ad5557
      Stefan Hajnoczi 提交于
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      28ad5557
    • M
      genirq/msi: Make sure PCI MSIs are activated early · f3b0946d
      Marc Zyngier 提交于
      Bharat Kumar Gogada reported issues with the generic MSI code, where the
      end-point ended up with garbage in its MSI configuration (both for the vector
      and the message).
      
      It turns out that the two MSI paths in the kernel are doing slightly different
      things:
      
      generic MSI: disable MSI -> allocate MSI -> enable MSI -> setup EP
      PCI MSI: disable MSI -> allocate MSI -> setup EP -> enable MSI
      
      And it turns out that end-points are allowed to latch the content of the MSI
      configuration registers as soon as MSIs are enabled.  In Bharat's case, the
      end-point ends up using whatever was there already, which is not what you
      want.
      
      In order to make things converge, we introduce a new MSI domain flag
      (MSI_FLAG_ACTIVATE_EARLY) that is unconditionally set for PCI/MSI. When set,
      this flag forces the programming of the end-point as soon as the MSIs are
      allocated.
      
      A consequence of this is that we have an extra activate in irq_startup, but
      that should be without much consequence.
      
      tglx: 
      
       - Several people reported a VMWare regression with PCI/MSI-X passthrough. It
         turns out that the patch also cures that issue.
      
       - We need to have a look at the MSI disable interrupt path, where we write
         the msg to all zeros without disabling MSI in the PCI device. Is that
         correct?
      
      Fixes: 52f518a3 "x86/MSI: Use hierarchical irqdomains to manage MSI interrupts"
      Reported-and-tested-by: NBharat Kumar Gogada <bharat.kumar.gogada@xilinx.com>
      Reported-and-tested-by: NFoster Snowhill <forst@forstwoof.ru>
      Reported-by: NMatthias Prager <linux@matthiasprager.de>
      Reported-by: NJason Taylor <jason.taylor@simplivity.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Acked-by: NBjorn Helgaas <bhelgaas@google.com>
      Cc: linux-pci@vger.kernel.org
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/1468426713-31431-1-git-send-email-marc.zyngier@arm.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      f3b0946d
    • P
      cxl: Use fixed width predefined types in data structure. · cbd74e1b
      Philippe Bergheaud 提交于
      This patch fixes a regression introduced by commit b810253b ("cxl:
      Add mechanism for delivering AFU driver specific events").
      
      It changes the type u8 to __u8 in the uapi header cxl.h, because the
      former is a kernel internal type, and may not be defined in userland
      build environments, in particular when cross-compiling libcxl on x86_64
      linux machines (RHEL6.7 and Ubuntu 16.04).
      
      This patch also changes the size of the field data_size, and makes it
      constant, to support 32-bit userland applications running on big-endian
      ppc64 kernels transparently.
      
      mpe: This is an ABI change, however the ABI was only added during the
      4.8 merge window so has never been part of a released kernel - therefore
      we give ourselves permission to change it.
      
      Fixes: b810253b ("cxl: Add mechanism for delivering AFU driver specific events")
      Signed-off-by: NPhilippe Bergheaud <felix@linux.vnet.ibm.com>
      Reviewed-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      cbd74e1b
    • L
      unsafe_[get|put]_user: change interface to use a error target label · 1bd4403d
      Linus Torvalds 提交于
      When I initially added the unsafe_[get|put]_user() helpers in commit
      5b24a7a2 ("Add 'unsafe' user access functions for batched
      accesses"), I made the mistake of modeling the interface on our
      traditional __[get|put]_user() functions, which return zero on success,
      or -EFAULT on failure.
      
      That interface is fairly easy to use, but it's actually fairly nasty for
      good code generation, since it essentially forces the caller to check
      the error value for each access.
      
      In particular, since the error handling is already internally
      implemented with an exception handler, and we already use "asm goto" for
      various other things, we could fairly easily make the error cases just
      jump directly to an error label instead, and avoid the need for explicit
      checking after each operation.
      
      So switch the interface to pass in an error label, rather than checking
      the error value in the caller.  Best do it now before we start growing
      more users (the signal handling code in particular would be a good place
      to use the new interface).
      
      So rather than
      
      	if (unsafe_get_user(x, ptr))
      		... handle error ..
      
      the interface is now
      
      	unsafe_get_user(x, ptr, label);
      
      where an error during the user mode fetch will now just cause a jump to
      'label' in the caller.
      
      Right now the actual _implementation_ of this all still ends up being a
      "if (err) goto label", and does not take advantage of any exception
      label tricks, but for "unsafe_put_user()" in particular it should be
      fairly straightforward to convert to using the exception table model.
      
      Note that "unsafe_get_user()" is much harder to convert to a clever
      exception table model, because current versions of gcc do not allow the
      use of "asm goto" (for the exception) with output values (for the actual
      value to be fetched).  But that is hopefully not a limitation in the
      long term.
      
      [ Also note that it might be a good idea to switch unsafe_get_user() to
        actually _return_ the value it fetches from user space, but this
        commit only changes the error handling semantics ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1bd4403d
  6. 08 8月, 2016 2 次提交
    • J
      block: rename bio bi_rw to bi_opf · 1eff9d32
      Jens Axboe 提交于
      Since commit 63a4cc24, bio->bi_rw contains flags in the lower
      portion and the op code in the higher portions. This means that
      old code that relies on manually setting bi_rw is most likely
      going to be broken. Instead of letting that brokeness linger,
      rename the member, to force old and out-of-tree code to break
      at compile time instead of at runtime.
      
      No intended functional changes in this commit.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      1eff9d32
    • J
      block/mm: make bdev_ops->rw_page() take a bool for read/write · c11f0c0b
      Jens Axboe 提交于
      Commit abf54548 changed it from an 'rw' flags type to the
      newer ops based interface, but now we're effectively leaking
      some bdev internals to the rest of the kernel. Since we only
      care about whether it's a read or a write at that level, just
      pass in a bool 'is_write' parameter instead.
      
      Then we can also move op_is_write() and friends back under
      CONFIG_BLOCK protection.
      Reviewed-by: NMike Christie <mchristi@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      c11f0c0b
  7. 06 8月, 2016 3 次提交
  8. 05 8月, 2016 4 次提交
    • M
      mm/block: convert rw_page users to bio op use · abf54548
      Mike Christie 提交于
      The rw_page users were not converted to use bio/req ops. As a result
      bdev_write_page is not passing down REQ_OP_WRITE and the IOs will
      be sent down as reads.
      Signed-off-by: NMike Christie <mchristi@redhat.com>
      Fixes: 4e1b2d52 ("block, fs, drivers: remove REQ_OP compat defs and related code")
      
      Modified by me to:
      
      1) Drop op_flags passing into ->rw_page(), as we don't use it.
      2) Make op_is_write() and friends safe to use for !CONFIG_BLOCK
      Signed-off-by: NJens Axboe <axboe@fb.com>
      abf54548
    • J
      Include: blkdev: Removed duplicate 'struct request;' declaration. · 6d25ec14
      John Pittman 提交于
      In include/linux/blkdev.h duplicate declarations of the request
      struct exist.  Cleaned up by removing the second, unneeded
      declaration.
      Signed-off-by: NJohn Pittman <jpittman@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      6d25ec14
    • D
      block: fix bdi vs gendisk lifetime mismatch · df08c32c
      Dan Williams 提交于
      The name for a bdi of a gendisk is derived from the gendisk's devt.
      However, since the gendisk is destroyed before the bdi it leaves a
      window where a new gendisk could dynamically reuse the same devt while a
      bdi with the same name is still live.  Arrange for the bdi to hold a
      reference against its "owner" disk device while it is registered.
      Otherwise we can hit sysfs duplicate name collisions like the following:
      
       WARNING: CPU: 10 PID: 2078 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x64/0x80
       sysfs: cannot create duplicate filename '/devices/virtual/bdi/259:1'
      
       Hardware name: HP ProLiant DL580 Gen8, BIOS P79 05/06/2015
        0000000000000286 0000000002c04ad5 ffff88006f24f970 ffffffff8134caec
        ffff88006f24f9c0 0000000000000000 ffff88006f24f9b0 ffffffff8108c351
        0000001f0000000c ffff88105d236000 ffff88105d1031e0 ffff8800357427f8
       Call Trace:
        [<ffffffff8134caec>] dump_stack+0x63/0x87
        [<ffffffff8108c351>] __warn+0xd1/0xf0
        [<ffffffff8108c3cf>] warn_slowpath_fmt+0x5f/0x80
        [<ffffffff812a0d34>] sysfs_warn_dup+0x64/0x80
        [<ffffffff812a0e1e>] sysfs_create_dir_ns+0x7e/0x90
        [<ffffffff8134faaa>] kobject_add_internal+0xaa/0x320
        [<ffffffff81358d4e>] ? vsnprintf+0x34e/0x4d0
        [<ffffffff8134ff55>] kobject_add+0x75/0xd0
        [<ffffffff816e66b2>] ? mutex_lock+0x12/0x2f
        [<ffffffff8148b0a5>] device_add+0x125/0x610
        [<ffffffff8148b788>] device_create_groups_vargs+0xd8/0x100
        [<ffffffff8148b7cc>] device_create_vargs+0x1c/0x20
        [<ffffffff811b775c>] bdi_register+0x8c/0x180
        [<ffffffff811b7877>] bdi_register_dev+0x27/0x30
        [<ffffffff813317f5>] add_disk+0x175/0x4a0
      
      Cc: <stable@vger.kernel.org>
      Reported-by: NYi Zhang <yizhan@redhat.com>
      Tested-by: NYi Zhang <yizhan@redhat.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      
      Fixed up missing 0 return in bdi_register_owner().
      Signed-off-by: NJens Axboe <axboe@fb.com>
      df08c32c
    • P
      block: add missing group association in bio-cloning functions · 20bd723e
      Paolo Valente 提交于
      When a bio is cloned, the newly created bio must be associated with
      the same blkcg as the original bio (if BLK_CGROUP is enabled). If
      this operation is not performed, then the new bio is not associated
      with any group, and the group of the current task is returned when
      the group of the bio is requested.
      
      Depending on the cloning frequency, this may cause a large
      percentage of the bios belonging to a given group to be treated
      as if belonging to other groups (in most cases as if belonging to
      the root group). The expected group isolation may thereby be broken.
      
      This commit adds the missing association in bio-cloning functions.
      
      Fixes: da2f0f74 ("Btrfs: add support for blkio controllers")
      Cc: stable@vger.kernel.org # v4.3+
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Reviewed-by: NNikolay Borisov <kernel@kyup.com>
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      20bd723e
  9. 04 8月, 2016 13 次提交
    • M
      Soft RoCE driver · 8700e3e7
      Moni Shoua 提交于
      Soft RoCE (RXE) - The software RoCE driver
      
      ib_rxe implements the RDMA transport and registers to the RDMA core
      device as a kernel verbs provider. It also implements the packet IO
      layer. On the other hand ib_rxe registers to the Linux netdev stack
      as a udp encapsulating protocol, in that case RDMA, for sending and
      receiving packets over any Ethernet device.  This yields a RDMA
      transport over the UDP/Ethernet network layer forming a RoCEv2
      compatible device.
      
      The configuration procedure of the Soft RoCE drivers requires
      binding to any existing Ethernet network device. This is done with
      /sys interface.
      
      A userspace Soft RoCE library (librxe) provides user applications
      the ability to run with Soft RoCE devices.  The use of rxe verbs ins
      user space requires the inclusion of librxe as a device specifics
      plug-in to libibverbs. librxe is packaged separately.
      
      Architecture:
      
           +-----------------------------------------------------------+
           |                          Application                      |
           +-----------------------------------------------------------+
                                  +-----------------------------------+
                                  |             libibverbs            |
      User                        +-----------------------------------+
                                  +----------------+ +----------------+
                                  | librxe         | | HW RoCE lib    |
                                  +----------------+ +----------------+
      +---------------------------------------------------------------+
           +--------------+                           +------------+
           | Sockets      |                           | RDMA ULP   |
           +--------------+                           +------------+
           +--------------+                  +---------------------+
           | TCP/IP       |                  | ib_core             |
           +--------------+                  +---------------------+
                                   +------------+ +----------------+
      Kernel                       | ib_rxe     | | HW RoCE driver |
                                   +------------+ +----------------+
           +------------------------------------+
           | NIC driver                         |
           +------------------------------------+
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           +-----------------------------------------------------------+
           |                          Application                      |
           +-----------------------------------------------------------+
                                  +-----------------------------------+
                                  |             libibverbs            |
      User                        +-----------------------------------+
                                  +----------------+ +----------------+
                                  | librxe         | | HW RoCE lib    |
                                  +----------------+ +----------------+
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           +--------------+                           +------------+
           | Sockets      |                           | RDMA ULP   |
           +--------------+                           +------------+
           +--------------+                  +---------------------+
           | TCP/IP       |                  | ib_core             |
           +--------------+                  +---------------------+
                                   +------------+ +----------------+
      Kernel                       | ib_rxe     | | HW RoCE driver |
                                   +------------+ +----------------+
           +------------------------------------+
           | NIC driver                         |
           +------------------------------------+
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      Soft RoCE resources:
      
      [1[ https://github.com/SoftRoCE/librxe-dev librxe - source code in
      Github
      [2] https://github.com/SoftRoCE/rxe-dev/wiki/rxe-dev:-Home - Soft RoCE
      Wiki page
      [3] https://github.com/SoftRoCE/librxe-dev - Soft RoCE userspace library
      Signed-off-by: NKamal Heib <kamalh@mellanox.com>
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NMoni Shoua <monis@mellanox.com>
      Reviewed-by: NHaggai Eran <haggaie@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      8700e3e7
    • J
      dynamic_debug: add jump label support · 9049fc74
      Jason Baron 提交于
      Although dynamic debug is often only used for debug builds, sometimes
      its enabled for production builds as well.  Minimize its impact by using
      jump labels.  This reduces the text section by 7000+ bytes in the kernel
      image below.  It does increase data, but this should only be referenced
      when changing the direction of the branches, and hence usually not in
      cache.
      
           text     data     bss       dec     hex  filename
        8194852  4879776  925696  14000324  d5a0c4  vmlinux.pre
        8187337  4960224  925696  14073257  d6bda9  vmlinux.post
      
      Link: http://lkml.kernel.org/r/d165b465e8c89bc582d973758d40be44c33f018b.1467837322.git.jbaron@akamai.comSigned-off-by: NJason Baron <jbaron@akamai.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9049fc74
    • J
      jump_label: remove bug.h, atomic.h dependencies for HAVE_JUMP_LABEL · 1f69bf9c
      Jason Baron 提交于
      The current jump_label.h includes bug.h for things such as WARN_ON().
      This makes the header problematic for inclusion by kernel.h or any
      headers that kernel.h includes, since bug.h includes kernel.h (circular
      dependency).  The inclusion of atomic.h is similarly problematic.  Thus,
      this should make jump_label.h 'includable' from most places.
      
      Link: http://lkml.kernel.org/r/7060ce35ddd0d20b33bf170685e6b0fab816bdf2.1467837322.git.jbaron@akamai.comSigned-off-by: NJason Baron <jbaron@akamai.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1f69bf9c
    • K
      dma-mapping: use unsigned long for dma_attrs · 00085f1e
      Krzysztof Kozlowski 提交于
      The dma-mapping core and the implementations do not change the DMA
      attributes passed by pointer.  Thus the pointer can point to const data.
      However the attributes do not have to be a bitfield.  Instead unsigned
      long will do fine:
      
      1. This is just simpler.  Both in terms of reading the code and setting
         attributes.  Instead of initializing local attributes on the stack
         and passing pointer to it to dma_set_attr(), just set the bits.
      
      2. It brings safeness and checking for const correctness because the
         attributes are passed by value.
      
      Semantic patches for this change (at least most of them):
      
          virtual patch
          virtual context
      
          @r@
          identifier f, attrs;
      
          @@
          f(...,
          - struct dma_attrs *attrs
          + unsigned long attrs
          , ...)
          {
          ...
          }
      
          @@
          identifier r.f;
          @@
          f(...,
          - NULL
          + 0
           )
      
      and
      
          // Options: --all-includes
          virtual patch
          virtual context
      
          @r@
          identifier f, attrs;
          type t;
      
          @@
          t f(..., struct dma_attrs *attrs);
      
          @@
          identifier r.f;
          @@
          f(...,
          - NULL
          + 0
           )
      
      Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.comSigned-off-by: NKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Acked-by: NVineet Gupta <vgupta@synopsys.com>
      Acked-by: NRobin Murphy <robin.murphy@arm.com>
      Acked-by: NHans-Christian Noren Egtvedt <egtvedt@samfundet.no>
      Acked-by: Mark Salter <msalter@redhat.com> [c6x]
      Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> [cris]
      Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [drm]
      Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
      Acked-by: Fabien Dessenne <fabien.dessenne@st.com> [bdisp]
      Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> [vb2-core]
      Acked-by: David Vrabel <david.vrabel@citrix.com> [xen]
      Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen swiotlb]
      Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
      Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon]
      Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
      Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390]
      Acked-by: NBjorn Andersson <bjorn.andersson@linaro.org>
      Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> [avr32]
      Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc]
      Acked-by: Robin Murphy <robin.murphy@arm.com> [arm64 and dma-iommu]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      00085f1e
    • A
      include/linux/bitmap.h: cleanup · 4b9d314c
      Andrew Morton 提交于
      Remove two unneeded `else's.
      
      Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4b9d314c
    • M
      tree-wide: replace config_enabled() with IS_ENABLED() · 97f2645f
      Masahiro Yamada 提交于
      The use of config_enabled() against config options is ambiguous.  In
      practical terms, config_enabled() is equivalent to IS_BUILTIN(), but the
      author might have used it for the meaning of IS_ENABLED().  Using
      IS_ENABLED(), IS_BUILTIN(), IS_MODULE() etc.  makes the intention
      clearer.
      
      This commit replaces config_enabled() with IS_ENABLED() where possible.
      This commit is only touching bool config options.
      
      I noticed two cases where config_enabled() is used against a tristate
      option:
      
       - config_enabled(CONFIG_HWMON)
        [ drivers/net/wireless/ath/ath10k/thermal.c ]
      
       - config_enabled(CONFIG_BACKLIGHT_CLASS_DEVICE)
        [ drivers/gpu/drm/gma500/opregion.c ]
      
      I did not touch them because they should be converted to IS_BUILTIN()
      in order to keep the logic, but I was not sure it was the authors'
      intention.
      
      Link: http://lkml.kernel.org/r/1465215656-20569-1-git-send-email-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Stas Sergeev <stsp@list.ru>
      Cc: Matt Redfearn <matt.redfearn@imgtec.com>
      Cc: Joshua Kinard <kumba@gentoo.org>
      Cc: Jiri Slaby <jslaby@suse.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Markos Chandras <markos.chandras@imgtec.com>
      Cc: "Dmitry V. Levin" <ldv@altlinux.org>
      Cc: yu-cheng yu <yu-cheng.yu@intel.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Will Drewry <wad@chromium.org>
      Cc: Nikolay Martynov <mar.kolya@gmail.com>
      Cc: Huacai Chen <chenhc@lemote.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com>
      Cc: Rafal Milecki <zajec5@gmail.com>
      Cc: James Cowgill <James.Cowgill@imgtec.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Alex Smith <alex.smith@imgtec.com>
      Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
      Cc: Qais Yousef <qais.yousef@imgtec.com>
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Brian Norris <computersforpeace@gmail.com>
      Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Cc: "Luis R. Rodriguez" <mcgrof@do-not-panic.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Roland McGrath <roland@hack.frob.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: Kalle Valo <kvalo@qca.qualcomm.com>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Tony Wu <tung7970@gmail.com>
      Cc: Huaitong Han <huaitong.han@intel.com>
      Cc: Sumit Semwal <sumit.semwal@linaro.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Jason Cooper <jason@lakedaemon.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrea Gelmini <andrea.gelmini@gelma.net>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Rabin Vincent <rabin@rab.in>
      Cc: "Maciej W. Rozycki" <macro@imgtec.com>
      Cc: David Daney <david.daney@cavium.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      97f2645f
    • A
      IB/core: Support for CMA multicast join flags · ab15c95a
      Alex Vesker 提交于
      Added UCMA and CMA support for multicast join flags. Flags are
      passed using UCMA CM join command previously reserved fields.
      Currently supporting two join flags indicating two different
      multicast JoinStates:
      
      1. Full Member:
         The initiator creates the Multicast group(MCG) if it wasn't
         previously created, can send Multicast messages to the group
         and receive messages from the MCG.
      
      2. Send Only Full Member:
         The initiator creates the Multicast group(MCG) if it wasn't
         previously created, can send Multicast messages to the group
         but doesn't receive any messages from the MCG.
      
         IB: Send Only Full Member requires a query of ClassPortInfo
             to determine if SM/SA supports this option. If SM/SA
             doesn't support Send-Only there will be no join request
             sent and an error will be returned.
      
         ETH: When Send Only Full Member is requested no IGMP join
      	will be sent.
      Signed-off-by: NAlex Vesker <valex@mellanox.com>
      Reviewed by: Hal Rosenstock <hal@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      ab15c95a
    • M
      IB/mlx4: Add diagnostic hardware counters · 3f85f2aa
      Mark Bloch 提交于
      Expose IB diagnostic hardware counters.
      The counters count IB events and are applicable for IB and RoCE.
      
      The counters can be divided into two groups, per device and per port.
      Device counters are always exposed.
      Port counters are exposed only if the firmware supports per port counters.
      
      rq_num_dup and sq_num_to are only exposed if we have firmware support
      for them, if we do, we expose them per device and per port.
      rq_num_udsdprd and num_cqovf are device only counters.
      
      rq - denotes responder.
      sq - denotes requester.
      
      |-----------------------|---------------------------------------|
      |	Name		|	Description			|
      |-----------------------|---------------------------------------|
      |rq_num_lle		| Number of local length errors		|
      |-----------------------|---------------------------------------|
      |sq_num_lle		| number of local length errors		|
      |-----------------------|---------------------------------------|
      |rq_num_lqpoe		| Number of local QP operation errors	|
      |-----------------------|---------------------------------------|
      |sq_num_lqpoe		| Number of local QP operation errors	|
      |-----------------------|---------------------------------------|
      |rq_num_lpe		| Number of local protection errors	|
      |-----------------------|---------------------------------------|
      |sq_num_lpe		| Number of local protection errors	|
      |-----------------------|---------------------------------------|
      |rq_num_wrfe		| Number of CQEs with error		|
      |-----------------------|---------------------------------------|
      |sq_num_wrfe		| Number of CQEs with error		|
      |-----------------------|---------------------------------------|
      |sq_num_mwbe		| Number of Memory Window bind errors	|
      |-----------------------|---------------------------------------|
      |sq_num_bre		| Number of bad response errors		|
      |-----------------------|---------------------------------------|
      |sq_num_rire		| Number of Remote Invalid request	|
      |			| errors				|
      |-----------------------|---------------------------------------|
      |rq_num_rire		| Number of Remote Invalid request	|
      |			| errors				|
      |-----------------------|---------------------------------------|
      |sq_num_rae		| Number of remote access errors	|
      |-----------------------|---------------------------------------|
      |rq_num_rae		| Number of remote access errors	|
      |-----------------------|---------------------------------------|
      |sq_num_roe		| Number of remote operation errors	|
      |-----------------------|---------------------------------------|
      |sq_num_tree		| Number of transport retries exceeded	|
      |			| errors				|
      |-----------------------|---------------------------------------|
      |sq_num_rree		| Number of RNR NAK retries exceeded	|
      |			| errors				|
      |-----------------------|---------------------------------------|
      |rq_num_rnr		| Number of RNR NAKs sent		|
      |-----------------------|---------------------------------------|
      |sq_num_rnr		| Number of RNR NAKs received		|
      |-----------------------|---------------------------------------|
      |rq_num_oos		| Number of Out of Sequence requests	|
      |			| received				|
      |-----------------------|---------------------------------------|
      |sq_num_oos		| Number of Out of Sequence NAKs	|
      |			| received				|
      |-----------------------|---------------------------------------|
      |rq_num_udsdprd		| Number of UD packets silently		|
      |			| discarded on the Receive Queue due to	|
      |			| lack of receive descriptor		|
      |-----------------------|---------------------------------------|
      |rq_num_dup		| Number of duplicate requests received	|
      |-----------------------|---------------------------------------|
      |sq_num_to		| Number of time out received		|
      |-----------------------|---------------------------------------|
      |num_cqovf		| Number of CQ overflows		|
      |-----------------------|---------------------------------------|
      Signed-off-by: NMark Bloch <markb@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      3f85f2aa
    • M
      net/mlx4: Query performance and diagnostics counters · bfaf3168
      Mark Bloch 提交于
      Add a function to query diagnostics counters from the firmware.
      Signed-off-by: NMark Bloch <markb@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      bfaf3168
    • M
      net/mlx4: Add diagnostic counters capability bit · c7c122ed
      Mark Bloch 提交于
      Add a bit that indicates if the firmware supports per port
      diagnostic counters.
      Signed-off-by: NMark Bloch <markb@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      c7c122ed
    • P
      extable.h: add stddef.h so "NULL" definition is not implicit · 49aadcf1
      Paul Gortmaker 提交于
      While not an issue now, eventually we will have independent users of
      the extable.h file and we will stop sourcing it via module.h header.
      
      In testing that pending work, with very sparse builds, characteristic
      of an "allnoconfig" on various architectures, we can sometimes hit an
      instance where the very basic standard definitions aren't present,
      resulting in:
      
       include/linux/extable.h:26:9: error: 'NULL' undeclared (first use in this function)
      
      To be clear, this isn't a regression, since currently extable.h is
      only used by module.h -- however, we will need this addition present
      before we start migrating exception table users off module.h and onto
      extable.h during the next release cycle.
      
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      49aadcf1
    • J
      modules: add ro_after_init support · 444d13ff
      Jessica Yu 提交于
      Add ro_after_init support for modules by adding a new page-aligned section
      in the module layout (after rodata) for ro_after_init data and enabling RO
      protection for that section after module init runs.
      Signed-off-by: NJessica Yu <jeyu@redhat.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      444d13ff
    • P
      exceptions: fork exception table content from module.h into extable.h · 0ef76537
      Paul Gortmaker 提交于
      For historical reasons (i.e. pre-git) the exception table stuff was
      buried in the middle of the module.h file.  I noticed this while
      doing an audit for needless includes of module.h and found core
      kernel files (both arch specific and arch independent) were just
      including module.h for this.
      
      The converse is also true, in that conventional drivers, be they
      for filesystems or actual hardware peripherals or similar, do not
      normally care about the exception tables.
      
      Here we fork the exception table content out of module.h into a
      new file called extable.h -- and temporarily include it into the
      module.h itself.
      
      Then we will work our way across the arch independent and arch
      specific files needing just exception table content, and move
      them off module.h and onto extable.h
      
      Once that is done, we can remove the extable.h from module.h
      and in doing it like this, we avoid introducing build failures
      into the git history.
      
      The gain here is that module.h gets a bit smaller, across all
      modular drivers that we build for allmodconfig.  Also the core
      files that only need exception table stuff don't have an include
      of module.h that brings in lots of extra stuff and just looks
      generally out of place.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      0ef76537
  10. 03 8月, 2016 4 次提交