1. 05 3月, 2014 2 次提交
    • M
      ia64/efi: Implement efi_enabled() · 09206380
      Matt Fleming 提交于
      There's no good reason to keep efi_enabled() under CONFIG_X86 anymore,
      since nothing about the implementation is specific to x86.
      
      Set EFI feature flags in the ia64 boot path instead of claiming to
      support all features. The old behaviour was actually buggy since
      efi.memmap never points to a valid memory map, so we shouldn't be
      claiming to support EFI_MEMMAP.
      
      Fortunately, this bug was never triggered because EFI_MEMMAP isn't used
      outside of arch/x86 currently, but that may not always be the case.
      Reviewed-and-tested-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      09206380
    • M
      efi: Move facility flags to struct efi · 3e909599
      Matt Fleming 提交于
      As we grow support for more EFI architectures they're going to want the
      ability to query which EFI features are available on the running system.
      Instead of storing this information in an architecture-specific place,
      stick it in the global 'struct efi', which is already the central
      location for EFI state.
      
      While we're at it, let's change the return value of efi_enabled() to be
      bool and replace all references to 'facility' with 'feature', which is
      the usual word used to describe the attributes of the running system.
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      3e909599
  2. 14 2月, 2014 1 次提交
  3. 13 2月, 2014 1 次提交
  4. 11 2月, 2014 4 次提交
    • K
      block: Fix cloning of discard/write same bios · 8423ae3d
      Kent Overstreet 提交于
      Immutable biovecs changed the way bio segments are treated in such a way that
      bio_for_each_segment() cannot now do what we want for discard/write same bios,
      since bi_size means something completely different for them.
      
      Fortunately discard and write same bios never have more than a single biovec, so
      bio_for_each_segment() is unnecessary and not terribly meaningful for them, but
      we still have to special case them in a few places.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Tested-by: NRichard W.M. Jones <rjones@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      8423ae3d
    • P
      smp.h: fix x86+cpu.c sparse warnings about arch nonboot CPU calls · fb37bb04
      Paul Gortmaker 提交于
      Use what we already do for arch_disable_smp_support() to fix these:
      
        arch/x86/kernel/smpboot.c:1155:6: warning: symbol 'arch_enable_nonboot_cpus_begin' was not declared. Should it be static?
        arch/x86/kernel/smpboot.c:1160:6: warning: symbol 'arch_enable_nonboot_cpus_end' was not declared. Should it be static?
        kernel/cpu.c:512:13: warning: symbol 'arch_enable_nonboot_cpus_begin' was not declared. Should it be static?
        kernel/cpu.c:516:13: warning: symbol 'arch_enable_nonboot_cpus_end' was not declared. Should it be static?
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fb37bb04
    • C
      blk-mq: rework flush sequencing logic · 18741986
      Christoph Hellwig 提交于
      Witch to using a preallocated flush_rq for blk-mq similar to what's done
      with the old request path.  This allows us to set up the request properly
      with a tag from the actually allowed range and ->rq_disk as needed by
      some drivers.  To make life easier we also switch to dynamic allocation
      of ->flush_rq for the old path.
      
      This effectively reverts most of
      
          "blk-mq: fix for flush deadlock"
      
      and
      
          "blk-mq: Don't reserve a tag for flush request"
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      18741986
    • C
      blk-mq: rework I/O completions · 30a91cb4
      Christoph Hellwig 提交于
      Rework I/O completions to work more like the old code path.  blk_mq_end_io
      now stays out of the business of deferring completions to others CPUs
      and calling blk_mark_rq_complete.  The latter is very important to allow
      completing requests that have timed out and thus are already marked completed,
      the former allows using the IPI callout even for driver specific completions
      instead of having to reimplement them.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      30a91cb4
  5. 10 2月, 2014 2 次提交
    • R
      fs: Add prototype declaration to appropriate header file include/linux/bio.h · c4540a7d
      Rashika Kheria 提交于
      Add prototype declaration to header file include/linux/bio.h because it
      is used by more than one file.
      
      This eliminates the following warning in bio-integrity.c:
      fs/bio-integrity.c:214:14: warning: no previous prototype for ‘bio_integrity_tag_size’ [-Wmissing-prototypes]
      Signed-off-by: NRashika Kheria <rashika.kheria@gmail.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      c4540a7d
    • A
      fix O_SYNC|O_APPEND syncing the wrong range on write() · d311d79d
      Al Viro 提交于
      It actually goes back to 2004 ([PATCH] Concurrent O_SYNC write support)
      when sync_page_range() had been introduced; generic_file_write{,v}() correctly
      synced
      	pos_after_write - written .. pos_after_write - 1
      but generic_file_aio_write() synced
      	pos_before_write .. pos_before_write + written - 1
      instead.  Which is not the same thing with O_APPEND, obviously.
      A couple of years later correct variant had been killed off when
      everything switched to use of generic_file_aio_write().
      
      All users of generic_file_aio_write() are affected, and the same bug
      has been copied into other instances of ->aio_write().
      
      The fix is trivial; the only subtle point is that generic_write_sync()
      ought to be inlined to avoid calculations useless for the majority of
      calls.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      d311d79d
  6. 09 2月, 2014 1 次提交
  7. 08 2月, 2014 3 次提交
  8. 07 2月, 2014 2 次提交
    • E
      IB/mlx5: Fix binary compatibility with libmlx5 · 78c0f98c
      Eli Cohen 提交于
      Commit c1be5232 ("Fix micro UAR allocator") broke binary compatibility
      between libmlx5 and mlx5_ib since it defines a different value to the number
      of micro UARs per page, leading to wrong calculation in libmlx5. This patch
      defines struct mlx5_ib_alloc_ucontext_req_v2 as an extension to struct
      mlx5_ib_alloc_ucontext_req.  The extended size is determined in mlx5_ib_alloc_ucontext()
      and in case of old library we use uuarn 0 which works fine -- this is
      acheived due to create_user_qp() falling back from high to medium then to
      low class where low class will return 0.  For new libraries we use the
      more sophisticated allocation algorithm.
      Signed-off-by: NEli Cohen <eli@mellanox.com>
      Reviewed-by: NYann Droneaud <ydroneaud@opteya.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      78c0f98c
    • S
      swap: add a simple detector for inappropriate swapin readahead · 579f8290
      Shaohua Li 提交于
      This is a patch to improve swap readahead algorithm.  It's from Hugh and
      I slightly changed it.
      
      Hugh's original changelog:
      
      swapin readahead does a blind readahead, whether or not the swapin is
      sequential.  This may be ok on harddisk, because large reads have
      relatively small costs, and if the readahead pages are unneeded they can
      be reclaimed easily - though, what if their allocation forced reclaim of
      useful pages? But on SSD devices large reads are more expensive than
      small ones: if the readahead pages are unneeded, reading them in caused
      significant overhead.
      
      This patch adds very simplistic random read detection.  Stealing the
      PageReadahead technique from Konstantin Khlebnikov's patch, avoiding the
      vma/anon_vma sophistications of Shaohua Li's patch, swapin_nr_pages()
      simply looks at readahead's current success rate, and narrows or widens
      its readahead window accordingly.  There is little science to its
      heuristic: it's about as stupid as can be whilst remaining effective.
      
      The table below shows elapsed times (in centiseconds) when running a
      single repetitive swapping load across a 1000MB mapping in 900MB ram
      with 1GB swap (the harddisk tests had taken painfully too long when I
      used mem=500M, but SSD shows similar results for that).
      
      Vanilla is the 3.6-rc7 kernel on which I started; Shaohua denotes his
      Sep 3 patch in mmotm and linux-next; HughOld denotes my Oct 1 patch
      which Shaohua showed to be defective; HughNew this Nov 14 patch, with
      page_cluster as usual at default of 3 (8-page reads); HughPC4 this same
      patch with page_cluster 4 (16-page reads); HughPC0 with page_cluster 0
      (1-page reads: no readahead).
      
      HDD for swapping to harddisk, SSD for swapping to VertexII SSD.  Seq for
      sequential access to the mapping, cycling five times around; Rand for
      the same number of random touches.  Anon for a MAP_PRIVATE anon mapping;
      Shmem for a MAP_SHARED anon mapping, equivalent to tmpfs.
      
      One weakness of Shaohua's vma/anon_vma approach was that it did not
      optimize Shmem: seen below.  Konstantin's approach was perhaps mistuned,
      50% slower on Seq: did not compete and is not shown below.
      
      HDD        Vanilla Shaohua HughOld HughNew HughPC4 HughPC0
      Seq Anon     73921   76210   75611   76904   78191  121542
      Seq Shmem    73601   73176   73855   72947   74543  118322
      Rand Anon   895392  831243  871569  845197  846496  841680
      Rand Shmem 1058375 1053486  827935  764955  764376  756489
      
      SSD        Vanilla Shaohua HughOld HughNew HughPC4 HughPC0
      Seq Anon     24634   24198   24673   25107   21614   70018
      Seq Shmem    24959   24932   25052   25703   22030   69678
      Rand Anon    43014   26146   28075   25989   26935   25901
      Rand Shmem   45349   45215   28249   24268   24138   24332
      
      These tests are, of course, two extremes of a very simple case: under
      heavier mixed loads I've not yet observed any consistent improvement or
      degradation, and wider testing would be welcome.
      
      Shaohua Li:
      
      Test shows Vanilla is slightly better in sequential workload than Hugh's
      patch.  I observed with Hugh's patch sometimes the readahead size is
      shrinked too fast (from 8 to 1 immediately) in sequential workload if
      there is no hit.  And in such case, continuing doing readahead is good
      actually.
      
      I don't prepare a sophisticated algorithm for the sequential workload
      because so far we can't guarantee sequential accessed pages are swap out
      sequentially.  So I slightly change Hugh's heuristic - don't shrink
      readahead size too fast.
      
      Here is my test result (unit second, 3 runs average):
      	Vanilla		Hugh		New
      Seq	356		370		360
      Random	4525		2447		2444
      
      Attached graph is the swapin/swapout throughput I collected with 'vmstat
      2'.  The first part is running a random workload (till around 1200 of
      the x-axis) and the second part is running a sequential workload.
      swapin and swapout throughput are almost identical in steady state in
      both workloads.  These are expected behavior.  while in Vanilla, swapin
      is much bigger than swapout especially in random workload (because wrong
      readahead).
      
      Original patches by: Shaohua Li and Konstantin Khlebnikov.
      
      [fengguang.wu@intel.com: swapin_nr_pages() can be static]
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NShaohua Li <shli@fusionio.com>
      Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      579f8290
  9. 06 2月, 2014 3 次提交
    • L
      gpio: consumer.h: Move forward declarations outside #ifdef · a3485d08
      Lars-Peter Clausen 提交于
      Make sure that the forward declared structs in gpio/consumer.h are also visible
      on the else branch of the CONFIG_GPIOLIB #ifdef.
      
      Fixes the following warnings and their associated errors when CONFIG_GPIOLIB is
      not selected:
      	include/linux/gpio/consumer.h:67:14: warning: 'struct device' declared inside parameter list
      	include/linux/gpio/consumer.h:67:14: warning: its scope is only this definition or declaration, which is probably not what you want
      	[...]
      Signed-off-by: NLars-Peter Clausen <lars@metafoo.de>
      Reviewed-by: NAlexandre Courbot <acourbot@nvidia.com>
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      a3485d08
    • L
      execve: use 'struct filename *' for executable name passing · c4ad8f98
      Linus Torvalds 提交于
      This changes 'do_execve()' to get the executable name as a 'struct
      filename', and to free it when it is done.  This is what the normal
      users want, and it simplifies and streamlines their error handling.
      
      The controlled lifetime of the executable name also fixes a
      use-after-free problem with the trace_sched_process_exec tracepoint: the
      lifetime of the passed-in string for kernel users was not at all
      obvious, and the user-mode helper code used UMH_WAIT_EXEC to serialize
      the pathname allocation lifetime with the execve() having finished,
      which in turn meant that the trace point that happened after
      mm_release() of the old process VM ended up using already free'd memory.
      
      To solve the kernel string lifetime issue, this simply introduces
      "getname_kernel()" that works like the normal user-space getname()
      function, except with the source coming from kernel memory.
      
      As Oleg points out, this also means that we could drop the tcomm[] array
      from 'struct linux_binprm', since the pathname lifetime now covers
      setup_new_exec().  That would be a separate cleanup.
      Reported-by: NIgor Zhbanov <i.zhbanov@samsung.com>
      Tested-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c4ad8f98
    • G
      of/device: Nullify match table in of_match_device() for CONFIG_OF=n · 1db73ae3
      Geert Uytterhoeven 提交于
      If the of_device_id table inside a device driver is protected by #ifdef
      CONFIG_OF, the driver still has to provide a dummy declaration of the
      table, or wrap it inside of_match_ptr(), when calling of_match_device()
      in the CONFIG_OF=n case, else the driver fails to compile with e.g.
      
      drivers/spi/spi-rspi.c: In function 'rspi_probe':
      drivers/spi/spi-rspi.c:1203:26: error: 'rspi_of_match' undeclared (first use in this function)
      drivers/spi/spi-rspi.c:1203:26: note: each undeclared identifier is reported only once for each function it appears in
      
      Make of_match_device() nullify the table pointer if CONFIG_OF=n to fix
      this.
      Reported-by: NYoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
      Signed-off-by: NGeert Uytterhoeven <geert+renesas@linux-m68k.org>
      Signed-off-by: NRob Herring <robh@kernel.org>
      1db73ae3
  10. 05 2月, 2014 2 次提交
  11. 02 2月, 2014 1 次提交
  12. 01 2月, 2014 1 次提交
  13. 31 1月, 2014 7 次提交
  14. 30 1月, 2014 4 次提交
  15. 29 1月, 2014 2 次提交
  16. 28 1月, 2014 4 次提交