1. 23 4月, 2015 1 次提交
  2. 21 4月, 2015 2 次提交
    • T
      net: add skb_checksum_complete_unset · 4e18b9ad
      Tom Herbert 提交于
      This function changes ip_summed to CHECKSUM_NONE if CHECKSUM_COMPLETE
      is set. This is called to discard checksum-complete when packet
      is being modified and checksum is not pulled for headers in a layer.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e18b9ad
    • L
      smp: don't use 16-bit words for atomic accesses · f4d03bd1
      Linus Torvalds 提交于
      Yes, it should work, but it's a bad idea.  Not only did ARM64 not have
      the 16-bit access code (there's a separate patch to add it), it's just
      not a good atomic type.  Some architectures fundamentally don't do
      atomic accesses in them (alpha), and it's not like it saves any space
      here anyway because of structure packing issues.
      
      We normally should aim for flags to be "unsigned int" or "unsigned
      long".  And if space is at a premium, use a single byte (although that
      causes problems on alpha again).  There might be very special cases
      where a 16-byte entity is really wanted, but this is not one of them.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f4d03bd1
  3. 20 4月, 2015 1 次提交
  4. 19 4月, 2015 2 次提交
  5. 18 4月, 2015 3 次提交
    • D
      netns: remove BUG_ONs from net_generic() · 2591ffd3
      Denys Vlasenko 提交于
      This inline has ~500 callsites.
      
      On 04/14/2015 08:37 PM, David Miller wrote:
      > That BUG_ON() was added 7 years ago, and I don't remember it ever
      > triggering or helping us diagnose something, so just remove it and
      > keep the function inlined.
      
      On x86 allyesconfig build:
      
          text     data      bss       dec     hex filename
      82447071 22255384 20627456 125329911 77861f7 vmlinux4
      82441375 22255384 20627456 125324215 7784bb7 vmlinux5prime
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      CC: Eric W. Biederman <ebiederm@xmission.com>
      CC: David S. Miller <davem@davemloft.net>
      CC: Jan Engelhardt <jengelh@medozas.de>
      CC: Jiri Pirko <jpirko@redhat.com>
      CC: linux-kernel@vger.kernel.org
      CC: netdev@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2591ffd3
    • J
      net: remove unused 'dev' argument from netif_needs_gso() · 8b86a61d
      Johannes Berg 提交于
      In commit 04ffcb25 ("net: Add ndo_gso_check") Tom originally
      added the 'dev' argument to be able to call ndo_gso_check().
      
      Then later, when generalizing this in commit 5f35227e
      ("net: Generalize ndo_gso_check to ndo_features_check")
      Jesse removed the call to ndo_gso_check() in netif_needs_gso()
      by calling the new ndo_features_check() in a different place.
      This made the 'dev' argument unused.
      
      Remove the unused argument and go back to the code as before.
      
      Cc: Tom Herbert <therbert@google.com>
      Cc: Jesse Gross <jesse@nicira.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b86a61d
    • E
      inet_diag: fix access to tcp cc information · 521f1cf1
      Eric Dumazet 提交于
      Two different problems are fixed here :
      
      1) inet_sk_diag_fill() might be called without socket lock held.
         icsk->icsk_ca_ops can change under us and module be unloaded.
         -> Access to freed memory.
         Fix this using rcu_read_lock() to prevent module unload.
      
      2) Some TCP Congestion Control modules provide information
         but again this is not safe against icsk->icsk_ca_ops
         change and nla_put() errors were ignored. Some sockets
         could not get the additional info if skb was almost full.
      
      Fix this by returning a status from get_info() handlers and
      using rcu protection as well.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      521f1cf1
  6. 17 4月, 2015 17 次提交
  7. 16 4月, 2015 14 次提交
    • R
      cpumask: resurrect CPU_MASK_CPU0 · 1527781d
      Rusty Russell 提交于
      We removed it in 2f0f267e (cpumask: remove deprecated functions.),
      but grep shows it still used by MIPS, and not unreasonably.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      1527781d
    • R
      linux/bitmap.h: improve BITMAP_{LAST,FIRST}_WORD_MASK · 89c1e79e
      Rasmus Villemoes 提交于
      The macro BITMAP_LAST_WORD_MASK can be implemented without a conditional,
      which will generally lead to slightly better generated code (221 bytes
      saved for allmodconfig-GCOV_KERNEL, ~2k with GCOV_KERNEL).  As a small
      bonus, this also ensures that the nbits parameter is expanded exactly
      once.
      
      In BITMAP_FIRST_WORD_MASK, if start is signed gcc is technically allowed
      to assume it is positive (or divisible by BITS_PER_LONG), and hence just
      do the simple mask.  It doesn't seem to use this, and even on an
      architecture like x86 where the shift only depends on the lower 5 or 6
      bits, and these bits are not affected by the signedness of the expression,
      gcc still generates code to compute the C99 mandated value of start %
      BITS_PER_LONG.  So just use a mask explicitly, also for consistency with
      BITMAP_LAST_WORD_MASK.
      Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Tejun Heo <tj@kernel.org>
      Reviewed-by: NGeorge Spelvin <linux@horizon.com>
      Cc: Yury Norov <yury.norov@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      89c1e79e
    • R
      lib/string_helpers.c: change semantics of string_escape_mem · 41416f23
      Rasmus Villemoes 提交于
      The current semantics of string_escape_mem are inadequate for one of its
      current users, vsnprintf().  If that is to honour its contract, it must
      know how much space would be needed for the entire escaped buffer, and
      string_escape_mem provides no way of obtaining that (short of allocating a
      large enough buffer (~4 times input string) to let it play with, and
      that's definitely a big no-no inside vsnprintf).
      
      So change the semantics for string_escape_mem to be more snprintf-like:
      Return the size of the output that would be generated if the destination
      buffer was big enough, but of course still only write to the part of dst
      it is allowed to, and (contrary to snprintf) don't do '\0'-termination.
      It is then up to the caller to detect whether output was truncated and to
      append a '\0' if desired.  Also, we must output partial escape sequences,
      otherwise a call such as snprintf(buf, 3, "%1pE", "\123") would cause
      printf to write a \0 to buf[2] but leaving buf[0] and buf[1] with whatever
      they previously contained.
      
      This also fixes a bug in the escaped_string() helper function, which used
      to unconditionally pass a length of "end-buf" to string_escape_mem();
      since the latter doesn't check osz for being insanely large, it would
      happily write to dst.  For example, kasprintf(GFP_KERNEL, "something and
      then %pE", ...); is an easy way to trigger an oops.
      
      In test-string_helpers.c, the -ENOMEM test is replaced with testing for
      getting the expected return value even if the buffer is too small.  We
      also ensure that nothing is written (by relying on a NULL pointer deref)
      if the output size is 0 by passing NULL - this has to work for
      kasprintf("%pE") to work.
      
      In net/sunrpc/cache.c, I think qword_add still has the same semantics.
      Someone should definitely double-check this.
      
      In fs/proc/array.c, I made the minimum possible change, but longer-term it
      should stop poking around in seq_file internals.
      
      [andriy.shevchenko@linux.intel.com: simplify qword_add]
      [andriy.shevchenko@linux.intel.com: add missed curly braces]
      Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      Acked-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      41416f23
    • S
      printk: comment pr_cont() stating it is only to continue a line · 7b1460ec
      Steven Rostedt 提交于
      KERN_CONT is nicely commented in kern_levels.h, but pr_cont() is now used
      more often, and it lacks the comment stating what it is used for.  It can
      be confused as continuing the log level, but that is not its purpose.  Its
      purpose is to continue a line that had no newline enclosed.  This should
      be documented by pr_cont() as well.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7b1460ec
    • J
      kernel/reboot.c: add orderly_reboot for graceful reboot · 7a54f46b
      Joel Stanley 提交于
      The kernel has orderly_poweroff which allows the kernel to initiate a
      graceful shutdown of userspace, by running /sbin/poweroff.  This adds
      orderly_reboot that will cause userspace to shut itself down by calling
      /sbin/reboot.
      
      This will be used for shutdown initiated by a system controller on
      platforms that do not use ACPI.
      
      orderly_reboot() should be used when the system wants to allow userspace
      to gracefully shut itself down.  For cases where the system may imminently
      catch on fire, the existing emergency_restart() provides an immediate
      reboot without involving userspace.
      Signed-off-by: NJoel Stanley <joel@jms.id.au>
      Cc: Fabian Frederick <fabf@skynet.be>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jeremy Kerr <jk@ozlabs.org>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7a54f46b
    • J
      kernel/resource.c: remove deprecated __check_region() and friends · 96831c0a
      Jakub Sitnicki 提交于
      All users of __check_region(), check_region(), and check_mem_region() are
      gone.  We got rid of the last user in v4.0-rc1.  Remove them.
      
      bloat-o-meter on x86_64 shows:
      
      add/remove: 0/3 grow/shrink: 0/0 up/down: 0/-102 (-102)
      function                                     old     new   delta
      __kstrtab___check_region                      15       -     -15
      __ksymtab___check_region                      16       -     -16
      __check_region                                71       -     -71
      Signed-off-by: NJakub Sitnicki <jsitnicki@gmail.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      96831c0a
    • I
      kernel: conditionally support non-root users, groups and capabilities · 2813893f
      Iulia Manda 提交于
      There are a lot of embedded systems that run most or all of their
      functionality in init, running as root:root.  For these systems,
      supporting multiple users is not necessary.
      
      This patch adds a new symbol, CONFIG_MULTIUSER, that makes support for
      non-root users, non-root groups, and capabilities optional.  It is enabled
      under CONFIG_EXPERT menu.
      
      When this symbol is not defined, UID and GID are zero in any possible case
      and processes always have all capabilities.
      
      The following syscalls are compiled out: setuid, setregid, setgid,
      setreuid, setresuid, getresuid, setresgid, getresgid, setgroups,
      getgroups, setfsuid, setfsgid, capget, capset.
      
      Also, groups.c is compiled out completely.
      
      In kernel/capability.c, capable function was moved in order to avoid
      adding two ifdef blocks.
      
      This change saves about 25 KB on a defconfig build.  The most minimal
      kernels have total text sizes in the high hundreds of kB rather than
      low MB.  (The 25k goes down a bit with allnoconfig, but not that much.
      
      The kernel was booted in Qemu.  All the common functionalities work.
      Adding users/groups is not possible, failing with -ENOSYS.
      
      Bloat-o-meter output:
      add/remove: 7/87 grow/shrink: 19/397 up/down: 1675/-26325 (-24650)
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NIulia Manda <iulia.manda21@gmail.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Tested-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2813893f
    • R
      include/linux: remove empty conditionals · 23f40a94
      Rasmus Villemoes 提交于
      Commit 607ca46e ("UAPI: (Scripted) Disintegrate include/linux") left
      behind some empty conditional blocks.  Since they are useless and may
      cause a reader to wonder whether something is missing, remove them.
      Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      23f40a94
    • M
      zsmalloc: support compaction · 312fcae2
      Minchan Kim 提交于
      This patch provides core functions for migration of zsmalloc.  Migraion
      policy is simple as follows.
      
      for each size class {
              while {
                      src_page = get zs_page from ZS_ALMOST_EMPTY
                      if (!src_page)
                              break;
                      dst_page = get zs_page from ZS_ALMOST_FULL
                      if (!dst_page)
                              dst_page = get zs_page from ZS_ALMOST_EMPTY
                      if (!dst_page)
                              break;
                      migrate(from src_page, to dst_page);
              }
      }
      
      For migration, we need to identify which objects in zspage are allocated
      to migrate them out.  We could know it by iterating of freed objects in a
      zspage because first_page of zspage keeps free objects singly-linked list
      but it's not efficient.  Instead, this patch adds a tag(ie,
      OBJ_ALLOCATED_TAG) in header of each object(ie, handle) so we could check
      whether the object is allocated easily.
      
      This patch adds another status bit in handle to synchronize between user
      access through zs_map_object and migration.  During migration, we cannot
      move objects user are using due to data coherency between old object and
      new object.
      
      [akpm@linux-foundation.org: zsmalloc.c needs sched.h for cond_resched()]
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: Juneho Choi <juno.choi@lge.com>
      Cc: Gunho Lee <gunho.lee@lge.com>
      Cc: Luigi Semenzato <semenzato@google.com>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Seth Jennings <sjennings@variantweb.net>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      312fcae2
    • B
      dax: use pfn_mkwrite to update c/mtime + freeze protection · 0e3b210c
      Boaz Harrosh 提交于
      From: Yigal Korman <yigal@plexistor.com>
      
      [v1]
      Without this patch, c/mtime is not updated correctly when mmap'ed page is
      first read from and then written to.
      
      A new xfstest is submitted for testing this (generic/080)
      
      [v2]
      Jan Kara has pointed out that if we add the
      sb_start/end_pagefault pair in the new pfn_mkwrite we
      are then fixing another bug where: A user could start
      writing to the page while filesystem is frozen.
      Signed-off-by: NYigal Korman <yigal@plexistor.com>
      Signed-off-by: NBoaz Harrosh <boaz@plexistor.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0e3b210c
    • B
      mm: new pfn_mkwrite same as page_mkwrite for VM_PFNMAP · dd906184
      Boaz Harrosh 提交于
      This will allow FS that uses VM_PFNMAP | VM_MIXEDMAP (no page structs) to
      get notified when access is a write to a read-only PFN.
      
      This can happen if we mmap() a file then first mmap-read from it to
      page-in a read-only PFN, than we mmap-write to the same page.
      
      We need this functionality to fix a DAX bug, where in the scenario above
      we fail to set ctime/mtime though we modified the file.  An xfstest is
      attached to this patchset that shows the failure and the fix.  (A DAX
      patch will follow)
      
      This functionality is extra important for us, because upon dirtying of a
      pmem page we also want to RDMA the page to a remote cluster node.
      
      We define a new pfn_mkwrite and do not reuse page_mkwrite because
        1 - The name ;-)
        2 - But mainly because it would take a very long and tedious
            audit of all page_mkwrite functions of VM_MIXEDMAP/VM_PFNMAP
            users. To make sure they do not now CRASH. For example current
            DAX code (which this is for) would crash.
            If we would want to reuse page_mkwrite, We will need to first
            patch all users, so to not-crash-on-no-page. Then enable this
            patch. But even if I did that I would not sleep so well at night.
            Adding a new vector is the safest thing to do, and is not that
            expensive. an extra pointer at a static function vector per driver.
            Also the new vector is better for performance, because else we
            Will call all current Kernel vectors, so to:
              check-ha-no-page-do-nothing and return.
      
      No need to call it from do_shared_fault because do_wp_page is called to
      change pte permissions anyway.
      Signed-off-by: NYigal Korman <yigal@plexistor.com>
      Signed-off-by: NBoaz Harrosh <boaz@plexistor.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dd906184
    • A
      mm/mempool.c: kasan: poison mempool elements · 92393615
      Andrey Ryabinin 提交于
      Mempools keep allocated objects in reserved for situations when ordinary
      allocation may not be possible to satisfy.  These objects shouldn't be
      accessed before they leave the pool.
      
      This patch poison elements when get into the pool and unpoison when they
      leave it.  This will let KASan to detect use-after-free of mempool's
      elements.
      Signed-off-by: NAndrey Ryabinin <a.ryabinin@samsung.com>
      Tested-by: NDavid Rientjes <rientjes@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dmitry Chernenkov <drcheren@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      92393615
    • K
      mm: uninline and cleanup page-mapping related helpers · e39155ea
      Kirill A. Shutemov 提交于
      Most-used page->mapping helper -- page_mapping() -- has already uninlined.
       Let's uninline also page_rmapping() and page_anon_vma().  It saves us
      depending on configuration around 400 bytes in text:
      
         text	   data	    bss	    dec	    hex	filename
       660318	  99254	 410000	1169572	 11d8a4	mm/built-in.o-before
       659854	  99254	 410000	1169108	 11d6d4	mm/built-in.o
      
      I also tried to make code a bit more clean.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e39155ea
    • S
      mm: cma: add trace events for CMA allocations and freeings · 99e8ea6c
      Stefan Strogin 提交于
      Add trace events for cma_alloc() and cma_release().
      
      The cma_alloc tracepoint is used both for successful and failed allocations,
      in case of allocation failure pfn=-1UL is stored and printed.
      Signed-off-by: NStefan Strogin <stefan.strogin@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Nazarewicz <mpn@google.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
      Cc: Thierry Reding <treding@nvidia.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      99e8ea6c