1. 01 3月, 2018 1 次提交
  2. 28 2月, 2018 3 次提交
  3. 24 2月, 2018 3 次提交
  4. 23 2月, 2018 8 次提交
  5. 22 2月, 2018 12 次提交
    • T
      seccomp, ptrace: switch get_metadata types to arch independent · 2a040f9f
      Tycho Andersen 提交于
      Commit 26500475 ("ptrace, seccomp: add support for retrieving seccomp
      metadata") introduced `struct seccomp_metadata`, which contained unsigned
      longs that should be arch independent. The type of the flags member was
      chosen to match the corresponding argument to seccomp(), and so we need
      something at least as big as unsigned long. My understanding is that __u64
      should fit the bill, so let's switch both types to that.
      
      While this is userspace facing, it was only introduced in 4.16-rc2, and so
      should be safe assuming it goes in before then.
      Reported-by: N"Dmitry V. Levin" <ldv@altlinux.org>
      Signed-off-by: NTycho Andersen <tycho@tycho.ws>
      CC: Kees Cook <keescook@chromium.org>
      CC: Oleg Nesterov <oleg@redhat.com>
      Reviewed-by: N"Dmitry V. Levin" <ldv@altlinux.org>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      2a040f9f
    • A
      bug.h: work around GCC PR82365 in BUG() · 173a3efd
      Arnd Bergmann 提交于
      Looking at functions with large stack frames across all architectures
      led me discovering that BUG() suffers from the same problem as
      fortify_panic(), which I've added a workaround for already.
      
      In short, variables that go out of scope by calling a noreturn function
      or __builtin_unreachable() keep using stack space in functions
      afterwards.
      
      A workaround that was identified is to insert an empty assembler
      statement just before calling the function that doesn't return.  I'm
      adding a macro "barrier_before_unreachable()" to document this, and
      insert calls to that in all instances of BUG() that currently suffer
      from this problem.
      
      The files that saw the largest change from this had these frame sizes
      before, and much less with my patch:
      
        fs/ext4/inode.c:82:1: warning: the frame size of 1672 bytes is larger than 800 bytes [-Wframe-larger-than=]
        fs/ext4/namei.c:434:1: warning: the frame size of 904 bytes is larger than 800 bytes [-Wframe-larger-than=]
        fs/ext4/super.c:2279:1: warning: the frame size of 1160 bytes is larger than 800 bytes [-Wframe-larger-than=]
        fs/ext4/xattr.c:146:1: warning: the frame size of 1168 bytes is larger than 800 bytes [-Wframe-larger-than=]
        fs/f2fs/inode.c:152:1: warning: the frame size of 1424 bytes is larger than 800 bytes [-Wframe-larger-than=]
        net/netfilter/ipvs/ip_vs_core.c:1195:1: warning: the frame size of 1068 bytes is larger than 800 bytes [-Wframe-larger-than=]
        net/netfilter/ipvs/ip_vs_core.c:395:1: warning: the frame size of 1084 bytes is larger than 800 bytes [-Wframe-larger-than=]
        net/netfilter/ipvs/ip_vs_ftp.c:298:1: warning: the frame size of 928 bytes is larger than 800 bytes [-Wframe-larger-than=]
        net/netfilter/ipvs/ip_vs_ftp.c:418:1: warning: the frame size of 908 bytes is larger than 800 bytes [-Wframe-larger-than=]
        net/netfilter/ipvs/ip_vs_lblcr.c:718:1: warning: the frame size of 960 bytes is larger than 800 bytes [-Wframe-larger-than=]
        drivers/net/xen-netback/netback.c:1500:1: warning: the frame size of 1088 bytes is larger than 800 bytes [-Wframe-larger-than=]
      
      In case of ARC and CRIS, it turns out that the BUG() implementation
      actually does return (or at least the compiler thinks it does),
      resulting in lots of warnings about uninitialized variable use and
      leaving noreturn functions, such as:
      
        block/cfq-iosched.c: In function 'cfq_async_queue_prio':
        block/cfq-iosched.c:3804:1: error: control reaches end of non-void function [-Werror=return-type]
        include/linux/dmaengine.h: In function 'dma_maxpq':
        include/linux/dmaengine.h:1123:1: error: control reaches end of non-void function [-Werror=return-type]
      
      This makes them call __builtin_trap() instead, which should normally
      dump the stack and kill the current process, like some of the other
      architectures already do.
      
      I tried adding barrier_before_unreachable() to panic() and
      fortify_panic() as well, but that had very little effect, so I'm not
      submitting that patch.
      
      Vineet said:
      
      : For ARC, it is double win.
      :
      : 1. Fixes 3 -Wreturn-type warnings
      :
      : | ../net/core/ethtool.c:311:1: warning: control reaches end of non-void function
      : [-Wreturn-type]
      : | ../kernel/sched/core.c:3246:1: warning: control reaches end of non-void function
      : [-Wreturn-type]
      : | ../include/linux/sunrpc/svc_xprt.h:180:1: warning: control reaches end of
      : non-void function [-Wreturn-type]
      :
      : 2.  bloat-o-meter reports code size improvements as gcc elides the
      :    generated code for stack return.
      
      Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82365
      Link: http://lkml.kernel.org/r/20171219114112.939391-1-arnd@arndb.deSigned-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: Vineet Gupta <vgupta@synopsys.com>	[arch/arc]
      Tested-by: Vineet Gupta <vgupta@synopsys.com>	[arch/arc]
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Christopher Li <sparse@chrisli.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      173a3efd
    • S
      mm, mlock, vmscan: no more skipping pagevecs · 9c4e6b1a
      Shakeel Butt 提交于
      When a thread mlocks an address space backed either by file pages which
      are currently not present in memory or swapped out anon pages (not in
      swapcache), a new page is allocated and added to the local pagevec
      (lru_add_pvec), I/O is triggered and the thread then sleeps on the page.
      On I/O completion, the thread can wake on a different CPU, the mlock
      syscall will then sets the PageMlocked() bit of the page but will not be
      able to put that page in unevictable LRU as the page is on the pagevec
      of a different CPU.  Even on drain, that page will go to evictable LRU
      because the PageMlocked() bit is not checked on pagevec drain.
      
      The page will eventually go to right LRU on reclaim but the LRU stats
      will remain skewed for a long time.
      
      This patch puts all the pages, even unevictable, to the pagevecs and on
      the drain, the pages will be added on their LRUs correctly by checking
      their evictability.  This resolves the mlocked pages on pagevec of other
      CPUs issue because when those pagevecs will be drained, the mlocked file
      pages will go to unevictable LRU.  Also this makes the race with munlock
      easier to resolve because the pagevec drains happen in LRU lock.
      
      However there is still one place which makes a page evictable and does
      PageLRU check on that page without LRU lock and needs special attention.
      TestClearPageMlocked() and isolate_lru_page() in clear_page_mlock().
      
      	#0: __pagevec_lru_add_fn	#1: clear_page_mlock
      
      	SetPageLRU()			if (!TestClearPageMlocked())
      					  return
      	smp_mb() // <--required
      					// inside does PageLRU
      	if (!PageMlocked())		if (isolate_lru_page())
      	  move to evictable LRU		  putback_lru_page()
      	else
      	  move to unevictable LRU
      
      In '#1', TestClearPageMlocked() provides full memory barrier semantics
      and thus the PageLRU check (inside isolate_lru_page) can not be
      reordered before it.
      
      In '#0', without explicit memory barrier, the PageMlocked() check can be
      reordered before SetPageLRU().  If that happens, '#0' can put a page in
      unevictable LRU and '#1' might have just cleared the Mlocked bit of that
      page but fails to isolate as PageLRU fails as '#0' still hasn't set
      PageLRU bit of that page.  That page will be stranded on the unevictable
      LRU.
      
      There is one (good) side effect though.  Without this patch, the pages
      allocated for System V shared memory segment are added to evictable LRUs
      even after shmctl(SHM_LOCK) on that segment.  This patch will correctly
      put such pages to unevictable LRU.
      
      Link: http://lkml.kernel.org/r/20171121211241.18877-1-shakeelb@google.comSigned-off-by: NShakeel Butt <shakeelb@google.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Shaohua Li <shli@fb.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9c4e6b1a
    • J
      mm: memcontrol: fix NR_WRITEBACK leak in memcg and system stats · c3cc3911
      Johannes Weiner 提交于
      After commit a983b5eb ("mm: memcontrol: fix excessive complexity in
      memory.stat reporting"), we observed slowly upward creeping NR_WRITEBACK
      counts over the course of several days, both the per-memcg stats as well
      as the system counter in e.g.  /proc/meminfo.
      
      The conversion from full per-cpu stat counts to per-cpu cached atomic
      stat counts introduced an irq-unsafe RMW operation into the updates.
      
      Most stat updates come from process context, but one notable exception
      is the NR_WRITEBACK counter.  While writebacks are issued from process
      context, they are retired from (soft)irq context.
      
      When writeback completions interrupt the RMW counter updates of new
      writebacks being issued, the decs from the completions are lost.
      
      Since the global updates are routed through the joint lruvec API, both
      the memcg counters as well as the system counters are affected.
      
      This patch makes the joint stat and event API irq safe.
      
      Link: http://lkml.kernel.org/r/20180203082353.17284-1-hannes@cmpxchg.org
      Fixes: a983b5eb ("mm: memcontrol: fix excessive complexity in memory.stat reporting")
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Debugged-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NRik van Riel <riel@surriel.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c3cc3911
    • A
      Kbuild: always define endianess in kconfig.h · 101110f6
      Arnd Bergmann 提交于
      Build testing with LTO found a couple of files that get compiled
      differently depending on whether asm/byteorder.h gets included early
      enough or not.  In particular, include/asm-generic/qrwlock_types.h is
      affected by this, but there are probably others as well.
      
      The symptom is a series of LTO link time warnings, including these:
      
          net/netlabel/netlabel_unlabeled.h:223: error: type of 'netlbl_unlhsh_add' does not match original declaration [-Werror=lto-type-mismatch]
           int netlbl_unlhsh_add(struct net *net,
          net/netlabel/netlabel_unlabeled.c:377: note: 'netlbl_unlhsh_add' was previously declared here
      
          include/net/ipv6.h:360: error: type of 'ipv6_renew_options_kern' does not match original declaration [-Werror=lto-type-mismatch]
           ipv6_renew_options_kern(struct sock *sk,
          net/ipv6/exthdrs.c:1162: note: 'ipv6_renew_options_kern' was previously declared here
      
          net/core/dev.c:761: note: 'dev_get_by_name_rcu' was previously declared here
           struct net_device *dev_get_by_name_rcu(struct net *net, const char *name)
          net/core/dev.c:761: note: code may be misoptimized unless -fno-strict-aliasing is used
      
          drivers/gpu/drm/i915/i915_drv.h:3377: error: type of 'i915_gem_object_set_to_wc_domain' does not match original declaration [-Werror=lto-type-mismatch]
           i915_gem_object_set_to_wc_domain(struct drm_i915_gem_object *obj, bool write);
          drivers/gpu/drm/i915/i915_gem.c:3639: note: 'i915_gem_object_set_to_wc_domain' was previously declared here
      
          include/linux/debugfs.h:92:9: error: type of 'debugfs_attr_read' does not match original declaration [-Werror=lto-type-mismatch]
           ssize_t debugfs_attr_read(struct file *file, char __user *buf,
          fs/debugfs/file.c:318: note: 'debugfs_attr_read' was previously declared here
      
          include/linux/rwlock_api_smp.h:30: error: type of '_raw_read_unlock' does not match original declaration [-Werror=lto-type-mismatch]
           void __lockfunc _raw_read_unlock(rwlock_t *lock) __releases(lock);
          kernel/locking/spinlock.c:246:26: note: '_raw_read_unlock' was previously declared here
      
          include/linux/fs.h:3308:5: error: type of 'simple_attr_open' does not match original declaration [-Werror=lto-type-mismatch]
           int simple_attr_open(struct inode *inode, struct file *file,
          fs/libfs.c:795: note: 'simple_attr_open' was previously declared here
      
      All of the above are caused by include/asm-generic/qrwlock_types.h
      failing to include asm/byteorder.h after commit e0d02285
      ("locking/qrwlock: Use 'struct qrwlock' instead of 'struct __qrwlock'")
      in linux-4.15.
      
      Similar bugs may or may not exist in older kernels as well, but there is
      no easy way to test those with link-time optimizations, and kernels
      before 4.14 are harder to fix because they don't have Babu's patch
      series
      
      We had similar issues with CONFIG_ symbols in the past and ended up
      always including the configuration headers though linux/kconfig.h.  This
      works around the issue through that same file, defining either
      __BIG_ENDIAN or __LITTLE_ENDIAN depending on CONFIG_CPU_BIG_ENDIAN,
      which is now always set on all architectures since commit 4c97a0c8
      ("arch: define CPU_BIG_ENDIAN for all fixed big endian archs").
      
      Link: http://lkml.kernel.org/r/20180202154104.1522809-2-arnd@arndb.deSigned-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Babu Moger <babu.moger@amd.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Nicolas Pitre <nico@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      101110f6
    • A
      include/linux/sched/mm.h: re-inline mmdrop() · d34bc48f
      Andrew Morton 提交于
      As Peter points out, Doing a CALL+RET for just the decrement is a bit silly.
      
      Fixes: d70f2a14 ("include/linux/sched/mm.h: uninline mmdrop_async(), etc")
      Acked-by: NPeter Zijlstra (Intel) <peterz@infraded.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d34bc48f
    • D
      net: Allow a rule to track originating protocol · cac56209
      Donald Sharp 提交于
      Allow a rule that is being added/deleted/modified or
      dumped to contain the originating protocol's id.
      
      The protocol is handled just like a routes originating
      protocol is.  This is especially useful because there
      is starting to be a plethora of different user space
      programs adding rules.
      
      Allow the vrf device to specify that the kernel is the originator
      of the rule created for this device.
      Signed-off-by: NDonald Sharp <sharpd@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cac56209
    • Y
      tcp: remove the hardcode in the definition of TCPF Macro · a823fed0
      Yafang Shao 提交于
      TCPF_ macro depends on the definition of TCP_ macro.
      So it is better to define them with TCP_ marco.
      Signed-off-by: NYafang Shao <laoar.shao@gmail.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a823fed0
    • E
      tcp: remove sk_check_csum_caps() · dead7cdb
      Eric Dumazet 提交于
      Since TCP relies on GSO, we do not need this helper anymore.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dead7cdb
    • E
      tcp: switch to GSO being always on · 0a6b2a1d
      Eric Dumazet 提交于
      Oleksandr Natalenko reported performance issues with BBR without FQ
      packet scheduler that were root caused to lack of SG and GSO/TSO on
      his configuration.
      
      In this mode, TCP internal pacing has to setup a high resolution timer
      for each MSS sent.
      
      We could implement in TCP a strategy similar to the one adopted
      in commit fefa569a ("net_sched: sch_fq: account for schedule/timers drifts")
      or decide to finally switch TCP stack to a GSO only mode.
      
      This has many benefits :
      
      1) Most TCP developments are done with TSO in mind.
      2) Less high-resolution timers needs to be armed for TCP-pacing
      3) GSO can benefit of xmit_more hint
      4) Receiver GRO is more effective (as if TSO was used for real on sender)
         -> Lower ACK traffic
      5) Write queues have less overhead (one skb holds about 64KB of payload)
      6) SACK coalescing just works.
      7) rtx rb-tree contains less packets, SACK is cheaper.
      
      This patch implements the minimum patch, but we can remove some legacy
      code as follow ups.
      
      Tested:
      
      On 40Gbit link, one netperf -t TCP_STREAM
      
      BBR+fq:
      sg on:  26 Gbits/sec
      sg off: 15.7 Gbits/sec   (was 2.3 Gbit before patch)
      
      BBR+pfifo_fast:
      sg on:  24.2 Gbits/sec
      sg off: 14.9 Gbits/sec  (was 0.66 Gbit before patch !!! )
      
      BBR+fq_codel:
      sg on:  24.4 Gbits/sec
      sg off: 15 Gbits/sec  (was 0.66 Gbit before patch !!! )
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NOleksandr Natalenko <oleksandr@natalenko.name>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a6b2a1d
    • F
      net/mac8390: Convert to nubus_driver · 494a973e
      Finn Thain 提交于
      This resolves an old bug that constrained this driver to no more than
      one card.
      Tested-by: NStan Johnson <userm57@yahoo.com>
      Signed-off-by: NFinn Thain <fthain@telegraphics.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      494a973e
    • E
      net: sched: add em_ipt ematch for calling xtables matches · ccc007e4
      Eyal Birger 提交于
      The commit a new tc ematch for using netfilter xtable matches.
      
      This allows early classification as well as mirroning/redirecting traffic
      based on logic implemented in netfilter extensions.
      
      Current supported use case is classification based on the incoming IPSec
      state used during decpsulation using the 'policy' iptables extension
      (xt_policy).
      
      The module dynamically fetches the netfilter match module and calls
      it using a fake xt_action_param structure based on validated userspace
      provided parameters.
      
      As the xt_policy match does not access skb->data, no skb modifications
      are needed on match.
      Signed-off-by: NEyal Birger <eyal.birger@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ccc007e4
  6. 21 2月, 2018 3 次提交
  7. 20 2月, 2018 6 次提交
  8. 19 2月, 2018 4 次提交