1. 13 7月, 2017 23 次提交
    • L
      test_sysctl: test against int proc_dointvec() array support · 7c43a657
      Luis R. Rodriguez 提交于
      Add a few initial respective tests for an array:
      
        o Echoing values separated by spaces works
        o Echoing only first elements will set first elements
        o Confirm PAGE_SIZE limit still applies even if an array is used
      
      Link: http://lkml.kernel.org/r/20170630224431.17374-7-mcgrof@kernel.orgSigned-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7c43a657
    • L
      test_sysctl: add simple proc_douintvec() case · 2920fad3
      Luis R. Rodriguez 提交于
      Test against a simple proc_douintvec() case.  While at it, add a test
      against UINT_MAX.  Make sure UINT_MAX works, and UINT_MAX+1 will fail
      and that negative values are not accepted.
      
      Link: http://lkml.kernel.org/r/20170630224431.17374-6-mcgrof@kernel.orgSigned-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2920fad3
    • L
      test_sysctl: add simple proc_dointvec() case · eb965eda
      Luis R. Rodriguez 提交于
      Test against a simple proc_dointvec() case.  While at it, add a test
      against INT_MAX.  Make sure INT_MAX works, and INT_MAX+1 will fail.
      Also test negative values work.
      
      Link: http://lkml.kernel.org/r/20170630224431.17374-5-mcgrof@kernel.orgSigned-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eb965eda
    • L
      test_sysctl: test against PAGE_SIZE for int · 1c0357c8
      Luis R. Rodriguez 提交于
      Add the following tests to ensure we do not regress:
      
        o Test using a buffer full of space (PAGE_SIZE-1) followed by a
          single digit works
      
        o Test using a buffer full of spaces (PAGE_SIZE or over) will fail
      
      As tests increase instead of unloading the module and reloading it we
      can just do a shell reset_vals() with a reset to values we know are set
      at init on the driver.
      
      Link: http://lkml.kernel.org/r/20170630224431.17374-4-mcgrof@kernel.orgSigned-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1c0357c8
    • L
      test_sysctl: add generic script to expand on tests · 64b67120
      Luis R. Rodriguez 提交于
      This adds a generic script to let us more easily add more tests cases.
      Since we really have only two types of tests cases just fold them into
      the one file.  Each test unit is now identified into its separate
      function:
      
          # ./sysctl.sh -l
        Test ID list:
      
        TEST_ID x NUM_TEST
        TEST_ID:   Test ID
        NUM_TESTS: Number of recommended times to run the test
      
        0001 x 1 - tests proc_dointvec_minmax()
        0002 x 1 - tests proc_dostring()
      
      For now we start off with what we had before, and run only each test
      once.  We can now watch a test case until it fails:
      
        ./sysctl.sh -w 0002
      
      We can also run a test case x number of times, say we want to run a test
      case 100 times:
      
        ./sysctl.sh -c 0001 100
      
      To run a test case only once, for example:
      
        ./sysctl.sh -s 0002
      
      The default settings are specified at the top of sysctl.sh.
      
      Link: http://lkml.kernel.org/r/20170630224431.17374-3-mcgrof@kernel.orgSigned-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      64b67120
    • L
      test_sysctl: add dedicated proc sysctl test driver · 9308f2f9
      Luis R. Rodriguez 提交于
      The existing tools/testing/selftests/sysctl/ tests include two test
      cases, but these use existing production kernel sysctl interfaces.  We
      want to expand test coverage but we can't just be looking for random
      safe production values to poke at, that's just insane!
      
      Instead just dedicate a test driver for debugging purposes and port the
      existing scripts to use it.  This will make it easier for further tests
      to be added.
      
      Subsequent patches will extend our test coverage for sysctl.
      
      The stress test driver uses a new license (GPL on Linux, copyleft-next
      outside of Linux).  Linus was fine with this [0] and later due to Ted's
      and Alans's request ironed out an "or" language clause to use [1] which
      is already present upstream.
      
      [0] https://lkml.kernel.org/r/CA+55aFyhxcvD+q7tp+-yrSFDKfR0mOHgyEAe=f_94aKLsOu0Og@mail.gmail.com
      [1] https://lkml.kernel.org/r/1495234558.7848.122.camel@linux.intel.com
      
      Link: http://lkml.kernel.org/r/20170630224431.17374-2-mcgrof@kernel.orgSigned-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9308f2f9
    • L
      sysctl: add unsigned int range support · 61d9b56a
      Luis R. Rodriguez 提交于
      To keep parity with regular int interfaces provide the an unsigned int
      proc_douintvec_minmax() which allows you to specify a range of allowed
      valid numbers.
      
      Adding proc_douintvec_minmax_sysadmin() is easy but we can wait for an
      actual user for that.
      
      Link: http://lkml.kernel.org/r/20170519033554.18592-6-mcgrof@kernel.orgSigned-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      61d9b56a
    • L
      sysctl: simplify unsigned int support · 4f2fec00
      Luis R. Rodriguez 提交于
      Commit e7d316a0 ("sysctl: handle error writing UINT_MAX to u32
      fields") added proc_douintvec() to start help adding support for
      unsigned int, this however was only half the work needed.  Two fixes
      have come in since then for the following issues:
      
        o Printing the values shows a negative value, this happens since
          do_proc_dointvec() and this uses proc_put_long()
      
      This was fixed by commit 5380e564 ("sysctl: don't print negative
      flag for proc_douintvec").
      
        o We can easily wrap around the int values: UINT_MAX is 4294967295, if
          we echo in 4294967295 + 1 we end up with 0, using 4294967295 + 2 we
          end up with 1.
        o We echo negative values in and they are accepted
      
      This was fixed by commit 425fffd8 ("sysctl: report EINVAL if value
      is larger than UINT_MAX for proc_douintvec").
      
      It still also failed to be added to sysctl_check_table()...  instead of
      adding it with the current implementation just provide a proper and
      simplified unsigned int support without any array unsigned int support
      with no negative support at all.
      
      Historically sysctl proc helpers have supported arrays, due to the
      complexity this adds though we've taken a step back to evaluate array
      users to determine if its worth upkeeping for unsigned int.  An
      evaluation using Coccinelle has been done to perform a grammatical
      search to ask ourselves:
      
        o How many sysctl proc_dointvec() (int) users exist which likely
          should be moved over to proc_douintvec() (unsigned int) ?
      	Answer: about 8
      	- Of these how many are array users ?
      		Answer: Probably only 1
        o How many sysctl array users exist ?
      	Answer: about 12
      
      This last question gives us an idea just how popular arrays: they are not.
      Array support should probably just be kept for strings.
      
      The identified uint ports are:
      
        drivers/infiniband/core/ucma.c - max_backlog
        drivers/infiniband/core/iwcm.c - default_backlog
        net/core/sysctl_net_core.c - rps_sock_flow_sysctl()
        net/netfilter/nf_conntrack_timestamp.c - nf_conntrack_timestamp -- bool
        net/netfilter/nf_conntrack_acct.c nf_conntrack_acct -- bool
        net/netfilter/nf_conntrack_ecache.c - nf_conntrack_events -- bool
        net/netfilter/nf_conntrack_helper.c - nf_conntrack_helper -- bool
        net/phonet/sysctl.c proc_local_port_range()
      
      The only possible array users is proc_local_port_range() but it does not
      seem worth it to add array support just for this given the range support
      works just as well.  Unsigned int support should be desirable more for
      when you *need* more than INT_MAX or using int min/max support then does
      not suffice for your ranges.
      
      If you forget and by mistake happen to register an unsigned int proc
      entry with an array, the driver will fail and you will get something as
      follows:
      
      sysctl table check failed: debug/test_sysctl//uint_0002 array now allowed
      CPU: 2 PID: 1342 Comm: modprobe Tainted: G        W   E <etc>
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS <etc>
      Call Trace:
       dump_stack+0x63/0x81
       __register_sysctl_table+0x350/0x650
       ? kmem_cache_alloc_trace+0x107/0x240
       __register_sysctl_paths+0x1b3/0x1e0
       ? 0xffffffffc005f000
       register_sysctl_table+0x1f/0x30
       test_sysctl_init+0x10/0x1000 [test_sysctl]
       do_one_initcall+0x52/0x1a0
       ? kmem_cache_alloc_trace+0x107/0x240
       do_init_module+0x5f/0x200
       load_module+0x1867/0x1bd0
       ? __symbol_put+0x60/0x60
       SYSC_finit_module+0xdf/0x110
       SyS_finit_module+0xe/0x10
       entry_SYSCALL_64_fastpath+0x1e/0xad
      RIP: 0033:0x7f042b22d119
      <etc>
      
      Fixes: e7d316a0 ("sysctl: handle error writing UINT_MAX to u32 fields")
      Link: http://lkml.kernel.org/r/20170519033554.18592-5-mcgrof@kernel.orgSigned-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Suggested-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Cc: Liping Zhang <zlpnobody@gmail.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4f2fec00
    • L
      sysctl: fold sysctl_writes_strict checks into helper · d383d484
      Luis R. Rodriguez 提交于
      The mode sysctl_writes_strict positional checks keep being copy and pasted
      as we add new proc handlers.  Just add a helper to avoid code duplication.
      
      Link: http://lkml.kernel.org/r/20170519033554.18592-4-mcgrof@kernel.orgSigned-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Suggested-by: NKees Cook <keescook@chromium.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d383d484
    • L
      sysctl: kdoc'ify sysctl_writes_strict · a19ac337
      Luis R. Rodriguez 提交于
      Document the different sysctl_writes_strict modes in code.
      
      Link: http://lkml.kernel.org/r/20170519033554.18592-3-mcgrof@kernel.orgSigned-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a19ac337
    • L
      sysctl: fix lax sysctl_check_table() sanity check · 89c5b53b
      Luis R. Rodriguez 提交于
      Patch series "sysctl: few fixes", v5.
      
      I've been working on making kmod more deterministic, and as I did that I
      couldn't help but notice a few issues with sysctl.  My end goal was just
      to fix unsigned int support, which back then was completely broken.
      Liping Zhang has sent up small atomic fixes, however it still missed yet
      one more fix and Alexey Dobriyan had also suggested to just drop array
      support given its complexity.
      
      I have inspected array support using Coccinelle and indeed its not that
      popular, so if in fact we can avoid it for new interfaces, I agree its
      best.
      
      I did develop a sysctl stress driver but will hold that off for another
      series.
      
      This patch (of 5):
      
      Commit 7c60c48f ("sysctl: Improve the sysctl sanity checks")
      improved sanity checks considerbly, however the enhancements on
      sysctl_check_table() meant adding a functional change so that only the
      last table entry's sanity error is propagated.  It also changed the way
      errors were propagated so that each new check reset the err value, this
      means only last sanity check computed is used for an error.  This has
      been in the kernel since v3.4 days.
      
      Fix this by carrying on errors from previous checks and iterations as we
      traverse the table and ensuring we keep any error from previous checks.
      We keep iterating on the table even if an error is found so we can
      complain for all errors found in one shot.  This works as -EINVAL is
      always returned on error anyway, and the check for error is any non-zero
      value.
      
      Fixes: 7c60c48f ("sysctl: Improve the sysctl sanity checks")
      Link: http://lkml.kernel.org/r/20170519033554.18592-2-mcgrof@kernel.orgSigned-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      89c5b53b
    • B
      kexec/kdump: minor Documentation updates for arm64 and Image · a711bdc0
      Bharat Bhushan 提交于
      Minor updates in Documentation for arm64 as relocatable kernel.  Also
      this patch updates documentation for using uncompressed image "Image"
      which is used for ARM64.
      
      Link: http://lkml.kernel.org/r/1495104793-6563-1-git-send-email-Bharat.Bhushan@nxp.comSigned-off-by: NBharat Bhushan <Bharat.Bhushan@nxp.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Pratyush Anand <panand@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a711bdc0
    • X
      kdump: protect vmcoreinfo data under the crash memory · 1229384f
      Xunlei Pang 提交于
      Currently vmcoreinfo data is updated at boot time subsys_initcall(), it
      has the risk of being modified by some wrong code during system is
      running.
      
      As a result, vmcore dumped may contain the wrong vmcoreinfo.  Later on,
      when using "crash", "makedumpfile", etc utility to parse this vmcore, we
      probably will get "Segmentation fault" or other unexpected errors.
      
      E.g.  1) wrong code overwrites vmcoreinfo_data; 2) further crashes the
      system; 3) trigger kdump, then we obviously will fail to recognize the
      crash context correctly due to the corrupted vmcoreinfo.
      
      Now except for vmcoreinfo, all the crash data is well
      protected(including the cpu note which is fully updated in the crash
      path, thus its correctness is guaranteed).  Given that vmcoreinfo data
      is a large chunk prepared for kdump, we better protect it as well.
      
      To solve this, we relocate and copy vmcoreinfo_data to the crash memory
      when kdump is loading via kexec syscalls.  Because the whole crash
      memory will be protected by existing arch_kexec_protect_crashkres()
      mechanism, we naturally protect vmcoreinfo_data from write(even read)
      access under kernel direct mapping after kdump is loaded.
      
      Since kdump is usually loaded at the very early stage after boot, we can
      trust the correctness of the vmcoreinfo data copied.
      
      On the other hand, we still need to operate the vmcoreinfo safe copy
      when crash happens to generate vmcoreinfo_note again, we rely on vmap()
      to map out a new kernel virtual address and update to use this new one
      instead in the following crash_save_vmcoreinfo().
      
      BTW, we do not touch vmcoreinfo_note, because it will be fully updated
      using the protected vmcoreinfo_data after crash which is surely correct
      just like the cpu crash note.
      
      Link: http://lkml.kernel.org/r/1493281021-20737-3-git-send-email-xlpang@redhat.comSigned-off-by: NXunlei Pang <xlpang@redhat.com>
      Tested-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1229384f
    • X
      powerpc/fadump: use the correct VMCOREINFO_NOTE_SIZE for phdr · 5203f499
      Xunlei Pang 提交于
      vmcoreinfo_max_size stands for the vmcoreinfo_data, the correct one we
      should use is vmcoreinfo_note whose total size is VMCOREINFO_NOTE_SIZE.
      
      Like explained in commit 77019967 ("kdump: fix exported size of
      vmcoreinfo note"), it should not affect the actual function, but we
      better fix it, also this change should be safe and backward compatible.
      
      After this, we can get rid of variable vmcoreinfo_max_size, let's use
      the corresponding macros directly, fewer variables means more safety for
      vmcoreinfo operation.
      
      [xlpang@redhat.com: fix build warning]
        Link: http://lkml.kernel.org/r/1494830606-27736-1-git-send-email-xlpang@redhat.com
      Link: http://lkml.kernel.org/r/1493281021-20737-2-git-send-email-xlpang@redhat.comSigned-off-by: NXunlei Pang <xlpang@redhat.com>
      Reviewed-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Reviewed-by: NDave Young <dyoung@redhat.com>
      Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5203f499
    • X
      kexec: move vmcoreinfo out of the kernel's .bss section · 203e9e41
      Xunlei Pang 提交于
      As Eric said,
       "what we need to do is move the variable vmcoreinfo_note out of the
        kernel's .bss section. And modify the code to regenerate and keep this
        information in something like the control page.
      
        Definitely something like this needs a page all to itself, and ideally
        far away from any other kernel data structures. I clearly was not
        watching closely the data someone decided to keep this silly thing in
        the kernel's .bss section."
      
      This patch allocates extra pages for these vmcoreinfo_XXX variables, one
      advantage is that it enhances some safety of vmcoreinfo, because
      vmcoreinfo now is kept far away from other kernel data structures.
      
      Link: http://lkml.kernel.org/r/1493281021-20737-1-git-send-email-xlpang@redhat.comSigned-off-by: NXunlei Pang <xlpang@redhat.com>
      Tested-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Suggested-by: NEric Biederman <ebiederm@xmission.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
      Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      203e9e41
    • C
      kernel/fork.c: virtually mapped stacks: do not disable interrupts · 112166f8
      Christoph Lameter 提交于
      The reason to disable interrupts seems to be to avoid switching to a
      different processor while handling per cpu data using individual loads and
      stores.  If we use per cpu RMV primitives we will not have to disable
      interrupts.
      
      Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1705171055130.5898@east.gentwo.orgSigned-off-by: NChristoph Lameter <cl@linux.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      112166f8
    • G
      mm/memory.c: mark create_huge_pmd() inline to prevent build failure · 91a90140
      Geert Uytterhoeven 提交于
      With gcc 4.1.2:
      
          mm/memory.o: In function `create_huge_pmd':
          memory.c:(.text+0x93e): undefined reference to `do_huge_pmd_anonymous_page'
      
      Interestingly, create_huge_pmd() is emitted in the assembler output, but
      never called.
      
      Converting transparent_hugepage_enabled() from a macro to a static
      inline function reduced the ability of the compiler to remove unused
      code.
      
      Fix this by marking create_huge_pmd() inline.
      
      Fixes: 16981d76 ("mm: improve readability of transparent_hugepage_enabled()")
      Link: http://lkml.kernel.org/r/1499842660-10665-1-git-send-email-geert@linux-m68k.orgSigned-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      91a90140
    • I
      kernel.h: handle pointers to arrays better in container_of() · c7acec71
      Ian Abbott 提交于
      If the first parameter of container_of() is a pointer to a
      non-const-qualified array type (and the third parameter names a
      non-const-qualified array member), the local variable __mptr will be
      defined with a const-qualified array type.  In ISO C, these types are
      incompatible.  They work as expected in GNU C, but some versions will
      issue warnings.  For example, GCC 4.9 produces the warning
      "initialization from incompatible pointer type".
      
      Here is an example of where the problem occurs:
      
      -------------------------------------------------------
         #include <linux/kernel.h>
         #include <linux/module.h>
      
        MODULE_LICENSE("GPL");
      
        struct st {
        	int a;
        	char b[16];
        };
      
        static int __init example_init(void) {
        	struct st t = { .a = 101, .b = "hello" };
        	char (*p)[16] = &t.b;
        	struct st *x = container_of(p, struct st, b);
        	printk(KERN_DEBUG "%p %p\n", (void *)&t, (void *)x);
        	return 0;
        }
      
        static void __exit example_exit(void) {
        }
      
        module_init(example_init);
        module_exit(example_exit);
      -------------------------------------------------------
      
      Building the module with gcc-4.9 results in these warnings (where '{m}'
      is the module source and '{k}' is the kernel source):
      
      -------------------------------------------------------
        In file included from {m}/example.c:1:0:
        {m}/example.c: In function `example_init':
        {k}/include/linux/kernel.h:854:48: warning: initialization from incompatible pointer type
          const typeof( ((type *)0)->member ) *__mptr = (ptr); \
                                                        ^
        {m}/example.c:14:17: note: in expansion of macro `container_of'
          struct st *x = container_of(p, struct st, b);
                         ^
        {k}/include/linux/kernel.h:854:48: warning: (near initialization for `x')
          const typeof( ((type *)0)->member ) *__mptr = (ptr); \
                                                        ^
        {m}/example.c:14:17: note: in expansion of macro `container_of'
          struct st *x = container_of(p, struct st, b);
                         ^
      -------------------------------------------------------
      
      Replace the type checking performed by the macro to avoid these
      warnings.  Make sure `*(ptr)` either has type compatible with the
      member, or has type compatible with `void`, ignoring qualifiers.  Raise
      compiler errors if this is not true.  This is stronger than the previous
      behaviour, which only resulted in compiler warnings for a type mismatch.
      
      [arnd@arndb.de: fix new warnings for container_of()]
        Link: http://lkml.kernel.org/r/20170620200940.90557-1-arnd@arndb.de
      Link: http://lkml.kernel.org/r/20170525120316.24473-7-abbotti@mev.co.ukSigned-off-by: NIan Abbott <abbotti@mev.co.uk>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NMichal Nazarewicz <mina86@mina86.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Johannes Berg <johannes.berg@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Alexander Potapenko <glider@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c7acec71
    • S
      include/linux/dcache.h: use unsigned chars in struct name_snapshot · 0a2c13d9
      Stephen Rothwell 提交于
      "kernel.h: handle pointers to arrays better in container_of()" triggers:
      
      In file included from include/uapi/linux/stddef.h:1:0,
                       from include/linux/stddef.h:4,
                       from include/uapi/linux/posix_types.h:4,
                       from include/uapi/linux/types.h:13,
                       from include/linux/types.h:5,
                       from include/linux/syscalls.h:71,
                       from fs/dcache.c:17:
      fs/dcache.c: In function 'release_dentry_name_snapshot':
      include/linux/compiler.h:542:38: error: call to '__compiletime_assert_305' declared with attribute error: pointer type mismatch in container_of()
        _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
                                            ^
      include/linux/compiler.h:525:4: note: in definition of macro '__compiletime_assert'
          prefix ## suffix();    \
          ^
      include/linux/compiler.h:542:2: note: in expansion of macro '_compiletime_assert'
        _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
        ^
      include/linux/build_bug.h:46:37: note: in expansion of macro 'compiletime_assert'
       #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
                                           ^
      include/linux/kernel.h:860:2: note: in expansion of macro 'BUILD_BUG_ON_MSG'
        BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \
        ^
      fs/dcache.c:305:7: note: in expansion of macro 'container_of'
         p = container_of(name->name, struct external_name, name[0]);
      
      Switch name_snapshot to use unsigned chars, matching struct qstr and
      struct external_name.
      
      Link: http://lkml.kernel.org/r/20170710152134.0f78c1e6@canb.auug.org.auSigned-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0a2c13d9
    • L
      Merge branch 'i2c/for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 235b84fc
      Linus Torvalds 提交于
      Pull i2c updates from Wolfram Sang:
       "This pull request contains:
      
         - i2c core reorganization. One source file became too monolithic. It
           is now split up, yet we still have the same named object as the
           final output. This should ease maintenance.
      
         - new drivers: ZTE ZX2967 family, ASPEED 24XX/25XX
      
         - designware driver gained slave mode support
      
         - xgene-slimpro driver gained ACPI support
      
         - bigger overhaul for pca-platform driver
      
         - the algo-bit module now supports messages with enforced STOP
      
         - slightly bigger than usual set of driver updates and improvements
      
        and with much appreciated quality assurance from Andy Shevchenko"
      
      * 'i2c/for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (51 commits)
        i2c: Provide a stub for i2c_detect_slave_mode()
        i2c: designware: Let slave adapter support be optional
        i2c: designware: Make HW init functions static
        i2c: designware: fix spelling mistakes
        i2c: pca-platform: propagate error from i2c_pca_add_numbered_bus
        i2c: pca-platform: correctly set algo_data.reset_chip
        i2c: acpi: Do not create i2c-clients for LNXVIDEO ACPI devices
        i2c: designware: enable SLAVE in platform module
        i2c: designware: add SLAVE mode functions
        i2c: zx2967: drop COMPILE_TEST dependency
        i2c: zx2967: always use the same device when printing errors
        i2c: pca-platform: use dev_warn/dev_info instead of printk
        i2c: pca-platform: use device managed allocations
        i2c: pca-platform: add devicetree awareness
        i2c: pca-platform: switch to struct gpio_desc
        dt-bindings: add bindings for i2c-pca-platform
        i2c: cadance: fix ctrl/addr reg write order
        i2c: zx2967: add i2c controller driver for ZTE's zx2967 family
        dt: bindings: add documentation for zx2967 family i2c controller
        i2c: algo-bit: add support for I2C_M_STOP
        ...
      235b84fc
    • L
      Merge tag 'iommu-updates-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · fb4e3bee
      Linus Torvalds 提交于
      Pull IOMMU updates from Joerg Roedel:
       "This update comes with:
      
         - Support for lockless operation in the ARM io-pgtable code.
      
           This is an important step to solve the scalability problems in the
           common dma-iommu code for ARM
      
         - Some Errata workarounds for ARM SMMU implemenations
      
         - Rewrite of the deferred IO/TLB flush code in the AMD IOMMU driver.
      
           The code suffered from very high flush rates, with the new
           implementation the flush rate is down to ~1% of what it was before
      
         - Support for amd_iommu=off when booting with kexec.
      
           The problem here was that the IOMMU driver bailed out early without
           disabling the iommu hardware, if it was enabled in the old kernel
      
         - The Rockchip IOMMU driver is now available on ARM64
      
         - Align the return value of the iommu_ops->device_group call-backs to
           not miss error values
      
         - Preempt-disable optimizations in the Intel VT-d and common IOVA
           code to help Linux-RT
      
         - Various other small cleanups and fixes"
      
      * tag 'iommu-updates-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (60 commits)
        iommu/vt-d: Constify intel_dma_ops
        iommu: Warn once when device_group callback returns NULL
        iommu/omap: Return ERR_PTR in device_group call-back
        iommu: Return ERR_PTR() values from device_group call-backs
        iommu/s390: Use iommu_group_get_for_dev() in s390_iommu_add_device()
        iommu/vt-d: Don't disable preemption while accessing deferred_flush()
        iommu/iova: Don't disable preempt around this_cpu_ptr()
        iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #126
        iommu/arm-smmu-v3: Enable ACPI based HiSilicon CMD_PREFETCH quirk(erratum 161010701)
        iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #74
        ACPI/IORT: Fixup SMMUv3 resource size for Cavium ThunderX2 SMMUv3 model
        iommu/arm-smmu-v3, acpi: Add temporary Cavium SMMU-V3 IORT model number definitions
        iommu/io-pgtable-arm: Use dma_wmb() instead of wmb() when publishing table
        iommu/io-pgtable: depend on !GENERIC_ATOMIC64 when using COMPILE_TEST with LPAE
        iommu/arm-smmu-v3: Remove io-pgtable spinlock
        iommu/arm-smmu: Remove io-pgtable spinlock
        iommu/io-pgtable-arm-v7s: Support lockless operation
        iommu/io-pgtable-arm: Support lockless operation
        iommu/io-pgtable: Introduce explicit coherency
        iommu/io-pgtable-arm-v7s: Refactor split_blk_unmap
        ...
      fb4e3bee
    • L
      Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs · 6b1c776d
      Linus Torvalds 提交于
      Pull overlayfs updates from Miklos Szeredi:
       "This work from Amir introduces the inodes index feature, which
        provides:
      
         - hardlinks are not broken on copy up
      
         - infrastructure for overlayfs NFS export
      
        This also fixes constant st_ino for samefs case for lower hardlinks"
      
      * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: (33 commits)
        ovl: mark parent impure and restore timestamp on ovl_link_up()
        ovl: document copying layers restrictions with inodes index
        ovl: cleanup orphan index entries
        ovl: persistent overlay inode nlink for indexed inodes
        ovl: implement index dir copy up
        ovl: move copy up lock out
        ovl: rearrange copy up
        ovl: add flag for upper in ovl_entry
        ovl: use struct copy_up_ctx as function argument
        ovl: base tmpfile in workdir too
        ovl: factor out ovl_copy_up_inode() helper
        ovl: extract helper to get temp file in copy up
        ovl: defer upper dir lock to tempfile link
        ovl: hash overlay non-dir inodes by copy up origin
        ovl: cleanup bad and stale index entries on mount
        ovl: lookup index entry for copy up origin
        ovl: verify index dir matches upper dir
        ovl: verify upper root dir matches lower root dir
        ovl: introduce the inodes index dir feature
        ovl: generalize ovl_create_workdir()
        ...
      6b1c776d
    • A
      fix a braino in compat_sys_getrlimit() · 58c7ffc0
      Al Viro 提交于
      Reported-and-tested-by: NMeelis Roos <mroos@linux.ee>
      Fixes: commit d9e968cb "getrlimit()/setrlimit(): move compat to native"
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      58c7ffc0
  2. 12 7月, 2017 7 次提交
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · 3b06b1a7
      Linus Torvalds 提交于
      Pull sparc fixes from David Miller:
      
       - Fix symbol version generation for assembler on sparc, from
         Nagarathnam Muthusamy.
      
       - Fix compound page handling in gup_huge_pmd(), from Nitin Gupta.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
        sparc64: Fix gup_huge_pmd
        Adding the type of exported symbols
        sed regex in Makefile.build requires line break between exported symbols
        Adding asm-prototypes.h for genksyms to generate crc
      3b06b1a7
    • L
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 130568d5
      Linus Torvalds 提交于
      Pull more block updates from Jens Axboe:
       "This is a followup for block changes, that didn't make the initial
        pull request. It's a bit of a mixed bag, this contains:
      
         - A followup pull request from Sagi for NVMe. Outside of fixups for
           NVMe, it also includes a series for ensuring that we properly
           quiesce hardware queues when browsing live tags.
      
         - Set of integrity fixes from Dmitry (mostly), fixing various issues
           for folks using DIF/DIX.
      
         - Fix for a bug introduced in cciss, with the req init changes. From
           Christoph.
      
         - Fix for a bug in BFQ, from Paolo.
      
         - Two followup fixes for lightnvm/pblk from Javier.
      
         - Depth fix from Ming for blk-mq-sched.
      
         - Also from Ming, performance fix for mtip32xx that was introduced
           with the dynamic initialization of commands"
      
      * 'for-linus' of git://git.kernel.dk/linux-block: (44 commits)
        block: call bio_uninit in bio_endio
        nvmet: avoid unneeded assignment of submit_bio return value
        nvme-pci: add module parameter for io queue depth
        nvme-pci: compile warnings in nvme_alloc_host_mem()
        nvmet_fc: Accept variable pad lengths on Create Association LS
        nvme_fc/nvmet_fc: revise Create Association descriptor length
        lightnvm: pblk: remove unnecessary checks
        lightnvm: pblk: control I/O flow also on tear down
        cciss: initialize struct scsi_req
        null_blk: fix error flow for shared tags during module_init
        block: Fix __blkdev_issue_zeroout loop
        nvme-rdma: unconditionally recycle the request mr
        nvme: split nvme_uninit_ctrl into stop and uninit
        virtio_blk: quiesce/unquiesce live IO when entering PM states
        mtip32xx: quiesce request queues to make sure no submissions are inflight
        nbd: quiesce request queues to make sure no submissions are inflight
        nvme: kick requeue list when requeueing a request instead of when starting the queues
        nvme-pci: quiesce/unquiesce admin_q instead of start/stop its hw queues
        nvme-loop: quiesce/unquiesce admin_q instead of start/stop its hw queues
        nvme-fc: quiesce/unquiesce admin_q instead of start/stop its hw queues
        ...
      130568d5
    • L
      Merge tag 'smb3-security-fixes-for-4.13' of git://git.samba.org/sfrench/cifs-2.6 · 908b852d
      Linus Torvalds 提交于
      Pull cifs fixes and sane default from Steve French:
       "Upgrade default dialect to more secure SMB3 from older cifs dialect"
      
      * tag 'smb3-security-fixes-for-4.13' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: Clean up unused variables in smb2pdu.c
        [SMB3] Improve security, move default dialect to SMB3 from old CIFS
        [SMB3] Remove ifdef since SMB3 (and later) now STRONGLY preferred
        CIFS: Reconnect expired SMB sessions
        CIFS: Display SMB2 error codes in the hex format
        cifs: Use smb 2 - 3 and cifsacl mount options setacl function
        cifs: prototype declaration and definition to set acl for smb 2 - 3 and cifsacl mount options
      908b852d
    • L
      Merge tag 'ceph-for-4.13-rc1' of git://github.com/ceph/ceph-client · 3bf7878f
      Linus Torvalds 提交于
      Pull ceph updates from Ilya Dryomov:
       "The main item here is support for v12.y.z ("Luminous") clusters:
        RESEND_ON_SPLIT, RADOS_BACKOFF, OSDMAP_PG_UPMAP and CRUSH_CHOOSE_ARGS
        feature bits, and various other changes in the RADOS client protocol.
      
        On top of that we have a new fsc mount option to allow supplying
        fscache uniquifier (similar to NFS) and the usual pile of filesystem
        fixes from Zheng"
      
      * tag 'ceph-for-4.13-rc1' of git://github.com/ceph/ceph-client: (44 commits)
        libceph: advertise support for NEW_OSDOP_ENCODING and SERVER_LUMINOUS
        libceph: osd_state is 32 bits wide in luminous
        crush: remove an obsolete comment
        crush: crush_init_workspace starts with struct crush_work
        libceph, crush: per-pool crush_choose_arg_map for crush_do_rule()
        crush: implement weight and id overrides for straw2
        libceph: apply_upmap()
        libceph: compute actual pgid in ceph_pg_to_up_acting_osds()
        libceph: pg_upmap[_items] infrastructure
        libceph: ceph_decode_skip_* helpers
        libceph: kill __{insert,lookup,remove}_pg_mapping()
        libceph: introduce and switch to decode_pg_mapping()
        libceph: don't pass pgid by value
        libceph: respect RADOS_BACKOFF backoffs
        libceph: make DEFINE_RB_* helpers more general
        libceph: avoid unnecessary pi lookups in calc_target()
        libceph: use target pi for calc_target() calculations
        libceph: always populate t->target_{oid,oloc} in calc_target()
        libceph: make sure need_resend targets reflect latest map
        libceph: delete from need_resend_linger before check_linger_pool_dne()
        ...
      3bf7878f
    • L
      Merge git://www.linux-watchdog.org/linux-watchdog · 07d306c8
      Linus Torvalds 提交于
      Pull watchdog updates from Wim Van Sebroeck:
      
       - Add Renesas RZ/A WDT Watchdog driver
      
       - STM32 Independent WatchDoG (IWDG) support
      
       - UniPhier watchdog support
      
       - Add F71868 support
      
       - Add support for NCT6793D and NCT6795D
      
       - dw_wdt: add reset lines support
      
       - core: add option to avoid early handling of watchdog
      
       - core: introduce watchdog_worker_should_ping helper
      
       - Cleanups and improvements for sama5d4, intel-mid_wdt, s3c2410_wdt,
         orion_wdt, gpio_wdt, it87_wdt, meson_wdt, davinci_wdt, bcm47xx_wdt,
         zx2967_wdt, cadence_wdt
      
      * git://www.linux-watchdog.org/linux-watchdog: (32 commits)
        watchdog: introduce watchdog_worker_should_ping helper
        watchdog: uniphier: add UniPhier watchdog driver
        dt-bindings: watchdog: add description for UniPhier WDT controller
        watchdog: cadence_wdt: make of_device_ids const.
        watchdog: zx2967: constify zx2967_wdt_ops.
        watchdog: bcm47xx_wdt: constify bcm47xx_wdt_hard_ops and bcm47xx_wdt_soft_ops
        watchdog: davinci: Add missing clk_disable_unprepare().
        watchdog: davinci: Handle return value of clk_prepare_enable
        watchdog: meson: Handle return value of clk_prepare_enable
        watchdog: it87: Add support for various Super-IO chips
        watchdog: it87: Use infrastructure to stop watchdog on reboot
        watchdog: it87: Drop support for resetting watchdog though CIR and Game port
        watchdog: it87: Convert to use watchdog core infrastructure
        watchdog: it87: Drop FSF mailing address
        watchdog: dw_wdt: get reset lines from dt
        watchdog: bindings: dw_wdt: add reset lines
        watchdog: w83627hf: Add support for NCT6793D and NCT6795D
        watchdog: core: add option to avoid early handling of watchdog
        watchdog: f71808e_wdt: Add F71868 support
        watchdog: Add STM32 IWDG driver
        ...
      07d306c8
    • L
      Merge tag 'chrome-platform-for-linus-4.13' of... · a3ddacba
      Linus Torvalds 提交于
      Merge tag 'chrome-platform-for-linus-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/bleung/chrome-platform
      
      Pull chrome platform updates from Benson Leung:
       "Changes in this pull request are around catching up cros_ec with the
        internal chromeos-kernel versions of cros_ec, cros_ec_lpc, and
        cros_ec_lightbar.
      
        Also, switching maintainership from olof to bleung"
      
      * tag 'chrome-platform-for-linus-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/bleung/chrome-platform:
        platform/chrome : Add myself as Maintainer
        platform/chrome: cros_ec_lightbar - hide unused PM functions
        cros_ec: Don't signal wake event for non-wake host events
        cros_ec: Fix deadlock when EC is not responsive at probe
        cros_ec: Don't return error when checking command version
        platform/chrome: cros_ec_lightbar - Avoid I2C xfer to EC during suspend
        platform/chrome: cros_ec_lightbar - Add userspace lightbar control bit to EC
        platform/chrome: cros_ec_lightbar - Control of suspend/resume lightbar sequence
        platform/chrome: cros_ec_lightbar - Add lightbar program feature to sysfs
        platform/chrome: cros_ec_lpc: Add MKBP events support over ACPI
        platform/chrome: cros_ec_lpc: Add power management ops
        platform/chrome: cros_ec_lpc: Add support for GOOG004 ACPI device
        platform/chrome: cros_ec_lpc: Add support for mec1322 EC
        platform/chrome: cros_ec_lpc: Add R/W helpers to LPC protocol variants
        mfd: cros_ec: Add support for dumping panic information
        cros_ec_debugfs: Pass proper struct sizes to cros_ec_cmd_xfer()
        mfd: cros_ec: add debugfs, console log file
        mfd: cros_ec: Add EC console read structures definitions
        mfd: cros_ec: Add helper for event notifier.
      a3ddacba
    • L
      Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu · a0188177
      Linus Torvalds 提交于
      Pull x86nommu update from Greg Ungerer:
       "Only a single change, to remove old Kconfig options from defconfigs"
      
      * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
        m68k: defconfig: Cleanup from old Kconfig options
      a0188177
  3. 11 7月, 2017 10 次提交
    • L
      Merge branch 'akpm' (patches from Andrew) · 9967468c
      Linus Torvalds 提交于
      Merge more updates from Andrew Morton:
      
       - most of the rest of MM
      
       - KASAN updates
      
       - lib/ updates
      
       - checkpatch updates
      
       - some binfmt_elf changes
      
       - various misc bits
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (115 commits)
        kernel/exit.c: avoid undefined behaviour when calling wait4()
        kernel/signal.c: avoid undefined behaviour in kill_something_info
        binfmt_elf: safely increment argv pointers
        s390: reduce ELF_ET_DYN_BASE
        powerpc: move ELF_ET_DYN_BASE to 4GB / 4MB
        arm64: move ELF_ET_DYN_BASE to 4GB / 4MB
        arm: move ELF_ET_DYN_BASE to 4MB
        binfmt_elf: use ELF_ET_DYN_BASE only for PIE
        fs, epoll: short circuit fetching events if thread has been killed
        checkpatch: improve multi-line alignment test
        checkpatch: improve macro reuse test
        checkpatch: change format of --color argument to --color[=WHEN]
        checkpatch: silence perl 5.26.0 unescaped left brace warnings
        checkpatch: improve tests for multiple line function definitions
        checkpatch: remove false warning for commit reference
        checkpatch: fix stepping through statements with $stat and ctx_statement_block
        checkpatch: [HLP]LIST_HEAD is also declaration
        checkpatch: warn when a MAINTAINERS entry isn't [A-Z]:\t
        checkpatch: improve the unnecessary OOM message test
        lib/bsearch.c: micro-optimize pivot position calculation
        ...
      9967468c
    • Z
      kernel/exit.c: avoid undefined behaviour when calling wait4() · dd83c161
      zhongjiang 提交于
      wait4(-2147483648, 0x20, 0, 0xdd0000) triggers:
      UBSAN: Undefined behaviour in kernel/exit.c:1651:9
      
      The related calltrace is as follows:
      
        negation of -2147483648 cannot be represented in type 'int':
        CPU: 9 PID: 16482 Comm: zj Tainted: G    B          ---- -------   3.10.0-327.53.58.71.x86_64+ #66
        Hardware name: Huawei Technologies Co., Ltd. Tecal RH2285          /BC11BTSA              , BIOS CTSAV036 04/27/2011
        Call Trace:
          dump_stack+0x19/0x1b
          ubsan_epilogue+0xd/0x50
          __ubsan_handle_negate_overflow+0x109/0x14e
          SyS_wait4+0x1cb/0x1e0
          system_call_fastpath+0x16/0x1b
      
      Exclude the overflow to avoid the UBSAN warning.
      
      Link: http://lkml.kernel.org/r/1497264618-20212-1-git-send-email-zhongjiang@huawei.comSigned-off-by: Nzhongjiang <zhongjiang@huawei.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dd83c161
    • Z
      kernel/signal.c: avoid undefined behaviour in kill_something_info · 4ea77014
      zhongjiang 提交于
      When running kill(72057458746458112, 0) in userspace I hit the following
      issue.
      
        UBSAN: Undefined behaviour in kernel/signal.c:1462:11
        negation of -2147483648 cannot be represented in type 'int':
        CPU: 226 PID: 9849 Comm: test Tainted: G    B          ---- -------   3.10.0-327.53.58.70.x86_64_ubsan+ #116
        Hardware name: Huawei Technologies Co., Ltd. RH8100 V3/BC61PBIA, BIOS BLHSV028 11/11/2014
        Call Trace:
          dump_stack+0x19/0x1b
          ubsan_epilogue+0xd/0x50
          __ubsan_handle_negate_overflow+0x109/0x14e
          SYSC_kill+0x43e/0x4d0
          SyS_kill+0xe/0x10
          system_call_fastpath+0x16/0x1b
      
      Add code to avoid the UBSAN detection.
      
      [akpm@linux-foundation.org: tweak comment]
      Link: http://lkml.kernel.org/r/1496670008-59084-1-git-send-email-zhongjiang@huawei.comSigned-off-by: Nzhongjiang <zhongjiang@huawei.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4ea77014
    • K
      binfmt_elf: safely increment argv pointers · 67c6777a
      Kees Cook 提交于
      When building the argv/envp pointers, the envp is needlessly
      pre-incremented instead of just continuing after the argv pointers are
      finished.  In some (likely impossible) race where the strings could be
      changed from userspace between copy_strings() and here, it might be
      possible to confuse the envp position.  Instead, just use sp like
      everything else.
      
      Link: http://lkml.kernel.org/r/20170622173838.GA43308@beastSigned-off-by: NKees Cook <keescook@chromium.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Cc: Qualys Security Advisory <qsa@qualys.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      67c6777a
    • K
      s390: reduce ELF_ET_DYN_BASE · a73dc537
      Kees Cook 提交于
      Now that explicitly executed loaders are loaded in the mmap region, we
      have more freedom to decide where we position PIE binaries in the
      address space to avoid possible collisions with mmap or stack regions.
      
      For 64-bit, align to 4GB to allow runtimes to use the entire 32-bit
      address space for 32-bit pointers.  On 32-bit use 4MB, which is the
      traditional x86 minimum load location, likely to avoid historically
      requiring a 4MB page table entry when only a portion of the first 4MB
      would be used (since the NULL address is avoided).  For s390 the
      position could be 0x10000, but that is needlessly close to the NULL
      address.
      
      Link: http://lkml.kernel.org/r/1498154792-49952-5-git-send-email-keescook@chromium.orgSigned-off-by: NKees Cook <keescook@chromium.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Pratyush Anand <panand@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a73dc537
    • K
      powerpc: move ELF_ET_DYN_BASE to 4GB / 4MB · 47ebb09d
      Kees Cook 提交于
      Now that explicitly executed loaders are loaded in the mmap region, we
      have more freedom to decide where we position PIE binaries in the
      address space to avoid possible collisions with mmap or stack regions.
      
      For 64-bit, align to 4GB to allow runtimes to use the entire 32-bit
      address space for 32-bit pointers.  On 32-bit use 4MB, which is the
      traditional x86 minimum load location, likely to avoid historically
      requiring a 4MB page table entry when only a portion of the first 4MB
      would be used (since the NULL address is avoided).
      
      Link: http://lkml.kernel.org/r/1498154792-49952-4-git-send-email-keescook@chromium.orgSigned-off-by: NKees Cook <keescook@chromium.org>
      Tested-by: NMichael Ellerman <mpe@ellerman.id.au>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Pratyush Anand <panand@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      47ebb09d
    • K
      arm64: move ELF_ET_DYN_BASE to 4GB / 4MB · 02445990
      Kees Cook 提交于
      Now that explicitly executed loaders are loaded in the mmap region, we
      have more freedom to decide where we position PIE binaries in the
      address space to avoid possible collisions with mmap or stack regions.
      
      For 64-bit, align to 4GB to allow runtimes to use the entire 32-bit
      address space for 32-bit pointers.  On 32-bit use 4MB, to match ARM.
      This could be 0x8000, the standard ET_EXEC load address, but that is
      needlessly close to the NULL address, and anyone running arm compat PIE
      will have an MMU, so the tight mapping is not needed.
      
      Link: http://lkml.kernel.org/r/1498251600-132458-4-git-send-email-keescook@chromium.orgSigned-off-by: NKees Cook <keescook@chromium.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      02445990
    • K
      arm: move ELF_ET_DYN_BASE to 4MB · 6a9af90a
      Kees Cook 提交于
      Now that explicitly executed loaders are loaded in the mmap region, we
      have more freedom to decide where we position PIE binaries in the
      address space to avoid possible collisions with mmap or stack regions.
      
      4MB is chosen here mainly to have parity with x86, where this is the
      traditional minimum load location, likely to avoid historically
      requiring a 4MB page table entry when only a portion of the first 4MB
      would be used (since the NULL address is avoided).
      
      For ARM the position could be 0x8000, the standard ET_EXEC load address,
      but that is needlessly close to the NULL address, and anyone running PIE
      on 32-bit ARM will have an MMU, so the tight mapping is not needed.
      
      Link: http://lkml.kernel.org/r/1498154792-49952-2-git-send-email-keescook@chromium.orgSigned-off-by: NKees Cook <keescook@chromium.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Pratyush Anand <panand@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
      Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Qualys Security Advisory <qsa@qualys.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6a9af90a
    • K
      binfmt_elf: use ELF_ET_DYN_BASE only for PIE · eab09532
      Kees Cook 提交于
      The ELF_ET_DYN_BASE position was originally intended to keep loaders
      away from ET_EXEC binaries.  (For example, running "/lib/ld-linux.so.2
      /bin/cat" might cause the subsequent load of /bin/cat into where the
      loader had been loaded.)
      
      With the advent of PIE (ET_DYN binaries with an INTERP Program Header),
      ELF_ET_DYN_BASE continued to be used since the kernel was only looking
      at ET_DYN.  However, since ELF_ET_DYN_BASE is traditionally set at the
      top 1/3rd of the TASK_SIZE, a substantial portion of the address space
      is unused.
      
      For 32-bit tasks when RLIMIT_STACK is set to RLIM_INFINITY, programs are
      loaded above the mmap region.  This means they can be made to collide
      (CVE-2017-1000370) or nearly collide (CVE-2017-1000371) with
      pathological stack regions.
      
      Lowering ELF_ET_DYN_BASE solves both by moving programs below the mmap
      region in all cases, and will now additionally avoid programs falling
      back to the mmap region by enforcing MAP_FIXED for program loads (i.e.
      if it would have collided with the stack, now it will fail to load
      instead of falling back to the mmap region).
      
      To allow for a lower ELF_ET_DYN_BASE, loaders (ET_DYN without INTERP)
      are loaded into the mmap region, leaving space available for either an
      ET_EXEC binary with a fixed location or PIE being loaded into mmap by
      the loader.  Only PIE programs are loaded offset from ELF_ET_DYN_BASE,
      which means architectures can now safely lower their values without risk
      of loaders colliding with their subsequently loaded programs.
      
      For 64-bit, ELF_ET_DYN_BASE is best set to 4GB to allow runtimes to use
      the entire 32-bit address space for 32-bit pointers.
      
      Thanks to PaX Team, Daniel Micay, and Rik van Riel for inspiration and
      suggestions on how to implement this solution.
      
      Fixes: d1fd836d ("mm: split ET_DYN ASLR from mmap ASLR")
      Link: http://lkml.kernel.org/r/20170621173201.GA114489@beastSigned-off-by: NKees Cook <keescook@chromium.org>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Cc: Qualys Security Advisory <qsa@qualys.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pratyush Anand <panand@redhat.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eab09532
    • D
      fs, epoll: short circuit fetching events if thread has been killed · c257a340
      David Rientjes 提交于
      We've encountered zombies that are waiting for a thread to exit that are
      looping in ep_poll() almost endlessly although there is a pending
      SIGKILL as a result of a group exit.
      
      This happens because we always find ep_events_available() and fetch more
      events and never are able to check for signal_pending() that would break
      from the loop and return -EINTR.
      
      Special case fatal signals and break immediately to guarantee that we
      loop to fetch more events and delay making a timely exit.
      
      It would also be possible to simply move the check for signal_pending()
      higher than checking for ep_events_available(), but there have been no
      reports of delayed signal handling other than SIGKILL preventing zombies
      from exiting that would be fixed by this.
      
      It fixes an issue for us where we have witnessed zombies sticking around
      for at least O(minutes), but considering the code has been like this
      forever and nobody else has complained that I have found, I would simply
      queue it up for 4.12.
      
      Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1705031722350.76784@chino.kir.corp.google.comSigned-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c257a340