1. 17 5月, 2016 2 次提交
  2. 13 5月, 2016 2 次提交
    • A
      mm: thp: calculate the mapcount correctly for THP pages during WP faults · 6d0a07ed
      Andrea Arcangeli 提交于
      This will provide fully accuracy to the mapcount calculation in the
      write protect faults, so page pinning will not get broken by false
      positive copy-on-writes.
      
      total_mapcount() isn't the right calculation needed in
      reuse_swap_page(), so this introduces a page_trans_huge_mapcount()
      that is effectively the full accurate return value for page_mapcount()
      if dealing with Transparent Hugepages, however we only use the
      page_trans_huge_mapcount() during COW faults where it strictly needed,
      due to its higher runtime cost.
      
      This also provide at practical zero cost the total_mapcount
      information which is needed to know if we can still relocate the page
      anon_vma to the local vma. If page_trans_huge_mapcount() returns 1 we
      can reuse the page no matter if it's a pte or a pmd_trans_huge
      triggering the fault, but we can only relocate the page anon_vma to
      the local vma->anon_vma if we're sure it's only this "vma" mapping the
      whole THP physical range.
      
      Kirill A. Shutemov discovered the problem with moving the page
      anon_vma to the local vma->anon_vma in a previous version of this
      patch and another problem in the way page_move_anon_rmap() was called.
      
      Andrew Morton discovered that CONFIG_SWAP=n wouldn't build in a
      previous version, because reuse_swap_page must be a macro to call
      page_trans_huge_mapcount from swap.h, so this uses a macro again
      instead of an inline function. With this change at least it's a less
      dangerous usage than it was before, because "page" is used only once
      now, while with the previous code reuse_swap_page(page++) would have
      called page_mapcount on page+1 and it would have increased page twice
      instead of just once.
      
      Dean Luick noticed an uninitialized variable that could result in a
      rmap inefficiency for the non-THP case in a previous version.
      
      Mike Marciniszyn said:
      
      : Our RDMA tests are seeing an issue with memory locking that bisects to
      : commit 61f5d698 ("mm: re-enable THP")
      :
      : The test program registers two rather large MRs (512M) and RDMA
      : writes data to a passive peer using the first and RDMA reads it back
      : into the second MR and compares that data.  The sizes are chosen randomly
      : between 0 and 1024 bytes.
      :
      : The test will get through a few (<= 4 iterations) and then gets a
      : compare error.
      :
      : Tracing indicates the kernel logical addresses associated with the individual
      : pages at registration ARE correct , the data in the "RDMA read response only"
      : packets ARE correct.
      :
      : The "corruption" occurs when the packet crosse two pages that are not physically
      : contiguous.   The second page reads back as zero in the program.
      :
      : It looks like the user VA at the point of the compare error no longer points to
      : the same physical address as was registered.
      :
      : This patch totally resolves the issue!
      
      Link: http://lkml.kernel.org/r/1462547040-1737-2-git-send-email-aarcange@redhat.comSigned-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reviewed-by: N"Kirill A. Shutemov" <kirill@shutemov.name>
      Reviewed-by: NDean Luick <dean.luick@intel.com>
      Tested-by: NAlex Williamson <alex.williamson@redhat.com>
      Tested-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Tested-by: NJosh Collier <josh.d.collier@intel.com>
      Cc: Marc Haber <mh+linux-kernel@zugschlus.de>
      Cc: <stable@vger.kernel.org>	[4.5]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6d0a07ed
    • H
      gre: Fix wrong tpi->proto in WCCP · da73b4e9
      Haishuang Yan 提交于
      When dealing with WCCP in gre6 tunnel, it sets the wrong tpi->protocol,
      that is, ETH_P_IP instead of ETH_P_IPV6 for the encapuslated traffic.
      Signed-off-by: NHaishuang Yan <yanhaishuang@cmss.chinamobile.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      da73b4e9
  3. 12 5月, 2016 20 次提交
  4. 11 5月, 2016 5 次提交
  5. 10 5月, 2016 5 次提交
    • J
      export tc ife uapi header · d99079e2
      Jamal Hadi Salim 提交于
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d99079e2
    • D
      net: l3mdev: Move get_saddr and rt6_dst · 4a65896f
      David Ahern 提交于
      Move l3mdev_rt6_dst_by_oif and l3mdev_get_saddr to l3mdev.c. Collapse
      l3mdev_get_rt6_dst into l3mdev_rt6_dst_by_oif since it is the only
      user and keep the l3mdev_get_rt6_dst name for consistency with other
      hooks.
      
      A follow-on patch adds more code to these functions making them long
      for inlined functions.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4a65896f
    • M
      uapi glibc compat: fix compile errors when glibc net/if.h included before linux/if.h · 4a91cb61
      Mikko Rapeli 提交于
      glibc's net/if.h contains copies of definitions from linux/if.h and these
      conflict and cause build failures if both files are included by application
      source code. Changes in uapi headers, which fixed header file dependencies to
      include linux/if.h when it was needed, e.g. commit 1ffad83d, made the
      net/if.h and linux/if.h incompatibilities visible as build failures for
      userspace applications like iproute2 and xtables-addons.
      
      This patch fixes compile errors when glibc net/if.h is included before
      linux/if.h:
      
      ./linux/if.h:99:21: error: redeclaration of enumerator ‘IFF_NOARP’
      ./linux/if.h:98:23: error: redeclaration of enumerator ‘IFF_RUNNING’
      ./linux/if.h:97:26: error: redeclaration of enumerator ‘IFF_NOTRAILERS’
      ./linux/if.h:96:27: error: redeclaration of enumerator ‘IFF_POINTOPOINT’
      ./linux/if.h:95:24: error: redeclaration of enumerator ‘IFF_LOOPBACK’
      ./linux/if.h:94:21: error: redeclaration of enumerator ‘IFF_DEBUG’
      ./linux/if.h:93:25: error: redeclaration of enumerator ‘IFF_BROADCAST’
      ./linux/if.h:92:19: error: redeclaration of enumerator ‘IFF_UP’
      ./linux/if.h:252:8: error: redefinition of ‘struct ifconf’
      ./linux/if.h:203:8: error: redefinition of ‘struct ifreq’
      ./linux/if.h:169:8: error: redefinition of ‘struct ifmap’
      ./linux/if.h:107:23: error: redeclaration of enumerator ‘IFF_DYNAMIC’
      ./linux/if.h:106:25: error: redeclaration of enumerator ‘IFF_AUTOMEDIA’
      ./linux/if.h:105:23: error: redeclaration of enumerator ‘IFF_PORTSEL’
      ./linux/if.h:104:25: error: redeclaration of enumerator ‘IFF_MULTICAST’
      ./linux/if.h:103:21: error: redeclaration of enumerator ‘IFF_SLAVE’
      ./linux/if.h:102:22: error: redeclaration of enumerator ‘IFF_MASTER’
      ./linux/if.h:101:24: error: redeclaration of enumerator ‘IFF_ALLMULTI’
      ./linux/if.h:100:23: error: redeclaration of enumerator ‘IFF_PROMISC’
      
      The cases where linux/if.h is included before net/if.h need a similar fix in
      the glibc side, or the order of include files can be changed userspace
      code as a workaround.
      
      This change was tested in x86 userspace on Debian unstable with
      scripts/headers_compile_test.sh:
      
      $ make headers_install && \
        cd usr/include && ../../scripts/headers_compile_test.sh -l -k
      ...
      cc -Wall -c -nostdinc -I /usr/lib/gcc/i586-linux-gnu/5/include -I /usr/lib/gcc/i586-linux-gnu/5/include-fixed -I . -I /home/mcfrisk/src/linux-2.6/usr/headers_compile_test_include.2uX2zH -I /home/mcfrisk/src/linux-2.6/usr/headers_compile_test_include.2uX2zH/i586-linux-gnu -o /dev/null ./linux/if.h_libc_before_kernel.h
      PASSED libc before kernel test: ./linux/if.h
      Reported-by: NJan Engelhardt <jengelh@inai.de>
      Reported-by: NJosh Boyer <jwboyer@fedoraproject.org>
      Reported-by: NStephen Hemminger <shemming@brocade.com>
      Reported-by: NWaldemar Brodkorb <mail@waldemar-brodkorb.de>
      Cc: Gabriel Laskar <gabriel@lse.epita.fr>
      Signed-off-by: NMikko Rapeli <mikko.rapeli@iki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4a91cb61
    • J
      compiler-gcc: require gcc 4.8 for powerpc __builtin_bswap16() · 8634de6d
      Josh Poimboeuf 提交于
      gcc support for __builtin_bswap16() was supposedly added for powerpc in
      gcc 4.6, and was then later added for other architectures in gcc 4.8.
      
      However, Stephen Rothwell reported that attempting to use it on powerpc
      in gcc 4.6 fails with:
      
        lib/vsprintf.c:160:2: error: initializer element is not constant
        lib/vsprintf.c:160:2: error: (near initialization for 'decpair[0]')
        lib/vsprintf.c:160:2: error: initializer element is not constant
        lib/vsprintf.c:160:2: error: (near initialization for 'decpair[1]')
        ...
      
      I'm not entirely sure what those errors mean, but I don't see them on
      gcc 4.8.  So let's consider gcc 4.8 to be the official starting point
      for __builtin_bswap16().
      
      Arnd Bergmann adds:
       "I found the commit in gcc-4.8 that replaced the powerpc-specific
        implementation of __builtin_bswap16 with an architecture-independent
        one.  Apparently the powerpc version (gcc-4.6 and 4.7) just mapped to
        the lhbrx/sthbrx instructions, so it ended up not being a constant,
        though the intent of the patch was mainly to add support for the
        builtin to x86:
      
          https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52624
      
        has the patch that went into gcc-4.8 and more information."
      
      Fixes: 7322dd75 ("byteswap: try to avoid __builtin_constant_p gcc bug")
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Tested-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8634de6d
    • S
      cgroup, kernfs: make mountinfo show properly scoped path for cgroup namespaces · 4f41fc59
      Serge E. Hallyn 提交于
      Patch summary:
      
      When showing a cgroupfs entry in mountinfo, show the path of the mount
      root dentry relative to the reader's cgroup namespace root.
      
      Short explanation (courtesy of mkerrisk):
      
      If we create a new cgroup namespace, then we want both /proc/self/cgroup
      and /proc/self/mountinfo to show cgroup paths that are correctly
      virtualized with respect to the cgroup mount point.  Previous to this
      patch, /proc/self/cgroup shows the right info, but /proc/self/mountinfo
      does not.
      
      Long version:
      
      When a uid 0 task which is in freezer cgroup /a/b, unshares a new cgroup
      namespace, and then mounts a new instance of the freezer cgroup, the new
      mount will be rooted at /a/b.  The root dentry field of the mountinfo
      entry will show '/a/b'.
      
       cat > /tmp/do1 << EOF
       mount -t cgroup -o freezer freezer /mnt
       grep freezer /proc/self/mountinfo
       EOF
      
       unshare -Gm  bash /tmp/do1
       > 330 160 0:34 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,freezer
       > 355 133 0:34 /a/b /mnt rw,relatime - cgroup freezer rw,freezer
      
      The task's freezer cgroup entry in /proc/self/cgroup will simply show
      '/':
      
       grep freezer /proc/self/cgroup
       9:freezer:/
      
      If instead the same task simply bind mounts the /a/b cgroup directory,
      the resulting mountinfo entry will again show /a/b for the dentry root.
      However in this case the task will find its own cgroup at /mnt/a/b,
      not at /mnt:
      
       mount --bind /sys/fs/cgroup/freezer/a/b /mnt
       130 25 0:34 /a/b /mnt rw,nosuid,nodev,noexec,relatime shared:21 - cgroup cgroup rw,freezer
      
      In other words, there is no way for the task to know, based on what is
      in mountinfo, which cgroup directory is its own.
      
      Example (by mkerrisk):
      
      First, a little script to save some typing and verbiage:
      
      echo -e "\t/proc/self/cgroup:\t$(cat /proc/self/cgroup | grep freezer)"
      cat /proc/self/mountinfo | grep freezer |
              awk '{print "\tmountinfo:\t\t" $4 "\t" $5}'
      
      Create cgroup, place this shell into the cgroup, and look at the state
      of the /proc files:
      
      2653
      2653                         # Our shell
      14254                        # cat(1)
              /proc/self/cgroup:      10:freezer:/a/b
              mountinfo:              /       /sys/fs/cgroup/freezer
      
      Create a shell in new cgroup and mount namespaces. The act of creating
      a new cgroup namespace causes the process's current cgroups directories
      to become its cgroup root directories. (Here, I'm using my own version
      of the "unshare" utility, which takes the same options as the util-linux
      version):
      
      Look at the state of the /proc files:
      
              /proc/self/cgroup:      10:freezer:/
              mountinfo:              /       /sys/fs/cgroup/freezer
      
      The third entry in /proc/self/cgroup (the pathname of the cgroup inside
      the hierarchy) is correctly virtualized w.r.t. the cgroup namespace, which
      is rooted at /a/b in the outer namespace.
      
      However, the info in /proc/self/mountinfo is not for this cgroup
      namespace, since we are seeing a duplicate of the mount from the
      old mount namespace, and the info there does not correspond to the
      new cgroup namespace. However, trying to create a new mount still
      doesn't show us the right information in mountinfo:
      
                                            # propagating to other mountns
              /proc/self/cgroup:      7:freezer:/
              mountinfo:              /a/b    /mnt/freezer
      
      The act of creating a new cgroup namespace caused the process's
      current freezer directory, "/a/b", to become its cgroup freezer root
      directory. In other words, the pathname directory of the directory
      within the newly mounted cgroup filesystem should be "/",
      but mountinfo wrongly shows us "/a/b". The consequence of this is
      that the process in the cgroup namespace cannot correctly construct
      the pathname of its cgroup root directory from the information in
      /proc/PID/mountinfo.
      
      With this patch, the dentry root field in mountinfo is shown relative
      to the reader's cgroup namespace.  So the same steps as above:
      
              /proc/self/cgroup:      10:freezer:/a/b
              mountinfo:              /       /sys/fs/cgroup/freezer
              /proc/self/cgroup:      10:freezer:/
              mountinfo:              /../..  /sys/fs/cgroup/freezer
              /proc/self/cgroup:      10:freezer:/
              mountinfo:              /       /mnt/freezer
      
      cgroup.clone_children  freezer.parent_freezing  freezer.state      tasks
      cgroup.procs           freezer.self_freezing    notify_on_release
      3164
      2653                   # First shell that placed in this cgroup
      3164                   # Shell started by 'unshare'
      14197                  # cat(1)
      Signed-off-by: NSerge Hallyn <serge.hallyn@ubuntu.com>
      Tested-by: NMichael Kerrisk <mtk.manpages@gmail.com>
      Acked-by: NMichael Kerrisk <mtk.manpages@gmail.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      4f41fc59
  6. 09 5月, 2016 6 次提交