1. 28 11月, 2013 3 次提交
    • T
      sysfs, kernfs: introduce kernfs_remove[_by_name[_ns]]() · 879f40d1
      Tejun Heo 提交于
      Introduce kernfs removal interfaces - kernfs_remove() and
      kernfs_remove_by_name[_ns]().
      
      These are just renames of sysfs_remove() and sysfs_hash_and_remove().
      No functional changes.
      
      v2: Dummy kernfs_remove_by_name_ns() for !CONFIG_SYSFS updated to
          return -ENOSYS instead of 0.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      879f40d1
    • T
      sysfs, kernfs: add skeletons for kernfs · b8441ed2
      Tejun Heo 提交于
      Core sysfs implementation will be separated into kernfs so that it can
      be used by other non-kobject users.
      
      This patch creates fs/kernfs/ directory and makes boilerplate changes.
      kernfs interface will be directly based on sysfs_dirent and its
      forward declaration is moved to include/linux/kernfs.h which is
      included from include/linux/sysfs.h.  sysfs core implementation will
      be gradually separated out and moved to kernfs.
      
      This patch doesn't introduce any functional changes.
      
      v2: mount.c added.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b8441ed2
    • T
      sysfs: drop kobj_ns_type handling, take #2 · c84a3b27
      Tejun Heo 提交于
      The way namespace tags are implemented in sysfs is more complicated
      than necessary.  As each tag is a pointer value and required to be
      non-NULL under a namespace enabled parent, there's no need to record
      separately what type each tag is.  If multiple namespace types are
      needed, which currently aren't, we can simply compare the tag to a set
      of allowed tags in the superblock assuming that the tags, being
      pointers, won't have the same value across multiple types.
      
      This patch rips out kobj_ns_type handling from sysfs.  sysfs now has
      an enable switch to turn on namespace under a node.  If enabled, all
      children are required to have non-NULL namespace tags and filtered
      against the super_block's tag.
      
      kobject namespace determination is now performed in
      lib/kobject.c::create_dir() making sysfs_read_ns_type() unnecessary.
      The sanity checks are also moved.  create_dir() is restructured to
      ease such addition.  This removes most kobject namespace knowledge
      from sysfs proper which will enable proper separation and layering of
      sysfs.
      
      This is the second try.  The first one was cb26a311 ("sysfs: drop
      kobj_ns_type handling") which tried to automatically enable namespace
      if there are children with non-NULL namespace tags; however, it was
      broken for symlinks as they should inherit the target's tag iff
      namespace is enabled in the parent.  This led to namespace filtering
      enabled incorrectly for wireless net class devices through phy80211
      symlinks and thus network configuration failure.  a1212d27
      ("Revert "sysfs: drop kobj_ns_type handling"") reverted the commit.
      
      This shouldn't introduce any behavior changes, for real.
      
      v2: Dummy implementation of sysfs_enable_ns() for !CONFIG_SYSFS was
          missing and caused build failure.  Reported by kbuild test robot.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Kay Sievers <kay@vrfy.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: kbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c84a3b27
  2. 22 11月, 2013 3 次提交
    • K
      mm: place page->pmd_huge_pte to right union · 7aa555bf
      Kirill A. Shutemov 提交于
      I don't know what went wrong, mis-merge or something, but ->pmd_huge_pte
      placed in wrong union within struct page.
      
      In original patch[1] it's placed to union with ->lru and ->slab, but in
      commit e009bb30 ("mm: implement split page table lock for PMD
      level") it's in union with ->index and ->freelist.
      
      That union seems also unused for pages with table tables and safe to
      re-use, but it's not what I've tested.
      
      Let's move it to original place.  It fixes indentation at least.  :)
      
      [1] https://lkml.org/lkml/2013/10/7/288Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7aa555bf
    • A
      mm: hugetlbfs: fix hugetlbfs optimization · 27c73ae7
      Andrea Arcangeli 提交于
      Commit 7cb2ef56 ("mm: fix aio performance regression for database
      caused by THP") can cause dereference of a dangling pointer if
      split_huge_page runs during PageHuge() if there are updates to the
      tail_page->private field.
      
      Also it is repeating compound_head twice for hugetlbfs and it is running
      compound_head+compound_trans_head for THP when a single one is needed in
      both cases.
      
      The new code within the PageSlab() check doesn't need to verify that the
      THP page size is never bigger than the smallest hugetlbfs page size, to
      avoid memory corruption.
      
      A longstanding theoretical race condition was found while fixing the
      above (see the change right after the skip_unlock label, that is
      relevant for the compound_lock path too).
      
      By re-establishing the _mapcount tail refcounting for all compound
      pages, this also fixes the below problem:
      
        echo 0 >/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
      
        BUG: Bad page state in process bash  pfn:59a01
        page:ffffea000139b038 count:0 mapcount:10 mapping:          (null) index:0x0
        page flags: 0x1c00000000008000(tail)
        Modules linked in:
        CPU: 6 PID: 2018 Comm: bash Not tainted 3.12.0+ #25
        Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
        Call Trace:
          dump_stack+0x55/0x76
          bad_page+0xd5/0x130
          free_pages_prepare+0x213/0x280
          __free_pages+0x36/0x80
          update_and_free_page+0xc1/0xd0
          free_pool_huge_page+0xc2/0xe0
          set_max_huge_pages.part.58+0x14c/0x220
          nr_hugepages_store_common.isra.60+0xd0/0xf0
          nr_hugepages_store+0x13/0x20
          kobj_attr_store+0xf/0x20
          sysfs_write_file+0x189/0x1e0
          vfs_write+0xc5/0x1f0
          SyS_write+0x55/0xb0
          system_call_fastpath+0x16/0x1b
      Signed-off-by: NKhalid Aziz <khalid.aziz@oracle.com>
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Tested-by: NKhalid Aziz <khalid.aziz@oracle.com>
      Cc: Pravin Shelar <pshelar@nicira.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      27c73ae7
    • D
      mm: thp: give transparent hugepage code a separate copy_page · 30b0a105
      Dave Hansen 提交于
      Right now, the migration code in migrate_page_copy() uses copy_huge_page()
      for hugetlbfs and thp pages:
      
             if (PageHuge(page) || PageTransHuge(page))
                      copy_huge_page(newpage, page);
      
      So, yay for code reuse.  But:
      
        void copy_huge_page(struct page *dst, struct page *src)
        {
              struct hstate *h = page_hstate(src);
      
      and a non-hugetlbfs page has no page_hstate().  This works 99% of the
      time because page_hstate() determines the hstate from the page order
      alone.  Since the page order of a THP page matches the default hugetlbfs
      page order, it works.
      
      But, if you change the default huge page size on the boot command-line
      (say default_hugepagesz=1G), then we might not even *have* a 2MB hstate
      so page_hstate() returns null and copy_huge_page() oopses pretty fast
      since copy_huge_page() dereferences the hstate:
      
        void copy_huge_page(struct page *dst, struct page *src)
        {
              struct hstate *h = page_hstate(src);
              if (unlikely(pages_per_huge_page(h) > MAX_ORDER_NR_PAGES)) {
        ...
      
      Mel noticed that the migration code is really the only user of these
      functions.  This moves all the copy code over to migrate.c and makes
      copy_huge_page() work for THP by checking for it explicitly.
      
      I believe the bug was introduced in commit b32967ff ("mm: numa: Add
      THP migration for the NUMA working set scanning fault case")
      
      [akpm@linux-foundation.org: fix coding-style and comment text, per Naoya Horiguchi]
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Tested-by: NDave Jiang <dave.jiang@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      30b0a105
  3. 21 11月, 2013 3 次提交
    • M
      net/phy: Add the autocross feature for forced links on VSC82x4 · 3fb69bca
      Madalin Bucur 提交于
      Add auto-MDI/MDI-X capability for forced (autonegotiation disabled)
      10/100 Mbps speeds on Vitesse VSC82x4 PHYs. Exported previously static
      function genphy_setup_forced() required by the new config_aneg handler
      in the Vitesse PHY module.
      Signed-off-by: NMadalin Bucur <madalin.bucur@freescale.com>
      Signed-off-by: NShruti Kanetkar <Shruti@freescale.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3fb69bca
    • H
      net: rework recvmsg handler msg_name and msg_namelen logic · f3d33426
      Hannes Frederic Sowa 提交于
      This patch now always passes msg->msg_namelen as 0. recvmsg handlers must
      set msg_namelen to the proper size <= sizeof(struct sockaddr_storage)
      to return msg_name to the user.
      
      This prevents numerous uninitialized memory leaks we had in the
      recvmsg handlers and makes it harder for new code to accidentally leak
      uninitialized memory.
      
      Optimize for the case recvfrom is called with NULL as address. We don't
      need to copy the address at all, so set it to NULL before invoking the
      recvmsg handler. We can do so, because all the recvmsg handlers must
      cope with the case a plain read() is called on them. read() also sets
      msg_name to NULL.
      
      Also document these changes in include/linux/net.h as suggested by David
      Miller.
      
      Changes since RFC:
      
      Set msg->msg_name = NULL if user specified a NULL in msg_name but had a
      non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't
      affect sendto as it would bail out earlier while trying to copy-in the
      address. It also more naturally reflects the logic by the callers of
      verify_iovec.
      
      With this change in place I could remove "
      if (!uaddr || msg_sys->msg_namelen == 0)
      	msg->msg_name = NULL
      ".
      
      This change does not alter the user visible error logic as we ignore
      msg_namelen as long as msg_name is NULL.
      
      Also remove two unnecessary curly brackets in ___sys_recvmsg and change
      comments to netdev style.
      
      Cc: David Miller <davem@davemloft.net>
      Suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f3d33426
    • L
      Revert "mm: create a separate slab for page->ptl allocation" · 8b2e9b71
      Linus Torvalds 提交于
      This reverts commit ea1e7ed3.
      
      Al points out that while the commit *does* actually create a separate
      slab for the page->ptl allocation, that slab is never actually used, and
      the code continues to use kmalloc/kfree.
      
      Damien Wyart points out that the original patch did have the conversion
      to use kmem_cache_alloc/free, so it got lost somewhere on its way to me.
      
      Revert the half-arsed attempt that didn't do anything.  If we really do
      want the special slab (remember: this is all relevant just for debug
      builds, so it's not necessarily all that critical) we might as well redo
      the patch fully.
      Reported-by: NAl Viro <viro@zeniv.linux.org.uk>
      Acked-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Kirill A Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8b2e9b71
  4. 20 11月, 2013 4 次提交
  5. 16 11月, 2013 4 次提交
  6. 15 11月, 2013 23 次提交