1. 07 5月, 2014 7 次提交
    • C
      f2fs: add a tracepoint for f2fs_write_{meta,node,data}_page · ecda0de3
      Chao Yu 提交于
      This patch adds a tracepoint for f2fs_write_{meta,node,data}_page to trace when
      page is writting out.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      ecda0de3
    • C
      f2fs: add a tracepoint for f2fs_write_end · dfb2bf38
      Chao Yu 提交于
      This patch adds a tracepoint for f2fs_write_end to trace write op of user.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      dfb2bf38
    • C
      f2fs: add a tracepoint for f2fs_write_begin · 62aed044
      Chao Yu 提交于
      This patch adds a tracepoint for f2fs_write_begin to trace write op of user.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      62aed044
    • C
      f2fs: introduce help macro ADDRS_PER_PAGE() · 6403eb1f
      Chao Yu 提交于
      Introduce help macro ADDRS_PER_PAGE() to get the number of address pointers in
      direct node or inode.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      6403eb1f
    • C
      slub: use sysfs'es release mechanism for kmem_cache · 41a21285
      Christoph Lameter 提交于
      debugobjects warning during netfilter exit:
      
          ------------[ cut here ]------------
          WARNING: CPU: 6 PID: 4178 at lib/debugobjects.c:260 debug_print_object+0x8d/0xb0()
          ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x20
          Modules linked in:
          CPU: 6 PID: 4178 Comm: kworker/u16:2 Tainted: G        W 3.11.0-next-20130906-sasha #3984
          Workqueue: netns cleanup_net
          Call Trace:
            dump_stack+0x52/0x87
            warn_slowpath_common+0x8c/0xc0
            warn_slowpath_fmt+0x46/0x50
            debug_print_object+0x8d/0xb0
            __debug_check_no_obj_freed+0xa5/0x220
            debug_check_no_obj_freed+0x15/0x20
            kmem_cache_free+0x197/0x340
            kmem_cache_destroy+0x86/0xe0
            nf_conntrack_cleanup_net_list+0x131/0x170
            nf_conntrack_pernet_exit+0x5d/0x70
            ops_exit_list+0x5e/0x70
            cleanup_net+0xfb/0x1c0
            process_one_work+0x338/0x550
            worker_thread+0x215/0x350
            kthread+0xe7/0xf0
            ret_from_fork+0x7c/0xb0
      
      Also during dcookie cleanup:
      
          WARNING: CPU: 12 PID: 9725 at lib/debugobjects.c:260 debug_print_object+0x8c/0xb0()
          ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x20
          Modules linked in:
          CPU: 12 PID: 9725 Comm: trinity-c141 Not tainted 3.15.0-rc2-next-20140423-sasha-00018-gc4ff6c4 #408
          Call Trace:
            dump_stack (lib/dump_stack.c:52)
            warn_slowpath_common (kernel/panic.c:430)
            warn_slowpath_fmt (kernel/panic.c:445)
            debug_print_object (lib/debugobjects.c:262)
            __debug_check_no_obj_freed (lib/debugobjects.c:697)
            debug_check_no_obj_freed (lib/debugobjects.c:726)
            kmem_cache_free (mm/slub.c:2689 mm/slub.c:2717)
            kmem_cache_destroy (mm/slab_common.c:363)
            dcookie_unregister (fs/dcookies.c:302 fs/dcookies.c:343)
            event_buffer_release (arch/x86/oprofile/../../../drivers/oprofile/event_buffer.c:153)
            __fput (fs/file_table.c:217)
            ____fput (fs/file_table.c:253)
            task_work_run (kernel/task_work.c:125 (discriminator 1))
            do_notify_resume (include/linux/tracehook.h:196 arch/x86/kernel/signal.c:751)
            int_signal (arch/x86/kernel/entry_64.S:807)
      
      Sysfs has a release mechanism.  Use that to release the kmem_cache
      structure if CONFIG_SYSFS is enabled.
      
      Only slub is changed - slab currently only supports /proc/slabinfo and
      not /sys/kernel/slab/*.  We talked about adding that and someone was
      working on it.
      
      [akpm@linux-foundation.org: fix CONFIG_SYSFS=n build]
      [akpm@linux-foundation.org: fix CONFIG_SYSFS=n build even more]
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Tested-by: NSasha Levin <sasha.levin@oracle.com>
      Acked-by: NGreg KH <greg@kroah.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      41a21285
    • N
      hugetlb: ensure hugepage access is denied if hugepages are not supported · 457c1b27
      Nishanth Aravamudan 提交于
      Currently, I am seeing the following when I `mount -t hugetlbfs /none
      /dev/hugetlbfs`, and then simply do a `ls /dev/hugetlbfs`.  I think it's
      related to the fact that hugetlbfs is properly not correctly setting
      itself up in this state?:
      
        Unable to handle kernel paging request for data at address 0x00000031
        Faulting instruction address: 0xc000000000245710
        Oops: Kernel access of bad area, sig: 11 [#1]
        SMP NR_CPUS=2048 NUMA pSeries
        ....
      
      In KVM guests on Power, in a guest not backed by hugepages, we see the
      following:
      
        AnonHugePages:         0 kB
        HugePages_Total:       0
        HugePages_Free:        0
        HugePages_Rsvd:        0
        HugePages_Surp:        0
        Hugepagesize:         64 kB
      
      HPAGE_SHIFT == 0 in this configuration, which indicates that hugepages
      are not supported at boot-time, but this is only checked in
      hugetlb_init().  Extract the check to a helper function, and use it in a
      few relevant places.
      
      This does make hugetlbfs not supported (not registered at all) in this
      environment.  I believe this is fine, as there are no valid hugepages
      and that won't change at runtime.
      
      [akpm@linux-foundation.org: use pr_info(), per Mel]
      [akpm@linux-foundation.org: fix build when HPAGE_SHIFT is undefined]
      Signed-off-by: NNishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      457c1b27
    • A
      nick kvfree() from apparmor · 39f1f78d
      Al Viro 提交于
      too many places open-code it
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      39f1f78d
  2. 06 5月, 2014 1 次提交
  3. 04 5月, 2014 2 次提交
  4. 01 5月, 2014 2 次提交
    • H
      word-at-a-time: simplify big-endian zero_bytemask macro · 789ce9dc
      H. Peter Anvin 提交于
      This is simpler and cleaner.  Depending on architecture, a smart
      compiler may or may not generate the same code.
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      789ce9dc
    • A
      dentry_kill(): don't try to remove from shrink list · 41edf278
      Al Viro 提交于
      If the victim in on the shrink list, don't remove it from there.
      If shrink_dentry_list() manages to remove it from the list before
      we are done - fine, we'll just free it as usual.  If not - mark
      it with new flag (DCACHE_MAY_FREE) and leave it there.
      
      Eventually, shrink_dentry_list() will get to it, remove the sucker
      from shrink list and call dentry_kill(dentry, 0).  Which is where
      we'll deal with freeing.
      
      Since now dentry_kill(dentry, 0) may happen after or during
      dentry_kill(dentry, 1), we need to recognize that (by seeing
      DCACHE_DENTRY_KILLED already set), unlock everything
      and either free the sucker (in case DCACHE_MAY_FREE has been
      set) or leave it for ongoing dentry_kill(dentry, 1) to deal with.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      41edf278
  5. 28 4月, 2014 7 次提交
    • M
      fuse: add renameat2 support · 1560c974
      Miklos Szeredi 提交于
      Support RENAME_EXCHANGE and RENAME_NOREPLACE flags on the userspace ABI.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      1560c974
    • S
      ftrace/module: Hardcode ftrace_module_init() call into load_module() · a949ae56
      Steven Rostedt (Red Hat) 提交于
      A race exists between module loading and enabling of function tracer.
      
      	CPU 1				CPU 2
      	-----				-----
        load_module()
         module->state = MODULE_STATE_COMING
      
      				register_ftrace_function()
      				 mutex_lock(&ftrace_lock);
      				 ftrace_startup()
      				  update_ftrace_function();
      				   ftrace_arch_code_modify_prepare()
      				    set_all_module_text_rw();
      				   <enables-ftrace>
      				    ftrace_arch_code_modify_post_process()
      				     set_all_module_text_ro();
      
      				[ here all module text is set to RO,
      				  including the module that is
      				  loading!! ]
      
         blocking_notifier_call_chain(MODULE_STATE_COMING);
          ftrace_init_module()
      
           [ tries to modify code, but it's RO, and fails!
             ftrace_bug() is called]
      
      When this race happens, ftrace_bug() will produces a nasty warning and
      all of the function tracing features will be disabled until reboot.
      
      The simple solution is to treate module load the same way the core
      kernel is treated at boot. To hardcode the ftrace function modification
      of converting calls to mcount into nops. This is done in init/main.c
      there's no reason it could not be done in load_module(). This gives
      a better control of the changes and doesn't tie the state of the
      module to its notifiers as much. Ftrace is special, it needs to be
      treated as such.
      
      The reason this would work, is that the ftrace_module_init() would be
      called while the module is in MODULE_STATE_UNFORMED, which is ignored
      by the set_all_module_text_ro() call.
      
      Link: http://lkml.kernel.org/r/1395637826-3312-1-git-send-email-indou.takao@jp.fujitsu.comReported-by: NTakao Indoh <indou.takao@jp.fujitsu.com>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: stable@vger.kernel.org # 2.6.38+
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a949ae56
    • M
      fuse: allow ctime flushing to userspace · ab9e13f7
      Maxim Patlasov 提交于
      The patch extends fuse_setattr_in, and extends the flush procedure
      (fuse_flush_times()) called on ->write_inode() to send the ctime as well as
      mtime.
      Signed-off-by: NMaxim Patlasov <MPatlasov@parallels.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      ab9e13f7
    • M
      fuse: fuse: add time_gran to INIT_OUT · e27c9d38
      Miklos Szeredi 提交于
      Allow userspace fs to specify time granularity.
      
      This is needed because with writeback_cache mode the kernel is responsible
      for generating mtime and ctime, but if the underlying filesystem doesn't
      support nanosecond granularity then the cache will contain a different
      value from the one stored on the filesystem resulting in a change of times
      after a cache flush.
      
      Make the default granularity 1s.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      e27c9d38
    • T
      genirq: x86: Ensure that dynamic irq allocation does not conflict · 62a08ae2
      Thomas Gleixner 提交于
      On x86 the allocation of irq descriptors may allocate interrupts which
      are in the range of the GSI interrupts. That's wrong as those
      interrupts are hardwired and we don't have the irq domain translation
      like PPC. So one of these interrupts can be hooked up later to one of
      the devices which are hard wired to it and the io_apic init code for
      that particular interrupt line happily reuses that descriptor with a
      completely different configuration so hell breaks lose.
      
      Inside x86 we allocate dynamic interrupts from above nr_gsi_irqs,
      except for a few usage sites which have not yet blown up in our face
      for whatever reason. But for drivers which need an irq range, like the
      GPIO drivers, we have no limit in place and we don't want to expose
      such a detail to a driver.
      
      To cure this introduce a function which an architecture can implement
      to impose a lower bound on the dynamic interrupt allocations.
      
      Implement it for x86 and set the lower bound to nr_gsi_irqs, which is
      the end of the hardwired interrupt space, so all dynamic allocations
      happen above.
      
      That not only allows the GPIO driver to work sanely, it also protects
      the bogus callsites of create_irq_nr() in hpet, uv, irq_remapping and
      htirq code. They need to be cleaned up as well, but that's a separate
      issue.
      Reported-by: NJin Yao <yao.jin@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Cc: Mathias Nyman <mathias.nyman@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Krogerus Heikki <heikki.krogerus@intel.com>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1404241617360.28206@ionos.tec.linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      62a08ae2
    • R
      linux/interrupt.h: fix new kernel-doc warnings · def5f127
      Randy Dunlap 提交于
      Fix new kernel-doc warnings in <linux/interrupt.h>:
      
      Warning(include/linux/interrupt.h:219): No description found for parameter 'cpumask'
      Warning(include/linux/interrupt.h:219): Excess function parameter 'mask' description in 'irq_set_affinity'
      Warning(include/linux/interrupt.h:236): No description found for parameter 'cpumask'
      Warning(include/linux/interrupt.h:236): Excess function parameter 'mask' description in 'irq_force_affinity'
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Link: http://lkml.kernel.org/r/535DD2FD.7030804@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      def5f127
    • W
      word-at-a-time: avoid undefined behaviour in zero_bytemask macro · ec6931b2
      Will Deacon 提交于
      The asm-generic, big-endian version of zero_bytemask creates a mask of
      bytes preceding the first zero-byte by left shifting ~0ul based on the
      position of the first zero byte.
      
      Unfortunately, if the first (top) byte is zero, the output of
      prep_zero_mask has only the top bit set, resulting in undefined C
      behaviour as we shift left by an amount equal to the width of the type.
      As it happens, GCC doesn't manage to spot this through the call to fls(),
      but the issue remains if architectures choose to implement their shift
      instructions differently.
      
      An example would be arch/arm/ (AArch32), where LSL Rd, Rn, #32 results
      in Rd == 0x0, whilst on arch/arm64 (AArch64) LSL Xd, Xn, #64 results in
      Xd == Xn.
      
      Rather than check explicitly for the problematic shift, this patch adds
      an extra shift by 1, replacing fls with __fls. Since zero_bytemask is
      never called with a zero argument (has_zero() is used to check the data
      first), we don't need to worry about calling __fls(0), which is
      undefined.
      
      Cc: <stable@vger.kernel.org>
      Cc: Victor Kamensky <victor.kamensky@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ec6931b2
  6. 25 4月, 2014 6 次提交
    • M
      tty: Fix race condition between __tty_buffer_request_room and flush_to_ldisc · 6a20dbd6
      Manfred Schlaegl 提交于
      The race was introduced while development of linux-3.11 by
      e8437d7e and
      e9975fde.
      Originally it was found and reproduced on linux-3.12.15 and
      linux-3.12.15-rt25, by sending 500 byte blocks with 115kbaud to the
      target uart in a loop with 100 milliseconds delay.
      
      In short:
       1. The consumer flush_to_ldisc is on to remove the head tty_buffer.
       2. The producer adds a number of bytes, so that a new tty_buffer must
      	be allocated and added by __tty_buffer_request_room.
       3. The consumer removes the head tty_buffer element, without handling
      	newly committed data.
      
      Detailed example:
       * Initial buffer:
         * Head, Tail -> 0: used=250; commit=250; read=240; next=NULL
       * Consumer: ''flush_to_ldisc''
         * consumed 10 Byte
         * buffer:
           * Head, Tail -> 0: used=250; commit=250; read=250; next=NULL
      {{{
      		count = head->commit - head->read;	// count = 0
      		if (!count) {				// enter
      			// INTERRUPTED BY PRODUCER ->
      			if (head->next == NULL)
      				break;
      			buf->head = head->next;
      			tty_buffer_free(port, head);
      			continue;
      		}
      }}}
       * Producer: tty_insert_flip_... 10 bytes + tty_flip_buffer_push
         * buffer:
           * Head, Tail -> 0: used=250; commit=250; read=250; next=NULL
         * added 6 bytes: head-element filled to maximum.
           * buffer:
             * Head, Tail -> 0: used=256; commit=250; read=250; next=NULL
         * added 4 bytes: __tty_buffer_request_room is called
           * buffer:
             * Head -> 0: used=256; commit=256; read=250; next=1
             * Tail -> 1: used=4; commit=0; read=250 next=NULL
         * push (tty_flip_buffer_push)
           * buffer:
             * Head -> 0: used=256; commit=256; read=250; next=1
             * Tail -> 1: used=4; commit=4; read=250 next=NULL
       * Consumer
      {{{
      		count = head->commit - head->read;
      		if (!count) {
      			// INTERRUPTED BY PRODUCER <-
      			if (head->next == NULL)		// -> no break
      				break;
      			buf->head = head->next;
      			tty_buffer_free(port, head);
      			// ERROR: tty_buffer head freed -> 6 bytes lost
      			continue;
      		}
      }}}
      
      This patch reintroduces a spin_lock to protect this case. Perhaps later
      a lock-less solution could be found.
      Signed-off-by: NManfred Schlaegl <manfred.schlaegl@gmx.at>
      Cc: stable <stable@vger.kernel.org> # 3.11
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6a20dbd6
    • R
      of/irq: do irq resolution in platform_get_irq · 9ec36caf
      Rob Herring 提交于
      Currently we get the following kind of errors if we try to use interrupt
      phandles to irqchips that have not yet initialized:
      
      irq: no irq domain found for /ocp/pinmux@48002030 !
      ------------[ cut here ]------------
      WARNING: CPU: 0 PID: 1 at drivers/of/platform.c:171 of_device_alloc+0x144/0x184()
      Modules linked in:
      CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.12.0-00038-g42a9708 #1012
      (show_stack+0x14/0x1c)
      (dump_stack+0x6c/0xa0)
      (warn_slowpath_common+0x64/0x84)
      (warn_slowpath_null+0x1c/0x24)
      (of_device_alloc+0x144/0x184)
      (of_platform_device_create_pdata+0x44/0x9c)
      (of_platform_bus_create+0xd0/0x170)
      (of_platform_bus_create+0x12c/0x170)
      (of_platform_populate+0x60/0x98)
      
      This is because we're wrongly trying to populate resources that are not
      yet available. It's perfectly valid to create irqchips dynamically, so
      let's fix up the issue by resolving the interrupt resources when
      platform_get_irq is called.
      
      And then we also need to accept the fact that some irqdomains do not
      exist that early on, and only get initialized later on. So we can
      make the current WARN_ON into just into a pr_debug().
      
      We still attempt to populate irq resources when we create the devices.
      This allows current drivers which don't use platform_get_irq to continue
      to function. Once all drivers are fixed, this code can be removed.
      Suggested-by: NRussell King <linux@arm.linux.org.uk>
      Signed-off-by: NRob Herring <robh@kernel.org>
      Signed-off-by: NTony Lindgren <tony@atomide.com>
      Tested-by: NTony Lindgren <tony@atomide.com>
      Cc: stable@vger.kernel.org # v3.10+
      Signed-off-by: NGrant Likely <grant.likely@linaro.org>
      9ec36caf
    • G
      phy: core: make NULL a valid phy reference if !CONFIG_GENERIC_PHY · 2b97789f
      Grygorii Strashko 提交于
      This fixes a regression on Keystone 2 platforms caused by patch
      57303488
      "usb: dwc3: adapt dwc3 core to use Generic PHY Framework" which adds
      optional support of generic phy in DWC3 core.
      
      On Keystone 2 platforms the USB is not working now because
      CONFIG_GENERIC_PHY isn't set and, as result, Generic PHY APIs stubs
      return -ENOSYS always. The log shows:
       dwc3 2690000.dwc3: failed to initialize core
       dwc3: probe of 2690000.dwc3 failed with error -38
      
      Hence, fix it by making NULL a valid phy reference in Generic PHY
      APIs stubs in the same way as it was done by the patch
      04c2faca "drivers: phy: Make NULL
      a valid phy reference".
      Acked-by: NFelipe Balbi <balbi@ti.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: NKishon Vijay Abraham I <kishon@ti.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2b97789f
    • E
      net: Add variants of capable for use on netlink messages · aa4cf945
      Eric W. Biederman 提交于
      netlink_net_capable - The common case use, for operations that are safe on a network namespace
      netlink_capable - For operations that are only known to be safe for the global root
      netlink_ns_capable - The general case of capable used to handle special cases
      
      __netlink_ns_capable - Same as netlink_ns_capable except taking a netlink_skb_parms instead of
      		       the skbuff of a netlink message.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa4cf945
    • E
      net: Add variants of capable for use on on sockets · a3b299da
      Eric W. Biederman 提交于
      sk_net_capable - The common case, operations that are safe in a network namespace.
      sk_capable - Operations that are not known to be safe in a network namespace
      sk_ns_capable - The general case for special cases.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a3b299da
    • E
      net: Move the permission check in sock_diag_put_filterinfo to packet_diag_dump · a53b72c8
      Eric W. Biederman 提交于
      The permission check in sock_diag_put_filterinfo is wrong, and it is so removed
      from it's sources it is not clear why it is wrong.  Move the computation
      into packet_diag_dump and pass a bool of the result into sock_diag_filterinfo.
      
      This does not yet correct the capability check but instead simply moves it to make
      it clear what is going on.
      Reported-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a53b72c8
  7. 24 4月, 2014 2 次提交
  8. 23 4月, 2014 2 次提交
  9. 22 4月, 2014 1 次提交
    • J
      locks: rename file-private locks to "open file description locks" · 0d3f7a2d
      Jeff Layton 提交于
      File-private locks have been merged into Linux for v3.15, and *now*
      people are commenting that the name and macro definitions for the new
      file-private locks suck.
      
      ...and I can't even disagree. The names and command macros do suck.
      
      We're going to have to live with these for a long time, so it's
      important that we be happy with the names before we're stuck with them.
      The consensus on the lists so far is that they should be rechristened as
      "open file description locks".
      
      The name isn't a big deal for the kernel, but the command macros are not
      visually distinct enough from the traditional POSIX lock macros. The
      glibc and documentation folks are recommending that we change them to
      look like F_OFD_{GETLK|SETLK|SETLKW}. That lessens the chance that a
      programmer will typo one of the commands wrong, and also makes it easier
      to spot this difference when reading code.
      
      This patch makes the following changes that I think are necessary before
      v3.15 ships:
      
      1) rename the command macros to their new names. These end up in the uapi
         headers and so are part of the external-facing API. It turns out that
         glibc doesn't actually use the fcntl.h uapi header, but it's hard to
         be sure that something else won't. Changing it now is safest.
      
      2) make the the /proc/locks output display these as type "OFDLCK"
      
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Carlos O'Donell <carlos@redhat.com>
      Cc: Stefan Metzmacher <metze@samba.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Frank Filz <ffilzlnx@mindspring.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      0d3f7a2d
  10. 20 4月, 2014 2 次提交
    • H
      Input: Add INPUT_PROP_TOPBUTTONPAD device property · f37c0134
      Hans de Goede 提交于
      On some newer laptops with a trackpoint the physical buttons for the
      trackpoint have been removed to allow for a larger touchpad. On these
      laptops the buttonpad has clearly marked areas on the top which are to be
      used as trackpad buttons.
      
      Users of the event device-node need to know about this, so that they can
      properly interpret BTN_LEFT events as being a left / right / middle click
      depending on where on the button pad the clicking finger is.
      
      This commits adds a INPUT_PROP_TOPBUTTONPAD device property which drivers
      for such buttonpads will use to signal to the user that this buttonpad not
      only has the normal bottom button area, but also a top button area.
      Signed-off-by: NHans de Goede <hdegoede@redhat.com>
      Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
      f37c0134
    • H
      Input: serio - add firmware_id sysfs attribute · 0456c66f
      Hans de Goede 提交于
      serio devices exposed via platform firmware interfaces such as ACPI may
      provide additional identifying information of use to userspace.
      
      We don't associate the serio devices with the firmware device (we don't
      set it as parent), so there's no way for userspace to make use of this
      information.
      
      We cannot change the parent for serio devices instantiated though a
      firmware interface as that would break suspend / resume ordering.
      
      Therefore this patch adds a new firmware_id sysfs attribute so that
      userspace can get a string from there with any additional identifying
      information the firmware interface may provide.
      Signed-off-by: NHans de Goede <hdegoede@redhat.com>
      Acked-by: NPeter Hutterer <peter.hutterer@who-t.net>
      Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
      0456c66f
  11. 19 4月, 2014 5 次提交
    • M
      mm: use paravirt friendly ops for NUMA hinting ptes · 29c77870
      Mel Gorman 提交于
      David Vrabel identified a regression when using automatic NUMA balancing
      under Xen whereby page table entries were getting corrupted due to the
      use of native PTE operations.  Quoting him
      
      	Xen PV guest page tables require that their entries use machine
      	addresses if the preset bit (_PAGE_PRESENT) is set, and (for
      	successful migration) non-present PTEs must use pseudo-physical
      	addresses.  This is because on migration MFNs in present PTEs are
      	translated to PFNs (canonicalised) so they may be translated back
      	to the new MFN in the destination domain (uncanonicalised).
      
      	pte_mknonnuma(), pmd_mknonnuma(), pte_mknuma() and pmd_mknuma()
      	set and clear the _PAGE_PRESENT bit using pte_set_flags(),
      	pte_clear_flags(), etc.
      
      	In a Xen PV guest, these functions must translate MFNs to PFNs
      	when clearing _PAGE_PRESENT and translate PFNs to MFNs when setting
      	_PAGE_PRESENT.
      
      His suggested fix converted p[te|md]_[set|clear]_flags to using
      paravirt-friendly ops but this is overkill.  He suggested an alternative
      of using p[te|md]_modify in the NUMA page table operations but this is
      does more work than necessary and would require looking up a VMA for
      protections.
      
      This patch modifies the NUMA page table operations to use paravirt
      friendly operations to set/clear the flags of interest.  Unfortunately
      this will take a performance hit when updating the PTEs on
      CONFIG_PARAVIRT but I do not see a way around it that does not break
      Xen.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NDavid Vrabel <david.vrabel@citrix.com>
      Tested-by: NDavid Vrabel <david.vrabel@citrix.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Steven Noonan <steven@uplinklabs.net>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      29c77870
    • P
      wait: explain the shadowing and type inconsistencies · 8b32201d
      Peter Zijlstra 提交于
      Stick in a comment before someone else tries to fix the sparse warning
      this generates.
      Suggested-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/n/tip-o2ro6f3vkxklni0bc8f7m68s@git.kernel.orgSigned-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8b32201d
    • V
      Shiraz has moved · 9cc23682
      Viresh Kumar 提交于
      shiraz.hashim@st.com email-id doesn't exist anymore as he has left the
      company.  Replace ST's id with shiraz.linux.kernel@gmail.com.
      
      It also updates .mailmap file to fix address for 'git shortlog'.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Cc: Shiraz Hashim <shiraz.linux.kernel@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9cc23682
    • V
      net: sctp: cache auth_enable per endpoint · b14878cc
      Vlad Yasevich 提交于
      Currently, it is possible to create an SCTP socket, then switch
      auth_enable via sysctl setting to 1 and crash the system on connect:
      
      Oops[#1]:
      CPU: 0 PID: 0 Comm: swapper Not tainted 3.14.1-mipsgit-20140415 #1
      task: ffffffff8056ce80 ti: ffffffff8055c000 task.ti: ffffffff8055c000
      [...]
      Call Trace:
      [<ffffffff8043c4e8>] sctp_auth_asoc_set_default_hmac+0x68/0x80
      [<ffffffff8042b300>] sctp_process_init+0x5e0/0x8a4
      [<ffffffff8042188c>] sctp_sf_do_5_1B_init+0x234/0x34c
      [<ffffffff804228c8>] sctp_do_sm+0xb4/0x1e8
      [<ffffffff80425a08>] sctp_endpoint_bh_rcv+0x1c4/0x214
      [<ffffffff8043af68>] sctp_rcv+0x588/0x630
      [<ffffffff8043e8e8>] sctp6_rcv+0x10/0x24
      [<ffffffff803acb50>] ip6_input+0x2c0/0x440
      [<ffffffff8030fc00>] __netif_receive_skb_core+0x4a8/0x564
      [<ffffffff80310650>] process_backlog+0xb4/0x18c
      [<ffffffff80313cbc>] net_rx_action+0x12c/0x210
      [<ffffffff80034254>] __do_softirq+0x17c/0x2ac
      [<ffffffff800345e0>] irq_exit+0x54/0xb0
      [<ffffffff800075a4>] ret_from_irq+0x0/0x4
      [<ffffffff800090ec>] rm7k_wait_irqoff+0x24/0x48
      [<ffffffff8005e388>] cpu_startup_entry+0xc0/0x148
      [<ffffffff805a88b0>] start_kernel+0x37c/0x398
      Code: dd0900b8  000330f8  0126302d <dcc60000> 50c0fff1  0047182a  a48306a0
      03e00008  00000000
      ---[ end trace b530b0551467f2fd ]---
      Kernel panic - not syncing: Fatal exception in interrupt
      
      What happens while auth_enable=0 in that case is, that
      ep->auth_hmacs is initialized to NULL in sctp_auth_init_hmacs()
      when endpoint is being created.
      
      After that point, if an admin switches over to auth_enable=1,
      the machine can crash due to NULL pointer dereference during
      reception of an INIT chunk. When we enter sctp_process_init()
      via sctp_sf_do_5_1B_init() in order to respond to an INIT chunk,
      the INIT verification succeeds and while we walk and process
      all INIT params via sctp_process_param() we find that
      net->sctp.auth_enable is set, therefore do not fall through,
      but invoke sctp_auth_asoc_set_default_hmac() instead, and thus,
      dereference what we have set to NULL during endpoint
      initialization phase.
      
      The fix is to make auth_enable immutable by caching its value
      during endpoint initialization, so that its original value is
      being carried along until destruction. The bug seems to originate
      from the very first days.
      
      Fix in joint work with Daniel Borkmann.
      Reported-by: NJoshua Kinard <kumba@gentoo.org>
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Tested-by: NJoshua Kinard <kumba@gentoo.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b14878cc
    • D
      libata/ahci: accommodate tag ordered controllers · 8a4aeec8
      Dan Williams 提交于
      The AHCI spec allows implementations to issue commands in tag order
      rather than FIFO order:
      
      	5.3.2.12 P:SelectCmd
      	HBA sets pSlotLoc = (pSlotLoc + 1) mod (CAP.NCS + 1)
      	or HBA selects the command to issue that has had the
      	PxCI bit set to '1' longer than any other command
      	pending to be issued.
      
      The result is that commands posted sequentially (time-wise) may play out
      of sequence when issued by hardware.
      
      This behavior has likely been hidden by drives that arrange for commands
      to complete in issue order.  However, it appears recent drives (two from
      different vendors that we have found so far) inflict out-of-order
      completions as a matter of course.  So, we need to take care to maintain
      ordered submission, otherwise we risk triggering a drive to fall out of
      sequential-io automation and back to random-io processing, which incurs
      large latency and degrades throughput.
      
      This issue was found in simple benchmarks where QD=2 seq-write
      performance was 30-50% *greater* than QD=32 seq-write performance.
      
      Tagging for -stable and making the change globally since it has a low
      risk-to-reward ratio.  Also, word is that recent versions of an unnamed
      OS also does it this way now.  So, drives in the field are already
      experienced with this tag ordering scheme.
      
      Cc: <stable@vger.kernel.org>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Ed Ciechanowski <ed.ciechanowski@intel.com>
      Reviewed-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      8a4aeec8
  12. 18 4月, 2014 3 次提交