1. 15 1月, 2013 2 次提交
    • M
      f2fs: add global mutex_lock to protect f2fs_stat_list · 66af62ce
      majianpeng 提交于
      There is an race condition between umounting f2fs and reading f2fs/status, which
      results in oops.
      
      Fox example:
      Thread A			Thread B
      umount f2fs 			cat f2fs/status
      
      f2fs_destroy_stats() {		stat_show() {
      				 list_for_each_entry_safe(&f2fs_stat_list)
       list_del(&si->stat_list);
       mutex_lock(&si->stat_lock);
       si->sbi = NULL;
       mutex_unlock(&si->stat_lock);
       kfree(sbi->stat_info);
      } 				 mutex_lock(&si->stat_lock) <- si is gone.
      				 ...
      				}
      
      Solution with a global lock: f2fs_stat_mutex:
      Thread A			Thread B
      umount f2fs 			cat f2fs/status
      
      f2fs_destroy_stats() {		stat_show() {
       mutex_lock(&f2fs_stat_mutex);
       list_del(&si->stat_list);
       mutex_unlock(&f2fs_stat_mutex);
       kfree(sbi->stat_info);		 mutex_lock(&f2fs_stat_mutex);
      }				 list_for_each_entry_safe(&f2fs_stat_list)
      				 ...
      				 mutex_unlock(&f2fs_stat_mutex);
      				}
      Signed-off-by: NJianpeng Ma <majianpeng@gmail.com>
      [jaegeuk.kim@samsung.com: fix typos, description, and remove the existing lock]
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      66af62ce
    • N
      f2fs: remove the blk_plug usage in f2fs_write_data_pages · fa9150a8
      Namjae Jeon 提交于
      Let's consider the usage of blk_plug in f2fs_write_data_pages().
      We can come up with the two issues: lock contention and task awareness.
      
      1. Merging bios prior to grabing "queue lock"
       The f2fs merges consecutive IOs in the file system level before
       submitting any bios, which is similar with the back merge by the
       plugging mechanism in attempt_plug_merge(). Both of them need to acquire
       no queue lock.
      
      2. Merging policy with respect to tasks
       The f2fs merges IOs as much as possible regardless of tasks, while
       blk-plugging is conducted on a basis of tasks. As we can understand
       there are trade-offs, f2fs tries to maximize the write performance with
       well-merged bios.
      
      As a result, if f2fs produces many consecutive but separated bios in
      writepages(), it would be good to use blk-plugging since f2fs would be
      able to avoid queue lock contention in the block layer by merging them.
      But, f2fs merges IOs and submit one bio, which means that there are not
      much chances to merge bios by attempt_plug_merge().
      
      However, f2fs has already been used blk_plug by triggering generic_writepages()
      in f2fs_write_data_pages().
      So to make the overall code consistency, I'd like to remove blk_plug there.
      Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NAmit Sahrawat <a.sahrawat@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      fa9150a8
  2. 14 1月, 2013 2 次提交
  3. 11 1月, 2013 2 次提交
    • J
      f2fs: move f2fs_balance_fs to punch_hole · 9eaeba70
      Jaegeuk Kim 提交于
      The f2fs_fallocate() has two operations: punch_hole and expand_size.
      
      Only in the case of punch_hole, dirty node pages can be produced, so let's
      trigger f2fs_balance_fs() in this case only.
      Furthermore, let's trigger it at every data truncation routine.
      Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      9eaeba70
    • J
      f2fs: add f2fs_balance_fs in several interfaces · 7d82db83
      Jaegeuk Kim 提交于
      The f2fs_balance_fs() is to check the number of free sections and decide whether
      it needs to conduct cleaning or not. If there are not enough free sections, the
      cleaning job should be started.
      
      In order to control an amount of free sections even under high utilization, f2fs
      should call f2fs_balance_fs at all the VFS interfaces that are able to produce
      dirty pages.
      This patch adds the function calls in the missing interfaces as follows.
      
      1. f2fs_setxattr()
      The f2fs_setxattr() produces dirty node pages so that we should call
      f2fs_balance_fs() either likewise doing in other VFS interfaces such as
      f2fs_lookup(), f2fs_mkdir(), and so on.
      
      2. f2fs_sync_file()
      We should guarantee serving free sections for syncing metadata during fsync.
      Previously, there is no space check before triggering checkpoint and
      sync_node_pages.
      Therefore, if a bunch of fsync calls are triggered under 100% of FS utilization,
      f2fs is able to be faced with no free sections, resulting in BUG_ON().
      
      3. f2fs_sync_fs()
      Before calling write_checkpoint(), we should guarantee that there are minimum
      free sections.
      
      4. f2fs_write_inode()
      f2fs_write_inode() is also able to produce dirty node pages.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      7d82db83
  4. 10 1月, 2013 1 次提交
    • J
      f2fs: revisit the f2fs_gc flow · 408e9375
      Jaegeuk Kim 提交于
      I'd like to revisit the f2fs_gc flow and rewrite as follows.
      
      1. In practical, the nGC parameter of f2fs_gc is meaningless. So, let's
        remove it.
      2. Background GC marks victim blocks as dirty one at a time.
      3. Foreground GC should do cleaning job until acquiring enough free
        sections. Afterwards, it needs to do checkpoint.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      408e9375
  5. 04 1月, 2013 13 次提交
  6. 03 1月, 2013 16 次提交
    • G
      powerpc: Add missing NULL terminator to avoid boot panic on PPC40x · e6449c9b
      Gabor Juhos 提交于
      The missing NULL terminator can cause a panic on
      PPC405 boards during boot:
      
        Linux/PowerPC load: console=ttyS0,115200 root=/dev/mtdblock1 rootfstype=squashfs,jffs2 noinitrd init=/etc/preinit
        Finalizing device tree... flat tree at 0x6a5160
        bootconsole [udbg0] enabled
        Page fault in user mode with in_atomic() = 1 mm = (null)
        NIP = c0275f50  MSR = fffffffe
        Oops: Weird page fault, sig: 11 [#1]
        PowerPC 40x Platform
        Modules linked in:
        NIP: c0275f50 LR: c0275f60 CTR: c0280000
        REGS: c0275eb0 TRAP: 636f7265   Not tainted  (3.7.1)
        MSR: fffffffe <VEC,VSX,EE,PR,FP,ME,SE,BE,IR,DR,PMM,RI> CR: c06a6190  XER: 00000001
        TASK = c02662a8[0] 'swapper' THREAD: c0274000
        GPR00: c0275ec0 c000c658 c027c4bf 00000000 c0275ee0 c000a0ec c020a1a8 c020a1f0
        GPR08: c020f631 c020f404 c025f078 c025f080 c0275f10
         Call Trace:
         ---[ end trace 31fd0ba7d8756001 ]---
      
        Kernel panic - not syncing: Attempted to kill the idle task!
      
      The panic happens since commit 9597abe0
      (sections: fix section conflicts in arch/powerpc), however the root
      cause of this is that the NULL terminator were not added in commit
      a4f740cf (of/flattree: Add of_flat_dt_match()
      helper function).
      
      Cc: Grant Likely <grant.likely@secretlab.ca>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NGabor Juhos <juhosg@openwrt.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      e6449c9b
    • S
      powerpc/vdso: Remove redundant locking in update_vsyscall_tz() · ce73ec6d
      Shan Hai 提交于
      The locking in update_vsyscall_tz() is not only unnecessary because the vdso
      code copies the data unproteced in __kernel_gettimeofday() but also
      introduces a hard to reproduce race condition between update_vsyscall()
      and update_vsyscall_tz(), which causes user space process to loop
      forever in vdso code.
      
      The following patch removes the locking from update_vsyscall_tz().
      
      Locking is not only unnecessary because the vdso code copies the data
      unprotected in __kernel_gettimeofday() but also erroneous because updating
      the tb_update_count is not atomic and introduces a hard to reproduce race
      condition between update_vsyscall() and update_vsyscall_tz(), which further
      causes user space process to loop forever in vdso code.
      
      The below scenario describes the race condition,
      x==0	Boot CPU			other CPU
      	proc_P: x==0
      	    timer interrupt
      		update_vsyscall
      x==1		    x++;sync		settimeofday
      					    update_vsyscall_tz
      x==2						x++;sync
      x==3		    sync;x++
      						sync;x++
      	proc_P: x==3 (loops until x becomes even)
      
      Because the ++ operator would be implemented as three instructions and not
      atomic on powerpc.
      
      A similar change was made for x86 in commit 6c260d58
      ("x86: vdso: Remove bogus locking in update_vsyscall_tz")
      Signed-off-by: NShan Hai <shan.hai@windriver.com>
      CC: <stable@vger.kernel.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ce73ec6d
    • L
      Linux 3.8-rc2 · d1c3ed66
      Linus Torvalds 提交于
      d1c3ed66
    • L
      Merge branch 'fixes-for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds · d50403dc
      Linus Torvalds 提交于
      Pull LED fix from Bryan Wu.
      
      * 'fixes-for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds:
        leds: leds-gpio: set devm_gpio_request_one() flags param correctly
      d50403dc
    • J
      leds: leds-gpio: set devm_gpio_request_one() flags param correctly · 2d7c22f6
      Javier Martinez Canillas 提交于
      commit a99d76f9 leds: leds-gpio: use gpio_request_one
      
      changed the leds-gpio driver to use gpio_request_one() instead
      of gpio_request() + gpio_direction_output()
      
      Unfortunately, it also made a semantic change that breaks the
      leds-gpio driver.
      
      The gpio_request_one() flags parameter was set to:
      
      GPIOF_DIR_OUT | (led_dat->active_low ^ state)
      
      Since GPIOF_DIR_OUT is 0, the final flags value will just be the
      XOR'ed value of led_dat->active_low and state.
      
      This value were used to distinguish between HIGH/LOW output initial
      level and call gpio_direction_output() accordingly.
      
      With this new semantic gpio_request_one() will take the flags value
      of 1 as a configuration of input direction (GPIOF_DIR_IN) and will
      call gpio_direction_input() instead of gpio_direction_output().
      
      int gpio_request_one(unsigned gpio, unsigned long flags, const char *label)
      {
      ..
      	if (flags & GPIOF_DIR_IN)
      		err = gpio_direction_input(gpio);
      	else
      		err = gpio_direction_output(gpio,
      				(flags & GPIOF_INIT_HIGH) ? 1 : 0);
      ..
      }
      
      The right semantic is to evaluate led_dat->active_low ^ state and
      set the output initial level explicitly.
      Signed-off-by: NJavier Martinez Canillas <javier.martinez@collabora.co.uk>
      Reported-by: NArnaud Patard <arnaud.patard@rtp-net.org>
      Tested-by: NEzequiel Garcia <ezequiel.garcia@free-electrons.com>
      Signed-off-by: NBryan Wu <cooloney@gmail.com>
      2d7c22f6
    • L
      Merge git://www.linux-watchdog.org/linux-watchdog · ef05e9b9
      Linus Torvalds 提交于
      Pull watchdog fixes from Wim Van Sebroeck:
       "This fixes some small errors in the new da9055 driver, eliminates a
        compiler warning and adds DT support for the twl4030_wdt driver (so
        that we can have multiple watchdogs with DT on the omap platforms)."
      
      * git://www.linux-watchdog.org/linux-watchdog:
        watchdog: twl4030_wdt: add DT support
        watchdog: omap_wdt: eliminate unused variable and a compiler warning
        watchdog: da9055: Don't update wdt_dev->timeout in da9055_wdt_set_timeout error path
        watchdog: da9055: Fix invalid free of devm_ allocated data
      ef05e9b9
    • L
      Merge tag '3.8-pci-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · 080a62e2
      Linus Torvalds 提交于
      Pull PCI updates from Bjorn Helgaas:
       "Some fixes for v3.8.  They include a fix for the new SR-IOV sysfs
        management support, an expanded quirk for Ricoh SD card readers, a
        Stratus DMI quirk fix, and a PME polling fix."
      
      * tag '3.8-pci-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        PCI: Reduce Ricoh 0xe822 SD card reader base clock frequency to 50MHz
        PCI/PM: Do not suspend port if any subordinate device needs PME polling
        PCI: Add PCIe Link Capability link speed and width names
        PCI: Work around Stratus ftServer broken PCIe hierarchy (fix DMI check)
        PCI: Remove spurious error for sriov_numvfs store and simplify flow
      080a62e2
    • D
      UAPI: Strip _UAPI prefix on header install no matter the whitespace · 8a7eab2b
      David Howells 提交于
      Commit 56c176c9 ("UAPI: strip the _UAPI prefix from header guards
      during header installation") strips the _UAPI prefix from header guards,
      but only if there's a single space between the cpp directive and the
      label.
      
      Make it more flexible and able to handle tabs and multiple white space
      characters.
      Signed-off-by: NDavid Howells <dhowell@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8a7eab2b
    • D
      UAPI: Remove empty Kbuild files · 3d33fcc1
      David Howells 提交于
      Empty files can get deleted by the patch program, so remove empty Kbuild
      files and their links from the parent Kbuilds.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3d33fcc1
    • L
      Merge tag 'ecryptfs-3.8-rc2-fixes' of... · 007f6c3a
      Linus Torvalds 提交于
      Merge tag 'ecryptfs-3.8-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs
      
      Pull ecryptfs fixes from Tyler Hicks:
       "Two self-explanatory fixes and a third patch which improves
        performance: when overwriting a full page in the eCryptfs page cache,
        skip reading in and decrypting the corresponding lower page."
      
      * tag 'ecryptfs-3.8-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
        fs/ecryptfs/crypto.c: make ecryptfs_encode_for_filename() static
        eCryptfs: fix to use list_for_each_entry_safe() when delete items
        eCryptfs: Avoid unnecessary disk read and data decryption during writing
      007f6c3a
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client · 58890c06
      Linus Torvalds 提交于
      Pull Ceph fixes from Sage Weil:
       "Two of Alex's patches deal with a race when reseting server
        connections for open RBD images, one demotes some non-fatal BUGs to
        WARNs, and my patch fixes a protocol feature bit failure path."
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
        libceph: fix protocol feature mismatch failure path
        libceph: WARN, don't BUG on unexpected connection states
        libceph: always reset osds when kicking
        libceph: move linger requests sooner in kick_requests()
      58890c06
    • M
      mm: mempolicy: Convert shared_policy mutex to spinlock · 42288fe3
      Mel Gorman 提交于
      Sasha was fuzzing with trinity and reported the following problem:
      
        BUG: sleeping function called from invalid context at kernel/mutex.c:269
        in_atomic(): 1, irqs_disabled(): 0, pid: 6361, name: trinity-main
        2 locks held by trinity-main/6361:
         #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff810aa314>] __do_page_fault+0x1e4/0x4f0
         #1:  (&(&mm->page_table_lock)->rlock){+.+...}, at: [<ffffffff8122f017>] handle_pte_fault+0x3f7/0x6a0
        Pid: 6361, comm: trinity-main Tainted: G        W
        3.7.0-rc2-next-20121024-sasha-00001-gd95ef01-dirty #74
        Call Trace:
          __might_sleep+0x1c3/0x1e0
          mutex_lock_nested+0x29/0x50
          mpol_shared_policy_lookup+0x2e/0x90
          shmem_get_policy+0x2e/0x30
          get_vma_policy+0x5a/0xa0
          mpol_misplaced+0x41/0x1d0
          handle_pte_fault+0x465/0x6a0
      
      This was triggered by a different version of automatic NUMA balancing
      but in theory the current version is vunerable to the same problem.
      
      do_numa_page
        -> numa_migrate_prep
          -> mpol_misplaced
            -> get_vma_policy
              -> shmem_get_policy
      
      It's very unlikely this will happen as shared pages are not marked
      pte_numa -- see the page_mapcount() check in change_pte_range() -- but
      it is possible.
      
      To address this, this patch restores sp->lock as originally implemented
      by Kosaki Motohiro.  In the path where get_vma_policy() is called, it
      should not be calling sp_alloc() so it is not necessary to treat the PTL
      specially.
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Tested-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      42288fe3
    • L
      Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · 5439ca6b
      Linus Torvalds 提交于
      Pull ext4 bug fixes from Ted Ts'o:
       "Various bug fixes for ext4.  Perhaps the most serious bug fixed is one
        which could cause file system corruptions when performing file punch
        operations."
      
      * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        ext4: avoid hang when mounting non-journal filesystems with orphan list
        ext4: lock i_mutex when truncating orphan inodes
        ext4: do not try to write superblock on ro remount w/o journal
        ext4: include journal blocks in df overhead calcs
        ext4: remove unaligned AIO warning printk
        ext4: fix an incorrect comment about i_mutex
        ext4: fix deadlock in journal_unmap_buffer()
        ext4: split off ext4_journalled_invalidatepage()
        jbd2: fix assertion failure in jbd2_journal_flush()
        ext4: check dioread_nolock on remount
        ext4: fix extent tree corruption caused by hole punch
      5439ca6b
    • H
      mempolicy: remove arg from mpol_parse_str, mpol_to_str · a7a88b23
      Hugh Dickins 提交于
      Remove the unused argument (formerly no_context) from mpol_parse_str()
      and from mpol_to_str().
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a7a88b23
    • H
      tmpfs mempolicy: fix /proc/mounts corrupting memory · f2a07f40
      Hugh Dickins 提交于
      Recently I suggested using "mount -o remount,mpol=local /tmp" in NUMA
      mempolicy testing.  Very nasty.  Reading /proc/mounts, /proc/pid/mounts
      or /proc/pid/mountinfo may then corrupt one bit of kernel memory, often
      in a page table (causing "Bad swap" or "Bad page map" warning or "Bad
      pagetable" oops), sometimes in a vm_area_struct or rbnode or somewhere
      worse.  "mpol=prefer" and "mpol=prefer:Node" are equally toxic.
      
      Recent NUMA enhancements are not to blame: this dates back to 2.6.35,
      when commit e17f74af "mempolicy: don't call mpol_set_nodemask() when
      no_context" skipped mpol_parse_str()'s call to mpol_set_nodemask(),
      which used to initialize v.preferred_node, or set MPOL_F_LOCAL in flags.
      With slab poisoning, you can then rely on mpol_to_str() to set the bit
      for node 0x6b6b, probably in the next page above the caller's stack.
      
      mpol_parse_str() is only called from shmem_parse_options(): no_context
      is always true, so call it unused for now, and remove !no_context code.
      Set v.nodes or v.preferred_node or MPOL_F_LOCAL as mpol_to_str() might
      expect.  Then mpol_to_str() can ignore its no_context argument also,
      the mpol being appropriately initialized whether contextualized or not.
      Rename its no_context unused too, and let subsequent patch remove them
      (that's not needed for stable backporting, which would involve rejects).
      
      I don't understand why MPOL_LOCAL is described as a pseudo-policy:
      it's a reasonable policy which suffers from a confusing implementation
      in terms of MPOL_PREFERRED with MPOL_F_LOCAL.  I believe this would be
      much more robust if MPOL_LOCAL were recognized in switch statements
      throughout, MPOL_F_LOCAL deleted, and MPOL_PREFERRED use the (possibly
      empty) nodes mask like everyone else, instead of its preferred_node
      variant (I presume an optimization from the days before MPOL_LOCAL).
      But that would take me too long to get right and fully tested.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f2a07f40
    • E
      epoll: prevent missed events on EPOLL_CTL_MOD · 128dd175
      Eric Wong 提交于
      EPOLL_CTL_MOD sets the interest mask before calling f_op->poll() to
      ensure events are not missed.  Since the modifications to the interest
      mask are not protected by the same lock as ep_poll_callback, we need to
      ensure the change is visible to other CPUs calling ep_poll_callback.
      
      We also need to ensure f_op->poll() has an up-to-date view of past
      events which occured before we modified the interest mask.  So this
      barrier also pairs with the barrier in wq_has_sleeper().
      
      This should guarantee either ep_poll_callback or f_op->poll() (or both)
      will notice the readiness of a recently-ready/modified item.
      
      This issue was encountered by Andreas Voellmy and Junchang(Jason) Wang in:
      http://thread.gmane.org/gmane.linux.kernel/1408782/Signed-off-by: NEric Wong <normalperson@yhbt.net>
      Cc: Hans Verkuil <hans.verkuil@cisco.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Cc: Hans de Goede <hdegoede@redhat.com>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andreas Voellmy <andreas.voellmy@yale.edu>
      Tested-by: N"Junchang(Jason) Wang" <junchang.wang@yale.edu>
      Cc: netdev@vger.kernel.org
      Cc: linux-fsdevel@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      128dd175
  7. 02 1月, 2013 4 次提交