1. 09 9月, 2013 3 次提交
    • J
      cifs: display iocharset= option in /proc/mounts · 3ae35cde
      Jeff Layton 提交于
      ...but only if it's not the default charset.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      3ae35cde
    • J
      cifs: create a new Documentation/ directory and move docfiles into it · 30706a54
      Jeff Layton 提交于
      Currently, we have a number of documentation files that live under
      fs/cifs/. Generally, these don't get picked up by distro packagers,
      since they're in a non-standard location. Move them to a new spot
      under Documentation/ instead.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      30706a54
    • J
      cifs: ensure that srv_mutex is held when dealing with ssocket pointer · 73e216a8
      Jeff Layton 提交于
      Oleksii reported that he had seen an oops similar to this:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000088
      IP: [<ffffffff814dcc13>] sock_sendmsg+0x93/0xd0
      PGD 0
      Oops: 0000 [#1] PREEMPT SMP
      Modules linked in: ipt_MASQUERADE xt_REDIRECT xt_tcpudp iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables carl9170 ath usb_storage f2fs nfnetlink_log nfnetlink md4 cifs dns_resolver hid_generic usbhid hid af_packet uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev rfcomm btusb bnep bluetooth qmi_wwan qcserial cdc_wdm usb_wwan usbnet usbserial mii snd_hda_codec_hdmi snd_hda_codec_realtek iwldvm mac80211 coretemp intel_powerclamp kvm_intel kvm iwlwifi snd_hda_intel cfg80211 snd_hda_codec xhci_hcd e1000e ehci_pci snd_hwdep sdhci_pci snd_pcm ehci_hcd microcode psmouse sdhci thinkpad_acpi mmc_core i2c_i801 pcspkr usbcore hwmon snd_timer snd_page_alloc snd ptp rfkill pps_core soundcore evdev usb_common vboxnetflt(O) vboxdrv(O)Oops#2 Part8
       loop tun binfmt_misc fuse msr acpi_call(O) ipv6 autofs4
      CPU: 0 PID: 21612 Comm: kworker/0:1 Tainted: G        W  O 3.10.1SIGN #28
      Hardware name: LENOVO 2306CTO/2306CTO, BIOS G2ET92WW (2.52 ) 02/22/2013
      Workqueue: cifsiod cifs_echo_request [cifs]
      task: ffff8801e1f416f0 ti: ffff880148744000 task.ti: ffff880148744000
      RIP: 0010:[<ffffffff814dcc13>]  [<ffffffff814dcc13>] sock_sendmsg+0x93/0xd0
      RSP: 0000:ffff880148745b00  EFLAGS: 00010246
      RAX: 0000000000000000 RBX: ffff880148745b78 RCX: 0000000000000048
      RDX: ffff880148745c90 RSI: ffff880181864a00 RDI: ffff880148745b78
      RBP: ffff880148745c48 R08: 0000000000000048 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff880181864a00
      R13: ffff880148745c90 R14: 0000000000000048 R15: 0000000000000048
      FS:  0000000000000000(0000) GS:ffff88021e200000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000088 CR3: 000000020c42c000 CR4: 00000000001407b0
      Oops#2 Part7
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Stack:
       ffff880148745b30 ffffffff810c4af9 0000004848745b30 ffff880181864a00
       ffffffff81ffbc40 0000000000000000 ffff880148745c90 ffffffff810a5aab
       ffff880148745bc0 ffffffff81ffbc40 ffff880148745b60 ffffffff815a9fb8
      Call Trace:
       [<ffffffff810c4af9>] ? finish_task_switch+0x49/0xe0
       [<ffffffff810a5aab>] ? lock_timer_base.isra.36+0x2b/0x50
       [<ffffffff815a9fb8>] ? _raw_spin_unlock_irqrestore+0x18/0x40
       [<ffffffff810a673f>] ? try_to_del_timer_sync+0x4f/0x70
       [<ffffffff815aa38f>] ? _raw_spin_unlock_bh+0x1f/0x30
       [<ffffffff814dcc87>] kernel_sendmsg+0x37/0x50
       [<ffffffffa081a0e0>] smb_send_kvec+0xd0/0x1d0 [cifs]
       [<ffffffffa081a263>] smb_send_rqst+0x83/0x1f0 [cifs]
       [<ffffffffa081ab6c>] cifs_call_async+0xec/0x1b0 [cifs]
       [<ffffffffa08245e0>] ? free_rsp_buf+0x40/0x40 [cifs]
      Oops#2 Part6
       [<ffffffffa082606e>] SMB2_echo+0x8e/0xb0 [cifs]
       [<ffffffffa0808789>] cifs_echo_request+0x79/0xa0 [cifs]
       [<ffffffff810b45b3>] process_one_work+0x173/0x4a0
       [<ffffffff810b52a1>] worker_thread+0x121/0x3a0
       [<ffffffff810b5180>] ? manage_workers.isra.27+0x2b0/0x2b0
       [<ffffffff810bae00>] kthread+0xc0/0xd0
       [<ffffffff810bad40>] ? kthread_create_on_node+0x120/0x120
       [<ffffffff815b199c>] ret_from_fork+0x7c/0xb0
       [<ffffffff810bad40>] ? kthread_create_on_node+0x120/0x120
      Code: 84 24 b8 00 00 00 4c 89 f1 4c 89 ea 4c 89 e6 48 89 df 4c 89 60 18 48 c7 40 28 00 00 00 00 4c 89 68 30 44 89 70 14 49 8b 44 24 28 <ff> 90 88 00 00 00 3d ef fd ff ff 74 10 48 8d 65 e0 5b 41 5c 41
       RIP  [<ffffffff814dcc13>] sock_sendmsg+0x93/0xd0
       RSP <ffff880148745b00>
      CR2: 0000000000000088
      
      The client was in the middle of trying to send a frame when the
      server->ssocket pointer got zeroed out. In most places, that we access
      that pointer, the srv_mutex is held. There's only one spot that I see
      that the server->ssocket pointer gets set and the srv_mutex isn't held.
      This patch corrects that.
      
      The upstream bug report was here:
      
          https://bugzilla.kernel.org/show_bug.cgi?id=60557
      
      Cc: <stable@vger.kernel.org>
      Reported-by: NOleksii Shevchuk <alxchk@gmail.com>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      73e216a8
  2. 06 9月, 2013 12 次提交
  3. 05 9月, 2013 2 次提交
    • J
      f2fs: optimize gc for better performance · a26b7c8a
      Jin Xu 提交于
      This patch improves the gc efficiency by optimizing the victim
      selection policy. With this optimization, the random re-write
      performance could increase up to 20%.
      
      For f2fs, when disk is in shortage of free spaces, gc will selects
      dirty segments and moves valid blocks around for making more space
      available. The gc cost of a segment is determined by the valid blocks
      in the segment. The less the valid blocks, the higher the efficiency.
      The ideal victim segment is the one that has the most garbage blocks.
      
      Currently, it searches up to 20 dirty segments for a victim segment.
      The selected victim is not likely the best victim for gc when there
      are much more dirty segments. Why not searching more dirty segments
      for a better victim? The cost of searching dirty segments is
      negligible in comparison to moving blocks.
      
      In this patch, it enlarges the MAX_VICTIM_SEARCH to 4096 to make
      the search more aggressively for a possible better victim. Since
      it also applies to victim selection for SSR, it will likely improve
      the SSR efficiency as well.
      
      The test case is simple. It creates as many files until the disk full.
      The size for each file is 32KB. Then it writes as many as 100000
      records of 4KB size to random offsets of random files in sync mode.
      The testing was done on a 2GB partition of a SDHC card. Let's see the
      test result of f2fs without and with the patch.
      
      ---------------------------------------
      2GB partition, SDHC
      create 52023 files of size 32768 bytes
      random re-write 100000 records of 4KB
      ---------------------------------------
      | file creation (s) | rewrite time (s) | gc count | gc garbage blocks |
      [no patch]  341         4227             1174          174840
      [patched]   324         2958             645           106682
      
      It's obvious that, with the patch, f2fs finishes the test in 20+% less
      time than without the patch. And internally it does much less gc with
      higher efficiency than before.
      
      Since the performance improvement is related to gc, it might not be so
      obvious for other tests that do not trigger gc as often as this one (
      This is because f2fs selects dirty segments for SSR use most of the
      time when free space is in shortage). The well-known iozone test tool
      was not used for benchmarking the patch becuase it seems do not have
      a test case that performs random re-write on a full disk.
      
      This patch is the revised version based on the suggestion from
      Jaegeuk Kim.
      Signed-off-by: NJin Xu <jinuxstyle@gmail.com>
      [Jaegeuk Kim: suggested simpler solution]
      Reviewed-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      a26b7c8a
    • J
      f2fs: merge more bios of node block writes · 423e95cc
      Jaegeuk Kim 提交于
      Previously, we experience bio traces as follows when running simple sequential
      write test.
      
       f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500104928, size = 4K
       f2fs_do_submit_bio: type = NODE, io = no sync, sector = 499922208, size = 368K
       f2fs_do_submit_bio: type = NODE, io = no sync, sector = 499914752, size = 140K
      
       -> total 512K
      
      The first one is to write an indirect node block, and the others are to write
      direct node blocks.
      
      The reason why there are two separate bios for direct node blocks is:
      0. initial state
      ------------------    ------------------
      |                |    |xxxxxxxx        |
      ------------------    ------------------
      
      1. write 368K
      ------------------    ------------------
      |                |    |xxxxxxxxWWWWWWWW|
      ------------------    ------------------
      
      2. write 140K
      ------------------    ------------------
      |WWWWWWW         |    |xxxxxxxxWWWWWWWW|
      ------------------    ------------------
      
      This is because f2fs_write_node_pages tries to write just 512K totally, so that
      we can lose the chance to merge more bios nicely.
      
      After this patch is applied, we can get the following bio traces.
      
        f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500103168, size = 8K
        f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500111368, size = 4K
        f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500107272, size = 512K
        f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500108296, size = 512K
        f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500109320, size = 500K
      
      And finally, we can improve the sequential write performance,
          from 458.775 MB/s to 479.945 MB/s on SSD.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      423e95cc
  4. 04 9月, 2013 10 次提交
  5. 03 9月, 2013 4 次提交
    • J
      f2fs: avoid an overflow during utilization calculation · 222cbdc4
      Jaegeuk Kim 提交于
      The current f2fs uses all the block counts with 32 bit numbers, which is able to
      cover about 15TB volume.
      
      But in calculation of utilization, f2fs multiplies the count by 100 which can
      induce overflow.
      This patch fixes this.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      222cbdc4
    • J
      f2fs: trigger GC when there are prefree segments · c34e333f
      Jaegeuk Kim 提交于
      Previously, f2fs conducts SSR when free_sections() < overprovision_sections.
      But, even though there are a lot of prefree segments, it can consider SSR only.
      So, let's consider the number of prefree segments too for triggering SSR.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      c34e333f
    • L
      vfs: reimplement d_rcu_to_refcount() using lockref_get_or_lock() · 15570086
      Linus Torvalds 提交于
      This moves __d_rcu_to_refcount() from <linux/dcache.h> into fs/namei.c
      and re-implements it using the lockref infrastructure instead.  It also
      adds a lot of comments about what is actually going on, because turning
      a dentry that was looked up using RCU into a long-lived reference
      counted entry is one of the more subtle parts of the rcu walk.
      
      We also used to be _particularly_ subtle in unlazy_walk() where we
      re-validate both the dentry and its parent using the same sequence
      count.  We used to do it by nesting the locks and then verifying the
      sequence count just once.
      
      That was silly, because nested locking is expensive, but the sequence
      count check is not.  So this just re-validates the dentry and the parent
      separately, avoiding the nested locking, and making the lockref lookup
      possible.
      Acked-by: NWaiman Long <waiman.long@hp.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      15570086
    • W
      vfs: use lockref_get_not_zero() for optimistic lockless dget_parent() · df3d0bbc
      Waiman Long 提交于
      A valid parent pointer is always going to have a non-zero reference
      count, but if we look up the parent optimistically without locking, we
      have to protect against the (very unlikely) race against renaming
      changing the parent from under us.
      
      We do that by using lockref_get_not_zero(), and then re-checking the
      parent pointer after getting a valid reference.
      
      [ This is a re-implementation of a chunk from the original patch by
        Waiman Long: "dcache: Enable lockless update of dentry's refcount".
        I've completely rewritten the patch-series and split it up, but I'm
        attributing this part to Waiman as it's close enough to his earlier
        patch  - Linus ]
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      df3d0bbc
  6. 31 8月, 2013 2 次提交
  7. 29 8月, 2013 7 次提交
    • E
      sysfs: Restrict mounting sysfs · 7dc5dbc8
      Eric W. Biederman 提交于
      Don't allow mounting sysfs unless the caller has CAP_SYS_ADMIN rights
      over the net namespace.  The principle here is if you create or have
      capabilities over it you can mount it, otherwise you get to live with
      what other people have mounted.
      
      Instead of testing this with a straight forward ns_capable call,
      perform this check the long and torturous way with kobject helpers,
      this keeps direct knowledge of namespaces out of sysfs, and preserves
      the existing sysfs abstractions.
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      7dc5dbc8
    • G
      fs/ocfs2/super.c: Use bigger nodestr to accomodate 32-bit node numbers · 49fa8140
      Goldwyn Rodrigues 提交于
      While using pacemaker/corosync, the node numbers are generated using IP
      address as opposed to serial node number generation.  This may not fit
      in a 8-byte string.  Use a bigger string to print the complete node
      number.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      49fa8140
    • W
      vfs: make the dentry cache use the lockref infrastructure · 98474236
      Waiman Long 提交于
      This just replaces the dentry count/lock combination with the lockref
      structure that contains both a count and a spinlock, and does the
      mechanical conversion to use the lockref infrastructure.
      
      There are no semantic changes here, it's purely syntactic.  The
      reference lockref implementation uses the spinlock exactly the same way
      that the old dcache code did, and the bulk of this patch is just
      expanding the internal "d_count" use in the dcache code to use
      "d_lockref.count" instead.
      
      This is purely preparation for the real change to make the reference
      count updates be lockless during the 3.12 merge window.
      
      [ As with the previous commit, this is a rewritten version of a concept
        originally from Waiman, so credit goes to him, blame for any errors
        goes to me.
      
        Waiman's patch had some semantic differences for taking advantage of
        the lockless update in dget_parent(), while this patch is
        intentionally a pure search-and-replace change with no semantic
        changes.     - Linus ]
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      98474236
    • E
      ext4: allow specifying external journal by pathname mount option · ad4eec61
      Eric Sandeen 提交于
      It's always been a hassle that if an external journal's
      device number changes, the filesystem won't mount.
      And since boot-time enumeration can change, device number
      changes aren't unusual.
      
      The current mechanism to update the journal location is by
      passing in a mount option w/ a new devnum, but that's a hassle;
      it's a manual approach, fixing things after the fact.
      
      Adding a mount option, "-o journal_path=/dev/$DEVICE" would
      help, since then we can do i.e.
      
      # mount -o journal_path=/dev/disk/by-label/$JOURNAL_LABEL ...
      
      and it'll mount even if the devnum has changed, as shown here:
      
      # losetup /dev/loop0 journalfile
      # mke2fs -L mylabel-journal -O journal_dev /dev/loop0 
      # mkfs.ext4 -L mylabel -J device=/dev/loop0 /dev/sdb1
      
      Change the journal device number:
      
      # losetup -d /dev/loop0
      # losetup /dev/loop1 journalfile 
      
      And today it will fail:
      
      # mount /dev/sdb1 /mnt/test
      mount: wrong fs type, bad option, bad superblock on /dev/sdb1,
             missing codepage or helper program, or other error
             In some cases useful info is found in syslog - try
             dmesg | tail  or so
      
      # dmesg | tail -n 1
      [17343.240702] EXT4-fs (sdb1): error: couldn't read superblock of external journal
      
      But with this new mount option, we can specify the new path:
      
      # mount -o journal_path=/dev/loop1 /dev/sdb1 /mnt/test
      #
      
      (which does update the encoded device number, incidentally):
      
      # umount /dev/sdb1
      # dumpe2fs -h /dev/sdb1 | grep "Journal device"
      dumpe2fs 1.41.12 (17-May-2010)
      Journal device:	          0x0701
      
      But best of all we can just always mount by journal-path, and
      it'll always work:
      
      # mount -o journal_path=/dev/disk/by-label/mylabel-journal /dev/sdb1 /mnt/test
      #
      
      So the journal_path option can be specified in fstab, and as long as
      the disk is available somewhere, and findable by label (or by UUID),
      we can mount.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      ad4eec61
    • D
      ext4: mark group corrupt on group descriptor checksum · bdfb6ff4
      Darrick J. Wong 提交于
      If the group descriptor fails validation, mark the whole blockgroup
      corrupt so that the inode/block allocators skip this group.  The
      previous approach takes the risk of writing to a damaged group
      descriptor; hopefully it was never the case that the [ib]bitmap fields
      pointed to another valid block and got dirtied, since the memset would
      fill the page with 1s.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      bdfb6ff4
    • D
      ext4: mark block group as corrupt on inode bitmap error · 87a39389
      Darrick J. Wong 提交于
      If we detect either a discrepancy between the inode bitmap and the
      inode counts or the inode bitmap fails to pass validation checks, mark
      the block group corrupt and refuse to allocate or deallocate inodes
      from the group.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      87a39389
    • D
      ext4: mark block group as corrupt on block bitmap error · 163a203d
      Darrick J. Wong 提交于
      When we notice a block-bitmap corruption (because of device failure or
      something else), we should mark this group as corrupt and prevent
      further block allocations/deallocations from it. Currently, we end up
      generating one error message for every block in the bitmap. This
      potentially could make the system unstable as noticed in some
      bugs. With this patch, the error will be printed only the first time
      and mark the entire block group as corrupted. This prevents future
      access allocations/deallocations from it.
      
      Also tested by corrupting the block
      bitmap and forcefully introducing the mb_free_blocks error:
      (1) create a largefile (2Gb)
      $ dd if=/dev/zero of=largefile oflag=direct bs=10485760 count=200
      (2) umount filesystem. use dumpe2fs to see which block-bitmaps
      are in use by largefile and note their block numbers
      (3) use dd to zero-out the used block bitmaps
      $ dd if=/dev/zero of=/dev/hdc4 bs=4096 seek=14 count=8 oflag=direct
      (4) mount the FS and delete the largefile.
      (5) recreate the largefile. verify that the new largefile does not
      get any blocks from the groups marked as bad.
      Without the patch, we will see mb_free_blocks error for each bit in
      each zero'ed out bitmap at (4). With the patch, we only see the error
      once per blockgroup:
      [  309.706803] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 15: 32768 clusters in bitmap, 0 in gd. blk grp corrupted.
      [  309.720824] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 14: 32768 clusters in bitmap, 0 in gd. blk grp corrupted.
      [  309.732858] EXT4-fs error (device sdb4) in ext4_free_blocks:4802: IO failure
      [  309.748321] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 13: 32768 clusters in bitmap, 0 in gd. blk grp corrupted.
      [  309.760331] EXT4-fs error (device sdb4) in ext4_free_blocks:4802: IO failure
      [  309.769695] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 12: 32768 clusters in bitmap, 0 in gd. blk grp corrupted.
      [  309.781721] EXT4-fs error (device sdb4) in ext4_free_blocks:4802: IO failure
      [  309.798166] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 11: 32768 clusters in bitmap, 0 in gd. blk grp corrupted.
      [  309.810184] EXT4-fs error (device sdb4) in ext4_free_blocks:4802: IO failure
      [  309.819532] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 10: 32768 clusters in bitmap, 0 in gd. blk grp corrupted.
      
      Google-Bug-Id: 7258357
      
      [darrick.wong@oracle.com]
      Further modifications (by Darrick) to make more obvious that this corruption
      bit applies to blocks only.  Set the corruption flag if the block group bitmap
      verification fails.
      
      Original-author: Aditya Kali <adityakali@google.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      163a203d