1. 07 1月, 2011 2 次提交
    • N
      fs: icache RCU free inodes · fa0d7e3d
      Nick Piggin 提交于
      RCU free the struct inode. This will allow:
      
      - Subsequent store-free path walking patch. The inode must be consulted for
        permissions when walking, so an RCU inode reference is a must.
      - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
        to take i_lock no longer need to take sb_inode_list_lock to walk the list in
        the first place. This will simplify and optimize locking.
      - Could remove some nested trylock loops in dcache code
      - Could potentially simplify things a bit in VM land. Do not need to take the
        page lock to follow page->mapping.
      
      The downsides of this is the performance cost of using RCU. In a simple
      creat/unlink microbenchmark, performance drops by about 10% due to inability to
      reuse cache-hot slab objects. As iterations increase and RCU freeing starts
      kicking over, this increases to about 20%.
      
      In cases where inode lifetimes are longer (ie. many inodes may be allocated
      during the average life span of a single inode), a lot of this cache reuse is
      not applicable, so the regression caused by this patch is smaller.
      
      The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
      however this adds some complexity to list walking and store-free path walking,
      so I prefer to implement this at a later date, if it is shown to be a win in
      real situations. I haven't found a regression in any non-micro benchmark so I
      doubt it will be a problem.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      fa0d7e3d
    • N
      fs: change d_delete semantics · fe15ce44
      Nick Piggin 提交于
      Change d_delete from a dentry deletion notification to a dentry caching
      advise, more like ->drop_inode. Require it to be constant and idempotent,
      and not take d_lock. This is how all existing filesystems use the callback
      anyway.
      
      This makes fine grained dentry locking of dput and dentry lru scanning
      much simpler.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      fe15ce44
  2. 08 12月, 2010 1 次提交
    • N
      sunrpc: prevent use-after-free on clearing XPT_BUSY · ed2849d3
      NeilBrown 提交于
      When an xprt is created, it has a refcount of 1, and XPT_BUSY is set.
      The refcount is *not* owned by the thread that created the xprt
      (as is clear from the fact that creators never put the reference).
      Rather, it is owned by the absence of XPT_DEAD.  Once XPT_DEAD is set,
      (And XPT_BUSY is clear) that initial reference is dropped and the xprt
      can be freed.
      
      So when a creator clears XPT_BUSY it is dropping its only reference and
      so must not touch the xprt again.
      
      However svc_recv, after calling ->xpo_accept (and so getting an XPT_BUSY
      reference on a new xprt), calls svc_xprt_recieved.  This clears
      XPT_BUSY and then svc_xprt_enqueue - this last without owning a reference.
      This is dangerous and has been seen to leave svc_xprt_enqueue working
      with an xprt containing garbage.
      
      So we need to hold an extra counted reference over that call to
      svc_xprt_received.
      
      For safety, any time we clear XPT_BUSY and then use the xprt again, we
      first get a reference, and the put it again afterwards.
      
      Note that svc_close_all does not need this extra protection as there are
      no threads running, and the final free can only be called asynchronously
      from such a thread.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Cc: stable@kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      ed2849d3
  3. 23 11月, 2010 1 次提交
  4. 18 11月, 2010 1 次提交
  5. 17 11月, 2010 1 次提交
  6. 29 10月, 2010 1 次提交
  7. 26 10月, 2010 6 次提交
  8. 25 10月, 2010 2 次提交
    • T
    • T
      SUNRPC: After calling xprt_release(), we must restart from call_reserve · 118df3d1
      Trond Myklebust 提交于
      Rob Leslie reports seeing the following Oops after his Kerberos session
      expired.
      
      BUG: unable to handle kernel NULL pointer dereference at 00000058
      IP: [<e186ed94>] rpcauth_refreshcred+0x11/0x12c [sunrpc]
      *pde = 00000000
      Oops: 0000 [#1]
      last sysfs file: /sys/devices/platform/pc87360.26144/temp3_input
      Modules linked in: autofs4 authenc esp4 xfrm4_mode_transport ipt_LOG ipt_REJECT xt_limit xt_state ipt_REDIRECT xt_owner xt_HL xt_hl xt_tcpudp xt_mark cls_u32 cls_tcindex sch_sfq sch_htb sch_dsmark geodewdt deflate ctr twofish_generic twofish_i586 twofish_common camellia serpent blowfish cast5 cbc xcbc rmd160 sha512_generic sha1_generic hmac crypto_null af_key rpcsec_gss_krb5 nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc ip_gre sit tunnel4 dummy ext3 jbd nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_conntrack_ftp iptable_mangle iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables x_tables pc8736x_gpio nsc_gpio pc87360 hwmon_vid loop aes_i586 aes_generic sha256_generic dm_crypt cs5535_gpio serio_raw cs5535_mfgpt hifn_795x des_generic geode_rng rng_core led_class ext4 mbcache jbd2 crc16 dm_mirror dm_region_hash dm_log dm_snapshot dm_mod sd_mod crc_t10dif ide_pci_generic cs5536 amd74xx ide_core pata_cs5536 ata_generic libata usb_storage via_rhine mii scsi_mod btrfs zlib_deflate crc32c libcrc32c [last unloaded: scsi_wait_scan]
      
      Pid: 12875, comm: sudo Not tainted 2.6.36-net5501 #1 /
      EIP: 0060:[<e186ed94>] EFLAGS: 00010292 CPU: 0
      EIP is at rpcauth_refreshcred+0x11/0x12c [sunrpc]
      EAX: 00000000 EBX: defb13a0 ECX: 00000006 EDX: e18683b8
      ESI: defb13a0 EDI: 00000000 EBP: 00000000 ESP: de571d58
       DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
      Process sudo (pid: 12875, ti=de570000 task=decd1430 task.ti=de570000)
      Stack:
       e186e008 00000000 defb13a0 0000000d deda6000 e1868f22 e196f12b defb13a0
      <0> defb13d8 00000000 00000000 e186e0aa 00000000 defb13a0 de571dac 00000000
      <0> e186956c de571e34 debea5c0 de571dc8 e186967a 00000000 debea5c0 de571e34
      Call Trace:
       [<e186e008>] ? rpc_wake_up_next+0x114/0x11b [sunrpc]
       [<e1868f22>] ? call_decode+0x24a/0x5af [sunrpc]
       [<e196f12b>] ? nfs4_xdr_dec_access+0x0/0xa2 [nfs]
       [<e186e0aa>] ? __rpc_execute+0x62/0x17b [sunrpc]
       [<e186956c>] ? rpc_run_task+0x91/0x97 [sunrpc]
       [<e186967a>] ? rpc_call_sync+0x40/0x5b [sunrpc]
       [<e1969ca2>] ? nfs4_proc_access+0x10a/0x176 [nfs]
       [<e19572fa>] ? nfs_do_access+0x2b1/0x2c0 [nfs]
       [<e186ed61>] ? rpcauth_lookupcred+0x62/0x84 [sunrpc]
       [<e19573b6>] ? nfs_permission+0xad/0x13b [nfs]
       [<c0177824>] ? exec_permission+0x15/0x4b
       [<c0177fbd>] ? link_path_walk+0x4f/0x456
       [<c017867d>] ? path_walk+0x4c/0xa8
       [<c0179678>] ? do_path_lookup+0x1f/0x68
       [<c017a3fb>] ? user_path_at+0x37/0x5f
       [<c016359c>] ? handle_mm_fault+0x229/0x55b
       [<c0170a2d>] ? sys_faccessat+0x93/0x146
       [<c0170aef>] ? sys_access+0xf/0x13
       [<c02cf615>] ? syscall_call+0x7/0xb
      Code: 0f 94 c2 84 d2 74 09 8b 44 24 0c e8 6a e9 8b de 83 c4 14 89 d8 5b 5e 5f 5d c3 55 57 56 53 83 ec 1c fc 89 c6 8b 40 10 89 44 24 04 <8b> 58 58 85 db 0f 85 d4 00 00 00 0f b7 46 70 8b 56 20 89 c5 83
      EIP: [<e186ed94>] rpcauth_refreshcred+0x11/0x12c [sunrpc] SS:ESP 0068:de571d58
      CR2: 0000000000000058
      
      This appears to be caused by the function rpc_verify_header() first
      calling xprt_release(), then doing a call_refresh. If we release the
      transport slot, we should _always_ jump back to call_reserve before
      calling anything else.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Cc: stable@kernel.org
      118df3d1
  9. 24 10月, 2010 1 次提交
  10. 21 10月, 2010 3 次提交
    • C
      SUNRPC: Properly initialize sock_xprt.srcaddr in all cases · 92476850
      Chuck Lever 提交于
      The source address field in the transport's sock_xprt is initialized
      ONLY IF the RPC application passed a pointer to a source address
      during the call to rpc_create().  However, xs_bind() subsequently uses
      the value of this field without regard to whether the source address
      was initialized during transport creation or not.
      
      So far we've been lucky: the uninitialized value of this field is
      zeroes.  xs_bind(), until recently, used only the sin[6]_addr field in
      this sockaddr, and all zeroes is a valid value for this: it means
      ANYADDR.  This is a happy coincidence.
      
      However, xs_bind() now wants to use the sa_family field as well, and
      expects it to be initialized to something other than zero.
      
      Therefore, the source address sockaddr field should be fully
      initialized at transport create time in _every_ case, not just when
      the RPC application wants to use a specific bind address.
      
      Bruce added a workaround for this missing initialization by adjusting
      commit 6bc9638a, but the "right" way to do this is to ensure that the
      source address sockaddr is always correctly initialized from the
      get-go.
      
      This patch doesn't introduce a behavior change.  It's simply a
      clean-up of Bruce's fix, to prevent future problems of this kind.  It
      may look like overkill, but
      
        a) it clearly documents the default initial value of this field,
      
        b) it doesn't assume that the sockaddr_storage memory is first
           initialized to any particular value, and
      
        c) it will fail verbosely if some unknown address family is passed
           in
      
      Originally introduced by commit d3bc9a1d.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      92476850
    • C
      SUNRPC: Use conventional switch statement when reclassifying sockets · 4232e863
      Chuck Lever 提交于
      Clean up.
      
      Defensive coding: If "family" is ever something that is neither
      AF_INET nor AF_INET6, xs_reclassify_socket6() is not the appropriate
      default action.  Choose to do nothing in that case.
      
      Introduced by commit 6bc9638a.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      4232e863
    • T
      sunrpc/xprtrdma: clean up workqueue usage · a25e758c
      Tejun Heo 提交于
      * Create and use svc_rdma_wq instead of using the system workqueue and
        flush_scheduled_work().  This workqueue is necessary to serve as
        flushing domain for rdma->sc_work which is used to destroy itself
        and thus can't be flushed explicitly.
      
      * Replace cancel_delayed_work() + flush_scheduled_work() with
        cancel_delayed_work_sync().
      
      * Implement synchronous connect in xprt_rdma_connect() using
        flush_delayed_work() on the rdma_connect work instead of using
        flush_scheduled_work().
      
      This is to prepare for the deprecation and removal of
      flush_scheduled_work().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      a25e758c
  11. 19 10月, 2010 20 次提交
  12. 15 10月, 2010 1 次提交
    • A
      llseek: automatically add .llseek fop · 6038f373
      Arnd Bergmann 提交于
      All file_operations should get a .llseek operation so we can make
      nonseekable_open the default for future file operations without a
      .llseek pointer.
      
      The three cases that we can automatically detect are no_llseek, seq_lseek
      and default_llseek. For cases where we can we can automatically prove that
      the file offset is always ignored, we use noop_llseek, which maintains
      the current behavior of not returning an error from a seek.
      
      New drivers should normally not use noop_llseek but instead use no_llseek
      and call nonseekable_open at open time.  Existing drivers can be converted
      to do the same when the maintainer knows for certain that no user code
      relies on calling seek on the device file.
      
      The generated code is often incorrectly indented and right now contains
      comments that clarify for each added line why a specific variant was
      chosen. In the version that gets submitted upstream, the comments will
      be gone and I will manually fix the indentation, because there does not
      seem to be a way to do that using coccinelle.
      
      Some amount of new code is currently sitting in linux-next that should get
      the same modifications, which I will do at the end of the merge window.
      
      Many thanks to Julia Lawall for helping me learn to write a semantic
      patch that does all this.
      
      ===== begin semantic patch =====
      // This adds an llseek= method to all file operations,
      // as a preparation for making no_llseek the default.
      //
      // The rules are
      // - use no_llseek explicitly if we do nonseekable_open
      // - use seq_lseek for sequential files
      // - use default_llseek if we know we access f_pos
      // - use noop_llseek if we know we don't access f_pos,
      //   but we still want to allow users to call lseek
      //
      @ open1 exists @
      identifier nested_open;
      @@
      nested_open(...)
      {
      <+...
      nonseekable_open(...)
      ...+>
      }
      
      @ open exists@
      identifier open_f;
      identifier i, f;
      identifier open1.nested_open;
      @@
      int open_f(struct inode *i, struct file *f)
      {
      <+...
      (
      nonseekable_open(...)
      |
      nested_open(...)
      )
      ...+>
      }
      
      @ read disable optional_qualifier exists @
      identifier read_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      expression E;
      identifier func;
      @@
      ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
      {
      <+...
      (
         *off = E
      |
         *off += E
      |
         func(..., off, ...)
      |
         E = *off
      )
      ...+>
      }
      
      @ read_no_fpos disable optional_qualifier exists @
      identifier read_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      @@
      ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
      {
      ... when != off
      }
      
      @ write @
      identifier write_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      expression E;
      identifier func;
      @@
      ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
      {
      <+...
      (
        *off = E
      |
        *off += E
      |
        func(..., off, ...)
      |
        E = *off
      )
      ...+>
      }
      
      @ write_no_fpos @
      identifier write_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      @@
      ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
      {
      ... when != off
      }
      
      @ fops0 @
      identifier fops;
      @@
      struct file_operations fops = {
       ...
      };
      
      @ has_llseek depends on fops0 @
      identifier fops0.fops;
      identifier llseek_f;
      @@
      struct file_operations fops = {
      ...
       .llseek = llseek_f,
      ...
      };
      
      @ has_read depends on fops0 @
      identifier fops0.fops;
      identifier read_f;
      @@
      struct file_operations fops = {
      ...
       .read = read_f,
      ...
      };
      
      @ has_write depends on fops0 @
      identifier fops0.fops;
      identifier write_f;
      @@
      struct file_operations fops = {
      ...
       .write = write_f,
      ...
      };
      
      @ has_open depends on fops0 @
      identifier fops0.fops;
      identifier open_f;
      @@
      struct file_operations fops = {
      ...
       .open = open_f,
      ...
      };
      
      // use no_llseek if we call nonseekable_open
      ////////////////////////////////////////////
      @ nonseekable1 depends on !has_llseek && has_open @
      identifier fops0.fops;
      identifier nso ~= "nonseekable_open";
      @@
      struct file_operations fops = {
      ...  .open = nso, ...
      +.llseek = no_llseek, /* nonseekable */
      };
      
      @ nonseekable2 depends on !has_llseek @
      identifier fops0.fops;
      identifier open.open_f;
      @@
      struct file_operations fops = {
      ...  .open = open_f, ...
      +.llseek = no_llseek, /* open uses nonseekable */
      };
      
      // use seq_lseek for sequential files
      /////////////////////////////////////
      @ seq depends on !has_llseek @
      identifier fops0.fops;
      identifier sr ~= "seq_read";
      @@
      struct file_operations fops = {
      ...  .read = sr, ...
      +.llseek = seq_lseek, /* we have seq_read */
      };
      
      // use default_llseek if there is a readdir
      ///////////////////////////////////////////
      @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier readdir_e;
      @@
      // any other fop is used that changes pos
      struct file_operations fops = {
      ... .readdir = readdir_e, ...
      +.llseek = default_llseek, /* readdir is present */
      };
      
      // use default_llseek if at least one of read/write touches f_pos
      /////////////////////////////////////////////////////////////////
      @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read.read_f;
      @@
      // read fops use offset
      struct file_operations fops = {
      ... .read = read_f, ...
      +.llseek = default_llseek, /* read accesses f_pos */
      };
      
      @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier write.write_f;
      @@
      // write fops use offset
      struct file_operations fops = {
      ... .write = write_f, ...
      +	.llseek = default_llseek, /* write accesses f_pos */
      };
      
      // Use noop_llseek if neither read nor write accesses f_pos
      ///////////////////////////////////////////////////////////
      
      @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read_no_fpos.read_f;
      identifier write_no_fpos.write_f;
      @@
      // write fops use offset
      struct file_operations fops = {
      ...
       .write = write_f,
       .read = read_f,
      ...
      +.llseek = noop_llseek, /* read and write both use no f_pos */
      };
      
      @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier write_no_fpos.write_f;
      @@
      struct file_operations fops = {
      ... .write = write_f, ...
      +.llseek = noop_llseek, /* write uses no f_pos */
      };
      
      @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read_no_fpos.read_f;
      @@
      struct file_operations fops = {
      ... .read = read_f, ...
      +.llseek = noop_llseek, /* read uses no f_pos */
      };
      
      @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      @@
      struct file_operations fops = {
      ...
      +.llseek = noop_llseek, /* no read or write fn */
      };
      ===== End semantic patch =====
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Julia Lawall <julia@diku.dk>
      Cc: Christoph Hellwig <hch@infradead.org>
      6038f373