cxl: Use call_rcu to reduce latency when releasing the afu fd

The afu fd release path was identified as a significant bottleneck in the overall performance of cxl. While an optimal AFU design would minimise the need to close & reopen the AFU fd, it is not always practical to avoid. The bottleneck seems to be down to the call to synchronize_rcu(), which will block until every other thread is guaranteed to be out of an RCU critical section. Replace it with call_rcu() to free the context structures later so we can return to the application sooner. This reduces the time spent in the fd release path from 13356 usec to 13.3 usec - about a 100x speed up. Reported-by: N Fei K Chen <uchen@cn.ibm.com> Signed-off-by: N Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: N Michael Ellerman <mpe@ellerman.id.au>

cxl: Use call_rcu to reduce latency when releasing the afu fd
The afu fd release path was identified as a significant bottleneck in the overall performance of cxl. While an optimal AFU design would minimise the need to close & reopen the AFU fd, it is not always practical to avoid. The bottleneck seems to be down to the call to synchronize_rcu(), which will block until every other thread is guaranteed to be out of an RCU critical section. Replace it with call_rcu() to free the context structures later so we can return to the application sooner. This reduces the time spent in the fd release path from 13356 usec to 13.3 usec - about a 100x speed up. Reported-by: N Fei K Chen <uchen@cn.ibm.com> Signed-off-by: N Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: N Michael Ellerman <mpe@ellerman.id.au>
8ac75b96 · Ian Munsie · Michael Ellerman · e36f6fe1 · 8ac75b96 · 8ac75b96
隐藏空白更改
内联并排

Showing with 12 addition and 5 deletion

drivers/misc/cxl/context.c drivers/misc/cxl/context.c +10 -5

drivers/misc/cxl/cxl.h drivers/misc/cxl/cxl.h +2 -0

未找到文件。
--- a/drivers/misc/cxl/context.c
+++ b/drivers/misc/cxl/context.c
@@ -232,12 +232,9 @@ void cxl_context_detach_all(struct cxl_afu *afu)
 	mutex_unlock(&afu->contexts_lock);
 }
-void cxl_context_free(struct cxl_context *ctx)
+static void reclaim_ctx(struct rcu_head *rcu)
 {
-	mutex_lock(&ctx->afu->contexts_lock);
+	struct cxl_context *ctx = container_of(rcu, struct cxl_context, rcu);
-	idr_remove(&ctx->afu->contexts_idr, ctx->pe);
-	mutex_unlock(&ctx->afu->contexts_lock);
-	synchronize_rcu();
 	free_page((u64)ctx->sstp);
 	ctx->sstp = NULL;
@@ -245,3 +242,11 @@ void cxl_context_free(struct cxl_context *ctx)
 	put_pid(ctx->pid);
 	kfree(ctx);
 }
+void cxl_context_free(struct cxl_context *ctx)
+{
+	mutex_lock(&ctx->afu->contexts_lock);
+	idr_remove(&ctx->afu->contexts_idr, ctx->pe);
+	mutex_unlock(&ctx->afu->contexts_lock);
+	call_rcu(&ctx->rcu, reclaim_ctx);
+}
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -457,6 +457,8 @@ struct cxl_context {
 	bool pending_irq;
 	bool pending_fault;
 	bool pending_afu_err;
+	struct rcu_head rcu;
 };
 struct cxl {