[IOTDB-2438] update the user guide of Spark Connector (#4931)

91dd7f7b · Xuan Ronaldo · GitHub · 336e88a8 · 91dd7f7b · 91dd7f7b
2 changed file
--- a/docs/UserGuide/Ecosystem Integration/Spark IoTDB.md
+++ b/docs/UserGuide/Ecosystem Integration/Spark IoTDB.md
@@ -19,6 +19,7 @@

 -->
 ## Spark-IoTDB
+
 ### version

 The versions required for Spark and Java are as follow:
@@ -28,12 +29,13 @@ The versions required for Spark and Java are as follow:
 | `2.4.5`        | `2.12`        | `1.8`        | `0.13.0`|


-> Currently we only support spark version 2.4.3 and there are some known issue on 2.4.7, do no use it
+> Currently we only support spark version 2.4.5 and there are some known issue on 2.4.7, do no use it


 ### Install
+```shell
 mvn clean scala:compile compile install
-
+```

 #### Maven Dependency

@@ -45,11 +47,15 @@ mvn clean scala:compile compile install
    </dependency>
 ```

-
 #### spark-shell user guide

+Notice: There is a conflict of thrift version between IoTDB and Spark. 
+Therefore, if you want to debug in spark-shell, you need to execute `rm -f $SPARK_HOME/jars/libthrift*` and `cp $IOTDB_HOME/lib/libthrift* $SPARK_HOME/jars/` to resolve it.
+Otherwise, you can only debug the code in IDE. If you want to run your task by `spark-submit`, you must package with dependency.
+
+
 ```
-spark-shell --jars spark-iotdb-connector-0.13.0.jar,iotdb-jdbc-0.13.0-jar-with-dependencies.jar
+spark-shell --jars spark-iotdb-connector-0.13.0.jar,iotdb-jdbc-0.13.0-jar-with-dependencies.jar,iotdb-session-0.13.0-jar-with-dependencies.jar

 import org.apache.iotdb.spark.db._

@@ -63,7 +69,7 @@ df.show()
 To partition rdd:

 ```
-spark-shell --jars spark-iotdb-connector-0.13.0.jar,iotdb-jdbc-0.13.0-jar-with-dependencies.jar
+spark-shell --jars spark-iotdb-connector-0.13.0.jar,iotdb-jdbc-0.13.0-jar-with-dependencies.jar,iotdb-session-0.13.0-jar-with-dependencies.jar

 import org.apache.iotdb.spark.db._

@@ -118,7 +124,7 @@ You can also use narrow table form which as follows: (You can see part 4 about h

 * from wide to narrow

-```
+```scala
 import org.apache.iotdb.spark.db._

 val wide_df = spark.read.format("org.apache.iotdb.spark.db").option("url", "jdbc:iotdb://127.0.0.1:6667/").option("sql", "select * from root where time < 1100 and time > 1000").load
@@ -127,7 +133,7 @@ val narrow_df = Transformer.toNarrowForm(spark, wide_df)

 * from narrow to wide

-```
+```scala
 import org.apache.iotdb.spark.db._

 val wide_df = Transformer.toWideForm(spark, narrow_df)
@@ -135,11 +141,11 @@ val wide_df = Transformer.toWideForm(spark, narrow_df)

 #### Java user guide

-```
+```java
 import org.apache.spark.sql.Dataset;
 import org.apache.spark.sql.Row;
 import org.apache.spark.sql.SparkSession;
-import org.apache.iotdb.spark.db.*
+import org.apache.iotdb.spark.db.*;

 public class Example {

@@ -158,15 +164,15 @@ public class Example {

    df.show();
    
-    Dataset<Row> narrowTable = Transformer.toNarrowForm(spark, df)
-    narrowTable.show()
+    Dataset<Row> narrowTable = Transformer.toNarrowForm(spark, df);
+    narrowTable.show();
  }
 }
 ```

-## Write Data to IoTDB
+### Write Data to IoTDB

-### User Guide
+#### User Guide
 ``` scala
 // import narrow table
 val df = spark.createDataFrame(List(
@@ -205,6 +211,6 @@ dfWithColumn.write.format("org.apache.iotdb.spark.db")
    .save
 ```

-### Notes
+#### Notes
 1. You can directly write data to IoTDB whatever the dataframe contains a wide table or a narrow table.
-2. The parameter `numPartition` is used to set the number of partitions. The dataframe that you want to save will be repartition based on this parameter before  writing data. Each partition will open a session to write data to increase the number of concurrent requests.
\ No newline at end of file
+2. The parameter `numPartition` is used to set the number of partitions. The dataframe that you want to save will be repartition based on this parameter before  writing data. Each partition will open a session to write data to increase the number of concurrent requests.
--- a/docs/zh/UserGuide/Ecosystem Integration/Spark IoTDB.md
+++ b/docs/zh/UserGuide/Ecosystem Integration/Spark IoTDB.md
@@ -45,7 +45,10 @@ mvn clean scala:compile compile install

 #### Spark-shell用户指南

-```
+注意：因为IoTDB与Spark的thrift版本有冲突，所以需要通过执行`rm -f $SPARK_HOME/jars/libthrift*`和`cp $IOTDB_HOME/lib/libthrift* $SPARK_HOME/jars/`这两个命令来解决。
+否则的话，就只能在IDE里面进行代码调试。而且如果你需要通过`spark-submit`命令提交任务的话，你打包时必须要带上依赖。
+
+```shell
 spark-shell --jars spark-iotdb-connector-0.13.0.jar,iotdb-jdbc-0.13.0-jar-with-dependencies.jar

 import org.apache.iotdb.spark.db._
@@ -59,7 +62,7 @@ df.show()

 如果要对rdd进行分区，可以执行以下操作

-```
+```shell
 spark-shell --jars spark-iotdb-connector-0.13.0.jar,iotdb-jdbc-0.13.0-jar-with-dependencies.jar

 import org.apache.iotdb.spark.db._
@@ -123,7 +126,7 @@ TsFile中的现有数据如下：

 * 从宽到窄

-```
+```scala
 import org.apache.iotdb.spark.db._

 val wide_df = spark.read.format("org.apache.iotdb.spark.db").option("url", "jdbc:iotdb://127.0.0.1:6667/").option("sql", "select * from root where time < 1100 and time > 1000").load
@@ -132,7 +135,7 @@ val narrow_df = Transformer.toNarrowForm(spark, wide_df)

 * 从窄到宽

-```
+```scala
 import org.apache.iotdb.spark.db._

 val wide_df = Transformer.toWideForm(spark, narrow_df)
@@ -140,11 +143,11 @@ val wide_df = Transformer.toWideForm(spark, narrow_df)

 #### Java用户指南

-```
+```java
 import org.apache.spark.sql.Dataset;
 import org.apache.spark.sql.Row;
 import org.apache.spark.sql.SparkSession;
-import org.apache.iotdb.spark.db.*
+import org.apache.iotdb.spark.db.*;

 public class Example {

@@ -163,14 +166,14 @@ public class Example {

    df.show();
    
-    Dataset<Row> narrowTable = Transformer.toNarrowForm(spark, df)
-    narrowTable.show()
+    Dataset<Row> narrowTable = Transformer.toNarrowForm(spark, df);
+    narrowTable.show();
  }
 }
 ```

-## 写数据到IoTDB
-### 用户指南
+### 写数据到IoTDB
+#### 用户指南
 ``` scala
 // import narrow table
 val df = spark.createDataFrame(List(
@@ -209,6 +212,6 @@ dfWithColumn.write.format("org.apache.iotdb.spark.db")
    .save
 ```

-### 注意
+#### 注意
 1. 无论dataframe中存放的是窄表还是宽表，都可以直接将数据写到IoTDB中。
-2. numPartition参数是用来设置分区数，会在写入数据之前给dataframe进行重分区。每一个分区都会开启一个session进行数据的写入，来提高并发数。
\ No newline at end of file
+2. numPartition参数是用来设置分区数，会在写入数据之前给dataframe进行重分区。每一个分区都会开启一个session进行数据的写入，来提高并发数。