Update doc with Scala First-N and Partition/Rebalance

2b4d7792 · Aljoscha Krettek · 66f236d2 · 2b4d7792 · 2b4d7792
隐藏空白更改
内联并排

Showing with 83 addition and 17 deletion

docs/dataset_transformations.md docs/dataset_transformations.md +55 -16

docs/programming_guide.md docs/programming_guide.md +28 -1

未找到文件。
--- a/docs/dataset_transformations.md
+++ b/docs/dataset_transformations.md
@@ -284,7 +284,7 @@ When using Case Classes you can also specify the grouping key using the names of

 ~~~scala
 case class MyClass(val a: String, b: Int, c: Double)
-val tuples = DataSet[MyClass]] = // [...]
+val tuples = DataSet[MyClass] = // [...]
 // group on the first and second field
 val reducedTuples = tuples.groupBy("a", "b").reduce { ... }
 ~~~
@@ -1103,15 +1103,11 @@ val unioned = vals1.union(vals2).union(vals3)
 </div>
 </div>

-### Rebalance (Java API Only)
-
+### Rebalance
 Evenly rebalances the parallel partitions of a DataSet to eliminate data skew.
-Only Map-like transformations may follow a rebalance transformation, i.e.,

- Map
- FlatMap
- Filter
- MapPartition
+<div class="codetabs" markdown="1">
+<div data-lang="java" markdown="1">

 ~~~java
 DataSet<String> in = // [...]
@@ -1120,16 +1116,26 @@ DataSet<Tuple2<String, String>> out = in.rebalance()
                                        .map(new Mapper());
 ~~~

-### Hash-Partition (Java API Only)
+</div>
+<div data-lang="scala" markdown="1">
+
+~~~scala
+val in: DataSet[String] = // [...]
+// rebalance DataSet and apply a Map transformation.
+val out = in.rebalance().map { ... }
+~~~
+
+</div>
+</div>
+
+
+### Hash-Partition

 Hash-partitions a DataSet on a given key. 
 Keys can be specified as key-selector functions or field position keys (see [Reduce examples](#reduce-on-grouped-dataset) for how to specify keys).
-Only Map-like transformations may follow a hash-partition transformation, i.e.,

- Map
- FlatMap
- Filter
- MapPartition
+<div class="codetabs" markdown="1">
+<div data-lang="java" markdown="1">

 ~~~java
 DataSet<Tuple2<String, Integer>> in = // [...]
@@ -1138,10 +1144,25 @@ DataSet<Tuple2<String, String>> out = in.partitionByHash(0)
                                        .mapPartition(new PartitionMapper());
 ~~~

-### First-n (Java API Only)
+</div>
+<div data-lang="scala" markdown="1">
+
+~~~scala
+val in: DataSet[(String, Int)] = // [...]
+// hash-partition DataSet by String value and apply a MapPartition transformation.
+val out = in.partitionByHash(0).mapPartition { ... }
+~~~
+
+</div>
+</div>
+
+### First-n

 Returns the first n (arbitrary) elements of a DataSet. First-n can be applied on a regular DataSet, a grouped DataSet, or a grouped-sorted DataSet. Grouping keys can be specified as key-selector functions or field position keys (see [Reduce examples](#reduce-on-grouped-dataset) for how to specify keys).

+<div class="codetabs" markdown="1">
+<div data-lang="java" markdown="1">
+
 ~~~java
 DataSet<Tuple2<String, Integer>> in = // [...]
 // Return the first five (arbitrary) elements of the DataSet
@@ -1155,4 +1176,22 @@ DataSet<Tuple2<String, Integer>> out2 = in.groupBy(0)
 DataSet<Tuple2<String, Integer>> out3 = in.groupBy(0)
                                          .sortGroup(1, Order.ASCENDING)
                                          .first(3);
-~~~
\ No newline at end of file
+~~~
+
+</div>
+<div data-lang="scala" markdown="1">
+
+~~~scala
+val in: DataSet[(String, Int)] = // [...]
+// Return the first five (arbitrary) elements of the DataSet
+val out1 = in.first(5)
+
+// Return the first two (arbitrary) elements of each String group
+val out2 = in.groupBy(0).first(2)
+
+// Return the first three elements of each String group ordered by the Integer field
+val out3 = in.groupBy(0).sortGroup(1, Order.ASCENDING).first(3)
+~~~
+
+</div>
+</div>
\ No newline at end of file
--- a/docs/programming_guide.md
+++ b/docs/programming_guide.md
@@ -608,7 +608,7 @@ DataSet<String> result = in.rebalance()
    <tr>
      <td><strong>Hash-Partition</strong></td>
      <td>
-        <p>Hash-partitions a data set on a given key. Keys can be specified as key-selector functions or field position keys. Only Map-like transformations may follow a hash-partition transformation. (Java API Only)</p>
+        <p>Hash-partitions a data set on a given key. Keys can be specified as key-selector functions or field position keys.</p>
 {% highlight java %}
 DataSet<Tuple2<String,Integer>> in = // [...]
 DataSet<Integer> result = in.partitionByHash(0)
@@ -804,6 +804,33 @@ val result: DataSet[(Int, String)] = data1.cross(data2)
        <p>Produces the union of two data sets.</p>
 {% highlight scala %}
 data.union(data2)
+{% endhighlight %}
+      </td>
+    </tr>
+    <tr>
+      <td><strong>Hash-Partition</strong></td>
+      <td>
+        <p>Hash-partitions a data set on a given key. Keys can be specified as key-selector functions, tuple positions
+        or case class fields.</p>
+{% highlight scala %}
+val in: DataSet[(Int, String)] = // [...]
+val result = in.partitionByHash(0).mapPartition { ... }
+{% endhighlight %}
+      </td>
+    </tr>
+    <tr>
+      <td><strong>First-n</strong></td>
+      <td>
+        <p>Returns the first n (arbitrary) elements of a data set. First-n can be applied on a regular data set, a grouped data set, or a grouped-sorted data set. Grouping keys can be specified as key-selector functions,
+        tuple positions or case class fields.</p>
+{% highlight scala %}
+val in: DataSet[(Int, String)] = // [...]
+// regular data set
+val result1 = in.first(3)
+// grouped data set
+val result2 = in.groupBy(0).first(3)
+// grouped-sorted data set
+val result3 = in.groupBy(0).sortGroup(1, Order.ASCENDING).first(3)
 {% endhighlight %}
      </td>
    </tr>