提交 2b4d7792 编写于 作者: A Aljoscha Krettek

Update doc with Scala First-N and Partition/Rebalance

上级 66f236d2
......@@ -284,7 +284,7 @@ When using Case Classes you can also specify the grouping key using the names of
~~~scala
case class MyClass(val a: String, b: Int, c: Double)
val tuples = DataSet[MyClass]] = // [...]
val tuples = DataSet[MyClass] = // [...]
// group on the first and second field
val reducedTuples = tuples.groupBy("a", "b").reduce { ... }
~~~
......@@ -1103,15 +1103,11 @@ val unioned = vals1.union(vals2).union(vals3)
</div>
</div>
### Rebalance (Java API Only)
### Rebalance
Evenly rebalances the parallel partitions of a DataSet to eliminate data skew.
Only Map-like transformations may follow a rebalance transformation, i.e.,
- Map
- FlatMap
- Filter
- MapPartition
<div class="codetabs" markdown="1">
<div data-lang="java" markdown="1">
~~~java
DataSet<String> in = // [...]
......@@ -1120,16 +1116,26 @@ DataSet<Tuple2<String, String>> out = in.rebalance()
.map(new Mapper());
~~~
### Hash-Partition (Java API Only)
</div>
<div data-lang="scala" markdown="1">
~~~scala
val in: DataSet[String] = // [...]
// rebalance DataSet and apply a Map transformation.
val out = in.rebalance().map { ... }
~~~
</div>
</div>
### Hash-Partition
Hash-partitions a DataSet on a given key.
Keys can be specified as key-selector functions or field position keys (see [Reduce examples](#reduce-on-grouped-dataset) for how to specify keys).
Only Map-like transformations may follow a hash-partition transformation, i.e.,
- Map
- FlatMap
- Filter
- MapPartition
<div class="codetabs" markdown="1">
<div data-lang="java" markdown="1">
~~~java
DataSet<Tuple2<String, Integer>> in = // [...]
......@@ -1138,10 +1144,25 @@ DataSet<Tuple2<String, String>> out = in.partitionByHash(0)
.mapPartition(new PartitionMapper());
~~~
### First-n (Java API Only)
</div>
<div data-lang="scala" markdown="1">
~~~scala
val in: DataSet[(String, Int)] = // [...]
// hash-partition DataSet by String value and apply a MapPartition transformation.
val out = in.partitionByHash(0).mapPartition { ... }
~~~
</div>
</div>
### First-n
Returns the first n (arbitrary) elements of a DataSet. First-n can be applied on a regular DataSet, a grouped DataSet, or a grouped-sorted DataSet. Grouping keys can be specified as key-selector functions or field position keys (see [Reduce examples](#reduce-on-grouped-dataset) for how to specify keys).
<div class="codetabs" markdown="1">
<div data-lang="java" markdown="1">
~~~java
DataSet<Tuple2<String, Integer>> in = // [...]
// Return the first five (arbitrary) elements of the DataSet
......@@ -1155,4 +1176,22 @@ DataSet<Tuple2<String, Integer>> out2 = in.groupBy(0)
DataSet<Tuple2<String, Integer>> out3 = in.groupBy(0)
.sortGroup(1, Order.ASCENDING)
.first(3);
~~~
\ No newline at end of file
~~~
</div>
<div data-lang="scala" markdown="1">
~~~scala
val in: DataSet[(String, Int)] = // [...]
// Return the first five (arbitrary) elements of the DataSet
val out1 = in.first(5)
// Return the first two (arbitrary) elements of each String group
val out2 = in.groupBy(0).first(2)
// Return the first three elements of each String group ordered by the Integer field
val out3 = in.groupBy(0).sortGroup(1, Order.ASCENDING).first(3)
~~~
</div>
</div>
\ No newline at end of file
......@@ -608,7 +608,7 @@ DataSet<String> result = in.rebalance()
<tr>
<td><strong>Hash-Partition</strong></td>
<td>
<p>Hash-partitions a data set on a given key. Keys can be specified as key-selector functions or field position keys. Only Map-like transformations may follow a hash-partition transformation. (Java API Only)</p>
<p>Hash-partitions a data set on a given key. Keys can be specified as key-selector functions or field position keys.</p>
{% highlight java %}
DataSet<Tuple2<String,Integer>> in = // [...]
DataSet<Integer> result = in.partitionByHash(0)
......@@ -804,6 +804,33 @@ val result: DataSet[(Int, String)] = data1.cross(data2)
<p>Produces the union of two data sets.</p>
{% highlight scala %}
data.union(data2)
{% endhighlight %}
</td>
</tr>
<tr>
<td><strong>Hash-Partition</strong></td>
<td>
<p>Hash-partitions a data set on a given key. Keys can be specified as key-selector functions, tuple positions
or case class fields.</p>
{% highlight scala %}
val in: DataSet[(Int, String)] = // [...]
val result = in.partitionByHash(0).mapPartition { ... }
{% endhighlight %}
</td>
</tr>
<tr>
<td><strong>First-n</strong></td>
<td>
<p>Returns the first n (arbitrary) elements of a data set. First-n can be applied on a regular data set, a grouped data set, or a grouped-sorted data set. Grouping keys can be specified as key-selector functions,
tuple positions or case class fields.</p>
{% highlight scala %}
val in: DataSet[(Int, String)] = // [...]
// regular data set
val result1 = in.first(3)
// grouped data set
val result2 = in.groupBy(0).first(3)
// grouped-sorted data set
val result3 = in.groupBy(0).sortGroup(1, Order.ASCENDING).first(3)
{% endhighlight %}
</td>
</tr>
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册