Design Doc: Selected Rows

SelectedRows is a kind of sparse tensor data type, which is designed to support embedding operators. The gradient of embedding table is a sparse tensor. Only a few rows are non-zero values in that tensor. It is straightforward to represent the sparse tensor by the following sparse tensor data structure:

class SelectedRows {
 private:
  vector<int> rows_;
  Tensor value_;
  int height_;
};

The field height_ shows the first dimension of SelectedRows. The rows are the indices of which rows of SelectedRows are non-zeros. The value_ field is an N-dim tensor and shape is [rows.size() /* NUM_ROWS */, ...], which supplies values for each row. The dimension of SelectedRows satisfies [height_] + value_.shape[1:].

Suppose that a SelectedRows-typed variable x has many rows, but only two of them have values – row 73 is [1, 2] and row 84 is [3, 4], the SelectedRows representation would be:

x = SelectedRow {
  rows = [73, 84],
  value = [[1, 2], [3,4]]
}

SelectedRows in Protobuf

SelectedRows is a kind of Variable. VarDesc in protobuf should describe the SelectedRows information. Only the tensor dimension of a SelectedRows will be described in compile-time since the rows_ and value_ are related to training data. So we use TensorDesc to unify data_type and dims. A LodTensorDesc contains a TensorDesc and lod_level. The description of SelectedRows is a Tensor description.

message TensorDesc {
  required DataType data_type = 1;
  repeated int64 dims = 2; // [UNK, 640, 480] is saved as [-1, 640, 480]
}

message LodTensorDesc {
  required TensorDesc tensor = 1;
  optional int lod_level = 2;
}

message VarDesc {
  required string name = 1;
  enum VarType { 
    LOD_TENSOR = 0;
    SELECTED_ROWS = 1;
  }
  required VarType type = 2;
  optional LodTensorDesc lod_desc = 3;
  optional TensorDesc selected_rows_desc = 4;
  optional bool persistable = 5 [ default = false ];
}

InferShape for Selected Rows

Just like LoD information, InferShape method will inference output tensor type as well. The operator should decide whether its output is a SelectedRows or Dense tensor.

For example, the gradient operator of TableLookup will always generate SelectedRows. Its InferShape method should be like following

void TableLookupGrad::InferShape(context) {
  ...
  context.SetDataType("Embedding.Grad", kSelectedRows);
}

Sparse Operators

There are several operators should be written to support SelectedRows. They are:

  1. Operators which generates SelectedRows gradient. e.g. Gradient of TableLookupOp.
  2. Optimize operators which support SelectedRows gradient. e.g. SGD or AdaGrad for SelectedRows. However, there should be only one SGD operator. OpWithKernel::Run should select a suitable kernel for both dense tensor or SelectedRows.