1. 01 11月, 2020 1 次提交
    • J
      [tf.data] Minor cleanup · 6679bc94
      Jiri Simsa 提交于
      PiperOrigin-RevId: 340071266
      Change-Id: Ic21209a25a1f8efa1122c9cee4a8ab3b8043c308
      6679bc94
  2. 25 9月, 2020 1 次提交
  3. 17 9月, 2020 1 次提交
    • A
      [tf.data] Add dataset splitting mechanism. · 5703a4ee
      Andrew Audibert 提交于
      This CL introduces the concept of a SplitProvider. A SplitProvider produces a sequence of "split" tensors which are interpreted by source datasets to produce dataset elements.
      
      When we initialize an iterator, a SplitProvider can be passed through the IteratorContext to indicate that the iterator should only iterate through the splits provided by the SplitProvider.
      
      This CL adds an optional DatasetBase::MakeSplitIterator method which creates a SplitIterator to create splits for the dataset. For non-source datasets, the proper implementation is generally just to call MakeSplitIterator on their input. To support this reasonable default, we add a `DatasetBase::InputDatasets` method, which produces the input datasets for a dataset. If a dataset implements InputDatasets and has a single input dataset, MakeSplitIterator will call delegate to the input by default.
      
      This CL only implements splitting for range_dataset_op; other splitting implementation will come in later CLs. This CL also implements a `ShardingSplitProvider`, to better test the range_dataset_op splitting implementation. `ShardingSplitProvider` will be useful in its own right for implementing an alternative to AutoShard which leverages splitting.
      
      PiperOrigin-RevId: 332056019
      Change-Id: I73b9b03cb91ae689c57a72fa6ba0acd092cf4cbe
      5703a4ee
  4. 15 9月, 2020 1 次提交
  5. 03 9月, 2020 1 次提交
  6. 29 7月, 2020 1 次提交
  7. 23 7月, 2020 1 次提交
  8. 07 4月, 2020 1 次提交
    • J
      [tf.data] Adding a metric for bytes produced and consumed by individual... · eabc157f
      Jiri Simsa 提交于
      [tf.data] Adding a metric for bytes produced and consumed by individual transformations, refactoring infrastructure for recording tf.data metrics, and moving the metrics API and implementation from `common_runtime` to `framework`.
      
      PiperOrigin-RevId: 305062865
      Change-Id: I63911f00154baf36aa225f66dbef0843239b7392
      eabc157f
  9. 18 3月, 2020 1 次提交
  10. 06 3月, 2020 1 次提交
  11. 07 2月, 2020 1 次提交
  12. 26 11月, 2019 1 次提交
  13. 17 8月, 2019 1 次提交
  14. 08 8月, 2019 1 次提交
    • J
      [tf.data] Serialization and checkpointing related cleanup. · 6d8f05ac
      Jiri Simsa 提交于
      This CL:
      - removes unused `DatasetBase::Save()` and related tests
      - replaces `SerilizationContext::optimization_only` with multiple functionality specific flags (`check_external_state`, `fail_if_unimplemented`, and `serialize_data_tensors`)
      - introduces `DatasetBase::CheckExternalState` as an error-raising replacement for `DatasetBase::IsStateful` to make it possible to communicate the reason for why serialization failed through the error status
      - adds `IteratorBase::SaveInternal` and `IteratorBase::RestoreInternal` in preparation of making these methods pure virtual
      
      PiperOrigin-RevId: 262235093
      6d8f05ac
  15. 26 7月, 2019 1 次提交
    • J
      [tf.data] Changing the implementation of iterator checkpointing to not store the dataset graph. · 1f878734
      Jiri Simsa 提交于
      After this change, restoring an iterator from a checkpoint will require that the iterator is initialized using a dataset that matches the dataset used for initializing the iterator used to create the checkpoint. In other words, if the Python definition of the input pipeline changes, the restoration of the iterator will fail.
      
      The motivation for this change is to make it possible to save (and restore) datasets whose graph cannot be serialized (e.g. because it contains ops with resource inputs). This will in turn allow tf.data to implement "reshuffle each iteration" or in-memory caching between different Python iterator for the same dataset.
      
      PiperOrigin-RevId: 260144783
      1f878734
  16. 25 6月, 2019 1 次提交
  17. 04 4月, 2019 1 次提交
  18. 14 3月, 2019 1 次提交
  19. 02 3月, 2019 1 次提交
    • D
      [tf.data] Add an unbounded thread pool to iterator resources. · 70da1fe2
      Derek Murray 提交于
      The previous implementation of many core `tf.data` transformations
      (e.g. `Dataset.prefetch()`) would create one or more threads each time
      an iterator over those datasets is created
      (e.g. `ds.prefetch(N).repeat(100)` would create and destroy 100
      threads). In addition to the overhead of thread creation, this
      interacts poorly with some malloc implementations, and can contribute
      to memory fragmentation.
      
      The new implementation maintains an unbounded pool of physical threads
      in each iterator (or `MultiDeviceIterator`) resource, and returns logical
      "threads" to that pool when their work is complete instead of exiting
      from them.
      
      PiperOrigin-RevId: 236413014
      70da1fe2
  20. 15 1月, 2019 1 次提交
    • J
      [tf.data] Add counters for tf.data elements, autotuning, and optimizations. · b74605a9
      Jiri Simsa 提交于
      This CL:
      - adds counters for tf.data elements, autotuning and optimizations
      - sets the number of iterations of the `tf_data_meta_optimizer` to one -- the iteration of tf.data optimizations is handled by the tf.data meta optimizer itself
      - adds the `alwayslink` attribute to all tf.data optimization BUILD targets to make sure they are always registered (without this, they would not be registered for the Tensorflow server binary I was using for local testing) and further cleans up visibility and dependencies of //third_party/tensorflow/core/grappler/optimizers/data/BUILD
      - introduces TFDataOptimizerBase as a base class for tf.data optimizations
      - moves TensorFlow metrics into tensorflow::metrics namespace
      
      PiperOrigin-RevId: 229302097
      b74605a9
  21. 21 12月, 2018 1 次提交
  22. 05 12月, 2018 2 次提交
  23. 09 11月, 2018 1 次提交
  24. 06 11月, 2018 1 次提交
  25. 31 10月, 2018 2 次提交
  26. 26 10月, 2018 1 次提交
  27. 09 10月, 2018 2 次提交
  28. 04 10月, 2018 2 次提交
  29. 21 9月, 2018 1 次提交
  30. 18 9月, 2018 1 次提交
    • J
      [tf.data] Adding support for `tf.data.AUTOTUNE` as a special value for the... · c8a0dfc7
      Jiri Simsa 提交于
      [tf.data] Adding support for `tf.data.AUTOTUNE` as a special value for the `num_parallel_calls` argument of `tf.data.Dataset.map()`, `tf.data.Dataset.interleave()`, and `tf.contrib.data.map_and_batch()`.
      
      When `tf.data.AUTOTUNE` is specified, the level of parallelism is determined at runtime. The underlying mechanism instruments the input pipeline to build a performance model and then uses the model to find the optimal values for the parallelism knobs.
      
      PiperOrigin-RevId: 213283297
      c8a0dfc7
  31. 12 9月, 2018 1 次提交
  32. 06 9月, 2018 1 次提交
  33. 14 8月, 2018 1 次提交
    • J
      [tf.data] Internal refactoring of C++ classes and APIs. · 83f1458e
      Jiri Simsa 提交于
      - replacing `OpKernelContext` with newly introduced `DatasetContext` in `DatasetBase` constructor to make it possible to instantiate `DatasetBase` in places where an instance of `OpKernelContext` is not available
      
      - replacing `dataset::MakeIteratorContext(OpKernelContext* ctx)` factory with `IteratorContext(OpKernelContext *ctx)` constructor.
      
      - folding `GraphDatasetBase` into `DataseBase` and removing the default implementation of `AsGraphDefInternal`, making it the responsibility of the derived class to implement it to encourage/hint developers to provide serialization logic
      
      PiperOrigin-RevId: 208560010
      83f1458e
  34. 11 8月, 2018 2 次提交
    • J
      [tf.data] Optimization checkpointing improvements. · 8d532ac4
      Jiri Simsa 提交于
      This CL:
      - changes the `OptimizeDataset` checkpointing logic to checkpoint the optimized dataset (as opposed to the original dataset + the optimizations, re-running optimization every time a checkpoint is restored)
      - replaces `OpKernelContext` with newly introduced `SerializationContext` in the signature of `AsGraphDefInternal` to reduce the scope of the context and also simplify the logic for overriding the `FunctionLibraryDefinition` when optimizations take place
      
      PiperOrigin-RevId: 208282562
      8d532ac4
    • J
      [tf.data] Minor API refactoring. · 0d1b1448
      Jiri Simsa 提交于
      Renaming `AddParentDataset`, `SaveParent`, and `RestoreParent` to `AddInputDataset`, `SaveInput`, and `RestoreInput`.
      
      PiperOrigin-RevId: 208272695
      0d1b1448
  35. 01 6月, 2018 1 次提交
    • B
      [tf.data] Mark DebugString() as const. · 3e3dd647
      Brennan Saeta 提交于
      By marking DebugString() as const we can make some error messages more descriptive. Because DatasetIterator marks the return value of the dataset() function const, DebugString() cannot be called.
      
      PiperOrigin-RevId: 198796894
      3e3dd647