Class SparkUtils
- java.lang.Object
-
- org.apache.sysds.runtime.instructions.spark.utils.SparkUtils
-
public class SparkUtils extends Object
-
-
Field Summary
Fields Modifier and Type Field Description static org.apache.spark.storage.StorageLevelDEFAULT_TMP
-
Constructor Summary
Constructors Constructor Description SparkUtils()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixCell>cacheBinaryCellRDD(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixCell> input)static voidcheckSparsity(String varname, ExecutionContext ec)static DataCharacteristicscomputeDataCharacteristics(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixCell> input)Utility to compute dimensions and non-zeros in a given RDD of binary cells.static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>copyBinaryBlockMatrix(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in)Creates a partitioning-preserving deep copy of the input matrix RDD, where the indexes and values are copied.static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>copyBinaryBlockMatrix(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, boolean deep)Creates a partitioning-preserving copy of the input matrix RDD.static org.apache.spark.api.java.JavaPairRDD<TensorIndexes,BasicTensorBlock>copyBinaryBlockTensor(org.apache.spark.api.java.JavaPairRDD<TensorIndexes,BasicTensorBlock> in)Creates a partitioning-preserving deep copy of the input tensor RDD, where the indexes and values are copied.static org.apache.spark.api.java.JavaPairRDD<TensorIndexes,BasicTensorBlock>copyBinaryBlockTensor(org.apache.spark.api.java.JavaPairRDD<TensorIndexes,BasicTensorBlock> in, boolean deep)Creates a partitioning-preserving copy of the input tensor RDD.static List<scala.Tuple2<Long,FrameBlock>>fromIndexedFrameBlock(List<Pair<Long,FrameBlock>> in)static scala.Tuple2<Long,FrameBlock>fromIndexedFrameBlock(Pair<Long,FrameBlock> in)static List<scala.Tuple2<MatrixIndexes,MatrixBlock>>fromIndexedMatrixBlock(List<IndexedMatrixValue> in)static scala.Tuple2<MatrixIndexes,MatrixBlock>fromIndexedMatrixBlock(IndexedMatrixValue in)static List<Pair<MatrixIndexes,MatrixBlock>>fromIndexedMatrixBlockToPair(List<IndexedMatrixValue> in)static Pair<MatrixIndexes,MatrixBlock>fromIndexedMatrixBlockToPair(IndexedMatrixValue in)static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>getEmptyBlockRDD(org.apache.spark.api.java.JavaSparkContext sc, DataCharacteristics mc)Creates an RDD of empty blocks according to the given matrix characteristics.static longgetNonZeros(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> input)static longgetNonZeros(MatrixObject mo)static intgetNumPreferredPartitions(DataCharacteristics dc)static intgetNumPreferredPartitions(DataCharacteristics dc, boolean outputEmptyBlocks)static intgetNumPreferredPartitions(DataCharacteristics dc, org.apache.spark.api.java.JavaPairRDD<?,?> in)static StringgetPrefixFromSparkDebugInfo(String line)static StringgetStartLineFromSparkDebugInfo(String line)static booleanisHashPartitioned(org.apache.spark.api.java.JavaPairRDD<?,?> in)Indicates if the input RDD is hash partitioned, i.e., it has a partitioner of typeorg.apache.spark.HashPartitioner.static voidpostprocessUltraSparseOutput(MatrixObject mo, DataCharacteristics mcOut)static Pair<Long,FrameBlock>toIndexedFrameBlock(scala.Tuple2<Long,FrameBlock> in)static List<Pair<Long,Long>>toIndexedLong(List<scala.Tuple2<Long,Long>> in)static IndexedMatrixValuetoIndexedMatrixBlock(MatrixIndexes ix, MatrixBlock mb)static IndexedMatrixValuetoIndexedMatrixBlock(scala.Tuple2<MatrixIndexes,MatrixBlock> in)static IndexedTensorBlocktoIndexedTensorBlock(TensorIndexes ix, TensorBlock mb)static IndexedTensorBlocktoIndexedTensorBlock(scala.Tuple2<TensorIndexes,TensorBlock> in)
-
-
-
Method Detail
-
toIndexedMatrixBlock
public static IndexedMatrixValue toIndexedMatrixBlock(scala.Tuple2<MatrixIndexes,MatrixBlock> in)
-
toIndexedMatrixBlock
public static IndexedMatrixValue toIndexedMatrixBlock(MatrixIndexes ix, MatrixBlock mb)
-
toIndexedTensorBlock
public static IndexedTensorBlock toIndexedTensorBlock(scala.Tuple2<TensorIndexes,TensorBlock> in)
-
toIndexedTensorBlock
public static IndexedTensorBlock toIndexedTensorBlock(TensorIndexes ix, TensorBlock mb)
-
fromIndexedMatrixBlock
public static scala.Tuple2<MatrixIndexes,MatrixBlock> fromIndexedMatrixBlock(IndexedMatrixValue in)
-
fromIndexedMatrixBlock
public static List<scala.Tuple2<MatrixIndexes,MatrixBlock>> fromIndexedMatrixBlock(List<IndexedMatrixValue> in)
-
fromIndexedMatrixBlockToPair
public static Pair<MatrixIndexes,MatrixBlock> fromIndexedMatrixBlockToPair(IndexedMatrixValue in)
-
fromIndexedMatrixBlockToPair
public static List<Pair<MatrixIndexes,MatrixBlock>> fromIndexedMatrixBlockToPair(List<IndexedMatrixValue> in)
-
fromIndexedFrameBlock
public static scala.Tuple2<Long,FrameBlock> fromIndexedFrameBlock(Pair<Long,FrameBlock> in)
-
fromIndexedFrameBlock
public static List<scala.Tuple2<Long,FrameBlock>> fromIndexedFrameBlock(List<Pair<Long,FrameBlock>> in)
-
toIndexedLong
public static List<Pair<Long,Long>> toIndexedLong(List<scala.Tuple2<Long,Long>> in)
-
toIndexedFrameBlock
public static Pair<Long,FrameBlock> toIndexedFrameBlock(scala.Tuple2<Long,FrameBlock> in)
-
isHashPartitioned
public static boolean isHashPartitioned(org.apache.spark.api.java.JavaPairRDD<?,?> in)
Indicates if the input RDD is hash partitioned, i.e., it has a partitioner of typeorg.apache.spark.HashPartitioner.- Parameters:
in- input JavaPairRDD- Returns:
- true if input is hash partitioned
-
getNumPreferredPartitions
public static int getNumPreferredPartitions(DataCharacteristics dc, org.apache.spark.api.java.JavaPairRDD<?,?> in)
-
getNumPreferredPartitions
public static int getNumPreferredPartitions(DataCharacteristics dc)
-
getNumPreferredPartitions
public static int getNumPreferredPartitions(DataCharacteristics dc, boolean outputEmptyBlocks)
-
copyBinaryBlockMatrix
public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> copyBinaryBlockMatrix(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in)
Creates a partitioning-preserving deep copy of the input matrix RDD, where the indexes and values are copied.- Parameters:
in- matrix asJavaPairRDD<MatrixIndexes,MatrixBlock>- Returns:
- matrix as
JavaPairRDD<MatrixIndexes,MatrixBlock>
-
copyBinaryBlockMatrix
public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> copyBinaryBlockMatrix(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, boolean deep)
Creates a partitioning-preserving copy of the input matrix RDD. If a deep copy is requested, indexes and values are copied, otherwise they are simply passed through.- Parameters:
in- matrix asJavaPairRDD<MatrixIndexes,MatrixBlock>deep- if true, perform deep copy- Returns:
- matrix as
JavaPairRDD<MatrixIndexes,MatrixBlock>
-
copyBinaryBlockTensor
public static org.apache.spark.api.java.JavaPairRDD<TensorIndexes,BasicTensorBlock> copyBinaryBlockTensor(org.apache.spark.api.java.JavaPairRDD<TensorIndexes,BasicTensorBlock> in)
Creates a partitioning-preserving deep copy of the input tensor RDD, where the indexes and values are copied.- Parameters:
in- tensor asJavaPairRDD<TensorIndexes,HomogTensor>- Returns:
- tensor as
JavaPairRDD<TensorIndexes,HomogTensor>
-
copyBinaryBlockTensor
public static org.apache.spark.api.java.JavaPairRDD<TensorIndexes,BasicTensorBlock> copyBinaryBlockTensor(org.apache.spark.api.java.JavaPairRDD<TensorIndexes,BasicTensorBlock> in, boolean deep)
Creates a partitioning-preserving copy of the input tensor RDD. If a deep copy is requested, indexes and values are copied, otherwise they are simply passed through.- Parameters:
in- tensor asJavaPairRDD<TensorIndexes,HomogTensor>deep- if true, perform deep copy- Returns:
- tensor as
JavaPairRDD<TensorIndexes,HomogTensor>
-
checkSparsity
public static void checkSparsity(String varname, ExecutionContext ec)
-
getEmptyBlockRDD
public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> getEmptyBlockRDD(org.apache.spark.api.java.JavaSparkContext sc, DataCharacteristics mc)
Creates an RDD of empty blocks according to the given matrix characteristics. This is done in a scalable manner by parallelizing block ranges and generating empty blocks in a distributed manner, under awareness of preferred output partition sizes.- Parameters:
sc- spark contextmc- matrix characteristics- Returns:
- pair rdd of empty matrix blocks
-
cacheBinaryCellRDD
public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixCell> cacheBinaryCellRDD(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixCell> input)
-
computeDataCharacteristics
public static DataCharacteristics computeDataCharacteristics(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixCell> input)
Utility to compute dimensions and non-zeros in a given RDD of binary cells.- Parameters:
input- matrix asJavaPairRDD<MatrixIndexes, MatrixCell>- Returns:
- matrix characteristics
-
getNonZeros
public static long getNonZeros(MatrixObject mo)
-
getNonZeros
public static long getNonZeros(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> input)
-
postprocessUltraSparseOutput
public static void postprocessUltraSparseOutput(MatrixObject mo, DataCharacteristics mcOut)
-
-