Class CompressionSettings
- java.lang.Object
-
- org.apache.sysds.runtime.compress.CompressionSettings
-
public class CompressionSettings extends Object
Compression Settings class, used as a bundle of parameters inside the Compression framework. See CompressionSettingsBuilder for default non static parameters.
-
-
Field Summary
Fields Modifier and Type Field Description booleanallowSharedDictionaryShare DDC Dictionaries between ColGroups.static intBITMAP_BLOCK_SZSize of the blocks used in a blocked bitmap representation.doublecoCodePercentageA Cocode parameter that differ in behavior based on compression method, in general it is a value that reflects aggressively likely coCoding is used.CoCoderFactory.PartitionerTypecolumnPartitionerThe selected method for column partitioning used in CoCoding compressed columnsCostEstimatorFactory.CostTypecostComputationTypeThe cost computation type for the compressionSampleEstimatorFactory.EstimationTypeestimationTypeThe sample type used for samplingbooleanisInSparkInstructionIs a spark instructionbooleanlossyTrue if lossy compression is enabledintmaxColGroupCoCodeThe maximum number of columns CoCoded allowedintmaxSampleSizeThe maximum size of the sample extracted.doubleminimumCompressionRatioThe minimum compression ratio to achieve.intminimumSampleSizeThe minimum size of the sample extracted.static intPAR_DDC_THRESHOLDParallelization threshold for DDC compressiondoublesamplePowerThe sampling ratio power to use when choosing sample size.doublesamplingRatioThe sampling ratio used when choosing ColGroups.InsertionSorterFactory.SORT_TYPEsdcSortTypeThe sorting type used in sorting/joining offsets to create SDC groupsintseedIf the seed is -1 then the system used system millisecond time and class hash for seeding.booleansortTuplesByFrequencySorting of values by physical length helps by 10-20%, especially for serial, while slight performance decrease for parallel incl multi-threaded, hence not applied for distributed operations (also because compression time + garbage collection increases)booleantransposedTranspose input matrix, to optimize access when extracting bitmaps.StringtransposeInputBoolean specifying which transpose setting is used, can be auto, true or falseEnumSet<AColGroup.CompressionType>validCompressionsValid Compressions List, containing the ColGroup CompressionTypes that are allowed to be used for the compression Default is to always allow for Uncompromisable ColGroup.
-
-
-
Field Detail
-
PAR_DDC_THRESHOLD
public static int PAR_DDC_THRESHOLD
Parallelization threshold for DDC compression
-
BITMAP_BLOCK_SZ
public static final int BITMAP_BLOCK_SZ
Size of the blocks used in a blocked bitmap representation. Note it is exactly Character.MAX_VALUE. This is not Character max value + 1 because it breaks the offsets in cases with fully dense values.- See Also:
- Constant Field Values
-
sortTuplesByFrequency
public final boolean sortTuplesByFrequency
Sorting of values by physical length helps by 10-20%, especially for serial, while slight performance decrease for parallel incl multi-threaded, hence not applied for distributed operations (also because compression time + garbage collection increases)
-
samplingRatio
public final double samplingRatio
The sampling ratio used when choosing ColGroups. Note that, default behavior is to use exact estimator if the number of elements is below 1000. DEPRECATED
-
samplePower
public final double samplePower
The sampling ratio power to use when choosing sample size. This is used in accordance to the function: sampleSize += nRows^samplePower; The value is bounded to be in the range of 0 to 1, 1 giving a sample size of everything, and 0 adding 1.
-
allowSharedDictionary
public final boolean allowSharedDictionary
Share DDC Dictionaries between ColGroups.
-
transposeInput
public final String transposeInput
Boolean specifying which transpose setting is used, can be auto, true or false
-
seed
public final int seed
If the seed is -1 then the system used system millisecond time and class hash for seeding.
-
lossy
public final boolean lossy
True if lossy compression is enabled
-
columnPartitioner
public final CoCoderFactory.PartitionerType columnPartitioner
The selected method for column partitioning used in CoCoding compressed columns
-
costComputationType
public final CostEstimatorFactory.CostType costComputationType
The cost computation type for the compression
-
maxColGroupCoCode
public final int maxColGroupCoCode
The maximum number of columns CoCoded allowed
-
coCodePercentage
public final double coCodePercentage
A Cocode parameter that differ in behavior based on compression method, in general it is a value that reflects aggressively likely coCoding is used.
-
validCompressions
public final EnumSet<AColGroup.CompressionType> validCompressions
Valid Compressions List, containing the ColGroup CompressionTypes that are allowed to be used for the compression Default is to always allow for Uncompromisable ColGroup.
-
minimumSampleSize
public final int minimumSampleSize
The minimum size of the sample extracted.
-
maxSampleSize
public final int maxSampleSize
The maximum size of the sample extracted.
-
estimationType
public final SampleEstimatorFactory.EstimationType estimationType
The sample type used for sampling
-
transposed
public boolean transposed
Transpose input matrix, to optimize access when extracting bitmaps. This setting is changed inside the script based on the transposeInput setting. This is intentionally left as a mutable value, since the transposition of the input matrix is decided in phase 3.
-
minimumCompressionRatio
public final double minimumCompressionRatio
The minimum compression ratio to achieve.
-
isInSparkInstruction
public final boolean isInSparkInstruction
Is a spark instruction
-
sdcSortType
public final InsertionSorterFactory.SORT_TYPE sdcSortType
The sorting type used in sorting/joining offsets to create SDC groups
-
-