public class ZScoreOutlierFilter extends BaseDatasetFilter
Entries whose value for any property is more than a certain number of standard deviations from the mean are flagged as outliers.
If an attribute has fewer than 10 distinct values, it is classified as a
discrete variable and will be excluded from screening.
Usage: <tolerance> [-class] [-attributes]
tolerance: Number of standard deviations from the mean used
to classify outliers
-class: Consider class values when determining outliers
-attributes: Consider attribute value when computing outliers
Modifier and Type | Field and Description |
---|---|
protected double[] |
AttributeMean
Means of attributes
|
protected java.lang.String[] |
AttributeNames
Names of attributes.
|
protected double[] |
AttributeStdDev
Standard deviation of attributes
|
protected double |
ClassMean
Mean of class variable
|
protected double |
ClassStdDev
Standard deviation of class variable
|
protected boolean |
ScreenAttributes
Whether to screen based on attributes
|
protected boolean |
ScreenClass
Whether to screen based on class variable
|
protected double |
Tolerance
Number of standard deviation away from mean used when defining outliers
|
protected boolean |
Trained
Whether this class has been trained
|
Constructor and Description |
---|
ZScoreOutlierFilter() |
Modifier and Type | Method and Description |
---|---|
protected boolean[] |
label(Dataset D)
Given a dataset, determine which entries passes the filter.
|
java.lang.String |
printUsage()
Print out required format for options.
|
void |
setOptions(java.util.List<java.lang.Object> Options)
Set any options for this object.
|
void |
setScreenAttributes(boolean screenAttributes)
Set whether to screen based on whether any attribute value is an outlier.
|
void |
setScreenClass(boolean screenClass)
Set whether to screen based on whether measured class value is an outlier
|
void |
setTolerance(double tolerance)
Set the tolerance defining what is an outlier.
|
void |
train(Dataset trainData)
Train a dataset splitter, if necessary
|
filter, parallelLabel, parallelMinimum, setExclude, toExclude
protected double Tolerance
protected boolean ScreenAttributes
protected boolean ScreenClass
protected double[] AttributeMean
protected double[] AttributeStdDev
protected double ClassMean
protected double ClassStdDev
protected boolean Trained
protected java.lang.String[] AttributeNames
public void setOptions(java.util.List<java.lang.Object> Options) throws java.lang.Exception
Options
Options
- Array of options as Objects - can be null
java.lang.Exception
- if problem with inputspublic java.lang.String printUsage()
Options
public void setTolerance(double tolerance)
tolerance
- Number of standard deviations from mean defining an outlierpublic void setScreenAttributes(boolean screenAttributes)
screenAttributes
- Desired settingpublic void setScreenClass(boolean screenClass)
screenClass
- Desired settingpublic void train(Dataset trainData)
BaseDatasetFilter
train
in class BaseDatasetFilter
trainData
- Dataset to use for trainingprotected boolean[] label(Dataset D)
BaseDatasetFilter
label
in class BaseDatasetFilter
D
- Dataset to be labeled