Plot

Plot module have three submodules.

FW (Feature weights)
cor (Feature Correlation (cor) analysis)
IFS (Incremental feature selection curves)
SHAP (shap values summary)
waterfall (Feature waterfall based on shap values)
beeswarm (Feature beeswarm based on shap values of specific category)
PCA (Principal component analysis)
CM (Confusion matrix)

Feature Weights (FW)

FW module shows the feature selection method to plot the contribution of each feature.

Parameters	Optional	Descripton
-i, —input	filename path	Feature importance dataframe path (CSV format)
—format	png, pdf	Picture format
-n, —number	int (default=50)	Number of features shown in the picture
-o, —output	directory	output directory (default:Current directory)

FW example

# If you do not have a dataframe of feature importances,
# Please run $Feature_scML fs -i example.csv -m fscore
$Feature_scML plot FW -i example_fscore.csv
# get example_fscore_FeatureWeight.png in the current directory.

../_images/example_fscore_FeatureWeight.png — example_fscore_FeatureWeight.png

Feature Correlation (cor) analysis

cor module shows the feature correlation based on pearson, spearman, or kendall method to plot the correlation of every two features.

Parameters	Optional	Descripton
-i, —input	filename path	Feature dataframe path (CSV format)
-m, —method	pearson,spearman,kendall	correlation measure
—format	png, pdf	Picture format (default=png)
-n, —feature_number	int (default=10)	Number of top-ranked features shown in the picture
-o, —output	directory	output directory (default:Current directory)

cor example

# If you do not have a dataframe of feature importances,
# Please run $Feature_scML fs -i example.csv -m fscore
$Feature_scML plot cor -i example_fscore_data.csv -m pearson -n 10
# get pearson_correlation_example_fscore_data_10.png in the current directory.

../_images/pearson_correlation_example_fscore_data_10.png — pearson_correlation_example_fscore_data_10.png

Incremental Feature Selection (IFS) Curves

IFS (incremental feature selection curves) evaluates the classification performance of the top-k-ranked features iteratively for k ∈ (1, 2, …, n), where n is the total number of features.

Parameters	Optional	Descripton
-i, —input	filename path	IFS dataframe path (CSV format)
—format	png, pdf	Picture format
-o, —output	directory	output directory (default:Current directory)

IFS example

# If you do not have a dataframe of incremental feature classification performance,
# Please run $Feature_scML automl -i example.csv -c svm -m fscore --njobs 20
$Feature_scML plot IFS -i 10-100_fscore_SVM_accuracy.csv
# get 10-100_fscore_SVM_accuracy_IFS.png in the current directory.

../_images/10-100_fscore_SVM_accuracy_IFS.png — 10-100_fscore_SVM_accuracy_IFS.png

SHAP

SHAP shows each feature of shap values for all categories.

Parameters	Optional	Descripton
-i, —input	filename path	dataframe path (CSV format)
—format	png, pdf	Picture format
—model_path	filename path	Model file path (joblib)
–classifier	svm,rf,lr	classifier name
-n, —feature_number	int	Consistent with the number of features trained by the model
-o, —output	directory	Output directory (default:Current directory)

SHAP example

# If you do not have a dataframe of incremental feature classification performance,
# Please run $Feature_scML automl -i example.csv -c rf -m fscore --njobs 20
$Feature_scML plot SHAP -i example.csv -c rf -n 100 --model_path example_rf.joblib

../_images/example_rf_100_SHAP_feature_summary.png — example_rf_100_SHAP_feature_summary.png

waterfall

waterfall (Feature waterfall based on shap values) shows each feature of shap values for a specific sample.

Parameters	Optional	Descripton
-i, —input	filename path	dataframe path (CSV format)
—format	png, pdf	Picture format
—model_path	filename path	Model file path (joblib)
—method	svm	Model name
-n, —feature_number	int	Consistent with the number of features trained by the model
-o, —output	directory	Output directory (default:Current directory)

waterfall example

# Feature_scML automl -i example.csv -c svm -m fscore --njobs 20 --getmodel True
$Feature_scML plot waterfall -i example_fscore_data.csv --model_path example_20_svm.joblib -s 0 -n 20

../_images/example_fscore_data_simple_feature_contribute.png — example_fscore_data_simple_feature_contribute.png

beeswarm

The beeswarm plot shows an information-dense summary of how the top features in a dataset impact the model’s output.

Parameters	Optional	Descripton
-i, —input	filename path	dataframe path (CSV format)
—format	png, pdf	Picture format
—model_path	filename path	Model file path (joblib)
—method	svm	Model name
-n, —feature_number	int	Consistent with the number of features trained by the model
-s, —sample_label	int	label category to(0, 1, …)
-o, —output	directory	Output directory (default:Current directory)

beeswarm example

# Feature_scML automl -i example.csv -c svm -m fscore --njobs 20 --getmodel True
# Evaluate the summary shap value of all samples with a strategy of 1.
$Feature_scML plot beeswarm -i example_fscore_data.csv --model_path example_20_svm.joblib  -n 20 -s 1

../_images/example_fscore_data_simple_feature_summary.png — example_fscore_data_simple_feature_summary.png

Principal Component Analysis (PCA)

The PCA plot shows the influence of different feature clustering on sample clustering.

Parameters	Optional	Descripton
-i, —input	filename path	dataframe path (CSV format)
—format	png, pdf	Picture format
-n, —feature_number	int	feature number
-o, —output	directory	Output directory (default:Current directory)

PCA example

$Feature_scML plot PCA -i example_fscore_data.csv -n 100

../_images/example_fscore_data_100_PCA.png — example_fscore_data_100_PCA.png

T-distributed Stochastic Neighbor Embedding (T-SNE)

The T-SNE plot shows the influence of different feature clustering on sample clustering.

Parameters	Optional	Descripton
-i, —input	filename path	dataframe path (CSV format)
—format	png, pdf	Picture format
-n, —feature_number	int	feature number
-o, —output	directory	Output directory (default:Current directory)

T-SNE example

$Feature_scML plot TSNE -i example_data.csv -n 100

../_images/example_data_100_T-SNE.png — example_data_100_T-SNE.png

Confusion matrix (CM)

CM module evaluates classification accuracy by computing the confusion matrix with each row corresponding to the true class

Parameters	Optional	Descripton
-i, —input	filename path	dataframe path (CSV format)
—format	png, pdf	Picture format
—model_path	filename path	Model file path (joblib)
-n, —feature_number	int	feature number
-o, —output	directory	Output directory (default:Current directory)

CM example

# $Feature_scML automl -i example.csv -c svm -m fscore
$Feature_scML plot CM -i example_fscore_data.csv -n 100 --model_path example_100_svm.joblib

../_images/confusion_matrix_example_fscore_data_100.png — confusion_matrix_example_fscore_data_100.png