Machine learning

Feature_scML cls func

cls module is Machine Learning classification function, including Support Vector Machine, Random Forest, Gaussian Naive Bayes, and Logical Regression. Input data and test data is required CSV format. We integrated the hyperparameter optimization function in the training process. Test data is not required. IF test data is not input, the train data wil be splited into train data and test data (train:test=8:2).

$Feature_scML cls -h
usage: cls

optional arguments:
-h, --help            show this help message and exit
-c {lr,svm,rf,gnb}, --classifier {lr,svm,rf,gnb}
                        Select a machine learning method:
                        lr (Logical Regression)
                        svm (Support Vector Machine)
                        rf (Random Forest)
                        gnb (Gaussian Naive Bayes)
-i INPUT_TRAIN, --input_train INPUT_TRAIN
                        Input train data (CSV)
-o OUTPUT, --output OUTPUT
                        Output directory
--njobs NJOBS         The number of jobs to run in parallel
--input_test INPUT_TEST
                        Input test data filename path (CSV)
--getmodel GETMODEL   Generate model files (default=False)

Command

Parameters

optional

Descripton

—classifier, -c



lr,svm,rf,gnb



- lr (Logical Regression)
- svm (Support Vector Machine)
- rf (Random Forest)
- gnb (Gaussian Naive Bayes)

—input_train,-i

filename path

input filename path (CSV format)

—output, -o

output directory

output directory (default:Current directory)

—njobs

int, default=1

The number of jobs to run in parallel

—input_test

filename path

If None, train dataset will be splited

—getmodel

True or False

If True, model file will be saved

Example

$Feature_scML cls -i example.csv -c svm
Feature selection: cv2
cv2 is running
....................
The identity link function does not respect the domain of the Gamma family.
cv2: DONE
$ls
example.csv  example_cv2.csv  example_cv2_data.csv
# example_cv2.csv is feature importance of cv2 method
# The colnames of example_cv2_data.csv are sorted by feature ranking of cv2 method.

example_cv2.csv

The result will generate a dataframe with column names of feature and score.

Feature

cv2_score

CCKBR

122.89836751624631

IFITM1
15.62471687317092

PDCL3

-90.00136155460666

example_cv2_data.csv

Label

CCKBR

IFITM1

PDCL3

6

0.7922727272727272

0.7636363636363637

93.6201131016

3

0.8276233766233766

0.7772727272727272

1517.12046654

3

0.0

121.252831887

1234.49979645