Tags:
temporal pattern mining, supervised learning
Owner:
yann.dauxais@irisa.fr
DCM (Discriminant chronicle mining) is a C++ implementation of two chronicle mining task. The extraction is done from set of temporal sequences.
g_{min}
times its support in the negative dataset.
The parameter g_{min}
has to be defined by the user before the run.
The implementation of this task in DCM is not complete as it will not extract
the whole set of discriminant chronicles.This app runs the implementation of DCM available on the Inria gitlab. Two options are available to run DCM through this app:
./DCM -i pos.dat -d neg.dat
followed by the parameters.
This option allows to run the algorithm on several zip files in the same job. ./DCM
followed by the parameters will be launched.
This option allows to upload two files and to choose which file is the positive or the negative dataset. In both options, the minimal frequency threshold is required and as no default value.
Each sequence f a dataset is represent by a line and each event by a string and an integer. Strings and integers are separated by space just like events. An example of two sequences is the head of the d214lbbb_H141.dat available on the git deposit:
qrs[abnormal] 164 p_wave[normal] 781 qrs[abnormal] 964
qrs[abnormal] 164 p_wave[normal] 781 qrs[abnormal] 964 p_wave[normal] 1647 qrs[abnormal] 1839
The DCM parameters are listed in the help of the executable.
To print this help, use the parameter --help
or simply run the executable without
parameters.
Usage: Extract input_file fmin [options]
Positional Options (required):
-i [ --input_file ] arg input file containing dataset to mine (string)
- positive dataset if --disc is used
positional : input_file
-f [ --fmin ] arg minimal frequency threshold (number)
Number of sequences if >= 1 (support)
Percent of positive sequences number else
positional fmin
General Options:
--help Display this help message
-d [ --disc ] arg Extract discriminant chronicles using this file as
negative dataset
-u [ --IBM ] Use IBM format for files instead of sequence per
line
--mincs arg Minimum size of extracted chronicles
--maxcs arg Maximum size of extracted chronicles
-c [ --close ] Extract frequent closed chronicles or discriminant
chronicles from closed multisets if --disc is used
-j [ --json ] Output format is json instead of plain text
-v [ --verbose ] The program will speak
Discriminant chronicles Options:
-g [ --gmin ] arg Minimal growth threshold
default : 2
Frequent chronicles Options:
-a [ --all_different ] Extract chronicles with multisets containing at most
one occurrence of an event
-w [ --cwm ] arg Define the maximal windows size for temporal
constraints
-n [ --not_calc_freq ] If used, doesn't calculate exact frequency it's know
bigger than fmin
ignored if --close is used
In input :
In output :
./DCM -i "pos.dat" -d "neg.dat" -f 0.8 -g 2 ##### C: {["qrs[abnormal]", "qrs[abnormal]", "qrs[abnormal]"]} 0, 1: (686, 881) 0, 2: (-inf, inf) 1, 2: (539, 894) f: 125/20 C: {["qrs[abnormal]", "qrs[abnormal]", "qrs[abnormal]", "p_wave[normal]", "p_wave[normal]"]} 0, 1: (-inf, inf) 0, 2: (-inf, 1861) 0, 3: (-173, inf) 0, 4: (706, inf) 1, 2: (653, 1011) 1, 3: (-inf, inf) 1, 4: (-inf, 706) 2, 3: (-inf, inf) 2, 4: (-inf, inf) 3, 4: (-inf, 1653) f: 116/0 C: {["qrs[abnormal]", "qrs[abnormal]", "p_wave[normal]"]} 0, 1: (1402, 1705) 0, 2: (544, 1384) 1, 2: (-inf, inf) f: 119/41 C: {["qrs[abnormal]", "qrs[abnormal]", "p_wave[normal]", "p_wave[normal]", "p_wave[normal]"]} 0, 1: (-inf, inf) 0, 2: (-inf, inf) 0, 3: (-inf, inf) 0, 4: (-inf, 1564) 1, 2: (-inf, inf) 1, 3: (-inf, inf) 1, 4: (-172, inf) 2, 3: (-inf, inf) 2, 4: (-inf, inf) 3, 4: (-inf, 1355) f: 117/30output.log
./DCM d214lbbb_H141.dat -d d214pvc.dat -f 0.8 -g 2 ##### C: {["qrs[abnormal]", "qrs[abnormal]", "qrs[abnormal]"]} 0, 1: (686, 881) 0, 2: (-inf, inf) 1, 2: (539, 894) f: 125/20 C: {["qrs[abnormal]", "qrs[abnormal]", "qrs[abnormal]", "p_wave[normal]", "p_wave[normal]"]} 0, 1: (-inf, inf) 0, 2: (-inf, 1861) 0, 3: (-173, inf) 0, 4: (706, inf) 1, 2: (653, 1011) 1, 3: (-inf, inf) 1, 4: (-inf, 706) 2, 3: (-inf, inf) 2, 4: (-inf, inf) 3, 4: (-inf, 1653) f: 116/0 C: {["qrs[abnormal]", "qrs[abnormal]", "p_wave[normal]"]} 0, 1: (1402, 1705) 0, 2: (544, 1384) 1, 2: (-inf, inf) f: 119/41 C: {["qrs[abnormal]", "qrs[abnormal]", "p_wave[normal]", "p_wave[normal]", "p_wave[normal]"]} 0, 1: (-inf, inf) 0, 2: (-inf, inf) 0, 3: (-inf, inf) 0, 4: (-inf, 1564) 1, 2: (-inf, inf) 1, 3: (-inf, inf) 1, 4: (-172, inf) 2, 3: (-inf, inf) 2, 4: (-inf, inf) 3, 4: (-inf, 1355) f: 117/30
22/01/2018 : Version 1, initial version
This app id is : 176
This curl command will create a job, and return your job url, and also the average execution time
files and/or dataset are optionnal, think to remove them if not wantedcurl -H 'Authorization: Token token=<your_private_token>' -X POST -F job[webapp_id]=176 -F job[param]="" -F job[queue]=standard -F files[0]=@test.txt -F files[1]=@test2.csv -F job[file_url]=<my_file_url> -F job[dataset]=<my_dataset_name> https://allgo.inria.fr/api/v1/jobs
Then, check your job to get the url files with :
curl -H 'Authorization: Token token=<your_private_token>' -X GET https://allgo.inria.fr/api/v1/jobs/<job_id>