DCM

DCM (Discriminant chronicle mining) is a C++ implementation of two chronicle mining task. The extraction is done from set of temporal sequences.

Discriminant chronicle mining: The main purpose of DCM is to extract discriminant chronicles from a positive dataset in comparison to a negative dataset. A chronicle will be considered discriminant if its support in the positive dataset is greater than g_{min} times its support in the negative dataset. The parameter g_{min} has to be defined by the user before the run. The implementation of this task in DCM is not complete as it will not extract the whole set of discriminant chronicles.
Frequent chronicle mining: DCM contains also an implementation of the frequent chronicle mining task. The implementation of this task is complete according to the definition that the bounds used for the temporal intervals of the chronicles must occur in the dataset. This part of the implementation has not been maintained for a while and could contains some bugs.

Run

This app runs the implementation of DCM available on the Inria gitlab. Two options are available to run DCM through this app:

Using zip file: Your datasets have to be rename as pos.dat and neg.dat and zipped together. In this case, the purpose of the algorithm is to extract discriminant chronicles for pos.dat according to neg.dat. The algorithm will be run as the command line ./DCM -i pos.dat -d neg.dat followed by the parameters. This option allows to run the algorithm on several zip files in the same job.
Using other files: If no zip file is found, algorithm will be runned without default dataset ie the command ./DCM followed by the parameters will be launched. This option allows to upload two files and to choose which file is the positive or the negative dataset.

In both options, the minimal frequency threshold is required and as no default value.

Each sequence f a dataset is represent by a line and each event by a string and an integer. Strings and integers are separated by space just like events. An example of two sequences is the head of the d214lbbb_H141.dat available on the git deposit:

qrs[abnormal] 164 p_wave[normal] 781 qrs[abnormal] 964 
qrs[abnormal] 164 p_wave[normal] 781 qrs[abnormal] 964 p_wave[normal] 1647 qrs[abnormal] 1839

Argument options

The DCM parameters are listed in the help of the executable. To print this help, use the parameter --help or simply run the executable without parameters.

Usage:  Extract input_file fmin [options]
Positional Options (required):
  -i [ --input_file ] arg input file containing dataset to mine (string)
                          - positive dataset if --disc is used
                          positional : input_file
  -f [ --fmin ] arg       minimal frequency threshold (number)
                          Number of sequences if >= 1 (support)
                          Percent of positive sequences number else
                          positional fmin


General Options:
  --help                   Display this help message
  -d [ --disc ] arg        Extract discriminant chronicles using this file as
                           negative dataset
  -u [ --IBM ]             Use IBM format for files instead of sequence per
                           line
  --mincs arg              Minimum size of extracted chronicles
  --maxcs arg              Maximum size of extracted chronicles
  -c [ --close ]           Extract frequent closed chronicles or discriminant
                           chronicles from closed multisets if --disc is used
  -j [ --json ]            Output format is json instead of plain text
  -v [ --verbose ]         The program will speak

Discriminant chronicles Options:
  -g [ --gmin ] arg        Minimal growth threshold
                           default : 2

Frequent chronicles Options:
  -a [ --all_different ]   Extract chronicles with multisets containing at most
                           one occurrence of an event
  -w [ --cwm ] arg         Define the maximal windows size for temporal
                           constraints
  -n [ --not_calc_freq ]   If used, doesn't calculate exact frequency it's know
                           bigger than fmin
                           ignored if --close is used

In input :

d214lbbb_vs_pvc.zip

d214lbbb_H141.dat

d214pvc.dat

In output :

d214lbbb_vs_pvc.output.log

./DCM -i "pos.dat" -d "neg.dat" -f 0.8 -g 2
#####

C: {["qrs[abnormal]", "qrs[abnormal]", "qrs[abnormal]"]}
0, 1: (686, 881)
0, 2: (-inf, inf)
1, 2: (539, 894)
f: 125/20

C: {["qrs[abnormal]", "qrs[abnormal]", "qrs[abnormal]", "p_wave[normal]", "p_wave[normal]"]}
0, 1: (-inf, inf)
0, 2: (-inf, 1861)
0, 3: (-173, inf)
0, 4: (706, inf)
1, 2: (653, 1011)
1, 3: (-inf, inf)
1, 4: (-inf, 706)
2, 3: (-inf, inf)
2, 4: (-inf, inf)
3, 4: (-inf, 1653)
f: 116/0

C: {["qrs[abnormal]", "qrs[abnormal]", "p_wave[normal]"]}
0, 1: (1402, 1705)
0, 2: (544, 1384)
1, 2: (-inf, inf)
f: 119/41

C: {["qrs[abnormal]", "qrs[abnormal]", "p_wave[normal]", "p_wave[normal]", "p_wave[normal]"]}
0, 1: (-inf, inf)
0, 2: (-inf, inf)
0, 3: (-inf, inf)
0, 4: (-inf, 1564)
1, 2: (-inf, inf)
1, 3: (-inf, inf)
1, 4: (-172, inf)
2, 3: (-inf, inf)
2, 4: (-inf, inf)
3, 4: (-inf, 1355)
f: 117/30

output.log

./DCM d214lbbb_H141.dat -d d214pvc.dat -f 0.8 -g 2
#####

C: {["qrs[abnormal]", "qrs[abnormal]", "qrs[abnormal]"]}
0, 1: (686, 881)
0, 2: (-inf, inf)
1, 2: (539, 894)
f: 125/20

C: {["qrs[abnormal]", "qrs[abnormal]", "qrs[abnormal]", "p_wave[normal]", "p_wave[normal]"]}
0, 1: (-inf, inf)
0, 2: (-inf, 1861)
0, 3: (-173, inf)
0, 4: (706, inf)
1, 2: (653, 1011)
1, 3: (-inf, inf)
1, 4: (-inf, 706)
2, 3: (-inf, inf)
2, 4: (-inf, inf)
3, 4: (-inf, 1653)
f: 116/0

C: {["qrs[abnormal]", "qrs[abnormal]", "p_wave[normal]"]}
0, 1: (1402, 1705)
0, 2: (544, 1384)
1, 2: (-inf, inf)
f: 119/41

C: {["qrs[abnormal]", "qrs[abnormal]", "p_wave[normal]", "p_wave[normal]", "p_wave[normal]"]}
0, 1: (-inf, inf)
0, 2: (-inf, inf)
0, 3: (-inf, inf)
0, 4: (-inf, 1564)
1, 2: (-inf, inf)
1, 3: (-inf, inf)
1, 4: (-172, inf)
2, 3: (-inf, inf)
2, 4: (-inf, inf)
3, 4: (-inf, 1355)
f: 117/30

22/01/2018 : Version 1, initial version

How to use our REST API :

Think to check your private token in your account first. You can find more detail in our documentation tab.

This app id is : 176

This curl command will create a job, and return your job url, and also the average execution time

files and/or dataset are optionnal, think to remove them if not wanted

curl -H 'Authorization: Token token=<your_private_token>' -X POST
-F job[webapp_id]=176
-F job[param]=""
-F job[queue]=standard
-F files[0]=@test.txt
-F files[1]=@test2.csv
-F job[file_url]=<my_file_url>
-F job[dataset]=<my_dataset_name> https://allgo.inria.fr/api/v1/jobs

Then, check your job to get the url files with :

curl -H 'Authorization: Token token=<your_private_token>' -X GET https://allgo.inria.fr/api/v1/jobs/<job_id>

Dcm

DCM

Run

Argument options

How to use our REST API :