Radi.sh

Radish

RADI.sh: Repeated Audio motif DIscovery within audio files.


Please log in to perform a job with this app.


This service finds repeating patterns within speech and audio streams without prior knowledge.

Overview:

RADI.sh discovers and collects occurrences of repeating spoken/audio motifs within the input audio stream. It is language- and topic-free as it doesn't rely on any prior acoustic and linguistic knowledge, nor training material (unsupervised approach). It handles large audio streams in a reasonable amount of time.

This service proceeds as follows. First, the audio is translated into a sequence of feature vectors, i.e., either a sequence of MFCCs or a posteriorgram calculated from them. Second, the sequence of feature vectors is progressively analyzed using a sliding window to detect repeated motives and store a prototypical pattern representing them in a library. The analysis window consists of two portions: the first portion represents the small pattern to be matched called "seed", and the second represents its recent future. The seed is considered as a potential fragment of motif if it matches partly an element of the library, or if it is repeated within its recent future. In this case, the matching patterns are extended to search for the complete motif occurrences. If the similarity between the matching motifs is above a particular threshold, either a new occurrence of a motif from the library is detected, or a new motif is detected and added to the library. Otherwise, the seed is discarded.
In the current version of this service, the recent future lasts 90 seconds or less according to the remaining duration of the stream. When a new occurrence of an existing motif is detected, its reference motif in the library is updated: it is replaced by the median occurrence according to a dynamic time warping (DTW)-based score. More information can be found in [1].

[1] Muscariello A., Bimbot F. and Gravier G., "Unsupervised Motif Acquisition in Speech via Seeded Discovery and Template Matching", in IEEE Transactions in Audio, Speech and Language Processing, vol. 20, issue 7, pp: 2031-2044, 2012 (online version).

File formats:

RADI.sh takes an audio file as input and outputs a text file and a JSON file containing the position of the motifs discovered. A JSON file produced by other multimedia services from A||go can be provided as input to be completed with this description.

  • inputs:
    • audio file: many formats are supported (wav, mp3, ogg, flac, MP4...) as the entry is converted to a 16bits 16kHz mono wav file using ffmpeg.
    • JSON file (optional): uploading the JSON output "<audio_file_name>.json" from A||go's multimedia webservices leads to its update with RADI.sh's results under the 'radish' label, along with metadata from the audio stream.
  • outputs:
    • text file: "<input_file_name>_radish.txt" is made of three columns following the format below:
      <start_time>\t<end_time>\t<motif_index>\n
      <start_time>\t<end_time>\t<motif_index>\n
      ...
      
      each line describing a single motif occurrence. Repeated motifs are related to the same index.
    • JSON file with the following format:
      {
      "general_info":{
      "src":"<input_file_name>",
      "audio":{
      "duration":"<time_in_hh:mm:ss_format>",
      "start":"<temporal_offset_in_seconds>",
      "format":"<bit_coding_format>",
      "sampling_rate":"<frequency> Hz",
      "nb_channels":"<n> channels",
      "bit_rate":"<bit_rate> kb/s"
      }
      },
      "radish":{
      "annotation_type":"repeated motives",
      "system":"radish",
      "parameters":"<input_parameters>",
      "modality":"audio",
      "time_unit":"seconds",
      "events":[
      {
      "start":<seg_start_time>,
      "end":<seg_end_time>,
      "type":"<motif_index>"
      },
      {
      "start":<seg_start_time>,
      "end":<seg_end_time>,
      "type":"<motif_index>"
      },
      ...
      {
      "start":<seg_start_time>,
      "end":<seg_end_time>,
      "type":"<motif_index>"
      }
      ]
      }
      }
        
    • Each element of "events" is a repeated motif with <start_time> and <end_time> in seconds, and "type" indicates its similarity class.

Parameters:

  • -t: similarity threshold (defaut: 2)
  • -l: seed length in seconds (default: 0.25s)
  • -p <value>: using the posteriorgram instead of MFCCs for motif discovery (default: MFCC). The value entered is the number of Gaussian components of the Gaussian Mixture Model used to calculate the posteriorgram (typically 256).

Credits and license:

RADI.sh is the online version of Modis: an audio MOtif DIScovery software. It incorporates Spro 5.0 for the extraction of MFCCs and Audioseg for the calculation of the posteriorgram. Spro was developed by Guillaume Gravier. Modis is a free speech and audio motif discovery software created and developed by (in alphabetical order) Frédéric Bimbot, Laurence Catanese, Guillaume Gravier, Armando Muscariello and Nathan Souviraà-Labastie. It is the property of IRISA, CNRS, INRIA and the University of Rennes.

10/08/2017 : Version 1.0,

How to use our REST API :

Think to check your private token in your account first. You can find more detail in our documentation tab.

This app id is : 87

This curl command will create a job, and return your job url, and also the average execution time

files and/or dataset are optionnal, think to remove them if not wanted
curl -H 'Authorization: Token token=<your_private_token>' -X POST
-F job[webapp_id]=87
-F job[param]=""
-F job[queue]=standard
-F files[0]=@test.txt
-F files[1]=@test2.csv
-F job[file_url]=<my_file_url>
-F job[dataset]=<my_dataset_name> https://allgo.inria.fr/api/v1/jobs

Then, check your job to get the url files with :

curl -H 'Authorization: Token token=<your_private_token>' -X GET https://allgo.inria.fr/api/v1/jobs/<job_id>