Loria speech transcription system v2

Loriasts2 fr

LORIA STS v2 transcribes the speech contained within audio files into text.


Please log in to perform a job with this app.


The LORIA Speech Transcription System performs a textual transcription of the speech contained in audio files. This system is a newer version of Loria STS based on Kaldi. It is adapted to the French language.

File formats:

This service takes an audio file as input and returns the transcribed speech at various formats: text (ctm, srt) and json.

  • inputs:
    • audio file: many audio formats are supported as the input file is converted to 16bits 16kHz mono wav file by AVCONV
    • JSON file (optional): uploading the JSON output "<audio_file_name>.json" from A||go's multimedia webservices leads to its update with the output transcripts under the 'loriaSTS' label, along with metadata from the audio stream.
  • outputs:
    • text file: <input_file_name>.ctm has the following format:
      <input_file_name>\t<channel_index>\t<start_time>\t<duration_of_word>\t<word>\t<confidence_score>\n
      <input_file_name>\t<channel_index>\t<start_time>\t<duration_of_word>\t<word>\t<confidence_score>\n
      <input_file_name>\t<channel_index>\t<start_time>\t<duration_of_word>\t<word>\t<confidence_score>\n
      ...
      
      Each line is related to an estimated word, with <start_time> and <duration_of_word> in seconds, <channel_index> being A for the audio file's left channel and B for the right one, and <confidence_score> taking values between 0 and 1 (1 is the highest confidence in the word estimation).
    • SubRip text file: <input_file_name>.srt contains the resulting transcript under a widely used format for subtitles (more information here).
      <subtitle_index>\n
      <start_time> --> <end_time>\n
      <words_transcribed_in_this_time_span>\n
      \n
      <subtitle_index>\n
      <start_time> --> <end_time>\n
      <words_transcribed_in_this_time_span>\n
      ...
      
      The times are expressed according to the hh:mm:ss,sss format. The transcript is divided in windows of 3 seconds to form these subtitles.
    • JSON file with the following format:
      {
      "general_info":{
      "src":"<input_file_name>",
      "audio":{
      "duration":"<time_in_hh:mm:ss_format>",
      "start":"<temporal_offset_in_seconds>",
      "format":"<bit_coding_format>",
      "sampling_rate":"<frequency> Hz",
      "nb_channels":"<n> channels",
      "bit_rate":"<bit_rate> kb/s"
      }
      },
      "loriaSTS":{
      "annotation_type":"speech transcription",
      "system":"loriaSTS",
      "parameters":"<input_parameters>",
      "modality":"audio",
      "time_unit":"seconds",
      "events":[
      {
      "start":<start_time>,
      "end":<end_time>,
      "word":"<estimated_word>",
      "confidence": <confidence_measure>
      },
      {
      "start":<start_time>,
      "end":<end_time>,
      "word":"<estimated_word>",
      "confidence": <confidence_measure>
      },
      ...
      {
      "start":<start_time>,
      "end":<end_time>,
      "word":"<estimated_word>",
      "confidence": <confidence_measure>
      }
      ]
      }
      }
        
    • with <start_time> and <end_time> in seconds.

Credits and license:

The Loria speech transcription system v2 is developed by Fohr D., Mella O. and Jouvet D. in LORIA/Inria Nancy. This piece of software relies on Libav/AVCONV, the SOX platform, the speaker diarization software from LIUM and the Kaldi speech recognition toolkit. Kaldi and the speaker diarization software from LIUM are respectively available under licenses Apache 2.0 and GNU. HTK is released under proprietary license. Acoustic and language models were learnt on corpora reserved for research and teaching only.

10/08/2017 : Version 1.0,

How to use our REST API :

Think to check your private token in your account first. You can find more detail in our documentation tab.

This app id is : 133

This curl command will create a job, and return your job url, and also the average execution time

files and/or dataset are optionnal, think to remove them if not wanted
curl -H 'Authorization: Token token=<your_private_token>' -X POST
-F job[webapp_id]=133
-F job[param]=""
-F job[queue]=standard
-F files[0]=@test.txt
-F files[1]=@test2.csv
-F job[file_url]=<my_file_url>
-F job[dataset]=<my_dataset_name> https://allgo.inria.fr/api/v1/jobs

Then, check your job to get the url files with :

curl -H 'Authorization: Token token=<your_private_token>' -X GET https://allgo.inria.fr/api/v1/jobs/<job_id>