Loria speech transcription system

Loriasts fr

LORIA STS transcribes the speech contained within audio files into text.


Please log in to perform a job with this app.


The LORIA Speech Transcription System performs a textual transcription of the speech contained in audio files. It is adapted to the French language.

File formats:

This service takes an audio file as input and outputs the transcribed speech using several formats.

  • inputs:
    • audio file: many audio formats are supported as the input file is converted to 16bits 16kHz mono wav file by AVCONV
    • JSON file (optional): uploading the JSON output "<audio_file_name>.json" from A||go's multimedia webservices leads to its update with the output transcripts under the 'loriaSTS' label, along with metadata from the audio stream.
  • outputs:
    • text file: <input_file_name>.ctm has the following format:
      <input_file_name>\t<channel_index>\t<start_time>\t<duration_of_word>\t<word>\t<confidence_score>\n
      <input_file_name>\t<channel_index>\t<start_time>\t<duration_of_word>\t<word>\t<confidence_score>\n
      <input_file_name>\t<channel_index>\t<start_time>\t<duration_of_word>\t<word>\t<confidence_score>\n
      ...
      
      Each line is related to an estimated word, with <start_time> and <duration_of_word> in seconds, <channel_index> being 1 for the audio file's left channel and 2 for the right one, and <confidence_score> taking values between 0 and 1 (1 is the highest confidence in the word estimation).
    • SubRip text file: <input_file_name>.srt contains the resulting transcript under a widely used format for subtitles (more information here).
      <subtitle_index>\n
      <start_time> --> <end_time>\n
      <words_transcribed_in_this_time_span>\n
      \n
      <subtitle_index>\n
      <start_time> --> <end_time>\n
      <words_transcribed_in_this_time_span>\n
      ...
      
      The times are expressed according to the hh:mm:ss,sss format. The transcript is divided in windows of 3 seconds to form these subtitles.
    • JSON file with the following format:
      {
      "general_info":{
      "src":"<input_file_name>",
      "audio":{
      "duration":"<time_in_hh:mm:ss_format>",
      "start":"<temporal_offset_in_seconds>",
      "format":"<bit_coding_format>",
      "sampling_rate":"<frequency> Hz",
      "nb_channels":"<n> channels",
      "bit_rate":"<bit_rate> kb/s"
      }
      },
      "loriaSTS":{
      "annotation_type":"speech transcription",
      "system":"loriaSTS",
      "parameters":"<input_parameters>",
      "modality":"audio",
      "time_unit":"seconds",
      "events":[
      {
      "start":<start_time>,
      "end":<end_time>,
      "word":"<estimated_word>",
      "confidence": <confidence_measure>
      },
      {
      "start":<start_time>,
      "end":<end_time>,
      "word":"<estimated_word>",
      "confidence": <confidence_measure>
      },
      ...
      {
      "start":<start_time>,
      "end":<end_time>,
      "word":"<estimated_word>",
      "confidence": <confidence_measure>
      }
      ]
      }
      }
        
    • with <start_time> and <end_time> in seconds.

Parameters:

  • -bw1 <integer>: width of the exploration beam used during the first Viterbi decoding, in number of states (default: 2000). Decreasing this beam width leads to a higher computation speed but a lower quality of transcripton.

Reference:

Illina I., Fohr D., Mella O., Cerisara C., "The Automatic News Transcription System : ANTS some Real Time experiments", In proc. of the 8th International Converence on Spoken Language Processing (ICSLP), October 2004.

Credits and license:

The Loria speech transcription system is the online version of the Automatic News Transcription System (ANTS) developed by Jouvet D., Fohr D. and Mella O. in LORIA/Inria Nancy. This piece of software relies on AVCONV, the SOX platform, the HTK speech recognition toolkit, the speaker diarization software from LIUM and the Julius decoder. Julius and the speaker diarization software from LIUM are respectively released under a revised BSD license and the GNU license. HTK is released under proprietary license. Acoustic and language models were learnt on corpora reserved for research and teaching only.

10/08/2017 : Version 1.0,

How to use our REST API :

Think to check your private token in your account first. You can find more detail in our documentation tab.

This app id is : 77

This curl command will create a job, and return your job url, and also the average execution time

files and/or dataset are optionnal, think to remove them if not wanted
curl -H 'Authorization: Token token=<your_private_token>' -X POST
-F job[webapp_id]=77
-F job[param]=""
-F job[queue]=standard
-F files[0]=@test.txt
-F files[1]=@test2.csv
-F job[file_url]=<my_file_url>
-F job[dataset]=<my_dataset_name> https://allgo.inria.fr/api/v1/jobs

Then, check your job to get the url files with :

curl -H 'Authorization: Token token=<your_private_token>' -X GET https://allgo.inria.fr/api/v1/jobs/<job_id>