Tags:
speech, transcription, ASR, LORIA, audio, multimedia
Owner:
gabriel.sargent@irisa.fr
LORIA STS transcribes the speech contained within audio files into text.
Please log in to perform a job with this app.
The LORIA Speech Transcription System performs a textual transcription of the speech contained in audio files. It is adapted to the French language.
This service takes an audio file as input and outputs the transcribed speech using several formats.
<input_file_name>\t<channel_index>\t<start_time>\t<duration_of_word>\t<word>\t<confidence_score>\n <input_file_name>\t<channel_index>\t<start_time>\t<duration_of_word>\t<word>\t<confidence_score>\n <input_file_name>\t<channel_index>\t<start_time>\t<duration_of_word>\t<word>\t<confidence_score>\n ...Each line is related to an estimated word, with <start_time> and <duration_of_word> in seconds, <channel_index> being 1 for the audio file's left channel and 2 for the right one, and <confidence_score> taking values between 0 and 1 (1 is the highest confidence in the word estimation).
<subtitle_index>\n <start_time> --> <end_time>\n <words_transcribed_in_this_time_span>\n \n <subtitle_index>\n <start_time> --> <end_time>\n <words_transcribed_in_this_time_span>\n ...The times are expressed according to the hh:mm:ss,sss format. The transcript is divided in windows of 3 seconds to form these subtitles.
{ "general_info":{ "src":"<input_file_name>", "audio":{ "duration":"<time_in_hh:mm:ss_format>", "start":"<temporal_offset_in_seconds>", "format":"<bit_coding_format>", "sampling_rate":"<frequency> Hz", "nb_channels":"<n> channels", "bit_rate":"<bit_rate> kb/s" } }, "loriaSTS":{ "annotation_type":"speech transcription", "system":"loriaSTS", "parameters":"<input_parameters>", "modality":"audio", "time_unit":"seconds", "events":[ { "start":<start_time>, "end":<end_time>, "word":"<estimated_word>", "confidence": <confidence_measure> }, { "start":<start_time>, "end":<end_time>, "word":"<estimated_word>", "confidence": <confidence_measure> }, ... { "start":<start_time>, "end":<end_time>, "word":"<estimated_word>", "confidence": <confidence_measure> } ] } }
Illina I., Fohr D., Mella O., Cerisara C., "The Automatic News Transcription System : ANTS some Real Time experiments", In proc. of the 8th International Converence on Spoken Language Processing (ICSLP), October 2004.
The Loria speech transcription system is the online version of the Automatic News Transcription System (ANTS) developed by Jouvet D., Fohr D. and Mella O. in LORIA/Inria Nancy. This piece of software relies on AVCONV, the SOX platform, the HTK speech recognition toolkit, the speaker diarization software from LIUM and the Julius decoder. Julius and the speaker diarization software from LIUM are respectively released under a revised BSD license and the GNU license. HTK is released under proprietary license. Acoustic and language models were learnt on corpora reserved for research and teaching only.
In input :
In output :
10/08/2017 : Version 1.0,
This app id is : 77
This curl command will create a job, and return your job url, and also the average execution time
files and/or dataset are optionnal, think to remove them if not wantedcurl -H 'Authorization: Token token=<your_private_token>' -X POST -F job[webapp_id]=77 -F job[param]="" -F job[queue]=standard -F files[0]=@test.txt -F files[1]=@test2.csv -F job[file_url]=<my_file_url> -F job[dataset]=<my_dataset_name> https://allgo.inria.fr/api/v1/jobs
Then, check your job to get the url files with :
curl -H 'Authorization: Token token=<your_private_token>' -X GET https://allgo.inria.fr/api/v1/jobs/<job_id>