Tags:
Speaker Diarization, audio, segmentation, speech, linkmedia
Owner:
gabriel.sargent@irisa.fr
SpeaDS: Speaker Diarization System. This service detects "who speaks when" within an audio recording, without prior information on the speakers.
Please log in to perform a job with this app.
SpeaDS segments the input audio stream according to the speakers appearing over time. These segments are labeled in terms of speaker index and of estimated genre (M: male, F: female). Speaker names are not known nor estimated.
In this service, the audio stream is analyzed using four main steps.
References:
[1] Ben, Betser, Bimbot and Gravier, "Speaker diarization using bottom-up clustering based on a parameter-derived distance between adapted GMMs", In Proc. of International Conference on Speech and Language Processing (Interspeech), 2004.
[2] SPro through Inria GForge
[3] Audioseg through Inria GForge
(Version 1.0 of this web service includes SPro v5.0 and Audioseg v1.2.2)
SpeaDS takes the audio stream from audio or video files as input, and outputs the speaker segmentation in raw text and json formats. A JSON file produced by other multimedia services from A||go can be provided as input to be completed with this segmentation.
<speakerI_G>\t<start_time>\t<end_time> <speakerI_G>\t<start_time>\t<end_time> ...each line describing a single segment, I being the speaker index, and G being the estimated gender of the speaker, either male (M) or female(F)
{ "general_info":{ "src":"<input_file_name>", "audio":{ "duration":"<time_in_hh:mm:ss_format>", "start":"<temporal_offset_in_seconds>", "format":"<bit_coding_format>", "sampling_rate":"<frequency> Hz", "nb_channels":"<n> channels", "bit_rate":"<bit_rate> kb/s" } }, "speads":{ "annotation_type":"speaker segments", "system":"speads", "parameters":"<input_parameters>", "modality":"audio", "time_unit":"seconds", "events":[ { "start":<seg_start_time>, "end":<seg_end_time>, "spkr":"<speakerI_G>" }, { "start":<seg_start_time>, "end":<seg_end_time>, "spkr":"<speakerI_G>" }, ... { "start":<start_time>, "end":<end_time>, "spkr":"<speakerI_G>" } ] } }
SpeaDS was developed by Gabriel Sargent and Guillaume Gravier in IRISA/Inria Rennes Bretagne Atlantique. It can be released and supplied under license on a case-by-case basis. Spro was developed by Guillaume Gravier. AudioSeg was developed by Mathieu Ben, Michaƫl Betser and Guillaume Gravier.
In input :
In output :
17/08/2017 : Version 1.0,
This app id is : 99
This curl command will create a job, and return your job url, and also the average execution time
files and/or dataset are optionnal, think to remove them if not wantedcurl -H 'Authorization: Token token=<your_private_token>' -X POST -F job[webapp_id]=99 -F job[param]="" -F job[queue]=standard -F files[0]=@test.txt -F files[1]=@test2.csv -F job[file_url]=<my_file_url> -F job[dataset]=<my_dataset_name> https://allgo.inria.fr/api/v1/jobs
Then, check your job to get the url files with :
curl -H 'Authorization: Token token=<your_private_token>' -X GET https://allgo.inria.fr/api/v1/jobs/<job_id>