Nero

Nero

NERO : Named Entities Recognition - Online version. Named entities detector for text files.


Please log in to perform a job with this app.


NERO detects named entities within texts. A named entity is a textual object - a word or a group of words - which can be categorized into broad semantic classes. This service considers the following classes : people, function, organization, location, human production, time and amount. It is adapted to the French language.

Overview:

NERO implements two machine learning approaches for the detection of named entities within "noisy" texts such as speech transcripts obtained automatically. The first approach bases the detection on a Conditional Random Field whereas the second relies on a combination of three Finite State Transducers. They both use several textual features: the words themselves, with additional information (prior knowledge on their class or their importance, and/or morpho-syntactic information). The French corpus ESTER 2 was used for parameter tuning. For more information, please refer to [1] (article in French).

[1] Raymond C. and Fayolle J., "Reconnaissance robuste d'entités nommées sur de la parole transcrite automatiquement", In Proceedings of "17e conférence sur le Traitement Automatique des Langues Naturelles" (TALN'10), July 2010, Montréal, Québec, Canada. 2010 (Online version).

File formats:

  • inputs:
    • text file (.txt): NERO is adapted to the processing of transcriptions produced automatically and does not take into account sentence-based informations such as capital letters or punctuation marks. It is advised to replace existing apostrophes by a space to improve the named entities' classification.
    • JSON file (optional): uploading the JSON output ""<input_file_name>.json" from A||go's multimedia webservices leads to its update with Nero's results under the 'nero' label, along with the words within the input textual stream (cf. the 'general_info' then 'text' labels).
  • outputs:
    • text file (.txt): NERO produces a copy of the input text augmented with tags marking the beginning, the end and the semantic class of the named entities detected. The classes considered are: people ("pers"), function ("fonc"), organization ("org"), location ("loc"), human production ("prod"), time ("time") and amount ("amount"). Unknown named entities are tagged as "unk".
    • JSON file with the following format:
      {
      "general_info":{
      "src":""<input_file_name>"",
      "text":{
      "duration":"00:00:00",
      "start":0,
      "time_unit":"word position",
      "words":[
      "<first_word_of_the_input_text>",
      "<second_word_of_the_input_text>",
      ...
      ]
      }
      },
      "nero":{
      "annotation_type":"named entities",
      "system":"nero",
      "parameters":"<input_parameters>",
      "modality":"text",
      "time_unit":"word position",
      "events":[
      {
      "start":<start_position>,
      "end":<end_position>,
      "type":"<class>"
      },
      {
      "start":<start_position>,
      "end":<end_position>,
      "type":"<class>"
      },
      ...
      {
      "start":<start_position>,
      "end":<end_position>,
      "type":"<class>"
      }
      ]
      }
      }
      
      each element of the "events" list being a particular named entity.

Parameters:

  • FST: (no dash) the named entity detection is performed using the Finite State Transducers approach. By default, the Conditional Random Field approach is used.
  • -f2h: enables the hierarchical tagging of named entities.
    Example without "-f2h":
    <fonc> président </fonc> <pers> chirac </pers>
    
    with "-f2h":
    <pers> <fonc> président </fonc> <pers> chirac </pers> </pers>
    

Credits and license:

NERO was developed by Christian Raymond in IRISA/INSA Rennes. This piece of software relies on the OpenFST Library (version 1.3.1) and Wapiti (version 1.4).

In input :

example.txt
Trois ans après la démission du ministre du budget, qui avait dissimulé un compte en Suisse, deux lois importantes ont été votées.
Angela Merkel est en visite lundi à Ankara dans l’espoir de limiter les départ de réfugiés vers l'Europe.



In output :

example_nero.txt
<time> Trois ans </time> après la démission du <fonc> ministre du budget </fonc>, qui avait dissimulé un compte en <loc> Suisse </loc>, deux lois importantes ont été votées. 
<pers> Angela Merkel </pers> est en visite <time> lundi </time> à <loc> Ankara </loc> dans <org> l’espoir </org> de limiter les départ de réfugiés vers l' <loc> Europe </loc>.

17/08/2017 : Version 1.0,

How to use our REST API :

Think to check your private token in your account first. You can find more detail in our documentation tab.

This app id is : 2

This curl command will create a job, and return your job url, and also the average execution time

files and/or dataset are optionnal, think to remove them if not wanted
curl -H 'Authorization: Token token=<your_private_token>' -X POST
-F job[webapp_id]=2
-F job[param]=""
-F job[queue]=standard
-F files[0]=@test.txt
-F files[1]=@test2.csv
-F job[file_url]=<my_file_url>
-F job[dataset]=<my_dataset_name> https://allgo.inria.fr/api/v1/jobs

Then, check your job to get the url files with :

curl -H 'Authorization: Token token=<your_private_token>' -X GET https://allgo.inria.fr/api/v1/jobs/<job_id>