This article includes a list of general references, but it lacks sufficient corresponding inline citations. (April 2019) |
Universal Dependencies, frequently abbreviated as UD, is an international cooperative project to create treebanks of the world's languages.[1] These treebanks are openly accessible and available. Core applications are automated text processing in the field of natural language processing (NLP) and research into natural language syntax and grammar, especially within linguistic typology. The project's primary aim is to achieve cross-linguistic consistency of annotation, while still permitting language-specific extensions when necessary. The annotation scheme has it roots in three related projects: Stanford Dependencies,[2] Google universal part-of-speech tags,[3] and the Interset interlingua[4] for morphosyntactic tagsets. The UD annotation scheme uses a representation in the form of dependency trees as opposed to a phrase structure trees. At the present time (January 2022), there are just over 200 treebanks of more than 100 languages available in the UD inventory.