Recommendations for repository managers on transliteration
Enable UTF-8 support in your repository and use the original alphabet / the writing system whenever possible. If it is necessary to transliterate metadata, use recognized standards (e.g. ISO).Transliteration is the conversion of text from one system of writing to another (e.g. from the Greek alphabet to the Latin alphabet) that relies on mapping graphemes from one writing system to those in another in a standardized way, so that readers can reconstruct the original spelling using standardized transliteration tables or software tools. Some countries have transliteration standards.
Transcription is the type of conversion where the text in the target language captures sound rather than spelling.
Transliteration is sometimes unavoidable. Huge amounts of transliterated or transcribed metadata can be found in bibliographic databases and library catalogs. In some research communities transliterating names and even titles is a common practice. Although support for UTF-8 is now common, these practices persist. If a repository already contains transliterated metadata or its designated community requires that metadata be transliterated, the following recommendations should be followed:
- ¶ 52 Leave a comment on paragraph 52 1
- Use recognized transliteration standards.
- If possible, choose one standard and declare it in the repository’s FAQ / user manual / about pages.
- If this is not possible, declare all used standards in the FAQ / user manual / about pages.
- To ensure that readers can reconstruct the original spelling, provide links to relevant transliteration guidelines (e.g. Library of Congress) and/or tools (e.g. https://alittlehebrew.com/transliterate/, https://www.translitteration.com) in FAQ / user manual / about pages.
- If author names are transliterated, identifiers such as ORCID should be used to connect different name variants.
- Use language codes for transliterated metadata (e.g. this resource recommends e.g. el-Latn to indicate text in Greek transliterated to Roman alphabet https://eidr.org/documents/Using_EIDR_Language_Codes.pdf)
¶ 53 Leave a comment on paragraph 53 0 If there are transliteration standards, transcription should be avoided because rules are not always clear, which makes it difficult to reconstruct the original spelling. If transcription is unavoidable, follow the rules and standards for your languages.
AILLA has not needed to transliterate a non-roman script yet, but we are expecting a deposit of modern (21st century) texts written in Mayan hieroglyphics by native speakers of several different Mayan languages. I do not think that UTF-8 can handle this script. This deposit presents new challenges for us, e.g. in determining which language code to use to classify the texts (we plan to use the code for the native language of the author), in rendering the glyphs (it might not be possible to do this in the metadata), and translating the content of the writing into Spanish and English (we must rely on the content creators to do that). I’m sure there will be additional challenges that we have not foreseen.