Appendix 5: Fixing language code inconsistencies in DSpace repository records

If the target language uses unique characters, it may be possible to automatically set the value of the language metadata.

Here is a SQL example for DSpace to specify items using a target language and set language value to them under the assume that the target language is not represented by 2-byte characters:

update metadatavalue set text_lang=’/*Insert here the ISO code of target language*/’

  where metadata_field_id in (/* Insert here each metadata_field_id numbers of which metadata accept some string value */)

  and length(text_value)!=octet_length(text_value)

  and text_value ~ ‘^[/*Insert here all specific characters uniquely used in target language */].*’

  and (text_lang is null or text_lang != ”);

You can use a regular expression that covers all the characters of the language. To take some examples, for Japanese:[ぁ-んァ-ヿ亜-腕] and for Cyrillic Scripts:[а-тА-ТЋ-Ӿћ-ӿ].

It’s an overnight cron job to add ‘en’ to any metadata lacking a language code, see more in Creating a SQL query or function to change text_lang to ‘en’

The Atmire CSV Power Tools could be used for editing exported metadata (en and en_US, as well as brackets, and other languages issues). 

Source: https://comments.coar-repositories.org/appendix-5-fixing-language-code-inconsistencies-in-dspace-repository-records/