Regenerating the Russian Dictionaries ------------------------------------- May 2014 The raw data is located in the file morphs20030410.mrd. This is the master morphology file from 2003. The corresponding grammatical markup is in rgramtab20030306.tab. The current Russian dicts are built from these two files. It is to be converted to Berkley DB files by running: time ./make-hash-db.pl This will create the files root.db root.db and tail.db. It should take about 45 seconds or less to run. These are then used to create the link-grammar dictionaries: time ./mkdict.pl This should take abput 25 seconds or less. After this, the Russian dictionaries are ready to be used! The query-morph.pl tool will examine the morphology candidates for a given word. For example: ./query-morph.pl ั‚ะตัั‚ Morphology sources ------------------ The morphology is nominally a part of http://www.aot.ru It comes from the "seman" project: http://seman.sourceforge.net/ Source data for the morphology is available under the LGPL license at: http://sourceforge.net/p/seman/svn/HEAD/tree/trunk/ A read-only version can be obtained as: svn checkout svn://svn.code.sf.net/p/seman/svn/trunk seman-svn So, for example, rgramtab.tab is in trunk/Dicts/Morph/ and the morphs.mrd is in trunk/Dicts/SrcMorph/RusSrc/morphs.mrd These files are encoded in KOI8-R. The iconv command can be used to convert KOI8-R to UTF, as folllows: iconv -f KOI8-R -t UTF-8 > So: iconv -f KOI8-R -t UTF-8 morphs.mrd > morphs-utf8.mrd iconv -f KOI8-R -t UTF-8 rgramtab.tab > rgramtab-utf8.tab The End. --------