% % Affixes get stripped off the left and right side of words % i.e. spaces are inserted between the affix and the word itself. % % Some of the funky UTF-8 parenthesis are used in Asian texts. % In order to allow single straight quote ' and double straight quote '' % to be stripped off from both the left and the right, they are % distinguished by the suffix .x and .y (as as Mr.x Mrs.x or Jr.y Sr.y) % % 。is an end-of-sentence marker used in Japanese texts. ")" "}" "]" ">" » 〉 ) 〕 》 】 ] 』」 "’’" "’" ''.y '.y "%" "," "." 。.y ‧ ":" ";" "?" "!" ‽ ؟ ?! ….y ....y "”" ━.y –.y ー.y ‐.y 、.y ~ 's 're 've 'd 'll 'm ’s ’re ’ve ’d ’ll ’m ¢ ₵ ™ ℠ : RPUNC+; "(" "{" "[" "<" « 〈 ( 〔 《 【 [ 『 「 、.x ` `` „ “ ‘ ''.x '.x ….x ....x ¿ ¡ "$" US$ USD C$ £ ₤ € ¤ ₳ ฿ ₡ ₢ ₠ ₫ ৳ ƒ ₣ ₲ ₴ ₭ ₺ ℳ ₥ ₦ ₧ ₱ ₰ ₹ ₨ ₪ ﷼ ₸ ₮ ₩ ¥ ៛ 호점 † †† ‡ § ¶ © ® ℗ № "#" * • ⁂ ❧ ☞ ◊ ※ ○ 。.x ゜ ✿ ☆ * ◕ ● ∇ □ ◇ @ ◎ –.x ━.x ー.x -- - ‧.x : LPUNC+; % The below is a quoted list, used during tokenization. Do NOT put % spaces in between the various quotation marks!! ""«»《》【】『』`„“": QUOTES+; % The below is a quoted list, used during tokenization. Do NOT put % spaces in between the various symbols!! "()¿¡†‡§¶©®℗№#*•⁂❧☞◊※○。゜✿☆*◕●∇□◇@◎–━ー---‧": BULLETS+; /en/words/units.1: UNITS+; /en/words/units.1.dot: UNITS+; /en/words/units.3: UNITS+; /en/words/units.4: UNITS+; /en/words/units.4.dot: UNITS+;