一分錢, 一分貨 jat1 fan1 cin4, jat1 fan1 fo3
You get what you pay for.
Category Archives: Languages
一分錢, 一分貨 jat1 fan1 cin4, jat1 fan1 fo3
駟 Cantonese: si3. “Team of four horses”.
(Yeah, the dude next to me is reading 賽馬 rags…)
宮室築成以後, 董桌強選民間少女八百多人, 充作宮娥彩女. 至於從民間搜刮來的財物更是不其數, 僅囤積的糧食, 便足夠食用二十年.
成 sing4, seng4, cing4. finished.
When the palace was finished,
選 syun2. to choose, select.
民 man4. People, citizen.
間 gaaan1. space, interval.
少 siu3/2. Few, less.
女 neoi5. girl
–> young girl
八 baat3. 8
人 jan4. Man. Person
–> 800+ girls
Dong Zhuo selected (forcibly) eight hundred young girls or more
充 cung1. to fill, full, supply
作 zok3. to make, work, perform.
–> supplied to work as
娥 ngo4. beautiful. good
彩 coi2. colour(ful).
–> 彩女 (lower-rank) maids in the palace.
And sent them to work as maids in the palace.
至 zi3. to reach, arrive
於 wu1, jyu1. in at oon.
從 zung6/cung4/sung1. from, by, since, whence, through
–> as for
搜 sau2/1. search, seek; investigate
刮 gwaat3. shave, pare off, scrape
–> plundered, seized
來 loi4/6, lai4. to come, return.
財 coi3. valuables, riches, possessions.
物 mat6 thing, substance, creature.
更 ga(a)ng1. ang1. further, more.
是 si6. this. yes.
其 kei4. that, his/her/its.
數 sou3/2, sok3. number, several.
As for the property/resources seized from the public/civilians, they were innumerable.
僅 gan2/6. Only, merely, just.
囤 tyun4, deon6. grain basket.
積 zik1. accumulate, store up.
糧 loeng4. food, grain, provisions
食 sik6. eat, food
–> the accumulated/stored up provisions (food)
便 bin6, pin4. convenient, expedient.
足 zuk1, zeoi3. foot; enough.
夠 gau3. enough.
用 jung6. to use.
年 nin4. year
Just the accumulated food was enough to last 20 years.
董卓強迫獻帝遷都長安以後, 強征了二十五萬民夫, 在離長安二百多里的地方, 另築郿塢城, 建造宮室, 規模和京城不相上下.
董 dung2. Supervise. Surname.
卓 coek3/zoek3. Brilliant.
–> Dong Zhuo, died 192. Dictator.
強 koeng4/5, goeng6. Strong.
迫 baak1/3, bik1. Coerce. Busy.
–> Forcefully installed.
獻 hin3. Offer, present. Display.
帝 dai3. Emperor
–> Emperor Xian. Puppet of Dong Zhuo.
遷 cin1. To move, transfer.
都 dou1. Capital
–> Changed the capital to:
長 coeng4. Long.
安 on1. Peace.
後 hau6. After.
After Dong Zhuo installed Emperor Xian on the throne, and moved the Capital to Chang’An,
征 zing1. Invade. Conquered.
了 liu5. Past particle.
二 ji6. 2
十 sap6. 10
五 ng5. 5
萬 maan1. 10K
民 man4. People.
夫 fu1/4. Man, adult man. Those.
He captured 250,000 men,
在 zoi6. At.
離 lei4/6. Depart. Separate.
百 baak3. 100
多 do1. Numerous. Several.
里 lei5. Distance unit. Village.
–> At 100+ li away.
的 dik1. Genitive.
地 dei6. Place.
方 fong1. Region.
另 ling6. Another. Separate.
築 zuk1. Build(ing).
郿 mei4. County in Shaanxi.
塢 wu2. Enbankment. Low wall.
城 sing4, seng4. Castle, town.
–> Meiwu (name of the new city)
建 gin1. Build.
造 zou6, cou3/5. Build. Begin. Prepare.
宮 gung1. Palace. Temple.
室 sat1. Room. Place.
規 kwai1. Rules. Law.
模 mou4. Model, pattern. Copy.
–> Size, format.
和 wo6/4. Peace. Harmony. And.
京 ging1. Capital city.
不 bat1. Not.
相 soeng1. Mutual. Each other.
上 soeng6/5. Up/top, superior. Go/send up.
下 haa6/5. Bottom, below, inferior. Send down.
–> comparable, equivalent
sent them to a place 100 li or more from there, and had a new city built, Meiwu, and a palace comparable to the one in the capital.
The quality of the Unihan database, while overall good, degrades along with the popularity of the languages covered. Chinese (Mandarin and Cantonese) are doing okay, Japanese isn’t too bad, where Korean and Viêtnamese have a lot to be desired. So I decided to give a helping hand and see if I could plug a few holes.
Step 1: What holes are there to fill?
The first step was to identify what’s missing. It’s all good and well to say that Korean isn’t well covered by the Unihan database, but actual facts would be better. Over the last few years (10?), I have done terrible things to the Unihan in my own little backyard. I have it today indexed more or less to my liking as an sqlite database. The tables are (as of last week, who knows what I’ll add):
Don’t mind the initial k- in the table names, it’s how I prefix Constants in my favorite languages, and the habit carried over to sqlite tables. Which is convenient, since Unicode does the same to the field names in Unihan… It could even be that this k- prefix habit was acquired from too much time reading Unihan docs… People familiar with the Unihan file will sneeze at the kHakka table. Si señor, I know that Unihan doesn’t cover Hakka, dammit! I had to fetch data from Dr Lau, and had to first build a Hakka input method (劉拼法) based on Dr Lau’s work, for my Macs. From that, indexing Hakka readings into my Unihan sqlite database wasn’t exactly a hardship.
Likewise, building a jyutping 粵拼 input system for Mac OS X from the Unihan wouldn’t be so hard, but I only reinvent the wheel when it’s really necessary. And a dude called Dominic Yu produced an input plugin back in the days. There you go, complete with instructions. For the curious here’s what my input plugins panel looks like:
So, from this Unihan sqlite database, how to determine what’s missing for Korean? Easy. The gist of it is a simple SQL query:
select distinct codepoint from '+tbl+' where '+tbl+'.codepoint not in (select codepoint from kKorean);
where tbl is each of the tables (except kKorean) of course. So I wrote a Python script that iterates over these tables, taking care of the duplicates of course. This yielded close to 18,000 characters without a Korean reading. That’s quite a lot…
Step 2: Let’s Grab Some Data
Next I had to find a reliable online source to fill in the gaps. I know exactly where to find info on all these missing sinograms, and more, in the dead-tree world (I used to own a copy of the 大漢韓辭典 which has 56,000 chars, give or take). But that wouldn’t be exactly practical… The best source I have found so far is Zonmal, which despite its third-world 20th century, webmaster-as-an-anally-retentive-dictator interface and ugly name, has quite a bit of information. After a little poking around, the local Adolf having tried hard to hide things from people like me – he who should be happy that some people are actually interested – I found out where to POST my queries, and how to find the results if any.
Since I didn’t want to hammer this site – the idea being to retrieve the data, not take it down, this affair being an .aspx thingy hosted on IIS – I had to be gentle. Also, the whole thing being encoded in EUC_KR, grrr, I needed to do on the fly conversions. For these reasons, I went back to my favorite language, REAL Basic, which is much better equipped than Python for the task. I set a timer at 8 seconds, and for the next 38 hours or so, my trusty MBP pinged that web site one request at a time, gently extracting the information I needed. Tonight I finally saw the result: 8,346 characters with a match, and readings filled out. That’s about one third of the missing characters. Not so bad.
In my list of tables, the one for Korean is called kKorean, and not kHangul – which is the name used in Unihan. The reason is that I store the Korean syllables in romanization, using the Yale system. Yale is definitely not the most common, but it is very well suited for automated conversion to and from hangul. I have two small functions in every language I use that provide this conversion. And they will be used in the next step: indexing.
Step 3: Cleanup and Indexing
For indexing I went back to Python, since I had code already for indexing from previous experiments. All I needed to do was read each line of the output from step 2, check whether there was a valid reading (or more), convert them to Yale (as the output from Zonmal was in hangul), and update the sqlite database. Barely forty lines of code. My Unihan database is now 35.6MB, including the indexes, and is used on a small web app I use daily to look up sinograms I either don’t know, don’t know the Cantonese reading, or the meaning. Very handy.
You will find below the source code for steps 1 and 3. You’d need my Unihan sqlite database to run them but it’s too heavy to upload – instead I’ll write another post on how to build it from the Unihan.txt file.
- Dylan’s Hakka Page – hideous but lots of good stuff in there
- Unihan database lookup – the original!
- Dominic Yu’s page on Chinese and computers – jyutping plugin
- Dr Lau’s PinFa input – Big5 encoding
- My own web app, based on Unihan
- Zonmal – the input form only
- Wiktionary zh – useful source but encoding of pinyin borked
- Wiktionary en – same, in English though, and encodings not borked. I’m planning to do a similar operation to fill in the gaps for kMandarin.
- ZDic – built on the Unihan too. And another eye-sore.
- Chinese Text Project – yet another Unihan-based 1996 eye-sore.
- 康熙字典網上版 – Want more eye-soreness, “made in China”?
- vi-nom-vni.mim – Lisp crapola, part of m17n library. Useful chu nom data. This will be used as some stage to fill in the gaps for Viêt.
- Narrow Python – what happens if you wanna go beyond the basic plane in Python? Boom. Read this.
Dear Appeul, I can haz a betteur string encoding endjin for ze Français ? It iz vélocité. Note to Appeul: ASCII is for ze ouik. You Tee Eff foh evah!
A long long time ago, when I was studying linguistics and Asian languages in Paris, I was introduced to a researcher who had written his PhD about the dialect spoken in a Hakka village called Sung Him Tong. He gave me a copy of his PhD dissertation, which I probably have somewhere up in storage in Kwaichung or wherever it is that my stuff is stored.
Back then I wasn’t interested in Cantonese, and other non-mainstream Chinese languages. I was immersed in Middle Chinese and other dead languages, and had little time for the languages spoken by live people. I was twenty-something and allowed to be foolish. Anyway, I chucked the dissertation in my library, where it accumulated dust. The only impression I kept was that this man must’ve been very determined to live in a Hakka village for 6 months or more, just to study their dialect. The image I had of Hakka villages was that of round, multy-storey wooden houses shared by several families in the boondocks of Mainland China.
Except it’s not in the boondocks… Well, at least not in the middle of nowhere in Mailand China, but a canon-shot away from Fanling KCR/MTR station. It looks like nowhere as shiny and modern as say Central see some nice pics here, but even 20 years ago, it must’ve been less of a hardship than living in Shenzhen today…
Anyway for some strange reason I got a blast from the past — I was looking up some references about the Hakka language, and a bibliographical reference to that PhD dissertation came up — and I thought that I should look up that place, 20 years after receiving the dissertation… Better late than never, right? I felt some kind of disappointment — here I was, as a kid, imagining that dude slumming it in the mountains with the indigenous population, whereas he was probably commuting every day on a 小巴… My hero’s a commuter. Sigh…
So I poked around a bit — since this place is near 沙頭閣, a place I really want to visit — and hk-place is always a good start when you’re looking for info and piccies about forgotten places in HK. There’s lots of not so ancient buildings, but there seems to be a bunch of 圍村, home to the 鄧 family, and cousins to the people who live in 錦田 (and thus 吉慶圍 I suppose). The two pictures here (click to see the originals), show the contrast that you can find in such HK villages. Concrete and stone walls. This is something that I enjoy quite a bit. This village is definitely on my list of things to visit in 2011!
Yesterday, my company was hosting a wine dinner. Around 30 people, including the team (my asshole boss; his not-much-better snobbish boss, who speaks an English that’s so British that none of the yellow-skinned clients understood him; LBP, who wiggled an invite; my colleague and I). I had four clients, and thus had to share a table with my boss. We managed not to fight — at least not during the meal anyway. We basically ignored each other, and entertained our half of the table. Which brings me to my point, yes I have a point, and this rambling will lead somewhere I promise.
The four clients I was trying to keep amused and entertained — Waiter! Pour more wine for the lady, my dear old chap, that’s a good boy, fanks Guv‘ — were total strangers to me. I had received by email the names of two of them, not even the names of their guests. From the names I got, I had kind of figured out that there’d be at least one woman and one man each, but that was about it. They’re friends of a client of mine, a lovely but very anally-retentive biz exec who spends more time taking notes about the wines than savouring them. Whatever dudine, as long as you buy from me.
My regular skit, when I have to entertain guests I’ve never met, includes clowning in their language, if my skills stretch that far. Yesterday night, they kind of did. My Cantonese is nowhere near fluent, but as long as wine is involved — whether sales or clowning — it serves its purpose. And since I managed to clown and sell wines yesterday night, I’ll pat myself on the back. Good show!
Of course, as is usually the case when practicing Cantonese with locals, the question of why Cantonese and not Mandarin came up quite quickly. Two of my guests were insistent that learning Mandarin would be A/ easier, and B/ more useful. Since I wanted to keep them in a good mood, I refrained from launching into a full-blown frenzied attack against Mandarin-imperialism and ridiculing Hongkies urging me to learn Mandarin, when themselves speak it so poorly. If it’s so easy, how come you guys manage not to learn it properly…? Instead, I focused on the usefulness argument. I could have just said Cuz you CantoCunts can’t speak the language anyways, only way to go through your thick dumb skulls is friggin’ Gwongdungwaa! But it wouldn’t have achieved the aforementioned goal of keeping the guests in a good mood. So instead I had to explain that yes, most of my customers indeed didn’t speak Mandarin that well, even if they speak half a dozen other languages.
One of my clients was expressing doubts. I had to give examples to make things clearer. She had made the mistake of equating “I sell wines in China” to “I sell wines in Beijing and Shanghai”. Dudine, I sold more wines, both in $$$ and number of bottles, in Guangdong last week than I ever sold wines north of Guangzhou. And while I found some of “my” wines in shops in BJ and SH, it’s because my Cantonese clients sold them there. The few clients I have who are of Chinese stock but don’t speak Cantonese (or at least not natively) all speak another Chinese dialect/language and English: Taiwanese, Singaporeans, Chinese-Thai, etc… Friday I spent 8 hours in the bar of a client, negotiating a huge sale with their end-client, a Mainland man who was very well-mannered — rare enough to be mentioned — and who spoke Cantonese with us, but Hakka with his business partner back home. The business partner also spoke Cantonese — the end-client put him on speaker, and had to remind him every 3 minutes 講白話! Mandarin was never used during the 8 hours we were together. And why would it? Nobody in the room spoke it, or well enough to matter anyway. I’m sure it riles Beijing, but they’ll have to suck it up: more business is done every day in China in Chinese dialects and languages than in Mandarin.
So why should I learn Mandarin indeed?
Last Saturday, in Cantonese class, we were asked to make up five questions – each one following one of the five basic sentence patterns we’ve learned so far. One of the fellow students asked this question: “你去過幾多國家呀?” How many countries have you been to? (The question dealt with quantities, numbers and 幾-). Dealing with the answer was easy — until the teacher asked me to name a few. Country names in Chinese… Errr… Right. I could have cheated and blurt out the ones I know already — stuff like 中國 zung1 gwok3, 德國 dak1 gwok3, 英國 jing1 gwok3, etc… But this is about learning, right..? So I was kind of floundering. So here are a few names I learned last week:
- 柬埔寨 gaan2 pou4 zaai6 Cambodia
- 菲律賓 fei1 leot6 ban1 Philippines
- 印尼 jan3 nei4 Indonesia [This one I knew]
- 馬來西亞 ma5 loi4 sai1 aa3 Malaysia
- 歐洲 au1 zau1 Europe
- 意大利 ji3 daai6 lei6 Italy
I should’ve checked Sheik’s countries page…