一分錢, 一分貨 jat1 fan1 cin4, jat1 fan1 fo3
You get what you pay for.
Category Archives: China
一分錢, 一分貨 jat1 fan1 cin4, jat1 fan1 fo3
I own a couple of mobile phones. Two of which have a local Hong Kong SIM card in them — others are just for testing, I just use WiFi on them, but I digress. That’s not the point. I don’t have a house phone, however. Or an office phone. Which seems to annoy, for some reason, people who think they have to have a landline number for me. Like banks, the government, and others, intent on emptying my bank account. Sorry folks, I can only be reached, if at all, through one of my mobile phones.
It just so happens that one of the phones I own, with a SIM card in it, is not really a phone. At least, I don’t see it as a phone. Hello Magritte. Ceci n’est pas un téléphone. This is a 3G-enabled micro tablet. No mi amor, sorry ah, gomen ne, not a phone. I know, it has a phone number, and technically, this number could be called. And is, by plenty of people I never heard of. Except. This old Nexus S has a nifty application called Firewall, and it is set on “Block All Calls”. Yup, all. Plus, the dial icon is hidden. As I said, not a phone. A micro tablet. Yupskies.
Now, on the subject of calling, and expecting me to answer. MWAHAHAHA! Really. Apparently my phone numbers (including the one that I never pick up because Firewall just hangs up on them) have been sold multiple times by everybody and Mrs Chan, their mother. And they’ve been sold to everybody and Mrs Lee, their Auntie. Apparently. And they think they have a right to peddle their crap to me, through calls and SMS.
The problem is compounded by the fact that I have a phone with THREE phone numbers in the SIM card: HK, China, Macau. The Macau number has not been sold to anyone apparently, for I never get any phones calls from Macau — just the usual avalanche of Casino-related SMS every time I arrive in this cesspool of gambling and prostitution. But I digress, again. On the other hand, I get a bazillion of phone calls on the HK number, and calls + SMS on the Chinese number.
Some of the Chinese SMS are hilarious: love letters from “my wife” (they call me 老公, so that’s my wives, right?); people giving me their (updated) bank account so that I don’t forget to send them the money I apparently owe them (I always feel like sending them a goatsee pic entitled “receipt.pdf” but so far I managed not to…); announcements for various exhibitions and events.
I picked up a couple of calls from China, just for kicks, but they weren’t much fun. No spweekee Engriss. And most of the time 唔識講白話 either. So now phone calls from Big6 are treated like other calls:
- Unknown Number: nofanks.
- Number not in my (very large) address book: nofanks.
- Number in my address book: depends. Maybe, maybe not. Probably not, though.
Also, my phone doesn’t ring. Yeah, the one that actually accepts calls. It doesn’t vibrate either. It’s set on silent. Permanently. When I get a phone call — assuming I haven’t turned the network off and just kept WiFi on of course… — the screen display the call and info, and that’s it. No ringee. No vibree. Terima kasih.
Of course, since I spend a large fraction of my time on the phone — mostly emails and chat applications though — I miss (involuntarily) very few calls. And the ones I do miss, so what? They can call again — they’ll have to, as I don’t have voicemail, despite the fact that some people think I do: they heard some Chinese and a beep. Well, learn yerself some Chinese, buddy, for that message was not a voicemail announcement: it was telling you that I am not unavailable, and to try again later. Woopsies.
駟 Cantonese: si3. “Team of four horses”.
(Yeah, the dude next to me is reading 賽馬 rags…)
宮室築成以後, 董桌強選民間少女八百多人, 充作宮娥彩女. 至於從民間搜刮來的財物更是不其數, 僅囤積的糧食, 便足夠食用二十年.
成 sing4, seng4, cing4. finished.
When the palace was finished,
選 syun2. to choose, select.
民 man4. People, citizen.
間 gaaan1. space, interval.
少 siu3/2. Few, less.
女 neoi5. girl
–> young girl
八 baat3. 8
人 jan4. Man. Person
–> 800+ girls
Dong Zhuo selected (forcibly) eight hundred young girls or more
充 cung1. to fill, full, supply
作 zok3. to make, work, perform.
–> supplied to work as
娥 ngo4. beautiful. good
彩 coi2. colour(ful).
–> 彩女 (lower-rank) maids in the palace.
And sent them to work as maids in the palace.
至 zi3. to reach, arrive
於 wu1, jyu1. in at oon.
從 zung6/cung4/sung1. from, by, since, whence, through
–> as for
搜 sau2/1. search, seek; investigate
刮 gwaat3. shave, pare off, scrape
–> plundered, seized
來 loi4/6, lai4. to come, return.
財 coi3. valuables, riches, possessions.
物 mat6 thing, substance, creature.
更 ga(a)ng1. ang1. further, more.
是 si6. this. yes.
其 kei4. that, his/her/its.
數 sou3/2, sok3. number, several.
As for the property/resources seized from the public/civilians, they were innumerable.
僅 gan2/6. Only, merely, just.
囤 tyun4, deon6. grain basket.
積 zik1. accumulate, store up.
糧 loeng4. food, grain, provisions
食 sik6. eat, food
–> the accumulated/stored up provisions (food)
便 bin6, pin4. convenient, expedient.
足 zuk1, zeoi3. foot; enough.
夠 gau3. enough.
用 jung6. to use.
年 nin4. year
Just the accumulated food was enough to last 20 years.
董卓強迫獻帝遷都長安以後, 強征了二十五萬民夫, 在離長安二百多里的地方, 另築郿塢城, 建造宮室, 規模和京城不相上下.
董 dung2. Supervise. Surname.
卓 coek3/zoek3. Brilliant.
–> Dong Zhuo, died 192. Dictator.
強 koeng4/5, goeng6. Strong.
迫 baak1/3, bik1. Coerce. Busy.
–> Forcefully installed.
獻 hin3. Offer, present. Display.
帝 dai3. Emperor
–> Emperor Xian. Puppet of Dong Zhuo.
遷 cin1. To move, transfer.
都 dou1. Capital
–> Changed the capital to:
長 coeng4. Long.
安 on1. Peace.
後 hau6. After.
After Dong Zhuo installed Emperor Xian on the throne, and moved the Capital to Chang’An,
征 zing1. Invade. Conquered.
了 liu5. Past particle.
二 ji6. 2
十 sap6. 10
五 ng5. 5
萬 maan1. 10K
民 man4. People.
夫 fu1/4. Man, adult man. Those.
He captured 250,000 men,
在 zoi6. At.
離 lei4/6. Depart. Separate.
百 baak3. 100
多 do1. Numerous. Several.
里 lei5. Distance unit. Village.
–> At 100+ li away.
的 dik1. Genitive.
地 dei6. Place.
方 fong1. Region.
另 ling6. Another. Separate.
築 zuk1. Build(ing).
郿 mei4. County in Shaanxi.
塢 wu2. Enbankment. Low wall.
城 sing4, seng4. Castle, town.
–> Meiwu (name of the new city)
建 gin1. Build.
造 zou6, cou3/5. Build. Begin. Prepare.
宮 gung1. Palace. Temple.
室 sat1. Room. Place.
規 kwai1. Rules. Law.
模 mou4. Model, pattern. Copy.
–> Size, format.
和 wo6/4. Peace. Harmony. And.
京 ging1. Capital city.
不 bat1. Not.
相 soeng1. Mutual. Each other.
上 soeng6/5. Up/top, superior. Go/send up.
下 haa6/5. Bottom, below, inferior. Send down.
–> comparable, equivalent
sent them to a place 100 li or more from there, and had a new city built, Meiwu, and a palace comparable to the one in the capital.
The quality of the Unihan database, while overall good, degrades along with the popularity of the languages covered. Chinese (Mandarin and Cantonese) are doing okay, Japanese isn’t too bad, where Korean and Viêtnamese have a lot to be desired. So I decided to give a helping hand and see if I could plug a few holes.
Step 1: What holes are there to fill?
The first step was to identify what’s missing. It’s all good and well to say that Korean isn’t well covered by the Unihan database, but actual facts would be better. Over the last few years (10?), I have done terrible things to the Unihan in my own little backyard. I have it today indexed more or less to my liking as an sqlite database. The tables are (as of last week, who knows what I’ll add):
Don’t mind the initial k- in the table names, it’s how I prefix Constants in my favorite languages, and the habit carried over to sqlite tables. Which is convenient, since Unicode does the same to the field names in Unihan… It could even be that this k- prefix habit was acquired from too much time reading Unihan docs… People familiar with the Unihan file will sneeze at the kHakka table. Si señor, I know that Unihan doesn’t cover Hakka, dammit! I had to fetch data from Dr Lau, and had to first build a Hakka input method (劉拼法) based on Dr Lau’s work, for my Macs. From that, indexing Hakka readings into my Unihan sqlite database wasn’t exactly a hardship.
Likewise, building a jyutping 粵拼 input system for Mac OS X from the Unihan wouldn’t be so hard, but I only reinvent the wheel when it’s really necessary. And a dude called Dominic Yu produced an input plugin back in the days. There you go, complete with instructions. For the curious here’s what my input plugins panel looks like:
So, from this Unihan sqlite database, how to determine what’s missing for Korean? Easy. The gist of it is a simple SQL query:
select distinct codepoint from '+tbl+' where '+tbl+'.codepoint not in (select codepoint from kKorean);
where tbl is each of the tables (except kKorean) of course. So I wrote a Python script that iterates over these tables, taking care of the duplicates of course. This yielded close to 18,000 characters without a Korean reading. That’s quite a lot…
Step 2: Let’s Grab Some Data
Next I had to find a reliable online source to fill in the gaps. I know exactly where to find info on all these missing sinograms, and more, in the dead-tree world (I used to own a copy of the 大漢韓辭典 which has 56,000 chars, give or take). But that wouldn’t be exactly practical… The best source I have found so far is Zonmal, which despite its third-world 20th century, webmaster-as-an-anally-retentive-dictator interface and ugly name, has quite a bit of information. After a little poking around, the local Adolf having tried hard to hide things from people like me – he who should be happy that some people are actually interested – I found out where to POST my queries, and how to find the results if any.
Since I didn’t want to hammer this site – the idea being to retrieve the data, not take it down, this affair being an .aspx thingy hosted on IIS – I had to be gentle. Also, the whole thing being encoded in EUC_KR, grrr, I needed to do on the fly conversions. For these reasons, I went back to my favorite language, REAL Basic, which is much better equipped than Python for the task. I set a timer at 8 seconds, and for the next 38 hours or so, my trusty MBP pinged that web site one request at a time, gently extracting the information I needed. Tonight I finally saw the result: 8,346 characters with a match, and readings filled out. That’s about one third of the missing characters. Not so bad.
In my list of tables, the one for Korean is called kKorean, and not kHangul – which is the name used in Unihan. The reason is that I store the Korean syllables in romanization, using the Yale system. Yale is definitely not the most common, but it is very well suited for automated conversion to and from hangul. I have two small functions in every language I use that provide this conversion. And they will be used in the next step: indexing.
Step 3: Cleanup and Indexing
For indexing I went back to Python, since I had code already for indexing from previous experiments. All I needed to do was read each line of the output from step 2, check whether there was a valid reading (or more), convert them to Yale (as the output from Zonmal was in hangul), and update the sqlite database. Barely forty lines of code. My Unihan database is now 35.6MB, including the indexes, and is used on a small web app I use daily to look up sinograms I either don’t know, don’t know the Cantonese reading, or the meaning. Very handy.
You will find below the source code for steps 1 and 3. You’d need my Unihan sqlite database to run them but it’s too heavy to upload – instead I’ll write another post on how to build it from the Unihan.txt file.
- Dylan’s Hakka Page – hideous but lots of good stuff in there
- Unihan database lookup – the original!
- Dominic Yu’s page on Chinese and computers – jyutping plugin
- Dr Lau’s PinFa input – Big5 encoding
- My own web app, based on Unihan
- Zonmal – the input form only 😉
- Wiktionary zh – useful source but encoding of pinyin borked
- Wiktionary en – same, in English though, and encodings not borked. I’m planning to do a similar operation to fill in the gaps for kMandarin.
- ZDic – built on the Unihan too. And another eye-sore.
- Chinese Text Project – yet another Unihan-based 1996 eye-sore.
- 康熙字典網上版 – Want more eye-soreness, “made in China”?
- vi-nom-vni.mim – Lisp crapola, part of m17n library. Useful chu nom data. This will be used as some stage to fill in the gaps for Viêt.
- Narrow Python – what happens if you wanna go beyond the basic plane in Python? Boom. Read this.
張丞相知潤州,有婦人夫出不歸,忽聞菜園井中有死人,即往哭曰:吾夫也。以聞於官。升 命吏集鄰裏驗,是其夫否,皆言井深不可辯。升曰:眾不可辯,而婦人獨知為夫,何耶?送 獄訊問,乃奸夫殺之,婦與共謀。
From 棠陰比事 作者: 桂萬榮 宋
Rough translation. Need to work on it. But you get the gist of this 7th Century CSI:
Zhang Sheng Inspects a Well
When Zhang Sheng was governing Runzhou, a husband had gone and not come back. Suddenly the wife reported there was a dead body in their garden’s well. She started crying, saying “This is my husband!” She informed the officials. Zhang Sheng sent an envoy to gather information in the neighbourhood. Everybody agreed that the well was deep, so they had no idea who it was. Zhang Sheng said: How come you are the only one to know for sure it’s your husband? He sent the crafty wife for interrogation and she confessed her crime.