Category Archives: Korea

The missing readings

The quality of the Unihan database, while overall good, degrades along with the popularity of the languages covered. Chinese (Mandarin and Cantonese) are doing okay, Japanese isn’t too bad, where Korean and Viêtnamese have a lot to be desired. So I decided to give a helping hand and see if I could plug a few holes.

Step 1: What holes are there to fill?

The first step was to identify what’s missing. It’s all good and well to say that Korean isn’t well covered by the Unihan database, but actual facts would be better. Over the last few years (10?), I have done terrible things to the Unihan in my own little backyard. I have it today indexed more or less to my liking as an sqlite database. The tables are (as of last week, who knows what I’ll add):

  • kCantonese
  • kMandarin
  • kJapaneseOn
  • kJapaneseKun
  • kKorean
  • kHakka
  • kVietnamese
  • kRSKangXi
  • kDefinition

Don’t mind the initial k- in the table names, it’s how I prefix Constants in my favorite languages, and the habit carried over to sqlite tables. Which is convenient, since Unicode does the same to the field names in Unihan… It could even be that this k- prefix habit was acquired from too much time reading Unihan docs… People familiar with the Unihan file will sneeze at the kHakka table. Si señor, I know that Unihan doesn’t cover Hakka, dammit! I had to fetch data from Dr Lau, and had to first build a Hakka input method (劉拼法) based on Dr Lau’s work, for my Macs. From that, indexing Hakka readings into my Unihan sqlite database wasn’t exactly a hardship.

Likewise, building a jyutping 粵拼 input system for Mac OS X from the Unihan wouldn’t be so hard, but I only reinvent the wheel when it’s really necessary. And a dude called Dominic Yu produced an input plugin back in the days. There you go, complete with instructions. For the curious here’s what my input plugins panel looks like:

All the input plugins I useWhich kind of reflects my linguistic interests…

So, from this Unihan sqlite database, how to determine what’s missing for Korean? Easy. The gist of it is a simple SQL query:

select distinct codepoint from '+tbl+' where '+tbl+'.codepoint
 not in (select codepoint from kKorean);

where tbl is each of the tables (except kKorean) of course. So I wrote a Python script that iterates over these tables, taking care of the duplicates of course. This yielded close to 18,000 characters without a Korean reading. That’s quite a lot…

Step 2: Let’s Grab Some Data

Next I had to find a reliable online source to fill in the gaps. I know exactly where to find info on all these missing sinograms, and more, in the dead-tree world (I used to own a copy of the 大漢韓辭典 which has 56,000 chars, give or take). But that wouldn’t be exactly practical… The best source I have found so far is Zonmal, which despite its third-world 20th century, webmaster-as-an-anally-retentive-dictator interface and ugly name, has quite a bit of information. After a little poking around, the local Adolf having tried hard to hide things from people like me – he who should be happy that some people are actually interested – I found out where to POST my queries, and how to find the results if any.

Since I didn’t want to hammer this site – the idea being to retrieve the data, not take it down, this affair being an .aspx thingy hosted on IIS – I had to be gentle. Also, the whole thing being encoded in EUC_KR, grrr, I needed to do on the fly conversions. For these reasons, I went back to my favorite language, REAL Basic, which is much better equipped than Python for the task. I set a timer at 8 seconds, and for the next 38 hours or so, my trusty MBP pinged that web site one request at a time, gently extracting the information I needed. Tonight I finally saw the result: 8,346 characters with a match, and readings filled out. That’s about one third of the missing characters. Not so bad.

In my list of tables, the one for Korean is called kKorean, and not kHangul – which is the name used in Unihan. The reason is that I store the Korean syllables in romanization, using the Yale system. Yale is definitely not the most common, but it is very well suited for automated conversion to and from hangul. I have two small functions in every language I use that provide this conversion. And they will be used in the next step: indexing.

Step 3: Cleanup and Indexing

For indexing I went back to Python, since I had code already for indexing from previous experiments. All I needed to do was read each line of the output from step 2, check whether there was a valid reading (or more), convert them to Yale (as the output from Zonmal was in hangul), and update the sqlite database. Barely forty lines of code. My Unihan database is now 35.6MB, including the indexes, and is used on a small web app I use daily to look up sinograms I either don’t know, don’t know the Cantonese reading, or the meaning. Very handy.

The Stash:

You will find below the source code for steps 1 and 3. You’d need my Unihan sqlite database to run them but it’s too heavy to upload – instead I’ll write another post on how to build it from the Unihan.txt file.

References:

Songgwang-sa 松廣寺 송광사

Reading Shanna’s blog lately, and enjoying her discovery of Korea, I thought I’d give out-of-the-way hints. I’ve been in a gazillion places in Korea – and while I don’t have that many pictures since my travels were done, ahem, in the 20th Century, it might still be interesting.

The entrance to Songgwang-sa

Songgwangsa is a Buddhist temple from the Chogye monastic order 曹溪禪宗. According to the remaining records, Songgwang-sa was founded by Zen Master Hyerin in the late Shilla dynasty. It is my favorite temple in Korea (I’ve been there 7 or 8 times), for a few reasons:

  • It’s in Chŏlla-do, near Sunch’ŏn, nearby Chiri-san. It’s a lovely region, away from the hustle of the cities.
  • It’s a friendly temple – there are even foreign monks studying there. Visitors can arrange to stay overnight and experience a little of the Buddhist life.
  • It’s a beautiful temple per se, very well maintained, and very peaceful – if you can avoid the crowds that plague it sometimes, ransom of the success I bet.
  • It has some cool artefacts – I remember seeing some hPhags-Pa script in the 선보각, #24 on the map below. The language geek in me had a little orgasm right there… :-)
  • The food at the restaurants below the temple is great – like in any self-respecting restaurant in Chŏlla-do.

I shamelessly copied the maps provided on Songgwang-sa’s own website for reference. Go visit their site and get a better idea of the place. But nothing can replace going there.

A similar pic of 大雄寶展 was on my desktop for years. I need to find my own pics…There it is!

So the question is: WHO IN BOTHERATION REMOVED THAT TREE? Grrr.

How to go there?

It’s pretty easy – although it takes forever. This is based on sliiiiightly dated info (I haven’t been to Songgwang-sa in ages). You need to get first to Gwangju’s Bus Terminal. That’s the easy part! Go either to Kangnam Express Bus Terminal or the East Seoul Terminal (동서울 터미널/강변역). There’ll be buses to Gwangju every 5 minutes, give or take.

Then, when you arrive in Gwangju – looking it up on the web I see they have improved the building and renamed it uSquare. Ridiculous much? Anyway… Last time I did this I went from the long-distance section of the terminal to the “local” section, bought a ticket for the “direct” bus to Songgwang-sa and was on my way. I see here that the schedule hasn’t changed :-) There’s 5 buses per day each way. So you should leave from Seoul very early – 5am or so – so that you can get the first bus to the temple. It’s REALLY worth it. You’ll arrive before lunch and will be able to enjoy some quiet. Then lunch at one of the restaurants, get a bus back to Gwangju and go have some duck soup (오리탕)!

A few things about Korea.

I wrote the following, in an email, to an Aussie who was being relocated to Korea, and didn’t know the place. This might actually come in handy.

* Housing
There are two systems for rental, one for expats and one for the locals (and those who don’t want to play along with the expat rip-off system).

The expat system is basically a 3-year contract that has to be paid in full, up-front, and it’s not cheap, in areas full of other expats… 2,000US$ and up, way up, per month. So the company – or you – will have to front 72,000 US$ and up. There are a few areas where you can find these places – which are usually nicer than what the locals have. If the company goes for that, so be it :-) You’ll live in a nice place. If you have to pay for yourself, or the budget isn’t sufficient, you might want to consider option 2.

The other system is what they call key-money (welcome to konglish). You sign a contract for 1 to 2 years, pay up a deposit – usually a very large one, 80,000 US$ and up – and you pay no rent. The deposit is refunded when you leave. Yup, I know, sounds weird and one has to wonder how they make money, but there you go. I lived in such places all the time I was there, and I was basically living rent-free. You’ll need the assistance of a bunch of Koreans to get that (to combat instant foreigner-induced price-tag inflation and whatnot), but Koreans are usually in sufficient supply in Seoul…

* Phones
Keep the same Koreans handy when you apply for a mobile phone. Most companies now have a prepaid-ish + very expensive system for foreigners, for fear they’ll leave the country without paying (it wasn’t so when I was there…). To get the “Korean discount” you register the phone in the name of a Korean and set up auto-pay on your own bank account. Voila, thank you, done. ;-)

Phones are CDMA – our phones can’t be used there. Welcome to “we’ll do it our way and screw you twice” Korea (they screw mostly their own people but whatever). Many phones can be set to English – I know mine has that somewhere. If you use the prete-nom services of a Korean person, just get an iPhone – other phones are made by and for Koreans. Seriously. Your HK mobile phone will work (roaming charges $$$$$$$$$$$$$$$$$) if it’s 3G.

* Transportation
Taxis used to be considered as mass-transportation; you’d fit in a cab as many ppl as possible, with ppl stepping on and off cabs along the way. 1997 and the economic crisis brought this to an end, thank God, but taxis remain very cheap. Except the black cabs – avoid them unless you have no choice. Cabbies usually don’t speak English, I mean at all. Prepare addresses in Korean, and have a phone number of a Korean-speaking person ready. The addresses in Korea are very confusing, and cabs have difficulties sometimes finding places.

When you arrive at Incheon airport, DO NOT TAKE A CAB! If the company is not picking you up, take a KAL Limousine bus to your hotel. Buy a ticket at the KAL counter, nearby Gate 4. Taxis working the airport, even the legal ones, are crooks, and will give you a full tour of Seoul before you arrive home, maybe. I had a cab driver arrested once.

The so-called high-speed train from the airport is, excusez my French, a fucking joke. Neither fast nor convenient. A waste of time. KAL Limousine Bus, lady. :-)

Inside Seoul (the place is 40 miles wide, 20 miles north to south), buses are convenient when you know what you’re doing. Avoid them until you’re settled in. Traffic in Seoul is almost as bad as in Bangkok – seriously. The metro system is getting better now, and serves its purpose. The PAs and signs are in Korean and Konglish, so you’ll be fine after a few days.

Taxis are usually fine within the city – as long as you’re going somewhere they can find. They’re not all honest, the airport cab drivers have cousins downtown, but most of the time they’re ok. You’ll miss HK cabs though. But Seoul cabs have GPS. “nabigayshon”, don’t ask. Sometimes more than one…

* Health
Hospitals are hit and miss – good ones and bad ones. And too many people. Seoul has 11 million people, and just as many in the suburbs. And they usually come to Seoul for work… So health care is kinda tough. Plus it’s massively expensive. Make sure you have good coverage first, along with the local insurance card (hospitals won’t take you in unless you have a medical insurance card) and in case of problems, until you find a suitable place, go to Samsung Jeil Hospital, in Jangchungdong. I use to be a service provider to hospitals, and they’re one of the best.

Unless you’re feeling adventurous and desire sick days, do not drink tap water. Even boiled. Once boiled it still contains heavy metals. Once boiled and filtered it is considered safe to drink. Oh well.

* Aussie Embassy
If they haven’t moved they’re in the Kyobo Building, on Sejongno (-no/-ro means road; as you’ll see, Seoul’s streets are highway-sized). It’s in the northern half of Seoul, downtown. In this building you also have a large bookshop that actually carries English books. There ain’t that many. The Aussie Embassy used to have a bar called the Boomerang Bar, that used to be open (or not) to ordinary people every Friday arvo. Worth checking it out – drinks were cheap there.

* Weather
Seoul has two seasons, interrupted by short-term “seaslets”: freezing cold and dry, and bloody hot and humid. “Spring” is a few weeks of generally clement weather, and fall is wonderful, but lasts as much as a snowball on a grill. No typhoons, yay, but Jangma, the monsoon. Jangma actually means long rain. You’ll see. People die every year trekking in mountains during Jangma. They never learn.

* Trekking
Rome is called the city of 7 mountains or something like that. Seoul is the city with a thousand of them. I don’t think you can find inside Seoul a mile of flat land. Hills and mountains everywhere. Many of them, alas, layered in concrete apartment blocks. But a few good ones. Koreans LOVE trekking. They’re basically the human version of goats. Even the beer-bellied 2-packs a day dudes are better than you’ll ever be.

* Food
Hope you like chillies and garlic. I do :-) Koreans are carnivore. They don’t understand what “vegetarian” means. I’ve seen even Buddhist monks eat meat. There are a few vegetarian places, but mostly, if you’re a veggie (I’m not) you’re in for a lot of home-cooking.

* Travel
Lots of nice places to visit outside Seoul — Seoul’s butt-ugly — and buy this book to give you lots of tips on the place. Avoid Chinese New Year and the Autumn Festival to travel, because 20 million+ of ppl are doing the same. Highway 1 – the main highway in Korea – is smaller than many avenues in Seoul, and turns into a parking lot during these holidays.

Yes, Korea celebrates Chinese New Year and such — they just tell us that it’s LUNAR rather than Chinese, but that’s where they got the Lunar calendar from, China :-)

* Money
The Korean Won is one of those monkey deals with too many zeroes, and alas not convertible outside Korea. There are also restrictions on sending money out. Caveat emptor. The won is available in 1,000, 5,000, 10,000 and since recently 50,000 won bills, and there are coins of 500, 100, 50, and 10 won.

There’s an Octopus-like card (T-Money), except that it has only 2 legs instead of eight, as it’s used mostly in buses, metros and taxis. You can get it in subway stations. Handy. They can be recharged by handing over a bill and the card and a smile, or at machines. Remember. People. Don’t. Speak. English :-)

Opening a bank account usually takes 10 minutes. You’ll only get a debit card, at best, or even just an ATM card. And a cute passbook. Credit cards are harder to get, since, like for mobile phones, you are suspected of wanting to run away without paying. Get used to it.

Standing Seat?!?

立席. Ipseok. That’s what I am looking at, here in the middle of nowhere, in the south-east region of South Korea. I can’t remember what station it was, possibly Pohang. Somewhere quaint, anyway, with remnants of the Japanese era. The station looked definitely like a transplant from another decade, and from across the sea. So. Ipseok. I try to wrap my mind around that concept. 立 means “standing”. 席 means “seat”. Hullo? Standing seat…?

I don’t remember where I was that day, but I remember where I wanted to be – back to the civilisation. I was touring the Deep South, sans Cajun music, with a few friends, and we wanted to go from Podunk, South Korea, to Busan. Not that Busan was/is much of a megalopolis, but as far as amenities were concerned, it’s night and day. So we were inquiring about trains going down there.

Back then, there was no TGV clone, aka KTX, linking Seoul to Busan in 4 hours or so. Plus, KTX trains, as they’re called, don’t stop in pissoir-sized stations anyway. So it was Mugunghwa all the way, a cute name for trains that would look luxurious, I am sure, in Myanmar, Utar Pradesh or Tanzania. It takes these trains 20 minutes to reach cruise speed, but by then they already had to stop a couple of times already, for they are the appointed cattle-movers, and will serve every possible village along the line. Cool when you have to go somewhere remote and otherwise unreachable. Otherwise, a pain in multiple body parts.

My favorite transportation means in the Nineties was long-distance buses. About as safe as unprotected sex in Uganda, the so-called high-speed buses, oh yeah!, emphasis on speed, and high, cover every imaginable place in South Korea. There are a couple of bus terminals in Seoul, I have used three of them, with buses leaving for Gawdknowswhere every 10 minutes. Some people try to reinvent the wheel, Koreans reinvented the human noria. These buses offer relatively comfortable seats – at least when the buses were not moving – and record average times from Seoul to  Gawdknowswhere and back. Plus, long-distance buses have their own lanes on highways. And their drivers know how to shove automobiles aside. Nothing. Can. Delay. Them.

But that day, for some reason, buses were not an option. Thus the inquiry at the station. And the answer. “Sorry. Only ipseok for the next train. And the one after that is in two hours.” Yippee Ki Yay, mother. Okay, what’s ipseok, then, I ask. The employee looks at me like I am dumb. Probably. “Standing only. If there are free seats, you can sit, but if people come in with pre-reserved seats, you have to give it back.” Which is basically what happened. Over the two hours and change that it took us to reach Busan, I switched seats a few times, and so did my friends. We were seating more or less as a group when we departed, but when we arrived we were spread all over the carriage. Which made for a quiet trip, of sorts. This was before mobile phones became a must-have, when people would actually talk to each other. So while we didn’t have obnoxious people yelling on their phones, the noise level was somehow high, albeit a loud hum of conversations. Besides, people in that region speak a dialect with a very strong accent, that make them sound louder than they may be. Then again, they’re loud, too…

It was probably one of the most unremarkable trips I took in a train – although, considering my mileage on trains, the number of unremarkable and long-forgotten trips is probably high. It sticks in my mind because of that single word. Ipseok. 立席.