zero

This is an article about my experience learning how to fly an airplane. Given it's rare in Taiwan to have a chance to learn general aviation, I'm gonna write this article in Traditional Chinese instead, an English version might come later.

上面那些都講完之後,就可以談談實際的飛行訓練了。這篇來聊一下學校的差別跟訓練的各種階段。

學校的種類

前面提到,我找的學校其實是一個俱樂部,你要自己找老師然後租用俱樂部的飛機來飛。這樣的學校通常被稱為是 Part 61。另外有一種比較職業導向,目的通常是訓練職業飛行員的學校被稱為 Part 141,這些學校通常有比較嚴格的課程規劃、對學生的要求也比較高、你也需要有更多的時間跟精力上的 commitment,一些著名的代表包括長榮在 Sacramento 的 EVA Flight Training Academy

這篇因為是在講我個人的經驗,提到的都是 Part 61 的狀況,但 Part 61 很看老師的教學風格,所以你如果去學也有可能遇到一些不同的情況。

地面訓練 (Private Pilot Knowledge Exam)

這個部分主要是在幫你準備筆試的部分,實際上的內容包羅萬象,從機場的號誌標線到航線的規劃與天氣應變,通通都會出現在考試題目上。

實際上的考試是去考試中心,有兩個半小時的時間(但大概半小時到一小時可以做完)。你可以攜帶一個航空計算機 (E6B or 電子版的計算機,但不可以是手機 app)進去以及圓規跟量角器。後者通常用在 Cross-Country Flight Planning 上會要你看航圖然後計算角度、距離以及時間等等的題目。考試的內容由於是當場隨機決定(有一個範圍),我其實考試當天沒有用到圓規跟量角器。

如何準備?我想台灣長大的各位應該都很會 😂 背考古題!我滿推薦 Sporty's 的課程,他有實際的影片可以看、也有各大平台的網站、app可用讓你練習考古題。考前就一直狂做這個,考試當天就沒問題了!練習的時候,我建議去買一本或是印一本 Private Pilot Test Supplement,因為剛剛講那些需要航圖的題目很需要用紙筆跟工具量測。

這個階段跟飛行訓練可以並行,但我建議開飛之前至少先看過一輪內容,了解基本飛行常識。但實際考試可以拖到 checkride 以前都可以。我自己是在 Solo XC 之後才開始考試。考試前需要老師的許可(或是你用 Sporty's 也可以取得線上許可)才能考試。Checkride 有機會 DPE 會問你筆試上答錯的題目(成績單上會印出錯誤的題目類別代碼),所以記得準備好再去考試。我自己很幸運的,雖然有幾題不是很確定,但拿到了 98 分!😁

第一階段訓練 (to Solo)

一開始最重要的訓練大概就是跑去附近的練習空域,然後一直在上面做各種飛行練習 (maneuvers)。在灣區,附近常用的練習空域就是在 Half Moon Bay 上空以及 Stanford 以北到 Crystal Springs Reservoir 一帶。

SF Bay Area Practice Area

到底在做什麼練習呢?一開始的時候當然就是一些基本的飛行控制,知道如何爬升、下降、轉彎,然後綜合起來,爬升轉彎、下降轉彎等等。這些基本操作都沒問題了之後,就會開始練習 Stall (aka 雲霄飛車,ㄜ 不是,失速)。

題外話:引擎熄火也叫 Stall,像是車子或重機上坡起步那個,所以我剛開始的時候也想說,引擎跑得好好的,幹嘛突然熄火 XD 後來才知道,Aerodynamic Stalls 是指飛機機翼無法產生足夠升力的狀況。

人活得好好的,為什麼要讓飛機失速?」我想很多人心中大概會冒出這個問題,對,我也是這樣想的。飛機其實在降落的時候,我們並不是直接讓飛機想辦法盡快接觸地面,這樣的結果輕則重落地(就是你搭客機的時候,有時候降落完會想問候機長的那種感覺(誤 ),重則有可能因為落地的壓力太大導致輪子折斷、爆胎等等。

飛機要降落其實是在靠近地面進入地面效應 (Ground Effect) 的時候,此時飛機的氣流被地面擾亂反而減少阻力進而提升飛機性能,在這個階段,我們會輕輕的把機頭往上拉,讓他慢慢減速但又不爬升,最後失速停止飛行而落在地上。這個階段如果你做得很細膩,就可以降落的很舒服很平緩。但實際上,會因為各種風向、進場穩定度等因素,尤其對新手來說,很多東西要控制得宜才能成功降落。

離題了,回到 Stall 練習。這個練習的目的就是教你如何把飛機進入降落的狀態,如果需要 go-around 要怎麼正確的復飛等等。最後也會教你 Traffic Pattern,中文常稱為五邊飛行。飛機進入機場通常會飛一個固定的路線,把不同跑道的飛機隔開,同時也方便讓駕駛容易掌握狀況、調整飛機。這個固定的路線就是 Traffic Pattern。以 KSQL 為例,通常我們會利用 30 跑道,這個跑道是 Right Traffic,也就是所有彎都右轉。從寫著 12 的那頭起飛後 (Upwind),右轉進入 Crosswind,隨後右轉進入 Downwind,最後右轉 Base 以及 Final,再次降落在 30 跑道上,這個就是一個 Pattern Work。

KSQL Runway 30 Traffic Pattern

當你可以成功自己降落飛機的時候,還沒完呢!還要教你會處理緊急情況,尤其是飛單引擎飛機失去引擎的時候怎麼辦?當然,飛機之所以叫飛機就是因為他很會飛,即便失去了推力也不會導致飛機瞬間墜毀。但失去了推力你就得盡快嘗試重啟引擎、如果失敗得關閉引擎,調整飛機姿態取得最大滑翔距離,隨後與空管連絡請求協助並挑選適合的降落場地(可能是附近的平地但也別忘記,如果可以滑翔到機場,機場當然是個最好的選擇)。

灣區公共的商用機場其實都可以讓小飛機降落使用。像是 Oakland (KOAK), San Jose (KSJC) 都可以讓小飛機降落跟練習使用。有時候在大飛機之後進場,很有突兀感 😂 (但也要小心前面的 Wake Trubulence,小飛機很容易被大飛機之後的氣流擾亂,所以這個情況,我們要比大飛機的進場路徑高然後要降落在比他們降落的點前面一些以避開氣流。)

SFO 理論上也可以讓你降落,但降落與處理費可能會收你個幾百塊美金 😅

上面這一大堆都學會了之後,最後就會進行一個 Solo Phase Check,這時候會請另外一個教練來跟你飛,確認你有獨立安全飛行的能力。之後就可以自己在很多限制之下短程飛行。而且緊接著進入下一階段的訓練!

第二階段訓練 (to Solo XC)

以我的狀況,教練在我 Solo Phase Check 通過之後並沒有讓我馬上 Solo,而是乾脆直接進入 Solo Cross-Country 的訓練。所謂的 Cross-Country 並不是真的要你自己從東岸飛到西岸(在美國飛小 Cessna 172 應該會飛到吐血 😂),而是指飛行兩個機場有 50nm 以上的直線距離。因為距離變長了,考驗的項目就變成了航線的規劃(怎麼決定你要飛多高?要怎麼知道障礙物多高?飛多低是安全且合法的?氣象怎麼查、怎麼決定避開還是晚飛?目的地機場附近怎麼辨識?跑道在哪邊? Pattern 方向/高度?... etc.)以及你跟空管員/在無管制的小機場(Uncontrolled Airport)跟其他飛行員溝通的能力。上一篇有提到一個可以買的軟體是波音的 ForeFlight,這時候就展現出他的好用之處了,上面提到的東西幾乎都可以在上面查到,你也可以用他規劃飛行計劃甚至實際飛行時,搭配 ADS-B receiver 直接顯示附近的飛機在哪邊以及提供語音提醒。

在跟教練飛過幾個遠的機場後,也跟上面一樣會進行一個 Solo XC Phase Check,在這之後就可以自己飛行長途的機場了。這個階段因為你還沒有駕照,每次長途飛行都會需要教練簽核你的計畫。也有可能遇到很多次因為天氣必須推遲飛行的狀態(像是灣區秋季很容易早上會起大霧要等到中午左右才會消退到足夠可以目視飛行)。

Snapshot of the GA airports I've visited in 2022

在這個階段超級好玩!因為你可以開始到處亂飛(也必須要飛一些比較長的飛行才能滿足考照要求與時數要求),上圖是我最近去過的機場,藍色的是出發地,橘色的是 Touch-and-Go 或是 Taxi-back (通常是在第一階段跟教練去過的),藍色的則大多是我自己去,有短暫 (讀作 “進去上個廁所”) 進去 FBO 的機場。我有一次甚至飛去 Sacramento Mather Airport,順便去附近台灣超市買個貢丸回家 😁

FBO: Fixed Base Operator

其實就是在機場上面提供飛機服務的地方,通常有地方讓飛行員可以短暫休息也可以幫飛機加油。雖然別人都是搭私人專機,你是開自己的小飛機,很有格格不入感 😂

最後準備考試 (to Checkride)

這部分就留待下集待續啦!因為我也還沒正式進入這階段,預計是 12 月開始練習準備最後的口試跟實機飛行考試,希望可以順利拿到駕照!

This is an article about my experience learning how to fly an airplane. Given it's rare in Taiwan to have a chacne to learn general aviation, I'm gonna write this article in Traditional Chinese instead, an English version might come later.

說來想學開飛機的想法也已經存在很久了,但都一直沒有付諸實現。要開始學飛雖然不然,但也很多雜七雜八的事情要思考跟計畫,所以就一直拖拖拖到現在才終於正式開始。這篇先來聊聊正式開始學飛之前會遇到的東西。

開始

怎麼會想來學開飛機?大概是COVID之後有點無聊吧XD 但最早第一次在 Georgia Tech 聽到同學有學過開飛機之後,就驚覺「咦?原來這是一個大家都可以學的東西嗎?!」。後來畢業隨著工作搬來了灣區,也遇過朋友開飛機帶我去 Bay Tour (繞一圈灣區,會經過幾個著名的景點像是金門大橋等等),那天也很幸運航管員(ATC, Air Traffic Controller),願意讓我們直接切西瓜經過 SFO 上面那個又複雜又大的 Class B airspace,難得拍了一張從空中拍SFO的照片。也遇過同事開飛機載我跟同事去 Tahoe 滑學,那種可以跳過塞車的感覺實在好爽,就隨著這些東西慢慢累積起了一點想真的去學的念頭。

從空中看 SFO

學飛(假設是PPL, Private Pilot License),基本上可以分成兩個部份:地面知識跟飛行時數。所以也有聽過有人建議,不然乾脆先自己學地面知識,網路上確實也有很多免費的資源可以學,但沒有實際飛行的「需求」實在有點難持之以恆。我嘗試在公司開了 #learn-how-to-fly 的 Slack channel,想說找幾個同事一起學,但最後沒有成功。XD 不知道為什麼,去年後期加入的同事有好幾個有PPL執照,頻道開始比較熱絡一點,給了我比較多動力開始學。剛好也有一個同事就在我的team上,我們一起開飛機去了Watsonville,回程途中他讓我飛了一小段,就感覺:「好吧,就是這次了。就認真去學吧!」

真的想學了,接下來就是找地方了。滿多未來想當航空公司機師的人,通常會去找飛行學校,但我沒有這個興趣也沒有這個時間能這樣用,所以必須找一個能跟我時間搭配的教練一起,也沒有自己的飛機所以也需要找地方租飛機。問了一圈,好像滿多我認識的人都是加入 West Valley Flying Club 這個飛行俱樂部。加上之前提到加入我的team的同事,也剛加入,需要找人給一個 checkride 確認飛行技術才能開始租飛機。我們那天出去飛一圈回來的時候,剛好就遇到他找的 CFI (Certified Flight Instructor) 準備帶學生出去,我就要了一下電話,之後就決定就乾脆找他直接開始吧。 XD 一個非常隨性又隨便的流程。(其中一部份原因也是這個 club 同時有 KSQL (San Carlos) 跟 KPAO (Palo Alto) 兩個地點,對我來說比較方便。辦公室裡面還有養一隻可愛的黑貓。

流程與文件

地點與時間

學飛真的是一個需要「時間」的事情,因為很多東西都需要實際操作才能體會,所以就必須要一直飛飛飛飛飛。但白天又要上班,就真的需要安排一下。建議最少一個星期能抽出兩次,最好三次的時間。每一次大概會需要三個小時(實際飛行時數大概會落在 1.8 小時上下),前後有一些飛行前檢查跟停妥飛機跟填寫紀錄本的事情要做會花掉一點時間。我最後跟我老闆僑好讓我平日下午一天提早下班去飛然後回來之後多上線幾小時補回來,另外一天就找週末去,還好我在灣區天氣晴,不太會有需要取消的狀況。有時後會需要在地面上課,補一下知識,這個如果你的 CFI 情況許可,就可以直接排在晚上下班後,比較好安排。

地點跟時間另外一個影響的就是飛機,前面提到 WVFC 同時有 KSQL, KPAO 兩個地方,所以稍微比較有彈性一點但偶爾也會遇到新飛機沒有了或是某個地點飛機都固定時段租光了之類的也要注意一下。

錢錢

對,不可否認學飛還是一個滿貴的事情。雖然說在矽谷當工程師收入不差了,學飛的支出還是會讓人多想一下。最基本的就是需要飛機跟教練,灣區這邊飛機,比較新的玻璃儀表飛機 (Cessna 172 G1000) 大概要 $180 ~ $220/hr 上下,教練從 $80 ~ $140/hr 都有。一次上課通常會收到 2.5hrs (飛機算實際飛行時數,大概是2小時)。加一加,上一次課就需要就 $650 上下了,一週需要兩三次。

除此之外還有一些額外要買的東西:

  • Bose A20:嚴格來說也不是必要,但抗噪耳機真的很讚。平常搭飛機也都會想用抗噪耳機了,飛行的時候你離引擎更近還需要跟 ATC 講話,有個好耳機真的還是幫助很大!($1,095.–)
  • Foreflight:超級好用的各種航空資訊 app,從飛機 Traffic 資訊(需要另外連接 ADS-B 接收器或是至少飛低一點的時候有網路)還有各種航空圖跟機場圖。一樣也不是必要,但非常好用!Foreflight 建議使用 iPad Mini Cellular 來跑,mini 大小比較適合而 Celluar 版本有 GPS 緊急情況可以當備用導航使用。 ($199/yr,加入 SAFE 有額外額扣) 不想花錢的話,可以用 SkyVector 但就沒有離線(除非你印出來XD)。
  • 筆記本、筆跟 Checklist:一些小東西,也沒多少錢但就在聊錢了就拿出來講一下。Checklist 跟飛機機型有關,學飛很常用 Cessna 172S / G1000。

外國(星)人的特殊步驟

因為美國文件都稱呼外國人是 Alien,因此以下我們都用外星人稱呼自己 XD 外星人想在美國學飛要先做一個背景調查叫 Flight Training Security Program (以前叫 Alien Flight Student Program XD),其實就是需要多一個指紋的步驟。大概一兩週就會下來很快,在這之前的時數無法紀錄在 logbook 上面算入正式訓練時間。

(題外話,一路上從簽證、綠卡跟 Global Entry,美國也不知道調查過幾次我的身家了,這種東西可以 share 一下嗎?我很樂意簽個資訊分享同意書 XD)

體檢

體檢倒是意外的麻煩。FAA 雖然有個官方網站可以查 AME (Aviation Medical Examiner),但你只能輸入一個地點然後之後還要一個一個打電話約約看時間,交叉看一下 reddit 上大家的經驗,總之是個有點花時間的步驟。PPL 一般來說只需要持有三級體檢 Third-class Medical Certificate,基本上沒有太多奇怪的問題都可以開飛機。

這個步驟建議提早開始做,免得最後體檢過不了,學飛的錢就浪費掉了。不過到你單飛 (Solo)之前都不需要體檢結果,所以你可以評估一下什麼時候要做。

特殊頒發 Special Issuance

有些醫療情況會導致你無法立刻通過三級體檢,需要再送文件給 FAA 額外審查之後才能拿到 Special Issuance。如果你有這個需求,我建議跟 AME 問一下 FAA 大概會需要哪些文件,先跟你的主治醫生弄好證明後直接就寄過去 FAA,同時建議每週打個電話過去問一下進度。我不知道會不會真的比較快審核你的申請,但總之兩三個月過後,我也還是拿到了。比較討厭的是,我的 SI 只有一年有效,之後可能要提早重新申請或是乾脆就改用 BasicMed。

到這邊如果你都做了,你應該已經開始在上課了。下一篇再來聊聊我自己上課的心得!

I am a lazy person so I've been really just compiling the code I want to run on Raspberry Pi ... well, on Raspberry Pi. It was slow but it is super simple to setup. However, sometimes you just want to compile something larger than the Raspberry Pi can handle. What now?

The first thing my lazy butt tried is to simply run a ARMv7 image using qemu-system-arm but that sadly is very slow on my computer due to emulating a different architecture altogether. I was also too lazy to setup a proper buildroot with all the toolchains and libraries properly cross-compiled for the ARMv7 architecture.

I decided to give another approach a try: using qemu user-mode emulation to run ARMv7 userspace directly and to wrap it in docker so I don't need to worry about messing my system up. We should be able to get near full-speed with this method.

Fortunately, someone already published an ARMv7 docker image agners/archlinuxarm-arm32v7. We just need to get our system to run ARMv7 file now. To do this, we need to install binfmt-qemu-static from AUR. This enables your system to run ELF files from other architecture.

If you just start running the container at this point on, you will run into this weird problem:

[root@f19789b92d0d code]# cargo build
    Updating crates.io index
warning: spurious network error (2 tries remaining): could not read directory '/root/.cargo/registry/index/github.com-1285ae84e5963aae/.git//refs': Value too large for defined data type; class=Os (2)
warning: spurious network error (1 tries remaining): could not read directory '/root/.cargo/registry/index/github.com-1285ae84e5963aae/.git//refs': Value too large for defined data type; class=Os (2)

Value too large... for wat? I didn't read into what exactly caused this but someone hypothysize that it could be filesystem compatibility between 32-bit/64-bit (ARMv7 is 32-bit and my PC is 64-bit. If you run the ARM64v8 image than it should just work) systems so we need to mount a filesystem that works on a 32-bit system. I've tried using mkfs.ext4 -O^64bit and even mkfs.ext3 but they all still produce the same problem. I decided to try another filesystem altoghter and JFS works!

To create a JFS image, you can run:

fallocate -l 4G disk.img
sudo mkfs.jfs disk.img

and than you can run this to mount it:

mkdir mnt
mount -o loop disk.img mnt

That's it! Once you have that JFS filesystem setup, you can run this command to run ARMv7 Arch Linux in docker and compile whatever you need!

docker run -it --rm -v $PWD/mnt:/work agners/archlinuxarm-arm32v7

The #BlackLivesMatter is happening in the US. It feels like a very very distant event for Taiwanese and yet it is happening right beside me. I've seen a lot of viewpoints from the Asian American community and that got me thinking: What am I feeling and thinking as a Taiwanese expatriate living in the US.

[The English version was translated and expanded on my original text, in Traditional Chinese.]

To be honest, I know next to nothing about racism when I came to the US years ago. I grew up in Taiwan all the way until I finished my master's degree. I haven't been hearing a lot of racism being talked about in Taiwan (not that Taiwan does not have it) and that I didn't have a deep understanding of the US history, and quiet frankly, I still don't today. It could be that I'm lucky or insensitive to it, but I also never deeply felt that I was being discriminated because of my racial background. The biggest discrimination I felt since I left Taiwan is the oppression on my country. Almost no one recognize Taiwan as a country and we need to somehow navigate these gaps as a Taiwanese individuals. #YourCountryIsNotACountry I can totally understand that some Taiwanese people, living 7500 miles away from the US, probably don't have the context to build empathy towards what is going on here.

As I'm staying longer in the US, I get to know more people of different backgrounds. I hear more things about my friends, about what's happening around me. And as I was rebuilding my identity now that I don't live in my own identity bubble, I've read on more things. It's really hard to not start to feel and think more deeply about racism. It has become a problem I might have encountered myself rather than some distant story. I've read on an article today “Black Lives Matter, Taiwan’s ‘228 Incident,’ and the Transnational Struggle For Liberation” that really resonate with me deeply. Growing up in a country that is being alienated by the international community, I have never though about one day we will be drawing parallels from the 228 White Terror that happened in the dictatorship-era of Taiwan to the Black history and current events.

Taiwan has came a long way since the dictatorship era. We grew to be one of the modern democratic and progressive country in Asia. This did not happen without protests, so we should know very well ourselves. More recently, we had the Sunflower movement in 2014, we have many same-sex marriage protests throughout our history until we finally legalize it in 2019. We really should know what is going on. The Taiwanese society cares a lot about being “polite”. Our movements put a lot of emphasize on projecting that image. Everyone is very conscious about it. We would be fighting our way into the legislative yuan while self-patrolling to make sure no one is hurt, no cultural artifacts in the building was damaged and protesters clean everything up afterwards. Yes, those are all great, but is that really everything? We've felt deeply when Hong Konger was protesting for their freedom and saw the police brutality over there as well. It all got me to think about what exactly is a protest and where do I draw the line? In the face of the oppression and systematic discrimination that the black community having going through, these doesn't matter. Minnesota officials also found that arrested looters are linked to the white supremacists groups. We have seen this too. There were gangsters trying to blend into our movements and try to incite violence and escalate too. We should understand what is going on. We've always felt that we were being discriminated on the international community and we should have the empathy here too as it is far more personal than ours.

I'm really glad that we have Taiwan. I may have not been living in Taiwan but seeing us gaining more momentum and visibility on the international stage really makes me happy. A few recent big policies are heading towards the progressive path. I felt really lucky and proud that I'm Taiwanese, but we are also far from perfect. We have not finished our own transitional justice for the 228 incident and we have our own racism problem towards migrant workers from the SEA countries too, not too mention casual racism that I still hear occasionally. I'm not saying every single Taiwanese person should care about all the things in the world, and that is perhaps not necessary. However, the very very least we can do, is to look at what is happening, and at the very least, trying to prevent it from happening in Taiwan too. And if you do live in the US, we should care. It's unjust and we are not protected from racism at all.

中文版

最近幾天我想大家應該都知道,美國正在大規模的抗議 #BlackLivesMatter 的事情;是一個感覺離台灣人很遠,但卻正在我身邊發生的事情。

身為一個住在美國的台灣人,剛來美國的時候,說真的我對種族歧視的認識剛開始真的沒有很深。一方面在台灣這是比較少被提到的問題(並不是台灣就沒有這個問題),一方面我對美國的歷史脈絡認識的也不多。即便如此,黑人也好、亞洲人也好被歧視的事情依然時有所聞。也或許是我神經大條吧,我自己沒有深刻的感受過因為我的種族背景而被歧視過,一直以來我出國之後感受到最大的打壓大概是我自己的國家吧。 #你的國家不是國家 因此,我完全可以理解在台灣出生、長大的人,大概很難對這件事情有很深刻的體會。

但時間久了,認識的人變多了,聽到的事情變多了。隨著出國之後重新建立自己認同的過程,看的東西也多了。真的很難不對這些事情開始有感覺,會去思考。畢竟種族歧視對於一個住在國外的我來說,是個很切身相關的問題。而今天,看到了一篇文章真的有點打到我,”Black Lives Matter, Taiwan’s ‘228 Incident,’ and the Transnational Struggle For Liberation” 這篇文章在講 228 白色恐怖跟現在 #黑人生命無價 的運動有什麼相似的地方。我從來沒有想過台灣的歷史跟美國與黑人的歷史能夠連結在一起,能夠有共鳴。有興趣的人建議讀讀。

難得說了這麼多話,我想說說我自己看到什麼:先說事情本身吧,一個手無寸鐵的黑人被壓制在地上,被警察用膝蓋壓著脖子不能呼吸長達好幾分鐘,中間一直求饒、求助說自己不能呼吸。然後,最後真的就這樣被壓死了。我想大部分的人都無法認同警察的作為。加上美國長期社會、警察對於黑人的不公平、暴力,民眾群起抗議,我想有在關心美國新聞的人應該不難想像這件抗議的產生,#BlackLivesMatter 運動今天也不是第一天。確實,看到有些抗議活動被暴力圍繞,有人砸店等等。但是,在我身邊 Bay Area 附近,也多得是很多城市抗議什麼大事也沒發生。甚至隔天也有志工在街上清理。而我同樣身為非主流群體的一員,雖然無法說是受到相同的迫害,真的可以理解為什麼會這麼生氣,也很支持這個抗議。

台灣這幾年的抗議運動、香港去年持續之今的反送中,都讓我重新思考到底抗議活動到什麼程度是合理的。在台灣,或許大家認知到我們自己的社會多在乎「要有禮貌」,大家非常非常注重抗議的形象。一邊衝進立法院,一邊還要自己組織糾察隊保護大家安全,自己整理好環境。這些都是非常好的事情沒錯,但我也可以體會在抗議的氣頭上,這些事情或許相對不是這麼重要。砸無關的店家,趁火打劫,這些我還是無法認同,但實際上,這些人是不是真的是抗議份子還很難說,Minnesota 官員也說抓到的打劫犯跟白人種族主義團體有關。這不是跟台灣大家在抗議的時候,很怕很小心自己被跟黑道份子連結一樣嗎?我們應該要更能夠拿自己的經驗將心比心才是,也應該要能夠用自己在國際上被長期歧視的心情感受。

想想,我們真的很幸運有台灣。這幾年雖然不在台灣,但看到台灣在國際上的能見度越來越高。幾個重大的國家政策也往進步價值的方向前進,說真的,我覺得我自己很幸運也很驕傲是台灣人。但台灣也不是完美的,說回228我們自己的轉型正義也還沒完成,而到現在也一直還是有歧視外籍勞工的現象。(最近不也才剛發生北車事件?)或許說要每個台灣人都關注世界大事是有點難,也沒有必要。不過我覺得至少,或許我們可以看看別人發生過什麼事情,避免台灣在未來也發生一樣的事情。

I want to talk about one problem that has been bugging me as a motorcyclist for a while. I usually ride with a helmet cam with me. For example, I've been to Japan for some motorcycle road trips before. I've collected hours and hours of videos of the roads ahead and some other different angles. However, it is really hard to find a highlight in the video.

Sometimes you noticed something interesting going on the road. Taking one example from my recent trip to Napa, I saw two squirrels fighting on the road as I rode by. (okay, it is both interesting and scary at the same time, luckily I managed to miss them.) How do I recover these highlights from a boring long video? The problem is that roads looks very similar and it is very easy to miss the exact moment you see something when you are skimming through the video.

I first thought about GPS might work if I can just remember where it happens and it turns out it's really hard to remember at which corner you see fun stuff and even if you do, synchronizing the video with recorded GPS tracks is usually a long process even if your helmet cam records GPS track at the same time as well. I thought about making a hardware button that just records a timestamp but then I will first need to figure out the right hardware to make one then to mount it on the bike and to synchronize it with the video too.

Finally I had a really simple idea. What if I just use my hand to cover the camera? It's simple, easy to do and now all I need to figure out is how to detect black frames from the video.

Here is one of the example of how a “marker” would look like on video when you use your hand to just cover the camera for a second. As long as you are covering the lens, it should produce a very dark frame comparing to regular day time riding videos.

gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
_, dist = cv2.threshold(gray, 30, 255, cv2.THRESH_BINARY)
dark_pixels = np.count_nonzero(dist == 0)
dark_percent = (float(dark_pixels) / size * 100)

We first convert the frame to grayscale for easier processing since all we card are detecting black pixels anyway. Then we run the frame through a threshold filter to mark anything below gray level 30 to 0 (perfect black) and anything else to 255 (perfect white) then we count the pixels having value equals to zero.

A grayscale threshold-processed frame

Now we take this snippet and apply a bit more logic: let's say we will count a frame as a marker if more than 95% of its pixels are black. We might also have multiple marker frames when your hand is moving in and out of the view so we will want to merge close-by marker points, let's say we will only have 1 marker per 5 seconds. Now we can write out the final code!

import sys

import math
from datetime import datetime
import numpy as np
import cv2

MERGE_THRESHOLD_MS = 5000


def format_time(timestamp):
    msec = timestamp % 1000
    parts = [msec]

    secs = math.floor(timestamp / 1000)
    parts.append(secs % 60)

    mins = math.floor(secs / 60)
    parts.append(mins % 60)

    hrs = math.floor(mins / 60)
    parts.append(hrs)

    parts.reverse()
    return "%02d:%02d:%02d.%03d" % tuple(parts)


def main():
    src = cv2.VideoCapture(sys.argv[1])
    if not src.isOpened():
        print("Error opening file")
        sys.exit(0)
    length = int(src.get(cv2.CAP_PROP_FRAME_COUNT))
    width = src.get(cv2.CAP_PROP_FRAME_WIDTH)
    height = src.get(cv2.CAP_PROP_FRAME_HEIGHT)
    size = width * height
    markers = []
    start_time = datetime.now()

    while src.isOpened():
        ret, frame = src.read()
        if not ret:
            break
        idx = int(src.get(cv2.CAP_PROP_POS_FRAMES))
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        _, dist = cv2.threshold(gray, 30, 255, cv2.THRESH_BINARY)
        dark_pixels = np.count_nonzero(dist == 0)
        dark_percent = (float(dark_pixels) / size * 100)
        frame_time = int(src.get(cv2.CAP_PROP_POS_MSEC))
        fps = idx / (datetime.now() - start_time).total_seconds()
        print("\033[0KFrame %d/%d [%s]: %.2f fps, %.2f%% black. %d black frames found.\r" %
              (idx, length, format_time(frame_time), fps, dark_percent, len(markers)),
              end='')
        if dark_percent > 95:
            markers.append(frame_time)

    merged_markers = []
    for marker in markers:
        if not merged_markers or marker - merged_markers[-1] > MERGE_THRESHOLD_MS:
            merged_markers.append(marker)

    print()
    print("Markers:")
    for marker in merged_markers:
        print("  %s" % format_time(marker))

    src.release()


main()

To actually run this script, you will need to have opencv-python and numpy installed.

One thing I have not figured out on how to improve is the performance of the script. It currently takes about 5 mins to process this 26 mins long video. It looks like most of the processing is done on CPU (decoding/analyzing). I'm wondering if try to move some processing into GPU would help with the speed but that's another topic for another time!

And this is the story of how I recover that squirrel snippet from a 4 hours long recording!

Crostini is the new Google project to bring Linux apps to ChromeOS. Input method is on the roadmap but it has not been implemented yet in the current preview version of Crostini. The situation is a little bit different from the regular Linux one because it is running Wayland and using Google's sommelier project to passthrough into the ChromeOS host Wayland.

To set it up, in your Crostini container do:

sudo apt install fcitx # and your IME engine, example for Taiwanese users: fcitx-chewing
sudo apt remove fcitx-module-kimpanel

Then you should use fcitx-config-gtk3 to set it up.

Now we need to set up a few environment variables and we want those to apply to starting application from launcher menu too. I found that we can set it up here in this file /etc/systemd/user/cros-garcon.service.d/cros-garcon-override.conf. This file might be overwritten in the future with updates. Suggestions are welcome for a better location. You should put in these extra lines in there:

Environment="GTK_IM_MODULE=fcitx"
Environment="QT_IM_MODULE=fcitx"
Environment="XMODIFIERS=@im=fcitx"

Finally, we need to start fcitx daemon. I just put this one-line in ~/.sommelierrc to do the work:

/usr/bin/fcitx-autostart

That's all! Now enjoy typing Chinese in the Linux apps on Chrome OS!

fcitx in gedit with candidates panel

We have just launched Night Mode on Twitter Lite recently. Night mode is an exciting feature in regards to engineering. It is a highly demanded, visually pleasing and the primary driver for our effort of moving our CSS to CSS-in-JS. Let's dive into what did we do to bring this feature to life!

DISCLAIMER: The post was written and posted after the end of my employment at Twitter. I tried to recall the details as best as I could, and I apologize beforehand for any inaccuracies.

What is it?

Night mode is an increasingly popular feature that starts to show up on a lot of websites/apps. Most of the websites use a white background which might cause eye strains when used in a dark environment. When users activate night mode, Twitter Lite switch to a dark color theme app-wide.

Styling components

The core of this feature is the ability to dynamically switching the styling of every component on the screen. Our components were styled using CSS. To swap out styling, we would have to build multiple CSS bundles based on a few factors: color theme, and LTR/RTL text direction. It is not a very scalable solution and requires users to download new CSS when switching different combinations. The other option would be switching to CSS variables. It, unfortunately, does not have enough support across the browsers that Twitter Lite intended to support.

Our next option would be to switch to a CSS-in-JS solution. We use react-native-web throughout our internal component library and the website. It has a built-in component called StyleSheet that provides the function.

// A simplifed example of using react-native-web StyleSheet
const styles = StyleSheet.create({
  root: {
    backgroundColor: theme.colors.red
  }
});

const Component = () => <View styles={styles.root}/>;

Runtime-generated Style Sheet

To create a StyleSheet instance, you make a StyleSheet.create call and pass in a JSON object that looks very much like its CSS counterpart. The API returns you an object with the class name mapped to a number representing the registered styles while its styling engine works in the background to generate runtime CSS classes and deduplication. We would need to somehow allow it to:

  1. Rerun the style creation every time we switch to a new theme
  2. Pass in reference to the next theme so we can use the new color palette

We designed a new API wrapping the StyleSheet API, but instead of taking an object, a function (theme) => styleObject is accepted. We store references to all those functions and return an object with dynamic getters. Whenever users requests to switch themes, we would re-run all the style creations with the new theme. The React components can use the same styles object returned from the first API call to render with the new style.

// Updated to support the new API
const styles = StyleSheet.create(theme => ({
  root: {
    // do not use color name directly but name colors by context
    backgroundColor: theme.colors.navigationalBackground
  }
}));

const Component = () => <View styles={styles.root}/>;

Are we all on the same page?

Sounds perfect! New styles are generated, and all the references are updated. The page, however, is not updated. Well, not until some components receives new data. The components are not re-rendering on the spot because we are updating an external variable instead of working with the React component states. We need a way to signal components to re-render.

Theoretically, we would love this part to be as performant as possible to reduce the overhead of switching themes. For example, we could use a higher-order component to keep track of the components and its corresponding styles and use that information to update components on a smaller scale. It turned out to be hard as we would need to wrap around many components and also the components might have some shouldComponentUpdate tricks to prevent themselves from updating, and the children components might also have shouldComponentUpdate functions too. It does work 80% of the time, it is unfortunate that the other 20% stand out very much under a dark theme.

One hacky solution would be to somehow recursively calling forceUpdate() on every mounted component. It would require some meddling with React internals and we eventually decided not to do this. In our first implementation, we used to manually unmount the previous component tree entirely and remount a new one; this caused a considerable delay in theme switching and was working out of React's lifecycles. We switched to using React.Fragment with the key set to the theme name, allowing React to optimize the operation better and without lifecycle hooking.

class AppThemeRoot extends React.Component {
  state = { theme: 'light' };

  componentDidMount() {
    StyleSheet.onThemeSwitch((theme) => this.setState({ theme: theme.name }));
  }

  render() {
    return (
      <React.Fragment key={this.state.theme}>
        {this.props.children}
      </React.Fragment>
    );
  }
}

The final touch

Night mode smooth transition

Now that we have the basic going, we would like to make it better. Instead of swapping the content out directly, we would like it to be a smooth transition. We have also explored a few different options to implement this.

The first option pops up in my head is to implement a cross-fade. Fading out the old content while fading in the new content. We can create a copy of the old content by doing oldDomNode.cloneNode(true) and insert it back into the DOM. It looked absolutely beautiful, but sadly it did screw up our virtualised list implementation. We had to explore other avenues. The next thing we tried was to fade out and fade in. It looks okay when we do it fast enough so that the transition feels smooth. It, however, would have a brief period of white flashing due to the default page background being full white. We addressed the flash by also fading the document background color to the next background color which makes it feels much more like a cross-fade than a simple fade-out-and-in.

Credit

I hope you enjoyed our journey of exploring the implementation of the Night Mode. Night Mode can't be made without the team's collaboration. Thanks to Marius and Sidhu for finding out the best solution to this problem with me. Special call out to Sidhu because he implemented the proposal. Thanks to the whole team very efficiently migrated all of our components out of CSS in two hack days which in turn enables us to switch the theme of the entire website!

I have worked on Twitter’s new mobile website for the past year. We rebuilt the website using the latest web technologies: React, Redux, Node.js/Express to name a few. It is absolutely an exciting project to work on since you rarely get a chance to rework a large-scale website from the ground up and experiment with the latest tools without having to worry about any historical baggage.

One of the problems that we realized early on is that our Tweet is fairly complex in both the React tree and the DOM tree. A Tweet does not only contain the body text and metadata; it also involves processing #hashtags, @mentions, cards and a lot of Unicode ordeals (one of the most prominent examples is emoji) to make sure we are rendering everything correctly across all platforms.

Tweet can have a complex DOM representation

This normally would not be a problem on a desktop browser, as they have enough processing power to deal with a highly complex DOM tree. However, this is not the case with mobile browsers. We discovered that the performance degrades as the user scrolls further down. What’s even worse is that if we want to implement caching and pre-download say 200 tweets for a user, this will cause our app to effectively render 200 tweets at the same time and lock up the app for a few seconds. I started to look into this problem and realized that a solution to this is to maintain only the visible portion of an infinite list in the DOM tree and render/remove invisible parts as the user scrolls.

How did we solve it?

In the search for a component to support both lazy-rendering and dynamic item height, we developed a component called LazyList. Supporting items of dynamic height can make the system much more complex but unfortunately Tweets have non-deterministic heights due to variable content like cards/picture and text.

The Basics

LazyList works by measuring an item’s height and calculating what slice of items should be displayed on the screen given the scrolled coordinates, this is called a projection. It also applies before/after padding to maintain the facade of out-of-view items, thus not affecting the scroll bar pill in terms of size and position.

Illustration of on-screen layout

In addition to the items visible in the viewport, in order to allow the page to scroll smoothly, we needed to render extra items both above and below the visible region. Typically, this results in one to one-and-a-half pages worth of items. This also gives us a bit of buffer in order to preload the next page of Tweets before the user hits the bottom of the scrollable area. Now that we have a strategy of how this component would work, we will need to fit this into React’s lifecycle methods. Theoretically we will want this to be just like a ListView component – give us items and render function and get lazy-rendering for free.

Lifecycle

The only thing that LazyList is required to know for rendering is a projection of items. A projection is defined as a slice of input items that is visible in the viewport. In order to calculate the projection at any given moment, we will need to figure out the height for each item. A typical approach on the web is to render it off-screen, taking a measurement and re-render it on-screen with the cached measurements. However, this doubles the rendering costs which is impractical for a product used by millions of users on lower-end mobile devices. We moved to an in-place measurement technique: we render items on screen first with a guestimate average height, caching the actual item height for rendered items. We repeat this process until the estimation/cached heights matches all the items on-screen. Using the in-place measurement also allow us to accommodate cases where the item height is changed after rendering, such as when loaded images change the overall height of a tweet.

LazyList lifecycle diagram

Initial rendering (mount)

When the component is mounted for the first time, it has no knowledge about what items will fall within the viewport. It renders nothing and simply triggers projection update.

Update Projection

The projection can be generated by adding up the item heights sequentially until it reaches the scroll offset of the container. This is when we know items after this will be in the viewport. We continue to add it up until it is more than the container height. If there’s any item in the process that we do not have the height for, we will guestimate one. The incorrect number will be corrected after we cache its height and update the projection again.

This step will also be triggered when input events, like resize and scroll happens.

Render

Render is fairly straightforward after we've established the projection to use. We simply run it through a loop and call the renderer function supplied by the user to render it on screen.

Prologue

After rendering, we update our internal cache of item heights. If we encounter any inconsistencies, it means our current projection is incorrect. We will repeat the process until it settles down. The difference in heights are also deducted from the scroll position so the list will stay at a stable position.

Resizing

Resizing a window changes all item widths which effectively invalidates all cached item heights. However, we definitely do not want to invalidate the entire cache. Think of the case where a user has scrolled down 5 pages: if they choose to resize the window, we will want the app to adapt to it gradually instead of waiting for LazyList to remeasure all items; fortunately the in-place measurement technique works with this scenario. We update new item heights into cache and allow the system to correct itself as the user scrolls. The downside to applying this technique is that the scroll bar pill will be a bit jerky or show sudden resizing due to first-pass rendering using cached heights and correcting itself on second-pass. However, this outcome is preferable to having the app locked up for several seconds.

Scroll Position Stabilization & Restoration

{% img /images/posts/infinite-list-anchoring.gif Notice the first tweet is always in the viewport during resizing %}

Whenever there is a difference in expected item heights and the actual item heights, the scroll position will be affected. This problem manifests as the list jumping up and down randomly due to miscalculation. We will need an anchoring solution to keep the list stable.

LazyList used a top-aligning strategy which means it kept the first rendered item at the same position. This strategy improves the symptom but did not fix it completely because we’re not necessarily aligning items within the viewport. We have since improved it to use an anchor-based solution. It searches for an anchor that is present in both projections before and after updates, usually the first item within the viewport. The anchor is used as a point of reference to adjust scroll position to keep it in the same place. This strategy works pretty well. However, it is tricky to programmatically control scroll position when the inertia scrolling is still in-effect. It stops the animation on Safari and causes slight slow down on Chrome for Windows while working fine on Chrome for Mac and Android, for which we do not have a perfect solution yet.

Timeline position is remembered

Remembering timeline position is one of the feature that most Twitter users expected a client to have. However, it is an interesting challenge due to each browser having their own slightly different strategies to restore scroll positions when navigating to a previously loaded page. Some wait for the whole page to finish loading, some wait extra bit to account for dynamically loaded data. To implement a cross-browser solution, we take the matter into our own hands. We give each infinite scrolling list a unique ID and persist the item heights cache and anchor candidates with it. When the user navigates back from other screens, we use that information to re-initialize the component and re-render the screen exactly as you left it. We take advantage of the scrollRestoration attribute of the history object to take over the restoration whenever available and compensate accordingly if manual takeover is not possible.

Onwards

Being a component that is centered around our performance, this is still a critical component that we work on from time to time. It has a new name VirtualScroller too. We have taken on refactoring, performance tuning (minimizing layout thrashing, optimizing for browser schedulers, etc.) largely thanks to Marius, Paul, the Google Chrome team(especially “Complexities of an Infinite Scroller”; we have taken some advice from it for our improvement plan.) and the Microsoft Edge team.

Open Sesame. Sesame is a smart door lock from the CandyHouse. It uses Bluetooth Low Energy to communicate wirelessly with smartphone apps. We are going to explain its BLE protocol and how we can write a script to control it. The protocol is reverse engineered from its Android app. This is not a full protocol documentation. I only reversed it just enough to lock/unlock the door.

BLE Services

The device exposes two BLE services that we can discover:

  • Operations: Normal operation service 00001523-1212-EFDE-1523-785FEABCD123
  • DFU: Device Firmware Upgrade service

DFU service is for upgrading firmwares while the operations service is where all the fun happens and it exposes a few characteristic that we can use to read / write data.

  • Command: 00001524-1212-EFDE-1523-785FEABCD123
  • Status: 00001526-1212-EFDE-1523-785FEABCD123

The packet format

Before we can send anything to the lock, we need to first understand the format of its packet.

HMAC macData md5(userID) S/N OP payload
32 6 16 4 1 optional

Where:

  • macData is the manufacturerData you can read from the BLE advertisement packet: it is the Sesame's MAC address with 3 bytes of zeroes prepending it, which you will need to strip
  • userID is your email address
  • S/N is a number read from the Status characteristic
  • OP is a number indicating operations: LOCK = 1, UNLOCK = 2
  • payload is not needed for locking and unlocking

The HMAC

HMAC is a standard way to authenticate the authenticity of a message. Sesame used SHA-256 as the hash function. The password can be a bit hard to extract. I believe (which means I traced the reversed code, but I have not verified if my assumption is correct) it came from a password which can be retrieved by logging into their XMPP server and chat with the server for user profile. However, it will need to be decrypt with a hard-coded key from the app. I was lazy to going through this so I wrote a Xposed module to extract it from the app. I hooked on to the SecretKeySpec constructor and wait for it to be initialized with the HMAC password.

Reading the Status

The serialNumber is an incrementing, rollover counter that we need to read from the device and include in the packet. It is located at bytes 6 ~ 10 of the response of the Status characteristic. You will need to plus one before you use it. Byte 14 is somewhat interesting as well, it is the error code for last command. You can find a list of error codes in the example code.

Wrapping it up

Pun intended. Before you can send out the constructed packet, you will need to break it down to a series of 20 bytes-sized packets. The first bytes is PT, indicating which part is the packet and then 19-bytes of the original payload.

  • PT indicates a series of packets: 01 for the first one, 02 for the rest and 04 to finalize it

With that ready, you can simply write the wrapped packet to the Command characteristic. It's a write-and-ack endpoint.

Notes & Trivia

  • The reversing was done with apktool, JADX and Android Studio
  • Interestingly, the Sesame app use XMPP protocol to talk to its cloud counterparts
  • The early version app contains a lot of Taiwanese internet memes in it. #TaiwanNo1 #56不能亡

Example Code

Here's an example snippet for unlocking a Sesame. Have fun!

const crypto = require('crypto');
const noble = require('noble');

const userId = 'REDACTED';
const deviceId = 'REDACTED';
const password = 'REDACTED';

const CODE_UNLOCK = 2;
const serviceOperationUuid = '000015231212efde1523785feabcd123';
const characteristicCommandUuid = '000015241212efde1523785feabcd123';
const characteristicStatusUuid = '000015261212efde1523785feabcd123';

console.log('==> waiting on adapter state change');

noble.on('stateChange', (state) => {
  console.log('==> adapter state change', state);
  if (state === 'poweredOn') {
    console.log('==> start scanning', [serviceOperationUuid]);
    noble.startScanning();
  } else {
    noble.stopScanning();
  }
});

noble.on('discover', (peripheral) => {
  if (peripheral.id !== deviceId) {
    console.log('peripheral discovered; id mismatch:', peripheral.id);
  } else {
    noble.stopScanning();
    connect(peripheral);
  }
});

function connect(peripheral) {
  console.log('==> connecting to', peripheral.id);
  peripheral.connect((error) => {
    if (error) {
      console.log('==> Failed to connect:', error);
    } else {
      console.log('==> connected');
      discoverService(peripheral);
    }
  });
}

function discoverService(peripheral) {
  console.log('==> discovering services');
  peripheral.once('servicesDiscover', (services) => {
    const opServices = services.filter((s) => s.uuid === serviceOperationUuid);
    if (opServices.length !== 1) {
      throw new Error('unexpected number of operation services');
    }

    discoverCharacteristic(peripheral, opServices[0]);
  });
  peripheral.discoverServices();
}

function discoverCharacteristic(peripheral, opService) {
  console.log('==> discovering characteristics');
  opService.once('characteristicsDiscover', (characteristics) => {
    const charStatus = characteristics.filter((c) => c.uuid === characteristicStatusUuid);
    const charCmd = characteristics.filter((c) => c.uuid === characteristicCommandUuid);

    if (charStatus.length !== 1 || charCmd.length !== 1) {
      throw new Error('unexpected number of command/status characteristics');
    }

    unlock(peripheral, charStatus[0], charCmd[0]);
  });
  opService.discoverCharacteristics();
}

function unlock(peripheral, charStatus, charCmd) {
  console.log('==> reading serial number');
  charStatus.on('data', (data) => {
    const sn = data.slice(6, 10).readUInt32LE(0) + 1;
    const err = data.slice(14).readUInt8();
    const errMsg = [
      "Timeout",
      "Unsupported",
      "Success",
      "Operating",
      "ErrorDeviceMac",
      "ErrorUserId",
      "ErrorNumber",
      "ErrorSignature",
      "ErrorLevel",
      "ErrorPermission",
      "ErrorLength",
      "ErrorUnknownCmd",
      "ErrorBusy",
      "ErrorEncryption",
      "ErrorFormat",
      "ErrorBattery",
      "ErrorNotSend"
    ];
    console.log('status update [sn=', + sn + ', err=' + errMsg[err+1] + ']');
  });
  charStatus.subscribe();
  charStatus.read((error, data) => {
    if (error) { console.log(error); process.exit(-1); }
    if (data) {
      const macData = peripheral.advertisement.manufacturerData;
      const sn = data.slice(6, 10).readUInt32LE(0) + 1;
      const payload = _sign(CODE_UNLOCK, '', password, macData.slice(3), userId, sn);
      console.log('==> unlocking', sn);
      write(charCmd, payload);
      setTimeout(() => process.exit(0), 500);
    }
  });
}

function _sign(code, payload, password, macData, userId, nonce) {
  const hmac = crypto.createHmac('sha256', Buffer.from(password, 'hex'));
  const hash = crypto.createHash('md5');
  hash.update(userId);
  const buf = Buffer.alloc(payload.length + 59);
  macData.copy(buf, 32); /* len = 6 */
  hash.digest().copy(buf, 38); /* len = 16 */
  buf.writeUInt32LE(nonce, 54); /* len = 4 */
  buf.writeUInt8(code, 58); /* len = 1 */
  Buffer.from(payload).copy(buf, 59);
  hmac.update(buf.slice(32));
  hmac.digest().copy(buf, 0);
  return buf;
}

function write(char, payload) {
  const writes = [];
  for(let i=0;i<payload.length;i+=19) {
    const sz = Math.min(payload.length - i, 19);
    const buf = Buffer.alloc(sz + 1);
    if (sz < 19) {
      buf.writeUInt8(4, 0);
    } else if (i === 0) {
      buf.writeUInt8(1, 0);
    } else {
      buf.writeUInt8(2, 0);
    }

    payload.copy(buf, 1, i, i + 19);
    console.log('<== writing:', buf.toString('hex').toUpperCase());
    char.write(buf, false);
  }
}

Zipkin is the Twitter open-source implementation of Google's distributed tracing system, Dapper. It's a great tool for people who wants to understand the bottleneck in their multi-services system. The only downside is that I found its documentation isn't quiet clear about the tracing format, so I decided to write a blog post that gives an overview of the system and its communication protocol.

Before we continue, I would suggest you to take a glance at the paper. It would give you some background knowledges and the assumptions of the Zipkin. I will try to include relevant points in the post, but you may find it easier if you read the paper first.

Overview

Zipkin splits the roles of a tracing system into four parts: a collector, a query service, a database and a web interface. Zipkin is a passive tracing system, which means the app are responsible of sending the tracing information to the Zipkin. Zipkin itself does not actively listen to the traffic on the network, nor does it try to ping the application for statistics.

Architecture

{% img /images/posts/zipkin-arch.png 650 “Zipkin Architecture” %}

An usual Zipkin deployment looks like the figure above. The recommended database is Cassandra. The protocol between the applications and Zipkin collector is Zipkin/Scribe/Thrift (read Zipkin on Scribe on Thrift). If you want scalability, the zipkin project recommended to setup a full Scribe environment. You can run multiple copies of Zipkin collector and configure your server-local Scribe receiver to route Zipkin messages to the cluster for load-balancing. For testing or low workload environment, you can point your application directly to the Zipkin collector, as it supports Scribe protocol as well.

Tracing

Zipkin treats every user initiated request as a trace. Each trace contains several spans, and each span is correlated to a RPC call. In each span, you can have several annotations. There are four annotations the one span must have in order to construct a full-view of a RPC call (in chronological order): cs, sr, ss, cr, in which c stands for the client, s stands for the server and the second s stands for send, the second r stands for receive. Please note that these annotations does not have to be all present in one span. You can send Zipkin two spans of the same spanID and have (cs, cr) and (sr, ss) respectively, this is an useful property since you can do logging in the client and the server separately. Each of those annotations would also have a timestamp to denote when the event happened and an host for on which host this event happened.

If you took a look at the Zipkin's thrift definition, you will also see that the span also carries a list of binary annotations. These are a special kind of annotations allows you to tag some request-specific information in the trace, for example, the HTTP request URI, the SQL query or the HTTP response code.

ID propagation

In the last section, we talked about trace and spans. Each trace is identified by a globally unique traceId. Each span is identified by the traceId it belongs to and an in-trace unique spanId. You may also specify an parentSpanId to represent another RPC call made during the parent span's duration. The spans should form an acyclic tree structure.

Now think about how server handles a request. Let's say we have a nginx server as frontend, an application server and a database server. When nginx gets a request, it needs to generate a traceId and two spans. The first spans denotes the user requesting nginx, it will have spanId = traceId and parentSpanId = 0 by convention for root spans. The second spans will be generated when nginx initiate the connection to the upstream. It would have a new spanId, parentSpanId set to the first span's id and reuse the same traceId.

The nginx will then need to pass the second span's traceId and spanId to the upstream. Fortunately, there's a convention for HTTP. Zipkin uses HTTP header to carry those informations. The nginx would need to set X-B3-TraceId, X-B3-SpanId and X-B3-ParentSpanId for the upstream to pick up, and the same process goes on for each layer. If you're using other protocols, you might need to come up with your own creative solution. For example, you may use SQL comment to carry over those ids in database queries.

In Practical

You should have enough knowledge of Zipkin to get started by now. Let's see how these things would be use in code.

Setup

Before we dig into codes, you need to deploy the Zipkin first. You can download the Zipkin code and set it up yourself. To make things easier, I packaged Zipkin into Docker images, enabling one-liner deployment. Check out docker-zipkin if you're interested.

Communications

We've talked about how Zipkin processes traces and spans. Here we will use an example to show you what has been transferred to the collector under-the-hood. We will reuse the example before: nginx frontend and an application server. Note that you will not see any Trace object below, since trace is a virtual entity that only exists as traceId in Spans. In the following example, we will use JSON to denote an object since it's easier to write. Also, in the real world Zipkin communication, spans are being encapsulated in Scribe. It would look like this.

{ category: "Zipkin", message: <Span-serialized-as-Binary-Thrift-Struct> }

When an user's request hits nginx, the nginx sends a span to the collector.

{
  traceId: 1,      // randomly generated globally unique ID
  spanId:  1,      // root span shares spanId with traceId
  parentSpanId: 0, // root span does not have a parent
  name: "GET",     // RPC method name
  annotations: [
    {
      timestamp: "10", // a UNIX timestamp in **milliseconds**
      value: "sr",
      host: {
        ipv4: 0xC0A80101, // IP address, but as an Integer
        port: 80,
        service_name: "nginx"
      }
    }
  ],
  binaryAnnotations: [ // It's optional, useful for store metadata.
    {
      key: "http.uri",
      value: "/book/1990", // would be store as byte[]
      annotationType: "String",
      host: {
        ipv4: 0xC0A80101,
        port: 80,
        service_name: "nginx"
      }
    }
  ]
}

The nginx would than figure out that it needs to contact the upstream application server to serve the content. Before it initiates a connection, it sends another span to the collector.

{
  traceId: 1,       // all spans in this request shares the same traceid
  spanId:  2,       // note that a new ID is being used
  parentSpanId: 1,  // the user <-> nginx span is now our parent
  name: "GET Book", // RPC method name
  annotations: [
    {
      timestamp: "12",
      value: "cs",
      host: {
        ipv4: 0xC0A80101,
        port: 80,
        service_name: "nginx"
      }
    }
  ],
  binaryAnnotations: []
}

The application server receives the request and, just like nginx, sends a server receive span to the collector.

{
  traceId: 1,
  spanId:  2,
  parentSpanId: 1,
  name: "GET Book",
  annotations: [
    {
      timestamp: "14",
      value: "sr",
      host: {
        ipv4: 0xC0A80102,
        port: 3000,
        service_name: "thin"
      }
    }
  ],
  binaryAnnotations: []
}

After the request has been processed, the application server sends server send to the collector.

{
  traceId: 1,
  spanId:  2,
  parentSpanId: 1,
  name: "GET Book",
  annotations: [
    {
      timestamp: "18",
      value: "ss",
      host: {
        ipv4: 0xC0A80102,
        port: 3000,
        service_name: "thin"
      }
    }
  ],
  binaryAnnotations: []
}

The nginx now receives the response from the upstream, it will sends a cr to the collector. It also sends a ss before it proxies the response back to the user.

// client receive from upstream
{
  traceId: 1,
  spanId:  2,
  parentSpanId: 1,
  name: "GET Book",
  annotations: [
    {
      timestamp: "20",
      value: "cr",
      host: {
        ipv4: 0xC0A80101,
        port: 80,
        service_name: "nginx"
      }
    }
  ],
  binaryAnnotations: []
}

// server send to the user
{
  traceId: 1,
  spanId:  1,
  parentSpanId: 0,
  name: "/book/1990",
  annotations: [
    {
      timestamp: "21",
      value: "ss",
      host: {
        ipv4: 0xC0A80101,
        port: 80,
        service_name: "nginx"
      }
    }
  ],
  binaryAnnotations: [
    {
      key: "http.responseCode",
      value: "200",
      annotationType: "int16",
      host: {
        ipv4: 0xC0A80101,
        port: 80,
        service_name: "nginx"
      }
    }
  ]
}

Send trace to Zipkin

Scala

Let's talk about Zipkin's native language: Scala first. Zipkin project published a client library based on Scrooge and Finagle. To use the library, you will need the following dependencies (shown in Gradle script format).

repositories {
  mavenCentral()
  maven { url "http://maven.twttr.com/" }

dependencies {
  compile 'org.apache.thrift:libthrift:0.9.1'
  compile 'com.twitter:finagle-core_2.10:6.12.1'
  compile 'com.twitter:zipkin-scrooge:1.0.0'
  compile 'com.twitter:util-core_2.10:6.12.1'
  compile 'com.twitter:util-app_2.10:6.12.1'
}

For the code example, Twitter already have an great example on the github. Please check out zipkin-test.

Java

For Java, I would not recommend to use the Finagle Java support just yet. (or maybe I'm too dumb to figure it out. :( ) Fortunately, there is a Zipkin implementation in Java called, Brave. The dependencies you're looking for are listed below.

repositories {
  mavenCentral()
  maven { url "http://maven.twttr.com/" }

dependencies {
  compile 'com.github.kristofa:brave-impl:2.1.1'
  compile 'com.github.kristofa:brave-zipkin-spancollector:2.1.1'
}

Brave provides an awesome ZipkinSpanCollector class which automagically handles queueing and threading for you.

Conclusion

Phew, finally we can conclude this long blog post. These are basically where I got lost when I tried to understand the Zipkin and tried to extend some other services such as nginx, MySQL to report traces back to the Zipkin. I hope these experiences would have you to get hands on the Zipkin faster. Zipkin actually have more feautres than we talked about here, please also take a look at the doc directory too. Have fun!