Saturday, January 21, 2006

Source Transformation

前幾天由 Tim 的來信,得知有個叫 TXL 的東西,好像很神奇,在好奇心趨使下也去一探究竟

TXL is a unique programming language specifically designed to support computer software analysis and source transformation tasks. ...TXL is best at tasks that involve structural analysis and transformation of formal notations such as programming languages, specification languages, structured documents and so on.

這看起來不是跟 Lex/YaccXSL/T 這些軟體的目的很像嗎?應該是同一類的東西吧?但 Learning TXL 的內文第一段卻說:

TXL is a weird and wonderful language, with a new, rich and distinctly different programming paradigm. Once you get your mind around it, it can help you very rapidly achieve real magic. But because it is different, it takes some time to understand. Along the way you will probably mistake it several times for things it isn't (for example, Haskell, Awk, Yacc or XSL/T).

好吧,不是就不是吧,讓我再仔細瞧瞧,看看它葫蘆裡在賣啥藥……

~~

我們可以把程式語言的 Compiler 或 Interpreter 需要做的事情分成兩個任務:

  1. 讀進原始碼,並找出它的結構。
  2. 處理找出來的結構。

傳統的 Lex 和 Yacc 搭配起來剛好可以產生某個程式語言的程式(通常是 C),用來解決第一個任務。要找出原始碼結構,其實有兩件事要做:

  1. 把原始碼切割成一個個的 token 。負責這件子任務的,叫做 scanner 。
  2. 找出原始碼的階層結構。這是 parser 的責任。

而 scanner 及 parser 這兩項子任務,剛好可以分別由 Lex 和 Yacc 來職掌。詳細的用法,可以去 The Lex & Yacc Page 逛逛。裡面有提到 FlexBison ,它們分別是 Lex 和 Yacc 的後代。我也曾經利用它們來練習寫了個精簡版的 C compiler 。如果你嫌 FlexBison 功能太陽春的話,也許可以去試試 Antlr

~~

TXL About 中有提到每個 Txl 程式都包含兩部份:

  1. A Description of the Structures to be Transformed -- Specified as a directly interpreted BNF grammar, in context-free ambiguous form.
  2. A Set of Structural Transformation Rules -- Specified by example as pattern/replacement pairs combined using functional programming.

TXL 程式的第一部份剛好把 scanner 及 parser 的事情做掉了;第二部份把剩下的翻譯工作也完成了。以前做這些事情要分別動用到三套程式語言,現在 TXL 可以一手包辦,一次搞定!

此外,以 Yacc 這類的 parser generator 來說,都會要我們以類似 BNF 的語法,把要處理的語言文法定出來。 Tex 除了跟別家一樣,支援 Context-Free 的文法外,它還允許訂出的文法可以有歧義遞迴的情形發生。所以用它定義語法比傳統的 Yacc 方便,但也因此,理論上,它在 parsing 時,就不會快過 Yacc 這類工具。

更令人激賞的是,在 scanning 及 parsing 後,接下來要做的翻譯任務,由於 TXL 的語法混合了 functional / rule-based 語言的語法,所以可以很優雅的解決:

The TXL programming language is a hybrid functional / rule-based language with unification, implied iteration and deep pattern match.

The TXL programming language is unique in that it is has a pure functional superstructure that provides scoping, abstraction, parameterization and recursion, over Prolog-like structural rewriting rules providing pattern search, unification and implicit iteration.

The formal semantics and implementation of TXL are based on formal tree rewriting, but the trees are largely hidden from the user due to the by-example style of rule specification.

Prolog 這類 functional language 的優點,在此表露無遺!

1 comments:

York said...

最近由 Programming 版,一連串 Compiler 的討論中,知道了 Spirit ,對相關議題有興趣的,也可以一併玩玩看。