- 积分
- 15721
好友
记录
日志
相册
回帖0
主题
分享
精华
威望 旺
钢镚 分
推荐 人
|
注册后推荐绑定QQ,之后方才可以使用下方的“用QQ帐号登录”。
您需要 登录 才可以下载或查看,没有账号?立即注册
x
http://www.culturomics.org/
) z0 Q1 @6 K3 H0 G' `4 }, ]
4 h- q+ P* @% n; x' c6 dOn December 16th, 2010, a team spanning the Cultural Observatory, Harvard, Encyclopaedia Britannica, the American Heritage Dictionary, and Google published a paper describing the Culturomics approach online in the journal Science, and at the same time launched the world's first real-time culturomic browser on Google Labs.
1 W0 B& k" w; C5 B( L4 ^
- |3 f2 R; y) W- H0 y% m2010年12月16日,世界上第一个实时文化组学浏览器上线。! h/ z T$ X# M6 c( g! E
_" e) i$ p1 G1 r3 j4 f& ]《科学》之相关报道和 Research Article 请见附件。$ p! _3 {) F* w" Q* |' ]1 w/ w- r
% k1 [$ u) [2 F摘录几段如下:+ G/ z8 C8 o+ g% e* a$ a5 O
. p& Q) \5 n( ^; M' A9 b, F% l
The resulting corpus contains over 500 billion words, in English (361 billion), French (45 billion), Spanish (45 billion), German (37 billion), Chinese (13 billion), Russian (35 billion), and Hebrew (2 billion). The oldest works were published in the 1500s. The early decades are represented by only a few books per year, comprising several hundred thousand words. By 1800, the corpus grows to 60 million words per year; by 1900, 1.4 billion; and by 2000, 8 billion. * ]1 o' Y8 K- M1 j2 `! I
# r6 l: o5 b& `" A8 x
The corpus cannot be read by a human. If you tried to read only the entries from the year 2000 alone, at the reasonable pace of 200 words/minute, without interruptions for food or sleep, it would take eighty years. The sequence of letters is one thousand times longer than the human genome: if you wrote it out in a straight line, it would reach to the moon and back 10 times over [8].
1 ]2 f: M, Y2 [) H
: I# q" o# D5 U |
|