東京大学 松下達彦研究室 松下言語学習ラボMatsushita Laboratory for Language Learning. University of Tokyo.

木のイメージ画像

Vocabulary Database for Reading Japanese (VDRJ)

Vocabulary Database for Reading Japanese (VDRJ) Ver. 1.1

If you have the “TM Word List”, please renew it with VDRJ as many corrections and additions are made.

Please download the database which meets with your needs.

Vocabulary Database for Reading Japanese (for Research) Ver. 1.1

Skewness, kurtosis of the sub-frequency distribution, sub-frequency rankings and its mean are added to Ver. 1.0, no other data correction is made.

The database which contains 141,950 types is provided as three separate database files. The extension of the databases is .xlsx.
Please open the files by Excel 2007 or a later version. The converter may be necessary with an older version.

VDRJ is made up of three databases: Top 60894, Assumed Known Words and Narrowly Ranging Words.

Please download the databases one by one from links below.

  • 重要度順語彙データベース (Top 60894) 重要度順位 00001-60894 (42MB)  VDRJ (Top60894) (42MB)
    Download
  • 想定既知語彙データベース フィラー、記号その他、固有名詞 (16MB)  VDRJ (Assumed Known Words: Proper Nouns, Fillers, Signs etc.) (16MB)
    Download
  • 使用範囲狭小語彙データベース (24MB) VDRJ (Narrowly Ranging Words) (24MB) Download

統計数値など(Statistics etc.)
日本語教育学会2010年春季大会予稿集発表原稿
=”TM語彙リスト”(データベースの前身)の簡単な紹介です。

Vocabulary Database for Reading Japanese (for Teachers) Ver. 1.0

Download

The database is the simplified version of VDRJ for Research (Top60894). It would be enough for general educational purposes.
It contains the top 60894 words.

The databases below are simplified versions of VDRJ with easier explanation.
These are more simplified versions than the database for teachers;
however, the words are ordered not only for reading but also for daily life use. It contains approx. 20,000 words.
There are three types of databases: For General Learners, For International Students, and Basic 2500.
The first two includes explanations in simple Japanese.
Basic 2500, the database for beginners, has a simple English explanation.

The Vocabulary Database for Learners of Japanese Ver. 1.0 (for General Learners)

Download

The Vocabulary Database for Learners of Japanese Ver. 1.0 (for International Students)

Download

The Vocabulary Database for Learners of Japanese: Basic 2500, Ver. 1.0

Download

Features of VDRJ:

The newest word frequency list made from the book corpus (approx. 28 million tokens) and internet forum corpus (5 million tokens).
--There have been some lists made from magazine or newspaper corpora but no list from a large book corpus.
--The list made from book corpora is better than the lists made from magazine or newspaper corpora in terms of the generality of the word-origin distribution and the stability of the word.
--The short coming from the book corpus is compensated by the internet forum vocabulary.

Dispersion is calculated from the sub frequencies in 10 sub corpora.
Words are ranked by the usage coefficient which is the product of frequency and dispersion.
--By taking the dispersion into account, unevenly distributed words are excluded from the high frequency band.

--Reordering words by applying different weighing to sub frequencies,
the new vocabulary ranking indices for general learners and international students are developed.
Adding word ranking for written Japanese together, three types of ranking are possible.

--Domain-specific words in the four science domains can be extracted
as the indices and levels for the domain-specificity of academic texts are added to the database.

--Possible literary words are extracted.

Character Database of Modern Japanese (CDJ)

Character Database of Modern Japanese (CDJ) Version 2.0

Released on January 2, 2014
松下達彦© Tatsuhiko Matsushita

Ver. 1 をお持ちの方は Ver.2 に更新してくださいますよう、お願いします。

Ver. 1.0 からの主な修正点
・2010年の常用漢字表改定を反映させて、人名用漢字も追加。
・学術テキスト(人文・芸術、社会、理工、生物・医学)や文芸テキストに特徴的に用いられる文字の情報を追加。

Please download the database(s) which meet with your needs.

CDJ for Research

All information is contained.
Download

CDJ for Teachers

This would be enough for general use.
Download

以下、工事中   Under construction below.

CDJ for International Students

This only contains the information which learners will need.
The frequencies in academic domains are weighted when calculating the rankings.

CDJ for General Learners

This only contains the information which learners will need.
The frequencies in everyday domains (i.e. literary works and the internet forum sites) are weighted when calculating the rankings.

CDJ Basic 450

This only contains the information on Hiragana, Katakana, Roman alphabet and the most basic 450 Kanji which elementary learners should learn.

日中対照漢字語データベース

日中対照漢字語データベース
Database of Japanese Kanji Vocabulary in Contrast to Chinese (JKVC)

Version 1.00 (2017年11月26日)
このデータベースの記述には、不完全な点やエラーが残されていますので、その点を了解の上でご利用ください。
今後、データの修正を続けて、随時、情報を更新する予定です。ご利用の場合は、バージョンをご確認の上、最新版をご利用ください。 ダウンロード/Download
(2017年度日本語教育学会秋季大会・予稿集原稿) 「松下・陳・王・陳(2017)日中対照漢字語データベースの開発と応用」
(2017年度日本語教育学会秋季大会・発表ポスター)「松下・陳・王・陳(2017)日中対照漢字語データベースの開発と応用」

日中同形漢語データベース

トップに戻る