Skip to content
/ data Public

This is a copy of the daily dump of catalogue and ATF data from the Cuneiform Digital Library Initiative (http://cdli.ucla.edu)

Notifications You must be signed in to change notification settings

cdli-gh/data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2,227 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CDLI Daily Bulk Data Dump

Last update was August 2022.

The repository contains a daily dump of all public catalogue and text data from the Cuneiform Digital Library Initiative.

Getting the data

Make sure you have the Git Large File Storage extentions (git-lfs) installed, see here for instructions. For installing under, say, Ubuntu, you can also use

$> curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
$> sudo apt-get install git-lfs

Clone the repository

$> git clone https://github.com/cdli-gh/data

Retrieve Git LSF data:

$> cd data
$> git lfs fetch

Format

Text Data

The CDLI transliterations dump is offered in plain text UTF-8 ATF format. For more information about ATF, visit :

  http://oracc.museum.upenn.edu/doc/help/editinginatf/cdliatf/index.html (Scroll down for an example).

Catalogue data

The catalogue is offered in a UTF-8 comma separated format. Most fields are thoroughly explained here:

 https://cdli.ucla.edu/?q=cdli-search-information  

Our data schema is currently being remodeled, get in touch if you would like a sneak peak!

To view a sample of the catalogue, you can use the head command on a Unix machine using this syntax, while you are in the directory where the file is stored:

head cdli_catalogue_1of2.csv

With Windows Power Shell, try

Get-Content *filename* -Head *n*

EPP [email protected]

About

This is a copy of the daily dump of catalogue and ATF data from the Cuneiform Digital Library Initiative (http://cdli.ucla.edu)

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •