Run the following command to automatically install all required dependencies.
sh setup.shCreate a file called .env in the root directory of this repository. Reach out to get the contents of this file.
Use the template file metadata_template.json to fill out the metadata for your dataset.
The name of your data that all users with access will see. Maximum 30 characters.
A paragraph description of your data. Maximum 200 characters.
The original source of the data. Not required, maximum 50 characters.
The date this dataset was created or collected. Not required.
Set to true to allow other users to access the data or set to false so that only you have access.
Options to process words in text column(s) of your data. Valid options are "none", "stem", and "lemma". Part of speech tagging is only available if "lemma" is selected.
Enable/disable word embeddings for this dataset. Set to true to enable or false to disable.
Grouping column for computing embeddings. Will be ignored if embeddings was not specified. THIS WILL CAUSE AN ERROR IF IT DOES NOT MATCH A VALID COLUMN NAME.
Langauge the text was written in. Supported languages are listed below.
- Chinese
- English
- French
- German
- Greek
- Italian
- Latin
- Portuguese
- Russian
- Spanish
If the language you are looking for is not currently supported, reach out to see if it is a possibility to add your language.
List of tags to help users find your dataset when filtering.
List of text columns that will be text mined.
Session token which stores your user data. This allows our server to verify your account and associate this dataset with your account. This token can be found by following these steps:
-
Login to Democracy Viewer.
-
On any page, Right Click ->
Inspect. -
Click on the
Applicationtab. -
Under
Storage, openLocal Storageand click the URL for Democracy Viewer. -
Click the row with the key
democracy-viewer. -
Open
user. The text inside the quotes aftertokenis what you need. Make sure to copy everything inside the quotes but not the quotes themselves.
Run the following command to run processing job.
python run_pipeline.py [data_file] [metadata_file]Replace [data_file] with the path to the file with your dataset and replace [metadata_file] with the path to the file created in step 3.
After running this command, status updates will occassionally be printed to the console and an email will be sent to the email associated with your Democracy Viewer account when processing is complete. Depending on your dataset size and processing power, this could take a long time to run.