A computational linguistics and network science project focused on exploring lexical relationships between Burushaski—a unique language isolate of Northern Pakistan—and over 50 major world languages. Using graph theory and centrality analysis, this research uncovers the structural and lexical connections embedded within multilingual networks.
- 🌍 Project Overview
- ✨ Key Features
- ⚙️ Installation
- 🚀 Usage
- 🧠 Methodology
- 📊 Data Sources
- 📈 Visualization Examples
- 🤝 Contributing
- 📬 Contact
- 📝 License
The Burushaski Words Network is a bipartite graph model that:
- Connects Burushaski words with 50 global languages based on shared characters, roots, or phonetic elements.
- Uses advanced centrality metrics to evaluate:
- Key lexical bridges between Burushaski and other languages.
- Languages with the strongest linguistic ties to Burushaski.
- Underlying topological properties of the multilingual network.
This project blends linguistic insight with network analysis, creating a valuable resource for researchers in natural language processing (NLP), comparative linguistics, and language preservation.
- Automated extraction and merging of lexical data.
- Bipartite and projected graph generation using NetworkX.
- Fully compatible with Jupyter/Colab and local environments.
- Implements five core centrality measures:
- Degree Centrality
- Betweenness Centrality
- Closeness Centrality
- Eigenvector Centrality
- PageRank Centrality
- Centrality scores are aggregated for global and language-specific insights.
- Enables identification of high-impact words and languages.
- Interactive network graphs for structural exploration.
- Centrality distribution plots to compare word influence.
- Heatmaps to visualize cross-language similarities.
- Custom graph layouts: spring, circular, Kamada-Kawai, etc.
- Python 3.8 or higher
- pip (Python package installer)
# Clone the repository
git clone https://github.com/sardaralikhamosh/Burushaski_Words_Network.git
cd Burushaski_Words_Network
# Create a virtual environment (recommended)
python -m venv venv
# Activate the environment
source venv/bin/activate # Linux/Mac
venv\Scripts\activate # Windows
# Install required packages
pip install -r requirements.txtOnce installed, simply run the analysis or visualization scripts as per your requirement. Example notebooks and script files are included for easy execution in both Jupyter and Google Colab environments.
The project uses weighted bipartite graphs where:
- Nodes represent either Burushaski words or target languages.
- Edges indicate shared alphabets or phonetic features.
- Edge weights are determined by common characters or linguistic overlap.
Centrality measures are computed on both the full and projected networks to assess word/language importance and connectedness.
Lexical data is aggregated from:
- Publicly available language alphabet sets
- Custom Burushaski word lists
- Ethnologue, Wiktionary, and linguistic corpora (where permitted)
Explore word-languages relationships through aesthetically rendered network visualizations and insightful centrality charts.
Contributions are welcome! If you wish to improve the project, submit pull requests or report issues. Please follow the contribution guidelines in CONTRIBUTING.md.
For queries, feedback, or collaborations:
Sardar Ali Khamosh
📧 Email
🌐 LinkedIn
🐙 GitHub Profile
🌐 Website
This project is open-source and available under the MIT License.
