@@ -22,31 +22,33 @@ This roadmap outlines the strategic direction for building a complete, productio
2222By 2027, DataHaskell will provide a high-performance, end-to-end data science toolkit that enables practitioners to build reliable machine learning systems from data ingestion through model deployment.
2323
2424### Core Principles
25- 1 . ** Type Safety First ** : Leverage Haskell's type system to catch errors at compile time
26- 2 . ** Interoperability ** : Seamless integration between ecosystem components
27- 3 . ** Performance ** : Match or exceed Python/R performance benchmarks
28- 4 . ** Ergonomics ** : Intuitive APIs that lower the barrier to entry
29- 5 . ** Production Ready ** : Focus on reliability, monitoring, and deployment
25+ 1 . ** Interoperability ** : Seamless integration between ecosystem components
26+ 2 . ** Performance ** : Match or exceed Python/R performance benchmarks
27+ 3 . ** Ergonomics ** : Intuitive APIs that lower the barrier to entry
28+ 4 . ** Production Ready ** : Focus on reliability, monitoring, and deployment
29+ 5 . ** Type Safety ** : Leverage Haskell's type system (where possible) to catch errors at compile time
3030
3131---
3232
3333## Current State Assessment
3434
35- ### 🟢 Strengths
36- - ** dataframe** (v0.1 launch March 5) : Modern, type-safe dataframe library with IHaskell integration
35+ ### Strengths
36+ - ** dataframe** : Modern dataframe library with IHaskell integration
3737- ** Hasktorch** : Mature deep learning library with PyTorch backend and GPU support
3838- ** distributed-process** : Battle-tested distributed computing framework
39+ - ** IHaskell** : A Haskell kernel for Jupyter notebooks.
3940- Strong functional programming foundations
4041- Excellent parallelism and concurrency primitives
4142
42- ### 🟡 Gaps to Address
43+ ### Gaps to Address
44+ - No community of maintainers and contributors
4345- Fragmented visualization ecosystem
4446- Limited data I/O format support
4547- Incomplete documentation and tutorials
4648- Sparse integration examples between major libraries
4749- Limited model deployment tooling
4850
49- ### 🔴 Critical Needs
51+ ### Critical Needs
5052- Unified onboarding experience
5153- Comprehensive benchmarking against Python/R
5254- Production deployment patterns
@@ -62,7 +64,7 @@ By 2027, DataHaskell will provide a high-performance, end-to-end data science to
6264** Owner** : dataframe team
6365
6466** Goals** :
65- - ✅ Complete dataframe v0.1 release (March 2026)
67+ - Complete dataframe v1 release (March 2026)
6668- Establish dataframe as the standard tabular data library
6769- Performance parity with Pandas/Polars for common operations
6870
@@ -95,8 +97,8 @@ By 2027, DataHaskell will provide a high-performance, end-to-end data science to
9597
9698** Goals** :
9799- Advanced data manipulation features
98- - Integration with database systems
99- - Time series support
100+ - Computing on files larger than memory
101+ - Integration with Cloud database systems
100102
101103** Deliverables** :
1021041 . ** Advanced Operations**
@@ -106,18 +108,13 @@ By 2027, DataHaskell will provide a high-performance, end-to-end data science to
106108 - Complex joins (anti, semi)
107109 - Reshaping operations (melt, cast)
108110
109- 2 . ** Database Connectivity**
111+ 2 . ** Cloud database Connectivity**
112+ - Read files from AWS/GCP/Azure
110113 - PostgreSQL integration
111114 - SQLite support
112115 - Query pushdown optimization
113116 - Streaming query results
114117
115- 3 . ** Time Series Extensions**
116- - Date/time indexing
117- - Resampling operations
118- - Time-based rolling windows
119- - Timezone handling
120-
121118---
122119
123120## Pillar 2: Statistical Computing & Visualization
@@ -126,14 +123,13 @@ By 2027, DataHaskell will provide a high-performance, end-to-end data science to
126123** Owner** : Community (needs maintainer)
127124
128125** Goals** :
129- - Establish comprehensive statistics library
126+ - Create a unified machine learning library on top of Hasktorch and Statistics
130127- Create unified plotting API
131128
132129** Deliverables** :
133- 1 . ** statistics-next** (modernize existing library)
134- - Descriptive statistics
135- - Hypothesis testing (t-test, ANOVA, chi-square)
136- - Linear regression
130+ 1 . ** statistics**
131+ - Extend hypothesis testing (t-test, ANOVA)
132+ - Simple regression models (linear and logistic)
137133 - Generalized linear models (GLM)
138134 - Survival analysis basics
139135 - Integration with dataframe
@@ -173,7 +169,7 @@ By 2027, DataHaskell will provide a high-performance, end-to-end data science to
173169** Owners** : Hasktorch + dataframe teams
174170
175171** Goals** :
176- - Seamless dataframe → tensor pipeline
172+ - Improve dataframe → tensor pipeline
177173- Example-driven documentation
178174
179175** Deliverables** :
@@ -183,7 +179,7 @@ By 2027, DataHaskell will provide a high-performance, end-to-end data science to
183179 - GPU memory management
184180 - Batch loading utilities
185181
186- 2 . ** ML Workflow Examples**
182+ 2 . ** ML Workflow Examples with new unified library **
187183 - End-to-end classification (Iris, MNIST)
188184 - Regression examples (California Housing)
189185 - Time series forecasting
@@ -297,7 +293,6 @@ By 2027, DataHaskell will provide a high-performance, end-to-end data science to
297293
298294** Deliverables** :
2992951 . ** DataHaskell Website Revamp**
300- - Modern design
301296 - Clear getting started guide
302297 - Library comparison matrix
303298 - Migration guides (from Python, R)
@@ -330,7 +325,7 @@ By 2027, DataHaskell will provide a high-performance, end-to-end data science to
330325 - Example project templates
331326
3323272 . ** IDE Support Improvements**
333- - VSCode extension enhancements
328+ - VSCode IHaskell support with dataHaskell stack supported out the box
334329 - HLS integration guides
335330 - Debugging workflows
336331 - IHaskell kernel improvements
@@ -380,41 +375,22 @@ By 2027, DataHaskell will provide a high-performance, end-to-end data science to
380375
381376---
382377
383- ## Integration Priority Matrix
384-
385- ### Critical Integrations (Start Immediately)
386- 1 . ** dataframe ↔ Hasktorch** : Data → Training pipeline
387- 2 . ** dataframe ↔ IHaskell** : Interactive analysis
388- 3 . ** dataframe ↔ statistics** : Analysis workflow
389-
390- ### High Priority (Q2-Q3 2026)
391- 4 . ** dataframe ↔ distributed-process** : Distributed operations
392- 5 . ** Hasktorch ↔ distributed-process** : Distributed training
393- 6 . ** statistics ↔ visualization** : Plot statistical results
394-
395- ### Medium Priority (Q4 2026)
396- 7 . ** All ↔ model deployment** : Production pipeline
397- 8 . ** All ↔ monitoring** : Observability
398-
399- ---
400-
401378## Success Metrics
402379
403380### Q2 2026
404- - [ ] dataframe v0.1 released with 500+ downloads/month
381+ - [ ] dataframe v1 released
405382- [ ] 3 complete end-to-end tutorials published
406383- [ ] Performance benchmarks showing ≥70% of Pandas speed
407384- [ ] 5 integration examples between major libraries
408385
409386### Q4 2026
410387- [ ] 10,000+ total library downloads/month across ecosystem
411- - [ ] 20+ companies using DataHaskell in production
412- - [ ] 50+ active contributors
388+ - [ ] 5+ active contributors
413389- [ ] Performance parity (≥90%) with Pandas for common operations
414390- [ ] Complete ML workflow from data to deployment documented
415391
416392### Q2 2027
417- - [ ] 100 + companies using DataHaskell
393+ - [ ] 2 + companies using DataHaskell
418394- [ ] DataHaskell track at major Haskell conference
419395- [ ] 3+ published case studies
420396- [ ] Comprehensive distributed computing examples
@@ -431,7 +407,6 @@ By 2027, DataHaskell will provide a high-performance, end-to-end data science to
431407### Maintainer Coordination
432408- ** Monthly sync** : All pillar leads (1 hour)
433409- ** Quarterly planning** : Full maintainer group (2 hours)
434- - ** Annual retreat** : Strategic direction (virtual or in-person)
435410
436411### Funding Needs (Optional but Helpful)
4374121 . ** Infrastructure**
@@ -441,7 +416,7 @@ By 2027, DataHaskell will provide a high-performance, end-to-end data science to
441416
4424172 . ** Developer Support**
443418 - Part-time technical writer
444- - Maintainer stipends (Haskell Foundation)
419+ - Maintainer stipends or grants
445420 - Summer of Haskell projects
446421
4474223 . ** Events**
@@ -490,11 +465,10 @@ By 2027, DataHaskell will provide a high-performance, end-to-end data science to
490465** Criteria** :
4914661 . Unmaintained for >6 months
4924672 . Better alternative exists
493- 3 . Low usage (<100 downloads/month)
494- 4 . Creates confusion in ecosystem
468+ 3 . Creates confusion in ecosystem
495469
496470### Version Compatibility Policy
497- - Support last 2 GHC versions
471+ - Support last 2 major GHC versions
498472- Semantic versioning (PVP)
499473- Deprecation warnings for 2 releases before removal
500474- Compatibility matrix published on website
@@ -504,7 +478,7 @@ By 2027, DataHaskell will provide a high-performance, end-to-end data science to
504478## Communication Plan
505479
506480### Internal (Maintainers)
507- - ** Slack/ Discord channel** : Daily async communication
481+ - ** Discord channel** : Daily async communication
508482- ** GitHub Discussions** : Technical decisions, RFCs
509483- ** Monthly video call** : Roadmap progress, blockers
510484- ** Quarterly planning session** : Next phase priorities
@@ -518,75 +492,6 @@ By 2027, DataHaskell will provide a high-performance, end-to-end data science to
518492
519493---
520494
521- ## Near-Term Action Items (Next 30 Days)
522-
523- ### For dataframe maintainer (mchav)
524- 1 . [ ] Finalize v0.1 release checklist
525- 2 . [ ] Write Parquet support specification
526- 3 . [ ] Create 3 dataframe ↔ Hasktorch examples
527- 4 . [ ] Set up benchmark infrastructure
528-
529- ### For Hasktorch team
530- 1 . [ ] Test dataframe integration patterns
531- 2 . [ ] Document tensor conversion APIs
532- 3 . [ ] Create example pipeline notebook
533- 4 . [ ] Identify distributed training requirements
534-
535- ### For distributed-process team
536- 1 . [ ] Prototype distributed dataframe operations
537- 2 . [ ] Document deployment patterns
538- 3 . [ ] Create cluster setup guide
539- 4 . [ ] Design fault-tolerance strategy
540-
541- ### For community coordinator
542- 1 . [ ] Set up monthly call schedule
543- 2 . [ ] Create Discord/Slack workspace
544- 3 . [ ] Draft website redesign plan
545- 4 . [ ] Reach out to potential contributors
546-
547- ### For all
548- 1 . [ ] Review and comment on this roadmap
549- 2 . [ ] Identify personal capacity for next 6 months
550- 3 . [ ] Claim ownership of specific deliverables
551- 4 . [ ] Share roadmap with broader community
552-
553- ---
554-
555- ## Appendix A: Related Projects to Consider
556-
557- ### Existing Haskell Projects
558- - ** Frames** : Alternative dataframe (potential collaboration/consolidation?)
559- - ** hmatrix** : Linear algebra (ensure compatibility)
560- - ** statistics** : Statistical computing (modernization candidate)
561- - ** Chart/hvega** : Visualization (integration targets)
562- - ** postgresql-simple** : Database connectivity
563- - ** accelerate** : Array processing with GPU support
564-
565- ### External Integration Targets
566- - ** Apache Arrow** : Zero-copy data interchange
567- - ** DuckDB** : Embedded analytical database
568- - ** ONNX** : Model interchange format
569- - ** MLflow** : ML lifecycle management
570-
571- ---
572-
573- ## Appendix B: Glossary
574-
575- ** Critical Path** : dataframe → statistics → ML toolkit → distributed operations
576- ** Integration Points** : Where libraries share data structures or APIs
577- ** Zero-Copy** : Data sharing without duplication in memory
578- ** Type-Safe** : Compile-time guarantees about data structure and operations
579-
580- ---
581-
582- ## Appendix C: Version History
583-
584- | Version | Date | Changes | Author |
585- | ---------| ------| ---------| --------|
586- | 1.0 | Nov 2026 | Initial comprehensive roadmap | DataHaskell coordinators |
587-
588- ---
589-
590495## How to Use This Roadmap
591496
592497This is a ** living document** . We will:
@@ -595,8 +500,6 @@ This is a **living document**. We will:
595500- Celebrate milestones publicly
596501- Adapt based on community feedback
597502
598- ** Contributing** : See [ CONTRIBUTING.md] for how to propose changes to this roadmap.
599-
600503** Questions?** Open a discussion on GitHub or join our community calls.
601504
602505---
0 commit comments