Skip to content

Comments

Add scientific data and geospatial publishing guides & case study#2243

Open
2color wants to merge 26 commits intoipfs:mainfrom
2color:geospatial-guide
Open

Add scientific data and geospatial publishing guides & case study#2243
2color wants to merge 26 commits intoipfs:mainfrom
2color:geospatial-guide

Conversation

@2color
Copy link
Member

@2color 2color commented Jan 23, 2026

What

Adds new documentation focused on scientific/geospatial data publishing with IPFS (Zarr + tooling), plus an ORCESTRA case study and related VuePress navigation updates, with a small quickstart retrieval enhancement.

Changes:

Added “Scientific data and IPFS landscape guide” and “Publish geospatial Zarr data with IPFS” how-to pages.
Added ORCESTRA case study and updated VuePress sidebar/navigation to include the new section + case study.
Extended retrieval quickstart with a Python/ipfsspec verified retrieval example and updated spellcheck ignore list.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 23, 2026

🚀 Build Preview on IPFS ready

@mishmosh
Copy link
Collaborator

This is great as a specific how-to. Is there another, complementary place we can write about all the ways geospatial users can benefit from IPFS?

From live meeting:

  • Consider title “Scientific Data” as category
    • Ecosystem Tooling
    • Guide to Publishing Scientific Data
  • IPFS is used by the geospatial community for better collaboration, data integrity, and open access.
    (make sure we can describe some of the architectures used)
    • Connecting kubo to your existing data repositories (stac catalog)
    • Private clusters (but open retrieval) or “Collaborative publishing”
    • Provenance

@2color 2color marked this pull request as ready for review February 4, 2026 16:14
Copy link
Collaborator

@mishmosh mishmosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions and comments inline, but I'm confident you can take it from here. Would also like to see @vmx review.

2color and others added 6 commits February 6, 2026 14:34
Co-authored-by: Volker Mische <volker.mische@gmail.com>
Co-authored-by: Mosh <1306020+mishmosh@users.noreply.github.com>
Co-authored-by: Mosh <1306020+mishmosh@users.noreply.github.com>
@2color 2color requested a review from vmx February 6, 2026 14:46
Comment on lines +103 to +105
--raw-leaves \
--chunker=size-1048576 \
--cid-version=1 \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once Kubo 0.40 ships these could be removed and replaced by one-time ipfs config profile apply unixfs-v1-2025 or setting Import.* values one-by-one

@2color

This comment was marked as outdated.

@2color 2color changed the title Add geospatial publishing guide Add scientific data and geospatial publishing guides & case study Feb 13, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds new Scientific Data documentation content to the IPFS docs site, including a hands-on guide for publishing geospatial Zarr datasets and supporting context via a landscape overview and an ORCESTRA case study. Updates the VuePress sidebar to surface the new pages and case study.

Changes:

  • Add a new “Publish Geospatial Zarr Data with IPFS” how-to guide.
  • Add a new “Scientific Data and IPFS Landscape Guide” overview page.
  • Add a new ORCESTRA case study and update VuePress navigation (including sidebar re-organization and case study list).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 20 comments.

File Description
docs/how-to/scientific-data/publish-geospatial-zarr-data.md New step-by-step publishing guide (Zarr + IPFS), including discovery/access patterns.
docs/how-to/scientific-data/landscape-guide.md New overview of scientific data formats, architectural patterns, and ecosystem tooling.
docs/case-studies/orcestra.md New case study describing ORCESTRA’s use of IPFS for scientific data distribution.
docs/.vuepress/config.js Adds the new Scientific Data pages to the How-to sidebar and adds ORCESTRA to case studies; also reorganizes peer-related sidebar entries.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mishmosh and others added 5 commits February 19, 2026 03:21
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 12 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +77 to +80
ds = xr.open_dataset(filename)
# Example: targeting ~1 MB chunks with float32 data
ds.to_zarr('output.zarr', encoding={
'var_name': {'chunks': (1, 512, 512)}
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chunking example uses undefined placeholders (filename, var_name), which will error if readers copy/paste. Consider making these explicit string placeholders (e.g., "path/to/file" / "variable_name") or adding a short comment that they must be replaced.

Suggested change
ds = xr.open_dataset(filename)
# Example: targeting ~1 MB chunks with float32 data
ds.to_zarr('output.zarr', encoding={
'var_name': {'chunks': (1, 512, 512)}
filename = "path/to/your/file.nc" # Replace with the path to your dataset
ds = xr.open_dataset(filename)
# Example: targeting ~1 MB chunks with float32 data
ds.to_zarr('output.zarr', encoding={
'variable_name': {'chunks': (1, 512, 512)} # Replace with the name of your variable

Copilot uses AI. Check for mistakes.
2color and others added 3 commits February 24, 2026 10:48
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link

@lkluft lkluft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this really nice article about ORCESTRA!

I made a couple of minor suggestions, but I do like the overall story very much! 👍

<NumberBlock :items="[
{value: '20+', text:'Research institutions'},
{value: '8', text: 'Sub-campaigns'},
{value: '50+', text: 'Participating organizations'},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{value: '50+', text: 'Participating organizations'},
{value: '50+', text: 'Scientists on-site'},


[ORCESTRA](https://orcestra-campaign.org/) (Organized Convection and EarthCARE Studies over the Tropical Atlantic) is an international field campaign that launched in early 2024 to study tropical mesoscale convective systems: the storm systems that play a significant role in the Earth's weather and climate dynamics.

The campaign brings together **over twenty scientific institutions** spanning Europe, North America, and Africa. Eight sub-campaigns (three airborne, one land-based, and four at sea) coordinate aircraft, ships, ground stations, and satellites to collect atmospheric measurements across the tropical Atlantic.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we did perform overflights with several satellites, one might mention EarthCARE explicitly (as it also comes up in the name).

Suggested change
The campaign brings together **over twenty scientific institutions** spanning Europe, North America, and Africa. Eight sub-campaigns (three airborne, one land-based, and four at sea) coordinate aircraft, ships, ground stations, and satellites to collect atmospheric measurements across the tropical Atlantic.
The campaign brings together **over twenty scientific institutions** spanning Europe, North America, and Africa. Eight sub-campaigns (three airborne, one land-based, and four at sea) coordinate aircraft, ships, and ground stations to collect atmospheric measurements across the tropical Atlantic and validate observations made by the [EarthCARE](https://earth.esa.int/eogateway/missions/earthcare) satellite, which was launched shortly before the start of the campaign.


## How ORCESTRA works

ORCESTRA's eight sub-campaigns span sea, air, and land, collecting atmospheric measurements such as temperature, humidity, wind, radiation, aerosols, and cloud properties. This observational data is structured as multidimensional arrays and stored primarily in the [Zarr](https://zarr.dev/) format, a cloud-native format optimized for chunked, distributed access to large scientific datasets.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ORCESTRA's eight sub-campaigns span sea, air, and land, collecting atmospheric measurements such as temperature, humidity, wind, radiation, aerosols, and cloud properties. This observational data is structured as multidimensional arrays and stored primarily in the [Zarr](https://zarr.dev/) format, a cloud-native format optimized for chunked, distributed access to large scientific datasets.
ORCESTRA's eight sub-campaigns cover sea, air and land. They collect atmospheric measurements such as temperature, humidity, wind, radiation, aerosols and cloud properties, as well as oceanic measurements such as sea-surface temperature, salinity and ocean currents. This observational data is structured as multidimensional arrays and stored primarily in the [Zarr](https://zarr.dev/) format, a cloud-native format optimized for chunked, distributed access to large scientific datasets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants