`prefix` arg to pandas.read_csv is deprecated

As of 1.4.0. https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html Will be removed in future, dunno when.

Per https://github.com/pandas-dev/pandas/issues/43396, the reason seems to be it conflicted with the `names` and `headers` arguments, and didn't add much value. (I think it only took effect if you passed `header=None` and didn't pass `names`.) Instead of `prefix='foo_'` you're now supposed to call `df.columns = [f'foo_{col}' for col in df.columns]` after `read_csv`.

But p9-cli doesn't give the user a way to do that. By default columns are just numbered 0,1,..., and [plotnine.aes](https://plotnine.readthedocs.io/en/stable/generated/plotnine.mapping.aes.html) works best if the column names are valid python identifiers. So if you have no header, your options to replace `prefix=col` seem to be:

* Provide `names,=col0 ,=col1 ,=col2 ...` - annoying.
* Instead of `x=col0 y=col1`, use `x='data[0]' y='data[1]'` - undocumented in p9-cli, more verbose, and I'm not sure it works in all the same ways.

If I don't like these, options for p9-cli seem to be:

* Implement `prefix` ourselves, by removing it from the kwargs passed to `read_csv` and then renaming the columns afterwards.
    * Either always rename if `prefix` is passed (different from `read_csv`), or only if `header` and `names` are both `None` (might be helpful if e.g. the header in the file is numeric; unlikely to cause problems?).
* If `header` and `names` are both `None`, automatically rename columns to add a prefix. (Obvious choices are `c`, `col` or `col_`. Following [`q`](https://harelba.github.io/q/) I think I like `c`.)
* Some combo. Perhaps: if `header` and `names` are both `None`, automatically add prefix. Look at the `prefix` kwarg to choose the prefix, sensible default if not given. If they're not both `None`, and there's a `prefix` kwarg, add a prefix anyway. I think I like this best.

Could also move this `prefix` arg outside of `--csv`, which would improve consistency, and possibly also apply to `--dataset` and (if supported in future) reading from sqlite tables and stuff. I think I'll leave it there at least for now though.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`prefix` arg to pandas.read_csv is deprecated #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

prefix arg to pandas.read_csv is deprecated #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

`prefix` arg to pandas.read_csv is deprecated #2