As of 1.4.0. https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html Will be removed in future, dunno when.
Per pandas-dev/pandas#43396, the reason seems to be it conflicted with the names and headers arguments, and didn't add much value. (I think it only took effect if you passed header=None and didn't pass names.) Instead of prefix='foo_' you're now supposed to call df.columns = [f'foo_{col}' for col in df.columns] after read_csv.
But p9-cli doesn't give the user a way to do that. By default columns are just numbered 0,1,..., and plotnine.aes works best if the column names are valid python identifiers. So if you have no header, your options to replace prefix=col seem to be:
- Provide
names,=col0 ,=col1 ,=col2 ... - annoying.
- Instead of
x=col0 y=col1, use x='data[0]' y='data[1]' - undocumented in p9-cli, more verbose, and I'm not sure it works in all the same ways.
If I don't like these, options for p9-cli seem to be:
- Implement
prefix ourselves, by removing it from the kwargs passed to read_csv and then renaming the columns afterwards.
- Either always rename if
prefix is passed (different from read_csv), or only if header and names are both None (might be helpful if e.g. the header in the file is numeric; unlikely to cause problems?).
- If
header and names are both None, automatically rename columns to add a prefix. (Obvious choices are c, col or col_. Following q I think I like c.)
- Some combo. Perhaps: if
header and names are both None, automatically add prefix. Look at the prefix kwarg to choose the prefix, sensible default if not given. If they're not both None, and there's a prefix kwarg, add a prefix anyway. I think I like this best.
Could also move this prefix arg outside of --csv, which would improve consistency, and possibly also apply to --dataset and (if supported in future) reading from sqlite tables and stuff. I think I'll leave it there at least for now though.
As of 1.4.0. https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html Will be removed in future, dunno when.
Per pandas-dev/pandas#43396, the reason seems to be it conflicted with the
namesandheadersarguments, and didn't add much value. (I think it only took effect if you passedheader=Noneand didn't passnames.) Instead ofprefix='foo_'you're now supposed to calldf.columns = [f'foo_{col}' for col in df.columns]afterread_csv.But p9-cli doesn't give the user a way to do that. By default columns are just numbered 0,1,..., and plotnine.aes works best if the column names are valid python identifiers. So if you have no header, your options to replace
prefix=colseem to be:names,=col0 ,=col1 ,=col2 ...- annoying.x=col0 y=col1, usex='data[0]' y='data[1]'- undocumented in p9-cli, more verbose, and I'm not sure it works in all the same ways.If I don't like these, options for p9-cli seem to be:
prefixourselves, by removing it from the kwargs passed toread_csvand then renaming the columns afterwards.prefixis passed (different fromread_csv), or only ifheaderandnamesare bothNone(might be helpful if e.g. the header in the file is numeric; unlikely to cause problems?).headerandnamesare bothNone, automatically rename columns to add a prefix. (Obvious choices arec,colorcol_. FollowingqI think I likec.)headerandnamesare bothNone, automatically add prefix. Look at theprefixkwarg to choose the prefix, sensible default if not given. If they're not bothNone, and there's aprefixkwarg, add a prefix anyway. I think I like this best.Could also move this
prefixarg outside of--csv, which would improve consistency, and possibly also apply to--datasetand (if supported in future) reading from sqlite tables and stuff. I think I'll leave it there at least for now though.