Conversation
cderv
left a comment
There was a problem hiding this comment.
I think this logic could be better than current for sure. Thanks !
Though several thoughts:
For Quarto, we could output Markdown or HTML. Quarto does this for Jupyter and I believe give priority HTML method; Example:
---
title: test
keep-md: true
---
```{python}
import pandas as pd
df = pd.DataFrame({"object": ["Sun", "Earth"], "radius": [696000, 6371]})
df
```If you look at the intermediary .md output, the table is in its HTML representation.
However, Using Markdown representation could be the same as we parse HTML table
@cscheid do you think it should be like Quarto and Jupyter, or reticulate should output _repr_markdown_ by default for Pandas ? (regarding your comment at #783 (comment)
We can discuss what Quarto does exactly, but currently we select HTML - so I commented with that in mind.
For rmarkdown, I believe this is a good logic that will use any knit_print.data.frame method that another package could define. Is there any possibly loss passing by py_to_r() to rely on data.frame representation, instead of what Pandas HTML representation could do ?
| if (knitr::opts_knit$get("rmarkdown.df_print") != "default" && renderDF) { | ||
| return(knitr::knit_print(py_to_r(value))) | ||
| } |
There was a problem hiding this comment.
I think we need to check really if using knit_print() at this level works really. I don't recall what knitr expect as output , but also it seems to me that there is a special mechanism in reticulate with .engine_context$pending_plots to store some content produce and then make it work with options in eng_python(). Next if (isHtml && py_has_method(value, "_repr_html_")) does something to output HTML raw data. I wonder if this is needed or not.
I don't know if you tested by at least, it is missing an argument here
| if (knitr::opts_knit$get("rmarkdown.df_print") != "default" && renderDF) { | |
| return(knitr::knit_print(py_to_r(value))) | |
| } | |
| if (knitr::opts_knit$get("rmarkdown.df_print") != "default" && renderDF) { | |
| return(knitr::knit_print(py_to_r(value), options)) | |
| } |
After this is fix it is working as expected it seems. (object is passed in captured and added to outputs$data() which is correctly passed to engine_output(). So looks good to me.
There was a problem hiding this comment.
Thanks for the fix, I made the change in 8243db4
Simply returning the raw output seems to work correctly. knit_print output seems to have a knit_html class that is correctly handled by knitr later. DO you have ideas of what could the potential edge cases? I have tried things like printing multiple tables from the same chunk and it worked.
| } | ||
|
|
||
| eng_python_generic_autoprint <- function(captured, value) { | ||
| if (py_has_method(value, "_repr_markdown_")) { |
There was a problem hiding this comment.
As commented in main comment, I wonder if we should do as Quarto and
- Output HTML first if available
- Output Markdown if available
- Then captured (equivalent to
text/plain)
Logic in Quarto for display is at displayMimeType() where we have a logic of ordering possibly output before checking the output values. In this logic, if there is HTML and Markdown representation, HTML is preferred currently
|
@cderv Thank you very much for the review, and sorry for taking so long to go back at this.
We could rely on
I'm afraid I don't have enough information about pros and cons to decide. That can be a breaking change because the current engine behavior is to prefer the markdown representation. Maybe that's a good time to push this change? It would be nice to know though the benefits of prefering HTML over the markdown representation. |
|
Thanks for working on this! I'm currently writing a Quarto book that uses both dplyr and pandas to display dataframes as tables at several places in the book. Currently, I'm stuck at using hidden cells with ipython's |
Improves how the reticulate engine handles pandas data frames rendering.
This PR is an attempt to fix #783
In summary it adresses two problems:
Quarto documents rendered with the Jupyter engine will by default use rich data.frame rendering, which is not the case if they use the reticulate/knitr engine.
We prefer not having such difference between engines, so this PR will special case Quarto rendered documents and use
.to_html()or._repr_markdown()when appropriate when out-printing data.frames into documents in Quarto rendered documents.The reticulate engine doesn't honor RMarkdown's
df_printoption and always usesprint(df)to display pandas data.frames.This PR adds support for such option and we now respect the
df_printoption when auto-printing pandas data.frames. When not using the'default'value, we convert the pandas data frame into an R data frame and then delegate to knitr for printing.Both changes are breaking changes, in the sense that they will now be the default. One should set
reticulate.engine.render_dftoFALSEin order to get back the old behavior.@cderv If you have time, it would be awesome if you can take a look :)