Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add blog post demonstrating the usage of GT.fmt_image() and vals.fmt_image() #558

Merged
merged 3 commits into from
Dec 13, 2024

Conversation

jrycw
Copy link
Collaborator

@jrycw jrycw commented Dec 12, 2024

Hello team,

I’d like to propose a blog post to help users better understand how to render images using the recent improvements to GT.fmt_image() and vals.fmt_image(). Feedback is always welcome!

In addition, I’ve noticed that the table alignment behaves differently depending on whether the file_pattern= argument is used. Below is a minimal example to demonstrate this behavior:

import pandas as pd
from great_tables import GT

data = {
    "name": [
        "Argentine",
        "Bastille",
        "Bérault",
        "Champs-Élysées—Clemenceau",
        "Charles de Gaulle—Étoile",
    ],
    "lines": ["1", "1, 5, 8", "1", "1, 13", "1, 2, 6"],
}

img_url_paths = "https://raw.githubusercontent.com/posit-dev/great-tables/refs/heads/main/great_tables/data/metro_images"

metro_mini = pd.DataFrame(
    {
        **data,
        "filenames": [
            ", ".join(f"metro_{item}.svg" for item in row.split(", ")) for row in data["lines"]
        ],
    }
)

print(metro_mini)

(
    GT(metro_mini)
    .fmt_image("filenames", path=img_url_paths)
    .fmt_image("lines", path=img_url_paths, file_pattern="metro_{}.svg")
    .cols_label(
        {
            "lines": "right-aligned(w file_pattern)",
            "filenames": "left-aligned(w/o file_pattern)",
        }
    )
    .show()
)
                        name    lines                              filenames
0                  Argentine        1                            metro_1.svg
1                   Bastille  1, 5, 8  metro_1.svg, metro_5.svg, metro_8.svg
2                    Bérault        1                            metro_1.svg
3  Champs-Élysées—Clemenceau    1, 13              metro_1.svg, metro_13.svg
4   Charles de Gaulle—Étoile  1, 2, 6  metro_1.svg, metro_2.svg, metro_6.svg

image

I’ve attempted to investigate the code but couldn’t pinpoint the root cause. Could the team clarify if this is an intended feature or a potential bug?

Copy link

codecov bot commented Dec 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.67%. Comparing base (ceef5c5) to head (08acafa).
Report is 4 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #558   +/-   ##
=======================================
  Coverage   90.66%   90.67%           
=======================================
  Files          46       46           
  Lines        5367     5381   +14     
=======================================
+ Hits         4866     4879   +13     
- Misses        501      502    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@github-actions github-actions bot temporarily deployed to docs-preview-blog December 12, 2024 14:42 Destroyed
@rich-iannone
Copy link
Member

Thanks for submitting this post! We'll read it over shortly. To address the question:

I’ve attempted to investigate the code but couldn’t pinpoint the root cause. Could the team clarify if this is an intended feature or a potential bug?

It's a feature (the auto_align= argument in GT()) that has some unintended consequences. Auto-align will align based on dtypes and also whether strings look like numbers/datetimes (contain the character set [0-9 -/:\\.]). This is all in the align_from_data() function:

number_like_matches.append(re.match("^[0-9 -/:\\.]*$", val))

Most of the time, this works well. In the case of your example, those metro line numbers in lines column just happen to have all cells matching the character set. If you just execute GT(metro_mini) you'll see the table aligns like this:

image

And using a letter in place of any of the numbers will result in left alignment of the lines column. That alignment is kept when replacing the text with images via fmt_image().

I think that there should probably be an align= argument for fmt_image() (defaulting to left) as a corrective and useful option. I can see inline images being aligned as left, center, and right as common use cases and having to use cols_align() separately seems a little inconvenient here.

@jrycw
Copy link
Collaborator Author

jrycw commented Dec 12, 2024

@rich-iannone , thank you for the prompt response and for pointing me to the relevant source code.
This is an interesting behavior that I wasn’t aware of—definitely learned something new about Great Tables today!

Copy link
Collaborator

@machow machow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR! This post is incredible, and reads really nicely! The 4 body case studies seem great in terms of prep and example data.

There are two pieces in those case studies that I think could be tweaked a bit to motivate the "why" of the cases. This is important IMO because it tells them what they'll get out of each case study (especially 3 and 4, which are convenience arguments).

Here are the two pieces:

  • The summary bullets at the beginning of the section (e.g. add text after to explain "why"; see comment)
  • The explanations just before the DataFrame code in each case study.
    • (I put an example in my case 1 comment)
    • Since the code preparing tables is folded (a nice move!), you could remove the paragraphs above any folded code explaining what it does.
    • Instead of the code explanations, a sentence prep'ing people on the table output might help orient them. What should they notice in each case study table?

I glanced through the other sections and they look so great!

Comment on lines 25 to 28
* **Case 1**: Local file paths.
* **Case 2**: Full HTTP/HTTPS URLs.
* **Case 3**: File names with the `path=` argument.
* **Case 4**: File names using both the `path=` and `file_pattern=` arguments.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This summary is really helpful up top! It says the "what" of each case (e.g. case 3 is using the path argument). The piece that took me a bit of scanning to find is the "why" of each case. I wonder if we could give a very short statement either in the bullets or below to help readers understand why they might choose a given case.

It seems like case 1 and 2 are fairly obvious. But afaik for others it looks like the whys might be...

  • case 3: specify image names relative to a base directory or url (e.g. "/path/to/images")
  • case 4: specify image names inside a path (e.g. "metro_{}.svg")

I think you could get away with adding a sentence or two beneath the list explaining the whys of case 3 and 4 (or there might be some other nice way of including?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a picture is worth a thousand words? Do you think the mermaid flowchart is clear enough for readers?

image

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we still need to add a few sentences to explain the "why," with the flowchart serving as an additional aid. I'll continue working on this tomorrow, as it's quite late here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using a callout to provide a quick tip for readers?

image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This little call out box seems great!

Comment on lines 66 to 73
Our goal is to construct a Python dictionary to serve as input for creating a `Pandas` DataFrame.
This dictionary will include the previously defined `data` and a new key, `"case1"`. The value for
`"case1"` is generated by processing the `data["lines"]` column, which contains a list of strings.

For each string in `data["lines"]`, we split it by commas to extract individual numbers. These
numbers are then processed by adding the necessary prefix and suffix to generate valid file paths.
Once processed, the paths are joined back into a single string using `", ".join()`. The resulting
strings from all iterations are collected into a list, which becomes the value for `"case1"`:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These paragraphs seem to explain the code below them. Since that code is folded, I think you could get away with removing these, and instead provide a sentence preparing people for the table output.

(If it seems useful, you could put some of this explanation in the folded code block)

An example for explaining the table might be

Below is a DataFrame, whose case1 column has full file paths that we want to turn into images.



### Case 3: File Names with the `path=` Argument
**Case 3** demonstrates how to simulate a column containing file names. The preparation for creating
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A sentence on why might help prime readers for what they'll get out of this case. Maybe something like...

Case 3 demonstrates using the path= argument to specify images relative to a base directory. This lets you cut out a lot of repetition in file names. <remove DataFrame code prep explanation text>

this works in **Case 4** below.

### Case 4: File Names Using Both the `path=` and `file_pattern=` Arguments
**Case 4** demonstrates how to simulate a column containing only the variable part of a string pattern.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to case 3, the why of case 4 up top might help people see what they'll get out of this section. something like..?:

Case 4 demonstrates how to use path= and file_pattern= to specify images whose names are inside a common name. For example, you might use file_pattern="metro_{}.svg", to specify "metro_1.svg", "metro_2.svg", etc..

@jrycw
Copy link
Collaborator Author

jrycw commented Dec 12, 2024

@machow , thank you so much for the excellent suggestions! I’ll work on incorporating them into the post.

@machow and @rich-iannone , I’d appreciate your thoughts on the following questions:

  • Is it appropriate to set pd.set_option('display.max_colwidth', 150)? I aim to display at least one full path in the column, and this setting works well for shorter paths. However, I’m concerned the paths might be too long. Do you have any suggestions for a better solution?

  • Should we ensure consistent alignment across all tables by explicitly using GT.cols_align()?

@machow
Copy link
Collaborator

machow commented Dec 12, 2024

Is it appropriate to set pd.set_option('display.max_colwidth', 150)?

It seems okay to use, especially if it lets people see the file paths. It's tough the file names are so long, but also--since that's addressed by case study 3--maybe it's a chance to lean into the problem of the long file names. You could flag in case study 1, "these are filepaths to images inside the great_tables.data module, and notice they're super long. (In case study 3 we'll show you how avoid repeating the directory name every time" (or similar?)

@rich-iannone
Copy link
Member

  • Should we ensure consistent alignment across all tables by explicitly using GT.cols_align()?

I think we ought to for these examples.

@jrycw
Copy link
Collaborator Author

jrycw commented Dec 13, 2024

@machow , thank you again for your excellent suggestions!

I decided to place the "why" section in a callout to provide a quick tip for readers. I also refined the wording for each case, thanks to the helpful "something like" examples you provided.

Additionally, I made the following updates:

  1. Added a section to remind readers that file_pattern can be used independently.
  2. Highlighted that GT.fmt_image() supports rendering multiple images within a single row.
  3. Aligned the image column to the right, which I think looks more aesthetically pleasing. While we technically don’t need to apply GT.cols_align() to every table, relying on the auto-align feature could lead to inconsistencies if we modify the behavior of GT.fmt_image() or the GT object in the future.
  4. Reworded and polished some sentences.

@machow and @rich-iannone , feel free to make any direct updates to this PR if needed.

Copy link
Collaborator

@machow machow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This blog post is incredible, @rich-iannone do you want to review? (or feel free to merge!)

Copy link
Member

@rich-iannone rich-iannone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!!

@rich-iannone rich-iannone merged commit 4043f14 into posit-dev:main Dec 13, 2024
14 of 15 checks passed
@jrycw jrycw deleted the blog-render-image branch December 14, 2024 01:08
@jrycw jrycw mentioned this pull request Dec 15, 2024
machow added a commit that referenced this pull request Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants