-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Update Tile Pre-Processor to support more modes #6558
base: main
Are you sure you want to change the base?
feat: Update Tile Pre-Processor to support more modes #6558
Conversation
6414d2d
to
cc3dbf6
Compare
fd8a155
to
2911149
Compare
3dc5566
to
9a23dd2
Compare
9a23dd2
to
a21f7d7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did an initial high-level pass. I haven't actually looked at the specifics of how the new modes are implemented yet - I'll do that once some of these initial questions are answered.
Have you done any testing with the new modes? Or can you give guidance on when you might want to choose one over another.
|
||
if self.mode == "regular": | ||
np_img = HWC3(np_img) | ||
if self.down_sampling_rate < 1.1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize that you didn't change this code, but do you know why 1.1
was chosen as the cut-off? It seems like we intentionally chose to allow float
down sampling rates, implying that a value such as 1.07
could be a valid downsampling rate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why this was done originally. I'll need to check.
if random.random() > 0.5: | ||
np_img = self.apply_gaussian_blur(np_img, ksize=int(blur_strength), sigmaX=blur_strength / 2) | ||
|
||
if random.random() > 0.5: | ||
np_img = self.apply_guided_filter(np_img, radius, eps, int(scale_factor)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why non-deterministic? Should we make these two different modes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also how this particular model seems to prefer it: https://huggingface.co/xinsir/controlnet-tile-sdxl-1.0
@@ -0,0 +1,283 @@ | |||
# ruff: noqa: E741 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file appears to have been copied from somewhere. Can you link to the original?
Also, why use this implementation over cv2.ximgproc.guidedFilter
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR was made to support this new Tile model: https://huggingface.co/xinsir/controlnet-tile-sdxl-1.0
This one uses a custom version of the guidedFilter
which they claim to be faster. I have not done benchmarks to test this myself. Functionally using the inbuilt guidedFilter
from cv2
should also work just fine imo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like it would be easy to test for both speed and behavior. I think that's worth doing so we can decide whether it's worth bringing in this file.
If we do decide that this file is better than the cv2 version, then we should:
- Explain why we are using this over cv2 at the top of this file.
- Permalink to the source that this file was copied from.
- Remove
import cv2.ximgproc
from the invocation file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed the sampling mode implementations.
Do you have test results that you can share for the new modes? Are they empirically better?
# referenced from | ||
# https://huggingface.co/TTPlanet/TTPLanet_SDXL_Controlnet_Tile_Realistic/blob/37f1c4575b543fb2036e39f5763d082fdd135318/TTP_tile_preprocessor_v5.py | ||
def _blur_resample(self, np_img: np.ndarray[Any, Any]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function does not seem to match anything in the linked file. Is the link wrong?
blur_strength = random.sample([i / 10.0 for i in range(10, 201, 2)], k=1)[0] | ||
radius = random.sample([i for i in range(1, 40, 2)], k=1)[0] # noqa: C416 | ||
eps = random.sample([i / 1000.0 for i in range(1, 101, 2)], k=1)[0] | ||
scale_factor = random.sample([i / 10.0 for i in range(10, 181, 5)], k=1)[0] | ||
|
||
if random.random() > 0.5: | ||
np_img = self._apply_gaussian_blur(np_img, ksize=int(blur_strength), sigma_x=blur_strength / 2) | ||
|
||
if random.random() > 0.5: | ||
np_img = self._apply_guided_filter(np_img, radius, eps, int(scale_factor)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the random sampling in this block makes it look like experimental code?
Nodes should be deterministic, so we need to either:
- Hard-code these values to good defaults.
- Expose them as parameters.
- Add a seed. (This seems like a bad idea. I can't think of a good reason for a preprocessor to need a seed.)
height, width, _ = np_img.shape | ||
ratio = np.sqrt(1024.0 * 1024.0 / (width * height)) | ||
resize_w, resize_h = int(width * ratio), int(height * ratio) | ||
np_img = cv2.resize(np_img, (resize_w, resize_h)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the first step is to resize to match the area 1024x1024. Why? Is this SDXL-specific? Does this interact in some important way with one of the later hard-coded values?
height, width, _ = np_img.shape | ||
ratio = np.sqrt(1024.0 * 1024.0 / (width * height)) | ||
resize_w, resize_h = int(width * ratio) // 48 * 48, int(height * ratio) // 48 * 48 | ||
np_img = cv2.resize(np_img, (resize_w, resize_h)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as above re: why 1024x1024?
And why force it to be a multiple of 48?
@blessedcoolant Do you plan to pick back up on this |
Summary
Add support for more methods of resampling to work with Tile ControlNet models.
regular
- The default downsizing resampler which basically downsamples the input image.blur
- A new guided filter algorithm that works with the now available ControlNet models for SDXL.var
- Used for image variations apparentlysuper
- TODO -- used with super sizing but probably should not be in this particular node anywayTile Image Processor is now also available in the Linear UI.
Related Issues / Discussions
QA Instructions
Merge Plan
Test and Merge.
Checklist