Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Side-Effects in Marsha RFC #159

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Side-Effects in Marsha RFC #159

wants to merge 1 commit into from

Conversation

dfellis
Copy link
Member

@dfellis dfellis commented Aug 1, 2023

No description provided.

@dfellis dfellis self-assigned this Aug 1, 2023
@dfellis dfellis requested review from depombo and aguillenv August 1, 2023 01:57
Comment on lines +96 to +98
#### Disallow mocks in the test suite

One possibility to fix this situation is to disable mocking in the generated test suite at all. In that case, the side-effect functions *will* be executed for real. This could resolve the side-effect input problem by using real-world data for the test, but it would make the reliability of the generation lower (why we added mock support in the first place), It also doesn't do anything for the side-effect outputs, which would remain completely untested.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, for this case, we would need to see how to achieve this since we are not explicitly allowing the mocks now, except for the file system:

If you are working with files, make sure to mock the file system since the tests will be run in a sandboxed environment.

Maybe the last part of that sentence is making the LLM create mocks for all tests with external deps?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard not to anthropomorphize the LLMs, but including a situation where it must mock may increase the probability that it decides that it should mock in other situations, too.

I have definitely noticed after that was introduced it is using mocks in many more tests than it was before.

Comment on lines +60 to +74
#### Optional side-effect "arguments"

Treating the side-effect inputs and outputs as non-side-effect inputs and outputs that are automatically set (or ignored for the return type) might make the concept less confusing. Wrapping the side-effects in both in brackets at the end of the lists could help here:

```md
# func cnn(string of section to take headlines from, [https://cnn.com]): list of headlines

This function scrapes the cnn.com website for headlines. The section it takes the headlines from is passed to it, with 'home' referring to the homepage at cnn.com and should be special cased, while 'us' refers to cnn.com/us, 'politics' refers to cnn.com/politics, and so on for every top-level category CNN has.

* cnn('home') = ["Florida's new standards for teaching Black history spark outrage", "His books sold over 300 million copies and were translated into 63 languages. Now, a museum is acknowledging his racism", "Player quits match in tears as tennis world slams opponent’s ‘absolutely disgusting’ actions"]
* cnn('us') = ['18-year-old Miami woman arrested after allegedly trying to hire a hitman to go after her 3-year-old son', 'Investigation into Gilgo Beach serial killings suspect expands to Nevada and South Carolina', 'Rescue crews continue search for 2 children swept away by Pennsylvania floodwater that killed their mother']
* cnn('world') = ["Police raids follow shocking video of sexual assault in India’s Manipur state amid ethnic violence", 'Ukrainian air defenses in Odesa outgunned as Russia targets global grain supply', 'Anger boils over as Kenya’s cost of living protests shake the nation']
```

Here it's clear that the `[https://cnn.com]` is *different*, though exactly how is not clear on a first reading. The fact that this input argument doesn't show up in the input arguments list of the generated function but it does show up in the function body might help clear it up a bit for the user just from inspection, but it could also confuse people that it's a "default" value of some sort that you can replace with some other value when that is not the case.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also [] are sometimes used as optional values and that could be confusing. This alternative would not take into account the "output" side effect either, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the suggestions I preferred, though, even though I agree this particular syntax is confusing. It doesn't help with discoverability, but it does make it obvious that it is an explicitly added feature in Marsha as a project with implications on how the compilation will go, rather than just relying on the LLM to "do the right thing" as some of the other examples imply.

* cnn('world') using a mock of https://cnn.com/world = ["Police raids follow shocking video of sexual assault in India’s Manipur state amid ethnic violence", 'Ukrainian air defenses in Odesa outgunned as Russia targets global grain supply', 'Anger boils over as Kenya’s cost of living protests shake the nation']
```

This approach seems pretty "natural" with the examples, bolting on some special behavior if you follow a particular pattern in the example set, but that may also reduce the discoverability of the mocking mechanism, or implying other features are automatically possible by using similar phrasing for other things.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this option seems interesting. I agree that could reduce the discoverability but not sure if that should be a blocker. What I'm not so sure about is if it would be too verbose/repetitive if all the examples use the same mock. Now, if each example could use a different mock, then this one or a similar alternative seems to be the way to go.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beyond the potentially repetitive nature here, though, it may not be obvious that Marsha would be using that syntax (fn_name(args...) [using a mock of ...] =) to extract what precisely is being mocked and trying to force the test generation phase to use an automatically downloaded copy of the site (or other side-effect source).

Thinking about it more this morning, it's also not clear how multiple mocks would work with this syntax, nor how to capture side-effect outputs like viz function calls or database record changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the hard part is the one related to side-effect outputs, I think none of the current alternatives is able to solve it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The [] syntax option does. # func my_func(regular arg type, [something_to_mock]): return type, [side-effect to intercept/validate]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants