Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ndjson for application/json requests #4969

Closed
wants to merge 4 commits into from

Conversation

pinkasey
Copy link

@pinkasey pinkasey commented Dec 19, 2022

Backstory -

I have a workflows that starts with a webhook.
This webhook is called with a request with Content-Type: application/json, and ndjson body, like this:

{ "id": 1, "message": "hello" }
{ "id": 2, "message": "I'm a JSON inside a NDJSON" }

I am completely unable to make the webhook work, because the parsing of the json is happening in Server.ts, so fiddling with the workflow settings (e.g, setting rawBody to true) - has no effect.
I get an error in the log, and the workflow is not started.

What I propose in this PR -

If a request is made with Content-Type: application/json, and parsing the body as json fails -
try to parse it as ndjson.

  • If that works - great! return an array of all the jsons in the body (1 line = 1 item)
  • if not - log an error, and pass the original answer.

Note:
In a perfect world, the API I'm consuming would send the request to the webhook with application/x-ndjson.
But I have no control over that - it's a 3rd party. And that would be the experience of most users of n8n.
I also tried to see what would've happened if the Content-Type was right - I got an empty body.

This PR is related to this feature-request from body-parser:
expressjs/body-parser#478
(If that feature-request is handled, then we could use it here instead of this PR. But I don't know if they'd answer)

@CLAassistant
Copy link

CLAassistant commented Dec 19, 2022

CLA assistant check
All committers have signed the CLA.

@n8n-assistant n8n-assistant bot added community Authored by a community member core Enhancement outside /nodes-base and /editor-ui labels Dec 19, 2022
}
return result;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function parseNdjson(str: string): any[] {
  const items = str.trim().split('\n');
  return items.map((item, i) => {
    try {
      return JSON.parse(item);
    } catch (e) {
      throw new Error(
        `error parsing ndjson, while parsing item ${i + 1} of ${items.length}: ${e.message}`,
      );
    }
  });
}

version of the code is equivalent to the original code, but it uses the map method to transform the array of strings into an array of parsed JSON objects, rather than using a for loop. This can make the code easier to read and understand. Additionally, the trim method is used to remove any leading or trailing whitespace from the input string, and the filter method is removed since it is not necessary. Finally, the error message is modified to correctly reference the current item being parsed in the map callback.

Copy link
Author

@pinkasey pinkasey Jan 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
The code looks better indeed, more readable and there's less memory allocation - trimming each line allocates a new string.

About removing the filter -
an empty line is acceptable according to the spec:

The parser MAY silently ignore empty lines, e.g. \n\n. This behavior MUST be documented and SHOULD be configurable by the user of the parser.

From my experience, API providers WILL do stupid things like putting empty-lines in the middle of an ndjson, occasionally or all the time (reminder - I opened this PR because of an API provider that did not adhere to standard).
It would be a shame that the n8n user get stuck because of this.

So I think it's better to filter out empty-lines nevertheless.
I wish there was a str.isEmpty() that does not allocates a new string.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree, it would be fab to have a str.isEmpty() method that does not allocate a new string, but for now, I think, using the trim method is a good solution to avoid unnecessary memory allocation. No?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about what i wrote about filtering out empty lines?
Do you agree?

next(err);
};
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function onJsonErrorTryNdjson() {
  return (err: any, req: express.Request, res: express.Response, next: express.NextFunction) => {
    if (
      typeis(req, ['application/json', 'application/x-ndjson']) &&
      err.type === 'entity.parse.failed'
    ) {
      let body = err.body;
      const match = body.match(/^\s*\{[^\n]*}\s*(\n\s*\{[^\n]*}\s*)+$/);
      if (match) {
        try {
          const result = parseNdjson(body);
          req.body = result;
          req.rawBody = Buffer.from(`[${result.join(',\n')}]`);
          res.status(200);
          next();
          return;
        } catch (e) {
          console.error(e);
        }
      }
    }
    next(err);
  };
}

This code is equivalent to the original code, but it uses the match method to check if the body string matches the regular expression, rather than using the match method in a boolean context. This can make the code easier to read and understand. Additionally, the return statement at the end of the if block is moved to the end of the block, to improve readability and to avoid having to use a comment to indicate the end of the "happy flow" code path. Finally, the variable match is introduced to store the result of the match method, so that it can be used in the if statement.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this ChatGPT?

Additionally, the return statement at the end of the if block is moved to the end of the block...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the changes made to the code were intended to improve its readability and maintainability, as when I looked at it at first, I wasn't sure that's the best option to go about it.

You are basically using the match method in a boolean context, which makes the whole thing not clear (at least to me), and not obvious what is being checked for. I used the match method to check if the body string matches the rregex, and I think this is more explicit and straightforward.

I hope that explains things.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol, but appreciate the doubt :)

@whoiskatrin
Copy link

General questions to this approach:

  • what if you cant determine the type of the request body based on the Content-Type header (for example, incorrect header? or if the type is generic)

@pinkasey
Copy link
Author

General questions to this approach:

  • what if you cant determine the type of the request body based on the Content-Type header (for example, incorrect header? or if the type is generic)

If we can't determine the type of the request body based on the Content-Type header, then we continue as usual, no change there.
I believe the default body type is plain text.
If that causes an error - very well. An incorrect Content-Type is a serious error by the client.

@pinkasey pinkasey requested a review from whoiskatrin March 16, 2023 11:12
@pinkasey
Copy link
Author

I've accepted CR comments, and rebased this branch.

I've since stopped using n8n, so I will not make further changes to this PR.
I hope this PR can be of help to others.

@netroy
Copy link
Member

netroy commented Jun 20, 2024

Sorry that this never got reviewed or merged. Unfortunately this can't be rebased anymore, since Server.ts has changed quite a lot.

If anyone needs ndjson support on webhooks, you can enable the "Binary Data" option on the webhook, and then parse this data in a "Code" node.

If you think this should actually be a feature in n8n, then please create a Feature request, so that this can first be discussed with the team before anything is implemented 🙏🏽

@netroy netroy closed this Jun 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community Authored by a community member core Enhancement outside /nodes-base and /editor-ui
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants