[TTL Dec 8th] Let's discuss package limits #2851

mtojek · 2021-11-23T14:20:31Z

mtojek
Nov 23, 2021

Why do we need to introduce limits?

We need rules to verify if packages produced by our users are compliant with the solutions we provide. We have to reject revisions that may impact the stability of the Package Registry or Kibana or the overall user experience. Once we support real community users and enable a certification platform, we have to pay more attention to potentially malicious activity, hence the idea of introducing hard limits.

These values have been selected arbitrarily, based on the storage observations, infrastructure limits, security risks, and operational experience.

EDIT:

I updated the proposal with the risks of not applying limit changes.

Global hard limits

Package size

250MB - we know about ML model files, that can weigh up to 100MB and these are the heaviest resources we’re aware of. If the ML team considers introducing larger models, we could cautiously raise limits.

Operational risk: Package Registry or Kibana can't handle packages with unlimited size, as they will become operational.

Security risk: Huge packages can cause out of memory errors and may lead to denial of service (cloud deployments)

Total filesize

150MB - a limit set to 50% higher than known ML models.

BTW, 4,294,967,296B - maximum file size allowed by the standard Zip format

Operational risk: same as for the package size

Security risk: same as for the package size

Total number of files in a package

65535 - the ZIP format, which we adopt for distributing package resources, has a hard limit. If we want to exceed this limit, we would have to adopt the Zip64 format.

Operational risk: we can't allow for more packages if we want to stick with ZIP archives. We won't be able to build and deliver such packages.

Total number of files in a directory

65535 - this constraint is enforced by the ZIP format.
4,294,967,295 = 2^32 - 1 - it doesn’t seem that there is a possibility that we’ll exceed the limit. Google Cloud doesn’t post any folder limits in public, but L

Operational risk: same as for the "total number of files in a package"

Local hard limits

Total number of data streams

500 - This is way above our current needs. Packages using most data streams, like AWS or Zeek, define approx. 50 data streams. Theoretically speaking, there are around 200 AWS services, so even if we provide separate data streams for logs and metrics, we should be safe to cover them all.

Operational risk: For sure there is a natural limit of data streams that can be supported by Kibana and Elasticsearch. Too many data streams may have a bad influence on the UX (slow UI).

Total number of fields per data stream

1024 - In the past, there was introduced a total limit of 1000 fields in Elasticsearch. We know the limit can be raised using appropriate settings. I think it’s safe to introduce such a hard limit for a single data stream. If a data stream defines more fields, then it may be a suggestion to the developer that it should be split into multiple streams.

Operational risk: long Kibana processing to install the package, unknown implications on the Elasticsearch side

Graphic resources (screenshots, images)

3MB - I’m not sure if it isn’t too much, but such a high limit is compliant with all graphic files in the Package Storage. We can enable image compression or web optimization, but can’t demand it.

UX risk: Kibana UI loading slowly, unneeded data transfer to render the integrations catalog.

Configuration files (YML files - manifests, fields files)

5MB - Files are read by Kibana to install packages

Operational risk: secure Kibana from processing unlimited content.

Ingest pipelines (JSON or YAML files)

3MB - Files are read by Kibana and are installed in Elasticsearch.

Bad design: Long pipelines can be considered antipatterns and we should prevent people from implementing and publishing them in such form. Every pipeline can be always split into multiple sub-pipelines. We can try enabling content compression, but it can’t be done with a ready/known YAML/JSON minifier (due to indents and placeholders).

Operational risks: not all pipelines are covered with tests, which means that it's relatively easy to break data collection

Config templates (HBS files)

2MB - Files are read and processed by Kibana, then loaded by Elastic Agents. Configs should be as simple and concise as possible, but there are packages that define additional long script processors in it (antipattern or workaround).

Security risks:

Secure Kibana from processing unlimited content.
Configs are distributed down to the host machines. Let's restrict the content length to prevent swamping hosts with potentially malicious data (consider community packages in the future).

jsoriano · 2021-11-24T11:04:00Z

jsoriano
Nov 24, 2021
Maintainer

For screenshots we may also add limits to their dimensions, this may help providing more helpful good-quality images. For example having 4K screenshot may not provide much value in small thumbnails, or given that most people screens are around or below FHD (more stats here). Reducing the dimension of the images will also reduce their size in bytes, without reducing their quality, and can provide guidelines to package developers about how dashboards are going to look in the most common screen sizes.

3 replies

mtojek Nov 24, 2021
Author

Although we should be fine with vector graphics (SVGs), This is a good idea to enforce dimensions too for raster graphics (PNGs, JPGs, ICOs). We can already identify 2 graphic types: icons and screenshots, which means that could try introducing two different limits

Screenshots

Based on the study of browsers you provided we consider the high-resolution boundary as 1366x768. AFAIK Kibana doesn't present screenshots in full-screen mode.

Icons

I know that it has a direct impact on quality, but I'm not sure if we need a higher resolution than 512x512. WDYT @jen-huang?

jen-huang Nov 24, 2021
Maintainer

Based on the study of browsers you provided we consider the high-resolution boundary as 1366x768. AFAIK Kibana doesn't present screenshots in full-screen mode.

Users are able to click thumbnails to view a bigger version similar to full screen, the UI is like this: https://elastic.github.io/eui/#/display/image#click-an-image-for-a-full-screen-version. I'd suggest maxing it out at 1920x1080 based on the browser stats.

I'm not sure if we need a higher resolution than 512x512

512x512 should be quite sufficient for icons.

mtojek Nov 25, 2021
Author

Sounds good to me! 1920x1080 for screenshots and 512x512 for icons. Thank you all for the valid input.

ruflin · 2021-11-25T09:39:39Z

ruflin
Nov 25, 2021

For data streams, I would like to see the limit much lower then 500. If we get to a point where a single package sets up more then 50 data streams, we should likely start a discussion on if this package needs splitting up or how we enabled Fleet to enable everything.

One thing we wanted to do in the past is potentially add short videos to the packages which would explode the size. Like a quick getting started / demo. Maybe for these assets we can find a way that these can be refrenced instead of package so we don't hit limits.

In general, I would start with the limits as low as possible. It is easy to increase a limit but really hard to reduce it later on as it is a breaking change. This also gives us the opportunity to discuss why a limit needs to be increased.

7 replies

ruflin Nov 25, 2021

@andrewkroh ^

jlind23 Dec 3, 2021
Maintainer

@mtojek can we also think about adding an alert for us (ecosystem team) as soon as a team reaches the limit? It will be cool to help the team proactively on it.

mtojek Dec 3, 2021
Author

Absolutely, we can even set it at 90% to be alarmed earlier.

andrewkroh Dec 7, 2021
Maintainer

Zeek has one data stream per log type. Zeek has many different log types so it could go beyond the 39 it has today. This page lists 65 types.

https://docs.zeek.org/en/master/script-reference/log-files.html

ruflin Dec 7, 2021

I think Zeek has many log streams but as mentioned above I think quite a few of these should be mapped to other package data streams.

mtojek · 2021-12-09T10:40:28Z

mtojek
Dec 9, 2021
Author

Ok, we hit the deadline, thank you for reviewing and contributing. We'll continue with the appropriate spec updates.

Discussion locked.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TTL Dec 8th] Let's discuss package limits #2851

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 10 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

[TTL Dec 8th] Let's discuss package limits #2851

mtojek Nov 23, 2021

Why do we need to introduce limits?

Global hard limits

Package size

Total filesize

Total number of files in a package

Total number of files in a directory

Local hard limits

Total number of data streams

Total number of fields per data stream

Graphic resources (screenshots, images)

Configuration files (YML files - manifests, fields files)

Ingest pipelines (JSON or YAML files)

Config templates (HBS files)

Replies: 3 comments · 10 replies

jsoriano Nov 24, 2021 Maintainer

mtojek Nov 24, 2021 Author

jen-huang Nov 24, 2021 Maintainer

mtojek Nov 25, 2021 Author

ruflin Nov 25, 2021

ruflin Nov 25, 2021

jlind23 Dec 3, 2021 Maintainer

mtojek Dec 3, 2021 Author

andrewkroh Dec 7, 2021 Maintainer

ruflin Dec 7, 2021

mtojek Dec 9, 2021 Author

mtojek
Nov 23, 2021

Replies: 3 comments 10 replies

jsoriano
Nov 24, 2021
Maintainer

mtojek Nov 24, 2021
Author

jen-huang Nov 24, 2021
Maintainer

mtojek Nov 25, 2021
Author

ruflin
Nov 25, 2021

jlind23 Dec 3, 2021
Maintainer

mtojek Dec 3, 2021
Author

andrewkroh Dec 7, 2021
Maintainer

mtojek
Dec 9, 2021
Author