Replies: 3 comments 10 replies
-
For screenshots we may also add limits to their dimensions, this may help providing more helpful good-quality images. For example having 4K screenshot may not provide much value in small thumbnails, or given that most people screens are around or below FHD (more stats here). Reducing the dimension of the images will also reduce their size in bytes, without reducing their quality, and can provide guidelines to package developers about how dashboards are going to look in the most common screen sizes. |
Beta Was this translation helpful? Give feedback.
-
For data streams, I would like to see the limit much lower then 500. If we get to a point where a single package sets up more then 50 data streams, we should likely start a discussion on if this package needs splitting up or how we enabled Fleet to enable everything. One thing we wanted to do in the past is potentially add short videos to the packages which would explode the size. Like a quick getting started / demo. Maybe for these assets we can find a way that these can be refrenced instead of package so we don't hit limits. In general, I would start with the limits as low as possible. It is easy to increase a limit but really hard to reduce it later on as it is a breaking change. This also gives us the opportunity to discuss why a limit needs to be increased. |
Beta Was this translation helpful? Give feedback.
-
Ok, we hit the deadline, thank you for reviewing and contributing. We'll continue with the appropriate spec updates. Discussion locked. |
Beta Was this translation helpful? Give feedback.
-
Why do we need to introduce limits?
We need rules to verify if packages produced by our users are compliant with the solutions we provide. We have to reject revisions that may impact the stability of the Package Registry or Kibana or the overall user experience. Once we support real community users and enable a certification platform, we have to pay more attention to potentially malicious activity, hence the idea of introducing hard limits.
These values have been selected arbitrarily, based on the storage observations, infrastructure limits, security risks, and operational experience.
EDIT:
I updated the proposal with the risks of not applying limit changes.
Global hard limits
Package size
250MB - we know about ML model files, that can weigh up to 100MB and these are the heaviest resources we’re aware of. If the ML team considers introducing larger models, we could cautiously raise limits.
Operational risk: Package Registry or Kibana can't handle packages with unlimited size, as they will become operational.
Security risk: Huge packages can cause out of memory errors and may lead to denial of service (cloud deployments)
Total filesize
150MB - a limit set to 50% higher than known ML models.
BTW, 4,294,967,296B - maximum file size allowed by the standard Zip format
Operational risk: same as for the package size
Security risk: same as for the package size
Total number of files in a package
65535 - the ZIP format, which we adopt for distributing package resources, has a hard limit. If we want to exceed this limit, we would have to adopt the Zip64 format.
Operational risk: we can't allow for more packages if we want to stick with ZIP archives. We won't be able to build and deliver such packages.
Total number of files in a directory
65535 - this constraint is enforced by the ZIP format.
4,294,967,295 = 2^32 - 1 - it doesn’t seem that there is a possibility that we’ll exceed the limit. Google Cloud doesn’t post any folder limits in public, but L
Operational risk: same as for the "total number of files in a package"
Local hard limits
Total number of data streams
500 - This is way above our current needs. Packages using most data streams, like AWS or Zeek, define approx. 50 data streams. Theoretically speaking, there are around 200 AWS services, so even if we provide separate data streams for logs and metrics, we should be safe to cover them all.
Operational risk: For sure there is a natural limit of data streams that can be supported by Kibana and Elasticsearch. Too many data streams may have a bad influence on the UX (slow UI).
Total number of fields per data stream
1024 - In the past, there was introduced a total limit of 1000 fields in Elasticsearch. We know the limit can be raised using appropriate settings. I think it’s safe to introduce such a hard limit for a single data stream. If a data stream defines more fields, then it may be a suggestion to the developer that it should be split into multiple streams.
Operational risk: long Kibana processing to install the package, unknown implications on the Elasticsearch side
Graphic resources (screenshots, images)
3MB - I’m not sure if it isn’t too much, but such a high limit is compliant with all graphic files in the Package Storage. We can enable image compression or web optimization, but can’t demand it.
UX risk: Kibana UI loading slowly, unneeded data transfer to render the integrations catalog.
Configuration files (YML files - manifests, fields files)
5MB - Files are read by Kibana to install packages
Operational risk: secure Kibana from processing unlimited content.
Ingest pipelines (JSON or YAML files)
3MB - Files are read by Kibana and are installed in Elasticsearch.
Bad design: Long pipelines can be considered antipatterns and we should prevent people from implementing and publishing them in such form. Every pipeline can be always split into multiple sub-pipelines. We can try enabling content compression, but it can’t be done with a ready/known YAML/JSON minifier (due to indents and placeholders).
Operational risks: not all pipelines are covered with tests, which means that it's relatively easy to break data collection
Config templates (HBS files)
2MB - Files are read and processed by Kibana, then loaded by Elastic Agents. Configs should be as simple and concise as possible, but there are packages that define additional long script processors in it (antipattern or workaround).
Security risks:
Beta Was this translation helpful? Give feedback.
All reactions