Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Policy about type name in index name is too harsh #2188

Open
1 of 2 tasks
codefromthecrypt opened this issue Dec 30, 2023 · 3 comments
Open
1 of 2 tasks

Policy about type name in index name is too harsh #2188

codefromthecrypt opened this issue Dec 30, 2023 · 3 comments

Comments

@codefromthecrypt
Copy link

What kind an issue is this?

  • Bug report. If you’ve found a bug, please provide a code snippet or test to reproduce it below.
    The easier it is to track down the bug, the faster it is solved.
  • Feature Request. Start by telling us what problem you’re trying to solve.
    Often a solution already exists! Don’t send pull requests to implement new features without
    first getting our support. Sometimes we leave features out on purpose to keep the project small.

Issue description

I can't find the pull request discussion about it, but this commit made it illegal to have a type name inside an index name:

047d80b#diff-082a1d730a4b541ff64014349ca4c0ce987cf6a5a9f92a6c4dd2fc2325348cceR86

Such a strict naming convention breaks upgrade paths, including zipkin as well this other report in the user group.

While understandable that type names in index names aren't useful in modern versions, it shouldn't break people or prevent them from upgrading. There are ecosystem tools that expect a common naming convention for the data in indices and it is very large effort to change over one draconian source line.

cc @xeraa as this is a huge impact and if this is declined I think the ecosystem will never upgrade.

Steps to reproduce

Code:

Pending code trying to upgrade zipkin-dependencies to support ES 8 and not interfere with ES 7 which already works. For example, this policy makes it impossible to upgrade cleanly.

Stack trace:

org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Detected type name in resource [earch.itelasticsearchdependencies$itdependencies:span-2023-12-30/span]. Remove type name to continue.

	at org.elasticsearch.hadoop.rest.Resource.<init>(Resource.java:88)
	at org.elasticsearch.hadoop.rest.InitializationUtils.checkIndexNameForRead(InitializationUtils.java:61)

Version Info

OS: : Darwin (for testing)
JVM : azul-11.0.21
Hadoop/Spark: 3.3.4
ES-Hadoop : 8.11.3
ES : 7.x or 8.x

@codefromthecrypt
Copy link
Author

I removed ES 6 support from our tool, and will have to mention loudly in the release notes that someone needs to first upgrade from ES 6-7 before using the es-hadoop version that includes this issue, as it is no longer possible to make the migration path from 6-8 in the same binary.

Feel free to close this if you are ok with the impact

codefromthecrypt pushed a commit to openzipkin/zipkin-dependencies that referenced this issue Dec 30, 2023
This drops support of Elasticsearch 6.x, including writing to old index
templates, as it is no longer possible with the driver that supports 7.x
and 8.x.

See elastic/elasticsearch-hadoop#2188

This also updates to the latest Spark 3.x version possible, currently
3.3.x. Note: we can't update to Scala 2.13 due to a conflict between
connectors: (DataStax) Cassandra requires Spark 3.4 on Scala 2.13, but
Elastic requires Spark 3.3.

See elastic/elasticsearch-hadoop#2187

Signed-off-by: Adrian Cole <[email protected]>
codefromthecrypt added a commit to openzipkin/zipkin-dependencies that referenced this issue Dec 30, 2023
This drops support of Elasticsearch 6.x, including writing to old index
templates, as it is no longer possible with the driver that supports 7.x
and 8.x.

See elastic/elasticsearch-hadoop#2188

This also updates to the latest Spark 3.x version possible, currently
3.3.x. Note: we can't update to Scala 2.13 due to a conflict between
connectors: (DataStax) Cassandra requires Spark 3.4 on Scala 2.13, but
Elastic requires Spark 3.3.

See elastic/elasticsearch-hadoop#2187

Signed-off-by: Adrian Cole <[email protected]>
@masseyke
Copy link
Member

masseyke commented Jan 1, 2024

Can you provide more details on how to reproduce this? You are using es-hadoop 8.11.3 pointed at a 6.x elasticsearch with types? For some reason the es-hadoop client thinks that you are pointed at an 8.x elasticsearch (it discovers the version on startup). This is the logic that is supposed to handle it: https://github.com/elastic/elasticsearch-hadoop/blob/v8.11.3/mr/src/main/java/org/elasticsearch/hadoop/rest/Resource.java#L85.
Or are you trying to use an index name with types when reading from elasticsearch 8.x?

@xeraa
Copy link

xeraa commented Jan 1, 2024

Maybe one sidenote for the upgrade story: Apache Lucene only writes the current major version (N) and can only read the previous major version (N-1) or current one (N). The Elasticsearch major versions also upgrade to a major Lucene version. So going from 6 to 8 will generally not work, since you'd need to be able to read N-2 — at least not without a reindex in which you could fix the _type.

It might not be perfect but many people will probably only keep tracing data for some time (let's say 30 to 90 days). If possible, I'd do a stepwise upgrade from 6 to 7 or within 7 to the new pattern and then 8 as the data ages out. Not perfect but maybe a reasonable tradeoff in upgrades?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants