Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"SPDX-License-Identifier: Artistic-1.0+" not recognised #3256

Open
vargenau opened this issue Feb 15, 2023 · 7 comments
Open

"SPDX-License-Identifier: Artistic-1.0+" not recognised #3256

vargenau opened this issue Feb 15, 2023 · 7 comments
Labels

Comments

@vargenau
Copy link
Contributor

Description

In a PHP source file, I have the following line:

* SPDX-License-Identifier: Artistic-1.0+

This results in:

LicenseInfoInFile: LicenseRef-scancode-unknown-spdx

and

LicenseID: LicenseRef-scancode-unknown-spdx
LicenseName: Unknown SPDX license detected but not recognized
LicenseComment: <text>See details at https://github.com/nexB/scancode-toolkit/blob/develop/src/licensedcode/data/licenses/unknown-spdx.yml
</text>
ExtractedText:  * SPDX-License-Identifier: Artistic-1.0+

I would have expected:

LicenseInfoInFile: Artistic-1.0+

Moreover, the URL https://github.com/nexB/scancode-toolkit/blob/develop/src/licensedcode/data/licenses/unknown-spdx.yml give a 404 error.

How To Reproduce

svn checkout https://svn.code.sf.net/p/phpwiki/code/trunk phpwiki
./scancode -c -l -i --license-text --spdx-tv phpwiki.spdx phpwiki

Resulting SPDX file:

phpwiki.spdx.txt

System configuration

./scancode --version
ScanCode version: 32.0.0rc1
ScanCode Output Format version: 3.0.0
SPDX License list version: 3.19

Ubuntu 22.10

@vargenau vargenau added the bug label Feb 15, 2023
@AyanSinhaMahapatra
Copy link
Member

@vargenau this is because spdx license identifier Artistic-1.0+ does not exist AKAIK, from https://spdx.org/licenses/, there are the following identifiers for artistic licenses:

name SPDX-License-Identifier link 
Artistic License 1.0 Artistic-1.0 Artistic License 1.0
Artistic License 1.0 w/clause 8 Artistic-1.0-cl8 Artistic License 1.0 w/clause 8
Artistic License 1.0 (Perl) Artistic-1.0-Perl Artistic License 1.0 (Perl)
Artistic License 2.0 Artistic-2.0 Artistic License 2.0

So the LicenseRef-scancode-unknown-spdx detection is intended behavior for now. It means there was an unidentified SPDX license.

But we probably can do better here, I propose the following:

  1. On the new license detection post processing, if we encounter a LicenseRef-scancode-unknown-spdx, we can run normal scancode-license detection again, instead of just expression parsing (which we do now on encountering the SPDX-License-Identifier: prefix)
  2. we make the resulting detected_license_expression whatever we find there, but keep this LicenseRef-scancode-unknown-spdx match also in the detection object with relevant logs to identify what happened here.
  3. we can also add a specific rule for this case to detect this correctly.

@pombredanne what do you think?

Also:

Moreover, the URL https://github.com/nexB/scancode-toolkit/blob/develop/src/licensedcode/data/licenses/unknown-spdx.yml give a 404 error.

This is a bug, thanks for reporting. We'll add the easy fix soon.
This should have been: https://github.com/nexB/scancode-toolkit/blob/develop/src/licensedcode/data/licenses/unknown-spdx.LICENSE as we recently updated all our rules/licenses to be in a single file instead of two like before. See #3049 for reference.

@vargenau
Copy link
Contributor Author

Hi @AyanSinhaMahapatra,

Thank you for taking my report into account.

In a source file the SPDX-License-Identifier can be a license identifier from the SPDX License List https://spdx.org/licenses/ but it can also be a license expression.

This is the case here. You have a valid license identifier Artistic-1.0 followed by the + meaning it can be Artistic-1.0 or a later version of this license.

See https://spdx.github.io/spdx-spec/v2.3/SPDX-license-expressions/ for license expressions and https://spdx.github.io/spdx-spec/v2.3/using-SPDX-short-identifiers-in-source-files/

@AyanSinhaMahapatra
Copy link
Member

@vargenau thanks for pointing out my mistake, I had no idea this was supported in SPDX. 😅

Thanks for the links, I'll look more into it, we will have to do this similar to what I mentioned above, if we want to support these expressions.

@pombredanne
Copy link
Member

@vargenau Actually, we handle this slightly differently in ScanCode toolkit.

We do not have by design a concept of an arbitrary "+" suffix e.g., an "or later" addition. Instead we create one license key for each license where we have such thing showing up in the wild, and we maintain multiple SPDX alternative ids in these cases.

For instance with the GPL-2.0, we have this data:
https://scancode-licensedb.aboutcode.org/gpl-2.0-plus.html

spdx_license_key: GPL-2.0-or-later 
other_spdx_license_keys:
    - GPL-2.0+
    - GPL 2.0+

We will report always GPL-2.0-or-later but accept the other keys too when scanning.

Here if SPDX-License-Identifier: Artistic-1.0+ is a thing, then we could add a license key for this alright. Otherwise this will be detected as an unknown SPDX license key "LicenseRef-scancode-unknown-spdx"

BTW do you have an original notice for "./lib/HttpClient.php" that has this Artistic or later license? Do you have more examples of the same?
I could not find an original Artistic license notice for https://web.archive.org/web/20060316094821/http://scripts.incutio.com/httpclient/manual.php and the only reference is in https://web.archive.org/web/20030721002452/http://simon.incutio.com/archive/2003/04/06/onyxRelicensed

Onyx Relicensed

Ed Swindelles has relicensed his Onyx RSS Parser under the MIT License, meaning it can now be used without risk for building commercial software. I haven't decided which license to place HttpClient under, but that one looks like a pretty good bet. I'll work it out in the morning.

I could not find much more in the history of https://github.com/pombredanne/svn.code.sf.net-p-phpwiki-code/commits/master/lib/HttpClient.php in particular where the "or later" would have come from/

I see though that it used to be (Artistic-1.0 OR Artistic-2.0) changed in pombredanne/svn.code.sf.net-p-phpwiki-code@89c1a05 and GPL-2.0-or later with https://github.com/pombredanne/svn.code.sf.net-p-phpwiki-code/blame/14232319a9c3c4f737eb2df2522e057c99805f57/lib/HttpClient.php

So I could not find any public evidence upstream that this was ever licensed under the Artistic-1.0 license. But assuming it was, there is no evidence this was Artistic-1.0 or a later version.

@vargenau
Copy link
Contributor Author

Hi Philippe,

Thank you for your detailed analysis.

The original file, when it was imported to PhpWiki, had the following header:

/**
Version 0.9, 6th April 2003 - Simon Willison ( http://simon.incutio.com/ )
Manual: http://scripts.incutio.com/httpclient/

Copyright � 2003 Incutio Limited
License: http://www.opensource.org/licenses/artistic-license.php
*/

A few years ago, I started adding the SPDX-License-Identifier in all files of the project.
Most files were GPL.
In this one, the URL http://www.opensource.org/licenses/artistic-license.php goes to a page that says there are now two versions of the license.
So I put

SPDX-License-Identifier: Artistic-1.0+

Perhaps this was a mistake.

What was clearly a mistake is that I added a GPL header (that I removed when I noticed it).

To confuse more the issue, I had replaced Artistic-1.0+ by (Artistic-1.0 OR Artistic-2.0) when I had misunderstood that the "+" had been removed from SPDX.
In fact, it was only the case for the GPL family that "+" is deprecated, when I realized it I put Artistic-1.0+ again.

But my question was more general: why create a custom LicenseID when an SPDX LicenseID exists?

In the case of:

SPDX-License-Identifier: Artistic-1.0 OR Artistic-2.0

or

SPDX-License-Identifier: Artistic-1.0 AND Artistic-2.0

the result is

PackageLicenseInfoFromFiles: Artistic-1.0
PackageLicenseInfoFromFiles: Artistic-2.0

which creates no custom LicenseID.

So, yes, I would prefer to have

LicenseInfoInFile: Artistic-1.0+

than a custom license, but it is not a major issue, the current behavior is correct.

@pombredanne
Copy link
Member

The original file, when it was imported to PhpWiki, had the following header:

/**
Version 0.9, 6th April 2003 - Simon Willison ( http://simon.incutio.com/ )
Manual: http://scripts.incutio.com/httpclient/

Copyright � 2003 Incutio Limited
License: http://www.opensource.org/licenses/artistic-license.php
*/

This artistic notice unfortunately never made it to archive.org: none of the the upstream versions have a license per https://web.archive.org/web/20080513205537/http://scripts.incutio.com/httpclient/HttpClient.class.php

And at the time, the link to opensource.org was pointing to the artistic 1.0 license from 2002 to 2006:
https://web.archive.org/web/20060403041708/http://opensource.org/licenses/artistic-license.php https://web.archive.org/web/20020624033843/http://www.opensource.org/licenses/artistic-license.php therefore based on this and the notice above, I cannot discern an intention to use any other version of the artistic license beyond the one that did exist when selected by Simon back in 2003, and I would not consider that the author intention was artistic-1.0 or later, unless Simon remembers any of this from 20 years ago and cares to chime in with his busy schedule ;)

Dear Simon @simonw :
What was your license for http://scripts.incutio.com/httpclient/ ?
Thanks!

But my question was more general: why create a custom LicenseID when an SPDX LicenseID exists?

This was an early design choice for ScanCode that even predates SPDX: every license MUST have its own concrete record and key, because there are several cases where this matters, GPL being the most prominent. Having a "+" modifier on an existing license id creates a wart in the data model and in practice, all lawyers I chatted with consider an "or later" license variant as different license terms, so it is best to treat them as a different license record.

Frankly, MO is that the adoption of a "plus" suffix by SPDX modifying a license has been a mistake and SPDX should have used instead only concrete license ids, and this is what happened eventually with the A/L/GPL licenses where we now have GPL-2.0-only and GPL-2.0-or-later variants.

With these changes (from circa 2017 under the pressure of rms), the current SPDX ids state is a half concrete and half "+" modifier suffixes and this is messy.

Technically GPL-2.0-only+ is a valid identifier... but what does this mean?

@vargenau
Copy link
Contributor Author

Hi Philippe,

I fully agree that GPL-2.0-only+ is legal and makes no sense.

That is why I had proposed last year to add in the SPDX spec a Boolean attribute to licenses, indicating whether or not they support the "+" operator.

See spdx/spdx-spec#689

But it had no support.

pombredanne added a commit that referenced this issue Feb 28, 2023
Add new rules to improve detection accuracy in phpwiki.

Reference: #3256
Signed-off-by: Philippe Ombredanne <[email protected]>
pombredanne added a commit that referenced this issue Apr 24, 2023
Add new rules to improve detection accuracy in phpwiki.

Reference: #3256
Signed-off-by: Philippe Ombredanne <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants