Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimal code needed for declarative usage #247

Open
abalter opened this issue Aug 26, 2024 · 14 comments
Open

Minimal code needed for declarative usage #247

abalter opened this issue Aug 26, 2024 · 14 comments

Comments

@abalter
Copy link

abalter commented Aug 26, 2024

This is somewhat unfair to ask, but if someone can help me, it would mean a huge amount. The citeproc-js library is pretty complex and using it requires creating other functions (retrieveItem, retrieveLocale) that don't fully make sense to me. The package is designed to be able to do a large number of things across a large number of use cases.

All I want to do is generate formatted citations given a CSL-JSON library, CSL stylesheet, and locale spec like this:

function getCitation({
  citationIds = [],  // Array of citation ids
  type = "",         // "in-text" or "full"
  sources = "",      // CSL-JSON string
  style = "",        // CSL stylesheet as JSON string
  locale = ""        // Locale XML as JSON string
}) {
  /**
   * Retrieve citations based on specified parameters.
   * 
   * @param {Array} citationIds - List of citation IDs
   * @param {String} type - Type of citation ("in-text" or "full")
   * @param {String} sources - CSL-JSON formatted string of sources
   * @param {String} style - CSL stylesheet converted to JSON
   * @param {String} locale - Locale XML file converted to JSON
   */
  
  // Assuming equivalent functions exist to parse the JSON, stylesheet, and XML
  const sources_object = JSON.parse(sources);
  const style_object = JSON.parse(style);
  const locale_object = JSON.parseL(locale);
  
  const selected_sources = sources_object.filter(source => citationIds.includes(source.id));
  
  if (type === "in-text") {
    return selected_sources.map(source => getShortCitation(source, style_object, locale_object));
  } else if (type === "full") {
    return selectedSources.map(source => getFullCitation(source, style_object , locale_object));
  } else {
    console.log("Invalid type: " + type);
    return false;
  }
}

Could someone guide me to the pertinent methods that I could use to build this simple application?

@fbennett
Copy link
Contributor

I wrote citeproc-js, maybe I can help. First off, by "in-text" and "full," do you mean something like APA in the first case, and something like Chicago Manual footnote style in the second? Or does the first mean "in the document" and the second "in the bibliography"?

@abalter
Copy link
Author

abalter commented Aug 26, 2024

Hi @fbennett. Thanks for offering to help!

By "in-text" I mean inline citations, what is produced by Cite.format('citation', ... in citation.js. This is probably close to a "citation cluster". For example (Loomes, 2017, pp. 23-27).

By "full" I mean what would go into a bibliography or references list. I'm not sure if citeproc.makeBibliography(filter) returns single full citations or only a full bibliography (the entire library). I don't know what the filter variable is.

https://help.quillbot.com/hc/en-us/articles/4408078736023-What-is-the-difference-between-in-text-citations-and-full-citations

I'm wondering if a "citation cluster" is an in-text citation with one or more references?

I do like the objects returned by makeCitationCluster (here) and makeBibliography (here).

Now that I'm looking over the docs again, I might be grokking it better. This is what I think I'm seeing:

The citeproc instance is initialized with a library of sources (CSL-JSON), and the ability to format citations and references in ANY style or locale specified. This is mediated by the sys function that the user has to create.

I think this creates an enormous overhead for my needs. If I know my style and locale ahead of time and know I'm going to use those, I would like to be able to instantiate a library that can directly create citations without having to know how to go fetch styles and locales.

Maybe we could consider making the code more modular? Something I would be willing to help with.

@fbennett
Copy link
Contributor

fbennett commented Aug 26, 2024 via email

@abalter
Copy link
Author

abalter commented Aug 26, 2024

First let me say that I understand if this all sounds very critical. Just having CSL, Citeproc-JS, and Citation-JS is an amazing thing! I can see an immense amount of work went into creating the specs and writing the code. It's a huge boon to the academic world.

I do find both of the JS libraries to be quite difficult to use and the code looks like it could possibly be a lot simpler if it were modularized. For example, it would be fantastic if there was a single function that received a single source, style, and locale all as JSON or JavaScript objects and returned an inline citation. But that functionality appears to be entangled with other operations. Although I could be wrong about that.

Maybe what I'm actually suggesting is a feature request:

var citeproc = new CSL.DeclarativeEngine(style, lang);

Where style and lang are the actual CSL style and Locale as strings. Or URLs.

And, why not just default to the Citation Style Language style and locale specs?

Alternatively, a default sys function that would work from strings or file URLs.

I guess I just need to write my own like this:

sys = {
    fetchFile: async function(url) {
        try {
            const response = await fetch(url);
            if (!response.ok) {
                throw new Error('Network response was not ok');
            }
            const data = await response.text(); // or response.json(), response.blob() etc.
            return data; // return the fetched data
        } catch (error) {
            console.error('There has been a problem with your fetch operation:', error);
        }
    },
    
    loadLibrary: async function(library){
        var library_data =  await fetchFile(library);
        this.library = JSON.parse(library_data);
    },
    
    retrieveItem: function(item_id){
        item = this.library.items.find(x => x.id == item_id);
    },
    
    retrieveStyle: async function(style) {
        // const url = `https://raw.githubusercontent.com/citation-style-language/styles/master/${style.csl_name}.csl`;
        const url = `https://www.zotero.org/styles${style}`;
        return await fetchFile(url);
    },

    retrieveLocale: async function(locale) {
        const url = `https://raw.githubusercontent.com/citation-style-language/locales/master/locales-${locale}.xml`;
        return await fetchFile(url);
    }
}

@fbennett
Copy link
Contributor

fbennett commented Aug 26, 2024 via email

@abalter
Copy link
Author

abalter commented Aug 26, 2024

Ok. Before I jump in, just how rugged is it processing

source -----> citation
               ^
              CSL

?

The XML has a lot of logic in it. Does ALL of that need to get parsed and recoded in javascript?

@fbennett
Copy link
Contributor

fbennett commented Aug 26, 2024 via email

@abalter
Copy link
Author

abalter commented Aug 26, 2024

Since there is so much logic in the stylesheets, really so much thought and work has gone into those, it seems to me like this is a perfect job for AI. I just handed chatGPT the "title" macro from chicago-author-date.csl and asked to to write it in javascript:

title macro

 <macro name="title">
    <choose>
      <if variable="title" match="none">
        <choose>
          <if type="personal_communication speech thesis" match="none">
            <text variable="genre" text-case="capitalize-first"/>
          </if>
        </choose>
      </if>
      <else-if type="bill book graphic legislation motion_picture song" match="any">
        <text variable="title" text-case="title" font-style="italic"/>
        <group prefix=" (" suffix=")" delimiter=" ">
          <text term="version"/>
          <text variable="version"/>
        </group>
      </else-if>
      <else-if variable="reviewed-author">
        <choose>
          <if variable="reviewed-title">
            <group delimiter=". ">
              <text variable="title" text-case="title" quotes="true"/>
              <group delimiter=", ">
                <text variable="reviewed-title" text-case="title" font-style="italic" prefix="Review of "/>
                <names variable="reviewed-author">
                  <label form="verb-short" text-case="lowercase" suffix=" "/>
                  <name and="text" delimiter=", "/>
                </names>
              </group>
            </group>
          </if>
          <else>
            <group delimiter=", ">
              <text variable="title" text-case="title" font-style="italic" prefix="Review of "/>
              <names variable="reviewed-author">
                <label form="verb-short" text-case="lowercase" suffix=" "/>
                <name and="text" delimiter=", "/>
              </names>
            </group>
          </else>
        </choose>
      </else-if>
      <else-if type="legal_case interview patent" match="any">
        <text variable="title"/>
      </else-if>
      <else>
        <text variable="title" text-case="title" quotes="true"/>
      </else>
    </choose>
  </macro>

title.js

function formatTitle(citation) {
    // Helper functions for text formatting
    const capitalizeFirst = text => text.charAt(0).toUpperCase() + text.slice(1);
    const toTitleCase = text => text.replace(/\w\S*/g, txt => txt.charAt(0).toUpperCase() + txt.substr(1).toLowerCase());
    const italicize = text => `<i>${text}</i>`;
    const quote = text => `"${text}"`;

    let result = "";

    // Handling different conditions
    if (!citation.title) {
        if (!["personal_communication", "speech", "thesis"].includes(citation.type)) {
            result = citation.genre ? capitalizeFirst(citation.genre) : "";
        }
    } else if (["bill", "book", "graphic", "legislation", "motion_picture", "song"].includes(citation.type)) {
        result = italicize(toTitleCase(citation.title));
        if (citation.version) {
            result += ` (${citation.term_version || ''} ${citation.version})`;
        }
    } else if (citation.reviewed_author) {
        if (citation.reviewed_title) {
            result = `${quote(toTitleCase(citation.title))}. Review of ${italicize(toTitleCase(citation.reviewed_title))}, ${citation.reviewed_author.map(author => author.name).join(", ")}`;
        } else {
            result = `Review of ${italicize(toTitleCase(citation.title))}, ${citation.reviewed_author.map(author => `${author.label || ''} ${author.name}`).join(", ")}`;
        }
    } else if (["legal_case", "interview", "patent"].includes(citation.type)) {
        result = citation.title;
    } else {
        result = quote(toTitleCase(citation.title));
    }

    return result;
}

@fbennett
Copy link
Contributor

fbennett commented Aug 26, 2024 via email

@abalter
Copy link
Author

abalter commented Aug 26, 2024

That's not my goal. I just thought I'd give it a try and see what happens. Not a good approach though, because then each style gets its own javascript which needs to be maintained.

I guess the goal is to write javascript that knows how to interpret and act on the logic in the macros.

@larsgw
Copy link
Collaborator

larsgw commented Aug 26, 2024

Before I say this I just want to denounce LLMs as well. However, I've been thinking about "compiling" CSL into JS or other imperative languages as well, but programmatically of course. You'd need the appropriate helper functions, but it might lend to some interesting optimizations. Is that what you're after @abalter, or do you mean a single function that initializes citeproc to simplify the API?

@abalter
Copy link
Author

abalter commented Aug 26, 2024

I wasn't thinking about using LLMs the way I think you might be, anyway. I use them to help write code, do some of the dirty work. It's actually quite good at that. Of course, it's just a helper, so I double check everything.

That's all. I wasn't thinking: "hand this over to an AI".

I did a little exploring to understand the limits of XML and XSLT. In a perfect work, each bit of logic in the stylesheet should directly translate to a logical statement in another computer language. Thus, something like xsltproc should be able to apply the logic in any well-formed stylesheet to any well-formed data. I guess it doesn't work like that.

My impression of the codebase is that interpreting and applying the logic in the stylesheets was pretty hellish. I see a lot of stuff that looks like trying to handle edge case after edge case. Is that just the way it is? Or would a fresh approach find common patterns and shortcuts?

I haven't studied a lot of CSL stylesheets yet to see if there are commonalities. I'm assuming each one has a few macros for handling authors, a few for titles, a few for publishers, etc. Maybe there is an ontology somewhere.

@larsgw
Copy link
Collaborator

larsgw commented Aug 26, 2024

That's all. I wasn't thinking: "hand this over to an AI".

Sorry for my misinterpretation.

My impression of the codebase is that interpreting and applying the logic in the stylesheets was pretty hellish. I see a lot of stuff that looks like trying to handle edge case after edge case. Is that just the way it is? Or would a fresh approach find common patterns and shortcuts?

Just my perspective: I tried such a fresh approach a while back to get to know CSL a bit better and found that (1) the specification covers a lot of edge cases, so the actual behavior is sometimes a lot more complex that the XML itself suggests (e.g. handling of names, punctuation, indentation, suppression) and (2) citeproc-js has a lot of heuristics to be able to properly follow the specifications in the first place, and covers plenty more edge cases which didn't make it to the specifications. You can't easily get red of those and still get good results unless you keep to the most basic references.

I haven't studied a lot of CSL stylesheets yet to see if there are commonalities. I'm assuming each one has a few macros for handling authors, a few for titles, a few for publishers, etc. Maybe there is an ontology somewhere.

The macros can differ between styles, and as far as I know there are no guidelines.

@fbennett
Copy link
Contributor

fbennett commented Aug 26, 2024

@abalter: There are a couple of projects that might be of interest, given your objectives (apologies if you already know of these):

  • citeproc-rs: an implementation of the CSL specification in Rust, with a view to replacing citeproc-js with a tool superior in speed and code composition. The repo hasn't seen major code contributions in three years, but a recent pull request aims to get it working with more recent releases of Rust.
  • csl-next: a working area set up by the designer of CSL itself, aiming to replace the current specification with a greatly simplified system that can be more easily implemented by developers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
@fbennett @abalter @larsgw and others