Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/caching #1922

Open
wants to merge 11 commits into
base: v2.6
Choose a base branch
from
Open

Feature/caching #1922

wants to merge 11 commits into from

Conversation

hmoazam
Copy link
Collaborator

@hmoazam hmoazam commented Dec 10, 2024

One single caching interface which has two levels of cache - in memory lru cache and fanout (on disk)

@hmoazam hmoazam requested a review from okhat December 10, 2024 22:15
dspy/__init__.py Outdated
@@ -64,6 +66,13 @@
configure = settings.configure
context = settings.context

CACHE_DIR = os.environ.get("DSPY_CACHEDIR") or os.path.join(Path.home(), ".dspy_cache")
Copy link
Collaborator

@okhat okhat Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: .dspy_cache is already used for litellm, this PR should be swapped (it uses .dspy_litellm_cache for the existing cache but that cache doesn't exist at that new path)

also the limit should be 30GB

num_retries=num_retries,
)
from dspy import settings
@dspy_cache_decorator(settings.cache)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm this is a strange pattern; is the inner function redefined every time? does this mean the decorator... works? I guess it does, but yeah it's a bit overkill probably?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does work but definitely a weird pattern. Not 100% sure if it has any negative side effects.

def cached_litellm_text_completion_inner(request: Dict[str, Any], num_retries: int):
return litellm_text_completion(
request,
cache={"no-cache": False, "no-store": False},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This passes cache to the function below. But for some reason we removed the cache kwarg below? Why?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default (both kwargs true) should be added back to litellm_text_completion function (I forgot to fix this)

key = self.cache_key(request)
except Exception:
return None
with self.lock:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm do we need to lock just to read?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we can remove this lock, we may wanna do something like if key := self.memory_cache.get() to avoid double access

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think reads are thread safe but the docs aren't clear.

@okhat
Copy link
Collaborator

okhat commented Dec 12, 2024

Recording some thoughts here.

  1. We need to make sure that caching behavior makes sense if the function being cached raises an Exception.
  2. We need to make sure that the caching behavior makes sense for LiteLLM. Specifically, LiteLLM's default caching is a bit nuanced: it does not cache api_* keys (which is important) and it does not cace metadata (like cost), as far as I can tell.

@okhat okhat changed the base branch from main to v2.6 December 13, 2024 15:38
"""

self.memory_cache = LRUCache(maxsize=mem_size_limit)
self.fanout_cache = FanoutCache(shards=16, timeout=2, directory=directory, size_limit=disk_size_limit)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hmoazam As I understand it, part of the motivation for this PR is to enable users to dump the current state of the cache for a particular DSPy session.

Is the FanoutCache introduced for that purpose? If so, since FanoutCache shares the same directory as the litellm.cache, what extra utility is it providing?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separately, I think we should define the behavior for dumping / loading a cache.

For example, consider the case where I've already run "Program 1" as a Python script / notebook. Now, I'm running "Program 2" in a separate notebook / Python script invocation. I'm using the default DSPy cache directory for both program executions.

If I dump the cache after I run Program 2, do I also want to include cache contents from Program 1? Or, do I only want to include cache contents from Program 1, since that's my "current session"? Should users have a choice?

Once we align on the desired semantics, we can figure out how to make adjustments to the implementation.

cc @okhat

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbczumar For disk caching, we will migrate from LiteLLM's caching to FanoutCache. The former has proved limiting when users need many read/write processes and threads in parallel.

For a short period of time, we might keep the LiteLLM cache as well. This means that highly active users that upgrade versions regularly will have an implicit seamless migration of their caches, because LiteLLM's requests will remain cached and will implicitly transfer to the FanoutCache. (This behavior is not required, and in principle there are other ways to more intentionally allow migration, but explicit migration is almost never worthwhile for caches.)

Cache dumping is fairly simple. It saves the current in-memory Python session, unless the user triggers a reset for the in-memory cache. (The current in-memory session draws on saved caches on disk, as usual.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants