Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUGFIX] Fix regression: Clear document cache to prevent memory exhaustion #1168

Closed
wants to merge 8 commits into from

Conversation

sebastian-meyer
Copy link
Member

Fixes #1165

@sebastian-meyer sebastian-meyer added the 🐛 bug A non-security related bug. label Feb 4, 2024
@sebastian-meyer sebastian-meyer self-assigned this Feb 4, 2024
Copy link
Member

@stweil stweil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch still does not fix the memory leak, see details in the issue report.

Copy link
Member

@stweil stweil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now a runtime exception is thrown:

Reindex a collection into database and Solr.
============================================

2024-02-05 15:48:39 Indexing 1/2098 with UID "33" ("https://digi.bib.uni-mannheim.de/vl/ubmadesb/oai:digi.bib.uni-mannheim.de:51284.xml") on PID 3 and Solr core 11.
Mem Usage 1: 72380376
Mem Usage 2: 72380376
Mem Usage 3: 72261640
Mem Usage 4: 73559968
Mem Usage 5: 73559840
2024-02-05 15:48:45 Indexing 2/2098 with UID "72" ("https://digi.bib.uni-mannheim.de/fileadmin/digi/416971342/416971342.xml") on PID 3 and Solr core 11.
Mem Usage 1: 73809792
Mem Usage 2: 73809792

In PersistenceManager.php line 220:
                                                                                                                
  [TYPO3\CMS\Extbase\Persistence\Exception\UnknownObjectException (1249479819)]                                 
  The object of type "Kitodo\Dlf\Domain\Model\Document" given to update must be persisted already, but is new.  
                                                                                                                

Exception trace:
  at /var/www/typo3/public/typo3/sysext/extbase/Classes/Persistence/Generic/PersistenceManager.php:220
 TYPO3\CMS\Extbase\Persistence\Generic\PersistenceManager->update() at /var/www/typo3/public/typo3/sysext/extbase/Classes/Persistence/Repository.php:117
 TYPO3\CMS\Extbase\Persistence\Repository->update() at /var/www/typo3/packages/bfallert/presentation/Classes/Command/BaseCommand.php:268
 Kitodo\Dlf\Command\BaseCommand->saveToDatabase() at /var/www/typo3/packages/bfallert/presentation/Classes/Command/ReindexCommand.php:182
 Kitodo\Dlf\Command\ReindexCommand->execute() at /var/www/typo3/vendor/symfony/console/Command/Command.php:298
 Symfony\Component\Console\Command\Command->run() at /var/www/typo3/vendor/symfony/console/Application.php:1040
 Symfony\Component\Console\Application->doRunCommand() at /var/www/typo3/vendor/symfony/console/Application.php:301
 Symfony\Component\Console\Application->doRun() at /var/www/typo3/vendor/symfony/console/Application.php:171
 Symfony\Component\Console\Application->run() at /var/www/typo3/public/typo3/sysext/core/Classes/Console/CommandApplication.php:91
 TYPO3\CMS\Core\Console\CommandApplication->run() at /var/www/typo3/vendor/typo3/cms-cli/typo3:23
 {closure}() at /var/www/typo3/vendor/typo3/cms-cli/typo3:24

@sebastian-meyer
Copy link
Member Author

OK, this one is tricky!

Clearing the persistence manager's cache in AbstractDocument::saveToDatabase() was too early, because we still need already persisted documents when handling parent/child relationships between them. I've moved the clearing to a later point where all related documents should be persisted as well. Could you please try again?

Generally it seems like the memory consumptions increases during the AbstractDocument::saveToDatabase() and Indexer::add() methods instead of inside the ReindexCommand::execute() method itself. So maybe you could move the debugging statements to those methods to get a better idea of where exactly the memory leaks happen?

@jmechnich
Copy link
Collaborator

@sebastian-meyer unfortunately we are getting the same error as above after applying your latest commit. We made sure to revert the first attempt. You are indeed correct that we can try narrowing the problem down using additional logging, we were hoping though that you had more effective tools for doing that.

@sebastian-meyer
Copy link
Member Author

sebastian-meyer commented Feb 7, 2024

Thanks for the quick response! Unfortunately this is really hard for me to debug, because I don't have a working testing environment with enough example documents to run into memory issues or even get a good idea about how memory consumption develops while indexing more than a handful of documents.
So your assistence with testing and debugging is very much appreciated!

@csidirop
Copy link
Contributor

csidirop commented Feb 21, 2024

Unfortunately this is really hard for me to debug, because I don't have a working testing environment with enough example documents to run into memory issues or even get a good idea about how memory consumption develops while indexing more than a handful of documents.

You really don't need much documents to see the memory increasing. I'm testing with only with 30 documents. When starting the mem usage is around 19.74 MB while reindexing the last one it is at around 90 MB.

So your assistence with testing and debugging is very much appreciated!

Unfortunately the mem usage is still increasing

@sebastian-meyer sebastian-meyer removed this from the Kitodo.Presentation 5.0.0 milestone May 21, 2024
@sebastian-meyer
Copy link
Member Author

This is superseded by #1196, #1197 and #1201.

@sebastian-meyer sebastian-meyer deleted the fix-1165 branch May 21, 2024 13:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug A non-security related bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Memory leak in kitodo:reindex CLI job
4 participants