Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Viewing Large Files #7

Open
haydenflinner opened this issue Feb 29, 2020 · 2 comments
Open

Viewing Large Files #7

haydenflinner opened this issue Feb 29, 2020 · 2 comments

Comments

@haydenflinner
Copy link

Hi,

First just wanted to say great job on this app. The code here as well as in prompt-toolkit is exactly the sort of idiomatic code I thought must be out there somewhere, and I'm glad I put off my less rewrite until I found this. There's only one thing missing from your pager that I need from less; viewing large files.

Less, when you press G, does a seek to the end of the file. Then it tries to calculate line numbers for you if enabled, but you can CTRL+C to stop that and just show the last screenful of the file. Reading the code, it seems that it keeps a sort of linked list of loaded portions of the file, for quick jumping around, e.g. G gg is near instant no-matter the size of the file. Additionally, pressing G takes you to the end of the file. In pypager, I've found that pressing G takes me to some point deep in the file, I assume when a read-timeout has finished, and pressing G again takes me deeper still, but not yet to the end. From reading the code I haven't seen any special approach for handling large files, it seems the file is just treated as one big text buffer. I'd like to implement low-resource reading of large files (Unix-only for me), and I was wondering if you had any thoughts on where to get started or gotchas that make it exceedingly difficult. I'm thinking I will start with trying a simple mmapped file, so that G does in fact go to the end of file, then see how searching performance works. I think mmap's caching will be enough to get a 90% improvement. Then if that's not good enough I can look at the more advanced semantic caching that less does, plus my own scheme that I think less should do but it doesn't.

@jonathanslenders
Copy link
Member

Supporting large files should be possible with several changes.
We probably can't use BufferControl anymore, and should use FormattedTextControl instead.

There are tricks with mmap indeed, we can build an index of line endings by running a regex over an mmapped file. Then only read the lines that need to be displayed. This is pretty efficient and supports multi-gigabyte files. Right now, the limit is around a few thousand lines.

Syntax highlighting on big files is a challenge, but probably can be done without too much effort. I'm not sure yet how much it will take to use prompt_toolkit's Lexer or a FormattedTextControl.

I'm not sure about reading large files from stdin.

@haydenflinner
Copy link
Author

Awesome, thanks for the pointers. Will take another look at this today. Large files from stdin I'm not worried about, if they're too large for memory you probably shouldn't be piping them over stdin 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants