Posted by jxmorris12 13 hours ago
That's absolutely not it. What you're describing is part of the UNIX philosophy: programs should do one thing and do it well, and they should function in a way that makes them very versatile and composable, etc.
And that part of the philosophy works GREAT when everything follows another part of the philosophy: everything should be based on flat text files.
But for a number of reasons, and regardless of whatever we all think of those reasons, we live in a world that has a lot of stuff that is NOT the kind of flat text file grep was made for. Binary formats, minified JS, etc. And so to make the tool more practical on a modern *nix workstation, suddenly more people want defaults that are going to work on their flat text files and transparently ignore things like .git.
It's just that you've showed up to an wildly unprincipled world armed with principles.
No you are correct, do not doubt yourself. Baked in behavior catering to a completely separate tool is bad design. Git is the current version control software but its not the first nor last. Imagine if we move to another source control and are burdened with .gitignore files. No thanks.
The Unix tools are designed to be good and explicit at their individual jobs so they can be easily composed together to form more complex tools that cater to the task at hand.
Actually we are, because some utter idiot wrote:
> grep-like tools which read .gitignore violate POLA.
How about we keep it civilized
Or some combination of --no-ignore (or -u/--unrestricted) with --ignore-file or --glob.
[1]: https://ugrep.com/
--ignore-file=.ignore
--ignore-file=.gitignore
--ignore-file=.dockerignore
--ignore-file=.npmignore
etc
but then, assuming all those share the same "ignore file syntax/grammar"...
But that’s the kind of problem that only successful things have to worry about.
But it does make sense today.
I'd argue that modern computers do many astonishing and complicated and confusing things - for example, they synchronize with cloud storage through complex on-demand mechanisms that present a file as being on the users' computer, but only actually download it when it's opened by the user - and they attempt to do so as a transparent abstraction of a real file on the disk. But if ripgrep tried to traverse your corporate Google Drive or Dropbox or Onedrive, users might be "astonished" when that abstraction breaks down and a minor rg query takes an hour and 800 GB of bandwidth.
It used to be that one polymath engineer could have a decent understanding of the whole pyramid of complexity, from semiconductors through spinlocks on to SQL servers. Now that goal is unachievable for most, and tools ought to be sophisticated enough to help the user with corner cases and gotchas that make their task more difficult than they expected it to be.
[1]: https://en.wikipedia.org/wiki/Principle_of_least_astonishmen...
The comparison between jiff, chrono, time and hifitime is just as good of a read in my opinion: https://github.com/BurntSushi/jiff/blob/HEAD/COMPARE.md
(And they have also written interesting things on regex, non-regex string matching, etc.)
There was this post from cursor https://cursor.com/blog/fast-regex-search today about building an index for agents due to them hitting a limit on ripgrep, but I’m not sure what codebase they are hitting that warrants it. Especially since they would have to be at 100-200 GB to be getting to 15s of runtime. Unless it’s all matches that is.
On a mid-size codebase, I fzf- and rg-ed through the code almost instantly, while watching my coworker's computer slow down to a crawl when Pycharm started reindexing the project.
Perhaps they run their software on operating or file systems that can't do it, or on hardware with different constraints than the workstation flavoured laptops I use.
There's also RGA (ripgrep-all) which searches binary files like PDFs, ebooks, doc files: https://github.com/phiresky/ripgrep-all
And I was dead wrong. Overnight everyone uses rg (me included).
It’s fast even on a 300mhz Octane.
SGUG tried hard to port newer packages for IRIX for several years but hit a wall with ABI mismatches leading to GOT corruption. This prevented a lot of larger packages from working or even building.
I picked up the effort again after wondering if LLMs would help. I ran into the ABI problems pretty quickly. This time though, I had Claude use Ghidra to RE the IRIX runtime linker daemon, which gave the LLM enough to understand that the memory structures I’d been using in LLVM were all wrong. See https://github.com/unxmaal/mogrix/blob/main/rules/methods/ir... .
After cracking that mystery I was able to quickly get “impossible” packages building, like WebKit, QT5, and even small bits of Go and Rust.
I’m optimistic that we’ll see more useful applications built for this cool old OS.
I’m sort of thinking of AmigaOS/Workbench as well although, perhaps because of what I would assume was always a much larger user base than SGI had, it maybe never went away like SGI and IRIX did.
It is great seeing these old platforms get a new lease of life.
Eventually I was considering rebuilding the machine completely but for some reason after a very long time digging deep into the rabbit hole I tried plain old grep and there was the data exactly where it should have been.
So it's such a vague story but it was a while back - I don't remember the specifics but I sure recall the panic.
If it actually matched grep's contract with opt-in differences that'd be a gamechanger and actually let it become the default for people, but that ship seems to have sailed.
rg : Searches git tracked files
rg -u : Includes .gitignored files
rg -uu : Includes .gitignored + hidden files
rg -uuu : Includes .gitignored + hidden + binary filesSometimes I forget that some of the config files I have for CI in a project are under a dot directory, and therefore ignored by rg by default, so I have to repeat the search giving the path to that config files subdirectory if I want to see the results that are under that one (or use some extra flags for rg to not ignore dot directories other than .git)
I still use it but Ive never trusted it fully since then I double check.
It's the reason I started using it. Got sick of grep returning results from node_modules etc.
> You could easily just alias a command with the right flag if the capability was opt-in.
I tried a search to make grep ignore .gitignore because `--exclude=...` got tedious and there was ripgrep to answer my prayers.
Maintaining an alias would be more work than just `rg 'regex' .venv` (which is tab-completed after `.v`) the few times I'm looking for something in there. I like to keep my aliases clean and not have to use rg-all to turn off the setting I turned on. Like in your case, `alias rg='rg -u'`, now how do you turn it off?
See https://github.com/BurntSushi/ripgrep/blob/master/GUIDE.md#a... for the details.
I wouldn't want to use tools that straddle the two, unless they had a nice clear way of picking one or the other. ripgrep does have "--no-ignore", though I would prefer -a / --all (one could make their own with alias rga='rg --no-ignore')
I think riggrep will not search UTF-16 files by default. I had some such issue once at least.
I ran into that with pt, and it definitely made me think I was going mad[0]. I can't fully remember if rg suffered from the same issue or not.
[0] https://github.com/monochromegane/the_platinum_searcher/issu...
The ".ignore" name was actually suggested by the author of ag (whereas the author of rg thought it was too generic): https://news.ycombinator.com/item?id=12568245
It's nice and everything, but I remember being happy with the tools before (I think i moved from grep to ack, then jumped due to perf to ag and for unremembered reasons to pt.)
It took me a while, but I remembered I ran into an issue with pt incorrectly guessing the encoding of some files[0].
I can't remember whether rg suffered from the same issue or not, but I do know after switching to rg everything was plain sailing and I've been happy with it since.
[0] https://github.com/monochromegane/the_platinum_searcher/issu...