I'm kinda allergic to writing "I did the thing" posts, so I can't help but tryhard and attempt to make them compelling somehow.
Writing in this manner is also very helpful in making sense of the work for myself. Takes a better understanding of the subject to thoroughly explain what you've built than to merely build it. Sometimes I've gone back and read through one of these updates to just get a refresher on what my thinking was when I built something.
In my experience, that is pretty much what marginalia search is. I rarely get what I expect but I always get something very interesting that makes me understand my expectations better which is very helpful in accomplishing my goals. Thanks for your work, marginalia is probably my favorite little corner of the web.
Off topic, but would there be a way to integrate marginalia with a specific website? Similarly to how people use google search for their forums or how HN uses algolia?
I'm asking this as one of my projects is a link aggregator similar to old reddit (and HN to some extent) and I would like to be able to present to users a search box, but without having to implement document indexing and search. (I assume ad principio that the website is already aligned ethically and technologically with what Marginalia stands for :D)
Should be soon-ish. I'm working right now on laying the ground works for ad-hoc domain filters. That's technically already possible but comes at a too big performance impact that it deteriorates the search results.
When it works, one of the things I have in mind is making a site search-esque functionality available, as well as exposing it via the public API so that it can be whiteboxed.
I'm using RDRPosTagger[1], though I've optimized the code a bit so that it's not just algorithmically efficient, but to use the language in a way that is fast. It isn't perfect, but it's good enough to be useful.
Language detection and sentence splitting are the other two slow bits of processing.
I'm kinda allergic to writing "I did the thing" posts, so I can't help but tryhard and attempt to make them compelling somehow.
Writing in this manner is also very helpful in making sense of the work for myself. Takes a better understanding of the subject to thoroughly explain what you've built than to merely build it. Sometimes I've gone back and read through one of these updates to just get a refresher on what my thinking was when I built something.
I'm asking this as one of my projects is a link aggregator similar to old reddit (and HN to some extent) and I would like to be able to present to users a search box, but without having to implement document indexing and search. (I assume ad principio that the website is already aligned ethically and technologically with what Marginalia stands for :D)
When it works, one of the things I have in mind is making a site search-esque functionality available, as well as exposing it via the public API so that it can be whiteboxed.
Language detection and sentence splitting are the other two slow bits of processing.
[1] https://github.com/datquocnguyen/RDRPOSTagger