I'm not sure, something about the "Recent Stories Summary" section (first view) is hard to read. The spacing is wrong. And the blue font. Someone mentioned Garamond too.
It's creating a "wall of text" effect to me and I'm not able to quickly skim and allow my eye to catch the bits that are interesting to me.
As a comparison, the HN homepage is very accessible to me for skimming and finding things to click into (like this entry).
UI is often quite subjective, understood. But I can't really "scan" the first view fast enough. It's all blending together and causes extra processing on my mind.
I hear you - there's something to be done there. My initial thought was to stay as close to convention as I could (links are blue!), but as the RECENT list gets long, its definitely gets less scannable.
I hear you about "links are blue" ... except when you are a link aggregator.
The links are blue design from early HTML was meant to highlight links in the context of a paragraph of prose, not a list of link items. "Blue" means something special about the text in the context of the text around it.
In this case, the blue font is distracting because the links are the content. You don't need the blue to help your links "stand out". Because the links are normal text, using a normal palette would be appropriate.
I don't mind some subtle clues that these are links. Underlines, slight grey text. Or even a subtle hover effect. Two cents.
Put the number of occurrences (I assume that's the # at the end) first to help with signal to noise ratio based on quantity of coverage maybe? I'd also take a look at news minimalist if I were you, and how they used significance scoring as a fill in vs upvotes to provide additional signal: https://www.newsminimalist.com/
It's quite scannable, but obviously you're doing reverse chrono order so up to you how best to solve the UI issue.
Better, to be honest. Keep refining of course. But this is definitely more readable.
I admit that straight black is not quite the right answer either. A slightly toned down dark grey would be nice. And again, subjectively, I like how HN has a row of non-link smaller (lighter shaded) text under each listing, which I think plays nice for the white space between each item.
Personally, I far prefer black over grey. Grey is really hard to read across a variety of lighting conditions and devices. The older you get, the more important contrast becomes.
Fair and good point. Sharp black bothers me, so just adding in a little hue is nice for my eyes. But that's me, of course.
Noting also that this text is #000000 black, per the CSS. Maybe the background color helps soften it a little? Like contrast white/black is hard on the eyes, but HN is not?
This is exactly the stuff that I think LLMs are best at. We have created the world's coolest string manipulator and this is exactly the kind of things I think LLMs are best suited for. Awesome job!
How do you ensure the titles aren’t confabulated? I’ve used Kagi News recently and it summed up the articles about France wrong (that’s the only section in which I could reliably spot the made-up stuff).
Love it, but the body font (garamond) is not easy on the eyes. Garamond is one of my favorite fonts in print and at not-too-small sizes. On the screen it doesn't look good because where the characters get thin it gets too thin (or as font experts call it, too much contrast).
Noted, thank you! I haven't put a tonne into readability, other than some basics - I prefer a serif'd font, and I made sure the background was easier on the eyes than #FFFFFF haha
In case you are still reading this: Any plans to add RSS etc.? I might be in a small minority, but for me my feedreader is the central source. If I can't subscribe via feed, it doesn't exist for me. That's the way I'm following Techmeme and also Hacker News (with the minimum points set to 80 to show up in the feed). It's kind of annoying how many sites even in the tech field don't offer an RSS feed anymore.
Amazing, lots of questions if you don't mind answering
1) is this written in python 2) if yes, does it use feedparser 3) how are you storing these feeds in the database 4) how are you handling CDATA or html based feeds that return lots of html, do you sanitize before storing or store directly in the database as a CDATA string? 5) how do you handle edge cases and anomalies across different feed providers?
Dude, its old school LAMP stack all day. I use SimpliePie to handle & sanitize feeds, storing text only (stripped of HTML). Edge cases are pretty smoothed out by simplepie!
I did try to build a public facing news aggregator with a similar ux but I couldn’t pull it off purely based on client side state (and I didn’t want to do user management)
I wonder how much this is a factor of the widespread mental health malaise that is often attributed to tech these days? Certainly plenty of factors to go around, but consider the connotation of "scrolling" and how common it is a default replacement to boredom in modern life and suddenly it seems quite insidious.
Super insightful. I feel the same way. I can't mentally "conclude" my read for the day, because there is always just One More Article that is just under the threshold.
An extension of Fear of Missing Out, basically. And yes, I think it causes mental exhaustion and might be directly related to some mental disorders that we have really yet to understand.
Yes - I was actually deliberate about leaving infinite scroll out; I started w/ scrolling on tag pages, for example, but switched them to paginated - largely for the feeling you described.
I love that your site comes with an overview instead of clicking away to another site immediately. Feels snappy and looks good. I can see this being my news roundup. Great work!
Fair enough - its honestly not something I expected anyone to be interested in enough such that an about page would be required.
At a high level, it reads RSS feeds from a number of sources, and uses LLMs to identify clusters of stories about the same thing, group them, tag them, and designate them a "top" story or not. That's it.
The biggest thing I've learned in all of this is that o3-mini is far and away the best at following instructions (for this use case). Periodically I'll cycle through the models available on Groq, and always come back to o3-mini.
Very nice, I've been working on something similar, but for regular news. But I want to summarize complete articles, and RSS only provides the headlines and sometimes the first paragraph of an article.
So I decided to write web crawlers, but then you run into CAPTCHA stuff. So I instead used Selenium to automate my browser to fetch the news articles. That worked well, but I haven't worked on it since.
Now I'm thinking that with all these AI browsers around these days, maybe that's actually easier than doing it with Selenium. But haven't researched it properly yet.
In any case, the LLM work of detecting whether two articles are reporting the same news, and summarizing the story, is the same in your project. So in case your project is open source, I would be interested in that part.
Oh man, don't ask - not a dumb question at all. I'll reshare what I put in another comment that answers it, but bottom line is they're a design gap in the context of /recent.
You're right --- incoming & outgoing end up being redundant on the "Recent" view.
Where they're (more) relevant is in the "Top" view where the LLM editor has picked a subset of stories to be categorized as top and incoming/outgoing are the ones that didn't make the cut, organized by timeliness.
I assumed it meant stories that trended highly and were now fading in popularity (outgoing) and stories that are trending but trending quickly and may be on a fast ascent.
Sort of a combo of "in case you missed it" and "the next new big stories".
I like it. It kind of reminds me of the old Fever RSS reader, which would group together similar articles from different sources, and use that to rank how hot a story was.
Not familiar with fever, but there is something similar buried at the heart of mine - the LLM clusters stories, and they get promoted to public when they reach a threshold of unique sources.
That threshold is a function of day of the week - on weekends when the news cycle is quiet, it lowers the bar --- tuesday to thursday its at its most restrictive.
Nice work! I built a simplified version for my daily routine of searching and reading industry and research papers: https://multimodal-scout.app/. My main content sources are HN and Hugging Face trending papers, which already serve as a content filter for me instead of scraping random news. It works well for the domain I’m interested in. I’ve also made it open source, so you can tweak the RSS feed and adapt it to your needs. https://github.com/yingzha/multimodal-scout/
Awesome - glad you're enjoying it and thank you for the kind words!
My "Stack" ---- LAMP + o3-mini for editorial tasks + Bootstrap for responsive front end.
That is to say: Its old school, and painfully functional. But, light & fast.
Thanks so much for the kind words - its 100% o3-mini for clustering. I have zero editorial input as to what constitutes a cluster, what's "top" news, etc.
The one subtlety is setting up the LLM to understand whether a new story belongs in an existing cluster, or with > 1 neighbors, constitutes a new cluster. The challenge there is scoping the clustering window (hours of stories for consideration) and topic breadth to avoid creating Katamari-super-clusters that just end up with every story associated to them.
At this point I seem to have found a sweet spot re: the hours window, the frequency of processing, and the design of the prompt such that its working consistently.
Very few false positives in terms of spurious clusters being created, or potential clusters being missed.
I just configured my own rss website to only find this awesome solution. I’m crying right now if only it found you earlier I would have saved me so much time. Also do you have the code publicly available so that I can customize for my own needs?
I have done similar style for tech news. Aggravating based on Tags. That way I can read tech news on micro topics.
https://embit.ca/
Your feedback is appreciated.
Very cool. Having an immutable record "time machine" you can use to re-find something you remember reading is very humane. I'd love to see this for world news, politics, etc.
Hey - still thinking about sources here. With the data I have, I could actually do an interesting analysis of news sources - i.e.:
- how often do their stories become members of clusters?
- how "fast" are they to publish on a topic vs. other competitors - i.e.: who "breaks" the news?
- what tags (people, companies, topics) does a given source stick close? Which do they shy away from?
Thanks very much for a really interesting set of ideas to explore!
Cheers and thank you! I'll reshare an earlier comment that I think answers your question - let me know:
Thanks so much for the kind words - its 100% o3-mini for clustering. I have zero editorial input as to what constitutes a cluster, what's "top" news, etc.
The one subtlety is setting up the LLM to understand whether a new story belongs in an existing cluster, or with > 1 neighbors, constitutes a new cluster. The challenge there is scoping the clustering window (hours of stories for consideration) and topic breadth to avoid creating Katamari-super-clusters that just end up with every story associated to them.
At this point I seem to have found a sweet spot re: the hours window, the frequency of processing, and the design of the prompt such that its working consistently.
Very few false positives in terms of spurious clusters being created, or potential clusters being missed.
Very interesting, how do you do that? Do you limit yourself what you feed or via custom instructions? I had a similar case so would love how you are doing the prompting here.
In my case we went with embeddings and clustering to find close papers to each other because llm were allucinating.
Cool - particle looks great - I really like how visual it is.
Distinguishing characteristics - personally I get value from the unambiguous timeline (no editorializing in /recent), and (as nice as the visual is) the non-visual, super simplistic presentation & the curated sources (...which I value b/c I curated them myself haha).
So bottom line is that DS will appeal to a certain kind of obsessive compulsive news consumer and synthesizer that wants the right balance of signal to noise ands a streamlined presentation that doesn't slow them down. I count myself among that group!
Thanks very much! Architecture - is truly recidivistic - LAMP, cron jobs, o3-mini, bootstrap. It works, its fast because its not complicated, and b/c I'm doing things like updating hourly vs. real time.
If you can get rid of the cookies message that would be great, as I will place the site as an app in my phone and that message is annoying to have when I open it.
You should only see that message once when you first show up, and as annoying as it is, there's a compliance element to it. Let me know if its persisting for you after accepting!
What is the purpose of having summaries for "Recent", "Incoming", and "Outgoing" all at the top? Seems like all content from the later two are in the first, right?
You're right --- incoming & outgoing end up being redundant on the "Recent" view.
Where they're (more) relevant is in the "Top" view where the LLM editor has picked a subset of stories to be categorized as top and incoming/outgoing are the ones that didn't make the cut, organized by timeliness.
Oh I should add: incoming will show stories ~20 minutes before they get picked up for "Top" inclusion, if they're going to make the cut, based on how jobs are scheduled.
It's creating a "wall of text" effect to me and I'm not able to quickly skim and allow my eye to catch the bits that are interesting to me.
As a comparison, the HN homepage is very accessible to me for skimming and finding things to click into (like this entry).
UI is often quite subjective, understood. But I can't really "scan" the first view fast enough. It's all blending together and causes extra processing on my mind.
Thank you for the feedback!
The links are blue design from early HTML was meant to highlight links in the context of a paragraph of prose, not a list of link items. "Blue" means something special about the text in the context of the text around it.
In this case, the blue font is distracting because the links are the content. You don't need the blue to help your links "stand out". Because the links are normal text, using a normal palette would be appropriate.
I don't mind some subtle clues that these are links. Underlines, slight grey text. Or even a subtle hover effect. Two cents.
It's quite scannable, but obviously you're doing reverse chrono order so up to you how best to solve the UI issue.
There's no reason for both the story count and the story summary to be clickable. It's confusing because:
(a) It's not clear what the number in parentheses even means (until you click and infer)
(b) Separate links makes you think they lead to different pages
Also, echoing another comment, it's not really clear what "incoming" and "outgoing" stories mean. Maybe "new" vs. "stale"?
I admit that straight black is not quite the right answer either. A slightly toned down dark grey would be nice. And again, subjectively, I like how HN has a row of non-link smaller (lighter shaded) text under each listing, which I think plays nice for the white space between each item.
Noting also that this text is #000000 black, per the CSS. Maybe the background color helps soften it a little? Like contrast white/black is hard on the eyes, but HN is not?
I also use it's sister aggregator site for political news every day - https://www.memeorandum.com/
https://github.com/facundoolano/feedi
I did try to build a public facing news aggregator with a similar ux but I couldn’t pull it off purely based on client side state (and I didn’t want to do user management)
An extension of Fear of Missing Out, basically. And yes, I think it causes mental exhaustion and might be directly related to some mental disorders that we have really yet to understand.
At a high level, it reads RSS feeds from a number of sources, and uses LLMs to identify clusters of stories about the same thing, group them, tag them, and designate them a "top" story or not. That's it.
The biggest thing I've learned in all of this is that o3-mini is far and away the best at following instructions (for this use case). Periodically I'll cycle through the models available on Groq, and always come back to o3-mini.
So I decided to write web crawlers, but then you run into CAPTCHA stuff. So I instead used Selenium to automate my browser to fetch the news articles. That worked well, but I haven't worked on it since.
Now I'm thinking that with all these AI browsers around these days, maybe that's actually easier than doing it with Selenium. But haven't researched it properly yet.
In any case, the LLM work of detecting whether two articles are reporting the same news, and summarizing the story, is the same in your project. So in case your project is open source, I would be interested in that part.
You're right --- incoming & outgoing end up being redundant on the "Recent" view. Where they're (more) relevant is in the "Top" view where the LLM editor has picked a subset of stories to be categorized as top and incoming/outgoing are the ones that didn't make the cut, organized by timeliness.
Definitely a gap in design!
Sort of a combo of "in case you missed it" and "the next new big stories".
That threshold is a function of day of the week - on weekends when the news cycle is quiet, it lowers the bar --- tuesday to thursday its at its most restrictive.
My "Stack" ---- LAMP + o3-mini for editorial tasks + Bootstrap for responsive front end. That is to say: Its old school, and painfully functional. But, light & fast.
The one subtlety is setting up the LLM to understand whether a new story belongs in an existing cluster, or with > 1 neighbors, constitutes a new cluster. The challenge there is scoping the clustering window (hours of stories for consideration) and topic breadth to avoid creating Katamari-super-clusters that just end up with every story associated to them.
At this point I seem to have found a sweet spot re: the hours window, the frequency of processing, and the design of the prompt such that its working consistently.
Very few false positives in terms of spurious clusters being created, or potential clusters being missed.
And - did you actually see the time machine at the bottom of the right hand column? Or - was that just a wish list item of yours?
Note that the top news breaker is: The Verge, having broken about 10% of stories on my site; TechCrunch is next at 8, followed by ... MacRumours at 7.
I should also add - please post any recommendations re: sources to cover.
- how often do their stories become members of clusters? - how "fast" are they to publish on a topic vs. other competitors - i.e.: who "breaks" the news? - what tags (people, companies, topics) does a given source stick close? Which do they shy away from?
Thanks very much for a really interesting set of ideas to explore!
Thanks so much for the kind words - its 100% o3-mini for clustering. I have zero editorial input as to what constitutes a cluster, what's "top" news, etc.
The one subtlety is setting up the LLM to understand whether a new story belongs in an existing cluster, or with > 1 neighbors, constitutes a new cluster. The challenge there is scoping the clustering window (hours of stories for consideration) and topic breadth to avoid creating Katamari-super-clusters that just end up with every story associated to them.
At this point I seem to have found a sweet spot re: the hours window, the frequency of processing, and the design of the prompt such that its working consistently.
Very few false positives in terms of spurious clusters being created, or potential clusters being missed.
In my case we went with embeddings and clustering to find close papers to each other because llm were allucinating.
Distinguishing characteristics - personally I get value from the unambiguous timeline (no editorializing in /recent), and (as nice as the visual is) the non-visual, super simplistic presentation & the curated sources (...which I value b/c I curated them myself haha).
So bottom line is that DS will appeal to a certain kind of obsessive compulsive news consumer and synthesizer that wants the right balance of signal to noise ands a streamlined presentation that doesn't slow them down. I count myself among that group!
Cheers!
What is the purpose of having summaries for "Recent", "Incoming", and "Outgoing" all at the top? Seems like all content from the later two are in the first, right?
Where they're (more) relevant is in the "Top" view where the LLM editor has picked a subset of stories to be categorized as top and incoming/outgoing are the ones that didn't make the cut, organized by timeliness.
Definitely a gap in design!