The entire New Yorker archive is now digitized

(newyorker.com)

244 points | by thm 5 days ago

15 comments

  • donohoe 1 minute ago
    So about 10 years ago, I worked at The New Yorker and responsible for launching the redesign, paywall, and move to WordPress - we had almost the full archive. The data was mostly there. The real issue was actually permissions and rights a lot of the stories. The contracts from 100+ years ago made no mention of the fact that there was going to be an Internet and that we could publish on a domain in a digital format. I can only imagine their gargantuan task it was to track all of that down and get it covered from a legal perspective. I know that was the issue holding us back then. So glad that’s over.
  • smelendez 4 hours ago
    I’ve long thought about trying to map of how the locations of music and maybe theater events listed in the magazine have changed over time.

    There are performances of some kind in pretty much every corner of NYC but it’s interesting to see which neighborhoods have had events deemed relevant to The New Yorker readership in different eras.

    • bufordsharkley 1 hour ago
      It also speaks to what we lose when we lose magazine listings of events (New Yorker effectively gutted this section within the past decade), movie showtime listings via newspaper, etc

      We have a very strong archive going back a century until about 2015, but now wading through linkrot circa 2017 is miserable

    • paganel 3 hours ago
      That's a very neat idea! If you ever have the time to do it you should try it out, in fact you've gave me an idea of trying to do the same for my city, Bucharest, just need to find some relevant data-sources.
  • krelian 4 hours ago
    I hope this gets incorporated into the existing website. I'm not an active subscriber but I used to be and I always thought there was a very fertile "other articles you might like" grounf that the New Yorker never took advantage of, given it's reputation and legacy.
    • tclancy 1 hour ago
      I’ve happily lost hours to following links at the bottom of one story to the next. The new archive still feels a little clunky (search needs a fair bit of work and the OCR clearly struggled in places), but it’s fun to chase down old classics and they’ve done a great job of highlighting greatest hits from the past 100 years.

      Plus the (really high-quality) crossword puzzles often have an Easter egg where the big revealer is linked to an essay from the past.

  • robin_reala 4 hours ago
    Slightly different question, but does anyone have any info about Google’s digitisation of Mainichi Shimbun’s pre-war articles? The work was announced 3 years ago, but it’s been radio silence since: https://mainichi.jp/english/articles/20221110/p2a/00m/0bu/00...
  • gregsadetsky 1 hour ago
    I think that a better link (even though it lacks the context) is this new archive (which is mostly good as it lets you quickly see all cover pages) - https://www.newyorker.com/archive

    But yeah, without a subscription, this still mostly just leads to walled off pages.

    Accessing the actual archived version of every issue at https://archives.newyorker.com/ is truly wonderful as they are fully digitized back to back.

    • toofy 23 minutes ago
      hopefully a lot of local libraries will have access. i could spend hours sifting through this.
  • TrevorFSmith 1 hour ago
    I am a subscriber but still would love a tarball of PDFs of each issue.
  • boh 3 hours ago
    Honestly this got me to subscribe. The back catalog is pretty stellar with pretty much every major writer of the twentieth century making a contribution. Zooming in on PDFs just wasn't how you wanted to read them.
  • bookofjoe 5 hours ago
  • subpixel 5 hours ago
    Here’s a place to start, a list of 250 “best” articles from the New Yorker. I guess this is from previously available articles.

    https://www.reddit.com/r/longform/s/zRJgAEdagi

  • JKCalhoun 4 hours ago
    I saw no way to pull down a PDF. That's unfortunate as I prefer to browse offline.
    • ez_mmk 4 hours ago
      I think you can download the entire issue from the archive
  • gavmor 4 hours ago
    How soon can we chat with it via RAG?
    • visarga 2 hours ago
      Haha, I can't read long articles anymore because I want to reply, a habit I picked chatting LLMs.
  • xnx 6 hours ago
    Nice! 100 years worth.
  • NoMoreNicksLeft 6 hours ago
    Could have sworn they did this years ago. I even have the first 80 years or whatever on DVD in the closet.
    • throwup238 1 hour ago
      Normally when laymen say "digitized" they mean one of two things: scanned images in a PDF or fully transcribed (and possible formatted) text extracted from the scan. The Complete New Yorker you're thinking of was mostly the former, with a bit of indexing (table of contents pointing to the PDFs if I remember correctly).

      This latest digitization project does the latter, transcribing the text into their existing content management system and as far as I can tell, preserving much of the formatting. This comes with full text search, allows cross linking between articles, and all that good stuff.

      I suspect that since they include an LLM summary and started this digitization project in early 2024, this was enabled by LLMs.

    • smelendez 4 hours ago
      If I’m reading this correctly, they now have all their historic articles loaded into their CMS. I think they previously just had a system where you could page (and maybe search?) through scans of old issues, which is also cool but not as versatile.
    • ghaff 5 hours ago
      When a lot of content was being put out on CD/DVD, a number of publications did but they are not straightforwardly accessible these days because they're usually on an old version of Windows. (Yes, if you want to make a project of it, you can probably get into them but has never been worth it for me.)
      • haunter 5 hours ago
        Usually Windows/Wine is the much better case than the old Mac apps (32bit, PPC etc) in the age of Apple Silcon

        https://old.reddit.com/r/thenewyorker/comments/1jlhrve/instr...

        Breaking the DJVU DRM would be the perfect solution though

      • mekael 2 hours ago
        Surprisingly, this has been a project I’ve been tinkering with for years. There is an easy way to get the raw png/jpeg files out, but it does require a windows box. Im planning on working on it more over the long holiday.
      • zorked 5 hours ago
        I think the disc release GP is talking about had files in DjVu format.
        • Tomte 3 hours ago
          Encrypted DjVu, and the viewer doesn‘t run on modern Windows.
      • kopirgan 5 hours ago
        I have the MAD archives bought in 90s on CDs but can't use..
        • haunter 4 hours ago
          The issues on the Absolutely MAD DVD (1952-2005) are just plain PDF files, no DRM, they work perfectly

          https://files.catbox.moe/x4np6u.png

          • ghaff 4 hours ago
            The CDs I have seem to be proprietary for Windows from the late 90s. But I also have PDFs through 2005 on my computer which I must have "acquired" at some point.
            • haunter 3 hours ago
              The browser app might be some outdated Windows application, that's the case with the MAD DVD too, but you can find the actual issue files in some folders
        • ghaff 4 hours ago
          I have MAD archives somewhere. I thought they were in some standard format but maybe not.

          A lot of the gen 1 or so CD content isn't easily accessible although a more industrious person could probably get to it in some manner.

      • fsckboy 5 hours ago
        doesn't wine have old versions of mswindows pretty much nailed?
  • unit149 5 hours ago
    [dead]