Why proteins fold and how GPUs help us fold

(aval.bearblog.dev)

70 points | by diginova 4 hours ago

9 comments

fabian2k 1 hour ago
The secondary structure graphic is entirely wrong. It's full of bad chemical formulas, and I would assume is AI-generated.
I'm quite impressed by the amino acid overview graphic. I'm sure all images are AI-generated, and this one is something I didn't expect AI to be able to do yet. There are mistakes in there (e.g. Threenine instead of Threonine, charged amino groups for some amino acids), but it doesn't look immediately wrong. Though I haven't needed to know the chemical formular for all the amino acids in a long time, so there are probably more errors in there I didn't immediately notice. The angles and lengths of the bonds are not entirely consistent, but that also happens without AI sometimes if someone doesn't know the drawing tools well. The labels are probably the clearest indicator, because they are partly wrong and they are not consistent as they also include the non-side-chain parts sometimes, which doesn't make sense.
The biology part of the text looks somehwat reasonable overall, I didn't notice any completely outrageous statements at a quick glance. Though I don't like the "folding is reproducible" statement as that is a huge oversimplification. Proteins do misfold, and there is an entire apparatus in the cells to handle those cases and clean them up.
[-]
- D-Machine 1 hour ago
  This article is garbage and makes many incorrect claims, and it is clearly AI-generated. E.g. the claim that "AlphaFold doesn't simulate physics. It recognizes patterns learned from 170,000+ known protein structures" couldn't be farther from the truth. Physical models are baked right into AlphaFold models and development at multiple steps, it is a highly unique architecture and approach.
  AlphaFold models also used TPUs: https://github.com/google-deepmind/alphafold/issues/31#issue...
  EDIT: Also annoying is the usual bullshit about "attention" being some kind of magic. It isn't even clear AlphaFold uses the same kind of attention as typical LLM transformers, because it uses custom "Evoformer" layers instead: https://www.nature.com/articles/s41586-021-03819-2_reference...
- augment_me 1 hour ago
  The text structure screams GPT5 sadly, so I would not be surprised if not only the text but the images were wrong.
- coolness 1 hour ago
  Yeah, I don't really understand why someone would make a blog and use AI to write the articles. Isn't having a blog more about the joy of writing and the learning you do while writing it?
  [-]
  - lm28469 1 hour ago
    Because it's what cool people do, so if you want to be cool you do it. They didn't realise the cool part was actually having the knowledge and actually writing the text.
    There are many similar things where people just take shortcuts because they don't understand the interesting part is the process/skill not the final result. It probably has to do with external validation, reddit is full of "art" subs being polluted by these people, generative ai is even leaking into leather work, wood carving, lino cut, it's a cancer
- Agingcoder 1 hour ago
  It’s also not a solved problem unlike what the article claims, unless ‘solved’ doesn’t mean ‘works all the time ‘.
- robbie-c 1 hour ago
  I think it's just an AI-generated simplification, sucks that it made it to the front page. The subject matter is interesting, I would have loved to have read something written by an expert!
  [-]
  - fabian2k 1 hour ago
    I would assume so, but I didn't see any smoking guns in the text itself. But I'm also not familiar with the newest models here and their quirks.
    [-]
    - D-Machine 1 hour ago
      See my point above (https://news.ycombinator.com/item?id=46271980) for smoking guns. There are some pretty basic and grievous factual errors re: GPUs being used when in fact TPUs are used, and completely false claims about physical models not being huge parts of AlphaFold development and even architecture.
      [-]
      - fabian2k 1 hour ago
        Those errors don't seem AI-specific to me, they could easily be made by humans.
        [-]
        D-Machine 57 minutes ago
        True, it is the style of the post that reveals obvious overuse of AI. The errors could well be made by a human, especially since a trivial visit to Wikipedia or one of the original papers will show most of what is being said here re: the actual deep models to be wrong. This is more likely the error of a human than an AI.
        EDIT: Ugh, it is late. I mean, if you used e.g. ChatGPT-5.X with extended thinking and search, it would not make these grievous errors. However, ChatGPT without search and the default style, produces junk basically indistinguishable from this kind of post. So, for me, the smoking gun is that not even the most basic due diligence (reading Wikipedia or looking at the actual papers) has been done, and, given the length and style of the post, this is effectively a smoking gun for (cheap, free-version) AI use.
        But, more importantly, it is indistinguishable in quality from AI slop, and so garbage regardless.
penetrarthur 1 hour ago
Great article!
On a sidenote, what is this new style of writing using small sentences where each sentence is supposed to be a punchline?
"And most of those sequences? They don't fold into anything useful. They're junk. They aggregate into clumps. They get degraded by cellular quality control. Only a TINY fraction of possible sequences fold into stable, functional proteins."
[-]
- prof-dr-ir 1 hour ago
  > what is this new style of writing
  Congratulation, you are now able to recognize an AI-generated text.
  (As of December 2025 at least, who knows what they will look like next month.)
- lm28469 58 minutes ago
  Short sentence are good. Especially when you interact with low attention individuals. Make sure they stay engaged. It's not just a style. It's a game changer for your blog.
- cassianoleal 1 hour ago
  Sounds like TEDspeak, only in writing.
zkmon 1 hour ago
If nature did so well for billions of years, why are we taking over it's job now? Did it ask for your help?
Anytime some talks about large numbers - some galaxy is billions of kilometers away, there are trillions of atoms in universe, trillions of possible combinations for a problem etc - it appears to me that you talking about some problem that doesn't fall into your job description.
atomlib 3 hours ago
Was this text AI-generated?
ursAxZA 1 hour ago
One protein fold is cute.
How many H100s do you need to simulate one human cell? Probably more than the universe can power.
emptybits 2 hours ago
I really appreciated the explanation of what proteins are, in simple terms. I assume (?) it's accurate enough for a layperson.
And I do love the optimism.
But then you must admit this reads like a B-movie intro:
```
    Then AI companies showed up in 2020 and said "we got this" and
    solved it in an afternoon. ... We're playing God with molecules
    and it's working.
```
topaz0 2 hours ago
I got about a page in before finding out this is drivel. The final straw was "AI companies showed up and solved it in an afternoon". No faster way to show you don't know what you're talking about.
[-]
- D-Machine 1 hour ago
  Yeah this article is garbage. The real problem with protein-folding is not compute, or training on known configurations only, but figuring out a differentiable loss that is related to the energy configuration of generated new sequences / molecules, and iterative folding and all sorts of other things. It is very much NOT just a "throw lots of data at GPUs" problem.
  This is all covered cursorily even by Wikipedia - https://en.wikipedia.org/wiki/AlphaFold#AlphaFold_2_(2020).
- terhechte 1 hour ago
  I don't know the space, so I found the article interesting. Please explain, what's wrong with it?
  [-]
  - eesmith 29 minutes ago
    From the text:
    > as you're reading this, there are approximately 20,000 different types of proteins working inside your body.
    From https://biologyinsights.com/how-many-human-proteins-are-ther...
    "The human genome contains approximately 19,000 to 20,000 protein-coding genes. While each gene can initiate the production of at least one protein, the total count of distinct proteins is significantly higher. Estimates suggest the human body contains 80,000 to 400,000 different protein types, with some projections reaching up to a million, depending on how a “distinct protein” is defined."
    Plus, that's just in the human DNA. In your body are a whole bunch of bacteria, adding even more types of protein.
    > The actual number of protein molecules? Billions. Trillions if we're counting across all your cells.
    There are on average 10 trillion proteins in a single cell. https://nigms.nih.gov/biobeat/2025/01/proteins-by-the-number... There are over 30 trillion human cells in an adult. https://pmc.ncbi.nlm.nih.gov/articles/PMC4991899/ . That's about 300 septillion proteins in the body. While yes, that's "trillions" in some mathematical sense, in that case it's also "tens" of proteins.
    (The linked-to piece later says "every single one of your 37 trillion cells", showing that "trillions" is far from the correct characterization. "trillions of trillions" would get the point across better.)
    > Each one has a specific job.
    Proteins can do multiple jobs, unless you define "job" as "whatever the protein does."
    Eg, from https://pmc.ncbi.nlm.nih.gov/articles/PMC3022353/
    "many of the proteins or protein domains encoded by viruses are multifunctional. The transmembrane (TM) domains of Hepatitis C Virus envelope glycoprotein are extreme examples of such multifunctionality. Indeed, these TM domains bear ER retention signals, demonstrate signal function and are involved in E1:E2 heterodimerization (Cocquerel et al. 1999; Cocquerel et al. 1998; Cocquerel et al. 2000). All these functions are partially overlapped and present in the sequence of <30 amino acids"
    > And if even ONE type folds wrong, one could get ... sickle cell anemia
    Sickle cell anemia is due to a mutation in the hemoglobin gene causing a hydrophobic patch to appear on the surface, which causes the hemoglobins to stick to each other.
    It isn't caused by misfolding. https://en.wikipedia.org/wiki/Sickle_cell_disease
    (I haven't researched the others to see if they are due to misfolding.)
    > Your body makes these proteins perfectly
    No, it doesn't. The error rate is quite low, but not perfect. Quoting https://pmc.ncbi.nlm.nih.gov/articles/PMC3866648/
    "Errors are more frequent during protein synthesis, resulting either from misacylation of tRNAs or from tRNA selection errors that cause insertion of an incorrect amino acid (misreading) shifting out of the normal reading frame (frameshifting), or spontaneous release of the peptidyl-tRNA (drop-off) (Kurland et al. 1996). Misreading errors are arguably the most common translational errors (Kramer and Farabaugh 2007; Kramer et al. 2010; Yadavalli and Ibba 2012)."
    > Then AI companies showed up in 2020 and said "we got this" and solved it in an afternoon.
    They didn't simply "show up" in 2020. Google DeepMind was working on it since 2016 or so. https://www.quantamagazine.org/how-ai-revolutionized-protein...
    > we're DESIGNING entirely new proteins that have never existed in nature
    We've been designing new proteins that have never existed in nature for decades. From https://en.wikipedia.org/wiki/Protein_design
    "The first protein successfully designed completely de novo was done by Stephen Mayo and coworkers in 1997 ... Later, in 2008, Baker's group computationally designed enzymes for two different reactions.[7] In 2010, one of the most powerful broadly neutralizing antibodies was isolated from patient serum using a computationally designed protein probe.[8] In 2024, Baker received one half of the Nobel Prize in Chemistry for his advancement of computational protein design, with the other half being shared by Demis Hassabis and John Jumper of Deepmind for protein structure prediction."
    > These are called secondary structures, local patterns in the protein backbone
    The corresponding figure is really messed up. The sequence of atoms in the amino acids are wrong, and the pairs of atoms which are hydrogen bonded are wrong. For example, it shows a hydrogen bond between two double-bonded oxygens, which don't have a hydrogen, and a hydrogen bond between two hydrogens, which would both have partial positive charge. The hydrogen bonds are suppose to go from the N-H to the O=C. See https://en.wikipedia.org/wiki/Beta_sheet#Hydrogen_bonding_pa...
    > Given the same sequence, you get the same structure.
    The structure may depend on environmental factors. For example, https://en.wikipedia.org/wiki/%CE%91-Lactalbumin "α-lactalbumin is a protein that regulates the production of lactose in the milk of almost all mammalian species ... A folding variant of human α-lactalbumin that may form in acidic environments such as the stomach, called HAMLET, probably induces apoptosis in tumor and immature cells."
    There can also be post-translational modifications.
    > The sequence contains all the instructions needed to fold into the correct shape.
    Assuming you know the folding environment.
    > Change the shape even slightly, and the protein stops working.
    I don't know how to interpret this. Some proteins require changing their shape to work. Myosin - a muscle protein - changes it shape during its power stroke.
    > Prions are misfolded proteins that can convert normal proteins into the misfolded form, spreading like an infection
    Earlier the author wrote "It's deterministic (mostly, there are exceptions called intrinsically disordered proteins, but let's not go there)."
    https://en.wikipedia.org/wiki/Prion says "Prions are a type of intrinsically disordered protein that continuously changes conformation unless bound to a specific partner, such as another protein."
    So the author went there. :)
    Either accept that proteins aren't always deterministically folded based on their sequence, or don't use prions as an example of misfolding.
  - D-Machine 1 hour ago
    See for example the AlphaFold2 presentation linked here: https://predictioncenter.org/casp14/doc/presentations/2020_1.... Some samples that point out where most of the innovations are NOT just "huck a transformer at it":
    ====
    Physical insights are built into the network structure, not just a process around it
    - End-to-end system directly producing a structure instead of inter-residue distances
    - Inductive biases reflect our knowledge of protein physics and geometry
    - The positions of residues in the sequence are de-emphasized
    - Instead residues that are close in the folded protein need to communicate
    - The network iteratively learns a graph of which residues are close, while reasoning over this implicit graph as it is being built
    What went badly:
    - Manual work required to get a very high-quality Orf8 prediction
    - Genetics search works much better on full sequences than individual domains
    - Final relaxation required to remove stereochemical violations
    What went well
    - Building the full pipeline as a single end-to-end deep learning system
    - Building physical and geometric notions into the architecture instead of a search process
    - Models that predict their own accuracy can be used for model-ranking
    - Using model uncertainty as a signal to improve our methods (e.g. training new models to eliminate problems with long chains)
    ====
    Also you can read the papers, e.g. https://www.nature.com/articles/s41586-019-1923-7 (available if you search the title on Google Scholar; also https://www.nature.com/articles/s41586-021-03819-2_reference...). There is actual, real good science, physics, and engineering going on here, as compared to e.g. LLMs or computer vision models that are just trained on the internet, and where all the engineering is focused on managing finicky training and compute costs. AlphaFold requires all this and more.
    EDIT: Basically, the article makes it sound like deep models just allowed scientists to sidestep all the complicated physics and etc and just magically solve the problem, and while this is arguably somewhat correct for computer vision and much of NLP, this is the exact opposite of the truth for AlphaFold.
naaqq 1 hour ago
Start reading from the 3/4 mark, that’s the ‘how’ part
VirusNewbie 2 hours ago
Did AlphaFold not use TPUs?
[-]
- D-Machine 1 hour ago
  Yes https://github.com/google-deepmind/alphafold/issues/31#issue....
  This article is garbage and makes many incorrect claims, and it is clearly AI-generated. E.g. the claim that "AlphaFold doesn't simulate physics. It recognizes patterns learned from 170,000+ known protein structures" couldn't be farther from the truth. Physical models are baked right into AlphaFold models and development at multiple steps, it is a highly unique architecture and approach.