My adventure in designing API keys

(vjay15.github.io)

60 points | by vjay15 2 days ago

18 comments

  • bob1029 2 hours ago
    I don't understand the need for this level of engineering. It appears we are going for an opaque bearer token here. The checksum is pointless because an entire 512 bit token still fits in an x86 cache line. Comparing the whole sequence won't show up in any profiler session you will ever care about.

    If you want aspects of the token to be inspectable by intermediaries, then you want json web tokens or a similar technology. You do not want to conflate these ideas. JWTs would solve the stated database concern. All you need to store in a JWT scheme are the private/public keys. Explicit tracking of the session is not required.

    • vjay15 46 minutes ago
      Hello bob! the checksum is for secret scanning offline and also for rejecting api keys which might have a typo (niche case)

      I just was confused regarding the JWT approach, since from the research I did I saw that it's supposed to be a unique string and thats it!

      • bob1029 22 minutes ago
        The neat thing about JWT is that there are no secrets to scan for. Your secret material ideally lives inside an HSM and never leaves. Scanning for these private keys is a waste of energy if they were generated inside the secure context.
      • petterroea 30 minutes ago
        I may be naive but I can't imagine anyone typing an api key by hand. Optimizing for it sounds like premature optimization, surely stopping the less than one in a million HTTP request with a hand-typed API key from reaching the db isn't worth anything
      • arethuza 40 minutes ago
        "for rejecting api keys which might have a type" - assuming that is meant by to be "typo" - won't they get rejected anyway?
        • vjay15 38 minutes ago
          it's just an added benefit, I don't have to make a DB call to verify that :)
    • notpushkin 2 hours ago
      > The checksum is pointless because an entire 512 bit token still fits in an x86 cache line

      I suppose it’s there to avoid round-trip to the DB. Most of us just need to host the DB on the same machine instead, but given sharding is involved, I assume the product is big enough this is undesirable.

      • phire 1 hour ago
        You need to support revocation, so I'm not sure it's ever possible to avoid the need for a round trip to verify the token.
        • kukkamario 1 hour ago
          The point of the checksum is to just drop obviously wrong keys. No need to handle revocation or do any DB access if checksum is incorrect, the key can just be rejected.
      • rrr_oh_man 1 hour ago
        > I assume the product is big enough

        Experience tells otherwise

  • weitendorf 2 hours ago
    Hey OP, sorry for the negativity, I think most of these commenters right now are pretty off-base. My company is building a lot of API infrastructure and I thought this was a great write up!
  • randomint64 1 hour ago
    While it's true that API keys are basically prefix + base32Encode(ID + secret), you will want a few more things to make secure API keys: at least versioning and hashing metadata to avoid confused deputy attacks.

    Here is a detailed write-up on how to implement production API keys: https://kerkour.com/api-keys

    • vjay15 46 minutes ago
      Thank you! I will definitely look into it!
  • petterroea 29 minutes ago
    A bit over-engineered, but it was fun to read about observations on industry standard API keys. I agree it would be nice with more discussion around API keys and qualities one would want from them.
  • Savageman 1 hour ago
    Side note: the slug prefix is not primarily intended for the end-user / developer to figure out which kind of key it is, but for security scanners to detect when they are committed to code / leaked and invalidate them.
    • vjay15 41 minutes ago
      Ahhhh I see, I didn't think about it that way too, this could help us a lot yea!!!
  • amelius 14 minutes ago
    It's a bit confusing that the "Random hex" example contains characters such as "q" and "p".
    • vjay15 12 minutes ago
      I don't understand your question :o
      • onei 7 minutes ago
        Hex is 0-9, a-f. P and q are outside that character set.
  • calrain 3 hours ago
    I don't like giving away any information what-so-ever in an API key, and would lean towards a UUIDv7 string, just trying to avoid collisions.

    Even the random hex with checksum component seems overkill to me, either the API key is correct or it isn't.

    • andrus 2 hours ago
      GitHub introduced checksums to their tokens to aid offline secret scanning. AFAIK it’s mostly an optimization for that use case. But the checksums also mean you can reveal a token’s prefix and suffix to show a partially redacted token, which has its benefits.
    • sneak 13 minutes ago
      Identifying an opaque value is useful for security analysis. You can use regex to see when they are committed to repos accidentally, for example.
  • pdhborges 1 hour ago
    I don't even understand what approach 3 is doing. They ended up hashing the random part of the API key with an hash function that produces a small hash and stored that in the metashard server is that it?
    • vjay15 39 minutes ago
      yea... sorry I still am not the best explainer but that is the approach, I just wanted to have a shorter hash in the meta shard that is it. The approach 3 is an attempt by me to generate my own base62/base70 encoder ;-;
  • grugdev42 36 minutes ago
    Everything about this is over engineered. Just KISS.
  • vjay15 2 days ago
    Hello everyone this is my third blog, I am still a junior learning stuff ^_^
    • vjay15 28 minutes ago
      I NEVER THOUGHT I WOULD BE IN THE MAIN PAGE OF HACKERNEWS THANK YOU SO MUCH GUYS (╥﹏╥)
    • notpushkin 2 hours ago
      Hey, welcome to HN!

      Reading “hex” pointing to a clearly base62-ish string was a bit interesting :-)

      Also, could we shard based on a short hash of account_id, and store the same hash in the token? This way we can lose the whole api_key → account_id lookup table in the metashard altogether.

      • vjay15 45 minutes ago
        Hello thanks for reading through my blog :D Coming to your question, yes! that is possible I mentioned it in my second approach!

        But when I mentioned it to my senior he wanted me to default with the random string approach :)

  • tlonny 1 hour ago
    Presumably because API keys are n bytes of random data vs. a shitty user-generated password we don’t have to bother using a salt + can use something cheap to compute like SHA256 vs. a multi-round bcrypt-like?
    • vjay15 11 minutes ago
      I can't understand what you are trying to say :o
  • ramchip 2 hours ago
    The purpose of the checksum is to help secret scanners avoid false positives, not to optimize the (extremely rare) case where an API key has a typo
    • vjay15 43 minutes ago
      thank you so much ram chip :) I didnt know that!
  • sneak 12 minutes ago
    This is a very good example of premature optimization.
  • usernametaken29 2 hours ago
    I know sometimes people just like to try things out, but for the love of god do not implement encryption related functionality yourself. Use JWT tokens and OpenSSL or another established library to sign them. This problem is solved. Not essentially solved, solved. Creating your own API key system has a high likelihood of fucking things up for good!
    • fabian2k 2 hours ago
      You don't need any encryption or signing for API keys. Using JWTs is probably more dangerous here, and more annoying for people using the API since you now have to handle refreshing tokens.

      Plain old API keys are straightforward to implement. Create a long random string and save it in the DB. When someone connects to the API, check if the API key is in your DB and use that to authenticate them. That's it.

      • swiftcoder 1 hour ago
        > Plain old API keys are straightforward to implement

        This is pretty much just plain-old-api-keys, at least as far as the auth mechanism is concerned.

        The prefix slug and the checksum are just there so your vulnerability scanner can find and revoke all the keys folks accidentally commit to github.

        • vjay15 42 minutes ago
          yes this is the approach!
      • sabageti 1 hour ago
        We don't store it, in plain text right, store them hashed as always.
      • iamflimflam1 2 hours ago
        I would add the capability to be able to seamlessly rotate keys.

        But otherwise, yes, for love of everything holy - keep it simple.

    • notpushkin 2 hours ago
      The securify here comes from looking the key up in the DB, not from any crypto shenanigans.
  • dhruv3006 2 hours ago
    Hey - this was a great blog ! I liked how you used the birthday paradox here.

    PS : I too am working on a APIs.Take a look here : https://voiden.md/

  • MORPHOICES 2 hours ago
    [dead]
  • adaptit 1 hour ago
    [dead]