Hacker News — vinext + Cloudflare Workers

new
past
show
ask
show
jobs
submit

▲The perils of UUID primary keys in SQLite (andersmurphy.com)

74 points by emschwartz 9 hours ago | 41 comments

bambax 7 minutes ago [-]

Why would you use UUIDs a primary keys? Let SQLite use rowids internally (which is automatic and invisible), and have a different (indexed) column with UUID if you need that for publishing the ID somewhere.

blopker 7 hours ago [-]

UUIDs are way over used. There is almost always a better key to use, usually a bigint for databases. If you're making some kind of leaderless distributed data store, then maybe, but even then there are other ID sharding strategies I'd go for first depending on the constraints.

For a single database, bigints are smaller and faster, with less footguns.

UUIDs can be nice for an opaque public ID, however I'd still prefer something like a Sqid for space and usability.

Fabricio20 5 hours ago [-]

> bigints are smaller and faster, with less footguns

But be careful!! Javascript WILL interpret your bigints as Number() and round them down because they are too big without telling you!!!

Famously seen by every snowflake user that has interacted with Javascript, quite an annoying problem.

silvestrov 24 minutes ago [-]

Good trick is to prefix all such keys with magic, i.e. a couple of letters that identify type type of key.

Then it will always be a string and you will be free to change the format/type of the key in the future to UUID or whatever you like.

spiffytech 5 hours ago [-]

Fortunately we're seeing more JS DB libraries offering to read large numbers as the BigInt type.

shakna 2 hours ago [-]

But frustratingly, a JS BigInt is nothing like a BigInt in any other language.

In JS - BigInt is 64bit integer.

In anything else - BigInt is a arbitrarily large integer.

anematode 2 hours ago [-]

Hm? JavaScript BigInts are arbitrary precision, and you need to use methods like BigInt.asIntN(64, a) to convert them to 64 bits

mort96 29 minutes ago [-]

I hate this so much because you can’t nicely serialise a BigInt as JSON. Using a string is nicer but it only makes sense where int64 is used as an ID, not where it’s used as a number; and you don’t wanna have to configure this per field per query.

Etheryte 1 hours ago [-]

This is simply not true? Or maybe I misunderstand what you mean?

paulddraper 5 hours ago [-]

Node.js drivers will correctly read int64 as string or bigint, not number.

E.g. pg for PostgreSQL

Maybe there’s a buggy driver but I don’t know it.

JamesSwift 6 hours ago [-]

UUIDs also have a nice benefit of it being impossible to query the wrong table with one if you mixup what an FK goes to

pyuser583 5 hours ago [-]

Yeah this is nice - also helps with grepping dump files.

mamcx 3 hours ago [-]

How is this done?

nickpeterson 3 hours ago [-]

They just mean you catch incorrect joins more easily because there is usually no overlap in keys between unrelated tables. Using int, you’re usually going to have some shared values between two unrelated tables.

sudoshred 3 hours ago [-]

Statistically impossible to inadvertently generate a collision using UUID keys. UUID is designed to be unique when generated across any computer system. Practically speaking if you have an exactly matching pair of UUIDs from disparate system you have found the exact record match. The name gives a hint "Universally unique identifier". -Not a cryptographer.

3 hours ago [-]

masklinn 3 hours ago [-]

The U means if you join the wrong table your join will always come up empty.

It does not actually make it impossible to query the wrong table it just tells you quickly when you’ve done so.

andersmurphy 1 hours ago [-]

Yes this matters even more if you are doing a lot of joins. Naive string UUIDs are 32 bytes (though I use binary uuid in the post which is 16) compared to 8 bytes for a 64-bit int. This matters even more with sqlite as it uses varint encoding. The upshot of all this is your indexes take up a lot less space in memory.

bob1029 7 hours ago [-]

I am finding UUIDs help a lot if your primary schema consumer is an LLM.

Inappropriate aliasing of integer keys allows for silent errors in queries because it will actually return some result a lot of the time. A UUID is immune to this problem. The model recognizes its mistake a lot more reliably when previously non-empty tables start showing up empty after attempting a join.

crubier 5 hours ago [-]

No one ever got fired for using UUIDs

7 hours ago [-]

ac50hz 15 minutes ago [-]

I enjoy these carefully worded posts from Anders Murphy, illustrative and informative, not opinionated and preachy. Very useful, it’s great to see the process, and ofc bookmarkeable material for sharing with others.

cropcirclbureau 41 minutes ago [-]

Is this relevant for other databases? For postgres for example, which supports concurrent writers, wouldn't sequential keys lead to contention on the page at the frontier?

pyuser583 5 hours ago [-]

Oh gosh the ints v uuids debate for pks. This is worse than vim v eMacs or brackets v braces.

andersmurphy 1 hours ago [-]

This is actually a draft. I Wanted to add more details about how this changes with row size etc. I might get time to update it later today.

w10-1 6 hours ago [-]

Isn't the solution just to use the rowid (after doing the read-id-after-insert dance)?

How much trouble does SQLite reysing rowid's actually cause?

andersmurphy 1 hours ago [-]

You don't even need to that. SQLite auto increments the ids and is a single writer (which you should be coordinating at the application level.

Regular rowids are definitely the way to go if you can use them.

wood_spirit 2 hours ago [-]

If you need (or want the convenience of) a uuid and the time of creation is not secret then use ulids eg uuid v7.

yepyoukno 8 hours ago [-]

Perils of “UUIDv4”. Everyone knows that’s what UUIDv7 was really for, and you should always convert that to binary to optimize everything.

JSR_FDED 7 hours ago [-]

Small nit: uuid7 is 128 bits (16 bytes) by definition. So there’s no need to convert it to binary. It already is. Unless you’re working with a stringified version of the uuid7.

yepyoukno 7 hours ago [-]

Oh yes, I meant don’t store as an ID in its string format!

dexterdog 5 hours ago [-]

It's just s dumb as storing dates as strings, but people still do it.

littlecranky67 47 minutes ago [-]

But SQLite does not have a native datetime type so you have to use strings

voakbasda 4 minutes ago [-]

You can use an integer

cenamus 1 hours ago [-]

But also one of the recommended ways of doing it, as it has no native Datetime type.

themafia 4 hours ago [-]

> and you should always convert that to binary to optimize everything

I disagree. I tried this once. Now you need a client access layer to touch the DB in any context. All your console tools no longer work well or at all. If they show up in URLs you need to deoptimize them for transport.

You give up a lot of convenience for this optimization. You should be absolutely sure your design requires it before using it.

dumbledorf 8 hours ago [-]

Wait how is sqlite doing a million inserts a second?

JSR_FDED 7 hours ago [-]

In batches

kg 7 hours ago [-]

sqlite is really fast. I'm surprised it's only a million.

andersmurphy 1 hours ago [-]

It's running on an M1 mac with synchronous full. Wouldn't surprise me if it's possible to get higher numbers.

6 hours ago [-]

smitty1e 7 hours ago [-]

':memory:'

https://sqlite.org/inmemorydb.html

KPGv2 3 hours ago [-]

Except this source code is not using :memory: The linked source code has

    (defonce db
      (d/init-db! "db/db.db"
        {:pool-size 4 :pragma {:synchronous "FULL"}}))

That's writing to disk.

andersmurphy 1 hours ago [-]

Yes it's writing to disk (on a M1 mac which has terribly slow fsync). But, because of the transaction the fsync dance is done once per batch. Each row is the id + a 50 byte data blob.

There's only one index so there's no real write amplification. The numbers will go down as you add more data and indexes.

Rendered at 07:56:38 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.