Favorite general purpose database for solo proejcts?

2 points by thenooby 6 months ago

Hello everyone. I want to know, what's your go-to or favorite general purpose database for your side projects or solo projects, and why ?

I've tried, since 2-3 years, MySQL, MongoDB, SQLite, Firestore and a bit of Postgres. I've enjoyed MySQL the most so far, and I don't like Firestore much (feels expensive), but I'm not a pro' dev yet, nor I have any extensive experience in any of these system.

After few projects, the most painful part has been migrations overall, which is why I consider MongoDB there. Also, I know that there is JSON data types within MySQL & Postgres, and I know that MySQL also have abstraction to have collections a la MongoDB that can coexist within your database along traditional tables. (Tho', its not much popular)

The project I want to do, is tiny and kind of niche. If it ever succeed, it will be around 1-2 million visits per month kind of website at most, I've seen some website peaking at 30m, but there is only one like that, from the data I know.

Part of the data of the application will be hard to structure, given I don't have direct control over the sources, and they do things differently. For the rest, it will be user generated data that can easily be predictable. Daily, there will be stats generated based on the different sources, the user generated data. There will be a system to search through some data items, with up to 30-40 parameters to refine the research. At the moment there is around 40K items I've gathered, and it grows of 1-2k items per year from what I've seen.

What would you use, or not use, and why ? What are your tips & tricks overall to not shoot yourself in the foot, given that I've not extended experience.

Sorry for my english.

Alex-Programs 6 months ago

I used to use MongoDB for all my projects because it was what I'd first used, when I learnt "proper" programming through an MMO group, and I liked the ability to format things heirarchally.

My current project (https://nuenki.app) uses Postgres. There isn't much complexity to the data it stores, but there is quite a lot of it (I use Postgres to cache hundreds of thousands of translations). I'm happy with it - I've had fewer deployment issues (zero) than I used to have with mongo (irritatingly many, often around ARM problems), SQL is nicer to use, and Rust's SQLx library works really well.

If your data is difficult to model because the source you're getting it from is in an awkward format, surely you should be transforming it into something nicer to work with at ingress?

Honestly, Postgres with some sensible keys and optimisation (just Google it and apply what's relevant to your use case) should be more than fast enough.

thenooby 6 months ago

Thank you very much for your reply.
So far, I've seen so much people recommending Postgres, I guess it's the right answer for me.
Do you use it yourself on your own VPSs/Servers or into a managed platform ?
Is it hard to upgrade Postgres version to version ? From the MongoDB point of view, was it hard to get into Postgres ? (I guess, because I already know SQL, not extensively, it will be easy to get into it) What are your favorites features ?
No issues while using it is a big deal!! (I'll look into "ARM problems", don't know what it means)
- Alex-Programs 6 months ago
  
  I use Postgres via its Docker image on a VPS with some other containers.
  That's probably a bit abnormal nowadays - I prefer simplicity with my deployment, and it can all fit on a single VPS with plenty of margin. It's not difficult to do, and I haven't had to touch that container since it was started.
  I haven't upgraded Postgres from one version to another.
  > From the MongoDB point of view, was it hard to get into Postgres
  Nope, and I barely know SQL. You probably know more than me.
  > What are your favorites features ?
  Getting out of my way and being nice and fast. In retrospect I overengineered some aspects of my code around the assumption that postgres is far slower than it actually is.
  I really don't use many of its features.
  The ARM problems were just that MongoDB didn't like ARM to start with (iirc), and then when I got it running on my raspberry pi they didn't like the lack of SIMD instructions, and then it was a bit of a pain installing an older version that was barely supported anymore. It wasn't particularly difficult, just irritating.
  Postgres just worked. Different ARM machine though, so it wasn't quite representative.
  
  thenooby 6 months ago
  
  Ok so, what you meant by ARM is that MongoDB was not easy to use on ARM architecture machines.
  In regard of benchmarks, few months ago, I've seen some benchmarks where Postgres obliterated MySQL, both for read and writes (basic ones), in terms of performances, with Node.js drivers tho' (probably matters too), I did not though that Mongo would be "faster", as it seems from your point of view. Tho' I assume that for a project of my size (tiny), any of the big general purpose database would be largely sufficient, in terms of benchmarks and just a big machines would be good.
  Looking at Postgres doc, which is quite nice btw (I did not remember that), I think there is much more features that I can dream of. (More than MySQL!!) The pain point I used to have, trying it, was the "administration" stuff, I'll read more in the doc, probably it was just me that was a noob with it (skill issue), also I did not used docker too.
  At the moment, I tried both Cloud things, such a Google cloud, and VPSs, I largely enjoyed more using VPS. Thought I've found Google clound run quite powerful and easy to use.
  Speaking about what you said in your previous comments about transforming data beforehand; I've done a PoC for myself about a year ago of that project, where I made multiple scripts (background jobs) which would gather data from each sources, and I tried to normalize everything to my schemas and all the common points of them. It was annoying to update them over the courses of multiple days, because I did not thought everything well from the get go, which is one of the reason I consider documents (I used to put a JSON column where I put any data I did not thought beforehand), but it's eventually a skill issue from me, and now thinking about that, I guess I could just look at the most important sources (more than 10 sources generating data daily, with 2 really big and important ones) and take time to think my schemas properly. In that regards, do you put the raw data somewhere ? (Would you keep it even?) If yes, in the DB or in text form within a S3 like storage ?

hruk 6 months ago

I use SQLite for this sort of thing (small e-commerce storefront, wife's niche popular blog, etc.). Actually, I use multiple - one for the DB and Diskcache for the cache.

Caching is cheap, backup is trivial and cheap, and pulling down a perfect snapshot of production when I need to fiddle with something is also trivial and cheap.

thenooby 6 months ago

Thank you very much for your reply.
Do you use things like Turso too ? How much do you handle with your personal stuff ? Was it hard to manage at first ? Is there any footgun to dodge there ?
SQLite seems awesome and also .. not enough for the long run ? But I don't know much TBH.
- hruk 6 months ago
  
  Nah, just SQLite. I had to learn a number of things about good SQLite defaults and habits, but that was years back before it really started to take off in this space. Now, there are tons of blogs you can refer to giving reasonable advice.
  The highest-traffic site I run off of SQLite gets roughly 1 million hits per day. I run it on a VPS with 2vCPU and 4 GB RAM and it hovers around ~8% CPU utilization and 12% memory usage during peak hours. Last I checked, the database is handling about ~80 queries per second during peak usage, and clearly there's a ton of headroom.
  If the site got 10x as much traffic, maybe I'd double the instance size. If it 100xed, maybe I'd need to upgrade from storing the DB on an EBS volume to storing it on a NVME drive. If it 1000xed... maybe I'd have problems? That seems like a pretty long way away, and in the meantime I get super simple deployment, replication, disaster recovery, and an easy route to pull down snapshots for debugging, etc.
  
  thenooby 6 months ago
  
  It's impressive and interesting, I guess in terms of performance, it's quite big, 80 qps feels quite huge!!
  For such thing of mine, it's more than enough I guess.
  Would it be a bad idea to use multiple sqlite files, isolating things per domains? Feels like something possible and interesting?

brudgers 6 months ago

Database choice doesn't actually matter until it actually matters.

The problem you have right now is getting to that level of success. There are many many problems to solve and things to learn before you get there. Building is the only way to make progress. Building is hard work. Research is easy and not really work.

I've enjoyed MySQL the most so far

Then, why not use that? Good luck.

thenooby 6 months ago

Thank you very much for your reply.
You're right, I should just start anyway. Waiting is stupid, I could've done a lot more instead of questionning myself, and eventually it will not matter much as you said.
But also, I'm kind of scared to shoot myself in the foot later in the run, if it happens to work, but if it works, I'll change it anyway!
I'll do "eeny, meeny, miny, moe" (I've looked up, its the equivalent of am stram gram in France?) and see what I use :')