If you are an application developer and you build on PostgreSQL, then maybe you have looked into where PostgreSQL comes from, who develops it, and where you can get professional help if needed.
Now, if you are a PostgreSQL developer (hi!), do you know what you are building on, where those things come from, who develops them, and where you get get professional help if needed?
I like to poke around the PostgreSQL Git repository to find interesting statistics and trends that affect PostgreSQL development. The question I had lately is, what are the largest patches that have been committed? That might indicate some kind of upper bound on the size of features you can get committed in one piece. If something is larger, you might need to split it up.
There is a PostgreSQL major release every year, and every release has about 200 changes listed in the release notes. A handful of those are typically listed at the top as “major items”, which are highlighted for the benefit of the public.
PostgreSQL development has certainly professionalized over the years, in the sense that more people do it as a job now rather than (only) as a hobby. I was wondering whether that would be visible in the time of day when people do publicly visible PostgreSQL work.
Let’s say you are following a patch in the PostgreSQL commit fest. The author has posted “v5” of a patch, some feedback has been sent, the author comes back with “v6” and a note saying that they have incorporated all that feedback. How do you check what actually changed?
I have been gathering some statistics about each major PostgreSQL release. Here is the update for this year:
Some time ago, someone asked on a PostgreSQL mailing list whether there was a diagram of the PostgreSQL system catalogs. There wasn’t at the time. Something like that used to be included in the PostgreSQL documentation, but it was never updated, and so it was eventually removed.
One of the new features in PostgreSQL 15 is that you can use ICU collations on a database and instance level.
A little while ago, we had a few PostgreSQL hackers in a room and someone oversaw me typing something like
I seem to be doing this a lot:
When you are coding in a source tree as big as PostgreSQL’s, you will at some point want to look into some kind of source code indexing. It’s often convenient not to bother, since
git grepis actually superfast. But when you want to find where a function is defined among all the call sites, some more intelligence is useful.
Here are some tweaks to
git logthat I have found useful. It might depend on the workflow of individual projects how applicable this is.
I was idly wondering what was taking
pg_dumpso long and noticed that it always seemed to be pegged at 100% CPU usage on the client. That was surprising because naively one might think that the bottleneck are the server’s or the client’s disk or the network. Profiling quickly revealed that the compression library zlib was taking most of the run time on the client. And indeed, turning compression off caused
pg_dumpto fly without getting anywhere near 100% CPU.
So I had this PostgreSQL database that was getting a bit too big, and since it was really only for analytics, I figured it would be a good fit for putting in Hadoop+Hive instead.
About two months ago, this happened:
PgBouncer has a virtual database called
pgbouncer. If you connect to that you can run special SQL-like commands, for example
2001: PostgreSQL 7.1: write-ahead log
There is a lot of helpful information about
screenout there, but I haven’t found anything about this. I don’t want to “forget” any screen sessions, so I’d like to be notified when I log into a box and there are screens running for me. Obviously, there is
screen -ls, but it needs to be wrapped in a bit logic so that it doesn’t annoy when there is no
screenrunning or even installed.
Normally, this doesn’t work as one might naively expect:
This is my new blog. My old blog was here, but it was time to move on.
subscribe via RSS