Skip to content

PG* DB env preparation#1671

Open
brontolosone wants to merge 3 commits intogetodk:nextfrom
brontolosone:db-env-preparation
Open

PG* DB env preparation#1671
brontolosone wants to merge 3 commits intogetodk:nextfrom
brontolosone:db-env-preparation

Conversation

@brontolosone
Copy link
Contributor

@brontolosone brontolosone commented Mar 3, 2026

Problem:
A bit of circling around in #1647 eventually resulted in a solution that would use Bash's expressiveness to deal with the PG* environment variables and in particular, backward compatibility for legacy DB_* variables. Defaults for the PG* env vars would not be defined via the docker environment (as we can't express what we want in Dockerese, to start with, it cannot distinguish between variable-undefined and variable-empty).
Thus in start-odk.sh we would do the necessary work. But, the environment created there would not be set for any random docker compose exec service [...] command. Only the env vars set through Docker's mechanisms are applied for those.
The central-related commands we run are run through wrappers in files/service/scripts, in addition to the odk-cmd wrapper (for creating users etc) that we steer users towards via the docs.
So if we set the environment variables when those wrappers are executed, we'll be good.
This PR implements that idea.

As a recap, the alternatives are:

  1. Set PG* vars through Docker.
    • Upside: Consistent env for anything executed in the service container
    • Downside: Due to lack of expressiveness in Dockerese, this makes certain DB configurations impossible, and again we'd be in the way for the user for configuring their database connection to the full libpq extent possible, while the idea we started on this journey for was to get out of the way.
  2. Run some preprocessor to generate an env file for Docker to source. The classic undesirable "use some build system to create inputs for some other build system".
    • Upside: Consistent env
    • Downside: There is no obvious canonical way of doing this, it's kinda antithetical to the idea of using Docker (as the win95 "Start" button), users won't like it if they have to run Make or something; it's throwing out the baby with the bath water; if we're going to create friction we might as well ask users to migrate their legacy DB_* env vars and be done with it.
  3. This here solution.
    • Upside: Backwards compatible while keeping it possible for the user to use any PG* configuration.
    • Downside: We need to remember to use the wrappers. This PR takes care of the obvious entry points (odk-cmd and the scripts in scripts/), but of course since previously there weren't any obvious advantages to using those wrappers, we haven't been dogfooding them. For instance here in Frontend we have been doing variants docker exec node [...].js, which then ends in tears as the DB defaults haven't been set. So, a bit of cognitive overhead for us devs.

What has been done to verify that this works as intended?

Why is this the best possible solution? Were any other approaches considered?

See discussion above.

How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?

See discussion above.

Does this change require updates to documentation? If so, please file an issue here and include the link below.

No changes necessary. In the docs we steer users to odk-cmd.

Before submitting this PR, please make sure you have:

  • branched off and targeted the next branch OR only changed documentation/infrastructure (master is stable and used in production)
  • verified that any code or assets from external sources are properly credited in comments or that everything is internally sourced

@brontolosone brontolosone marked this pull request as ready for review March 3, 2026 13:29
brontolosone added a commit to brontolosone/central-frontend that referenced this pull request Mar 3, 2026
@matthew-white
Copy link
Member

As a recap, the alternatives are:

  1. Set PG* vars through Docker.
  2. Run some preprocessor to generate an env file for Docker to source. The classic undesirable "use some build system to create inputs for some other build system".
  3. This here solution.

I'm feeling like I may need a second opinion on this one. I would love for us not to go the way of (1), for the reasons you outlined. It'd be awesome for us to be able to tell users, "just set your PG* variables," without then listing a bunch of caveats/exceptions.

That said, option (2) would definitely be a change from our current approach. Option (3) is also something that devs would need to understand and regularly interact with. In particular:

we have been doing variants docker exec node [...].js, which then ends in tears as the DB defaults haven't been set.

IMO it'd be non-ideal if it ended up being harder to use the service container — if each interaction with the container required some setup. For example, I sometimes run docker compose exec service node lib/bin/repl.js, but now it seems like that wouldn't work without a wrapper or some other setup.


I'm not opposed to options (2) or (3), but they do feel like big enough departures from the status quo that I think we should get wider sign-off before implementing one of them.

Though just to throw out an option 4, is there any possibility of moving the logic in env.d/000_DB-env-defaults into Backend? If Backend were solely responsible for converting DB_* environment variables to PG* variables (e.g., as part of setlibpqEnv()), then anything that interacted with Backend would get that automatically. No preprocessing or setup necessary.

Is there anything outside Backend that uses the PG* variables? I guess there's the part where we wait to connect to the database, but we could probably move that inside Backend. Is there anything else?

Comment on lines +1 to +3
# if the PGSERVICE mechanism — libpq named configurations (specified through this env var)
# sourced from a file (referenced through PGSERVICEFILE env var) is used, leave everything be.
if ! [[ -v PGSERVICE ]]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for these new lines? Feels like this should maybe go in its own PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was an omission, this proviso should've been in #1647. I want dodge a bit of PR-busywork. It's all on-topic :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely get not wanting to open a million different PRs, but I'd encourage a separate PR in this case. I promise to review it quickly. 😅 I have specific questions about these lines, but asking them here feels a tad off-topic. I think having that conversation in its own PR would make the change more transparent — both to those following this work now and to our future selves doing Git archeology.

@matthew-white
Copy link
Member

Does this change require updates to documentation? If so, please file an issue here and include the link below.

No changes necessary. In the docs we steer users to odk-cmd.

Not always, unfortunately. E.g., at https://docs.getodk.org/central-backup/ we recommend:

docker compose exec service node /usr/odk/lib/bin/restore.js ...

@brontolosone
Copy link
Contributor Author

Does this change require updates to documentation? If so, please file an issue here and include the link below.

No changes necessary. In the docs we steer users to odk-cmd.

Not always, unfortunately. E.g., at https://docs.getodk.org/central-backup/ we recommend:

docker compose exec service node /usr/odk/lib/bin/restore.js ...

We can provide a wrapper for that. I'm starting to think we might want a generic wrapper so that you can go docker compose exec thewrapper node /some/js/file, this would also help you and your repl.js use case.

@brontolosone
Copy link
Contributor Author

Though just to throw out an option 4, is there any possibility of moving the logic in env.d/000_DB-env-defaults into Backend? If Backend were solely responsible for converting DB_* environment variables to PG* variables (e.g., as part of setlibpqEnv()), then anything that interacted with Backend would get that automatically. No preprocessing or setup necessary.

Is there anything outside Backend that uses the PG* variables? I guess there's the part where we wait to connect to the database, but we could probably move that inside Backend. Is there anything else?

I considered that same thing ;-) and felt it doesn't solve the problem, it just moves the environment setting from an easy to reach for place (eg docker exec -it service bash and then going source /usr/local/odk/env.d/*) to a (IMO) harder to reach place. For sure we don't want to do it in two places though. And I see wrapping things through shell (well, not really wrapping anymore if you call exec!) work better than wrapping things through Node 😆, I'm not even sure it has an equivalent of "becoming" the called process (with the execv* system call family, which I think the bash exec built-in runs with, iow the "wrapping" process just dissolves).

Yes, psql is a use case. I have the ideal that one would be able to verify their DB connection via some variant of docker exec service psql (which would, in my latest thinking, need to be docker exec service envwrapper psql, but that's a detail).

@brontolosone
Copy link
Contributor Author

I think there must be a way to set the shell used when doing docker exec to the wrapper. Then you'll always have your env vars 🎉
I'll try that later tonight.

@brontolosone
Copy link
Contributor Author

I think there must be a way to set the shell used when doing docker exec to the wrapper. Then you'll always have your env vars 🎉 I'll try that later tonight.

Yeah, no, docker exec doesn't run through the container's shell (maybe you don't even have a shell there). Nor does there seem to be a way to make it go through one, by default :-/

@matthew-white
Copy link
Member

It's still feeling like we should probably pull more people into this conversation (perhaps on Thursday). But let me throw out some thoughts.

I'm starting to think we might want a generic wrapper so that you can go docker compose exec thewrapper node /some/js/file, this would also help you and your repl.js use case.

That feels pretty ergonomic to me — just inserting thewrapper. It still seems like a bit of a breaking change though. But maybe it wouldn't affect anything that happens very often / maybe it wouldn't break any workflows. E.g., hopefully no one is having to run node /usr/odk/lib/bin/restore.js very often.

Couple of follow-up questions about thewrapper:

  1. thewrapper is still part of option 3 from the PR description, right? The idea is that odk-cmd will wrap automatically, while things outside odk-cmd will need thewrapper?
  2. When is thewrapper needed and when is it not? It seems to me like if you're using a custom database and only using the new PG* variables, then it's not needed. But if you're using the postgres14 database in Docker (in which case, you're likely using the PG* variable defaults), or if you're using DB_* variables, then you need thewrapper. Does that sound right?

Yes, psql is a use case. I have the ideal that one would be able to verify their DB connection via some variant of docker exec service psql

This feels like a new use case, so maybe it's not something we need to be backwards-compatible about. I don't feel like it commits us to thewrapper:

  • We could write a little Node script to verify the DB connection.
  • In most cases going forward, there won't be a need for thewrapper to access Postgres:
    • If you're using the postgres14 database in Docker, you can access the postgres14 container directly.
    • If you're using a custom database, then you don't need thewrapper as long as you're using PG* environment variables.

Is there anything outside Backend that uses the PG* variables? I guess there's the part where we wait to connect to the database, but we could probably move that inside Backend. Is there anything else?

I considered that same thing ;-) and felt it doesn't solve the problem, it just moves the environment setting from an easy to reach for place (eg docker exec -it service bash and then going source /usr/local/odk/env.d/*) to a (IMO) harder to reach place.

Could you say more about the aspects of the problem that it doesn't solve? Are there concrete use cases outside docker exec service psql that would be problematic? To me, if moving things into Backend would eliminate the need for thewrapper, that would be a significant benefit.

The main purpose of the service container is to do various things in Node, so it seems reasonable to me to implement this backwards-compatibility in Node itself. It probably wouldn't have been my first choice, but if it prevents things from breaking, it seems like a pragmatic path forward.

@matthew-white
Copy link
Member

It's probably clear that I lean toward my option 4 currently. But most of all, I just want to understand the various options better (pros and cons) so that the tradeoffs are clear if/when we discuss as a team.

@brontolosone
Copy link
Contributor Author

brontolosone commented Mar 4, 2026

It's still feeling like we should probably pull more people into this conversation (perhaps on Thursday). But let me throw out some thoughts.

👍

I'm starting to think we might want a generic wrapper so that you can go docker compose exec thewrapper node /some/js/file, this would also help you and your repl.js use case.

That feels pretty ergonomic to me — just inserting thewrapper. It still seems like a bit of a breaking change though. But maybe it wouldn't affect anything that happens very often / maybe it wouldn't break any workflows. E.g., hopefully no one is having to run node /usr/odk/lib/bin/restore.js very often.

Yes.

Couple of follow-up questions about thewrapper:

1. `thewrapper` is still part of option 3 from the PR description, right? The idea is that `odk-cmd` will wrap automatically, while things outside `odk-cmd` will need `thewrapper`?

Yes — and we may want to consider wrapping the scripts of files/service/scripts in addition to that odk-cmd.

2. When is `thewrapper` needed and when is it not? It seems to me like if you're using a custom database and only using the new `PG*` variables, then it's not needed. But if you're using the `postgres14` database in Docker (in which case, you're likely using the `PG*` variable defaults), or if you're using `DB_*` variables, then you need `thewrapper`. Does that sound right?

💯 . We need the wrapper to effect the fallback defaults, because we can't do it with Docker, mainly because it can't distinguish between unset and empty variables.

Just migrate the .env!

Doing things in a non-backward compatible way would save a lot of trouble and keep things cleaner and clearer, and we should at least consider it.
It could work like this:

  1. We put our default values straight into uncommented PG* env vars, in our .env.example. New users will copy it to .env and just not touch those vars, unless they have a custom DB.
  2. We let the user run a migration script that does the appropriate replacements/additions on their .env file. Example script.
  3. As a failsafe, in the Dockerfile "build" script we could error out when we don't see the migration marker set by the script, and point users towards the conversion script and/or some documentation on what's going on.

Yes, psql is a use case. I have the ideal that one would be able to verify their DB connection via some variant of docker exec service psql

This feels like a new use case, so maybe it's not something we need to be backwards-compatible about. I don't feel like it commits us to thewrapper:

* We could write a little Node script to verify the DB connection.

The use case for being able to verify your DB connection without involving ODK code is, for instance, for when you've been supplied with credentials for some provisioned DB and you're troubleshooting why you can't connect. And then you call over the sysadmin. You have a much stronger troubleshooting case when it's "I have these PG* vars in my process environment, exactly like the Postgres docs say, and I have the psql we both know and trust right here, and it says X" versus "I have this piece of software you know nothing about and it doesn't seem to be able to connect to the DB". Until ODK supersedes psql's popularity it remains a better common reference point as a connection diagnostic tool 😆

* In most cases going forward, there won't be a need for `thewrapper` to access Postgres:
  
  * If you're using the `postgres14` database in Docker, you can access the `postgres14` container directly.

True, of course there's a way, but it would be very easy if one could just access it the same way as central does, without having to copy/paste connection specifics from some config file.

  * If you're using a custom database, then you don't need `thewrapper` as long as you're using `PG*` environment variables.

That's very true! Just having the PG* env vars available, always, with defaults, through how Docker sets the environment, for anything inside the container, would be bliss. We'd all love that. 🌈

Is there anything outside Backend that uses the PG* variables? I guess there's the part where we wait to connect to the database

hmmmm, we should use pg_isready instead of psql for that actually! 22fa7ed .

but we could probably move that inside Backend. Is there anything else?

Well, all the DB utilities one might want to use (pg_dump, pg_restore etc). But I'm feeling no love from anyone but me for those use cases 💔

I considered that same thing ;-) and felt it doesn't solve the problem, it just moves the environment setting from an easy to reach for place (eg docker exec -it service bash and then going source /usr/local/odk/env.d/*) to a (IMO) harder to reach place.

Could you say more about the aspects of the problem that it doesn't solve? Are there concrete use cases outside docker exec service psql that would be problematic? To me, if moving things into Backend would eliminate the need for thewrapper, that would be a significant benefit.

The main purpose of the service container is to do various things in Node, so it seems reasonable to me to implement this backwards-compatibility in Node itself. It probably wouldn't have been my first choice, but if it prevents things from breaking, it seems like a pragmatic path forward.

I'm now really thinking that we should dispense with all this runtime handling of backwards compat and defaults. I feel there's a big cost associated with it, in terms of extra code paths (to maintain, to test), unobviousness of what gets decided when and where, etc. There's no substitute for the congruence of "see env var in .env, see same env var in docker exec service env, no games".
I feel we should seriously consider biting the bullet and migrate the .env as outlined above 🤔… Let's discuss on Thursday.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants