0

Say that I have some data that is publicly available to begin (like an archive of hackernews stories), and I would like to make it available publicly in a sql database that is world-accessible. The constraints/salient points are these:

  1. The data is public to begin with so there's no risk of leaking sensitive information.
  2. Users should be allowed to run any SELECT or SHOWPLAN but nothing destructive like DELETE or DROP.
  3. The worst-case scenario here should be that someone manages to somehow sneak a DROP through and the whole thing vanishes. This scenario is acceptable with very low probability. What is not acceptable is that someone manages to get root access to the system on which the database is running.
  4. Rate limiting is outside the scope of the problem. It can be assumed that this can be done higher up, like at the API level.

Could all of these be satisfied by something as simple as having a database fronted by an API with a single endpoint which takes SQL queries, parses them, filters them, and forwards them to the database?

Are there any instances of this happening in the wild? Google returns 0 results when searching "publicly readable relational database"...

2 Answers2

1

There's a difference between making the data publicly available and making the database publicly available.

What is not acceptable is that someone manages to get root access to the system on which the database is running.

This is very true! Regardless of how sensitive the data is (or isn't, in your case) users should never have access to the system in such a way that they can interfere with the operation of that system or the experience of other users. Hence it doesn't make sense to place the database server itself in a situation where it is exposed in any way.

a database fronted by an API with a single endpoint which takes SQL queries, parses them, filters them, and forwards them to the database

This is literally the two- or three- tier architecture used by every web/application server/micro-service out there today. Build your API - possibly as a hosted micro-service if you don't want to worry about your own servers - and have it connect to a protected database behind the scenes: every major cloud service out there (AWS, Google, Oracle) offers a way to do this using SaaS and DBaaS that is publicly accessible and secure.

Users should be allowed to run any SELECT or SHOWPLAN but nothing destructive like DELETE or DROP.

Make sure that your API isn't too open: users should be able to provide input to pre-defined, structured SQL queries, but should never be able to submit raw, ad-hoc SQL. Queries should always be executed with as few privileges as possible, using an account with select privileges and nothing else. Never have your API connect to the database as the owner of the tables, or as a user with any system-level privileges. Always be in control of what is happening in your database: not because your data is sensitive, but because your infrastructure and the experience of your users is.

pmdba
  • 3,104
  • 2
  • 6
  • 11
0

Make sure that your API isn't too open: users should be able to provide input to pre-defined, structured SQL queries, but should never be able to submit raw, ad-hoc SQL.

To add on to pmdba's excellent points, another reason you shouldn't allow users to submit raw / ad-hoc queries is because they could easily tank the performance of your database server, making it inaccessible to others, if those queries are poorly formulated.

In fact, ad-hoc queries are generally the hardest to optimize for even when they're more controlled with the general use cases already understood ahead of time by the database developers. So it would be even tougher trying to optimize for any and all end user input, without even having a hint of what their use cases could be. And giving them free rein to run any query openly (even if limited to just SELECT statements) also gives them the power to write a poor query that tanks your system.

J.D.
  • 37,483
  • 8
  • 54
  • 121