= problem = Koji, Bodhi and PackageDB contain a lot of data in their databases which might be useful to access for certain analyses. Access to that data is only possble via slow and restricted web rpcs. But there is for example no possibility to easily retrieve all retired packages from pkgdb without iterating over all packages. Therefore it would be helpful to have direct read access to the respective databases.
= analysis =
The databases for the services are critical for Fedora's operation, which is why they should not be available directly on the Internet. However usually it is possible to replicate the databases. If there is a way to create a one-way replicate from the internal database to an external database, this could be used to create a publicly reachable database with mirrored data from the internal databases. The only problem left would be if there is confidential data in the databases. However at least for Bodhi and Package DB database dumps are published irregulary, making this unlikely. Authentication is usually done via FAS, therefore the services do not need to store user's data.
= enhancement recommendation =
Create replication databases on publicly accessible systems. Make an internal system push the data from the internal database to the external system. This makes sure that a compromise of the external database does not endager the internal database, because all it can do is receive the data from the internal database. Allow everyone or select Fedora users direct access on the external database, preferable with the possibility to use TLS.
I think this is something that would use a lot of resources and be used by almost no one.
We could possibly look at making available dumps of dbs? But then we would need to audit them to make sure they aren't publishing any private info, which could be a lot of work for again, not many users.
How many people would find this of use? For what?
For one off issues like the "retrieve all retired packages from pkgdb" you could request someone in sysadmin-db run the query for you and get you the data? Or enhance pkgdb to provide an easier interface to do that?
Replying to [comment:1 kevin]:
For one off issues like the "retrieve all retired packages from pkgdb" [...] enhance pkgdb to provide an easier interface to do that?
Packagedb2? :)
I think this is something that would use a lot of resources and be used by almost no one. We could possibly look at making available dumps of dbs? But then we would need to audit them to make sure they aren't publishing any private info, which could be a lot of work for again, not many users.
There already dumps, for example at https://fedorahosted.org/releases/p/a/packagedb/ https://fedorahosted.org/releases/b/o/bodhi/ Dumps are better than nothing, but I assume importing them will also take a lot of time. It does at least for Bodhi.
I suspect that this might also allow to speedup queries with clients like fedora-easy-karma, because the applications can just query the data that is needed without lots of information that is not required.
packagedb2 will provide an interface for this, but who knows how fast it will be. At least the current packagedb contains a lot of data that might not be of interest in every response to e.g. get the current status of a package. It always contains all usernames of all affected users, the description and summary, which probably all are unecessary joins to just get this information.
Web apps that collect a lot of information from different systems might also benefit from this, because they can faster display information, e.g. https://apps.fedoraproject.org/packages/
I'm not sure if those dumps are full db or are sanatized somehow.
If fedora-easy-karma needs a better interface, I'd prefer it's developer work with bodhi and see if we can expose that?
Sure, we could do better, but how often do these consumers need to run?
Packages should be able to talk to the db already?
It's worth noting that we are still trying to do db replication for HA needs, much less for some db exposing/gravy. ;)
So, I'm going to say no right now and close this, but once we get replication for other needs we could talk to application owners and see if this is more desired?
try bucardo for replication.
Log in to comment on this ticket.