| |
@@ -192,6 +192,16 @@
|
| |
raise NotImplementedError(archive)
|
| |
|
| |
|
| |
+ def index_db(name, tempdb):
|
| |
+ print(f'{name.ljust(padding)} Indexing file: {tempdb}')
|
| |
+
|
| |
+ if tempdb.endswith('primary.sqlite'):
|
| |
+ conn = sqlite3.connect(tempdb)
|
| |
+ conn.execute('CREATE INDEX packageSource ON packages (rpm_sourcerpm)')
|
| |
+ conn.commit()
|
| |
+ conn.close()
|
| |
+
|
| |
+
|
| |
def compare_dbs(name, db1, db2, cache1, cache2):
|
| |
print(f'{name.ljust(padding)} Comparing {db1} and {db2}')
|
| |
|
| |
@@ -412,6 +422,7 @@
|
| |
|
| |
download_db(name, repomd_url, archive)
|
| |
decompress_db(name, archive, tempdb)
|
| |
+ index_db(name, tempdb)
|
| |
if PUBLISH_CHANGES:
|
| |
packages = compare_dbs(name, tempdb, destfile, cache1, cache2)
|
| |
publish_changes(name, packages, repomd_url)
|
| |
The
GET_CO_PACKAGE
query is used in_expand_pkg_info
, which is used inget_pkg
/get_src_pkg
/_process_dep
and the latter is used in most other API calls. This means the query is used in every single call exceptindex
andlist_branches
. It turns out that this query is slower than it can be.Firstly, it returns 10 extra columns, and uniqueness is run in Python. This can be moved to sqlite using
DISTINCT
. Secondly, determining co-packages uses therpm_sourcerpm
column, which has no index. This makes lookups on that column very slow.For Rawhide source packages, 60% produce 1 package, 15% produce 2, 3.3% produce 3, 10.6% produce 4, 4.2% produce 5, 2.0% produce 6, and the remaining are <1% for anywhere from 7-90, a few singular source packages with 100-300 packages, and of course
texlive
at the far end of 5936. For comparison, I ranab -c 100 -n 1000 http://127.0.0.1:8080/f31/pkg/${pkg}
onmaster
, and the two commits here. The results are as follows for some packages in that range, with 3 columns for each result:guake
(1 co-package):gcc
(82 co-packages):glibc
(222 co-packages):lodash
(299 co-packages (max other thantexlive
)):texlive (5936 co-packages):
Thus the first commit is a bit slower (5%) for most packages (since most are 1-to-1), but 1.42-3.72× faster (Requests per second) for larger packages. But the second commit is helpful for all packages, between 1.03-2.36× faster over the first commit. Overall, requests are 2.24-3.83× faster.