| |
@@ -38,15 +38,13 @@
|
| |
|
| |
# Duplicate data check (for sqlite output)
|
| |
if args.dupcheck:
|
| |
- try:
|
| |
- item = next(match_iter) # grab first matching item
|
| |
- except StopIteration:
|
| |
- # If there is no next match, keep going
|
| |
- continue
|
| |
- if args.writer.has_item(item): # if it's already in the db...
|
| |
- continue # skip to next log
|
| |
- else: # otherwise
|
| |
- args.writer.write_item(item) # insert it into the db
|
| |
+ for item in match_iter:
|
| |
+ if args.writer.has_item(item): # if it's already in the db...
|
| |
+ continue # skip to next log
|
| |
+
|
| |
+ args.writer.write_item(item) # insert it into the db
|
| |
+ # There should be no items left, but to be safe...
|
| |
+ continue
|
| |
|
| |
# Write matching items (sqlite does commit at end, or rollback on error)
|
| |
args.writer.write_items(match_iter)
|
| |
I think this fixes #11204.
Basically the "deduplication" happened once for the first line of a log file, and if that triggered then it skipped the entire logfile.
This does mean that all the old numbers will now be slightly "wrong" because there are duplicates in the middle of the data. Eg.
JDBG: _dup: CountmeItem(timestamp=1673222415, host='xxxx', os_name='Rocky Linux', os_version='8.7', os_variant='generic', os_arch='x86_64', sys_age=1, repo_tag='epel-8', repo_arch='x86_64')
JDBG: _dup: CountmeItem(timestamp=1673222415, host='xxxx', os_name='Rocky Linux', os_version='8.7', os_variant='generic', os_arch='x86_64', sys_age=1, repo_tag='epel-8', repo_arch='x86_64')
...where before this patch that would be two entries, unless it was the first line in a new logfile in which case it skipped the entire file.
Not going to merge before I rerun all the data for this year, at least once, but feel free to review.
Also contains a second patch which is a variant of smooge's parse_sql fix.