I do some real-time scraping and have noticed various differences in counts. In particular at the start of the day (0000Z). I conversed with Mark KB9AMG on this and here are his thoughts.
My observations are not limited to the one month.


Jim, I was able to compare some things with the 4/26 data despite the issues you were having with the scrapes that day. The one item of interest is what you mentioned some months ago about the spot counts dropping when going from 2300z - 0000z. Appears that you are entirely correct - there is something messed up with that first hour of spots. Your scraped data from the 00z hour had almost 10000 more spots than what I had from the database. There is nothing wrong with your data. The database archive is missing the data for some reason. When doing a query on the database, the data does show up - it just doesn't appear in the zipped archive file. From 0000z-0008z on 4-26, there are a TOTAL of 11 spots in the zipped database file. There should be almost 8000 in the archive.

I can only speculate why this might be happening. Late in the month the archive is very large. I'm guessing the processing of the data on wsprnet starts at 0000z and the completed file shows up around 0030z. For a time the database might be locked out so it can be copied/extracted/archived. What doesn't make sense is why the spots are showing up in the queries but not in the archive file. Unless there are 2 different files - one that contains the last 2 weeks of spots to search, and another that has the entire database to create the monthly archives from. I even waited until the next day to see if the spots from 0000z-0008z would then show up in the archive, but they still didn't show up there.

As an experiment, I tried to manually upload one of my spots from 0000z on the 26th that shows up in your scraped data and in the database query. The spot does not show up in the archive. When I manually uploaded the spot, it said it was a dupe.

wsprnet might be keeping a 2-week database on the site itself and there might be a separate database which has all the current month data. For some reason some of the data in the 0000z hour of the 2-week database is not getting stored in the main database file (just guessing on how things actually work on wsprnet).

Most likely the cause is the very large size of the archive file this late in the month. It is 630MB compressed, and getting close to 3GB uncompressed. I see many spots again missing on the 27th for 0000-0006z.