Skip to content

Exercise 1.3: Bulk Data Queries

Sending too many queries for individual IP addresses will tax our servers and will also be quite slow. Instead, use our "Daily Sources" feed which will summarize all data received the prior day in an easy to parse, tab delimited file.

The most recent file can be found at

The file is quite large (50-100 MBytes). Please download it only once a day. For this exercise, we will start with

curl > /tmp/sources.txt

Let's answer a simple question: What are the top 10 /24 networks, based on the number of IP addresses listed in the file:

First we need to remove comments:

grep -v '^#' /tmp/sources.txt

Next, we "cut" the first column, the source IPs.

grep -v '^#' /tmp/sources.txt | cut -f1

We need to count the number of distinct IPs. So we remove duplicates.

grep -v '^#' /tmp/sources.txt | cut -f1 | sort -u

But we are only interested in /24s. So we run another "cut" to only keep the first 3 octets.

grep -v '^#' /tmp/sources.txt | cut -f1 | sort -u | cut -f1-3 -d'.'

Finally, we sort, count unique network addresses and sort to see the most frequent /24s at the end.

grep -v '^#' /tmp/sources.txt | cut -f1 | sort -u | cut -f1-3 -d'.' | sort | uniq -c | sort -n

When I ran the command a couple days ago, I got this result for the top 10 (your result may be different):

126 063.088.023
132 042.115.009
136 205.251.193
139 045.083.066
148 045.083.065
148 045.083.067
149 045.083.064
165 103.131.071
185 071.006.233
212 192.035.168

In this example, 45.83/16 was interesting as it showed up 4 times. A whois look reveals that - is owned by yet another Internet Security Research project (Alpha Strike Labs) that is not yet fully listed in our feed but will be by the end of the week :).


Use the "Daily Sources" (see above) and the APIs "Miner" feed to find any IPs in the daily sources that are also in the miner feed. Let your command line Kung-Fu shine for this one!

Please try to hit the API only once and save the output to a file.

To remove the extra "0"s from the source IPs in the "daily sources" feed, use this sed command:

sed -E 's/^0+//' < /tmp/sources.txt | sed -E 's/\.0+/./g'
Hint #1

The API function you are looking for is

Hint #2

Probably the easiest way to get a list of IPs is

curl ''  | jq '.[].ipv4' | tr -d '"' > /tmp/miners

hint #3

Make sure the miner feed, as well as the daily sources only contain unique IPs:

sort -u /tmp/miners > /tmp/miners_uniq
sort -u /tmp/sources.txt > /tmp/sources_uniq

hint #4

Combine the two feeds, and check which IP addresses show up only once:

cat /tmp/miners_uniq /tmp/sources_uniq | sort | uniq -c | grep '^2'


You will likely get no duplicates. Mining pools are "passive" in that they will only accept connections. These IPs should not show up in our firewall log feed.