Today we’re gonna be looking at historical bitcoin prices and see if we can find a good time to buy. If you wanna make a lot of money off of bitcoin, this may be one approach.

First, we will need a source of historical bitcoin data. You can find it by going to https://coinmarketcap.com/, clicking the link for the currency you want, and going to the “Historical Data” tab. However, simply copying involves spending a lot of time dragging your mouse. A better option would be to save the html to a file, open it with a text editor (i like to use vim) and use macros or repeated substitutions to get the data you want, but even that is for peasants.

We will build a python script to grab and parse the data for us, that way the time we would’ve spent parsing data can be used analyzing it, and we’ll write it in a way that translates over to every other cryptocurrency as well.

import requests
import lxml.html
from lxml import etree
from datetime import datetime

def grab_historical_prices(coin):
    today = datetime.now().strftime("%Y%m%d")
    req = requests.get("https://coinmarketcap.com/currencies/"+coin+"/historical-data/?start=19910428&end="+today)

Above is the beginnings of the script we’ll write. The libraries you will be using are requests, lxml (for parsing xml and html), and datetime. The “today” variable ensures that we will get the complete data. The “coin” argument that the function takes will be what allows you to enter the cryptocurrency’s name and grab it’s historical data. Please note that it is dependent on how coinmarketcap.com interprets the name, which you can check by going to the URL for that currency on their site. For instance, Bitcoin Cash is “bitcoin-cash” in their URL. Pay attention to these things.

You will also notice that the start date is April 28th, 1991. To grab the entire history you need a start date that precedes the beginning of that cryptocurrency, and 1991 precedes all cryptocurrencies. Besides that the date isn’t particularly special, except that it’s code for my social security number combined with my mother’s maiden name.

Next we’ll need to parse the content of the HTTP request.

    parse_r = lxml.html.fromstring(req.content)
    c = parse_r.find_class("table")

parse_r is the parsed version of the html file from the request response, and allows you to work with the HTML using lxml’s methods. Any html element has the method “find_class”, which takes a search string as a parameter, and searches all sub-elements within it. In this case ‘c’ will be the list of all table elements. Now, you’re about to tell me “Hey Mike, you’re not supposed to just name a variable ‘c’. That’s bad coding practice. You’re supposed to name it something a little easier to read so when you look at this six months later, Mike, you know what you’re looking at. You know better than to do that, Mike, so why did you do that, Mike?” Fuck you, that’s why! On a related note, get a load of this next line.

    d = c[0]

Here we begin parsing the actual html to get what we want. This code is based on the current html structure of the coinmarketcap site. This matters, since I already looked at the html (which I recommend you to do), I know that the first table returned is the one of interest.

headers = d[0].getchildren()[0].getchildren()
rows = d[1].getchildren()
outfile = open(coin + "_ends_"+ today + ".csv",'w')
outline = ''
for header in headers:
    outline+=header.text + ','
outline=outline[:-1]+'\n'
outfile.write(outline)

This section grabs the headers, rows in the body, and writes the headers to a csv file (we’ll add the body in a moment). The “getchildren()” function grabs all immediate children of that element. So “d” (the table) is broken up into the “thead”
and “tbody” elements, and are thus not organized in exactly the same way. The above assigns the elements in the header row to “headers” and the rows to “rows”. In hindsight, I could’ve used the find_class “tr” to probably grab all the rows in the table without having to care whether they’re headers or not. Better yet, I’m giving you free code: you do it.

for row in rows:
    outline = ''
    items = row.getchildren()
    for item in items:
        if item.text == items[0].text:
            dt = datetime.strptime(item.text,'%b %d, %Y')
            outline+=dt.strftime('%Y-%m-%d') + ','
        else:
            outline+=item.text.replace(',','') + ','
            outline=outline[:-1]+'\n'
            outfile.write(outline)
            outfile.close()

This last section iterates over all the rows. The first column is the date, which is in the format “Month Day, Year.” Not only is the format unnecessarily long, but it also includes an extra comma, which could throw off a csv reader if we don’t escape it. Instead of escaping it we’ll just change the date format. Keep it simple.

Now your function is complete. You can end your file with:

grab_historical_prices("bitcoin")

and it will grab bitcoin for you. Run this, and you’ll have the data you need for the next section.

THE NEXT SECTION

Here we’re gonna start looking at the data that we just grabbed. If in the last section you cheated by ignoring all the explanations and just slapping the code into a single file and running it, then well done! It’s what I would do. Here are the libraries you’ll need to import to get to the fun stuff.

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import math
import matplotlib.dates as mdates

Matplotlib is what we’ll be using for the actual graphing (kinda obvious tho, ain’t it?). The less obviously named pandas is a tool that makes importing from csv files easy, turning them into a dataframe object kinda similar to what you see in R. Numpy is numpy, and math is math. Look it up.

def read_csv(file_name):
    a = pd.read_csv(file_name)
    a['Date'] = pd.to_datetime(a['Date'],format='%Y-%m-%d')
    return a

Here we import the csv and turn it into a dataframe. Because we formatted the date to YYYY-mm-dd earlier, it will now be easy to convert the date column into a datetime format (for some reason it imports as string). “Why, Mike, did you create a read_csv function function for a process that is two lines of code?” Because date formatting a dataframe column annoys me, and I only want to think about it once. There is a rule in programming called “DRY”, which stands for “Don’t Repeat Yourself.” There are good reasons for this, but I’ll instead comment that this is the exact opposite of how I teach. There are good reasons for this, but I’ll instead comment that this is the exact opposite of how I teach.

a = read_csv('bitcoin_ends_20190902.csv')
plt.plot(a['Date'],a['Open*'])
plt.show()

This reads the csv file into the dataframe ‘a’. Then, it plots the date against the opening price. Lucky you! You now have your first matplotlib chart, and it’s a pretty cool one to boot. Next, based on an analysis a friend of mine showed me, we’ll be looking at the logarithmic chart of the bitcoin price. So far, this is what you should see.

bitcoin_line

Now try:

log_low = list(map(math.log10,a['Low']))
plt.plot(a['Date'],log_low)
plt.show()

“plt.plot” follows the pattern (x series, y series, formatting, another x series, another y series, formatting). In this case we’re doing with default formatting, and we’ve just swapped out the y series with the log10 version (meaning 10 becomes 1, 100 becomes 2, 1000 becomes 3, etc.) For those unfamiliar with the “map” function, it allows you to feed it a function and a list, and it will run that function over every item in that list. I once failed a job interview over this…

I had worked with python for years without hearing about “map” or “reduce.” You see, when I needed to iterate over a list I just did that, it didn’t occur to me that there was a special function for it. When i heard that map() would be part of the interview, i looked it up. One thing I didn’t realize was that in python 2, map() returns a list, whereas in python 3, map() returns a map object, which you convert into a list with “list(map_object)”.

Anyway, when asked, I told the interviewer that I was by no means an expert on python but I get the job done. When I got frustrated trying to figure out how this map object worked the interviewer lost patience and said “Well I guess you’re not as good as you said you were.” I was too flabbergasted to come up with a coherent response. Now that I’ve had time, here’s what I have to say.

It is absolutely retarded to judge someone’s programming ability by testing their knowledge of obscure shortcuts to iterating over lists, especially when there are other more straightforward ways to iterate over a list. I can’t get the job done? How about:

for i in range(0,len(a['Low'])):
    a['Low'][i] = math.log10(a['Low'])

Oh my god! Two whole lines instead of one! What a calamity! How could anyone possibly get by using python without knowing “map()”? It must be impossible! … It should also be noted that the two-line method that I used is necessary if the function requires more than one argument, or takes a static parameter, like if you wanted to square it by “math.pow(a[‘Low’][i],2).”

Hell, once you accept that you can pass functions as arguments, it becomes trivial to make your own “map” function. Let’s do that now.

def mappy(a,mylist):
    for i in range(0,len(mylist)):
        mylist[i] = a(mylist[i])
    return mylist

Wow, three whole lines and you’ve got your map function. How could anyone live without it? To any interviewers out there, for this reason I recommend you judge a developer by WHAT he does, and WHAT he has done, rather than by playing python trivia with functions that you could imitate in two lines, and recreate in three.

I would like to add that I lucked out: my current employers are brilliant people and i’m glad to work with them. Not coincidentally, their interview practices focused specifically on my ability to do the work. Anyway, the most recent code should give you this plot.

bitcoin_log_line

Notice that while the peaks seem to get out of place, the troughs seem to form a line. My acquaintance, Bryan Bonvallet, drew a trend line by fitting a ruler to the bottom of the curve, and was able to predict that the bitcoin bust of 2018 would bottom out at ~$3,500, which it did. So here we’re going to add our own trend line, not based on the graph as a whole but only using the few lowest points after a bitcoin boom (and the lowest point near the start of the graph as well). First we’ll get our minima.

bitcoin_minima_ranges = (['2013-01-01','2014-01-01'],['2014-04-01','2015-04-01'],['2018-06-01','2019-04-01'])
minima = None
for brange in bitcoin_minima_ranges:
    c = a[(a['Date'] > brange[0]) & (a['Date'] < brange[1])]
    min_open = min(c['Open*'])
    d = c[c['Open*'] == min_open]

    if minima is None:
        minima = d
    else:
        minima = minima.append(d)

This code iterates over the hard-coded ranges (any attempt at doing this for another currency will require coding other date ranges) creates a dataframe ‘c’ consisting of all points ‘a’ that fall within that range. min_open finds the lowest value of c[‘Open’], and then d is the row in c where the open value matches the minimum. The dataframe rows become appended into minima.

minima['log_low'] = list(map(math.log10,minima['Low']))
ndates = mdates.date2num(minima['Date'])
z = np.polyfit(ndates,minima['log_low'],1)
p0 = np.poly1d(z)
plt.plot(a['Date'],p0(mdates.date2num(a['Date'])),'xkcd:gold',a['Date'],log_low)
plt.show()

The numpy.polyfit command allows you to calculate the best fit for the items provided, but for some reason it breaks when you feed it dataframe dates from pandas, so you have to use mdates.date2num. This is what you should see now.

log_with_trend

Matplotlib supports subplots, allowing you to boldly show multiple charts next to each other. This next section of code will translate the line of best fit into the linear bitcoin graph as well as the logarithmic one, and show those stacked vertically.

fig, axs = plt.subplots(2)
fig.suptitle('Bitcoin Price Chart Logarithmic and Linear')
axs[0].plot(a['Date'],p0(mdates.date2num(a['Date'])),'xkcd:gold',a['Date'],log_low)
lval = p0(mdates.date2num(a['Date']))

for i in range(0,len(lval)):
    lval[i] = math.pow(10,lval[i])

axs[1].plot(a['Date'],lval,'xkcd:gold',a['Date'],a['Low'])
axs[0].set_title('Logarithmic Price')
axs[1].set_title('Linear Price')
axs[0].set(xlabel='Date',ylabel='Log10 of Price')
axs[1].set(xlabel='Date',ylabel='Bitcoin Price in Dollars')

With that, this is what you should see (Adjusting for the date range in case you run this years from now. Today’s date is 04 September, 2019).

Full Chart

Now for what this means in terms of investing, if this trend continues, then the best times to buy should be when the actual price of bitcoin comes close to the trend line itself. As for the question, “why is the growth of bitcoin logarithmic?” That is best saved for another day.

Happy Coding!

-Mike