This notebook will include example for how to:
- Open ascii files manually
- Open ascii files using the Pandas module
- Plot the data which has been parsed
- Save the results of those plots
# The first step when using python is to import the packages which you want to use.
# is is good practice to import all the packages you want to use in one place (as opposed to throught the file)
# Packages are bits of codes other have written and you have installed which you can call from your programs
# this is super helpful as it means you don't have to write all this code again
# Some particularly helpful packages are:
# - numpy for more powerful mathematics operations such as linear algebra and statistics
# - matplotlib for graphing and visualizaton
# - scipy for some generally helpful scientific tools such as curve fitting
# - pandas for big data handeling
# - astropy for tools specific to astornomy such as time corrections and sky coordinates
# The syntax to bring packages into python can be somewhat confusing, I'll try to break down the basic rules below
# import statments can either start with "import" or "from"
# statements are basically human readable
# the "as" keyword is like assigning a shortcut name to a thing
# so for example you might say
# import numpy as np
# this will bring in all the code from the numpy package to your script. You can then access the code by writing np.<function_name> (for example np.sin(2*np.pi) would return the sin of 2 pi)
# if you didn't have as and instead just wrote
# import numpy
# you would instead write numpy.sin(2*numpy.py)
# you can also import specific things from a package
# from numpy import pi, sin
# in this case you can simply write sin(2*pi) but you wont have access to any of the other things within the package (np.<function_name> won't work because you never defined what np is)
# sometimes packages have subpackages within them
# import numpy.random as rand
# here the subpackage is called random, you can then access anything within random by doing rand.<function_name> (such as rand.normal() for a normal distribution)
# similarly to above you could also just grab one part
# from numpy.random import normal as norm
# this would then let you simply call norm()
# note how you can combine / mix and match the import, from, and as keywords
# Below I will import a few important packages in the way that I most commonly use them (including their most common / standard abbriviations)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os
from scipy.stats import gaussian_kde
from matplotlib.colors import LogNorm
FILEPATH = "hlsp_acsggct_hst_acs-wfc_ngc2808_r.rdviq.cal.adj.zpt"
# As a first check lets run a line of code that will make sure the file exists where we think it does!
# If the file exists then this will output okay, otherwise it will raise an error
# don't worry about the details of how the error is raised right now
if not os.path.exists(FILEPATH):
raise OSError(f"file {os.path.basename(FILEPATH)} does not exist!")
else:
print("Okay!")
Okay!
Now we want to be able to read in the contents of the file and do things with them. We can do this "manually" or we can use some code others have written to. While generally I just use pandas (the code in question that others have writte), there is a lot of value in being able to do this yourself. That's because oftentimes files are quite complicated and the prebuilt code simply does not know what to do with them
To read in a file we have to go through three steps
1) open the file
2) copy the contents from the file into the programs memory
3) close the file
in basic that might look like the following
file = open(FILEPATH)
file_contents = file.read()
file.close()
# because the file is really large it would take a long time to print all of it so this just prints the first 100 charectars to the screen, just to make sure this worked
print(file_contents[:100])
id x y Vvega err VIvega err Ivega err Vground Iground Nv Ni wV wI xsig ysig othv othi qfitV qfitI R
There is actually a slightly better easier way to do this
with open(FILEPATH) as file:
file_contents = file.read()
print(file_contents[:100])
id x y Vvega err VIvega err Ivega err Vground Iground Nv Ni wV wI xsig ysig othv othi qfitV qfitI R
Here the "with" block (called a context manager) took care of opening and closing the file for us (this is good cause it can be hard to remember to close a file so its nice to have the program handel it for you). When you need to open a file always try to use a context manager.
We now have the contents of the file read into memory (put another way we have stored them in the variable called file_contents). We now need to "parse" them. In this case we need to take this raw array of charectars and turn it into some structured table of numbers. When you are writing code to parse the contents of a file the first step is always to open the file in a text editor and look at its structure. Then you look for patterns which you can exploit to pull out the data that you want. Below is a snippet from the file we are interested in parsing
id x y Vvega err VIvega err Ivega err Vground Iground Nv Ni wV wI xsig ysig othv othi qfitV qfitI RA Dec 1 1500.801 975.634 19.905 0.0029 0.634 0.0043 19.271 0.0032 20.077 19.230 1 1 1 1 0.014 0.010 0.000 0.000 0.059 0.024 138.0592168 -64.8911376 2 1532.085 861.844 21.514 0.0061 0.917 0.0085 20.597 0.0059 21.783 20.563 1 1 1 1 0.022 0.001 0.121 0.009 0.067 0.152 138.0581958 -64.8927183 3 1530.230 872.678 19.653 0.0026 0.600 0.0039 19.053 0.0029 19.811 19.011 1 1 1 1 0.016 0.009 0.002 0.002 0.054 0.027 138.0582561 -64.8925679 4 1523.191 888.185 23.043 0.0125 1.028 0.0170 22.015 0.0115 23.347 21.984 1 1 1 1 0.056 0.040 0.000 0.000 0.116 0.075 138.0584862 -64.8923524 5 1527.557 920.728 22.079 0.0080 0.853 0.0113 21.226 0.0080 22.328 21.189 1 1 1 1 0.005 0.001 0.000 0.000 0.072 0.048 138.0583423 -64.8919005 6 1533.630 935.460 23.917 0.0189 1.475 0.0236 22.442 0.0141 24.339 22.430 1 1 1 1 0.032 0.021 0.357 0.168 0.074 0.194 138.0581433 -64.8916959 7 1516.235 948.472 24.170 0.0215 0.935 0.0298 23.235 0.0207 24.445 23.201 1 1 1 1 0.043 0.006 0.023 0.033 0.181 0.218 138.0587125 -64.8915150 8 1525.199 951.999 21.847 0.0072 0.780 0.0103 21.067 0.0074 22.071 21.029 1 1 1 1 0.010 0.001 0.006 0.006 0.067 0.058 138.0584188 -64.8914661 9 1533.446 942.922 19.697 0.0026 0.619 0.0040 19.078 0.0029 19.864 19.036 1 1 1 1 0.006 0.008 0.000 0.000 0.051 0.032 138.0581490 -64.8915923 10 1521.701 965.747 20.131 0.0032 0.620 0.0048 19.511 0.0036 20.298 19.469 1 1 1 1 0.011 0.007 0.000 0.000 0.055 0.035 138.0585330 -64.8912751 11 1524.800 975.675 24.433 0.0245 1.345 0.0312 23.088 0.0194 24.824 23.070 1 1 1 1 0.041 0.054 0.174 0.101 0.199 0.238 138.0584313 -64.8911373 12 1532.479 983.762 20.286 0.0035 0.649 0.0052 19.637 0.0038 20.464 19.596 1 1 1 1 0.008 0.013 0.002 0.003 0.049 0.038 138.0581797 -64.8910250 13 1554.750 796.796 21.106 0.0051 0.674 0.0075 20.432 0.0055 21.292 20.391 1 1 1 1 0.022 0.045 0.000 0.000 0.193 0.160 138.0574554 -64.8936220 14 1560.846 815.714 21.803 0.0070 0.826 0.0100 20.977 0.0071 22.044 20.939 1 1 1 1 0.011 0.010 0.000 0.000 0.059 0.053 138.0572552 -64.8933593 15 1549.266 849.083 18.898 0.0018 0.691 0.0027 18.207 0.0020 19.091 18.167 1 1 1 1 0.004 0.012 0.000 0.000 0.051 0.034 138.0576335 -64.8928958 16 1541.173 853.718 23.906 0.0188 1.344 0.0239 22.562 0.0148 24.296 22.543 1 1 1 1 0.016 0.048 0.351 0.179 0.214 0.148 138.0578985 -64.8928313 17 1552.946 842.371 23.734 0.0173 1.345 0.0221 22.389 0.0137 24.125 22.371 1 1 1 1 0.036 0.002 0.640 0.381 0.125 0.110 138.0575132 -64.8929890 18 1539.413 877.985 20.310 0.0035 0.646 0.0052 19.664 0.0039 20.486 19.623 1 1 1 1 0.008 0.008 0.016 0.027 0.059 0.032 138.0579555 -64.8924942 19 1550.015 887.235 24.402 0.0239 1.393 0.0301 23.009 0.0183 24.804 22.993 1 1 1 1 0.025 0.021 0.004 0.003 0.145 0.117 138.0576080 -64.8923659 20 1540.053 873.743 22.633 0.0103 0.957 0.0142 21.676 0.0098 22.916 21.642 1 1 1 1 0.011 0.005 0.840 0.573 0.106 0.068 138.0579347 -64.8925532 21 1558.787 885.288 24.248 0.0221 1.269 0.0286 22.979 0.0181 24.620 22.957 1 1 1 1 0.033 0.045 0.004 0.006 0.177 0.107 138.0573210 -64.8923930 22 1543.851 901.133 21.903 0.0073 0.784 0.0105 21.119 0.0076 22.128 21.081 1 1 1 1 0.013 0.007 0.000 0.000 0.047 0.035 138.0578096 -64.8921728 23 1561.731 909.985 24.552 0.0257 1.391 0.0324 23.161 0.0197 24.954 23.145 1 1 1 1 0.008 0.035 0.177 0.045 0.204 0.124 138.0572241 -64.8920500 24 1544.442 935.553 24.960 0.0314 1.664 0.0378 23.296 0.0210 25.422 23.294 1 1 1 1 0.008 0.043 0.006 0.006 0.211 0.228 138.0577894 -64.8916947 25 1549.343 962.155 22.845 0.0114 1.026 0.0155 21.819 0.0105 23.149 21.788 1 1 1 1 0.013 0.015 0.000 0.000 0.078 0.059 138.0576284 -64.8913253 26 1556.440 947.695 23.489 0.0154 1.057 0.0208 22.432 0.0140 23.801 22.402 1 1 1 1 0.013 0.030 0.000 0.000 0.135 0.077 138.0573964 -64.8915262 27 1539.592 985.276 21.826 0.0071 0.843 0.0100 20.983 0.0071 22.071 20.946 1 1 1 1 0.004 0.007 0.041 0.040 0.061 0.052 138.0579470 -64.8910041 28 1562.309 978.534 25.934 0.2530 1.354 0.3379 24.580 0.2240 26.326 24.562 2 2 1 1 0.001 0.116 0.387 0.294 0.488 0.393 138.0572035 -64.8910980 29 1575.370 735.744 23.939 0.0191 1.212 0.0249 22.727 0.0160 24.295 22.702 1 1 1 1 0.023 0.007 0.017 0.013 0.118 0.077 138.0567819 -64.8944702 30 1586.542 715.469 20.932 0.0047 0.678 0.0069 20.254 0.0051 21.120 20.214 1 1 1 1 0.005 0.021 0.000 0.000 0.059 0.038 138.0564167 -64.8947519 31 1579.079 748.357 23.392 0.0147 1.197 0.0193 22.195 0.0125 23.744 22.170 1 1 1 1 0.013 0.014 0.013 0.009 0.064 0.085 138.0566601 -64.8942950 32 1587.393 738.984 20.093 0.0032 0.627 0.0047 19.466 0.0035 20.262 19.424 1 1 1 1 0.003 0.016 0.000 0.000 0.059 0.041 138.0563883 -64.8944253 33 1582.392 784.999 21.064 0.0050 0.785 0.0072 20.279 0.0051 21.291 20.241 1 1 1 1 0.006 0.008 0.001 0.003 0.054 0.037 138.0565509 -64.8937862 34 1584.768 779.082 23.791 0.0178 1.193 0.0234 22.598 0.0151 24.142 22.573 1 1 1 1 0.021 0.050 0.201 0.123 0.117 0.082 138.0564731 -64.8938684 35 1584.765 801.290 20.786 0.0044 0.658 0.0065 20.128 0.0048 20.967 20.088 1 1 1 1 0.008 0.011 0.000 0.000 0.054 0.036 138.0564726 -64.8935599 36 1587.570 829.191 20.571 0.0040 0.675 0.0059 19.896 0.0043 20.759 19.856 1 1 1 1 0.012 0.001 0.000 0.000 0.060 0.036 138.0563803 -64.8931724 37 1568.598 870.703 23.880 0.0185 1.216 0.0242 22.664 0.0155 24.238 22.640 1 1 1 1 0.034 0.002 0.040 0.039 0.153 0.080 138.0570002 -64.8925957 38 1581.094 877.097 23.736 0.0173 1.135 0.0230 22.601 0.0151 24.071 22.573 1 1 1 1 0.007 0.020 0.000 0.000 0.137 0.074 138.0565912 -64.8925070 39 1568.746 899.814 19.070 0.0020 0.663 0.0029 18.407 0.0022 19.253 18.366 1 1 1 1 0.013 0.008 0.000 0.000 0.061 0.035 138.0569946 -64.8921914 40 1578.019 906.725 23.875 0.0185 1.310 0.0237 22.565 0.0148 24.257 22.545 1 1 1 1 0.035 0.048 0.076 0.070 0.202 0.238 138.0566910 -64.8920955 41 1585.284 892.127 21.174 0.0052 1.575 0.0064 19.599 0.0037 21.618 19.592 1 1 1 1 0.014 0.011 0.001 0.001 0.056 0.030 138.0564537 -64.8922983 42 1586.016 903.027 26.981 0.0958 2.180 0.1054 24.801 0.0441 27.533 24.832 1 1 1 1 0.019 0.046 9.900 0.189 0.692 0.474 138.0564293 -64.8921469 43 1580.735 896.629 24.186 0.0214 1.258 0.0277 22.928 0.0176 24.554 22.906 1 1 1 1 0.050 0.069 0.209 0.316 0.106 0.154 138.0566026 -64.8922357 44 1568.822 920.474 22.433 0.0094 0.987 0.0129 21.446 0.0088 22.725 21.414 1 1 1 1 0.009 0.014 0.000 0.000 0.066 0.082 138.0569918 -64.8919044 45 1583.164 929.258 25.278 0.3970 1.472 0.3979 23.806 0.0270 25.699 23.793 2 1 1 1 0.048 0.049 0.019 0.013 0.306 0.141 138.0565222 -64.8917826 46 1574.176 936.517 26.688 0.6730 1.588 0.6749 25.100 0.0510 27.135 25.094 2 2 1 1 0.064 0.110 4.400 2.114 0.880 0.494 138.0568160 -64.8916817 47 1563.259 957.949 26.519 0.0725 1.926 0.0827 24.593 0.0398 27.030 24.607 1 1 1 1 0.030 0.057 0.331 0.137 0.551 0.320 138.0571729 -64.8913839
We can see that this file has the format of
1) A header line with column names seperated (more formalled called delimited) by spaces
2) 2 blank lines
3) Data lines delimited by spaces and intended by a tab when compared to the header
This is all the information we need to write a simple parser. Recall that the variable file_header stores the contents of the file.
Let's start by extracting the header
# split file_contents into an array of strings based on lines. Each line becomes a new entry into this array. the \n charectar is the UNIX new line charectar.
# This will work on any "UNIX-Like" system. This includes Mac OS and Linux. This does not always work on windows which sometimes uses a different line feed charectar.
lines = file_contents.split('\n')
# get the first (0th because python counts from 0) line, which we already know is the header line
header_row = lines[0]
# We now have the first row but really we want that to be a list of the column names. We can use the split() function to split a string into a list of strings based on whitespace
# every white space charectar will be removed and the strings between them will be added to a new list (in this case called header)
header = header_row.split()
print(header)
['id', 'x', 'y', 'Vvega', 'err', 'VIvega', 'err', 'Ivega', 'err', 'Vground', 'Iground', 'Nv', 'Ni', 'wV', 'wI', 'xsig', 'ysig', 'othv', 'othi', 'qfitV', 'qfitI', 'RA', 'Dec']
Now we have a list of the column names. Next we will extract the data
# Here we get all the lines after (but not including) the third line up do (but not including) the final line (thats what the n:m sytnax does, start at n non inclusive and get everything after it up until m)
# negative numbers count back from the end so -1 is the last line
# we want to drop the last line because if you open the file and look at it you will see the last line is blank with no data on it
data_list = lines[3:-1]
# data_list is now a list of lines, we want to split each list into another list of numbers. There are a few ways you could do this;
# however, one of the best is with something called "list-comprehension"
# This lets you iterate over elements of a list and preform an operation on each one one at a time
# So below we say that for each row (x) in data_list (each row in the file) split that row in the same way we split
# the column names from before. We will be left with a 2D list (a table)
data_seperated = [x.split() for x in data_list]
# One major problem remains however. The data in data_seperated are in the form of strings
# that is to say that while they look like numbers, the computer knows them only as the charectars that represent those
# numbers. It does not know they are actually numbers and can therefore not do any math with them.
# Most languages (Python included) have whats called "casting" where you can take the string representation of a number
# and turn it into the numeric representation (tell the computer its a number and not just a set of charectars)
# next we will cast all the elements of data_seperated to "floats" (decimal numbers)
# recall that data_seperated is a 2D list so we need to loop over 2 indicies. Once again list comprehension makes
# this quite straigtforward
# Here we say for every item (y) in every row (x) in data_seperated turn y into a float
data_cast = [[float(y) for y in x] for x in data_seperated]
Now we have a numeric representation of the data from the file which we opened. The final thing we want to be able to do is to connect these with the header names from before. There is a data structure in python called a dictionary (more generally this is a hash-map) which allows you to index data with strings. We will use the column name as the index and the column as the data to index
# This is a lot like list comprehension, this is dictoinary comprehension (note the {} as opposed to []).
# Dictionaries are key-value paried data-structures, the key is defined on the left side of the :
# The value assigned to that key is on the right side of the :
# Here we define a dictionary called parsed
# We first loop over the "enumeration" of header
# this means loop over every element of header (recall these will be the column names) but don't just return the value, also count up from 0 (in other words return the place that value is in the list).
# The enumeration returns the index then the value, so when we say for column_number, column_name in enumerate(header) each loop iteration the index will
# unpack into column_numebr and the value at that index will unpack into column_name
# we use the column name to define the key for that element in the dict (the left side of the :) and we use the column number to select the correct element from each row
# in the data table we generated in the previous cell
parsed = {column_name: [row[column_number] for row in data_cast] for column_number, column_name in enumerate(header)}
We now have fully parsed this file, you can look at parts of it using the header names, for example if we wanted to plot (spoilers for latter) the x and y positions of the first 100 of these targets we could do the following
x = parsed['x'][:100]
y = parsed['y'][:100]
plt.plot(x, y, '.')
[<matplotlib.lines.Line2D at 0x7fc5f81a73c8>]
Now that's all well and good and once you get used to it it doesn't take too long to write; however, we don't normally have a file which is so spcialised we need to write our own parsed. Normally you can use a pre-canned one. The python package pandas has the ability to parse many files built right in
# All of what we did before (and a good bit more behind the scenes housekeeping) is handeled in this one line
# Read_csv is the function to read an ascii file
# start by giving it the filepath
# then we need to tell it how the file is delimited. By default it will assume commas seperate the data
# here however data is seperated by spaces. We can use whats called a regular expression to generally
# to account for seperation by all whitespace (spaces, tabs, a few other esotaric charectars)
# finally we tell pandas to skip blank lines because we know we have 2 blank lines between the header and the data
parsed = pd.read_csv(FILEPATH, sep=r"\s+", skip_blank_lines=True)
# one of the nice things pandas does is give us this easy to read table output when we look at our data
parsed
id | x | y | Vvega | err | VIvega | err.1 | Ivega | err.2 | Vground | ... | wV | wI | xsig | ysig | othv | othi | qfitV | qfitI | RA | Dec | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1500.801 | 975.634 | 19.905 | 0.0029 | 0.634 | 0.0043 | 19.271 | 0.0032 | 20.077 | ... | 1 | 1 | 0.014 | 0.010 | 0.000 | 0.000 | 0.059 | 0.024 | 138.059217 | -64.891138 |
1 | 2 | 1532.085 | 861.844 | 21.514 | 0.0061 | 0.917 | 0.0085 | 20.597 | 0.0059 | 21.783 | ... | 1 | 1 | 0.022 | 0.001 | 0.121 | 0.009 | 0.067 | 0.152 | 138.058196 | -64.892718 |
2 | 3 | 1530.230 | 872.678 | 19.653 | 0.0026 | 0.600 | 0.0039 | 19.053 | 0.0029 | 19.811 | ... | 1 | 1 | 0.016 | 0.009 | 0.002 | 0.002 | 0.054 | 0.027 | 138.058256 | -64.892568 |
3 | 4 | 1523.191 | 888.185 | 23.043 | 0.0125 | 1.028 | 0.0170 | 22.015 | 0.0115 | 23.347 | ... | 1 | 1 | 0.056 | 0.040 | 0.000 | 0.000 | 0.116 | 0.075 | 138.058486 | -64.892352 |
4 | 5 | 1527.557 | 920.728 | 22.079 | 0.0080 | 0.853 | 0.0113 | 21.226 | 0.0080 | 22.328 | ... | 1 | 1 | 0.005 | 0.001 | 0.000 | 0.000 | 0.072 | 0.048 | 138.058342 | -64.891901 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
308728 | 308729 | 4442.749 | 5098.706 | 25.143 | 0.0340 | 1.483 | 0.0423 | 23.660 | 0.0252 | 25.567 | ... | 1 | 1 | 0.014 | 0.031 | 0.002 | 0.001 | 0.183 | 0.167 | 137.963025 | -64.833873 |
308729 | 308730 | 4471.442 | 5010.146 | 22.015 | 0.0079 | 0.864 | 0.0111 | 21.151 | 0.0078 | 22.268 | ... | 1 | 1 | 0.009 | 0.001 | 0.000 | 0.000 | 0.065 | 0.034 | 137.962086 | -64.835103 |
308730 | 308731 | 4484.483 | 4999.601 | 21.080 | 0.0051 | 0.765 | 0.0074 | 20.315 | 0.0053 | 21.299 | ... | 1 | 1 | 0.010 | 0.002 | 0.000 | 0.000 | 0.042 | 0.029 | 137.961660 | -64.835249 |
308731 | 308732 | 4464.859 | 5026.023 | 22.737 | 0.0110 | 1.009 | 0.0150 | 21.728 | 0.0102 | 23.035 | ... | 1 | 1 | 0.008 | 0.007 | 0.146 | 0.115 | 0.075 | 0.045 | 137.962302 | -64.834883 |
308732 | 308733 | 4470.596 | 5047.705 | 25.217 | 0.0355 | 1.328 | 0.0453 | 23.889 | 0.0282 | 25.603 | ... | 1 | 1 | 0.040 | 0.079 | 0.000 | 0.000 | 0.228 | 0.110 | 137.962115 | -64.834581 |
308733 rows × 23 columns
# And this looks the same as before which is as it should be!
plt.plot(parsed['x'].iloc[:100], parsed['y'].iloc[:100], '.')
[<matplotlib.lines.Line2D at 0x7fc670f5d6a0>]
Congratulations! You have now read in and parsed your first file. This is a suprisingly large part of astronomy. Files can get pretty complicated so having this basic understanding is really helpful!
Now that we have the data loaded into some usable data structure we can think about visualizing it.
Python has many viszlization libraries; however, by far the most commonly used is matplotlib. Matplotlib lets you create any number of graphs in 2 and 3D.
The first thing that we are going to look at is the color-magnitude diagram. In astronomy when we say color we mean the difference in brightness between two "filters" or two very specific wavelength of light. So we might imagine some telescope with a filter called B and filter called G. In this example think of B as the blue filter, only letting through blue light and blocking everything else, and G and the green filter, only letting through green light and blocking everything else. If we take 2 pictures of a star one with the B filter in place and one with the G filter in place we could report the stars Bmag and Gmag, its brightness (or magnitude) in blue light and its magnitude in green light (magnitude has a slightly more subtle definition but brightness is okay to 1st order). We could then say that its B-G color is the difference between the brightness in the blue filter and the green filter. It turns out a lot of interesting structure exists in graphs of magnitude vs color, these are called color-magnitude diagrams (CMDs)
The Hubble Space Telescope (HST) has a variety of filters, they are called things like F606W, F814W, and others in that kinda mold. Don't worry about what the names mean exactly right now (if you are interested: in general they give the center wavelength of the filter and then denote how much space around that center wavelength they cover)
We are going to look at the F606W F814W CMD. In the dataset which we have parsed these have been given the column names Vvega and Ivega.
# Extraxt these from the parsed dataframe, this is not strictly nessisairy; however, often in programming you do things to keep code clean even if it not needed.
# by extracting these here I reduce the number of times I need to write parsed['Vvega/Ivega'] down the line. Simply writing Vvega/Ivega is a bit cleaner and
# will make the code easeier to read down the line.
Vvega = parsed['Vvega']
Ivega = parsed['Ivega']
# Let's plot these now
# '.' tells matplotlib to plot small points (as opposed to 'o' with is large points, or not having anything there in which case it will attempt to connect all points with lines)
plt.plot(Vvega-Ivega, Vvega, '.')
plt.xlabel('F606W-F814W')
plt.ylabel('F606W')
Text(0, 0.5, 'F606W')
So we can already see some structure here; however, there are a few things we can do to make this both more readable and more in line with normal CMDs (how astronomers tend to plot them for some historical reasons)
First of all when we plot CMDs we invert the y-axis (bigger numbers on the bottom). This is because the larger the magnitude the fainter the star. So by flipping the y-axis we keep the brightest objects on top (lowest magnitudes). In order to flip the axis we are going to dive a little more deeply into matplotlib.
Matplotlib operates on a few basic structures. The canvas, the figure, and the axes. In general you don't need to worry about the canvas, thats basically handeled by the backend. The figure only takes a little configureation (this is where you will set the overall size of the figure for example). The axes, which is everything you actually plot, the data, the axis labels, the tick marks, everything you see, is where most of the customization work goes.
Up and until now you have seen graphs made using something like the following:
plt.plot(x, y)
This is great for quick and dirty graphs; however, it limits what you can do latter on. For more powerful and customized plotting we need to use the "matplotlib object model". In general that looks like
fig, ax = plt.subplots(1, 1, figsize=(10, 7))
ax.plot(x, y)
# Lets take a look at this in more detail
# subplots let you make multiple plots in one line, here we only make one so we ask for 1 row and 1 column (thats what the 1, 1 is for)
# the figure (which holds the axes) and the axes (which you plot to) are both returned from this
# finally we set the figure size to 10, 7. These size units are basically arbitray so find ones that work for you
fig, ax = plt.subplots(1, 1, figsize=(10, 7))
ax.plot(Vvega-Ivega, Vvega, '.')
ax.set_xlabel('F606W-F814W')
ax.set_ylabel('F606W')
Text(0, 0.5, 'F606W')
You can see this looks the same as before, except now it is larger. Let's now invert the yaxis. The font size on the x and y axis labels also look quite small, lets make those larger.
fig, ax = plt.subplots(1, 1, figsize=(10, 7))
ax.plot(Vvega-Ivega, Vvega, '.')
# There are many helpful functions within the axes to customize them
ax.axes.invert_yaxis()
ax.set_xlabel('F606W-F814W', fontsize=25)
ax.set_ylabel('F606W', fontsize=25)
Text(0, 0.5, 'F606W')
Now this is much more like a CMD as we would see it in astronomy. However, there are a lot of points here, so many in fact that we cant actually see a lot of the structure. Basically points in the center of the CMD are rendering on top of each other hiding small differences between their location. This is actually somewhat of a challenging problem to solve. Once way is to simply make the points smaller, but this does not always work. What we will do here is make a mathematical representation of the density of these points then plot that. This is called a density map.
To generate the density map we are going to use something called a "gaussian kernel density estimator". The details of how this works are not super important, if you want to read about it here is a good resource https://mathisonian.github.io/kde/. This is a good example how useful python is for glueing different pieces of code together. Note that because of how many points are plotted here this may take quite a while to run
color = Vvega-Ivega
# here we use guassian_kde to create a function we can evaluate at any point in our domain and return the estimate density at that point
# therefore kde_f is a function which takes a coordinate as argument (such as kde_f((0,0)) and returns a density estimate
xy = np.vstack([color, Vvega])
kde_f = gaussian_kde(xy)
# We could evaluate the density at every point we have data for
# However....Thats a lot of points and will take a long time to run
# A faster way is to build up a "mesh". This is a grid of points covering some
# parameter space. Here we will build a mesh covering the color and Vvega range from each of their min values to each of their max values.
# We do some numpy magic, dont worry about this for now, just know for future that stuff like this is sometimes needed
X, Y = np.mgrid[color.min():color.max():100j, Vvega.min():Vvega.max():100j]
positions = np.vstack([X.ravel(), Y.ravel()])
# We then evaluate the kde_f function and reshape it to be an "image" (2D data / matrix)
Z = np.reshape(kde_f(positions).T, X.shape)
# then we plot that image similar to how we did before. Note how for a 2D data set we use the function imshow
fig, ax = plt.subplots(1, 1, figsize=(10, 7))
ax.invert_yaxis()
# extent here tells matplotlib what numeric range the image data runs over
ax.imshow(np.rot90(Z), extent=[color.min(), color.max(), Vvega.min(), Vvega.max()], norm=LogNorm())
<matplotlib.image.AxesImage at 0x7fc5f016a9e8>
Note how you can see very similar structure to before but it is much more clear that there is a very narrow high density region right in the center (this is the principal sequence of NGC 2808) which rapidly falls off. Previously this density gradient was obscured by all the points we were plotting.
The final thing we are going to do in this notebook is to save these results to disk. Below is a bit of code that plots everything and saves everything
N.B. You may see some divide by zero errors here, thats okay ignore them for now.
fig, ax = plt.subplots(1, 1, figsize=(10, 7))
img = ax.imshow(np.rot90(Z/Z.max()), extent=[color.min(), color.max(), Vvega.min(), Vvega.max()], norm=LogNorm())
ax.set_xlabel("F606W-F814W", fontsize=25)
ax.set_ylabel("F606W", fontsize=25)
# bbox_inches=tight tells matplotlib to fit the file size to the figure size, without it you will end up with a lot of whitespace in your image surrounding your file
fig.savefig("NGC2808_principal_exampleCMD.pdf", bbox_inches="tight")
Please let me know if you have any questions. I know that was a lot to have in one file. Its mostly all here so you have a reference going forward, dont think that you have to be able to understand or reproduce all of this immediatly, pretty soon you'll be old hat at this but for now take it at whatever pace you are comfortable with!