Data Handeling and Visualization Pt. 1

This notebook will include example for how to:

- Open ascii files manually
- Open ascii files using the Pandas module
- Plot the data which has been parsed
- Save the results of those plots

Now we want to be able to read in the contents of the file and do things with them. We can do this "manually" or we can use some code others have written to. While generally I just use pandas (the code in question that others have writte), there is a lot of value in being able to do this yourself. That's because oftentimes files are quite complicated and the prebuilt code simply does not know what to do with them

Reading in Files

Manually

To read in a file we have to go through three steps

1) open the file
2) copy the contents from the file into the programs memory
3) close the file

in basic that might look like the following

There is actually a slightly better easier way to do this

Here the "with" block (called a context manager) took care of opening and closing the file for us (this is good cause it can be hard to remember to close a file so its nice to have the program handel it for you). When you need to open a file always try to use a context manager.

We now have the contents of the file read into memory (put another way we have stored them in the variable called file_contents). We now need to "parse" them. In this case we need to take this raw array of charectars and turn it into some structured table of numbers. When you are writing code to parse the contents of a file the first step is always to open the file in a text editor and look at its structure. Then you look for patterns which you can exploit to pull out the data that you want. Below is a snippet from the file we are interested in parsing

id x y Vvega err VIvega err Ivega err Vground Iground Nv  Ni wV wI xsig ysig othv othi qfitV qfitI RA Dec               


     1 1500.801  975.634   19.905   0.0029    0.634   0.0043   19.271   0.0032   20.077   19.230  1  1  1  1  0.014  0.010  0.000  0.000  0.059  0.024   138.0592168   -64.8911376
     2 1532.085  861.844   21.514   0.0061    0.917   0.0085   20.597   0.0059   21.783   20.563  1  1  1  1  0.022  0.001  0.121  0.009  0.067  0.152   138.0581958   -64.8927183
     3 1530.230  872.678   19.653   0.0026    0.600   0.0039   19.053   0.0029   19.811   19.011  1  1  1  1  0.016  0.009  0.002  0.002  0.054  0.027   138.0582561   -64.8925679
     4 1523.191  888.185   23.043   0.0125    1.028   0.0170   22.015   0.0115   23.347   21.984  1  1  1  1  0.056  0.040  0.000  0.000  0.116  0.075   138.0584862   -64.8923524
     5 1527.557  920.728   22.079   0.0080    0.853   0.0113   21.226   0.0080   22.328   21.189  1  1  1  1  0.005  0.001  0.000  0.000  0.072  0.048   138.0583423   -64.8919005
     6 1533.630  935.460   23.917   0.0189    1.475   0.0236   22.442   0.0141   24.339   22.430  1  1  1  1  0.032  0.021  0.357  0.168  0.074  0.194   138.0581433   -64.8916959
     7 1516.235  948.472   24.170   0.0215    0.935   0.0298   23.235   0.0207   24.445   23.201  1  1  1  1  0.043  0.006  0.023  0.033  0.181  0.218   138.0587125   -64.8915150
     8 1525.199  951.999   21.847   0.0072    0.780   0.0103   21.067   0.0074   22.071   21.029  1  1  1  1  0.010  0.001  0.006  0.006  0.067  0.058   138.0584188   -64.8914661
     9 1533.446  942.922   19.697   0.0026    0.619   0.0040   19.078   0.0029   19.864   19.036  1  1  1  1  0.006  0.008  0.000  0.000  0.051  0.032   138.0581490   -64.8915923
    10 1521.701  965.747   20.131   0.0032    0.620   0.0048   19.511   0.0036   20.298   19.469  1  1  1  1  0.011  0.007  0.000  0.000  0.055  0.035   138.0585330   -64.8912751
    11 1524.800  975.675   24.433   0.0245    1.345   0.0312   23.088   0.0194   24.824   23.070  1  1  1  1  0.041  0.054  0.174  0.101  0.199  0.238   138.0584313   -64.8911373
    12 1532.479  983.762   20.286   0.0035    0.649   0.0052   19.637   0.0038   20.464   19.596  1  1  1  1  0.008  0.013  0.002  0.003  0.049  0.038   138.0581797   -64.8910250
    13 1554.750  796.796   21.106   0.0051    0.674   0.0075   20.432   0.0055   21.292   20.391  1  1  1  1  0.022  0.045  0.000  0.000  0.193  0.160   138.0574554   -64.8936220
    14 1560.846  815.714   21.803   0.0070    0.826   0.0100   20.977   0.0071   22.044   20.939  1  1  1  1  0.011  0.010  0.000  0.000  0.059  0.053   138.0572552   -64.8933593
    15 1549.266  849.083   18.898   0.0018    0.691   0.0027   18.207   0.0020   19.091   18.167  1  1  1  1  0.004  0.012  0.000  0.000  0.051  0.034   138.0576335   -64.8928958
    16 1541.173  853.718   23.906   0.0188    1.344   0.0239   22.562   0.0148   24.296   22.543  1  1  1  1  0.016  0.048  0.351  0.179  0.214  0.148   138.0578985   -64.8928313
    17 1552.946  842.371   23.734   0.0173    1.345   0.0221   22.389   0.0137   24.125   22.371  1  1  1  1  0.036  0.002  0.640  0.381  0.125  0.110   138.0575132   -64.8929890
    18 1539.413  877.985   20.310   0.0035    0.646   0.0052   19.664   0.0039   20.486   19.623  1  1  1  1  0.008  0.008  0.016  0.027  0.059  0.032   138.0579555   -64.8924942
    19 1550.015  887.235   24.402   0.0239    1.393   0.0301   23.009   0.0183   24.804   22.993  1  1  1  1  0.025  0.021  0.004  0.003  0.145  0.117   138.0576080   -64.8923659
    20 1540.053  873.743   22.633   0.0103    0.957   0.0142   21.676   0.0098   22.916   21.642  1  1  1  1  0.011  0.005  0.840  0.573  0.106  0.068   138.0579347   -64.8925532
    21 1558.787  885.288   24.248   0.0221    1.269   0.0286   22.979   0.0181   24.620   22.957  1  1  1  1  0.033  0.045  0.004  0.006  0.177  0.107   138.0573210   -64.8923930
    22 1543.851  901.133   21.903   0.0073    0.784   0.0105   21.119   0.0076   22.128   21.081  1  1  1  1  0.013  0.007  0.000  0.000  0.047  0.035   138.0578096   -64.8921728
    23 1561.731  909.985   24.552   0.0257    1.391   0.0324   23.161   0.0197   24.954   23.145  1  1  1  1  0.008  0.035  0.177  0.045  0.204  0.124   138.0572241   -64.8920500
    24 1544.442  935.553   24.960   0.0314    1.664   0.0378   23.296   0.0210   25.422   23.294  1  1  1  1  0.008  0.043  0.006  0.006  0.211  0.228   138.0577894   -64.8916947
    25 1549.343  962.155   22.845   0.0114    1.026   0.0155   21.819   0.0105   23.149   21.788  1  1  1  1  0.013  0.015  0.000  0.000  0.078  0.059   138.0576284   -64.8913253
    26 1556.440  947.695   23.489   0.0154    1.057   0.0208   22.432   0.0140   23.801   22.402  1  1  1  1  0.013  0.030  0.000  0.000  0.135  0.077   138.0573964   -64.8915262
    27 1539.592  985.276   21.826   0.0071    0.843   0.0100   20.983   0.0071   22.071   20.946  1  1  1  1  0.004  0.007  0.041  0.040  0.061  0.052   138.0579470   -64.8910041
    28 1562.309  978.534   25.934   0.2530    1.354   0.3379   24.580   0.2240   26.326   24.562  2  2  1  1  0.001  0.116  0.387  0.294  0.488  0.393   138.0572035   -64.8910980
    29 1575.370  735.744   23.939   0.0191    1.212   0.0249   22.727   0.0160   24.295   22.702  1  1  1  1  0.023  0.007  0.017  0.013  0.118  0.077   138.0567819   -64.8944702
    30 1586.542  715.469   20.932   0.0047    0.678   0.0069   20.254   0.0051   21.120   20.214  1  1  1  1  0.005  0.021  0.000  0.000  0.059  0.038   138.0564167   -64.8947519
    31 1579.079  748.357   23.392   0.0147    1.197   0.0193   22.195   0.0125   23.744   22.170  1  1  1  1  0.013  0.014  0.013  0.009  0.064  0.085   138.0566601   -64.8942950
    32 1587.393  738.984   20.093   0.0032    0.627   0.0047   19.466   0.0035   20.262   19.424  1  1  1  1  0.003  0.016  0.000  0.000  0.059  0.041   138.0563883   -64.8944253
    33 1582.392  784.999   21.064   0.0050    0.785   0.0072   20.279   0.0051   21.291   20.241  1  1  1  1  0.006  0.008  0.001  0.003  0.054  0.037   138.0565509   -64.8937862
    34 1584.768  779.082   23.791   0.0178    1.193   0.0234   22.598   0.0151   24.142   22.573  1  1  1  1  0.021  0.050  0.201  0.123  0.117  0.082   138.0564731   -64.8938684
    35 1584.765  801.290   20.786   0.0044    0.658   0.0065   20.128   0.0048   20.967   20.088  1  1  1  1  0.008  0.011  0.000  0.000  0.054  0.036   138.0564726   -64.8935599
    36 1587.570  829.191   20.571   0.0040    0.675   0.0059   19.896   0.0043   20.759   19.856  1  1  1  1  0.012  0.001  0.000  0.000  0.060  0.036   138.0563803   -64.8931724
    37 1568.598  870.703   23.880   0.0185    1.216   0.0242   22.664   0.0155   24.238   22.640  1  1  1  1  0.034  0.002  0.040  0.039  0.153  0.080   138.0570002   -64.8925957
    38 1581.094  877.097   23.736   0.0173    1.135   0.0230   22.601   0.0151   24.071   22.573  1  1  1  1  0.007  0.020  0.000  0.000  0.137  0.074   138.0565912   -64.8925070
    39 1568.746  899.814   19.070   0.0020    0.663   0.0029   18.407   0.0022   19.253   18.366  1  1  1  1  0.013  0.008  0.000  0.000  0.061  0.035   138.0569946   -64.8921914
    40 1578.019  906.725   23.875   0.0185    1.310   0.0237   22.565   0.0148   24.257   22.545  1  1  1  1  0.035  0.048  0.076  0.070  0.202  0.238   138.0566910   -64.8920955
    41 1585.284  892.127   21.174   0.0052    1.575   0.0064   19.599   0.0037   21.618   19.592  1  1  1  1  0.014  0.011  0.001  0.001  0.056  0.030   138.0564537   -64.8922983
    42 1586.016  903.027   26.981   0.0958    2.180   0.1054   24.801   0.0441   27.533   24.832  1  1  1  1  0.019  0.046  9.900  0.189  0.692  0.474   138.0564293   -64.8921469
    43 1580.735  896.629   24.186   0.0214    1.258   0.0277   22.928   0.0176   24.554   22.906  1  1  1  1  0.050  0.069  0.209  0.316  0.106  0.154   138.0566026   -64.8922357
    44 1568.822  920.474   22.433   0.0094    0.987   0.0129   21.446   0.0088   22.725   21.414  1  1  1  1  0.009  0.014  0.000  0.000  0.066  0.082   138.0569918   -64.8919044
    45 1583.164  929.258   25.278   0.3970    1.472   0.3979   23.806   0.0270   25.699   23.793  2  1  1  1  0.048  0.049  0.019  0.013  0.306  0.141   138.0565222   -64.8917826
    46 1574.176  936.517   26.688   0.6730    1.588   0.6749   25.100   0.0510   27.135   25.094  2  2  1  1  0.064  0.110  4.400  2.114  0.880  0.494   138.0568160   -64.8916817
    47 1563.259  957.949   26.519   0.0725    1.926   0.0827   24.593   0.0398   27.030   24.607  1  1  1  1  0.030  0.057  0.331  0.137  0.551  0.320   138.0571729   -64.8913839

We can see that this file has the format of

1) A header line with column names seperated (more formalled called delimited) by spaces
2) 2 blank lines
3) Data lines delimited by spaces and intended by a tab when compared to the header

This is all the information we need to write a simple parser. Recall that the variable file_header stores the contents of the file.

Let's start by extracting the header

Now we have a list of the column names. Next we will extract the data

Now we have a numeric representation of the data from the file which we opened. The final thing we want to be able to do is to connect these with the header names from before. There is a data structure in python called a dictionary (more generally this is a hash-map) which allows you to index data with strings. We will use the column name as the index and the column as the data to index

We now have fully parsed this file, you can look at parts of it using the header names, for example if we wanted to plot (spoilers for latter) the x and y positions of the first 100 of these targets we could do the following

Now that's all well and good and once you get used to it it doesn't take too long to write; however, we don't normally have a file which is so spcialised we need to write our own parsed. Normally you can use a pre-canned one. The python package pandas has the ability to parse many files built right in

With Pandas

Congratulations! You have now read in and parsed your first file. This is a suprisingly large part of astronomy. Files can get pretty complicated so having this basic understanding is really helpful!

Plotting

Now that we have the data loaded into some usable data structure we can think about visualizing it.

Python has many viszlization libraries; however, by far the most commonly used is matplotlib. Matplotlib lets you create any number of graphs in 2 and 3D.

The first thing that we are going to look at is the color-magnitude diagram. In astronomy when we say color we mean the difference in brightness between two "filters" or two very specific wavelength of light. So we might imagine some telescope with a filter called B and filter called G. In this example think of B as the blue filter, only letting through blue light and blocking everything else, and G and the green filter, only letting through green light and blocking everything else. If we take 2 pictures of a star one with the B filter in place and one with the G filter in place we could report the stars Bmag and Gmag, its brightness (or magnitude) in blue light and its magnitude in green light (magnitude has a slightly more subtle definition but brightness is okay to 1st order). We could then say that its B-G color is the difference between the brightness in the blue filter and the green filter. It turns out a lot of interesting structure exists in graphs of magnitude vs color, these are called color-magnitude diagrams (CMDs)

The Hubble Space Telescope (HST) has a variety of filters, they are called things like F606W, F814W, and others in that kinda mold. Don't worry about what the names mean exactly right now (if you are interested: in general they give the center wavelength of the filter and then denote how much space around that center wavelength they cover)

We are going to look at the F606W F814W CMD. In the dataset which we have parsed these have been given the column names Vvega and Ivega.

So we can already see some structure here; however, there are a few things we can do to make this both more readable and more in line with normal CMDs (how astronomers tend to plot them for some historical reasons)

First of all when we plot CMDs we invert the y-axis (bigger numbers on the bottom). This is because the larger the magnitude the fainter the star. So by flipping the y-axis we keep the brightest objects on top (lowest magnitudes). In order to flip the axis we are going to dive a little more deeply into matplotlib.

Matplotlib operates on a few basic structures. The canvas, the figure, and the axes. In general you don't need to worry about the canvas, thats basically handeled by the backend. The figure only takes a little configureation (this is where you will set the overall size of the figure for example). The axes, which is everything you actually plot, the data, the axis labels, the tick marks, everything you see, is where most of the customization work goes.

Up and until now you have seen graphs made using something like the following:

plt.plot(x, y)

This is great for quick and dirty graphs; however, it limits what you can do latter on. For more powerful and customized plotting we need to use the "matplotlib object model". In general that looks like

fig, ax = plt.subplots(1, 1, figsize=(10, 7))
ax.plot(x, y)

You can see this looks the same as before, except now it is larger. Let's now invert the yaxis. The font size on the x and y axis labels also look quite small, lets make those larger.

Now this is much more like a CMD as we would see it in astronomy. However, there are a lot of points here, so many in fact that we cant actually see a lot of the structure. Basically points in the center of the CMD are rendering on top of each other hiding small differences between their location. This is actually somewhat of a challenging problem to solve. Once way is to simply make the points smaller, but this does not always work. What we will do here is make a mathematical representation of the density of these points then plot that. This is called a density map.

To generate the density map we are going to use something called a "gaussian kernel density estimator". The details of how this works are not super important, if you want to read about it here is a good resource https://mathisonian.github.io/kde/. This is a good example how useful python is for glueing different pieces of code together. Note that because of how many points are plotted here this may take quite a while to run

Note how you can see very similar structure to before but it is much more clear that there is a very narrow high density region right in the center (this is the principal sequence of NGC 2808) which rapidly falls off. Previously this density gradient was obscured by all the points we were plotting.

The final thing we are going to do in this notebook is to save these results to disk. Below is a bit of code that plots everything and saves everything

N.B. You may see some divide by zero errors here, thats okay ignore them for now.

Please let me know if you have any questions. I know that was a lot to have in one file. Its mostly all here so you have a reference going forward, dont think that you have to be able to understand or reproduce all of this immediatly, pretty soon you'll be old hat at this but for now take it at whatever pace you are comfortable with!