Skip to content

A command line large dataset query tool for Nasa's/JPL's NEO dataset

Notifications You must be signed in to change notification settings

TJPain/command-line-large-dataset-query-tool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Explore Close Approaches of Near-Earth Objects

In this project, you'll use Python - and the skills we've developed throughout this course - to search for and explore close approaches of near-Earth objects (NEOs), using data from NASA/JPL's Center for Near Earth Object Studies.

Overview

At a high-level, you'll create Python code that implements a command-line tool to inspect and query a dataset of NEOs and their close approaches to Earth.

Concretely, you'll have to read data from both a CSV file and a JSON file, convert that data into structured Python objects, perform filtering operations on the data, limit the size of the result set, and write the results to a file in a structured format, such as CSV or JSON.

When complete, you'll be able to inspect the properties of the near-Earth objects in the data set and query the data set of close approaches to Earth using any combination of the following filters:

  • Occurs on a given date.
  • Occurs on or after a given start date.
  • Occurs on or before a given end date.
  • Approaches Earth at a distance of at least (or at most) X astronomical units.
  • Approaches Earth at a relative velocity of at least (or at most) Y kilometers per second.
  • Has a diameter that is at least as large as (or at least as small as) Z kilometers.
  • Is marked by NASA as potentially hazardous (or not).

Learning Objectives

By completing this project, you'll have demonstrated an ability to:

  • Represent structured data in Python.
  • Extract data from structured files into Python.
  • Transform the data within Python according to some desired behavior.
  • Save the results in a structured way to a file.

Along the way, you'll have to be able to:

  • Write Python functions to transform data and perform algorithms.
  • Design Python classes to encapsulate useful data types.
  • Provide interface abstractions for complex implementations.

It's normal to encounter bugs along the way, so in all likelihood, you'll also gain practice with valuable debugging skills, whether interpreting stack traces, chasing down system errors, handling and raising appropriate errors, walking through code with pdb, checking preconditions with assert, or simply displaying internal state with print.

Understanding the Near-Earth Object Close Approach Datasets

This project contains two important data sets, and our first step will be to explore and understand the data containing within these structured files.

One dataset (neos.csv) contains information about semantic, physical, orbital, and model parameters for certain small bodies (asteroids and comets, mostly) in our solar system. The other dataset (cad.json) contains information about NEO close approaches - moments in time when the orbit of an astronomical body brings it close to Earth. NASA helpfully provides a glossary to define any unfamiliar terms you might encounter.

Importantly, these datasets come directly from NASA - we haven't dressed them up for you at all.

Small-Bodies Dataset

NASA's Jet Propulsion Laboratory (JPL) provides a web interface to their database of "small bodies" - mostly asteroids and comets - in the solar system. A subset of these small bodies are near-Earth objects (NEOs): "comets and asteroids that have been nudged by the gravitational attraction of nearby planets into orbits that allow them to enter the Earth's neighborhood." 1

From this dataset, you can answer questions such as "what is the diameter of the Halley's Comet?" or "is the near-Earth object named 'Eros' potentially hazardous?".

NASA's web service lets you download their data on near-Earth objects in a CSV format. For this project, the data set we've provided (neos.csv) comes directly from a query in which we limited the "Object Group" to NEOs and in which we selected every output field. That's a lot of columns (75, to be exact)!

Let's take an initial look at the first three rows of neos.csv:

id,spkid,full_name,pdes,name,prefix,neo,pha,H,G,M1,M2,K1,K2,PC,diameter,extent,albedo,rot_per,GM,BV,UB,IR,spec_B,spec_T,H_sigma,diameter_sigma,orbit_id,epoch,epoch_mjd,epoch_cal,equinox,e,a,q,i,om,w,ma,ad,n,tp,tp_cal,per,per_y,moid,moid_ld,moid_jup,t_jup,sigma_e,sigma_a,sigma_q,sigma_i,sigma_om,sigma_w,sigma_ma,sigma_ad,sigma_n,sigma_tp,sigma_per,class,producer,data_arc,first_obs,last_obs,n_obs_used,n_del_obs_used,n_dop_obs_used,condition_code,rms,two_body,A1,A2,A3,DT
a0000433,2000433,"   433 Eros (A898 PA)",433,Eros,,Y,N,10.4,0.46,,,,,,16.84,34.4x11.2x11.2,0.25,5.270,4.463e-04,0.921,0.531,,S,S,,0.06,"JPL 658",2459000.5,59000,20200531.0000000,J2000,.2229512647434284,1.458045729081037,1.132972589728666,10.83054121829922,304.2993259000444,178.8822959227224,271.0717325705167,1.783118868433408,.5598186418120109,2459159.351922368362,20201105.8519224,643.0654021001488,1.76061711731731,.148623,57.83961291,3.2865,4.582,9.6497E-9,2.1374E-10,1.4063E-8,1.1645E-6,3.8525E-6,4.088E-6,1.4389E-6,2.6139E-10,1.231E-10,2.5792E-6,1.414E-7,AMO,Giorgini,46330,1893-10-29,2020-09-03,8767,4,2,0,.28397,,,,,
a0000719,2000719,"   719 Albert (A911 TB)",719,Albert,,Y,N,15.5,,,,,,,,,,5.801,,,,,S,,,,"JPL 214",2459000.5,59000,20200531.0000000,J2000,.5465584653041263,2.63860206439375,1.196451769530403,11.56748478123323,183.8669499802364,156.17633771,140.2734217745985,4.080752359257098,.2299551959241748,2458390.496728663387,20180928.9967287,1565.522355575327,4.28616661348481,.203482,79.18908994,1.41794,3.140,2.1784E-8,2.5313E-9,5.8116E-8,2.9108E-6,1.6575E-5,1.6827E-5,2.5213E-6,3.9148E-9,3.309E-10,1.0306E-5,2.2528E-6,AMO,"Otto Matic",39593,1911-10-04,2020-02-27,1874,,,0,.39148,,,,,

Before we're able to write Python code to process this data, we'll need to understand what this data represents.

In this CSV file, the first row is a header, containing names for each of the columns. Each subsequent row represents a single NEO. There are too many columns to understand fully (although we encourage you to learn more by searching NASA's website!), so we'll focus on just a few of them:

pdes - the primary designation of the NEO. This is a unique identifier in the database, and its "name" to computer systems.
name - the International Astronomical Union (IAU) name of the NEO. This is its "name" to humans.
pha - whether NASA has marked the NEO as a "Potentially Hazardous Asteroid," roughly meaning that it's large and can come quite close to Earth.
diameter - the NEO's diameter (from an equivalent sphere) in kilometers.

So, the first NEO described in the CSV file has a primary designation of 433 and an IAU name "Eros". It is ('Y') an NEO, but it is not ('N') potentially hazardous. It has a diameter of 16.84km.

Every NEO has a primary designation, but there exist NEOs without names (in fact, having an IAU name is relatively rare!). Some IAU names are reused for several NEOs. For some NEOs, the data doesn't include information about a diameter, because NASA does not have enough observations to make a reasonably-accurate estimate.

If you'd like to explore individual NEOs in more detail (and perhaps interpret a few of the rest of the columns), NASA also provides a web interface to search for a single small body as well as an API.

Close Approach Dataset

NASA's Center for Near-Earth Object Studies (CNEOS) also provides data about close approaches of NEOs to Earth. A close approach occurs when an NEO's orbit path brings it near Earth - although, "near" in astronomical terms can be quite far in human-scale units, such as kilometers. Instead of kilometers, astronomical distances within the solar system are often measured with the astronomical unit (au) - the mean distance between the Earth and the sun - although sometimes you'll see distances measured with the lunar distance (ld) - the mean distance between the Earth and the moon - or even plain old kilometers.

From this dataset, you can answer questions such as "On which date(s) does Halley's Comet pass near to Earth?" or "How fast does Eros pass by Earth, on average?"

The data is JSON-formatted, and we've downloaded it from NASA's public API. A description of the API, as well as details about the query parameters and the scheme of the returned data, can be found here. Concretely, we asked NASA for this data by querying the API at https://ssd-api.jpl.nasa.gov/cad.api?date-min=1900-01-01&date-max=2100-01-01&dist-max=1. In other words, our data set contains all currently known close approaches that have happened or will happen in the 20th and 21st centuries! Additionally, NASA provides the data is chronological order.

Let's take an initial look at the data in cad.json.

{
  "signature":{
    "source":"NASA/JPL SBDB Close Approach Data API",
    "version":"1.1"
  },
  "count":"406785",
  "fields":["des", "orbit_id", "jd", "cd", "dist", "dist_min", "dist_max", "v_rel", "v_inf", "t_sigma_f", "h"],
  "data":[
    [
       "170903",
       "105",
       "2415020.507669610",
       "1900-Jan-01 00:11",
       "0.0921795123769547",
       "0.0912006569517418",
       "0.0931589328621254",
       "16.7523040362574",
       "16.7505784933163",
       "01:00",
       "18.1"
    ],
    [
       "2005 OE3",
       "52",
       "2415020.606013490",
       "1900-Jan-01 02:33",
       "0.414975519685102",
       "0.414968315685577",
       "0.414982724454678",
       "17.918395877175",
       "17.9180375373357",
       "< 00:01",
       "20.3"
    ],
    ...
  ]
}

It certainly looks different from the CSV data!

The top-level JSON payload is a dictionary with keys "signature", "count", "fields", and "data". The "signature" field shows where this data came from - in this case, from the API provided by NASA/JPL. The "count" field tells us how many entries to expect in the "data" section. The "fields" key maps to a list of strings describing how we should interpret the entries in the "data" section. Lastly, the "data" section itself maps to a list of lists - each element is a list of data for a single close approach, corresponding (by order) with the "fields" key.

What do each of the fields mean? NASA's API documentation provides the answer:

  • des - primary designation of the asteroid or comet (e.g., 443, 2000 SG344)
  • orbit_id - orbit ID
  • jd - time of close-approach (JD Ephemeris Time)
  • cd - time of close-approach (formatted calendar date/time, in UTC)
  • dist - nominal approach distance (au)
  • dist_min - minimum (3-sigma) approach distance (au)
  • dist_max - maximum (3-sigma) approach distance (au)
  • v_rel - velocity relative to the approach body at close approach (km/s)
  • v_inf - velocity relative to a massless body (km/s)
  • t_sigma_f - 3-sigma uncertainty in the time of close-approach (formatted in days, hours, and minutes; days are not included if zero; example "13:02" is 13 hours 2 minutes; example "2_09:08" is 2 days 9 hours 8 minutes)
  • h - absolute magnitude H (mag)

With this in mind, we can interpret that the first close approach contained in the dataset is:

  • an asteroid or comet with primary designation "170903"
  • an orbit ID of 105
  • a close approach time of 2415020.507669610 (in JD Ephemeris time) or 1900-Jan-01 00:11 (in a normal format)
  • an approach distance of 0.0921795123769547 astronomical units (with 3-sigma bounds of (0.0912006569517418au, 0.0931589328621254au))
  • an approach velocity of 16.7523040362574 km/s (relative to Earth) or 16.7505784933163 km/s (relative to a massless body)
  • 3-sigma uncertainty in the time of close approach of 1 hour
  • an absolute magnitude of 18.1

The second close approach contained in the dataset is:

  • an asteroid or comet with primary designation "2005 OE3"
  • an orbit ID of 52
  • a close approach time of 2415020.606013490 (in JD Ephemeris time) or 1900-Jan-01 02:33 (in a normal format)
  • an approach distance of 0.414975519685102 astronomical units (with 3-sigma bounds of (0.414968315685577au, 0.414982724454678au))
  • an approach velocity of 17.918395877175 km/s (relative to Earth) or 17.9180375373357 km/s (relative to a massless body)
  • 3-sigma uncertainty in the time of close approach of less than 1 minute.
  • an absolute magnitude of 20.3

As before, this data set contains more information than we need. For this project, we'll make use of the des, cd, dist, and v_rel measurements - although the other attributes can be useful if you wish to extend the project! Fortunately, each entry has well-formatted data for each of these attributes.

Visual Exploration

If you're someone who prefers to explore data sets by poking around a web site, NASA has a tutorial video on how to effectively navigate the CNEOS website, and an interactive close approach data table that you can investigate.

Also, it's important to realize that NASA is discovering new NEOs, and potential forecasting new close approaches, every week, so their web-based UI might contain updated information that isn't represented in the data files included with this project.

Project Interface

Now that we understand the data with which we'll be working, let's dive into what our program will actually do

This project is driven by the main.py script. That means that you'll run python3 main.py ... ... ... at the command line to invoke the program that will call your code.

At a command line, you can run python3 main.py --help for an explanation of how to invoke the script.

usage: main.py [-h] [--neofile NEOFILE] [--cadfile CADFILE] {inspect,query,interactive} ...

Explore past and future close approaches of near-Earth objects.

positional arguments:
  {inspect,query,interactive}

optional arguments:
  -h, --help            show this help message and exit
  --neofile NEOFILE     Path to CSV file of near-Earth objects.
  --cadfile CADFILE     Path to JSON file of close approach data.

There are three subcommands: inspect, query, and interactive. Let's take a look at the interfaces of each of these subcommands.

inspect

The inspect subcommand inspects a single NEO, printing its details in a human-readable format. The NEO is specified with exactly one of the --pdes option (the primary designation) and the --name option (the IAU name). The --verbose flag additionally prints out, in a human-readable form, all known close approaches to Earth made by this NEO. Each of these options has an abbreviated version. To remind yourself of the full interface, you can run python3 main.py inspect --help:

$ python3 main.py inspect --help
usage: main.py inspect [-h] [-v] (-p PDES | -n NAME)

Inspect an NEO by primary designation or by name.

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         Additionally, print all known close approaches of this NEO.
  -p PDES, --pdes PDES  The primary designation of the NEO to inspect (e.g. '433').
  -n NAME, --name NAME  The IAU name of the NEO to inspect (e.g. 'Halley').

Here are a few examples of the inspect subcommand in action:

# Inspect the NEO with a primary designation of 433 (that's Eros!)
$ python3 main.py inspect --pdes 433
NEO 433 (Eros) has a diameter of 16.840 km and is not potentially hazardous.

# Inspect the NEO with an IAU name of "Halley" (that's Halley's Comet!)
$ python3 main.py inspect --name Halley
NEO 1P (Halley) has a diameter of 11.000 km and is not potentially hazardous.

# Attempt to inspect an NEO that doesn't exist.
$ python3 main.py inspect --name fake-comet
No matching NEOs exist in the database.

# Verbosely list information about Ganymed and each of its known close approaches.
# For the record, Ganymed is HUGE - it's the largest known NEO.
$ python3 main.py inspect --verbose --name Ganymed
NEO 1036 (Ganymed) has a diameter of 37.675 km and is not potentially hazardous.
- On 1911-10-15 19:16, '1036 (Ganymed)' approaches Earth at a distance of 0.38 au and a velocity of 17.09 km/s.
- On 1924-10-17 00:51, '1036 (Ganymed)' approaches Earth at a distance of 0.50 au and a velocity of 19.36 km/s.
- On 1998-10-14 05:12, '1036 (Ganymed)' approaches Earth at a distance of 0.46 au and a velocity of 13.64 km/s.
- On 2011-10-13 00:04, '1036 (Ganymed)' approaches Earth at a distance of 0.36 au and a velocity of 14.30 km/s.
- On 2024-10-13 01:56, '1036 (Ganymed)' approaches Earth at a distance of 0.37 au and a velocity of 16.33 km/s.
- On 2037-10-15 18:31, '1036 (Ganymed)' approaches Earth at a distance of 0.47 au and a velocity of 18.68 km/s.

For an NEO to be found with the inspect subcommand, the given primary designation or IAU name must match the data exactly, so if an NEO is mysteriously missing, double-check the spelling and capitalization.

query

The query subcommand is more significantly more advanced - a query generates a collection of close approaches that match a set of specified filters, and either displays a limited set of those results to standard output or writes the structured results to a file.

$ python3 main.py query --help
usage: main.py query [-h] [-d DATE] [-s START_DATE] [-e END_DATE] [--min-distance DISTANCE_MIN] [--max-distance DISTANCE_MAX]
                     [--min-velocity VELOCITY_MIN] [--max-velocity VELOCITY_MAX] [--min-diameter DIAMETER_MIN]
                     [--max-diameter DIAMETER_MAX] [--hazardous] [--not-hazardous] [-l LIMIT] [-o OUTFILE]

Query for close approaches that match a collection of filters.

optional arguments:
  -h, --help            show this help message and exit
  -l LIMIT, --limit LIMIT
                        The maximum number of matches to return. Defaults to 10 if no --outfile is given.
  -o OUTFILE, --outfile OUTFILE
                        File in which to save structured results. If omitted, results are printed to standard output.

Filters:
  Filter close approaches by their attributes or the attributes of their NEOs.

  -d DATE, --date DATE  Only return close approaches on the given date, in YYYY-MM-DD format (e.g. 2020-12-31).
  -s START_DATE, --start-date START_DATE
                        Only return close approaches on or after the given date, in YYYY-MM-DD format (e.g. 2020-12-31).
  -e END_DATE, --end-date END_DATE
                        Only return close approaches on or before the given date, in YYYY-MM-DD format (e.g. 2020-12-31).
  --min-distance DISTANCE_MIN
                        In astronomical units. Only return close approaches that pass as far or farther away from Earth as the given
                        distance.
  --max-distance DISTANCE_MAX
                        In astronomical units. Only return close approaches that pass as near or nearer to Earth as the given
                        distance.
  --min-velocity VELOCITY_MIN
                        In kilometers per second. Only return close approaches whose relative velocity to Earth at approach is as fast
                        or faster than the given velocity.
  --max-velocity VELOCITY_MAX
                        In kilometers per second. Only return close approaches whose relative velocity to Earth at approach is as slow
                        or slower than the given velocity.
  --min-diameter DIAMETER_MIN
                        In kilometers. Only return close approaches of NEOs with diameters as large or larger than the given size.
  --max-diameter DIAMETER_MAX
                        In kilometers. Only return close approaches of NEOs with diameters as small or smaller than the given size.
  --hazardous           If specified, only return close approaches of NEOs that are potentially hazardous.
  --not-hazardous       If specified, only return close approaches of NEOs that are not potentially hazardous.

Here are a few examples of the query subcommand in action:

# Show (the first) two close approaches in the data set.
$ python3 main.py query --limit 2
On 1900-01-01 00:11, '170903' approaches Earth at a distance of 0.09 au and a velocity of 16.75 km/s.
On 1900-01-01 02:33, '2005 OE3' approaches Earth at a distance of 0.41 au and a velocity of 17.92 km/s.

# Show (the first) three close approaches on July 29th, 1969.
$ python3 main.py query --date 1969-07-29 --limit 3
On 1969-07-29 01:47, '408982' approaches Earth at a distance of 0.36 au and a velocity of 24.24 km/s.
On 1969-07-29 13:33, '2010 MA' approaches Earth at a distance of 0.21 au and a velocity of 8.80 km/s.
On 1969-07-29 19:56, '464798' approaches Earth at a distance of 0.10 au and a velocity of 8.02 km/s.

# Show (the first) three close approaches in 2050.
$ python3 main.py query --start-date 2050-01-01 --limit 3
On 2050-01-01 04:18, '2019 AY9' approaches Earth at a distance of 0.31 au and a velocity of 8.31 km/s.
On 2050-01-01 06:00, '162361' approaches Earth at a distance of 0.19 au and a velocity of 9.08 km/s.
On 2050-01-01 09:55, '2009 LW2' approaches Earth at a distance of 0.04 au and a velocity of 19.02 km/s.

# Show (the first) four close approaches in March 2020 that passed at least 0.4au of Earth.
$ python3 main.py query --start-date 2020-03-01 --end-date 2020-03-31 --min-distance 0.4 --limit 4
On 2020-03-01 00:28, '152561' approaches Earth at a distance of 0.42 au and a velocity of 11.23 km/s.
On 2020-03-01 09:28, '462550' approaches Earth at a distance of 0.47 au and a velocity of 17.19 km/s.
On 2020-03-02 21:41, '2020 QF2' approaches Earth at a distance of 0.45 au and a velocity of 8.79 km/s.
On 2020-03-03 00:49, '2019 TU' approaches Earth at a distance of 0.49 au and a velocity of 5.92 km/s.

# Show (the first) three close approaches that passed at most 0.0025au from Earth with a relative speed of at most 5 km/s.
# That's slightly less than the average distance between the Earth and the moon.
$ python3 main.py query --max-distance 0.0025 --max-velocity 5 --limit 3
On 1949-01-01 02:53, '2003 YS70' approaches Earth at a distance of 0.00 au and a velocity of 3.64 km/s.
On 1954-03-13 00:00, '2013 RZ53' approaches Earth at a distance of 0.00 au and a velocity of 3.04 km/s.
On 1979-09-02 00:16, '2014 WX202' approaches Earth at a distance of 0.00 au and a velocity of 1.79 km/s.

# Show (the first) three close approaches in the 2000s of NEOs with a known diameter of least 6 kilometers that passed Earth at a relative velocity of at least 15 km/s.
$ python3 main.py query --start-date 2000-01-01 --min-velocity 15 --min-diameter 6 --limit 3
On 2000-05-21 10:08, '7092 (Cadmus)' approaches Earth at a distance of 0.34 au and a velocity of 28.46 km/s.
On 2004-05-25 03:54, '7092 (Cadmus)' approaches Earth at a distance of 0.41 au and a velocity of 30.52 km/s.
On 2006-06-10 20:04, '1866 (Sisyphus)' approaches Earth at a distance of 0.49 au and a velocity of 26.81 km/s.

# Show (the first) two close approaches in January 2030 of NEOs that are at most 50m in diameter and are marked not potentially hazardous.
$ python3 main.py query --start-date 2030-01-01 --end-date 2030-01-31 --max-diameter 0.05 --not-hazardous --limit 2
On 2030-01-07 20:59, '2010 GH7' approaches Earth at a distance of 0.46 au and a velocity of 18.84 km/s.
On 2030-01-13 07:29, '2010 AE30' approaches Earth at a distance of 0.06 au and a velocity of 14.00 km/s.

# Show (the first) three close approaches in 2021 of potentially hazardous NEOs at least 100m in diameter that pass within 0.1au of Earth at a relative velocity of at least 15 kilometers per second.
$ python3 main.py query --start-date 2021-01-01 --max-distance 0.1 --min-velocity 15 --min-diameter 0.1 --hazardous --limit 3
On 2021-01-21 22:56, '363024' approaches Earth at a distance of 0.07 au and a velocity of 15.31 km/s.
On 2021-02-01 22:26, '2016 CL136' approaches Earth at a distance of 0.04 au and a velocity of 18.06 km/s.
On 2021-08-21 15:10, '2016 AJ193' approaches Earth at a distance of 0.02 au and a velocity of 26.17 km/s.

# Save, to a CSV file,  all close approaches.
$ python3 main.py query --outfile results.csv

# Save, to a JSON file, all close approaches in the 2020s of NEOs at least 1km in diameter that pass between 0.01 au and 0.1 au away from Earth.
$ python3 main.py query --start-date 2020-01-01 --end-date 2029-12-31 --min-diameter 1 --min-distance 0.01 --max-distance 0.1 --outfile results.json

interactive

There's a third useful subcommand named interactive. This subcommand first loads the database and then starts a command loop so that you can repeatedly run inspect and query subcommands on the database without having to wait to reload the data each time you want to run a new command, which saves an extraordinary amount of time. This can be extremely helpful, as it lets you speed up your development cycle and even show off the project more easily to friends.

Here's what an example session might look like:

$ python3 main.py interactive
Explore close approaches of near-Earth objects. Type `help` or `?` to list commands and `exit` to exit.

(neo) inspect --pdes 433
NEO 433 (Eros) has a diameter of 16.840 km and is not potentially hazardous.
(neo) help i
Shorthand for `inspect`.
(neo) i --name Halley
NEO 1P (Halley) has a diameter of 11.000 km and is not potentially hazardous.
(neo) query --date 2020-12-31 --limit 2
On 2020-12-31 05:48, '2010 PQ10' approaches Earth at a distance of 0.45 au and a velocity of 21.69 km/s.
On 2020-12-31 16:00, '2015 YA' approaches Earth at a distance of 0.17 au and a velocity of 5.65 km/s.
(neo) q --date 2021-3-14 --min-velocity 10
On 2021-03-14 06:17, '2019 DS1' approaches Earth at a distance of 0.39 au and a velocity of 20.17 km/s.
On 2021-03-14 20:19, '483656' approaches Earth at a distance of 0.06 au and a velocity of 12.09 km/s.
...

The prompt is (neo) . At the prompt, you can enter either an inspect or a query subcommand, with the exact same options and behavior as you would on the command line. You can use the special command quit, exit, or CTRL+D to exit this session and return to the command line. The command help or ? shows a help menu, and help <command> (e.g. help query) shows a help menu specific to that command. In this environment only, you can also use the short forms i and q for inspect and query (e.g. (neo) i --verbose --name Ganymed)).

Importantly, the interactive session doesn't automatically update when you update your code. This means that, if you make a meaningful change to your Python files, you should exit and restart the session. If the interactive session detects that any Python files have changed since it began, it will warn you before it runs each new command. The interactive subcommand takes an optional argument --aggressive - if specified, the interactive session will instead preemptively exit whenever it notices any changes to any Python files.

All in all, the interactive subcommand has the following options:

$ python3 main.py interactive --help
usage: main.py interactive [-h] [-a]

Start an interactive command session to repeatedly run `interact` and `query` commands.

optional arguments:
  -h, --help        show this help message and exit
  -a, --aggressive  If specified, kill the session whenever a project file is modified.

Project Scaffolding

Upon starting, the project contains several files and folders to help you get up and running:

.
├── README.md       # This file.
├── main.py
├── models.py       # Task 1.
├── read.py         # Task 2a.
├── database.py     # Task 2b and Task 3b.
├── filters.py      # Task 3a and Task 3c.
├── write.py        # Task 4.
├── helpers.py
├── data
│   ├── neos.csv
│   └── cad.json
└── tests
    ├── test-neos-2020.csv
    ├── test-cad-2020.json
    ├── test_*.py
    ├── ...
    └── test_*.py

Let's take a closer look at the purpose of each of these files and folders:

  • main.py: The main Python script that wraps the command-line tool, orchestrates the data pipeline by invoking the functions and classes that you'll write. You will not need to modify this file.
  • models.py: In this file, you'll define Python objects to represent a NearEarthObject and a CloseApproach. These objects will have a few attributes, a human-readable string representatino, and perhaps a property or a method here or there.
  • extract.py: In this file, you'll write functions to read information from data files, creating NearEarthObjects and CloseApproaches from the data.
  • database.py: In this file, you'll define an NEODatabase class to encapsulate the entire data set (connecting NEOs and close approaches) and write methods to get NEOs by primary designation and by name, as well as to query the dataset with a collection of user-specified filters to generate an iterable stream of matching results.
  • filters.py: In this file, you'll create a plethora of filters to be used in conjuction with the NEODatabase to query for a stream of matching close approaches. You'll also write a utility function to limit the number of results produced from a stream.
  • write.py: Finally, in this file, you'll implement functions to write a stream of results (the CloseApproach objects generated by the NEODatabase) to a file either in JSON format or in CSV format.
  • helpers.py: A simple module that provides a few helpful utility functions to convert to and from datetime objects.

The data files are located in the data/ folder.

Additionally, the starter code includes unit tests that will help you check your progress as you advance through this project. The unit tests all live in the tests/ folder. When the project is fully implemented, all of the unit tests should pass. To run all of the tests, you can use python3 -m unittest --verbose at the command line, although many tests will currently fail since the project isn't yet finished.

Tasks to Complete

Overview

At a high-level, we'll break down this project into a few manageable tasks.

  • Task 0: Inspect the data. (data/neos.csv and data/cad.json)
  • Task 1: Build models to represent the data. (models.py)
  • Task 2: Extract the data into a custom database (extract.py and database.py)
  • Task 3: Create filters to query the database to generate a stream of matching CloseApproach objects, and limit the result size. (filters.py and database.py)
  • Task 4: Save the data to a file. (write.py)

As you implement these tasks, you'll unlock more and more functionality. When Task 2 is complete, you'll be able to run the inspect subcommand. When Task 3 is complete, you'll be able to run the query subcommand without the --outfile argument. When Task 4 is complete, you'll be able to run everything.

Remember, in this project you won't need to write any code that prompts the user for input - the main.py script will accept arguments from the command line or the interactive session and pass that information to the appropriate Python classes and functions that you create.

Task 0: Inspect the data set.

The very first step of any project involving known data should always be to manually explore the data set. With any tool you'd like (Excel, a text editor, NASA's online browsers, etc), attempt to answer the following questions before you move on.

  • How many NEOs are in the neos.csv data set?
    • Hint: Count the number of rows in the neos.csv file.
    • Answer: 23967
  • What is the primary designation of the first Near Earth Object in the neos.csv data set?
    • Hint: Look at the first row of the CSV, under the header "pdes"
    • Answer: 433
  • What is the diameter of the NEO whose name is "Apollo"?
    • Hint: Look for the row of the CSV containing the name "Apollo" in the "name" column, and find the entry under the "diameter" column.
    • Answer: 1.5 kilometers
  • How many NEOs have IAU names in the data set?
    • Hint: Count the number of rows that have nonempty entries in the "name" column.
    • Answer: 343
  • How many NEOs have diameters in the data set?
    • Hint: Count the number of rows that have nonempty entries in the "diameter" column.
    • Answer: 1268
  • How many close approaches are in the cad.json data set?
    • Hint: Instead of manually counting the entries, you can use the value of the "count" key.
    • Answer: 406785
  • On January 1st, 2000, how close did the NEO whose primary designation is "2015 CL" pass by Earth?
    • Find entries whose date starts with '2000-Jan-01'. One of the lists represents the close approach of the NEO "2015 CL". What is the value corresponding to the distance from Earth?
    • Answer: About 0.145 au
  • On January 1st, 2000, how fast did the NEO whose primary designation is "2002 PB" pass by Earth?
    • Hint: Find entries whose date starts with '2000-Jan-01'. One of the lists represents the close approach of the NEO "2002 PB". What is the value corresponding to the velocity relative to Earth?
    • Answer: About 29.39 km/s

For this task, you might decide to use the Python interpreter to quickly answer some of these queries if they're too hard to answer by hand.

Have any lingering curiosities about the dataset? You may be able to use bespoke Python scripts to answer these questions, either now or as they come up.

Task 1: Design the objects that will store our data.

Well done! Now that we understand the project overview and our data set, it's time to start coding. The first thing we'll do is create Python objects to represent our data. In particular, we're going to create two classes in the models.py file:

  • A NearEarthObject class, to represent the data for a single near-Earth object.
  • A CloseApproach class, to represent the data for a single close approach of an NEO.

In doing so, we'll have to decide how to construct new instances of this class, which attributes from our dataset belong to each object, how to build a human-readable representation this object, and which additional methods or properties, if any, we want to include. We'll also have to plan for how these objects will interact with each other.

Designing the NearEarthObject class

The models.py file contains a starting template for the NearEarthObject class. This class object will represent a single near-Earth object.

class NearEarthObject:
    def __init__(self, ...):
        ...

    def __str__(self):
        ...

The __init__ method is the constructor for the class. You will need to decide what arguments it should accept. If you make changes, you should also update the surrounding comments.

The __str__ method will return a human-readable string that captures the contents of the class for a human audience. In contrast, the prewritten __repr__ method is stylized to be machine-readable.

Each NearEarthObject must have attributes (or gettable properties) for the following names:

  • designation: The primary designation for this NearEarthObject.
  • name: The IAU name for this NearEarthObject.
  • diameter: The diameter, in kilometers, of this NearEarthObject.
  • hazardous: Whether or not this NearEarthObject is potentially hazardous.
  • approaches: A collection of this NearEarthObjects close approaches to Earth.

The starter code contains default values for some of these attributes - you should decide how, and if, to replace that code.

Recall that, even though every NEO in the data set has a nonempty primary designation, some NEOs have no name, and some NEOs have no diameter (it's unknown to NASA).

The designation should resolve to a string, the name should resolve to either a nonempty string or the value None, the diameter should resolve to a float (you should use float('nan') to represent an undefined diameter), and the hazardous flag should resolve to a boolean.

The approaches attribute, for now, can be an empty collection. In Task 2, you'll use the real data set to populate this collection with the real CloseApproach data.

The __str__ method that you write is up to you - it'll determine how this object is printed, and should be human-readable. For inspiration, we adopted the following format:

>>> neo = ...
>>> print(neo)
NEO {fullname} has a diameter of {diameter:.3f} km and [is/is not] potentially hazardous.
>>> halley = ...
>>> print(halley)
NEO 433 (Eros) has a diameter of 16.840 km and is not potentially hazardous.

In the above, {fullname} is either {designation} ({name}) if the name exists or simply {designation} otherwise. As a hint, this is a great opportunity for a property named fullname!

Designing the CloseApproach class

The models.py file also contains a starting template for the CloseApproach class. This class object will represent a single close approach to Earth by a near-Earth object.

class CloseApproach:
    def __init__(self, ...):
        ...

    def __str__(self):
        ...

The __init__ method is the constructor for the class. You will need to decide what arguments it should accept. If you make changes, you should also update the surrounding comments.

The __str__ method will return a human-readable string that captures the contents of the class for a human audience. In contrast, the prewritten __repr__ method is stylized to be machine-readable.

Each CloseApproach must have attributes (or gettable properties) for the following names:

  • time: The date and time, in UTC, at which the NEO passes closest to Earth.
  • distance: The nominal approach distance, in astronomical units, of the NEO to Earth at the closest point.
  • velocity: The velocity, in kilometers per second, of the NEO relative to Earth at the closest point.
  • neo: The NearEarthObject that is making a close approach to Earth.

The date should resolve to a Python datetime, the distance should resolve to a float, and the velocity should resolve to a float.

The neo attribute, for now, can be None. In its absence, you should include a _designation attribute with the primary designation of the close approach's NEO. In Task 2, you'll use the real data set and this _designation attribute to connect the neo attribute to a real NearEarthObject instance.

You can use the cd_to_datetime function in the helpers module to convert a calendar date from the format provided in cad.json (e.g. "1900-Jan-01 00:00") into a Python datetime object.

The __str__ method that you write is up to you - it'll determine how this object is printed, and should be human-readable. For inspiration, we adopted the following format:

>>> ca = ...
>>> print(ca)
At {time_str}, '{neo.fullname}' approaches Earth at a distance of {distance:.2f} au and a velocity of {velocity:.2f} km/s.
>>> halley_approach = ...
>>> print(halley_approach)
On 1910-05-20 12:49, '1P (Halley)' approaches Earth at a distance of 0.15 au and a velocity of 70.56 km/s.

You should use the datetime_to_str function from the helpers module to format the time attribute to a string without seconds. This is another great opportunity for a property!

Testing

Make sure to manually test your implementation at an interactive interpreter. Your interactive session might look something like:

$ python3 -q
>>> from models import NearEarthObject, CloseApproach
>>> neo = NearEarthObject(...)  # Use any sample data here.
>>> print(neo.designation)
2020 FK
>>> print(neo.name)
One REALLY BIG fake asteroid
>>> print(neo.diameter)
12.345
>>> print(neo.hazardous)
True
>>> print(neo)
NEO 2020 FK (One REALLY BIG fake asteroid) has a diameter of 12.345 km and is potentially hazardous.
>>> ca = CloseApproach(...)  # Use any sample data here.
>>> print(type(ca.time))
datetime.datetime
>>> print(ca.time_str)
2020-01-01 12:30
>>> print(ca.distance)
0.25
>>> print(ca.velocity)
56.78
>>> print(ca)
On 2020-01-01 12:30, '2020 FK (One REALLY BIG fake asteroid)' approaches Earth at a distance of 0.25 au and a velocity of 56.78 km/s.

As you progress the the remaining tasks, you may have to revisit this file to adapt your implementation - that's expected!

Task 2: Extract data from structures files into Python objects.

Wonderful! Now that we've defined Python objects in models.py that can represent our data, let's extract the real data from our data sets.

For this task, we'll make changes in two files:

  • In extract.py, we'll write functions that takes the paths to our data files and extract structured data.
  • In database.py, we'll capture this data in an NEODatabase, precompute auxiliary data structures, interconnect the NearEarthObjects and CloseApproaches, and provide the ability to fetch NEOs by designation or by name.

Task 2a: Extract data from data files.

In the extract.py file, you'll implement the load_neos and load_approaches functions:

def load_neos(neo_csv_path):
    ...
    return a collection of `NearEarthObject` instances.

def load_approaches(cad_json_path):
    ...
    return a collection of `CloseApproach` instances.

The neo_csv_path and cad_json_path arguments are Path-like objects corresponding either to the default data/neos.csv and data/cad.json or to some alternate location specifed by the user at the command line. You can open(neo_csv_path) or open(cad_json_path) as usual.

In this module, you'll have to use the built-in csv and json modules. You'll also need to rely on the NearEarthObject and CloseApproach classes you defined in Task 1, which you could end up adapting if needed.

The collections returned by load_neos and load_approaches are then used by the main.py script to create an NEODatabase.

Task 2b: Encapsulate the data in a NEODatabase.

In the database.py file, you'll implement the __init__ constructor of the NEODatabase object and finish the get_neo_by_designation and get_neo_by_name methods. At the start, the NEODatabase class looks like:

class NEODatabase:
    def __init__(self, neos, approaches):
        ...
    def get_neo_by_designation(self, designation):
        ...
    def get_neo_by_name(self, name):
        ...

The neos and approaches arguments provided to the NEODatabase constructor are exactly the objects produced by the load_neos and load_approaches functions of the extract module.

In the NEODatabase constructor, you must connect together the collection of NearEarthObjects and the collection of CloseApproaches. Specifically, for each close approach, you should determine to which NEO its _designation corresponds, and assign that NearEarthObject to the CloseApproach's .neo attribute (which we set to None in Task 1). Additionally, you should add this close approach to the NearEarthObject's .approaches attribute, which represents a collection of CloseApproaches (which we initialized to an empty collection in Task 1).

In addition to storing the newly-connected NEOs and close approaches, you'll likely want to precompute some helpful auxiliary data structures that can speed up the get_neo_by_designation and get_neo_by_name methods. If you loop over every known NEO in those methods, the resulting code will be unnecessarily slow. What additional data structures can we attach to the NEODatabase that can assist with these methods?

Both the get_neo_by_designation and get_neo_by_name methods should return None if a matching NEO wasn't found in the database. For get_neo_by_name, in no case should the empty string nor the None singleton be associated to an NEO. Furthermore, in the relatively rare case that there are multiple NEOs with the same name, it's acceptable to return any of them.

Testing

It's always a good idea to manually test your implementation at an interactive interpreter. However, starting with Task 2, we provide additional tools for you to check your code.

You can use the pre-written unit tests to check that each of your functions and methods are working as required:

$ python3 -m unittest --verbose tests.test_extract tests.test_database

There are a total of 21 unit tests for this task. When Task 2 is complete, all of the unit tests in these two modules will pass.

Furthermore, after completing Task 2 entirely, the inspect subcommand will fully work. Therefore, you can use the command line to test your code as well:

$ python3 main.py inspect --name Halley
NEO 1P (Halley) has a diameter of 11.000 km and is not potentially hazardous.

# Inspect the NEO with a primary designation of 433 (that's Eros!)
$ python3 main.py inspect --pdes 433
NEO 433 (Eros) has a diameter of 16.840 km and is not potentially hazardous.

# Attempt to inspect an NEO that doesn't exist.
$ python3 main.py inspect --verbose --name Ganymed
NEO 1036 (Ganymed) has a diameter of 37.675 km and is not potentially hazardous.
- On 1911-10-15 19:16, '1036 (Ganymed)' approaches Earth at a distance of 0.38 au and a velocity of 17.09 km/s.
- On 1924-10-17 00:51, '1036 (Ganymed)' approaches Earth at a distance of 0.50 au and a velocity of 19.36 km/s.
- On 1998-10-14 05:12, '1036 (Ganymed)' approaches Earth at a distance of 0.46 au and a velocity of 13.64 km/s.
- On 2011-10-13 00:04, '1036 (Ganymed)' approaches Earth at a distance of 0.36 au and a velocity of 14.30 km/s.
- On 2024-10-13 01:56, '1036 (Ganymed)' approaches Earth at a distance of 0.37 au and a velocity of 16.33 km/s.
- On 2037-10-15 18:31, '1036 (Ganymed)' approaches Earth at a distance of 0.47 au and a velocity of 18.68 km/s.

Don't forget that you can use the interactive subcommand to repeatedly inspect NEOs without having to reload the database each time!

Task 3: Query close approaches with user-specified criteria.

Woohoo! You're making real progress. We can extract data from structured files, create NearEarthObject and CloseApproach instances to represent that data, and capture the data in an NEODatabase. Now, we'll provide the ability to query the data set of close approaches for a limited size stream of matching results.

We'll split this task up into a few steps:

  1. Create a collection of Filters from the options given by the user at the command line.
  2. Query the database's collection of close approaches to generate a stream of matching close approaches.
  3. Limit the stream of results to at most some given maximum number.

There are several filters that we'll implementing, corresponding to options from the query subcommand:

  • Date (--date, --start-date, --end-date)
  • Distance (--min-distance, --max-distance)
  • Velocity (--min-velocity, --max-velocity)
  • Diameter (--min-diameter, --max-diameter)
  • Hazardous (--hazardous, --not-hazardous)

Of these, the date, distance, and velocity filters apply to attributes of an instance of CloseApproach, whereas the diameter and hazardous filters apply to attributes of an instance of NearEarthObject. The date filter operates on Python date and datetime objects; the distance, velocity, and diameter filters operate on floats, and the hazardous filter operates on booleans.

You have a lot of design freedom in the first and second steps. They are closely related, so it's a good idea to start with just one filter type (distance, perhaps) in step 1, so that you can build and test step 2. Once step 1 and step 2 are working with a single filter type, you can expand to implement each of the rest of the filters. You can also leverage the tests (in tests.test_query, with python3 -m unittest --verbose tests.test_query) to measure your steady progress through the first two steps.

Task 3a: Creating filters.

For this step, you'll implement the create_filters function in the filters.py file. The main.py script calls this function with the options that the user provided at the command line.

def create_filters(date=None, start_date=None, end_date=None,
                   distance_min=None, distance_max=None,
                   velocity_min=None, velocity_max=None,
                   diameter_min=None, diameter_max=None,
                   hazardous=None):

If the user didn't provide an option, its value will be None. Note that, if the user specifies --not-hazardous, the value of the hazardous argument will be False, not to be confused with None.

You have tons of flexibility in what this object returns. The main.py script takes whatever it receives and passes it directly to the query method that you'll implement in Task 3b.

Designing a program with this much flexibility can be daunting, so we've prepared a first step for one possible approach (from which you can, and likely will, deviate) - under this plan, the create_filters function will produce a collection of instances of subclasses of AttributeFilter - a helper class we've already provided to you. You don't need to rely on AttributeFilter or even use it at all - you can delete it and pursue your own implementation design - but here's the idea:

What do these filters have in common? Each of them compares (with <=,==, or >=) some attribute (of a CloseApproach or a NearEarthObject) to a reference value. For example, the date filters check if the close approach date is equal to, less than or equal to, or greater than or equal to the date given on the command line. So, the three things that seem to be shared between all of our filters are (1) a way to get the attribute we're interested in and (2) a way to compare that attribute against (3) some reference value. Where there's shared behavior, there's an opportunity for decomposition.

class AttributeFilter:
    def __init__(self, op, value):
        self.op = op
        self.value = value

    def __call__(self, approach):
        return self.op(self.get(approach), self.value)

    @classmethod
    def get(cls, approach):
        raise UnsupportedCriterionError

The three elements are present in the AttributeFilter superclass - in (1) the class method AttributeFilter.get, (2) the op argument to the constructor, and (3) the value argument to the constructor.

This abstract superclass's get method raises UnsupportedCriterionError, a custom subclass of NotImplementedError, but concrete subclasses will be able to override this method to actually get a specific attribute of interest. The op argument will represent the operation corresponding to either <=, ==, or >= - Python's operator module makes these available to us as operator.le, operator.eq, and operator.ge. That is, operator.ge(a, b) is the same as a >= b. Lastly, the value will just be our target value, as supplied by the user at the command line and fed to create_filters by the main module.

The __call__ method makes instance objects of this type behave as callables - if we have an instance of a subclass of AttributeFilter named f, then the code f(approach) is really evaluating f.__call__(approach). Specifically, "calling" the AttributeFilter with a CloseApproach object will get the attribute of interest (self.get(approach)) and compare it (via self.op) to the reference value (self.value), returning either True or False, representing whether that close approach satisfies the criterion.

As an example, suppose that we wanted to build an AttributeFilter that filtered on the designation attribute of the NearEarthObject attached to a CloseApproach (really, we wouldn't ever need this, because primary designations are unique and we already have NEODatabase.get_neo_by_designation). We could define a new subclass of AttributeFilter:

class DesignationFilter(AttributeFilter):
    @classmethod
    def get(cls, approach):
        return approach.neo.designation

We could then create and use an instance of this new class:

approach_433 = CloseApproach(...)
approach_other = CloseApproach(...)
f = DesignationFilter(operator.eq, '433')
f(approach_433)  # => True
f(approach_other)  # => True

This might seem complex - and it is. Are there different ways to do this? Well, yes. However, this is a relatively clean first approach, and the AttributeFilter is a first step towards unifying these filters, from which you can deviate freely.

On Comparing Dates

So far, we've been treating dates (naive Python objects that store a year, month, and day) and datetimes (naive Python objects that store a year, month, day, hour, minute, and seconds) as essentially interchangeable. Mostly, we haven't cared too much about the details. However, dates and datetimes are not comparable (would "May 1st" be before, after, or equal to "May 1st at noon"?).

The date, start_date, and end_date arguments supplied to create_filters are dates, but the .time attribute of a CloseApproach is a datetime. You can use the .date() method on datetime objects to get the corresponding moment as a date. That is, you aren't able to evaluate start_date <= approach.time <= end_date but you are able to evaluate start_date <= approach.time.date() <= end_date

Task 3b: Query the database of close approaches using user-specified criteria.

Let's turn our attention back to the database.py file. For this task, you'll implement the query method, which will generate a stream of CloseApproaches that match the user's criteria.

The query method accepts one argument - a collection of filters. The main.py script supplies to the query method whatever was returned from the create_filters function you implemented above.

You have a lot of freedom in how you implement this method - your implementation choice depends heavily on how you designed your filters in the previous section. In pseudo-code, we roughly expect the implementation to look something like the following:

define query(filters):
  for each approach in the database's collection of close approaches:
    if this close approach passes each of the criteria:
      yield this close approach

As before, you can certainly deviate from this pattern, especially depending on how you chose to implement the previous step.

Why yield? Recall that when we use yield in a Python function, it becomes a generator function, capable of pausing and resuming. Generators are often useful to represent sources of data streams. In our project, there might be thousands of close approaches matching the user's criteria, but we might only need to show the first ten (specified with the --limit command-line option). For these cases, we'll want the query function not to return a fully-computed collection of matching close approaches - which could take a while to compute - but rather to generate a stream of matching close approaches. In doing so, we'll make the query method almost instantaneous, and only do the work to determine the next element of the generator (the next matching CloseApproach) if another unit of code asks for it.

There are a plethora of other ways to optimize this method as well. For example, you could preprocess even more auxiliary data structures in the NEODatabase constructor to speed up specific queries. You might map dates to collections of close approaches that occurred on those dates, to speed up the --date criterion. You might order the close approaches by distance or velocity, or the NEOs by diameter, in order to more efficiently search for matches. Furthermore, you might be able to intelligently combine filters - for example, there are definitely no close approaches that are simulataneously closer than 0.1au (--max-distance 0.1) to Earth and further than 0.3au (--max-distance 0.3) from Earth. Depending on the exact approach you take, some of these changes may affect the design of your filters or the create_filter function, but there are many opportunities for performance improvements.

However, while these additional optimizations are certainly interesting - and in many cases can speed up the time it takes to perform complex queries - they are in no way necessary to successfully complete this task. By following the pseudocode given above, you can query the collection of close approaches to generate (with yield) a stream of results that match user-specified criteria.

Task 3c: Limit the results to at most some maximum number.

After the main.py script runs .query on the NEODatabase with the objects you produced in create_filters, it sends the stream of results through the limit function in the filters module. This is the next function that we'll write.

def limit(iterator, n):
    ...

The first argument - iterator - represents a stream of data, as an iterable. In our pipeline, it will be the stream of CloseApproaches produced by the query method. The second argument - n- represents the maximum number of elements from the stream that might be produced by the limit function. If n is None or zero, you shouldn't limit the results at all.

You should not treat the iterator argument as being an in-memory aggregate data type, such as a list or a tuple. In particular, you should not slice the iterator argument.

Why restrict ourselves in this way? With any sufficiently large dataset, we'd usually like to do the minimum number of operations necessary to achieve our goal. As just discussed, there are some queries for which, if we simply calculated and buffered all matching close approaches from the query method and sliced the result, the runtime would be just too slow. Although our data set may be small enough for the naive solution to be possible, it's still big enough to illustrate a noticeable improved performance by leveraging operations on iterators and generators.

As a hint, (although not necessary) you may find the itertools.islice function helpful.

Testing

It's getting a little harder to manually test your implementations.

At the command line, as you implement more and more individual filters (and their effect on query), you'll unlock more and more of the options of the query subcommand. When this task is finished, the query subcommand will work completely, with the exception of --outfile. Here are a few examples:

# Query for close approaches on 2020-01-01
$ python3 main.py query --date 2020-01-01

# Query for close approaches in 2020.
$ python3 main.py query --start-date 2020-01-01 --end-date 2020-12-31

# Query for close approaches in 2020 with a distance of <=0.1 au.
$ python3 main.py query --start-date 2020-01-01 --end-date 2020-12-31 --max-distance 0.1

# Query for close approaches in 2020 with a distance of >=0.3 au.
$ python3 main.py query --start-date 2020-01-01 --end-date 2020-12-31 --min-distance 0.3

# Query for close approaches in 2020 with a velocity of <=50 km/s.
$ python3 main.py query --start-date 2020-01-01 --end-date 2020-12-31 --max-velocity 50

# Query for close approaches in 2020 with a velocity of >=25 km/s.
$ python3 main.py query --start-date 2020-01-01 --end-date 2020-12-31 --min-velocity 25

# Query for close approaches of not potentially-hazardous NEOs between 500m and 600m in diameter.
$ python3 main.py query --min-diameter 0.5 --max-diameter 0.6 --not-hazardous

# Query for close approaches of potentially-hazardous NEOs larger than 2.5km passing within 0.1 au at a speed of at least 35 km/s
# Hint: There's only one match in the whole dataset :)
$ python3 main.py query --max-distance 0.1 --min-velocity 35 --min-diameter 2.5 --hazardous

There are more examples at the start of this README and in the main.py file's module comment.

In some cases, you might want to inspect an NEO to check that the diameter and hazardous filters behave correctly.

Again, recall that you can use the interactive subcommand to load the database once and perform several query and inspect commands, which will avoid excessively waiting for your code to reload the database with each command.

Additionally, you can use the pre-written unit tests to exercise each of these steps. You can read the test files if you'd like to see exactly which test cases we use.

$ python3 -m unittest tests.test_query tests.test_limit

There are a total of 37 unit tests for this task. You can use these tests during development as well. As you implement individual filter types, you'll pass more and more of the tests.

When this task is complete, all tests should pass.

Task 4: Report the results.

Fantastic! You've successfully written code to filter and limit the database of close approaches with user-specified criteria. So far, the results have been simply printed to standard output.

For this task, you'll implement functions in write.py to save these results to an output file. You'll write two functions:

  • write_to_csv: Write a stream of CloseApproach objects to a specific CSV file.
  • write_to_json: Write a stream of CloseApproach objects to a specific JSON file.

Each of these functions accepts two arguments: results and filename.

The results parameter is a stream of CloseApproach objects, as produced by the limit function. The filename parameter is a Path-like object with the name of the output file. You can open(filename, 'w') as usual.

If there are no results, then write_to_csv should just write a header row, and write_to_json should just write an empty list.

CSV Output Format

The write_to_csv method should write a stream of results to a CSV file and include a header row. Each row will represent one CloseApproach from the stream of results, and include information about the close approach as well as the associated NEO. The header columns should be: 'datetime_utc', 'distance_au', 'velocity_km_s', 'designation', 'name', 'diameter_km', 'potentially_hazardous'.

As an example, consider the CloseApproach when the NEO Eros approaches Earth on 2025-11-30 02:18. For this close approach, the corresponding row would be:

datetime_utc,distance_au,velocity_km_s,designation,name,diameter_km,potentially_hazardous
...
2025-11-30 02:18,0.397647483265833,3.72885069167641,433,Eros,16.84,False
...

A missing name must be represented by the empty string (not 'None'). A missing diameter must be represented either by the empty string or by the string 'nan'. The potentially_hazardous flag should be either the string 'False' or the string 'True'.

JSON Output Format

The write_to_json method should write a stream of results to a JSON file. The top-level JSON object must be a list, with each entry representing one CloseApproach from the stream of results. Each entry should be a dictionary mapping the keys 'datetime_utc', 'distance_au', 'velocity_km_s' to the associated values on the CloseApproach object and the key neo to a dictionary mapping the keys 'designation', 'name', 'diameter_km', 'potentially_hazardous' to the associated values on the close approach's NEO.

As an example, consider the (same) CloseApproach when the NEO Eros approaches Earth on 2025-11-30 02:18. For this close approach, the corresponding entry would be:

[
  {...},
  {
    "datetime_utc": "2025-11-30 02:18",
    "distance_au": 0.397647483265833,
    "velocity_km_s": 3.72885069167641,
    "neo": {
      "designation": "433",
      "name": "Eros",
      "diameter_km": 16.84,
      "potentially_hazardous": false
    }
  },
  ...
]

The datetime_utc value should be a string formatted with datetime_to_str from the helpers module; the distance_au and velocity_km_s values should be floats; the designation and name should be strings (if the name is missing, it must be the empty string); the diameter_km should be a float (if the diameter_km is missing, it should be the JSON value NaN, which Python's json loader successfully rehydrates as float('nan')); and potentially_hazardous should be a boolean (i.e. the JSON literals false or true, not the strings 'False' nor 'True').

Deduplicating Serialization

It can feel as though this output specification includes several edge cases. Fortunately, with the right design, Python's default behavior will handle these edge cases smoothly. While you are free to concretely implement these methods in any way you would like, we recommend that you add .serialize()methods to the NearEarthObject and CloseApproach classes that each produce a dictionary containing relevant attributes for CSV or JSON serialization. These methods can individually handle any edge cases, in a single place. For example:

>>> neo = NearEarthObject(...)
>>> approach = CloseApproach(...)
>>> print(neo.serialize())
{'designation': '433', 'name': 'Eros', 'diameter_km': 16.84, 'potentially_hazardous': False}
>>> print(approach.serialize())
{'datetime_utc': '2025-11-30 02:18', 'distance_au': 0.397647483265833, 'velocity_km_s': 3.72885069167641}

Testing

Congratulations! This was the final task for this project.

At this point, all of the unit tests should pass. You can run all of the unit tests:

$ python3 -m unittest
.........................................................................
----------------------------------------------------------------------
Ran 73 tests in 3.666s

OK

Heck, run it with python3 -m unittest --verbose to verbosely celebrate all of the test cases that you have now made pass.

Tests for this specific task are in the tests.test_write module.

Furthermore, the complete functional interface of the command line tool should now work. Therefore, you can now use main.py freely (including the --outfile argument). For example:

# Save (the first) five close approaches on 2020-01-01 to a CSV file.
$ python3 main.py query --date 2020-01-01 --limit 5 --outfile results.csv

# Save (the first) five close approaches on 2020-01-01 to a JSON file.
$ python3 main.py query --date 2020-01-01 --limit 5 --outfile results.json

# Putting it all together.
# Save (the first) ten close approaches between 2020-01-01 and 2020-12-31 of a potentially-hazardous NEO larger than 250m in diameter that passed within 0.1au of Earth to a JSON file.
$ python3 main.py query --start-date 2020-01-01 --end-date 2020-12-31 --hazardous --min-diameter 0.25 --max-distance 0.1 --limit 5 --outfile results.json

Recap

We've reviewed a lot of information. Here's a high-level overview of the main parts of each task.

  • Task 0: Inspect data. (data/neos.csv and data/cad.json)
  • Task 1: Build models. (models.py)
    • Write __init__ and __str__ methods for NearEarthObject and CloseApproach
  • Task 2a: Extract data. (extract.py)
    • Implement load_neos and load_approaches to read data from CSV and JSON files.
  • Task 2b: Process data. (database.py)
    • Implement the constructor for NEODatabase, preprocessing the data to help with future queries.
    • Write methods to get NEOs by primary designation or by name.
  • Task 3a: Create filters. (filters.py)
    • Define a hierarchy of Filters.
    • Implement create_filters to create a collection of filters from user-specified criteria.
  • Task 3b: Query matching close approaches (database.py)
    • Implement the query method to generate a stream of CloseApproaches that match the given filters.
  • Task 3c: Limit results. (filter.py)
    • Write limit to produce only the first values from a generator.
  • Task 4: Save data. (write.py)
    • Implement write_to_csv and write_to_json to save structured data to a formatted file.

Development Environments

This project requires Python 3.6+. To see the version of your environment's Python 3, run python3 -V at the command line. You should see: Python 3.X.Y where X >= 6.

Fortunately, this project has no dependencies external to the Python standard library, so there's no need for virtual environments.

All of the examples use the python3 executable. Only if your environment's python -V is also Python 3.6+ can you use python instead of python3.

There are two primary environments in which you might accomplish this project: (1) in a Udacity classroom workspace; (2) locally, on a machine you control.

Udacity Workspace

Within the Udacity course, under the "Project: Exploring Near-Earth Objects" lesson, find the "Project Workspace" page and click on it. You'll be taken to a development environment with a file explorer, a code editor, and a command line. This workspace already contains all of the necessary files within the /home/workspace folder - you're ready to get started!

Local Development

First, clone the project to your local machine with git clone https://github.com/udacity/nd303-c1-advanced-python-techniques-project-starter.git, and then navigate to the project directory (the one containing main.py).

Check Your Setup

As you settle into your development environment, run the following unit tests to check that your environment is set up correctly. All of the tests should pass, even on the starter code.

$ python3 -m unittest --verbose tests.test_python_version
test_python_version_is_at_least_3_6 (tests.test_python_version.TestPythonVersion) ... ok

----------------------------------------------------------------------
Ran 1 test in X.XXXs

OK
$ python3 -m unittest --verbose tests.test_data_files
test_data_files_are_not_empty (tests.test_data_files.TestDataFiles) ... ok
test_data_files_are_readable (tests.test_data_files.TestDataFiles) ... ok
test_data_files_are_well_formatted (tests.test_data_files.TestDataFiles) ... ok
test_data_files_exist (tests.test_data_files.TestDataFiles) ... ok

----------------------------------------------------------------------
Ran 4 tests in X.XXXs

OK

If any of the tests fail, you should fix the causes of error before beginning this project.

Deliverables

Upon completing this project, you'll have modified at least database.py, extract.py, filters.py, models.py, and write.py. If you went above-and-beyond with any Stand Out Suggestions, include your changes alongside an EXTENSIONS.md file describing your changes so that the reviewers can understand what you've done.

Over the course of this project (specifically, in Task 4), you've likely created several output files. You should remove these files before submitting your project.

Rubric

In addition to the functionality requirements detailed above, your submission will be assessed on how well it follows best practices in Python. Roughly speaking, "best practices" can be divided into two categories - mechanics and design. Good Python mechanics adhere to PEP 8 - the style guide for Python code - and PEP 257 - docstring conventions. These address rules for naming, spacing, commenting, and several common programming patterns. Additionally, good general programming mechanics that you are expected to follow include removing starter code markings, removing extraneous print statements, and documenting your code. Python design refers to the higher-level organization of your code - the interfaces and implementation boundaries defined by your code objects. Many interface and implementation boundaries are already imposed by the organization of the starter code; however, there are still several situations (particularly in Tasks 3 and 4) in which the organization of your code can reflect poorly on or reflect well on the organization of the problem and your choice of solution.

Complete details on grading criteria are available in the attached rubric.

About

A command line large dataset query tool for Nasa's/JPL's NEO dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages