Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update class.md #203

Open
wants to merge 43 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
a67da2a
Transcript of my data science studies plan.
scdavis50 Mar 23, 2016
244c3c6
Updated transcript fixing the formatting.
scdavis50 Mar 23, 2016
bcbc0f4
Update scott-davis-transcript.md
scdavis50 Mar 23, 2016
f306f38
Update scott-davis-transcript.md
scdavis50 Mar 23, 2016
4464351
Update scott-davis-transcript.md
scdavis50 Mar 23, 2016
7e8245f
Added a couple of resources, fixed tags
scdavis50 Apr 17, 2016
70ca806
Updates to transcript
scdavis50 Apr 18, 2016
d22bab8
additional updates, fixed some formatting
scdavis50 Apr 18, 2016
de7679e
updates
scdavis50 Apr 18, 2016
386dce9
corrected a link
scdavis50 Apr 19, 2016
aef621d
Updated with new algorithm certificaiton
scdavis50 Apr 22, 2016
5590047
Update with edx classes
scdavis50 Apr 29, 2016
7be0b77
Updated with websites, along with completions to date
scdavis50 May 2, 2016
ee559df
Updated with course completions
scdavis50 May 7, 2016
2c84ef2
Updated completions
scdavis50 May 12, 2016
208dd78
Completed Developing Data products class
scdavis50 May 23, 2016
e47f699
updated with edx materials
scdavis50 May 23, 2016
791674e
Finished webscraping with python
scdavis50 May 24, 2016
5cf800e
finished data science from scratch
scdavis50 May 24, 2016
bd079aa
Updates for completions
scdavis50 Jun 13, 2016
5c4ef33
Added completion of coursera data science
scdavis50 Jul 17, 2016
ef353f9
updated with algorithmic toolbox completion
scdavis50 Aug 20, 2016
8397e0e
updated with some additional books finished.
scdavis50 Aug 31, 2016
732cc15
updated with book completions
scdavis50 Sep 8, 2016
29b5eef
updated with book completions
scdavis50 Sep 15, 2016
f06e81c
updated with some additional books finished.
scdavis50 Sep 17, 2016
e836ef1
Create jekyll-gh-pages.yml
clarecorthell Nov 15, 2022
e464450
Update expired link
phuongdoan13 Nov 18, 2022
da46337
NLTK book URL fix
florianbuetow Mar 2, 2023
2d91ff0
README.md Updated
aniketpotabatti Apr 11, 2023
9539dec
Merge pull request #201 from aniketpotabatti/master
clarecorthell Apr 16, 2023
f1691f4
Merge pull request #197 from phuongdoan13/patch-1
clarecorthell Apr 16, 2023
b480dff
Merge pull request #200 from florianbuetow/nltk-book-url-fix
clarecorthell Apr 16, 2023
85d1257
Update README.md
clarecorthell Apr 16, 2023
2c3e09c
Update README.md
clarecorthell Apr 17, 2023
3fd6f15
Update README.md
clarecorthell Apr 17, 2023
2d8010a
Update README.md
clarecorthell Apr 17, 2023
a958072
Update README.md
clarecorthell Apr 18, 2023
c1109b4
Update README.md
clarecorthell Apr 18, 2023
c5da9ac
Update README.md
clarecorthell Apr 18, 2023
5be9ca7
Merge pull request #95 from scdavis50/master
clarecorthell Apr 18, 2023
3e4e19c
Update README.md
clarecorthell Apr 18, 2023
446827f
minor changes
fredsys2016 Aug 19, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 50 additions & 0 deletions .github/workflows/jekyll-gh-pages.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Sample workflow for building and deploying a Jekyll site to GitHub Pages
name: Deploy Jekyll with GitHub Pages dependencies preinstalled

on:
# Runs on pushes targeting the default branch
push:
branches: ["master"]

# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:

# Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages
permissions:
contents: read
pages: write
id-token: write

# Allow one concurrent deployment
concurrency:
group: "pages"
cancel-in-progress: true

jobs:
# Build job
build:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Setup Pages
uses: actions/configure-pages@v2
- name: Build with Jekyll
uses: actions/jekyll-build-pages@v1
with:
source: ./
destination: ./_site
- name: Upload artifact
uses: actions/upload-pages-artifact@v1

# Deployment job
deploy:
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
runs-on: ubuntu-latest
needs: build
steps:
- name: Deploy to GitHub Pages
id: deployment
uses: actions/deploy-pages@v1
17 changes: 12 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,13 +64,19 @@ Get familiar and comfortable with manipulating data in a database with a common
* SQL School [Mode Analytics / Tutorials](http://bit.ly/sqlschool)

### Math & Statistics

#### Calculus
* Single Variable Calculus [MIT OpenCourseWare](http://ocw.mit.edu/courses/mathematics/18-01-single-variable-calculus-fall-2006/)
* Multivariable Calculus [MIT OpenCourseWare](http://ocw.mit.edu/courses/mathematics/18-02sc-multivariable-calculus-fall-2010/)

#### Linear Algebra
The foundational mathematics for working with large samples of data. Spend time in exercises until you feel highly confident in the key topics of Linear Algebra. It will serve you well.
* An Intuitive Guide to Linear Algebra [Better Explained / Article](https://betterexplained.com/articles/linear-algebra-guide/)
* A Programmer's Intuition for Matrix Multiplication [Better Explained / Article](https://betterexplained.com/articles/matrix-multiplication/)
* Vector Calculus: Understanding the Cross Product [Better Explained / Article](https://betterexplained.com/articles/cross-product/)
* Vector Calculus: Understanding the Dot Product [Better Explained / Article](https://betterexplained.com/articles/vector-calculus-understanding-the-dot-product/)
* Linear Algebra [Khan Academy / Videos](http://bit.ly/khanlinalg)
* Linear Algebra [MIT](http://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/)

#### Statistics
How can we answer questions with data? Everywhere you look, you'll see methods from statistics. Spend a lot of time here!
Expand Down Expand Up @@ -120,7 +126,7 @@ A branch of statistics that uses graphical models and specialized statistics to
### Natural Language Processing
The imperfect and immensely useful art (science?) of transforming human language into data.
* From Languages to Information / Stanford CS147 [Materials](http://bit.ly/nlpcs124)
* NLP with Python (NLTK library) [Digital](http://bit.ly/ebook-nltk), [Book ```$55```](https://bookshop.org/a/2958/9780596516499)
* NLP with Python (NLTK library) [Digital](http://bit.ly/py-nltk), [Book ```$55```](https://bookshop.org/a/2958/9780596516499)
* How to Write a Spelling Correcter / Norvig [Tutorial](http://norvig.com/spell-correct.html)

### Graph Analysis
Expand Down Expand Up @@ -153,6 +159,7 @@ If you have interest in operations management, manufacturing, supply chains, or

### Deep Learning / Neural Networks
* Neural Networks [Andrej Karpathy / Python Walkthrough](http://bit.ly/karpathyneuralnets)
* Neural Networks for Machine Learning [Geoffrey Hinton / U Toronto](https://www.youtube.com/playlist?list=PLoRl3Ht4JOcdU872GhiYWf6jwrk_SNhz9)
* Deep Learning for Natural Language Processing CS224d [Stanford](http://cs224d.stanford.edu/syllabus.html)

## 🤝 Doing Data Science
Expand Down Expand Up @@ -197,9 +204,9 @@ A document conveying the motives, direction, investment, and expected value of t
#### Results Presentation
A slide deck or document with the goal of conveying the results of the work and how the findings support an important decision(s).

Best appended to the Spec, and summarized in a slide deck for easy consumption. Depending on the culture of the group, slides or a short docuemnt may be easier to look through to understand the results of the work. In the remote work era, think about how your work will be passed around and make sure your "above the fold" is easy to understand and clearly conveys the "why" and results in particular.
Best appended to the Spec, and summarized in a slide deck for easy consumption. Depending on the culture of the group, slides or a short document may be easier to look through to understand the results of the work. In the remote work era, think about how your work will be passed around and make sure your "above the fold" is easy to understand and clearly conveys the "why" and results in particular.

__Example__: A particularly polished [presentation](https://medium.com/lyft-engineering/how-lyft-discovered-openstreetmap-is-the-freshest-map-for-rideshare-a7a41bf92ec) of [map quality study results](https://drive.google.com/file/d/1Sb-dOUjeP1Ljqz4ra931D3Pe8B5C3pde/view) showing higher data quality in US maps on OSM than commercially available alternatives. The impact of this work was a) increased confidence in service reliability and b) enabled the company to decide against buying a commercially available annual license costing ~$10mi/yr.
__Example__: A particularly polished [presentation](https://medium.com/lyft-engineering/how-lyft-discovered-openstreetmap-is-the-freshest-map-for-rideshare-a7a41bf92ec) of [map quality study results](https://drive.google.com/file/d/1Sb-dOUjeP1Ljqz4ra931D3Pe8B5C3pde/view) showing higher data quality in US maps on OSM than commercially available alternatives. The impact of this work was a) increased confidence in service reliability for the company and b) enabled the company to decide against buying a commercially available annual license costing millions of dollars annually.

## 🧑‍💻 Capstone Project
_Choose a meaningful project or dataset to demonstrate what you've learned._
Expand Down Expand Up @@ -229,9 +236,8 @@ Show the process you used to disprove your hypothesis, preferably in a jupyter n
* Exploratory Data Analysis [Tukey / Book ```$81```](http://amzn.to/1kNUEPa) [```$113```](https://bookshop.org/books/exploratory-data-analysis-classic-version/9780134995458)
* Mining Massive Data Sets / Stanford [Course & Digital Textbook](http://bit.ly/mmds-course) & [Book ```$58```](https://bookshop.org/a/2958/9781108476348)
* Introduction to Information Retrieval / Stanford [Digital](http://bit.ly/ebook-stanford-inforetrieval) & [Book ```$70```](https://bookshop.org/a/2958/9780521865715)
* [Data Science in IPython Notebooks](http://bit.ly/ipynb-ds) (Linear Regression, Logistic Regression, Random Forests, K-Means Clustering)
* Probabilistic Graphical Models [Stanford / Coursera](http://bit.ly/stanford-pgm)
* Differential Equations in Data Science [Python Tutorial](http://bit.ly/ipynb-differentialeq)
* Differential Equations in Data Science [Python Tutorial](https://web.archive.org/web/20190617023702/https://nbviewer.jupyter.org/github/URXtech/techblog/blob/master/continuousTimeMarkovChain/markovChain.ipynb)
* Algorithm Design, Kleinberg & Tardos [Book ```$125```](http://amzn.to/1iMnWm5)
* [Tidy Data in Python](http://www.jeannicholashould.com/tidy-data-in-python.html)
* Designing, Visualizing and Understanding Deep Neural Networks [Berkeley CS294-129](https://bcourses.berkeley.edu/courses/1453965/pages/cs294-129-designing-visualizing-and-understanding-deep-neural-networks)
Expand All @@ -243,6 +249,7 @@ Show the process you used to disprove your hypothesis, preferably in a jupyter n
* SQL Tutorials [SQLZOO / Tutorials](http://bit.ly/tut-sqlzoo)
* Machine Learning [Caltech / Edx](http://bit.ly/caltech-ml)
* A Course in Machine Learning [UMD / Digital Book](http://bit.ly/22WyV3N)
* Designing Data Intensive Applications [Book ```$56```](https://bookshop.org/a/2958/9781449373320)

***

Expand Down
2 changes: 2 additions & 0 deletions class.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Freddy
Version control system
2 changes: 1 addition & 1 deletion r-resources.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ _[Note: The core of The Open Source Data Science Masters focuses on programmatic

#### Basic Statistics with R

* An Introduction to Statistical Learning [Book pdf](http://www-bcf.usc.edu/~gareth/ISL/ISLR%20First%20Printing.pdf) ^also a Machine Learning resource
* An Introduction to Statistical Learning [Book pdf](https://www.statlearning.com/) ^also a Machine Learning resource

#### Data Science with R
* Introduction to Data Science [Syracuse University / ebook](http://jsresearch.net/index.html)
Expand Down
133 changes: 133 additions & 0 deletions scott-davis-transcript.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
<h1>Scott Davis Transcript</h1>
<h2>Open Source Data Science Masters</h2>

<br>I'm going to have some time for indepedent study this year so I plan to work through as much as possible. I work in the real estate industry and we have so much data that isn't used for meaningful analysis and the tools, though readily available, haven't caught up for the needs of real estate users. That's what I'm interested in working on. I use a lot of GIS and R, so my curriculum is tailored to follow [R](https://www.r-project.org/)/[Python](www.python.org) and [QGIS](www.qgis.org). I'm a bit of an open-source nut so I like learning much better this way. I'm looking for people to connect with, and possibly to work on projects.<br>

Want to collaborate? Get in touch:
* [linkedin](http://www.linkedin.com/in/scottcdavis);
* [twitter](http://www.twitter.com/scottdavisCRE); or
* [email](mailto:[email protected])


<h2>Open Source Curriculum</h2>
<h3>Base Introduction</h3>
Data Science Introductions
- [ ] Intro to Data Science by UW / Coursera, online course
- [ ] Data Science Specialization by Johns Hopkins / Coursera
- [X] [Data Scientists Toolbox](https://www.coursera.org/account/accomplishments/certificate/UY4EBM46HL)
- [X] [R Programming](https://www.coursera.org/account/accomplishments/records/Va5vuEvGKyr7UyHEL)
- [X] [Getting and Cleaning Data](https://www.coursera.org/account/accomplishments/records/ENSGmvNfx24sANRW)
- [X] [Exploratory Data Analysis](https://www.coursera.org/account/accomplishments/records/2PPsRu2Us3sUehBQ)
- [X] [Reproducible Research]
- [ ] [Statistical Inference] (in progress)
- [ ] [Regression Models] (in progress)
- [X] [Practical Machine Learning]
- [ ] [Developing Data Products]
- [ ] [Data Science Capstone]
- [ ] [Data Science by Harvard](http://cs109.github.io/2015/) (online course)
- [ ] [Data Science with Open Source Tools](http://shop.oreilly.com/product/9780596802363.do)
- [50 Years of Data Science](http://pages.cs.wisc.edu/~anhai/courses/784-fall15/50YearsDataScience.pdf)
- [ ] [Datasmart](http://www.amazon.com/Data-Smart-Science-Transform-Information/dp/111866146X/ref=sr_1_1?s=books&ie=UTF8&qid=1458768727&sr=1-1&keywords=datasmart) - in Excel, but also works in LibreOffice and so much of business analytics is still in Excel.


<h3>Mathematics/Statistics</h3>
- [ ] [Statistics for Spatial Data, Revised Edition](http://www.wiley.com/WileyCDA/WileyTitle/productCd-1119114616.html)
- [ ] [Statistics for Spatio-Temporal Data](http://www.wiley.com/WileyCDA/WileyTitle/productCd-EHEP002348.html)
- [ ] [Linear Algebra](http://www.amazon.com/Linear-Algebra-Dover-Books-Mathematics/dp/048663518X)
- [ ] Problem-Solving Heuristics: [How to Solve It](http://www.amazon.com/How-Solve-It-Mathematical-Princeton/dp/069111966X)

<h3>Computing</h3>
R:
- [ ] [R in Action](https://www.manning.com/books/r-in-action-second-edition?a_bid=5c2b1e1d&a_aid=RiA2ed)
- [ ] [R Cookbook](http://shop.oreilly.com/product/9780596809164.do)
- [ ] [Forecasting: Principles and Practice](http://otexts.com/fpp/)

R Libraries/Task Views
* [ProjectTemplate](http://projecttemplate.net/index.html)
* Spatial Data [CRAN Task View: Analysis of Spatial Data](https://cran.r-project.org/web/views/Spatial.html)
* Spatio-Temporal Data [CRAN Task View: Handling and Analyzing Spatio-Temporal Data](https://cran.r-project.org/web/views/SpatioTemporal.html)
* Optimization [CRAN Task View: Optimization and Mathematical Programming](https://cran.r-project.org/web/views/Optimization.html)
* Finance [CRAN Task View: Empirical Finance](https://cran.r-project.org/web/views/Finance.html)

Python:
- [ ] [Dive Into Python](http://www.diveintopython.net/)
- [ ] [Google's Python Class](code.google.com/edu/languages/google-python-class/)
- [ ] [Python for Data Analysis](http://shop.oreilly.com/product/0636920023784.do)
- [ ] [Webscraping with Python](https://www.packtpub.com/big-data-and-business-intelligence/web-scraping-python)

QGIS:
- [X] [QGIS Tutorials and Tips](http://www.qgistutorials.com/en/)
- [X] [Mastering QGIS](https://www.packtpub.com/application-development/mastering-qgis)
- [ ] [Building Mapping Applications with QGIS](https://www.packtpub.com/application-development/building-mapping-applications-qgis)
- [ ] [GIS Tutorial Workbook 1](https://esripress.esri.com/display/index.cfm?fuseaction=display&websiteID=232&moduleID=1) This is for ArcView, but you can work the examples in QGIS too
- [ ] [GIS Tutorial Workbook 2: Spatial Analysis](https://esripress.esri.com/display/index.cfm?fuseaction=display&websiteID=230&moduleID=0) This is for ArcView, but you can work the examples in QGIS too
- [ ] [QGIS Map Design](https://locatepress.com/qmd) I've just thumbed through this, but it's beautiful and belongs on any list of GIS books.

MySQL:
- [ ] [Learn MySQL in One Video](https://www.youtube.com/watch?v=yPu6qV5byu4)
- [ ] [MySQL Workbench Starter](code.google.com/edu/languages/google-python-class/)

Octave:
- [ ] [GNU Octave Beginners Guide](https://www.packtpub.com/big-data-and-business-intelligence/gnu-octave-beginners-guide)
-
PostGIS/PostGRESQL:
- [ ] [PostGIS Essentials](https://www.packtpub.com/big-data-and-business-intelligence/postgis-essentials)
- [ ] [PostGRESQL Tutorial](http://www.postgresqltutorial.com/)
- [ ] [PostgreSQL: Up and Running: A Practical Introduction to the Advanced Open Source Database](http://shop.oreilly.com/product/0636920032144.do)

<h3>Algorithms</h3>
- [ ] [Algorithms Design & Analysis](http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=IntroToAlgorithms) Stanford openclassroom

<h3>Distributed Computing Paradigms</h3>
- [ ] Intro to Hadoop and MapReduce by Cloudera and Udacity
*Note: I might swap the above course with an EdX course on Apache Spark and distributed computing*

<h3>Data Mining</h3>
- [ ] Mining Massive Data Sets, by Stanford and Coursera
- [ ] [Clean Data](https://www.packtpub.com/big-data-and-business-intelligence/clean-data)

<h3>Machine Learning/Predictive Analytics - Foundational/Theoretical/Practical</h3>
- [ ] Machine Learning, by Ng Stanford and Coursera (NB this class requires a lot of higher level math)
- [ ] [An Introduction to Statistical Learning with Applications in R](http://www.r-bloggers.com/in-depth-introduction-to-machine-learning-in-15-hours-of-expert-videos/) (by the authors of The Elements of Statistical Learning at Stanford.)
- [ ] [Machine Learning with R](https://www.packtpub.com/big-data-and-business-intelligence/machine-learning-r-second-edition)
- [ ] [Building a Recommendation System in R](https://www.packtpub.com/big-data-and-business-intelligence/building-recommendation-system-r)
- [ ] [Mastering Predictive Analytics in R](https://www.packtpub.com/application-development/mastering-predictive-analytics-r)
- [ ] [Bootstrapping Machine Learning](http://www.louisdorard.com/machine-learning-book/)
- [ ] [Applied Predictive Modeling](http://www.amazon.com/gp/product/1461468485?psc=1&redirect=true&ref_=oh_aui_detailpage_o08_s00)

<h3>Analysis</h3>
- [ ] [Practical Data Science Cookbook](http://www.diveintopython.net/)
- [ ] [R Data Analysis Cookbook](code.google.com/edu/languages/google-python-class/)

<h3>Spatial Analysis</h3>
- [ ] [An Introduction to R for Spatial Analysis and Mapping](https://us.sagepub.com/en-us/nam/an-introduction-to-r-for-spatial-analysis-and-mapping/book241031)
- [ ] [Applied Spatial Data Analysis with R](http://www.springer.com/us/book/9781461476177)

<h3>Land Use/Transport/Gravity Modeling</h3>
- [ ] [Integrated Land Use and Transport Modelling: Decision Chains and Hierarchies](http://www.amazon.com/gp/product/0521022177?psc=1&redirect=true&ref_=oh_aui_detailpage_o03_s00)
- [ ] [Gravity and Spatial Interaction Models (Scientific Geography Series)](http://www.amazon.com/gp/product/0803925441?psc=1&redirect=true&ref_=oh_aui_detailpage_o06_s00)
- [ ] [TRANUS Model](http://www.tranus.com/tranus-english)
- [ ] [Urban Sim](https://pypi.python.org/pypi/urbansim)
- [ ] [Huff-tools Package in R](http://rstudio-pubs-static.s3.amazonaws.com/42357_1e6fcc5bcfec439096eb86a106ebf22e.html)
-
<h3>Data Design/Data Viz</h3>
- [ ] [Beautiful Evidence](http://www.edwardtufte.com/tufte/books_be)
- [ ] [Semiology of Graphics](http://www.amazon.com/Semiology-Graphics-Diagrams-Networks-Maps/dp/1589482611)
- [ ] [Visual Complexity Mapping Patterns of Information](hhttp://www.visualcomplexity.com/vc/book/)
- [ ] [The Visual Display of Quantitative Information](http://www.edwardtufte.com/tufte/books_vdqi)
- [ ] [Design for Information](http://isabelmeirelles.com/book-design-for-information/)
- [ ] [Design Elements: A Graphical Style Manual](http://www.amazon.com/Design-Elements-Graphic-Style-Manual/dp/1592532616)
- [ ] [Storytelling with Data](http://www.amazon.com/gp/product/1119002257?psc=1&redirect=true&ref_=oh_aui_detailpage_o09_s00)
- [ ] [Mastering Python Data Visualization](https://www.packtpub.com/big-data-and-business-intelligence/mastering-python-data-visualization)
- [ ] [The Grammar of Graphics](https://www.packtpub.com/big-data-and-business-intelligence/mastering-python-data-visualization)
- [ ] [R Graphics Cookbook](http://shop.oreilly.com/product/9780596809164.do)

<h3>Relevant prior studies</h3>
- [X] MS in Community and Regional Planning, UT-Austin
- [X] BA in Liberal Arts, concentration in geography, UT-Austin

<h2>OpenSource Data Science Masters Capstone Project</h2>
I'm interesting in using data science approaches for better intelligence behind real estate decisions, specifically evaluating population growth, transactions and location decisions. I'd also like to evaluate statistical learning technqiues to make better pricing decisions. Finally, I'd like to develop a model to optimize real estate portfolios.

If you'd like to pair up for the capstone, [let me know](http://www.twitter.com/scottdavisCRE)

Loading