[AISWorld] Responses to 'Resources for R and Python'

mmora at securenym.net mmora at securenym.net
Tue Mar 29 13:29:43 EDT 2016


Colleague Jerry Flatto,
I missed to share these links (we are elaborating a similar syllabus for
MKT undergraduate introductory course, so the reason to explore these
topics). Good luck !
Manuel Mora / UAA, Mexico

---- There are several open source GUI front-ends to use R software. I
installed several of them, but most friendly and with minimal problems
for installation was the first reported option:

1st. RCommander
http://www.rcommander.com/

2nd. RKWard
https://rkward.kde.org/

3rd. Deducer
http://www.deducer.org

4th. RStudio
https://www.rstudio.com/

On Data Science open source great tools these are recommended !

1. Canvas Orange
http://orange.biolab.si/

2. KMine
https://www.knime.org/

3. RapidMiner
https://rapidminer.com/

4. Weka
http://www.cs.waikato.ac.nz/ml/weka/
--------------------------------------------------------------------



On Tue, March 29, 2016 11:42 am, Jerry Flatto wrote:
> I recently posted a request for resources related to R and Python.  Thank
>  you to everyone who responded.  All the responses are provided below.
>
>
>
> After some research and thinking, I am planning to go with Python in my
> classes.  I am including my thoughts on how I arrived at this decision in
> case this might be helpful to others in a similar situation.  Feel free
> to email me at jflatto at uindy.edu <mailto:jflatto at uindy.edu>  to kick this
> around or to tell me why I should rethink my plan.  While leaning towards
>  Python, I can still be swayed.  :-)
>
>
>
>
> My business students generally do not know programming and are not
> generally going to be statistical experts.  I do not see them pushing the
> boundaries of data science but rather working for organizations who want
> to improve their decision making process but will not be "bleeding edge"
> in most cases.
>
>
>
>
> Rather, I see them spending time capturing data from various sources and
> having to clean the data before the analysis.  As such, Python seems to be
> a better fit.  I also see more natural language processing in the
> curriculum which Python seems to handle better.  I incorporate Tableau in
> the curriculum which helps with visualization. I do not have a
> philosophical issue with open source versus commercial software; rather I
> do not want to use commercial software so expensive that it will be very
> unlikely for my students to have after they graduate.  Tableau is popular
> enough so that I can easily see my students having it available.  Some of
> my other commercial software is just "too expensive" for many companies to
> have.
>
>
>
> As for the option of teaching them R and Python, I am concerned that if I
> go this route, the students will not get enough depth in either one to be
> "dangerous".
>
>
>
>
> Some of the online discussion I have looked at for R versus Python
> include:
>
>
>
>
> http://www.dataschool.io/python-or-r-for-data-science/
>
>
>
>
> https://www.datacamp.com/community/tutorials/r-or-python-for-data-analysi
> s
>
>
>
> https://www.dataquest.io/blog/python-vs-r/
>
>
>
>
> Jerry
>
>
>
>
>
>
> "No trees were harmed in the sending of this message; however, a large
> number of electrons were slightly inconvenienced..."
>
>
> Dr. Jerry Flatto, Professor, Information Systems Department - School of
> Business
>
>
> University of Indianapolis, Indianapolis, Indiana, USA
> mailto:jflatto at uindy.edu
>
>
>
>
> Confidentiality Notice: This communication and/or its content are for the
>  sole use of the intended recipient, and may be privileged, confidential,
> or otherwise protected from disclosure by law.  If you are not the
> intended recipient, please notify the sender and then delete all copies of
> it. Unless you are the intended recipient, your use or dissemination of
> the information contained in this communication may be illegal.
>
>
>
>
>
> This is probably the best resource I found insofar:
> https://cran.r-project.org/doc/contrib/Zhao_R_and_data_mining.pdf. And it
> is available for free.
>
>
>
>
>
>
>
> You may find this useful:
>
>
>
>
>> https://www.datacamp.com/community/tutorials/r-or-python-for-data-analy
>> sis
>
>
>
>
>
>
>
>
>
> RStudio is a good IDE and the server version is free for universities.
>
>
>
>
> Feel free to use my slides.
>
>
>
>
> http://richardtwatson.com/dm6e/Reader/slides.html
>
>
> Chapters 14-18
>
>
>
>
>
>
>
>
> Hi Jerry, in 2013 a graduate student and I developed a set of five R
> tutorials that we submitted to some competition but never heard back
> about. Your request reminded me of them, and I just uploaded them to the
> Teradata
> University Network.  Have you been through there, yet, by the way?  It's
> teradatauniversitynetwork.com and a lot of faculty upload their teaching
> materials to share.
>
> Here's the link to the material on TUN:
>
>
> http://www.teradatauniversitynetwork.com/Library/Items/Five-Tutorials-for
> -Da
> ta-Visualization-and-Analysis-with-R/
>
>
>
>
>
>
>
>
>
>
>
> I would highly recommend DataCamp (https://www.datacamp.com/home), a site
>  with several online courses that specialize on R, statistics and
> analytics (and to a lesser degree, Python). The format is short videos
> followed by hands-on exercises hosted on their cloud R service. I haven't
> used it for teaching, but this is what I've been using to learn R myself,
> and I find the quality of the content and pedagogy to be excellent (with
> the sole exception of their data.table course). The academic price is USD
> 9 per month, but a
> couple of introductory courses are free, and the first chapter of every
> course is free, so you can easily try it out.
>
>
>
>
>
>
>
> Your choice to provide instruction in R is wise. I wish I had learned it
> during my Ph.D.  I am learning it now.  The learning curve is steep at
> first but R is much more powerful and flexible than SPSS.
>
>
>
> There are quite a few free books available in pdf format online.
>
>
>
>
> R for Beginners
> https://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf
>
>
> The R Inferno.  http://www.burns-stat.com/pages/Tutor/R_inferno.pdf
>
>
> Statistics with R is a webpage
> (http://zoonek2.free.fr/UNIX/48_R/all.html)
> but they provide a pdf version of their site.
> http://zoonek2.free.fr/UNIX/48_R/all.pdf.bz2
>
>
> R tips. http://pj.freefaculty.org/R/Rtips.pdf
>
>
> http://cran.r-project.org/doc/manuals/
>
>
> http://heather.cs.ucdavis.edu/~matloff/132/NSPpart.pdf
>
>
> https://media.readthedocs.org/pdf/little-book-of-r-for-multivariate-analy
> sis /latest/little-book-of-r-for-multivariate-analysis.pdf
>
>
>
>
> There are free books and resources available on specific topics as well.
>
>
>
>
> An Introduction to Statistical Learning with Applications in R.
> http://www-bcf.usc.edu/~gareth/ISL/
>
>
> http://ggplot2.org/book/qplot.pdf
>
>
>
>
> There are some really good channels on youtube that provide instruction
> on R.
>
>
>
>
> https://www.youtube.com/channel/UC0MxOB6BCL976Dm2kPK-HgA
>
>
> https://www.youtube.com/user/TheLearnR
>
>
> https://www.youtube.com/user/marinstatlectures
>
>
> https://www.youtube.com/user/Tutorlol
>
>
> https://www.youtube.com/channel/UClYj39vwP_hdlG8mBXz2y4w
>
>
>
>
> I have watched videos on youtube. I have taken courses on udemy. The best
>  instruction I have taken so far has been on www.datacamp.com
> <http://www.datacamp.com> . Learning how to use R by watching videos is a
>  bit like learning mathematics by watching someone else do it.  The only
> real way to learn is by doing it.  It is important to do a lot of
> exercises. Datacamp allows me to do that. However, it is not free.
>
>
>
>
>
>
>
>
>
>
> This might not be aimed at the right audience for you, Jerry, but there
> are some great resources for the beginner here:
>
>
>
> http://inventwithpython.com/
>
>
>
>
> With a free online book:
>
>
>
>
> http://inventwithpython.com/chapter1.html
>
>
>
>
>
>
>
>
>
>
> For Python, I recommend "Python for Data Analysis" by McKinney (O'Reilly
> Media).  The author is the creator of the 'pandas' library, very useful
> for data preparation, and he covers a bit of visualization as well.  If
> your students need to start from scratch in the language, I've heard great
>  reviews of Learn Python the Hard Way (learnpythonthehardway.org), it's a
>  free text but they can pay a small fee for video lessons.
>
>
>
> Definitely use Anaconda Scientific Python Distribution from
> https://continuum.io.  It's free, and bundles the latest versions of
> Python
> with all the commonly-used packages for data analysis and visualization.
> Also, if students have Python already installed for work, Anaconda
> installs a separate copy so it doesn't disrupt their current installation.
> Best of
> all, it has an "install for self" mode which means that students can
> install it on computer lab computers without having Administrator
> access... so I can bypass going to the university's IT department.
>
>
>
>
>
>
>
>
>
> I think considering R/Python instead of SAS/ SPSS is a very good idea for
>  analytics programs. If  you are looking for books with  R, you may want
> to consider the following books:
>
> 1. An Introduction to Statistical Learning with Applications in R by
> Gareth
> James et.al.
>
>
> 2. R and Data Mining: Examples and Case Studies by Yanchang Zhao
>
>
> 3. Data mining and business analytics with R  by Johannes Ledolter
>
>
>
>
>
>
> For the decision as to which one to use, that is really dependent on how
> much analysis and mathematics will need to be used.
>
>
>
> For heavy data analysis and mathematics, here are the recommended open
> source options:
>
>
>
> 1. R
>
>
> 2. Octave
>
>
>
>
> When you are ready to take an algorithm to a production state or drop
> into another application's workflow, python with either pylab or pandas
> packages is the way to go.
>
>
>
> For machine learning capabilities R, Octave are the best again for
> creating the math, Python is the preferred application code to implement.
>
>
>
> For resources, here are some courses/tutorials that my team has found
> useful:
>
>
>
>
> https://www.udemy.com/r-programming/
>
>
> https://www.udemy.com/applied-data-science-with-r/
>
>
> https://www.udemy.com/applied-data-science-with-python/
>
>
> https://www.udemy.com/data-analysis-in-python-with-pandas/
>
>
>
>
> Although it is a bit dated at this point, I still really love Stanford's
> course on machine learning.  This course does require some pretty heavy
> mathematics/stats, so might want to brush up on those things before
> taking:
>
>
>
>
> https://www.coursera.org/learn/machine-learning
>
>
>
>
> One thing you didn't request is how to visualize the isights or outputs.
> For this you can certainly leverage the packages of R or Python to provide
>  some nice visualization capabilities; however, for more advanced and
> explorable options, there is a java script library that has plugins for
> both R & Python, D3.JS -- you might want to research this as well.
>
>
>
>
>
>
> One very useful resource you may consider is the integrated development
> environment for python by Jet Brains and it is free for Professors and
> Students. You can find it here https://www.jetbrains.com/pycharm/
>
>
>
>
>
>
> First, I think its great that you are moving towards open source,
> flexible data analysis tools. This will really help your student's think
> about what they are doing and let them be more creative. However, with
> that comes a price: your student's need a modicum of comfort or ability to
> program or think like a programmer to use these tools...there are no
> buttons to just click on and pretty tables to view data. Its all through
> programming commands.
>
> Here are the books that I've found must useful.
>
>
> Note: Unless noted otherwise, all the resources below have been made
> freely available by their authors, but they are also available for
> purchase from places like Amazon.com
>
> R Programming Language Resources
>
>
> *	Books by Hadley Wickham (a Core R Team member who has developed a
> lot of very useful utilities for R)
>
> *	 <http://r4ds.had.co.nz/> R for Data Science this is focused on
> using R for statistics *	 <http://adv-r.had.co.nz/> Advanced R this is
> focused on R as a programming language, not on how to do statistics.
>
> *	 <https://cran.r-project.org/web/views/> CRAN Task Views this is a
> page maintained by the R Project Team that thematically organizes the
> myriad of packages in R.
>
> *	Pros: Well organized and has decent descriptions and links to many
> packages. *	Cons: Not exhaustive...more experimental or relatively new
> packages are not always there (however, this may not be a bad thing)
>
> *	 <http://www.cookbook-r.com/> Cookbook for R takes a "just tell me
> what to do" approach to many common tasks in R.
>
> Python Programming Resources
> Note: There are currently two versions of python out there: Python 2 and
> Python 3. Normally, the developers try to maintain backwards
> compatibility, but they deviated from that principle for Python 3. The
> vast majority of Python 2 code will run with Python 3, but there are a few
> gotchas. I've included a reference that I think does a good job describing
> both languages. I'd recommend having your students use Python 3, as it's
> where the language is going.
>
> *	 <https://docs.python.org/3.5/> Official Python 3 Documentation --
> Decently written, comprehensive overview of Python's standard library.
> *	Core External Packages for Data Analysis: Unlike R, Python's data
> science toolkit is comprised of a few "mega packages" as opposed to many
> small, focused packages. Also, these packages almost have a life of their
>  own, with their own conferences and generally well-documented, decent
> looking web pages (unlike R's sparse help files).
>
> *	 <http://scipy.org/> Scipy.org: Not a package, but the SciPy
> organization makes most of the packages below. *
> <http://docs.scipy.org/doc/numpy-1.10.0/user/index.html> Numpy:
> Convient array-like objects that are more user-friendly than Python base
> arrays for numerical computations. *
> <http://docs.scipy.org/doc/scipy/reference/> Scipy: The  package
> for scientific computing...has tons of stuff from calculus to statistics
> to image processing and linear algebra and optimization and....
>
> *	 <http://scikit-learn.org/stable/> Scikit-learn: Scippy has a nubmer
> of "kits" that add additional functionality. This one has a bunch of cool
> machine learning algorithms with generally user-friendly APIs (so they
> are more accessible to non-ML experts). Since machine learning is pretty
> hot right now, and the idea of AI and computers learning though statistics
> is just plain cool, even a brief foray into this area would be well
> received by students (e.g., lots of classification algorithms boil down to
> a linear model, albeit in a transformed space)
>
> *	 <http://pandas.pydata.org/pandas-docs/stable/index.html> Pandas:
> Major contribution is the DataFrame, which is meant to have similar
> functionality to R's popular DataFrame. Has lots of nice data
> import/export features too (e.g., Pandas.DataFrame.from_csv("filename.csv"
> creates a nice data from right from a local csv)
>
> *
>
>
> *	 <http://matplotlib.org/> Matplotlib: Emulates a lot of MATLAB's
> plotting functionality. again, with a generally user-friendly API.
>
> *	 <http://stanford.edu/~mwaskom/software/seaborn/api.html> Seaborn:
> This is a package that uses matplotlib behind the scenes, but it makes a
> lot of the choices for you regarding formatting and display...generally
> good choices ;-) I use it a lot because I don't like fiddling with tons of
>  parameters.
>
> *	(NOT FREE)  <http://www.dabeaz.com/per.html> Python Essential
> Reference by David Beasley. This is a very concise (but well written)
> reference manual on Python programming (note, does not have a statistics
> focus). However, it does a good job pointing out the quirks in the
> language and how it's internals work, so Python will seem less mysterious.
>
>
> New(er) Data Formats
>
>
> It may also be helpful for you to briefly describe how to use JSON and
> YAML
> data formats. They aren't super difficult to learn, but both R and Python
> can parse these files into useful data structures and they allow for
> expressing more complex data (like nested lists). It also helps if your
> students aren't tied to CSV files, useful as they may be for basic
> statistics.
>
> *	 <http://www.w3schools.com/json/> JSON: Less "human readable" but
> widely used. *	 <http://ess.khhq.net/wiki/YAML_Tutorial> YAML: More
> readable and a person favorite of mine for developing configuration files
> and expressing complex data.
>
> Finally: Done underestimate YouTube....lots of great stuff related to
> above, and its generally easier to digest a 15 minute example.
>
> As a practicing data scientist, I regularly use all the above items, and
> they have helped me learn a lot of techniques.
>
> Hope it helps you and your students.
>
>
>
>
>
>
> *	If your students are going to work with R, they most definitely
> should install  <https://www.rstudio.com/> RStudio, which is a great IDE
> (dare I say "industry standard"?) for R.
> *	Johns Hopkins offers a
> <https://www.coursera.org/specializations/jhu-data-science> data science
> specialization on Coursera. The specialization itself has a fee, but the
> courses are free, they are based on R, and their done well. In
> particular, the second course is an introduction to R programming. *	The
> swirl package can be installed from CRAN. It is a learn-by-doing approach
> to R and related topics. Once installed, it lets you choose from a
> <https://github.com/swirldev/swirl_courses#swirl-courses> list of courses
>  and then walks you through entering and executing code. Someone shifting
>  from, say, Python to R might find it a tad basic, but for a beginner
> it's a fairly painless introduction to R coding. *	There's an active
> <https://plus.google.com/communities/117681470673972651781> Statistics and
> R
> Google+ community where people can seek help.
>
>
>
>
>
>
> I could recommend some text books, but you have enough by now. Besides,
> it would be useful to visit some interesting sites showing R
> aplplications. here is a suggetsion:
> <http://www.r-bloggers.com/r-stats-digital-analytics-8-blogs-you-should-f
> oll ow/> R Stats + Digital Analytics: 8 Blogs you should Follow
>
>
>
>
>
> Learning Base R,
>
>
> by Lawrence M. Leemis,
>
> 2016, Lightning Source, ISBN: 978-0-9829174-8-0.
>
>
> Available on Amazon.
>
>
>
>
> *Learning Base R* provides an introduction to the R language for those
> with limited or no prior programming experience.  It introduces the key
> topics, listed below, that are needed to begin analyzing data and
> programming in R.
>
> The focus is on the R language rather than a particular application.
> Nearly
> 200 exercises make the book appropriate for classroom use.
>
>
>
>
>
>
> You might want to take a look at R for Marketing Research and Analtyics
> <http://r-marketing.r-forge.r-project.org/> . The first half of the book
> focuses on basic statistical operations that ought to be fairly universal
> (plotting, crosstabulating, ANOVA and linear regression).  The second
> half covers a variety of more specific methods that are useful in
> marketing including factor analysis, choice modeling and hierarchical
> modeling. It wasn't intended as a textbook, but a few marketing faculty
> have adopted it. They are creating slides and exercises to go with the
> book and should be posting them in the next week or so.  You can read a
> review of the book in the Journal of Statistical Software.
> <https://www.jstatsoft.org/article/view/v067b02>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> AISWorld mailing list
> AISWorld at lists.aisnet.org
>
>






More information about the AISWorld mailing list