Python is one of the best general purpose programming language.These days Python is becoming the most preferred language by the Data Scientists.Python community is regularly updating the libraries for Data Science.There is a list of extensive libraries for Data Analysis,Machine Learning,Data Visualization respectively.I will be providing a short description of each library which will be very handy for you when performing data related tasks in Python.
So let’s get started,
Python libraries for Data Analysis:
Numpy stands for Numerical Python.It is the base library on which all higher level tool for scientific Python are built.Some basic functionalities provided by this library are:
General purpose array processing package designed to manipulate large mutli-dimensional arrays.
Ability to create arrays of arbitrary types
Basic facilities for Discrete fourier transformation,basic linear algebra and random number generation.
Scipy stands for Scientific Python,this library depends on Numpy.Scipy contains methods needed for scientific work such as solving differential equations,optimization etc.Scipy is built to work with Numpy arrays.On the whole you can say all the all the scientific operations on Numpy arrays are done using Scipy.
Pandas is one of the most essential library for Data Science.It contains high level data structures and data manipulation tools designed to make data analysis process fast and easy.Pandas is the best tool for Data Munging.Basic features of Pandas:
Easy handling of Missing data(such as Nan,Na).
Automatic data alignment i.e objects can be aligned to a set of labels or you can simply ignore the labels and let Series and DataFrame automatically align the data for computations.
Columns can be inserted or deleted from DataFrame.
Intelligent label based Slicing,fancy indexing and subsetting of large datasets.
Python Libraries for Data Visualization:
Bokeh is one of the most interactive visualization library which targets on modern browsers for presentation.Its goal is to provide elegant, accurate construction of novel graphics in the style of D3.js, and to extend this capability with high-performance interactivity over extremely large or streaming datasets. With the help of Bokeh you can quickly create interactive plots, dashboards, and data applications and that too in an easy way.
Seaborn is a famous library for making attractive and informative statistical graphs.It is built on top of matplotlib and provides support for numpy and pandas data structure.Seaborn offers:
Various built-in themes that improve on matplotlib aesthetics.
Tools for choosing color palettes to make beautiful plots that reveal patterns in your data.
Tools that fit and visualize linear regression models for different kinds of independent and dependent variables.
matplotlib is a 2-dimensional plotting library which produces high quality figures in a variety of hardcopy formats and interactive environments across platforms.You can generate plots, histograms, power spectra, bar charts, errorcharts, scatterplots, etc, with just a few lines of code.
ggplot is similar to R language ggplot package.In ggplot gg stands for Grammar of Graphics.It can be used for making professional looking, plots quickly with just few lines of code.
Python Machine Learning Libraries:
Scikit learn is one of the most popular library for machine learning.It includes a wide range of different classifiers,cross validation and other model selection methods,dimensional reduction techniques,modules for linear regression,multiple linear regression.
Shogun is a Python machine learning library that is focussed on large-scale kernel methods. Most important strengths of this library are Support Vector Machines (SVMs) and it comes with a range of different SVM implementations.