Wednesday, November 7, 2018

Study Plan for Datascience

On The auspicious day of Diwali,
I came across Microsoft AI School. and feeling Wow!!!!!!

However many pending things which I need to cover before I start jumping into new things.

So, here my Study Plan after doing couple of certifications from 'Microsoft Professional Program for Datascience'.

will write an entire post about what I studied from those certificates and how it has helped me and still what is missing so i need to restructure my personal Study Plan to be competent in the field of DataScience.


1)  'Python for Data Analysis,

it's Github code exercises for Jupiter Notebook

2) Math of Intelligence

3) DAT275x : Principles of Machine Learning: Python Edition

its Github code exercises 

4) BigQuery, will do Project on Kaggle using BigQuery

5) Refer Kernel Master's code on Kaggle for various problems.

6) Practice on Kaggle Datasets and Hands-On for various types of Data. will Identify gradually projects I will cover and will mention them here.

5) Participate in Kaggle compitition

My personal Challenges:
1) Whatever I do is self study, need to be very focused in my night time study.
and revise them in day time by listening videos and thinking solutions for the problems I face whenever I get chance.

2) I really need to find some friends with who I can interact about what I study, how I study, what problem I face etc. Write now best is diary.

Thursday, October 4, 2018

Iris Dataset Prediction

For some reasons i was out of touch with machine learning, datascience after i finished my certification 

'Microsoft: DAT210x

Programming with Python for Data Science'
so it was a new start today after a long break. got confused from where to start, thought of revising the course but i wanted to dig my hands in prediction exercise rather going with theories about what is machine learning and detail explanation about different models.
Thought to run some code for iris data first. search on github and plenty of codes but all were talking different models. My goal was clear to find prediction from iris data. for that I need to 
  1. Prepare Data.
  2. Evaluate Algorithms.
  3. Find more accurate Algorithms.
  4. Predictions
Luckily I landed up Jason Brownee's blog. step-by-step guide for irish-data analytics 
At the end of the tutorial did prediction with couple of random data for exampleas below using Support Vector Machine's algorithm.

 #test predictions
X_new = [[5.2, 3.9,1.2, 0.2], [6.9,3.0,4.1,1.2]]
Y_new = svn.predict(X_new)
print('Prediction of species:{}'.format(Y_new))

Output: Prediction of species:['Iris-setosa' 'Iris-versicolor']

and really happy with the accuracy and predication.
So good luck to me for my self - learning journey as a DataScientist.

Monday, April 10, 2017

What is HDInsight?

Introduction to HDInsight Service


Microsoft Azure HDInsight Service

Microsoft and Hortonworks are working together to create Azure HDInsight.A fully managed Apache “Hadoop” and Apache “Spark” cloud service that has been hardend for the enterprise and simpler for the users. As a managed Hadoop-as-a-service offering, HDInsight was designed to make Apache Hadoop and Apache Spark simple to use, with a lower manageability cost and higher developer productivity.  



Why Microsoft Azure HDInsight?

Today customer challenges with real-time large volume of unstructured data is very different then ever before.

Today's data really doesn't fit into traditional relational database approach as 85% of new data are unstructured. customers need to managed all these data alongside their relational data in data sets as well in warehouses.

Volume of Data are exploding (Large volume of Data) from various social media platform, websites, online feedback etc.

Companies are using real time data to change, build and optimize their business services as well to understand market trend. these data are High Velocity and organizations today don't have staff with skill or experience (Lack of skill) to deal with these challenges.

As organizations scale out the amount of data they are capturing, manageability of on-premise infrastructure is challenging. Organizations need more IT staff to deploy and administer the solutions.




resources:
https://msdn.microsoft.com/en-us/library/dn749853.aspx
https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-introduction
https://hortonworks.com/datasheet/microsoft-azure-hdinsight/