Saturday, May 30, 2015

Part 4. Making a first model.

“In theory, theory and practice are the same. In practice, they are not”
Albert Einstein
 
 
 
 
    Greetings! Today we will finally start making some predictions and recommendations. In previous part we converted our data set to numeric form so it is ready for most models form scikit-learn library. In this part we will perform next:
  1. Divide data set into parts
  2. Try few simple models from scikit-learn
  3. Try Neural Network from pybrain package
    I will use standard algorithms from scikit-learn and pybrain libraries. If you feel you can implement your own model, you can share it in comments. So let's do the practical Machine Learning.
 

Tuesday, May 12, 2015

Part 3. Data analysis.

"Not everything that can be counted counts,
and not everything that counts can be counted"
Albert Einstein
 
 
 
    Hi there! For now we have database which contains information about movies and also collection of watched films. In this part we will start to analyze data. I will show how I've chosen features to describe a movie, how I've made them to be measurable and what I got in the result. As always I will use "divide and conquer" tactics and separate this article in next sections:
  1. Converting words to numbers
  2. Converting numbers to features
  3. Normalizing and scaling features
    I will try to describe all actions I performed to reach the goal, but mostly I used my intuition. If your opinion differs feel free to post it in comments. So let the analytics begin!

Thursday, April 9, 2015

Part 2. Collect information about watched movies


"You can have data without information, but you cannot have information without data."
Daniel Keys Moran



    Hello again. In this part I will complete movie database with watched movies. Later this information will be used to create training set for machine learning algorithm. This part is short but still important. We will consider next topics:
  1. How to collect information about watched movies
  2. How to rate movies
    After finishing these two topics you can start analyzing data and dig for dependencies of favorite movies. If you will find out some interesting observations, post it in comments and I will include it my next Part. Let's do the job.

Sunday, April 5, 2015

Part 1. Create movie database.


“It is a capital mistake to theorize before one has data”
Sherlock Holmes

 
 
    Welcome back to my blog. In this part I will try to describe one of the complicated task I encountered, and which is mostly not mentioned in courses and literature. The process of creating data warehouse which is going to be used for our analysis and recommendations.
 
    For myself I divided this part into three sections. Each of them should answer three simple questions:
  1. What data do I need?
  2. Where can I find it?
  3. How shall I use it?
 
    Next I will try to answer these questions and mention problems I faced during creating my movie database. I will describe my thoughts about data, movies and technologies, but your opinion could differ, so I will show some approaches and you can choose any of them or use your own. I will be also grateful if you will share your thoughts in comments. So let's start!
 

Thursday, April 2, 2015

Part 0. Introduction.



"Docendo discimus"
(by teaching, we learn)
 
 
    Hello to everybody reading this blog. I am software engineer from Ukraine. This blog is dedicated for learning Data Science for beginners through practice. You can find dozens of blogs, articles with similar description, but the main distinction is I am also beginner. So if you are experienced in this area do not hesitate to point out my mistakes in comments or just write me via e-mail.
 
    Why have I chosen Data Science? I just like to solve complicated tasks and puzzles. Also I like to play with data: plot it, make some predictions. So fasten your seatbelts, we are taking off!