Kiva Exploratory Data Analysis on Kaggle

While delving into some data science communities I had hear of Kaggle many times over and had even created an account, but it wasn't until I got an email about a dataset of Kiva loans that I actually got interested.

First of all, what is Kaggle?
Kaggle is a platform for predictive modelling and analytics competitions in which statisticians and data miners compete to produce the best models for predicting and describing the datasets uploaded by companies and users. - Wikipedia
Essentially it is a data science playground where you can access data, do analysis, look at other peoples analysis and enter competitions. If you submit an analysis then all your code is available for others to see which means you can 'be inspired' techniques you see others using and also makes it easier assist others when they need help.

So what about Kiva?
Kiva Microfunds is a 501 non-profit organization that allows people to lend money via the Internet to low-income entrepreneurs and students in over 80 countries. Kiva's mission is “to connect people through lending to alleviate poverty.” - Wikipedia
Kiva are doing some really great things and I've had an account with them for a while.  In fact a friend of mine put together a group were everyone put in a little money each month and it was great to watch the amount we could lend grow.

So once again the personal connection and a bit of prior knowledge about the data made all the difference in making the analysis interesting to me and in making it more robust.  I keep coming back to this comment:

Anyway my Kaggle account is here and my analysis of Kiva data is here.  This was my first time using the site and I really enjoyed it and learned a lot that I can apply to my data reports in my day job.  A lot of that learning came from tackling a topic that was outside of my normal sphere of operation, but also from looking at others work.

I never did finish this analysis, I have a lot more ideas that I hope I will implement at some point but I was really happy with some of the multidimensional visualisaitons I came up with to help me understand a complex dataset. Below is a chart that is displaying for each sector, the range of loan values, the total value of loans and the number of loans.







Comments

Popular posts from this blog

Victorian Property Overlay Map

Know your bias