Complete Small Focused Projects and Demonstrate Your Skills
(完成小型针对性机器学习项目,证明你的能力)
A portfolio is typically used by designers and artists to show examples of prior work to prospective clients and employers.
Design, art and photography are examples where the work product is creative and empirical, where telling someone you can do it is not valued the same as showing them.
In this post, I will convince you that building a machine learning portfolio has value to you, others and the community.
You will discover what exactly a machine learning portfolio is, the types of projects that can be included and how to make your portfolio really work for you.
Your Portfolio
- Pick a theme. This is the type of projects that you want to work on. A no-brainer would be reports on customer data (high-value customers, predictions of prospects that convert, etc.).
- Find open datasets. You need to locate datasets that you can practice on that are close to or on your theme. Look on competition websites like Kaggle and KDDCup as a starting point. There are a lot of public access datasets these days that you can practice on!
- Complete projects. Treat each dataset like a project with a client and apply your process to it in order to deliver a result. This may require you to assume the role of the client and take an educated guess as to the outcome they are looking for (model or report on a specific question, etc.)
- Write-up. Write-up your findings as a semi-formal work product and host it publicly online.
Benefits of a Machine Learning Portfolio
If you are just starting out as a beginner in machine learning or you are a hardened veteran, a machine learning portfolio can keep you on track and demonstrate your skills. Creating a machine learning portfolio is a valuable exercise for you and for others.
Benefits for You
Building up a collection of completed machine learning projects can keep you focused, motivated and be leveraged on future projects.
- Focus: Each project has a well-defined purpose and end point. Small projects constrained in effort and resources can keep velocity high.
- Knowledge Base: The corpus of completed projects provide a knowledge base for you to reflect on and leverage as you push into projects further from comfort zones.
- Trajectory: There are so many shiny things to investigate, reminding yourself that you are looking for a consistent collection projects can be used as a lever to keep you on track.
Benefits to Others
A portfolio of completed projects can be used by others as an indicator of specific skills, ability to communicate and a demonstration of drive.
- Skills: A project can demonstrate your capability with regard to a specific problem domain, tool, library technology stack or algorithm.
- Communication: A project must be understood at least in terms of its purpose and the findings. The curation of a good portfolio requires excellent communication skills that tautologically demonstrate your ability to communicate technical subjects well.
- Motivation: Working on and completing side projects, regardless of the size of scope takes a certain level of self-discipline. The fact that you managed to put together a portfolio is a monument to your interest in the subject and ability to manage your time.
Benefits to the Community
Sharing your projects in public extends the benefits to the broader machine learning community.
- Engagement: A public project can elicit feedback from third-parties which may provide extensions and improvements from which both you and the community itself can learn from.
- Starting Point: A public portfolio project can provide the jumping off point from which others can learn and build upon, perhaps for their own small project or something serious.
- Case Study: a public project can provide a point of study perhaps for a unique or interesting algorithm behavior or problem decomposition, the very source of innovation.
Hopefully, I’ve convinced you that building a machine learning portfolio has some benefits that interest you. Next, we will look at what exactly a machine learning portfolio is.
Build a Machine Learning Portfolio
A machine learning portfolio is a collection of completed independent projects, each of which uses machine learning in some way. The folio presents the collection of projects and allows review of individual projects.
Five properties of an effective machine learning portfolio include:
- Accessible: I advocate making the portfolio public in the form of a publicly accessible webpage or collection of public code repositories. You want people to find, read, comment on, and use your work if possible.
- Small: Each project should be small in scope in terms of effort, resources, and most importantly, your time (10-20 hours). You’re busy and it’s hard to keep focus. See my Small Projects Methodology.
- Completed: Small projects help you have finished projects. Set a modest project objective and achieve it. Like mini-experiments, you present the findings of your successes and your failures, they are all useful learnings.
- Independent: Each project should be independent so that it can be understood in isolation. This does not mean you can’t leverage prior work, it means that the project makes sense on its own as a standalone piece of work.
- Understandable: Each project must clearly and effectively communicate it’s purpose and findings (at the very least). Spend some time and make sure a fresh set of eyes understand what you did and why it matters.
Four types of small project ideas that may inspire you, include:
- Investigate a property of a machine learning tool or library.
- Investigate the behavior of a machine learning algorithm.
- Investigate and characterize a data set or machine learning problem.
- Implement a machine learning algorithm in your favorite programming language.
Some ideas for projects that you probably didn’t think were portfolio pieces include:
- Coursework: Your clear presentation of your notes and homework for a machine learning related course (such as a MOOC).
- Book Review: Your clear presentation of your notes from reading and reviewing a machine learning book.
- Software Review: Your clear presentation and worked examples for using a machine learning related software tool or library.
- Competition Participation: You’re clearly presented notes and results for participating in a machine learning competition, such as Kaggle.
- Commentary: An essay in response to a machine learning themed blog post or your detailed response to a machine learning related question on a Q&A site like Quora, Reddit Machine Learning or CrossValidated.
Now that you know what a machine learning portfolio is and have some ideas of projects, let’s look at how to turn up the awesome on your portfolio.
Making Your Portfolio Great
To make your portfolio shine, you need to do some light marketing. Don’t worry, it’s none of that slimy stuff, just good old fashioned getting the word out.
Code Repository
Consider using a public source code repository such as GitHub or BitBucket that naturally list your public projects. These sits encourage you to provide a readme file in the root of each project that describes what the project is all about. Use this feature to clearly describe the purpose and findings for each project. Don’t be afraid to include images, graphs, videos and links.
Provide unambiguous instructions for downloading the project and recreating the results (if there is code or experimentation involved). You want people to re-run your work, make it as easy as possible (i.e. type this to download then type this to build and run it).
Curate Projects
You can slap together any old project on GitHub, but only include your best, clearest most interesting work in your machine learning portfolio.
Curate your projects like a gallery. Choose those that best demonstrate your skills, interests and capabilities. Show off what you can do and what you have done. These ideas of self-promotion can feed back into the projects you might want to tackle. Be clear in your vision, where you want to be and what projects you want to tackle that will help you get there. Own the process.
Present Findings
Spend a lot of time writing up results. Explain how they relate to the aims of the project. Explain the impact they have in the domain or could have. List off opportunities for extensions that you would or could explore if you had another month or year to deep dive on the project.
Create tables, graphs and any other pretty pictures that help you tell your story. Write up your findings as a blog post. For bonus points, create a short screen cast showing how you got the results and a small power point presentation for what that mean, put it up on YouTube. This video can be embedded in your blog post and linked to from your project repository readme file.
Depending on the findings you have and how important they are to you (such as doing well in a Kaggle competition), you can consider creating a technical report and uploading it to scribd and uploading your slides to SlideShare.
Promote Your Work
You can share the details of each project as you finish it. You may be completing one per week depending on the number of free hours you can find around study and/or work. Sharing links on social media is a good start, such as twitter, facebook and Google+.
I would urge you to add each project (or just your best projects) as “projects” on LinkedIn. It supports the idea of projects and you may have to create a job for them to be listed against. Consider the name of your blog, your sole trader company or invent a relevant job and title such as “Machine Learning Mastery” (wink) or “Self Education“.
Now that we have some ideas on how to make our portfolio shine and how to get the word out, can look at some examples of machine learning portfolios.
Trend of Machine Learning Portfolio
The idea of a code portfolio is not new, it was baked into GitHub. What is interesting is that in recent interviews with data scientists and managers, portfolios are being requested even desired along with participation in machine learning competitions and completion of online training.
Like sample code in programming interviews, Machine Learning portfolios are getting to become a serious part of hiring.
Look for examples of good (or at least filled out) machine learning portfolios. Look for people doing well in machine learning competitions, they typically have an amazing collection of projects described on their blogs and in their public code repositories.
Look for contributors to open source machine learning projects, they can have amazing tutorials, applications and extensions to the software on their blogs and public code repositories.
Get started now. Dig up your projects and put them together in a story that explains your knowledge, interest or skills in machine learning.