Data Science and Qubole Help Improve the College Selection Process

Feature Row: h2 Call-to-Action Here

Row 1: h2 Heading in a Full Width Content Section (1-Column)

Register Here  

Competition Sponsors

Big Data Utah, NDSO, Qubole, ThinkBig, and Utah Geek Events are aiming to help the common good through data!

A strong supporting group of individuals interested in the various technologies typically associated with big data, include state and local governments and government agencies, local universities, public utilities and medical services and business executives from many companies are involved with the Advisory Board and have helped turn this idea into a reality which may lead to viable business opportunities.

Competition Details

Qubole Data Services (QDS) simplifies the provisioning, management and scaling of big data analytics workloads. Many data-driven organizations face challenges accessing, processing, and optimizing their data for use in better serving their customers and driving innovation.

In the Fall of 2016, Qubole, NDSO, Utah Geek Events and the local user groups (, Slc SQL , Big Data Utah) came together to create a competition to improve and provide insights into College Selection Process and ways to improve it though Data Science. The competition was open to all technologists who wanted to learn more about Data Science, massively parallel processing, Big Data, predictive analytics, and big data tools like Spark, Hadoop, Presto, and other NoSQL technologies.

Choosing the right college for any student is a complex and multifaceted issue. Optimized selection requires a combination of significant quantitative and qualitative decisions. Without optimization of these decisions, students are often forced to forgo their dream school or the school that would have been an ideal fit. This is due largely to misinformation or misalignment related to personal fit, the best outcome for their career options, as well as financial components such as the availability of grants, student loans, or scholarships.

To solve this problem, leaders in the analytics and education industry are banding together to take a deep dive into college-related data, including President Obama’s College Scorecard. The end goal is an improved system to assist students find the very best college.

Goal of the Competition

"How would you improve the College Selection process following Obama’s College Scorecard, or using relevant analytics outside the given criteria to correlate potential outcomes."

Rules of the Competition

Teams must have 5 members. We suggest 5-10 and encourage you to have at least one “newbie.” All Code must be open source. All Data must be “Open." The data and the code will need to be published at the end of the event. A Qubole AWS S3 bucket will be used to house the data for the duration of the competition. The project must have a focus on Improving the College Selection Process based on Suggested Criteria. House Team Organizers  are ineligible for the Grand Prize Must be present at Big Mountain Data Conference to win Grand Prize. (At least 1 member, if you need a member in Utah, please let us know). Any and all technologies are welcome, compute will be done using Qubole and AWS EC2.

Competition Milestones

The competition will last for about 45 days in total, from the middle of September to the end of October. Competition participants will have the opportunity to compete in small teams. Through the use of Advanced Analytics and Machine Learning. Each team will research, analyze, model, and present their solution. Judging criteria is based on a wide evaluation of collaboration, Machine Learning models, data quality, as well as presentation skills.

August 10th – Announcement
August 31st – Charter or Goal “turned in”/data sets + Team Members Registered
September 1st-7th – Cleansing & Preparing Data
September 17th – Hack-a-Thon / Mixer
September 15-31st – Data Cleansing & Creating Effective Metrics
October 1st-12st – Machine Learning, Modeling, Presentation building, etc.,
October 12th – Presentation Drafts Due
October 17th – 21st – Final Presentations Due + Judges Review
October 24th – 28th – Present to Judges
November 19th - Results / Prizes Announced at the Big Mountain Data Conference


The Grand Prize will be cash, promotion at the Intermountain Data Conference, press, introductions to various business leaders in the Rocky Mountain area, and other cool stuff!

Evaluation Criteria Data Problem & Hypothesis Steps taken to Address that Problem Machine Learning Techniques Points for Presentation Effective Communication.


Brands that depend on Qubole


Row 4: 2-Column Break Out Sections


The 'headline' class removes top margin on heading tags

This is a nested "row" inside of the left column. It's 5/6 width (col-sm-9) and can be adjusted the same as the larger columns (instructions above). You can turn this entire row off by toggling the "Left Column: Nested Row 2" variable to "Hide".


The 'headline' class removes top margin on heading tags

This is a nested "row" inside of the left column. It's 5/6 width (col-sm-9) and can be adjusted the same as the larger columns (instructions above). You can turn this entire row off by toggling the "Left Column: Nested Row 2" variable to "Hide".

The 'headline' class removes top margin on heading tags

Add a message above your form here

This is an optional area for full-width content. Alternately, to move the "Brands That Rely on Qubole" section down, you can copy all the HTML in that section and paste it here.