Data Mining with MySQL – Finding Value in Numbers

July 24th, 2010 Leave a comment 1 comment
Like the article?
Data Mining

When it comes to full scale database analysis, many servers have integrated software packages such as SSIS for Microsoft and dedicated business intelligence tools from SAP, SAS and other forms. Analyzing data with open source technology however requires a bit more work when it comes to implementing an end to end Business Intelligence (BI) tool. While there are a variety of ways to export MySQL data into software packages that can handle modeling, it is important to understand how the process works and the technology required.

How MySQL can work with PMML

Working with broad patterns and intuition on raw database technology can be challenging, which is why developing models through Predictive Model Markup Language (PMML) is an ideal way to start with exported data from MySQL. While database administrators can handle basic tasks in working with data, properly creating predictive models requires first understanding how the development process works from beginning to end.

To determine potential outcomes and possibilities from database information, you can properly score MySQL data in order to segment forecasted outcomes. Data mining allows you to process raw MySQL data in a way that makes it accessible for predictive modeling. While MySQL is a great solution for storing and computing with data, you will need another step in your model in order to make this possible with PMML modeling techniques.

Segmenting Analytics Data

This flows directly into the process of segmenting your data according to defined rules. With PMML you can work directly with MySQL in order to define discrete segments of data rather than working with continuous variables that may not have proper labels. For developers familiar with basic markup languages, PMML will flow naturally in with your coding style. You can write direct commands which can define the data before you input it into a statistical package. Working with raw files in statistical programs can be challenging, so processing your SQL data in advance can save you substantial time. Suppose you want to define two segments of existing site users in your database based upon the frequency of their visits in the past 365 days. You could sort this data and define your variables after the fact but you can just as easily categorize your MySQL data in PMML:

<Discretize field="value">
  <DiscretizeBin binValue="frequent">
    <Interval closure="openclosed" leftmargin="10" rightmargin="100" />
  </DiscretizeBin>
</Discretize>

You could then define a “power user” with leftmargin=”100″ and interval closure “open” so that any users who have visited more than 100 times become power users and can be segmented properly for offers, site features and campaigns. Properly scoring prospects with MySQL data for input into statistical packages can provide substantial insights into the complete process of scoring and segmenting analytics data.

There are a number of different transformations you can work on for your database information so you can add a layer of intelligence on top of your database. While MySQL can provide impressive calculations and has the potential for a wide range of applications, it is best paired with a computation engine that allows you to determine correlations based upon parameters. With an open source base you can build your own BI unit from scratch.

Help us spread the word!
  • Twitter
  • Facebook
  • LinkedIn
  • Pinterest
  • Delicious
  • DZone
  • Reddit
  • Sphinn
  • StumbleUpon
  • Google Plus
  • RSS
  • Email
  • Print
Don't miss another post! Receive updates via email!
  1. It is good that we can define a “power user” with leftmargin=”100″ and interval closure “open” so that any users who have visited more than 100 times become power users

Comment