Wednesday, July 09, 2008

my CLEM experience

CACHE usually brings trouble: could leads to ORA-12704 character set mismatch error

MUST use Select and Filter, to reduce possible data reads, only read data/columns that you NEED to produce your result (average trans count or whatever).

CARMA is quicker than Apriori, and can adjust support when viewing result?

ID in Sequence node is people'S ID, not transaction ID.

Spend per visit distribution: X-axis is increasing value of spend/visit, Y axis is the number of people on that spending level.

good diagram2
Lift like: X axis is % of customers, Y axis is a percentage, many curves can be used to show how many % of customers produced the Y value on the curve

按f3 to delete all connection of selected node,
press middle button and drag to create a connection; double middle click to bypass a connected node

Pareto图: Descend sort Sales(98lines from B2 to B99), create a column, create function (say for C3=B3/SUM($B$2:$B$99), then format C3 as percentage; create another column D, D2=B2, but D3=D2+B3, then drag down(cumulative of B). THEN draw custom chart, with 2 axis column/curve chart

Excel求和:SUM($B$1: $B$99), dot or comma may be used but comma means only those (not between)

可以把Quest , c5.0模型串起来,同时输出2个预测值,不影响,还可以比对lift

RFM Node: each field map to a 5-bin field, and score has a weight of 10, so a top R,F, second bin M customer will have final score: 5*10+5*10+4*10=140 ( lots of others got 140 score as well)

PROFSET作用:某商品的利润不能用自身利润,而是要看其出现的商品组合的总利润.该组合的利润将按组合的出现概率(3次AB,AB,BC里2次AB,AB)指定给该组合.可以改变218种产品里54种的利润排名头号。 (所以促销时,促销这些隐性利润带来者而非显性单项利润最大者?

可以一边把处理完数据导出到库,同时另一分支处理(data audit, web, distribution),这样可以节约周末时间

In essence, mapping data results simply in the creation of a new Filter node, which matches up the appropriate fields by renaming them.

参数使用

Parameter Value Long name
Train.time 5 Time to train (minutes)
Sample.rand_pct 10 Percentage random sample

Note: The parameter names, such as Sample.rand_pct, use correct syntax for referring to node properties, where Sample represents the name of the node and rand_pct is a node property. See Properties Reference Overview for more information.

Once you have defined these parameters, you can easily modify values for the two Sample and Neural Net node properties without reopening each dialog box. Instead, simply select Set Parameters from the SuperNode menu to access the Parameters tab of the SuperNode dialog box, where you can specify new values for Random % and Time. This is particularly useful when exploring the data during numerous iterations of model building.

0 Comments:

Post a Comment

<< Home