Tuesday, August 26, 2008

fwd: SPSS Clementine Scripts基本语法

有点深的好书:袁:实用数据挖掘 (意大利人P Giudici著)

多个分支,比起用缓存,然后一个一个分支运行,效率更低

如果用CSV, remember to set to STRING or set to REAL properly

问题:如何调用脚本生成的节点名? ^yourname

最好不要重名节点,否则无法用script delete: ambigious nodes.以下可能可以help #Gets a reference to an existing node. This can be a useful way to ensure non-ambiguous references to nodes.
var mynode
set mynode = get node flag1:derivenode
position ^mynode at 400 400

分布节点:normalize by color,让bar等长

However, blanks (user defined missing values) do contribute to the aggregate summaries and these values should be replaced with $null$

We will also sort the data to make the data aggregation more efficient??

Distinct对大数据集不好,you can try an alternative: sort the data on the key fields, and then use the CLEM expression @OFFSET with a Select node to select (or discard) the first distinct record from each group

If the Keys are contiguous check box is selected, values for the key fields will only be treated as equal if they occur in adjacent records

Missing data太多的列应该删除。

是否划为outlier取决于分析目的:找有钱人,普通顾客(outlier不好);categorical fields 通常没有outlier/anormality;

类别的某个值太少,则无统计意义(跑步上班),infrequent behaviour

Anormality是从很多列里找不正常记录;
outlier也有可能是2列交叉:如低收入高奢侈

线形回归可被小比例的outlier影响,decision tree等则不容易;用histogram overlay或mean来看分布的影响.通常聚类时不用demographic data,而是用来校验。高度相关的列如果同时采用,则等于加了不比要的权

SPSS Clementine Scripts基本语法
1.作者:bolow (cnjm)

2.本文大部分内容为笔者在业余时间根据SPSS的官方文档整理出来,
* 按语法内容分成若干章节,大多数为语法元素的示意,讲解不多,
* 阅读时可能

Standard script: stored in a file

BE AWARE of ^ (variable ref, but not very useful?) and \ (line continous character, or maybe embeded single quote id, also)
If you use quotation marks within a CLEM expression, make sure that each quotation mark is preceded by a backslash (\)—for example:

set :node.parameter = "BP = \"HIGH\""


Script Syntax section: Variable names, such as ^mystream, are preceded with a caret (^) symbol when referencing an existing variable whose value has already been set. The caret is not used when declaring or setting the value of the variable. See Referencing Nodes for more information


. You can specify nodes by name--for example, DRUG1n. You can qualify the name by type--for example, Drug:neuralnetnode refers to a Neural Net node named Drug and not to any other kind of node.

• You can specify nodes by type only—for example, :neuralnetnode refers to all Neural Net nodes. Any valid node type can be used—for example, samplenode, neuralnetnode, and kmeansnode. The node suffix is optional and can be omitted, but including it is recommended because it makes identifying errors in scripts easier.

• You can reference each node by its unique ID as displayed on the Annotations tab for each node. Use an "@" symbol followed by the id, for example @id5E5GJK23L.custom_name = "My Node". See Annotating Nodes and Streams for more information.

@MEAN(BALANCE,5),流过的最后5个record @SUM(field),所有record

一 变量与类型

1 域(field)的名称以及变量名以字母开头,可以包含字母、数字以及下划线。
如果命名不遵循以上原则,名称需要用单引号包括

2 数据类型使用样式
字符串 --"c1", "Type 2", "a piece of free text"
整数 --12, 0, –189
实数 --12.34, 0.0, –0.0045
日期时间 --05/12/2002, 12/05/2002, 12/05/02
字符 --`a` 或者 3
列表 --[1 2 3], [’Type 1’ ’Type 2’]

3 引号使用规则
字符串 --最好使用双引号。虽然单引号也能用,但是有时候会和域名混淆
字符 --使用后引号`(ESC键下面的那个)
也可以使用数字
也可以使用字符串中的索引 比如lowertoupper("druga"(5)) —> "A"
域名 --通常是不用加引号的,但是如果包含了空格等特殊字符就要加上双引号
如果给没有定义的域名加上引号,可能会被认为是字符串
参数名 --必须使用单引号

二 语法

1 运算符优先顺序
函数参数
函数调用
xx
x / mod div rem
+ -
> < >= <= /== == = /=


2 结构控制
a if..then..else
if EXPR then STATEMENTS 1
else STATEMENTS 2
endif

举例
if ^param = 24 then
create derivenode
else exit 2
endif

b for循环
× for PARAMETER in LIST
STATEMENTS
endfor

× for PARAMETER from N to M
STATEMENTS
endfor

× for PARAMETER in_models
STATEMENTS
endfor
对生成模型面板上的模型进行枚举操作,模型的名字被传到PARAMETER变量中

× for PARAMETER in_fields_at NODE
STATEMENTS
endfor
对node节点下游(downstream)节点的每个字段进行操作

× for PARAMETER in_fields_to NODE
STATEMENTS
endfor
对node节点下游(upstream)节点的每个字段进行操作

× exit
for PARAMETER in_streams
STATEMENTS
endfor
对当前打开的流进行枚举操作

3 赋值示意
×set :balancenode.directives = [{1.3 "Age > 60"}]
set :fillernode.cHigh\")"
set :derivenode.formula_expr = "substring(5, 1, Drug)"
set Flag:derivenode.flag_expr = "Drug = X"
set :selectnode.c
set :derivenode.formula_expr = "Age - GLOBAL_MEAN(Age)"
set nodename.tablename="mytable"
set: databasenode.table="atablename"
set my_node = get node :plotnode
set :samplenode {
max_size = 200
mode = "Include"
sample_type = "First"
}
set :balancenode.directives = [{1.3 "Age > 60"}]
set :fillernode.cHigh\")"
set :derivenode.formula_expr = "substring(5, 1, Drug)"
set Flag:derivenode.flag_expr = "Drug = X"
set :selectnode.c
set :derivenode.formula_expr = "Age - GLOBAL_MEAN(Age)"
完整的表达形式应该是 set nodename:NODETYPE.prop=value
在独立脚本中引用节点要加^

× 设置超节点参数
set mySuperNode.parameters.minvalue = 30
set :process_supernode.parameters.minvalue = 30
set :process_supernode.parameters.minvalue = ""
set mySuperNode:process_supernode.parameters.minvalue = 30
set mySuperNode.parameters.’Data_subset:samplenode.rand_pct’ = 50
set :source_supernode.parameters.’Data_subset:samplenode.rand_pct’= 50
在定义一个超节点的参数的时候,必须使用短名

4 设置一个图标的位置
position nodename at 450 50


5 执行某个节点
execute :exe_node_name


6 新建一个节点和流
×创建节点
var x
set x = create typenode
rename ^x as "mytypenode"
position ^x at 200 200
var y
set y = create varfilenode
rename ^y as "mydatasource"
position ^y at 100 200
connect ^y to ^x

set node = create typenode
rename ^node as "mytypenode"
position ^node at 200 200
set node = create varfilenode
rename ^node as "mydatasource"
position ^node at 100 200
connect mydatasource to mytypenode

×创建流
create STREAM DEFAULT_FILENAME


7 访问数据结果
×value RESULT at ROW COLUMN

×set num_rows = :tablenode.output.row_count

×set table_data = :tablenode.output
set last_value = value table_data at num_rows num_cols


8 文件操作
× 打开文件
open MODE FILENAME
MODE create/append

× 关闭文件
close FILE

× 举例
set file = open create ’C:/Data/script.out’
for I from 1 to 3
write file ’Stream ’ >< I
endfor
close file

9 连接节点
create tablenode
create variablefilenode
connect :variablefilenode to :tablenode
set :variablefilenode.full_filename = "C:\Program Files\Clementine\8.1\demos\DRUG1n"
execute ’Table’
set param = value :tablenode.output at 1 1
if ^param = 24 then
create derivenode
else exit 2
endif

使用Clem expression in Scripts:
You can use CLEM expressions, functions, and operators within Clementine scripts; however, your scripting expression cannot contain calls to any @ functions, date/time functions, and bitwise operations. Additionally, the following rules apply to CLEM expressions in scripting:

• Parameters must be specified in single quotes and with the $P- prefix.

• CLEM expressions must be enclosed in quotes. If the CLEM expression itself contains quoted strings or quoted field names, the embedded quotes must be preceded by a backslash (\). See Scripting Syntax for more information.

You can use global values, such as GLOBAL_MEAN(Age), in scripting; however, you cannot use the @GLOBAL function itself within the scripting environment.

Examples of CLEM expressions used in scripting are:

set :balancenode.directives = [{1.3 "Age > 60"}]
set :fillernode.condition = "(Age > 60) and (BP = \"High\")"
set :derivenode.formula_expr = "substring(5, 1, Drug)"
set Flag:derivenode.flag_expr = "Drug = X"
set :selectnode.condition = "Age >= '$P-cutoff'"
set :derivenode.formula_expr = "Age - GLOBAL_MEAN(Age)"


各节点的property在此:Scripting, automation, and CEMI
Properties Reference

derivenode Properties
Derive node,Derive node,Derive node
properties,properties,properties
derivenode properties,derivenode properties,derivenode properties
The Derive node modifies data values or creates new fields from one or more existing fields. It creates fields of type formula, flag, set, stat, count, and conditional. See Derive Node for more information.
derivenode properties Data type Property description
new_name string Name of new field.
See the example below for usage.
mode Single
Multiple Specifies single or multiple fields.
fields [field field field] Used in Multiple mode only to select multiple fields.
name_extension string Specifies the extension for the new field name(s).
add_as Suffix
Prefix Adds the extension as a prefix (at the beginning) or as a suffix (at the end) of the field name.
result_type Formula
Flag
Set
State
Count
Conditional The six types of new fields that you can create.
formula_expr string Expression for calculating a new field value in a Derive node.
flag_expr string
flag_true string
flag_false string
set_default string
set_value_cond

特殊符Literal text blocks that include spaces, tabs, and line breaks can be included in scripts by setting them off in triple quotes. Any text within the quoted block is preserved as literal text, including spaces, line breaks, and embedded single and double quotes. No line continuation or escape characters are needed.

Clem expression用在script中需要用双引号包围之 set :fillernode.condition = "(Age > 60) and (BP = \"High\")"

Kohenan: default 7x10 is too many, 3x4 is better.设定(exponienal decay倾向于生成1个特别大的cluster;neighbourhood phase1在grid小时不应该大于phase2)。接近指定的镞数上限

Most of the time when using Apriori or GRI, we will

either not define data as blanks, or we will define data

as blanks and then remove the records with blank values.

Either one of these approaches will lead to no confusion

in the association rules created.要么不定义,如果定义,

则移去带空白的列。Apriori把定义的missing视为合法值,gri

找到空白规则,但统计算rule除去,所以得confidence=0的规则。CARMA基本不受影响(正常去除空白值)


Carma node works only with fields of storage type string

;而且只用true,不识别false
Override…True from the context menu
Right-click again and select Set Storage…String

(discussed. Carma allows user specifying rule support,

leading to simpler rules;
Carma has the limitation that the data be of type flag

(and storage type string) with tabular data,if the data

are changed to transactional format, Carma can use

fields of type set.如果要出负规则,则需要对调carma的真假


is decreased to the value of Neighborhood (Phase 2) +1. By default, this value is equivalent to that for Neighborhood (Phase 1), so no decrease occurs in phase 1

In small grids, the value of Neighborhood (Phase 1) shouldn’t be set higher than the default of Neighborhood (Phase 2); otherwise the whole grid will be affected.
? Don’t change the Cycles settings unless you get an odd-looking solution; they should be large enough for almost all circumstances, except for unusual data or a large number of clusters.
? The Initial Eta settings are the most likely place to begin to modify a network, and you would normally set Initial Eta (Phase 1) a bit higher, and perhaps Initial Eta (Phase 2) as well.
Techniques for Clustering 2 - 21
Clustering and Association Models with Clementine
? If you expect to find one dominant cluster, leave the Learning decay rate as Exponential. If not, you can try using a linear decay.

In the Kohonen node, blanks are handled by substituting “neutral” values for the missing ones. For range and flag fields with missing values (blanks and nulls), the missing value is replaced with 0.5 (for range fields, this is done after the original values have been transformed into a 0-1 range). Range field values below the lower bound (in the last Type node) will be set to the lower bound and values above the upper bound will be set to the upper bound value. For set fields, the derived indicator field values are all set to 0.0. This is the same missing data handling as we found in the K-Means node.

Kmeans:The Encoding value for sets value can be set between .0001 and 1.0, inclusive. Values below .70711 will decrease the importance of set and flag fields, while values above that will do the reverse.
Flag fields are not encoded in this manner, as doing so may distort the distances between records. They are given values of 0 and 1.
If symbolic fields are included in a clustering solution, we recommend leaving the encoding value at its default setting unless you have a good reason to change the influence of these fields.

solution compared to numeric fields. Accordingly, the value of .70711 is used instead (the square root of 1/2).

2step's ASSUMPTION
It should be noted that the likelihood function assumes that numeric predictors follow normal distributions and the symbolic predictors follow multinomial distributions; the former assumption is not.且不支持missingdata,records with blank,null,missing将被remove
3种cluster都依赖于输入顺序。

22 Comments:

Anonymous Anonymous said...

xanax cost xanax bars get high - soma xanax high

February 19, 2013 at 11:33 AM  
Anonymous Anonymous said...

buy tramadol online tramadol hydrochloride sr 100mg - tramadol hcl 50 mg drug interactions

February 20, 2013 at 6:55 PM  
Anonymous Anonymous said...

buy tramadol online buy tramadol cod delivery - tramadol 50 mg white pill

February 22, 2013 at 3:19 AM  
Anonymous Anonymous said...

tramadol 50 mg tramadol online pharmacy - tramadol hydrochloride 50mg capsules side effects

February 23, 2013 at 8:23 AM  
Anonymous Anonymous said...

order tramadol online tramadol for dogs 50mg dosage - tramadol 100 mg recreational

February 23, 2013 at 11:26 PM  
Anonymous Anonymous said...

buy tramadol online buy cheap tramadol online - clorhidrato de tramadol 100mg

February 24, 2013 at 3:59 AM  
Anonymous Anonymous said...

buy tramadol online tramadol 50mg dosage - 10 50 mg tramadol

February 27, 2013 at 9:00 AM  
Anonymous Anonymous said...

tramadol 50 mg tramadol for dogs arthritis - tramadol online no prescription overnight delivery

February 27, 2013 at 4:56 PM  
Anonymous Anonymous said...

buy tramadol online stop tramadol addiction - tramadol que es

February 28, 2013 at 9:55 AM  
Anonymous Anonymous said...

xanax online xanax 2mg no prescription - xanax withdrawal like

March 2, 2013 at 5:35 AM  
Anonymous Anonymous said...

buy cialis online cheap viagra cialis us - cialis 20 mg usa

March 2, 2013 at 8:50 AM  
Anonymous Anonymous said...

cialis online real cialis online pharmacy - buy cialis mumbai

March 4, 2013 at 4:54 AM  
Anonymous Anonymous said...

http://landvoicelearning.com/#23561 buy tramadol online usa cheap - tramadol addiction time

March 6, 2013 at 1:52 PM  
Anonymous Anonymous said...

buy tramadol online boompanjang blogspot buy tramadol online - buy tramadol overnight delivery no prescription

March 6, 2013 at 8:47 PM  
Anonymous Anonymous said...

learn how to buy tramdadol buy tramadol online reviews - buy tramadol online with mastercard

March 6, 2013 at 10:04 PM  
Anonymous Anonymous said...

tramadol 100mg buy tramadol online without rx - tramadol 627 high

March 7, 2013 at 8:04 PM  
Anonymous Anonymous said...

buy tramadol tramadol hcl with ibuprofen - buy tramadol now online

March 10, 2013 at 6:15 PM  
Anonymous Anonymous said...

http://buytramadolonlinecool.com/#96430 tramadol for dogs for pain - tramadol sr tablet 100mg

March 11, 2013 at 7:42 AM  
Anonymous Anonymous said...

ways to buy ativan online cost of lorazepam 1mg - want buy lorazepam

March 11, 2013 at 2:40 PM  
Anonymous Anonymous said...

buy tramadol no prescription overnight tramadol online order - tramadol for dogs reviews

March 11, 2013 at 3:09 PM  
Anonymous Anonymous said...

http://reidmoody.com/#51208 order ativan online - ativan dosage 0.25 mg

March 13, 2013 at 2:46 AM  
Anonymous Anonymous said...

buy tramadol online tramadol 100 mg sustained release - tramadol dosage maximum

March 15, 2013 at 10:49 PM  

Post a Comment

<< Home