User Dear Forename when available;
Always type, one side.
Use thick/good paper, not normal 80g/m2, but 90/100 gm
Explain how you obtained key requirements and give evidence
Use 1st class mail

AT THE END of the inverview ask "will I be able to get feedback" to create a positive hook and leave the door open even if rejected

Set an objective for networking "I want six more people to be aware of the skills I can offer", be persistent and try again later (ask for permission?

group interview:用笔记录的同伴乱记,用了不在材料上的乱设条件: 应该提醒



Questions you may be asked in Interview

This list will help give you that vital edge in interviews. The trick is to find out what your client is looking for. Once you feel you know this your confidence will grow. Below is a list of questions employers often ask (including some difficult ones). After each question we we explain what the interviewer is really looking for. Remember to put yourself in the employer’s shoes and think about what lies behind each line of questioning.

Tell me about yourself

Employers are looking for a quick snapshot of you (both your background and your personality) and how well you sell yourself and your capabilities. Don’t ramble on.

Why did you apply for the job?

This looks at your levels of motivation and commitment. Make sure you research thoroughly what the job entails. State the benefits you feel you can offer. Say why you want this job – not why you are leaving your present one.

Tell me what you do in your spare time?

This question has a double purpose. To make sure that you have a fully rounded personality and to ensure that your hobbies won’t interfere with your job. Go over any outside interests quickly, highlighting any job relevance and outlining the skills you have developed through them.

When have you been involved in teams?

Employers want a team player, so give examples of your role within teams (eg creative, promoter, developer, organiser, inspector, maintainer, adviser). Underline what you learned and how it has made you more effective in a team. Link your answer directly to the job you’re after – check if they are looking for a creative, resourceful team member, a detail orientated person who will see tasks through or a positive team leader, and then tailor your answer accordingly.

What are your main strengths and weaknesses?

This revolves around self awareness. Again, link your strengths to the particular job. Employers want someone who knows what they are good at and where they need to improve. Everybody has a weakness but employers want to know what you are doing to improve. Choose a positive weakness and turn it into a strength eg ‘I’m a bit of a perfectionist but that’s good for quality’, ‘my financial skills aren’t as sharp as I’d like but I’m attending an evening class in bookkeeping’

Why should we employ you?

What skills do you have that could add value to the company? Make brief but telling comparisons between the job description and your ability to meet their needs. State briefly what you can offer and back up anything you say with facts.

What has been your biggest achievement?

This reveals what motivates you (family, work, education or leisure). Choose something that makes you stand out and involves positive characteristics e.g. you developed determination, strength of character etc.

What have you learned from your past work experiences?

This focuses on the skills developed in previous jobs (vacation, part-time, full-time). Think about those jobs. Did you have any responsibility? Pull out the positive elements and focus on benefits to the employer.

When did you last work under pressure or deal with conflict – and how did you cope?

This is aimed at discovering if you can deal with problems quickly and efficiently and confront a situation if you become frustrated. The best technique is to think of an example and explain how the situation arose – then say how you dealt with it. If asked directly if anything made you annoyed or frustrated, be truthful but avoid appearing negative.

What is the biggest/problem/dilemma etc you’ve ever faced?

Try to choose something that will show you in a positive light. How did you get over it? What did you learn? Try and keep it work related if possible and not eg about an ongoing dispute with your neighbour. Your answer will not only show how you cope under stress but also your decision making ability and strength of character.

What other career opportunities are you looking at?

This will illustrate how well you have researched and thought through your chosen career area. It will also show an employer how much you really want the job. If you list a long series of unrelated career options, it will cast doubt on your motivation.

Where would you like to be in 5 or 10 years time?

Again, if you have a clear idea, it will show commitment and vision. If you do have some insight into where you are heading, think of some of the functions and responsibilities you would hope to have

When have you had to…..?

Employers want real evidence that clearly demonstrates you have particular skills. Draw up a list of key skills required for the position (found by dissecting the job ad, job description and personal specification) and highlight at least two situations or achievements that prove you have each skill. Practice talking through each example and present a concise, hard-hitting case. Avoid waffle and keep it sharp.

What would you do in ……..situation?

Situational questions are used to test your overall style and approach. Carefully prepare by listing all the roles you’ll potentially undertake in the new position and think up awkward questions yourself.

So, sell me this product.

Roel play questions really make you think on your feet. Once again, do your homework. Be prepared to demonstrate your skills in action.

What salary do you expect?

Work out a salary range you consider reasonable – job ads and job websites will give you an idea. Don’t undersell or oversell yourself. Give a range and indicate that you are prepared to negotiate.

How competent are you at ……?

Many employers now like to assess candidates using scoring grids with a work-based framework. This makes it important to quote practical examples showing your level of competence.

Are you pregnant/gay/etc?

Yes, it’s an outrageous question but always be on the alert for it. It may be designed to shock you and assess your reactions. It may equally reflect the fact that some employers lack formal training in interview techniques and fall back on crude stereotypes. Whatever the reason, it’s vital not to lose your cool – just write if off as ignorance.

You haven’t been much of a success so far, have you?

The aggressive approach may also throw you. The reasons could be the same but this time it is more likely to be a deliberate attempt to unnerve you. Again, keep your composure; it’s probably the reaction they are looking for.

Do you have any questions?

Always expect this one – so prepare a list. Include a few probing questions to show you’ve done your homework. Don’t be afraid to write them down and take them to the interview with you.

Other Questions which may be asked

* What brings you to the job market at this point in your career?
* Why would you like to work for this company in particular?
* What attracts you to this role?
* If you could change anything about your career so far, what would it be?
* How would members of your team describe you?
* What important points came out of your last appraisal?
* Describe your management style.
* What do you look for in a manager?
* Describe your toughest client.
* What do you want from your next role?
* What does success mean to you?
* What are the key things that drive or motivate you?
* Describe a difficult work scenario and how you managed it.

Questions to ask your interviewer

* How has this vacancy arisen?
* How would you describe the firm/company culture?
* What do you see as the key challenges of this role?
* How do you differentiate yourselves from your competitors?
* What are the organisation’s major business objectives in the coming year?
* How are employees measured in terms of performance?
* What processes exist to support employees in their career development?
* How would you describe the firm/company’s values?
* What key issues currently face the organisation?
* What can I expect to be involved in during my first six months of joining?
* What are the department’s priorities during the next six months?

Further Advice

Do not hesitate to ask your Recruitment Consultant for any additional advice, remember it is their job to assist you in getting your ideal job.

数据仓库(Data Warehouse \ DW)是为了便于多维分析和多角度展现而将数据按特定的模式进行存储所建立起来的关系型数据库,它的数据基于OLTP源系统。数据仓库中的数据是细节的、集成的、面向主题的,以OLAP系统的分析需求为目的。


























历史数据保留,新增数据也要保留。这时,要将原数据更新,将新数据插入,我们使用UPDATE / INSERT。比如:某一员工2005年在A部门,2006年时他调到了B部门。那么在统计2005年的数据时就应该将该员工定位到A部门;而在统计2006年数据时就应该定位到B部门,然后再有新的数据插入时,将按照新部门(B部门)进行处理,这样我们的做法是将该维度成员列表加入标识列,将历史的数据标识为“过期”,将目前的数据标识为“当前的”。另一种方法是将该维度打上时间戳,即将历史数据生效的时间段作为它的一个属性,在与原始表匹配生成事实表时将按照时间段进行关联,这种方法的好处是该维度成员生效时间明确。




在公司的大量数据堆积如山时,我们想看看里面究竟是什么,结果发现里面是一笔笔生产记录,一笔笔交易记录… 那么这些记录是我们将要建立的事实表的原始数据,即关于某一主题的事实记录表。






ETL是数据抽取(Extract)、转换(Transform)、加载(Load )的简写,它是指:将OLTP系统中的数据抽取出来,并将不同数据源的数据进行转换和整合,得出一致性的数据,然后加载到数据仓库中。例如:下图就向我们展示了ETL的数据转换效果。










在对数据进行处理时,难免会发生数据处理错误,产生出错信息,那么我们如何获得出错信息并及时修正呢? 方法是我们使用一张或多张Log日志表,将出错信息记录下来,在日志表中我们将记录每次抽取的条数、处理成功的条数、处理失败的条数、处理失败的数据、处理时间等等。这样,当数据发生错误时,我们很容易发现问题所在,然后对出错的数据进行修正或重新处理。








数据仓库系统中,一个很重要的目的就是保留数据的历史变化信息。而变化数据捕获(Change Data Capture,CDC)就是为这个目的而产生的一项技术。变化数据捕获常用的方法有:1)文件或者表的全扫描对比,2)DBMS日志获取,3)在源系统中增加触发器获取,4)基于源系统的时间戳获取,5)基于复制技术的获取,6)DBMS提供的变化数据捕获方法等。其中,由DBMS提供变化数据捕获的方法是大势所趋,即具体的捕获过程由DBMS来完成。
创新性应用 数据建模经验谈
2007-09-27 20:38:45 作者: 来源:互联网 文字大小:【大】【中】【小】
    笔者从98年进入数据库及数据仓库领域工作至今已经有近八年的时间,对数据建模工作接触的比较多,创新性不敢谈,本文只是将工作中的经验总结出来,供大家一同探讨和指正。  提起数据建模来,有一点是首先 ...


















  首先需要说明的一点是,目前在数据仓库领域比较一致的意见是在数据仓库中需要保留企业范围内一致的原子层数据。而独立的数据集市架构(Independent data marts)没有企业范围内一致的数据,很可能会导致信息孤岛的产生,除非在很小的企业内或只针对固定主题,否则不建议建立这样的架构方式。联邦式的数据仓库架构(Federated Data Warehouse Architecture)不管是在地域上的联邦还是功能上的联邦都需要先在不同平台上建立各自的数据仓库,再通过参考(reference)数据来实现整合,而这样很容易造成整合的不彻底,除非联邦式的数据仓库架构也采用Kimball的总线架构(Bus Architecture)中类似的功能,即在数据准备区保留一致性维度(Conformed Table)并不断更新它。所以,这两种架构方式不在讨论范围之内。下面主要讨论剩下的三种架构方式。


  这样的数据仓库架构最大的倡导者就是数据仓库之父Inmon,而他的企业信息工厂(Corporate Information System)就是典型的代表。这样的架构也称之为企业数据仓库(Enterprise Data Warehouse,EDW)。企业信息工厂的实现方式是,首先进行全企业的数据整合,建立企业信息模型,即EDW。对于各种分析需求再建立相应的数据集市或者探索仓库,其数据来源于EDW。三范式的原子层给建立OLAP带来一定的复杂性,但是对于建立更复杂的应用,如挖掘仓库、探索仓库提供了更好的支持。这类架构的建设周期比较长,相应的成本也比较高。

  2)星型结构(Star Schema)的原子层+HOLAP



  这样的数据仓库架构也称为集中式架构(Centralized Architecture),思路是在三范式的原子层上直接建立ROLAP,做的比较出色的就是MicroStrategy。在三范式的原子层上定义ROLAP比在星型结构的原子层上定义ROLAP要复杂很多。采用这种架构需要在定义ROLAP是多下些功夫,而且ROLAP的元数据不一定是通用的格式,所以对ROLAP做展现很可能会受到工具的局限。这类架构和第一类很相似,只是少了原子层上的数据集市。




  在数据仓库系统中,一个很重要的目的就是保留数据的历史变化信息。而变化数据捕获(Change Data Capture,CDC)就是为这个目的而产生的一项技术。变化数据捕获常用的方法有:1)文件或者表的全扫描对比,2)DBMS日志获取,3)在源系统中增加触发器获取,4)基于源系统的时间戳获取,5)基于复制技术的获取,6)DBMS提供的变化数据捕获方法等。其中,由DBMS提供变化数据捕获的方法是大势所趋,即具体的捕获过程由DBMS来完成。


  而对于一些零售行业,像合同表中的合同金额类似的数值在录入后是有可能会发生改变的,也就是说事实表的数据也有可能发生变化。通常对于事实表数据的修改属于勘误的范畴,可以采用类似缓慢变化维TYPE 1的处理方式直接更新事实表。笔者不太赞同对事实表的变化采用快照的方式插入一条新的事实勘误记录,这样会给后续的展现、分析程序带来太多的麻烦。









即在数据准备区保留一致性维度(Conformed Table)并不断更新它。所以,

Dimension Design Best Practices
Good dimension design is the most important aspect of a well designed Analysis Services OLAP database. Although the wizards in Analysis Services do much of the work to get you started, it is important to review the design that is created by the wizard and ensure that the attributes, relationships, and hierarchies correctly reflect the data and match the needs of your end-users.

Do create attribute relationships wherever they exist in the data
Attribute relationships are an important part of dimension design. They help the server optimize storage of data, define referential integrity rules within the dimension, control the presence of member properties, and determine how MDX restrictions on one hierarchy affect the values in another hierarchy. For these reasons, it is important to spend some time defining attribute relationships that accurately reflect relationships in the data.

Avoid creating attributes that will not be used
Attributes add to the complexity and storage requirements of a dimension, and the number of attributes in a dimension can significantly affect performance. This is especially of attributes which have AttributeHierachyEnabled set to True. Although SQL Server 2005 Analysis Services can support many attributes in a dimension, having more attributes than are actually used decreases performance unnecessarily and can make the end-user experience more difficult.

It is usually not necessary to create an attribute for every column in a table. Even though the wizards do this by default in Analysis Services 2005, a better design approach is to start with the attributes you know you'll need, and later add more attributes. Adding attributes as you discover they are needed is generally better a better practice than adding everything and then removing attributes.

Do not create hierarchies where an attribute of a lower level contains fewer members than an attribute of the level above
A hierarchy such as this is frequently an indication that your levels are in the incorrect order: for example, [City] above [State]. It might also indicate that the key columns of the lower level are missing a column: for example, [Year] above [Quarter Number] instead of [Year] above [Quarter with Year]. Either of these situations will lead to confusion for end-users trying to use and understand the cube.

Do not include more than one non-aggregatable attribute per dimension
Because there is no All member, each non-aggregatable attribute will always have some non-all member selected, even if not specified in a query. Therefore, if you include multiple non-aggregatable attributes in a dimension, the selected attributes will conflict and produce unexpected numbers.

For example, in a time dimension it might not make sense to sum the members of [Calendar Year] or [Fiscal Year], but if both are made non-aggregatable, whenever a user asks for data for a specific [Calendar Year] it will be filtered by the default [Fiscal Year] unless they also specify the [Fiscal Year]. Worse, because [Calendar Year] and [Fiscal Year] do not align but overlap, it is difficult to obtain the full data for either a [Calendar Year] or a [Fiscal Year] because the one is filtered by the other.

Do use key columns that completely and correctly define the uniqueness of the members in an attribute
Usually a single key column is sufficient, but sometimes multiple key columns are necessary to uniquely identify members of an attribute. For example, it is common in time dimensions to have a [Month] attribute include both [Year] and [Month Name] as key columns. This is known as a composite key and identifies January of 1997 as being a different member than January of 1998. When you use [Month] in a time hierarchy that also contains [Year], this distinction between January of 1997 and January of 1998 is important.

It may also make sense to have a separate [Month of Year] attribute that has only [Month Name] as the key. This [Month of Year] attribute contains a single January member that spans all years, which can be useful for comparing seasonal data. However, this attribute should not be used in a hierarchy together with [Year] because there is no relationship between [Month of Year] and [Year].

Similar distinctions between [Quarter] and [Quarter of Year], [Semester] and [Semester of Year], and so on should also be made by setting appropriate key columns.

Do perform Process Index after doing a Process Update if the dimension contains flexible AttributeRelationships or a parent-child hierarchy
An aggregation is considered flexible if any attribute included in the aggregation is related, either directly or indirectly, to the key of its dimension through an AttributeRelationship with RelationshipType set to Flexible. Aggregations that include parent-child hierarchies are also considered flexible.

When a dimension is processed by using the Process Update option, any flexible aggregations that the dimension participates in might be dropped, depending on the contents of the new dimension data. These aggregations are not rebuilt by default, so Process Index must then be explicitly performed to rebuild them.

Do use numeric keys for attributes that contain many members (>1 million)
Using a numeric key column instead of a string key column or a composite key will improve the performance of attributes that contain many members. This best practice is based on the same concept as using surrogate keys in relational tables for more efficient indexing. You can specify the numeric surrogate column as the key column and still use a string column as the name column so that the attribute members appear the same to end-users. As a guideline, if the attribute has more than one million members, you should consider using a numeric key.

Do not create redundant attribute relationships
Do not create attribute relationships that are transitively implied by other attribute relationships. The alternative paths created by these redundant attribute relationships can cause problems for the server and are of no benefit to the dimension. For example, if the relationships A->B, B->C, and A->C have been created, A->C is redundant and should be removed.

Do include the key columns of snowflake tables joined to nullable foreign keys as attributes that have NullProcessing set to UnknownMember
If tables that are used in a dimension are joined on a foreign key column that might contain nulls, it is important that you include in your design an attribute whose key column is the corresponding key in the lookup table. Without such an attribute, the OLAP server would have to issue a query to join the two tables during dimension processing. This makes processing slower; moreover, the default join that is created by the OLAP server would exclude any rows that contain nulls in the foreign key column. It is important to set the NullProcessing option on the key column of this attribute to UnknownMember. The reason is that, by default, nulls are converted to zeros or blanks when the engine processes attributes. This can be dangerous when you are processing a nullable foreign key. Conversion of a null to zero at best produces an error; in the worst case, the zero may be a legitimate value in the lookup table, thereby producing incorrect results.

To handle nullable foreign keys correctly, you must also set UnknownMember to Visible on the dimension. The Cube Wizard and Dimension Wizard currently set this property automatically; however, the Dimension Wizard lets you manually de-select the key attribute of snowflake tables. You must not deselect the key column if the corresponding foreign key is nullable.

If you do not want to browse the attribute that contains the lookup table key column, you can set AttributeHierarchyVisible to False. However, AttributeHierarchyEnabled must be set to True because it is necessary that all other attributes in the lookup table be directly or indirectly related to the lookup key attribute in order to avoid the automatic creation of new joins during dimension processing.

Do set the RelationshipType property appropriately on AttributeRelationships based on whether the relationships between individual members change over time
The relationships between members of some attributes, such as dates in a given month or the gender of a customer, are not expected to change. Other relationships, such as salespeople in a given region or the marital status of a customer, are more prone to change over time. You should set RelationshipType to Flexible for those relationships that are expected to change and set RelationshipType to Rigid for relationships that are not expected to change.

When you set RelationshipType appropriately, the server can optimize the processing of changes and re-building of aggregations.

By default, the user interface always sets RelationshipType to Flexible.

Avoid using ErrorConfigurations with KeyDuplicate set to IgnoreError on dimensions
When KeyDuplicate is set to IgnoreError, it can be difficult to detect problems with incorrect key columns, incorrectly defined AttributeRelationships, and data consistency issues. Instead of using the IgnoreError option, in most cases it is better to correct your design and clean the data. The IgnoreError option may be useful in prototypes where correctness is less of a concern. Be aware that the default value for KeyDuplicate is IgnoreError. Therefore, it is important to change this value after prototyping is complete to ensure data consistency.

Do define explicit default members for non-aggregatable attributes
By default, the All member is used as the default member for aggregatable attributes. This default works very well for aggregatable attributes, but non-aggregatable attributes have no obvious choice for the server to use as a default member, therefore a member will be selected arbitrarily. This arbitrarily selected member is then selected whenever the attribute is not explicitly included in an MDX query. To avoid this, it is important to explicitly set a default value for each non-aggregatable attribute.

Default members can be explicitly set either on the DimensionAttribute or in the cube script.

Avoid creating user-defined hierarchies that do not have attribute relationships relating each level to the level above
Having attribute relationships between every level in a hierarchy makes the hierarchy strong and enables significant server optimizations.

Avoid creating diamond-shaped attribute relationships
A Diamond-shaped relationship refers to a chain of attribute relationships that splits and rejoins but contains no redundant relationships. For example, Day->Month->Year and Day->Quarter->Year have the same start and end points, but do not have any common relationships. The presence of multiple paths can create some ambiguity on the server. If preserving the multiple paths is important, it is strongly recommended that you resolve the ambiguity by creating user hierarchies that contain all the paths.

Consider setting AttributeHierarchyEnabled to False on attributes that have cardinality that closely matches the key attribute
When an attribute contains roughly one value for each distinct value of the key attribute, it usually means that the attribute contains only alternative identification information or secondary details. Such attributes are usually not interesting to pivot or group by. For example, the Social Security number or telephone number may be interesting properties to view, but there is very little value in being able to pivot and group based on SSN or telephone. Setting AttributeHierarchyEnabled to False on such attributes will reduce the complexity of the dimension for end-users and improve its performance.

If you want to be able to browse such attributes, you can set AttributeHierarchyEnabled to True; however, you should consider setting AttributeHierarchyOptimized to NotOptimized and setting GroupingBehavior to DiscourageGrouping. By setting these properties, you can improve performance and indicate to the users that the attribute is not very useful for grouping.

Consider setting AttributeHierarchyVisible to False on the key attribute of parent-child dimensions
Because the members of the key attribute are also contained in the parent-child hierarchy in a more organized manner, it is usually unnecessary and confusing to the end-user to expose the flat list of members contained in the key attribute.

Avoid setting UnknownMember=Hidden
When you suppress unknown members, the effect is to hide relational integrity issues; moreover, because hidden members might contain data, results might appear not to add up. Therefore, we recommend that you avoid use of this setting except in prototype applications.

Do use MOLAP storage mode for dimensions with outline calculations (custom rollups, semi-additive measures, and unary operators)
Dimensions that contain custom rollups or unary operators will perform significantly better using MOLAP storage. The following dimension types will also benefit from using MOLAP storage: an Account dimension in a measure group that contains measures aggregated using ByAccount; the first time dimension in a measure group that contains other semi-additive measures.

Do use a 64 bit server if you have dimensions with more than 10 million members
If a dimension contains more than 10 million members, using an x64 or an IA-64-based server is recommended for better performance.

Do set the OrderBy property for time attributes and other attributes whose natural ordering is not alphabetical
By default, the server orders attribute members alphabetically, by name. This ordering is especially undesirable for time attributes. To obtain the desired ordering, use the OrderBy and OrderByAttributes properties and explicitly specify how you want the members ordered. For time-based attributes, there is frequently a date or numeric key column that can be used to obtain the correct chronological ordering.

Do expose a DateTime MemberValue for date attributes
Some clients, such as Excel, will take advantage of the MemberValue property of date members and use the DateTime value that is exposed. When Excel recognizes the value as DateTime, Excel can treat the value as a date type and apply date functions to the value, as well as provide better formatting and filtering. If the key column is a single DateTime column and the name column has not been set, this MemberValue is automatically derived from the key column and no action is necessary. However, in other cases, you can ensure that the MemberValue is DateTime by explicitly specifying the ValueColumn property of the attribute.

Do set AttributeHierarchyEnabled to False, specify a ValueColumn and specify the MimeType of the ValueColumn on attributes that contain images
Because there is no value in browsing the member names of an attribute that contains an image, you should disable browsing by setting AttributeHierarchyEnabled to False. To help clients recognize and display the member property of the attribute as an image, specify the ValueColumn property of the attribute and then set MimeType to an appropriate image type.

Avoid setting IsAggregatable to False on any attribute other than the parent attribute in a parent-child dimension
Non-aggregatable attributes have non-all default members. These default members affect the result of queries whenever the attributes are not explicitly included. Because parent-child hierarchies generally represent the most interesting exploration path in dimensions that contain them, it is best to avoid having non-aggregatable attributes other than the parent attribute.

Do set dimension and attribute Type properties correctly for Time, Account, and Geography dimensions
For time dimensions, it is important to set the dimension and attribute types correctly so that time-related MDX functions and the time intelligence of the Business Intelligence Wizard can work correctly. For Account dimensions, it is similarly important to set appropriate account types when using measures with the aggregate function ByAccount. Geography types are not used by the server, but provide information for client applications.

A common mistake is to set the Type property on a dimension but not on an attribute, or vice-versa. Another common mistake when configuring time dimensions is to confuse the different time attribute types, such as [Month] and [Month of Year].

Consider creating user-defined hierarchies whenever you have a chain of related attributes in a dimension
Chains of related attributes usually represent an interesting navigation path for end-users, and defining hierarchies for these will also provide performance benefits.

Do include all desired attributes of a logical business entity in a single dimension instead of splitting them up over several dimensions
In Analysis Services 2000, each hierarchy was in reality a separate dimension and attributes such as gender and age would also be separate dimensions. In Analysis Services 2005, a dimension can and should contain the complete information about a logical business entity, including multiple hierarchies and many attributes. This does not mean that every piece of information available must be included in the dimension, but rather that any desired information should be included in one dimension instead of split over many dimensions.

There are two exceptions to this guideline:

A dimension can only contain one parent-child hierarchy.

To model multiple joins to a lookup table within a dimension's schema, you must create a separate dimension based on the lookup table and then use this as a referenced dimension.

Do not combine unrelated business entities into a single dimension
Combining attributes of independent business entities, such as customer and product or warehouse and time, into a single dimension will not only create a confusing model, but also reduce query performance because auto-exist will be applied across attributes within the dimension.

Another way to state this rule is that the values of the key attribute of a dimension should uniquely identify a single business entity and not a combination of entities. Generally this means having a single column key for the key attribute.

Do set NullProcessing to UnknownMember on each attribute that has nulls and is used to join to a referenced dimension
By default, nulls are converted to zeros or blanks when the engine processes attributes. This can be dangerous when processing a nullable foreign key, because if a null is converted to zero when zero is a legitimate value in the reference dimension, the join on the values can produce incorrect results. At best, conversion to zero will produce an error.

To prevent these errors, you must also set UnknownMember to Visible on the referenced dimension.

The Cube Wizard in SQL Server 2005 Analysis Services handles both settings automatically, except when dealing with existing dimensions where UnknownMember is not set to Visible.

Do set NullKeyConvertToUnknown to IgnoreError on the ErrorConfiguration on any measure groups that contain a dimension referenced through a nullable column
By default, nulls are converted to zeros or blanks when the engine processes granularity attributes. This can be dangerous when you are processing a nullable foreign key, because if a null value is converted to zero and zero is a legitimate value in the dimension, the join can produce incorrect results. At best, the conversion will produce errors.

To prevent conversion of nulls, you must also set UnknownMember to Visible on the dimension.

The Cube Wizard in SQL Server 2005 Analysis Services handles these settings automatically, except when dealing with existing dimensions where UnknownMember is not set to Visible.

Consider setting AttributeHierarchyVisible to False for attributes included in user-defined hierarchies
It is usually not necessary to expose an attribute in its own single level hierarchy when that attribute is included in a user-defined hierarchy. This duplication only complicates the end-user experience without providing additional value.

One common case in which it is appropriate to present two views of an attribute is in time dimensions. The ability to browse by [Month] and the ability to browse by [Month-Quarter-Year] are both very valuable. However, these two month attributes are actually separate attributes. The first contains only the month value such as “January” while the second contains the month and the year such as “January 1998”.

Do not use proactive caching settings that put dimensions into ROLAP mode
For performance reasons, we strongly discourage the use of dimension proactive caching settings that may put the dimension in ROLAP mode. To ensure that a dimension with proactive caching enabled will never enter ROLAP mode, you should set the OnlineMode property to OnCacheComplete. You can also prevent use of ROLAP mode by deselecting the Bring online immediately check box in the Storage Options dialog box.

Avoid making an attribute non-aggregatable unless it is at the end of the longest chain of attribute relationships in the dimension
Non-aggregatable attributes have non-all default members that affect the result of queries in which values for those attributes are not explicitly specified. Therefore, you should avoid making an attribute non-aggregatable unless that attribute is regularly used. Because the longest chain of attributes generally represents the most interesting exploration path for users, it is best to avoid having non-aggregatable attributes in other, less interesting chains.

Consider creating at least one user-defined hierarchy in each dimension that does not contain a parent-child hierarchy
Most (but not all) dimensions contain some hierarchical structure to the data which is worth exposing in the cube. Frequently the Cube Wizard or Dimension Wizard will not detect this hierarchy. In these cases, you should define a hierarchy manually.

Do set the InstanceSelection property on attributes to help clients determine the best way to display attributes for member selection
If there are too many members to display in a single list, the client user interface can use other methods, such as filtered lists, to display the members. By setting the InstanceSelection property, you provide a hint to client applications to suggest how a list of items should be displayed, based on the expected number of items in the list.

In a temporal database, it is necessary to distinguish between the surrogate key and the primary key. Typically, every row would have both a primary key and a surrogate key. The primary key identifies the unique row in the database, the surrogate key identifies the unique entity in the modelled world; these two keys are not the same. For example, table Staff may contain two rows for "John Smith", one row when he was employed between 1990 and 1999, another row when he was employed between 2001 and 2006. The surrogate key is identical (non-unique) in both rows however the primary key will be unique.

微软总部Microsoft Marketing数据分析项目等
现在的数据库工具厂家比较多,对海量数据的处理对所使用的数据库工具要求比较高,一般使用Oracle或者DB2,微软公司最近发布的SQL Server 2005性能也不错。另外在BI领域:数据库,数据仓库,多维数据库,数据挖掘等相关工具也要进行选择,象好的ETL工具和好的OLAP工具都十分必要,例如Informatic,Eassbase等。笔者在实际数据分析项目中,对每天6000万条的日志数据进行处理,使用SQL Server 2000需要花费6小时,而使用SQL Server 2005则只需要花费3小时。
对海量数据进行分区操作十分必要,例如针对按年份存取的数据,我们可以按年进行分区,不同的数据库有不同的分区方式,不过处理机制大体相同。例如SQL Server的数据库分区是将不同的数据存于不同的文件组下,而不同的文件组存于不同的磁盘分区下,这样将数据分散开,减小磁盘I/O,减小了系统负荷,而且还可以将日志,索引等放于不同的分区下。
如果系统资源有限,内存提示不足,则可以靠增加虚拟内存来解决。笔者在实际项目中曾经遇到针对18亿条的数据进行处理,内存为1GB,1个P4 2.4G的CPU,对这么大的数据量进行聚合操作是有问题的,提示内存不足,那么采用了加大虚拟内存的方法来解决,在6块磁盘分区上分别建立了6个4096M的磁盘分区,用于虚拟内存,这样虚拟的内存则增加为 4096*6 + 1024 = 25600 M,解决了数据处理中的内存不足问题。
十一、 定制强大的清洗规则和出错处理机制
十二、 建立视图或者物化视图
十三、 避免使用32位机子(极端情况)
十四、 考虑操作系统问题
十五、 使用数据仓库和多维数据库存储
十六、 使用采样数据,进行数据挖掘

"rebel state"

Why Kosovo, Bosnia and Herzegovina are not REBEL STATE???

SHAME BBC. Shame western media this time


作为系统集成技术的专家,我们能够帮助你们提升运作效率。我们将利用我们成熟的新技术储备帮助你们降低现有的资产负债率,为你们带来更高的回报,同时帮助 你们提升团队意识。我们坚信自己是能够带领你们走向更高阶梯的伙伴,将以娴熟的战术和策略帮助你们取得成功。我们所拥有的世界级的专业人员广泛地采用业界 最优异的技术手段。
我将为你们讲述艾利森——我们的一位高级经理的故事,这样你们就能以最好的方式了 解我们的公司了。就在上周,她接待了一家来自西海岸的客户,这家公司刚刚完成了并购,而原先两家公司的系统看上去很难协调在一起,因此不得不报废掉原来的 系统完全换新的。艾利森的建议是:“先不要买任何一台新的电脑。”她在客户那里实地观察和分析了几天之后,发现运用一些专业的客户端软件,现有的两个系统 完全可以很好地实现协同工作。就这样,客户仅花费60万美元就搞定了这一切,而原先客户计划的是300万美元。那家公司的总裁根本不敢相信自己的耳朵。这 个方案使他大喜过望,他甚至邀请艾利森和她的家人今年夏天到他的山间别墅去小住一周。

这一切都该结束了。我不会再容忍这种对于安全生产漫不经心的态度。从这一刻起,一旦我发 现你们没有遵守安全生产的规则,我会立刻请你们走人,不再给你们发放工资。我不在乎我们有多忙,要么这些事故报告恢复到我们可以接受的水平,要么我们就走 着瞧。
看看周围吧,这里到处都是安全生产标志,但我肯定它们并没有真正起到作用。如果我们不遵守安全 生产的规则,我们就是在拿生命开玩笑。弗兰克(向弗兰克示意),我知道你是那支棒球队的教练,如果你的腕关节摔坏了的话——我是说如果——那你把球扔给内 场手就很困难了。迈克尔(向迈克尔示意),我知道你和你的妻子周末喜欢去钓鱼,如果你的脚缠着绷带、步履蹒跚的话,你就无法划着小船在湖中行动自如、控制 好浮标并钓到鱼了。一旦你受伤,这就意味着我得打电话给你所爱的人,告诉他们到诊所或医院来见你;告诉他们你病了,你需要他们。我痛恨去打这些电话,我痛 恨把坏消息带给你的家人。从今天起,我不会再这样做了。因为:如果让我看见你们中的任何人没有遵守我们一致认可的安全注意事项,我就立即把你们遣送回家。 然后由你们自己去向你的家人解释,为什么你丢掉了工作。我们的安全记录从今天起将会被改写,因为我想要让你们今晚以及每晚都能够安全回家。

这简直就是个灾难。我们需要增加50%的服务设施。据我们的设备经理统计,这样做的成本将是40万美元。我们需要马上扩充服务设施,以便能完成我们的 使命。
我们越来越难以完成我们的组织使命。我知道这一点,因为上个月我没怎么见到罗纳德。当他终于出 现的时候,我问他一直在哪儿,他说他上次排了一个小时队才在我们这里吃到了饭,他说在垃圾桶里觅食都比在这里等待花的时间要少。让人难过的是,他还说,居 然他更愿意和我们在一起。我也想他能在这里,而且我知道你们也是这么想。这里有一个安全的、体面的环境,而不是个贫民窟。我们应该让罗纳德至少每天都有一 次机会可以享受到我们的欢迎和尊重。
罗纳德是我们要服务的对象,但我们却没有能力为他和其他人提供良好的服务。我们需要更多的空间来进行我们的准备工作,需要更多的火炉、更多的服务窗口,以 及更多干净安全的地方,容纳像罗纳德一样的人们。我希望看到罗纳德和我们在一起享用晚餐,而不是在垃圾堆里搜寻。他理应得到更好的服务,就像我们所有其他 的顾客一样。我们需要花掉40万美元,而我将告诉你们为什么这对于改善我们的设施和社区环境是一笔合理的投资

有点深的好书:袁:实用数据挖掘 (意大利人P Giudici著)


如果用CSV, remember to set to STRING or set to REAL properly

问题:如何调用脚本生成的节点名? ^yourname

最好不要重名节点,否则无法用script delete: ambigious nodes.以下可能可以help #Gets a reference to an existing node. This can be a useful way to ensure non-ambiguous references to nodes.
var mynode
set mynode = get node flag1:derivenode
position ^mynode at 400 400

分布节点:normalize by color,让bar等长

However, blanks (user defined missing values) do contribute to the aggregate summaries and these values should be replaced with $null$

We will also sort the data to make the data aggregation more efficient??

Distinct对大数据集不好,you can try an alternative: sort the data on the key fields, and then use the CLEM expression @OFFSET with a Select node to select (or discard) the first distinct record from each group

If the Keys are contiguous check box is selected, values for the key fields will only be treated as equal if they occur in adjacent records

Missing data太多的列应该删除。

是否划为outlier取决于分析目的:找有钱人,普通顾客(outlier不好);categorical fields 通常没有outlier/anormality;

类别的某个值太少,则无统计意义(跑步上班),infrequent behaviour


线形回归可被小比例的outlier影响,decision tree等则不容易;用histogram overlay或mean来看分布的影响.通常聚类时不用demographic data,而是用来校验。高度相关的列如果同时采用,则等于加了不比要的权

* 按语法内容分成若干章节,大多数为语法元素的示意,讲解不多,
* 阅读时可能

Standard script: stored in a file

BE AWARE of ^ (variable ref, but not very useful?) and \ (line continous character, or maybe embeded single quote id, also)
If you use quotation marks within a CLEM expression, make sure that each quotation mark is preceded by a backslash (\)—for example:

set :node.parameter = "BP = \"HIGH\""

Script Syntax section: Variable names, such as ^mystream, are preceded with a caret (^) symbol when referencing an existing variable whose value has already been set. The caret is not used when declaring or setting the value of the variable. See Referencing Nodes for more information

. You can specify nodes by name--for example, DRUG1n. You can qualify the name by type--for example, Drug:neuralnetnode refers to a Neural Net node named Drug and not to any other kind of node.

• You can specify nodes by type only—for example, :neuralnetnode refers to all Neural Net nodes. Any valid node type can be used—for example, samplenode, neuralnetnode, and kmeansnode. The node suffix is optional and can be omitted, but including it is recommended because it makes identifying errors in scripts easier.

• You can reference each node by its unique ID as displayed on the Annotations tab for each node. Use an "@" symbol followed by the id, for example @id5E5GJK23L.custom_name = "My Node". See Annotating Nodes and Streams for more information.

@MEAN(BALANCE,5),流过的最后5个record @SUM(field),所有record

一 变量与类型

1 域(field)的名称以及变量名以字母开头,可以包含字母、数字以及下划线。

2 数据类型使用样式
字符串 --"c1", "Type 2", "a piece of free text"
整数 --12, 0, –189
实数 --12.34, 0.0, –0.0045
日期时间 --05/12/2002, 12/05/2002, 12/05/02
字符 --`a` 或者 3
列表 --[1 2 3], [’Type 1’ ’Type 2’]

3 引号使用规则
字符串 --最好使用双引号。虽然单引号也能用,但是有时候会和域名混淆
字符 --使用后引号`(ESC键下面的那个)
也可以使用字符串中的索引 比如lowertoupper("druga"(5)) —> "A"
域名 --通常是不用加引号的,但是如果包含了空格等特殊字符就要加上双引号
参数名 --必须使用单引号

二 语法

1 运算符优先顺序
x / mod div rem
+ -
> < >= <= /== == = /=

2 结构控制
a if..then..else

if ^param = 24 then
create derivenode
else exit 2

b for循环

× for PARAMETER from N to M

× for PARAMETER in_models

× for PARAMETER in_fields_at NODE

× for PARAMETER in_fields_to NODE

× exit
for PARAMETER in_streams

3 赋值示意
×set :balancenode.directives = [{1.3 "Age > 60"}]
set :fillernode.cHigh\")"
set :derivenode.formula_expr = "substring(5, 1, Drug)"
set Flag:derivenode.flag_expr = "Drug = X"
set :selectnode.c
set :derivenode.formula_expr = "Age - GLOBAL_MEAN(Age)"
set nodename.tablename="mytable"
set: databasenode.table="atablename"
set my_node = get node :plotnode
set :samplenode {
max_size = 200
mode = "Include"
sample_type = "First"
set :balancenode.directives = [{1.3 "Age > 60"}]
set :fillernode.cHigh\")"
set :derivenode.formula_expr = "substring(5, 1, Drug)"
set Flag:derivenode.flag_expr = "Drug = X"
set :selectnode.c
set :derivenode.formula_expr = "Age - GLOBAL_MEAN(Age)"
完整的表达形式应该是 set nodename:NODETYPE.prop=value

× 设置超节点参数
set mySuperNode.parameters.minvalue = 30
set :process_supernode.parameters.minvalue = 30
set :process_supernode.parameters.minvalue = ""
set mySuperNode:process_supernode.parameters.minvalue = 30
set mySuperNode.parameters.’Data_subset:samplenode.rand_pct’ = 50
set :source_supernode.parameters.’Data_subset:samplenode.rand_pct’= 50

4 设置一个图标的位置
position nodename at 450 50

5 执行某个节点
execute :exe_node_name

6 新建一个节点和流
var x
set x = create typenode
rename ^x as "mytypenode"
position ^x at 200 200
var y
set y = create varfilenode
rename ^y as "mydatasource"
position ^y at 100 200
connect ^y to ^x

set node = create typenode
rename ^node as "mytypenode"
position ^node at 200 200
set node = create varfilenode
rename ^node as "mydatasource"
position ^node at 100 200
connect mydatasource to mytypenode


7 访问数据结果

×set num_rows = :tablenode.output.row_count

×set table_data = :tablenode.output
set last_value = value table_data at num_rows num_cols

8 文件操作
× 打开文件
MODE create/append

× 关闭文件
close FILE

× 举例
set file = open create ’C:/Data/script.out’
for I from 1 to 3
write file ’Stream ’ >< I
close file

9 连接节点
create tablenode
create variablefilenode
connect :variablefilenode to :tablenode
set :variablefilenode.full_filename = "C:\Program Files\Clementine\8.1\demos\DRUG1n"
execute ’Table’
set param = value :tablenode.output at 1 1
if ^param = 24 then
create derivenode
else exit 2

使用Clem expression in Scripts:
You can use CLEM expressions, functions, and operators within Clementine scripts; however, your scripting expression cannot contain calls to any @ functions, date/time functions, and bitwise operations. Additionally, the following rules apply to CLEM expressions in scripting:

• Parameters must be specified in single quotes and with the $P- prefix.

• CLEM expressions must be enclosed in quotes. If the CLEM expression itself contains quoted strings or quoted field names, the embedded quotes must be preceded by a backslash (\). See Scripting Syntax for more information.

You can use global values, such as GLOBAL_MEAN(Age), in scripting; however, you cannot use the @GLOBAL function itself within the scripting environment.

Examples of CLEM expressions used in scripting are:

set :balancenode.directives = [{1.3 "Age > 60"}]
set :fillernode.condition = "(Age > 60) and (BP = \"High\")"
set :derivenode.formula_expr = "substring(5, 1, Drug)"
set Flag:derivenode.flag_expr = "Drug = X"
set :selectnode.condition = "Age >= '$P-cutoff'"
set :derivenode.formula_expr = "Age - GLOBAL_MEAN(Age)"

各节点的property在此:Scripting, automation, and CEMI
Properties Reference

derivenode Properties
Derive node,Derive node,Derive node
derivenode properties,derivenode properties,derivenode properties
The Derive node modifies data values or creates new fields from one or more existing fields. It creates fields of type formula, flag, set, stat, count, and conditional. See Derive Node for more information.
derivenode properties Data type Property description
new_name string Name of new field.
See the example below for usage.
mode Single
Multiple Specifies single or multiple fields.
fields [field field field] Used in Multiple mode only to select multiple fields.
name_extension string Specifies the extension for the new field name(s).
add_as Suffix
Prefix Adds the extension as a prefix (at the beginning) or as a suffix (at the end) of the field name.
result_type Formula
Conditional The six types of new fields that you can create.
formula_expr string Expression for calculating a new field value in a Derive node.
flag_expr string
flag_true string
flag_false string
set_default string

特殊符Literal text blocks that include spaces, tabs, and line breaks can be included in scripts by setting them off in triple quotes. Any text within the quoted block is preserved as literal text, including spaces, line breaks, and embedded single and double quotes. No line continuation or escape characters are needed.

Clem expression用在script中需要用双引号包围之 set :fillernode.condition = "(Age > 60) and (BP = \"High\")"

Kohenan: default 7x10 is too many, 3x4 is better.设定(exponienal decay倾向于生成1个特别大的cluster;neighbourhood phase1在grid小时不应该大于phase2)。接近指定的镞数上限

Most of the time when using Apriori or GRI, we will

either not define data as blanks, or we will define data

as blanks and then remove the records with blank values.

Either one of these approaches will lead to no confusion

in the association rules created.要么不定义,如果定义,



Carma node works only with fields of storage type string

Override…True from the context menu
Right-click again and select Set Storage…String

(discussed. Carma allows user specifying rule support,

leading to simpler rules;
Carma has the limitation that the data be of type flag

(and storage type string) with tabular data,if the data

are changed to transactional format, Carma can use

fields of type set.如果要出负规则,则需要对调carma的真假

is decreased to the value of Neighborhood (Phase 2) +1. By default, this value is equivalent to that for Neighborhood (Phase 1), so no decrease occurs in phase 1

In small grids, the value of Neighborhood (Phase 1) shouldn’t be set higher than the default of Neighborhood (Phase 2); otherwise the whole grid will be affected.
? Don’t change the Cycles settings unless you get an odd-looking solution; they should be large enough for almost all circumstances, except for unusual data or a large number of clusters.
? The Initial Eta settings are the most likely place to begin to modify a network, and you would normally set Initial Eta (Phase 1) a bit higher, and perhaps Initial Eta (Phase 2) as well.
Techniques for Clustering 2 - 21
Clustering and Association Models with Clementine
? If you expect to find one dominant cluster, leave the Learning decay rate as Exponential. If not, you can try using a linear decay.

In the Kohonen node, blanks are handled by substituting “neutral” values for the missing ones. For range and flag fields with missing values (blanks and nulls), the missing value is replaced with 0.5 (for range fields, this is done after the original values have been transformed into a 0-1 range). Range field values below the lower bound (in the last Type node) will be set to the lower bound and values above the upper bound will be set to the upper bound value. For set fields, the derived indicator field values are all set to 0.0. This is the same missing data handling as we found in the K-Means node.

Kmeans:The Encoding value for sets value can be set between .0001 and 1.0, inclusive. Values below .70711 will decrease the importance of set and flag fields, while values above that will do the reverse.
Flag fields are not encoded in this manner, as doing so may distort the distances between records. They are given values of 0 and 1.
If symbolic fields are included in a clustering solution, we recommend leaving the encoding value at its default setting unless you have a good reason to change the influence of these fields.

solution compared to numeric fields. Accordingly, the value of .70711 is used instead (the square root of 1/2).

It should be noted that the likelihood function assumes that numeric predictors follow normal distributions and the symbolic predictors follow multinomial distributions; the former assumption is not.且不支持missingdata,records with blank,null,missing将被remove

上搜狐,看奥运,这是sohu 10月28日的首页的heading..真可悲。不注重细节的公司














leading 000 are important in programming

All seemed working in a AJAX, we tested with some user IDs and it worked, however when our customers log in, they get an error which indicates the correct flag which IS in database is not picked up.

It is not http/https difference nor AJAX not working,

when I wrote the code, the sMU_id returned from Database is varchar2 so can be 01112222, however when it was assigned to mu_id this 0 (if there is one) is lost as the default type of vbscript is chosen automatically as number!!!


Response.Write("var mu_id; mu_id="" "";mu_id="""&sMU_id & """;"& vbCrLf )

   1、《统计学》 David Freedman等著,魏宗舒,施锡铨等译 中国统计出版社    据说是统计思想讲得最好的一本书,读了部分章节,受益很多。整本书几乎没有公式,但是讲到了统计思想的精髓。
   2、《Mind on statistics(英文版)》 机械工业出版社
   只需要高中的数学水平,统计的扫盲书。有一句话影响很深: Mathematics as to statistics is something like hammer, nails, wood as to a house, it's just the material and tools but not the house itself。
  3、《Mathematical Statistics and Data Analysis(英文版.第二版)》 机械工业出版社
  4、《Business Statistics a decision making approach(影印版)》 中国统计出版社
  5、《Understanding Statistics in the behavioral science(影印版)》 中国统计出版社
  6、《探索性数据分析》中国统计出版社 和第一本是一个系列的。大家好好看看陈希儒老先生做的序,可以说是对中国数理统计的一种反思。
  7、 数理统计引论
  著译者: 陈希孺
  8 《概率论与数理统计教程》魏宗舒
  1、《应用线性回归》 中国统计出版社
  2、《Regression Analysis by example (3rd Ed影印版)》
  3、《Logistics回归模型——方法与应用》 王济川 郭志刚 高等教育出版社 不多的国内的经典统计教材。两位都是社会学出身,不重推导重应用。每章都有详细的SAS和SPSS程序和输出的分析。两位估计洋墨水喝得比较多,中文写的书,但是明显老外写书的风格
  0、《多元统计分析引论》张尧庭,方开泰著 科学出版社
  1、《应用多元分析(第二版)》 王学民 上海财经大学出版社
  现在好像就是用的这本书,但是请注意,这本书的亮点不是推导,而是后面和SAS结合的部分,以及其中的一些想法(比如P99 n对假设检验的影响,绝对是统计的感觉,不是推推公式就能感觉到的)。这是一本国内很好的多元统计教材。
  2、《Analyzing Multivariate Data(英文版)》 Lattin等著 机械工业出版社 这本书有很多直观的感觉和解释,非常有意思。对数学要求不高,证明也不够好,但的确是“统计书”,不是数学书。
  3、《Applied Multivariate Statistical Analysis (5th Ed影印版)》 Johnson & Wichem 著 中国统计出版社
  个人认为是国内能买到的最好的多元统计书了。Amazon 上有人评论,评价很高的。不过据王学民老师说,这本书的证明还是有不太清楚,老外实务可以,证明实在不咋的,呵呵
  1、《商务和经济预测中的时间序列模型》 弗朗西斯著
  Amazon 上五星推荐的书,讲了很多很新的东西也非常实用。我看完才知道,原来时间序列不知有AR(1) MA(1)啊,哈
  2、《Forecasting and Time Series an applied approach(third edition)》 Bowerman & Connell 著
  1、《抽样技术》 科克伦著 张尧庭译
  2、《Sampling: Design and Analysis(影印版)》 Lohr著 中国统计出版社
  讲了很多很新的方法,无应答,非抽样误差,再抽样,都有讨论。也很不好懂,当时偶是和《Advance Microeconomic
  1、《SAS软件与应用统计分析》 王吉利 张尧庭 主编
  2、《SAS V8基础教程》 汪嘉冈编 中国统计出版社
  3、《SPSS11统计分析教程(基础篇)(高级篇)》 张文彤 北京希望出版社
  4、《金融市场的统计分析》 张尧庭著 广西师范大学出版社
  Common Errors in Statistics : (and How to Avoid Them)
  Good P.I., Hardin J.W.
  John Wiley & Sons; 2003; 240стр.; ISBN: 0471460680

Don't extend your brand name into new product line (use a new brand), it will dilute the strong link between the brand and the power product

Product brand extension is like alcohol (a short term extragattor and long term depressor)....

《定位》、《新定位》、《五轮书》、《战争论》、杰克特劳特 《营销战》、《君王论》、高建华《不战而胜》、《鬼谷子全书》、《非常营销》、科特勒《营销管理》、路长全《切割销售》 …………

  《创新与企业家精神》(彼得.德鲁克) 、《请给我结果》 、《比强者更强》 、《赢》杰克韦尔奇 ……
  第一 美国总统演讲撰稿人 克里斯 马修斯 写的 硬球-政治是这样玩的。
  第二 维亚康姆总裁雷石东的自传——赢得激情。 好的没话讲,也比较正面。
  第三 我不得不杀人 以色列特工组织摩萨德前女特工自传。

1、 经验曲线应用于行业成本分析
高累计产量 低单位产品成本 高盈利。以溴化锂中央空调为例,江阴双良由于领先进入行业,其累计产量最高,具有较高的市场占有率和较低的单位产品成本,和同样采取总成本领先战略的其它企业相比,其盈利是最高的,这是因为采用的战略基本相同,因此具有相同或者类似的经验曲线。但是,如果两个竞争对手,分别具有不同的产品技术或者所能达到的技术水平不同,则需要注意到由于经验曲线的不同,所能达到的效果也是不同的。例如三洋制冷,在1992年以日本三洋所具有的世界领先的溴化锂产品和技术参与行业竞争,虽然当时江阴双良在市场上占据成本优势,但是三洋制冷以新的技术即以一条不同的经验曲线打入市场,虽然在初始阶段在市场占有率上处于劣势,但是却凭借着性能价格比的优势,迅速扩大市场占有率,在市场上站稳脚跟并迅速发展壮大。
2、 经验曲线应用于匡算企业的成本发展趋势
3、 经验曲线应用于经营战略的选择

weekend story奥运与战争




xinhuanews did report the stab of death and injury of two American tourists... The chinese male must has received huge unfair treatment which he has no better target (if he can reach higher officials!)

MoD of UK lost more than 600 laptops in the last 4 years

Telepresence will REDUCE business travel and FAR BETTER than video conferencing

竞争能力:1管理(产品成本控制)2营销能力(销售成本及价格哄抬) 3技术创新


●新市场战略 针对未用产品的群体用户(一个新的细分市场),说服他们采用产品。比如,说服男子采用化妆品。
●市场渗透战略 这是对现有细分市场中还未用产品的顾客,或只偶尔使用的顾客,采用降价、劝诱和加大促销力度等方法,促使他们采用产品或是增加使用量。如口服滋补品的营销者强调产品日常保健功能,使顾客认为不是只有患病才要使用。如果平时也使用,就可增加产品消费量。
●地理扩展战略 即将



購買決策上較常看到的三個模式如下:(一)Engel-Kollat-Blackwell Model(簡稱 EKB 模式)EKB 模式是由 Engel 等(1978)所提出,研究的重點在於強調消費者的決策過程是一個整體的程序而非間斷性的行動。其特色在於以消費者決策過程為中心,解決面臨的問題。經歷需求認知、資訊尋求、方案評估、購買消費至購後行為五個程序。購買決策程序有下列五步驟:1.需求認知(need recognition):購買決策過程的第一階段,購買者認識到本身的問題或需要的存在。2.資訊蒐集(information search):購買決策過程的第二階段,被引起購買欲望的消費者會去蒐集更多的資訊。消費者可能只是對資訊有高度的關注,或進行積極的資訊蒐集。3.方案評估(alternative evaluation):(1)品牌形象(brand image):對某一特定品牌所持有的信念。(2)評估程序(evaluation procedure):建立對各品牌的態度,通常消費者會利用一種或以上的評估程序來做產品評估,如品質、大小、價格等。4.購買消費(purchase decision):購買決策過程中的第四階段,消費者實際上進行產品的購買。購買消費包含是否購買、何時購買、購買什麼、哪裡購買與如何付款?5.購後行為(post-purchase behavior):購買決策過程中的第五階段,消費者在購買產品後會基於其對購買過程結果所採取的後續行為,其中包含滿意度及購後失調。當消費者在購買產品後,此兩種經驗通常都會進入其記憶中,並影響往後的購買決策,進一步反應於下一個購買程序中。(二)Engel-Blackwell-Miniard Model(簡稱 EBM 模式)EBM 模式由 Engel 等(1993)所倡導,認為一切與消費者購買產品或其過程中,有關的活動與意見。即消費者直接涉及、取得、消費與處置產品或服務的所有活動,包含此類活動前後所引發的決策程序。(三)Kotler ModelKotler(1994)主張外部的行銷刺激與環境刺激,經由消費者黑箱處理的過程,產生購買決策,並且會因為個人的特性與決策過程的不同,產生不同的購買反應,而行銷的任務在於瞭解刺激與消費者的意識中所發生的事件,整個過程涵蓋了環境、個人差異、心理程序三類因素。(四)Howard-Sheth Model
Page 5
5Howard 和 Sheth(1969)提出消費者決策模式(Consumer Decision Model,CDM)。模式主要是由六個基本變項組合而成:資訊(information)、品牌認知(brand recognition)、態度(attitude)、信心(confidence)、購買意願(purchaseintention )和購買(purchase


6.参见:【美】菲利普·科特勒:《营销管理》(第九版)梅汝和等译 第13页 上海人民出版社 Prentice-Hall,Inc 1999年10月出版 ·10·
第一章 市场营销与顾客满意
来,这就是细分市场的概念。当细分市场以后,企业就可以根据自己的资源情况、技术专长和竞争能力,选择其中一些细分市场提供产品和服务,被企业选为提供产服务的那些细分市场,就是一个企业的目标市场。细分市场(Segmentation Market)和目标市场(Target Market)是现代营销的核心概念


波士顿矩阵法,或直接简称为BCG法 习惯上以10%的增长率作为高、低增长率的分界线






尽管平衡计分卡的指标各有特定的内容,但彼此并非孤立、完全割裂的,而是既常常冲突对立又密不可分的。正如卡普兰所言“平衡计分卡的四个维度并不是罗列,学习维度,流程维度。客户维度、财务维度所组成的平衡计分卡既包含结果指标,也包含促成这些结果的先导性指标,并且这些指标之间存在因果关系 ”, 这种内部逻辑关系,其根本为投资者需要的财务角度,但投资收益是有一个价值产生过程的,先有员工的创新学习,企业内部管理才有优化的可能和基础,内部管理优化后就能更好地为顾客服务,顾客认可企业的产品和服务,才进行有效消费,企业的价值才能实现,也就有了投资收益。企业发展了一步,产生新情况,又需要员工创新学习,开始下一个循环,由此形成一个完整、均衡的关联指标体系。同时,为了保障战略的有效执行,BSC在评价系统中通过因果关系链整合了财务指标和非财务战略指标,既包括结果指标也包括驱动指标,使其自身成为一个前向反馈的管理控制系统。各指标平衡时,产生良性互动;当某个指标片面偏离目标发生冲突时,协调、沟通、评价机制发挥作用推动财务指标与非财务指标之间,领先指标与落后指标之间,长期指标与短期指标之间,外部指标与内部指标之间达到平衡。

Got a VMWARE certification and you got a job!

VDI (Citrix infrastructure)

所有超市雇员都知道的两个名字:the Likert scale(问卷5级答案) and Osgood’s Semantic
Differential Procedures(用多个形容词纬度的评级来描述某事物/概念). Not though, generally speaking, two concepts
on the lips of every supermarket employee.

早期:Out of the 45,000 lines, 8,500 accounted for 90 per cent

of sales.
Working with that number would inevitably be quicker and

easier, and
common sense suggested that it could yield almost as

much insight as if
the other 36,500 lower-sales-contributing lines were

also included
To the team’s excitement, when they
examined the list of products in each cluster, they

seemed to make sense.
The team settled on 27 different clusters, which became

its first
customer segments. This was given the catchy title of

‘Tesco Lifestyles’.

ways; for
example, there was a ‘Snacking and Lunch Box’ Bucket.

‘Why not turn Lifestyles upside down?’ they reasoned. Take each
product, and attach to it a series of appropriate attributes, describing
what that product implicitly represented to Tesco customers. Then by
scoring those attributes for each customer based on their consistent
shopping behaviour, and building those scores into an aggregate measurement
per individual, a series of clusters should appear that would
create entirely new segments.
They then set about imagining 50 things that our shopping baskets might say
about customers. What does it mean if we buy a lot of ready meals? Alot
of fresh produce? No meat? Did we like to try out new products, or
exotic ingredients? Are we motivated by price promotions?
Measuring customers on a number of these criteria could start to create
distinct profiles

建立osgood profile,对45000种商品?By creating 20 scales on which to judge the attributes of every
product in the store, it could then create 20 numerical measures. Turning
numbers into insight was becoming a Clubcard speciality.
But what scales to choose? ‘Low fat’ against ‘high fat’, ‘big carton’
against ‘small carton’, ‘needs preparation’ against ‘ready to eat’, and
‘low price’ against ‘high price’ are just a few of the two-tailed Likert
scales that they ended up choosing. There were also single-tailed
measures, such as ‘Is it a promotion?’ and ‘Is it a Major Brand product?’
With 20 scales agreed as a way of grading every product on its shelves,
all that the team had to do was to produce the Osgood Profiles. That is,
45,000 Osgood profiles, one for every product from anchovies to
asparagus, whisky to washing powder. But judging 45,000 products on
20 different scales would mean agreement on 1.2 million individual
ratings before the segmentation could be used.


had ever tried to distinguish how ‘adventurous’
every product in a supermarket is. Tinned fish probably isn’t; extra
virgin olive oil is. Is Brie adventurous? How adventurous is it? More
than decaffeinated coffee? Less than a red pepper?
They set about devising a way to allocate attributes for every item.
The process created was known as the Rolling Ball. To create a Rolling
Ball categorization, Pavey and his team started with a small set of
products that definitely have the quality you seek: so if you want to find
out which products are adventurous, start with extra virgin olive oil and
ingredients for Malaysian curries, and see which customers bought
those products.
Then look at what else these customers have in their shopping basket.
Discard items that show up in everyone’s basket (bananas or milk, for
example), and keep looking, building bigger and bigger groups of
products. When can the process stop? This is where the rolling ball idea
came in.
The products that are picked up early will have a high ‘adventurous’
rating. As the ball gets bigger, those ratings are probably lower, and
certainly less reliable. So how to stop the ball? Well, the basic idea was
that each of the major attributes were large dips in a huge surface. When
the ball starts to roll into an adjacent hole, then the ball should stop. For
example, you might start off trying to predict adventurous products, but
after 400 or 500 products are coded, you start to find a lot of products
that are more ‘Fresh’ than ‘Adventurous’, and so the ball has started to
roll down an adjacent hole. The mathematics to solve this problem were
challenging, but the method created groups of products that intuitively
seem right.

Each time a cluster became apparent, fewer shoppers remained lost in
20-dimensional space. After six months, 13 well-defined and tested
groups had been identified. But the 14th made no sense.
To make the segmentation work well, an extra
segmentation was born, Shopping Habits, which used not just what
people bought, but when people shopped.

Saturday, August 02, 2008






BBC - BBC Three Programmes - The Real Hustle - Previous episodes:一车多卖

what have u done that for?you came out frim nowhere

battery is flat

fiver: $5

bigger bag catch small bag

charge: wrong car park只要你贴张纸说机器坏了,它就坏了,收钱吧

2.liking 招人喜欢
3. reciprocation 互回馈义务
4 consistensy 言行潜意识里会一致
5 social validation 下意识和别人保持一致
6 scarcity 强烈想要(无用的)稀缺资源


消费品市场具有以下特征: ·99·
营销管理(第 2 版)
