Wednesday, May 26, 2010

eDiscovery电子鉴证笔记

电子鉴证的流程 edrm.net

1.发现相关数据
2a.保留
2b.收集
3a.处理
3b.评估
3c.分析
4 结论
5 处理结论

整个过程中数据量下降,数据的相关性上升。


Plaintiff 原告 defendant 被告 deposition 证词 conjugation ( change of verb form

文档查找效果可以用recall(多少正确文档被查处)和precision(number of target files retrieved divided by all files retrived)

relevance ranking相关度排序:关键词出现多,则相关度高,但关键词所出现的文章越多(the term appears in more documents of the corpus,the less important the term is)

聚类:rule based or sample based


Search effectiveness can be measured by recall, the number of responsive documents retrieved divided by the total number of responsive documents, and precision, the number of responsive documents retrieved divided by the total number of documents retrieved. A variety of technologies have been introduced to improve both recall and precision of keyword search. Boolean, which allows for use of AND, and NOT operators in search queries, and proximity search, which finds documents that contain terms within a specified distance of each other, have been used to improve precision by reducing false positives. Stemming, wildcard and fuzzy search, which find documents with different variations of the specified terms, such as differences in case, conjugation and spelling, have been used to improve recall by finding variations of the search word that have the same or similar meaning.

0 Comments:

Post a Comment

<< Home