算法与数据结构学习

Wiki Note

wiki/算法与数据结构学习.md

算法与数据结构学习

[AI Synthesis] 由 13 份压缩摘要交叉孵化的新 L1 主题（mean conf 0.81，max 0.88）。

Core patterns

_(Incubated; evidence appended below.)_

Distillation (2026-06-16) - source: ../raw/_posts/2014-04-19-sort-algorithms-priority-queues.md

[Literal] Priority Queues are a data type used to find the largest M items from a stream of N items, especially when there isn't enough memory to store all N items. [AI Synthesis] This implies a scenario where data arrives sequentially and memory is a constraint, making efficient selection crucial.

Distillation (2026-06-16) - source: ../raw/_posts/2014-05-11-search-algorithms-binary-search.md

[Literal] A symbol table is a data structure for key-value pairs supporting insert and search operations. [AI Synthesis] Binary search on an ordered array is presented as a method to implement this API. [Literal] The implementation uses two parallel arrays, with keys maintained in sorted order. [Literal] Java code examples demonstrate get() and put() methods, utilizing a rank() helper function to find the position of a key. [Literal] get() returns the value if the key is found, while put() updates the value if the key exists or inserts a new key-value pair, resizing the array if necessary. [Literal] The performance analysis highlights that binary search requires at most lgN + 1 compares for search, but insertion can take up to ~2N array accesses in the worst case, leading to ~$N^2$ accesses for inserting N keys. [AI Synthesis] This makes it efficient for static tables but problematic for dynamic scenarios with frequent intermixed searches and inserts. [Literal] The text concludes that for efficient insertion alongside search, more complex data structures like binary search trees or hash tables are needed, as linked structures alone prevent binary search's indexing advantage.

Distillation (2026-06-16) - source: ../raw/_posts/2014-05-25-search-algorithms-binary-search-trees.md

[Literal] Binary search trees (BST) combine the flexibility of insertion in a linked list with the efficiency of search in an ordered array. > [Literal] A BST is a binary tree where each node's key is larger than all keys in its left subtree and smaller than all keys in its right subtree. > [Literal] A node consists of a key, a value, and references to left (smaller keys) and right (larger keys) subtrees. > [AI Synthesis] The Java implementation demonstrates recursive get and put methods for searching and inserting key-value pairs, respectively.

Distillation (2026-06-16) - source: ../raw/_posts/2015-03-20-mcc_lsh.md

[Literal] The primary features of fingerprint ridges are ridge endings, bifurcations, short ridges, or dots. [Literal] A minutia signifies an unusual point in a fingerprint, such as where two ridges merge or a ridge terminates. [Literal] Minutiae and patterns are crucial for fingerprint analysis due to the uniqueness of each finger. [Literal] The Minutia Cylinder-code (MCC) representation links a local structure to each minutia, normalizing fingerprints for size and orientation. [Literal] This local minutiae representation uses 3D data structures (Cylinders) derived from invariant distances and angles within a minutia's neighborhood. [Literal] Locality-Sensitive Hashing (LSH) projects data into a lower-dimensional space where similar items are likely to hash to the same bucket, significantly reducing the number of distance calculations needed by only considering colliding vectors. [AI Synthesis] LSH is a technique for approximate nearest neighbor search in high-dimensional spaces. [Literal] For Euclidean distance, LSH can be initiated by projecting points onto random lines and dividing these lines into fixed-length intervals. [Literal] Nearby points are expected to fall into the same interval, while distant points are less likely to do so. [Literal] Exact Euclidean LSH (E2LSH) offers a randomized approach to the high-dimensional near-neighbor problem in Euclidean space.

Distillation (2026-06-16) - source: ../raw/_posts/2015-07-11-hash.md

[Literal] This document introduces hash functions, which map a set of N items into a structure that allows for O(1) expected time dictionary queries, contrasting with O(log N) time for sorted structures. [AI Synthesis] Hashing offers a trade-off between space complexity (O(N)) and query time. - [Literal] Hashing involves two main steps: computing a hash function to transform keys into array indices and a collision-resolution process for keys that map to the same index.

Distillation (2026-06-16) - source: ../raw/_posts/2015-07-29-clustering.md

[Literal] Clustering aims to divide a set of objects into meaningful groups where objects within a cluster are similar, and objects in different clusters are dissimilar. [Literal] Centroid-based partitioning methods include k-center and k-means. [Literal] Hierarchical methods allow exploration of various clustering results from a dendrogram. [Literal] Density-based methods are also mentioned but not detailed.

Distillation (2026-06-16) - source: ../raw/_posts/2015-08-04-priorityqueue.md

[Literal] A Priority Queue is a collection where items can be added at any time, but only the item with the highest priority can be removed. [AI Synthesis] This contrasts with a standard queue (FIFO) and a stack (FILO), where removal order is strictly defined by insertion time. > [Literal] Implementations vary: unordered arrays/linked-lists offer O(1) insertion but O(n) removal of the max element. Ordered arrays/linked-lists offer O(1) removal of the max but O(n) insertion. > [Literal] A binary heap provides an efficient O(logN) for both insertion and removal of the highest priority element.

Distillation (2026-06-16) - source: ../raw/_posts/2015-09-04-similaritysearch.md

[Literal] Similarity search aims to find objects with characteristics similar to a query object, a task crucial for databases, data mining, and search engines dealing with feature-rich data represented as high-dimensional vectors. > [Literal] This is typically achieved through K-Nearest Neighbor (KNN) or Approximate Nearest Neighbors (ANN) search, where KNN finds the K closest objects and ANN finds objects within a small factor of the true nearest neighbors' distances. > [AI Synthesis] The core challenge lies in efficiently indexing and searching these high-dimensional spaces, as traditional methods often fail to meet accuracy, time, and space efficiency requirements.

Distillation (2026-06-16) - source: ../raw/_posts/diary/2015-04-19-weekly-summary.md

Done：与老板完善 kNN（想通 divide-and-conquer）；Hash 待总结；读完《甲骨文》；红酒、作息稍规律；跑步；与老杨打篮球。To improve：算法编程、英粤、珍惜时间。 - (Source: raw/_posts/diary/2015-04-19-weekly-summary.md)

Distillation (2026-06-16) - source: ../raw/_posts/diary/2015-05-03-weekly-summary.md

Done：kNN 分割下界 bug 与数据测试；与 li yu 沙田买衣服看电影；浸会 TED（英语听说差）；办公室写 C 指针数组 textsort。To improve：算法线 70% 精力；周末活动；提高效率。 - (Source: raw/_posts/diary/2015-05-03-weekly-summary.md)

Distillation (2026-06-16) - source: ../raw/_posts/diary/2015-06-08-weekly-summary.md

周复盘：完善 LSH 算法（performance 分析、C++ 版、hash 总结）；欧冠熬夜影响精神，要少熬夜；加快读书与上课进度，提高效率。 - 技术：LSH 算法 performance 分析；C++ 版本细读；hash 部分总结。 - 作息：周六晚看欧冠后周日没精神；规律作息，少熬夜。 - 学习：加快读书与上课进展；提高效率。 - (Source: raw/_posts/diary/2015-06-08-weekly-summary.md)

Distillation (2026-06-16) - source: ../raw/_posts/diary/2015-07-13-weekly-summary.md

周报告：打羽毛球更认识同事；改进网站；hashtable 与 iris hamming indexing 完成。TODO：语言多花时间；算法课与系统课加快。 - Done：羽毛球社交；网站改进；hashtable（需加测试）；iris hamming indexing。 - TODO：语言多花时间；算法课、系统课进度加快。 - (Source: raw/_posts/diary/2015-07-13-weekly-summary.md)

Distillation (2026-06-16) - source: ../raw/_posts/diary/2016-10-10-month-summary.md

珍惜时间，多多努力。每天练英文尤其听说；research skills；每月读完一本书；focus system、programming、algorithms；生活规律些、锻炼身体。 - (Source: raw/_posts/diary/2016-10-10-month-summary.md)

Evolution

2026-06-16: Theme incubated from 13 compression file(s) (threshold: support≥3, mean≥0.78, max≥0.82).
2026-06-16: Distilled from raw source ../raw/_posts/2014-04-19-sort-algorithms-priority-queues.md.
2026-06-16: Distilled from raw source ../raw/_posts/2014-05-11-search-algorithms-binary-search.md.
2026-06-16: Distilled from raw source ../raw/_posts/2014-05-25-search-algorithms-binary-search-trees.md.
2026-06-16: Distilled from raw source ../raw/_posts/2015-03-20-mcc_lsh.md.
2026-06-16: Distilled from raw source ../raw/_posts/2015-07-11-hash.md.
2026-06-16: Distilled from raw source ../raw/_posts/2015-07-29-clustering.md.
2026-06-16: Distilled from raw source ../raw/_posts/2015-08-04-priorityqueue.md.
2026-06-16: Distilled from raw source ../raw/_posts/2015-09-04-similaritysearch.md.
2026-06-16: Distilled from raw source ../raw/_posts/diary/2015-04-19-weekly-summary.md.
2026-06-16: Distilled from raw source ../raw/_posts/diary/2015-05-03-weekly-summary.md.
2026-06-16: Distilled from raw source ../raw/_posts/diary/2015-06-08-weekly-summary.md.
2026-06-16: Distilled from raw source ../raw/_posts/diary/2015-07-13-weekly-summary.md.
2026-06-16: Distilled from raw source ../raw/_posts/diary/2016-10-10-month-summary.md.

Backlinks