Earlier this week, GitHub engineer Timothy Clem discussed the technology behind the company's new code search tool in a post.

 

Earlier this week, GitHub engineer Timothy Clem discussed the technology behind the company's new code search tool in a post.

 The search engine called BlackBird was built from scratch in Rust. 


  • During his GitHub Universe presentation last year, Clem commented that code search's user experience was poor and very expensive to host. 
  • GitHub uses Shard by Git blob object ID to avoid any duplication and uses delta encoding to reduce the amount of crawling. 
  • BlackBird provides access to around 45 million GitHub repositories, amounting to 115TB of code and 15.5 billion documents. 
  • Delta indexing reduces the number of documents needed to crawl by around 50% and allows GitHub to re-index the entire corpus in 18 hours. 

Post a Comment

Previous Next

Contact Form