Practically 45GB of supply code information, allegedly stolen by a former worker, have revealed the underpinnings of Russian tech large Yandex’s many apps and companies. It additionally revealed key rating components for Yandex’s search engine, the type virtually by no means revealed in public.
The “Yandex git sources” had been posted as a torrent file on January 25 and present information seemingly taken in July 2022 and courting again to February 2022. Software engineer Arseniy Shestakov claims that he verified with present and former Yandex workers that some archives “for certain comprise fashionable supply code for firm companies.” Yandex told security blog BleepingComputer that “Yandex was not hacked” and that the leak got here from a former worker. Yandex said that it didn’t “see any menace to person knowledge or platform efficiency.”
The information notably date to February 2022, when Russia started a full-scale invasion of Ukraine. A former govt at Yandex instructed BleepingComputer that the leak was “political” and famous that the previous worker had not tried to promote the code to Yandex opponents. Anti-spam code was additionally not leaked.
Whereas it is not clear whether or not there are safety or structural implications of Yandex’s supply code revelation, the leak of 1,922 ranking factors in Yandex’s search algorithm is definitely making waves. search engine optimisation guide Martin MacDonald described the hack on Twitter as “most likely essentially the most fascinating factor to have occurred in search engine optimisation in years” (as noted by Search Engine Land). In a thread detailing a few of the extra notable components, researcher Alex Buraks suggests that “there may be numerous helpful info for Google search engine optimisation as properly.”
Yandex, the fourth-ranked search engine by quantity, purportedly employs a number of ex-Google workers. Yandex tracks a lot of Google’s rating components, identifiable in its code, and competes closely with Google. Google’s Russian division recently filed for bankruptcy after shedding its financial institution accounts and fee companies. Buraks notes that the primary think about Yandex’s listing of rating components is “PAGE_RANK,” which is seemingly tied to the foundational algorithm created by Google’s co-founders.
As detailed by Buraks (in two threads), Yandex’s engine favors pages that:
- Aren’t too outdated
- Have numerous natural site visitors (distinctive guests) and fewer search-driven site visitors
- Have fewer numbers and slashes of their URL
- Have optimized code moderately than “exhausting pessimization,” with a “PR=0”
- Are hosted on dependable servers
- Occur to be Wikipedia pages or are linked from Wikipedia
- Are hosted or linked from higher-level pages on a website
- Have key phrases of their URL (as much as three)
You may search and click on via all of the components on Rob Ousbey’s compiled search tool. You may discover that almost 1,000 of the rating components have the tag “TG_DEPRECATED,” and greater than 200 are listed as “TG_UNUSED.” As a result of the code is from February 2022 and was grabbed in July 2022, Yandex’s search has definitely modified since. However the leak gives a uncommon look into how search rankings are put collectively at a web site that companies one of many world’s largest nations.
Yandex beforehand noticed its search engine code stroll out the door in 2015, when a former worker tried to sell it on the black market for $28,000 to fund his personal startup. The surprisingly low determine for the core code of Yandex’s primary product steered he was unaware of its actual worth. That worker was sentenced to a suspended two years in jail, and the code was by no means seen publicly.