We Thought. We Experimented. We Failed.
You know how when you read a page, and they have a “Load More” button at the bottom? Or when you scroll say 80% down the page and more text, images or more content appears? This content is generated dynamically, and it’s often done so using ‘code’ called Ajax, or Javascript. These pages are generically called dynamic pages and they differ from static pages which print the text immediately when the pages load.
It’s common knowledge among SEOs that Google is able to read (to a large extent) dynamic pages - Baidu however, cannot. When you think of Baidu’s bot (or BaiduSpider as they call it), it’s made for a Web 1.0 era, where pages are largely static in nature and everything that appears on the page is present within the HTML itself. Often, much of the readable content is generated by dynamic means, and if this is the case - it supports the notion that the less content Baidu is able to read, the less able Baidu is to understand it.
Thesis: If Baidu can’t read dynamic pages, do sites with dynamic pages rank lower on Baidu?
Let’s find out.
We set out to find a correlation between search engine Rank vs degree to which a page was dynamic. We generated [ xxx ] queries, and analyzed over [ 1800 pages ] looking at
• Google in English
• Baidu in English
• Google in Chinese
• Baidu in Chinese
We thought it wise to define a few terms along the way:
SNL: Static Number of Lines: number of lines of useful info displayed on the page before any Ajax (note: this isn’t funny)
DNL: Dynamic Number of Lines: number of lines of useful info displayed on the page after Ajax
We need a ratio now to determine just how ‘heavy’ or how much of the page is determined dynamically, let’s call this PC (or Percent Change in Content), whereby:
PC = (DNL-SNL)/SNL
We define a similar parameter:
PC_Ajax = (DNL - SNL) / DNL
We go on further to define strictly the percent of content added vs deleted on the pages, but for all intensive purposes, PC, and PC_Ajax as a generic metrics should point us in the right direction.
The process looks sound so far - but we ran into a concrete wall almost immediately. After evaluating a set of results, we found pages comprised almost entirely dynamic content - this goes immediately counter to our thesis. Granted - there are a number of other factors that contribute to a page/site’s ranking on Baidu for example backlinks, meta tags - let’s come back to this - maybe.
Elsewhere, we found pages where there was neither text, nor dynamic content - the pages were images! Such a page was: http://www.grandhotelbeijing.com/ - proof that Baidu parses images well, that other factors were at play, or perhaps both.
There were 23 of these 'all-JS’ pages, which comprise 2.35% of our total data points. We’ll omit these as well as other results where PC is < 0, or where DNL equals 0.
The results indicate a positive relationship between rank and how dynamic pages are. That’s it! We’re done!
Not so fast - while the slope is positive for both, it’s somewhat meaningless given the amount of noise. Looking at the top 10 search results (I wonder at this point why we even went out to pages ranked 50th), we see an incredibly amount of dynamic content with 20-30x the amount of dynamic vs static content. What does this mean?
It could be that more highly ranked sites are more sophisticated - perhaps they inject a number of advertisements, and the static content is sufficient for Baidu to understand the nature of the page. These can’t be validated with the data at hand however.
Honing in on the top 10, or generally ‘Page 1’ of Search, we find a slightly different trend. Here, we find that Baidu prefers more dynamic pages while Google still has a preference for less Ajax.
Next, we consider the impact that PC_Ajax has on rank. Notice the difference between PC_Ajax, and PC. PC_Ajax uses the final information (i.e. DYnamic Number of Lines) in the denominator - now, the values are between 0 and 1. At this point - it’s hard to make any sense of this data, there is no slope, nor is there any significance.
These results, while consistent, with those found earlier with PC still don’t say much. To make more sense of the data, we plot the Median of PC_Ajax for each rank - and our conclusion is now, finally becoming more clear:
Setting out to run our final set of regressions, we split out the language of the query (e.g. Hong Kong Hotels vs 香港酒店) and evaluate if language has an impact on the amount of Ajax used, and thus ranking. After all, Baidu is a Chinese search engine, and Google is an English search engine (well - largely). Perhaps running English queries through Baidu, and Chinese queries through Google has been messing up the data all this time, but alas the answer is ‘No’ and our conclusion becomes even more clear.
We analyzed the relationship between Search Ranking, and the Degree to which pages were dynamic. Looking back, if we could better identify, or split out the nature of, the portion of, or the location to which dynamic content was generated, we could perhaps identify if the dynamic content was additive to the ‘understanding’ of the page as understood by a search engine, or if it was used for other purposes like advertisements.
At this point, it’s fair to say there is no correlation between how dynamic pages are, and how they rank on Baidu.