Nov 14
Guidelines for successful indexing

 


Here are some recommendations that may help MSNBot and other web crawlers effectively index and rank your site. We’ve also provided a list of items and techniques that MSN Search discourages.

Technical recommendations for your website (back to top)
  • Use only well-formed HTML code in your pages. Ensure that all tags are closed, and that all links function properly. If your site contains broken links, MSNBot may not be able to index your site effectively, and people may not be able to reach all of your pages.
  • If you move a page, set up the page’s original URL to direct people to the new page, and tell them whether the move is permanent or temporary. For more information, see What to do when your site moves.
  • Make sure MSNBot is allowed to crawl your site and is not on your list of web crawlers that are prohibited from indexing your site.
  • Use a robots.txt file or meta tags to control how MSNBot and other web crawlers index your site. The robots.txt file tells web crawlers which files and folders it is not allowed to crawl. The Web Robots Pagesprovide detailed information on the robots.txt Robots Exclusion standard. This site may be available in English only.
  • Keep your URLs simple and static. Complicated or frequently changed URLs are difficult to use as link destinations. For example, the URL www.example.com/mypage is easier for MSNBot to crawl and for people to type than a long URL with multiple extensions. Also, a URL that doesn’t change is easier for people to remember, which makes it a more likely link destination from other sites.

Content guidelines for your website (back to top)

The best way to attract people to your site, and keep them coming back, is to design your pages with valuable content that your target audience is interested in.

  • In the visible page text, include words users might choose as search query terms to find the information on your site.
  • Limit all pages to a reasonable size. We recommend one topic per page. An HTML page with no pictures should be under 150 KB.
  • Make sure that each page is accessible by at least one static text link.
  • Keep the text that you want indexed outside of images. For example, if you want your company name or address to be indexed, make sure it is displayed on your page outside of a company logo.
  • Add a site map. This enables MSNBot to find all of your pages easily. Links embedded in menus, list boxes, and similar elements are not accessible to web crawlers unless they appear in your site map.
  • Keep your site hierarchy fairly flat. That is, each page should only be one to three clicks away from the home page.

Items and techniques discouraged by MSN Search (back to top)

The following items and techniques are not appropriate uses of the MSN Search index. Use of these items and techniques may affect how your site is ranked within MSN Search and may result in the removal of your site from the index.

  • Loading pages with irrelevant words in an attempt to increase a page’s keyword density. This includes stuffing ALT tags that users are unlikely to view.
  • Using hidden text or links. You should use only text and links that are visible to users.
  • Using techniques to artificially increase the number of links to your page, such as link farms.

Nov 14

 

About website ranking

MSN Search website ranking is completely automated. The MSN Search ranking algorithm analyzes factors such as web page content, the number and quality of websites that link to your pages, and the relevance of your website’s content to keywords. The algorithm is complex and never human-mediated. You cannot pay to boost your website’s relevance ranking; however, we do offer advertising optionsfor website owners.

Each time the index is updated, you may notice a shift in your website’s ranking. As new sites are added and some sites become obsolete, previous relevance rankings are revised.

Although you cannot directly change your website’s ranking, you can optimize its design and technical implementation to enable appropriate ranking by most search engines. See Guidelines for successful indexing.

To comment about MSN Search ranking or about your specific website, please send us feedback.

Nov 14

 


A diferencia del resto de buscadores analizados en este sitio, la propia MSN ha publicado una página acerca de su algoritmo

Además, el mismo buscador ha desarrollado una guía para un posicionamiento satisfactorio, que resulta de gran utilidad: permite saber qué cosas tiene en cuenta este procedimiento (sin entrar, obviamente, en grandes detalles). A grandes rasgos, se indica que:

  • Se use HTML bien formado
  • Se añada un sitemap
  • Potenciar las arquitecturas planas (no más de tres clicks desde el inicio)
  • Se penaliza la inclusión de palabras no relacionadas como forma de “engaño”.
  • Utilización de texto en fuente pequeña o con el color de fondo.
  • Se recomienda poner el texto importante lo más arriba posible dentro de la página.

Se observa claramente cómo este algoritmo se basa en aspectos ya comentados para los dos anteriores, lo cual da una idea de hasta qué punto es difícil encontrar distintas estrategias que permitan ponderar la adecuación de un contenido.
No obstante, no debe olvidarse que los algoritmos de posicionamiento son intrinsecamente muy complejos. Se asegura que los empleados actualmente consideran varios millones de variables.

Tags:

Nov 14

 

Yahoo! Slurp is Yahoo!’s web-indexing robot. The Yahoo! Slurp crawler collects documents from the Web to build a searchable index for search services using the Yahoo! search engine. These documents are discovered and crawled because other web pages contain links directing to these documents.
As part of the crawling effort, the Yahoo! Slurp crawler will take robots.txt standards into account to ensure we do not crawl and index those pages that you would not like to have returned via Yahoo! Search Technology. If a page is disallowed to be crawled by robots.txt standards, it is neither considered for inclusion nor placed in the search engine’s database.
Yahoo! Slurp follows HREF links. It does not follow SRC links. This means that Yahoo! Slurp does not retrieve or index individual frames referred to by SRC links.
Yahoo! Slurp has support for frames and makes an effort to crawl complex URLs such as those generated by forms, content generation systems, and dynamic page generation software.

How can I prevent Yahoo Slurp from following links from a particular page or archiving a copy of a page?

Yahoo! Slurp obeys the noindex meta-tag. If you place:
<META NAME=”robots” CONTENT=”noindex”>
-in the head of your web document, Yahoo! Slurp will retrieve the document, but it will not index the document or place it in the search engine’s database.

<META NAME=”robots” CONTENT=”noindex”> Yahoo Slurp will retrieve the document, but it will not index the document.
<META NAME=”robots” CONTENT=”nofollow”> Yahoo Slurp will not follow any links that are present on the page to other documents.
<META NAME=”robots” CONTENT=”noarchive”> Yahoo maintains a cache of all the documents that we fetch, to permit our users to access the content that we indexed (in the event that the original host of the content is inaccessible, or the content has changed). If you do not wish us to archive a document from your site, you can place this tag in the head of the document, and Yahoo will not provide an archive copy(Cache) for the document.

Yahoo Slurp indexes not only the title and meta tags, but also the full text of webpages. So including quality content in the webpage is as important as including keywords in the title and meta tags.
Yahoo searches for pages and when it finds a page with the required keyword, it lists the page in its SERPs. The position of the page depends on the content. But there are chances for a page with the required keyword being left out due to poor content or because Yahoo could not find the page.

How to attract Yahoo slurp to crawl the site ?

There are 3 ways you attract the Yahoo Crawler in crawling the site:

1. Get links from sites that are regularly crawled by the Yahoo Robot, If that is done Yahoo regular visits the site and crawls it, Regular Yahoo visit is a good sign and helps a lot of getting good Ranking,

2. As yahoo says you can trigger the Yahoo Robot by browsing a site using the Yahoo companion toolbar, Yahoo says this will trigger the Yahoo slurp Bot,

3. Through the Infamous PFI/PPC program sitematch, This type of inclusion guarantee’s an inclusion into the Yahoo index, so no problem using it,

Nov 14

The new Yahoo! Search offers these services and tools to help you find whatever you’re looking for, faster and easier than ever.

Yahoo offers the following different types of searches:

Web Search:
Yahoo uses a powerful search algorithm powered by its crawler, Yahoo Slurp, to list webpages related to keywords entered by the user.

News Search:
This section is used to search for news stories, pictures, and audio/video.

Image Search:
The Yahoo image search is used find photos and illustrations from all over the Web. It lists images by crawling the image src tag and lists them.

Directory Search:
It lists webpages from the yahoo directory, related to keywords entered by the user

Yellow Pages Search:
It is used to search for local businesses serving a particular area.

Product Search:
This search is used to find product reviews and prices on the web.

Search Tips
• Try to use specific words rather than general words. For example: If you are searching for the history of books, use “history of books” rather than “books”.
• Avoid using words with more than one meaning.
• For including and excluding words, use + and -.
• You can also use logical words like OR, AND.
• To search for an exact prase, include quotation marks.

Search Meta Words
Here is a list of special keywords that give you results unique to that special keyword instruction. You can enter these Meta words directly into the Yahoo! Search box.

• site: this allows one to find all documents within a particular domain and all it’s subdomains.
Example: site:searchenginegenie.com
• hostname: this allows one to find all documents from a particular host only.
Example: hostname:autos.yahoo.com
• link: this allows one to find documents that link to a particular url.
Example: link:http://www. searchenginegenie.com/
• url: this alllows one to find a specific document in yahoo’s index.
Example: url:http://www. searchenginegenie.com/links.html
• inurl: this allows one to find a specific keyword as part of indexed urls.
Example: inurl:bulgarian
• intitle: this allows one to find a specific keyword as part of the indexed titles.
Example: intitle:Bulgarian

Yahoo purchased Overture (the leading Pay Per Click provider) in the middle of 2003. By purchasing Overture, Yahoo now owns AltaVista and AllTheWeb (which were purchased by Overture earlier in the same year). By February 2004, Yahoo stopped showing Google results on their SERPs( Search Engine Result Pages) and started using results based on their own database. The search results were now powered not only by Yahoo’s algorithm, but also by technologies from AltaVista, AllTheWeb and Inktomi.

Tags:

Nov 14

According to Yahoo, “Yahoo! Web Rank” is a measure of how popular a web page is based on the number of links pointing at it.

When you enable the Web Rank button on the Yahoo! Companion Toolbar, the URLs you visit will be sent to Yahoo! in order to get its Web Rank (from 1 to 10). The info sent about the URL is anonymous and DOES NOT include your Yahoo! ID or any personally identifiable information.

According to me, It is not like Google Pagerank in any way, Though Yahoo says it takes links into count I feel it is not the case, They mostly check the popularity of the site from people using their toolbar and use that data to assign a webrank to it, This is another good feature of yahoo to get into a direct competition with the Big G, Google is our words,

You can turn off Web Rank at any time from the Toolbar Settings menu (pencil icon). When disabled, URLs are no longer sent to Yahoo!.

SO how does Web rank feature work?

The Yahoo! Web Rank feature of the Yahoo! toolbar works by collecting anonymous URL data about the page you are visiting. This anonymous URL data is sent to the Yahoo! Companion servers, and a Yahoo! Web Rank value( which yahoo knows already) is returned to the Yahoo! Companion Toolbar as one measurement of the link popularity of the Web page or URL you are visiting.
You will see a small icon on your Yahoo! Companion Toolbar displaying the Web Rank value (on a scale of 1 to 10) of the site you are currently visiting. Web rank is measured by the link popularity of the web page, Mostly i feel Yahoo gives a lot of weight to sites in Dmoz and yahoo directory, According to them Dmoz is an expert hub and links from it is credited a lot, We have a site with about 500 backlinks in Yahoo showing a Yahoo webrank of 0, ANother sites has only 20 links but one link is from an important category in Dmoz directory that site has a Yahoo web rank of 4 which is really surprising,

Also the big bug I saw in the web rank feature of Yahoo is the webrank of Google, Google has one of the most powerful backlink on the web but Yahoo says Google’s Webrank was 0, Which is a Bug in their system, But later it was rectified and now it shows a webrank 10 for yahoo which is a good sign, May be Future is good for Yahoo’s Webrank feature but right now it is not too much of an SEO attraction,
How do I install the Yahoo! Web Rank feature?

First you have to download the Yahoo companion toolbar from Yahoo, When that toolbar is downloaded you have to install the Yahoo! Companion Toolbar, then you will have the option of enabling the Web Rank feature. If you would like to use it, choose the “Install WITH Web Rank” button on the configuration page and the Yahoo! Web Rank feature will be enabled during the installation of the toolbar.

Important feature of Yahoo webrank is helps in alerting the Yahoo! Search crawler to the existence of a particular Web site or Web page, and direct a crawler to visit that Web site or Web page for inclusion in Yahoo! Search if it is not already in the Yahoo! Search index. This is an important feature and helps you in possible inclusion in the yahoo index, Yahoo crawler is the 2nd most active crawler on the web and I feel you will have a good faith that Yahoo slurp is triggered by this webrank feature.

Yahoo Searchengine

Yahoo Webrank

Yahoo Slurp

SiteMatch

Conclusion

Tags: ,

Nov 14

El procedimiento utilizado por el buscador Yahoo para posicionar las páginas parece emplear también criterios de popularidad, pero no mediante el recuento de enlaces (pues en este foro se afirma que este sistema está patentado por Google).

Según algunos expertos, WebRank se basaría en la antigüedad del sitio web como forma de medición de su credibilidad. Sin embargo, se suele decir que Yahoo está siguiendo los pasos de Google en este sentido, por lo que no es raro ver que cuando se optimiza las búsquedas para un buscador, se obtengan generalmente buenos resultados en otro.

Se recomienda consultar el sitio especializado SearchEngineGenie . Aún a pesar de que su contenido está en inglés, en él se puede encontrar una descripción a grandes rasgos del funcionamiento interno de este algoritmo.

No obstante, SeoBook aclara que la falta de contenidos y estudios sobre Webrank se debe en gran medida al estado de desarrollo, aún experimental.

Bibliografía y referencias de interés

Tags: , ,

Nov 14

You know about PageRank and about two weeks ago I mentioned a new paper from Stanford’s Database Group discussing PeopleRank. Today, another paper posted on the Stanford server. This one introduces TrustRank that has been developed to help fight web spam. Here’s the abstract:

Web spam pages use various techniques to achieve higher-than-deserved rankings in a search engine’s results. While human experts can identify spam, it is too expensive to manually evaluate a large number of pages. Instead, we propose techniques to semi-automatically separate reputable, good pages from spam. We first select a small set of seed pages to be evaluated by an expert. Once we manually identify the reputable seed pages, we use the link structure of the web to discover other pages that are likely to be good. In this paper we discuss possible ways to implement the seed selection and the discovery of good pages. We present results of experiments run on the World Wide Web indexed by AltaVista and evaluate the performance of our techniques. Our results show that we can effectively filter out spam from a significant fraction of the web, based on a good seed set of less than 200 sites.

Tags:

Nov 14

Recuperacion y organizacion de la informacion

El algoritmo de posicionamiento TrustRank es, según los expertos, el fin de las páginas poco útiles y bien posicionadas.
 

Como vimos en el algoritmo PageRank, la base del cálculo era que “un enlace es una recomendación” , y en base a ello se calculaba el interés de un determinado sitio web.

En estas circunstancias, seria posible engañar a los motores de recuperacion: si se generan enlaces desde páginas importantes, o simplemente muchos enlaces desde cualquier web (con independencia de su PageRank), la página es “muy recomendada”, por lo que se posicionaría arriba.
 

Esta burda forma de engaño trata de ser paliada por la vía algorítmica de los buscadores. Los algoritmos se cambian periódicamente y tratan de penalizar las malas prácticas. Sin embargo, todo algoritmo es de por sí una ley (de ordenacion o procesamiento). Y, como sabemos, algunos se dedican a buscar la debilidad de la ley que les beneficie
 

¿Qué es TrustRank?

TrustRank es, precisamente, el algoritmo que invalidaría este tipo de trucos. Se basa en analizar semánticamente la validez de las páginas, para evaluar de forma real si es útil para los usuarios
 

El proceso se desconoce todavía. La Universidad de Stanford expuso en un artículo la idea de que probablemente el proceso será guiado por humanos (ya no por computadores), que evaluarán un conjunto de páginas web (denominadas semilla). Dichas páginas serán transmisoras de TrustRank, de forma que cada página transmitirá un TrustRank un punto inferior al que tenga, de forma que con la distancia a la semilla se disminuiría dicho valor.
 

La gran diferencia de este proceso es que esos evaluadores de webs podrían otorgar valores de TrustRank negativos , de forma que se eliminaría, de una vez por todas, la existencia de las páginas inútiles.

Eleccion de las páginas semilla

Aunque se debe reiterar que se trata de una especulación, lo que sí es cierto es que habría páginas de reputada credibilidad (Universidades, Organismos oficiales) que formarían parte de ese conjunto.

A partir de ahí, podrían ser páginas interesantes:

  • Empresas con certificados de calidad ISO
  • Organos históricos en el ámbito de conocimiento (Real Academia Española…)
  • Medios de comunicacion (prensa, radio, televisión)
  • Organos de Internet (W3C, IETF)
  • … y otros muchos.

Eliminación de las páginas indeseadas

En efecto, las páginas trucadas dejarían de tener sentido. El algoritmo estaría controlado por humanos lo que haría poco efectivo las técnicas de engaño. Como consecuencia, esas webs dejarían de tener interés y, definitivamente, se limpiaría Internet del spam web actual.

Bibliografía y referencias de interés

Tags: ,

Nov 11

¿Porqué Google realizó este ajuste? Para combatir cuatro problemas actuales de Internet * Prácticas de SEO poco éticas (venta de links por ejemplo) * La forma en que el PR favorecía en exceso a dominios “antiguos” por encima de una página nueva * Fraude de Page Rank (El valor comercial de muchos sitios dependía del PR más que de su calidad) * Eficiencia de AdSense: incluyendo clicks inválidos, clicks no intencionales y por supuesto reducir el trafico a SpamBlogs

Últimamente ha habido gran cantidad de noticias frente al cambio que estaría generando Google en su Algoritmo de búsquedas, lo cual podría cambiar de forma significativa el ranking que cada sitio tenga en lo que es posicionamiento web.

Sobre esta noticia, la principal acción de Google ha sido ir castigando todos aquellos sitios que venden links hacia el sitio del cliente con el objetivo de mejorar el posicionamiento web. Con ello se busca eliminar la distorsión que ha generado en la acuciosidad del Algoritmo en discriminar en base a relevancia. Incluso se está hablando que los sitios que estén ligados vía Links a este tipo de sitios quedarán fuera dee Google.

Por tanto todas aquellas compañías que busquen generar posicionamiento web de forma poco ética tendrán un difícil camino que emprender. El algoritmo al parecer ya no perdonará pecados.

Tags: ,