Google Office Hours 7-15-14
Clearing out 404s
Speaker 1: Once there’s a page that’s a 404, and suppose a client has ten 404s, ten pages with 404s, How long would it take until you guys clean those pages out? Because I still see them after three weeks. They’re still indexed. A client of mine is really picky about that, so she wants them gone.
John Mueller: So if these are just a handful of pages, I just use the URL removal tool for that, and just get them taken out. Usually for 404s or for any kind of page, we essentially just have to recrawl them, and depending on how they’re linked within your website, how often we recrawl your website in general, how often we recrawl those pages, it can take a couple of days, a couple of weeks, maybe even a couple of month for those 404s to drop out. If this is just a handful of pages and you’re really picky about these being visible in search, then you could just use the URL removal tool. If there are a lot of pages, like hundreds or thousands of them, I just let them drop out naturally, even if it takes a couple of months to kind of get reprocessed.
Speaker 1: Ok, and what about random ones? For instance, there’s one from CNN, but the blog is no longer there, and it’s been there over four weeks.
John Mueller: So on someone else’s website or…
Speaker 1: Yeah, yeah
John Mueller: You could use a URL removal tool for that as well. Usually that’s not something you’d have to clean up for other people, so as long as it returns a 404, it’s blocked by a robots.txt, or it has a noindex on it, you can use a URL removal tool for that.
Speaker 1: Ok.
John Mueller: All right, let’s grab another one from you guys. Let’s see who’s still awake.
Speaker 2: I have a question there if you have time, John.
John Mueller: Sure.
Defending Against SEO Attacks
Speaker 2: I have a good friend in the SEO industry who tells me he runs a very large e-commerce site, and he gets negative SEO attacks all the time, or what he thinks are negative SEO attacks in form of bad links to his pages. I didn’t think the disavow file worked like this, so I’d like to tell you what he’s doing and you tell me if this is a disavow file or not.
He says that he sees a certain page start to lose ranking, he checks the backlinks for that page, he sees some bad links pointing to it or links that he didn’t make or doesn’t trust. He puts those links in a disavow file, and he said like clockwork, three or four days later, the rankings for that page will come back, and he said this has happened about dozen times, so he doesn’t think it’s coincidence. Now if this is the way the disavow file works, I think that to be an incredible positive thing, because it means that webmasters have a chance of defending themselves against some of this negative SEO crap that’s going on. So I don’t know if you can confirm or deny it, but I didn’t think the disavow file worked that way, John. I thought it took much longer.
John Mueller: Yeah, usually it would take longer. So my guess is maybe they’re seeing effects from something else. Maybe some algorithms are picking up changes and reprocessing them, and it just happens to coincide with that. In general, for the disavow file, we need to recrawl those links, and depending on what the problem was with those links, we need to rerun those algorithms to do that. So if this is with the penguin algorithm, then it takes a while for us to actually rerun that algorithm, and that’s not something where you’d see changes within a couple of days or a couple of weeks. So from that point of view, that wouldn’t be related to the normal type of issues where we react to problematic links to a page. My guess is that this is just our normal algorithms picking up changes on that website, on those pages, and responding to that, and the disavow file is good in the sense that it’ll prevent those problematic links from causing any problems in the long run, but you’re probably not going to see any really short term effects like that.
Speaker 2: Right. Ok. Thank you. That’s how I thought it worked. Thank you very much.
John Mueller: So I guess in those cases, our normal algorithms are working fine and just ignoring those negative SEO attacks, and the fluctuations they are seeing with those individual URLs are just normal fluctuations from algorithms in general.
Speaker 2: Great. Thank you.
John Mueller: All right, let’s grab some questions from the Q&A here.
Google Post-Penguin: Could it Happen?
Q:“If Google has decided not to use the Penguin algorithm ever again, would Google rerun one last time to help the sites affected that have cleaned up, or wouldn’t the Penguin-hit site be affected forever?”
A: As far as I know, we’re not retiring any of these algorithms, and when we do retire them, we generally remove them completely so that any effects that were in place from those algorithms won’t be in effect anymore. So it’s not that we turn off this algorithm and keep whatever data was in there forever. It’s essentially something where if we decide to turn off the algorithm, we will remove the data that’s associated with it. So it’s not the case that stuff gets stuck forever.
Speaker 1: Well, there’s rumors out there that it’s not coming back.
John Mueller: Oh, there are lots of rumors out there. Come on. You know that. This is something where we tend to announce things when they’re ready, and we put them up when they’re ready. And sometimes it takes a little bit longer than we expect, so that’s the case here with the Penguin algorithm. It’s not that we’re turning it off or that we’re leaving it in this state forever. It’s essentially just taking a bit longer for us to update that data, so from that point of view, I wouldn’t believe any random rumors that people are making up based on the current situation without having any information from our side. But I think that’s normal in the SEO industry, to some extent. And I think that’s something that we have to work on as well from Google’s point of view, in the sense that people are making up random rumors and they’re making decisions based on those rumors. That’s partially our problem as well because we need to respond to those kinds of questions.
Barry Schwartz: John, speaking about rumors
Speaker 1: I follow your twitter there, the Webmaster twitter. Are you guys going to be more active on that?
John Mueller: The webmaster twitter account, we are a bit more active on that. We have started a lot more on the Google+ page, the Google Webmaster page. Those are essentially our normal platforms where we’re bringing out a lot of this information. If there are specific topics that you think we need to be more active about, feel free to let us know.
Matt Cutts’ Absence
Barry Schwartz: John, speaking about rumors and stuff like that, now that Matt Cutts is no longer around for the short, four month period or so, who do we ask about when there are specific updates? Who is taking over that role at Google to tell us about updates? Is it you, since you took a hit for the authorship, or is it going to be somebody else?
Speaker 1: John Mueller.
John Mueller: Probably a combination of people. Depending on who’s working with the teams involved, they’ll get that information out there. It might be one of us: me, Pierre, Gary, Zinna, Maria, who were working directly with the webmaster outreach team on those issues. But we also have people in Mountain View who are working on a lot of these things, who are running the Twitter account, the Google+ pages, and those are channels as well where we’re going to be bringing out this information.
Barry Schwartz: So there seemed to be something going on June 28 and possibly also July 5. Are you aware of the stuff Google is working on that you could say? A lot of people were saying it was Panda refreshes and stuff like that. Could you comment about that, or are you just not aware?
John Mueller: I’d have to double check what’s on those specifics dates. We make up dates all the time, so it’s sometimes tricky to figure out. Webmasters are noticing this specific update and not the five other ones that we did on that day. So that’s kind of tricky. But if you have examples that we can look at where we’re doing things wrong or where the rankings don’t look like they should, then that’s always useful for us to take back to the teams and figure out what exactly is happening and what we could be doing better.
Barry Schwartz: So just to step back a little, sorry to take up in the past, I got the impression that you didn’t have the level of access or security clearance to know certain things about what’s in the works in terms of certain algorithms, or maybe you were told x months or x days before something was being released to prepare you. But now are you more involved in certain things that are rolling out every so often, in terms of algorithms, since Matt Cutts is on a leave, or doesn’t it work that way?
John Mueller: Partially, partially. A lot of the things Matt has been doing, we have to take up that room and start to do those things as well and a lot of what he’s been doing has already been done by other people anyway, so it’s a mixture of both sides.
Barry Schwartz: Ok. Thank you.
Q:“Will Google still run large updates such as Penguin even though Matt Cutts is on leave through October?”
A: Yes, absolutely. Matt doesn’t have to do everything, and I think if we had one person who ran all of our algorithms then that would be bad. These are algorithms that other people have been working on even before Matt was active here, so that’s not something that gets paused or gets stopped until he is back. These updates will continue running. We’ll continue putting out new updates, new algorithms, and hopefully, improve your search results by doing that.
Speaker 1: But the effect won’t be as the one last year, right? The really shaky effect that was… you know…
John Mueller: Some algorithms are more visible than others, so it’s not that we’d say since Matt is on leave, we’ll just make small updates and tweaks instead of bigger changes. We’ve always had to make bigger changes from time to time. Sometimes even the smaller changes look like bigger changes just because the external ranking tools are testing it in a really weird, skewed way that make it look like a really big change, but actually, it’s a kind of a small change. And sometimes really big changes roll out on our side that external tools don’t even notice. All of the Hummingbird stuff, for example; we thought this was a pretty significant change, and a lot of these external ranking and search tracking tools didn’t even realize that we rolled it out.
Barry Schwartz: I noticed, and you guys denied it.
John Mueller: All right. Barry notices everything. I think it’s hard to hide anything from you.
Speaker 1: Even the logo that you changed, he updated right away on his headlines. I didn’t notice the logo change. It was like one millimeter or something like that.
John Mueller: Yeah.
Cross-Marketing on Owned Sites
Q:“If we own multiple sites, some educational, some e.commerce, should banner for the e-commerce sites on the educational sites that we also own be nofollowed? Does it matter if they’re followed, nofollowed or side-wide, not sitewide?”
A: In general, I’d see this more as a question of “Are these essentially advertisements on those pages or are they natural links within the content?” If they’re essentially advertising, then I’d treat that like any other advertising, even if that’s for the your own sites. So in those cases, I’d definitely put the nofollow on there. If this is just a natural link within your content and it happens to be to another site that you own, then that’s fine. that’s like any other natural link on your content.
Q:“Why does Google index pages that don’t actually appear on my website? For example, other websites create an incorrect link to one of my articles, and it gets indexed by Google.”
A: I probably have to take a look at the examples to see what exactly is happening there. My guess is that either your website is actually serving content for those URLs or maybe those URLs are blocked by the robots.txt file so that we can’t actually tell that those pages don’t exist. In both of those cases, that’s suboptimal on your website because essentially any URL could be indexed in a case like that. What I recommend doing there is to make sure that if these URLs don’t exist, they really return a 404 and that they can’t get indexed like that.
Churn and Burn Black-Hat Strategy
Barry Schwartz: Can we talk about the sandbox?
John Mueller: The Sandbox… Okay, go ahead. What’s on your mind?
Barry Schwartz: I’ve been seeing a lot chatter, especially in the “black hat” forums, about these webmasters that say they used to “churn and burn.” They create websites, Google smacks them a month later, and then they try again. They’re saying it’s much harder for them to get their rankings in the normal amount of time. I don’t know what the normal amount of times is, but it’s much harder now
This reminds me of the original Google Sandbox. I don’t know what you guys named it back in 2004. I don’t know if you were with Google in 2004. But it reminded me of that. They said this was released a little after Panda 4.0, where, it’s much harder for them to actually start ranking well for these churn and burn black hat websites. Do you have any comments about that?
John Mueller: That sounds like a good thing to hear, I guess.
Barry Schwartz: No, it is, but..
John Mueller: (laughs) I don’t know specifically what they’re seeing there, but I do know that we have various algorithms to try to recognize these kind of situations where sites pop up really quickly and get really high visibility and search results, and we need to take action on them manually because they’re doing things quickly. Those are the things that we try to recognize, so maybe they’re just seeing updates and those algorithms that are causing that. But I think-
Barry Schwartz: Whether some updates and algorithms rely around the round Sandboxie stuff?
John Mueller: I’d have to double check when specifically, but these are the things that we work on as well, and that’s definitely possible that we rolled out one of those recently.
Barry Schwartz: Ok. I’ll touch base with you and find out if you could give me anything on-record. Thanks.
John Mueller: Sure.
Dealing with “Doorway Pages”
Q:“Many high value local searches are still dominated by sites that create hundreds of services in placename.html URLs with thin or spun text. How should small local business compete against those organics results without adding to the spam?”
A: I think it’s good to avoid creating these kind of doorway pages because that’s going to be a long term liability, even if they show up in the search results at the moment. I strongly recommend making sure you’re sticking to the guidelines, not falling into this trap of creating doorways pages. At the same time, if you’re seeing these kinds of spammy pages ranking, submitting a spam report is really useful for us. Or, if you see that this is something systematic which you can show with some very generic searches, then that’s something you can also send to me directly, and I can take a look at it with the search quality teams here. This is something where we try to recognize these situations and respond appropriately.
Sometimes, for example, our high quality site algorithms will recognize this and say, “Hey, there are hundreds of pages on this website, they’re all kind of similar, and they’re all really low quality” then maybe we shouldn’t be trusting this website anymore. We have a bunch of algorithms that try to recognize and work with this kind of issue, but again, if you see that these are still happening, you’re welcome to forward those to us. I can’t promise that we can fix every one of these immediately, but it’s definitely useful feedback for us to bring back to the engineers so that we can work on creating algorithms that handle these situations better.
My Site is Stuck on Page 2: Do I have a Penalty?
Q:“My page has been stuck on page two in the coupon niche. Occasionally, it bounces to page one and a very spam niche will get sites on page one. Is there anything I can do to help it move up, or could being stuck on page two be an algorithmic penalty for me?”
A: I don’t think we have any algorithm penalties that stick websites to page two. We do have various algorithms that look at the quality of these websites, though. So if your website is in the coupon niche, there are a lot of really low quality sites that we see in those niches that essentially just take feeds and re-publish them without adding any additional value to those pages. That’s something where I’d work hard to make sure that there’s something unique and compelling on your website, not just the usual feeds that these kinds of sites re-publish. I imagine it’s going to take a bit of time for users to recognize the high quality nature of your site, and for our algorithms to pick up on that as well. I think I’d just recommend you keep working on that, to make sure your site is the highest quality possible, not just re-published information from various feeds.
Old Domains vs. New Domains
Q: “I decided to migrate a website with a history of search quality issues to a different domain, but the domain I want to use has been 301 redirected to a domain we’re abandoning. Is this a problem? Should I pick an entirely new domain to be safe?”
A: That sounds like a weird circular situation. In general, that should work. At least from a technical point of view, that shouldn’t be that much of a problem. It might take longer for us to recognize this new situation where one site was redirecting to the other and now that other site is redirecting to that first one. With regards to the search quality issues that you mentioned, if you 301 redirect the old site to the new domain, you have to keep in mind that some of those problematic issues with regards to search quality might be forwarded to the new domain as well.
For example, if you have a history of problematic linking to that old domain, and you just 301 redirect to a new domain, then all of those problematic links are essentially forward as well. So those issues are following your website along if you just redirect them. On the other hand, if it was more a question of the quality of the content, and you revamped the content completely, and you’re just moving to a different domain to do the branding side of things, then that’s generally fine. I wouldn’t try to use the 301 redirect as a way to get out of the search quality issues without actually fixing those issues first. If you fix those issues, you can stay on that domain because that’s almost just as well as moving to a different domain. So in that regard, I’d recommend you have to search quality issues fixed completely, and then, the move from one domain to another is more a technical issue than anything related to search quality.
Barry Schwartz: John, Ashley has a question.
John Mueller: Yes.
Ashley Berman Hale: So if a webmaster follows all directions in a site move, 301s, change of address tool, but Google keeps showing the non-preferred version months after the redirects, what can the webmaster do or how can they change it? Or is it a trust issue with the new domain?
John Mueller: How are you seeing the whole domain? Is that if you’re doing a site query or…
Ashley Berman Hale: A site query or you’re Googling. It looks like you move from longer domain to shorter domain .com. You bought a new domain. You wanted a snappier brand, but the longer one keeps showing up. URL structure is entirely the same. 301 redirects are good, no blocking, change of address tool used, but Google just keeps sticking to the old one, let’s say, six month after the site move. What should a webmaster do?
John Mueller: Post in the help forum. (laughs). That’s something we probably want to look at. So posting in the forum or posting to me directly, that’s something you could do to let us know about that so we can take a look.
One thing to kind of keep in mind, if you’re doing a site query for the whole domain, then sometimes we’ll just show the old domain content anyway because we think, “Oh, you’re looking for this specific URL for your site.” And even if we know that it’s actually moved to a new one, we’ll say, “Well, we know this content used to be on this URL, so we’ll show it to you because we think it matches what you’re looking for.” Essentially, we’re trying to show you the information that you’re looking for, and in your case, you’re trying to confirm that it’s not there anymore. So the site query would probably not be so helpful in a case like that.
The other thing to keep in mind is that, for larger sites, there will always be some pages that just take like a really long time to be recrawled. So if you do a site query, probably the whole page, the main pages, we’ll move over fairly quickly, but there might be some long tail pages there that we just don’t crawl that frequently. Maybe it’ll take a half a year, maybe even a year, for us to recrawl and reprocess all of those pages and see those moves. So that’s something you might see in the site query; long tail pages that are just stuck on the old domain and take a long time to be reprocessed.
Ashley Berman Hale: So if Google is just really tenacious about showing the old, unpreferred version, there’s nothing you can do beyond all the signals, but just wait and ping you… ping the forum?
John Mueller: Yeah, that’s something that should pick up automatically at some point. Sometimes there are algorithms on our side that are trying to pick the right URL where we see there’s 301 redirect, but we all have the link pointing to the old version, for example. And we think, “Well, everything keeps linking to the old version. It must be the preferred version, even if they’re redirecting somewhere else.”
So really making sure that all the signals are telling us that the new one is really the right one to use, 301, maybe a rel canonical, updating the old links if you can, contact the people that are linking to the site… All that adds up for us, but if we’re really picky, and we just keep sticking to the old link, then it sounds like that’s something our algorithms should be better at recognizing and something that we can talk to the engineers here to make sure that we’re doing the right thing instead of sticking to an old domain..
Ashley Berman Hale: thank you.
Speaker 3: Hey there John.
John Mueller: Hi.
Speaker 3: Just to kind of add to that, if somebody happens to be in that same situation we were in, we actually had the co.uk website, and 301 directing to our .com. When we decided to make the changes we did, our .co.uk was put as a stand-alone site, and it worked very well. So this person actually has cleaned up all their problems and is just waiting for a penguin refresh, but they’re not willing to wait any longer because, as we discussed, we don’t know when it’s going to happen. Then it could be a very positive move for them and shouldn’t be an issue.
John Mueller: Yeah. But I think in your case, it wasn’t that you moved to the other site, you essentially built up a new website on the co.uk. Right?
Speaker 3: Well, right. It’s the same content, essentially, and with pricing and a different currency and using the hreflang. We discovered that there wasn’t anything wrong with our content, and it actually was Penguin holding us back, even though we cleaned up all of our links problems. The only problem we had was the fact that we were waiting for a Penguin refresh because our .com site is still nowhere, and our .co.uk site still is. We just hit page one from co.uk web again, after four and half years. So we are simply going through these hoops waiting for Penguin to refresh. Now it’s been nine months I think, or whatever it is. There’s probably a lot of people who have cleaned up their problems and are simply just waiting for a penguin refresh. Another two or three, four months out of business for people can cripple them. So you know, if we hadn’t done this move, I may not be here today in this, in my position still. So it’s pretty vital.
John Mueller: Yeah, I know. It’s something we’re also talking with engineers about to see what we can do to speed that process up because that’s really frustrating in a case like yours where you can actually spend so much time to clean those issues up and you’re waiting for things to be processed and updated again. In your case, that was a really great move with the hreflang because you have UK specific content and that’s a great fit for your website, but that might not work for all websites. That’s something we need to work on to speed things up as well.
Speaker 3: Yeah. While we’re talking. Can I ask you a quick question regarding… Now that we’re actually getting traffic and we can do business and all that stuff, I’ve noticed that a lot’s changed in a few years, and we’re not seeing the keywords that are arriving on our site. So they’re searching for Virtual Office London, and we’re getting in our statistics. The new changes Google made in 2010 hides the keywords. If somebody types “office space London cheap”, we want to put the listings that are cheaper at the top, but we can’t do that because the keywords are being blocked. Is there a solution and why is it being done?
John Mueller: Not for those kind of situations. That’s something you could track through Webmaster Tools, but you can’t kind of respond to it on the fly, and there are various reasons why we started doing that. We’re starting to do that for ads as well now. We think it makes sense to pass a generic refer along instead of the full refer that we used to have.
Speaker 3: I think it’s counter-intuitive, to be honest. I think there’s a lot of situations where more data is better, and the only downside that I can see is that potentially, Google wants to protect and hoard this information. That’s how I see it from my point of view. Whereas, if I could plug into Google Analytics live, as it’s happening, to get that protected information, that would be useful. There should be some way that we can get a hold of that information so that we can tailor the search directly to the customer. We’ve talked about hummingbird and having too big a logo at the top because we don’t want customers to scroll. If I can deliver the right content to the top, the customers won’t scroll. Surely, you know, I’m preempting what my customer wants. It’s got to be the best solution.
John Mueller: Yeah… we saw a lot of really spammy sites that were doing that kind of thing. I think in your case, that might make sense, if we’re sending users of various interests to your same pages. But we’ve seen a lot of situations as well where sites were doing really spammy things there, and the primary reason we’re not showing the refer is also for privacy so that this information doesn’t get forwarded directly unencrypted. It’s something that the user has for themselves, and it’s not something that we forward along. So I think that’s probably not going to change. Even having a back door through Analytics where you could see this information live, I doubt that’s going to happen. We might be able to get the data in Webmaster Tools a little bit faster so that it’s maybe just a couple of hours or a day behind instead of the three days that it is at the moment, but I don’t see us sending the full refer anymore or showing this data alive through Analytics or some other kind of back channel.
Speaker 3: Yeah. Ok. Yeah. Thanks, John.
John Mueller: I spent a lot of time looking at my log files as well before I started at Google, and it was always interesting to see where, exactly, people were coming from. But I think from a privacy point of view, it makes sense to take this step and to really make sure that this data is as secured as it could be. That’s something I don’t see us changing any time soon, and I don’t see that data coming back, to be honest. Maybe there are ways we can make this a little bit faster, at least in the Webmaster Tools in an aggregated way, to at least give you faster feedback on how people are finding your site.
Speaker 3: I was more concerned about being reactive and delivering exactly what my customers wants. Unfortunately it’s one of those getting-rid-of-something-great-to-deal-with-abuse. It seems like the wrong action to take in that sense. And if somebody’s landing on my site and they’re landing on office space in Birmingham, I already roughly know what they were searching for in the first place. Whether or not the’re using the word “cheap” or “discount”, is irrelevant from the point of privacy, as far as I’m concerned. So it seems a bit of a moot point.
John Mueller: Yeah, I imagine with some sites, it is more a problem than with other sites. But it’s something we’ve had to take that step, and I really don’t see that coming back, to be honest. If that’s something you were waiting for, I’d try to find other ways to do that. Maybe create a separate page for low price, virtual offices, or those different attributes that you’re looking at there and see if that works for you…
Speaker 3: …If I was to do that I would end up with a Panda issue. And so…
John Mueller: ..No, it kind of depends.
Speaker 3: …I just want to reorder it..
John Mueller: Yeah, and-
Speaker 3: and I want to put a nofollow, noindex so it won’t get indexed. So it’s a circular problem, unfortunately, that all of these things have been built separately without that very idea in mind. It would actually be very useful if it was utilized properly and not made simply on the basis that, “Oh, somebody’s comes to my page; I’ll create a page based on that content that’s going to be dynamic and full of a bunch of garbage,” which is obviously what people did for a long, long time. I understand that, but I think there’s some very good uses to it.
John Mueller: Yeah.
Speaker 3: Anyway, it’s something to think about if it gets discussed that there are practical uses.
John Mueller: Sure.
Speaker 3: Thanks, John.
E-Mail Notifications Inconsistent with Webmaster Tools
Q:“I’ve noticed some disconnect between Webmaster Tools and the emails. I’ve seen some penalties appear, disappear, and reappear, yet the emails started that there were no manual actions. Have there been some glitches in the past three weeks as this is unusual?”
A: That shouldn’t be happening. What sometimes happens is that there’s a slight timing problem between when the emails are sent out and when the data is visible in Webmaster Tools. I update my Webmaster Tools data for manual actions updates maybe twice a day, something around that range, and emails we might send out once a day. So there’s a timing issue there where maybe you’ll get an email and it’ll already have been visible in Webmaster Tools slightly beforehand. It’s not that there’s this one second interval where we send out the email and show the data in Webmaster Tools. It’s sometimes slightly staged there. But it shouldn’t be the case that things pop up and disappear and reappear and emails come randomly, and you don’t see any of that in Webmaster Tools. That definitely shouldn’t be the case.
One thing I’ve seen where some Webmasters were confused is when manual actions expired. That’s something that happens with all of our manual actions in the sense that at some point, it makes sense to expire this manual action because maybe the webmaster has cleaned up and just hasn’t gone through Webmaster Tools to let us know about that. Usually that’s in the range of a couple of years, something around that time frame. When this manual actions expire, essentially, they’re no longer visible in Webmaster Tools, but, as far as I know, we don’t send out any specific email to tell you about that. So what might happen is that you get an email saying, “Hey, you have a manual action.” You look on Webmaster Tools, it shows a manual action, and a couple years later, you look in Webmaster Tools again, and it doesn’t show the manual actions anymore. But that’s not something that would happen from one day to the next, like come and go, come and go again. It’s something that would probably take a couple of months, a couple of years, to actually drop out of Webmaster Tools.
Duplicate Content: HTTP and HTTPS
Q: “According to a crawling tool on my website has duplicate content issues because my content pages are being served with HTTP and HTTPS protocols which I don’t redirect to each other. Should I be worried about that?”
A: No, you definitely don’t need to be worried about that. From our point of view, that’s more of a technical issue. That’s something where we always have to deal with duplicate content in that we crawl different URLs. We see the same content. We have to handle that on our side. So that’s not something where you’d see any demotion or penalty because of that. That’s a technical problem. What you could do is use something like the rel canonical. You could setup a redirect if you have a preferred version, but you don’t need to do that all the time. We can live with this situation.
What will happen in this situation that might be negative for your website is that we’ll be recrawling both of these versions more regularly, and maybe your server has a problem with that extra load. Whereas with two copies of the same page, that’s probably not so much of a problem. But if you have session IDs in there as well, if you have different URL parameters that all lead to the same content, then that can add up, where every page that you have on your website, they’re actually like 10 or 100 differents URLs that we have to crawl to reprocess that. It sometimes gets to the point where we have to spend a lot of time recrawling duplicates, and we don’t pick up your new content as quickly. Or we crawl the website so frequently that it actually causes problems on your server that we’re kind of slowing your server down for all your normal users. But if you’re just seeing this kind of an issue with HTTP and HTTPS, then that’s something we just have to solve on our side where we say, “Oh, we see the same content on these two pages. We have to pick one of these and show it in the search results.” We have to pick one of these pages, and if we don’t know which one to pick, we might pick one that you don’t want to have shown. So if you have a specific opinion on which one we should be showing, tell us. If you don’t mind that we just pick one or the other than that’s fine too.
Are Link Removals Necessary, or is Disavow Enough?
Q:“If we were penalized because of backlinks in a widget users posted our widgets on the websites, is it enough if we add a nofollow to all the links, or do we have to completely delete such links?”
A: From our point of view, if you put a nofollow on those links, if you disavow them or if you delete those links, it’s all the same in that we have to recrawl those pages. We’ll see that the link is no longer passing pagerank, and that’s fine for us. So putting a nofollow there is absolutely fine. Just keep in mind that we have to recrawl those pages to actually see those changes, so that might take a bit of a time to actually be completely reprocessed, regardless of which solution you pick.
Speaker 2: John?
John mueller: Yes.
Speaker 2: I have an interesting question, if you don’t mind if I jump in.
John Mueller: Sure.
Problems with Disqus Commenting Plugin?
Speaker 2: Great. I have a site that got hit by Panda 4. And among a number of things it’s aggregating some content, so I imagine that might have something to do with it . It’s also got Disqus, the commenting system, installed at the bottom. What’s happening is Disqus is not blocking /robots.txt of Googlebot from crawling, but it’s just timing out, it looks like. The net effect, however, looks like they’re blocking JS, and it is affecting the design quite a bit because the 10 or 20 comments that would usually show up at the bottom aren’t showing up. I know that Google likes to check out those comments to make sure that everything is high quality as well. So I was worried that possibly could be making some problems. What do you think?
John Mueller: That shouldn’t be causing that much of a problem. I think the main issue you’d be seeing is if we can’t crawl these comments. If we can’t pick them up for indexing, we won’t be able to rank those pages for those comments.
So for instance, sometimes we have situations where there’s a lot of text on a very technical topic in a blog post, and the comments describe it in a more colloquial way and discuss the issue in slightly different words. In cases like that, those comments are really useful for us to understand that this page is actually about this topic. Iif we can’t get to those comments, then it’s harder for us to rank that page appropriately. It’s more a matter of finding content on those pages and showing those in searches appropriately than anything from a quality point of view. And there are sometimes legitimate reasons to have blocks of text on a page that are blocked with the /robots.txt file. Maybe they’re parts of a page you don’t want to have indexed like that. Maybe there’s some content there that you’re not allowed to have indexed like that for licensing reasons. And you put an iframe, and the iframe content is written in the /robots.txt area, for example. So there are legitimate reasons to do that, and that’s not something that, from a quality point of view, would be causing any problems.
At the same time, if these comment are timing out for Googlebot, then maybe they’re timing out for users as well, and that’s something to watch out for because that could be degrading the user experience in general. But just because they’re roboted or not visible to Googlebot doesn’t mean the site or the page itself is low quality. It’s just that we can’t show that content in the search results.
Speaker 2: Ok. so that wouldn’t be considered cloaking at all?
John Mueller: Not really, no.
Speaker 2: Ok. Great.
John Mueller: All right, here is a question about how Google handles AJAX URLs.
Q: “I’ve noticed that URLs are being indexed without using hashbang method. Looking at the site, it does have a meta fragment. Is that the reason we no longer need hashbang?”
Google and Flash for Mobile Users
Barry Schwartz: Why do you hate Flash?
John Mueller: Aah, we don’t hate Flash. I think you’re referring to the mobile stuff that we did recently. It’s not that we hate Flash. We still crawl and index Flash content for the web search normally. It’s just that if you’re using a smartphone, chances are you won’t be able to look at the Flash content there. And if your website is Flash-based and we send smartphone users there, they’re going to be frustrated because they’re not going to see anything. So it’s not that we hate Flash, it’s just that it doesn’t seem to be supported that well among smartphones at the moment. And…
Speaker 1: Hopefully, they get there, they shake hands and they agree, and they’ll work together.
John Mueller: Yeah, this is something where things have moved on a little bit, and if your website is based completely on Flash, then you’re essentially blocking everyone who’s using a smartphone from being able to access your content. That’s also a really bad user experience from our point of view. If we point smartphones users to your pages and we know ahead of time that they won’t be able to see anything there, then that’s something that we should avoid doing. Usually, what happens when we look at the feedback for search, is that people will say, “Wow, Google sent me to this page that’s absolutely empty.” “It’s Google’s fault that I can’t read this content.” We take that into our own hands and find a way to handle that better in the search results so that smartphone users are actually happy to use our search and not stuck on sites that they can’t actually use.
Faulty Re-Directs and Mobile Accessibility
Speaker 1: Let’s say that you have 10, 15 faulty redirects. Does it have a serious impact on rankings over time? For instance, let’s say you have faulty redirects for a month or two, and then an algorithm comes by. Will I get impacted if I didn’t fix those faulty redirects on my smartphone?
John Mueller: So… faulty redirects would be when you access the desktop URL, and it redirects you to the mobile home page or to another page..
Speaker 1: Yeah. Correct.
John Mueller: Yeah.
Speaker 1: It still hasn’t been fixed because I think Pierre did say in a blog that it does. But I’m not-
John Mueller: That’s one of the category of problems where we see where smartphone users aren’t able to get to the content they’d like to see. That’s something we’re looking at, for example, showing them lower in the search results, maybe batching the search results to tell users that they might not see the content they’re actually going to be clicking on, those kind of things. That’s not something we do on a site-wide basis, it’s really on a per URL basis.
Speaker 1: Yeah.
John Mueller: Because we’ve seen some sites do it for a part of their site, works really fine on mobile and another part has these weird, broken redirects. So we do that on a per URL basis, and it’s not that the website itself would be demoted, it’s just those individual pages we’d like to show a little bit lower in search.
Speaker 1: Right, right, and so once it’s fixed, then they’ll pop back up?
John Mueller: Yeah, definitely, and this is something that’s only visible in the smartphone search results. It’s not that it would affect your desktop search results. It’s really just for the smartphone users. When we can recognize that it won’t be able to use this content, we’ll try to take action on that. We’re at the stage where we’re blocking out individuals issues where the content is completely blocked for smartphone users and telling those smartphones users about it, maybe showing them a little bit lower in search, those kind of things, to make it easier for smartphone users to actually get to the content that they can actually look at.
Speaker 1: But like if one page has 700 words and in mobile… it’s not necessary to have all the 700 words in that one page. Right?
John Mueller: Yeah, yeah, so from our point of view, if you link your desktop pages to a mobile page, then we expect that the content is equivalent. It doesn’t mean that the content has to be exactly the same. The layout can be completely different. Often it is completely different, like the sidebar is missing, the menu’s structure might be slightly different, but the primary content should be equivalent. So if you’re looking at a desktop page about red cars, then the mobile page should be about red cars and not about green bicycles, for example.
Speaker 1: Right. But the content can be less. like ahh…
John Mueller: Sure. Yeah, yeah, We see that a lot, for example, with the sites that kind of the comments away from the mobile version where you have to click on a link to actually get to the comments. But on a desktop version, the comments are completely visible, and that helps speed up the mobile version of that page. So from our point of view, that’s completely fine.
Speaker 1: Maybe there should be another blog about that, John, because I see this: The mobile, there’s 1,000 words and the “Call us now” is all the way towards the end. That’s crazy.
John Mueller: Well, I’d say that’s their problem in the sense that we’re showing the search results as we think they’re relevant. But if the call to action is completely hidden on mobile, then, that’s their problem and something that they should be working on to improve the general mobile user experience there. At the moment, we’re not ranking search results based on mobile friendliness, apart from issues where like mobile users can’t access the content at all. It might be that at some point in the future, we do start ranking search results for mobile slightly differently by bringing the more mobile friendly sites higher. But at the moment, there are just so many sites that are blocking mobile users completely, that we think it makes more sense to focus on this very obvious situation.
Speaker 1: Ok. Thank you.
Google Answer Box: How is it Affecting Click-Throughs?
Speaker 4: John, would you mind if I ask you a question?
John Mueller: Sure. Go ahead.
Speaker 4: For full disclosure, this is on behalf of someone else, and you may recognize it as soon as I post you the link. In the search result, there’s a Google Answer box. Is there room for abuse from brands when it looks like someone’s asking a generic question? Because they’re getting a branded answer.
John Mueller: Yeah, that’s kind of weird. I think, in general, a lot of these answers that we provide work really well, and they bring this information up higher and make it more interesting to click through to find out more. But branded answers like that are not what we’re trying to do there. I can definitely talk with the team that works on that to see if we can improve that a little bit.
Speaker 4: All right, well, Gary will be pleased.
John Mueller: Yeah, I think that’s something we definitely need to watch out for, that it doesn’t turn into an advertisement for websites, but rather, brings more information to the search results about this general topic.
Speaker 1: Well for now the click-through is through the roof.
John Mueller: It’s tricky. I think this is one of those topics where we have a very heated discussions internally. On the one hand, it makes sense to show a bit of a bigger answer for people who are searching for these topics. On the other hand, webmasters might feel that their content is being misused by being shown in this bigger snippet where a search might get the information they want directly in the search results instead of clicking through the site. You say that the click.through rate is probably increasing here. We think the click-through rate is generally increasing, and more interested people are clicking through. Whereas people who are just curious about this topic, they might look at this answer and say, “Oh yeah, this is all I need to know, and I don’t really need to read through the site.” So we are still in discussions internally, at least with regards to how we should handle this kind of answer.
Speaker 1: All right.
Payday Loans Algorithm
Speaker 5: Can I ask you a question?
John Mueller: Sure.
Speaker 5: Regarding the Payday Loans Algorithm, is it a variation of Penguin or something similar to that in that it dea’s primarily with spammy content off-site? Or could we alse be talking about on-site with the quality issues that Panda didn’t get, excessive keywords stuffing or something like that?
John Mueller: I think specifically for that algorithm, we’re looking at a very specific type of website. We’re looking into various signals that we can pull, and that can include on-site and off-site signals. So if you have a spammy payday loan website, then that could involve a lot of factors that you can work on to improve that overall. I wouldn’t necessarily focus only on on-site or only on off-site, but look at it holistically and make sure you’re converting all bases there.
Speaker 5: Are they going to be any longer as far as how it’s going to affect that site than usual with other…
John Mueller: I’m not sure how frequently we’d be updating that. But you know-
Speaker 5: All right, and in one hangout you were talking about a website that might be too far gone, that it would just be better off starting from scratch. What kind of scenario do you think that a site had gotten to that stage where we could look at it and say quite easily, “Oh yeah, might as well start again.” Something that’s been repeatedly hit by a variety of algorithms over time?
John Mueller: It’s really tricky. I don’t think there is any totally obvious signal where you could look at it and say, “Oh, they should just start over again.” But I think if a website has been doing problematic things for quite a long time, then that’s where you start thinking about that. And the tricky part is trying to find the right balance in the sense that if a website has been active for a really long time, it might have this older brand attached to it that people know about. Whereas if the website is still fairly new, then it’s easier to switch to a different domain or a different brand, but at the same time, they probably haven’t done that much bad stuff in the past either.
The situations you need to watch out for are if you’re starting a new website on a domain that has been used for really spammy things in the past. We can, to some extent, take those spammy things and ignore them because we recognize that this is a completely new website. But some of those spammy things might still be associated with that website for a fairly long time. For instance, if they’ve been building spammy links for five, ten years now and someone starts a completely new website on that domain, then that’s a lot of problematic history that is going to be hard to clean up. And if you’re just starting new on that domain name, then probably it’s not that much effort to pick a different domain name and just start on that one instead.
Speaker 5: And just disavowing everything wouldn’t be…
John Mueller: Well, sure, you can. It’s not that it’s impossible, but you have to balance the amount of work and time it takes to clean that up with what you’re trying to achieve. And if you’re a completely new website, then spending a couple of months disavowing old links that you’re not involved in is probably not the first priority on your mind when you’re building a new business. Whereas if you’re an older, established website, then it makes sense to spend more time to actually clean up all of those old problems. So it’s something where there’s not an obvious and easy answer, and I know it makes it hard to make the right decision.
One tricky aspect of that, as well, is that you don’t know how things will change in the future. Where maybe you’ll clean up all of the links now and you don’t really know if the next algorithm update is going to take place next week or maybe it’ll take place in a half a year. It’s hard to know when you should cut loose and say, “Ok, this is going to be way too much effort for us actually clean up completely” Whereas it might be a little bit easier for us actually start over a new domain. Some people have started over and have made really great progress there. Other people may have cleaned up their old website and managed to get it back into good shape. So it’s not that it’s impossible to do, one way or the other. It’s just that you have to make a business decision on where you should focus your energy, your money, and your time.
Speaker 5: Ok. Alright, thanks a lot.
John Mueller: And I think that’s where you guys who have a more experience with cleaning up issues around links, with cleaning up issues around spam, you’re in good position to help these websites make the right decision and to look at their business model overall to say, “Hey, you’re a completely new website and you´re on this domain that has been spammed by previous owners in crazy ways in the past. Maybe it makes sense to just pick a different domain name and start on that one instead. And if over time you manage to clean up the domain name, maybe you could put something on there as well or maybe you can move to that one.”
Speaker 5: Would the bigger challenge be that the website had like external links and stuff pointing back to it, or say it had been a poor quality website for a long time previously?
John Mueller: Most of the time, it’s the external factors that are harder to clean up. If it was a really spammy website in the past and it might have gotten stuck on one of our catching-low-quality-content type of algorithms, that just takes a longer time to update again. Usually, I look more at the problematic links in that situation, but I still take a look at archive.org to see what the old content was. If the old content was really problematic, if it was just content from feeds that’s just scrambled and spun, then that’s something also worth taking into account. And with all the new top level domains that are coming out, maybe it just makes sense to grab something that’s completely new where you don’t have any anchor attached to your website that’s dragging you back. Even if you clean up these links now, you never know what else might be kind of dragging you back. If you start out with something completely new, then you don’t have to worry about that. You can spend all of your time and energy making a fantastic website instead of having to clean up old things that could potentially be pulling you down.
The Future of SEO
Speaker 1: John, before you go, there was an article about how Google for webmasters: “It’s only going to get worse, but not better.” That was the title. Can you clear this for the SEO community? Is that true?
John Mueller: Well, I think one thing to keep in mind is the web is constantly evolving, and in the past, it has been a niche market area. In the last couple of years, it’s really mainstreamed. Five or ten years ago, if you opened an online bookstore, you were one the first five people to do that, and you’d have it really easy to rank for buying books online. Whereas now, there are tons of companies that are offering these services. A lot of the traditional offline business are also active online, and it’s just going to be harder to compete. It doesn’t mean that it’s impossible, and there’s still a lot of room for these niche market areas where you’re targeting an audience that is too small for these bigger websites. I think there’s a lot of potential there, but it’s not the case that, from Google’s point of view, it’s going to become much harder. It’s just that the whole market is maturing, and it’s obviously going to be harder to rank number one for something where there are hundreds of other sites that are already competing for that slot.
Speaker 1: Ok, but in the end of the day, you have to know what you’re doing. That, I agree with, but you clarified this before, that SEO is here to stay. You still need it, right?
John Mueller: Yeah, yeah, definitely. There are lots of technical things, for example, that I would like to see as a part of SEO that essentially makes sure that the website is accessible to search engines, and those are thing that have to remain there. It’s not that you can take a photo of a brochure and put that online as an image file, and suddenly that will rank number one. You really have to know what it involves to put a website up to make sure that it’s crawlable, to make sure that it’s accessible by search engines, that they can read the test on there and index that content properly. And that’s something that’s definitely not going to go away. And all of the understanding on how the marketing side of search engines also work, like what people are searching for, how you can create content that matches what they’re searching for, how you can create content that works well for users and works well for search engines. Those are things that aren’t going to go away. Just because there are more people active online doesn’t mean that it’s going to become impossible to do that or unnecessary to do that. There’s still a lot of craftsmanship involved in making a website that works well in search.
Speaker: Of course. Yeah, Ok.
Speaker 3: I actually think that it’s going to get easier once Google actually deals with the churn and burn issues.
Speaker 1: No, it’s just that writing an article like that is totally…I just kind of agreed with it and didn’t agree, but …
Speaker 3: Yeah, I think it’s going to get a lot easier once the churn and burn sites are gone. When Google deals with those spam issues, there’s going to be a lot of spaces in the top 10 that are available. Right now in my niche, the number one site, it’s now gone from number two to number one, and they’re a complete spam site, churn and burn, …..things will change, and it’ll be a much better place. And it won’t be more difficult, it’ll be much easier.
Speaker 2: Hey Gary, Sandeep had a question. I don’t know if John still has time, but Sandeep has been trying to ask his question for the last 45 minutes. John, do you have time for it?
John Mueller: Sure.
Recovering from Penalty
Q: All right. Content Penalty on our sub-domain” “Our manual penalty was lifted, but our site’s traffic is still the same. It’s been almost six months, and not picking up.
A: I’d have to take a quick look. From a really quick glance, I think what you’re looking at there is not that there’s any kind of manual action that’s holding you back anymore. It looks like that’s been resolved, but rather just our algorithms that are not so happy with the quality of your content overall. With regards to forums, that’s sometimes a tricky problem because people can be posting anything in your forums, so you probably need to take action as well and think about ways that you can make sure that the quality of the content on your forum is kind of the highest quality possible.
We’ve seen ways to figure out if an author is providing high quality content overall and maybe no indexing content from new people that are joining your forums, maybe no indexing forum threads that just have a question and no answers yet. And making sure that the content we pick up for indexing is actually really high quality and the things that are completely new are blocked with a noindex, so that doesn’t get picked up from one day to the next.
That’s something you really need to work on your forum and think about ways that you can make sure it works well overall, that the high quality content is indexable and content is blocked from indexing until you’ve had a chance to vet and process it to make sure it’s really high quality. But that is tricky to do on a forum, and it takes a while to figure out which method works for your forum. Some methods might work for one type of forum, other methods work for a different type of forum. In some cases, if the forum is just filled with chatter, to noindex the majority of the content there and focus on what you think is really high quality and have that indexed instead.
Speaker 1: All right.
John Mueller: All right.
Speaker: We should go to a two hours special.
John Mueller: Oh my God, no.
Speaker 1: (laughs). All right.
John Mueller: Thank you all for coming. Thank you all for all of your questions. I set up another one in English for next week which is a little bit later. So for those of you in Pacific time zone, it might be a little bit easier to join in. And I wish you a guys a great week.
Speaker 1: Bye.
John Mueller: Bye, everyone.
Speaker 2: Thanks very much, John. You’re a god-send for webmasters. Thank you, John.
John Mueller: Ok, bye.
Latest posts by Marcela De Vivo (see all)
- The Evolution of Data: Creating Intent-Led Digital Strategies - 29 January, 2019
- Productive Things To Do When You Are a Freelancer Job-Hunting - 18 July, 2018
- What KPIs Should I be Using to Measure my SEO Campaign - 21 July, 2017