<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Mozilla IT &#187; Infrastructure Notices</title>
	<atom:link href="http://blog.mozilla.com/it/category/infrastructure-notices/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.mozilla.com/it</link>
	<description>Mozilla IT &#38; Operations</description>
	<lastBuildDate>Thu, 26 Jan 2012 20:19:22 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Mozilla Scheduled Maintenance, Subversion (svn.mozilla.org) will be unavailable 01/28/2012 8pm-2am PST (0400 GMT)</title>
		<link>http://blog.mozilla.com/it/2012/01/26/mozilla-scheduled-maintenance-subversion-svn-mozilla-org-will-be-unavailable-01282012-8pm-2am-pdt/</link>
		<comments>http://blog.mozilla.com/it/2012/01/26/mozilla-scheduled-maintenance-subversion-svn-mozilla-org-will-be-unavailable-01282012-8pm-2am-pdt/#comments</comments>
		<pubDate>Thu, 26 Jan 2012 19:47:12 +0000</pubDate>
		<dc:creator>bhourigan</dc:creator>
				<category><![CDATA[General Updates]]></category>
		<category><![CDATA[Outages]]></category>
		<category><![CDATA[Scheduled Maintenance]]></category>
		<category><![CDATA[l10n]]></category>
		<category><![CDATA[phoenix]]></category>
		<category><![CDATA[subversion]]></category>
		<category><![CDATA[svn]]></category>

		<guid isPermaLink="false">http://blog.mozilla.com/it/?p=1684</guid>
		<description><![CDATA[We will have a scheduled maintenance window on Saturday, January 28th at 8pm-2am PST (0400 GMT). The following work will take place: Migrate from San Jose to Phoenix Implement fault tolerant infrastructure During the maintenance period subversion will be unavailable for both reading and writing. Because we&#8217;re switching to a newer version of subversion (and&#8230; <a class="more-link" href="http://blog.mozilla.com/it/2012/01/26/mozilla-scheduled-maintenance-subversion-svn-mozilla-org-will-be-unavailable-01282012-8pm-2am-pdt/" title="Read the rest of &#8220;Mozilla Scheduled Maintenance, Subversion (svn.mozilla.org) will be unavailable 01/28/2012 8pm-2am PST (0400 GMT)&#8221;">Read more</a>]]></description>
			<content:encoded><![CDATA[<p>We will have a scheduled maintenance window on <strong>Saturday, January 28th at 8pm-2am PST (0400 GMT)</strong>. The following work will take place:</p>
<ul>
<li>Migrate from San Jose to Phoenix</li>
<li>Implement fault tolerant infrastructure</li>
</ul>
<p>During the maintenance period subversion will be unavailable for both reading and writing. Because we&#8217;re<br />
switching to a newer version of subversion (and changing the data store to fsfs) the data migration will require a time consuming svnadmin dump / svnadmin load. We anticipate this step alone will take about 4 hours.</p>
<p>&nbsp;</p>
<p><strong>Time</strong>: January 28th 8pm-2am PST<br />
<strong>Scheduled downtime</strong>: 6 hours<br />
<strong>Estimated actual downtime</strong>: 4.5 hours<br />
<strong>Impact</strong>: All subversion related services will be unavailable <em>including</em> viewvc</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mozilla.com/it/2012/01/26/mozilla-scheduled-maintenance-subversion-svn-mozilla-org-will-be-unavailable-01282012-8pm-2am-pdt/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>To Tree or Not to Tree</title>
		<link>http://blog.mozilla.com/it/2011/12/01/to-tree-or-not-to-tree/</link>
		<comments>http://blog.mozilla.com/it/2011/12/01/to-tree-or-not-to-tree/#comments</comments>
		<pubDate>Thu, 01 Dec 2011 11:59:40 +0000</pubDate>
		<dc:creator>dmoore</dc:creator>
				<category><![CDATA[Outages]]></category>
		<category><![CDATA[outage]]></category>
		<category><![CDATA[phx1]]></category>
		<category><![CDATA[spanning tree]]></category>

		<guid isPermaLink="false">http://blog.mozilla.com/it/?p=1590</guid>
		<description><![CDATA[This week, our Phoenix datacenter fell prey to a series of brief rolling outages which visibly impacted many of Mozilla&#8217;s public services. Generally speaking, our datacenter architectures are intentionally simple and spanning tree has served us well. However, as we have grown to meet demand, some of our more&#8230; venerable datacenters have become convoluted as&#8230; <a class="more-link" href="http://blog.mozilla.com/it/2011/12/01/to-tree-or-not-to-tree/" title="Read the rest of &#8220;To Tree or Not to Tree&#8221;">Read more</a>]]></description>
			<content:encoded><![CDATA[<p>This week, our Phoenix datacenter fell prey to a series of brief rolling outages which visibly impacted many of Mozilla&#8217;s public services.</p>
<p><a href="http://blog.mozilla.com/it/files/2011/12/core1_down.png"><img class="alignnone size-full wp-image-1592" title="I blame fox2mike" src="http://blog.mozilla.com/it/files/2011/12/core1_down.png" alt="I blame fox2mike" width="545" height="38" /></a></p>
<p>Generally speaking, our datacenter architectures are intentionally simple and <a href="http://en.wikipedia.org/wiki/Spanning_tree_protocol">spanning tree</a> has served us well. However, as we have grown to meet demand, some of our more&#8230; <em>venerable</em> datacenters have become convoluted as new applications are shoehorned into old infrastructure.</p>
<p>Two weeks ago, we brought <a title="This week in IT: We rack some stuff" href="http://blog.mozilla.com/it/2011/11/18/this-week-in-it-we-rack-some-stuff/">a new expansion</a> online in Phoenix. Little did we suspect this would be the straw which broke the camel&#8217;s back. Minor spanning tree events which had previously gone unnoticed quickly escalated into <strong>very</strong> noticeable spanning tree cascades. Frustratingly, outages would often resolve themselves before netops personnel could log in to diagnose them. Cell phones vibrated at odd hours of the night. Unkind words were spoken.</p>
<p>Ultimately, we traced the fragility to an oversight in our spanning tree design. Although Juniper is our vendor of choice, we do rely on Cisco&#8217;s 3120 blade switch for our HP c7000 chassis. This multi-vendor network creates interesting challenges. In this case, we discovered Juniper&#8217;s VSTP mode is not entirely compatible with Cisco&#8217;s rapid-pvst mode. In JUNOS versions prior to 10.3, VSTP is unable to fully converge with rapid-pvst. For more information, see <a href="http://kb.juniper.net/InfoCenter/index?page=content&amp;id=KB18291">Juniper KB 18291 </a><em>(Juniper support account required)</em>.</p>
<h2>What did we learn?</h2>
<ol>
<li>Be diligent about marking server trunk ports as spanning tree edge ports. Otherwise, these ports will generate topology changes when a server reboots.</li>
<li>There&#8217;s no such thing as too much logging. Logging of spanning tree events can alert you to unexpected topology changes (See #1).</li>
<li>Not all spanning tree protocols are created equal. Don&#8217;t blindly trust that spanning tree is doing the right thing.</li>
</ol>
<h2>How do we avoid this, moving forward?</h2>
<p>We&#8217;re taking great pains to eliminate spanning tree entirely from our newest datacenter, <a href="http://blog.mozilla.com/it/tag/project-scl3/">SCL3</a>. While we&#8217;re not quite ready to make the leap to a unified fabric architecture (such as Juniper&#8217;s QFabric or Cisco&#8217;s Nexus), modern multi-chassis technologies can still offer significant improvements. In our case, we&#8217;ll be deploying Juniper&#8217;s XRE line to enable virtual chassis support on our core EX8200 platform.</p>
<div id="attachment_1598" class="wp-caption aligncenter" style="width: 310px"><a href="http://blog.mozilla.com/it/files/2011/12/lbox-xre200-left.jpg"><img class="size-medium wp-image-1598" title="Juniper's XRE200" src="http://blog.mozilla.com/it/files/2011/12/lbox-xre200-left-300x99.jpg" alt="Juniper's XRE200" width="300" height="99" /></a><p class="wp-caption-text">Juniper&#39;s XRE200</p></div>
<p>With virtual chassis at every level (core, aggregation, access), we no longer depend on spanning tree for layer 2 redundancy. Instead, we will be able to rely on a <a href="http://en.wikipedia.org/wiki/Link_aggregation">link aggregation protocol </a>(such as LACP). This comes with several added benefits:</p>
<ul>
<li>Improved utilization and load balancing of redundant links</li>
<li>Faster convergence</li>
<li>Capacity for growth</li>
<li>Not spanning tree</li>
</ul>
<p>Once this architecture is vetted in SCL3, retrofitting PHX1 with the XRE devices will become a top priority.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mozilla.com/it/2011/12/01/to-tree-or-not-to-tree/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>RFO: SCL1 outage Oct 16, 2011</title>
		<link>http://blog.mozilla.com/it/2011/10/20/rfo-scl1-outage-nov-16-2011/</link>
		<comments>http://blog.mozilla.com/it/2011/10/20/rfo-scl1-outage-nov-16-2011/#comments</comments>
		<pubDate>Fri, 21 Oct 2011 01:01:28 +0000</pubDate>
		<dc:creator>ravi</dc:creator>
				<category><![CDATA[Outages]]></category>
		<category><![CDATA[releng]]></category>
		<category><![CDATA[scl1]]></category>

		<guid isPermaLink="false">http://blog.mozilla.com/it/?p=1378</guid>
		<description><![CDATA[On October 13th at 1324 PST Nagios alerted the start of a network event affecting reachability to the SCL1 data center. SCL1 is configured with redundant internet links where a VPN traverses a redundant firewall at both ends. There is also a point-to-point (p2p) that connects directly to SJC1. The running configuration had the VPN&#8230; <a class="more-link" href="http://blog.mozilla.com/it/2011/10/20/rfo-scl1-outage-nov-16-2011/" title="Read the rest of &#8220;RFO: SCL1 outage Oct 16, 2011&#8221;">Read more</a>]]></description>
			<content:encoded><![CDATA[<p>On October 13th at 1324 PST Nagios alerted the start of a network event affecting reachability to the SCL1 data center. SCL1 is configured with redundant internet links where a VPN traverses a redundant firewall at both ends.  There is also a point-to-point (p2p) that connects directly to SJC1.</p>
<p>The running configuration had the VPN as the active path and the p2p disabled because of an ongoing issue (<a title="RESOLVED DUPLICATE - scl1 p2p packet loss appears to increase under load" href="https://bugzilla.mozilla.org/show_bug.cgi?id=680463">bug 680463</a>).</p>
<p>Because this path was disabled a complete outage was experienced to SCL1 and all its services which primarily includes the release engineering and build infrastructure.</p>
<p>Upon initial investigation the VPNs, fw1.scl1 and vpn1.sjc1, showed the other was sending a incorrect response while renegotiating the tunnel.  Standard non-destructive troubleshooting was attempted to reestablish the tunnel with no success.</p>
<p>In the normal course of troubleshooting fw1.scl1 became unresponsive where on-site presence was required.  Once on site fw1.scl1 was restored traffic was shifted from the VPN to the p2p despite it not being confirmed fixed.  Basic steps were made to reseat optics and clean fiber patches before traffic was moved.</p>
<p>The review of the logs available did not point to any specific issue why the VPN failed nor why the methods used to recover it failed.</p>
<p>While traffic was being shifted to the p2p the VPN recovered on its own, but the decision was made to stay on the p2p while closely monitoring it being mindful of <a title="RESOLVED DUPLICATE - scl1 p2p packet loss appears to increase under load" href="https://bugzilla.mozilla.org/show_bug.cgi?id=680463">bug 680463</a> which has since been resolved.</p>
<p>Netops is investigating configurations to augment link fault and the automatic failover to the standby path and will implement it at a later date.</p>
<p>Complete timeline:</p>
<p>13:24 Initial nagios alert.<br />
13:34 Netops is paged.<br />
13:56 Netops responds.<br />
14:05 Escalation to dmoore (page)<br />
14:13 Escalation to dmoore (phone call)<br />
14:15 Escalation to ravi (page)<br />
14:16 Escalation to ravi (page)<br />
14:16 Ravi responds<br />
14:48 fw1.scl1 becomes unresponsive<br />
16:17 Nagios alerts begin to clear</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mozilla.com/it/2011/10/20/rfo-scl1-outage-nov-16-2011/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mozilla Scheduled Maintenance 7/12 (tree closure), 14, 19/2011 1800 PST (0100 UTC)</title>
		<link>http://blog.mozilla.com/it/2011/07/11/mozilla-scheduled-maintenance-712-tree-closure-14-192011-1800-pst-0100-utc/</link>
		<comments>http://blog.mozilla.com/it/2011/07/11/mozilla-scheduled-maintenance-712-tree-closure-14-192011-1800-pst-0100-utc/#comments</comments>
		<pubDate>Mon, 11 Jul 2011 07:03:58 +0000</pubDate>
		<dc:creator>ravi</dc:creator>
				<category><![CDATA[Infrastructure Notices]]></category>
		<category><![CDATA[RelEng]]></category>
		<category><![CDATA[Scheduled Maintenance]]></category>
		<category><![CDATA[phx1]]></category>
		<category><![CDATA[scl1]]></category>
		<category><![CDATA[sjc1]]></category>

		<guid isPermaLink="false">http://blog.mozilla.com/it/?p=1285</guid>
		<description><![CDATA[Network Operations will be conducting the following maintenance windows: Tuesday July 12 2011 1800 PST (0100 UTC) [Tree Closure] Duration: 3h All Firefox, Fennec &#38; TryServer trees will be closed during this window. The VPN concentrator in SJC1 will be reconfigured to accommodate point-to-point (p2p) links to PHX1 and SCL1.  During this window the p2p&#8230; <a class="more-link" href="http://blog.mozilla.com/it/2011/07/11/mozilla-scheduled-maintenance-712-tree-closure-14-192011-1800-pst-0100-utc/" title="Read the rest of &#8220;Mozilla Scheduled Maintenance 7/12 (tree closure), 14, 19/2011 1800 PST (0100 UTC)&#8221;">Read more</a>]]></description>
			<content:encoded><![CDATA[<p>Network Operations will be conducting the following maintenance windows:</p>
<p><strong>Tuesday July 12 2011 1800 PST (0100 UTC) [Tree Closure] Duration: 3h<br />
</strong></p>
<p><span style="color: #ff6600;">All Firefox, Fennec &amp; TryServer trees will be closed during this window.</span></p>
<p>The VPN concentrator in SJC1 will be reconfigured to accommodate point-to-point (p2p) links to PHX1 and SCL1.  During this window the p2p to SCL1 will be enabled.</p>
<p>The expected client impact is to <a href="http://www.mozilla.com/en-US/mobile/sync/" target="_blank">Firefox Sync</a> users who change their password. During this time LDAP replication will be suspended and affected users may experience client issues until the replication chain is restored.</p>
<p><strong>Thursday July 14 2011 1800 PST (0100 UTC)</strong> <strong>Duration: 3h</strong><br />
<strong>Tuesday July 19 2011 1800 PST (0100 UTC) <strong>Duration: 3h</strong></strong></p>
<p><span style="color: #ff6600;">No tree closure.</span></p>
<p>Each of these days the router OS will be upgraded on a core and border pair in PHX1.  The upgrades will address bug fixes to allow IPv6 and allow the p2p form SJC1 to be enabled.</p>
<p>There is no expected user facing outage during this work.</p>
<p>Please let me know if you have any reason why we should not proceed with any of the planned maintenance. As always, we aim to keep downtime to as little as possible, but unexpected complications can arise causing longer downtime periods than expected. All systems should be operational by the end of the maintenance window.</p>
<p>Feel free to comment directly if you see issues past the planned windows.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mozilla.com/it/2011/07/11/mozilla-scheduled-maintenance-712-tree-closure-14-192011-1800-pst-0100-utc/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mozilla Scheduled Maintenance (Tree Closure) – 5/31/2011, 6am PDT (1300 UTC 31 May 2011)</title>
		<link>http://blog.mozilla.com/it/2011/05/30/mozilla-scheduled-maintenance-tree-closure-%e2%80%93-5312011-6am-pdt-1300-utc-31-may-2011/</link>
		<comments>http://blog.mozilla.com/it/2011/05/30/mozilla-scheduled-maintenance-tree-closure-%e2%80%93-5312011-6am-pdt-1300-utc-31-may-2011/#comments</comments>
		<pubDate>Mon, 30 May 2011 16:22:19 +0000</pubDate>
		<dc:creator>amilewski</dc:creator>
				<category><![CDATA[Scheduled Maintenance]]></category>

		<guid isPermaLink="false">http://blog.mozilla.com/it/?p=1263</guid>
		<description><![CDATA[We will have a scheduled maintenance window Tuesday morning, May 31 from 6am to 10am PDT (1300-1700 UTC) All Firefox, Fennec &#38; TryServer trees will be closed during this window. 6:00am PDT (1300 UTC) Talos/Pageloader update. RelEng will be landing a changeset on the build/tools infra. This will require restarting all the masters, so we&#8217;ll&#8230; <a class="more-link" href="http://blog.mozilla.com/it/2011/05/30/mozilla-scheduled-maintenance-tree-closure-%e2%80%93-5312011-6am-pdt-1300-utc-31-may-2011/" title="Read the rest of &#8220;Mozilla Scheduled Maintenance (Tree Closure) – 5/31/2011, 6am PDT (1300 UTC 31 May 2011)&#8221;">Read more</a>]]></description>
			<content:encoded><![CDATA[<p>We will have a scheduled maintenance window <strong>Tuesday morning, May 31</strong> from 6am to 10am PDT (1300-1700 UTC)</p>
<p><span style="color: #ff6600;">All Firefox, Fennec &amp; TryServer trees will be closed during this window.</span></p>
<ul>
<li>6:00am PDT (1300 UTC) Talos/Pageloader update.<br />
RelEng will be landing a changeset on the build/tools infra. This will require restarting all the masters, so we&#8217;ll close the tree at 6am, let any running builds complete, then land the patch and restart. Total expected tree closure is 4 hours.</li>
<li>9:00am PDT (1600 UTC) SCL1 Switch reconfiguration.<br />
We will be enabling portfast on the SCL1 top-of-rack switches to improve remote management. Latency may spike on connections to buildslaves, but no loss of connectivity is expected.</li>
</ul>
<p>Please let me know if you have any reason why we should not proceed      with this planned maintenance. As always, we aim to keep downtime to   as    little as possible, but unexpected complications can arise  causing     longer downtime periods than expected. All systems should be   operational    by the end of the maintenance window.</p>
<p>Feel free to comment directly if you see issues past the planned downtime.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mozilla.com/it/2011/05/30/mozilla-scheduled-maintenance-tree-closure-%e2%80%93-5312011-6am-pdt-1300-utc-31-may-2011/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mozilla Scheduled Maintenance (Tree Closure) – 5/9/2011, 6am PDT (1300 UTC 9 May 2011)</title>
		<link>http://blog.mozilla.com/it/2011/05/07/mozilla-scheduled-maintenance-tree-closure-%e2%80%93-592011-6am-pdt-1300-utc-9-may-2011/</link>
		<comments>http://blog.mozilla.com/it/2011/05/07/mozilla-scheduled-maintenance-tree-closure-%e2%80%93-592011-6am-pdt-1300-utc-9-may-2011/#comments</comments>
		<pubDate>Sat, 07 May 2011 15:43:35 +0000</pubDate>
		<dc:creator>amilewski</dc:creator>
				<category><![CDATA[Scheduled Maintenance]]></category>

		<guid isPermaLink="false">http://blog.mozilla.com/it/?p=1246</guid>
		<description><![CDATA[We will have a scheduled maintenance window Monday morning, May 9 from 6am to 7am PDT (1300-1400 UTC) All Firefox, Fennec &#38; TryServer trees will be closed during this window. 6:00am PDT (1300 UTC) Talos/Pageloader update. A new Talos/Pageloader bundle will be deployed. This is expected to cause a wobble in test results, so we&#8217;ll&#8230; <a class="more-link" href="http://blog.mozilla.com/it/2011/05/07/mozilla-scheduled-maintenance-tree-closure-%e2%80%93-592011-6am-pdt-1300-utc-9-may-2011/" title="Read the rest of &#8220;Mozilla Scheduled Maintenance (Tree Closure) – 5/9/2011, 6am PDT (1300 UTC 9 May 2011)&#8221;">Read more</a>]]></description>
			<content:encoded><![CDATA[<p>We will have a scheduled maintenance window <strong>Monday morning, May 9</strong> from 6am to 7am PDT (1300-1400 UTC)</p>
<p><span style="color: #ff0000;">All Firefox, Fennec &amp; TryServer trees will be closed during this window.</span></p>
<ul>
<li><span style="color: #000000;">6</span>:00am PDT (1300 UTC) Talos/Pageloader update.<br />
A new Talos/Pageloader bundle will be deployed. This is expected to cause a wobble in test results, so we&#8217;ll be closing the tree to push out the update.</li>
</ul>
<p>Please let me know if you have any reason why we should not proceed     with this planned maintenance. As always, we aim to keep downtime to  as    little as possible, but unexpected complications can arise causing     longer downtime periods than expected. All systems should be  operational    by the end of the maintenance window.</p>
<p>Feel free to comment directly if you see issues past the planned downtime.</p>
<p><strong> </strong></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mozilla.com/it/2011/05/07/mozilla-scheduled-maintenance-tree-closure-%e2%80%93-592011-6am-pdt-1300-utc-9-may-2011/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mozilla Scheduled Maintenance – 03/09/2011, 4pm PST (03/09/2011, 0000 UTC)</title>
		<link>http://blog.mozilla.com/it/2011/03/08/mozilla-scheduled-maintenance-%e2%80%93-03092011-4pm-pst-03092011-0000-utc/</link>
		<comments>http://blog.mozilla.com/it/2011/03/08/mozilla-scheduled-maintenance-%e2%80%93-03092011-4pm-pst-03092011-0000-utc/#comments</comments>
		<pubDate>Wed, 09 Mar 2011 05:39:40 +0000</pubDate>
		<dc:creator>amilewski</dc:creator>
				<category><![CDATA[Scheduled Maintenance]]></category>

		<guid isPermaLink="false">http://blog.mozilla.com/it/?p=1229</guid>
		<description><![CDATA[We will have a scheduled maintenance window tomorrow night from 4:00pm to 5:00pm PST. The following changes will take place: 4:00pm PST (0000 UTC) support.mozilla.com. We will be switching over from the San Jose NFS cluster to the Phoenix NFS cluster (bug 639844). Duration 1 hour. Please let me know if you have any reason&#8230; <a class="more-link" href="http://blog.mozilla.com/it/2011/03/08/mozilla-scheduled-maintenance-%e2%80%93-03092011-4pm-pst-03092011-0000-utc/" title="Read the rest of &#8220;Mozilla Scheduled Maintenance – 03/09/2011, 4pm PST (03/09/2011, 0000 UTC)&#8221;">Read more</a>]]></description>
			<content:encoded><![CDATA[<p>We will have a scheduled maintenance window <b>tomorrow night</b> from 4:00pm to 5:00pm PST. The following changes will take place:</p>
<ul>
<li>4:00pm PST (0000 UTC) <a href="http://support.mozilla.com/"><code>support.mozilla.com</code></a>. We will be switching over from the San Jose NFS cluster to the Phoenix NFS cluster (bug <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=639844">639844</a>). <em>Duration 1 hour.</em>
</ul>
<p>Please let me know if you have any reason why we should not proceed with this planned maintenance. As always, we aim to keep downtime to as little as possible, but unexpected complications can arise causing longer downtime periods than expected. All systems should be operational by the end of the maintenance window.</p>
<p>Feel free to comment directly if you see issues past the planned downtime.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mozilla.com/it/2011/03/08/mozilla-scheduled-maintenance-%e2%80%93-03092011-4pm-pst-03092011-0000-utc/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mozilla Network Outage Report (Phoenix) – 03/08/2011, 5:00am PST – 11:30am PST</title>
		<link>http://blog.mozilla.com/it/2011/03/08/mozilla-network-outage-report-phoenix-%e2%80%93-03082011-500am-pst-%e2%80%93-1130am-pst/</link>
		<comments>http://blog.mozilla.com/it/2011/03/08/mozilla-network-outage-report-phoenix-%e2%80%93-03082011-500am-pst-%e2%80%93-1130am-pst/#comments</comments>
		<pubDate>Tue, 08 Mar 2011 22:06:19 +0000</pubDate>
		<dc:creator>mrz</dc:creator>
				<category><![CDATA[Outages]]></category>
		<category><![CDATA[juniper]]></category>
		<category><![CDATA[phx1]]></category>

		<guid isPermaLink="false">http://blog.mozilla.com/it/?p=1215</guid>
		<description><![CDATA[For several hours this morning, Mozilla&#8217;s Phoenix data center suffered several intermittent outages. This was fall out from yesterday&#8217;s Juniper SRX JunOS upgrade. The following sites/services may have experienced degraded performance or partial/full outages: Firefox Sync Socorro (crash-stats.mozilla.com &#038; crash-reports.mozilla.com) input.mozilla.com pulse.mozilla.org firefoxlive.mozilla.org demos.mozilla.org www.mozillademos.org www.drumbeat.org Background: There were two separate issues that we encountered,&#8230; <a class="more-link" href="http://blog.mozilla.com/it/2011/03/08/mozilla-network-outage-report-phoenix-%e2%80%93-03082011-500am-pst-%e2%80%93-1130am-pst/" title="Read the rest of &#8220;Mozilla Network Outage Report (Phoenix) – 03/08/2011, 5:00am PST – 11:30am PST&#8221;">Read more</a>]]></description>
			<content:encoded><![CDATA[<p>For several hours this morning, Mozilla&#8217;s Phoenix data center suffered several intermittent outages.  This was fall out from <a href="http://blog.mozilla.com/it/2011/03/06/mozilla-scheduled-maintenance-%E2%80%93-03072011-6pm-pst-03082011-0100-utc/">yesterday&#8217;s Juniper SRX JunOS upgrade</a>.</p>
<p>The following sites/services may have experienced degraded performance or partial/full outages:<br />
<blocklist></p>
<ul>
<li>Firefox Sync
<li>Socorro (<code>crash-stats.mozilla.com</code> &#038;  <code>crash-reports.mozilla.com</code>)
<li><code>input.mozilla.com</code>
<li><code>pulse.mozilla.org</code>
<li><code>firefoxlive.mozilla.org</code>
<li><code>demos.mozilla.org</code>
<li><code>www.mozillademos.org</code>
<li><code>www.drumbeat.org</code>
</ul>
<p></blocklist>
</ul>
<p><b>Background:</b><br />
There were two separate issues that we encountered, both tracked in  bug <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=639745">639745</a>.</p>
<ol>
<li>DHCP relay failures.  This is a regression in the JunOS code.
<p>Just before 10:00pm Monday night, multiple hosts in Phoenix began to lose their DHCP leases and drop offline.  For reasons not yet understood, the DHCP relay feature was no longer operational.</p>
<p>This caused an 8 minute outage for <code>support.mozilla.com</code>.</p>
<li>High CPU load. We began experiencing high (maximum) CPU usage on multiple FPCs after upgrading from 10.1 to 10.4R2.  This did not have any immediate impact and we opted to continue working with overnight with JTAC on resolution.
<p>This morning as general load increased, this became a service impacting issue.  Netops downgraded to 10.3R2.11 and eventually downgraded to 10.2S7 to resolve these issues.</ol>
<p>We apologize for any inconvenience this may have caused and will continue to work with Juniper to understand why this failed and on a long term remedy. </p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mozilla.com/it/2011/03/08/mozilla-network-outage-report-phoenix-%e2%80%93-03082011-500am-pst-%e2%80%93-1130am-pst/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mozilla Scheduled Maintenance – 03/07/2011, 6pm PST (03/08/2011, 0100 UTC)</title>
		<link>http://blog.mozilla.com/it/2011/03/06/mozilla-scheduled-maintenance-%e2%80%93-03072011-6pm-pst-03082011-0100-utc/</link>
		<comments>http://blog.mozilla.com/it/2011/03/06/mozilla-scheduled-maintenance-%e2%80%93-03072011-6pm-pst-03082011-0100-utc/#comments</comments>
		<pubDate>Mon, 07 Mar 2011 05:38:03 +0000</pubDate>
		<dc:creator>mrz</dc:creator>
				<category><![CDATA[Scheduled Maintenance]]></category>

		<guid isPermaLink="false">http://blog.mozilla.com/it/?p=1205</guid>
		<description><![CDATA[We will have an off-schedule maintenance window tomorrow night from 6:00pm to 9:00pm PST. The following changes will take place: 6:00pm PST (0100 UTC) Phoenix Firewall upgrades. We&#8217;ll be picking up vendor-recommended software upgrades. The nature of this upgrade requires redundancy to be disabled and as such, all sites will experience a 10-15 second outage.&#8230; <a class="more-link" href="http://blog.mozilla.com/it/2011/03/06/mozilla-scheduled-maintenance-%e2%80%93-03072011-6pm-pst-03082011-0100-utc/" title="Read the rest of &#8220;Mozilla Scheduled Maintenance – 03/07/2011, 6pm PST (03/08/2011, 0100 UTC)&#8221;">Read more</a>]]></description>
			<content:encoded><![CDATA[<p>We will have an off-schedule maintenance window <strong>tomorrow night</strong> from 6:00pm to 9:00pm PST. The following changes will take place:</p>
<ul>
<li>6:00pm PST (0100 UTC) Phoenix Firewall upgrades.  We&#8217;ll be picking up vendor-recommended software upgrades.  The nature of this upgrade requires redundancy to be disabled and as such, all sites will experience a 10-15 second outage. While the actual user-facing outage is less than a minute, the entire window will be <em>2 hours.</em>
<p>The following services will be impacted during this upgrade:<br />
<blocklist></p>
<ul>
<li>Firefox Sync
<li>Socorro (<code>crash-stats.mozilla.com</code> &#038;  <code>crash-reports.mozilla.com</code>)
<li><code>input.mozilla.com</code>
<li><code>pulse.mozilla.org</code>
<li><code>firefoxlive.mozilla.org</code>
<li><code>demos.mozilla.org</code>
<li><code>www.mozillademos.org</code>
<li><code>www.drumbeat.org</code>
</ul>
<p></blocklist>
</ul>
<p>Please let me know if you have any reason why we should not proceed  with this planned maintenance. As always, we aim to keep downtime to as  little as possible, but unexpected complications can arise causing  longer downtime periods than expected. All systems should be operational  by the end of the maintenance window.</p>
<p>Feel free to comment directly if you see issues past the planned downtime.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mozilla.com/it/2011/03/06/mozilla-scheduled-maintenance-%e2%80%93-03072011-6pm-pst-03082011-0100-utc/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mozilla Scheduled Maintenance (Tree Closure) – 2/18/2011, 7am PST (1500 UTC 18 Feb 2011)</title>
		<link>http://blog.mozilla.com/it/2011/02/17/mozilla-scheduled-maintenance-tree-closure-%e2%80%93-2182011-7am-pst-1500-utc-18-feb-2011/</link>
		<comments>http://blog.mozilla.com/it/2011/02/17/mozilla-scheduled-maintenance-tree-closure-%e2%80%93-2182011-7am-pst-1500-utc-18-feb-2011/#comments</comments>
		<pubDate>Fri, 18 Feb 2011 00:15:59 +0000</pubDate>
		<dc:creator>amilewski</dc:creator>
				<category><![CDATA[Scheduled Maintenance]]></category>

		<guid isPermaLink="false">http://blog.mozilla.com/it/?p=1198</guid>
		<description><![CDATA[We will have a scheduled maintenance window tomorrow (Friday) morning from 7am to 8am PST (1500-1600 UTC) All Firefox, Fennec &#38; TryServer trees will be closed during this window. 7:00am PST (1400 UTC) Network hardware reconfiguration. Edge switches in one of our datacenters need reconfiguration. This will close all trees, as Talos machines will experience&#8230; <a class="more-link" href="http://blog.mozilla.com/it/2011/02/17/mozilla-scheduled-maintenance-tree-closure-%e2%80%93-2182011-7am-pst-1500-utc-18-feb-2011/" title="Read the rest of &#8220;Mozilla Scheduled Maintenance (Tree Closure) – 2/18/2011, 7am PST (1500 UTC 18 Feb 2011)&#8221;">Read more</a>]]></description>
			<content:encoded><![CDATA[<p>We will have a scheduled maintenance window <strong>tomorrow (Friday) morning</strong> from 7am to 8am PST (1500-1600 UTC)</p>
<p><span style="color: #ff0000;">All Firefox, Fennec &amp; TryServer trees will be closed during this window.</span></p>
<ul>
<li>7:00am PST (1400 UTC) Network hardware reconfiguration.<br />
Edge switches in one of our datacenters need reconfiguration. This will close all trees, as Talos machines will experience intermittent connectivity during the change. Apologies for the short notice; we want to get this done during a gap in the beta build cycle.</li>
</ul>
<p>Please let me know if you have any reason why we should not proceed    with this planned maintenance. As always, we aim to keep downtime to as    little as possible, but unexpected complications can arise causing    longer downtime periods than expected. All systems should be operational    by the end of the maintenance window.</p>
<p>Feel free to comment directly if you see issues past the planned downtime.</p>
<p><strong>UPDATE: 08:10 PST</strong><br />
All maintenance completed successfully, and the trees are reopened.</p>
<p><strong> </strong></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mozilla.com/it/2011/02/17/mozilla-scheduled-maintenance-tree-closure-%e2%80%93-2182011-7am-pst-1500-utc-18-feb-2011/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

