RuWeb.net Forum

NAC connectivity problems
support - 13.2.2003 в 21:27

Вот все объяснения, которые я получал от администратора сервера #2 в США по поводу проблем с сервером в этом месяце.

7.02.03
===
friends,

this announcement will summarize the events which transpired over the last few days.

31.01.03: the porting of the network segment to a new core level3 gige switch started. this was done in order to increase
transfer capacity, which would ensure consistent performance even at peak times. the switch has been tested for 24 hours
prior to installation on a 'dummy' network, constantly pushing over 300mbps.

1.02.03: NAC and the surrounding area suffered a power outage. the backup generator could not handle the capacity required
on some network segments, including ours, so when the UPS power ran out, the servers went offline. some of the switches that
were reprogrammed friday reverted to the old configuration (the new configuration did not serialize in time). as a result,
some servers were inaccessible even after the power outage has been resolved.

2.02.03: gige level3 routing was enabled again. shortly thereafter, a large ddos attack hit NAC resulting in severe packet
loss. the originating ips were blackholed and the destination ips were null routed as well.

3.02.03: gige appeared unstable and the network reverted to the old configuration. later that day, gige was brought live
again after technicians narrowed the problem down to a bug in cisco's IOS software, which controls the switch. the switch
has been performing consistently and remained stable since then.

4.02.03: a smurf ddos attack hit NAC, but was dealt with and connectivity has been restored fast.


What is being done:

+ additional generators are being put in to ensure that there is always enough backup power for the whole colocation facility.

+ additional steps to increase the NOC technicians' ability to deal with ddos attacks are being taken.

while ddos attacks can not be prevented, all measures necessary are being taken to ensure that their effect on the network is minimized, should they re-occur.

while we are extremely disappointed with the fact that these incidents occurred, the work that NAC technicians have put in to address the problems and resolve them expeditiously was very impressive. we are confident that they will continue handling any problems, should they arise, as well or better than ever, ensuring quality of service we have promised to you, our valued clients.
===

11.02.03
===
our machines will be transferred to a new power grid within 48 hours to take advntage of the new megawatt backup power
generator. this will ensure that we will have uninterrupted power supply to the servers even in case of a power outage such
as the one last week. expected downtime 5-10 minutes.
===

13.02.03
===
ia v vosmushenii. pri perenose servera NAC ne prosledidli chtoby on normal'no zagruzilsya. oni ego pingnuli (on otvetil), no on povis na fsck. seychas nad etim rabotayut.
===


support - 13.2.2003 в 21:33

Вобщем, причиной сегодняшнего более чем 6-часового дауна было то, что персонал дата-центра, подключив сервер к новому источнику питания, не проверил загрузился ли он полностью. А сервер повис и не работал, хотя и пинговался.
Вот такие дела... :(
Будем ругаться. :-E


Anonymous - 13.2.2003 в 22:15

Плехо, конечно....


rusko - 14.2.2003 в 07:49

polnostyu soglasen, eto nedopustimo. k sozhaleniyu, nesmotrya na to chto personal datacentra vsegda rabotal na vysokoprofessional'nom urovne, segodnya rebyta dali osechku. peredvigalos' primerno 80 serverov na novyi generator i vidimo oni ne usledili v svyazi s obiemom.

prinosim vsem klientam ruweb kotoryh eto kosnulos' svoi izvineniya.

pavel [rusko eps]


Воплик - 14.4.2003 в 08:22

Вопрос есть.. Что было в последнюю пятницу, 11-го числа с сервером? по московскому времени висел примерно часов с 12-13 до пяти..? =)


support - 14.4.2003 в 09:43

Блок питания из строя вышел...