Advice on catching S3 upload slowness

I have an application that the admins can create photo albums and usually, these albums have more than a hundred photos. Carrierwave + uploadfy.swf are used to handle the files and send them to S3 (located in São Paulo/Brazil). The app runs on linode in Atlanta. And all the admins are from Brazil and lately it’s been very laggy to uploa. I know the file needs to travel a very long way when leaving my computer in Brazil, going to Atlanta and then the server sending it back to Brazil. I’ve talked to the guys at linode and they say the problem is with a node on amazon’s network and they can do nothing about it.

When I run things locally, everything’s very fast as expect. I’d like to know how can I debug on the production server to catch this lag. Or if I just move the server and the files closer.

Thanks

Have you tried running traceroute between your production machine and S3?

Are you absolutely positive the slowdown is coming in the part of the response that is uploading the file to S3? Is it possible the slowness is coming in generating whatever response comes after you have finished uploading? Or something before you upload?

Instrumentation will help here. I’m not familiar with CarrierWave but I’d start by examining your logs to see if they contain enough data to proove where the slowness is happening. If they do not, you’ll want to add more logging so you can pinpoint exactly what the problem is.

@benorenstein @derekprior a guy from linode’s support asked to run an MTR report from my linode to the S3 bucket and when I sent him the report he told:

According the MTR report, much of the latency is occurring within the Amazon infrastructure as indicated by the time it is taking for the packets to move from one hop to the other within their infrastructure. You may want to reach out to them to determine if they are having any issues within their network. Since this is outside of our network infrastructure, there is limited we can do to address the issue.

And since I know nothing about what those numbers on the report mean, I just took his word for it. And I haven’t reached to Amazon because I don’t pay to have that kind of support. I have to rely on their forums. And so far, nobody helped.

So @derekprior, that’s why I think it has to do with the network. Because when I run locally it works fast and my machine is a 1.7ghz macbook air against the 6-core on my server.

But since I only think that’s the problem but haven’t figured out a way to confirm or solve it I’m stuck.