Minimizing traffic on a server for a web app

A place to discuss the implementation and style of computer programs.

Moderators: phlip, Moderators General, Prelates

addiction
Posts: 27
Joined: Wed Dec 02, 2009 6:22 pm UTC

Minimizing traffic on a server for a web app

Postby addiction » Mon Nov 06, 2017 9:24 pm UTC

Hello all. I'm currently in the works of drafting out a web app for my job. The user case for this tool would essentially be:

"I have this ID value. I don't know what this value is because we have a bunch of different systems on a bunch of different servers. Where does this ID show up in and how is it locally known?"

Basically the user treats my tool like Google. They enter their string in, and it returns as much relevant data as it can.

Now unless someone knows of a neat little trick I can use, I have the basics of coding this thing in mind. Basically ASP.NET and C# to build, and execute SQL code in all of our database environments to see if I can get a match.

My problem: I have to assume that every single employee in this company is going to be using this tool simultaneously. This means roughly 300,000 (ish?) people are going to be querying one server at the same time in order to run this query.

Now the querying part is easy. I can do the equivalent of SQL Server's "NOLOCK" on each of the queries to prevent deadlocks. But I don't want to break a server or something due to traffic. I have the necessary C# that I can basically:
1. Open connection
2. Immediately run the sql
3. Immediately close the connection upon success

I think I can also figure out how to time out the connection, but that will take some research. Does anyone have any advice/tips on how to minimize the amount of traffic this tool may add? For those of you that can help, thank you! But please keep this in mind:

1. I'm inexperienced with this. My main skills involve SQL Server. I have the basics to build this, just not the networking experience to keep things from breaking.
2. As previously stated, this will be built in C#, ASP.NET, CSS (not sure if that matters...), and whatever SQL platform is being used. It would be best if I could keep it with just these languages.
3. I'm just a developer, so I don't have access to any of these servers to make modifications to handle heavy traffic. My manager might be able to make a strong case for me so that the respective admins will modify things, but we should probably assume that they won't budge on it.

Sorry for the long post, and thank you for your time!

User avatar
Thesh
Made to Fuck Dinosaurs
Posts: 5497
Joined: Tue Jan 12, 2010 1:55 am UTC
Location: Colorado

Re: Minimizing traffic on a server for a web app

Postby Thesh » Tue Nov 07, 2017 12:01 am UTC

I mean, create a covering index on just the columns queried and as long as your database server is more powerful than your web server then I wouldn't worry about it. Your web server will use connection pooling by default, which means it's just a single RPC (sp_reset_connection) as long as your connection string doesn't change. More than a couple thousand requests per second is probably unreasonable for an average web server.
Honesty replaced by greed, they gave us the reason to fight and bleed
They try to torch our faith and hope, spit at our presence and detest our goals

Tub
Posts: 311
Joined: Wed Jul 27, 2011 3:13 pm UTC

Re: Minimizing traffic on a server for a web app

Postby Tub » Tue Nov 07, 2017 7:51 am UTC

Closing the DB connection after success is unwise. Reopening a connection is much more costly than just having an idle connection sitting around. Read up on the connection pool of your language of choice, and use that.

You might ask your DBAs to issue you a read-only user with proper load limits and/or connection limits. Make it their problem(tm). Your critical DB servers should have load monitoring infrastructure in place, so someone is bound to notice load increases before it kills the server.

Other than that, consider implementing a cache. If someone searches for id=42, remember the result, so when another person searches for id=42 you don't need to query the databases. This is only useful if you expect multiple people to search for the same ids, and if the databases change infrequently enough that old results are acceptable.

Also, stop worrying. If your tool ends up being mission-critical for 300.000 people, then you will surely be given the resources to make the DB servers handle the load.

User avatar
Okita
Staying Alive
Posts: 3071
Joined: Fri Aug 10, 2007 7:51 pm UTC
Location: Finance land.

Re: Minimizing traffic on a server for a web app

Postby Okita » Wed Nov 08, 2017 4:44 am UTC

How fresh does your data need to be? Are we talking minute fresh or second fresh or "as fresh as I can get it"?

A solution that comes to mind is MOAR Apps. Specifically each app has a connection to the db and only knows how to return read_only data while using one connection string. It then passes the info to the requester (being your main webapp) in say, json format. Your main webapp knows to consume and put that info together.

So you have 300k people hitting one web server that's then making requests to a bunch of other smaller web servers. This means each server that's reading on a db environment only needs 1 connection string per server. This will allow you to really handle any scaling problems without making 300k connections. For example, scale up additional instances for db envs that are hit a lot. Or better yet, ask if they can make a read_only db replication that you can connect to.

Your SQL Server is probably better than mine but I'm not sure how NOLOCK helps you here. Won't you potentially be reading uncommitted dirty data? And why do you need to prevent deadlock? I would hope those db's are on Read Uncommitted?
"I may or may not be a raptor. There is no way of knowing until entering a box that I happen to be in and then letting me sunder the delicious human flesh from your body in reptile fury."

addiction
Posts: 27
Joined: Wed Dec 02, 2009 6:22 pm UTC

Re: Minimizing traffic on a server for a web app

Postby addiction » Wed Nov 08, 2017 3:43 pm UTC

Thanks for the input everyone. It's something for me to research and discuss with the DBA team. I'll start with trying to maximize how much I can do with one connection.

Okita wrote:How fresh does your data need to be? Are we talking minute fresh or second fresh or "as fresh as I can get it"?

A solution that comes to mind is MOAR Apps. Specifically each app has a connection to the db and only knows how to return read_only data while using one connection string. It then passes the info to the requester (being your main webapp) in say, json format. Your main webapp knows to consume and put that info together.

So you have 300k people hitting one web server that's then making requests to a bunch of other smaller web servers. This means each server that's reading on a db environment only needs 1 connection string per server. This will allow you to really handle any scaling problems without making 300k connections. For example, scale up additional instances for db envs that are hit a lot. Or better yet, ask if they can make a read_only db replication that you can connect to.

Your SQL Server is probably better than mine but I'm not sure how NOLOCK helps you here. Won't you potentially be reading uncommitted dirty data? And why do you need to prevent deadlock? I would hope those db's are on Read Uncommitted?



I think I need to better clarify something: I won't actually be reading any data. Rather, I'll be doing a quick query just to see if the ID exists in a given table. And by finding a match, I can populate everything else myself and keep that work local. So for example, I'd run this:

Code: Select all

if exists(select distinct 1
             from table
             where field = @variable)
begin
    select 1
end
else
begin
    select 0
end


I then load this into a variable and populate the data that I need. So "dirty" data doesn't really apply. Because I only care if it exists. And if someone is searching for this value, they have it, and therefore it exists in the data. Or what they have isn't real or it's a typo or something. So the NOLOCK helps just so I don't have to worry about locking the table and stopping the process. Because it doesn't matter if something is being updated. It just matters that my value exists.

I'll ask about the various servers being re-created as snapshots. Or at least relevant tables being created as snapshots. That might help too. My manager is under the impression that this tool would be SUPER helpful to a lot of people, so I might be able to request more than I thought.

Thank you for the input, everyone! I'll look into what you've advised me on.

User avatar
Thesh
Made to Fuck Dinosaurs
Posts: 5497
Joined: Tue Jan 12, 2010 1:55 am UTC
Location: Colorado

Re: Minimizing traffic on a server for a web app

Postby Thesh » Wed Nov 08, 2017 3:53 pm UTC

That's a lot of unnecessary logic in SQL. Why not just "select 1 from table where field = @variable" and then check for an empty result? I'm assuming field is a unique identifier, otherwise use top 1 - defensive coding just makes things a mess; with the original, bandwidth will probably be more costly than the query itself!
Honesty replaced by greed, they gave us the reason to fight and bleed
They try to torch our faith and hope, spit at our presence and detest our goals

addiction
Posts: 27
Joined: Wed Dec 02, 2009 6:22 pm UTC

Re: Minimizing traffic on a server for a web app

Postby addiction » Wed Nov 08, 2017 4:21 pm UTC

Well really, I threw this out for the sake of example, but that's a good point about defensive coding. Either way, I need to do some detailed testing to ensure that what I built will work well. But this is straying into another tricky coding topic for me. I can easily build queries to perform the best they can for the environment they're forced to be in. That's not hard, and testing for that is pretty easy as well. However, within the context of a web app, and needing to worry about bandwidth and other network-related parts, I have very little experience with. My general experience with web stuff has been a web app I've contributed to here at work and that's it. So everything related to this is super new.

Tub
Posts: 311
Joined: Wed Jul 27, 2011 3:13 pm UTC

Re: Minimizing traffic on a server for a web app

Postby Tub » Wed Nov 08, 2017 9:31 pm UTC

I'm not sure what exactly you're doing. Your web server is getting requests, including an ID. You then query multiple database servers for the existence of that ID, on indexed columns, but fetch no additional data. Then what? Do you just return a list of all databases the ID was found in, possibly with some static meta-information for each database, and then you're done? That would seem too simple for you to be worried..

Do a ballpark estimate how often your users will search per day on average, multiply by 300k. Average over business hours to get requests per second. Multiply by two to accomodate load spikes. What's your number?

Less than 100? Stop worrying.
Less than 1000? Start with a prototype, benchmark, maybe implement a cache, you'll be fine.
Less than 10000? Make a ballpark calculation of how many requests you can answer from cache, and how many requests will hit the DB servers. Present the DBAs with that number and the respective queries. If they have doubts, then you can ask for snapshots/replication slaves/other solutions. For simple lookups like those, thousands per second should not be a problem on a modern DB server, assuming you're granted enough concurrent connections so you don't get limited by network RTT.
More than 10000? Now you can start to worry. ;)

Not sure how many IDs exist in your databases, but the home-grown version of a snapshot is putting the results of "SELECT DISTINCT field FROM table" into a map/dictionary. You don't seem concerned about stale data, so unless those tables have millions of entries, that can work without involving office politics.

User avatar
Thesh
Made to Fuck Dinosaurs
Posts: 5497
Joined: Tue Jan 12, 2010 1:55 am UTC
Location: Colorado

Re: Minimizing traffic on a server for a web app

Postby Thesh » Wed Nov 08, 2017 10:31 pm UTC

From the description of the problem, it sounds like a single sign-on system: pass a token in a querystring to the target website, then the website checks to make sure the token is in the database - if it is, they allow access, otherwise they deny access. It doesn't necessarily need the information from the database, just whether or not it's there.
Honesty replaced by greed, they gave us the reason to fight and bleed
They try to torch our faith and hope, spit at our presence and detest our goals

addiction
Posts: 27
Joined: Wed Dec 02, 2009 6:22 pm UTC

Re: Minimizing traffic on a server for a web app

Postby addiction » Thu Nov 09, 2017 9:59 pm UTC

Thesh wrote:From the description of the problem, it sounds like a single sign-on system: pass a token in a querystring to the target website, then the website checks to make sure the token is in the database - if it is, they allow access, otherwise they deny access. It doesn't necessarily need the information from the database, just whether or not it's there.


This is correct. I apologize if I haven't been super clear about what I'm looking for, especially since this would be much easier to troubleshoot if you were actually co-workers sitting next to me, instead of kind strangers trying to decipher my text...

In terms of building the web app itself and all of the functionality it needs, I have zero worries about this. Getting it to work is simple, as it should be. The part that has been worrying me is I have no perspective/experience in how to measure this web app's potential impact on the dozens of different servers it will be interacting with. Really, I just have enough relevant knowledge of networking to know that one of my testing points should be to make sure that I don't overload something.

I did have a thought about how to deal with this issue:

My manager and I were thinking that we load this data into our local database. We could time this refresh to happen during some down time and minimize the number of instances where overseas employees are getting incomplete results. Especially if it's an incremental load. I could even put a warning on the app to let the user know that they're searching during a refresh.

This would result in one SUPER LONG table, but I could do all of the work with one query/server connection. Would this solve issues or create new ones? For perspective, this would result in a slowly increasing table starting at about a billion records (the eventual result of a recent company merger). It would be a narrow table, but very long. Indexing would improve the situation, as well as some careful filegroup creations. And since it's our own database/server, my manager might have more pull with getting the necessary modifications to ensure it wouldn't shut down unexpectedly.

Maybe this is my brain trying to cause more problems than solutions, but it feels like I'm missing something important. Otherwise, this sounds like an easy fix to my potential traffic situation? What do we think?

Tub
Posts: 311
Joined: Wed Jul 27, 2011 3:13 pm UTC

Re: Minimizing traffic on a server for a web app

Postby Tub » Thu Nov 09, 2017 11:35 pm UTC

addiction wrote:The part that has been worrying me is I have no perspective/experience in how to measure this web app's potential impact on the dozens of different servers it will be interacting with.

Like I said, start by estimating the number of queries per second. You won't have actual load measurements until you have a prototype, and you won't have actual usage metrics until the app is live, so you need to work with estimates to determine your architecture.

addiction wrote:My manager and I were thinking that we load this data into our local database. We could time this refresh to happen during some down time and minimize the number of instances where overseas employees are getting incomplete results. Especially if it's an incremental load. I could even put a warning on the app to let the user know that they're searching during a refresh.

Treat that DB like a local cache: on a cache miss, fall back to querying the backend servers directly. Caches are meant to reduce load, not to introduce failures.

addiction wrote:This would result in one SUPER LONG table, but I could do all of the work with one query/server connection. Would this solve issues or create new ones? For perspective, this would result in a slowly increasing table starting at about a billion records (the eventual result of a recent company merger). It would be a narrow table, but very long. Indexing would improve the situation, as well as some careful filegroup creations. And since it's our own database/server, my manager might have more pull with getting the necessary modifications to ensure it wouldn't shut down unexpectedly.

A billion tokens for 300k users? :shock: How do you end up with ~3k tokens per user? How many of those are still active?

But anyway, depending on token length, that's like ~50GB of data, correct? As far as databases go, that's small. An index is mandatory of course; I'm not fluent in SQL Server's disk layout, but other DB servers will benefit greatly if the token is set as the primary key.

But a full copy will saturate a gigabit network for ~7 minutes (assuming all servers are in the same data center), and put heavy load on the involved DB servers. If a copy once a day (overnight) is sufficient, then it can work. If you're able to do incremental updates (SELECT token FROM table WHERE creation_time >= $lastupdate_time), then you will be able to afford more frequent updates, but that requires the DB servers to not only store creation times, but to have them indexed as well. If you're planning to do full copies several times per day, then this may very well be a deoptimization because the copies could be more expensive than the actual queries would have been.

Also, I'm not sure why you expect your local DB to handle the load just fine, while you expect the remote DB servers to explode under the same load. Is your hardware going to be faster?


Return to “Coding”

Who is online

Users browsing this forum: Exabot [Bot] and 10 guests