0

I have a use case where a large number of clients are reporting their operational states every minute using an HTTP connection to a php script.

For each client, I have a row in a mysql table with unique clientID.

PHP script first checks if the client is valid, and update the timestamp so I know when was the last time they connected.(A heartbeat of sense)

INSERT into clientStatus (cID,..,timeStamp) VALUES (123,....,time())
ON DUPLICATE KEY UPDATE ......,timeStamp=time();

I use INSERT/ UPDATE to handle any new clients joining, but I can handle it separately if it helps to have a better flow on the larger scale. Each HTTP request creates a write operation (as above) to the DB.

Is there a way to aggregate all such queries (15-20 thousand requests per minute), and save it somewhere, combine them and run it as a single INSERT/ UPDATE query? Or is this line of thinking not the right approach?

At this point, the AWS RDS IOPS is being a bottleneck, and that's what caused me to explore this angle.

aVC
  • 101
  • 2
  • Few thousand requests per minute doesn't sound like a big deal; do you actually have a performance problem that you need resolved? If you do, did you already try to address it by scaling up hardware and/or tuning MySQL? – mustaccio May 10 '22 at 17:54
  • You might want to also check this out, with respect to the hidden cost of insert .. on duplicate key. – mustaccio May 10 '22 at 17:59
  • @mustaccio I have been fine tuning queries, indexes, etc. And I have come a long way. This is another avenue I am exploring. Ideally, for most of these clients, I only need to update the timestamp. Would have been nice If I could update timestamp on the select statement itself. Performance is not the issue. It is all running well. Currently, I have about 20k requests, all doing an insert/update every minute. This client list could grow, and I am planning ahead to reduce cost. – aVC May 10 '22 at 18:01
  • Well, there's ProxySQL... – mustaccio May 10 '22 at 18:03
  • @mustaccio ProxySQL sounds Interesting. Thanks, let me look into this. Ultimately, current intention is to reduce IOPS usage on AWS RDS. – aVC May 10 '22 at 18:48
  • The "thousands per minute" are coming through HTTP? Then through PHP (or some other app)? Which is then making a new connection to MySQL to do that single UPDATE? If not, please spell out the flow in more detail. – Rick James May 10 '22 at 23:23
  • @RickJames Thanks for responding, I have updated the question to make indicate the flow. Currently, each http call is making an individual update. Any thoughts much appreciated. – aVC May 10 '22 at 23:53

1 Answers1

0

There are multiple places where there might be a bottleneck.

  • HTTP
  • The client process that then talks to MySQL
  • Connecting to MySQL (for a single action)
  • Competing with other actions to do the Update (or IODKU)

Plan A: Scrape the web server's log to see that someone made a request. (The web service may not let you do this.) This would let you batch the data, and even de-duplicate much of the data. Periodically in the scraping, do a batch IODKU into the table. The timestamp updating would be delayed, but that may be OK.

Plan B: If the table where the update is occurring is somewhat busy with other stuff, it may be better to split this column out into a separate table. (I often recommend this for a "Likes" and "Visits" counter.)

Plan C: Like Plan B, but stage the updates in a very simple table that is periodically flushed with batch inserts into the appropriate locations.

Until you know which part of the processing is the bottleneck, you can't be sure that any of my suggestions (or others) would be beneficial.

Rick James
  • 78,038
  • 5
  • 47
  • 113
  • Rick, scraping logs might be an idea, but ideally prefer to stick to the real http requests. The requests are legit, and everything happening now is expected. Nothing fishy. When I said iops is bottleneck, I meant cost-wise, reducing the IOPS usage could help us, and hence exploring options to collect all IODKUs, and process in batch, instead of individual connections. I am looking into proxySQL. As I ask towards the end, Is this a healthy approach to explore, in your opinion? – aVC May 11 '22 at 21:14
  • @aVC - Http requests may be the bulk of IOPs. Can you isolate which IOPs come from them vs MySQL? – Rick James May 12 '22 at 16:15