Remix.run Logo
brentroose 11 hours ago

A month ago, I went on a performance quest trying to optimize a PHP script that took 5 days to run. Together with the help of many talented developers, I eventually got it to run in under 30 seconds. This optimization process with so much fun, and so many people pitched in with their ideas; so I eventually decided I wanted to do something more.

That's why I built a performance challenge for the PHP community

The goal of this challenge is to parse 100 million rows of data with PHP, as efficiently as possible. The challenge will run for about two weeks, and at the end there are some prizes for the best entries (amongst the prize is the very sought-after PhpStorm Elephpant, of which we only have a handful left).

I hope people will have fun with it :)

Tade0 8 hours ago | parent | next [-]

Pitch this to whoever is in charge of performance at Wordpress.

A Wordpress instance will happily take over 20 seconds to fully load if you disable cache.

rectang 4 hours ago | parent | next [-]

Are you talking about a new, empty WordPress instance running the default theme? Because if so, that doesn't match my anecdotal experience.

If you're talking about a WordPress instance with arbitrary plugins running an arbitrary theme, then sure — but that's an observation about those plugins and themes, not core.

As someone who has to work with WordPress, I have all kinds of issues with it, but "20 seconds to load core with caching disabled" isn't one of them.

lossyalgo 3 hours ago | parent [-]

Can concur. I bought a plugin a few years ago after using the free version for many years, wanting to support the devs for making such a useful plugin. I installed it on a few sites, and left my PC running overnight with a tab open to the plugin and woke up the next day to a lovely rebooted Windows (I hate how default Windows behavior after BSOD is a reboot with ZERO indication that it crashed or if it was an update that rebooted). Re-opened all my tabs, and queue the same waking up the next day to a freshly rebooted Windows, which made me suspicious. I assumed at that point it must have been a BSOD, so dug into Windows event logs, eventually realizing it was Firefox. Restored tabs yet again, left browser open over night, while installing more and more debugging tools for Firefox, none of which helped me track down the culprit. What pissed me off the most was that Firefox even allowed a process to consume > 30GB of RAM and cause my PC to crash! I finally caught it one night after > 10 BSODs - the tab had been open for 20+ hours, and right as it started to spiral out of control and my PC was about to crash, as programs were starting to error out and Windows was madly paging things to disk. I got lucky, and was able to open about:memory to see the culprit - this plugin had some kind of memory leak that wasn't noticeable but then suddenly went nuts. I emailed the devs multiple times with the full debug output, and was ignored for weeks until finally they responded, which pissed me off even more having finally paid for the pro version, only to be greeted with this. The free version didn't seem to have this issue either, which was like an extra slap in the face.

Naked Wordpress is plenty fast, but as soon as you start adding sketchy plugins and Themes, things can spiral out of control.

embedding-shape 8 hours ago | parent | prev | next [-]

Microbenchmarks are very different from optimizing performance in real applications in wide use though, they could do great on this specific benchmark but still have no clue about how to actually make something large like Wordpress to perform OK out of the box.

tracker1 4 hours ago | parent | prev | next [-]

Wordpress is something that I cannot believe hasn't been displaced by a service that uses a separate application for editing and delivery.

It seems like something like vercel/cloudflare could host the content-side published as a worker for mostly-static content from a larger application and that would be more beneficial and run better with less risk, for that matter. Having the app editing and auth served from the same location is just begging for the issues WP and plugins have seen.

devmor 3 hours ago | parent [-]

As someone who built full ecommerce websites on wordpress over 15 years ago, I can tell you exactly why it hasn't been replaced - the plugin/theme ecosystem.

There are tens of thousands of plugins and themes to make a Wordpress website do whatever you want and look however you want, either for free or a very low fee. You have to replace that entire ecosystem for the same price to replace Wordpress.

No matter how many times people get hacked, the perceived value of getting something for nothing outweighs the eventual cost.

hparadiz 3 hours ago | parent | next [-]

I did a short contract a few years back where multiple WordPress plugins were pulling different versions of guzzle and I had to use a namespace rewriter to be able to run multiple guzzle versions at the same time.

The thing about WordPress is you can put it on a box and lock it down so hard you just treat it as an untrusted process on your server.

shimman 3 hours ago | parent | prev [-]

Maybe it's just my poor imagination but how many plugins are truly unique to WP that you can't find on other CMSs? The only ones that come to mind would be those plugins that help connect to various B2B or B2C workflows, is that where the gold is mostly found?

AlienRobot 2 hours ago | parent [-]

WP essentially lets plugins do anything they want. The plugins are just scripts that register callbacks to events. WP calls events on BASICALLY EVERY FUNCTION. This is without exaggeration. I don't remember the exact names right now, but if you have a function like wp_get_title that gets the title of a post, there will be a "get_title" event that can modify which title is returned. So for every function first the data is computed using the default WP way, then plugins are allowed to discard all that work and replace it without something else entirely. There are events for deciding the canonical URL, for deciding the description of a post, for deciding whether RSS links will be displayed or not (the callback just returns true or false), etc.

In other words, every property can be modified through global event callbacks. Some events are called very early in the whole pipeline that let plugins just render whatever they want (e.g. render custom XML sitemaps).

rkozik1989 5 hours ago | parent | prev | next [-]

Much like anything else your performance is going to vary a lot based on architecture of implementation. You really shouldn't deploying anything into production without some kind of caching. Whether that's done in the application itself or with memcached/redis or varnish or OPcache.

slopinthebag 3 hours ago | parent | next [-]

Either you use a slow language and deal with caching or you use a fast language and just put Cloudflare/Bunny/etc in front.

LoganDark 5 hours ago | parent | prev [-]

> You really shouldn't deploying anything into production without some kind of caching.

Citation needed? You only need cache if a render is expensive to produce.

monkey_monkey 7 hours ago | parent | prev [-]

That's often a skill issue.

almosthere 5 hours ago | parent [-]

skill issue being they only know php

ge96 4 hours ago | parent | prev | next [-]

5 days to 30 seconds? What kind of factor/order of magnitude is that damn

What takes 5 days to run

hosteur 3 hours ago | parent | next [-]

Poorly made analytics/datawarehouse stuff.

slopinthebag 3 hours ago | parent | prev [-]

One query per column per row

gib444 9 hours ago | parent | prev | next [-]

> A month ago, I went on a performance quest trying to optimize a PHP script that took 5 days to run. Together with the help of many talented developers, I eventually got it to run in under 30 seconds

That's a huge improvement! How much was low hanging fruit unrelated to the PHP interpreter itself, out of curiosity? (E.g. parallelism, faster SQL queries etc)

brentroose 8 hours ago | parent | next [-]

Almost all, actually. I wrote about it here: https://stitcher.io/blog/11-million-rows-in-seconds

A couple of things I did:

- Cursor based pagination - Combining insert statements - Using database transactions to prevent fsync calls - Moving calculations from the database to PHP - Avoiding serialization where possible

tiffanyh 8 hours ago | parent [-]

Aren’t these optimizations less about PHP, and more about optimizing how your using the database.

toast0 4 hours ago | parent | next [-]

PHP is kind of like C. It can be very fast if you do things right, and it gives you more than enough rope to tie yourself in knots.

Making your application fast is less about tuning your runtime and more about carefully selecting what you do at runtime.

Runtime choice does still matter, an environment where you can reasonably separate sending database queries and receiving the result (async communication) or otherwise lets you pipeline requests will tend to have higher throughput, if used appropriately, batching queries can narrow the gap though. Languages with easy parallelism can make individual requests faster at least while you have available resources. Etc.

A lot of popular PHP programs and frameworks start by spending lots of time assembling a beautiful sculpture of objects that will be thrown away at the end of the request. Almost everything is going to be thrown away at the end of the request; making your garbage beautiful doesn't usually help performance.

tiffanyh an hour ago | parent [-]

Would love to read more stories by you toast0 on things you've optimized in the past (given the huge scale you've worked on). Lessons learned, etc. I always find your comments super interesting :)

toast0 43 minutes ago | parent [-]

<3 I always love seeing your comments and questions, too!

Well on the subject of PHP, I think I've got a nice story.

The more recent one is about Wordpress. One day, I had this conversation:

Boss: "will the blog stay up?"

toast0: "yeah, nobody goes to the blog, it's no big deal"

Boss: "they will"

toast0: "oh, ummmm we can serve a static index.html and that should work"

Later that day, he posted https://blog.whatsapp.com/facebook I took a snapshot to serve as index.html and the blog stayed up. A few months later, I had a good reason to tear out WordPress (which I had been wanting to do for a long time), so I spent a week and made FakePress which only did exactly what we needed and could serve our very exciting blog posts in something like 10-20 ms per page view instead of whatever WordPress took (which was especially not very fast if you hit a www server that wasn't in the same colo as our database servers). That worked pretty well, until the blog was rewritten to run on the FB stack --- page weight doubled, but since it was served by the FB CDN, load time stayed about the same. The process to create and translate blog entries was completely different, and the RSS was non-compliant: I didn't want to include a time with the date, and there is/was no available timeless date field in any of the RSS specs, so I just left the time out ... but it was sooo much nicer to run.

Sadly, I haven't been doing any large scale optimization stuff lately. My work stuff doesn't scale much at the moment, and personal small scale fun things include polishing up my crazierl [1] demo (will update the published demo in the next few days or email me for the release candidate url), added IPv6 to my Path MTU Discovery Test [2] since I have somewhere to run IPv6 at MTU 1500, and I wrote memdisk_uefi [3], which is like Syslinux's MEMDISK but in UEFI. My goal with memdisk_uefi is to get FreeBSD's installer images to be usable with PXE in UEFI ... as of FreeBSD 15.0, in BIOS mode you can use PXE and MEMDISK to boot an installer image; but UEFI is elusive --- I got some feedback from FreeBSD suggesting a different approach than what I have, but I haven't had time to work on that; hopefully soonish. Oh and my Vanagon doesn't want to run anymore ... but it's cold out and I don't seem to want to follow the steps in the fuel system diagnosis, so that's not progressing much... I did get a back seat in good shape though so now it can carry 5 people nowhere instead of only two (caveat: I don't have seat belts for the rear passengers, which would be unsafe if the van was running)

[1] https://crazierl.org/

[2] http://pmtud.enslaves.us/

[3] https://github.com/russor/memdisk_uefi

hu3 8 hours ago | parent | prev | next [-]

It's still valid as as example to the language community of how to apply these optimizations.

swasheck 8 hours ago | parent | prev [-]

in all my years doing database tuning/admin/reliability/etc, performance have overwhelmingly been in the bad query/bad data pattern categories. the data platform is rarely the issue

tosti 6 hours ago | parent [-]

The worst offenders I've seen were looping over a shitty ORM

cobbzilla 5 hours ago | parent | next [-]

hey don’t forget, that shitty ORM also empowers you to write beautiful, fluent code that, under the hood, generates a 12-way join that brings down your entire database.

edoceo 5 hours ago | parent | prev [-]

And that is true across languages.

Joel_Mckay 6 hours ago | parent | prev [-]

In general, it is bad practice to touch transaction datasets in php script space. Like all foot-guns it leads to Read-modify-write bugs eventually.

Depending on the SQL engine, there are many PHP Cursor optimizations that save moving around large chunks of data.

Clean cached PHP can be fast for REST transactional data parsing, but it is also often used as a bodge language by amateurs. PHP is not slow by default or meant to run persistently (low memory use is nice), but it still gets a lot of justified criticism.

Erlang and Elixir are much better for clients/host budgets, but less intuitive than PHP =3

contingencies 34 minutes ago | parent | prev | next [-]

Hehe. Optimization ... it's a good way to learn. Earlier in my career I did a lot of PHP. Usually close to bare.

Other than the obvious point that writing an enormous JSON file is a dubious goal in the first place (really), while PHP can be very fast this is probably faster to implement in shell with sed/grep, or ... almost certainly better ... by loading to sqlite then dumping out from there. Your optimization path then likely becomes index specification and processing, and after the initial load potentially query or instance parallelization.

The page confirms sqlite is available.

If the judges whinge and shell_exec() is unavailable as a path, as a more acceptable path that's whinge-tolerant, use PHP's sqlite feature then dump to JSON.

If I wanted to achieve this for some reason in reality, I'd have the file on a memory-backed blockstore before processing, which would yield further gains.

Frankly, this is not much of a programming problem, it's more a system problem, but it's not being specced as such. This shows, in my view, immaturity of conception of the real problem domain (likely IO bound). Right tool for the job.

CyberDildonics 4 hours ago | parent | prev | next [-]

Using a language that is 100x slower than naive native programs to do a "speed challenge" is like spending your entire day speed walking to run errands when you can just learn how to drive a car.

user3939382 8 hours ago | parent | prev | next [-]

exec(‘c program that does the parsing’);

Where do I get my prize? ;)

brentroose 8 hours ago | parent [-]

The FAQ states that solutions like FFI are not allowed because the goal is to solve it with PHP :)

kpcyrd 8 hours ago | parent [-]

What about using the filesystem as an optimized dict implementation?

olmo23 7 hours ago | parent [-]

this is never going to be faster because it requires syscalls

kpcyrd 6 hours ago | parent [-]

The time you lose at the syscall boundary you may be able to win back during much shorter GC pauses.

cess11 4 hours ago | parent [-]

Some suggestions that are already in the PR list disable GC.

https://www.php.net/manual/en/function.gc-disable.php

onion2k 6 hours ago | parent | prev [-]

A month ago, I went on a performance quest trying to optimize a PHP script that took 5 days to run. Together with the help of many talented developers, I eventually got it to run in under 30 seconds.

When people say leetcode interviews are pointless I might share a link to this post. If that sort of optimization is possible there is a structures and algorithms problem in the background somewhere.

nicoburns 5 hours ago | parent | next [-]

I find that these kind of optimizations are usually more about technical architecture than leetcode. Last time I got speedups this crazy the biggest win was reducing the number of network/database calls. There were also optimisations around reducing allocations and pulling expensive work out of hot loops. But leetcode interview questions don't tend to cover any of that.

They tend to be about the implementation details of specific algorithms and data structures. Whereas the important skill in most real-world scenarios would be to understand the trade-offs between different algorithms and data structures so that you pick an appropriate off-the-shelf implementation to use.

LollipopYakuza 5 hours ago | parent [-]

I agree. The "advanced" leetcode is about those last % of optimization. But when network latency is involved in a flow, it is usually the most obvious low hanging fruit.

tuetuopay 5 hours ago | parent | prev | next [-]

Well leetcode asks you to implement the data structure, not how and when to use which data structure. I don’t need to know how to implement a bloom filter on a whiteboard off the top of my head to know when to use it.

Twirrim an hour ago | parent [-]

Hell, the number of times I've used a lot of the data structures that come up in leetcode exercises without at least looking at some reference material is pretty small. I usually assume I'm going to misremember it, and go double check before I write it so I don't waste ages debugging later.

slopinthebag 3 hours ago | parent | prev [-]

Do you think they achieved that performance optimisation with a networked service because they switched from insertion sort to quicksort?

hparadiz 2 hours ago | parent [-]

I did the same thing in PHP before. Issue was a foreach over a foreach with a search. The fix was to build the result set as you populate the main array of objects. Basically as you add stuff to the array if there's a value that matches you throw it into another "results" array for any cardinality (unique value) that exists. Since in PHP objects are always just pointers your results arrays are relatively painless. Just a series of int32 integers basically. Then when you need an answer your result is instant. I ended up getting a 80-90% speed up. This is not just a php thing either. Folks end up doing this type of stuff in every language.