PCI itself is Payment Card Industry. PCI DSS as noted is the Data Security Standard.

https://en.wikipedia.org/wiki/Payment_Card_Industry_Data_Sec...

The time it was in the transition between 2.0 and 3.0 (its been refined many times since).

https://listings.pcisecuritystandards.org/documents/PCI-DSS-... is the 3.2.1 audit report template.

One of the most important things in there is you don't mix dev and production. The idea of putting a development box next to a production box that runs the same transactions... that just doesn't happen.

Failing a PCI DSS audit means hefty fines and increases of transaction fees (paying 1% more on each transaction done with a credit card can make a $10k/month - $100k/month fine a rounding error) to a "no, you can't process credit cards" which would mean... well... shutting down the company (that wouldn't be a first offense - its still not something you want to have a chat about with accounting about why everything costs 1% more now). Those are things that you don't want to deal with as a developer.

So, no. There is no development configuration in production, or mirroring of a point of sales terminal to another system that's running development code.

Development code doesn't touch other people's money. We had enough side eyes looking at the raw data for our manager's payment card on development systems because only people that banked at that local bank occasionally experienced a problem with their visa check card... https://en.wikipedia.org/wiki/Digital_card#Financial_cards - when it says "generally '^'" it means it can be some other character... and it was... and this wasn't a problem for most people, but it turned out that the non-standard separator (that we only found after reading the card's raw data) and a space in the surname would result in misparsing of the track and giving an error - but none of our other cards used a separator that didn't match the "generally").

So, being able to generate real production load (in the cafeteria) without using Visa, Mastercard, etc... was important. As was being able to fall back to using the nearly antique credit card imprinter ( https://en.wikipedia.org/wiki/Credit_card_imprinter ) for the store that was lucky to get a dozen transactions a day.

▲

wcarss 17 hours ago | parent | next [-]

> So, no. There is no development configuration in production, or mirroring of a point of sales terminal to another system that's running development code.

This is a misreading of the suggestion, I think. My reading of the suggestion is to run a production "dry run" parallel code path, which you can reconcile with the existing system's work for a period of time, before you cut over.

This is not an issue precluded by PCI; it is exactly the method a team I led used to verify a rewrite of and migration to a "new system" handling over a billion dollars of recurring billing transactions annually: write the new thing with all your normal testing etc, then deploy it alongside in a "just tell us what you would do" mode, then verify its operation for specific case classes and then roll progressively over to using it for real.

edit: I don't mean to suggest this is a trivial thing to do, especially in the context you mentioned with many elements of hardware and likely odd deployment of updates, etc.

▲

shagie 14 hours ago | parent [-]

Our reading of PCI DSS was that there was no development code in a production build. Having a --dry-run flag would have meant doing that.

You could do "here is the list of skus for transaction 12120112340112345 - run this through the system and see what you get" on our dev boxes hooked up to QA store 2 (and an old device in the lab hooked up to QA store 1). That's not a problem.

Sending the scanner reads to the current production and a dev box in production would have been a hardware challenge. Not completely insurmountable but very difficult.

Sending the keyboard entry to both devices would be a problem. The screens were different and you can hand enter credit card numbers. So keyboard entry is potentially PCI data.

The backend store server would also have been difficult. There were updates to the store server (QA store 1 vs QA store 2 running simultaneously) that were needed too.

This wasn't something that we could progressively roll out to a store. When a store was to get the new terminals, they got a new hardware box, ingenicos were swapped with epson, old epson were replaced with new (same device but the screens had to be changed to match a different workflow - they were reprogrammable, but that was something that stores didn't have the setup to do), and a new build was pushed to the store server. You couldn't run register 1 with the old device and register 2 with a new one.

Fetching a list of SKUs, printing up a page of barcodes and running it was something we could do (and did) in the office. Trying to run a new POS system in a non-production mode next to production and mirroring it (with reconciling end of day runs) wasn't feasible for hardware, software, and PCI reasons that were exacerbated by the hardware and software issues.

Online this is potentially easier to do with sending a shopping cart to two different price calculators and logging if the new one matches the old one. With a POS terminal, this would be more akin to hooking the same keyboard and mouse up to a windows machine and a linux machine. The Windows machine is running MS Word and the linux is running Open office and checking to see that after five minutes of use of the windows machine that the Linux machine had the same text entered into OpenOffice. Of course they aren't - the keyboard entry commands are different, the windows are different sizes, the menus have things in different places in different drop downs... similarly, trying to do this with the two POS systems would be a challenge. And to top it off sometimes the digits typed are hand keyed credit card numbers when the MSR couldn't get a read - and make sure those don't show up on the linux machine.

I do realize this is reminiscent of business giving a poorly spec'ed thing and each time someone says "what about..." they come up with another reason it wouldn't work. This was a system that I worked on for a long while (a decade and a half ago) and could spend hours drawing and explaining diagrams of system architecture and issues that we had. Anecdotes of how something worked in a 4M Sloc system are inherently incomplete.

▲

wcarss 12 hours ago | parent [-]

Neat! Yeah, that's a pretty complex context and I completely see what you mean about the new hardware being part of the rollout and necessarily meaning that you can't just run both systems. My comment is more of a strategy for just a backend or online processing system change than a physical brick and mortar swap out.

In my note about misreading the suggestion, I was thinking generally. I do believe that there is no reason from a PCI perspective why a given production system cannot process a transaction live and also in a dry mode on a new code path that's being verified, but if the difference isn't just code paths on a device, and instead involves hardware and process changes, your point about needing to deploy a dev box and that being a PCI issue totally makes sense, plus the bit about it being a bad test anyway because of the differences in actions taken or outputs.

The example you gave originally, of shipping to the lower stake exceptional stores first and then working out issues with them before you tried to scale out to everywhere, sounded to me like a very solid approach to mitigating risk while shipping early.

	▲	shagie 11 hours ago \| parent [-]
		More of the background of the project. The original register was a custom written C program running in DOS. It was getting harder and harder to find C programmers. The consultancy that had part of the maintenance contract with it was also having that difficulty and between raising the rates and deprioritizing the work items because their senior people (the ones who still knew how to sling C and fit it into computers with 4 MB of memory that you couldn't get replacement parts for anymore) were on other (higher paying) contracts. So the company I worked at made the decision to switch from that program to a new one. They bought and licensed the full source to a Java POS system (and I've seen the same interface at other big retail companies too) and replace all the hardware in all the stores... ballpark 5000 POS systems. The professional services consultancy was originally brought in (I recall it being underway when I started at there in 2010). They missed deadlines and updates and I'm sure legal got in there with failure to deliver on contract. I think it was late 2011 that the company pulled the top devs from each team and set us to working on making this ready in all stores by October 2012 (side note: tossing two senior devs from four different teams into a new team results in some challenging personality situations). And that's when we (the devs) flipped the schedule around and instead of March 2013 for the cafeteria and surplus store (because they were the odd ones), we were going to get them in place in March of 2012 so that we could have low risk production environments while we worked out issues (so many race conditions and graphical event issues hanging with old school AWT). --- ... personality clash memory... it was on some point of architecture and code and our voices were getting louder. Bullpen work environment, (a bunch of unsaid backstory here) but the director was in the cube on the other side of the bullpen from us. The director "suggested" that we take our discussion to a meeting room... so we packed up a computer (we needed it to talk about code), all of the POS devices that we needed, put it on a cart, pushed the cart down the hall into a free conference room (there were two conference rooms on that floor - no, this wasn't a building designed for development teams) and set up and went back to loudly discussing. However, we didn't schedule or reserve the room... and the director that kicked us out of the bullpen had reserved the room that we had been kicked into shortly after we got there. "We're still discussing the topic, that will probably be another 5-10 minutes from now... and it will take us another 5 minutes pack the computer back up and take it back to the bullpen. Your cube with extra chairs in it should be available for your meeting and it's quiet there now without our discussions going on."

▲

hipratham 19 hours ago | parent | prev | next [-]

Why not use aged/ anonymized data? This way you can use Prod data in Dev with custom security rules anonymizing your data and following DSS.

▲

wcarss 17 hours ago | parent [-]

Lead: "We have six weeks to ship. Questions?"

Dev: "Could we pull an export of relevant historical data and get some time to write code to safely anonymize that, and stand up a parallel production system using just the anonymized data and replicate our deploy there, so we can safely test on real-ish stuff at scale?"

Lead: "I'll think about it. In the meantime, please just build the features I asked you to. We gotta hustle on this one."

I'm not arguing with this hypothetical exchange that it's infeasible or even a bad idea to do exactly what you suggested, but attempting to justify an upfront engineering cost that isn't directly finishing the job is a difficult thing to win in most contexts.

▲

philipallstar 17 hours ago | parent [-]

It's very common to use identical systems but anonymised data shipped back to test environments in such cases. There are certain test card numbers that always fail or always succeed against otherwise-real infrastructure on the card provider's side.

▲

wcarss 15 hours ago | parent [-]

Absolutely, I agree that it's a useful pattern. I've personally typed 4111 1111 1111 1111 into a stripe form more times than I want to even think about.

My point above was that it's not necessarily easy to convince the operators of a business that it's a justifiable engineering expense to set up a new "prodlike but with anonymized data" environment from scratch, because it's not a trivial thing to make and maintain.

I do think it's pretty easy to convince operators of a business to adopt the other strategy suggested in a sibling thread: run a dry mode parallel code path, verify its results, and cut over when you have confidence. This shouldn't really be an alternative to a test environment, but they can both achieve similar stuff.

▲

14 hours ago | parent | next [-]

[deleted]

▲

philipallstar 14 hours ago | parent | prev [-]

> I do think it's pretty easy to convince operators of a business to adopt the other strategy suggested in a sibling thread: run a dry mode parallel code path, verify its results, and cut over when you have confidence. This shouldn't really be an alternative to a test environment, but they can both achieve similar stuff.

I agree - it's a nice low-risk way of doing things.

	▲	shagie 13 hours ago \| parent [-]
		Elsecomment I explained this more... It is as low risk as trying to use Windows and Microsoft Word with a keyboard and mouse mirrored to a Linux machine running Open Office and expecting the same results. You can't run the two systems side by side - different screens, different keyboard entry... and some of the keyboard entry can't touch the other system. And this is assuming you can put a dry path into the production system. If the answer is "no", then you're putting a dev environment into a production environment... and that's certainly a "no". We had test environments and we had a lab were we had two rows of systems where the two systems sat back to back and each row was hooked up to a different test store (not feasible in a production store environment).

▲

ChrisGreenHeur a day ago | parent | prev [-]

surely you have logs from the production systems? just gather the logs and run them through the dev box. verify the end result matches between the two. You dont actually need the dev box to sit next to the production system.

▲

brendoelfrendo 21 hours ago | parent [-]

You cannot, under any circumstances, keep a real card # and use it as test data. I think that's where this conversation is getting hung up, because the idea of running a transaction through prod and them doing the same in test to see if it matches isn't something you can do. I mean, of course you can throw the prices and UPCs at the new system and verify that the new system's math matches the old system, but that's only the most basic function of a POS system. Testing a transaction from end-to-end would have to be done with synthetic data in an isolated environment, and I'll assume that's what OP is trying to articulate.

▲

antihero 17 hours ago | parent | next [-]

There's all this stuff but I remember when I was a Junior freelancer I was analysing a calendar availability sync script for a small holiday bookings company (not the big one). The hosts would have a publicly accessible Google Calendar with their bookings on which the script I was fixing would pull from.

Turns out, most of the host stored their customers long cards + expiry etc in the comment field of the booking.

▲

ChrisGreenHeur 20 hours ago | parent | prev [-]

the reproduction is always fake to some extent, that does not matter, the point is to do as good a job as you can.

for example you can have a fake transaction server with the credit card numbers made up and mapped to fake accounts that always have enough money, unless the records show they did not.

▲

Ghoelian 19 hours ago | parent | next [-]

I've also worked with payment processors a lot. The ones I've used have test environments where you can fake payments, and some of them (Adyen does this) even give you actual test debit and credit cards, with real IBAN's and stuff like that.

▲

skeeter2020 14 hours ago | parent | next [-]

Don't know anything about the OP's system, other than "POS" but the bug they experienced - and (maybe?) all the typical integration stuff like inventory management - is very complex and wouldn't manifest itself in a payment processing failure. I'm doubtful that anyone's production inventory or accounting systems allow for "fake" transactions that can be validated by an e2e test

	▲	shagie 12 hours ago \| parent [-]
		POS stands for Point Of Sales in this context. It was a linux running on (year appropriate) https://www.hp.com/us-en/solutions/pos-systems-products.html... - and add on all the peripherals. The POS software was standalone-ish (you could, in theory, hook it up to a generator to a register and the primary store server and process cash, paper check, and likely store branded credit cards)... it wouldn't be pleasant, but it could. The logic for discounts and sales and taxes (and if an item had sales tax in that jurisdiction) was all on register. The store server logged the transaction and handled inventory and price lookup, but didn't do price (sale, taxes) calculations itself.

▲

brazzy 16 hours ago | parent | prev [-]

It's even public: https://docs.adyen.com/development-resources/test-cards-and-...

▲

CamouflagedKiwi 18 hours ago | parent | prev [-]

At some point you start to get far away from reality though. If the cards have fake numbers then other auth information is also incorrect - e.g. the CVC won't match, the PIN won't either (depending on the format in use maybe). You can fake all that stuff too but now how much of that system are you really testing?

▲

nenxk 17 hours ago | parent [-]

I mean in his example the discount bug they ran into wouldn’t have needed any card numbers that could have been discovered with fake/cloned transactions that contained no customer detail in this case it seems it would have been best to test the payment processing in personal at a single store and then also testing with sales logs from multiple other locations

▲

ChrisGreenHeur 15 hours ago | parent [-]

yep, it sounds like the first implementation step really should have been to gather a large test set of data and develop the system with that in mind after understanding the test data, starting with making tests from the test data.

	▲	skeeter2020 14 hours ago \| parent [-]
		They explained the scenario though and it seems like a combination of rarer edge cases. It's great to think your awesome dev team and QA would have collected test data representing this ahead of time, but surely you've all been caught by this? I know I have; that's why we don't have flawless systems at launch.