Posted by rosscomputerguy 2 days ago
But all open FPGA projects miss the IO required for a good design. They do not have any serdes hardware nor DDR IO cells.
If those numbers are at all right it puts it in useful territory. Very much so for a first spin
For a first spin it looks overall pretty useful. The only nitpick I have would be that `operation` on the DSP tile should be from fabric instead of config (hardcoded in bitstream) otherwise I don't see a convenient way of resetting the accumulator(?)
I know that IO is really the 2nd thing which sells FPGA's. I did design a basic serdes hardware that should just work for this first generation. I do want to do DDR IO cells in the future.
How fast will the SerDes run, 50 Mhz? It is not clear to me from the serdes_tile.dart source code. Can you share the verilog files?
On the I/O side, getting even a basic 400MHz oversampled SerDes into a first-gen test chip puts this way ahead of most academic open FPGA efforts.
Really looking forward to seeing the Terra family expand and how the test chips perform.
On these multi party shuttle projects this gets simplified into a price list where they quote you a high ball-park number that covers your test chips cost by a wide margin. The actual cost is never disclosed, certainly not on price lists.
A mask set maker and a chip fab create half of your product, they own that intellectual product and they won't even tell you what it has cost them. They merge their product with yours, now thyey co-own your product. There are only a few competing companies world wide (and getting fewer every year) and they compete on all this non-disclosed stuff. Prices above all. Never belief what you read on the internet, especially in the chips war industry.
If you want to make better chips, like the low power Apple Silicon for example, you create your own EDA software tools to make the innovation. Creating a new transistor like the CFET [1] means writing new physics simulation tools, for example.
The outdated 1990's and buggy Open Lane software for example limits what kind of RAM transistors you can make or the complexity of your design.
My team makes asynchronous chips, free space optics photonics, ultra dense 2 transistor SRAM, niobium SQF chips, wafer scale integrations. All require bespoke software simulation tools, netlist rewriting tools, cross-reticle stepper exposure software (a software change in a $400 million dollar machine), etc etc. Making hardware near atomic size structures is mostly a software job. Hardware is software crystalized early, Alan Kay quips.
[1] https://www.imec-int.com/en/articles/imec-puts-complementary...
The problem is you can make test chips like Aegis for around $10 (depending on the yield, on how many of the first 1000 chips actually work) but they are just that, test chips.
In the case of Morphle Logic we make wafer scale integrations (WSI) with 10 billion transistors at 180nm for $750. That yields around 300 million 'gates', the largest commercial FPGA's barely get to 3 million. So our Morphle Logic WSI is the largest and fastest (up to 12 Ghz) FPGA you could get if we can find a few hundred buyers who want to pay up front (crowdfunding). Please email me if you are interested in such a enormous fast FPGA.
I'll buy an Aegis FPGFA test chip just to find out how hard it is to test a test chip.
Good luck RossComputerGuy, I hope you get working chips back. The same fab and supplier lost our first taped-out chips in the mail... and then they went bankrupt.
I struggled a bit to understand the explanation on github, but eventually got to something that made sense. It would have helped me if it said up front that
- 0, 1, N and Y pass the input signal on (works like a | or - in the input direction), and that - when a circuit has both a 0 and 1 output value, the output becomes 0 (which is why 11 is an AND and not a OR)
Hopefully that's correctly understood? If so, maybe consider updating the explanation for the next person.
Also, a question: Does a 0 and 1 on the same circuit consume more power than two 0s or two 1s due to the conflicting values? Or is it solved with transistors at the cost of propagation delay? Or something else?
We made seven different implementations of Morphle Logic, some of which are lower power, use less transistors, different ways to do asynchronous logic or are based on superconducting josephson junctions instead of transistors.
In this particular case the two tokens probably consume the same amount of power regardless of their value, but only measurements will tell.
Morphle Logic WSI has over 47,169,811 yellow cells. You could say that a single yellow Morphle Logic cell is more complex than ten Versal cells, but it's an apples and oranges comparison. However you count it, the $500 Morphle Logic WSI (cost price) has 10 billion transistors, the AMD Versal Premium cost over $100.000 and is effectively smaller in terms of gates, LUTs or cells even though it has 138 billion transistors.
If I made the Morphle Logic WSI in 2nm TSMC, it would have more than 52 trillion transistors [1], at least 245,283,018,867 yellow cells and cost over $22.500. You could easily emulate several AMD Versal Premium VP1902 FPGA's on the wafer.
I'll also note that it has a ton of SRAM onboard which doesn't shrink well, so I'm not convinced just by that extrapolation that you could eclipse it with a simple lithography shrink. Unless you really meant several per wafer, which doesn't really feel like a hard target...
Today the manufacturing process could be better optimized than 25 years ago, so some logic circuits much simpler than a 64-bit CPU (the previous were 32-bit CPUs for integers, but they had 64-bit/80-bit FPUs working at full speed), i.e. with much less gate delays per pipeline stage, might be able to reach 12 GHz.
However, something like a 64-bit ALU will certainly not reach 12 GHz. Even a 32-bit ALU is very unlikely to reach 12 GHz. Simple things, like shift registers and Galois-field counters, might reach such speeds, or even higher.
The next CMOS process generation, i.e. 130 nm, already allows making complex processors with more than a half of the maximum clock frequency of the fastest processors of today. It also allows making analog amplifiers and mixers for the 5 GHz WiFi frequency bands.
At 110nm I measured when a transistor was switching the second transistor on its output. I can prove it, can you disprove it?
A consistent 12Ghz signal cascade was (repeatedly) tested and confirmed on a 28nm asynchronous chip [1].
Why would it be impossible? [2].
We measure 800 Ghz and teraherz clocks on niobium superconducting Josephson Junctions [3,4,5].
[1] https://byrdsight.com/asynchronous-technology-has-its-time-f... ( See also on the slides in the video talk)
[2] "If an elderly but distinguished scientist says that something is possible, he is almost certainly right; but if he says that it is impossible, he is very probably wrong." - Arthur C. Clarke
[3] Ivan Sutherland Keynote Single Flux Quantum SFQ Ditigal Electronics Digital circuits totally distinct from Quan https://www.youtube.com/watch?v=KMVV3ErGSVY
[4] https://www.researchgate.net/profile/Jerome-Pety/publication...
[5] https://scholar.google.com/scholar?hl=en&as_sdt=0,5&qsp=3&q=...
It's very common to xray the dies, especially for debugging. Also common is to etch it layer by layer, take photos and rebuild the circuit schematic, mainly for reverse engineering but I've seen companies doing it to their own dies too.
Things get more blurry at the board level, the combinations of suppliers and service providers are endless.