edgecase
Kipling deals in thoughts which are both vulgar and permanent.
~ George Orwell
Author: StJohn Piano
Published: 2018-05-10
Datafeed Article 51
This article has been digitally signed by Edgecase Datafeed.
58954 words - 12963 lines - 325 pages





GOAL



I want to read and verify a standard raw bitcoin transaction.

This will involve:
- learning about the transaction format
- checking the validity of the transaction signature






CONTENTS



- Goal
- Contents
- Brief Summary
- Transaction Verification Recipe
- Downloadable Assets
- Representative Sample Of Bash Commands Used
- Notes / Discoveries
- Slightly-Less-Certain Notes
- Further Work
- Thoughts
- Project Log






BRIEF SUMMARY



I chose a bitcoin transaction to analyse. I studied prior work by Ken Shirriff (and retraced many of his steps) in order to learn how to read and verify a transaction. I then applied this process to my selected transaction and successfully verified it.

I developed code that significantly streamlined the verification process. This code is available in the Downloadable Assets section. A recipe for repeating this process is available in the Transaction Verification Recipe section. This recipe and code can only handle transactions of a particular type (one-input single-signature Pay-To-Public-Key-Hash (P2PKH)).

I learned a lot about how bitcoin transactions are structured. See the Notes / Discoveries section in particular.

In the Project Log section, I included a fairly thorough snapshot sequence of the development of parser1.py (a script for reading a transaction and reporting its structure to the user), which may be helpful for those who are new to programming and are wondering what they can usefully do.






TRANSACTION VERIFICATION RECIPE



Create a work directory. Download all assets associated with this article and place them into the work diretory.

Unzip ecdsa-0.10.tar.gz. Within the resulting ecdsa-0.10 directory, find the ecdsa directory and copy it to the work directory.

The inputs for each script must be edited manually. None have command-line argument processing.

On blockchain.info, a blockchain explorer web application, the raw form of a transaction can be looked up by using a request in this format:

http://blockchain.info/tx/[txid]?format=hex

where [txid] is the transaction identifier.

Example:
http://blockchain.info/tx/9ff7742262f68c560efd352f435b977c024797dc538f7efa08a467039858e42f?format=hex



### TRANSACTION VERIFICATION PROCESS (version 1)


Select a transaction to verify. It must have exactly one input. It can have multiple outputs. The input and the outputs must all be single-signature Pay-To-Public-Key-Hash (P2PKH). The public key in the input scriptSig may be compressed.

Get the raw form of the selected transaction. Save this in a file in the work directory.

Run parser2.py, targeted at this file.
- In the single input information, get these values:
-- [derived property] previous_output_hash (no spaces, big-endian)
-- previous_output_index
-- scriptSig

Use the "previous_output_hash (no spaces, big-endian)" value (a.k.a. the txid) to look up the transaction that supplied an output for use as the input to the selected transaction. Get the raw form of this previous transaction. Save it in a file in the work directory.

Run parser2.py, targeted at this file.
Use the previous_output_index found earlier to select the appropriate output in the result.
- In the information for this output, get this value:
-- scriptPubKey

For the selected transaction to be valid, its scriptSig needs to satisfy the cryptographic condition set by the previous transaction's scriptPubKey.

Use the scriptPubKey and the scriptSig as inputs to script_processor1.py.
Run script_processor1.py. From the output, get these values:
- signature_data
- public_key_data
- public_key_hash

Use public_key_data and public_key_hash as inputs to check_hash1.py. This script will check that the public_key_data in the selected transaction's scriptSig hashes to the public_key_hash in the previous transaction's scriptPubKey.

Use these items produced by script_processor1.py:
- signature_data
- public_key_data
this item produced by parser2.py when parser2.py was targeting the previous transaction:
- scriptPubKey
and the file containing the selected transaction
as inputs to check_signature2.py.

check_signature2.py will raise an informative error if the public_key_data is compressed.

If the public_key_data is compressed, uncompress it using uncompress.py. From the output, get the X and Y values, and use them as input for check_validity_of_point.py, which will check whether the uncompressed form of the public key is a valid point on the secp256k1 curve (if successful, the result is "True"). From the output of uncompress.py, get the uncompressed form of the public key and use it as input to check_signature2.py. Run check_signature2.py again.

check_signature2.py will use the public key to check the validity of the signature of the selected transaction in signable form. The public key and the signature are stored in the selected transaction's scriptSig. The scriptPubKey of the previous transaction is used as part of the transaction-in-signable-form.

If both check_hash1.py and check_signature2.py return successful results, then the selected transaction has been verified.


### END TRANSACTION VERIFICATION PROCESS (version 1)






DOWNLOADABLE ASSETS



The following assets comprise a toolset for verifying a standard transaction. See the Transaction Verification Recipe section.


bjorn_edstrom_ripemd160.py

check_hash1.py

check_signature2.py

check_validity_of_point.py

ecdsa-0.10.tar.gz

parser2.py

pypy_sha256.py

script_processor1.py

uncompress.py






REPRESENTATIVE SAMPLE OF BASH COMMANDS USED



aineko:~ stjohnpiano$ input="48 4d 40 d4 5b 9e a0 d6 52 fc a8 25 8a b7 ca a4 25 41 eb 52 97 58 57 f9 6f b5 0c d7 32 c8 b4 81"


[remove spaces from a string]
aineko:~ stjohnpiano$ echo $input | sed 's/ //g'

484d40d45b9ea0d652fca8258ab7caa42541eb52975857f96fb50cd732c8b481

[count characters in a string]
aineko:~ stjohnpiano$ echo -n "484d40d45b9ea0d652fca8258ab7caa42541eb52975857f96fb50cd732c8b481" | wc -c

64


aineko:work stjohnpiano$ python --version

Python 2.7.13

[change permissions of file]
aineko:work stjohnpiano$ chmod 700 parser1.py


aineko:work stjohnpiano$ ./parser1.py

hello

[add a space after every two characters in a string]
aineko:~ stjohnpiano$ echo -n "df3bd30160e6c6145baaf2c88a8844c13a00d1d5" | sed 's/../& /g'

df 3b d3 01 60 e6 c6 14 5b aa f2 c8 8a 88 44 c1 3a 00 d1 d5

aineko:~ stjohnpiano$ pubKeyHashB="df 3b d3 01 60 e6 c6 14 5b aa f2 c8 8a 88 44 c1 3a 00 d1 d5"


aineko:~ stjohnpiano$ pubKeyHash="df 3b d3 01 60 e6 c6 14 5b aa f2 c8 8a 88 44 c1 3a 00 d1 d5"


[check equality of two strings]
aineko:~ stjohnpiano$ [[ "$pubKeyHashB" = "$pubKeyHash" ]] && echo equal || echo not-equal

equal

[unpack zipped tape archive file]
aineko:Downloads stjohnpiano$ tar -zxvf ecdsa-0.10.tar.gz







NOTES / DISCOVERIES




My working definition of a standard transaction:
- It can have multiple inputs and outputs.
- The input and the outputs must all be single-signature Pay-To-Public-Key-Hash (P2PKH).
- The public key in the input scriptSig may be compressed.


The verification recipe I have developed in this article has been written and tested only with single-input transactions. Some code sections would have to be rewritten, expanded, and tested in order to handle multiple-input transactions.


Here is my current working description of the final raw form of a transaction:

Expected format (raw hex bytes):
- version: 4 bytes (little-endian)
- input_count: (var_int)
- inputs:
-- previous_output_hash: 32 bytes (little-endian)
-- previous_output_index: 4 bytes (little-endian)
-- script_length: (var_int)
-- scriptSig
-- sequence: 4 bytes (little-endian)
- output_count: (var_int)
- outputs:
-- value: 8 bytes (little-endian)
-- script_length: (var_int)
-- scriptPubKey
- block lock time: 4 bytes


Note: I might find in future that other transactions have formats that don't match this expected format.

Note: A var_int ("variable integer") value is used to store a number of variable length. It can be used to indicate the length in bytes of the next item, e.g. script_length, which contains the byte length of scriptSig. It can also be used to store a number, e.g. input_count (number of input items). A one-byte var_int has a maximum value of 0xFC (252 in decimal).


On blockchain.info, a blockchain explorer web application, the raw form of a transaction can be looked up by using a request in this format:

http://blockchain.info/tx/[txid]?format=hex

where [txid] is the transaction identifier.

Example:
http://blockchain.info/tx/9ff7742262f68c560efd352f435b977c024797dc538f7efa08a467039858e42f?format=hex



I chose the transaction with txid
9ff7742262f68c560efd352f435b977c024797dc538f7efa08a467039858e42f
in the block with hash
0000000000000000001a226aea1237aa124c741d73e897cc8384f273578a682e
at block height 519883 for analysis. It has 1 input and 2 outputs. All the addresses start with a "1", i.e. are not multisig.


Note: txid == "transaction id".


In raw form, this transaction is:

01000000014f8cad2ca3b602750e06c4300d91b22c17dd97bb5ce635d860df8b0ab325b336010000006b483045022100d8321c3c4b3fc3a34e30547d1cb6aea6855cad330b590b53e824f647dd49239902201e9056e82297a2108b1969cc497150734ab78f51cbf979885afc104f70a65150012102a534d65273f8520e1e841ccbb782b71e06b4117a90bb92c655f8131cf6ae2261fdffffff02e8d94100000000001976a914338cdde52f708236affa5675f969606ff846ee6f88acc4716e01000000001976a9143496950f1a01285a1605ac9337c06a5596b9fcd888accaee0700



In space-separated hex bytes, this transaction is:

01 00 00 00 01 4f 8c ad 2c a3 b6 02 75 0e 06 c4 30 0d 91 b2 2c 17 dd 97 bb 5c e6 35 d8 60 df 8b 0a b3 25 b3 36 01 00 00 00 6b 48 30 45 02 21 00 d8 32 1c 3c 4b 3f c3 a3 4e 30 54 7d 1c b6 ae a6 85 5c ad 33 0b 59 0b 53 e8 24 f6 47 dd 49 23 99 02 20 1e 90 56 e8 22 97 a2 10 8b 19 69 cc 49 71 50 73 4a b7 8f 51 cb f9 79 88 5a fc 10 4f 70 a6 51 50 01 21 02 a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61 fd ff ff ff 02 e8 d9 41 00 00 00 00 00 19 76 a9 14 33 8c dd e5 2f 70 82 36 af fa 56 75 f9 69 60 6f f8 46 ee 6f 88 ac c4 71 6e 01 00 00 00 00 19 76 a9 14 34 96 95 0f 1a 01 28 5a 16 05 ac 93 37 c0 6a 55 96 b9 fc d8 88 ac ca ee 07 00



In subdivided readable form, this transaction is:

- [derived property] byte length: 226
- version: 01 00 00 00
- input_count: 01
- [derived property] input_count_decimal: 1
- input #0:
-- previous_output_hash: 4f 8c ad 2c a3 b6 02 75 0e 06 c4 30 0d 91 b2 2c 17 dd 97 bb 5c e6 35 d8 60 df 8b 0a b3 25 b3 36
-- [derived property] previous_output_hash (no spaces, big-endian): 36b325b30a8bdf60d835e65cbb97dd172cb2910d30c4060e7502b6a32cad8c4f
-- previous_output_index: 01 00 00 00
-- [derived property] previous_output_index_decimal: 1
-- script_length: 6b
-- [derived property] script_length_decimal: 107
-- scriptSig: 48 30 45 02 21 00 d8 32 1c 3c 4b 3f c3 a3 4e 30 54 7d 1c b6 ae a6 85 5c ad 33 0b 59 0b 53 e8 24 f6 47 dd 49 23 99 02 20 1e 90 56 e8 22 97 a2 10 8b 19 69 cc 49 71 50 73 4a b7 8f 51 cb f9 79 88 5a fc 10 4f 70 a6 51 50 01 21 02 a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61
-- sequence: fd ff ff ff
- output_count: 02
- [derived property] output_count_decimal: 2
- output #0:
-- value: e8 d9 41 00 00 00 00 00
-- [derived property] output value in bitcoin: 0.04315624
-- script_length: 19
-- [derived property] script_length_decimal: 25
-- scriptPubKey: 76 a9 14 33 8c dd e5 2f 70 82 36 af fa 56 75 f9 69 60 6f f8 46 ee 6f 88 ac
- output #1:
-- value: c4 71 6e 01 00 00 00 00
-- [derived property] output value in bitcoin: 0.24015300
-- script_length: 19
-- [derived property] script_length_decimal: 25
-- scriptPubKey: 76 a9 14 34 96 95 0f 1a 01 28 5a 16 05 ac 93 37 c0 6a 55 96 b9 fc d8 88 ac



In subdivided readable form, the scriptSig is:
- PUSHDATA: 48
- [derived property] PUSHDATA decimal value: 72
- signature_data: 30 45 02 21 00 d8 32 1c 3c 4b 3f c3 a3 4e 30 54 7d 1c b6 ae a6 85 5c ad 33 0b 59 0b 53 e8 24 f6 47 dd 49 23 99 02 20 1e 90 56 e8 22 97 a2 10 8b 19 69 cc 49 71 50 73 4a b7 8f 51 cb f9 79 88 5a fc 10 4f 70 a6 51 50 01
- PUSHDATA: 21
- [derived property] PUSHDATA decimal value: 33
- public_key_data: 02 a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61



In subdivided readable form, the scriptPubKey from the previous transaction (that supplied the unspent output used as the input for this transaction) is:
- OP_DUP: 76
- OP_HASH160: a9
- PUSHDATA: 14
- [derived property] PUSHDATA decimal value: 20
- public_key_hash: 7e 76 71 29 b1 5e e6 d7 e6 be 8a c3 a1 00 dd 29 c4 c6 7e 63
- OP_EQUALVERIFY: 88
- OP_CHECKSIG: ac



The inputs for a new transaction are a set of as-yet-unspent outputs from previous transactions. Inputs transfer value out of a previous address and outputs transfer value into a new address. (It would perhaps be more accurate to say that value is "associated with" an address, rather than that the value is "contained within" it.) A transaction is a collection of inputs and outputs, formatted into a single item.

Blocks are collections of transactions, formatted into a single item.

The bitcoin amount associated with an address does not specifically exist anywhere on the blockchain. It is an implicit value - it is calculated as the sum of the values of the currently-unspent outputs associated with an address (i.e. sent to that address by previous transactions). When some of these unspent outputs are used in a new transaction, value is transferred out of the address.

A bitcoin address is not just a global ID, like an account number in a bank. A standard* address is a value that encodes a hash of a bitcoin public key. The network nodes, following the bitcoin ruleset, verify that new transactions that spend bitcoin from an address are signed by a public key that hashes to the value encoded in that address. The nodes reject transactions that do not satisfy this cryptographic condition.

* My working definition of a standard address is: single-signature Pay-To-Public-Key-Hash (P2PKH).

The cryptographic proof is contained in each input (a.k.a. "as-yet-unspent-output") and is specific to that input. For standard* inputs, this proof consists of:
- A public key that hashes to the value encoded in the address that holds the unspent-output.
- A signature, made by the private key associated with the public key, of the entire transaction (albeit an altered form of the transaction).

* My working definition of a standard input is: an input that spends an unspent output from a standard single-signature Pay-To-Public-Key-Hash (P2PKH) address.

If a public key is compressed:
- This will produce a different hash and thus a different address.
- The compressed form must also be used in a transaction input that spends from this address. (It will be hashed and the result compared to the hash encoded in the address.)

It is worth noting that when an address is used to receive bitcoin, only the hash of the public key is revealed. Later, if and when bitcoin is sent from this address, the actual public key will be revealed in the relevant input of the transaction.


Bitcoin ECDSA public keys represent a point on a particular Elliptic Curve (EC) defined in secp256k1. This curve is y^2 = x^3 + 7. In their traditional uncompressed form, public keys contain an identification byte, a 32-byte X coordinate, and a 32-byte Y coordinate.

The Elliptic Curve DSA algorithm generates a 512-bit public key from the private key.


Each input used must be entirely spent in a transaction.

Suppose that an address that you own contains 1 bitcoin and that you want to send 0.01 bitcoin to someone else from this address. The transaction must spend the entire 1 bitcoin. The solution is to spend the remaining 0.99 bitcoin to another of your addresses, which can in fact be the original address that contained the 1 bitcoin.

Transactions can also include a mining fee. The fee is an implicit value - if there is any bitcoin left over after adding up the inputs and subtracting the outputs, the remainder is the fee paid to the miner. The fee isn't strictly required, but transactions without a fee will be a low priority for miners and may not be processed for days or may be discarded entirely.


In order to learn how to read and verify a signed transaction, I studied Ken Shirriff's article Bitcoins the hard way: Using the raw Bitcoin protocol, in which he created and signed a transaction, and the associated code [link]. I read and verified his signed transaction, and used the process/knowledge from this to read and verify my selected transaction.


The txid of Ken Shirriff's signed transaction is:
3f285f083de7c0acabd9f106a43ec42687ab0bebe2e6f0d529db696794540fea


Here is Ken Shirriff's signed transaction in raw form:

0100000001484d40d45b9ea0d652fca8258ab7caa42541eb52975857f96fb50cd732c8b481000000008a47304402202cb265bf10707bf49346c3515dd3d16fc454618c58ec0a0ff448a676c54ff71302206c6624d762a1fcef4618284ead8f08678ac05b13c84235f1654e6ad168233e8201410414e301b2328f17442c0b8310d787bf3d8a404cfbd0704f135b6ad4b2d3ee751310f981926e53a6e8c39bd7d3fefd576c543cce493cbac06388f2651d1aacbfcdffffffff0162640100000000001976a914c8e90996c7c6080ee06284600c684ed904d14c5c88ac00000000



Here is Ken Shirriff's signed transaction in subdivided readable form:

- [derived property] byte length: 223
- version: 01 00 00 00
- input_count: 01
- [derived property] input_count_decimal: 1
- input #0:
-- previous_output_hash: 48 4d 40 d4 5b 9e a0 d6 52 fc a8 25 8a b7 ca a4 25 41 eb 52 97 58 57 f9 6f b5 0c d7 32 c8 b4 81
-- [derived property] previous_output_hash (no spaces, big-endian): 81b4c832d70cb56ff957589752eb4125a4cab78a25a8fc52d6a09e5bd4404d48
-- previous_output_index: 00 00 00 00
-- [derived property] previous_output_index_decimal: 0
-- script_length: 8a
-- [derived property] script_length_decimal: 138
-- scriptSig: 47 30 44 02 20 2c b2 65 bf 10 70 7b f4 93 46 c3 51 5d d3 d1 6f c4 54 61 8c 58 ec 0a 0f f4 48 a6 76 c5 4f f7 13 02 20 6c 66 24 d7 62 a1 fc ef 46 18 28 4e ad 8f 08 67 8a c0 5b 13 c8 42 35 f1 65 4e 6a d1 68 23 3e 82 01 41 04 14 e3 01 b2 32 8f 17 44 2c 0b 83 10 d7 87 bf 3d 8a 40 4c fb d0 70 4f 13 5b 6a d4 b2 d3 ee 75 13 10 f9 81 92 6e 53 a6 e8 c3 9b d7 d3 fe fd 57 6c 54 3c ce 49 3c ba c0 63 88 f2 65 1d 1a ac bf cd
-- sequence: ff ff ff ff
- output_count: 01
- [derived property] output_count_decimal: 1
- output #0:
-- value: 62 64 01 00 00 00 00 00
-- [derived property] output value in bitcoin: 0.00091234
-- script_length: 19
-- [derived property] script_length_decimal: 25
-- scriptPubKey: 76 a9 14 c8 e9 09 96 c7 c6 08 0e e0 62 84 60 0c 68 4e d9 04 d1 4c 5c 88 ac



In subdivided readable form, the scriptSig is:
- PUSHDATA(71)
- [transaction signature] 30 44 02 20 2c b2 65 bf 10 70 7b f4 93 46 c3 51 5d d3 d1 6f c4 54 61 8c 58 ec 0a 0f f4 48 a6 76 c5 4f f7 13 02 20 6c 66 24 d7 62 a1 fc ef 46 18 28 4e ad 8f 08 67 8a c0 5b 13 c8 42 35 f1 65 4e 6a d1 68 23 3e 82 01
- PUSHDATA(65)
- [bitcoin public key] 04 14 e3 01 b2 32 8f 17 44 2c 0b 83 10 d7 87 bf 3d 8a 40 4c fb d0 70 4f 13 5b 6a d4 b2 d3 ee 75 13 10 f9 81 92 6e 53 a6 e8 c3 9b d7 d3 fe fd 57 6c 54 3c ce 49 3c ba c0 63 88 f2 65 1d 1a ac bf cd


In subdivided readable form, the scriptPubKey from Ken Shirriff's previous transaction (that supplied the unspent output used as the input for this transaction) is:
- OP_DUP
- OP_HASH160
- PUSHDATA(20)
- df 3b d3 01 60 e6 c6 14 5b aa f2 c8 8a 88 44 c1 3a 00 d1 d5
- OP_EQUALVERIFY
- OP_CHECKSIG



The value in bitcoin of an output is actually an integer and is denominated in satoshis, the smallest unit of bitcoin. Bitcoin values used for display often include a decimal point, but on the blockchain they are all actually satoshi integer values.


In a transaction, scriptSig and scriptPubKey are small stack programs in a Bitcoin-specific language called Script. These programs contain the instructions for verifying the validity of a transaction input. A bitcoin node must use an implementation of a stack machine to run these programs. Script is simple, stack-based, and processed from left to right.

Some reading indicates that a "stack" is the abstract idea of a sequential collection of objects on which certain operations are defined. A stack machine can process a stack and produce an output. Stack machine implementations may vary across platforms but must behave in the same way in order to be considered to be the same stack machine. A stack machine can therefore be implemented in Python on one computer and in C++ on another computer, but as long as the two stack machine implementations can perform the same operations on the same domain, they can pass data (stack items) between themselves without any trouble.

The stack code does not implement a SHA256 hash function - it simply indicates when and on what data a SHA256 function should operate. The details of implementing SHA256 are left to the stack machine.

Stack code can be manually processed by reading the code and performing the indicated operations e.g. checking that the two top stack items are equal or applying the SHA256 hash function to the data in a specified stack item.

The scriptPubKey is contained within an output of a previous transaction (the one that sent bitcoin to a particular address). The scriptSig is contained within an input of a new transaction (the one that spends bitcoin from a particular address).

The scriptPubKey in the old transaction defines the condition for spending the bitcoins (a hash of a public key). The scriptSig in the new transaction provides data that satisfies this condition (a signature and a public key).

The scriptPubKey from the previous transaction is needed for transaction validation. An input in the new transaction will identify the previous transaction and include the index of the relevant output from the previous transaction. The transaction identifier can be used to look up the previous transaction.
- Note: I think that a standard scriptPubKey could be constructed from the address alone, but it is preferable to source the scriptPubKey directly from the blockchain, so as to avoid any discrepancy.

The scriptSig signature in a valid transaction accomplishes several things:
- It proves that the transaction has not been changed since it was created.
- It proves that the transaction was signed by someone who had a particular public key (which is contained in the scriptSig).
- It proves that the transaction was signed by someone who had the specific public key required by the scriptPubKey of the relevant previous output.

Note: It is perhaps more accurate to say something like "It proves that the transaction was signed by someone who had the private key that corresponds to a particular public key.".

During transaction validation, scriptSig and scriptPubKey are combined into a single script. The resulting script program is then run. A transaction is valid if nothing in the combined script triggers failure and the top stack item is True (non-zero) when the script exits.

expected script for a standard transaction:
- [scriptSig]
-- PUSHDATA: 1 byte
-- signature data + SIGHASH_ALL
-- PUSHDATA: 1 byte
-- public key data
- [scriptPubKey]
-- OP_DUP / 0x76: 1 byte
-- OP_HASH160 / 0xa9: 1 byte
-- PUSHDATA: 1 byte
-- public key hash: 20 bytes (== 160 bits)
-- OP_EQUALVERIFY / 0x88: 1 byte
-- OP_CHECKSIG / 0xac: 1 byte



Particular byte values are "opcodes". They indicate an stack operation that should be performed. Many byte values signify PUSHDATA (push n subsequent bytes onto the stack as a new stack item), but each indicates a different number of bytes to push.

SIGHASH_ALL is a single byte appended to the signature data. Its value is 0x01.

The signature is in DER encoding and must be unpacked before it can be checked.

The first byte of the public key data will be 0x04 if the public key is uncompressed. It will be 0x02 or 0x03 if compressed. If it is 0x02, the Y value is even. If it is 0x03, the Y value is odd.

Note: The public key data in the input of my selected transaction begins with 02 and is compressed. The public key data in Ken Shirriff's signed transaction begins with 04 and is uncompressed.


Opcode descriptions:

- byte value: 0x01-0x4b
-- Word: PUSHDATA
-- Opcodes 1-75 (e.g. OP_1).
-- Input: [Special]
-- Output: a single stack item
-- Description: Let N be the byte value of the opcode. The next N bytes are data to be pushed onto the stack as a new stack item.

- byte value: 0x69
-- Word: OP_VERIFY
-- Opcode: 105
-- Input: True / false
-- Output: Nothing / fail
-- Description: Marks transaction as invalid if top stack value is not true. The top stack item is removed.

- byte value: 0x76
-- Word: OP_DUP
-- Opcode: 118
-- Input: top_item
-- Output: top_item, top_item
-- Description: Duplicates the top stack item.

- byte value: 0x87
-- Word: OP_EQUAL
-- Opcode: 135
-- Input: top_item, item_below_top_item
-- Output: True / false.
-- Description: Returns 1 if the inputs are exactly equal, 0 otherwise.

- byte value: 0x88
-- Word: OP_EQUALVERIFY
-- Opcode: 136
-- Input: top_item, item_below_top_item
-- Output: Nothing / fail.
-- Description: Same as OP_EQUAL, but runs OP_VERIFY afterward.

- byte value: 0xa9
-- Word: OP_HASH160
-- Opcode: 169
-- Input: top_item
-- Output: hash
-- Description: The top item on the stack is hashed twice: first with SHA-256 and then with RIPEMD-160.

- byte value: 0xac
-- Word: OP_CHECKSIG
-- Opcode: 172
-- Input: top_item, item_below_top_item (signature, public key). Also hash of entire transaction-in-signable-form.
-- Output: True / false
-- Description: The entire transaction (in an altered form) is hashed. The signature must be a valid signature for this hash and public key. If it is, 1 is returned, otherwise 0 is returned.



The input signatures in a transaction are created by this process:

- For each input:
-- Acquire the relevant scriptPubKey of the previous transaction (that supplies the unspent output used as an input in the new transaction).
-- Substitute this scriptPubKey for the scriptSig of this input. Also substitute the scriptPubKey's length var_int for the scriptSig's length var_int.
-- Remove the scriptSigs of other inputs. I haven't tested this myself. Some reading indicates that a single 0x00 byte is substituted for each length var_int of other scriptSigs (thus indicating that the scriptSig contains no data).
-- Append a four-byte form of the hash_type. For hash type 1, SIGHASH_ALL, this is 0x01 00 00 00 (The value "1" in little-endian form).
-- The transaction is currently the transaction-in-signable-form.
-- Sign this form of the transaction with the private key corresponding to the public key to which the unspent output was originally sent.
-- Remove the four-byte hash_type from the end of the transaction.
-- Append a one-byte version of the hash_type (e.g. SIGHASH_ALL == 0x01) to the scriptSig signature data.
-- Save the scriptSig for this input.


Note: The transaction-in-signable-form differs for each input.

To construct the final transaction:
- After all signatures have been constructed, place all the final scriptSigs in the appropriate locations within the transaction.






SLIGHTLY-LESS-CERTAIN NOTES



The items in these notes require more testing/checking/confirmation.


Here is my current understanding of possible var_int values:

- var_int value: < 0xFD
-- Storage length = 1
-- Format = uint8_t
-- Note: A one-byte var_int can encode a maximum length value of 0xFC or 252 bytes.

- var_int value: <= 0xFFFF
-- Storage length = 3
-- Format = 0xFD followed by the length as uint16_t
-- Note: A three-byte var_int can encode a maximum length value of 0xFFFF or 65535 bytes.

- var_int value: <= 0xFFFF FFFF
-- Storage length = 5
-- Format = 0xFE followed by the length as uint32_t
-- Note: A five-byte var_int can encode a maximum length value of 0xFFFF FFFF or 4294967295 bytes.

- var_int value: <= 0xFFFF FFFF FFFF FFFF
-- Storage length = 9
-- Format = 0xFF followed by the length as uint64_t
-- Note: A nine-byte var_int can encode a maximum length value of 0xFFFF FFFF FFFF FFFF or 18446744073709551615 bytes.



An almost 50% reduction in public key size can be realised by dropping the Y coordinate. This is possible because only two points along the curve share any particular X coordinate, so the 32-byte Y coordinate can be replaced with a smaller value indicating whether the Y value is even or odd. I think that the terms "even" and "odd" are derived from the modular arithmetic used to calculate Y from X. The actual equation for calculating Y from X is y = sqrt(x^3 + 7), which will produce a negative result and a positive result.


A transaction signature can be altered by a third party and rebroadcast. This is known as Transaction Malleability. This new version will not be a valid transaction and will be rejected. I think that this can be a problem in two cases:
- Someone tries to DDOS bitcoin nodes by rebroadcasting many altered transactions. CPU effort is required to process and reject altered transactions (more effort than is required to reject normal pings, I think).
- The altered version may prevent the valid transaction from being mined (included in a block) for a while. This can particularly cause trouble if a person receiving payment attempts to spend the output from an unconfirmed transaction. A third party can then rebroadcast an altered version of the unconfirmed transaction, delaying the secondary payment.


A Bitcoin ECDSA private key in standard format is simply a 256-bit number, between the values:
- 0x01
and
- 0xFFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFE BAAE DCE6 AF48 A03B BFD2 5E8C D036 4140
This range represents nearly the entire range of 2^256-1 values. It is set by the secp256k1 ECDSA encryption standard.


Script is intentionally not Turing-complete, with no loops or GOTO instructions.


A different random number is needed for every ECDSA signature. If an insufficiently random number is used, an attacker can derive the private key from a sufficient number of signatures.





FURTHER WORK



Find out how to calculate the transaction identifier (TXID) of a signed transaction.


"In Bitcoin, a private key in standard format is simply a 256-bit number, between the values:
- 0x01
and
- 0xFFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFE BAAE DCE6 AF48 A03B BFD2 5E8C D036 4140, representing nearly the entire range of 2^256-1 values. The range is governed by the secp256k1 ECDSA encryption standard used by Bitcoin."
-> Investigate this. Are these values accurate?


"However, Bitcoin Core prior to 0.6 used uncompressed keys."
-> How do versions prior to 0.6 verify transactions (produced by other versions) in which a compressed public key is used in an input scriptSig?


Investigate the format of the block_lock_time value. It's apparently not a straightforward little-endian integer value.


Investigate whether any transactions have included a two-byte var_int length value.
E.g. a transaction with 253 inputs.
-> Do other limits within the bitcoin code prevent a transaction having 253 inputs?
- This limit was mentioned in an excerpt: "The transaction must be smaller than 100,000 bytes. That's around 200 times larger than a typical single-input, single-output P2PKH transaction."
-- However, a transaction with 253 inputs and one output would presumably not exceed 100 000 bytes.



Read and verify a multiple-input transaction. Develop appropriate code to streamline this process.


Add block_lock_time to the output of Transaction.to_string() in parser2.py.


A different random number is needed for every ECDSA signature. If an insufficiently random number is used, an attacker can derive the private key from a sufficient number of signatures.
-> How is this attack performed, exactly?
- Note: I think that this random number has to be the same length as the ECDSA public key (which is twice the length of the ECDSA private key). Is this true?






THOUGHTS



Ken Shirriff wrote that "A transaction is the basic operation in the Bitcoin system.". I think it might actually be the basic conceptual item in the Bitcoin system. The definitions/designs of all the other items could be considered as flowing from the idea of a transaction.

Examples:
- A bitcoin address can be thought of as "the condition that must be satisfied by a transaction input scriptSig".
- Blocks are collections of transactions. They lay down a transaction history that is very difficult to alter.
- New bitcoin is generated by using a special case of a transaction (a "coinbase" transaction in a new block).

This is different from gold as a basis for a currency. The physical item "gold" is the basic conceptual item in a gold-based currency system. The idea of a "transaction" derives from the existence of the gold e.g. "what if I give you gold in exchange for your grain?". The amount of money that you have is equivalent to the amount of gold that you have. However, in the Bitcoin system, the amount of money you have is equivalent to: the sum of the values of the currently-unspent transaction outputs associated with addresses that you control. Here, the existence of the monetary item "bitcoin" derives from the existence of previous transactions.

The universe and its physical behaviour enforces the relative scarcity of gold. The mining competition in the Bitcoin system enforces the relative scarcity of coinbase transactions that create new bitcoin.

Any miner could of course devise and follow a new ruleset, e.g. one that doubled the amount of new bitcoin in a coinbase transaction, but other miners would continue mining on the pre-existing chain, so there would be a "fork" - two transaction history chains that diverge from a particular block on the original chain. Each of these transaction history chains ("blockchains") would acquire a market price. Market participants would prefer coins on one chain to the coins on the other. The chain that inflated more (created more coins) would probably be worth less over time (if supply increases, demand/supply ratio i.e. price decreases). This means that although a miner could fork and create a chain in which he acquired more currency units per block created, each unit would be worth less, perhaps much less, thus over time lowering the return on his investment in mining equipment. The fear of loss is what drives the miners to keep using the same ruleset and incidentally to preserve a single transaction history chain with relatively scarce currency unit creation.
- Note: This last thought is not my own, although I arrived at it here from a different starting point - it comes from reading discussions in the #trilema chat channel log and articles from Mircea Popescu.
- Note: Fear of loss for a miner applies when the miner is small relative to the total mining effort. In the case where mining is sufficiently centralised for a sufficient long period of time, the central mining entity would begin to be able to make changes to the ruleset with much less fear of loss.






PROJECT LOG



I want to find a small standard transaction, i.e. one with 1 input and 1 or 2 outputs.


Update: My working definition of a standard transaction:
- It can have multiple inputs and outputs.
- The input and the outputs must all be single-signature Pay-To-Public-Key-Hash (P2PKH).
- The public key in the input scriptSig may be compressed.


Work computer: Aineko, a 2008 Macbook running Mac OS X 10.6.8 Snow Leopard.


First, I need a sample transaction.

Browse to:
blockchain.info

Newest block listed: 519889

Click "See More" link to browse to larger list of most recent blocks.

Choose seventh-to-last block: 519883

Click its block height to browse to block height page.
blockchain.info/block-height/519883

Block Height 519883: Blocks at depth 519883 in the bitcoin blockchain

Summary:
- Height: 519883 (Main chain)
- Hash:
0000000000000000001a226aea1237aa124c741d73e897cc8384f273578a682e
- Previous Block:
000000000000000000306d04afee78bf94eafee3108b0e9a5a855bf8ecd398c3
- Next Blocks:
00000000000000000003b14ec2a3ac99ec3269b9f9b5a133a96f258ea31f706c
- Time: 2018-04-25 15:45:09
- Received Time: 2018-04-25 15:45:09
- Relayed By: BTC.com
- Difficulty: 3,839,316,899,029.67
- Bits: 390680589
- Number Of Transactions: 2819
- Output Total: 19,872.3921596 BTC
- Estimated Transaction Volume: 1,268.15051772 BTC
- Size: 1157.477 KB
- Version: 0x20000000
- Merkle Root:
38fed9146fcd86d5aaee12b9785e2fe2ec84a4efe15e5ebb6273f70e90884df0
- Nonce: 2668289829
- Block Reward: 12.5 BTC
- Transaction Fees: 1.12204974 BTC



There's only one block displayed. However, if there was an another block at this block height, which was not added to the main chain, then the details of two blocks would probably be displayed.

"Next Blocks" should perhaps be "Next Block(s)".

The hash value is a link.
Clicking it leads to:
blockchain.info/block/0000000000000000001a226aea1237aa124c741d73e897cc8384f273578a682e

Block #519883
Summary:
- Number Of Transactions: 2819
- Output Total: 19,872.3921596 BTC
- Estimated Transaction Volume: 1,268.15051772 BTC
- Transaction Fees: 1.12204974 BTC
- Height: 519883 (Main Chain)
- Timestamp: 2018-04-25 15:45:09
- Received Time: 2018-04-25 15:45:09
- Relayed By: BTC.com
- Difficulty: 3,839,316,899,029.67
- Bits: 390680589
- Size: 1157.477 kB
- Weight: 3992.867 kWU
- Version: 0x20000000
- Nonce: 2668289829
- Block Reward: 12.5 BTC

Hashes:
- Hash
0000000000000000001a226aea1237aa124c741d73e897cc8384f273578a682e
- Previous Block
000000000000000000306d04afee78bf94eafee3108b0e9a5a855bf8ecd398c3
- Next Block(s)
00000000000000000003b14ec2a3ac99ec3269b9f9b5a133a96f258ea31f706c
- Merkle Root
38fed9146fcd86d5aaee12b9785e2fe2ec84a4efe15e5ebb6273f70e90884df0



Many transactions are listed on the page (presumably 2819 in total).

First one:

fdf94f02a95e08c034d248f77c3cadd8fa0e870342f63caaa73f6e4186fcd8cb
2018-04-25 15:45:09

- No Inputs (Newly Generated Coins)
=>
- 1C1mCxRukix1KfegAY5zQQJV7samAciZpv: 13.62204974 BTC
- Unable to decode output address: 0 BTC

[Total] 13.62204974 BTC


The transaction hash is a link to:
blockchain.info/tx/fdf94f02a95e08c034d248f77c3cadd8fa0e870342f63caaa73f6e4186fcd8cb


This transaction lacks an input, which means that it's the mining reward for the miner who created the block.


Let's move on to the second transaction.

c023284e8ba0eee2f71fc07121a9364900c8af7d8ec9fe0266a6b768dd9c1b74
2018-04-25 11:30:08

- 1G1jQNuLLgYBrS8WPoyg1Hxhv7AdXgL66K
=>
- 3QUseoZurNCbLAMPNSedphGNdM796B9MhQ: 1.135445 BTC

[Total] 1.135445 BTC



The "3" at the start of the output address indicates a multi-signature address.


Let's move on through the transactions until I find a non-multisig transaction with 1 input and 1 or 2 outputs.


9ff7742262f68c560efd352f435b977c024797dc538f7efa08a467039858e42f
2018-04-25 15:33:43

- 1CXg2fkCXCoHHnr4ijvZCTdqe9hPQkyDZG
=>
- 15haDNxwdj5Q4TgphVtg3CakrYZUBdY3Qu 0.04315624 BTC
- 15o4XcczE8Vx5kVwqGSjCNUh5V9R8sbbeW 0.240153 BTC

[Total] 0.28330924 BTC



Looks good. 1 input, 2 outputs. All addresses start with "1".

The transaction hash is a link to:
blockchain.info/tx/9ff7742262f68c560efd352f435b977c024797dc538f7efa08a467039858e42f


Click the link to browse to the transaction page.

9ff7742262f68c560efd352f435b977c024797dc538f7efa08a467039858e42f

- 1CXg2fkCXCoHHnr4ijvZCTdqe9hPQkyDZG (0.28725795 BTC - Output)
=>
- 15haDNxwdj5Q4TgphVtg3CakrYZUBdY3Qu - (Unspent) 0.04315624 BTC
- 15o4XcczE8Vx5kVwqGSjCNUh5V9R8sbbeW - (Spent) 0.240153 BTC

11 Confirmations
[Total] 0.28330924 BTC


Summary
- Size: 226 (bytes)
- Weight: 904
- Received Time: 2018-04-25 15:33:43
- Lock Time: Block 519882
- Included In Blocks: 519883 ( 2018-04-25 15:45:09 + 11 minutes )
- Confirmations: 11 Confirmations

Inputs and Outputs
- Total Input: 0.28725795 BTC
- Total Output: 0.28330924 BTC
- Fees: 0.00394871 BTC
- Fee per byte: 1,747.217 sat/B
- Fee per weight unit: 436.804 sat/WU
- Estimated BTC Transacted: 0.04315624 BTC

Input Scripts

1)
ScriptSig:
PUSHDATA(72)
[3045022100d8321c3c4b3fc3a34e30547d1cb6aea6855cad330b590b53e824f647dd49239902201e9056e82297a2108b1969cc497150734ab78f51cbf979885afc104f70a6515001]
PUSHDATA(33)
[02a534d65273f8520e1e841ccbb782b71e06b4117a90bb92c655f8131cf6ae2261]

Output Scripts

1)
DUP
HASH160
PUSHDATA(20)
[338cdde52f708236affa5675f969606ff846ee6f]
EQUALVERIFY
CHECKSIG

2)
DUP
HASH160
PUSHDATA(20)
[3496950f1a01285a1605ac9337c06a5596b9fcd8]
EQUALVERIFY
CHECKSIG



I don't see an option for displaying the raw transaction.


Google "look up raw transaction bitcoin".


Excerpt from 5th result:
bitcointalk.org/index.php?topic=332706.0
Author: dserrano5
PGP 404B 1B79 DF33 2359 4C8F 0F5E 1BCC 1A1F 280A 01F9 (bitcointalkatdserrano5.es)
Date: November 13, 2013, 01:08:24 PM
Subject: Re: Get Hex transaction from nodes

Quote from: DualSignal on November 13, 2013, 12:55:56 PM

How do you get the raw transaction in Hex format, from a transaction that is unconfirmed, and I didn't make, but appears on http://blockchain.info in a format that allows it to be passed into http://blockchain.info/pushtx when it is dropped from all miners memory pools.


Subject line says "from nodes", is that a requirement? blockchain.info returns the raw transaction if you append '?format=hex' to the URL of a transaction, e.g.:

http://blockchain.info/tx/9021b49d445c719106c95d561b9c3fac7bcb3650db67684a9226cd7fa1e1c1a0?format=hex



I note the existence of a "pushtx" option on blockchain.info.


Ok.

Take the link to the selected transaction and append '?format=hex' to it.

http://blockchain.info/tx/9ff7742262f68c560efd352f435b977c024797dc538f7efa08a467039858e42f?format=hex


Browse to this link.

01000000014f8cad2ca3b602750e06c4300d91b22c17dd97bb5ce635d860df8b0ab325b336010000006b483045022100d8321c3c4b3fc3a34e30547d1cb6aea6855cad330b590b53e824f647dd49239902201e9056e82297a2108b1969cc497150734ab78f51cbf979885afc104f70a65150012102a534d65273f8520e1e841ccbb782b71e06b4117a90bb92c655f8131cf6ae2261fdffffff02e8d94100000000001976a914338cdde52f708236affa5675f969606ff846ee6f88acc4716e01000000001976a9143496950f1a01285a1605ac9337c06a5596b9fcd888accaee0700


Hm. So this is a raw transaction, shown in hex.


The first result for googling "look up raw transaction bitcoin" was:
blockchain.info/decode-tx

Browse to this page.
"This page will decode a raw transaction in hex format (i.e. characters 0-9, a-f) and display it in human readable format"
There is a text field and a button named "Submit Transaction".

Paste in the raw transaction in hex format and click Submit Transaction.

Result:

{ "lock_time":519882, "size":226, "inputs":[ { "prev_out":{ "index":1, "hash":"36b325b30a8bdf60d835e65cbb97dd172cb2910d30c4060e7502b6a32cad8c4f" }, "script":"483045022100d8321c3c4b3fc3a34e30547d1cb6aea6855cad330b590b53e824f647dd49239902201e9056e82297a2108b1969cc497150734ab78f51cbf979885afc104f70a65150012102a534d65273f8520e1e841ccbb782b71e06b4117a90bb92c655f8131cf6ae2261" } ], "version":1, "vin_sz":1, "hash":"9ff7742262f68c560efd352f435b977c024797dc538f7efa08a467039858e42f", "vout_sz":2, "out":[ { "script_string":"OP_DUP OP_HASH160 338cdde52f708236affa5675f969606ff846ee6f OP_EQUALVERIFY OP_CHECKSIG", "address":"15haDNxwdj5Q4TgphVtg3CakrYZUBdY3Qu", "value":4315624, "script":"76a914338cdde52f708236affa5675f969606ff846ee6f88ac" }, { "script_string":"OP_DUP OP_HASH160 3496950f1a01285a1605ac9337c06a5596b9fcd8 OP_EQUALVERIFY OP_CHECKSIG", "address":"15o4XcczE8Vx5kVwqGSjCNUh5V9R8sbbeW", "value":24015300, "script":"76a9143496950f1a01285a1605ac9337c06a5596b9fcd888ac" } ] }



I think "out" might mean "outputs".



Let's do some reading:

Excerpts from:
bitcoin.org/en/developer-guide

A pop-up note is included on this page: "This documentation has not been extensively reviewed by Bitcoin experts and so likely contains numerous errors."


Block Chain

The block chain provides Bitcoin's public ledger, an ordered and timestamped record of transactions. This system is used to protect against double spending and modification of previous transaction records.

Each full node in the Bitcoin network independently stores a block chain containing only blocks validated by that node. When several nodes all have the same blocks in their block chain, they are considered to be in consensus. The validation rules these nodes follow to maintain consensus are called consensus rules. [...]


Block Chain Overview

[...]

A block of one or more new transactions is collected into the transaction data part of a block. Copies of each transaction are hashed, and the hashes are then paired, hashed, paired again, and hashed again until a single hash remains, the merkle root of a merkle tree.

The merkle root is stored in the block header. Each block also stores the hash of the previous block's header, chaining the blocks together. This ensures a transaction cannot be modified without modifying the block that records it and all following blocks.

Transactions are also chained together. Bitcoin wallet software gives the impression that satoshis are sent from and to wallets, but bitcoins really move from transaction to transaction. Each transaction spends the satoshis previously received in one or more earlier transactions, so the input of one transaction is the output of a previous transaction.

[...]

A single transaction can create multiple outputs, as would be the case when sending to multiple addresses, but each output of a particular transaction can only be used as an input once in the block chain. Any subsequent reference is a forbidden double spend - an attempt to spend the same satoshis twice.

Outputs are tied to transaction identifiers (TXIDs), which are the hashes of signed transactions.

Because each output of a particular transaction can only be spent once, the outputs of all transactions included in the block chain can be categorized as either Unspent Transaction Outputs (UTXOs) or spent transaction outputs. For a payment to be valid, it must only use UTXOs as inputs.

Ignoring coinbase transactions (described later), if the value of a transaction's outputs exceed its inputs, the transaction will be rejected - but if the inputs exceed the value of the outputs, any difference in value may be claimed as a transaction fee by the Bitcoin miner who creates the block containing that transaction.

[...]


Block Height And Forking

Any Bitcoin miner who successfully hashes a block header to a value below the target threshold can add the entire block to the block chain (assuming the block is otherwise valid). These blocks are commonly addressed by their block height - the number of blocks between them and the first Bitcoin block (block 0, most commonly known as the genesis block).

[...]

Multiple blocks can all have the same block height, as is common when two or more miners each produce a block at roughly the same time. This creates an apparent fork in the block chain, as shown in the illustration above.

When miners produce simultaneous blocks at the end of the block chain, each node individually chooses which block to accept. In the absence of other considerations, discussed below, nodes usually use the first block they see.

Eventually a miner produces another block which attaches to only one of the competing simultaneously-mined blocks. This makes that side of the fork stronger than the other side. Assuming a fork only contains valid blocks, normal peers always follow the most difficult chain to recreate and throw away stale blocks belonging to shorter forks. (Stale blocks are also sometimes called orphans or orphan blocks, but those terms are also used for true orphan blocks without a known parent block.)

[...]

Since multiple blocks can have the same height during a block chain fork, block height should not be used as a globally unique identifier. Instead, blocks are usually referenced by the hash of their header (often with the byte order reversed, and in hexadecimal).


Transaction Data

Every block must include one or more transactions. The first one of these transactions must be a coinbase transaction, also called a generation transaction, which should collect and spend the block reward (comprised of a block subsidy and any transaction fees paid by transactions included in this block).

The UTXO of a coinbase transaction has the special condition that it cannot be spent (used as an input) for at least 100 blocks. This temporarily prevents a miner from spending the transaction fees and block reward from a block that may later be determined to be stale (and therefore the coinbase transaction destroyed) after a block chain fork.

Blocks are not required to include any non-coinbase transactions, but miners almost always do include additional transactions in order to collect their transaction fees.

All transactions, including the coinbase transaction, are encoded into blocks in binary rawtransaction format.

The rawtransaction format is hashed to create the transaction identifier (txid). From these txids, the merkle tree is constructed by pairing each txid with one other txid and then hashing them together. If there are an odd number of txids, the txid without a partner is hashed with a copy of itself.

The resulting hashes themselves are each paired with one other hash and hashed together. Any hash without a partner is hashed with itself. The process repeats until only one hash remains, the merkle root.

[...]


Transactions

Transactions let users spend satoshis. Each transaction is constructed out of several parts which enable both simple direct payments and complex transactions.

[...]

To keep things simple, this section pretends coinbase transactions do not exist.

[...]

Each transaction has at least one input and one output. Each input spends the satoshis paid to a previous output. Each output then waits as an Unspent Transaction Output (UTXO) until a later input spends it. When your Bitcoin wallet tells you that you have a 10,000 satoshi balance, it really means that you have 10,000 satoshis waiting in one or more UTXOs.

Each transaction is prefixed by a four-byte transaction version number which tells Bitcoin peers and miners which set of rules to use to validate it.

[...]

An output has an implied index number based on its location in the transaction - the index of the first output is zero. The output also has an amount in satoshis which it pays to a conditional pubkey script. Anyone who can satisfy the conditions of that pubkey script can spend up to the amount of satoshis paid to it.

An input uses a transaction identifier (txid) and an output index number (often called "vout" for output vector) to identify a particular output to be spent. It also has a signature script which allows it to provide data parameters that satisfy the conditionals in the pubkey script. (The sequence number and locktime are related and will be covered together in a later subsection.)

[...]

[Let's describe] the workflow Alice uses to send Bob a transaction and which Bob later uses to spend that transaction. Both Alice and Bob will use the most common form of the standard Pay-To-Public-Key-Hash (P2PKH) transaction type. P2PKH lets Alice spend satoshis to a typical Bitcoin address, and then lets Bob further spend those satoshis using a simple cryptographic key pair.

Bob must first generate a private/public key pair before Alice can create the first transaction. Bitcoin uses the Elliptic Curve Digital Signature Algorithm (ECDSA) with the secp256k1 curve; secp256k1 private keys are 256 bits of random data. A copy of that data is deterministically transformed into an secp256k1 public key. Because the transformation can be reliably repeated later, the public key does not need to be stored.

The public key (pubkey) is then cryptographically hashed. This pubkey hash can also be reliably repeated later, so it also does not need to be stored. The hash shortens and obfuscates the public key, making manual transcription easier and providing security against unanticipated problems which might allow reconstruction of private keys from public key data at some later point.

Bob provides the pubkey hash to Alice. Pubkey hashes are almost always sent encoded as Bitcoin addresses, which are base58-encoded strings containing an address version number, the hash, and an error-detection checksum to catch typos. The address can be transmitted through any medium, including one-way mediums which prevent the spender from communicating with the receiver, and it can be further encoded into another format, such as a QR code containing a bitcoin: URI.

Once Alice has the address and decodes it back into a standard hash, she can create the first transaction. She creates a standard P2PKH transaction output containing instructions which allow anyone to spend that output if they can prove they control the private key corresponding to Bob's hashed public key. These instructions are called the pubkey script or scriptPubKey.

Once Alice has the address and decodes it back into a standard hash, she can create the first transaction. She creates a standard P2PKH transaction output containing instructions which allow anyone to spend that output if they can prove they control the private key corresponding to Bob's hashed public key. These instructions are called the pubkey script or scriptPubKey.

When, some time later, Bob decides to spend the UTXO, he must create an input which references the transaction Alice created by its hash, called a Transaction Identifier (txid), and the specific output she used by its index number (output index). He must then create a signature script - a collection of data parameters which satisfy the conditions Alice placed in the previous output's pubkey script. Signature scripts are also called scriptSigs.

Pubkey scripts and signature scripts combine secp256k1 pubkeys and signatures with conditional logic, creating a programmable authorization mechanism.

[...]

For a P2PKH-style output, Bob's signature script will contain the following two pieces of data:

1) His full (unhashed) public key, so the pubkey script can check that it hashes to the same value as the pubkey hash provided by Alice.

2) An secp256k1 signature made by using the ECDSA cryptographic formula to combine certain transaction data (described below) with Bob's private key. This lets the pubkey script verify that Bob owns the private key which created the public key.

Bob's secp256k1 signature doesn't just prove Bob controls his private key; it also makes the non-signature-script parts of his transaction tamper-proof so Bob can safely broadcast them over the peer-to-peer network.

[...]

The data Bob signs includes the txid and output index of the previous transaction, the previous output's pubkey script, the pubkey script Bob creates which will let the next recipient spend this transaction's output, and the amount of satoshis to spend to the next recipient. In essence, the entire transaction is signed except for any signature scripts, which hold the full public keys and secp256k1 signatures.

After putting his signature and public key in the signature script, Bob broadcasts the transaction to Bitcoin miners through the peer-to-peer network. Each peer and miner independently validates the transaction before broadcasting it further or attempting to include it in a new block of transactions.

[...]


P2PKH Script Validation

The validation procedure requires evaluation of the signature script and pubkey script. In a P2PKH output, the pubkey script is:

OP_DUP OP_HASH160 [PubkeyHash] OP_EQUALVERIFY OP_CHECKSIG


The spender's signature script is evaluated and prefixed to the beginning of the script. In a P2PKH transaction, the signature script contains an secp256k1 signature (sig) and full public key (pubkey), creating the following concatenation:

[Sig] [PubKey] OP_DUP OP_HASH160 [PubkeyHash] OP_EQUALVERIFY OP_CHECKSIG


The script language is a Forth-like stack-based language deliberately designed to be stateless and not Turing complete. Statelessness ensures that once a transaction is added to the block chain, there is no condition which renders it permanently unspendable. Turing-incompleteness (specifically, a lack of loops or gotos) makes the script language less flexible and more predictable, greatly simplifying the security model.

To test whether the transaction is valid, signature script and pubkey script operations are executed one item at a time, starting with Bob's signature script and continuing to the end of Alice's pubkey script.

[...]

1) The signature (from Bob's signature script) is added (pushed) to an empty stack. Because it's just data, nothing is done except adding it to the stack. The public key (also from the signature script) is pushed on top of the signature.

2) From Alice's pubkey script, the OP_DUP operation is executed. OP_DUP pushes onto the stack a copy of the data currently at the top of it - in this case creating a copy of the public key Bob provided.

3) The operation executed next, OP_HASH160, pushes onto the stack a hash of the data currently on top of it - in this case, Bob's public key. This creates a hash of Bob's public key.

4) Alice's pubkey script then pushes the pubkey hash that Bob gave her for the first transaction. At this point, there should be two copies of Bob's pubkey hash at the top of the stack.

5) Alice's pubkey script then pushes the pubkey hash that Bob gave her for the first transaction. At this point, there should be two copies of Bob's pubkey hash at the top of the stack.

OP_EQUAL (not shown) checks the two values at the top of the stack; in this case, it checks whether the pubkey hash generated from the full public key Bob provided equals the pubkey hash Alice provided when she created transaction #1. OP_EQUAL pops (removes from the top of the stack) the two values it compared, and replaces them with the result of that comparison: zero (false) or one (true).

OP_VERIFY (not shown) checks the value at the top of the stack. If the value is false it immediately terminates evaluation and the transaction validation fails. Otherwise it pops the true value off the stack.

6) Finally, Alice's pubkey script executes OP_CHECKSIG, which checks the signature Bob provided against the now-authenticated public key he also provided. If the signature matches the public key and was generated using all of the data required to be signed, OP_CHECKSIG pushes the value true onto the top of the stack.

If false is not at the top of the stack after the pubkey script has been evaluated, the transaction is valid (provided there are no other problems with it).

[...]


Non-Standard Transactions

If you use anything besides a standard pubkey script in an output, peers and miners using the default Bitcoin Core settings will neither accept, broadcast, nor mine your transaction. When you try to broadcast your transaction to a peer running the default settings, you will receive an error.

[...]

Note: standard transactions are designed to protect and help the network, not prevent you from making mistakes. It's easy to create standard transactions which make the satoshis sent to them unspendable.

As of Bitcoin Core 0.9.3, standard transactions must also meet the following conditions:

1) The transaction must be finalized: either its locktime must be in the past (or less than or equal to the current block height), or all of its sequence numbers must be 0xffffffff.

2) The transaction must be smaller than 100,000 bytes. That's around 200 times larger than a typical single-input, single-output P2PKH transaction.

3) Each of the transaction's signature scripts must be smaller than 1,650 bytes.

[...]

4) The transaction's signature script must only push data to the script evaluation stack. It cannot push new opcodes, with the exception of opcodes which solely push data to the stack.

5) The transaction must not include any outputs which receive fewer than 1/3 as many satoshis as it would take to spend it in a typical input. That's currently 546 satoshis for a P2PKH or P2SH output on a Bitcoin Core node with the default relay fee. [...]

[...]


Locktime And Sequence Number

One thing all signature hash types sign is the transaction's locktime. (Called nLockTime in the Bitcoin Core source code.) The locktime indicates the earliest time a transaction can be added to the block chain.

Locktime allows signers to create time-locked transactions which will only become valid in the future, giving the signers a chance to change their minds.

If any of the signers change their mind, they can create a new non-locktime transaction. The new transaction will use, as one of its inputs, one of the same outputs which was used as an input to the locktime transaction. This makes the locktime transaction invalid if the new transaction is added to the block chain before the time lock expires.

Care must be taken near the expiry time of a time lock. The peer-to-peer network allows block time to be up to two hours ahead of real time, so a locktime transaction can be added to the block chain up to two hours before its time lock officially expires. Also, blocks are not created at guaranteed intervals, so any attempt to cancel a valuable transaction should be made a few hours before the time lock expires.

Previous versions of Bitcoin Core provided a feature which prevented transaction signers from using the method described above to cancel a time-locked transaction, but a necessary part of this feature was disabled to prevent denial of service attacks. A legacy of this system are four-byte sequence numbers in every input. Sequence numbers were meant to allow multiple signers to agree to update a transaction; when they finished updating the transaction, they could agree to set every input's sequence number to the four-byte unsigned maximum (0xffffffff), allowing the transaction to be added to a block even if its time lock had not expired.

Even today, setting all sequence numbers to 0xffffffff (the default in Bitcoin Core) can still disable the time lock, so if you want to use locktime, at least one input must have a sequence number below the maximum. Since sequence numbers are not used by the network for any other purpose, setting any sequence number to zero is sufficient to enable locktime.

Locktime itself is an unsigned 4-byte integer which can be parsed two ways:

1) If less than 500 million, locktime is parsed as a block height. The transaction can be added to any block which has this height or higher.

2) If greater than or equal to 500 million, locktime is parsed using the Unix epoch time format (the number of seconds elapsed since 1970-01-01T00:00 UTC - currently over 1.395 billion). The transaction can be added to any block whose block time is greater than the locktime.


Transaction Fees And Change

Transactions pay fees based on the total byte size of the signed transaction. Fees per byte are calculated based on current demand for space in mined blocks with fees rising as demand increases. The transaction fee is given to the Bitcoin miner, [...] and so it is ultimately up to each miner to choose the minimum transaction fee they will accept.

There is also a concept of so-called "high-priority transactions" which spend satoshis that have not moved for a long time.

In the past, these "priority" transaction were often exempt from the normal fee requirements. Before Bitcoin Core 0.12, 50 KB of each block would be reserved for these high-priority transactions, however this is now set to 0 KB by default. After the priority area, all transactions are prioritized based on their fee per byte, with higher-paying transactions being added in sequence until all of the available space is filled.

As of Bitcoin Core 0.9, a minimum fee (currently 1,000 satoshis) has been required to broadcast a transaction across the network. Any transaction paying only the minimum fee should be prepared to wait a long time before there's enough spare space in a block to include it. [...]

Since each transaction spends Unspent Transaction Outputs (UTXOs) and because a UTXO can only be spent once, the full value of the included UTXOs must be [sent to output addresses] or given to a miner as a transaction fee. Few people will have UTXOs that exactly match the amount they want to pay, so most transactions include a change output.

Change outputs are regular outputs which spend the surplus satoshis from the UTXOs back to the spender. They can reuse the same P2PKH pubkey hash [...] as was used in the UTXO.

[...]


Transaction Malleability

None of Bitcoin's signature hash types protect the signature script, leaving the door open for a limited denial of service attack called transaction malleability. The signature script contains the secp256k1 signature, which can't sign itself, allowing attackers to make non-functional modifications to a transaction without rendering it invalid. For example, an attacker can add some data to the signature script which will be dropped before the previous pubkey script is processed.

Although the modifications are non-functional - so they do not change what inputs the transaction uses nor what outputs it pays - they do change the computed hash of the transaction. Since each transaction links to previous transactions using hashes as a transaction identifier (txid), a modified transaction will not have the txid its creator expected.

This isn't a problem for most Bitcoin transactions which are designed to be added to the block chain immediately. But it does become a problem when the output from a transaction is spent before that transaction is added to the block chain.

[...]

New transactions should not depend on previous transactions which have not been added to the block chain yet. [...]

Transaction malleability also affects payment tracking. Bitcoin Core's RPC interface lets you track transactions by their txid - but if that txid changes because the transaction was modified, it may appear that the transaction has disappeared from the network.

Current best practices for transaction tracking dictate that a transaction should be tracked by the transaction outputs (UTXOs) it spends as inputs, as they cannot be changed without invalidating the transaction.

Best practices further dictate that if a transaction does seem to disappear from the network and needs to be reissued, that it be reissued in a way that invalidates the lost transaction. One method which will always work is to ensure the reissued payment spends all of the same outputs that the lost transaction used as inputs.

[...]


Private Key Formats

Private keys are what are used to unlock satoshis from a particular address. In Bitcoin, a private key in standard format is simply a 256-bit number, between the values:

0x01 and 0xFFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFE BAAE DCE6 AF48 A03B BFD2 5E8C D036 4140, representing nearly the entire range of 2^256-1 values. The range is governed by the secp256k1 ECDSA encryption standard used by Bitcoin.


Wallet Import Format (WIF)

In order to make copying of private keys less prone to error, Wallet Import Format may be utilized. WIF uses base58Check encoding on an private key, greatly decreasing the chance of copying error, much like standard Bitcoin addresses.

1) Take a private key.

2) Add a 0x80 byte in front of it for mainnet addresses or 0xef for testnet addresses.

3) Append a 0x01 byte after it if it should be used with compressed public keys (described in a later subsection). Nothing is appended if it is used with uncompressed public keys.

4) Perform a SHA-256 hash on the extended key.

5) Perform a SHA-256 hash on result of SHA-256 hash.

6) Take the first four bytes of the second SHA-256 hash; this is the checksum.

7) Add the four checksum bytes from point 5 at the end of the extended key from point 2.

8) Convert the result from a byte string into a Base58 string using Base58Check encoding.

The process is easily reversible, using the Base58 decoding function, and removing the padding.

[...]


Public Key Formats

Bitcoin ECDSA public keys represent a point on a particular Elliptic Curve (EC) defined in secp256k1. In their traditional uncompressed form, public keys contain an identification byte, a 32-byte X coordinate, and a 32-byte Y coordinate. The extremely simplified illustration below shows such a point on the elliptic curve used by Bitcoin, y^2 = x^3 + 7, over a field of contiguous numbers.

[illustration not included]

(Secp256k1 actually modulos coordinates by a large prime, which produces a field of non-contiguous integers and a significantly less clear plot, although the principles are the same.)

An almost 50% reduction in public key size can be realized without changing any fundamentals by dropping the Y coordinate. This is possible because only two points along the curve share any particular X coordinate, so the 32-byte Y coordinate can be replaced with a single bit indicating whether the point is on what appears in the illustration as the "top" side or the "bottom" side.

No data is lost by creating these compressed public keys - only a small amount of CPU is necessary to reconstruct the Y coordinate and access the uncompressed public key. Both uncompressed and compressed public keys are described in official secp256k1 documentation and supported by default in the widely-used OpenSSL library.

Because they're easy to use, and because they reduce almost by half the block chain space used to store public keys for every spent output, compressed public keys are the default in Bitcoin Core and are the recommended default for all Bitcoin software.

However, Bitcoin Core prior to 0.6 used uncompressed keys. This creates a few complications, as the hashed form of an uncompressed key is different than the hashed form of a compressed key, so the same key works with two different P2PKH addresses. This also means that the key must be submitted in the correct format in the signature script so it matches the hash in the previous output's pubkey script.

For this reason, Bitcoin Core uses several different identifier bytes to help programs identify how keys should be used:

1) Private keys meant to be used with compressed public keys have 0x01 appended to them before being Base-58 encoded. (See the private key encoding section above.)

2) Uncompressed public keys start with 0x04; compressed public keys begin with 0x03 or 0x02 depending on whether they're greater or less than the midpoint of the curve. These prefix bytes are all used in official secp256k1 documentation.

[...]


Verifying Payment

[...] Broadcasting a transaction to the network doesn't ensure that the receiver gets paid. A malicious spender can create one transaction that pays the receiver and a second one that pays the same input back to himself. Only one of these transactions will be added to the block chain, and nobody can say for sure which one it will be.

Two or more transactions spending the same input are commonly referred to as a double spend.

[...]

6 confirmations: The network has spent about an hour working to protect the transaction against double spends and the transaction is buried under six blocks. Even a reasonably lucky attacker would require a large percentage of the total network hashing power to replace six blocks. Although this number is somewhat arbitrary, software handling high-value transactions, or otherwise at risk for fraud, should wait for at least six confirmations before treating a payment as accepted.




Hm.


I'm going to use Ken Shirriff's article Bitcoins the hard way: Using the raw Bitcoin protocol and the associated code [link] as a guide. Footnote 11 in the article provided the link to the code.


Excerpts from the article:

Bitcoin uses a variety of keys and addresses, so the following diagram may help explain them. You start by creating a random 256-bit private key. The private key is needed to sign a transaction and thus transfer (spend) bitcoins. Thus, the private key must be kept secret or else your bitcoins can be stolen.

The Elliptic Curve DSA algorithm generates a 512-bit public key from the private key. (Elliptic curve cryptography will be discussed later.) This public key is used to verify the signature on a transaction. Inconveniently, the Bitcoin protocol adds a prefix of 04 to the public key. The public key is not revealed until a transaction is signed, unlike most systems where the public key is made public.

The next step is to generate the Bitcoin address that is shared with others. Since the 512-bit public key is inconveniently large, it is hashed down to 160 bits using the SHA-256 and RIPEMD hash algorithms. The key is then encoded in ASCII using Bitcoin's custom Base58Check encoding. The resulting address, such as 1KKKK6N21XKo48zWKuQKXdvSsCf95ibHFa, is the address people publish in order to receive bitcoins. Note that you cannot determine the public key or the private key from the address. If you lose your private key (for instance by throwing out your hard drive), your bitcoins are lost forever.

Finally, the Wallet Interchange Format key (WIF) is used to add a private key to your client wallet software. This is simply a Base58Check encoding of the private key into ASCII, which is easily reversed to obtain the 256-bit private key. [...]

To summarize, there are three types of keys: the private key, the public key, and the hash of the public key, and they are represented externally in ASCII using Base58Check encoding. The private key is the important key, since it is required to access the bitcoins and the other keys can be generated from it. The public key hash is the Bitcoin address you see published.

[...]

A transaction is the basic operation in the Bitcoin system. You might expect that a transaction simply moves some bitcoins from one address to another address, but it's more complicated than that. A Bitcoin transaction moves bitcoins between one or more inputs and outputs. Each input is a transaction and address supplying bitcoins. Each output is an address receiving bitcoin, along with the amount of bitcoins going to that address.

[...]

Each input used must be entirely spent in a transaction. If an address received 100 bitcoins in a transaction and you just want to spend 1 bitcoin, the transaction must spend all 100. The solution is to use a second output for change, which returns the 99 leftover bitcoins back to you.

Transactions can also include fees. If there are any bitcoins left over after adding up the inputs and subtracting the outputs, the remainder is a fee paid to the miner. The fee isn't strictly required, but transactions without a fee will be a low priority for miners and may not be processed for days or may be discarded entirely.

[...]


Manually creating a transaction

For my experiment I used a simple transaction with one input and one output. [...] I started by bying bitcoins from Coinbase and putting 0.00101234 bitcoins into address 1MMMMSUb1piy2ufrSguNUdFmAcvqrQF8M5, which was transaction 81b4c832.... My goal was to create a transaction to transfer these bitcoins to the address I created above, 1KKKK6N21XKo48zWKuQKXdvSsCf95ibHFa, subtracting a fee of 0.0001 bitcoins. Thus, the destination address will receive 0.00091234 bitcoins.

[...]

Following the specification
[ http://en.bitcoin.it/wiki/Protocol_specification#tx ],
the unsigned transaction can be assembled fairly easily, as shown below. There is one input, which is using output 0 (the first output) from transaction 81b4c832.... Note that this transaction hash is inconveniently reversed in the transaction. The output amount is 0.00091234 bitcoins (91234 is 0x016462 in hex), which is stored in the value field in little-endian form. The cryptographic parts - scriptSig and scriptPubKey - are more complex and will be discussed later.

[Unsigned Transaction]
- version: 01 00 00 00
- input count: 01
- input
-- previous output hash (reversed): 48 4d 40 d4 5b 9e a0 d6 52 fc a8 25 8a b7 ca a4 25 41 eb 52 97 58 57 f9 6f b5 0c d7 32 c8 b4 81
-- previous output index: 00 00 00 00
-- script length:
-- scriptSig: script containing signature
-- sequence: ff ff ff ff
- output count: 01
- output
-- value: 62 64 01 00 00 00 00 00
-- script length:
-- scriptPubKey: script containing destination address
- block lock time: 00 00 00 00


[...]

The final transaction is shown below. This combines the scriptSig and scriptPubKey above with the unsigned transaction described earlier.

[Signed Transaction]
- version: 01 00 00 00
- input count: 01
- input:
-- previous output hash (reversed): 48 4d 40 d4 5b 9e a0 d6 52 fc a8 25 8a b7 ca a4 25 41 eb 52 97 58 57 f9 6f b5 0c d7 32 c8 b4 81
-- previous output index: 00 00 00 00
-- script length: 8a
-- scriptSig: 47 30 44 02 20 2c b2 65 bf 10 70 7b f4 93 46 c3 51 5d d3 d1 6f c4 54 61 8c 58 ec 0a 0f f4 48 a6 76 c5 4f f7 13 02 20 6c 66 24 d7 62 a1 fc ef 46 18 28 4e ad 8f 08 67 8a c0 5b 13 c8 42 35 f1 65 4e 6a d1 68 23 3e 82 01 41 04 14 e3 01 b2 32 8f 17 44 2c 0b 83 10 d7 87 bf 3d 8a 40 4c fb d0 70 4f 13 5b 6a d4 b2 d3 ee 75 13 10 f9 81 92 6e 53 a6 e8 c3 9b d7 d3 fe fd 57 6c 54 3c ce 49 3c ba c0 63 88 f2 65 1d 1a ac bf cd
-- sequence: ff ff ff ff
- output count: 01
- output:
-- value: 62 64 01 00 00 00 00 00
-- script length: 19
-- scriptPubKey: 76 a9 14 c8 e9 09 96 c7 c6 08 0e e0 62 84 60 0c 68 4e d9 04 d1 4c 5c 88 ac
- block lock time: 00 00 00 00



Very helpful. Bearing in mind the Signed Transaction format described above, let's have another look at the transaction selected earlier.

01000000014f8cad2ca3b602750e06c4300d91b22c17dd97bb5ce635d860df8b0ab325b336010000006b483045022100d8321c3c4b3fc3a34e30547d1cb6aea6855cad330b590b53e824f647dd49239902201e9056e82297a2108b1969cc497150734ab78f51cbf979885afc104f70a65150012102a534d65273f8520e1e841ccbb782b71e06b4117a90bb92c655f8131cf6ae2261fdffffff02e8d94100000000001976a914338cdde52f708236affa5675f969606ff846ee6f88acc4716e01000000001976a9143496950f1a01285a1605ac9337c06a5596b9fcd888accaee0700



I'll split it into single bytes (two hex characters represent one byte).

aineko:~ stjohnpiano$ input=01000000014f8cad2ca3b602750e06c4300d91b22c17dd97bb5ce635d860df8b0ab325b336010000006b483045022100d8321c3c4b3fc3a34e30547d1cb6aea6855cad330b590b53e824f647dd49239902201e9056e82297a2108b1969cc497150734ab78f51cbf979885afc104f70a65150012102a534d65273f8520e1e841ccbb782b71e06b4117a90bb92c655f8131cf6ae2261fdffffff02e8d94100000000001976a914338cdde52f708236affa5675f969606ff846ee6f88acc4716e01000000001976a9143496950f1a01285a1605ac9337c06a5596b9fcd888accaee0700


aineko:~ stjohnpiano$ echo $input

01000000014f8cad2ca3b602750e06c4300d91b22c17dd97bb5ce635d860df8b0ab325b336010000006b483045022100d8321c3c4b3fc3a34e30547d1cb6aea6855cad330b590b53e824f647dd49239902201e9056e82297a2108b1969cc497150734ab78f51cbf979885afc104f70a65150012102a534d65273f8520e1e841ccbb782b71e06b4117a90bb92c655f8131cf6ae2261fdffffff02e8d94100000000001976a914338cdde52f708236affa5675f969606ff846ee6f88acc4716e01000000001976a9143496950f1a01285a1605ac9337c06a5596b9fcd888accaee0700

aineko:~ stjohnpiano$ echo $input | sed 's/../& /g'

01 00 00 00 01 4f 8c ad 2c a3 b6 02 75 0e 06 c4 30 0d 91 b2 2c 17 dd 97 bb 5c e6 35 d8 60 df 8b 0a b3 25 b3 36 01 00 00 00 6b 48 30 45 02 21 00 d8 32 1c 3c 4b 3f c3 a3 4e 30 54 7d 1c b6 ae a6 85 5c ad 33 0b 59 0b 53 e8 24 f6 47 dd 49 23 99 02 20 1e 90 56 e8 22 97 a2 10 8b 19 69 cc 49 71 50 73 4a b7 8f 51 cb f9 79 88 5a fc 10 4f 70 a6 51 50 01 21 02 a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61 fd ff ff ff 02 e8 d9 41 00 00 00 00 00 19 76 a9 14 33 8c dd e5 2f 70 82 36 af fa 56 75 f9 69 60 6f f8 46 ee 6f 88 ac c4 71 6e 01 00 00 00 00 19 76 a9 14 34 96 95 0f 1a 01 28 5a 16 05 ac 93 37 c0 6a 55 96 b9 fc d8 88 ac ca ee 07 00



So the selected transaction, in single bytes, is:

01 00 00 00 01 4f 8c ad 2c a3 b6 02 75 0e 06 c4 30 0d 91 b2 2c 17 dd 97 bb 5c e6 35 d8 60 df 8b 0a b3 25 b3 36 01 00 00 00 6b 48 30 45 02 21 00 d8 32 1c 3c 4b 3f c3 a3 4e 30 54 7d 1c b6 ae a6 85 5c ad 33 0b 59 0b 53 e8 24 f6 47 dd 49 23 99 02 20 1e 90 56 e8 22 97 a2 10 8b 19 69 cc 49 71 50 73 4a b7 8f 51 cb f9 79 88 5a fc 10 4f 70 a6 51 50 01 21 02 a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61 fd ff ff ff 02 e8 d9 41 00 00 00 00 00 19 76 a9 14 33 8c dd e5 2f 70 82 36 af fa 56 75 f9 69 60 6f f8 46 ee 6f 88 ac c4 71 6e 01 00 00 00 00 19 76 a9 14 34 96 95 0f 1a 01 28 5a 16 05 ac 93 37 c0 6a 55 96 b9 fc d8 88 ac ca ee 07 00



Analysis:

- The first four bytes are the version: 01 00 00 00

- The next byte is the input count: 01

- The next item on this tree level should be an "input", which will consist of: previous output hash (reversed), previous output index, script length, scriptSig, and sequence.
-- If the input count were e.g. 02, then there would be two input items.
--- A single byte for input count would imply a maximum of 255 input items. Is this right?

I'll look at the specification linked from Ken Shirriff's article.
en.bitcoin.it/wiki/Protocol_specification#tx

I see that tx_in (transaction input) has a field size of 1+ (i.e. 1 or more bytes). Its data type is var_int.

The text "var_int" links to:
en.bitcoin.it/wiki/Protocol_documentation#Variable_length_integer

Excerpt:

Variable length integer

Integer can be encoded depending on the represented value to save space. Variable length integers always precede an array/vector of a type of data that may vary in length. Longer numbers are encoded in little endian.

- var_int value: < 0xFD
-- Storage length = 1
-- Format = uint8_t

- var_int value: <= 0xFFFF
-- Storage length = 3
-- Format = 0xFD followed by the length as uint16_t

- var_int value: <= 0xFFFF FFFF
-- Storage length = 5
-- Format = 0xFE followed by the length as uint32_t

- var_int value: -
-- Storage length = 9
-- Format = 0xFF followed by the length as uint64_t

If you're reading the Satoshi client code (BitcoinQT) it refers to this encoding as a "CompactSize".


Hm. 0x is a prefix used to indicate a hex number.

I'm not sure what the "-" represents. I would have thought that "<= 0xFFFF FFFF FFFF FFFF" would be written there instead.

Ok, let's work through this:
- If the value of the first byte is less than 253 (< 0xFD), then there is only one byte in the number, and this byte is the number's value. Largest number that can be represented in this format: 252.
- If the value of the first byte is 253 (0xFD), then there are three bytes in the number, and the next two bytes are the number's value. Largest number that can be represented in this format: FF FF, which in decimal is 255*256 + 255 = 65535.
- If the value of the first byte is 254 (0xFE), then there are five bytes in the number, and the next four bytes are the number's value. Largest number that can be represented in this format: FF FF FF FF, which in decimal is 255*(256^3) + 255*(256^2) + 255*256 + 255 = 4294967295.
- If the value of the first input count byte is 255 (0xFF), then there are nine bytes in the number, and the next eight bytes are the number's value. Largest number that can be represented in this format: FF FF FF FF FF FF FF FF, which in decimal is 255*(256^7) + 255*(256^6) + 255*(256^5) + 255*(256^4) + 255*(256^3) + 255*(256^2) + 255*256 + 255 = 18446744073709551615.

There are presumably several limits (explicit or emergent) on the number of transaction inputs in the bitcoind code.

One mentioned in an earlier excerpt was:
"The transaction must be smaller than 100,000 bytes. That's around 200 times larger than a typical single-input, single-output P2PKH transaction."


Let's move back to analysis.

- Within the input item, the first item should be the "previous output hash (reversed)". I think this is the hash of the previous transaction that contains this input.

In Ken Shirriff's signed transaction, this is: 48 4d 40 d4 5b 9e a0 d6 52 fc a8 25 8a b7 ca a4 25 41 eb 52 97 58 57 f9 6f b5 0c d7 32 c8 b4 81

Let's find its length. Hashes have a fixed length, so I should be able to use this length to select the previous output hash in the transaction I am analysing.

aineko:~ stjohnpiano$ input="48 4d 40 d4 5b 9e a0 d6 52 fc a8 25 8a b7 ca a4 25 41 eb 52 97 58 57 f9 6f b5 0c d7 32 c8 b4 81"


aineko:~ stjohnpiano$ echo $input | sed 's/ //g'

484d40d45b9ea0d652fca8258ab7caa42541eb52975857f96fb50cd732c8b481

aineko:~ stjohnpiano$ echo -n "484d40d45b9ea0d652fca8258ab7caa42541eb52975857f96fb50cd732c8b481" | wc -c

64


Note: If I pipe to
wc -c
directly:
aineko:~ stjohnpiano$ echo -n $input | sed 's/ //g' | wc -c

then the result is 65. (I knew this result was wrong because a hex number's length should be even.) I think that a newline character is added by
sed
. Some reading indicates that this behaviour may be specific to OS X.


64 hex letters == 32 bytes.

32 bytes == 256 bits.


If the output hash is 256 bits, it's probably produced by the SHA256 hash function.


I'll use python to select slices of the transaction.

aineko:~ stjohnpiano$ python

Python 2.7.13 (default, Dec 18 2016, 05:35:59)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> transaction="01000000014f8cad2ca3b602750e06c4300d91b22c17dd97bb5ce635d860df8b0ab325b336010000006b483045022100d8321c3c4b3fc3a34e30547d1cb6aea6855cad330b590b53e824f647dd49239902201e9056e82297a2108b1969cc497150734ab78f51cbf979885afc104f70a65150012102a534d65273f8520e1e841ccbb782b71e06b4117a90bb92c655f8131cf6ae2261fdffffff02e8d94100000000001976a914338cdde52f708236affa5675f969606ff846ee6f88acc4716e01000000001976a9143496950f1a01285a1605ac9337c06a5596b9fcd888accaee0700"
>>> output_hash=x[2*5:2*(5+32)]
>>> output_hash=transaction[2*5:2*(5+32)]
>>> len(output_hash)
>>> 64

>>> print output_hash
>>> 4f8cad2ca3b602750e06c4300d91b22c17dd97bb5ce635d860df8b0ab325b336




So, in the raw transaction, the output hash appears here:

01 00 00 00 01 4f 8c ad 2c a3 b6 02 75 0e 06 c4 30 0d 91 b2 2c 17 dd 97 bb 5c e6 35 d8 60 df 8b 0a b3 25 b3 36 01 00 00 00 6b 48 30 45 02 21 00 d8 32 1c 3c 4b 3f c3 a3 4e 30 54 7d 1c b6 ae a6 85 5c ad 33 0b 59 0b 53 e8 24 f6 47 dd 49 23 99 02 20 1e 90 56 e8 22 97 a2 10 8b 19 69 cc 49 71 50 73 4a b7 8f 51 cb f9 79 88 5a fc 10 4f 70 a6 51 50 01 21 02 a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61 fd ff ff ff 02 e8 d9 41 00 00 00 00 00 19 76 a9 14 33 8c dd e5 2f 70 82 36 af fa 56 75 f9 69 60 6f f8 46 ee 6f 88 ac c4 71 6e 01 00 00 00 00 19 76 a9 14 34 96 95 0f 1a 01 28 5a 16 05 ac 93 37 c0 6a 55 96 b9 fc d8 88 ac ca ee 07 00



Next comes the "previous output index", the position of the input within the previous transaction. Technically "input" here means "unspent output".

In Ken Shirriff's transaction, the bytes for the previous output index were: 00 00 00 00

So, four bytes.

In my selected transaction, the next four bytes are: 01 00 00 00

This appears to be in little-endian format i.e. the value is "1". The implied indices for unspent outputs in a transaction start from 0, so this is the second unspent output.

Browse to:
blockchain.info/tx/9ff7742262f68c560efd352f435b977c024797dc538f7efa08a467039858e42f
The source address is: 1CXg2fkCXCoHHnr4ijvZCTdqe9hPQkyDZG
and it has an "Output" link, which leads to:
blockchain.info/tx-index/341730624/1
It is indeed displayed as the second output address in the previous transaction.


Next is the script length, which according to
en.bitcoin.it/wiki/Protocol_specification#tx
is a var_int.

The next byte for this transaction is 6b, which is less than FD, so it is itself the value. 6b = 6*16 + 11 = 107

Therefore, the next 107 bytes are the scriptSig.



The transaction as a whole is:

aineko:~ stjohnpiano$ echo -n
"01000000014f8cad2ca3b602750e06c4300d91b22c17dd97bb5ce635d860df8b0ab325b336010000006b483045022100d8321c3c4b3fc3a34e30547d1cb6aea6855cad330b590b53e824f647dd49239902201e9056e82297a2108b1969cc497150734ab78f51cbf979885afc104f70a65150012102a534d65273f8520e1e841ccbb782b71e06b4117a90bb92c655f8131cf6ae2261fdffffff02e8d94100000000001976a914338cdde52f708236affa5675f969606ff846ee6f88acc4716e01000000001976a9143496950f1a01285a1605ac9337c06a5596b9fcd888accaee0700" | wc -c

452


452 hex characters long.

452/2 = 226 bytes, which matches the length reported by blockchain.info earlier.


Our current position in the transaction is: 4 + 1 + 32 + 4 + 1 = 42. We have just looked at byte 42.
Starting from index 0, we are at index 41.

In python, print transaction[2*42:2*(42+107)].

result:
483045022100d8321c3c4b3fc3a34e30547d1cb6aea6855cad330b590b53e824f647dd49239902201e9056e82297a2108b1969cc497150734ab78f51cbf979885afc104f70a65150012102a534d65273f8520e1e841ccbb782b71e06b4117a90bb92c655f8131cf6ae2261



So, in the raw transaction, the scriptSig appears here:

01 00 00 00 01 4f 8c ad 2c a3 b6 02 75 0e 06 c4 30 0d 91 b2 2c 17 dd 97 bb 5c e6 35 d8 60 df 8b 0a b3 25 b3 36 01 00 00 00 6b 48 30 45 02 21 00 d8 32 1c 3c 4b 3f c3 a3 4e 30 54 7d 1c b6 ae a6 85 5c ad 33 0b 59 0b 53 e8 24 f6 47 dd 49 23 99 02 20 1e 90 56 e8 22 97 a2 10 8b 19 69 cc 49 71 50 73 4a b7 8f 51 cb f9 79 88 5a fc 10 4f 70 a6 51 50 01 21 02 a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61 fd ff ff ff 02 e8 d9 41 00 00 00 00 00 19 76 a9 14 33 8c dd e5 2f 70 82 36 af fa 56 75 f9 69 60 6f f8 46 ee 6f 88 ac c4 71 6e 01 00 00 00 00 19 76 a9 14 34 96 95 0f 1a 01 28 5a 16 05 ac 93 37 c0 6a 55 96 b9 fc d8 88 ac ca ee 07 00



I'll leave the scriptSig for now.


Next: sequence.

In Ken Shirriff's transaction, this was: ff ff ff ff

In this case, it is: fd ff ff ff

I wonder why it's set to "fd ff ff ff", rather than 0.


That's the end of the input item.


Next: The output count

output count == the number of output items

According to
en.bitcoin.it/wiki/Protocol_specification#tx
the output count is a var_int.

The next byte is 02, which is less than FD, so this byte is the value of the output count number. 0*16 + 2 = 2.

There are two outputs, as seen previously in the blockchain.info output earlier.


Next: value (of the first output)

In Ken Shirriff's transaction, this is: 62 64 01 00 00 00 00 00

In the excerpt above, he notes that this is a little-endian value.

So, eight bytes.

In my transaction, the next eight bytes are: e8 d9 41 00 00 00 00 00

In decimal, the value is (14*16+8) + (13*16+9)*256 + (4*16+1)*(256^2) = 4315624 satoshi == 0.04315624 bitcoin.


Next: script length

According to
en.bitcoin.it/wiki/Protocol_specification#tx
the script length is a var_int.

The next byte is 19, which is less than FD, so this byte is the value of the script length number. 1*16 + 9 = 25.

The next 25 bytes are the scriptPubKey.


Our current position in the transaction is: 4 + 1 + 32 + 4 + 1 + 107 + 4 + 1 + 8 + 1 = 163. We have just looked at byte 163.
Starting from index 0, we are at index 162.

In python, print transaction[2*163:2*(163+25)].

result:
76a914338cdde52f708236affa5675f969606ff846ee6f88ac


So, in the raw transaction, the scriptPubKey appears here:

01 00 00 00 01 4f 8c ad 2c a3 b6 02 75 0e 06 c4 30 0d 91 b2 2c 17 dd 97 bb 5c e6 35 d8 60 df 8b 0a b3 25 b3 36 01 00 00 00 6b 48 30 45 02 21 00 d8 32 1c 3c 4b 3f c3 a3 4e 30 54 7d 1c b6 ae a6 85 5c ad 33 0b 59 0b 53 e8 24 f6 47 dd 49 23 99 02 20 1e 90 56 e8 22 97 a2 10 8b 19 69 cc 49 71 50 73 4a b7 8f 51 cb f9 79 88 5a fc 10 4f 70 a6 51 50 01 21 02 a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61 fd ff ff ff 02 e8 d9 41 00 00 00 00 00 19 76 a9 14 33 8c dd e5 2f 70 82 36 af fa 56 75 f9 69 60 6f f8 46 ee 6f 88 ac c4 71 6e 01 00 00 00 00 19 76 a9 14 34 96 95 0f 1a 01 28 5a 16 05 ac 93 37 c0 6a 55 96 b9 fc d8 88 ac ca ee 07 00



I'll leave the scriptPubKey for now.


That's the end of the first output item.


Next: another output item.


Next: value (of the second output)

In my transaction, the next eight bytes are: c4 71 6e 01 00 00 00 00

In decimal, the value is (12*16+4) + (7*16+1)*256 + (6*16+14)*(256^2) + (0*16+1)*(256^3) = 24015300 satoshi == 0.24015300 bitcoin.


Next: script length (of the second output)

The next byte is 19, which is less than FD, so this byte is the value of the script length number. 1*16 + 9 = 25.

The next 25 bytes are the scriptPubKey (of the second output).


Our current position in the transaction is: 4 + 1 + 32 + 4 + 1 + 107 + 4 + 1 + 8 + 1 + 25 + 8 + 1 = 197. We have just looked at byte 197.
Starting from index 0, we are at index 196.

In python, print transaction[2*197:2*(197+25)].

result:
76a9143496950f1a01285a1605ac9337c06a5596b9fcd888ac


So, in the raw transaction, the scriptPubKey of the second output appears here:

01 00 00 00 01 4f 8c ad 2c a3 b6 02 75 0e 06 c4 30 0d 91 b2 2c 17 dd 97 bb 5c e6 35 d8 60 df 8b 0a b3 25 b3 36 01 00 00 00 6b 48 30 45 02 21 00 d8 32 1c 3c 4b 3f c3 a3 4e 30 54 7d 1c b6 ae a6 85 5c ad 33 0b 59 0b 53 e8 24 f6 47 dd 49 23 99 02 20 1e 90 56 e8 22 97 a2 10 8b 19 69 cc 49 71 50 73 4a b7 8f 51 cb f9 79 88 5a fc 10 4f 70 a6 51 50 01 21 02 a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61 fd ff ff ff 02 e8 d9 41 00 00 00 00 00 19 76 a9 14 33 8c dd e5 2f 70 82 36 af fa 56 75 f9 69 60 6f f8 46 ee 6f 88 ac c4 71 6e 01 00 00 00 00 19 76 a9 14 34 96 95 0f 1a 01 28 5a 16 05 ac 93 37 c0 6a 55 96 b9 fc d8 88 ac ca ee 07 00



I'll leave the scriptPubKey for now.


That's the end of the second output item.


The last four bytes of the raw transaction are the block lock time.

In Ken Shirriff's transaction, these are: 00 00 00 00

In mine, they are: ca ee 07 00

I'm not sure whether this is little-endian or big-endian.



Let's write my selected raw transaction in a more readable, subdivided form:


Ken Shirriff's signed transaction:

[Signed Transaction]
- version: 01 00 00 00
- input count: 01
- input:
-- previous output hash (reversed): 48 4d 40 d4 5b 9e a0 d6 52 fc a8 25 8a b7 ca a4 25 41 eb 52 97 58 57 f9 6f b5 0c d7 32 c8 b4 81
-- previous output index: 00 00 00 00
-- script length: 8a
-- scriptSig: 47 30 44 02 20 2c b2 65 bf 10 70 7b f4 93 46 c3 51 5d d3 d1 6f c4 54 61 8c 58 ec 0a 0f f4 48 a6 76 c5 4f f7 13 02 20 6c 66 24 d7 62 a1 fc ef 46 18 28 4e ad 8f 08 67 8a c0 5b 13 c8 42 35 f1 65 4e 6a d1 68 23 3e 82 01 41 04 14 e3 01 b2 32 8f 17 44 2c 0b 83 10 d7 87 bf 3d 8a 40 4c fb d0 70 4f 13 5b 6a d4 b2 d3 ee 75 13 10 f9 81 92 6e 53 a6 e8 c3 9b d7 d3 fe fd 57 6c 54 3c ce 49 3c ba c0 63 88 f2 65 1d 1a ac bf cd
-- sequence: ff ff ff ff
- output count: 01
- output:
-- value: 62 64 01 00 00 00 00 00
-- script length: 19
-- scriptPubKey: 76 a9 14 c8 e9 09 96 c7 c6 08 0e e0 62 84 60 0c 68 4e d9 04 d1 4c 5c 88 ac
- block lock time: 00 00 00 00



The selected raw transaction, in single bytes:

01 00 00 00 01 4f 8c ad 2c a3 b6 02 75 0e 06 c4 30 0d 91 b2 2c 17 dd 97 bb 5c e6 35 d8 60 df 8b 0a b3 25 b3 36 01 00 00 00 6b 48 30 45 02 21 00 d8 32 1c 3c 4b 3f c3 a3 4e 30 54 7d 1c b6 ae a6 85 5c ad 33 0b 59 0b 53 e8 24 f6 47 dd 49 23 99 02 20 1e 90 56 e8 22 97 a2 10 8b 19 69 cc 49 71 50 73 4a b7 8f 51 cb f9 79 88 5a fc 10 4f 70 a6 51 50 01 21 02 a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61 fd ff ff ff 02 e8 d9 41 00 00 00 00 00 19 76 a9 14 33 8c dd e5 2f 70 82 36 af fa 56 75 f9 69 60 6f f8 46 ee 6f 88 ac c4 71 6e 01 00 00 00 00 19 76 a9 14 34 96 95 0f 1a 01 28 5a 16 05 ac 93 37 c0 6a 55 96 b9 fc d8 88 ac ca ee 07 00



The selected raw transaction, in sections:

- version: 01 00 00 00
- input count: 01
- input:
-- previous output hash (reversed): 4f 8c ad 2c a3 b6 02 75 0e 06 c4 30 0d 91 b2 2c 17 dd 97 bb 5c e6 35 d8 60 df 8b 0a b3 25 b3 36
-- previous output index: 01 00 00 00
-- script length: 6b
-- scriptSig: 48 30 45 02 21 00 d8 32 1c 3c 4b 3f c3 a3 4e 30 54 7d 1c b6 ae a6 85 5c ad 33 0b 59 0b 53 e8 24 f6 47 dd 49 23 99 02 20 1e 90 56 e8 22 97 a2 10 8b 19 69 cc 49 71 50 73 4a b7 8f 51 cb f9 79 88 5a fc 10 4f 70 a6 51 50 01 21 02 a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61
-- sequence: fd ff ff ff
- output count: 02
- output:
-- value: e8 d9 41 00 00 00 00 00
-- script length: 19
-- scriptPubKey: 76 a9 14 33 8c dd e5 2f 70 82 36 af fa 56 75 f9 69 60 6f f8 46 ee 6f 88 ac
- output:
-- value: c4 71 6e 01 00 00 00 00
-- script length: 19
-- scriptPubKey: 76 a9 14 34 96 95 0f 1a 01 28 5a 16 05 ac 93 37 c0 6a 55 96 b9 fc d8 88 ac
- block lock time: ca ee 07 00



I wonder why the scriptSig in Ken Shirriff's transaction is longer than the scriptSig in my selected transaction.




Next: Find out how to verify the signature on Ken Shirriff's transaction. Later, I will use what I have learned to attempt to verify the signature on my selected transaction.



Excerpts from Ken Shirriff's article:

You might expect that a Bitcoin transaction is signed simply by including the signature in the transaction, but the process is much more complicated. In fact, there is a small program inside each transaction that gets executed to decide if a transaction is valid. This program is written in Script, the stack-based Bitcoin scripting language.

[...]

The Script language is surprisingly complex, with about 80 different opcodes. It includes arithmetic, bitwise operations, string operations, conditionals, and stack manipulation. The language also includes the necessary cryptographic operations (SHA-256, RIPEMD, etc.) as primitives. In order to ensure that scripts terminate, the language does not contain any looping operations. (As a consequence, it is not Turing-complete.) [...]

In order for a Bitcoin transaction to be valid, the two parts of the redemption script must run successfully. The script in the old transaction is called scriptPubKey and the script in the new transaction is called scriptSig. To verify a transaction, the scriptSig executed followed by the scriptPubKey. If the script completes successfully, the transaction is valid and the Bitcoin can be spent. Otherwise, the transaction is invalid. The point of this is that the scriptPubKey in the old transaction defines the conditions for spending the bitcoins. The scriptSig in the new transaction must provide the data to satisfy the conditions.

In a standard transaction, the scriptSig pushes the signature (generated from the private key) to the stack, followed by the public key. Next, the scriptPubKey (from the source transaction) is executed to verify the public key and then verify the signature.

As expressed in Script, the scriptSig is:

PUSHDATA
signature data and SIGHASH_ALL
PUSHDATA
public key data


The scriptPubKey is:

OP_DUP
OP_HASH160
PUSHDATA
Bitcoin address (public key hash)
OP_EQUALVERIFY
OP_CHECKSIG


When this code executes, PUSHDATA first pushes the signature to the stack. The next PUSHDATA pushes the public key to the stack. Next, OP_DUP duplicates the public key on the stack. OP_HASH160 computes the 160-bit hash of the public key. PUSHDATA pushes the required Bitcoin address. Then OP_EQUALVERIFY verifies the top two stack values are equal - that the public key hash from the new transaction matches the address in the old address. This proves that the public key is valid. Next, OP_CHECKSIG checks that the signature of the transaction matches the public key and signature on the stack. This proves that the signature is valid.



Hm. Ok.
Notes:
- The previous/source transaction that contains the unspent output (which is used as an input in the current transaction) also holds the scriptPubKey, which roughly means "when this unspent output is used as an input for a new transaction (i.e. 'spent'), the transaction must be signed by the private key corresponding to the bitcoin address that holds this unspent output".
- The new/current transaction contains a scriptSig, which contains the signature that satisfies the condition set by the scriptPubKey.
- The scriptSig contains the bitcoin public key. This public key corresponds to the private key of the bitcoin address that "holds" the unspent output that the new transaction will spend. (It "holds" the unspent output in the sense that, according to the bitcoin distributed ledger, this unspent output is associated with this bitcoin address.)
-- Note: The bitcoin public key, when reformatted in a particular way, becomes the bitcoin address.
- The scriptPubKey contains the hash of the bitcoin public key. The scriptSig of a new transaction must contain the bitcoin public key that hashes to this hash value.



So, I need to learn how to execute the various operations in the signature verification. I also need to learn exactly which data is required as input for these operations and which format(s) this data must be in before the operations are performed.


There is one scriptPubKey per output in the selected transaction. If they are used in a new transaction, each will probably require a separate signature. Therefore, when making a new transaction, each input must be signed, presumably separately. I think that each signature will apply to the raw unsigned form of the transaction, and perhaps only to the section of the transaction that deals with a specific input. Some manipulation will therefore be required to get the transaction data into the signed form, in order to verify a signature.


The scriptSig signature does several things:
- It proves that the transaction has not been changed since it was created.
- It proves that the transaction was created by someone who had a specific public key (which is contained in the scriptSig)
- It proves that the transaction was created by someone who had the private key that corresponds to the specific public key required by the scriptPubKey of the relevant input.


Hm. In order to properly verify Ken Shirriff's transaction (i.e. in the way that the bitcoin client would verify it), I need the scriptPubKey of the input, which will be in the previous/source transaction. I will then verify that the scriptSig of Ken Shirriff's transaction satisfies this scriptPubKey.


From an earlier excerpt from Ken Shirriff's article:
"I started by bying bitcoins from Coinbase and putting 0.00101234 bitcoins into address 1MMMMSUb1piy2ufrSguNUdFmAcvqrQF8M5, which was transaction 81b4c832...."
In the article, "81b4c832...." is linked to:
blockchain.info/tx/81b4c832d70cb56ff957589752eb4125a4cab78a25a8fc52d6a09e5bd4404d48

This transaction contains 3 inputs and 2 outputs. The first output is to address 1MMMMSUb1piy2ufrSguNUdFmAcvqrQF8M5.


All the input and output addresses begin with "1", so none of them are multisig. (I think that multisig addresses may affect the transaction format.)



I'll begin writing a transaction parser.


I'll get the raw transaction by appending "?format=hex" to the hyperlink reference.

Browse to:
http://blockchain.info/tx/81b4c832d70cb56ff957589752eb4125a4cab78a25a8fc52d6a09e5bd4404d48?format=hex

Result:

0100000003c7c4e9082dfbb3713e47c1bef9f68efd9b856dfcdee318416daaa2add82770d4010000008b483045022100b2fd3f8a8c226f2addb1b663009c344e2c351dad0daf022d1cb12fe77e81563a02206e012ef6235d10ee058d4c3d61b44a8ca18ebc5e99388e6004f306b289683e02014104365c787aaf52a181a6e110f9d3daa08103f91b1512bea21398b9ea1f1beb5dae9155daad83b1dabbf316c4b6e4bb438344204a1db1d33ccb47d01c2051c0974affffffffee595b71cc5a1fb980a77af8b962534ae0049b2906fbe0790e48fa71e9f02299000000008b483045022100faf84d50e99deeef7d2cf9f8b4600b8de56fd788d9f66521a04378f90b49a87602204608cde776fda754ce871abff873dc4d29aa34c03a50d96085200e825bae73a40141045684d9b38346deb7f93b6b8282dcf8227bfcb72913d8f4c5fad9987e38770467fee26b5a0b57d0aef4df4002463ec1f1934640d6905eede1ed28bb7e432bcbd1ffffffff5e327471c5bdbecca45fccbd698a479424a31c99a9704b68e619f6b3e3b92955010000008a47304402206f361fb4b97aaea04f18cf9ff81a186e48ed4478379e6e93e0ab28d4d48226c902201565b73f2f03effacf429e7a840543c577347518f256dcc3def8558fa8effcef0141040dc0d62bd87d2e54be3f95c4187fa30d48d6cd00431978401dd798b74507d9adb457fd9434876f562735baef0bb9d6e35366c2c181b6d961b8362ad95af726aaffffffff02728b0100000000001976a914df3bd30160e6c6145baaf2c88a8844c13a00d1d588ac95871a00000000001976a9149f036f85506693bb84c5d910d58bdcf8283b20ce88ac00000000



I'll refer to this transaction as "tr_ken1".

Ken Shirriff's transaction is "tr_ken2".

My selected transaction is "tr_s2". Whichever transaction provides the single input to my selected transaction will be "tr_s1".



First goal for a transaction parser: Read the raw transaction and output it in readable sections.



Make a new directory in which to work.

Save the raw transaction above as "tr_ken1.txt" in the work directory.

In TextWrangler, create a new file "parser1.py" and save it in the work directory.


Let's develop.


Parser1

python 2.7.13
#!/opt/local/bin/python

def main():

	print "hello"



if __name__ == "__main__": main()



aineko:work stjohnpiano$ python --version

Python 2.7.13

aineko:work stjohnpiano$ chmod 700 parser1.py


aineko:work stjohnpiano$ ./parser1.py

hello






Parser1

python 2.7.13
#!/opt/local/bin/python

def main():

	file_path_1 = "tr_ken1.txt"
	raw_transaction = file_get_contents(file_path_1)

	print raw_transaction



def file_get_contents(file_path):
	from os.path import isfile
	if not isfile(file_path): 
		stop("%s is not a file." % file_path)
	f = open(file_path, "r")
	data = f.read()
	f.close()
	return data


def stop(message):
	print "ERROR: %s\n" % message
	sys.exit(1)


if __name__ == "__main__": main()



aineko:work stjohnpiano$ ./parser1.py

0100000003c7c4e9082dfbb3713e47c1bef9f68efd9b856dfcdee318416daaa2add82770d4010000008b483045022100b2fd3f8a8c226f2addb1b663009c344e2c351dad0daf022d1cb12fe77e81563a02206e012ef6235d10ee058d4c3d61b44a8ca18ebc5e99388e6004f306b289683e02014104365c787aaf52a181a6e110f9d3daa08103f91b1512bea21398b9ea1f1beb5dae9155daad83b1dabbf316c4b6e4bb438344204a1db1d33ccb47d01c2051c0974affffffffee595b71cc5a1fb980a77af8b962534ae0049b2906fbe0790e48fa71e9f02299000000008b483045022100faf84d50e99deeef7d2cf9f8b4600b8de56fd788d9f66521a04378f90b49a87602204608cde776fda754ce871abff873dc4d29aa34c03a50d96085200e825bae73a40141045684d9b38346deb7f93b6b8282dcf8227bfcb72913d8f4c5fad9987e38770467fee26b5a0b57d0aef4df4002463ec1f1934640d6905eede1ed28bb7e432bcbd1ffffffff5e327471c5bdbecca45fccbd698a479424a31c99a9704b68e619f6b3e3b92955010000008a47304402206f361fb4b97aaea04f18cf9ff81a186e48ed4478379e6e93e0ab28d4d48226c902201565b73f2f03effacf429e7a840543c577347518f256dcc3def8558fa8effcef0141040dc0d62bd87d2e54be3f95c4187fa30d48d6cd00431978401dd798b74507d9adb457fd9434876f562735baef0bb9d6e35366c2c181b6d961b8362ad95af726aaffffffff02728b0100000000001976a914df3bd30160e6c6145baaf2c88a8844c13a00d1d588ac95871a00000000001976a9149f036f85506693bb84c5d910d58bdcf8283b20ce88ac00000000





Parser1

python 2.7.13
#!/opt/local/bin/python

def main():

	file_path_1 = "tr_ken1.txt"
	raw_transaction = file_get_contents(file_path_1)
	
	if len(raw_transaction) % 2 != 0:
		stop("length of raw transaction in hex format is not even.")
	hex_bytes = [raw_transaction[i:i+2] for i in xrange(0, len(raw_transaction), 2)]
	
	print " ".join(hex_bytes)



def file_get_contents(file_path):
	from os.path import isfile
	if not isfile(file_path): 
		stop("%s is not a file." % file_path)
	f = open(file_path, "r")
	data = f.read()
	f.close()
	return data


def stop(message):
	print "\nERROR: %s\n" % message
	sys.exit(1)


if __name__ == "__main__": main()



aineko:work stjohnpiano$ ./parser1.py

01 00 00 00 03 c7 c4 e9 08 2d fb b3 71 3e 47 c1 be f9 f6 8e fd 9b 85 6d fc de e3 18 41 6d aa a2 ad d8 27 70 d4 01 00 00 00 8b 48 30 45 02 21 00 b2 fd 3f 8a 8c 22 6f 2a dd b1 b6 63 00 9c 34 4e 2c 35 1d ad 0d af 02 2d 1c b1 2f e7 7e 81 56 3a 02 20 6e 01 2e f6 23 5d 10 ee 05 8d 4c 3d 61 b4 4a 8c a1 8e bc 5e 99 38 8e 60 04 f3 06 b2 89 68 3e 02 01 41 04 36 5c 78 7a af 52 a1 81 a6 e1 10 f9 d3 da a0 81 03 f9 1b 15 12 be a2 13 98 b9 ea 1f 1b eb 5d ae 91 55 da ad 83 b1 da bb f3 16 c4 b6 e4 bb 43 83 44 20 4a 1d b1 d3 3c cb 47 d0 1c 20 51 c0 97 4a ff ff ff ff ee 59 5b 71 cc 5a 1f b9 80 a7 7a f8 b9 62 53 4a e0 04 9b 29 06 fb e0 79 0e 48 fa 71 e9 f0 22 99 00 00 00 00 8b 48 30 45 02 21 00 fa f8 4d 50 e9 9d ee ef 7d 2c f9 f8 b4 60 0b 8d e5 6f d7 88 d9 f6 65 21 a0 43 78 f9 0b 49 a8 76 02 20 46 08 cd e7 76 fd a7 54 ce 87 1a bf f8 73 dc 4d 29 aa 34 c0 3a 50 d9 60 85 20 0e 82 5b ae 73 a4 01 41 04 56 84 d9 b3 83 46 de b7 f9 3b 6b 82 82 dc f8 22 7b fc b7 29 13 d8 f4 c5 fa d9 98 7e 38 77 04 67 fe e2 6b 5a 0b 57 d0 ae f4 df 40 02 46 3e c1 f1 93 46 40 d6 90 5e ed e1 ed 28 bb 7e 43 2b cb d1 ff ff ff ff 5e 32 74 71 c5 bd be cc a4 5f cc bd 69 8a 47 94 24 a3 1c 99 a9 70 4b 68 e6 19 f6 b3 e3 b9 29 55 01 00 00 00 8a 47 30 44 02 20 6f 36 1f b4 b9 7a ae a0 4f 18 cf 9f f8 1a 18 6e 48 ed 44 78 37 9e 6e 93 e0 ab 28 d4 d4 82 26 c9 02 20 15 65 b7 3f 2f 03 ef fa cf 42 9e 7a 84 05 43 c5 77 34 75 18 f2 56 dc c3 de f8 55 8f a8 ef fc ef 01 41 04 0d c0 d6 2b d8 7d 2e 54 be 3f 95 c4 18 7f a3 0d 48 d6 cd 00 43 19 78 40 1d d7 98 b7 45 07 d9 ad b4 57 fd 94 34 87 6f 56 27 35 ba ef 0b b9 d6 e3 53 66 c2 c1 81 b6 d9 61 b8 36 2a d9 5a f7 26 aa ff ff ff ff 02 72 8b 01 00 00 00 00 00 19 76 a9 14 df 3b d3 01 60 e6 c6 14 5b aa f2 c8 8a 88 44 c1 3a 00 d1 d5 88 ac 95 87 1a 00 00 00 00 00 19 76 a9 14 9f 03 6f 85 50 66 93 bb 84 c5 d9 10 d5 8b dc f8 28 3b 20 ce 88 ac 00 00 00 00







Parser1

python 2.7.13
#!/opt/local/bin/python

def main():

	file_path_1 = "tr_ken1.txt"
	raw_transaction = file_get_contents(file_path_1)
	
	if len(raw_transaction) % 2 != 0:
		stop("length of raw transaction in hex format is not even.")
	hex_bytes = [raw_transaction[i:i+2] for i in xrange(0, len(raw_transaction), 2)]
	
	version = hex_bytes[0:4]
	input_count = hex_bytes[4]
	
	
	
	print "Transaction:"
	print "- version: %s" % spaced(version)
	print "- input_count: %s" % spaced(input_count)



def spaced(item):
	# expects either a string (single hex byte) or a list (multiple hex bytes)
	if isinstance(item, str):
		return item
	elif isinstance(item, list):
		return " ".join(item)
	stop("item is neither a string nor a list.")


def file_get_contents(file_path):
	from os.path import isfile
	if not isfile(file_path): 
		stop("%s is not a file." % file_path)
	f = open(file_path, "r")
	data = f.read()
	f.close()
	return data


def stop(message):
	print "\nERROR: %s\n" % message
	sys.exit(1)


if __name__ == "__main__": main()



aineko:work stjohnpiano$ ./parser1.py

Transaction:
- version: 01 00 00 00
- input_count: 03








Parser1

python 2.7.13
#!/opt/local/bin/python

def main():

	file_path_1 = "tr_ken1.txt"
	raw_transaction = file_get_contents(file_path_1)
	raw_transaction = raw_transaction.lower()
	
	if len(raw_transaction) % 2 != 0:
		stop("length of raw transaction in hex format is not even.")
	hex_bytes = [raw_transaction[i:i+2] for i in xrange(0, len(raw_transaction), 2)]
	
	version = hex_bytes[0:4]
	input_count = hex_bytes[4]
	
	input_count_length = get_byte_length_of_var_int(input_count)
	
	
	print "Transaction:"
	print "- version: %s" % spaced(version)
	print "- input_count: %s" % spaced(input_count)
	print "-- input_count byte length: %d" % input_count_length



def get_byte_length_of_var_int(byte):
	# examine the first byte of a variable length integer to find out its length. 
	if byte == "ff":
		return 9
	elif byte == "fe":
		return 5
	elif byte == "fd":
		return 3
	return 1


def spaced(item):
	# expects either a string (single hex byte) or a list (multiple hex bytes)
	if isinstance(item, str):
		return item
	elif isinstance(item, list):
		return " ".join(item)
	stop("item is neither a string nor a list.")


def file_get_contents(file_path):
	from os.path import isfile
	if not isfile(file_path): 
		stop("%s is not a file." % file_path)
	f = open(file_path, "r")
	data = f.read()
	f.close()
	return data


def stop(message):
	print "\nERROR: %s\n" % message
	sys.exit(1)


if __name__ == "__main__": main()



aineko:work stjohnpiano$ ./parser1.py

Transaction:
- version: 01 00 00 00
- input_count: 03
-- input_count byte length: 1








Parser1

python 2.7.13
#!/opt/local/bin/python


def main():

	file_path_1 = "tr_ken1.txt"
	raw_transaction = file_get_contents(file_path_1)
	raw_transaction = raw_transaction.lower()
	
	if len(raw_transaction) % 2 != 0:
		stop("length of raw transaction in hex format is not even.")
	hex_bytes = [raw_transaction[i:i+2] for i in xrange(0, len(raw_transaction), 2)]
	
	version = hex_bytes[0:4]
	input_count = hex_bytes[4]
	
	input_count_length = get_byte_length_of_var_int(input_count)
	
	if input_count_length != 1:
		stop("mark1: unwritten section.")
	elif input_count_length == 1:
		input_count_decimal = hex_string_to_decimal(input_count)
	
	
	
	
	print "Transaction:"
	print "- version: %s" % spaced(version)
	print "- input_count: %s" % spaced(input_count)
	print "-- [property] input_count byte length: %d" % input_count_length
	print "-- [property] input_count decimal value: %d" % input_count_decimal





def hex_string_to_decimal(string):
	total = 0
	n = len(string)
	for i in xrange(n):
		c = string[i] # c = character
		v = hex_character_to_decimal(c) # v = value
		power = n - i - 1 # first character has the highest power of 16. power of last character is 0. 
		total += v*(16**power)
	return total
	

def hex_character_to_decimal(c):
	decimal_digits = "0123456790"
	hex_digits = {"a": 10, "b": 11, "c": 12, "d": 13, "e": 14, "f": 15}
	if c in decimal_digits:
		return int(c)
	elif c in hex_digits.keys():
		return hex_digits[c]
	stop("character is not a hex character.")
		

def get_byte_length_of_var_int(byte):
	# examine the first byte of a variable length integer to find out its length. 
	if byte == "ff":
		return 9
	elif byte == "fe":
		return 5
	elif byte == "fd":
		return 3
	return 1


def spaced(item):
	# expects either a string (single hex byte) or a list (multiple hex bytes)
	if isinstance(item, str):
		return item
	elif isinstance(item, list):
		return " ".join(item)
	stop("item is neither a string nor a list.")


def file_get_contents(file_path):
	from os.path import isfile
	if not isfile(file_path): 
		stop("%s is not a file." % file_path)
	f = open(file_path, "r")
	data = f.read()
	f.close()
	return data


def stop(message):
	print "\nERROR: %s\n" % message
	sys.exit(1)


if __name__ == "__main__": main()



aineko:work stjohnpiano$ ./parser1.py

Transaction:
- version: 01 00 00 00
- input_count: 03
-- [property] input_count byte length: 1
-- [property] input_count decimal value: 3







Parser1

python 2.7.13
#!/opt/local/bin/python


def main():

	file_path_1 = "tr_ken1.txt"
	raw_transaction = file_get_contents(file_path_1)
	raw_transaction = raw_transaction.lower()
	
	if len(raw_transaction) % 2 != 0:
		stop("length of raw transaction in hex format is not even.")
	hex_bytes = [raw_transaction[i:i+2] for i in xrange(0, len(raw_transaction), 2)]
	
	version = hex_bytes[0:4]
	
	# next byte is the first byte of the input count. 
	input_count = hex_bytes[4]
	
	# input_count is a variable length integer, so we need to find out how long it is. 
	input_count_length = get_byte_length_of_var_int(input_count)
	
	if input_count_length != 1:
		stop("mark1: unwritten section.")
		# future: append appropriate number of bytes to input_count. 
	elif input_count_length == 1:
		input_count_decimal = hex_string_to_decimal(input_count)
	
	index = 4 + input_count_length # our current byte position within the raw transaction (starts from 0)
	
	# parse each input item
	input = { "previous_output_hash": None, 
			  "previous_output_index": None,
			  "script_length": None,
			  "scriptSig": None }
	inputs = [input]*input_count_decimal
	for i in xrange(input_count_decimal):
		# next 32 bytes contain the hash of the transaction that contains the output that became this input. 
		previous_output_hash = hex_bytes[index:(index+32)]
		# next 4 bytes contain the index of the output in the previous transaction.
		index += 32
		previous_output_index = hex_bytes[index:(index+4)]
		# next byte is the first byte of the script length. 
		# script_length is a variable length integer, so we need to find out how long it is. 
		index += 4
		script_length = hex_bytes[index]
		script_length_length = get_byte_length_of_var_int(script_length)
		
		if script_length_length != 1:
			stop("mark2: unwritten section.")
			# future: append appropriate number of bytes to script_length. 
		elif script_length_length == 1:
			script_length_decimal = hex_string_to_decimal(script_length)
		
		index += script_length_decimal
	
	print "Transaction:"
	print "- [property] byte length: %d" % len(hex_bytes)
	print "- version: %s" % spaced(version)
	print "- input_count: %s" % spaced(input_count)
	print "-- [property] input_count byte length: %d" % input_count_length
	print "-- [property] input_count decimal value: %d" % input_count_decimal





def hex_string_to_decimal(string):
	total = 0
	n = len(string)
	for i in xrange(n):
		c = string[i] # c = character
		v = hex_character_to_decimal(c) # v = value
		power = n - i - 1 # first character has the highest power of 16. power of last character is 0. 
		total += v*(16**power)
	return total
	

def hex_character_to_decimal(c):
	decimal_digits = "0123456790"
	hex_digits = {"a": 10, "b": 11, "c": 12, "d": 13, "e": 14, "f": 15}
	if c in decimal_digits:
		return int(c)
	elif c in hex_digits.keys():
		return hex_digits[c]
	stop("character is not a hex character.")
		

def get_byte_length_of_var_int(byte):
	# examine the first byte of a variable length integer to find out its length. 
	if byte == "ff":
		return 9
	elif byte == "fe":
		return 5
	elif byte == "fd":
		return 3
	return 1


def spaced(item):
	# expects either a string (single hex byte) or a list (multiple hex bytes)
	if isinstance(item, str):
		return item
	elif isinstance(item, list):
		return " ".join(item)
	stop("item is neither a string nor a list.")


def file_get_contents(file_path):
	from os.path import isfile
	if not isfile(file_path): 
		stop("%s is not a file." % file_path)
	f = open(file_path, "r")
	data = f.read()
	f.close()
	return data


def stop(message):
	print "\nERROR: %s\n" % message
	sys.exit(1)


if __name__ == "__main__": main()



aineko:work stjohnpiano$ ./parser1.py


ERROR: character is not a hex character.

Traceback (most recent call last):
File "./parser1.py", line 123, in <module>
if __name__ == "__main__": main()
File "./parser1.py", line 52, in main
script_length_decimal = hex_string_to_decimal(script_length)
File "./parser1.py", line 72, in hex_string_to_decimal
v = hex_character_to_decimal(c) # v = value
File "./parser1.py", line 85, in hex_character_to_decimal
stop("character is not a hex character.")
File "./parser1.py", line 120, in stop
sys.exit(1)
NameError: global name 'sys' is not defined



Add
import sys
as the first line of
stop(message)
.

aineko:work stjohnpiano$ ./parser1.py


ERROR: character is not a hex character.



Edit last line of
hex_character_to_decimal(c)
to be:
stop("character %s is not a hex character." % c)



aineko:work stjohnpiano$ ./parser1.py


ERROR: character 8 is not a hex character.




Hm.


Ah.

decimal_digits = "0123456790"

should have been
decimal_digits = "0123456789"


Edit this line.


aineko:work stjohnpiano$ ./parser1.py

Transaction:
- [property] byte length: 617
- version: 01 00 00 00
- input_count: 03
-- [property] input_count byte length: 1
-- [property] input_count decimal value: 3




Let's just look at the first input for now.

Note: This
inputs = [input]*input_count_decimal

is an error. Need to explicitly make a copy of the
input
dictionary for each entry in the
inputs
list.




Parser1

python 2.7.13
#!/opt/local/bin/python


def main():

	file_path_1 = "tr_ken1.txt"
	raw_transaction = file_get_contents(file_path_1)
	raw_transaction = raw_transaction.lower()
	
	if len(raw_transaction) % 2 != 0:
		stop("length of raw transaction in hex format is not even.")
	hex_bytes = [raw_transaction[i:i+2] for i in xrange(0, len(raw_transaction), 2)]
	
	version = hex_bytes[0:4]
	
	# next byte is the first byte of the input count. 
	input_count = hex_bytes[4]
	
	# input_count is a variable length integer, so we need to find out how long it is. 
	input_count_length = get_byte_length_of_var_int(input_count)
	
	if input_count_length != 1:
		stop("mark1: unwritten section.")
		# future: append appropriate number of bytes to input_count. 
	elif input_count_length == 1:
		input_count_decimal = hex_string_to_decimal(input_count)
	
	index = 4 + input_count_length # our current byte position within the raw transaction (starts from 0)
	
	# parse each input item
	input = { "previous_output_hash": None, 
			  "previous_output_index": None,
			  "script_length": None,
			  "scriptSig": None,
			  "sequence": None }
	inputs = [None]*input_count_decimal
	for i in xrange(input_count_decimal):
		inputs[i] = input.copy()
	for i in xrange(1):
		# next 32 bytes contain the hash of the transaction that contains the output that became this input. 
		previous_output_hash = hex_bytes[index:(index+32)]
		# next 4 bytes contain the index of the output in the previous transaction.
		index += 32
		previous_output_index = hex_bytes[index:(index+4)]
		# next byte is the first byte of the script length. 
		# script_length is a variable length integer, so we need to find out how long it is. 
		index += 4
		script_length = hex_bytes[index]
		script_length_length = get_byte_length_of_var_int(script_length)
		
		if script_length_length != 1:
			stop("mark2: unwritten section.")
			# future: append appropriate number of bytes to script_length. 
		elif script_length_length == 1:
			script_length_decimal = hex_string_to_decimal(script_length)
		index += script_length_length
		
		# select next [script_length_decimal] bytes. this will be the scriptSig.
		scriptSig = hex_bytes[index:(index+script_length_decimal)]
		
		index += script_length_decimal
		
		# next four bytes are the input's sequence number.
		sequence = hex_bytes[index:(index+4)]
		index += 4
		
		input = inputs[i]
		input["previous_output_hash"] = previous_output_hash
		input["previous_output_index"] = previous_output_index
		input["script_length"] = script_length
		input["scriptSig"] = scriptSig
		input["sequence"] = sequence
		
		break # only do first input for now. 
	
	
	
	
	print "Transaction:"
	print "- [property] byte length: %d" % len(hex_bytes)
	print "- version: %s" % spaced(version)
	print "- input_count: %s" % spaced(input_count)
	print "-- [property] input_count byte length: %d" % input_count_length
	print "-- [property] input_count decimal value: %d" % input_count_decimal
	for i, input in enumerate(inputs):
		print "- input %d" % i
		print "-- previous_output_hash: %s" % spaced(input["previous_output_hash"])
		print "-- previous_output_index: %s" % spaced(input["previous_output_index"])
		print "-- script_length: %s" % spaced(input["script_length"])
		print "-- scriptSig: %s" % spaced(input["scriptSig"])
		print "-- sequence: %s" % spaced(input["sequence"])



def hex_string_to_decimal(string):
	total = 0
	n = len(string)
	for i in xrange(n):
		c = string[i] # c = character
		v = hex_character_to_decimal(c) # v = value
		power = n - i - 1 # first character has the highest power of 16. power of last character is 0. 
		total += v*(16**power)
	return total
	

def hex_character_to_decimal(c):
	decimal_digits = "0123456789"
	hex_digits = {"a": 10, "b": 11, "c": 12, "d": 13, "e": 14, "f": 15}
	if c in decimal_digits:
		return int(c)
	elif c in hex_digits.keys():
		return hex_digits[c]
	stop("character %s is not a hex character." % c)
		

def get_byte_length_of_var_int(byte):
	# examine the first byte of a variable length integer to find out its length. 
	if byte == "ff":
		return 9
	elif byte == "fe":
		return 5
	elif byte == "fd":
		return 3
	return 1


def spaced(item):
	# expects either a string (single hex byte) or a list (multiple hex bytes)
	if isinstance(item, str):
		return item
	elif isinstance(item, list):
		return " ".join(item)
	elif item == None:
		return "None"
	stop("item is neither a string nor a list. item_string = %s" % str(item))


def file_get_contents(file_path):
	from os.path import isfile
	if not isfile(file_path): 
		stop("%s is not a file." % file_path)
	f = open(file_path, "r")
	data = f.read()
	f.close()
	return data


def stop(message):
	import sys
	print "\nERROR: %s\n" % message
	sys.exit(1)


if __name__ == "__main__": main()



aineko:work stjohnpiano$ ./parser1.py

Transaction:
- [property] byte length: 617
- version: 01 00 00 00
- input_count: 03
-- [property] input_count byte length: 1
-- [property] input_count decimal value: 3
- input 0
-- previous_output_hash: c7 c4 e9 08 2d fb b3 71 3e 47 c1 be f9 f6 8e fd 9b 85 6d fc de e3 18 41 6d aa a2 ad d8 27 70 d4
-- previous_output_index: 01 00 00 00
-- script_length: 8b
-- scriptSig: 48 30 45 02 21 00 b2 fd 3f 8a 8c 22 6f 2a dd b1 b6 63 00 9c 34 4e 2c 35 1d ad 0d af 02 2d 1c b1 2f e7 7e 81 56 3a 02 20 6e 01 2e f6 23 5d 10 ee 05 8d 4c 3d 61 b4 4a 8c a1 8e bc 5e 99 38 8e 60 04 f3 06 b2 89 68 3e 02 01 41 04 36 5c 78 7a af 52 a1 81 a6 e1 10 f9 d3 da a0 81 03 f9 1b 15 12 be a2 13 98 b9 ea 1f 1b eb 5d ae 91 55 da ad 83 b1 da bb f3 16 c4 b6 e4 bb 43 83 44 20 4a 1d b1 d3 3c cb 47 d0 1c 20 51 c0 97 4a
-- sequence: ff ff ff ff
- input 1
-- previous_output_hash: None
-- previous_output_index: None
-- script_length: None
-- scriptSig: None
-- sequence: None
- input 2
-- previous_output_hash: None
-- previous_output_index: None
-- script_length: None
-- scriptSig: None
-- sequence: None






Ok. Looks good.
previous_output_index (01 00 00 00) and sequence (ff ff ff ff) hold reasonable values.

Let's do all the inputs.


Parser1

python 2.7.13
#!/opt/local/bin/python


def main():

	file_path_1 = "tr_ken1.txt"
	raw_transaction = file_get_contents(file_path_1)
	raw_transaction = raw_transaction.lower()
	
	if len(raw_transaction) % 2 != 0:
		stop("length of raw transaction in hex format is not even.")
	hex_bytes = [raw_transaction[i:i+2] for i in xrange(0, len(raw_transaction), 2)]
	
	version = hex_bytes[0:4]
	
	# next byte is the first byte of the input count. 
	input_count = hex_bytes[4]
	
	# input_count is a variable length integer, so we need to find out how long it is. 
	input_count_length = get_byte_length_of_var_int(input_count)
	
	if input_count_length != 1:
		stop("mark1: unwritten section.")
		# future: append appropriate number of bytes to input_count. 
	elif input_count_length == 1:
		input_count_decimal = hex_string_to_decimal(input_count)
	
	index = 4 + input_count_length # our current byte position within the raw transaction (starts from 0)
	
	# parse each input item
	input = { "previous_output_hash": None, 
			  "previous_output_index": None,
			  "script_length": None,
			  "scriptSig": None,
			  "sequence": None }
	inputs = [None]*input_count_decimal
	for i in xrange(input_count_decimal):
		inputs[i] = input.copy()
	for i in xrange(input_count_decimal):
		# next 32 bytes contain the hash of the transaction that contains the output that became this input. 
		previous_output_hash = hex_bytes[index:(index+32)]
		# next 4 bytes contain the index of the output in the previous transaction.
		index += 32
		previous_output_index = hex_bytes[index:(index+4)]
		# next byte is the first byte of the script length. 
		# script_length is a variable length integer, so we need to find out how long it is. 
		index += 4
		script_length = hex_bytes[index]
		script_length_length = get_byte_length_of_var_int(script_length)
		
		if script_length_length != 1:
			stop("mark2: unwritten section.")
			# future: append appropriate number of bytes to script_length. 
		elif script_length_length == 1:
			script_length_decimal = hex_string_to_decimal(script_length)
		index += script_length_length
		
		# select next [script_length_decimal] bytes. this will be the scriptSig.
		scriptSig = hex_bytes[index:(index+script_length_decimal)]
		
		index += script_length_decimal
		
		# next four bytes are the input's sequence number.
		sequence = hex_bytes[index:(index+4)]
		index += 4
		
		input = inputs[i]
		input["previous_output_hash"] = previous_output_hash
		input["previous_output_index"] = previous_output_index
		input["script_length"] = script_length
		input["scriptSig"] = scriptSig
		input["sequence"] = sequence
			
	
	
	
	print "\nTransaction:\n"
	print "- [property] byte length: %d" % len(hex_bytes)
	print "- version: %s" % spaced(version)
	print "- input_count: %s" % spaced(input_count)
	print "-- [property] input_count byte length: %d" % input_count_length
	print "-- [property] input_count decimal value: %d" % input_count_decimal
	for i, input in enumerate(inputs):
		print "\n- input #%d" % i
		print "-- previous_output_hash: %s" % spaced(input["previous_output_hash"])
		print "-- previous_output_index: %s" % spaced(input["previous_output_index"])
		print "-- script_length: %s" % spaced(input["script_length"])
		print "-- scriptSig: %s" % spaced(input["scriptSig"])
		print "-- sequence: %s" % spaced(input["sequence"])



def hex_string_to_decimal(string):
	total = 0
	n = len(string)
	for i in xrange(n):
		c = string[i] # c = character
		v = hex_character_to_decimal(c) # v = value
		power = n - i - 1 # first character has the highest power of 16. power of last character is 0. 
		total += v*(16**power)
	return total
	

def hex_character_to_decimal(c):
	decimal_digits = "0123456789"
	hex_digits = {"a": 10, "b": 11, "c": 12, "d": 13, "e": 14, "f": 15}
	if c in decimal_digits:
		return int(c)
	elif c in hex_digits.keys():
		return hex_digits[c]
	stop("character %s is not a hex character." % c)
		

def get_byte_length_of_var_int(byte):
	# examine the first byte of a variable length integer to find out its length. 
	if byte == "ff":
		return 9
	elif byte == "fe":
		return 5
	elif byte == "fd":
		return 3
	return 1


def spaced(item):
	# expects a string (single hex byte), a list (multiple hex bytes), or None
	if isinstance(item, str):
		return item
	elif isinstance(item, list):
		return " ".join(item)
	elif item == None:
		return "None"
	stop("item is not in [string, list, or None]. item_string = %s" % str(item))


def file_get_contents(file_path):
	from os.path import isfile
	if not isfile(file_path): 
		stop("%s is not a file." % file_path)
	f = open(file_path, "r")
	data = f.read()
	f.close()
	return data


def stop(message):
	import sys
	print "\nERROR: %s\n" % message
	sys.exit(1)


if __name__ == "__main__": main()



aineko:work stjohnpiano$ ./parser1.py


Transaction:

- [property] byte length: 617
- version: 01 00 00 00
- input_count: 03
-- [property] input_count byte length: 1
-- [property] input_count decimal value: 3

- input #0
-- previous_output_hash: c7 c4 e9 08 2d fb b3 71 3e 47 c1 be f9 f6 8e fd 9b 85 6d fc de e3 18 41 6d aa a2 ad d8 27 70 d4
-- previous_output_index: 01 00 00 00
-- script_length: 8b
-- scriptSig: 48 30 45 02 21 00 b2 fd 3f 8a 8c 22 6f 2a dd b1 b6 63 00 9c 34 4e 2c 35 1d ad 0d af 02 2d 1c b1 2f e7 7e 81 56 3a 02 20 6e 01 2e f6 23 5d 10 ee 05 8d 4c 3d 61 b4 4a 8c a1 8e bc 5e 99 38 8e 60 04 f3 06 b2 89 68 3e 02 01 41 04 36 5c 78 7a af 52 a1 81 a6 e1 10 f9 d3 da a0 81 03 f9 1b 15 12 be a2 13 98 b9 ea 1f 1b eb 5d ae 91 55 da ad 83 b1 da bb f3 16 c4 b6 e4 bb 43 83 44 20 4a 1d b1 d3 3c cb 47 d0 1c 20 51 c0 97 4a
-- sequence: ff ff ff ff

- input #1
-- previous_output_hash: ee 59 5b 71 cc 5a 1f b9 80 a7 7a f8 b9 62 53 4a e0 04 9b 29 06 fb e0 79 0e 48 fa 71 e9 f0 22 99
-- previous_output_index: 00 00 00 00
-- script_length: 8b
-- scriptSig: 48 30 45 02 21 00 fa f8 4d 50 e9 9d ee ef 7d 2c f9 f8 b4 60 0b 8d e5 6f d7 88 d9 f6 65 21 a0 43 78 f9 0b 49 a8 76 02 20 46 08 cd e7 76 fd a7 54 ce 87 1a bf f8 73 dc 4d 29 aa 34 c0 3a 50 d9 60 85 20 0e 82 5b ae 73 a4 01 41 04 56 84 d9 b3 83 46 de b7 f9 3b 6b 82 82 dc f8 22 7b fc b7 29 13 d8 f4 c5 fa d9 98 7e 38 77 04 67 fe e2 6b 5a 0b 57 d0 ae f4 df 40 02 46 3e c1 f1 93 46 40 d6 90 5e ed e1 ed 28 bb 7e 43 2b cb d1
-- sequence: ff ff ff ff

- input #2
-- previous_output_hash: 5e 32 74 71 c5 bd be cc a4 5f cc bd 69 8a 47 94 24 a3 1c 99 a9 70 4b 68 e6 19 f6 b3 e3 b9 29 55
-- previous_output_index: 01 00 00 00
-- script_length: 8a
-- scriptSig: 47 30 44 02 20 6f 36 1f b4 b9 7a ae a0 4f 18 cf 9f f8 1a 18 6e 48 ed 44 78 37 9e 6e 93 e0 ab 28 d4 d4 82 26 c9 02 20 15 65 b7 3f 2f 03 ef fa cf 42 9e 7a 84 05 43 c5 77 34 75 18 f2 56 dc c3 de f8 55 8f a8 ef fc ef 01 41 04 0d c0 d6 2b d8 7d 2e 54 be 3f 95 c4 18 7f a3 0d 48 d6 cd 00 43 19 78 40 1d d7 98 b7 45 07 d9 ad b4 57 fd 94 34 87 6f 56 27 35 ba ef 0b b9 d6 e3 53 66 c2 c1 81 b6 d9 61 b8 36 2a d9 5a f7 26 aa
-- sequence: ff ff ff ff





Ok.
The previous_output_index and the sequence for each input hold reasonable values.


Next: the outputs.





Parser1

python 2.7.13
#!/opt/local/bin/python


def main():

	file_path_1 = "tr_ken1.txt"
	raw_transaction = file_get_contents(file_path_1)
	raw_transaction = raw_transaction.lower()
	
	if len(raw_transaction) % 2 != 0:
		stop("length of raw transaction in hex format is not even.")
	hex_bytes = [raw_transaction[i:i+2] for i in xrange(0, len(raw_transaction), 2)]
	
	version = hex_bytes[0:4]
	
	# next byte is the first byte of the input count. 
	input_count = hex_bytes[4]
	
	# input_count is a variable length integer, so we need to find out how long it is. 
	input_count_length = get_byte_length_of_var_int(input_count)
	
	if input_count_length != 1:
		stop("mark1: unwritten section.")
		# future: append appropriate number of bytes to input_count. 
	elif input_count_length == 1:
		input_count_decimal = hex_string_to_decimal(input_count)
	
	index = 4 + input_count_length # our current byte position within the raw transaction (starts from 0)
	
	# parse each input item
	input = { "previous_output_hash": None, 
			  "previous_output_index": None,
			  "script_length": None,
			  "scriptSig": None,
			  "sequence": None }
	inputs = [None]*input_count_decimal
	for i in xrange(input_count_decimal):
		inputs[i] = input.copy()
	for i in xrange(input_count_decimal):
		# next 32 bytes contain the hash of the transaction that contains the output that became this input. 
		previous_output_hash = hex_bytes[index:(index+32)]
		# next 4 bytes contain the index of the output in the previous transaction.
		index += 32
		previous_output_index = hex_bytes[index:(index+4)]
		# next byte is the first byte of the script length. 
		# script_length is a variable length integer, so we need to find out how long it is. 
		index += 4
		script_length = hex_bytes[index]
		script_length_length = get_byte_length_of_var_int(script_length)
		
		if script_length_length != 1:
			stop("mark2: unwritten section.")
			# future: append appropriate number of bytes to script_length. 
		elif script_length_length == 1:
			script_length_decimal = hex_string_to_decimal(script_length)
		index += script_length_length
		
		# select next [script_length_decimal] bytes. this will be the scriptSig.
		scriptSig = hex_bytes[index:(index+script_length_decimal)]
		
		index += script_length_decimal
		
		# next four bytes are the input's sequence number.
		sequence = hex_bytes[index:(index+4)]
		index += 4
		
		input = inputs[i]
		input["previous_output_hash"] = previous_output_hash
		input["previous_output_index"] = previous_output_index
		input["script_length"] = script_length
		input["scriptSig"] = scriptSig
		input["sequence"] = sequence
			
	
	# next byte is the first byte of the output count. 
	output_count = hex_bytes[index]
	
	# input_count is a variable length integer, so we need to find out how long it is. 
	output_count_length = get_byte_length_of_var_int(output_count)
	
	if output_count_length != 1:
		stop("mark3: unwritten section.")
		# future: append appropriate number of bytes to output_count. 
	elif output_count_length == 1:
		output_count_decimal = hex_string_to_decimal(output_count)
	
	index += output_count_length
	
	
	
	
	
	
	print "\nTransaction:\n"
	print "- [property] byte length: %d" % len(hex_bytes)
	print "- version: %s" % spaced(version)
	print "- input_count: %s" % spaced(input_count)
	print "-- [property] input_count byte length: %d" % input_count_length
	print "-- [property] input_count decimal value: %d" % input_count_decimal
	for i, input in enumerate(inputs):
		print "\n- input #%d" % i
		print "-- previous_output_hash: %s" % spaced(input["previous_output_hash"])
		print "-- previous_output_index: %s" % spaced(input["previous_output_index"])
		print "-- script_length: %s" % spaced(input["script_length"])
		print "--- [property] script_length decimal value: %d" % hex_string_to_decimal(script_length)
		print "-- scriptSig: %s" % spaced(input["scriptSig"])
		print "-- sequence: %s" % spaced(input["sequence"])
	print ""
	print "- output_count: %s" % spaced(output_count)
	print "-- [property] output_count byte length: %d" % output_count_length
	print "-- [property] output_count decimal value: %d" % output_count_decimal



def hex_string_to_decimal(string):
	total = 0
	n = len(string)
	for i in xrange(n):
		c = string[i] # c = character
		v = hex_character_to_decimal(c) # v = value
		power = n - i - 1 # first character has the highest power of 16. power of last character is 0. 
		total += v*(16**power)
	return total
	

def hex_character_to_decimal(c):
	decimal_digits = "0123456789"
	hex_digits = {"a": 10, "b": 11, "c": 12, "d": 13, "e": 14, "f": 15}
	if c in decimal_digits:
		return int(c)
	elif c in hex_digits.keys():
		return hex_digits[c]
	stop("character %s is not a hex character." % c)
		

def get_byte_length_of_var_int(byte):
	# examine the first byte of a variable length integer to find out its length. 
	if byte == "ff":
		return 9
	elif byte == "fe":
		return 5
	elif byte == "fd":
		return 3
	return 1


def spaced(item):
	# expects a string (single hex byte), a list (multiple hex bytes), or None
	if isinstance(item, str):
		return item
	elif isinstance(item, list):
		return " ".join(item)
	elif item == None:
		return "None"
	stop("item is not in [string, list, or None]. item_string = %s" % str(item))


def file_get_contents(file_path):
	from os.path import isfile
	if not isfile(file_path): 
		stop("%s is not a file." % file_path)
	f = open(file_path, "r")
	data = f.read()
	f.close()
	return data


def stop(message):
	import sys
	print "\nERROR: %s\n" % message
	sys.exit(1)


if __name__ == "__main__": main()



aineko:work stjohnpiano$ ./parser1.py


Transaction:

- [property] byte length: 617
- version: 01 00 00 00
- input_count: 03
-- [property] input_count byte length: 1
-- [property] input_count decimal value: 3

- input #0
-- previous_output_hash: c7 c4 e9 08 2d fb b3 71 3e 47 c1 be f9 f6 8e fd 9b 85 6d fc de e3 18 41 6d aa a2 ad d8 27 70 d4
-- previous_output_index: 01 00 00 00
-- script_length: 8b
--- [property] script_length decimal value: 138
-- scriptSig: 48 30 45 02 21 00 b2 fd 3f 8a 8c 22 6f 2a dd b1 b6 63 00 9c 34 4e 2c 35 1d ad 0d af 02 2d 1c b1 2f e7 7e 81 56 3a 02 20 6e 01 2e f6 23 5d 10 ee 05 8d 4c 3d 61 b4 4a 8c a1 8e bc 5e 99 38 8e 60 04 f3 06 b2 89 68 3e 02 01 41 04 36 5c 78 7a af 52 a1 81 a6 e1 10 f9 d3 da a0 81 03 f9 1b 15 12 be a2 13 98 b9 ea 1f 1b eb 5d ae 91 55 da ad 83 b1 da bb f3 16 c4 b6 e4 bb 43 83 44 20 4a 1d b1 d3 3c cb 47 d0 1c 20 51 c0 97 4a
-- sequence: ff ff ff ff

- input #1
-- previous_output_hash: ee 59 5b 71 cc 5a 1f b9 80 a7 7a f8 b9 62 53 4a e0 04 9b 29 06 fb e0 79 0e 48 fa 71 e9 f0 22 99
-- previous_output_index: 00 00 00 00
-- script_length: 8b
--- [property] script_length decimal value: 138
-- scriptSig: 48 30 45 02 21 00 fa f8 4d 50 e9 9d ee ef 7d 2c f9 f8 b4 60 0b 8d e5 6f d7 88 d9 f6 65 21 a0 43 78 f9 0b 49 a8 76 02 20 46 08 cd e7 76 fd a7 54 ce 87 1a bf f8 73 dc 4d 29 aa 34 c0 3a 50 d9 60 85 20 0e 82 5b ae 73 a4 01 41 04 56 84 d9 b3 83 46 de b7 f9 3b 6b 82 82 dc f8 22 7b fc b7 29 13 d8 f4 c5 fa d9 98 7e 38 77 04 67 fe e2 6b 5a 0b 57 d0 ae f4 df 40 02 46 3e c1 f1 93 46 40 d6 90 5e ed e1 ed 28 bb 7e 43 2b cb d1
-- sequence: ff ff ff ff

- input #2
-- previous_output_hash: 5e 32 74 71 c5 bd be cc a4 5f cc bd 69 8a 47 94 24 a3 1c 99 a9 70 4b 68 e6 19 f6 b3 e3 b9 29 55
-- previous_output_index: 01 00 00 00
-- script_length: 8a
--- [property] script_length decimal value: 138
-- scriptSig: 47 30 44 02 20 6f 36 1f b4 b9 7a ae a0 4f 18 cf 9f f8 1a 18 6e 48 ed 44 78 37 9e 6e 93 e0 ab 28 d4 d4 82 26 c9 02 20 15 65 b7 3f 2f 03 ef fa cf 42 9e 7a 84 05 43 c5 77 34 75 18 f2 56 dc c3 de f8 55 8f a8 ef fc ef 01 41 04 0d c0 d6 2b d8 7d 2e 54 be 3f 95 c4 18 7f a3 0d 48 d6 cd 00 43 19 78 40 1d d7 98 b7 45 07 d9 ad b4 57 fd 94 34 87 6f 56 27 35 ba ef 0b b9 d6 e3 53 66 c2 c1 81 b6 d9 61 b8 36 2a d9 5a f7 26 aa
-- sequence: ff ff ff ff

- output_count: 02
-- [property] output_count byte length: 1
-- [property] output_count decimal value: 2







Just the first output for now.



Error:
hex_string_to_decimal(script_length)

in the "print data" section was not updating script_length with each input.







Parser1

python 2.7.13
#!/opt/local/bin/python


def main():

	file_path_1 = "tr_ken1.txt"
	raw_transaction = file_get_contents(file_path_1)
	raw_transaction = raw_transaction.lower()
	
	if len(raw_transaction) % 2 != 0:
		stop("length of raw transaction in hex format is not even.")
	hex_bytes = [raw_transaction[i:i+2] for i in xrange(0, len(raw_transaction), 2)]
	
	version = hex_bytes[0:4]
	
	# next byte is the first byte of the input count. 
	input_count = hex_bytes[4]
	
	# input_count is a variable length integer, so we need to find out how long it is. 
	input_count_length = get_byte_length_of_var_int(input_count)
	
	if input_count_length != 1:
		stop("mark1: unwritten section.")
		# future: append appropriate number of bytes to input_count. 
	elif input_count_length == 1:
		input_count_decimal = hex_string_to_decimal(input_count)
	
	index = 4 + input_count_length # our current byte position within the raw transaction (starts from 0)
	
	# parse each input item
	input = { "previous_output_hash": None, 
			  "previous_output_index": None,
			  "script_length": None,
			  "scriptSig": None,
			  "sequence": None }
	inputs = [None]*input_count_decimal
	for i in xrange(input_count_decimal):
		inputs[i] = input.copy()
	for i in xrange(input_count_decimal):
		# next 32 bytes contain the hash of the transaction that contains the output that became this input. 
		previous_output_hash = hex_bytes[index:(index+32)]
		index += 32
		# next 4 bytes contain the index of the output in the previous transaction.
		previous_output_index = hex_bytes[index:(index+4)]
		index += 4
		# next byte is the first byte of the script length. 
		# script_length is a variable length integer, so we need to find out how long it is. 
		script_length = hex_bytes[index]
		script_length_length = get_byte_length_of_var_int(script_length)
		
		if script_length_length != 1:
			stop("mark2: unwritten section.")
			# future: append appropriate number of bytes to script_length. 
		elif script_length_length == 1:
			script_length_decimal = hex_string_to_decimal(script_length)
		index += script_length_length
		
		# select next [script_length_decimal] bytes. this will be the scriptSig.
		scriptSig = hex_bytes[index:(index+script_length_decimal)]
		
		index += script_length_decimal
		
		# next four bytes are the input's sequence number.
		sequence = hex_bytes[index:(index+4)]
		index += 4
		
		input = inputs[i]
		input["previous_output_hash"] = previous_output_hash
		input["previous_output_index"] = previous_output_index
		input["script_length"] = script_length
		input["scriptSig"] = scriptSig
		input["sequence"] = sequence
			
	
	# next byte is the first byte of the output count. 
	output_count = hex_bytes[index]
	
	# input_count is a variable length integer, so we need to find out how long it is. 
	output_count_length = get_byte_length_of_var_int(output_count)
	
	if output_count_length != 1:
		stop("mark3: unwritten section.")
		# future: append appropriate number of bytes to output_count. 
	elif output_count_length == 1:
		output_count_decimal = hex_string_to_decimal(output_count)
	
	index += output_count_length
	
	# parse each output item
	output = { "value": None, 
			  "script_length": None,
			  "scriptPubKey": None }
	outputs = [None]*output_count_decimal
	for i in xrange(output_count_decimal):
		outputs[i] = output.copy()
	for i in xrange(1):
		# next 8 bytes contain the value of the output. 
		value = hex_bytes[index:(index+8)]
		index += 8
		
		# next byte is the first byte of the script length. 
		# script_length is a variable length integer, so we need to find out how long it is. 
		script_length = hex_bytes[index]
		script_length_length = get_byte_length_of_var_int(script_length)
		
		if script_length_length != 1:
			stop("mark4: unwritten section.")
			# future: append appropriate number of bytes to script_length. 
		elif script_length_length == 1:
			script_length_decimal = hex_string_to_decimal(script_length)
		index += script_length_length
	
		# select next [script_length_decimal] bytes. this will be the scriptPubKey.
		scriptPubKey = hex_bytes[index:(index+script_length_decimal)]
		
		index += script_length_decimal
		
		output = outputs[i]
		output["value"] = value
		output["script_length"] = script_length
		output["scriptPubKey"] = scriptPubKey
		
		
	
	
	print "\nTransaction:\n"
	print "- [property] byte length: %d" % len(hex_bytes)
	print "- version: %s" % spaced(version)
	print "- input_count: %s" % spaced(input_count)
	print "-- [property] input_count byte length: %d" % input_count_length
	print "-- [property] input_count decimal value: %d" % input_count_decimal
	for i, input in enumerate(inputs):
		print "\n- input #%d" % i
		print "-- previous_output_hash: %s" % spaced(input["previous_output_hash"])
		print "-- previous_output_index: %s" % spaced(input["previous_output_index"])
		print "-- script_length: %s" % spaced(input["script_length"])
		print "--- [property] script_length decimal value: %d" % hex_string_to_decimal(input["script_length"])
		print "-- scriptSig: %s" % spaced(input["scriptSig"])
		print "-- sequence: %s" % spaced(input["sequence"])
	print ""
	print "- output_count: %s" % spaced(output_count)
	print "-- [property] output_count byte length: %d" % output_count_length
	print "-- [property] output_count decimal value: %d" % output_count_decimal
	for i, output in enumerate(outputs):
		print "\n- output #%d" % i
		print "-- value: %s" % spaced(output["value"])
		print "-- script_length: %s" % spaced(output["script_length"])
		#print "--- [property] script_length decimal value: %d" % hex_string_to_decimal(output["script_length"])
		print "-- scriptPubKey: %s" % spaced(output["scriptPubKey"])
	
	


def hex_string_to_decimal(string):
	total = 0
	n = len(string)
	for i in xrange(n):
		c = string[i] # c = character
		v = hex_character_to_decimal(c) # v = value
		power = n - i - 1 # first character has the highest power of 16. power of last character is 0. 
		total += v*(16**power)
	return total
	

def hex_character_to_decimal(c):
	decimal_digits = "0123456789"
	hex_digits = {"a": 10, "b": 11, "c": 12, "d": 13, "e": 14, "f": 15}
	if c in decimal_digits:
		return int(c)
	elif c in hex_digits.keys():
		return hex_digits[c]
	stop("character %s is not a hex character." % c)
		

def get_byte_length_of_var_int(byte):
	# examine the first byte of a variable length integer to find out its length. 
	if byte == "ff":
		return 9
	elif byte == "fe":
		return 5
	elif byte == "fd":
		return 3
	return 1


def spaced(item):
	# expects a string (single hex byte), a list (multiple hex bytes), or None
	if isinstance(item, str):
		return item
	elif isinstance(item, list):
		return " ".join(item)
	elif item == None:
		return "None"
	stop("item is not in [string, list, or None]. item_string = %s" % str(item))


def file_get_contents(file_path):
	from os.path import isfile
	if not isfile(file_path): 
		stop("%s is not a file." % file_path)
	f = open(file_path, "r")
	data = f.read()
	f.close()
	return data


def stop(message):
	import sys
	print "\nERROR: %s\n" % message
	sys.exit(1)


if __name__ == "__main__": main()



aineko:work stjohnpiano$ ./parser1.py


Transaction:

- [property] byte length: 617
- version: 01 00 00 00
- input_count: 03
-- [property] input_count byte length: 1
-- [property] input_count decimal value: 3

- input #0
-- previous_output_hash: c7 c4 e9 08 2d fb b3 71 3e 47 c1 be f9 f6 8e fd 9b 85 6d fc de e3 18 41 6d aa a2 ad d8 27 70 d4
-- previous_output_index: 01 00 00 00
-- script_length: 8b
--- [property] script_length decimal value: 139
-- scriptSig: 48 30 45 02 21 00 b2 fd 3f 8a 8c 22 6f 2a dd b1 b6 63 00 9c 34 4e 2c 35 1d ad 0d af 02 2d 1c b1 2f e7 7e 81 56 3a 02 20 6e 01 2e f6 23 5d 10 ee 05 8d 4c 3d 61 b4 4a 8c a1 8e bc 5e 99 38 8e 60 04 f3 06 b2 89 68 3e 02 01 41 04 36 5c 78 7a af 52 a1 81 a6 e1 10 f9 d3 da a0 81 03 f9 1b 15 12 be a2 13 98 b9 ea 1f 1b eb 5d ae 91 55 da ad 83 b1 da bb f3 16 c4 b6 e4 bb 43 83 44 20 4a 1d b1 d3 3c cb 47 d0 1c 20 51 c0 97 4a
-- sequence: ff ff ff ff

- input #1
-- previous_output_hash: ee 59 5b 71 cc 5a 1f b9 80 a7 7a f8 b9 62 53 4a e0 04 9b 29 06 fb e0 79 0e 48 fa 71 e9 f0 22 99
-- previous_output_index: 00 00 00 00
-- script_length: 8b
--- [property] script_length decimal value: 139
-- scriptSig: 48 30 45 02 21 00 fa f8 4d 50 e9 9d ee ef 7d 2c f9 f8 b4 60 0b 8d e5 6f d7 88 d9 f6 65 21 a0 43 78 f9 0b 49 a8 76 02 20 46 08 cd e7 76 fd a7 54 ce 87 1a bf f8 73 dc 4d 29 aa 34 c0 3a 50 d9 60 85 20 0e 82 5b ae 73 a4 01 41 04 56 84 d9 b3 83 46 de b7 f9 3b 6b 82 82 dc f8 22 7b fc b7 29 13 d8 f4 c5 fa d9 98 7e 38 77 04 67 fe e2 6b 5a 0b 57 d0 ae f4 df 40 02 46 3e c1 f1 93 46 40 d6 90 5e ed e1 ed 28 bb 7e 43 2b cb d1
-- sequence: ff ff ff ff

- input #2
-- previous_output_hash: 5e 32 74 71 c5 bd be cc a4 5f cc bd 69 8a 47 94 24 a3 1c 99 a9 70 4b 68 e6 19 f6 b3 e3 b9 29 55
-- previous_output_index: 01 00 00 00
-- script_length: 8a
--- [property] script_length decimal value: 138
-- scriptSig: 47 30 44 02 20 6f 36 1f b4 b9 7a ae a0 4f 18 cf 9f f8 1a 18 6e 48 ed 44 78 37 9e 6e 93 e0 ab 28 d4 d4 82 26 c9 02 20 15 65 b7 3f 2f 03 ef fa cf 42 9e 7a 84 05 43 c5 77 34 75 18 f2 56 dc c3 de f8 55 8f a8 ef fc ef 01 41 04 0d c0 d6 2b d8 7d 2e 54 be 3f 95 c4 18 7f a3 0d 48 d6 cd 00 43 19 78 40 1d d7 98 b7 45 07 d9 ad b4 57 fd 94 34 87 6f 56 27 35 ba ef 0b b9 d6 e3 53 66 c2 c1 81 b6 d9 61 b8 36 2a d9 5a f7 26 aa
-- sequence: ff ff ff ff

- output_count: 02
-- [property] output_count byte length: 1
-- [property] output_count decimal value: 2

- output #0
-- value: 72 8b 01 00 00 00 00 00
-- script_length: 19
-- scriptPubKey: 76 a9 14 df 3b d3 01 60 e6 c6 14 5b aa f2 c8 8a 88 44 c1 3a 00 d1 d5 88 ac

- output #1
-- value: None
-- script_length: None
-- scriptPubKey: None




Looks good.
Output #0's value and script_length are reasonable.



Let's try with both outputs.



Parser1

python 2.7.13
#!/opt/local/bin/python


def main():

	file_path_1 = "tr_ken1.txt"
	raw_transaction = file_get_contents(file_path_1)
	raw_transaction = raw_transaction.lower()
	
	if len(raw_transaction) % 2 != 0:
		stop("length of raw transaction in hex format is not even.")
	hex_bytes = [raw_transaction[i:i+2] for i in xrange(0, len(raw_transaction), 2)]
	
	version = hex_bytes[0:4]
	
	# next byte is the first byte of the input count. 
	input_count = hex_bytes[4]
	
	# input_count is a variable length integer, so we need to find out how long it is. 
	input_count_length = get_byte_length_of_var_int(input_count)
	
	if input_count_length != 1:
		stop("mark1: unwritten section.")
		# future: append appropriate number of bytes to input_count. 
	elif input_count_length == 1:
		input_count_decimal = hex_string_to_decimal(input_count)
	
	index = 4 + input_count_length # our current byte position within the raw transaction (starts from 0)
	
	# parse each input item
	input = { "previous_output_hash": None, 
			  "previous_output_index": None,
			  "script_length": None,
			  "scriptSig": None,
			  "sequence": None }
	inputs = [None]*input_count_decimal
	for i in xrange(input_count_decimal):
		inputs[i] = input.copy()
	for i in xrange(input_count_decimal):
		# next 32 bytes contain the hash of the transaction that contains the output that became this input. 
		previous_output_hash = hex_bytes[index:(index+32)]
		index += 32
		# next 4 bytes contain the index of the output in the previous transaction.
		previous_output_index = hex_bytes[index:(index+4)]
		index += 4
		# next byte is the first byte of the script length. 
		# script_length is a variable length integer, so we need to find out how long it is. 
		script_length = hex_bytes[index]
		script_length_length = get_byte_length_of_var_int(script_length)
		
		if script_length_length != 1:
			stop("mark2: unwritten section.")
			# future: append appropriate number of bytes to script_length. 
		elif script_length_length == 1:
			script_length_decimal = hex_string_to_decimal(script_length)
		index += script_length_length
		
		# select next [script_length_decimal] bytes. this will be the scriptSig.
		scriptSig = hex_bytes[index:(index+script_length_decimal)]
		
		index += script_length_decimal
		
		# next four bytes are the input's sequence number.
		sequence = hex_bytes[index:(index+4)]
		index += 4
		
		input = inputs[i]
		input["previous_output_hash"] = previous_output_hash
		input["previous_output_index"] = previous_output_index
		input["script_length"] = script_length
		input["scriptSig"] = scriptSig
		input["sequence"] = sequence
			
	
	# next byte is the first byte of the output count. 
	output_count = hex_bytes[index]
	
	# input_count is a variable length integer, so we need to find out how long it is. 
	output_count_length = get_byte_length_of_var_int(output_count)
	
	if output_count_length != 1:
		stop("mark3: unwritten section.")
		# future: append appropriate number of bytes to output_count. 
	elif output_count_length == 1:
		output_count_decimal = hex_string_to_decimal(output_count)
	
	index += output_count_length
	
	# parse each output item
	output = { "value": None, 
			  "script_length": None,
			  "scriptPubKey": None }
	outputs = [None]*output_count_decimal
	for i in xrange(output_count_decimal):
		outputs[i] = output.copy()
	for i in xrange(output_count_decimal):
		# next 8 bytes contain the value of the output. 
		value = hex_bytes[index:(index+8)]
		index += 8
		
		# next byte is the first byte of the script length. 
		# script_length is a variable length integer, so we need to find out how long it is. 
		script_length = hex_bytes[index]
		script_length_length = get_byte_length_of_var_int(script_length)
		
		if script_length_length != 1:
			stop("mark4: unwritten section.")
			# future: append appropriate number of bytes to script_length. 
		elif script_length_length == 1:
			script_length_decimal = hex_string_to_decimal(script_length)
		index += script_length_length
	
		# select next [script_length_decimal] bytes. this will be the scriptPubKey.
		scriptPubKey = hex_bytes[index:(index+script_length_decimal)]
		
		index += script_length_decimal
		
		output = outputs[i]
		output["value"] = value
		output["script_length"] = script_length
		output["scriptPubKey"] = scriptPubKey
		
		
	
	
	print "\nTransaction:\n"
	print "- [property] byte length: %d" % len(hex_bytes)
	print "- version: %s" % spaced(version)
	print "- input_count: %s" % spaced(input_count)
	print "-- [property] input_count byte length: %d" % input_count_length
	print "-- [property] input_count decimal value: %d" % input_count_decimal
	for i, input in enumerate(inputs):
		print "\n- input #%d" % i
		print "-- previous_output_hash: %s" % spaced(input["previous_output_hash"])
		print "-- previous_output_index: %s" % spaced(input["previous_output_index"])
		print "-- script_length: %s" % spaced(input["script_length"])
		print "--- [property] script_length decimal value: %d" % hex_string_to_decimal(input["script_length"])
		print "-- scriptSig: %s" % spaced(input["scriptSig"])
		print "-- sequence: %s" % spaced(input["sequence"])
	print ""
	print "- output_count: %s" % spaced(output_count)
	print "-- [property] output_count byte length: %d" % output_count_length
	print "-- [property] output_count decimal value: %d" % output_count_decimal
	for i, output in enumerate(outputs):
		print "\n- output #%d" % i
		print "-- value: %s" % spaced(output["value"])
		print "-- script_length: %s" % spaced(output["script_length"])
		print "--- [property] script_length decimal value: %d" % hex_string_to_decimal(output["script_length"])
		print "-- scriptPubKey: %s" % spaced(output["scriptPubKey"])
	print ""
	


def hex_string_to_decimal(string):
	total = 0
	n = len(string)
	for i in xrange(n):
		c = string[i] # c = character
		v = hex_character_to_decimal(c) # v = value
		power = n - i - 1 # first character has the highest power of 16. power of last character is 0. 
		total += v*(16**power)
	return total
	

def hex_character_to_decimal(c):
	decimal_digits = "0123456789"
	hex_digits = {"a": 10, "b": 11, "c": 12, "d": 13, "e": 14, "f": 15}
	if c in decimal_digits:
		return int(c)
	elif c in hex_digits.keys():
		return hex_digits[c]
	stop("character %s is not a hex character." % c)
		

def get_byte_length_of_var_int(byte):
	# examine the first byte of a variable length integer to find out its length. 
	if byte == "ff":
		return 9
	elif byte == "fe":
		return 5
	elif byte == "fd":
		return 3
	return 1


def spaced(item):
	# expects a string (single hex byte), a list (multiple hex bytes), or None
	if isinstance(item, str):
		return item
	elif isinstance(item, list):
		return " ".join(item)
	elif item == None:
		return "None"
	stop("item is not in [string, list, or None]. item_string = %s" % str(item))


def file_get_contents(file_path):
	from os.path import isfile
	if not isfile(file_path): 
		stop("%s is not a file." % file_path)
	f = open(file_path, "r")
	data = f.read()
	f.close()
	return data


def stop(message):
	import sys
	# raise an Exception to get a traceback. 
	raise Exception("ERROR: %s\n" % message)


if __name__ == "__main__": main()



aineko:work stjohnpiano$ ./parser1.py


Transaction:

- [property] byte length: 617
- version: 01 00 00 00
- input_count: 03
-- [property] input_count byte length: 1
-- [property] input_count decimal value: 3

- input #0
-- previous_output_hash: c7 c4 e9 08 2d fb b3 71 3e 47 c1 be f9 f6 8e fd 9b 85 6d fc de e3 18 41 6d aa a2 ad d8 27 70 d4
-- previous_output_index: 01 00 00 00
-- script_length: 8b
--- [property] script_length decimal value: 139
-- scriptSig: 48 30 45 02 21 00 b2 fd 3f 8a 8c 22 6f 2a dd b1 b6 63 00 9c 34 4e 2c 35 1d ad 0d af 02 2d 1c b1 2f e7 7e 81 56 3a 02 20 6e 01 2e f6 23 5d 10 ee 05 8d 4c 3d 61 b4 4a 8c a1 8e bc 5e 99 38 8e 60 04 f3 06 b2 89 68 3e 02 01 41 04 36 5c 78 7a af 52 a1 81 a6 e1 10 f9 d3 da a0 81 03 f9 1b 15 12 be a2 13 98 b9 ea 1f 1b eb 5d ae 91 55 da ad 83 b1 da bb f3 16 c4 b6 e4 bb 43 83 44 20 4a 1d b1 d3 3c cb 47 d0 1c 20 51 c0 97 4a
-- sequence: ff ff ff ff

- input #1
-- previous_output_hash: ee 59 5b 71 cc 5a 1f b9 80 a7 7a f8 b9 62 53 4a e0 04 9b 29 06 fb e0 79 0e 48 fa 71 e9 f0 22 99
-- previous_output_index: 00 00 00 00
-- script_length: 8b
--- [property] script_length decimal value: 139
-- scriptSig: 48 30 45 02 21 00 fa f8 4d 50 e9 9d ee ef 7d 2c f9 f8 b4 60 0b 8d e5 6f d7 88 d9 f6 65 21 a0 43 78 f9 0b 49 a8 76 02 20 46 08 cd e7 76 fd a7 54 ce 87 1a bf f8 73 dc 4d 29 aa 34 c0 3a 50 d9 60 85 20 0e 82 5b ae 73 a4 01 41 04 56 84 d9 b3 83 46 de b7 f9 3b 6b 82 82 dc f8 22 7b fc b7 29 13 d8 f4 c5 fa d9 98 7e 38 77 04 67 fe e2 6b 5a 0b 57 d0 ae f4 df 40 02 46 3e c1 f1 93 46 40 d6 90 5e ed e1 ed 28 bb 7e 43 2b cb d1
-- sequence: ff ff ff ff

- input #2
-- previous_output_hash: 5e 32 74 71 c5 bd be cc a4 5f cc bd 69 8a 47 94 24 a3 1c 99 a9 70 4b 68 e6 19 f6 b3 e3 b9 29 55
-- previous_output_index: 01 00 00 00
-- script_length: 8a
--- [property] script_length decimal value: 138
-- scriptSig: 47 30 44 02 20 6f 36 1f b4 b9 7a ae a0 4f 18 cf 9f f8 1a 18 6e 48 ed 44 78 37 9e 6e 93 e0 ab 28 d4 d4 82 26 c9 02 20 15 65 b7 3f 2f 03 ef fa cf 42 9e 7a 84 05 43 c5 77 34 75 18 f2 56 dc c3 de f8 55 8f a8 ef fc ef 01 41 04 0d c0 d6 2b d8 7d 2e 54 be 3f 95 c4 18 7f a3 0d 48 d6 cd 00 43 19 78 40 1d d7 98 b7 45 07 d9 ad b4 57 fd 94 34 87 6f 56 27 35 ba ef 0b b9 d6 e3 53 66 c2 c1 81 b6 d9 61 b8 36 2a d9 5a f7 26 aa
-- sequence: ff ff ff ff

- output_count: 02
-- [property] output_count byte length: 1
-- [property] output_count decimal value: 2

- output #0
-- value: 72 8b 01 00 00 00 00 00
-- script_length: 19
--- [property] script_length decimal value: 25
-- scriptPubKey: 76 a9 14 df 3b d3 01 60 e6 c6 14 5b aa f2 c8 8a 88 44 c1 3a 00 d1 d5 88 ac

- output #1
-- value: 95 87 1a 00 00 00 00 00
-- script_length: 19
--- [property] script_length decimal value: 25
-- scriptPubKey: 76 a9 14 9f 03 6f 85 50 66 93 bb 84 c5 d9 10 d5 8b dc f8 28 3b 20 ce 88 ac






Good. Finally, get the block lock time. Also, add a check to confirm that we have reached the end of the raw transaction i.e. that the index has the exact expected value.



Parser1

python 2.7.13
#!/opt/local/bin/python


def main():

	file_path_1 = "tr_ken1.txt"
	raw_transaction = file_get_contents(file_path_1)
	raw_transaction = raw_transaction.lower()
	
	if len(raw_transaction) % 2 != 0:
		stop("length of raw transaction in hex format is not even.")
	hex_bytes = [raw_transaction[i:i+2] for i in xrange(0, len(raw_transaction), 2)]
	
	version = hex_bytes[0:4]
	
	# next byte is the first byte of the input count. 
	input_count = hex_bytes[4]
	
	# input_count is a variable length integer, so we need to find out how long it is. 
	input_count_length = get_byte_length_of_var_int(input_count)
	
	if input_count_length != 1:
		stop("mark1: unwritten section.")
		# future: append appropriate number of bytes to input_count. 
	elif input_count_length == 1:
		input_count_decimal = hex_string_to_decimal(input_count)
	
	index = 4 + input_count_length # our current byte position within the raw transaction (starts from 0)
	
	# parse each input item
	input = { "previous_output_hash": None, 
			  "previous_output_index": None,
			  "script_length": None,
			  "scriptSig": None,
			  "sequence": None }
	inputs = [None]*input_count_decimal
	for i in xrange(input_count_decimal):
		inputs[i] = input.copy()
	for i in xrange(input_count_decimal):
		# next 32 bytes contain the hash of the transaction that contains the output that became this input. 
		previous_output_hash = hex_bytes[index:(index+32)]
		index += 32
		# next 4 bytes contain the index of the output in the previous transaction.
		previous_output_index = hex_bytes[index:(index+4)]
		index += 4
		# next byte is the first byte of the script length. 
		# script_length is a variable length integer, so we need to find out how long it is. 
		script_length = hex_bytes[index]
		script_length_length = get_byte_length_of_var_int(script_length)
		
		if script_length_length != 1:
			stop("mark2: unwritten section.")
			# future: append appropriate number of bytes to script_length. 
		elif script_length_length == 1:
			script_length_decimal = hex_string_to_decimal(script_length)
		index += script_length_length
		
		# select next [script_length_decimal] bytes. this will be the scriptSig.
		scriptSig = hex_bytes[index:(index+script_length_decimal)]
		
		index += script_length_decimal
		
		# next four bytes are the input's sequence number.
		sequence = hex_bytes[index:(index+4)]
		index += 4
		
		input = inputs[i]
		input["previous_output_hash"] = previous_output_hash
		input["previous_output_index"] = previous_output_index
		input["script_length"] = script_length
		input["scriptSig"] = scriptSig
		input["sequence"] = sequence
		
	
	# next byte is the first byte of the output count. 
	output_count = hex_bytes[index]
	
	# input_count is a variable length integer, so we need to find out how long it is. 
	output_count_length = get_byte_length_of_var_int(output_count)
	
	if output_count_length != 1:
		stop("mark3: unwritten section.")
		# future: append appropriate number of bytes to output_count. 
	elif output_count_length == 1:
		output_count_decimal = hex_string_to_decimal(output_count)
	
	index += output_count_length
	
	# parse each output item
	output = { "value": None, 
			  "script_length": None,
			  "scriptPubKey": None }
	outputs = [None]*output_count_decimal
	for i in xrange(output_count_decimal):
		outputs[i] = output.copy()
	for i in xrange(output_count_decimal):
		# next 8 bytes contain the value of the output. 
		value = hex_bytes[index:(index+8)]
		index += 8
		
		# next byte is the first byte of the script length. 
		# script_length is a variable length integer, so we need to find out how long it is. 
		script_length = hex_bytes[index]
		script_length_length = get_byte_length_of_var_int(script_length)
		
		if script_length_length != 1:
			stop("mark4: unwritten section.")
			# future: append appropriate number of bytes to script_length. 
		elif script_length_length == 1:
			script_length_decimal = hex_string_to_decimal(script_length)
		index += script_length_length
	
		# select next [script_length_decimal] bytes. this will be the scriptPubKey.
		scriptPubKey = hex_bytes[index:(index+script_length_decimal)]
		
		index += script_length_decimal
		
		output = outputs[i]
		output["value"] = value
		output["script_length"] = script_length
		output["scriptPubKey"] = scriptPubKey
	
	
	# the last four bytes are the block lock time
	block_lock_time = hex_bytes[index:(index+4)]
	index += 4
	
	
	# After parsing the raw transaction, the index (indicating our current position in the byte sequence) should be equal to the length in bytes of the entire transaction. This will confirm that we have parsed exactly the right number of bytes. I'm adding this check because Python's array slicing is quite forgiving and won't throw an error if we overshoot.
	assert index == len(hex_bytes)
	
	
	print "\nTransaction:\n"
	print "- [property] byte length: %d" % len(hex_bytes)
	print "- version: %s" % spaced(version)
	print "- input_count: %s" % spaced(input_count)
	print "-- [property] input_count byte length: %d" % input_count_length
	print "-- [property] input_count decimal value: %d" % input_count_decimal
	for i, input in enumerate(inputs):
		print "\n- input #%d" % i
		print "-- previous_output_hash: %s" % spaced(input["previous_output_hash"])
		print "-- previous_output_index: %s" % spaced(input["previous_output_index"])
		print "-- script_length: %s" % spaced(input["script_length"])
		print "--- [property] script_length decimal value: %d" % hex_string_to_decimal(input["script_length"])
		print "-- scriptSig: %s" % spaced(input["scriptSig"])
		print "-- sequence: %s" % spaced(input["sequence"])
	print ""
	print "- output_count: %s" % spaced(output_count)
	print "-- [property] output_count byte length: %d" % output_count_length
	print "-- [property] output_count decimal value: %d" % output_count_decimal
	for i, output in enumerate(outputs):
		print "\n- output #%d" % i
		print "-- value: %s" % spaced(output["value"])
		print "-- script_length: %s" % spaced(output["script_length"])
		print "--- [property] script_length decimal value: %d" % hex_string_to_decimal(output["script_length"])
		print "-- scriptPubKey: %s" % spaced(output["scriptPubKey"])
	print "\n- block lock time: %s" % spaced(block_lock_time)
	print ""
	


def hex_string_to_decimal(string):
	total = 0
	n = len(string)
	for i in xrange(n):
		c = string[i] # c = character
		v = hex_character_to_decimal(c) # v = value
		power = n - i - 1 # first character has the highest power of 16. power of last character is 0. 
		total += v*(16**power)
	return total
	

def hex_character_to_decimal(c):
	decimal_digits = "0123456789"
	hex_digits = {"a": 10, "b": 11, "c": 12, "d": 13, "e": 14, "f": 15}
	if c in decimal_digits:
		return int(c)
	elif c in hex_digits.keys():
		return hex_digits[c]
	stop("character %s is not a hex character." % c)
		

def get_byte_length_of_var_int(byte):
	# examine the first byte of a variable length integer to find out its length. 
	if byte == "ff":
		return 9
	elif byte == "fe":
		return 5
	elif byte == "fd":
		return 3
	return 1


def spaced(item):
	# expects a string (single hex byte), a list (multiple hex bytes), or None
	if isinstance(item, str):
		return item
	elif isinstance(item, list):
		return " ".join(item)
	elif item == None:
		return "None"
	stop("item is not in [string, list, or None]. item_string = %s" % str(item))


def file_get_contents(file_path):
	from os.path import isfile
	if not isfile(file_path): 
		stop("%s is not a file." % file_path)
	f = open(file_path, "r")
	data = f.read()
	f.close()
	return data


def stop(message):
	import sys
	# raise an Exception to get a traceback. 
	raise Exception("ERROR: %s\n" % message)


if __name__ == "__main__": main()



aineko:work stjohnpiano$ ./parser1.py


Transaction:

- [property] byte length: 617
- version: 01 00 00 00
- input_count: 03
-- [property] input_count byte length: 1
-- [property] input_count decimal value: 3

- input #0
-- previous_output_hash: c7 c4 e9 08 2d fb b3 71 3e 47 c1 be f9 f6 8e fd 9b 85 6d fc de e3 18 41 6d aa a2 ad d8 27 70 d4
-- previous_output_index: 01 00 00 00
-- script_length: 8b
--- [property] script_length decimal value: 139
-- scriptSig: 48 30 45 02 21 00 b2 fd 3f 8a 8c 22 6f 2a dd b1 b6 63 00 9c 34 4e 2c 35 1d ad 0d af 02 2d 1c b1 2f e7 7e 81 56 3a 02 20 6e 01 2e f6 23 5d 10 ee 05 8d 4c 3d 61 b4 4a 8c a1 8e bc 5e 99 38 8e 60 04 f3 06 b2 89 68 3e 02 01 41 04 36 5c 78 7a af 52 a1 81 a6 e1 10 f9 d3 da a0 81 03 f9 1b 15 12 be a2 13 98 b9 ea 1f 1b eb 5d ae 91 55 da ad 83 b1 da bb f3 16 c4 b6 e4 bb 43 83 44 20 4a 1d b1 d3 3c cb 47 d0 1c 20 51 c0 97 4a
-- sequence: ff ff ff ff

- input #1
-- previous_output_hash: ee 59 5b 71 cc 5a 1f b9 80 a7 7a f8 b9 62 53 4a e0 04 9b 29 06 fb e0 79 0e 48 fa 71 e9 f0 22 99
-- previous_output_index: 00 00 00 00
-- script_length: 8b
--- [property] script_length decimal value: 139
-- scriptSig: 48 30 45 02 21 00 fa f8 4d 50 e9 9d ee ef 7d 2c f9 f8 b4 60 0b 8d e5 6f d7 88 d9 f6 65 21 a0 43 78 f9 0b 49 a8 76 02 20 46 08 cd e7 76 fd a7 54 ce 87 1a bf f8 73 dc 4d 29 aa 34 c0 3a 50 d9 60 85 20 0e 82 5b ae 73 a4 01 41 04 56 84 d9 b3 83 46 de b7 f9 3b 6b 82 82 dc f8 22 7b fc b7 29 13 d8 f4 c5 fa d9 98 7e 38 77 04 67 fe e2 6b 5a 0b 57 d0 ae f4 df 40 02 46 3e c1 f1 93 46 40 d6 90 5e ed e1 ed 28 bb 7e 43 2b cb d1
-- sequence: ff ff ff ff

- input #2
-- previous_output_hash: 5e 32 74 71 c5 bd be cc a4 5f cc bd 69 8a 47 94 24 a3 1c 99 a9 70 4b 68 e6 19 f6 b3 e3 b9 29 55
-- previous_output_index: 01 00 00 00
-- script_length: 8a
--- [property] script_length decimal value: 138
-- scriptSig: 47 30 44 02 20 6f 36 1f b4 b9 7a ae a0 4f 18 cf 9f f8 1a 18 6e 48 ed 44 78 37 9e 6e 93 e0 ab 28 d4 d4 82 26 c9 02 20 15 65 b7 3f 2f 03 ef fa cf 42 9e 7a 84 05 43 c5 77 34 75 18 f2 56 dc c3 de f8 55 8f a8 ef fc ef 01 41 04 0d c0 d6 2b d8 7d 2e 54 be 3f 95 c4 18 7f a3 0d 48 d6 cd 00 43 19 78 40 1d d7 98 b7 45 07 d9 ad b4 57 fd 94 34 87 6f 56 27 35 ba ef 0b b9 d6 e3 53 66 c2 c1 81 b6 d9 61 b8 36 2a d9 5a f7 26 aa
-- sequence: ff ff ff ff

- output_count: 02
-- [property] output_count byte length: 1
-- [property] output_count decimal value: 2

- output #0
-- value: 72 8b 01 00 00 00 00 00
-- script_length: 19
--- [property] script_length decimal value: 25
-- scriptPubKey: 76 a9 14 df 3b d3 01 60 e6 c6 14 5b aa f2 c8 8a 88 44 c1 3a 00 d1 d5 88 ac

- output #1
-- value: 95 87 1a 00 00 00 00 00
-- script_length: 19
--- [property] script_length decimal value: 25
-- scriptPubKey: 76 a9 14 9f 03 6f 85 50 66 93 bb 84 c5 d9 10 d5 8b dc f8 28 3b 20 ce 88 ac

- block lock time: 00 00 00 00




I'd like to see the output values in bitcoin rather than little-endian hex.




Parser1

python 2.7.13
#!/opt/local/bin/python


def main():

	file_path_1 = "tr_ken1.txt"
	raw_transaction = file_get_contents(file_path_1)
	raw_transaction = raw_transaction.lower()
	
	if len(raw_transaction) % 2 != 0:
		stop("length of raw transaction in hex format is not even.")
	hex_bytes = [raw_transaction[i:i+2] for i in xrange(0, len(raw_transaction), 2)]
	
	version = hex_bytes[0:4]
	
	# next byte is the first byte of the input count. 
	input_count = hex_bytes[4]
	
	# input_count is a variable length integer, so we need to find out how long it is. 
	input_count_length = get_byte_length_of_var_int(input_count)
	
	if input_count_length != 1:
		stop("mark1: unwritten section.")
		# future: append appropriate number of bytes to input_count. 
	elif input_count_length == 1:
		input_count_decimal = hex_string_to_decimal(input_count)
	
	index = 4 + input_count_length # our current byte position within the raw transaction (starts from 0)
	
	# parse each input item
	input = { "previous_output_hash": None, 
			  "previous_output_index": None,
			  "script_length": None,
			  "scriptSig": None,
			  "sequence": None }
	inputs = [None]*input_count_decimal
	for i in xrange(input_count_decimal):
		inputs[i] = input.copy()
	for i in xrange(input_count_decimal):
		# next 32 bytes contain the hash of the transaction that contains the output that became this input. 
		previous_output_hash = hex_bytes[index:(index+32)]
		index += 32
		# next 4 bytes contain the index of the output in the previous transaction.
		previous_output_index = hex_bytes[index:(index+4)]
		index += 4
		# next byte is the first byte of the script length. 
		# script_length is a variable length integer, so we need to find out how long it is. 
		script_length = hex_bytes[index]
		script_length_length = get_byte_length_of_var_int(script_length)
		
		if script_length_length != 1:
			stop("mark2: unwritten section.")
			# future: append appropriate number of bytes to script_length. 
		elif script_length_length == 1:
			script_length_decimal = hex_string_to_decimal(script_length)
		index += script_length_length
		
		# select next [script_length_decimal] bytes. this will be the scriptSig.
		scriptSig = hex_bytes[index:(index+script_length_decimal)]
		
		index += script_length_decimal
		
		# next four bytes are the input's sequence number.
		sequence = hex_bytes[index:(index+4)]
		index += 4
		
		input = inputs[i]
		input["previous_output_hash"] = previous_output_hash
		input["previous_output_index"] = previous_output_index
		input["script_length"] = script_length
		input["scriptSig"] = scriptSig
		input["sequence"] = sequence
		
	
	# next byte is the first byte of the output count. 
	output_count = hex_bytes[index]
	
	# input_count is a variable length integer, so we need to find out how long it is. 
	output_count_length = get_byte_length_of_var_int(output_count)
	
	if output_count_length != 1:
		stop("mark3: unwritten section.")
		# future: append appropriate number of bytes to output_count. 
	elif output_count_length == 1:
		output_count_decimal = hex_string_to_decimal(output_count)
	
	index += output_count_length
	
	# parse each output item
	output = { "value": None, 
			  "script_length": None,
			  "scriptPubKey": None }
	outputs = [None]*output_count_decimal
	for i in xrange(output_count_decimal):
		outputs[i] = output.copy()
	for i in xrange(output_count_decimal):
		# next 8 bytes contain the value of the output. 
		value = hex_bytes[index:(index+8)]
		index += 8
		
		# next byte is the first byte of the script length. 
		# script_length is a variable length integer, so we need to find out how long it is. 
		script_length = hex_bytes[index]
		script_length_length = get_byte_length_of_var_int(script_length)
		
		if script_length_length != 1:
			stop("mark4: unwritten section.")
			# future: append appropriate number of bytes to script_length. 
		elif script_length_length == 1:
			script_length_decimal = hex_string_to_decimal(script_length)
		index += script_length_length
	
		# select next [script_length_decimal] bytes. this will be the scriptPubKey.
		scriptPubKey = hex_bytes[index:(index+script_length_decimal)]
		
		index += script_length_decimal
		
		output = outputs[i]
		output["value"] = value
		output["script_length"] = script_length
		output["scriptPubKey"] = scriptPubKey
	
	
	# the last four bytes are the block lock time
	block_lock_time = hex_bytes[index:(index+4)]
	index += 4
	
	
	# After parsing the raw transaction, the index (indicating our current position in the byte sequence) should be equal to the length in bytes of the entire transaction. This will confirm that we have parsed exactly the right number of bytes. I'm adding this check because Python's array slicing is quite forgiving and won't throw an error if we overshoot.
	assert index == len(hex_bytes)
	
	
	print "\nTransaction:\n"
	print "- [property] byte length: %d" % len(hex_bytes)
	print "- version: %s" % spaced(version)
	print "- input_count: %s" % spaced(input_count)
	print "-- [property] input_count byte length: %d" % input_count_length
	print "-- [property] input_count decimal value: %d" % input_count_decimal
	for i, input in enumerate(inputs):
		print "\n- input #%d" % i
		print "-- previous_output_hash: %s" % spaced(input["previous_output_hash"])
		print "-- previous_output_index: %s" % spaced(input["previous_output_index"])
		print "-- script_length: %s" % spaced(input["script_length"])
		print "--- [property] script_length decimal value: %d" % hex_string_to_decimal(input["script_length"])
		print "-- scriptSig: %s" % spaced(input["scriptSig"])
		print "-- sequence: %s" % spaced(input["sequence"])
	print ""
	print "- output_count: %s" % spaced(output_count)
	print "-- [property] output_count byte length: %d" % output_count_length
	print "-- [property] output_count decimal value: %d" % output_count_decimal
	for i, output in enumerate(outputs):
		print "\n- output #%d" % i
		print "-- value: %s" % spaced(output["value"])
		print "--- [property] output value in bitcoin: %s" % convert_output_value_to_bitcoin_string(output["value"])
		print "-- script_length: %s" % spaced(output["script_length"])
		print "--- [property] script_length decimal value: %d" % hex_string_to_decimal(output["script_length"])
		print "-- scriptPubKey: %s" % spaced(output["scriptPubKey"])
	print "\n- block lock time: %s" % spaced(block_lock_time)
	print ""



def convert_output_value_to_bitcoin_string(value):
	# convert 8-byte output value (little-endian hex) into bitcoin value. 
	value = list(reversed(value))
	vs = "".join(value) # vs = value string
	vs_decimal = str(hex_string_to_decimal(vs))
	n = len(vs_decimal) # number of characters in the decimal form of the value string.
	if n - 8 > 0: # more than 8 characters i.e. greater than 1 btc. bitcoin has 8 decimal places.
		# add the decimal point at the appropriate position within the value string.
		section1 = vs_decimal[:-8]
		section2 = vs_decimal[-8:]
		vs_decimal = section1 + "." + section2
	else:
		# 8 or fewer characters i.e. less than 1 btc.
		# add a 0, a decimal point, and the appropriate number of 0s in front of the value string.
		n_zeros = abs(n-8) # appropriate number of zeros
		vs_decimal = "0." + "0"*n_zeros + vs_decimal
	return vs_decimal


def hex_string_to_decimal(string):
	total = 0
	n = len(string)
	for i in xrange(n):
		c = string[i] # c = character
		v = hex_character_to_decimal(c) # v = value
		power = n - i - 1 # first character has the highest power of 16. power of last character is 0. 
		total += v*(16**power)
	return total
	

def hex_character_to_decimal(c):
	decimal_digits = "0123456789"
	hex_digits = {"a": 10, "b": 11, "c": 12, "d": 13, "e": 14, "f": 15}
	if c in decimal_digits:
		return int(c)
	elif c in hex_digits.keys():
		return hex_digits[c]
	stop("character %s is not a hex character." % c)
		

def get_byte_length_of_var_int(byte):
	# examine the first byte of a variable length integer to find out its length. 
	if byte == "ff":
		return 9
	elif byte == "fe":
		return 5
	elif byte == "fd":
		return 3
	return 1


def spaced(item):
	# expects a string (single hex byte), a list (multiple hex bytes), or None
	if isinstance(item, str):
		return item
	elif isinstance(item, list):
		return " ".join(item)
	elif item == None:
		return "None"
	stop("item is not in [string, list, or None]. item_string = %s" % str(item))


def file_get_contents(file_path):
	from os.path import isfile
	if not isfile(file_path): 
		stop("%s is not a file." % file_path)
	f = open(file_path, "r")
	data = f.read()
	f.close()
	return data


def stop(message):
	import sys
	# raise an Exception to get a traceback. 
	raise Exception("ERROR: %s\n" % message)


if __name__ == "__main__": main()



aineko:work stjohnpiano$ ./parser1.py


Transaction:

- [property] byte length: 617
- version: 01 00 00 00
- input_count: 03
-- [property] input_count byte length: 1
-- [property] input_count decimal value: 3

- input #0
-- previous_output_hash: c7 c4 e9 08 2d fb b3 71 3e 47 c1 be f9 f6 8e fd 9b 85 6d fc de e3 18 41 6d aa a2 ad d8 27 70 d4
-- previous_output_index: 01 00 00 00
-- script_length: 8b
--- [property] script_length decimal value: 139
-- scriptSig: 48 30 45 02 21 00 b2 fd 3f 8a 8c 22 6f 2a dd b1 b6 63 00 9c 34 4e 2c 35 1d ad 0d af 02 2d 1c b1 2f e7 7e 81 56 3a 02 20 6e 01 2e f6 23 5d 10 ee 05 8d 4c 3d 61 b4 4a 8c a1 8e bc 5e 99 38 8e 60 04 f3 06 b2 89 68 3e 02 01 41 04 36 5c 78 7a af 52 a1 81 a6 e1 10 f9 d3 da a0 81 03 f9 1b 15 12 be a2 13 98 b9 ea 1f 1b eb 5d ae 91 55 da ad 83 b1 da bb f3 16 c4 b6 e4 bb 43 83 44 20 4a 1d b1 d3 3c cb 47 d0 1c 20 51 c0 97 4a
-- sequence: ff ff ff ff

- input #1
-- previous_output_hash: ee 59 5b 71 cc 5a 1f b9 80 a7 7a f8 b9 62 53 4a e0 04 9b 29 06 fb e0 79 0e 48 fa 71 e9 f0 22 99
-- previous_output_index: 00 00 00 00
-- script_length: 8b
--- [property] script_length decimal value: 139
-- scriptSig: 48 30 45 02 21 00 fa f8 4d 50 e9 9d ee ef 7d 2c f9 f8 b4 60 0b 8d e5 6f d7 88 d9 f6 65 21 a0 43 78 f9 0b 49 a8 76 02 20 46 08 cd e7 76 fd a7 54 ce 87 1a bf f8 73 dc 4d 29 aa 34 c0 3a 50 d9 60 85 20 0e 82 5b ae 73 a4 01 41 04 56 84 d9 b3 83 46 de b7 f9 3b 6b 82 82 dc f8 22 7b fc b7 29 13 d8 f4 c5 fa d9 98 7e 38 77 04 67 fe e2 6b 5a 0b 57 d0 ae f4 df 40 02 46 3e c1 f1 93 46 40 d6 90 5e ed e1 ed 28 bb 7e 43 2b cb d1
-- sequence: ff ff ff ff

- input #2
-- previous_output_hash: 5e 32 74 71 c5 bd be cc a4 5f cc bd 69 8a 47 94 24 a3 1c 99 a9 70 4b 68 e6 19 f6 b3 e3 b9 29 55
-- previous_output_index: 01 00 00 00
-- script_length: 8a
--- [property] script_length decimal value: 138
-- scriptSig: 47 30 44 02 20 6f 36 1f b4 b9 7a ae a0 4f 18 cf 9f f8 1a 18 6e 48 ed 44 78 37 9e 6e 93 e0 ab 28 d4 d4 82 26 c9 02 20 15 65 b7 3f 2f 03 ef fa cf 42 9e 7a 84 05 43 c5 77 34 75 18 f2 56 dc c3 de f8 55 8f a8 ef fc ef 01 41 04 0d c0 d6 2b d8 7d 2e 54 be 3f 95 c4 18 7f a3 0d 48 d6 cd 00 43 19 78 40 1d d7 98 b7 45 07 d9 ad b4 57 fd 94 34 87 6f 56 27 35 ba ef 0b b9 d6 e3 53 66 c2 c1 81 b6 d9 61 b8 36 2a d9 5a f7 26 aa
-- sequence: ff ff ff ff

- output_count: 02
-- [property] output_count byte length: 1
-- [property] output_count decimal value: 2

- output #0
-- value: 72 8b 01 00 00 00 00 00
--- [property] output value in bitcoin: 0.00101234
-- script_length: 19
--- [property] script_length decimal value: 25
-- scriptPubKey: 76 a9 14 df 3b d3 01 60 e6 c6 14 5b aa f2 c8 8a 88 44 c1 3a 00 d1 d5 88 ac

- output #1
-- value: 95 87 1a 00 00 00 00 00
--- [property] output value in bitcoin: 0.01738645
-- script_length: 19
--- [property] script_length decimal value: 25
-- scriptPubKey: 76 a9 14 9f 03 6f 85 50 66 93 bb 84 c5 d9 10 d5 8b dc f8 28 3b 20 ce 88 ac

- block lock time: 00 00 00 00





Hm. Good.

Remove
import sys
from
stop(message)
, as it is now unnecessary.



For comparison, I'll get the result of pasting in the raw transaction tr_ken1 into:
blockchain.info/decode-tx

Result:

{ "lock_time":0, "size":617, "inputs":[ { "prev_out":{ "index":1, "hash":"d47027d8ada2aa6d4118e3defc6d859bfd8ef6f9bec1473e71b3fb2d08e9c4c7" }, "script":"483045022100b2fd3f8a8c226f2addb1b663009c344e2c351dad0daf022d1cb12fe77e81563a02206e012ef6235d10ee058d4c3d61b44a8ca18ebc5e99388e6004f306b289683e02014104365c787aaf52a181a6e110f9d3daa08103f91b1512bea21398b9ea1f1beb5dae9155daad83b1dabbf316c4b6e4bb438344204a1db1d33ccb47d01c2051c0974a" }, { "prev_out":{ "index":0, "hash":"9922f0e971fa480e79e0fb06299b04e04a5362b9f87aa780b91f5acc715b59ee" }, "script":"483045022100faf84d50e99deeef7d2cf9f8b4600b8de56fd788d9f66521a04378f90b49a87602204608cde776fda754ce871abff873dc4d29aa34c03a50d96085200e825bae73a40141045684d9b38346deb7f93b6b8282dcf8227bfcb72913d8f4c5fad9987e38770467fee26b5a0b57d0aef4df4002463ec1f1934640d6905eede1ed28bb7e432bcbd1" }, { "prev_out":{ "index":1, "hash":"5529b9e3b3f619e6684b70a9991ca32494478a69bdcc5fa4ccbebdc57174325e" }, "script":"47304402206f361fb4b97aaea04f18cf9ff81a186e48ed4478379e6e93e0ab28d4d48226c902201565b73f2f03effacf429e7a840543c577347518f256dcc3def8558fa8effcef0141040dc0d62bd87d2e54be3f95c4187fa30d48d6cd00431978401dd798b74507d9adb457fd9434876f562735baef0bb9d6e35366c2c181b6d961b8362ad95af726aa" } ], "version":1, "vin_sz":3, "hash":"81b4c832d70cb56ff957589752eb4125a4cab78a25a8fc52d6a09e5bd4404d48", "vout_sz":2, "out":[ { "script_string":"OP_DUP OP_HASH160 df3bd30160e6c6145baaf2c88a8844c13a00d1d5 OP_EQUALVERIFY OP_CHECKSIG", "address":"1MMMMSUb1piy2ufrSguNUdFmAcvqrQF8M5", "value":101234, "script":"76a914df3bd30160e6c6145baaf2c88a8844c13a00d1d588ac" }, { "script_string":"OP_DUP OP_HASH160 9f036f85506693bb84c5d910d58bdcf8283b20ce OP_EQUALVERIFY OP_CHECKSIG", "address":"1FVnZDLN2c2RqnqLkySoeYD9n7BiydPhMv", "value":1738645, "script":"76a9149f036f85506693bb84c5d910d58bdcf8283b20ce88ac" } ] }



Let's compare results.


These two results agree on:
- size (617 bytes)
- block lock time (0 or 00 00 00 00)
- version (1 or 01 00 00 00 (little-endian))
- input_count (3 or 03)
- output_count (2 or 02)
-- Note: some reading indicates that vin_sz == input_count and vout_sz == output_count.

Differences:
- blockchain.info: The 'hash' on the first level of the tree
(81b4c832d70cb56ff957589752eb4125a4cab78a25a8fc52d6a09e5bd4404d48)
is not present in my parsing result. Some reading indicates that this is the transaction id of the transaction, calculated by hashing the transaction. I'm not sure what format the transaction has to be in before this hashing operation.


The previous output hash values in the inputs appear to be reversed in the blockchain.info result.

Let's confirm equality.

Example of checking the equality of a hash from blockchain.info and from my result:

aineko:work stjohnpiano$ python

Python 2.7.13 (default, Dec 18 2016, 05:35:59)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> hash_s1 = "c7 c4 e9 08 2d fb b3 71 3e 47 c1 be f9 f6 8e fd 9b 85 6d fc de e3 18 41 6d aa a2 ad d8 27 70 d4"
>>> hash_b1 = "d47027d8ada2aa6d4118e3defc6d859bfd8ef6f9bec1473e71b3fb2d08e9c4c7"
>>> bytes_s1 = hash_s1.split(" ")
>>> bytes_s1_reversed = list(reversed(bytes_s1))
>>> bytes_s1_string = "".join(bytes_s1_reversed)
>>> bytes_s1_string
>>> 'd47027d8ada2aa6d4118e3defc6d859bfd8ef6f9bec1473e71b3fb2d08e9c4c7'

>>> hash_b1
>>> 'd47027d8ada2aa6d4118e3defc6d859bfd8ef6f9bec1473e71b3fb2d08e9c4c7'

>>> bytes_s1_string == hash_b1
>>> True


The "_b1" suffix stands for "blockchain.info 1". The "_s1" suffix stands for "StJohn 1".

Repeat this process for the other two input previous_output_hash values.

They're all equal.


More agreements:
- The inputs appear in the same order.
- The inputs have the same previous_output_index values (1, 0, 1 or 01 00 00 00, 00 00 00 00, 01 00 00 00)



The scriptSig values in my result output do not appear to be reversed.


Let's confirm equality of the scriptSig values.

Example of checking the equality of a scriptSig from blockchain.info and from my result:

aineko:work stjohnpiano$ python

Python 2.7.13 (default, Dec 18 2016, 05:35:59)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> script_b1 = "483045022100b2fd3f8a8c226f2addb1b663009c344e2c351dad0daf022d1cb12fe77e81563a02206e012ef6235d10ee058d4c3d61b44a8ca18ebc5e99388e6004f306b289683e02014104365c787aaf52a181a6e110f9d3daa08103f91b1512bea21398b9ea1f1beb5dae9155daad83b1dabbf316c4b6e4bb438344204a1db1d33ccb47d01c2051c0974a"
>>> script_s1 = "48 30 45 02 21 00 b2 fd 3f 8a 8c 22 6f 2a dd b1 b6 63 00 9c 34 4e 2c 35 1d ad 0d af 02 2d 1c b1 2f e7 7e 81 56 3a 02 20 6e 01 2e f6 23 5d 10 ee 05 8d 4c 3d 61 b4 4a 8c a1 8e bc 5e 99 38 8e 60 04 f3 06 b2 89 68 3e 02 01 41 04 36 5c 78 7a af 52 a1 81 a6 e1 10 f9 d3 da a0 81 03 f9 1b 15 12 be a2 13 98 b9 ea 1f 1b eb 5d ae 91 55 da ad 83 b1 da bb f3 16 c4 b6 e4 bb 43 83 44 20 4a 1d b1 d3 3c cb 47 d0 1c 20 51 c0 97 4a"
>>> script_s1 = script_s1.replace(" ","")
>>> script_s1 == script_b1
>>> True



Repeat this process for the other two input scriptSig values.

They're all equal.


In the blockchain.info result, the "script" item within each output would appear to be the scriptPubKey in my result.

The scripPubKey values in my result do not appear to be reversed.

Confirm equality of the scriptPubKey values, using the same process for checking the scriptSig values shown above.

They're both equal.

More agreements:
- The outputs are in the same order.
- Output #0 has the same value in each result (101234 or 0.00101234).
- Output #1 has the same value in each result (1738645 or 0.01738645).


Notes:
- The "script_string" value appears to be decoded from the "script" value.
- It's strange that the "address" for an output is returned in the blockchain.info result, as this information is not part of a raw transaction. Only the hash of the public key is stored in the transaction, not the public key itself. The public key becomes known only when the output is spent in a new transaction (the public key is included within the scriptSig). The "address" must therefore be retrieved from the next transaction that spends this output, stored in the data system of blockchain.info.
- Sequence numbers are included in my result but not in the blockchain.info result.












Ok. Let's continue.



In Ken Shirriff's signed transaction (tr_ken2), the previous output index of the single input was: 00 00 00 00
This equals 0.

So, the first output from the previous transaction (tr_ken1) is used as the input in tr_ken2. (Outputs in a transaction are implicitly 0-indexed.)

From the output of parser1.py, I see that the scriptPubKey of output #0 is:
76 a9 14 df 3b d3 01 60 e6 c6 14 5b aa f2 c8 8a 88 44 c1 3a 00 d1 d5 88 ac


Hm.

Next: Look at the scriptPubKey format for basic transactions (i.e. where outputs are Pay-To-Public-Key-Hash (P2PKH)).



In this sentence from an earlier excerpt from Ken Shirriff's article,
"The Script language is surprisingly complex, with about 80 different opcodes."
The phrase "80 different opcodes" links to
en.bitcoin.it/wiki/Script



Excerpts from:
en.bitcoin.it/wiki/Script

Bitcoin uses a scripting system for transactions. Forth-like, Script is simple, stack-based, and processed from left to right. It is intentionally not Turing-complete, with no loops.

A script is essentially a list of instructions recorded with each transaction that describe how the next person wanting to spend the Bitcoins being transferred can gain access to them. The script for a typical Bitcoin transfer to destination Bitcoin address D simply encumbers future spending of the bitcoins with two things: the spender must provide

1) a public key that, when hashed, yields destination address D embedded in the script, and
2) a signature to prove ownership of the private key corresponding to the public key just provided.

[...]

A transaction is valid if nothing in the combined script triggers failure and the top stack item is True (non-zero) when the script exits. [...]

This document is for information purposes only. Officially, Bitcoin script is defined by its reference implementation
[ http://github.com/bitcoin/bitcoin/blob/master/src/script/interpreter.cpp ].

The stacks hold byte vectors. When used as numbers, byte vectors are interpreted as little-endian variable-length integers with the most significant bit determining the sign of the integer. Thus 0x81 represents -1. 0x80 is another representation of zero (so called negative 0). Positive 0 is represented by a null-length vector. Byte vectors are interpreted as Booleans where False is represented by any representation of zero and True is represented by any representation of non-zero.

Leading zeros in an integer and negative zero are allowed in blocks but get rejected by the stricter requirements which standard full nodes put on transactions before retransmitting them. Byte vectors on the stack are not allowed to be more than 520 bytes long. Opcodes which take integers and bools off the stack require that they be no more than 4 bytes long, but addition and subtraction can overflow and result in a 5 byte integer being put on the stack.



Hm.

0x81 in binary is: 1000 0001
0x80 in binary is: 1000 0000

Ok. So in Script, any byte that starts with a "1" bit is negative. The first one to do so, 0x80 or 1000 0000, is "negative 0".

0x81 is -1 in decimal.

Normal ("positive") 0 is 0x00 or 0000 0000.

Booleans:
- 0x00 or 0x80 are False
- Any other value is True.

Key points:
- Byte vectors are little-endian variable-length integers.
- Don't use negative 0 (0x80) bytes. Don't use leading zero bytes in the integer values. Both of these cause problems with transaction retransmission.
- Maximum length of byte vector is 520 bytes.
- Don't use integers and boolean values that are longer than 4 bytes.


Further reading indicates that a "stack" is the abstract idea of a sequential collection of objects on which certain operations are defined. Implementations may vary across platforms but must behave in the same way in order to be considered to be the same stack item. This means that a stack can be implemented in Python on one computer and in C++ on another computer, but as long as the two stack implementations can perform the same operations on the same domain, they can pass data between themselves without any trouble.

The stack itself does not implement a SHA256 hash function, it simply indicates when and on what data a SHA256 function should operate. The details of implementing SHA256 are left to the stack implementation.



From reading the list of opcodes shown at:
en.bitcoin.it/wiki/Script
it appears that all opcodes are exactly one byte.


Let's re-quote an excerpt from Ken Shirriff's article:

In a standard transaction, the scriptSig pushes the signature (generated from the private key) to the stack, followed by the public key. Next, the scriptPubKey (from the source transaction) is executed to verify the public key and then verify the signature.

As expressed in Script, the scriptSig is:

PUSHDATA
signature data and SIGHASH_ALL
PUSHDATA
public key data


The scriptPubKey is:

OP_DUP
OP_HASH160
PUSHDATA
Bitcoin address (public key hash)
OP_EQUALVERIFY
OP_CHECKSIG


When this code executes, PUSHDATA first pushes the signature to the stack. The next PUSHDATA pushes the public key to the stack. Next, OP_DUP duplicates the public key on the stack. OP_HASH160 computes the 160-bit hash of the public key. PUSHDATA pushes the required Bitcoin address. Then OP_EQUALVERIFY verifies the top two stack values are equal - that the public key hash from the new transaction matches the address in the old address. This proves that the public key is valid. Next, OP_CHECKSIG checks that the signature of the transaction matches the public key and signature on the stack. This proves that the signature is valid.



So, it looks as though each of these opcodes:
PUSHDATA, OP_DUP, OP_HASH160, OP_EQUALVERIFY, OP_CHECKSIG
is a single byte in a script byte sequence in the raw transaction. The stack implementation knows how to interpret these opcodes and perform the indicated operations.

Exceptions: OP_PUSHDATA1 is two bytes long, OP_PUSHDATA2 is three bytes long, and OP_PUSHDATA4 is five bytes long. There may be other multi-byte opcodes.



Excerpt from:
en.bitcoin.it/wiki/Script

Standard Transaction to Bitcoin address (pay-to-pubkey-hash)

scriptPubKey: OP_DUP OP_HASH160 {pubKeyHash} OP_EQUALVERIFY OP_CHECKSIG
scriptSig: {sig} {pubKey}


To demonstrate how scripts look on the wire, here is a raw scriptPubKey:

OP_DUP: 76
OP_HASH160: A9
Bytes to push: 14
Data to push: 89 AB CD EF AB BA AB BA AB BA AB BA AB BA AB BA AB BA AB BA
OP_EQUALVERIFY: 88
OP_CHECKSIG: AC


Note: scriptSig is in the input of the spending transaction and scriptPubKey is in the output of the previously unspent i.e. "available" transaction.

Here is how each word is processed:

[Begin Stack Processing]

[Stage 1]
- Stack: Empty
- Script: {sig} {pubKey} OP_DUP OP_HASH160 {pubKeyHash} OP_EQUALVERIFY OP_CHECKSIG
- Description: scriptSig and scriptPubKey are combined.

[Stage 2]
- Stack: {sig} {pubKey}
- Script: OP_DUP OP_HASH160 {pubKeyHash} OP_EQUALVERIFY OP_CHECKSIG
- Description: Constants are added to the stack.

[Stage 3]
- Stack: {sig} {pubKey} {pubKey}
- Script: OP_HASH160 {pubKeyHash} OP_EQUALVERIFY OP_CHECKSIG
- Description: Top stack item is duplicated.

[Stage 4]
- Stack: {sig} {pubKey} {pubHashA}
- Script: {pubKeyHash} OP_EQUALVERIFY OP_CHECKSIG
- Description: Top stack item is hashed.

[Stage 5]
- Stack: {sig} {pubKey} {pubHashA} {pubKeyHash}
- Script: OP_EQUALVERIFY OP_CHECKSIG
- Description: Constant added.

[Stage 6]
- Stack: {sig} {pubKey}
- Script: OP_CHECKSIG
- Description: Equality is checked between the top two stack items.

[Stage 7]
- Stack: true
- Script: Empty.
- Description: Signature is checked for top two stack items.

[End Stack Processing]




Ok. Let's look up the relevant opcodes.


From:
en.bitcoin.it/wiki/Script

I find that:

- 0x00:
-- Word: OP_0 / OP_FALSE
-- Opcode 0
-- Input: Nothing.
-- Output: [empty value]
-- Description: An empty array of bytes is pushed onto the stack. (This is not a no-op: an item is added to the stack.)

- 0x01-0x4b
-- Word: [N/A]
-- Opcodes 1-75 (e.g. OP_1).
-- Input: [Special]
-- Output: data
-- Description: Let N be the byte value of the opcode. The next N bytes are data to be pushed onto the stack.

- 0x69
-- Word: OP_VERIFY
-- Opcode: 105
-- Input: True / false
-- Output: Nothing / fail
-- Description: Marks transaction as invalid if top stack value is not true. The top stack value is removed.

- 0x76
-- Word: OP_DUP
-- Opcode: 118
-- Input: x
-- Output: x x
-- Description: Duplicates the top stack item.

- 0x87
-- Word: OP_EQUAL
-- Opcode: 135
-- Input: x1 x2
-- Output: True / false.
-- Description: Returns 1 if the inputs are exactly equal, 0 otherwise.

- 0x88
-- Word: OP_EQUALVERIFY
-- Opcode: 136
-- Input: x1 x2
-- Output: Nothing / fail.
-- Description: Same as OP_EQUAL, but runs OP_VERIFY afterward.

- 0xa9
-- Word: OP_HASH160
-- Opcode: 169
-- Input: in
-- Output: hash
-- Description: The input is hashed twice: first with SHA-256 and then with RIPEMD-160.

- 0xac
-- Word: OP_CHECKSIG
-- Opcode: 172
-- Input: sig pubkey
-- Output: True / false
-- Description: The entire transaction's outputs, inputs, and script (from the most recently-executed OP_CODESEPARATOR to the end) are hashed. The signature used by OP_CHECKSIG must be a valid signature for this hash and public key. If it is, 1 is returned, 0 otherwise.



Notes:
- The "opcode" is the decimal value of the hex byte.
- "Input: x1 x2" presumably means that the inputs to the operation are the first two items on the stack (judging by the example stack process shown above), which indicates that the stack is 1-indexed.
- For OP_HASH160, "Input: in" presumably means the top item on the stack.



I think those are all the opcodes I need for reading a standard scriptPubKey.


This is the scriptPubKey under consideration (from tr_ken1):
76 a9 14 df 3b d3 01 60 e6 c6 14 5b aa f2 c8 8a 88 44 c1 3a 00 d1 d5 88 ac

Hm.

First byte is 76, or rather 0x76, which is OP_DUP.

Second byte is a9, which is OP_HASH160.

Next: 0x14, which in decimal is 1*16+4 = 20, which is between 1 and 75, so it means PUSHDATA(20), i.e. push the next 20 bytes of data onto the stack, presumably as a single stack item.

Next 20 bytes: df 3b d3 01 60 e6 c6 14 5b aa f2 c8 8a 88 44 c1 3a 00 d1 d5

These 20 bytes are presumably the public key hash i.e. if someone's bitcoin public key hashes to this value, they can use the corresponding private key to spend this output.

Next byte: 88, which is OP_EQUALVERIFY.

Last byte: ac, which is OP_CHECKSIG.


So, in readable form, the scriptPubKey is:
- OP_DUP
- OP_HASH160
- PUSHDATA(20)
- df 3b d3 01 60 e6 c6 14 5b aa f2 c8 8a 88 44 c1 3a 00 d1 d5
- OP_EQUALVERIFY
- OP_CHECKSIG





Now let's look at the scriptSig in tr_ken2, which should satisfy the cryptographic condition set by scriptPubKey in tr_ken1.


scriptSig:
47 30 44 02 20 2c b2 65 bf 10 70 7b f4 93 46 c3 51 5d d3 d1 6f c4 54 61 8c 58 ec 0a 0f f4 48 a6 76 c5 4f f7 13 02 20 6c 66 24 d7 62 a1 fc ef 46 18 28 4e ad 8f 08 67 8a c0 5b 13 c8 42 35 f1 65 4e 6a d1 68 23 3e 82 01 41 04 14 e3 01 b2 32 8f 17 44 2c 0b 83 10 d7 87 bf 3d 8a 40 4c fb d0 70 4f 13 5b 6a d4 b2 d3 ee 75 13 10 f9 81 92 6e 53 a6 e8 c3 9b d7 d3 fe fd 57 6c 54 3c ce 49 3c ba c0 63 88 f2 65 1d 1a ac bf cd



The first byte, 47, which in decimal is 4*16+7 = 71, which is within the inclusive range 1-75, is a PUSHDATA command, in this case PUSHDATA(71).

I'm using PUSHDATA(n) to mean that:
- the hex byte in question indicates that the next n bytes are data that should be pushed onto the stack as a single new stack item.

I'll use python to select the next 71 bytes.


aineko:work stjohnpiano$ python

Python 2.7.13 (default, Dec 18 2016, 05:35:59)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> s = "47 30 44 02 20 2c b2 65 bf 10 70 7b f4 93 46 c3 51 5d d3 d1 6f c4 54 61 8c 58 ec 0a 0f f4 48 a6 76 c5 4f f7 13 02 20 6c 66 24 d7 62 a1 fc ef 46 18 28 4e ad 8f 08 67 8a c0 5b 13 c8 42 35 f1 65 4e 6a d1 68 23 3e 82 01 41 04 14 e3 01 b2 32 8f 17 44 2c 0b 83 10 d7 87 bf 3d 8a 40 4c fb d0 70 4f 13 5b 6a d4 b2 d3 ee 75 13 10 f9 81 92 6e 53 a6 e8 c3 9b d7 d3 fe fd 57 6c 54 3c ce 49 3c ba c0 63 88 f2 65 1d 1a ac bf cd"
>>> s2 = s.split(" ")
>>> " ".join(s2[1:(1+71)])
>>> '30 44 02 20 2c b2 65 bf 10 70 7b f4 93 46 c3 51 5d d3 d1 6f c4 54 61 8c 58 ec 0a 0f f4 48 a6 76 c5 4f f7 13 02 20 6c 66 24 d7 62 a1 fc ef 46 18 28 4e ad 8f 08 67 8a c0 5b 13 c8 42 35 f1 65 4e 6a d1 68 23 3e 82 01'

>>> len(s2[1:(1+71)])
>>> 71




So, within scriptSig, the next 71 bytes, which should be the transaction signature, are located here:

scriptSig:
47 30 44 02 20 2c b2 65 bf 10 70 7b f4 93 46 c3 51 5d d3 d1 6f c4 54 61 8c 58 ec 0a 0f f4 48 a6 76 c5 4f f7 13 02 20 6c 66 24 d7 62 a1 fc ef 46 18 28 4e ad 8f 08 67 8a c0 5b 13 c8 42 35 f1 65 4e 6a d1 68 23 3e 82 01 41 04 14 e3 01 b2 32 8f 17 44 2c 0b 83 10 d7 87 bf 3d 8a 40 4c fb d0 70 4f 13 5b 6a d4 b2 d3 ee 75 13 10 f9 81 92 6e 53 a6 e8 c3 9b d7 d3 fe fd 57 6c 54 3c ce 49 3c ba c0 63 88 f2 65 1d 1a ac bf cd



Note: To find the end of the 71-byte sequence, choose the last three bytes from the final string shown in the python output above, which are "3e 82 01", copy the scriptSig to the bottom of the document, position the cursor before scriptSig, and use the text editor (TextWrangler in this case) to search for the byte string "3e 82 01". It's unlikely that e.g. three bytes will repeat exactly within scriptSig. You can search again from the end of the first instance of the byte string in order to make sure.


The next byte is 41, which in decimal is 4*16+1 = 65, which is within the inclusive range 1-75, is a PUSHDATA command, in this case PUSHDATA(65).


Use python as shown above to select the next 65 bytes. Check that the number of hex bytes selected is actually 65 (Python slicing is elastic/forgiving) using: len(s2[73:73+65])


So, within scriptSig, the last 65 bytes, which should be the bitcoin public key, are located here:

scriptSig:
47 30 44 02 20 2c b2 65 bf 10 70 7b f4 93 46 c3 51 5d d3 d1 6f c4 54 61 8c 58 ec 0a 0f f4 48 a6 76 c5 4f f7 13 02 20 6c 66 24 d7 62 a1 fc ef 46 18 28 4e ad 8f 08 67 8a c0 5b 13 c8 42 35 f1 65 4e 6a d1 68 23 3e 82 01 41 04 14 e3 01 b2 32 8f 17 44 2c 0b 83 10 d7 87 bf 3d 8a 40 4c fb d0 70 4f 13 5b 6a d4 b2 d3 ee 75 13 10 f9 81 92 6e 53 a6 e8 c3 9b d7 d3 fe fd 57 6c 54 3c ce 49 3c ba c0 63 88 f2 65 1d 1a ac bf cd



So, in readable form, the scriptSig is:
- PUSHDATA(71)
- [transaction signature] 30 44 02 20 2c b2 65 bf 10 70 7b f4 93 46 c3 51 5d d3 d1 6f c4 54 61 8c 58 ec 0a 0f f4 48 a6 76 c5 4f f7 13 02 20 6c 66 24 d7 62 a1 fc ef 46 18 28 4e ad 8f 08 67 8a c0 5b 13 c8 42 35 f1 65 4e 6a d1 68 23 3e 82 01
- PUSHDATA(65)
- [bitcoin public key] 04 14 e3 01 b2 32 8f 17 44 2c 0b 83 10 d7 87 bf 3d 8a 40 4c fb d0 70 4f 13 5b 6a d4 b2 d3 ee 75 13 10 f9 81 92 6e 53 a6 e8 c3 9b d7 d3 fe fd 57 6c 54 3c ce 49 3c ba c0 63 88 f2 65 1d 1a ac bf cd




Next: I can actually just crank the stack machine by hand, by performing the required operations in the specified order on the correctly selected data.



scriptSig:
- PUSHDATA(71)
- [transaction signature] 30 44 02 20 2c b2 65 bf 10 70 7b f4 93 46 c3 51 5d d3 d1 6f c4 54 61 8c 58 ec 0a 0f f4 48 a6 76 c5 4f f7 13 02 20 6c 66 24 d7 62 a1 fc ef 46 18 28 4e ad 8f 08 67 8a c0 5b 13 c8 42 35 f1 65 4e 6a d1 68 23 3e 82 01
- PUSHDATA(65)
- [bitcoin public key] 04 14 e3 01 b2 32 8f 17 44 2c 0b 83 10 d7 87 bf 3d 8a 40 4c fb d0 70 4f 13 5b 6a d4 b2 d3 ee 75 13 10 f9 81 92 6e 53 a6 e8 c3 9b d7 d3 fe fd 57 6c 54 3c ce 49 3c ba c0 63 88 f2 65 1d 1a ac bf cd

scriptPubKey:
- OP_DUP
- OP_HASH160
- PUSHDATA(20)
- df 3b d3 01 60 e6 c6 14 5b aa f2 c8 8a 88 44 c1 3a 00 d1 d5
- OP_EQUALVERIFY
- OP_CHECKSIG




Here is a manual stack machine:

[Begin Stack Processing]

[Stage 1]
- Stack: Empty
- Script: PUSHDATA(71) [sequence of 71 bytes] PUSHDATA(65) [sequence of 65 bytes] OP_DUP OP_HASH160 PUSHDATA(20) [sequence of 20 bytes] OP_EQUALVERIFY OP_CHECKSIG
- Description: scriptSig and scriptPubKey are combined.

Let's begin executing the stack commands.

The first command is to "push next 71 bytes onto the stack as a new stack item". For convenience, I will refer to the next 71 bytes (value: 30 44 02 20 2c b2 65 bf 10 70 7b f4 93 46 c3 51 5d d3 d1 6f c4 54 61 8c 58 ec 0a 0f f4 48 a6 76 c5 4f f7 13 02 20 6c 66 24 d7 62 a1 fc ef 46 18 28 4e ad 8f 08 67 8a c0 5b 13 c8 42 35 f1 65 4e 6a d1 68 23 3e 82 01) as {sig}.

I will refer to the sequence of 65 bytes as {pubKey} and the sequence of 20 bytes as {pubKeyHash}.

[Stage 2]
- Stack: {sig}
- Script: PUSHDATA(65) [sequence of 65 bytes] OP_DUP OP_HASH160 PUSHDATA(20) [sequence of 20 bytes] OP_EQUALVERIFY OP_CHECKSIG
- Description: First command (PUSHDATA(71)) has been executed.

[Stage 3]
- Stack: {sig} {pubKey}
- Script: OP_DUP OP_HASH160 PUSHDATA(20) [sequence of 20 bytes] OP_EQUALVERIFY OP_CHECKSIG
- Description: Second command (PUSHDATA(65)) has been executed.

[Stage 4]
- Stack: {sig} {pubKey} {pubKey}
- Script: OP_HASH160 PUSHDATA(20) [sequence of 20 bytes] OP_EQUALVERIFY OP_CHECKSIG
- Description: Third command (OP_DUP) has been executed.


Hm. Ok, next stage requires some work. I have to take the top item on the stack and hash it twice, first with SHA-256 and then with RIPEMD-160.


In my archives, I have a Python implementation of SHA256, originally copied from the PyPy source code.

I downloaded it from:
bitbucket.org/pypy/pypy/src/tip/lib_pypy/_sha256.py
on 2017-05-12 and saved it as pypy_sha256.py.

I've saved my copy as an asset of this article. [link]

Save a copy of this file in the work directory.


Note: The sha256(object) class constructor requires its input to be a byte sequence in raw bytes, not hex bytes. The output is also a raw byte sequence. Both of these byte sequences are big-endian (i.e. the highest value byte is first in the sequence).


Let's test.


Create a new file in the work directory: op_hash160.py

The goal is to hash {pubKey}, which is:
04 14 e3 01 b2 32 8f 17 44 2c 0b 83 10 d7 87 bf 3d 8a 40 4c fb d0 70 4f 13 5b 6a d4 b2 d3 ee 75 13 10 f9 81 92 6e 53 a6 e8 c3 9b d7 d3 fe fd 57 6c 54 3c ce 49 3c ba c0 63 88 f2 65 1d 1a ac bf cd




op_hash160.py

python 2.7.13
#!/opt/local/bin/python

from pypy_sha256 import sha256
from binascii import hexlify, unhexlify

input = "04 14 e3 01 b2 32 8f 17 44 2c 0b 83 10 d7 87 bf 3d 8a 40 4c fb d0 70 4f 13 5b 6a d4 b2 d3 ee 75 13 10 f9 81 92 6e 53 a6 e8 c3 9b d7 d3 fe fd 57 6c 54 3c ce 49 3c ba c0 63 88 f2 65 1d 1a ac bf cd"

input_hex_string = input.replace(" ","")

# convert to a byte sequence that can be used as input to the sha256 function. 
byte_sequence = unhexlify(input_hex_string)

# the digest (i.e. the result of hashing the input) will be a byte sequence. 
digest = sha256(byte_sequence).digest()

# convert the byte sequence to hex for display. 
print hexlify(digest)



aineko:work stjohnpiano$ chmod 700 op_hash160.py


aineko:work stjohnpiano$ ./op_hash160.py

acd518cad5449a573db372ff7e3330c4ffb40de59d662fcdc4090c4e021b850b



Ok. The SHA256 digest of the {pubKey} data is:
acd518cad5449a573db372ff7e3330c4ffb40de59d662fcdc4090c4e021b850b




Next: RIPEMD-160.


In my archives, I have a Python implementation of RIPEMD160. I originally found it in Vitalik Buterin's pybitcointools repository at:
github.com/vbuterin/pybitcointools

The first two lines of the code are:
## ripemd.py - pure Python implementation of the RIPEMD-160 algorithm.
## Bjorn Edstrom <be@bjrn.se> 16 december 2007.


The original author, Björn Edström, has a website:
www.bjrn.se

After some searching, I found the code stored at:
www.bjrn.se/code/ripemdpy.txt
I downloaded a copy on 2017-08-04 and saved it as bjorn_edstrom_ripemd160.py.

I've saved my copy as an asset of this article. [link]

Save a copy of this file in the work directory.


Note: I think the RIPEMD160(arg) class constructor handles a raw byte sequence as input without any trouble. The author used the phrase "string argument" but I think that he meant "string of bytes" rather than "string of bytes that are valid ASCII characters".


Let's test.



op_hash160.py

python 2.7.13
#!/opt/local/bin/python

from binascii import hexlify, unhexlify
from pypy_sha256 import sha256
from bjorn_edstrom_ripemd160 import RIPEMD160

input = "04 14 e3 01 b2 32 8f 17 44 2c 0b 83 10 d7 87 bf 3d 8a 40 4c fb d0 70 4f 13 5b 6a d4 b2 d3 ee 75 13 10 f9 81 92 6e 53 a6 e8 c3 9b d7 d3 fe fd 57 6c 54 3c ce 49 3c ba c0 63 88 f2 65 1d 1a ac bf cd"

input_hex_string = input.replace(" ","")

# convert to a byte sequence that can be used as input to the sha256 hash function. 
byte_sequence = unhexlify(input_hex_string)

# the digest (i.e. the result of hashing the input) will be a byte sequence. 
sha256_digest = sha256(byte_sequence).digest()

# convert the byte sequence to hex for display. 
print "sha256_digest: %s" % hexlify(sha256_digest)

# the sha256_digest is already a raw byte sequence that can be used as input to the ripemd160 hash function. 
ripemd160_digest = RIPEMD160(sha256_digest).digest()

# convert the byte sequence to hex for display. 
print "ripemd160_digest(sha256_digest): %s" % hexlify(ripemd160_digest)



aineko:work stjohnpiano$ ./op_hash160.py

sha256_digest: acd518cad5449a573db372ff7e3330c4ffb40de59d662fcdc4090c4e021b850b
ripemd160_digest(sha256_digest): df3bd30160e6c6145baaf2c88a8844c13a00d1d5




Ok. The result of OP_HASH160 is:
df3bd30160e6c6145baaf2c88a8844c13a00d1d5

Let's split this into individual hex bytes:


aineko:~ stjohnpiano$ echo -n "df3bd30160e6c6145baaf2c88a8844c13a00d1d5" | sed 's/../& /g'

df 3b d3 01 60 e6 c6 14 5b aa f2 c8 8a 88 44 c1 3a 00 d1 d5



Result:
df 3b d3 01 60 e6 c6 14 5b aa f2 c8 8a 88 44 c1 3a 00 d1 d5





This stage of the stack machine is:

[Stage 5]
- Stack: {sig} {pubKey} {pubKeyHash}
- Script: PUSHDATA(20) [sequence of 20 bytes] OP_EQUALVERIFY OP_CHECKSIG
- Description: Top stack item has been hashed.


Let's continue.


[Stage 6]
- Stack: {sig} {pubKey} {pubKeyHash} {pubKeyHashB}
- Script: OP_EQUALVERIFY OP_CHECKSIG
- Description: Command (PUSHDATA(65)) has been executed.

I've renamed the sequence of 65 bytes to pubKeyHashB, in order to differentiate it from the hash of the public key that was just calculated.


Next stage:

OP_EQUALVERIFY == run OP_EQUAL, then run OP_VERIFY.

OP_EQUAL returns 1 if the two top stack items are exactly equal, 0 otherwise.

Ok.

The current top two stack items are:
- {pubKeyHashB} = df 3b d3 01 60 e6 c6 14 5b aa f2 c8 8a 88 44 c1 3a 00 d1 d5
- {pubKeyHash} = df 3b d3 01 60 e6 c6 14 5b aa f2 c8 8a 88 44 c1 3a 00 d1 d5


Let's check whether they are equal.


aineko:~ stjohnpiano$ pubKeyHashB="df 3b d3 01 60 e6 c6 14 5b aa f2 c8 8a 88 44 c1 3a 00 d1 d5"


aineko:~ stjohnpiano$ pubKeyHash="df 3b d3 01 60 e6 c6 14 5b aa f2 c8 8a 88 44 c1 3a 00 d1 d5"


aineko:~ stjohnpiano$ [[ "$pubKeyHashB" = "$pubKeyHash" ]] && echo equal || echo not-equal

equal



They're equal. This means that OP_EQUAL returns 1.

Now run OP_VERIFY.

OP_VERIFY marks the transaction as invalid if top stack value is not true. The top stack value is removed.

For this stack machine, any value other than positive 0 (0x00) or negative 0 (0x80) is "true". OP_EQUAL returned "1", which in hex is 0x01, so OP_VERIFY does not mark the transaction as invalid. It then removes the top stack item.

From an earlier excerpt:

OP_EQUAL pops (removes from the top of the stack) the two values it compared, and replaces them with the result of that comparison: zero (false) or one (true).



So, in this case: OP_EQUAL removes the two top items ({pubKeyHash} and {pubKeyHashB}) and adds a new one ("1"). Then OP_VERIFY removes the "1" item.


So:

[Stage 7]
- Stack: {sig} {pubKey}
- Script: OP_CHECKSIG
- Description: Command OP_EQUALVERIFY has been executed. Transaction has not been marked invalid.



OP_CHECKSIG: The entire transaction's outputs, inputs, and script (from the most recently-executed OP_CODESEPARATOR to the end) are hashed. The signature used by OP_CHECKSIG must be a valid signature for this hash and public key. If it is, 1 is returned, 0 otherwise.




Hm.


- {sig} [transaction signature]: 30 44 02 20 2c b2 65 bf 10 70 7b f4 93 46 c3 51 5d d3 d1 6f c4 54 61 8c 58 ec 0a 0f f4 48 a6 76 c5 4f f7 13 02 20 6c 66 24 d7 62 a1 fc ef 46 18 28 4e ad 8f 08 67 8a c0 5b 13 c8 42 35 f1 65 4e 6a d1 68 23 3e 82 01
- {pubKey} [bitcoin public key]: 04 14 e3 01 b2 32 8f 17 44 2c 0b 83 10 d7 87 bf 3d 8a 40 4c fb d0 70 4f 13 5b 6a d4 b2 d3 ee 75 13 10 f9 81 92 6e 53 a6 e8 c3 9b d7 d3 fe fd 57 6c 54 3c ce 49 3c ba c0 63 88 f2 65 1d 1a ac bf cd



I need the hash (which algorithm?) of the transaction (in what format?). Then I need to verify the signature, which (I think) means to check that the scriptSig's signature is the signed-by-the-public-key form of the hash-of-the-formatted-transaction.


Earlier, I found that the txid ("previous output hash") used in a transaction input is 256 bits long, and therefore probably the result of running the SHA256 algorithm. Perhaps SHA256 is also used to hash the raw transaction for signing.

However, the {pubKey} [bitcoin public key] above is 65 bytes == 520 bits long (not 256 bits).

From an earlier excerpt:

The Elliptic Curve DSA algorithm generates a 512-bit public key from the private key. (Elliptic curve cryptography will be discussed later.) This public key is used to verify the signature on a transaction. Inconveniently, the Bitcoin protocol adds a prefix of 04 to the public key.


512 bits + a single hex byte "04" is 520 bits. The first byte of {pubKey} is indeed 04.





Some earlier excerpts from:
bitcoin.org/en/developer-guide

All transactions, including the coinbase transaction, are encoded into blocks in binary rawtransaction format.

The rawtransaction format is hashed to create the transaction identifier (txid).

[...]

For a P2PKH-style output, Bob's signature script will contain the following two pieces of data:

1) His full (unhashed) public key, so the pubkey script can check that it hashes to the same value as the pubkey hash provided by Alice.

2) An secp256k1 signature made by using the ECDSA cryptographic formula to combine certain transaction data (described below) with Bob's private key. This lets the pubkey script verify that Bob owns the private key which created the public key.

Bob's secp256k1 signature doesn't just prove Bob controls his private key; it also makes the non-signature-script parts of his transaction tamper-proof so Bob can safely broadcast them over the peer-to-peer network.

[...]

The data Bob signs includes the txid and output index of the previous transaction, the previous output's pubkey script, the pubkey script Bob creates which will let the next recipient spend this transaction's output, and the amount of satoshis to spend to the next recipient. In essence, the entire transaction is signed except for any signature scripts, which hold the full public keys and secp256k1 signatures.



Hm.


More excerpts from Ken Shirriff's article:

Signing the transaction

I found signing the transaction to be the hardest part of using Bitcoin manually, with a process that is surprisingly difficult and error-prone. The basic idea is to use the ECDSA elliptic curve algorithm and the private key to generate a digital signature of the transaction, but the details are tricky. The signing process has been described through a 19-step process
[ http://bitcoin.stackexchange.com/questions/3374/how-to-redeem-a-basic-tx ]
(more info)
[ http://en.bitcoin.it/wiki/OP_CHECKSIG ].

[...]

The biggest complication is the signature appears in the middle of the transaction, which raises the question of how to sign the transaction before you have the signature. To avoid this problem, the scriptPubKey script is copied from the source transaction into the spending transaction (i.e. the transaction that is being signed) before computing the signature. Then the signature is turned into code in the Script language, creating the scriptSig script that is embedded in the transaction. It appears that using the previous transaction's scriptPubKey during signing is for historical reasons rather than any logical reason. For transactions with multiple inputs, signing is even more complicated since each input requires a separate signature, but I won't go into the details.

One step that tripped me up is the hash type. Before signing, the transaction has a hash type constant temporarily appended. For a regular transaction, this is SIGHASH_ALL (0x00000001). After signing, this hash type is removed from the end of the transaction and appended to the scriptSig.

Another annoying thing about the Bitcoin protocol is that the signature and public key are both 512-bit elliptic curve values, but they are represented in totally different ways: the signature is encoded with DER encoding but the public key is represented as plain bytes. In addition, both values have an extra byte, but positioned inconsistently: SIGHASH_ALL is put after the signature, and type 04 is put before the public key.

Debugging the signature was made more difficult because the ECDSA algorithm uses a random number. Thus, the signature is different every time you compute it, so it can't be compared with a known-good signature.

Update (Feb 2014): An important side-effect of the signature changing every time is that if you re-sign a transaction, the transaction's hash will change. This is known as Transaction Malleability. There are also ways that third parties can modify transactions in trivial ways that change the hash but not the meaning of the transaction. [...]

With these complications it took me a long time to get the signature to work. Eventually, though, I got all the bugs out of my signing code and succesfully signed a transaction. Here's the code snippet I used.

def makeSignedTransaction(privateKey, outputTransactionHash, sourceIndex, scriptPubKey, outputs):
    myTxn_forSig = (makeRawTransaction(outputTransactionHash, sourceIndex, scriptPubKey, outputs)
         + "01000000") # hash code

    s256 = hashlib.sha256(hashlib.sha256(myTxn_forSig.decode('hex')).digest()).digest()
    sk = ecdsa.SigningKey.from_string(privateKey.decode('hex'), curve=ecdsa.SECP256k1)
    sig = sk.sign_digest(s256, sigencode=ecdsa.util.sigencode_der) + '\01' # 01 is hashtype
    pubKey = keyUtils.privateKeyToPublicKey(privateKey)
    scriptSig = utils.varstr(sig).encode('hex') + utils.varstr(pubKey.decode('hex')).encode('hex')
    signed_txn = makeRawTransaction(outputTransactionHash, sourceIndex, scriptSig, outputs)
    verifyTxnSignature(signed_txn)
    return signed_txn




Hm.

Ok. From the code, I see that the first step is to create a transaction to be signed and append a hash_type ("01000000") to it.

In a transaction, there is one scriptSig per input. In a transaction-to-be-signed, the scriptSig is replaced with the scriptPubKey of the unspent output. The signature will be created using the private key that corresponds to this scriptPubKey.

Question: If there are multiple inputs, a signature is needed for each one. Are all scriptSigs in a multiple-input-transaction replaced with the same scriptPubKey in the corresponding transaction-to-be-signed?

Ken Shirriff's code and article solves his problem (creating a valid 1-input-to-1-output transaction) and didn't tackle this question.

Alternative possibility: Only the scriptSig for 1 input is replaced with the scriptPubKey of the unspent-output-being-used-here-as-an-input. The other input scriptSigs are left blank during the signing.

Some reading indicates that the other scriptSigs are in fact replaced with a zero byte (0x00) within the transaction-to-be-signed.

Future: Test this by verifying a multiple-input transaction.




In Ken Shirriff's repository linked from this article, I find the code for the function makeRawTransaction() in txnUtils.py.

python
# Makes a transaction from the inputs
# outputs is a list of [redemptionSatoshis, outputScript]
def makeRawTransaction(outputTransactionHash, sourceIndex, scriptSig, outputs):
    def makeOutput(data):
        redemptionSatoshis, outputScript = data
        return (struct.pack("<Q", redemptionSatoshis).encode('hex') +
        '%02x' % len(outputScript.decode('hex')) + outputScript)
    formattedOutputs = ''.join(map(makeOutput, outputs))
    return (
        "01000000" + # 4 bytes version
        "01" + # varint for number of inputs
        outputTransactionHash.decode('hex')[::-1].encode('hex') + # reverse outputTransactionHash
        struct.pack('<L', sourceIndex).encode('hex') +
        '%02x' % len(scriptSig.decode('hex')) + scriptSig +
        "ffffffff" + # sequence
        "%02x" % len(outputs) + # number of outputs
        formattedOutputs +
        "00000000" # lockTime
        )



When this function was called earlier, scriptPubKey was passed in as the scriptSig argument.

I note that the length var_int of scriptSig is calculated in this function.



I'll create the transaction-to-be-signed manually.



Ken Shirriff's signed transaction (tr_ken2):

[Signed Transaction]
- version: 01 00 00 00
- input count: 01
- input:
-- previous output hash (reversed): 48 4d 40 d4 5b 9e a0 d6 52 fc a8 25 8a b7 ca a4 25 41 eb 52 97 58 57 f9 6f b5 0c d7 32 c8 b4 81
-- previous output index: 00 00 00 00
-- script length: 8a
-- scriptSig: 47 30 44 02 20 2c b2 65 bf 10 70 7b f4 93 46 c3 51 5d d3 d1 6f c4 54 61 8c 58 ec 0a 0f f4 48 a6 76 c5 4f f7 13 02 20 6c 66 24 d7 62 a1 fc ef 46 18 28 4e ad 8f 08 67 8a c0 5b 13 c8 42 35 f1 65 4e 6a d1 68 23 3e 82 01 41 04 14 e3 01 b2 32 8f 17 44 2c 0b 83 10 d7 87 bf 3d 8a 40 4c fb d0 70 4f 13 5b 6a d4 b2 d3 ee 75 13 10 f9 81 92 6e 53 a6 e8 c3 9b d7 d3 fe fd 57 6c 54 3c ce 49 3c ba c0 63 88 f2 65 1d 1a ac bf cd
-- sequence: ff ff ff ff
- output count: 01
- output:
-- value: 62 64 01 00 00 00 00 00
-- script length: 19
-- scriptPubKey: 76 a9 14 c8 e9 09 96 c7 c6 08 0e e0 62 84 60 0c 68 4e d9 04 d1 4c 5c 88 ac
- block lock time: 00 00 00 00



Note: tr_ken2 already has the necessary outputTransactionHash (previous_output_hash) and sourceIndex (previous_output_index).

The previous_output_hash is already reversed.


This is the relevant scriptPubKey from tr_ken1:
76 a9 14 df 3b d3 01 60 e6 c6 14 5b aa f2 c8 8a 88 44 c1 3a 00 d1 d5 88 ac


I need to construct a var_int with length of this scriptPubKey. It's included in the substitution.


aineko:work stjohnpiano$ python

Python 2.7.13 (default, Dec 18 2016, 05:35:59)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> x = "76 a9 14 df 3b d3 01 60 e6 c6 14 5b aa f2 c8 8a 88 44 c1 3a 00 d1 d5 88 ac"
>>> y = x.replace(" ","")
>>> len(y)/2
>>> 25

>>> hex(25)
>>> '0x19'



So there are 25 bytes in scriptPubKey, which is 19 in hex.

So, now I substitute the relevant scriptPubKey from tr_ken1 into tr_ken2, and I append the hash_type (4 hex bytes) "01 00 00 00", creating tr_ken2_to_be_signed.

- version: 01 00 00 00
- input count: 01
- input:
-- previous output hash (reversed): 48 4d 40 d4 5b 9e a0 d6 52 fc a8 25 8a b7 ca a4 25 41 eb 52 97 58 57 f9 6f b5 0c d7 32 c8 b4 81
-- previous output index: 00 00 00 00
-- script length: 19
-- scriptPubKey: 76 a9 14 df 3b d3 01 60 e6 c6 14 5b aa f2 c8 8a 88 44 c1 3a 00 d1 d5 88 ac
-- sequence: ff ff ff ff
- output count: 01
- output:
-- value: 62 64 01 00 00 00 00 00
-- script length: 19
-- scriptPubKey: 76 a9 14 c8 e9 09 96 c7 c6 08 0e e0 62 84 60 0c 68 4e d9 04 d1 4c 5c 88 ac
- block lock time: 00 00 00 00
- hash type: 01 00 00 00



This line from earlier indicates that I now have to hash tr_ken2_to_be_signed twice using SHA256.

s256 = hashlib.sha256(hashlib.sha256(myTxn_forSig.decode('hex')).digest()).digest()


The result of this operation is the item that is actually signed.


Ok. Let's hash it twice.

First, get the hex bytes of tr_ken2_to_be_signed into a single string.

Copy the transaction above. Use TextWrangler's Find-and-Replace with grep enabled to replace "^-.*: " with "".

01 00 00 00
01

48 4d 40 d4 5b 9e a0 d6 52 fc a8 25 8a b7 ca a4 25 41 eb 52 97 58 57 f9 6f b5 0c d7 32 c8 b4 81
00 00 00 00
19
76 a9 14 df 3b d3 01 60 e6 c6 14 5b aa f2 c8 8a 88 44 c1 3a 00 d1 d5 88 ac
ff ff ff ff
01

62 64 01 00 00 00 00 00
19
76 a9 14 c8 e9 09 96 c7 c6 08 0e e0 62 84 60 0c 68 4e d9 04 d1 4c 5c 88 ac
00 00 00 00
01 00 00 00



I entered the Python interpreter within my work directory, so the sha256 class can be imported from pypy_sha256.py.


>>> x = """
>>> 01 00 00 00
... 01
...
... 48 4d 40 d4 5b 9e a0 d6 52 fc a8 25 8a b7 ca a4 25 41 eb 52 97 58 57 f9 6f b5 0c d7 32 c8 b4 81
... 00 00 00 00
... 19
... 76 a9 14 df 3b d3 01 60 e6 c6 14 5b aa f2 c8 8a 88 44 c1 3a 00 d1 d5 88 ac
... ff ff ff ff
... 01
...
... 62 64 01 00 00 00 00 00
... 19
... 76 a9 14 c8 e9 09 96 c7 c6 08 0e e0 62 84 60 0c 68 4e d9 04 d1 4c 5c 88 ac
... 00 00 00 00
... 01 00 00 00"""
...
>>> x
>>> '\n01 00 00 00\n01\n\n48 4d 40 d4 5b 9e a0 d6 52 fc a8 25 8a b7 ca a4 25 41 eb 52 97 58 57 f9 6f b5 0c d7 32 c8 b4 81\n00 00 00 00\n19\n76 a9 14 df 3b d3 01 60 e6 c6 14 5b aa f2 c8 8a 88 44 c1 3a 00 d1 d5 88 ac\nff ff ff ff\n01\n\n62 64 01 00 00 00 00 00\n19\n76 a9 14 c8 e9 09 96 c7 c6 08 0e e0 62 84 60 0c 68 4e d9 04 d1 4c 5c 88 ac\n00 00 00 00\n01 00 00 00'

>>> y = x.replace("\n","")
>>> y
>>> '01 00 00 000148 4d 40 d4 5b 9e a0 d6 52 fc a8 25 8a b7 ca a4 25 41 eb 52 97 58 57 f9 6f b5 0c d7 32 c8 b4 8100 00 00 001976 a9 14 df 3b d3 01 60 e6 c6 14 5b aa f2 c8 8a 88 44 c1 3a 00 d1 d5 88 acff ff ff ff0162 64 01 00 00 00 00 001976 a9 14 c8 e9 09 96 c7 c6 08 0e e0 62 84 60 0c 68 4e d9 04 d1 4c 5c 88 ac00 00 00 0001 00 00 00'

>>> y = y.replace(" ","")
>>> y
>>> '0100000001484d40d45b9ea0d652fca8258ab7caa42541eb52975857f96fb50cd732c8b481000000001976a914df3bd30160e6c6145baaf2c88a8844c13a00d1d588acffffffff0162640100000000001976a914c8e90996c7c6080ee06284600c684ed904d14c5c88ac0000000001000000'

>>> from pypy_sha256 import sha256
>>> from binascii import hexlify, unhexlify
>>> input_hex_string = y
>>> byte_sequence = unhexlify(input_hex_string)
>>> digest = sha256(byte_sequence).digest()
>>> digest2 = sha256(digest).digest()
>>> print hexlify(digest2)
>>> 5fda68729a6312e17e641e9a49fac2a4a6a680126610af573caab270d232f850



The result of this operation is:
5fda68729a6312e17e641e9a49fac2a4a6a680126610af573caab270d232f850


aineko:work stjohnpiano$ echo -n "5fda68729a6312e17e641e9a49fac2a4a6a680126610af573caab270d232f850" | sed 's/../& /g'

5f da 68 72 9a 63 12 e1 7e 64 1e 9a 49 fa c2 a4 a6 a6 80 12 66 10 af 57 3c aa b2 70 d2 32 f8 50


In space-separated hex bytes, the result is:
5f da 68 72 9a 63 12 e1 7e 64 1e 9a 49 fa c2 a4 a6 a6 80 12 66 10 af 57 3c aa b2 70 d2 32 f8 50


This result is the double SHA256 hash of the transaction-to-be-signed.



Let's look at Ken Shirriff's verifyTxnSignature(signed_txn) function.

Its code is in txnUtils.py, along with the code for some of the functions it calls.

python
# Returns [first, sig, pub, rest]
def parseTxn(txn):
    first = txn[0:41*2]
    scriptLen = int(txn[41*2:42*2], 16)
    script = txn[42*2:42*2+2*scriptLen]
    sigLen = int(script[0:2], 16)
    sig = script[2:2+sigLen*2]
    pubLen = int(script[2+sigLen*2:2+sigLen*2+2], 16)
    pub = script[2+sigLen*2+2:]
            
    assert(len(pub) == pubLen*2)
    rest = txn[42*2+2*scriptLen:]
    return [first, sig, pub, rest]         

# Substitutes the scriptPubKey into the transaction, appends SIGN_ALL to make the version
# of the transaction that can be signed
def getSignableTxn(parsed):
    first, sig, pub, rest = parsed
    inputAddr = utils.base58CheckDecode(keyUtils.pubKeyToAddr(pub))
    return first + "1976a914" + inputAddr.encode('hex') + "88ac" + rest + "01000000"

# Verifies that a transaction is properly signed, assuming the generated scriptPubKey matches
# the one in the previous transaction's output
def verifyTxnSignature(txn):                    
    parsed = parseTxn(txn)      
    signableTxn = getSignableTxn(parsed)
    hashToSign = hashlib.sha256(hashlib.sha256(signableTxn.decode('hex')).digest()).digest().encode('hex')
    assert(parsed[1][-2:] == '01') # hashtype
    sig = keyUtils.derSigToHexSig(parsed[1][:-2])
    public_key = parsed[2]
    vk = ecdsa.VerifyingKey.from_string(public_key[2:].decode('hex'), curve=ecdsa.SECP256k1)
    assert(vk.verify_digest(sig.decode('hex'), hashToSign.decode('hex')))




parseTxn(txn) and getSignableTxn(parsed) approximately do the same process that I have just done manually.

This line:
assert(parsed[1][-2:] == '01') # hashtype

is a check for the hash type that was appended to the scriptSig.

From an earlier excerpt:

One step that tripped me up is the hash type. Before signing, the transaction has a hash type constant temporarily appended. For a regular transaction, this is SIGHASH_ALL (0x00000001). After signing, this hash type is removed from the end of the transaction and appended to the scriptSig.


Ken Shirriff's code appends this SIGHASH_ALL hash type in little-endian form "01 00 00 00" prior to signing, but the verification check looks at the last hex byte, which suggests that it was appended to scriptSig in big-endian form.

The tr_ken2 scriptSig in question is:
47 30 44 02 20 2c b2 65 bf 10 70 7b f4 93 46 c3 51 5d d3 d1 6f c4 54 61 8c 58 ec 0a 0f f4 48 a6 76 c5 4f f7 13 02 20 6c 66 24 d7 62 a1 fc ef 46 18 28 4e ad 8f 08 67 8a c0 5b 13 c8 42 35 f1 65 4e 6a d1 68 23 3e 82 01 41 04 14 e3 01 b2 32 8f 17 44 2c 0b 83 10 d7 87 bf 3d 8a 40 4c fb d0 70 4f 13 5b 6a d4 b2 d3 ee 75 13 10 f9 81 92 6e 53 a6 e8 c3 9b d7 d3 fe fd 57 6c 54 3c ce 49 3c ba c0 63 88 f2 65 1d 1a ac bf cd



It does not end in "01".

However, the subdivided readable form of the scriptSig is:
- PUSHDATA(71)
- [transaction signature] 30 44 02 20 2c b2 65 bf 10 70 7b f4 93 46 c3 51 5d d3 d1 6f c4 54 61 8c 58 ec 0a 0f f4 48 a6 76 c5 4f f7 13 02 20 6c 66 24 d7 62 a1 fc ef 46 18 28 4e ad 8f 08 67 8a c0 5b 13 c8 42 35 f1 65 4e 6a d1 68 23 3e 82 01
- PUSHDATA(65)
- [bitcoin public key] 04 14 e3 01 b2 32 8f 17 44 2c 0b 83 10 d7 87 bf 3d 8a 40 4c fb d0 70 4f 13 5b 6a d4 b2 d3 ee 75 13 10 f9 81 92 6e 53 a6 e8 c3 9b d7 d3 fe fd 57 6c 54 3c ce 49 3c ba c0 63 88 f2 65 1d 1a ac bf cd


This fits with the fact that the parsed() function returns "sig" and "pub" as separate items.

The transaction signature section, or {sig}, does end in 01, indicating that the hash type is appended to the {sig} section of the scriptSig, rather than to the entire scriptSig item itself.

The three bytes that preceed the "01" byte at the end of {sig} are not "00 00 00", suggesting that although a four byte version of the idea {hash_type==1} is appended to the transaction prior to signing, only a one byte version is appended to the signature in the final transaction.



I'll run parseTxn on tr_ken2 and take a look at its outputs.


First, get tr_ken2 as a hex byte string.


Ken Shirriff's signed transaction:

[Signed Transaction]
- version: 01 00 00 00
- input count: 01
- input:
-- previous output hash (reversed): 48 4d 40 d4 5b 9e a0 d6 52 fc a8 25 8a b7 ca a4 25 41 eb 52 97 58 57 f9 6f b5 0c d7 32 c8 b4 81
-- previous output index: 00 00 00 00
-- script length: 8a
-- scriptSig: 47 30 44 02 20 2c b2 65 bf 10 70 7b f4 93 46 c3 51 5d d3 d1 6f c4 54 61 8c 58 ec 0a 0f f4 48 a6 76 c5 4f f7 13 02 20 6c 66 24 d7 62 a1 fc ef 46 18 28 4e ad 8f 08 67 8a c0 5b 13 c8 42 35 f1 65 4e 6a d1 68 23 3e 82 01 41 04 14 e3 01 b2 32 8f 17 44 2c 0b 83 10 d7 87 bf 3d 8a 40 4c fb d0 70 4f 13 5b 6a d4 b2 d3 ee 75 13 10 f9 81 92 6e 53 a6 e8 c3 9b d7 d3 fe fd 57 6c 54 3c ce 49 3c ba c0 63 88 f2 65 1d 1a ac bf cd
-- sequence: ff ff ff ff
- output count: 01
- output:
-- value: 62 64 01 00 00 00 00 00
-- script length: 19
-- scriptPubKey: 76 a9 14 c8 e9 09 96 c7 c6 08 0e e0 62 84 60 0c 68 4e d9 04 d1 4c 5c 88 ac
- block lock time: 00 00 00 00



Following the same process as that applied earlier to get tr_ken2_to_be_signed into a single hex byte string, I get:

0100000001484d40d45b9ea0d652fca8258ab7caa42541eb52975857f96fb50cd732c8b481000000008a47304402202cb265bf10707bf49346c3515dd3d16fc454618c58ec0a0ff448a676c54ff71302206c6624d762a1fcef4618284ead8f08678ac05b13c84235f1654e6ad168233e8201410414e301b2328f17442c0b8310d787bf3d8a404cfbd0704f135b6ad4b2d3ee751310f981926e53a6e8c39bd7d3fefd576c543cce493cbac06388f2651d1aacbfcdffffffff0162640100000000001976a914c8e90996c7c6080ee06284600c684ed904d14c5c88ac00000000




parser_ken.py

python 2.7.13
#!/opt/local/bin/python

def main():

	tr_ken2 = "0100000001484d40d45b9ea0d652fca8258ab7caa42541eb52975857f96fb50cd732c8b481000000008a47304402202cb265bf10707bf49346c3515dd3d16fc454618c58ec0a0ff448a676c54ff71302206c6624d762a1fcef4618284ead8f08678ac05b13c84235f1654e6ad168233e8201410414e301b2328f17442c0b8310d787bf3d8a404cfbd0704f135b6ad4b2d3ee751310f981926e53a6e8c39bd7d3fefd576c543cce493cbac06388f2651d1aacbfcdffffffff0162640100000000001976a914c8e90996c7c6080ee06284600c684ed904d14c5c88ac00000000"
	[first, sig, pub, rest] = parseTxn(tr_ken2)

	print "first: %s" % first
	print "sig: %s" % sig
	print "pub: %s" % pub
	print "rest: %s" % rest



# Returns [first, sig, pub, rest]
def parseTxn(txn):
	first = txn[0:41*2]
	scriptLen = int(txn[41*2:42*2], 16)
	script = txn[42*2:42*2+2*scriptLen]
	sigLen = int(script[0:2], 16)
	sig = script[2:2+sigLen*2]
	pubLen = int(script[2+sigLen*2:2+sigLen*2+2], 16)
	pub = script[2+sigLen*2+2:]
	
	assert(len(pub) == pubLen*2)
	rest = txn[42*2+2*scriptLen:]
	return [first, sig, pub, rest]


if __name__ == "__main__": main()



aineko:work stjohnpiano$ ./parser_ken.py

first: 0100000001484d40d45b9ea0d652fca8258ab7caa42541eb52975857f96fb50cd732c8b48100000000
sig: 304402202cb265bf10707bf49346c3515dd3d16fc454618c58ec0a0ff448a676c54ff71302206c6624d762a1fcef4618284ead8f08678ac05b13c84235f1654e6ad168233e8201
pub: 0414e301b2328f17442c0b8310d787bf3d8a404cfbd0704f135b6ad4b2d3ee751310f981926e53a6e8c39bd7d3fefd576c543cce493cbac06388f2651d1aacbfcd
rest: ffffffff0162640100000000001976a914c8e90996c7c6080ee06284600c684ed904d14c5c88ac00000000





Ok. The last byte of "sig", as returned by the parsed() function, is "01" as expected. The first byte of "sig" is the same as the first byte of {sig} ("30").


The next line in verifyTxnSignature(txn) is:
sig = keyUtils.derSigToHexSig(parsed[1][:-2])


So, we take the "sig" result from parsed(), which I think is the same as {sig}, without the last hash_type byte, and without the sig_length var_int at the beginning.

Then, we convert it from DER encoding to a hex byte string.

Here is tr_ken2 {sig} again (without the final 01 hash_type byte):
30 44 02 20 2c b2 65 bf 10 70 7b f4 93 46 c3 51 5d d3 d1 6f c4 54 61 8c 58 ec 0a 0f f4 48 a6 76 c5 4f f7 13 02 20 6c 66 24 d7 62 a1 fc ef 46 18 28 4e ad 8f 08 67 8a c0 5b 13 c8 42 35 f1 65 4e 6a d1 68 23 3e 82




Let's look at keyUtils.py.


I find:


python
# http://pypi.python.org/pypi/ecdsa/0.10
import ecdsa
import ecdsa.der

[...]

# Input is a hex-encoded, DER-encoded signature
# Output is a 64-byte hex-encoded signature
def derSigToHexSig(s):
	s, junk = ecdsa.der.remove_sequence(s.decode('hex'))
	if junk != '':
		print 'JUNK', junk.encode('hex')
	assert(junk == '')
	x, s = ecdsa.der.remove_integer(s)
	y, s = ecdsa.der.remove_integer(s)
	return '%064x%064x' % (x, y)




I have downloaded a copy of the ecdsa library available at
pypi.python.org/pypi/ecdsa/0.10
It came with this filename: ecdsa-0.10.tar.gz

I've saved my copy as an asset of this article. [link]


Let's uncompress the archive.

aineko:Downloads stjohnpiano$ tar -zxvf ecdsa-0.10.tar.gz

x ecdsa-0.10/
x ecdsa-0.10/ecdsa/
x ecdsa-0.10/ecdsa/__init__.py
x ecdsa-0.10/ecdsa/_version.py
x ecdsa-0.10/ecdsa/curves.py
x ecdsa-0.10/ecdsa/der.py
x ecdsa-0.10/ecdsa/ecdsa.py
x ecdsa-0.10/ecdsa/ellipticcurve.py
x ecdsa-0.10/ecdsa/keys.py
x ecdsa-0.10/ecdsa/numbertheory.py
x ecdsa-0.10/ecdsa/rfc6979.py
x ecdsa-0.10/ecdsa/six.py
x ecdsa-0.10/ecdsa/test_pyecdsa.py
x ecdsa-0.10/ecdsa/util.py
x ecdsa-0.10/LICENSE
x ecdsa-0.10/MANIFEST.in
x ecdsa-0.10/NEWS
x ecdsa-0.10/PKG-INFO
x ecdsa-0.10/README.md
x ecdsa-0.10/setup.py



From within the resulting directory ecdsa-0.10, copy the directory ecdsa to the work directory.

The ecdsa directory contains an __init__.py file and a ecdsa.py file, so (I think) I should be able to use "import ecdsa" within a python script in the work directory. I don't know the exact details of how python importing works.







check_signature.py

python 2.7.13
#!/opt/local/bin/python

import ecdsa
import ecdsa.der

def main():

	tr_ken2_sig = "30 44 02 20 2c b2 65 bf 10 70 7b f4 93 46 c3 51 5d d3 d1 6f c4 54 61 8c 58 ec 0a 0f f4 48 a6 76 c5 4f f7 13 02 20 6c 66 24 d7 62 a1 fc ef 46 18 28 4e ad 8f 08 67 8a c0 5b 13 c8 42 35 f1 65 4e 6a d1 68 23 3e 82"
	
	input_hex_byte_string = tr_ken2_sig.replace(" ","")

	hex_sig = derSigToHexSig(input_hex_byte_string)
	
	print "hex_sig: %s" % hex_sig
	

# Input is a hex-encoded, DER-encoded signature
# Output is a 64-byte hex-encoded signature
def derSigToHexSig(s):
	s, junk = ecdsa.der.remove_sequence(s.decode('hex'))
	if junk != '':
		print 'JUNK', junk.encode('hex')
	assert(junk == '')
	x, s = ecdsa.der.remove_integer(s)
	y, s = ecdsa.der.remove_integer(s)
	return '%064x%064x' % (x, y)

if __name__ == "__main__": main()



aineko:work stjohnpiano$ chmod 700 check_signature.py


aineko:work stjohnpiano$ ./check_signature.py

hex_sig: 2cb265bf10707bf49346c3515dd3d16fc454618c58ec0a0ff448a676c54ff7136c6624d762a1fcef4618284ead8f08678ac05b13c84235f1654e6ad168233e82



Result:
2cb265bf10707bf49346c3515dd3d16fc454618c58ec0a0ff448a676c54ff7136c6624d762a1fcef4618284ead8f08678ac05b13c84235f1654e6ad168233e82

Result in space-separated bytes:
2c b2 65 bf 10 70 7b f4 93 46 c3 51 5d d3 d1 6f c4 54 61 8c 58 ec 0a 0f f4 48 a6 76 c5 4f f7 13 6c 66 24 d7 62 a1 fc ef 46 18 28 4e ad 8f 08 67 8a c0 5b 13 c8 42 35 f1 65 4e 6a d1 68 23 3e 82



Is the output 64 bytes?

aineko:work stjohnpiano$ echo -n "2cb265bf10707bf49346c3515dd3d16fc454618c58ec0a0ff448a676c54ff7136c6624d762a1fcef4618284ead8f08678ac05b13c84235f1654e6ad168233e82" | wc -c

128


128/2 = 64, so yes. (two hex characters are one byte.)


Let's look at the last three lines of verifyTxnSignature(txn).

public_key = parsed[2]
vk = ecdsa.VerifyingKey.from_string(public_key[2:].decode('hex'), curve=ecdsa.SECP256k1)
assert(vk.verify_digest(sig.decode('hex'), hashToSign.decode('hex')))



So, the first line shows me that I need the public key from scriptSig in tr_ken2, which is:
04 14 e3 01 b2 32 8f 17 44 2c 0b 83 10 d7 87 bf 3d 8a 40 4c fb d0 70 4f 13 5b 6a d4 b2 d3 ee 75 13 10 f9 81 92 6e 53 a6 e8 c3 9b d7 d3 fe fd 57 6c 54 3c ce 49 3c ba c0 63 88 f2 65 1d 1a ac bf cd


Note: The result of running parser_ken.py shows that the length var_int at the start is not included.


The second of the three lines shows me that I should remove the first byte from the public key, convert it from hex into raw bytes, and then use the ecdsa library to create a key object from these raw bytes.

The first byte is "04".

Excerpt from earlier:

Inconveniently, the Bitcoin protocol adds a prefix of 04 to the public key.


So this first byte is a prefix, and is not part of the actual public key value.


Finally, the last line is an assertion that the signature is the result of signing the double-sha256-hash of the transaction-to-be-signed.


Ok.


Let's extend the capabilities of check_signature.py.



check_signature.py

python 2.7.13
#!/opt/local/bin/python

import ecdsa
import ecdsa.der

def main():

	from binascii import hexlify, unhexlify

	# double sha256 hash digest of transaction-to-be-signed. 
	tr_ken2_hash_hex_bytes = "5f da 68 72 9a 63 12 e1 7e 64 1e 9a 49 fa c2 a4 a6 a6 80 12 66 10 af 57 3c aa b2 70 d2 32 f8 50"

	tr_ken2_der_sig_hex_bytes = "30 44 02 20 2c b2 65 bf 10 70 7b f4 93 46 c3 51 5d d3 d1 6f c4 54 61 8c 58 ec 0a 0f f4 48 a6 76 c5 4f f7 13 02 20 6c 66 24 d7 62 a1 fc ef 46 18 28 4e ad 8f 08 67 8a c0 5b 13 c8 42 35 f1 65 4e 6a d1 68 23 3e 82"
	
	tr_ken2_public_key_hex_bytes = "04 14 e3 01 b2 32 8f 17 44 2c 0b 83 10 d7 87 bf 3d 8a 40 4c fb d0 70 4f 13 5b 6a d4 b2 d3 ee 75 13 10 f9 81 92 6e 53 a6 e8 c3 9b d7 d3 fe fd 57 6c 54 3c ce 49 3c ba c0 63 88 f2 65 1d 1a ac bf cd"
	
	hash_hex_byte_string = tr_ken2_hash_hex_bytes.replace(" ","")
	der_sig_hex_byte_string = tr_ken2_der_sig_hex_bytes.replace(" ","")
	public_key_hex_byte_string = tr_ken2_public_key_hex_bytes.replace(" ","")
	
	# remove "04" prefix.
	public_key_hex_byte_string = public_key_hex_byte_string[2:] 
	
	# convert sig from DER encoding to hex
	sig_hex_byte_string = derSigToHexSig(der_sig_hex_byte_string)
	
	# convert sig, public key, and hash to raw byte sequences
	hash_raw_byte_string = unhexlify(hash_hex_byte_string)
	sig_raw_byte_string = unhexlify(sig_hex_byte_string)
	public_key_raw_byte_string = unhexlify(public_key_hex_byte_string)
	
	# create a key object from the raw-byte-sequence public key.
	vk = ecdsa.VerifyingKey.from_string(public_key_raw_byte_string, curve=ecdsa.SECP256k1)
	
	# verify the signature
	verification_result = vk.verify_digest(sig_raw_byte_string, hash_raw_byte_string)
	
	print "hash_hex_byte_string (%d bytes): %s" % (len(hash_hex_byte_string)/2, hash_hex_byte_string)
	print "sig_hex_byte_string (%d bytes): %s" % (len(sig_hex_byte_string)/2, sig_hex_byte_string)
	print "public_key_hex_byte_string (%d bytes): %s" % (len(public_key_hex_byte_string)/2, public_key_hex_byte_string)
	print "type(verification_result): %s" % type(verification_result)
	print "verification_result: %s" % verification_result
	
	
	

# Input is a hex-encoded, DER-encoded signature
# Output is a 64-byte hex-encoded signature
def derSigToHexSig(s):
	s, junk = ecdsa.der.remove_sequence(s.decode('hex'))
	if junk != '':
		print 'JUNK', junk.encode('hex')
	assert(junk == '')
	x, s = ecdsa.der.remove_integer(s)
	y, s = ecdsa.der.remove_integer(s)
	return '%064x%064x' % (x, y)
	
if __name__ == "__main__": main()



aineko:work stjohnpiano$ ./check_signature.py

hash_hex_byte_string (32 bytes): 5fda68729a6312e17e641e9a49fac2a4a6a680126610af573caab270d232f850
sig_hex_byte_string (64 bytes): 2cb265bf10707bf49346c3515dd3d16fc454618c58ec0a0ff448a676c54ff7136c6624d762a1fcef4618284ead8f08678ac05b13c84235f1654e6ad168233e82
public_key_hex_byte_string (64 bytes): 14e301b2328f17442c0b8310d787bf3d8a404cfbd0704f135b6ad4b2d3ee751310f981926e53a6e8c39bd7d3fefd576c543cce493cbac06388f2651d1aacbfcd
type(verification_result): <type 'bool'>
verification_result: True



The ECDSA implementation in this library reports that the signature is valid.


32 bytes is 32*8 = 256 bits, which is the expected length for the result of running SHA256.

64 bytes is 64*8 = 512 bits, which is the expected length of a ECDSA public key.

A signature should be the same length as the key that was used to produce it, and indeed it is (64 bytes).




Excellent.




Next: Let's change a bit in the hash to see if the library correctly reports a verification failure.


I'll change
tr_ken2_hash_hex_bytes = "5f da 68 72 9a 63 12 e1 7e 64 1e 9a 49 fa c2 a4 a6 a6 80 12 66 10 af 57 3c aa b2 70 d2 32 f8 50"
to
tr_ken2_hash_hex_bytes = "5f da 68 72 9a 63 12 e1 7e 64 1e 9a 49 fa c2 a4 a6 a6 80 12 66 10 af 57 3c aa b2 70 d2 32 f8 51"

I've changed the last byte from 50 to 51. In hex byte, a single character represents 4 bits. The last 4 bits of the hash have been changed from 0000 to 0001, so the change is just a single bit.


aineko:work stjohnpiano$ ./check_signature.py

Traceback (most recent call last):
File "./check_signature.py", line 58, in <module>
if __name__ == "__main__": main()
File "./check_signature.py", line 36, in main
verification_result = vk.verify_digest(sig_raw_byte_string, hash_raw_byte_string)
File "/Users/stjohnpiano/Desktop/stuff/PROJECTS_CURRENT/hydra/1_articles_under_construction/reading_and_verifying_a_standard_raw_bitcoin_transaction/work/ecdsa/keys.py", line 111, in verify_digest
raise BadSignatureError
ecdsa.keys.BadSignatureError



Good.





Next: Attempt to verify the signature on my selected transaction (tr_s2). Streamline the process of verification if possible.



raw transaction tr_s2 as a hex byte string:

01000000014f8cad2ca3b602750e06c4300d91b22c17dd97bb5ce635d860df8b0ab325b336010000006b483045022100d8321c3c4b3fc3a34e30547d1cb6aea6855cad330b590b53e824f647dd49239902201e9056e82297a2108b1969cc497150734ab78f51cbf979885afc104f70a65150012102a534d65273f8520e1e841ccbb782b71e06b4117a90bb92c655f8131cf6ae2261fdffffff02e8d94100000000001976a914338cdde52f708236affa5675f969606ff846ee6f88acc4716e01000000001976a9143496950f1a01285a1605ac9337c06a5596b9fcd888accaee0700



tr_s2 in space-separated hex bytes:

01 00 00 00 01 4f 8c ad 2c a3 b6 02 75 0e 06 c4 30 0d 91 b2 2c 17 dd 97 bb 5c e6 35 d8 60 df 8b 0a b3 25 b3 36 01 00 00 00 6b 48 30 45 02 21 00 d8 32 1c 3c 4b 3f c3 a3 4e 30 54 7d 1c b6 ae a6 85 5c ad 33 0b 59 0b 53 e8 24 f6 47 dd 49 23 99 02 20 1e 90 56 e8 22 97 a2 10 8b 19 69 cc 49 71 50 73 4a b7 8f 51 cb f9 79 88 5a fc 10 4f 70 a6 51 50 01 21 02 a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61 fd ff ff ff 02 e8 d9 41 00 00 00 00 00 19 76 a9 14 33 8c dd e5 2f 70 82 36 af fa 56 75 f9 69 60 6f f8 46 ee 6f 88 ac c4 71 6e 01 00 00 00 00 19 76 a9 14 34 96 95 0f 1a 01 28 5a 16 05 ac 93 37 c0 6a 55 96 b9 fc d8 88 ac ca ee 07 00



The result of manually reading tr_s2:

- version: 01 00 00 00
- input count: 01
- input:
-- previous output hash (reversed): 4f 8c ad 2c a3 b6 02 75 0e 06 c4 30 0d 91 b2 2c 17 dd 97 bb 5c e6 35 d8 60 df 8b 0a b3 25 b3 36
-- previous output index: 01 00 00 00
-- script length: 6b
-- scriptSig: 48 30 45 02 21 00 d8 32 1c 3c 4b 3f c3 a3 4e 30 54 7d 1c b6 ae a6 85 5c ad 33 0b 59 0b 53 e8 24 f6 47 dd 49 23 99 02 20 1e 90 56 e8 22 97 a2 10 8b 19 69 cc 49 71 50 73 4a b7 8f 51 cb f9 79 88 5a fc 10 4f 70 a6 51 50 01 21 02 a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61
-- sequence: fd ff ff ff
- output count: 02
- output:
-- value: e8 d9 41 00 00 00 00 00
-- script length: 19
-- scriptPubKey: 76 a9 14 33 8c dd e5 2f 70 82 36 af fa 56 75 f9 69 60 6f f8 46 ee 6f 88 ac
- output:
-- value: c4 71 6e 01 00 00 00 00
-- script length: 19
-- scriptPubKey: 76 a9 14 34 96 95 0f 1a 01 28 5a 16 05 ac 93 37 c0 6a 55 96 b9 fc d8 88 ac
- block lock time: ca ee 07 00




Let's parse tr_s2 with parser1.py and check that the results match those that were found manually.

Save tr_s2 into a new file called tr_s2.txt within the work directory. Edit parser1.py so that
file_path_1 = "tr_s2.txt"



aineko:work stjohnpiano$ ./parser1.py


Transaction:

- [property] byte length: 226
- version: 01 00 00 00
- input_count: 01
-- [property] input_count byte length: 1
-- [property] input_count decimal value: 1

- input #0
-- previous_output_hash: 4f 8c ad 2c a3 b6 02 75 0e 06 c4 30 0d 91 b2 2c 17 dd 97 bb 5c e6 35 d8 60 df 8b 0a b3 25 b3 36
-- previous_output_index: 01 00 00 00
-- script_length: 6b
--- [property] script_length decimal value: 107
-- scriptSig: 48 30 45 02 21 00 d8 32 1c 3c 4b 3f c3 a3 4e 30 54 7d 1c b6 ae a6 85 5c ad 33 0b 59 0b 53 e8 24 f6 47 dd 49 23 99 02 20 1e 90 56 e8 22 97 a2 10 8b 19 69 cc 49 71 50 73 4a b7 8f 51 cb f9 79 88 5a fc 10 4f 70 a6 51 50 01 21 02 a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61
-- sequence: fd ff ff ff

- output_count: 02
-- [property] output_count byte length: 1
-- [property] output_count decimal value: 2

- output #0
-- value: e8 d9 41 00 00 00 00 00
--- [property] output value in bitcoin: 0.04315624
-- script_length: 19
--- [property] script_length decimal value: 25
-- scriptPubKey: 76 a9 14 33 8c dd e5 2f 70 82 36 af fa 56 75 f9 69 60 6f f8 46 ee 6f 88 ac

- output #1
-- value: c4 71 6e 01 00 00 00 00
--- [property] output value in bitcoin: 0.24015300
-- script_length: 19
--- [property] script_length decimal value: 25
-- scriptPubKey: 76 a9 14 34 96 95 0f 1a 01 28 5a 16 05 ac 93 37 c0 6a 55 96 b9 fc d8 88 ac

- block lock time: ca ee 07 00




Results match.



I'll write a second parser, using parser1.py as a guide.

This new parser should construct a Transaction object to store all the results.



parser2.py

python 2.7.13
#!/opt/local/bin/python

def main():

	print "hello world"


if __name__ == "__main__": main()



aineko:work stjohnpiano$ chmod 700 parser2.py


aineko:work stjohnpiano$ ./parser2.py

hello world




[development occurs here]




parser2.py

python 2.7.13
#!/opt/local/bin/python

def main():

	raw_transaction_file_path = "tr_s2.txt"
	raw_transaction = file_get_contents(raw_transaction_file_path)
	
	transaction = Transaction()
	transaction.parse_raw_transaction(raw_transaction)
	
	print transaction.to_string()




class Transaction:

	# expected format (raw hex bytes):
	# - version: 4 bytes
	# - input_count: (var_int) 1 to 9 bytes
	# - inputs: 
	# -- previous_output_hash: 32 bytes (little-endian)
	# -- previous_output_index: 4 bytes (little-endian)
	# -- script_length: (var_int) 1 to 9 bytes
	# -- scriptSig
	# -- sequence: 4 bytes
	# - output_count: (var_int) 1 to 9 bytes
	# - outputs:
	# -- value: 8 bytes (little-endian)
	# -- script_length: (var_int) 1 to 9 bytes
	# -- scriptPubKey
	# - block lock time: 4 bytes
	
	# notes:
	# - first byte of var_int indicates its total byte length. More details are in the comments for the get_length_of_var_int function.
	
	# questions:
	# - is sequence little-endian? 
	# - I recall that block_lock_time has an odd format. 
	
	# warning:
	# - other transactions could have formats that don't fit the expected format. 
	
	def __init__(self):
		self.raw_transaction = None # original byte string
		self.hex_bytes = None # list of original bytes
		self.byte_length = None
		self.version = None # original bytes
		self.input_count_bytes = None # original bytes
		self.input_count_decimal = None
		self.inputs = None # dictionary of original bytes
		self.output_count_bytes = None # original bytes
		self.output_count_decimal = None
		self.outputs = None # dictionary of original bytes
		self.block_lock_time = None # original bytes
		
	def parse_raw_transaction(self, raw_transaction):
		self.check_raw_transaction(raw_transaction)
		hex_bytes = [raw_transaction[i:i+2] for i in xrange(0, len(raw_transaction), 2)]
		version = hex_bytes[0:4]
		index = 4 # this variable stores the index of the-next-byte-to-process in the hex_bytes list (which is 0-indexed).
		# get number of inputs
		input_count_bytes, index = get_var_int_from_hex_bytes(hex_bytes, index)
		input_count_decimal = hex_bytes_to_decimal(input_count_bytes)
		# prepare the storage for the inputs
		input = {
			"previous_output_hash": None, 
			"previous_output_index": None,
			"script_length": None,
			"scriptSig": None,
			"sequence": None,
			}
		inputs = [None]*input_count_decimal
		for i in xrange(input_count_decimal):
			# explicitly copy the dictionary to make sure that the different inputs are different objects. 
			inputs[i] = input.copy()
		# parse each input item
		for i in xrange(input_count_decimal):
			previous_output_hash = hex_bytes[index:(index+32)]
			index += 32
			previous_output_index = hex_bytes[index:(index+4)]
			index += 4
			script_length_bytes, index = get_var_int_from_hex_bytes(hex_bytes, index)
			script_length_decimal = hex_bytes_to_decimal(script_length_bytes)
			scriptSig = hex_bytes[index:(index+script_length_decimal)]
			index += script_length_decimal
			sequence = hex_bytes[index:(index+4)]
			index += 4
			# store the input bytes
			input = inputs[i]
			input["previous_output_hash"] = previous_output_hash
			input["previous_output_index"] = previous_output_index
			input["script_length"] = script_length_bytes
			input["scriptSig"] = scriptSig
			input["sequence"] = sequence
		# get number of outputs
		output_count_bytes, index = get_var_int_from_hex_bytes(hex_bytes, index)
		output_count_decimal = hex_bytes_to_decimal(output_count_bytes)
		# prepare the storage for the outputs
		output = {
			"value": None, 
			"script_length": None,
			"scriptPubKey": None,
			}
		outputs = [None]*output_count_decimal
		for i in xrange(output_count_decimal):
			# explicitly copy the dictionary to make sure that the different outputs are different objects.
			outputs[i] = output.copy()
		# parse each input item
		for i in xrange(output_count_decimal):
			value = hex_bytes[index:(index+8)]
			index += 8
			script_length_bytes, index = get_var_int_from_hex_bytes(hex_bytes, index)
			script_length_decimal = hex_bytes_to_decimal(script_length_bytes)
			scriptPubKey = hex_bytes[index:(index+script_length_decimal)]
			index += script_length_decimal
			# store the output bytes
			output = outputs[i]
			output["value"] = value
			output["script_length"] = script_length_bytes
			output["scriptPubKey"] = scriptPubKey
		# get the block_lock_time
		block_lock_time = hex_bytes[index:(index+4)]
		index += 4
		# confirm that we have traversed the exact length of the raw transaction by checking whether:
		# - the index of the-next-byte-to-be-processed (in a 0-indexed list) == the raw-transaction-byte-length (1-indexed)
		assert index == len(hex_bytes)
		# store the results of parsing the raw transaction in this instance's properties. 
		self.raw_transaction = raw_transaction
		self.hex_bytes = hex_bytes
		self.byte_length = len(hex_bytes)
		self.version = version
		self.input_count_bytes = input_count_bytes
		self.input_count_decimal = input_count_decimal
		self.inputs = inputs
		self.output_count_bytes = output_count_bytes
		self.output_count_decimal = output_count_decimal
		self.outputs = outputs
		self.block_lock_time = block_lock_time
		
	
	def to_string(self):
		# return a printable string representation of self.
		s = "\nTransaction:"
		s += "\n- [derived property] byte length: %s" % self.byte_length
		s += "\n- version: %s" % hex_bytes_to_string(self.version)
		s += "\n- input_count: %s" % hex_bytes_to_string(self.input_count_bytes)
		s += "\n- [derived property] input_count_decimal: %d" % self.input_count_decimal
		for i, input in enumerate(self.inputs): # same order as found within transaction.
			s += "\n- input #%d:" % i
			s += "\n-- previous_output_hash: %s" % hex_bytes_to_string(input["previous_output_hash"])
			previous_output_hash_big_endian = list(reversed(input["previous_output_hash"]))
			
			s += "\n-- [derived property] previous_output_hash (no spaces, big-endian): %s" % "".join(previous_output_hash_big_endian)
			s += "\n-- previous_output_index: %s" % hex_bytes_to_string(input["previous_output_index"])
			previous_output_index_big_endian = list(reversed(input["previous_output_index"]))
			s += "\n-- [derived property] previous_output_index_decimal: %d" % hex_bytes_to_decimal(previous_output_index_big_endian)
			s += "\n-- script_length: %s" % hex_bytes_to_string(input["script_length"])
			s += "\n-- [derived property] script_length_decimal: %d" % hex_bytes_to_decimal(input["script_length"])
			s += "\n-- scriptSig: %s" % hex_bytes_to_string(input["scriptSig"])
			s += "\n-- sequence: %s" % hex_bytes_to_string(input["sequence"])
		s += "\n- output_count: %s" % hex_bytes_to_string(self.output_count_bytes)
		s += "\n- [derived property] output_count_decimal: %d" % self.output_count_decimal
		for i, output in enumerate(self.outputs): # same order as found within transaction.
			s += "\n- output #%d:" % i
			s += "\n-- value: %s" % hex_bytes_to_string(output["value"])
			s += "\n-- [derived property] output value in bitcoin: %s" % self.convert_output_value_to_bitcoin_string(output["value"])
			s += "\n-- script_length: %s" % hex_bytes_to_string(output["script_length"])
			s += "\n-- [derived property] script_length_decimal: %s" % hex_bytes_to_decimal(output["script_length"])
			s += "\n-- scriptPubKey: %s" % hex_bytes_to_string(output["scriptPubKey"])
		s += "\n"
		return s
		
	
	def convert_output_value_to_bitcoin_string(self, value):
		# convert 8-byte output value (little-endian list of hex bytes) into bitcoin value. 
		value = list(reversed(value)) # change to big-endian
		vs = "".join(value) # vs = value string
		vs_decimal = str(hex_string_to_decimal(vs)) # value string in decimal form
		n = len(vs_decimal) # number of characters in the decimal form of the value string.
		if n - 8 > 0: # more than 8 characters i.e. greater than 1 btc. bitcoin has 8 decimal places.
			# add the decimal point at the appropriate position within the value string.
			section1 = vs_decimal[:-8]
			section2 = vs_decimal[-8:]
			vs_decimal = section1 + "." + section2
		else:
			# 8 or fewer characters i.e. less than 1 btc.
			# add a 0, a decimal point, and the appropriate number of 0s in front of the value string.
			n_zeros = abs(n-8) # appropriate number of zeros
			vs_decimal = "0." + "0"*n_zeros + vs_decimal
		return vs_decimal
	
	
	def check_raw_transaction(self, input):
		# input should be an even number of lower-case hex byte characters, with no spaces.
		permitted_characters = "0123456789abcdef"
		for character in input:
			if character not in permitted_characters:
				stop("character %s is not in permitted characters list: %s" % (character, permitted_characters))
		if len(input) % 2 != 0:
			stop("length of input string is not even")
		return 0



def hex_bytes_to_string(hex_bytes):
	hex_byte_string = " ".join(hex_bytes) # space-separated hex byte string
	return hex_byte_string


def hex_bytes_to_decimal(hex_bytes):
	# expects list of hex bytes
	hex_byte_string = "".join(hex_bytes)
	decimal_value = hex_string_to_decimal(hex_byte_string)
	return decimal_value


def hex_string_to_decimal(string):
	# expects string of hex bytes
	total = 0
	n = len(string)
	for i in xrange(n):
		c = string[i] # c = character
		v = hex_character_to_decimal(c) # v = value
		power = n - i - 1 # first character has the highest power of 16. power of last character is 0. 
		total += v*(16**power)
	return total
	

def hex_character_to_decimal(c):
	decimal_digits = "0123456789"
	hex_digits = {"a": 10, "b": 11, "c": 12, "d": 13, "e": 14, "f": 15}
	if c in decimal_digits:
		return int(c)
	elif c in hex_digits.keys():
		return hex_digits[c]
	stop("character %s is not a hex character." % c)


def get_var_int_from_hex_bytes(hex_bytes, index):
	# expects hex bytes list and the index of the-next-byte-to-process. 
	# returns var_int hex bytes list and the index of the-next-byte-to-process. 
	byte_1 = hex_bytes[index]
	byte_length = get_length_of_var_int(byte_1)
	if byte_length > 1:
		# future: handle var_ints longer than 1 byte here.
		stop("get_var_int_from_hex_bytes mark1: unwritten section")
	elif byte_length == 1:
		var_int = [byte_1]
		index += 1
	return [var_int, index]
	

def get_length_of_var_int(hex_byte):
	# expected format:
	# - single hex byte (two characters).
	# - certain high byte values indicate that N further bytes are actually part of the value of this variable_integer.
	# source:
	# http://en.bitcoin.it/wiki/Protocol_documentation#Variable_length_integer
	if hex_byte == "ff": 
		# next 8 bytes are a uint64 (unsigned 64-bit integer)
		return 9
	elif hex_byte == "fe":
		# next 4 bytes are a uint32
		return 5
	elif hex_byte == "fd":
		# next 2 bytes are a uint16
		return 3
	# otherwise, this byte is the value of the variable integer, and the length of the integer is 1 byte (uint8). 
	return 1


def file_get_contents(file_path):
	from os.path import isfile
	if not isfile(file_path): 
		stop("%s is not a file." % file_path)
	f = open(file_path, "r")
	data = f.read()
	f.close()
	return data
	

def stop(message):
	# raise an Exception to get a traceback. 
	raise Exception("ERROR: %s\n" % message)


if __name__ == "__main__": main()



aineko:work stjohnpiano$ ./parser2.py


Transaction:
- [derived property] byte length: 226
- version: 01 00 00 00
- input_count: 01
- [derived property] input_count_decimal: 1
- input #0:
-- previous_output_hash: 4f 8c ad 2c a3 b6 02 75 0e 06 c4 30 0d 91 b2 2c 17 dd 97 bb 5c e6 35 d8 60 df 8b 0a b3 25 b3 36
-- [derived property] previous_output_hash (no spaces, big-endian): 36b325b30a8bdf60d835e65cbb97dd172cb2910d30c4060e7502b6a32cad8c4f
-- previous_output_index: 01 00 00 00
-- [derived property] previous_output_index_decimal: 1
-- script_length: 6b
-- [derived property] script_length_decimal: 107
-- scriptSig: 48 30 45 02 21 00 d8 32 1c 3c 4b 3f c3 a3 4e 30 54 7d 1c b6 ae a6 85 5c ad 33 0b 59 0b 53 e8 24 f6 47 dd 49 23 99 02 20 1e 90 56 e8 22 97 a2 10 8b 19 69 cc 49 71 50 73 4a b7 8f 51 cb f9 79 88 5a fc 10 4f 70 a6 51 50 01 21 02 a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61
-- sequence: fd ff ff ff
- output_count: 02
- [derived property] output_count_decimal: 2
- output #0:
-- value: e8 d9 41 00 00 00 00 00
-- [derived property] output value in bitcoin: 0.04315624
-- script_length: 19
-- [derived property] script_length_decimal: 25
-- scriptPubKey: 76 a9 14 33 8c dd e5 2f 70 82 36 af fa 56 75 f9 69 60 6f f8 46 ee 6f 88 ac
- output #1:
-- value: c4 71 6e 01 00 00 00 00
-- [derived property] output value in bitcoin: 0.24015300
-- script_length: 19
-- [derived property] script_length_decimal: 25
-- scriptPubKey: 76 a9 14 34 96 95 0f 1a 01 28 5a 16 05 ac 93 37 c0 6a 55 96 b9 fc d8 88 ac





Ok.

These two output lines within an input of the parsed transaction:
-- [derived property] previous_output_hash (no spaces, big-endian): 36b325b30a8bdf60d835e65cbb97dd172cb2910d30c4060e7502b6a32cad8c4f
[...]
-- [derived property] previous_output_index_decimal: 1

provide me with the information necessary to:
- look up the previous transaction that supplied this input (which will be tr_s1), then download it in raw form and parse it.
- select the scriptPubKey from the correct output from the parsed form of this previous transaction.



Construct query for getting raw transaction tr_s1 from blockchain.info:
http://blockchain.info/tx/36b325b30a8bdf60d835e65cbb97dd172cb2910d30c4060e7502b6a32cad8c4f?format=hex

Result:
0100000001803945b5278fc29094f3aca6b328e5f0b478a9fb825949c8a9d5409a5d66d9d0010000006a47304402206217b2f6c540c171b86141a4449ae1ebd1e9356f41c73773bb71abf672392a21022033253de37a4cc90883009b3ba57b665d04110ddcf2d2018f1d411c94a90af7210121023043a87b930b2a2b42abb85620257bca8a535198a080c6c6eca104932042cca0fdffffff02d84f9a01000000001976a9141bbbce692a9c85ac25ccb26c1d9a4a0f0336833288ac2352b601000000001976a9147e767129b15ee6d7e6be8ac3a100dd29c4c67e6388ac8ee70700


Save this as tr_s1.txt in the work directory.

Edit parser2.py so that
raw_transaction_file_path = "tr_s1.txt"



aineko:work stjohnpiano$ ./parser2.py


Transaction:
- [derived property] byte length: 225
- version: 01 00 00 00
- input_count: 01
- [derived property] input_count_decimal: 1
- input #0:
-- previous_output_hash: 80 39 45 b5 27 8f c2 90 94 f3 ac a6 b3 28 e5 f0 b4 78 a9 fb 82 59 49 c8 a9 d5 40 9a 5d 66 d9 d0
-- [derived property] previous_output_hash (no spaces, big-endian): d0d9665d9a40d5a9c8495982fba978b4f0e528b3a6acf39490c28f27b5453980
-- previous_output_index: 01 00 00 00
-- [derived property] previous_output_index_decimal: 1
-- script_length: 6a
-- [derived property] script_length_decimal: 106
-- scriptSig: 47 30 44 02 20 62 17 b2 f6 c5 40 c1 71 b8 61 41 a4 44 9a e1 eb d1 e9 35 6f 41 c7 37 73 bb 71 ab f6 72 39 2a 21 02 20 33 25 3d e3 7a 4c c9 08 83 00 9b 3b a5 7b 66 5d 04 11 0d dc f2 d2 01 8f 1d 41 1c 94 a9 0a f7 21 01 21 02 30 43 a8 7b 93 0b 2a 2b 42 ab b8 56 20 25 7b ca 8a 53 51 98 a0 80 c6 c6 ec a1 04 93 20 42 cc a0
-- sequence: fd ff ff ff
- output_count: 02
- [derived property] output_count_decimal: 2
- output #0:
-- value: d8 4f 9a 01 00 00 00 00
-- [derived property] output value in bitcoin: 0.26890200
-- script_length: 19
-- [derived property] script_length_decimal: 25
-- scriptPubKey: 76 a9 14 1b bb ce 69 2a 9c 85 ac 25 cc b2 6c 1d 9a 4a 0f 03 36 83 32 88 ac
- output #1:
-- value: 23 52 b6 01 00 00 00 00
-- [derived property] output value in bitcoin: 0.28725795
-- script_length: 19
-- [derived property] script_length_decimal: 25
-- scriptPubKey: 76 a9 14 7e 76 71 29 b1 5e e6 d7 e6 be 8a c3 a1 00 dd 29 c4 c6 7e 63 88 ac



The scriptPubKey of output #1 is:
76 a9 14 7e 76 71 29 b1 5e e6 d7 e6 be 8a c3 a1 00 dd 29 c4 c6 7e 63 88 ac



This is the scriptPubKey challenge that the single scriptSig of tr_s2 satisfies.



Next: Write a script that can process a subset of Script and display it in a readable form.



Create a file script_processor1.py in the work directory. Run
chmod 700 script_processor1.py
to change its permissions so that it can be run with the command
./script_processor1.py
.



[development occurs here]


script_processor1.py

python 2.7.13
#!/opt/local/bin/python

# description:
# - This script reads a standard concatenated scriptSig + scriptPubKey and displays it in a readable form. 


# notes:

# - sources:
# -- http://www.righto.com/2014/02/bitcoins-hard-way-using-raw-bitcoin.html
# -- http://en.bitcoin.it/wiki/Script
# - Bitcoin uses a scripting system called Script for transactions. Script is Forth-like, simple, stack-based, and processed from left to right.
# - A script is a list of instructions included within the outputs of a transaction, which describe the cryptographic conditions that must be satisfied in order to spend these outputs (i.e. create another transaction that uses them as inputs).
# -- The conditions are stored in the scriptPubKey of the previous transaction's output.
# -- In my terminology, a "standard" transaction is one in which all inputs and outputs are single-signature Pay-To-Public-Key-Hash (P2PKH). 
# -- For standard transactions, the scriptSig for a particular input (or "unspent output") in a new transaction satisfies these conditions by:
# --- Including a public key that hashes to the hash stored in the scriptPubKey.
# --- Including a signature of the hash of the new transaction, made by the same public key. The form of the transaction that is hashed uses the previous transaction's scriptPubKey in place of the new transaction's scriptSig (apparently for historical reasons - there's no extra security from using the previous scriptPubKey). The scriptSig is not included in the signed form of the transaction because it can only be created after the signature has been made. 

# expected script for a standard transaction:
# - {sig} (scriptSig)
# -- PUSHDATA: 1 byte
# -- signature data + SIGHASH_ALL
# -- PUSHDATA: 1 byte
# -- public key data
# - {pubKey} (scriptPubKey)
# -- OP_DUP / 0x76: 1 byte
# -- OP_HASH160 / 0xa9: 1 byte
# -- PUSHDATA: 1 byte
# -- public key hash: 20 bytes (== 160 bits)
# -- OP_EQUALVERIFY / 0x88: 1 byte
# -- OP_CHECKSIG / 0xac: 1 byte

# example:
# transaction created manually by Ken Shirriff
# transaction hash (big-endian) = 3f285f083de7c0acabd9f106a43ec42687ab0bebe2e6f0d529db696794540fea
# uses 1 input from a previous transaction:
# previous transaction hash (big-endian) =  81b4c832d70cb56ff957589752eb4125a4cab78a25a8fc52d6a09e5bd4404d48
# scriptPubKey from previous transaction:
# 76 a9 14 df 3b d3 01 60 e6 c6 14 5b aa f2 c8 8a 88 44 c1 3a 00 d1 d5 88 ac
# scriptSig from new transaction:
# 47 30 44 02 20 2c b2 65 bf 10 70 7b f4 93 46 c3 51 5d d3 d1 6f c4 54 61 8c 58 ec 0a 0f f4 48 a6 76 c5 4f f7 13 02 20 6c 66 24 d7 62 a1 fc ef 46 18 28 4e ad 8f 08 67 8a c0 5b 13 c8 42 35 f1 65 4e 6a d1 68 23 3e 82 01 41 04 14 e3 01 b2 32 8f 17 44 2c 0b 83 10 d7 87 bf 3d 8a 40 4c fb d0 70 4f 13 5b 6a d4 b2 d3 ee 75 13 10 f9 81 92 6e 53 a6 e8 c3 9b d7 d3 fe fd 57 6c 54 3c ce 49 3c ba c0 63 88 f2 65 1d 1a ac bf cd
# scriptPubKey in readable subdivided form:
# - OP_DUP
# - OP_HASH160
# - PUSHDATA(20)
# - df 3b d3 01 60 e6 c6 14 5b aa f2 c8 8a 88 44 c1 3a 00 d1 d5
# - OP_EQUALVERIFY
# - OP_CHECKSIG
# scriptSig in readable subdivided form:
# - PUSHDATA(71)
# - [transaction signature] 30 44 02 20 2c b2 65 bf 10 70 7b f4 93 46 c3 51 5d d3 d1 6f c4 54 61 8c 58 ec 0a 0f f4 48 a6 76 c5 4f f7 13 02 20 6c 66 24 d7 62 a1 fc ef 46 18 28 4e ad 8f 08 67 8a c0 5b 13 c8 42 35 f1 65 4e 6a d1 68 23 3e 82 01
# - PUSHDATA(65)
# - [bitcoin public key] 04 14 e3 01 b2 32 8f 17 44 2c 0b 83 10 d7 87 bf 3d 8a 40 4c fb d0 70 4f 13 5b 6a d4 b2 d3 ee 75 13 10 f9 81 92 6e 53 a6 e8 c3 9b d7 d3 fe fd 57 6c 54 3c ce 49 3c ba c0 63 88 f2 65 1d 1a ac bf cd

# Notes on the script formats:
# - The last byte of the signature, 01, is the hash type, and is not part of the signature itself. 
# - The first byte of the public key, 04, is not part of the public key itself. I don't know what it's meant to indicate. 
# - The signature is in DER encoding. The public key is in hex bytes. 
# - The signature is made by a 512-bit (64-byte) ECDSA public key that uses the SECP256k1 curve. 

# Further notes:
# - To produce the signature, a particular form of the previous transaction is created and signed. This form, which might be called "transaction-in-signable-form", must be re-created in order to check the validity of the signature. 

# opcode descriptions:

# - byte value: 0x01-0x4b
# -- Word: PUSHDATA
# -- Opcodes 1-75 (e.g. OP_1).
# -- Input: [Special]
# -- Output: a single stack item
# -- Description: Let N be the byte value of the opcode. The next N bytes are data to be pushed onto the stack.

# - byte value: 0x69
# -- Word: OP_VERIFY
# -- Opcode: 105
# -- Input: True / false
# -- Output: Nothing / fail
# -- Description: Marks transaction as invalid if top stack value is not true. The top stack value is removed.

# - byte value: 0x76
# -- Word: OP_DUP
# -- Opcode: 118
# -- Input: x
# -- Output: x x
# -- Description: Duplicates the top stack item.

# - byte value: 0x87
# -- Word: OP_EQUAL
# -- Opcode: 135
# -- Input: x1 x2
# -- Output: True / false. 
# -- Description: Returns 1 if the inputs are exactly equal, 0 otherwise. 

# - byte value: 0x88
# -- Word: OP_EQUALVERIFY
# -- Opcode: 136
# -- Input: x1 x2
# -- Output: Nothing / fail. 
# -- Description: Same as OP_EQUAL, but runs OP_VERIFY afterward.

# - byte value: 0xa9
# -- Word: OP_HASH160
# -- Opcode: 169
# -- Input: in
# -- Output: hash
# -- Description: The input is hashed twice: first with SHA-256 and then with RIPEMD-160.

# - byte value: 0xac
# -- Word: OP_CHECKSIG
# -- Opcode: 172
# -- Input: sig pubkey
# -- Output: True / false 
# -- Description: The entire transaction's outputs, inputs, and script (from the most recently-executed OP_CODESEPARATOR to the end) are hashed. The signature used by OP_CHECKSIG must be a valid signature for this hash and public key. If it is, 1 is returned, 0 otherwise.


def main():


	scriptPubKey = "76 a9 14 7e 76 71 29 b1 5e e6 d7 e6 be 8a c3 a1 00 dd 29 c4 c6 7e 63 88 ac"
	scriptSig = "48 30 45 02 21 00 d8 32 1c 3c 4b 3f c3 a3 4e 30 54 7d 1c b6 ae a6 85 5c ad 33 0b 59 0b 53 e8 24 f6 47 dd 49 23 99 02 20 1e 90 56 e8 22 97 a2 10 8b 19 69 cc 49 71 50 73 4a b7 8f 51 cb f9 79 88 5a fc 10 4f 70 a6 51 50 01 21 02 a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61"
	
	scriptPubKey_bytes = scriptPubKey.split(" ")
	scriptSig_bytes = scriptSig.split(" ")
	
	# concatenate the two scripts. scriptSig is executed first, so it goes at the front. 
	script_bytes = scriptSig_bytes + scriptPubKey_bytes
	
	# expected script for a standard transaction:
	# - scriptSig
	# -- PUSHDATA: 1 byte
	# -- signature data + SIGHASH_ALL
	# -- PUSHDATA: 1 byte
	# -- public key data
	# - scriptPubKey
	# -- OP_DUP / 0x76: 1 byte
	# -- OP_HASH160 / 0xa9: 1 byte
	# -- PUSHDATA: 1 byte
	# -- public key hash: 20 bytes (== 160 bits)
	# -- OP_EQUALVERIFY / 0x88: 1 byte
	# -- OP_CHECKSIG / 0xac: 1 byte
	
	
	index = 0 # stores the current index-of-the-next-byte-to-process within the script.
	
	print "\nscript:"
	print "- scriptSig (new transaction): %s" % hex_bytes_to_string(scriptSig_bytes)
	print "- scriptPubKey (previous transaction): %s" % hex_bytes_to_string(scriptPubKey_bytes)
	print "- entire script (scriptSig + scriptPubKey): %s " % hex_bytes_to_string(script_bytes)
	
	print "- entire script in readable form:"
	
	# PUSHDATA
	pushdata = script_bytes[index]
	pushdata_decimal = hex_string_to_decimal(pushdata)
	if pushdata_decimal not in xrange(1,75):
		stop("pushdata_decimal (value=%d) not in range(1,75) (PUSHDATA)." % pushdata_decimal)
	index += 1
	print "-- PUSHDATA: %s" % pushdata
	print "-- [derived property] PUSHDATA decimal value: %d" % pushdata_decimal
	
	# signature + hash_type byte
	signature_data = script_bytes[index:index+pushdata_decimal]
	index += pushdata_decimal
	print "-- signature_data: %s" % hex_bytes_to_string(signature_data)
	# confirm that last byte (hash_type) is 0x01.
	assert(signature_data[-1] == "01")
	
	# PUSHDATA
	pushdata = script_bytes[index]
	pushdata_decimal = hex_string_to_decimal(pushdata)
	if pushdata_decimal not in xrange(1,75):
		stop("pushdata_decimal (value=%d) not in range(1,75) (PUSHDATA)." % pushdata_decimal)
	index += 1
	print "-- PUSHDATA: %s" % pushdata
	print "-- [derived property] PUSHDATA decimal value: %d" % pushdata_decimal
	
	# public key, with 04 prefix. 
	public_key_data = script_bytes[index:index+pushdata_decimal]
	index += pushdata_decimal
	print "-- public_key_data: %s" % hex_bytes_to_string(public_key_data)
	# confirm that first byte (prefix) is 0x04.
	assert(public_key_data[0] == "04")
	
	# OP_DUP
	op_dup = script_bytes[index]
	if op_dup != "76":
		stop("expected OP_DUP / 0x76. found 0x%s" % op_dup)
	index += 1
	print "-- OP_DUP: %s" % op_dup
	
	# OP_HASH160
	op_hash160 = script_bytes[index]
	if op_hash160 != "a9":
		stop("expected OP_HASH160 / 0xa9. found 0x%s" % op_hash160)
	index += 1
	print "-- OP_HASH160: %s" % op_hash160
	
	# PUSHDATA
	pushdata = script_bytes[index]
	pushdata_decimal = hex_string_to_decimal(pushdata)
	if pushdata_decimal not in xrange(1,75):
		stop("pushdata_decimal (value=%d) not in range(1,75) (PUSHDATA)." % pushdata_decimal)
	index += 1
	print "-- PUSHDATA: %s" % pushdata
	print "-- [derived property] PUSHDATA decimal value: %d" % pushdata_decimal
	
	# public key hash 
	public_key_hash = script_bytes[index:index+pushdata_decimal]
	index += pushdata_decimal
	print "-- public_key_hash: %s" % hex_bytes_to_string(public_key_hash)
	
	# OP_EQUALVERIFY
	op_equalverify = script_bytes[index]
	if op_equalverify != "88":
		stop("expected OP_EQUALVERIFY / 0x88. found 0x%s" % op_equalverify)
	index += 1
	print "-- OP_EQUALVERIFY: %s" % op_equalverify
	
	# OP_CHECKSIG
	op_checksig = script_bytes[index]
	if op_checksig != "ac":
		stop("expected OP_CHECKSIG / 0xac. found 0x%s" % op_checksig)
	index += 1
	print "-- OP_CHECKSIG: %s" % op_checksig
	
	# confirm that we have processed exactly the right number of bytes (i.e. all the bytes in the script).
	# index is the index within a 0-indexed list, while len() calculates the 1-indexed length of a list.
	assert(index == len(script_bytes))
	
	print ""
	



def hex_bytes_to_string(hex_bytes):
	hex_byte_string = " ".join(hex_bytes) # space-separated hex byte string
	return hex_byte_string


def hex_string_to_decimal(string):
	# expects string of hex bytes, without spaces.
	if string.count(" ") > 0: 
		stop("string (value=%s) contains at least one space." % string)
	total = 0
	n = len(string)
	for i in xrange(n):
		c = string[i] # c = character
		v = hex_character_to_decimal(c) # v = value
		power = n - i - 1 # first character has the highest power of 16. power of last character is 0. 
		total += v*(16**power)
	return total
	

def hex_character_to_decimal(c):
	decimal_digits = "0123456789"
	hex_digits = {"a": 10, "b": 11, "c": 12, "d": 13, "e": 14, "f": 15}
	if c in decimal_digits:
		return int(c)
	elif c in hex_digits.keys():
		return hex_digits[c]
	stop("character %s is not a hex character." % c)


def stop(message):
	# raise an Exception to get a traceback. 
	raise Exception("ERROR: %s\n" % message)


if __name__ == "__main__": main()



aineko:work stjohnpiano$ script_processor1.py


script:
- scriptSig (new transaction): 48 30 45 02 21 00 d8 32 1c 3c 4b 3f c3 a3 4e 30 54 7d 1c b6 ae a6 85 5c ad 33 0b 59 0b 53 e8 24 f6 47 dd 49 23 99 02 20 1e 90 56 e8 22 97 a2 10 8b 19 69 cc 49 71 50 73 4a b7 8f 51 cb f9 79 88 5a fc 10 4f 70 a6 51 50 01 21 02 a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61
- scriptPubKey (previous transaction): 76 a9 14 7e 76 71 29 b1 5e e6 d7 e6 be 8a c3 a1 00 dd 29 c4 c6 7e 63 88 ac
- entire script (scriptSig + scriptPubKey): 48 30 45 02 21 00 d8 32 1c 3c 4b 3f c3 a3 4e 30 54 7d 1c b6 ae a6 85 5c ad 33 0b 59 0b 53 e8 24 f6 47 dd 49 23 99 02 20 1e 90 56 e8 22 97 a2 10 8b 19 69 cc 49 71 50 73 4a b7 8f 51 cb f9 79 88 5a fc 10 4f 70 a6 51 50 01 21 02 a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61 76 a9 14 7e 76 71 29 b1 5e e6 d7 e6 be 8a c3 a1 00 dd 29 c4 c6 7e 63 88 ac
- entire script in readable form:
-- PUSHDATA: 48
-- [derived property] PUSHDATA decimal value: 72
-- signature_data: 30 45 02 21 00 d8 32 1c 3c 4b 3f c3 a3 4e 30 54 7d 1c b6 ae a6 85 5c ad 33 0b 59 0b 53 e8 24 f6 47 dd 49 23 99 02 20 1e 90 56 e8 22 97 a2 10 8b 19 69 cc 49 71 50 73 4a b7 8f 51 cb f9 79 88 5a fc 10 4f 70 a6 51 50 01
-- PUSHDATA: 21
-- [derived property] PUSHDATA decimal value: 33
-- public_key_data: 02 a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61
Traceback (most recent call last):
File "./script_processor1.py", line 270, in <module>
if __name__ == "__main__": main()
File "./script_processor1.py", line 183, in main
assert(public_key_data[0] == "04")
AssertionError



Hm.

In the output, I can see that the first byte of the public_key_data in scriptSig is not actually 0x04. It's 0x02.


An earlier excerpt from:
bitcoin.org/en/developer-guide

Bitcoin ECDSA public keys represent a point on a particular Elliptic Curve (EC) defined in secp256k1. In their traditional uncompressed form, public keys contain an identification byte, a 32-byte X coordinate, and a 32-byte Y coordinate. The extremely simplified illustration below shows such a point on the elliptic curve used by Bitcoin, y^2 = x^3 + 7, over a field of contiguous numbers.

[illustration not included]

An almost 50% reduction in public key size can be realized without changing any fundamentals by dropping the Y coordinate. This is possible because only two points along the curve share any particular X coordinate, so the 32-byte Y coordinate can be replaced with a single bit indicating whether the point is on what appears in the illustration as the "top" side or the "bottom" side.

No data is lost by creating these compressed public keys - only a small amount of CPU is necessary to reconstruct the Y coordinate and access the uncompressed public key. Both uncompressed and compressed public keys are described in official secp256k1 documentation and supported by default in the widely-used OpenSSL library.

Because they're easy to use, and because they reduce almost by half the block chain space used to store public keys for every spent output, compressed public keys are the default in Bitcoin Core and are the recommended default for all Bitcoin software.

However, Bitcoin Core prior to 0.6 used uncompressed keys. This creates a few complications, as the hashed form of an uncompressed key is different than the hashed form of a compressed key, so the same key works with two different P2PKH addresses. This also means that the key must be submitted in the correct format in the signature script so it matches the hash in the previous output's pubkey script.

For this reason, Bitcoin Core uses several different identifier bytes to help programs identify how keys should be used:

1) Private keys meant to be used with compressed public keys have 0x01 appended to them before being Base-58 encoded. (See the private key encoding section above.)

2) Uncompressed public keys start with 0x04; compressed public keys begin with 0x03 or 0x02 depending on whether they're greater or less than the midpoint of the curve. These prefix bytes are all used in official secp256k1 documentation.




Excerpt from:
en.bitcoin.it/wiki/Protocol_documentation#Signatures

Public keys (in scripts) are given as 04 [x] [y] where x and y are 32 byte big-endian integers representing the coordinates of a point on the curve or in compressed form given as [sign] [x] where [sign] is 0x02 if y is even and 0x03 if y is odd.





So, the public data in this case is:
public key:
- type: 02
- X: a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61


Key point:
- "The hashed form of an uncompressed key is different than the hashed form of a compressed key, so the same key works with two different P2PKH addresses. This also means that the key must be submitted in the correct format in the signature script so it matches the hash in the previous output's pubkey script."


Hm. I think that this means that the compressed public key has a corresponding bitcoin address that uses the hash of the compressed public key.


So: The scriptPubKey in tr_s1 probably stores the hash of the compressed form of the public key.


Edit script_processor1.py:
- Change this comment:
# - The first byte of the public key, 04, is not part of the public key itself. I don't know what it's meant to indicate.
to:
# - The first byte of the public key indicates the type of public key. 0x04 == uncompressed form (32-byte X coordinate, 32-byte Y coordinate on the elliptic curve). 0x02 == compressed form (32-byte X, Y is even). 0x03 == compressed form (32-byte X, Y is odd). Source: http://en.bitcoin.it/wiki/Protocol_documentation#Signatures
- Change this line:
# public key, with 04 prefix.
to:
# public key, with 1-byte type prefix in [0x04, 0x02, 0x03].
- Change these two lines:
# confirm that first byte (prefix) is 0x04.
assert(public_key_data[0] == "04")
to:
# confirm that first byte (prefix) is in list [0x04, 0x02, 0x03].
assert(public_key_data[0] in ["04", "02", "03"])


Run it again.



aineko:work stjohnpiano$ script_processor1.py


script:
- scriptSig (new transaction): 48 30 45 02 21 00 d8 32 1c 3c 4b 3f c3 a3 4e 30 54 7d 1c b6 ae a6 85 5c ad 33 0b 59 0b 53 e8 24 f6 47 dd 49 23 99 02 20 1e 90 56 e8 22 97 a2 10 8b 19 69 cc 49 71 50 73 4a b7 8f 51 cb f9 79 88 5a fc 10 4f 70 a6 51 50 01 21 02 a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61
- scriptPubKey (previous transaction): 76 a9 14 7e 76 71 29 b1 5e e6 d7 e6 be 8a c3 a1 00 dd 29 c4 c6 7e 63 88 ac
- entire script (scriptSig + scriptPubKey): 48 30 45 02 21 00 d8 32 1c 3c 4b 3f c3 a3 4e 30 54 7d 1c b6 ae a6 85 5c ad 33 0b 59 0b 53 e8 24 f6 47 dd 49 23 99 02 20 1e 90 56 e8 22 97 a2 10 8b 19 69 cc 49 71 50 73 4a b7 8f 51 cb f9 79 88 5a fc 10 4f 70 a6 51 50 01 21 02 a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61 76 a9 14 7e 76 71 29 b1 5e e6 d7 e6 be 8a c3 a1 00 dd 29 c4 c6 7e 63 88 ac
- entire script in readable form:
-- PUSHDATA: 48
-- [derived property] PUSHDATA decimal value: 72
-- signature_data: 30 45 02 21 00 d8 32 1c 3c 4b 3f c3 a3 4e 30 54 7d 1c b6 ae a6 85 5c ad 33 0b 59 0b 53 e8 24 f6 47 dd 49 23 99 02 20 1e 90 56 e8 22 97 a2 10 8b 19 69 cc 49 71 50 73 4a b7 8f 51 cb f9 79 88 5a fc 10 4f 70 a6 51 50 01
-- PUSHDATA: 21
-- [derived property] PUSHDATA decimal value: 33
-- public_key_data: 02 a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61
-- OP_DUP: 76
-- OP_HASH160: a9
-- PUSHDATA: 14
-- [derived property] PUSHDATA decimal value: 20
-- public_key_hash: 7e 76 71 29 b1 5e e6 d7 e6 be 8a c3 a1 00 dd 29 c4 c6 7e 63
-- OP_EQUALVERIFY: 88
-- OP_CHECKSIG: ac



Note: The signature data in scriptSig of tr_s2 is:
30 45 02 21 00 d8 32 1c 3c 4b 3f c3 a3 4e 30 54 7d 1c b6 ae a6 85 5c ad 33 0b 59 0b 53 e8 24 f6 47 dd 49 23 99 02 20 1e 90 56 e8 22 97 a2 10 8b 19 69 cc 49 71 50 73 4a b7 8f 51 cb f9 79 88 5a fc 10 4f 70 a6 51 50 01




Rename check_signature.py to check_signature1.py.

Write check_hash_and_signature1.py, which should:
- hash the public key data from tr_s1 (use code from op_hash160.py)
- confirm that the resulting hash matches the hash in the scriptPubKey in tr_s1.
- parse tr_s2 and generate the transaction-in-signable-form (copy the Transaction class and accompanying functions from parser2.py and add a method for building the transaction-in-signable-form).
- hash the transaction-in-signable-form (double sha256)
- tackle the problem of getting an ECDSA VerifyingKey object from just the X coordinate and the Y sign.


Then, for the moment, check the validity of the signature using check_signature1.py.





check_hash_and_signature1.py

python 2.7.13
#!/opt/local/bin/python


from binascii import hexlify, unhexlify
from pypy_sha256 import sha256
from bjorn_edstrom_ripemd160 import RIPEMD160


def main():

	# data from tr_s1
	public_key_hash = "7e 76 71 29 b1 5e e6 d7 e6 be 8a c3 a1 00 dd 29 c4 c6 7e 63"

	# data from tr_s2
	public_key_data = "02 a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61"

	
	public_key_hash_byte_string = public_key_hash.replace(" ","")
	
	
	# hash of public key data
	digest = op_hash160(public_key_data)
	
	
	
	print ""
	print "public key hash (previous transaction): %s" % public_key_hash
	print "- public key hash without spaces: %s" % public_key_hash_byte_string
	print "public key data (new transaction): %s" % public_key_data

	# convert the byte sequence to hex for display. 
	digest_byte_string = hexlify(digest)
	print "ripemd160(sha256(public_key_data)): %s" % digest_byte_string
	
	# confirm that the hash of the public key data matches the hash in the scriptPubKey in tr_s1. 
	assert(digest_byte_string == public_key_hash_byte_string)
	print "success: calculated hash of the public key data (new transaction) matches the stored hash in the scriptPubKey (previous transaction)"
	
	print ""



def op_hash160(hex_byte_string):
	# expects hex_byte_string (optionally with spaces)
	# returns byte sequence
	
	hex_byte_string = hex_byte_string.replace(" ","")

	# convert to a byte sequence that can be used as input to the sha256 hash function. 
	byte_sequence = unhexlify(hex_byte_string)

	# the digest (i.e. the result of hashing the input) will be a byte sequence. 
	sha256_digest = sha256(byte_sequence).digest()

	# the sha256_digest is already a raw byte sequence that can be used as input to the ripemd160 hash function. 
	ripemd160_digest = RIPEMD160(sha256_digest).digest()
	
	return ripemd160_digest


if __name__ == "__main__": main()



aineko:work stjohnpiano$ ./check_hash_and_signature1.py

public key hash (previous transaction): 7e 76 71 29 b1 5e e6 d7 e6 be 8a c3 a1 00 dd 29 c4 c6 7e 63
- public key hash without spaces: 7e767129b15ee6d7e6be8ac3a100dd29c4c67e63
public key data (new transaction): 02 a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61
ripemd160(sha256(public_key_data)): 7e767129b15ee6d7e6be8ac3a100dd29c4c67e63
success: calculated hash of the public key data (new transaction) matches the stored hash in the scriptPubKey (previous transaction)





Aha. Experimental confirmation that if the public key is supplied in compressed form within the scriptSig of a new transaction, then the compressed form (not the uncompressed form) should be hashed for comparison to the public key hash in the scriptPubKey of the relevant previous transaction.


[some development occurs here]


I'm currently working on this:
- tackle the problem of getting an ECDSA VerifyingKey object from just the X coordinate and the Y sign.


I've read through the ecdsa library. I can't find a function for calculating the Y coordinate from a known X coordinate.

I note that:
- the ecdsa library imports its sha1 hash function from hashlib (which, I think, is a wrapper for openssl). Also sha256, sha512. However, it looks like it's possible to provide an already-calculated hash to a signing function.
- in ecdsa.py, I find: "It is absolutely vital that random_k be an unpredictable number in the range [1, self.public_key.point.order()-1]. If an attacker can guess random_k, he can compute our private key from a single signature. Also, if an attacker knows a few high-order bits (or a few low-order bits) of random_k, he can compute our private key from many signatures. The generation of nonces with adequate cryptographic strength is very difficult and far beyond the scope of this comment."
-- random_k is an argument to class Private_key, method sign( self, hash, random_k ).
-- Summary: A different random number is needed for every signature. If an insufficiently random number is used, an attacker can derive the private key from a sufficient number of signatures.
- The code claims that it was written in 2005 and 2006 by Peter Pearson. Some changes made in 2008 and 2009.


From an earlier excerpt: "a point on the elliptic curve used by Bitcoin, y^2 = x^3 + 7"

So, to get Y from X, calculate y = sqrt(x^3 + 7).

Need to convert 32-byte X into an integer. Then convert resulting integer Y into 32-byte string.

Because y is calculated from a square root, there will be two y values for every x value. One positive, one negative.

This indicates that this source
en.bitcoin.it/wiki/Protocol_documentation#Signatures
incorrectly stated that the difference between the two y values is that one is odd and the other is even.

Here is X:
a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61


By using
X = hex_bytes_to_decimal(public_key_bytes)

I find that X_decimal is:
74724975260254336635725088600963580058846919575436399405146825925206718947937

By using
Y = math.sqrt(X**3 + 7)

I find that Y_decimal is:
20426721601886702285677856159981430109269012723278278952324010893280738091637119207319161997770778320257157001379840
- Note: To display full number of Y (i.e. not in scientific notation), use:
print long(Y)



Hm.


Python code excerpt:

public_key_bytes = public_key_data_bytes[1:] # remove first byte
X_decimal = hex_bytes_to_decimal(public_key_bytes)
Y_decimal = math.sqrt(X_decimal**3 + 7)
Y_decimal = int(Y_decimal) # I'm uncertain about this step. However, cryptography works on integers, and the hex() function obviously doesn't accept a float result. 
Y = hex(Y_decimal)
if Y[-1] == "L": Y = Y[:-1] # remove Python's Long indicator
if Y[:2] == "0x": Y = Y[2:] # remove Python's hex indicator
print len(Y)/2


This produces a result of 48 (bytes), but it should be 32.


Google "bitcoin uncompress public key".


First result:
bitcoin.stackexchange.com/questions/28398/how-can-i-get-the-uncompressed-public-key-from-the-compressed-public-key-in-open

A comment there by Tim S leads to:
bitcointalk.org/index.php?topic=644919.msg7205689#msg7205689
Author: TimS
Date: June 09, 2014, 12:53:21 AM

Excerpt:

Re: How to get uncompressed public key from compressed one ?

From http://en.wikipedia.org/wiki/Quadratic_residue and http://mersennewiki.org/index.php/Modular_Square_Root, if

r^2 = a mod m where m = 3 mod 4 (as secp256k1's p does)
then
r = +-a^((m+1)/4) mod m

So:

y^2 mod p = (x^3 + 7) mod p
y mod p = +-(x^3 + 7)^((p+1)/4) mod p

So calculate (x^3 + 7)^((p+1)/4) mod p, and if the parity of the first answer you get is wrong, then take the negative of that answer (since we're working modulo an odd number, taking the negative will flip the even/odd parity).

Just for fun, here's some Python code that'll do the calculation in the blink of an eye, preset to work on the public key of (the random) private key 55255657523dd1c65a77d3cb53fcd050bf7fc2c11bb0bb6edabdbd41ea51f641

Code:
python
def pow_mod(x, y, z):
    "Calculate (x ** y) % z efficiently."
    number = 1
    while y:
        if y & 1:
            number = number * x % z
        y >>= 1
        x = x * x % z
    return number

p = 0xfffffffffffffffffffffffffffffffffffffffffffffffffffffffefffffc2f
compressed_key = '0314fc03b8df87cd7b872996810db8458d61da8448e531569c8517b469a119d267'
y_parity = int(compressed_key[:2]) - 2
x = int(compressed_key[2:], 16)
a = (pow_mod(x, 3, p) + 7) % p
y = pow_mod(a, (p+1)//4, p)
if y % 2 != y_parity:
    y = -y % p
uncompressed_key = '04{:x}{:x}'.format(x, y)
print(uncompressed_key)




Ok. The calculation has to be done using modular arithmetic. I don't fully understand the mathematics code, but I'll see if it works.




uncompress.py

python 2.7.13
#!/opt/local/bin/python


def main():
	
	public_key_data_byte_string = "02 a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61"
	public_key_data_byte_string = public_key_data_byte_string.replace(" ","")
	
	print "compressed public key: %s" % public_key_data_byte_string

	public_key_hex_byte_string = uncompress(public_key_data_byte_string)
	
	print "uncompressed public key: %s" % public_key_hex_byte_string


def uncompress(compressed_key):
	# source: http://bitcointalk.org/index.php?topic=644919.msg7205689#msg7205689
	# author: TimS
	# example compressed key: '0314fc03b8df87cd7b872996810db8458d61da8448e531569c8517b469a119d267'
	
	def pow_mod(x, y, z):
		"Calculate (x ** y) % z efficiently."
		number = 1
		while y:
			if y & 1:
				number = number * x % z
			y >>= 1
			x = x * x % z
		return number
	
	p = 0xfffffffffffffffffffffffffffffffffffffffffffffffffffffffefffffc2f
	y_parity = int(compressed_key[:2]) - 2
	x = int(compressed_key[2:], 16)
	a = (pow_mod(x, 3, p) + 7) % p
	y = pow_mod(a, (p+1)//4, p)
	if y % 2 != y_parity:
		y = -y % p
	uncompressed_key = '04{:x}{:x}'.format(x, y)
	return uncompressed_key
	
	
if __name__ == "__main__": main()



aineko:work stjohnpiano$ ./uncompress.py

compressed public key: 02a534d65273f8520e1e841ccbb782b71e06b4117a90bb92c655f8131cf6ae2261
uncompressed public key: 04a534d65273f8520e1e841ccbb782b71e06b4117a90bb92c655f8131cf6ae226150b222cdcd631b144cfe6ff8df86a5c02e416eb6d10ee9168bda056fc4f855ee



This line:
if y % 2 != y_parity:

suggests that the correct way to distinguish between the two possible Y results really is "even" vs "odd".



Hm. I get a 65-byte result for the uncompressed public key:
04a534d65273f8520e1e841ccbb782b71e06b4117a90bb92c655f8131cf6ae226150b222cdcd631b144cfe6ff8df86a5c02e416eb6d10ee9168bda056fc4f855ee

A "04" prefix and two 32-byte values for X and Y.

X: a534d65273f8520e1e841ccbb782b71e06b4117a90bb92c655f8131cf6ae2261
Y: 50b222cdcd631b144cfe6ff8df86a5c02e416eb6d10ee9168bda056fc4f855ee

Now let's see if that actually works.


I can use parts of the ecdsa library to check the validity of a point.



check_validity_of_point.py

python 2.7.13
#!/opt/local/bin/python

from ecdsa.ecdsa import *

X="a534d65273f8520e1e841ccbb782b71e06b4117a90bb92c655f8131cf6ae2261"
Y="50b222cdcd631b144cfe6ff8df86a5c02e416eb6d10ee9168bda056fc4f855ee"

# convert from hex to decimal integers
X_decimal = int(X,16)
Y_decimal = int(Y,16)

validity = point_is_valid( generator=generator_secp256k1, x=X_decimal, y=Y_decimal )
print validity



aineko:work stjohnpiano$ ./check_validity_of_point.py

True



Ok. It's a valid point.




check_hash_and_signature1.py can currently:
- confirm that the public key in a new transaction hashes to the hash stored in the scriptPubKey of the relevant previous transaction.
- get the signable form of a transaction.
- apply sha256 twice to the transaction-in-signable-form.


Let's store a checkpoint:



check_hash_and_signature1.py

python 2.7.13
#!/opt/local/bin/python

import ecdsa
import ecdsa.der

from binascii import hexlify, unhexlify
from pypy_sha256 import sha256
from bjorn_edstrom_ripemd160 import RIPEMD160


def main():

	# data from previous transaction
	public_key_hash = "7e 76 71 29 b1 5e e6 d7 e6 be 8a c3 a1 00 dd 29 c4 c6 7e 63"
	scriptPubKey = "76 a9 14 7e 76 71 29 b1 5e e6 d7 e6 be 8a c3 a1 00 dd 29 c4 c6 7e 63 88 ac"

	# data from new transaction
	public_key_data = "02 a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61"

	# new transaction
	raw_transaction_file_path = "tr_s2.txt"
	raw_transaction = file_get_contents(raw_transaction_file_path)
	transaction = Transaction()
	transaction.parse_raw_transaction(raw_transaction)
	
	# hash of public key data
	digest = op_hash160(public_key_data)
	
	public_key_hash_byte_string = public_key_hash.replace(" ","")
	
	print ""
	print "public key hash (previous transaction): %s" % public_key_hash
	print "- public key hash without spaces: %s" % public_key_hash_byte_string
	print "public key data (new transaction): %s" % public_key_data

	# convert the byte sequence to hex for display. 
	digest_byte_string = hexlify(digest)
	print "ripemd160(sha256(public_key_data)): %s" % digest_byte_string
	
	# confirm that the hash of the public key data matches the hash in the scriptPubKey in tr_s1. 
	assert(digest_byte_string == public_key_hash_byte_string)
	print "success: calculated hash of the public key data (new transaction) matches the stored hash in the scriptPubKey (previous transaction)"
	
	# get new transaction in signable form.
	input_index = 0
	scriptPubKey_bytes = scriptPubKey.split(" ")
	signable_transaction_hex_bytes = transaction.get_signable_hex_bytes(input_index, scriptPubKey_bytes)
	signable_transaction_hex_byte_string = "".join(signable_transaction_hex_bytes)
	
	# get double sha256 hash of transaction-in-signable-form
	digest_hex = sha256_hex(signable_transaction_hex_byte_string)
	digest_hex = sha256_hex(digest_hex)
	
	print "double sha256 digest of the new transaction in signable form:\n%s" % digest_hex
	
	print ""




def op_hash160(hex_byte_string):
	# expects hex_byte_string (optionally with spaces)
	# returns byte sequence
	
	hex_byte_string = hex_byte_string.replace(" ","")

	# convert to a byte sequence that can be used as input to the sha256 hash function. 
	byte_sequence = unhexlify(hex_byte_string)

	# the digest (i.e. the result of hashing the input) will be a byte sequence. 
	sha256_digest = sha256(byte_sequence).digest()

	# the sha256_digest is already a raw byte sequence that can be used as input to the ripemd160 hash function. 
	ripemd160_digest = RIPEMD160(sha256_digest).digest()
	
	return ripemd160_digest


def sha256_hex(hex_byte_string):
	# expects hex byte string, optionally separated by spaces.
	# returns hex byte string, not space-separated.
	hex_byte_string = hex_byte_string.replace(" ","")
	# convert to a byte sequence that can be used as input to the sha256 hash function. 
	byte_sequence = unhexlify(hex_byte_string)
	# the digest (i.e. the result of hashing the input) will be a byte sequence. 
	sha256_digest = sha256(byte_sequence).digest()
	sha256_digest_hex_byte_string = hexlify(sha256_digest)
	return sha256_digest_hex_byte_string



class Transaction:


	# expected format (raw hex bytes):
	# - version: 4 bytes
	# - input_count: (var_int) 1 to 9 bytes
	# - inputs: 
	# -- previous_output_hash: 32 bytes (little-endian)
	# -- previous_output_index: 4 bytes (little-endian)
	# -- script_length: (var_int) 1 to 9 bytes
	# -- scriptSig
	# -- sequence: 4 bytes
	# - output_count: (var_int) 1 to 9 bytes
	# - outputs:
	# -- value: 8 bytes (little-endian)
	# -- script_length: (var_int) 1 to 9 bytes
	# -- scriptPubKey
	# - block lock time: 4 bytes
	
	# notes:
	# - first byte of var_int indicates its total byte length. More details are in the comments for the get_length_of_var_int function.
	
	# questions:
	# - is sequence little-endian? 
	# - I recall that block_lock_time has an odd format. 
	
	# warning:
	# - other transactions could have formats that don't fit the expected format. 
	
	
	def __init__(self):
		self.raw_transaction = None
		self.hex_bytes = None
		self.byte_length = None
		self.version = None
		self.input_count_bytes = None
		self.input_count_decimal = None # derived property
		self.inputs = None # list, not byte string
		self.output_count_bytes = None
		self.output_count_decimal = None # derived property
		self.outputs = None # list, not byte string
		self.block_lock_time = None
	
	
	def get_signable_hex_bytes(self, input_index, scriptPubKey_bytes):
		# expects input_index as integer and scriptPubKey_bytes as list of hex bytes. 
		# returns list of hex bytes
		# input_index = the position of the input whose private key is or will be used to sign this transaction. The scriptPubKey to which this input was paid will be substituted for the scriptSig of this input. 
		# future: test to see if 0x00 or nothing should be substituted for the scriptSigs of other inputs.
		if len(self.inputs) > 1:
			stop("can't handle multiple inputs at the moment.")
		input = self.inputs[input_index]
		new_script_length = get_var_int_prefix(scriptPubKey_bytes)
		new_scriptSig = scriptPubKey_bytes
		original_script_length = input["script_length"]
		original_scriptSig = input["scriptSig"]
		# perform substitution
		input["script_length"] = new_script_length
		input["scriptSig"] = new_scriptSig
		hex_bytes = self.get_hex_bytes()
		# change back
		input["script_length"] = original_script_length
		input["scriptSig"] = original_scriptSig
		hex_bytes += ["01","00","00","00"] # add hash type (in 4 byte form)
		return hex_bytes
		
		
	def get_hex_bytes(self):
		hex_bytes = self.version + self.input_count_bytes
		for input in self.inputs:
			hex_bytes += input["previous_output_hash"]
			hex_bytes += input["previous_output_index"]
			hex_bytes += input["script_length"]
			hex_bytes += input["scriptSig"]
			hex_bytes += input["sequence"]
		hex_bytes += self.output_count_bytes
		for output in self.outputs:
			hex_bytes += output["value"]
			hex_bytes += output["script_length"]
			hex_bytes += output["scriptPubKey"]
		hex_bytes += self.block_lock_time
		return hex_bytes
	

	def parse_raw_transaction(self, raw_transaction):
		self.check_raw_transaction(raw_transaction)
		hex_bytes = [raw_transaction[i:i+2] for i in xrange(0, len(raw_transaction), 2)]
		version = hex_bytes[0:4]
		index = 4 # this variable stores the index of the-next-byte-to-process in the hex_bytes list (which is 0-indexed).
		# get number of inputs
		input_count_bytes, index = get_var_int_from_hex_bytes(hex_bytes, index)
		input_count_decimal = hex_bytes_to_decimal(input_count_bytes)
		# prepare the storage for the inputs
		input = {
			"previous_output_hash": None, 
			"previous_output_index": None,
			"script_length": None,
			"scriptSig": None,
			"sequence": None,
			}
		inputs = [None]*input_count_decimal
		for i in xrange(input_count_decimal):
			# explicitly copy the dictionary to make sure that the different inputs are different objects. 
			inputs[i] = input.copy()
		# parse each input item
		for i in xrange(input_count_decimal):
			previous_output_hash = hex_bytes[index:(index+32)]
			index += 32
			previous_output_index = hex_bytes[index:(index+4)]
			index += 4
			script_length_bytes, index = get_var_int_from_hex_bytes(hex_bytes, index)
			script_length_decimal = hex_bytes_to_decimal(script_length_bytes)
			scriptSig = hex_bytes[index:(index+script_length_decimal)]
			index += script_length_decimal
			sequence = hex_bytes[index:(index+4)]
			index += 4
			# store the input bytes
			input = inputs[i]
			input["previous_output_hash"] = previous_output_hash
			input["previous_output_index"] = previous_output_index
			input["script_length"] = script_length_bytes
			input["scriptSig"] = scriptSig
			input["sequence"] = sequence
		# get number of outputs
		output_count_bytes, index = get_var_int_from_hex_bytes(hex_bytes, index)
		output_count_decimal = hex_bytes_to_decimal(output_count_bytes)
		# prepare the storage for the outputs
		output = {
			"value": None, 
			"script_length": None,
			"scriptPubKey": None,
			}
		outputs = [None]*output_count_decimal
		for i in xrange(output_count_decimal):
			# explicitly copy the dictionary to make sure that the different outputs are different objects.
			outputs[i] = output.copy()
		# parse each input item
		for i in xrange(output_count_decimal):
			value = hex_bytes[index:(index+8)]
			index += 8
			script_length_bytes, index = get_var_int_from_hex_bytes(hex_bytes, index)
			script_length_decimal = hex_bytes_to_decimal(script_length_bytes)
			scriptPubKey = hex_bytes[index:(index+script_length_decimal)]
			index += script_length_decimal
			# store the output bytes
			output = outputs[i]
			output["value"] = value
			output["script_length"] = script_length_bytes
			output["scriptPubKey"] = scriptPubKey
		# get the block_lock_time
		block_lock_time = hex_bytes[index:(index+4)]
		index += 4
		# confirm that we have traversed the exact length of the raw transaction by checking whether:
		# - the index of the-next-byte-to-be-processed (in a 0-indexed list) == the raw-transaction-byte-length (1-indexed)
		assert index == len(hex_bytes)
		# store the results of parsing the raw transaction in this instance's properties. 
		self.raw_transaction = raw_transaction
		self.hex_bytes = hex_bytes
		self.byte_length = len(hex_bytes)
		self.version = version
		self.input_count_bytes = input_count_bytes
		self.input_count_decimal = input_count_decimal
		self.inputs = inputs
		self.output_count_bytes = output_count_bytes
		self.output_count_decimal = output_count_decimal
		self.outputs = outputs
		self.block_lock_time = block_lock_time
		
	
	def to_string(self):
		# return a printable string representation of self.
		s = "\nTransaction:"
		s += "\n- [derived property] byte length: %s" % self.byte_length
		s += "\n- version: %s" % hex_bytes_to_string(self.version)
		s += "\n- input_count: %s" % hex_bytes_to_string(self.input_count_bytes)
		s += "\n- [derived property] input_count_decimal: %d" % self.input_count_decimal
		for i, input in enumerate(self.inputs): # same order as found within transaction.
			s += "\n- input #%d:" % i
			s += "\n-- previous_output_hash: %s" % hex_bytes_to_string(input["previous_output_hash"])
			previous_output_hash_big_endian = list(reversed(input["previous_output_hash"]))
			
			s += "\n-- [derived property] previous_output_hash (no spaces, big-endian): %s" % "".join(previous_output_hash_big_endian)
			s += "\n-- previous_output_index: %s" % hex_bytes_to_string(input["previous_output_index"])
			previous_output_index_big_endian = list(reversed(input["previous_output_index"]))
			s += "\n-- [derived property] previous_output_index_decimal: %d" % hex_bytes_to_decimal(previous_output_index_big_endian)
			s += "\n-- script_length: %s" % hex_bytes_to_string(input["script_length"])
			s += "\n-- [derived property] script_length_decimal: %d" % hex_bytes_to_decimal(input["script_length"])
			s += "\n-- scriptSig: %s" % hex_bytes_to_string(input["scriptSig"])
			s += "\n-- sequence: %s" % hex_bytes_to_string(input["sequence"])
		s += "\n- output_count: %s" % hex_bytes_to_string(self.output_count_bytes)
		s += "\n- [derived property] output_count_decimal: %d" % self.output_count_decimal
		for i, output in enumerate(self.outputs): # same order as found within transaction.
			s += "\n- output #%d:" % i
			s += "\n-- value: %s" % hex_bytes_to_string(output["value"])
			s += "\n-- [derived property] output value in bitcoin: %s" % self.convert_output_value_to_bitcoin_string(output["value"])
			s += "\n-- script_length: %s" % hex_bytes_to_string(output["script_length"])
			s += "\n-- [derived property] script_length_decimal: %s" % hex_bytes_to_decimal(output["script_length"])
			s += "\n-- scriptPubKey: %s" % hex_bytes_to_string(output["scriptPubKey"])
		s += "\n"
		return s
		
	
	def convert_output_value_to_bitcoin_string(self, value):
		# convert 8-byte output value (little-endian list of hex bytes) into bitcoin value. 
		value = list(reversed(value)) # change to big-endian
		vs = "".join(value) # vs = value string
		vs_decimal = str(hex_string_to_decimal(vs)) # value string in decimal form
		n = len(vs_decimal) # number of characters in the decimal form of the value string.
		if n - 8 > 0: # more than 8 characters i.e. greater than 1 btc. bitcoin has 8 decimal places.
			# add the decimal point at the appropriate position within the value string.
			section1 = vs_decimal[:-8]
			section2 = vs_decimal[-8:]
			vs_decimal = section1 + "." + section2
		else:
			# 8 or fewer characters i.e. less than 1 btc.
			# add a 0, a decimal point, and the appropriate number of 0s in front of the value string.
			n_zeros = abs(n-8) # appropriate number of zeros
			vs_decimal = "0." + "0"*n_zeros + vs_decimal
		return vs_decimal
	
	
	def check_raw_transaction(self, input):
		# input should be an even number of lower-case hex byte characters, with no spaces.
		permitted_characters = "0123456789abcdef"
		for character in input:
			if character not in permitted_characters:
				stop("character %s is not in permitted characters list: %s" % (character, permitted_characters))
		if len(input) % 2 != 0:
			stop("length of input string is not even")
		return 0



def hex_bytes_to_string(hex_bytes):
	hex_byte_string = " ".join(hex_bytes) # space-separated hex byte string
	return hex_byte_string


def hex_bytes_to_decimal(hex_bytes):
	# expects list of hex bytes
	hex_byte_string = "".join(hex_bytes)
	decimal_value = hex_string_to_decimal(hex_byte_string)
	return decimal_value


def hex_string_to_decimal(string):
	# expects string of hex bytes, without spaces.
	if string.count(" ") > 0: 
		stop("string (value=%s) contains at least one space." % string)
	total = 0
	n = len(string)
	for i in xrange(n):
		c = string[i] # c = character
		v = hex_character_to_decimal(c) # v = value
		power = n - i - 1 # first character has the highest power of 16. power of last character is 0. 
		total += v*(16**power)
	return total


def hex_character_to_decimal(c):
	decimal_digits = "0123456789"
	hex_digits = {"a": 10, "b": 11, "c": 12, "d": 13, "e": 14, "f": 15}
	if c in decimal_digits:
		return int(c)
	elif c in hex_digits.keys():
		return hex_digits[c]
	stop("character %s is not a hex character." % c)


def get_var_int_prefix(hex_bytes):
	# expects hex bytes list
	# returns var_int prefix bytes (that describe the length of the hex bytes list).
	if not isinstance(hex_bytes, list):
		stop("expected list, got %s" % type(hex_bytes))
	n = len(hex_bytes)
	if n >= 253: # "fd" in hex. 
		stop("get_var_int_bytes mark1: unwritten section")
	else:
		# add a single byte describing the var_int length
		length_byte = hex(n)[2:] # first two characters are a "0x" prefix. 
		return [length_byte]
		
	
def get_var_int_from_hex_bytes(hex_bytes, index):
	# expects hex bytes list and the index of the-next-byte-to-process. 
	# returns var_int hex bytes list and the index of the-next-byte-to-process. 
	byte_1 = hex_bytes[index]
	byte_length = get_length_of_var_int(byte_1)
	if byte_length > 1:
		# future: handle var_ints longer than 1 byte here.
		stop("get_var_int_from_hex_bytes mark1: unwritten section")
	elif byte_length == 1:
		var_int = [byte_1]
		index += 1
	return [var_int, index]
	

def get_length_of_var_int(hex_byte):
	# expected format:
	# - single hex byte (two characters).
	# - certain high byte values indicate that N further bytes are actually part of the value of this variable_integer.
	# source:
	# http://en.bitcoin.it/wiki/Protocol_documentation#Variable_length_integer
	if hex_byte == "ff": 
		# next 8 bytes are a uint64 (unsigned 64-bit integer)
		return 9
	elif hex_byte == "fe":
		# next 4 bytes are a uint32
		return 5
	elif hex_byte == "fd":
		# next 2 bytes are a uint16
		return 3
	# otherwise, this byte is the value of the variable integer, and the length of the integer is 1 byte (uint8). 
	return 1


def file_get_contents(file_path):
	from os.path import isfile
	if not isfile(file_path): 
		stop("%s is not a file." % file_path)
	f = open(file_path, "r")
	data = f.read()
	f.close()
	return data
	

def stop(message):
	# raise an Exception to get a traceback. 
	raise Exception("ERROR: %s\n" % message)


if __name__ == "__main__": main()



aineko:work stjohnpiano$ ./check_hash_and_signature1.py


public key hash (previous transaction): 7e 76 71 29 b1 5e e6 d7 e6 be 8a c3 a1 00 dd 29 c4 c6 7e 63
- public key hash without spaces: 7e767129b15ee6d7e6be8ac3a100dd29c4c67e63
public key data (new transaction): 02 a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61
ripemd160(sha256(public_key_data)): 7e767129b15ee6d7e6be8ac3a100dd29c4c67e63
success: calculated hash of the public key data (new transaction) matches the stored hash in the scriptPubKey (previous transaction)
double sha256 digest of the new transaction in signable form:
a309d05d98ec0f328ef84e436c730fb4e7a0471ace0f8a44a3efa11866a5a796





Inputs for check_signature1.py:
- double sha256 of transaction-in-signable-form:
a309d05d98ec0f328ef84e436c730fb4e7a0471ace0f8a44a3efa11866a5a796
- uncompressed public key:
04a534d65273f8520e1e841ccbb782b71e06b4117a90bb92c655f8131cf6ae226150b222cdcd631b144cfe6ff8df86a5c02e416eb6d10ee9168bda056fc4f855ee
- signature data: 30 45 02 21 00 d8 32 1c 3c 4b 3f c3 a3 4e 30 54 7d 1c b6 ae a6 85 5c ad 33 0b 59 0b 53 e8 24 f6 47 dd 49 23 99 02 20 1e 90 56 e8 22 97 a2 10 8b 19 69 cc 49 71 50 73 4a b7 8f 51 cb f9 79 88 5a fc 10 4f 70 a6 51 50 01

All of these come from tr_s2.


Remove the last byte (hash_type == 01) from the signature and add spaces to the other data items.

Inputs for check_signature1.py:
- double sha256 of transaction-in-signable-form: a3 09 d0 5d 98 ec 0f 32 8e f8 4e 43 6c 73 0f b4 e7 a0 47 1a ce 0f 8a 44 a3 ef a1 18 66 a5 a7 96
- uncompressed public key: 04 a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61 50 b2 22 cd cd 63 1b 14 4c fe 6f f8 df 86 a5 c0 2e 41 6e b6 d1 0e e9 16 8b da 05 6f c4 f8 55 ee
- signature data: 30 45 02 21 00 d8 32 1c 3c 4b 3f c3 a3 4e 30 54 7d 1c b6 ae a6 85 5c ad 33 0b 59 0b 53 e8 24 f6 47 dd 49 23 99 02 20 1e 90 56 e8 22 97 a2 10 8b 19 69 cc 49 71 50 73 4a b7 8f 51 cb f9 79 88 5a fc 10 4f 70 a6 51 50





Rewrite check_signature1.py a bit.




check_signature1.py

python 2.7.13
#!/opt/local/bin/python

import ecdsa
import ecdsa.der

def main():

	from binascii import hexlify, unhexlify

	# all data comes from a new transaction. 

	# double sha256 hash digest of transaction-in-signable-form. 
	signable_tx_hash_hex = "a3 09 d0 5d 98 ec 0f 32 8e f8 4e 43 6c 73 0f b4 e7 a0 47 1a ce 0f 8a 44 a3 ef a1 18 66 a5 a7 96"
	
	der_sig_hex = "30 45 02 21 00 d8 32 1c 3c 4b 3f c3 a3 4e 30 54 7d 1c b6 ae a6 85 5c ad 33 0b 59 0b 53 e8 24 f6 47 dd 49 23 99 02 20 1e 90 56 e8 22 97 a2 10 8b 19 69 cc 49 71 50 73 4a b7 8f 51 cb f9 79 88 5a fc 10 4f 70 a6 51 50"
	
	public_key_hex = "04 a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61 50 b2 22 cd cd 63 1b 14 4c fe 6f f8 df 86 a5 c0 2e 41 6e b6 d1 0e e9 16 8b da 05 6f c4 f8 55 ee"
	
	signable_tx_hash_hex_ns = signable_tx_hash_hex.replace(" ","") # ns = no spaces
	der_sig_hex_ns = der_sig_hex.replace(" ","")
	public_key_hex_ns = public_key_hex.replace(" ","")
	
	# remove "04" prefix.
	assert(public_key_hex_ns[:2] == "04")
	public_key_hex_ns = public_key_hex_ns[2:] 
	
	# convert sig from DER encoding to hex
	sig_hex_ns = derSigToHexSig(der_sig_hex_ns)
	
	# convert sig, public key, and hash to raw byte sequences
	hash_raw_byte_string = unhexlify(signable_tx_hash_hex_ns)
	sig_raw_byte_string = unhexlify(sig_hex_ns)
	public_key_raw_byte_string = unhexlify(public_key_hex_ns)
	
	# create a key object from the raw-byte-sequence public key.
	vk = ecdsa.VerifyingKey.from_string(public_key_raw_byte_string, curve=ecdsa.SECP256k1)
	
	# verify the signature
	verification_result = vk.verify_digest(sig_raw_byte_string, hash_raw_byte_string)
	
	print "signable_tx_hash_hex (%d bytes): %s" % (len(signable_tx_hash_hex_ns)/2, signable_tx_hash_hex_ns)
	print "sig_hex (%d bytes): %s" % (len(sig_hex_ns)/2, sig_hex_ns)
	print "public_key_hex (%d bytes): %s" % (len(public_key_hex_ns)/2, public_key_hex_ns)
	print "type(verification_result): %s" % type(verification_result)
	print "verification_result: %s" % verification_result
	


# Input is a hex-encoded, DER-encoded signature
# Output is a 64-byte hex-encoded signature
def derSigToHexSig(s):
	s, junk = ecdsa.der.remove_sequence(s.decode('hex'))
	if junk != '':
		print 'JUNK', junk.encode('hex')
	assert(junk == '')
	x, s = ecdsa.der.remove_integer(s)
	y, s = ecdsa.der.remove_integer(s)
	return '%064x%064x' % (x, y)


if __name__ == "__main__": main()



aineko:work stjohnpiano$ ./check_signature1.py

signable_tx_hash_hex (32 bytes): a309d05d98ec0f328ef84e436c730fb4e7a0471ace0f8a44a3efa11866a5a796
sig_hex (64 bytes): d8321c3c4b3fc3a34e30547d1cb6aea6855cad330b590b53e824f647dd4923991e9056e82297a2108b1969cc497150734ab78f51cbf979885afc104f70a65150
public_key_hex (64 bytes): a534d65273f8520e1e841ccbb782b71e06b4117a90bb92c655f8131cf6ae226150b222cdcd631b144cfe6ff8df86a5c02e416eb6d10ee9168bda056fc4f855ee
type(verification_result): <type 'bool'>
verification_result: True






Excellent.


Digital signature for tr_s2 found to be valid.

Earlier the public key for tr_s2 was found to be valid (its hash was the expected value).


Next: Move the functionality of uncompress.py, check_validity_of_point.py, and check_signature1.py into check_hash_and_signature1.py.



Hm. Actually, I think I'll keep everything as separate files for the moment.


Actually, I should split the public key hash check and the double sha256 calculation into separate files. I'll verify the transaction again and, as I do so, I'll perform the split and record the sequence of operations in the verification process.







### START VERIFICATION PROCESS

selected raw transaction, tr_s2:

01000000014f8cad2ca3b602750e06c4300d91b22c17dd97bb5ce635d860df8b0ab325b336010000006b483045022100d8321c3c4b3fc3a34e30547d1cb6aea6855cad330b590b53e824f647dd49239902201e9056e82297a2108b1969cc497150734ab78f51cbf979885afc104f70a65150012102a534d65273f8520e1e841ccbb782b71e06b4117a90bb92c655f8131cf6ae2261fdffffff02e8d94100000000001976a914338cdde52f708236affa5675f969606ff846ee6f88acc4716e01000000001976a9143496950f1a01285a1605ac9337c06a5596b9fcd888accaee0700


it's stored in file tr_s2.txt.


use parser2.py, targeted at tr_s2.txt, to parse the raw transaction format.

aineko:work stjohnpiano$ ./parser2.py


Transaction:
- [derived property] byte length: 226
- version: 01 00 00 00
- input_count: 01
- [derived property] input_count_decimal: 1
- input #0:
-- previous_output_hash: 4f 8c ad 2c a3 b6 02 75 0e 06 c4 30 0d 91 b2 2c 17 dd 97 bb 5c e6 35 d8 60 df 8b 0a b3 25 b3 36
-- [derived property] previous_output_hash (no spaces, big-endian): 36b325b30a8bdf60d835e65cbb97dd172cb2910d30c4060e7502b6a32cad8c4f
-- previous_output_index: 01 00 00 00
-- [derived property] previous_output_index_decimal: 1
-- script_length: 6b
-- [derived property] script_length_decimal: 107
-- scriptSig: 48 30 45 02 21 00 d8 32 1c 3c 4b 3f c3 a3 4e 30 54 7d 1c b6 ae a6 85 5c ad 33 0b 59 0b 53 e8 24 f6 47 dd 49 23 99 02 20 1e 90 56 e8 22 97 a2 10 8b 19 69 cc 49 71 50 73 4a b7 8f 51 cb f9 79 88 5a fc 10 4f 70 a6 51 50 01 21 02 a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61
-- sequence: fd ff ff ff
- output_count: 02
- [derived property] output_count_decimal: 2
- output #0:
-- value: e8 d9 41 00 00 00 00 00
-- [derived property] output value in bitcoin: 0.04315624
-- script_length: 19
-- [derived property] script_length_decimal: 25
-- scriptPubKey: 76 a9 14 33 8c dd e5 2f 70 82 36 af fa 56 75 f9 69 60 6f f8 46 ee 6f 88 ac
- output #1:
-- value: c4 71 6e 01 00 00 00 00
-- [derived property] output value in bitcoin: 0.24015300
-- script_length: 19
-- [derived property] script_length_decimal: 25
-- scriptPubKey: 76 a9 14 34 96 95 0f 1a 01 28 5a 16 05 ac 93 37 c0 6a 55 96 b9 fc d8 88 ac



Note that the previous_output_index is "1".


From the single input, get the value of "[derived property] previous_output_hash (no spaces, big-endian)":
36b325b30a8bdf60d835e65cbb97dd172cb2910d30c4060e7502b6a32cad8c4f


Construct a query with which to retrieve this previous transaction in raw form:

http://blockchain.info/tx/36b325b30a8bdf60d835e65cbb97dd172cb2910d30c4060e7502b6a32cad8c4f?format=hex

Save the resulting raw transaction in the file tr_s1.txt (it's the same as before - no change).


use parser2.py, targeted at tr_s1.txt, to parse the raw transaction format.


aineko:work stjohnpiano$ ./parser2.py


Transaction:
- [derived property] byte length: 225
- version: 01 00 00 00
- input_count: 01
- [derived property] input_count_decimal: 1
- input #0:
-- previous_output_hash: 80 39 45 b5 27 8f c2 90 94 f3 ac a6 b3 28 e5 f0 b4 78 a9 fb 82 59 49 c8 a9 d5 40 9a 5d 66 d9 d0
-- [derived property] previous_output_hash (no spaces, big-endian): d0d9665d9a40d5a9c8495982fba978b4f0e528b3a6acf39490c28f27b5453980
-- previous_output_index: 01 00 00 00
-- [derived property] previous_output_index_decimal: 1
-- script_length: 6a
-- [derived property] script_length_decimal: 106
-- scriptSig: 47 30 44 02 20 62 17 b2 f6 c5 40 c1 71 b8 61 41 a4 44 9a e1 eb d1 e9 35 6f 41 c7 37 73 bb 71 ab f6 72 39 2a 21 02 20 33 25 3d e3 7a 4c c9 08 83 00 9b 3b a5 7b 66 5d 04 11 0d dc f2 d2 01 8f 1d 41 1c 94 a9 0a f7 21 01 21 02 30 43 a8 7b 93 0b 2a 2b 42 ab b8 56 20 25 7b ca 8a 53 51 98 a0 80 c6 c6 ec a1 04 93 20 42 cc a0
-- sequence: fd ff ff ff
- output_count: 02
- [derived property] output_count_decimal: 2
- output #0:
-- value: d8 4f 9a 01 00 00 00 00
-- [derived property] output value in bitcoin: 0.26890200
-- script_length: 19
-- [derived property] script_length_decimal: 25
-- scriptPubKey: 76 a9 14 1b bb ce 69 2a 9c 85 ac 25 cc b2 6c 1d 9a 4a 0f 03 36 83 32 88 ac
- output #1:
-- value: 23 52 b6 01 00 00 00 00
-- [derived property] output value in bitcoin: 0.28725795
-- script_length: 19
-- [derived property] script_length_decimal: 25
-- scriptPubKey: 76 a9 14 7e 76 71 29 b1 5e e6 d7 e6 be 8a c3 a1 00 dd 29 c4 c6 7e 63 88 ac




Get the tr_s2_scriptSig:
48 30 45 02 21 00 d8 32 1c 3c 4b 3f c3 a3 4e 30 54 7d 1c b6 ae a6 85 5c ad 33 0b 59 0b 53 e8 24 f6 47 dd 49 23 99 02 20 1e 90 56 e8 22 97 a2 10 8b 19 69 cc 49 71 50 73 4a b7 8f 51 cb f9 79 88 5a fc 10 4f 70 a6 51 50 01 21 02 a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61


From the second output (output index 1) of tr_s1, get the scriptPubKey:
76 a9 14 7e 76 71 29 b1 5e e6 d7 e6 be 8a c3 a1 00 dd 29 c4 c6 7e 63 88 ac



tr_s2_scriptSig satisfies the cryptographic condition set by tr_s1_output1_scriptPubKey, which allows tr_s2 to use tr_s1_output1 as an input.


Take this scriptSig and scriptPubKey and use them as inputs to script_processor1.py.


aineko:work stjohnpiano$ ./script_processor1.py


script:
- scriptSig (new transaction): 48 30 45 02 21 00 d8 32 1c 3c 4b 3f c3 a3 4e 30 54 7d 1c b6 ae a6 85 5c ad 33 0b 59 0b 53 e8 24 f6 47 dd 49 23 99 02 20 1e 90 56 e8 22 97 a2 10 8b 19 69 cc 49 71 50 73 4a b7 8f 51 cb f9 79 88 5a fc 10 4f 70 a6 51 50 01 21 02 a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61
- scriptPubKey (previous transaction): 76 a9 14 7e 76 71 29 b1 5e e6 d7 e6 be 8a c3 a1 00 dd 29 c4 c6 7e 63 88 ac
- entire script (scriptSig + scriptPubKey): 48 30 45 02 21 00 d8 32 1c 3c 4b 3f c3 a3 4e 30 54 7d 1c b6 ae a6 85 5c ad 33 0b 59 0b 53 e8 24 f6 47 dd 49 23 99 02 20 1e 90 56 e8 22 97 a2 10 8b 19 69 cc 49 71 50 73 4a b7 8f 51 cb f9 79 88 5a fc 10 4f 70 a6 51 50 01 21 02 a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61 76 a9 14 7e 76 71 29 b1 5e e6 d7 e6 be 8a c3 a1 00 dd 29 c4 c6 7e 63 88 ac
- entire script in readable form:
-- PUSHDATA: 48
-- [derived property] PUSHDATA decimal value: 72
-- signature_data: 30 45 02 21 00 d8 32 1c 3c 4b 3f c3 a3 4e 30 54 7d 1c b6 ae a6 85 5c ad 33 0b 59 0b 53 e8 24 f6 47 dd 49 23 99 02 20 1e 90 56 e8 22 97 a2 10 8b 19 69 cc 49 71 50 73 4a b7 8f 51 cb f9 79 88 5a fc 10 4f 70 a6 51 50 01
-- PUSHDATA: 21
-- [derived property] PUSHDATA decimal value: 33
-- public_key_data: 02 a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61
-- OP_DUP: 76
-- OP_HASH160: a9
-- PUSHDATA: 14
-- [derived property] PUSHDATA decimal value: 20
-- public_key_hash: 7e 76 71 29 b1 5e e6 d7 e6 be 8a c3 a1 00 dd 29 c4 c6 7e 63
-- OP_EQUALVERIFY: 88
-- OP_CHECKSIG: ac



script_processor1.py didn't raise any errors, which means that the combined script is in the expected format (the standard P2PKH format).


Now, write check_hash1.py, drawing material from check_hash_and_signature1.py.
check_hash1.py will use these results from script_processor1.py as input:
- public_key_data: 02 a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61
public_key_hash: 7e 76 71 29 b1 5e e6 d7 e6 be 8a c3 a1 00 dd 29 c4 c6 7e 63




check_hash1.py

python 2.7.13
#!/opt/local/bin/python


from binascii import hexlify, unhexlify
from pypy_sha256 import sha256
from bjorn_edstrom_ripemd160 import RIPEMD160


def main():
	
	
	# data from previous transaction
	public_key_hash = "7e 76 71 29 b1 5e e6 d7 e6 be 8a c3 a1 00 dd 29 c4 c6 7e 63"

	# data from new transaction
	public_key_data = "02 a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61"
	
	# hash of public key data
	digest_raw_byte_string = op_hash160(public_key_data)
	
	public_key_hash_hex_byte_string_ns = public_key_hash.replace(" ","") # ns = no spaces
	
	
	print ""
	
	print "public key hash (previous transaction): %s" % public_key_hash
	print "- public key hash without spaces: %s" % public_key_hash_hex_byte_string_ns
	print "public key data (new transaction): %s" % public_key_data

	# convert the byte sequence to hex for display. 
	digest_hex_byte_string = hexlify(digest_raw_byte_string)
	print "ripemd160(sha256(public_key_data)): %s" % digest_hex_byte_string
	
	# confirm that the hash of the public key data matches the hash in the scriptPubKey in tr_s1. 
	assert(digest_hex_byte_string == public_key_hash_hex_byte_string_ns)
	print "success: calculated hash of the public key data (new transaction) matches the stored hash in the scriptPubKey (previous transaction)"
	
	print ""



def op_hash160(hex_byte_string):
	# expects hex_byte_string (optionally with spaces)
	# returns byte sequence
	
	hex_byte_string = hex_byte_string.replace(" ","")

	# convert to a byte sequence that can be used as input to the sha256 hash function. 
	byte_sequence = unhexlify(hex_byte_string)

	# the digest (i.e. the result of hashing the input) will be a byte sequence. 
	sha256_digest = sha256(byte_sequence).digest()

	# the sha256_digest is already a raw byte sequence that can be used as input to the ripemd160 hash function. 
	ripemd160_digest = RIPEMD160(sha256_digest).digest()
	
	return ripemd160_digest


if __name__ == "__main__": main()



aineko:work stjohnpiano$ ./check_hash1.py


public key hash (previous transaction): 7e 76 71 29 b1 5e e6 d7 e6 be 8a c3 a1 00 dd 29 c4 c6 7e 63
- public key hash without spaces: 7e767129b15ee6d7e6be8ac3a100dd29c4c67e63
public key data (new transaction): 02 a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61
ripemd160(sha256(public_key_data)): 7e767129b15ee6d7e6be8ac3a100dd29c4c67e63
success: calculated hash of the public key data (new transaction) matches the stored hash in the scriptPubKey (previous transaction)



tr_s2_public_key hashes to tr_s1_public_key_hash.


Write check_signature2.py, which will import the Transaction class from parser2.py.

Copy the new material in the Transaction class from check_hash_and_signature1.py to the Transaction class in parser2.py.
These are:
- the function get_var_int_prefix (place this in parser2.py).
- the methods get_signable_hex_bytes and get_hex_bytes.


check_signature2.py inputs:
- tr_s2.txt
- (tr_s2) public_key_data (from script_processor1.py): 02 a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61
- (tr_s2) signature_data (from script_processor1.py): 30 45 02 21 00 d8 32 1c 3c 4b 3f c3 a3 4e 30 54 7d 1c b6 ae a6 85 5c ad 33 0b 59 0b 53 e8 24 f6 47 dd 49 23 99 02 20 1e 90 56 e8 22 97 a2 10 8b 19 69 cc 49 71 50 73 4a b7 8f 51 cb f9 79 88 5a fc 10 4f 70 a6 51 50 01
- (tr_s1) scriptPubKey (from output of parser2.py targeted at tr_s1.txt):
76 a9 14 7e 76 71 29 b1 5e e6 d7 e6 be 8a c3 a1 00 dd 29 c4 c6 7e 63 88 ac



Don't remove the "01" final hash_type byte from signature_data manually. I'll do this in check_signature2.py.


Add a check in check_signature2.py that stops the script if the public key is in compressed form and instructs the user to:
- uncompress the public key
- run the script again

Done.

In this case, the public key starts with "02", so check_signature reports this as a problem. Use uncompress.py to produce an uncompressed public key and use check_validity_of_point.py to confirm that the uncompressed public key is a valid point on the secp256k1 curve.

Adjust uncompress.py so that it prints the X and Y values that can be used as inputs for check_validity_of_point.py.



uncompress.py

python 2.7.13
#!/opt/local/bin/python


def main():
	
	public_key_data_byte_string = "02 a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61"
	public_key_data_byte_string = public_key_data_byte_string.replace(" ","")
	
	print ""
	
	print "compressed public key: %s" % public_key_data_byte_string

	public_key_hex_byte_string = uncompress(public_key_data_byte_string)
	
	print "uncompressed public key: %s" % public_key_hex_byte_string
	
	n = len(public_key_hex_byte_string[2:])/2
	X = public_key_hex_byte_string[2:2+n]
	Y = public_key_hex_byte_string[2+n:2+2*n]
	print "- X: %s" % X
	print "- Y: %s" % Y
	print ""


def uncompress(compressed_key):
	# source: http://bitcointalk.org/index.php?topic=644919.msg7205689#msg7205689
	# author: TimS
	# example compressed key: '0314fc03b8df87cd7b872996810db8458d61da8448e531569c8517b469a119d267'
	
	def pow_mod(x, y, z):
		"Calculate (x ** y) % z efficiently."
		number = 1
		while y:
			if y & 1:
				number = number * x % z
			y >>= 1
			x = x * x % z
		return number
	
	p = 0xfffffffffffffffffffffffffffffffffffffffffffffffffffffffefffffc2f
	y_parity = int(compressed_key[:2]) - 2
	x = int(compressed_key[2:], 16)
	a = (pow_mod(x, 3, p) + 7) % p
	y = pow_mod(a, (p+1)//4, p)
	if y % 2 != y_parity:
		y = -y % p
	uncompressed_key = '04{:x}{:x}'.format(x, y)
	return uncompressed_key
	
	
if __name__ == "__main__": main()



aineko:work stjohnpiano$ ./uncompress.py


compressed public key: 02a534d65273f8520e1e841ccbb782b71e06b4117a90bb92c655f8131cf6ae2261
uncompressed public key: 04a534d65273f8520e1e841ccbb782b71e06b4117a90bb92c655f8131cf6ae226150b222cdcd631b144cfe6ff8df86a5c02e416eb6d10ee9168bda056fc4f855ee
- X: a534d65273f8520e1e841ccbb782b71e06b4117a90bb92c655f8131cf6ae2261
- Y: 50b222cdcd631b144cfe6ff8df86a5c02e416eb6d10ee9168bda056fc4f855ee





aineko:work stjohnpiano$ ./check_validity_of_point.py

True



Use the uncompressed public key produced by uncompress.py as new input into check_signature2.py.


First, run check_signature2.py with the original compressed public key data:

aineko:work stjohnpiano$ ./check_signature2.py

Traceback (most recent call last):
File "./check_signature2.py", line 118, in <module>
if __name__ == "__main__": main()
File "./check_signature2.py", line 52, in main
stop("public key starts with %s, not 04 as expected. need to uncompress the public key, change the relevant input to this script, and run this script again." % public_key_hex_ns[:2])
File "./check_signature2.py", line 115, in stop
raise Exception("ERROR: %s\n" % message)
Exception: ERROR: public key starts with 02, not 04 as expected. need to uncompress the public key, change the relevant input to this script, and run this script again.



Error is reported as expected.


Now, use the uncompressed public key.


aineko:work stjohnpiano$ ./check_signature2.py


double sha256 digest of the new transaction in signable form: a309d05d98ec0f328ef84e436c730fb4e7a0471ace0f8a44a3efa11866a5a796
signable_tx_hash_hex (32 bytes): a309d05d98ec0f328ef84e436c730fb4e7a0471ace0f8a44a3efa11866a5a796
unpacked signature (64 bytes): d8321c3c4b3fc3a34e30547d1cb6aea6855cad330b590b53e824f647dd4923991e9056e82297a2108b1969cc497150734ab78f51cbf979885afc104f70a65150
public key (64 bytes): a534d65273f8520e1e841ccbb782b71e06b4117a90bb92c655f8131cf6ae226150b222cdcd631b144cfe6ff8df86a5c02e416eb6d10ee9168bda056fc4f855ee
type(verification_result): <type 'bool'>
verification_result: True




Cool. Validity of signature confirmed.



Here's the final code for check_signature2.py.


check_signature2.py

python 2.7.13
#!/opt/local/bin/python


import ecdsa
import ecdsa.der

from binascii import hexlify, unhexlify
from pypy_sha256 import sha256
from parser2 import Transaction


def main():
	

	# data from new transaction:	
	signature_data = "30 45 02 21 00 d8 32 1c 3c 4b 3f c3 a3 4e 30 54 7d 1c b6 ae a6 85 5c ad 33 0b 59 0b 53 e8 24 f6 47 dd 49 23 99 02 20 1e 90 56 e8 22 97 a2 10 8b 19 69 cc 49 71 50 73 4a b7 8f 51 cb f9 79 88 5a fc 10 4f 70 a6 51 50 01"
	#public_key_data = "02 a5 34 d6 52 73 f8 52 0e 1e 84 1c cb b7 82 b7 1e 06 b4 11 7a 90 bb 92 c6 55 f8 13 1c f6 ae 22 61"
	public_key_data = "04a534d65273f8520e1e841ccbb782b71e06b4117a90bb92c655f8131cf6ae226150b222cdcd631b144cfe6ff8df86a5c02e416eb6d10ee9168bda056fc4f855ee"
	
	# new transaction
	raw_transaction_file_path = "tr_s2.txt"
	raw_transaction = file_get_contents(raw_transaction_file_path)
	transaction = Transaction()
	transaction.parse_raw_transaction(raw_transaction)
	
	# data from previous transaction:
	scriptPubKey = "76 a9 14 7e 76 71 29 b1 5e e6 d7 e6 be 8a c3 a1 00 dd 29 c4 c6 7e 63 88 ac"
	
	# get signable form of the transaction.
	input_index = 0
	scriptPubKey_bytes = scriptPubKey.split(" ")
	signable_transaction_hex_bytes = transaction.get_signable_hex_bytes(input_index, scriptPubKey_bytes)
	signable_transaction_hex_byte_string_ns = "".join(signable_transaction_hex_bytes) # ns = no spaces
	
	# hash the transaction-in-signable-form twice using sha256. 
	# convert to a byte sequence that can be used as input to the sha256 hash function. 
	byte_sequence = unhexlify(signable_transaction_hex_byte_string_ns)
	# the digest (i.e. the result of hashing the input) will be a byte sequence. 
	sha256_digest = sha256(byte_sequence).digest()
	# apply sha256 again
	sha256_digest2 = sha256(sha256_digest).digest()
	# convert back to hex
	sha256_digest_hex_byte_string = hexlify(sha256_digest2)
	
	
	# remove any spaces from the three pieces required for checking the signature's validity. 
	signable_tx_hash_hex_ns = sha256_digest_hex_byte_string.replace(" ","") # ns = no spaces
	der_sig_hex_ns = signature_data.replace(" ","")
	public_key_hex_ns = public_key_data.replace(" ","")
	
	# if public key starts with "02" or "03", need to uncompress it.
	if public_key_hex_ns[:2] in ["02", "03"]:
		stop("public key starts with %s, not 04 as expected. need to uncompress the public key, change the relevant input to this script, and run this script again." % public_key_hex_ns[:2])
	
	# check and remove "04" prefix from public key.
	assert(public_key_hex_ns[:2] == "04")
	public_key_hex_ns = public_key_hex_ns[2:]
	
	# check and remove "01" hash_type suffix from signature data.
	assert(der_sig_hex_ns[-2:] == "01")
	der_sig_hex_ns = der_sig_hex_ns[:-2]
	
	# convert sig from DER encoding to hex
	sig_hex_ns = derSigToHexSig(der_sig_hex_ns)
	
	# convert sig, public key, and hash to raw byte sequences
	hash_raw_byte_string = unhexlify(signable_tx_hash_hex_ns)
	sig_raw_byte_string = unhexlify(sig_hex_ns)
	public_key_raw_byte_string = unhexlify(public_key_hex_ns)
	
	# create a key object from the raw-byte-sequence public key.
	vk = ecdsa.VerifyingKey.from_string(public_key_raw_byte_string, curve=ecdsa.SECP256k1)
	
	# verify the signature
	verification_result = vk.verify_digest(sig_raw_byte_string, hash_raw_byte_string)
	
	
	print ""
	print "double sha256 digest of the new transaction in signable form: %s" % sha256_digest_hex_byte_string
	print "signable_tx_hash_hex (%d bytes): %s" % (len(signable_tx_hash_hex_ns)/2, signable_tx_hash_hex_ns)
	print "unpacked signature (%d bytes): %s" % (len(sig_hex_ns)/2, sig_hex_ns)
	print "public key (%d bytes): %s" % (len(public_key_hex_ns)/2, public_key_hex_ns)
	print "type(verification_result): %s" % type(verification_result)
	print "verification_result: %s" % verification_result
	assert(verification_result == True)
	print ""
	


def derSigToHexSig(s):
	# Input is a hex-encoded, DER-encoded signature
	# Output is a 64-byte hex-encoded signature
	# source: http://github.com/shirriff/bitcoin-code: txnUtils.py
	s, junk = ecdsa.der.remove_sequence(s.decode('hex'))
	if junk != '':
		print 'JUNK', junk.encode('hex')
	assert(junk == '')
	x, s = ecdsa.der.remove_integer(s)
	y, s = ecdsa.der.remove_integer(s)
	return '%064x%064x' % (x, y)
	

def file_get_contents(file_path):
	from os.path import isfile
	if not isfile(file_path): 
		stop("%s is not a file." % file_path)
	f = open(file_path, "r")
	data = f.read()
	f.close()
	return data
	

def stop(message):
	# raise an Exception to get a traceback. 
	raise Exception("ERROR: %s\n" % message)


if __name__ == "__main__": main()




And the final code for parser2.py (with the new material from check_hash_and_signature1.py):



parser2.py

python 2.7.13
#!/opt/local/bin/python


def main():

	raw_transaction_file_path = "tr_s1.txt"
	raw_transaction = file_get_contents(raw_transaction_file_path)
	
	transaction = Transaction()
	transaction.parse_raw_transaction(raw_transaction)
	
	print transaction.to_string()
	
	


class Transaction:


	# expected format (raw hex bytes):
	# - version: 4 bytes
	# - input_count: (var_int) 1 to 9 bytes
	# - inputs: 
	# -- previous_output_hash: 32 bytes (little-endian)
	# -- previous_output_index: 4 bytes (little-endian)
	# -- script_length: (var_int) 1 to 9 bytes
	# -- scriptSig
	# -- sequence: 4 bytes
	# - output_count: (var_int) 1 to 9 bytes
	# - outputs:
	# -- value: 8 bytes (little-endian)
	# -- script_length: (var_int) 1 to 9 bytes
	# -- scriptPubKey
	# - block lock time: 4 bytes
	
	# notes:
	# - first byte of var_int indicates its total byte length. More details are in the comments for the get_length_of_var_int function.
	
	# questions:
	# - is sequence little-endian? 
	# - I recall that block_lock_time has an odd format. 
	
	# warning:
	# - other transactions could have formats that don't fit the expected format. 
	
	
	def __init__(self):
		self.raw_transaction = None # original byte string
		self.hex_bytes = None # list of original bytes
		self.byte_length = None
		self.version = None # original bytes
		self.input_count_bytes = None # original bytes
		self.input_count_decimal = None
		self.inputs = None # dictionary of original bytes
		self.output_count_bytes = None # original bytes
		self.output_count_decimal = None
		self.outputs = None # dictionary of original bytes
		self.block_lock_time = None # original bytes
	
	
	def get_signable_hex_bytes(self, input_index, scriptPubKey_bytes):
		# expects input_index as integer and scriptPubKey_bytes as list of hex bytes. 
		# returns list of hex bytes
		# input_index = the position of the input whose private key is or will be used to sign this transaction. The scriptPubKey to which this input was paid will be substituted for the scriptSig of this input. 
		# future: test to see if 0x00 or nothing should be substituted for the scriptSigs of other inputs.
		if len(self.inputs) > 1:
			stop("can't handle multiple inputs at the moment.")
		input = self.inputs[input_index]
		new_script_length = get_var_int_prefix(scriptPubKey_bytes)
		new_scriptSig = scriptPubKey_bytes
		original_script_length = input["script_length"]
		original_scriptSig = input["scriptSig"]
		# perform substitution
		input["script_length"] = new_script_length
		input["scriptSig"] = new_scriptSig
		hex_bytes = self.get_hex_bytes()
		# change back
		input["script_length"] = original_script_length
		input["scriptSig"] = original_scriptSig
		hex_bytes += ["01","00","00","00"] # add hash type (in 4 byte form)
		return hex_bytes
		
		
	def get_hex_bytes(self):
		hex_bytes = self.version + self.input_count_bytes
		for input in self.inputs:
			hex_bytes += input["previous_output_hash"]
			hex_bytes += input["previous_output_index"]
			hex_bytes += input["script_length"]
			hex_bytes += input["scriptSig"]
			hex_bytes += input["sequence"]
		hex_bytes += self.output_count_bytes
		for output in self.outputs:
			hex_bytes += output["value"]
			hex_bytes += output["script_length"]
			hex_bytes += output["scriptPubKey"]
		hex_bytes += self.block_lock_time
		return hex_bytes
	

	
	def parse_raw_transaction(self, raw_transaction):
		self.check_raw_transaction(raw_transaction)
		hex_bytes = [raw_transaction[i:i+2] for i in xrange(0, len(raw_transaction), 2)]
		version = hex_bytes[0:4]
		index = 4 # this variable stores the index of the-next-byte-to-process in the hex_bytes list (which is 0-indexed).
		# get number of inputs
		input_count_bytes, index = get_var_int_from_hex_bytes(hex_bytes, index)
		input_count_decimal = hex_bytes_to_decimal(input_count_bytes)
		# prepare the storage for the inputs
		input = {
			"previous_output_hash": None, 
			"previous_output_index": None,
			"script_length": None,
			"scriptSig": None,
			"sequence": None,
			}
		inputs = [None]*input_count_decimal
		for i in xrange(input_count_decimal):
			# explicitly copy the dictionary to make sure that the different inputs are different objects. 
			inputs[i] = input.copy()
		# parse each input item
		for i in xrange(input_count_decimal):
			previous_output_hash = hex_bytes[index:(index+32)]
			index += 32
			previous_output_index = hex_bytes[index:(index+4)]
			index += 4
			script_length_bytes, index = get_var_int_from_hex_bytes(hex_bytes, index)
			script_length_decimal = hex_bytes_to_decimal(script_length_bytes)
			scriptSig = hex_bytes[index:(index+script_length_decimal)]
			index += script_length_decimal
			sequence = hex_bytes[index:(index+4)]
			index += 4
			# store the input bytes
			input = inputs[i]
			input["previous_output_hash"] = previous_output_hash
			input["previous_output_index"] = previous_output_index
			input["script_length"] = script_length_bytes
			input["scriptSig"] = scriptSig
			input["sequence"] = sequence
		# get number of outputs
		output_count_bytes, index = get_var_int_from_hex_bytes(hex_bytes, index)
		output_count_decimal = hex_bytes_to_decimal(output_count_bytes)
		# prepare the storage for the outputs
		output = {
			"value": None, 
			"script_length": None,
			"scriptPubKey": None,
			}
		outputs = [None]*output_count_decimal
		for i in xrange(output_count_decimal):
			# explicitly copy the dictionary to make sure that the different outputs are different objects.
			outputs[i] = output.copy()
		# parse each input item
		for i in xrange(output_count_decimal):
			value = hex_bytes[index:(index+8)]
			index += 8
			script_length_bytes, index = get_var_int_from_hex_bytes(hex_bytes, index)
			script_length_decimal = hex_bytes_to_decimal(script_length_bytes)
			scriptPubKey = hex_bytes[index:(index+script_length_decimal)]
			index += script_length_decimal
			# store the output bytes
			output = outputs[i]
			output["value"] = value
			output["script_length"] = script_length_bytes
			output["scriptPubKey"] = scriptPubKey
		# get the block_lock_time
		block_lock_time = hex_bytes[index:(index+4)]
		index += 4
		# confirm that we have traversed the exact length of the raw transaction by checking whether:
		# - the index of the-next-byte-to-be-processed (in a 0-indexed list) == the raw-transaction-byte-length (1-indexed)
		assert index == len(hex_bytes)
		# store the results of parsing the raw transaction in this instance's properties. 
		self.raw_transaction = raw_transaction
		self.hex_bytes = hex_bytes
		self.byte_length = len(hex_bytes)
		self.version = version
		self.input_count_bytes = input_count_bytes
		self.input_count_decimal = input_count_decimal
		self.inputs = inputs
		self.output_count_bytes = output_count_bytes
		self.output_count_decimal = output_count_decimal
		self.outputs = outputs
		self.block_lock_time = block_lock_time
		
	
	def to_string(self):
		# return a printable string representation of self.
		s = "\nTransaction:"
		s += "\n- [derived property] byte length: %s" % self.byte_length
		s += "\n- version: %s" % hex_bytes_to_string(self.version)
		s += "\n- input_count: %s" % hex_bytes_to_string(self.input_count_bytes)
		s += "\n- [derived property] input_count_decimal: %d" % self.input_count_decimal
		for i, input in enumerate(self.inputs): # same order as found within transaction.
			s += "\n- input #%d:" % i
			s += "\n-- previous_output_hash: %s" % hex_bytes_to_string(input["previous_output_hash"])
			previous_output_hash_big_endian = list(reversed(input["previous_output_hash"]))
			
			s += "\n-- [derived property] previous_output_hash (no spaces, big-endian): %s" % "".join(previous_output_hash_big_endian)
			s += "\n-- previous_output_index: %s" % hex_bytes_to_string(input["previous_output_index"])
			previous_output_index_big_endian = list(reversed(input["previous_output_index"]))
			s += "\n-- [derived property] previous_output_index_decimal: %d" % hex_bytes_to_decimal(previous_output_index_big_endian)
			s += "\n-- script_length: %s" % hex_bytes_to_string(input["script_length"])
			s += "\n-- [derived property] script_length_decimal: %d" % hex_bytes_to_decimal(input["script_length"])
			s += "\n-- scriptSig: %s" % hex_bytes_to_string(input["scriptSig"])
			s += "\n-- sequence: %s" % hex_bytes_to_string(input["sequence"])
		s += "\n- output_count: %s" % hex_bytes_to_string(self.output_count_bytes)
		s += "\n- [derived property] output_count_decimal: %d" % self.output_count_decimal
		for i, output in enumerate(self.outputs): # same order as found within transaction.
			s += "\n- output #%d:" % i
			s += "\n-- value: %s" % hex_bytes_to_string(output["value"])
			s += "\n-- [derived property] output value in bitcoin: %s" % self.convert_output_value_to_bitcoin_string(output["value"])
			s += "\n-- script_length: %s" % hex_bytes_to_string(output["script_length"])
			s += "\n-- [derived property] script_length_decimal: %s" % hex_bytes_to_decimal(output["script_length"])
			s += "\n-- scriptPubKey: %s" % hex_bytes_to_string(output["scriptPubKey"])
		s += "\n"
		return s
		
	
	def convert_output_value_to_bitcoin_string(self, value):
		# convert 8-byte output value (little-endian list of hex bytes) into bitcoin value. 
		value = list(reversed(value)) # change to big-endian
		vs = "".join(value) # vs = value string
		vs_decimal = str(hex_string_to_decimal(vs)) # value string in decimal form
		n = len(vs_decimal) # number of characters in the decimal form of the value string.
		if n - 8 > 0: # more than 8 characters i.e. greater than 1 btc. bitcoin has 8 decimal places.
			# add the decimal point at the appropriate position within the value string.
			section1 = vs_decimal[:-8]
			section2 = vs_decimal[-8:]
			vs_decimal = section1 + "." + section2
		else:
			# 8 or fewer characters i.e. less than 1 btc.
			# add a 0, a decimal point, and the appropriate number of 0s in front of the value string.
			n_zeros = abs(n-8) # appropriate number of zeros
			vs_decimal = "0." + "0"*n_zeros + vs_decimal
		return vs_decimal
	
	
	def check_raw_transaction(self, input):
		# input should be an even number of lower-case hex byte characters, with no spaces.
		permitted_characters = "0123456789abcdef"
		for character in input:
			if character not in permitted_characters:
				stop("character %s is not in permitted characters list: %s" % (character, permitted_characters))
		if len(input) % 2 != 0:
			stop("length of input string is not even")
		return 0



def hex_bytes_to_string(hex_bytes):
	hex_byte_string = " ".join(hex_bytes) # space-separated hex byte string
	return hex_byte_string


def hex_bytes_to_decimal(hex_bytes):
	# expects list of hex bytes
	hex_byte_string = "".join(hex_bytes)
	decimal_value = hex_string_to_decimal(hex_byte_string)
	return decimal_value


def hex_string_to_decimal(string):
	# expects string of hex bytes
	total = 0
	n = len(string)
	for i in xrange(n):
		c = string[i] # c = character
		v = hex_character_to_decimal(c) # v = value
		power = n - i - 1 # first character has the highest power of 16. power of last character is 0. 
		total += v*(16**power)
	return total
	

def hex_character_to_decimal(c):
	decimal_digits = "0123456789"
	hex_digits = {"a": 10, "b": 11, "c": 12, "d": 13, "e": 14, "f": 15}
	if c in decimal_digits:
		return int(c)
	elif c in hex_digits.keys():
		return hex_digits[c]
	stop("character %s is not a hex character." % c)


def get_var_int_prefix(hex_bytes):
	# expects hex bytes list
	# returns var_int prefix bytes (that describe the length of the hex bytes list).
	if not isinstance(hex_bytes, list):
		stop("expected list, got %s" % type(hex_bytes))
	n = len(hex_bytes)
	if n >= 253: # "fd" in hex. 
		stop("get_var_int_bytes mark1: unwritten section")
	else:
		# add a single byte describing the var_int length
		length_byte = hex(n)[2:] # first two characters are a "0x" prefix. 
		return [length_byte]
		

def get_var_int_from_hex_bytes(hex_bytes, index):
	# expects hex bytes list and the index of the-next-byte-to-process. 
	# returns var_int hex bytes list and the index of the-next-byte-to-process. 
	byte_1 = hex_bytes[index]
	byte_length = get_length_of_var_int(byte_1)
	if byte_length > 1:
		# future: handle var_ints longer than 1 byte here.
		stop("get_var_int_from_hex_bytes mark1: unwritten section")
	elif byte_length == 1:
		var_int = [byte_1]
		index += 1
	return [var_int, index]
	

def get_length_of_var_int(hex_byte):
	# expected format:
	# - single hex byte (two characters).
	# - certain high byte values indicate that N further bytes are actually part of the value of this variable_integer.
	# source:
	# http://en.bitcoin.it/wiki/Protocol_documentation#Variable_length_integer
	if hex_byte == "ff": 
		# next 8 bytes are a uint64 (unsigned 64-bit integer)
		return 9
	elif hex_byte == "fe":
		# next 4 bytes are a uint32
		return 5
	elif hex_byte == "fd":
		# next 2 bytes are a uint16
		return 3
	# otherwise, this byte is the value of the variable integer, and the length of the integer is 1 byte (uint8). 
	return 1


def file_get_contents(file_path):
	from os.path import isfile
	if not isfile(file_path): 
		stop("%s is not a file." % file_path)
	f = open(file_path, "r")
	data = f.read()
	f.close()
	return data
	

def stop(message):
	# raise an Exception to get a traceback. 
	raise Exception("ERROR: %s\n" % message)


if __name__ == "__main__": main()








### END VERIFICATION PROCESS






As a sanity check, I'll do this again with Ken Shirriff's transaction, tr_ken2.


Earlier, I found that tr_ken1 can be accessed at:
blockchain.info/tx/81b4c832d70cb56ff957589752eb4125a4cab78a25a8fc52d6a09e5bd4404d48

From Ken Shirriff's article, I see that tr_ken1 transferred bitcoin to this address generated by Ken Shirriff: 1MMMMSUb1piy2ufrSguNUdFmAcvqrQF8M5

The transaction tr_ken2 transferred bitcoin from
1MMMMSUb1piy2ufrSguNUdFmAcvqrQF8M5
to
1KKKK6N21XKo48zWKuQKXdvSsCf95ibHFa.

At
blockchain.info/tx/81b4c832d70cb56ff957589752eb4125a4cab78a25a8fc52d6a09e5bd4404d48
click the output address
1MMMMSUb1piy2ufrSguNUdFmAcvqrQF8M5,
which links to:
blockchain.info/address/1MMMMSUb1piy2ufrSguNUdFmAcvqrQF8M5

Two transactions are listed for this address.

The top transaction has the output address
1KKKK6N21XKo48zWKuQKXdvSsCf95ibHFa
so it's tr_ken2.

tr_ken2's transaction hash is
3f285f083de7c0acabd9f106a43ec42687ab0bebe2e6f0d529db696794540fea
On the page, this transaction hash links to
blockchain.info/tx/3f285f083de7c0acabd9f106a43ec42687ab0bebe2e6f0d529db696794540fea

Append ?format=hex to this transaction hash link to get the link to the raw form of the transaction.
http://blockchain.info/tx/3f285f083de7c0acabd9f106a43ec42687ab0bebe2e6f0d529db696794540fea?format=hex

transaction tr_ken2 in raw form:

0100000001484d40d45b9ea0d652fca8258ab7caa42541eb52975857f96fb50cd732c8b481000000008a47304402202cb265bf10707bf49346c3515dd3d16fc454618c58ec0a0ff448a676c54ff71302206c6624d762a1fcef4618284ead8f08678ac05b13c84235f1654e6ad168233e8201410414e301b2328f17442c0b8310d787bf3d8a404cfbd0704f135b6ad4b2d3ee751310f981926e53a6e8c39bd7d3fefd576c543cce493cbac06388f2651d1aacbfcdffffffff0162640100000000001976a914c8e90996c7c6080ee06284600c684ed904d14c5c88ac00000000


This is the input to the verification process.






### START VERIFICATION PROCESS


transaction tr_ken2 in raw form:

0100000001484d40d45b9ea0d652fca8258ab7caa42541eb52975857f96fb50cd732c8b481000000008a47304402202cb265bf10707bf49346c3515dd3d16fc454618c58ec0a0ff448a676c54ff71302206c6624d762a1fcef4618284ead8f08678ac05b13c84235f1654e6ad168233e8201410414e301b2328f17442c0b8310d787bf3d8a404cfbd0704f135b6ad4b2d3ee751310f981926e53a6e8c39bd7d3fefd576c543cce493cbac06388f2651d1aacbfcdffffffff0162640100000000001976a914c8e90996c7c6080ee06284600c684ed904d14c5c88ac00000000



Save this raw transaction in tr_ken2.txt in the work directory (this is a new file).


Run parser2.py, targeted at tr_ken2.txt.



aineko:work stjohnpiano$ ./parser2.py


Transaction:
- [derived property] byte length: 223
- version: 01 00 00 00
- input_count: 01
- [derived property] input_count_decimal: 1
- input #0:
-- previous_output_hash: 48 4d 40 d4 5b 9e a0 d6 52 fc a8 25 8a b7 ca a4 25 41 eb 52 97 58 57 f9 6f b5 0c d7 32 c8 b4 81
-- [derived property] previous_output_hash (no spaces, big-endian): 81b4c832d70cb56ff957589752eb4125a4cab78a25a8fc52d6a09e5bd4404d48
-- previous_output_index: 00 00 00 00
-- [derived property] previous_output_index_decimal: 0
-- script_length: 8a
-- [derived property] script_length_decimal: 138
-- scriptSig: 47 30 44 02 20 2c b2 65 bf 10 70 7b f4 93 46 c3 51 5d d3 d1 6f c4 54 61 8c 58 ec 0a 0f f4 48 a6 76 c5 4f f7 13 02 20 6c 66 24 d7 62 a1 fc ef 46 18 28 4e ad 8f 08 67 8a c0 5b 13 c8 42 35 f1 65 4e 6a d1 68 23 3e 82 01 41 04 14 e3 01 b2 32 8f 17 44 2c 0b 83 10 d7 87 bf 3d 8a 40 4c fb d0 70 4f 13 5b 6a d4 b2 d3 ee 75 13 10 f9 81 92 6e 53 a6 e8 c3 9b d7 d3 fe fd 57 6c 54 3c ce 49 3c ba c0 63 88 f2 65 1d 1a ac bf cd
-- sequence: ff ff ff ff
- output_count: 01
- [derived property] output_count_decimal: 1
- output #0:
-- value: 62 64 01 00 00 00 00 00
-- [derived property] output value in bitcoin: 0.00091234
-- script_length: 19
-- [derived property] script_length_decimal: 25
-- scriptPubKey: 76 a9 14 c8 e9 09 96 c7 c6 08 0e e0 62 84 60 0c 68 4e d9 04 d1 4c 5c 88 ac



Note that the previous_output_index of the single input is "0".


From the single input, get the value of "[derived property] previous_output_hash (no spaces, big-endian)":
81b4c832d70cb56ff957589752eb4125a4cab78a25a8fc52d6a09e5bd4404d48


Construct a query with which to retrieve this previous transaction in raw form:

http://blockchain.info/tx/81b4c832d70cb56ff957589752eb4125a4cab78a25a8fc52d6a09e5bd4404d48?format=hex

Save the resulting raw transaction in the file tr_ken1.txt (it's the same as before - no change).


use parser2.py, targeted at tr_ken1.txt, to parse the raw transaction format.


aineko:work stjohnpiano$ ./parser2.py


Transaction:
- [derived property] byte length: 617
- version: 01 00 00 00
- input_count: 03
- [derived property] input_count_decimal: 3
- input #0:
-- previous_output_hash: c7 c4 e9 08 2d fb b3 71 3e 47 c1 be f9 f6 8e fd 9b 85 6d fc de e3 18 41 6d aa a2 ad d8 27 70 d4
-- [derived property] previous_output_hash (no spaces, big-endian): d47027d8ada2aa6d4118e3defc6d859bfd8ef6f9bec1473e71b3fb2d08e9c4c7
-- previous_output_index: 01 00 00 00
-- [derived property] previous_output_index_decimal: 1
-- script_length: 8b
-- [derived property] script_length_decimal: 139
-- scriptSig: 48 30 45 02 21 00 b2 fd 3f 8a 8c 22 6f 2a dd b1 b6 63 00 9c 34 4e 2c 35 1d ad 0d af 02 2d 1c b1 2f e7 7e 81 56 3a 02 20 6e 01 2e f6 23 5d 10 ee 05 8d 4c 3d 61 b4 4a 8c a1 8e bc 5e 99 38 8e 60 04 f3 06 b2 89 68 3e 02 01 41 04 36 5c 78 7a af 52 a1 81 a6 e1 10 f9 d3 da a0 81 03 f9 1b 15 12 be a2 13 98 b9 ea 1f 1b eb 5d ae 91 55 da ad 83 b1 da bb f3 16 c4 b6 e4 bb 43 83 44 20 4a 1d b1 d3 3c cb 47 d0 1c 20 51 c0 97 4a
-- sequence: ff ff ff ff
- input #1:
-- previous_output_hash: ee 59 5b 71 cc 5a 1f b9 80 a7 7a f8 b9 62 53 4a e0 04 9b 29 06 fb e0 79 0e 48 fa 71 e9 f0 22 99
-- [derived property] previous_output_hash (no spaces, big-endian): 9922f0e971fa480e79e0fb06299b04e04a5362b9f87aa780b91f5acc715b59ee
-- previous_output_index: 00 00 00 00
-- [derived property] previous_output_index_decimal: 0
-- script_length: 8b
-- [derived property] script_length_decimal: 139
-- scriptSig: 48 30 45 02 21 00 fa f8 4d 50 e9 9d ee ef 7d 2c f9 f8 b4 60 0b 8d e5 6f d7 88 d9 f6 65 21 a0 43 78 f9 0b 49 a8 76 02 20 46 08 cd e7 76 fd a7 54 ce 87 1a bf f8 73 dc 4d 29 aa 34 c0 3a 50 d9 60 85 20 0e 82 5b ae 73 a4 01 41 04 56 84 d9 b3 83 46 de b7 f9 3b 6b 82 82 dc f8 22 7b fc b7 29 13 d8 f4 c5 fa d9 98 7e 38 77 04 67 fe e2 6b 5a 0b 57 d0 ae f4 df 40 02 46 3e c1 f1 93 46 40 d6 90 5e ed e1 ed 28 bb 7e 43 2b cb d1
-- sequence: ff ff ff ff
- input #2:
-- previous_output_hash: 5e 32 74 71 c5 bd be cc a4 5f cc bd 69 8a 47 94 24 a3 1c 99 a9 70 4b 68 e6 19 f6 b3 e3 b9 29 55
-- [derived property] previous_output_hash (no spaces, big-endian): 5529b9e3b3f619e6684b70a9991ca32494478a69bdcc5fa4ccbebdc57174325e
-- previous_output_index: 01 00 00 00
-- [derived property] previous_output_index_decimal: 1
-- script_length: 8a
-- [derived property] script_length_decimal: 138
-- scriptSig: 47 30 44 02 20 6f 36 1f b4 b9 7a ae a0 4f 18 cf 9f f8 1a 18 6e 48 ed 44 78 37 9e 6e 93 e0 ab 28 d4 d4 82 26 c9 02 20 15 65 b7 3f 2f 03 ef fa cf 42 9e 7a 84 05 43 c5 77 34 75 18 f2 56 dc c3 de f8 55 8f a8 ef fc ef 01 41 04 0d c0 d6 2b d8 7d 2e 54 be 3f 95 c4 18 7f a3 0d 48 d6 cd 00 43 19 78 40 1d d7 98 b7 45 07 d9 ad b4 57 fd 94 34 87 6f 56 27 35 ba ef 0b b9 d6 e3 53 66 c2 c1 81 b6 d9 61 b8 36 2a d9 5a f7 26 aa
-- sequence: ff ff ff ff
- output_count: 02
- [derived property] output_count_decimal: 2
- output #0:
-- value: 72 8b 01 00 00 00 00 00
-- [derived property] output value in bitcoin: 0.00101234
-- script_length: 19
-- [derived property] script_length_decimal: 25
-- scriptPubKey: 76 a9 14 df 3b d3 01 60 e6 c6 14 5b aa f2 c8 8a 88 44 c1 3a 00 d1 d5 88 ac
- output #1:
-- value: 95 87 1a 00 00 00 00 00
-- [derived property] output value in bitcoin: 0.01738645
-- script_length: 19
-- [derived property] script_length_decimal: 25
-- scriptPubKey: 76 a9 14 9f 03 6f 85 50 66 93 bb 84 c5 d9 10 d5 8b dc f8 28 3b 20 ce 88 ac



Get the tr_ken2_scriptSig:
47 30 44 02 20 2c b2 65 bf 10 70 7b f4 93 46 c3 51 5d d3 d1 6f c4 54 61 8c 58 ec 0a 0f f4 48 a6 76 c5 4f f7 13 02 20 6c 66 24 d7 62 a1 fc ef 46 18 28 4e ad 8f 08 67 8a c0 5b 13 c8 42 35 f1 65 4e 6a d1 68 23 3e 82 01 41 04 14 e3 01 b2 32 8f 17 44 2c 0b 83 10 d7 87 bf 3d 8a 40 4c fb d0 70 4f 13 5b 6a d4 b2 d3 ee 75 13 10 f9 81 92 6e 53 a6 e8 c3 9b d7 d3 fe fd 57 6c 54 3c ce 49 3c ba c0 63 88 f2 65 1d 1a ac bf cd



From the first output (output index 0) of tr_ken1, get the scriptPubKey:
76 a9 14 df 3b d3 01 60 e6 c6 14 5b aa f2 c8 8a 88 44 c1 3a 00 d1 d5 88 ac



tr_ken2_scriptSig satisfies the cryptographic condition set by tr_ken1_output0_scriptPubKey, which allows tr_ken2 to use tr_ken1_output0 as an input.



Take this scriptSig and scriptPubKey and use them as inputs to script_processor1.py.


aineko:work stjohnpiano$ ./script_processor1.py


script:
- scriptSig (new transaction): 47 30 44 02 20 2c b2 65 bf 10 70 7b f4 93 46 c3 51 5d d3 d1 6f c4 54 61 8c 58 ec 0a 0f f4 48 a6 76 c5 4f f7 13 02 20 6c 66 24 d7 62 a1 fc ef 46 18 28 4e ad 8f 08 67 8a c0 5b 13 c8 42 35 f1 65 4e 6a d1 68 23 3e 82 01 41 04 14 e3 01 b2 32 8f 17 44 2c 0b 83 10 d7 87 bf 3d 8a 40 4c fb d0 70 4f 13 5b 6a d4 b2 d3 ee 75 13 10 f9 81 92 6e 53 a6 e8 c3 9b d7 d3 fe fd 57 6c 54 3c ce 49 3c ba c0 63 88 f2 65 1d 1a ac bf cd
- scriptPubKey (previous transaction): 76 a9 14 df 3b d3 01 60 e6 c6 14 5b aa f2 c8 8a 88 44 c1 3a 00 d1 d5 88 ac
- entire script (scriptSig + scriptPubKey): 47 30 44 02 20 2c b2 65 bf 10 70 7b f4 93 46 c3 51 5d d3 d1 6f c4 54 61 8c 58 ec 0a 0f f4 48 a6 76 c5 4f f7 13 02 20 6c 66 24 d7 62 a1 fc ef 46 18 28 4e ad 8f 08 67 8a c0 5b 13 c8 42 35 f1 65 4e 6a d1 68 23 3e 82 01 41 04 14 e3 01 b2 32 8f 17 44 2c 0b 83 10 d7 87 bf 3d 8a 40 4c fb d0 70 4f 13 5b 6a d4 b2 d3 ee 75 13 10 f9 81 92 6e 53 a6 e8 c3 9b d7 d3 fe fd 57 6c 54 3c ce 49 3c ba c0 63 88 f2 65 1d 1a ac bf cd 76 a9 14 df 3b d3 01 60 e6 c6 14 5b aa f2 c8 8a 88 44 c1 3a 00 d1 d5 88 ac
- entire script in readable form:
-- PUSHDATA: 47
-- [derived property] PUSHDATA decimal value: 71
-- signature_data: 30 44 02 20 2c b2 65 bf 10 70 7b f4 93 46 c3 51 5d d3 d1 6f c4 54 61 8c 58 ec 0a 0f f4 48 a6 76 c5 4f f7 13 02 20 6c 66 24 d7 62 a1 fc ef 46 18 28 4e ad 8f 08 67 8a c0 5b 13 c8 42 35 f1 65 4e 6a d1 68 23 3e 82 01
-- PUSHDATA: 41
-- [derived property] PUSHDATA decimal value: 65
-- public_key_data: 04 14 e3 01 b2 32 8f 17 44 2c 0b 83 10 d7 87 bf 3d 8a 40 4c fb d0 70 4f 13 5b 6a d4 b2 d3 ee 75 13 10 f9 81 92 6e 53 a6 e8 c3 9b d7 d3 fe fd 57 6c 54 3c ce 49 3c ba c0 63 88 f2 65 1d 1a ac bf cd
-- OP_DUP: 76
-- OP_HASH160: a9
-- PUSHDATA: 14
-- [derived property] PUSHDATA decimal value: 20
-- public_key_hash: df 3b d3 01 60 e6 c6 14 5b aa f2 c8 8a 88 44 c1 3a 00 d1 d5
-- OP_EQUALVERIFY: 88
-- OP_CHECKSIG: ac



script_processor1.py didn't raise any errors, which means that the combined script is in the expected format (the standard P2PKH format).


Run check_hash1.py, which uses these results from script_processor1.py as input:
- public_key_data: 04 14 e3 01 b2 32 8f 17 44 2c 0b 83 10 d7 87 bf 3d 8a 40 4c fb d0 70 4f 13 5b 6a d4 b2 d3 ee 75 13 10 f9 81 92 6e 53 a6 e8 c3 9b d7 d3 fe fd 57 6c 54 3c ce 49 3c ba c0 63 88 f2 65 1d 1a ac bf cd
- public_key_hash: df 3b d3 01 60 e6 c6 14 5b aa f2 c8 8a 88 44 c1 3a 00 d1 d5



aineko:work stjohnpiano$ ./check_hash1.py


public key hash (previous transaction): df 3b d3 01 60 e6 c6 14 5b aa f2 c8 8a 88 44 c1 3a 00 d1 d5
- public key hash without spaces: df3bd30160e6c6145baaf2c88a8844c13a00d1d5
public key data (new transaction): 04 14 e3 01 b2 32 8f 17 44 2c 0b 83 10 d7 87 bf 3d 8a 40 4c fb d0 70 4f 13 5b 6a d4 b2 d3 ee 75 13 10 f9 81 92 6e 53 a6 e8 c3 9b d7 d3 fe fd 57 6c 54 3c ce 49 3c ba c0 63 88 f2 65 1d 1a ac bf cd
ripemd160(sha256(public_key_data)): df3bd30160e6c6145baaf2c88a8844c13a00d1d5
success: calculated hash of the public key data (new transaction) matches the stored hash in the scriptPubKey (previous transaction)




tr_ken2_input_public_key hashes to tr_ken1_output0_public_key_hash.



Now run check_signature2.py, with these inputs:
- tr_ken2.txt
- (tr_ken2) public_key_data (from script_processor1.py): 04 14 e3 01 b2 32 8f 17 44 2c 0b 83 10 d7 87 bf 3d 8a 40 4c fb d0 70 4f 13 5b 6a d4 b2 d3 ee 75 13 10 f9 81 92 6e 53 a6 e8 c3 9b d7 d3 fe fd 57 6c 54 3c ce 49 3c ba c0 63 88 f2 65 1d 1a ac bf cd
- (tr_ken2) signature_data (from script_processor1.py): 30 44 02 20 2c b2 65 bf 10 70 7b f4 93 46 c3 51 5d d3 d1 6f c4 54 61 8c 58 ec 0a 0f f4 48 a6 76 c5 4f f7 13 02 20 6c 66 24 d7 62 a1 fc ef 46 18 28 4e ad 8f 08 67 8a c0 5b 13 c8 42 35 f1 65 4e 6a d1 68 23 3e 82 01
- (tr_ken1) output0_scriptPubKey (from output of parser2.py targeted at tr_ken1.txt):
76 a9 14 df 3b d3 01 60 e6 c6 14 5b aa f2 c8 8a 88 44 c1 3a 00 d1 d5 88 ac



The public key starts with "04", so for this transaction (tr_ken2) it will not be necessary to use uncompress.py and check_validity_of_point.py.


aineko:work stjohnpiano$ ./check_signature2.py


double sha256 digest of the new transaction in signable form: 5fda68729a6312e17e641e9a49fac2a4a6a680126610af573caab270d232f850
signable_tx_hash_hex (32 bytes): 5fda68729a6312e17e641e9a49fac2a4a6a680126610af573caab270d232f850
unpacked signature (64 bytes): 2cb265bf10707bf49346c3515dd3d16fc454618c58ec0a0ff448a676c54ff7136c6624d762a1fcef4618284ead8f08678ac05b13c84235f1654e6ad168233e82
public key (64 bytes): 14e301b2328f17442c0b8310d787bf3d8a404cfbd0704f135b6ad4b2d3ee751310f981926e53a6e8c39bd7d3fefd576c543cce493cbac06388f2651d1aacbfcd
type(verification_result): <type 'bool'>
verification_result: True



Validity of signature in tr_ken2 confirmed.


### END VERIFICATION PROCESS




Let's summarise the verification process as it currently stands.



### TRANSACTION VERIFICATION PROCESS (version 1)


Select a transaction to verify. It must have exactly one input. It can have multiple outputs. The input and the outputs must all be single-signature Pay-To-Public-Key-Hash (P2PKH). The public key in the input scriptSig may be compressed.

Get the raw form of the selected transaction. Save this in a file in the work directory.

Run parser2.py, targeted at this file.
- In the single input, get these values:
-- [derived property] previous_output_hash (no spaces, big-endian)
-- previous_output_index
-- scriptSig

Use the "previous_output_hash (no spaces, big-endian)" value to look up the transaction that supplied an output for use as the input to the selected transaction. Get the raw form of this previous transaction. Save it in a file in the work directory.

Run parser2.py, targeted at this file.
Use the previous_output_index found earlier to select the appropriate output in the result.
- In this output, get this value:
-- scriptPubKey

The selected transaction's scriptSig should satisfy the cryptographic condition set by the previous transaction's scriptPubKey.

Use the scriptPubKey and the scriptSig as inputs to script_processor1.py.
From the output, get these values:
- signature_data
- public_key_data
- public_key_hash

Use public_key_data and public_key_hash as inputs to check_hash1.py. This script will check that the public_key_data in the selected transaction's scriptSig hashes to the public_key_hash in the previous transaction's scriptPubKey.

Use these items produced by script_processor1.py:
- signature_data
- public_key_data
this item produced by parser2.py when parser2.py was targeting the previous transaction:
- scriptPubKey
and the file containing the selected transaction
as inputs to check_signature2.py

If the public_key_data is compressed, uncompress it using uncompress.py and check_validity_of_point.py, then use the uncompressed form as input to check_signature2.py.

check_signature2.py will use the public key to check the validity of the signature of the selected transaction in signable form. The public key and the signature are stored in the selected transaction's scriptSig. The scriptPubKey of the previous transaction is part of the transaction-in-signable-form.

If both check_hash1.py and check_signature2.py return successful results, then the selected transaction has been verified.


### END TRANSACTION VERIFICATION PROCESS (version 1)










[start of notes]



Changes from the original text:

- I have not always preserved the format of any excerpts from webpages on other sites (e.g. not preserving the original bold/italic styles, changing the list structures, not preserving hyperlinks).

- I have not always preserved the format of any computer output (e.g. from running bash commands). Examples: Setting input lines in bold text, adding/removing newlines in order to make a sequence of commands easier to read, using hyphens for lists and sublists instead of indentation, breaking wide tables into consecutive sections.


[end of notes]