Code for "Illusions of Alignment Between Large Language Models and Brains Emerge From Fragile Methods and Overlooked Confounds".
Required packages can be installed using the requirements_updated.txt file.
We recommend first creating a conda environment: conda create -n "YOUR_ENV_NAME" python=3.11 -y.
Next, activate this conda environment, and run: pip install -r requirements_updated.txt
To save the neural data and necessary associated data, run create_y_matrix.ipynb. We are working on uploading the associated neural data (.nc) files required for this notebook.
All code to generate model activations (i.e., X variables in regression) is located in the generate_activations folder.
Run OASM.ipynb and simple_features.ipynb to generate activations for OASM and the simple features, respecitvely.
Run LLM.py to generate activations for LLMs.
Code to combine activations for the banded regressions is located in the stack_features folder.
Stack_features_figure_2.ipynb stacks OASM with with GPT2 activations.
The main function is himalaya_regression_caller, located within banded_reg_func.py. This function can be called from the command line using the call_banded_reg.py file, or imported into a notebook.
himalaya_regression_caller relies on code from neural_dat_funcs, which generates train/val/test splits for each neural dataset and then calls run_himalayas (located near the end of helper_funcs.py) to get the regression results.
All results are saved to results_all/results_dataset, where dataset is specified by the user. The three possible datasets which were used in the paper, are pereira, fedorenko, and blank.
The folders are organized as following:
figure
All figures are created in the analyze_results/figures_code folder. We provide a separate script to generate the results for each figures.