Как правильно читать csv в Pandas при изменении имен столбцов

Абсолютный основной вопрос read_csv.

У меня есть данные, которые выглядят следующим образом в файле csv –

Date,Open Price,High Price,Low Price,Close Price,WAP,No.of Shares,No. of Trades,Total Turnover (Rs.),Deliverable Quantity,% Deli. Qty to Traded Qty,Spread High-Low,Spread Close-Open 28-February-2015,2270.00,2310.00,2258.00,2294.85,2279.192067772602217319,73422,8043,167342840.00,11556,15.74,52.00,24.85 27-February-2015,2267.25,2280.85,2258.00,2266.35,2269.239841485775122730,50721,4938,115098114.00,12297,24.24,22.85,-0.90 26-February-2015,2314.90,2314.90,2250.00,2259.50,2277.198324862194860047,69845,8403,159050917.00,22046,31.56,64.90,-55.40 25-February-2015,2290.00,2332.00,2278.35,2318.05,2315.100614216488163214,161995,10174,375034724.00,102972,63.56,53.65,28.05 24-February-2015,2276.05,2295.00,2258.00,2278.15,2281.058946240263344242,52251,7726,119187611.00,13292,25.44,37.00,2.10 23-February-2015,2303.95,2311.00,2253.25,2270.70,2281.912259219760108491,75951,7344,173313518.00,24969,32.88,57.75,-33.25 20-February-2015,2324.00,2335.20,2277.00,2284.30,2301.631421152326354478,79717,10233,183479152.00,23045,28.91,58.20,-39.70 19-February-2015,2304.00,2333.90,2292.00,2326.60,2321.485466301625211160,85835,8847,199264705.00,29728,34.63,41.90,22.60 18-February-2015,2284.00,2305.00,2261.10,2295.75,2282.060986778089405300,69884,6639,159479550.00,26665,38.16,43.90,11.75 16-February-2015,2281.00,2305.85,2266.00,2278.50,2284.961866239581019628,85541,10149,195457923.00,22164,25.91,39.85,-2.50 13-February-2015,2311.00,2324.90,2286.95,2296.40,2311.371235111317676864,109731,5570,253629077.00,69039,62.92,37.95,-14.60 12-February-2015,2280.00,2322.85,2275.00,2315.45,2301.372038211769425569,79766,9095,183571242.00,33981,42.60,47.85,35.45 11-February-2015,2275.00,2295.00,2258.25,2287.20,2279.587966250020639664,60563,7467,138058686.00,20058,33.12,36.75,12.20 10-February-2015,2244.90,2297.40,2225.00,2280.30,2269.562228214830293104,141656,13026,321497107.00,55577,39.23,72.40,35.40 

Я пытаюсь прочитать эти данные в кадре данных pandas, используя следующие варианты read_csv. Меня интересуют только две колонки.

 z = pd.read_csv('file.csv', parse_dates=True, index_col="Date", usecols=["Date", "Open Price", "Close Price"], names=["Date", "O", "C"], header=0) 

Я получаю

  OC Date 2015-02-28 NaN NaN 2015-02-27 NaN NaN 2015-02-26 NaN NaN 2015-02-25 NaN NaN 2015-02-24 NaN NaN Or z = pd.read_csv('file.csv', parse_dates=True, index_col="Date", usecols=["Date", "Open", "Close"], names=["Date", "Open Price", "Close Price"], header=0) 

Результат –

  Open Price Close Price Date 2015-02-28 NaN NaN 2015-02-27 NaN NaN 2015-02-26 NaN NaN 2015-02-25 NaN NaN 

Не хватает ли чего-то фундаментального или есть проблема с read_csv из pandas 0.13.1 – моя версия на Debian Wheezy?

Вы правы, что-то странно с атрибутами name . Мне кажется, что вы не можете использовать оба в одно и то же время. Либо вы указываете имя для каждого столбца файла CSV, либо вообще не указываете имя. Таким образом, кажется, что вы не можете установить имя, когда вы не берете все колоны ( usecols )

names : array-like List of column names to use. If file contains no header row, then you should explicitly pass header=None

Возможно, вы уже знаете это, но после этого вы можете переименовать колоны.

 import pandas as pd from StringIO import StringIO csv = r"""Date,Open Price,High Price,Low Price,Close Price,WAP,No.of Shares,No. of Trades,Total Turnover (Rs.),Deliverable Quantity,% Deli. Qty to Traded Qty,Spread High-Low,Spread Close-Open 28-February-2015,2270.00,2310.00,2258.00,2294.85,2279.192067772602217319,73422,8043,167342840.00,11556,15.74,52.00,24.85 27-February-2015,2267.25,2280.85,2258.00,2266.35,2269.239841485775122730,50721,4938,115098114.00,12297,24.24,22.85,-0.90 26-February-2015,2314.90,2314.90,2250.00,2259.50,2277.198324862194860047,69845,8403,159050917.00,22046,31.56,64.90,-55.40 25-February-2015,2290.00,2332.00,2278.35,2318.05,2315.100614216488163214,161995,10174,375034724.00,102972,63.56,53.65,28.05 24-February-2015,2276.05,2295.00,2258.00,2278.15,2281.058946240263344242,52251,7726,119187611.00,13292,25.44,37.00,2.10 23-February-2015,2303.95,2311.00,2253.25,2270.70,2281.912259219760108491,75951,7344,173313518.00,24969,32.88,57.75,-33.25 20-February-2015,2324.00,2335.20,2277.00,2284.30,2301.631421152326354478,79717,10233,183479152.00,23045,28.91,58.20,-39.70 19-February-2015,2304.00,2333.90,2292.00,2326.60,2321.485466301625211160,85835,8847,199264705.00,29728,34.63,41.90,22.60 18-February-2015,2284.00,2305.00,2261.10,2295.75,2282.060986778089405300,69884,6639,159479550.00,26665,38.16,43.90,11.75 16-February-2015,2281.00,2305.85,2266.00,2278.50,2284.961866239581019628,85541,10149,195457923.00,22164,25.91,39.85,-2.50 13-February-2015,2311.00,2324.90,2286.95,2296.40,2311.371235111317676864,109731,5570,253629077.00,69039,62.92,37.95,-14.60 12-February-2015,2280.00,2322.85,2275.00,2315.45,2301.372038211769425569,79766,9095,183571242.00,33981,42.60,47.85,35.45 11-February-2015,2275.00,2295.00,2258.25,2287.20,2279.587966250020639664,60563,7467,138058686.00,20058,33.12,36.75,12.20 10-February-2015,2244.90,2297.40,2225.00,2280.30,2269.562228214830293104,141656,13026,321497107.00,55577,39.23,72.40,35.40""" df = pd.read_csv(StringIO(csv), usecols=["Date", "Open Price", "Close Price"], header=0) df.columns = ['Date', 'O', 'C'] df 

вывод:

  Date OC 0 28-February-2015 2270.00 2294.85 1 27-February-2015 2267.25 2266.35 2 26-February-2015 2314.90 2259.50 3 25-February-2015 2290.00 2318.05 4 24-February-2015 2276.05 2278.15 5 23-February-2015 2303.95 2270.70 6 20-February-2015 2324.00 2284.30 7 19-February-2015 2304.00 2326.60 8 18-February-2015 2284.00 2295.75 9 16-February-2015 2281.00 2278.50 10 13-February-2015 2311.00 2296.40 11 12-February-2015 2280.00 2315.45 12 11-February-2015 2275.00 2287.20 13 10-February-2015 2244.90 2280.30