如何使用Pandas将新工作表保存在现有的Excel文件中?


86

我想使用Excel文件来存储用python制作的数据。我的问题是我无法将图纸添加到现有的excel文件中。在这里,我建议使用示例代码来解决此问题

import pandas as pd
import numpy as np

path = r"C:\Users\fedel\Desktop\excelData\PhD_data.xlsx"

x1 = np.random.randn(100, 2)
df1 = pd.DataFrame(x1)

x2 = np.random.randn(100, 2)
df2 = pd.DataFrame(x2)

writer = pd.ExcelWriter(path, engine = 'xlsxwriter')
df1.to_excel(writer, sheet_name = 'x1')
df2.to_excel(writer, sheet_name = 'x2')
writer.save()
writer.close()

此代码将两个DataFrame保存到两个表中,分别命名为“ x1”和“ x2”。如果创建两个新的DataFrame并尝试使用相同的代码添加两个新的工作表“ x3”和“ x4”,则原始数据将丢失。

import pandas as pd
import numpy as np

path = r"C:\Users\fedel\Desktop\excelData\PhD_data.xlsx"

x3 = np.random.randn(100, 2)
df3 = pd.DataFrame(x3)

x4 = np.random.randn(100, 2)
df4 = pd.DataFrame(x4)

writer = pd.ExcelWriter(path, engine = 'xlsxwriter')
df3.to_excel(writer, sheet_name = 'x3')
df4.to_excel(writer, sheet_name = 'x4')
writer.save()
writer.close()

我想要一个具有四张纸的Excel文件:“ x1”,“ x2”,“ x3”,“ x4”。我知道“ xlsxwriter”不是唯一的“引擎”,而是“ openpyxl”。我还看到已经有其他人对此问题进行了写作,但是我仍然不知道如何做到这一点。

这里是从此链接获取的代码

import pandas
from openpyxl import load_workbook

book = load_workbook('Masterfile.xlsx')
writer = pandas.ExcelWriter('Masterfile.xlsx', engine='openpyxl') 
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)

data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])

writer.save()

他们说这行得通,但是很难弄清楚是怎么回事。我不了解在此上下文中的“ ws.title”,“ ws”和“ dict”是什么。

保存“ x1”和“ x2”,然后关闭文件,再次打开并添加“ x3”和“ x4”的最佳方法是什么?

Answers:


116

谢谢。我相信,一个完整的示例可能会对遇到相同问题的其他人有所帮助:

import pandas as pd
import numpy as np

path = r"C:\Users\fedel\Desktop\excelData\PhD_data.xlsx"

x1 = np.random.randn(100, 2)
df1 = pd.DataFrame(x1)

x2 = np.random.randn(100, 2)
df2 = pd.DataFrame(x2)

writer = pd.ExcelWriter(path, engine = 'xlsxwriter')
df1.to_excel(writer, sheet_name = 'x1')
df2.to_excel(writer, sheet_name = 'x2')
writer.save()
writer.close()

根据我的理解,在这里我生成一个excel文件,无论它是通过“ xslxwriter”还是“ openpyxl”引擎生成的,都没有关系。

当我想写而不丢失原始数据时

import pandas as pd
import numpy as np
from openpyxl import load_workbook

path = r"C:\Users\fedel\Desktop\excelData\PhD_data.xlsx"

book = load_workbook(path)
writer = pd.ExcelWriter(path, engine = 'openpyxl')
writer.book = book

x3 = np.random.randn(100, 2)
df3 = pd.DataFrame(x3)

x4 = np.random.randn(100, 2)
df4 = pd.DataFrame(x4)

df3.to_excel(writer, sheet_name = 'x3')
df4.to_excel(writer, sheet_name = 'x4')
writer.save()
writer.close()

此代码可以完成工作!


任何想法,为什么当我尝试此操作时得到:ValueError:没有Excel作者'Sales Leads Calculations.xlsx'?
bernando_vialli

1
是的,这是在不删除现有工作表的情况下将工作表添加到excel中。谢谢!
Nikhil VJ

2
保存excel文件时,如何保留现有的excel工作表格式?
Vineesh TP

3
如果有人读这奇事如何覆盖具有相同名称的现有片,而不是重命名新的:添加行 writer.sheets = dict((ws.title, ws) for ws in book.worksheets)writer.book = book
危害TE成型加工厂

1
@Stefano Fedele是否可以使用“ xlsxwriter”而不是“ openpyxl”对现有的excel进行相同的更新?
M Nikesh

15

在您共享的示例中,您正在将现有文件加载到其中book并将其writer.book值设置为book。在这一行中,writer.sheets = dict((ws.title, ws) for ws in book.worksheets)您以的方式访问工作簿中的每个工作表ws。然后是工作表标题,ws因此您正在创建{sheet_titles: sheet}键,值对的字典。然后将此词典设置为writer.sheets。本质上,这些步骤只是从中加载现有数据'Masterfile.xlsx'并使用它们填充您的编写器。

现在,假设您已经有一个带有x1x2作为工作表的文件。您可以使用示例代码加载文件,然后可以执行类似的操作来添加x3x4

path = r"C:\Users\fedel\Desktop\excelData\PhD_data.xlsx"
writer = pd.ExcelWriter(path, engine='openpyxl')
df3.to_excel(writer, 'x3', index=False)
df4.to_excel(writer, 'x4', index=False)
writer.save()

那应该做您想要的。


任何想法,为什么当我尝试此操作时得到:ValueError:没有Excel作者'Sales Leads Calculations.xlsx'?
bernando_vialli

18
这将擦除现有的工作表。
Nikhil VJ

13

一个简单的示例,一次可以写入多个数据以使其表现出色。而且,当您要将数据附加到书面excel文件(关闭的excel文件)上的工作表时。

当您是第一次向Excel写作时。(将“ df1”和“ df2”写入“ 1st_sheet”和“ 2nd_sheet”)

import pandas as pd 
from openpyxl import load_workbook

df1 = pd.DataFrame([[1],[1]], columns=['a'])
df2 = pd.DataFrame([[2],[2]], columns=['b'])
df3 = pd.DataFrame([[3],[3]], columns=['c'])

excel_dir = "my/excel/dir"

with pd.ExcelWriter(excel_dir, engine='xlsxwriter') as writer:    
    df1.to_excel(writer, '1st_sheet')   
    df2.to_excel(writer, '2nd_sheet')   
    writer.save()    

关闭excel后,您希望将数据“追加”到同一excel文件中,但又添加到另一个工作表中,让工作表名称“ 3rd_sheet”说“ df3”。

book = load_workbook(excel_dir)
with pd.ExcelWriter(excel_dir, engine='openpyxl') as writer:
    writer.book = book
    writer.sheets = dict((ws.title, ws) for ws in book.worksheets)    

    ## Your dataframe to append. 
    df3.to_excel(writer, '3rd_sheet')  

    writer.save()     

需要注意的是,excel格式不能为xls,可以使用xlsx之一。


1
我没有看到这个答案的补充。实际上,重复使用这样的上下文管理器将涉及更多的I / O。
查理·克拉克


4

用于创建新文件

x1 = np.random.randn(100, 2)
df1 = pd.DataFrame(x1)
with pd.ExcelWriter('sample.xlsx') as writer:  
    df1.to_excel(writer, sheet_name='x1')

对于附加到文件,使用参数mode='a'pd.ExcelWriter

x2 = np.random.randn(100, 2)
df2 = pd.DataFrame(x2)
with pd.ExcelWriter('sample.xlsx', engine='openpyxl', mode='a') as writer:  
    df2.to_excel(writer, sheet_name='x2')

默认值为mode ='w'。请参阅文档


3

无需使用ExcelWriter即可完成操作,而无需使用openpyxl中的工具。这可以使使用以下操作更轻松地向新工作表添加字体 openpyxl.styles

import pandas as pd
from openpyxl import load_workbook
from openpyxl.utils.dataframe import dataframe_to_rows

#Location of original excel sheet
fileLocation =r'C:\workspace\data.xlsx'

#Location of new file which can be the same as original file
writeLocation=r'C:\workspace\dataNew.xlsx'

data = {'Name':['Tom','Paul','Jeremy'],'Age':[32,43,34],'Salary':[20000,34000,32000]}

#The dataframe you want to add
df = pd.DataFrame(data)

#Load existing sheet as it is
book = load_workbook(fileLocation)
#create a new sheet
sheet = book.create_sheet("Sheet Name")

#Load dataframe into new sheet
for row in dataframe_to_rows(df, index=False, header=True):
    sheet.append(row)

#Save the modified excel at desired location    
book.save(writeLocation)

这是一个不错的解决方案,但是我不确定这是否也有意义。您是说不能这样做,ExcelWriter还是根本不需要?
MattSom

您可以使用Excelwriter做到这一点,但是我发现仅使用openpyxl就可以轻松实现。
吉斯·马修

2

您可以将感兴趣的现有工作表(例如“ x1”,“ x2”)读入内存,然后在添加更多新工作表之前将其“写回”(请注意,文件中的工作表和内存中的工作表是两个不同的工作表东西,如果您不阅读它们,它们将会丢失)。这种方法仅使用“ xlsxwriter”,不涉及openpyxl。

import pandas as pd
import numpy as np

path = r"C:\Users\fedel\Desktop\excelData\PhD_data.xlsx"

# begin <== read selected sheets and write them back
df1 = pd.read_excel(path, sheet_name='x1', index_col=0) # or sheet_name=0
df2 = pd.read_excel(path, sheet_name='x2', index_col=0) # or sheet_name=1
writer = pd.ExcelWriter(path, engine='xlsxwriter')
df1.to_excel(writer, sheet_name='x1')
df2.to_excel(writer, sheet_name='x2')
# end ==>

# now create more new sheets
x3 = np.random.randn(100, 2)
df3 = pd.DataFrame(x3)

x4 = np.random.randn(100, 2)
df4 = pd.DataFrame(x4)

df3.to_excel(writer, sheet_name='x3')
df4.to_excel(writer, sheet_name='x4')
writer.save()
writer.close()

如果要保留所有现有工作表,可以在开头和结尾之间替换以下代码:

# read all existing sheets and write them back
writer = pd.ExcelWriter(path, engine='xlsxwriter')
xlsx = pd.ExcelFile(path)
for sheet in xlsx.sheet_names:
    df = xlsx.parse(sheet_name=sheet, index_col=0)
    df.to_excel(writer, sheet_name=sheet)

1
#This program is to read from excel workbook to fetch only the URL domain names and write to the existing excel workbook in a different sheet..
#Developer - Nilesh K
import pandas as pd
from openpyxl import load_workbook #for writting to the existing workbook

df = pd.read_excel("urlsearch_test.xlsx")

#You can use the below for the relative path.
# r"C:\Users\xyz\Desktop\Python\

l = [] #To make a list in for loop

#begin
#loop starts here for fetching http from a string and iterate thru the entire sheet. You can have your own logic here.
for index, row in df.iterrows():
    try: 
        str = (row['TEXT']) #string to read and iterate
        y = (index)
        str_pos = str.index('http') #fetched the index position for http
        str_pos1 = str.index('/', str.index('/')+2) #fetched the second 3rd position of / starting from http
        str_op = str[str_pos:str_pos1] #Substring the domain name
        l.append(str_op) #append the list with domain names

    #Error handling to skip the error rows and continue.
    except ValueError:
            print('Error!')
print(l)
l = list(dict.fromkeys(l)) #Keep distinct values, you can comment this line to get all the values
df1 = pd.DataFrame(l,columns=['URL']) #Create dataframe using the list
#end

#Write using openpyxl so it can be written to same workbook
book = load_workbook('urlsearch_test.xlsx')
writer = pd.ExcelWriter('urlsearch_test.xlsx',engine = 'openpyxl')
writer.book = book
df1.to_excel(writer,sheet_name = 'Sheet3')
writer.save()
writer.close()

#The below can be used to write to a different workbook without using openpyxl
#df1.to_excel(r"C:\Users\xyz\Desktop\Python\urlsearch1_test.xlsx",index='false',sheet_name='sheet1')

1
除了关于excel之外,我没有关注这与问题的关系。
Artog

我正在努力寻找一个完整的解决方案,以读写现有工作簿,但找不到相同的解决方案。在这里,我找到了有关如何写入现有工作簿的提示,因此我想为我的问题提供完整的解决方案。希望它清除。
nileshk611

0

实现此目的的另一种相当简单的方法是制作如下方法:

def _write_frame_to_new_sheet(path_to_file=None, sheet_name='sheet', data_frame=None):
    book = None
    try:
        book = load_workbook(path_to_file)
    except Exception:
        logging.debug('Creating new workbook at %s', path_to_file)
    with pd.ExcelWriter(path_to_file, engine='openpyxl') as writer:
        if book is not None:
            writer.book = book
        data_frame.to_excel(writer, sheet_name, index=False)

这里的想法是将工作簿加载到path_to_file(如果存在)中,然后将data_frame附加为具有sheet_name的新工作表。如果工作簿不存在,则会创建它。似乎都没有openpyxlxlsxwriter附加,因此如上面@Stefano的示例中所示,您确实必须先加载然后重写才能附加。

By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.